Message ID | 20250228161824.3164826-1-mathias.nyman@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFT] xhci: Handle spurious events on Etron host isoc enpoints | expand |
Hi Mathias, kernel test robot noticed the following build errors: [auto build test ERROR on usb/usb-testing] [also build test ERROR on usb/usb-next usb/usb-linus linus/master v6.14-rc4 next-20250228] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Mathias-Nyman/xhci-Handle-spurious-events-on-Etron-host-isoc-enpoints/20250301-001842 base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing patch link: https://lore.kernel.org/r/20250228161824.3164826-1-mathias.nyman%40linux.intel.com patch subject: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints config: sh-randconfig-002-20250301 (https://download.01.org/0day-ci/archive/20250301/202503010346.46nbVSmT-lkp@intel.com/config) compiler: sh4-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250301/202503010346.46nbVSmT-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202503010346.46nbVSmT-lkp@intel.com/ All error/warnings (new ones prefixed by >>): drivers/usb/host/xhci-ring.c: In function 'xhci_spurious_success_tx_event': >> drivers/usb/host/xhci-ring.c:2650:21: error: 'struct xhci_ring' has no member named 'old_trb_comp_code' 2650 | switch (ring->old_trb_comp_code) { | ^~ In file included from include/linux/device.h:15, from include/linux/dma-mapping.h:5, from drivers/usb/host/xhci-ring.c:59: drivers/usb/host/xhci-ring.c: In function 'handle_tx_event': drivers/usb/host/xhci-ring.c:2717:60: error: 'struct xhci_ring' has no member named 'old_trb_comp_code' 2717 | slot_id, ep_index, ep_ring->old_trb_comp_code); | ^~ include/linux/dev_printk.h:139:56: note: in definition of macro 'dev_no_printk' 139 | _dev_printk(level, dev, fmt, ##__VA_ARGS__); \ | ^~~~~~~~~~~ drivers/usb/host/xhci.h:1733:9: note: in expansion of macro 'dev_dbg' 1733 | dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args) | ^~~~~~~ drivers/usb/host/xhci-ring.c:2716:25: note: in expansion of macro 'xhci_dbg' 2716 | xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n", | ^~~~~~~~ drivers/usb/host/xhci-ring.c:2913:77: error: 'struct xhci_ring' has no member named 'old_trb_comp_code' 2913 | &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code); | ^~ include/linux/dev_printk.h:139:56: note: in definition of macro 'dev_no_printk' 139 | _dev_printk(level, dev, fmt, ##__VA_ARGS__); \ | ^~~~~~~~~~~ drivers/usb/host/xhci.h:1733:9: note: in expansion of macro 'dev_dbg' 1733 | dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args) | ^~~~~~~ drivers/usb/host/xhci-ring.c:2912:33: note: in expansion of macro 'xhci_dbg' 2912 | xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n", | ^~~~~~~~ drivers/usb/host/xhci-ring.c:2914:40: error: 'struct xhci_ring' has no member named 'old_trb_comp_code' 2914 | ep_ring->old_trb_comp_code = trb_comp_code; | ^~ drivers/usb/host/xhci-ring.c:2942:16: error: 'struct xhci_ring' has no member named 'old_trb_comp_code' 2942 | ep_ring->old_trb_comp_code = trb_comp_code; | ^~ drivers/usb/host/xhci-ring.c: In function 'xhci_spurious_success_tx_event': >> drivers/usb/host/xhci-ring.c:2661:1: warning: control reaches end of non-void function [-Wreturn-type] 2661 | } | ^ vim +2650 drivers/usb/host/xhci-ring.c 2646 2647 static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci, 2648 struct xhci_ring *ring) 2649 { > 2650 switch (ring->old_trb_comp_code) { 2651 case COMP_SHORT_PACKET: 2652 return xhci->quirks & XHCI_SPURIOUS_SUCCESS; 2653 case COMP_USB_TRANSACTION_ERROR: 2654 case COMP_BABBLE_DETECTED_ERROR: 2655 case COMP_ISOCH_BUFFER_OVERRUN: 2656 return xhci->quirks & XHCI_ETRON_HOST && 2657 ring->type == TYPE_ISOC; 2658 default: 2659 return false; 2660 } > 2661 } 2662
Hi Mathias, kernel test robot noticed the following build errors: [auto build test ERROR on usb/usb-testing] [also build test ERROR on usb/usb-next usb/usb-linus linus/master v6.14-rc4 next-20250228] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Mathias-Nyman/xhci-Handle-spurious-events-on-Etron-host-isoc-enpoints/20250301-001842 base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing patch link: https://lore.kernel.org/r/20250228161824.3164826-1-mathias.nyman%40linux.intel.com patch subject: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints config: hexagon-randconfig-002-20250301 (https://download.01.org/0day-ci/archive/20250301/202503010302.yhUVRgse-lkp@intel.com/config) compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project 14170b16028c087ca154878f5ed93d3089a965c6) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250301/202503010302.yhUVRgse-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202503010302.yhUVRgse-lkp@intel.com/ All errors (new ones prefixed by >>): >> drivers/usb/host/xhci-ring.c:2650:16: error: no member named 'old_trb_comp_code' in 'struct xhci_ring' 2650 | switch (ring->old_trb_comp_code) { | ~~~~ ^ drivers/usb/host/xhci-ring.c:2717:34: error: no member named 'old_trb_comp_code' in 'struct xhci_ring' 2717 | slot_id, ep_index, ep_ring->old_trb_comp_code); | ~~~~~~~ ^ drivers/usb/host/xhci.h:1733:56: note: expanded from macro 'xhci_dbg' 1733 | dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args) | ^~~~ include/linux/dev_printk.h:165:39: note: expanded from macro 'dev_dbg' 165 | dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__) | ^~~~~~~~~~~ include/linux/dynamic_debug.h:274:19: note: expanded from macro 'dynamic_dev_dbg' 274 | dev, fmt, ##__VA_ARGS__) | ^~~~~~~~~~~ include/linux/dynamic_debug.h:250:59: note: expanded from macro '_dynamic_func_call' 250 | _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__) | ^~~~~~~~~~~ include/linux/dynamic_debug.h:248:65: note: expanded from macro '_dynamic_func_call_cls' 248 | __dynamic_func_call_cls(__UNIQUE_ID(ddebug), cls, fmt, func, ##__VA_ARGS__) | ^~~~~~~~~~~ include/linux/dynamic_debug.h:224:15: note: expanded from macro '__dynamic_func_call_cls' 224 | func(&id, ##__VA_ARGS__); \ | ^~~~~~~~~~~ drivers/usb/host/xhci-ring.c:2913:44: error: no member named 'old_trb_comp_code' in 'struct xhci_ring' 2913 | &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code); | ~~~~~~~ ^ drivers/usb/host/xhci.h:1733:56: note: expanded from macro 'xhci_dbg' 1733 | dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args) | ^~~~ include/linux/dev_printk.h:165:39: note: expanded from macro 'dev_dbg' 165 | dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__) | ^~~~~~~~~~~ include/linux/dynamic_debug.h:274:19: note: expanded from macro 'dynamic_dev_dbg' 274 | dev, fmt, ##__VA_ARGS__) | ^~~~~~~~~~~ include/linux/dynamic_debug.h:250:59: note: expanded from macro '_dynamic_func_call' 250 | _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__) | ^~~~~~~~~~~ include/linux/dynamic_debug.h:248:65: note: expanded from macro '_dynamic_func_call_cls' 248 | __dynamic_func_call_cls(__UNIQUE_ID(ddebug), cls, fmt, func, ##__VA_ARGS__) | ^~~~~~~~~~~ include/linux/dynamic_debug.h:224:15: note: expanded from macro '__dynamic_func_call_cls' 224 | func(&id, ##__VA_ARGS__); \ | ^~~~~~~~~~~ drivers/usb/host/xhci-ring.c:2914:14: error: no member named 'old_trb_comp_code' in 'struct xhci_ring' 2914 | ep_ring->old_trb_comp_code = trb_comp_code; | ~~~~~~~ ^ drivers/usb/host/xhci-ring.c:2942:11: error: no member named 'old_trb_comp_code' in 'struct xhci_ring' 2942 | ep_ring->old_trb_comp_code = trb_comp_code; | ~~~~~~~ ^ 5 errors generated. vim +2650 drivers/usb/host/xhci-ring.c 2646 2647 static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci, 2648 struct xhci_ring *ring) 2649 { > 2650 switch (ring->old_trb_comp_code) { 2651 case COMP_SHORT_PACKET: 2652 return xhci->quirks & XHCI_SPURIOUS_SUCCESS; 2653 case COMP_USB_TRANSACTION_ERROR: 2654 case COMP_BABBLE_DETECTED_ERROR: 2655 case COMP_ISOCH_BUFFER_OVERRUN: 2656 return xhci->quirks & XHCI_ETRON_HOST && 2657 ring->type == TYPE_ISOC; 2658 default: 2659 return false; 2660 } 2661 } 2662
Hi, Thanks for the patch. Mathias Nyman <mathias.nyman@linux.intel.com> 於 2025年3月1日 週六 上午12:17寫道: > > Unplugging a USB3.0 webcam from Etron hosts while streaming results > in errors like this: > > [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13 > [ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650 > [ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13 > [ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670 > > Etron xHC generates two transfer events for the TRB if an error is > detected while processing the last TRB of an isoc TD. > > The first event can be any sort of error (like USB Transaction or > Babble Detected, etc), and the final event is Success. > > The xHCI driver will handle the TD after the first event and remove it > from its internal list, and then print an "Transfer event TRB DMA ptr > not part of current TD" error message after the final event. > > Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a > transaction error mid TD.") is designed to address isoc transaction > errors, but unfortunately it doesn't account for this scenario. > > This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success > event follows a 'short transfer' event, but the TD the event points to > is already given back. > > Expand the spurious success 'short transfer' event handling to cover > the spurious success after error on Etron hosts. > > Kuangyi Chiang reported this issue and submitted a different solution > based on using error_mid_td. This commit message is mostly taken > from that patch. > > Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com> > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> > --- > drivers/usb/host/xhci-ring.c | 36 +++++++++++++++++++++++++----------- > drivers/usb/host/xhci.h | 2 +- > 2 files changed, 26 insertions(+), 12 deletions(-) > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 965bffce301e..3d3e6cd69019 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -2644,6 +2644,22 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_ > return 0; > } > > +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci, > + struct xhci_ring *ring) > +{ > + switch (ring->old_trb_comp_code) { > + case COMP_SHORT_PACKET: > + return xhci->quirks & XHCI_SPURIOUS_SUCCESS; > + case COMP_USB_TRANSACTION_ERROR: > + case COMP_BABBLE_DETECTED_ERROR: > + case COMP_ISOCH_BUFFER_OVERRUN: > + return xhci->quirks & XHCI_ETRON_HOST && > + ring->type == TYPE_ISOC; > + default: > + return false; > + } > +} > + > /* > * If this function returns an error condition, it means it got a Transfer > * event with a corrupted Slot ID, Endpoint ID, or TRB DMA address. > @@ -2697,8 +2713,8 @@ static int handle_tx_event(struct xhci_hcd *xhci, > case COMP_SUCCESS: > if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) { > trb_comp_code = COMP_SHORT_PACKET; > - xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n", > - slot_id, ep_index, ep_ring->last_td_was_short); > + xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n", > + slot_id, ep_index, ep_ring->old_trb_comp_code); > } > break; > case COMP_SHORT_PACKET: > @@ -2846,7 +2862,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, > */ > if (trb_comp_code != COMP_STOPPED && > trb_comp_code != COMP_STOPPED_LENGTH_INVALID && > - !ep_ring->last_td_was_short) { > + !xhci_spurious_success_tx_event(xhci, ep_ring)) { > xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n", > slot_id, ep_index); > } > @@ -2890,11 +2906,12 @@ static int handle_tx_event(struct xhci_hcd *xhci, > > /* > * Some hosts give a spurious success event after a short > - * transfer. Ignore it. > + * transfer or error on last TRB. Ignore it. > */ > - if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) && > - ep_ring->last_td_was_short) { > - ep_ring->last_td_was_short = false; > + if (xhci_spurious_success_tx_event(xhci, ep_ring)) { > + xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n", > + &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code); > + ep_ring->old_trb_comp_code = trb_comp_code; > return 0; > } > > @@ -2922,10 +2939,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, > */ > } while (ep->skip); > > - if (trb_comp_code == COMP_SHORT_PACKET) > - ep_ring->last_td_was_short = true; > - else > - ep_ring->last_td_was_short = false; > + ep_ring->old_trb_comp_code = trb_comp_code; > > ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)]; > trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb, ep_trb_dma); > diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h > index 8c164340a2c3..c75c2c12ce53 100644 > --- a/drivers/usb/host/xhci.h > +++ b/drivers/usb/host/xhci.h > @@ -1371,7 +1371,7 @@ struct xhci_ring { > unsigned int num_trbs_free; /* used only by xhci DbC */ > unsigned int bounce_buf_len; > enum xhci_ring_type type; > - bool last_td_was_short; > + u32 last_td_comp_code; Should be changed to old_trb_comp_code. > struct radix_tree_root *trb_address_map; > }; > > -- > 2.43.0 > I'll test this later. Thanks, Kuangyi Chiang
Hi, Kuangyi Chiang <ki.chiang65@gmail.com> 於 2025年3月1日 週六 上午10:05寫道: > > Hi, > > Thanks for the patch. > > Mathias Nyman <mathias.nyman@linux.intel.com> 於 2025年3月1日 週六 上午12:17寫道: > > > > Unplugging a USB3.0 webcam from Etron hosts while streaming results > > in errors like this: > > > > [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13 > > [ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650 > > [ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13 > > [ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670 > > > > Etron xHC generates two transfer events for the TRB if an error is > > detected while processing the last TRB of an isoc TD. > > > > The first event can be any sort of error (like USB Transaction or > > Babble Detected, etc), and the final event is Success. > > > > The xHCI driver will handle the TD after the first event and remove it > > from its internal list, and then print an "Transfer event TRB DMA ptr > > not part of current TD" error message after the final event. > > > > Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a > > transaction error mid TD.") is designed to address isoc transaction > > errors, but unfortunately it doesn't account for this scenario. > > > > This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success > > event follows a 'short transfer' event, but the TD the event points to > > is already given back. > > > > Expand the spurious success 'short transfer' event handling to cover > > the spurious success after error on Etron hosts. > > > > Kuangyi Chiang reported this issue and submitted a different solution > > based on using error_mid_td. This commit message is mostly taken > > from that patch. > > > > Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com> > > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> > > --- > > drivers/usb/host/xhci-ring.c | 36 +++++++++++++++++++++++++----------- > > drivers/usb/host/xhci.h | 2 +- > > 2 files changed, 26 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > > index 965bffce301e..3d3e6cd69019 100644 > > --- a/drivers/usb/host/xhci-ring.c > > +++ b/drivers/usb/host/xhci-ring.c > > @@ -2644,6 +2644,22 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_ > > return 0; > > } > > > > +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci, > > + struct xhci_ring *ring) > > +{ > > + switch (ring->old_trb_comp_code) { > > + case COMP_SHORT_PACKET: > > + return xhci->quirks & XHCI_SPURIOUS_SUCCESS; > > + case COMP_USB_TRANSACTION_ERROR: > > + case COMP_BABBLE_DETECTED_ERROR: > > + case COMP_ISOCH_BUFFER_OVERRUN: > > + return xhci->quirks & XHCI_ETRON_HOST && > > + ring->type == TYPE_ISOC; > > + default: > > + return false; > > + } > > +} > > + > > /* > > * If this function returns an error condition, it means it got a Transfer > > * event with a corrupted Slot ID, Endpoint ID, or TRB DMA address. > > @@ -2697,8 +2713,8 @@ static int handle_tx_event(struct xhci_hcd *xhci, > > case COMP_SUCCESS: > > if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) { > > trb_comp_code = COMP_SHORT_PACKET; > > - xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n", > > - slot_id, ep_index, ep_ring->last_td_was_short); > > + xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n", > > + slot_id, ep_index, ep_ring->old_trb_comp_code); > > } > > break; > > case COMP_SHORT_PACKET: > > @@ -2846,7 +2862,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, > > */ > > if (trb_comp_code != COMP_STOPPED && > > trb_comp_code != COMP_STOPPED_LENGTH_INVALID && > > - !ep_ring->last_td_was_short) { > > + !xhci_spurious_success_tx_event(xhci, ep_ring)) { > > xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n", > > slot_id, ep_index); > > } > > @@ -2890,11 +2906,12 @@ static int handle_tx_event(struct xhci_hcd *xhci, > > > > /* > > * Some hosts give a spurious success event after a short > > - * transfer. Ignore it. > > + * transfer or error on last TRB. Ignore it. > > */ > > - if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) && > > - ep_ring->last_td_was_short) { > > - ep_ring->last_td_was_short = false; > > + if (xhci_spurious_success_tx_event(xhci, ep_ring)) { > > + xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n", > > + &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code); > > + ep_ring->old_trb_comp_code = trb_comp_code; > > return 0; > > } > > > > @@ -2922,10 +2939,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, > > */ > > } while (ep->skip); > > > > - if (trb_comp_code == COMP_SHORT_PACKET) > > - ep_ring->last_td_was_short = true; > > - else > > - ep_ring->last_td_was_short = false; > > + ep_ring->old_trb_comp_code = trb_comp_code; > > > > ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)]; > > trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb, ep_trb_dma); > > diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h > > index 8c164340a2c3..c75c2c12ce53 100644 > > --- a/drivers/usb/host/xhci.h > > +++ b/drivers/usb/host/xhci.h > > @@ -1371,7 +1371,7 @@ struct xhci_ring { > > unsigned int num_trbs_free; /* used only by xhci DbC */ > > unsigned int bounce_buf_len; > > enum xhci_ring_type type; > > - bool last_td_was_short; > > + u32 last_td_comp_code; > > Should be changed to old_trb_comp_code. > > > struct radix_tree_root *trb_address_map; > > }; > > > > -- > > 2.43.0 > > > > I'll test this later. It works. See the below dynamic debug messages: [ 1138.281772] xhci_hcd 0000:04:00.0: Successful completion on short TX for slot 1 ep 18 with last td comp code 13 [ 1138.281777] xhci_hcd 0000:04:00.0: Spurious event dma 0x000000010b5ba600, comp_code 13 after 13 [ 1138.282013] xhci_hcd 0000:04:00.0: Successful completion on short TX for slot 1 ep 18 with last td comp code 13 [ 1138.282019] xhci_hcd 0000:04:00.0: Spurious event dma 0x000000010b5ba630, comp_code 13 after 13 [ 1138.282271] xhci_hcd 0000:04:00.0: Successful completion on short TX for slot 1 ep 18 with last td comp code 13 [ 1138.282277] xhci_hcd 0000:04:00.0: Spurious event dma 0x000000010b5ba660, comp_code 13 after 13 [ 1138.282420] xhci_hcd 0000:04:00.0: Successful completion on short TX for slot 1 ep 18 with last td comp code 13 [ 1138.282425] xhci_hcd 0000:04:00.0: Spurious event dma 0x000000010b5ba680, comp_code 13 after 13 [ 1138.282653] xhci_hcd 0000:04:00.0: Transfer error for slot 1 ep 18 on endpoint [ 1138.282659] xhci_hcd 0000:04:00.0: Error mid isoc TD, wait for final completion event [ 1138.282779] xhci_hcd 0000:04:00.0: Successful completion on short TX for slot 1 ep 18 with last td comp code 4 [ 1138.282785] xhci_hcd 0000:04:00.0: Transfer error for slot 1 ep 18 on endpoint [ 1138.282911] xhci_hcd 0000:04:00.0: Successful completion on short TX for slot 1 ep 18 with last td comp code 4 [ 1138.282916] xhci_hcd 0000:04:00.0: Spurious event dma 0x000000010b5ba6c0, comp_code 13 after 4 [ 1138.282920] xhci_hcd 0000:04:00.0: Transfer error for slot 1 ep 18 on endpoint [ 1138.282923] xhci_hcd 0000:04:00.0: Error mid isoc TD, wait for final completion event [ 1138.283039] xhci_hcd 0000:04:00.0: Successful completion on short TX for slot 1 ep 18 with last td comp code 4 [ 1138.283045] xhci_hcd 0000:04:00.0: Transfer error for slot 1 ep 18 on endpoint [ 1138.283163] xhci_hcd 0000:04:00.0: Successful completion on short TX for slot 1 ep 18 with last td comp code 4 [ 1138.283167] xhci_hcd 0000:04:00.0: Spurious event dma 0x000000010b5ba6f0, comp_code 13 after 4 > > Thanks, > Kuangyi Chiang Thanks, Kuangyi Chiang
On 1.3.2025 4.05, Kuangyi Chiang wrote: > Hi, > > Thanks for the patch. > > Mathias Nyman <mathias.nyman@linux.intel.com> 於 2025年3月1日 週六 上午12:17寫道: >> index 8c164340a2c3..c75c2c12ce53 100644 >> --- a/drivers/usb/host/xhci.h >> +++ b/drivers/usb/host/xhci.h >> @@ -1371,7 +1371,7 @@ struct xhci_ring { >> unsigned int num_trbs_free; /* used only by xhci DbC */ >> unsigned int bounce_buf_len; >> enum xhci_ring_type type; >> - bool last_td_was_short; >> + u32 last_td_comp_code; > > Should be changed to old_trb_comp_code. > Thanks, forgot to add that last xhci.h change to the patch -Mathias
On Fri, 28 Feb 2025 18:18:24 +0200, Mathias Nyman wrote: > Unplugging a USB3.0 webcam from Etron hosts while streaming results > in errors like this: > > [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr > not part of current TD ep_index 18 comp_code 13 [ 2.646446] xhci_hcd > 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start > 000000002fdf8640 trb-end 000000002fdf8650 [ 2.646560] xhci_hcd > 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD > ep_index 18 comp_code 13 [ 2.646568] xhci_hcd 0000:03:00.0: Looking > for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end > 000000002fdf8670 > > Etron xHC generates two transfer events for the TRB if an error is > detected while processing the last TRB of an isoc TD. > > The first event can be any sort of error (like USB Transaction or > Babble Detected, etc), and the final event is Success. > > The xHCI driver will handle the TD after the first event and remove it > from its internal list, and then print an "Transfer event TRB DMA ptr > not part of current TD" error message after the final event. > > Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a > transaction error mid TD.") is designed to address isoc transaction > errors, but unfortunately it doesn't account for this scenario. > > This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a > success event follows a 'short transfer' event, but the TD the event > points to is already given back. > > Expand the spurious success 'short transfer' event handling to cover > the spurious success after error on Etron hosts. > > Kuangyi Chiang reported this issue and submitted a different solution > based on using error_mid_td. This commit message is mostly taken > from that patch. > > Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com> > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Works here too, modulo the obvious build problem. Etron with errors: [ 1064.865311] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint [ 1064.865322] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td comp code 4 [ 1064.865326] xhci_hcd 0000:06:00.0: Spurious event dma 0x00000000ffef88c0, comp_code 13 after 4 Renesas with short packets: [ 1365.299218] xhci_hcd 0000:08:00.0: Successful completion on short TX for slot 1 ep 2 with last td comp code 13 [ 1365.299223] xhci_hcd 0000:08:00.0: Spurious event dma 0x00000000ffbbf870, comp_code 13 after 13 BTW, it says "comp_code 13 after something" because of this crazy TRUST_TX_LENGTH hack, which changes trb_comp_code if it's success but the residual is nonzero. If I remove the hack, Etron: [ 2713.630443] xhci_hcd 0000:06:00.0: Spurious event dma 0x00000000ff3b6130, comp_code 1 after 4 Renesas: [ 4033.652300] xhci_hcd 0000:08:00.0: Spurious event dma 0x00000000ffcd1b80, comp_code 1 after 13 The hack could almost be removed now, but if there really are HCs which report Success on the first event, this won't work for them: > +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci, > + struct xhci_ring *ring) > +{ > + switch (ring->old_trb_comp_code) { > + case COMP_SHORT_PACKET: > + return xhci->quirks & XHCI_SPURIOUS_SUCCESS; Could it work without relying on fictional COMP_SHORT_PACKET events? > + if (xhci_spurious_success_tx_event(xhci, ep_ring)) { > + xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n", > + &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code); > + ep_ring->old_trb_comp_code = trb_comp_code; This part will (quite arbitrarily IMO) not execute if td_list is empty. I had this idea that "empty td_list" and "no matching TD on td_list" are practically identical cases, and their code could be merged.
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 965bffce301e..3d3e6cd69019 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2644,6 +2644,22 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_ return 0; } +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci, + struct xhci_ring *ring) +{ + switch (ring->old_trb_comp_code) { + case COMP_SHORT_PACKET: + return xhci->quirks & XHCI_SPURIOUS_SUCCESS; + case COMP_USB_TRANSACTION_ERROR: + case COMP_BABBLE_DETECTED_ERROR: + case COMP_ISOCH_BUFFER_OVERRUN: + return xhci->quirks & XHCI_ETRON_HOST && + ring->type == TYPE_ISOC; + default: + return false; + } +} + /* * If this function returns an error condition, it means it got a Transfer * event with a corrupted Slot ID, Endpoint ID, or TRB DMA address. @@ -2697,8 +2713,8 @@ static int handle_tx_event(struct xhci_hcd *xhci, case COMP_SUCCESS: if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) { trb_comp_code = COMP_SHORT_PACKET; - xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n", - slot_id, ep_index, ep_ring->last_td_was_short); + xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n", + slot_id, ep_index, ep_ring->old_trb_comp_code); } break; case COMP_SHORT_PACKET: @@ -2846,7 +2862,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, */ if (trb_comp_code != COMP_STOPPED && trb_comp_code != COMP_STOPPED_LENGTH_INVALID && - !ep_ring->last_td_was_short) { + !xhci_spurious_success_tx_event(xhci, ep_ring)) { xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n", slot_id, ep_index); } @@ -2890,11 +2906,12 @@ static int handle_tx_event(struct xhci_hcd *xhci, /* * Some hosts give a spurious success event after a short - * transfer. Ignore it. + * transfer or error on last TRB. Ignore it. */ - if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) && - ep_ring->last_td_was_short) { - ep_ring->last_td_was_short = false; + if (xhci_spurious_success_tx_event(xhci, ep_ring)) { + xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n", + &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code); + ep_ring->old_trb_comp_code = trb_comp_code; return 0; } @@ -2922,10 +2939,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, */ } while (ep->skip); - if (trb_comp_code == COMP_SHORT_PACKET) - ep_ring->last_td_was_short = true; - else - ep_ring->last_td_was_short = false; + ep_ring->old_trb_comp_code = trb_comp_code; ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)]; trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb, ep_trb_dma); diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 8c164340a2c3..c75c2c12ce53 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1371,7 +1371,7 @@ struct xhci_ring { unsigned int num_trbs_free; /* used only by xhci DbC */ unsigned int bounce_buf_len; enum xhci_ring_type type; - bool last_td_was_short; + u32 last_td_comp_code; struct radix_tree_root *trb_address_map; };
Unplugging a USB3.0 webcam from Etron hosts while streaming results in errors like this: [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13 [ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650 [ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13 [ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670 Etron xHC generates two transfer events for the TRB if an error is detected while processing the last TRB of an isoc TD. The first event can be any sort of error (like USB Transaction or Babble Detected, etc), and the final event is Success. The xHCI driver will handle the TD after the first event and remove it from its internal list, and then print an "Transfer event TRB DMA ptr not part of current TD" error message after the final event. Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a transaction error mid TD.") is designed to address isoc transaction errors, but unfortunately it doesn't account for this scenario. This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success event follows a 'short transfer' event, but the TD the event points to is already given back. Expand the spurious success 'short transfer' event handling to cover the spurious success after error on Etron hosts. Kuangyi Chiang reported this issue and submitted a different solution based on using error_mid_td. This commit message is mostly taken from that patch. Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> --- drivers/usb/host/xhci-ring.c | 36 +++++++++++++++++++++++++----------- drivers/usb/host/xhci.h | 2 +- 2 files changed, 26 insertions(+), 12 deletions(-)