Message ID | 20210122104342.12451-1-stf_xl@wp.pl (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Kalle Valo |
Headers | show |
Series | usb, xhci, rt2800usb: do not perform Soft Retry | expand |
On Fri, Jan 22, 2021 at 11:43:42AM +0100, stf_xl@wp.pl wrote: > From: Stanislaw Gruszka <stf_xl@wp.pl> > > Since f8f80be501aa ("xhci: Use soft retry to recover faster from transaction > errors") on some systems rt2800usb devices are unable to operate. Looks > that due to firmware or hardware limitations of those devices, they > require full recovery from USB Transaction Errors. > > To avoid the problem add URB transfer flag, that restore pre f8f80be501aa > xhci behaviour when the flag is set. For now only add it only to rt2800usb > driver. This feels like a really heavy hammer, to add a xhci flag for a single broken device. Are you sure this is really needed? What does this device do on other operating systems, do they have such a quirk for their host controller driver? Or is this due to the specific host controller device hardware? Should this be a xhci quirk for a specific pci device instead? > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202541 > Fixes: f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors") > Reported-and-tested-by: Bernhard <bernhard.gebetsberger@gmx.at> > Bisected-by: Bernhard <bernhard.gebetsberger@gmx.at> > Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> > --- > drivers/net/wireless/ralink/rt2x00/rt2x00usb.c | 3 +++ > drivers/usb/core/urb.c | 2 +- > drivers/usb/host/xhci-ring.c | 3 ++- > include/linux/usb.h | 1 + > 4 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c > index e4473a551241..f1d82b3e6bba 100644 > --- a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c > +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c > @@ -214,6 +214,7 @@ void rt2x00usb_register_read_async(struct rt2x00_dev *rt2x00dev, > usb_fill_control_urb(urb, usb_dev, usb_rcvctrlpipe(usb_dev, 0), > (unsigned char *)(&rd->cr), &rd->reg, sizeof(rd->reg), > rt2x00usb_register_read_async_cb, rd); > + urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK; > usb_anchor_urb(urb, rt2x00dev->anchor); > if (usb_submit_urb(urb, GFP_ATOMIC) < 0) { > usb_unanchor_urb(urb); > @@ -323,6 +324,7 @@ static bool rt2x00usb_kick_tx_entry(struct queue_entry *entry, void *data) > usb_sndbulkpipe(usb_dev, entry->queue->usb_endpoint), > entry->skb->data, length, > rt2x00usb_interrupt_txdone, entry); > + entry_priv->urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK; > > status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC); > if (status) { > @@ -409,6 +411,7 @@ static bool rt2x00usb_kick_rx_entry(struct queue_entry *entry, void *data) > usb_rcvbulkpipe(usb_dev, entry->queue->usb_endpoint), > entry->skb->data, entry->skb->len, > rt2x00usb_interrupt_rxdone, entry); > + entry_priv->urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK; > > status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC); > if (status) { > diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c > index 357b149b20d3..140bac59dc32 100644 > --- a/drivers/usb/core/urb.c > +++ b/drivers/usb/core/urb.c > @@ -495,7 +495,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags) > > /* Check against a simple/standard policy */ > allowed = (URB_NO_TRANSFER_DMA_MAP | URB_NO_INTERRUPT | URB_DIR_MASK | > - URB_FREE_BUFFER); > + URB_SOFT_RETRY_NOT_OK | URB_FREE_BUFFER); > switch (xfertype) { > case USB_ENDPOINT_XFER_BULK: > case USB_ENDPOINT_XFER_INT: > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 5677b81c0915..6712e1a7735c 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -2302,7 +2302,8 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td, > remaining = 0; > break; > case COMP_USB_TRANSACTION_ERROR: > - if ((ep_ring->err_count++ > MAX_SOFT_RETRY) || > + if (td->urb->transfer_flags & URB_SOFT_RETRY_NOT_OK || > + (ep_ring->err_count++ > MAX_SOFT_RETRY) || > le32_to_cpu(slot_ctx->tt_info) & TT_SLOT) > break; > *status = 0; > diff --git a/include/linux/usb.h b/include/linux/usb.h > index 7d72c4e0713c..dcdac2f03263 100644 > --- a/include/linux/usb.h > +++ b/include/linux/usb.h > @@ -1329,6 +1329,7 @@ extern int usb_disabled(void); > #define URB_ISO_ASAP 0x0002 /* iso-only; use the first unexpired > * slot in the schedule */ > #define URB_NO_TRANSFER_DMA_MAP 0x0004 /* urb->transfer_dma valid on submit */ > +#define URB_SOFT_RETRY_NOT_OK 0x0008 /* Avoid XHCI Soft Retry */ To match other flags here, how about "URB_NO_SOFT_RETRY"? thanks, greg k-h
On 22.01.21 at 12:56 Greg KH wrote: > On Fri, Jan 22, 2021 at 11:43:42AM +0100, stf_xl@wp.pl wrote: >> From: Stanislaw Gruszka <stf_xl@wp.pl> >> >> Since f8f80be501aa ("xhci: Use soft retry to recover faster from transaction >> errors") on some systems rt2800usb devices are unable to operate. Looks >> that due to firmware or hardware limitations of those devices, they >> require full recovery from USB Transaction Errors. >> >> To avoid the problem add URB transfer flag, that restore pre f8f80be501aa >> xhci behaviour when the flag is set. For now only add it only to rt2800usb >> driver. > > This feels like a really heavy hammer, to add a xhci flag for a single > broken device. > > Are you sure this is really needed? What does this device do on other > operating systems, do they have such a quirk for their host controller > driver? > > Or is this due to the specific host controller device hardware? Should > this be a xhci quirk for a specific pci device instead? Well, rt2800usb USB implementation does have a lot of potential for optimization since the very beginning (current throughput comparison 2 MiB/s vs 13 MiB/s with the original driver e.g.). That's why I'm using until today a self patched version (it's bound to cfg80211 meanwhile) of the original driver (rt5572sta), which doesn't have those problems at all. From my point of view, the goal should be to solve the real reason for the problem. The original driver works much better (leastwise here) and doesn't show this problem at all! But anyway, there is from my point of view a basic problem with xhci_hcd, which just seems not to be completely backward compatible to existing USB 2 drivers (see https://marc.info/?l=linux-usb&m=161130327411612&w=2) if the device is plugged to an USB 3.x interface. Thanks Andreas
On Fri, Jan 22, 2021 at 12:56:30PM +0100, Greg KH wrote: > On Fri, Jan 22, 2021 at 11:43:42AM +0100, stf_xl@wp.pl wrote: > > From: Stanislaw Gruszka <stf_xl@wp.pl> > > > > Since f8f80be501aa ("xhci: Use soft retry to recover faster from transaction > > errors") on some systems rt2800usb devices are unable to operate. Looks > > that due to firmware or hardware limitations of those devices, they > > require full recovery from USB Transaction Errors. > > > > To avoid the problem add URB transfer flag, that restore pre f8f80be501aa > > xhci behaviour when the flag is set. For now only add it only to rt2800usb > > driver. > > This feels like a really heavy hammer, to add a xhci flag for a single > broken device. > > Are you sure this is really needed? I'm not sure if this is needed, however this particular bug was reported as regression caused by f8f80be501aa commit on 4.19 -> 4.20 cycle. It was reported to Mathias Nyman - xhci maintainer and f8f80be501aa commit author. But since then, regardless of some Mathias work on this on xhci side, we did not get solution that fixed the problem. From other side, I can ask why change from f8f80be501aa is need? Taking it's commit message, the benefit of the change is not obvious. What I can notice, is that it only broke support for hardware that worked previously. However I assume that the change is useful and needed, so I come up with patch that just revert this change only for rt2800usb. > What does this device do on other > operating systems, do they have such a quirk for their host controller > driver? I don't know what other OSes do. > Or is this due to the specific host controller device hardware? Should > this be a xhci quirk for a specific pci device instead? That certainly possible. However taking that the issue is observed only when usb bus transition error happens, what I think can be very rare, it's not easy to identify actual culprit of the problem. > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202541 > > Fixes: f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors") > > Reported-and-tested-by: Bernhard <bernhard.gebetsberger@gmx.at> > > Bisected-by: Bernhard <bernhard.gebetsberger@gmx.at> > > Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> > > --- > > drivers/net/wireless/ralink/rt2x00/rt2x00usb.c | 3 +++ > > drivers/usb/core/urb.c | 2 +- > > drivers/usb/host/xhci-ring.c | 3 ++- > > include/linux/usb.h | 1 + > > 4 files changed, 7 insertions(+), 2 deletions(-) [snip] > > --- a/include/linux/usb.h > > +++ b/include/linux/usb.h > > @@ -1329,6 +1329,7 @@ extern int usb_disabled(void); > > #define URB_ISO_ASAP 0x0002 /* iso-only; use the first unexpired > > * slot in the schedule */ > > #define URB_NO_TRANSFER_DMA_MAP 0x0004 /* urb->transfer_dma valid on submit */ > > +#define URB_SOFT_RETRY_NOT_OK 0x0008 /* Avoid XHCI Soft Retry */ > > To match other flags here, how about "URB_NO_SOFT_RETRY"? I named the flag based on existing "URB_SHORT_NOT_OK" flag, but I could rename it to URB_NO_SOFT_RETRY, no problem with that. Stanislaw
On 22.1.2021 15.26, Stanislaw Gruszka wrote: > On Fri, Jan 22, 2021 at 12:56:30PM +0100, Greg KH wrote: >> On Fri, Jan 22, 2021 at 11:43:42AM +0100, stf_xl@wp.pl wrote: >>> From: Stanislaw Gruszka <stf_xl@wp.pl> >>> >>> Since f8f80be501aa ("xhci: Use soft retry to recover faster from transaction >>> errors") on some systems rt2800usb devices are unable to operate. Looks >>> that due to firmware or hardware limitations of those devices, they >>> require full recovery from USB Transaction Errors. >>> >>> To avoid the problem add URB transfer flag, that restore pre f8f80be501aa >>> xhci behaviour when the flag is set. For now only add it only to rt2800usb >>> driver. >> >> This feels like a really heavy hammer, to add a xhci flag for a single >> broken device. I agree, rootcause is still unknown. This bug hasn't gotten enough attention. I'm tidying up a rewrite of areas that touches this, but it didn't seem to help. I'd anyway like to get the rewrite done first, then get a new set of logs and traces, and take a fresh look at this Meanwhile it could be checked if this issue is seen only on some xHCI controllers. Maybe some vendors don't support soft retry properly. we could easily prevent soft retry usage on those xHC hosts. >> >> Are you sure this is really needed? > > I'm not sure if this is needed, however this particular bug was reported > as regression caused by f8f80be501aa commit on 4.19 -> 4.20 cycle. It > was reported to Mathias Nyman - xhci maintainer and f8f80be501aa commit > author. But since then, regardless of some Mathias work on this on xhci > side, we did not get solution that fixed the problem. > > From other side, I can ask why change from f8f80be501aa is need? Taking > it's commit message, the benefit of the change is not obvious. What > I can notice, is that it only broke support for hardware that worked > previously. However I assume that the change is useful and needed, > so I come up with patch that just revert this change only for rt2800usb. Significantly faster recovery from transaction errors. Many errors are temporary due to electrical interference, and a simple retry will solve the case. see xhci spec section 4.6.8.1 "Soft Retry" for details. > >> What does this device do on other >> operating systems, do they have such a quirk for their host controller >> driver? > > I don't know what other OSes do. > >> Or is this due to the specific host controller device hardware? Should >> this be a xhci quirk for a specific pci device instead? Exactly, this should be checked. Stanislaw, weren't there a few users already that saw this issue? Do we know what xHCI controllers they were using? -Mathias
On 22.1.2021 15.17, Andreas Hartmann wrote: > > On 22.01.21 at 12:56 Greg KH wrote: >> On Fri, Jan 22, 2021 at 11:43:42AM +0100, stf_xl@wp.pl wrote: >>> From: Stanislaw Gruszka <stf_xl@wp.pl> >>> >>> Since f8f80be501aa ("xhci: Use soft retry to recover faster from transaction >>> errors") on some systems rt2800usb devices are unable to operate. Looks >>> that due to firmware or hardware limitations of those devices, they >>> require full recovery from USB Transaction Errors. >>> >>> To avoid the problem add URB transfer flag, that restore pre f8f80be501aa >>> xhci behaviour when the flag is set. For now only add it only to rt2800usb >>> driver. >> >> This feels like a really heavy hammer, to add a xhci flag for a single >> broken device. >> >> Are you sure this is really needed? What does this device do on other >> operating systems, do they have such a quirk for their host controller >> driver? >> >> Or is this due to the specific host controller device hardware? Should >> this be a xhci quirk for a specific pci device instead? > > Well, rt2800usb USB implementation does have a lot of potential for > optimization since the very beginning (current throughput comparison > 2 MiB/s vs 13 MiB/s with the original driver e.g.). That's why I'm > using until today a self patched version (it's bound to cfg80211 > meanwhile) of the original driver (rt5572sta), which doesn't have those > problems at all. From my point of view, the goal should be to solve the > real reason for the problem. The original driver works much better > (leastwise here) and doesn't show this problem at all! Ok, so it could be a rt2800 driver issue, or it just hitting some unlucky sequence that triggers this. > > But anyway, there is from my point of view a basic problem with xhci_hcd, > which just seems not to be completely backward compatible to existing USB 2 > drivers (see https://marc.info/?l=linux-usb&m=161130327411612&w=2) if the > device is plugged to an USB 3.x interface. This looks like a different issue, lets keep it in its own thread. The xHCI usb host controller handles both USB 2 and USB 3 speeds. If the USB port is connected to a xHC controller then the xhci driver will be used. If the port is connected to a EHCI then the ehci driever is used. EHCI does not support USB3 speeds. It's very possible that something that worked behind a EHCI host has issues when connected to a xHCI host. -Mathias
On 22.01.21 at 16:22 Mathias Nyman wrote: > On 22.1.2021 15.17, Andreas Hartmann wrote: >> But anyway, there is from my point of view a basic problem with xhci_hcd, >> which just seems not to be completely backward compatible to existing USB 2 >> drivers (see https://marc.info/?l=linux-usb&m=161130327411612&w=2) if the >> device is plugged to an USB 3.x interface. > > This looks like a different issue, lets keep it in its own thread. > > The xHCI usb host controller handles both USB 2 and USB 3 speeds. > If the USB port is connected to a xHC controller then the xhci driver will > be used. If the port is connected to a EHCI then the ehci driever is used. > EHCI does not support USB3 speeds. > > It's very possible that something that worked behind a EHCI host has issues > when connected to a xHCI host. I would be very glad to get this sorted out. At the moment I absolutely don't know where exactly to try to prevent to trigger the "xhci_hcd 0000:05:00.3: WARN Wrong bounce buffer write length: 0 != 512" situation. Is it on the level where the bulk packages are created or even before? Thanks Andreas
On Fri, Jan 22, 2021 at 05:00:21PM +0200, Mathias Nyman wrote: > >> Or is this due to the specific host controller device hardware? Should > >> this be a xhci quirk for a specific pci device instead? > > Exactly, this should be checked. > Stanislaw, weren't there a few users already that saw this issue? There are 30+ users cc-ed in the bugzilla report. However I think some of them are not affected by issue originally reported by Bernhard. They just saw "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" message caused by different problem and added comment to this particular bug report. > Do we know what xHCI controllers they were using? What I can tell issue was reported mostly on ASMedia and AMD controllers. We can ask for exact vendor and device IDs and just add xhci->quirks flag. However I'm not entirely sure that xHCI hardware misbehave is actual root cause. I think equally probable is that connected device do not handle soft retry correctly. In that case disabling Soft Retry per device would be actually "lightest hammer" since other devices connected to the xHCI host could benefit from faster recovery. Is there way to debug/identify which side: host or device hardware misbehave when Soft Retry is performed ? Stanislaw
On Sat, Jan 23, 2021 at 11:14:19AM +0100, Stanislaw Gruszka wrote: > On Fri, Jan 22, 2021 at 05:00:21PM +0200, Mathias Nyman wrote: > > >> Or is this due to the specific host controller device hardware? Should > > >> this be a xhci quirk for a specific pci device instead? > > > > Exactly, this should be checked. > > Stanislaw, weren't there a few users already that saw this issue? > > There are 30+ users cc-ed in the bugzilla report. However I think some > of them are not affected by issue originally reported by Bernhard. > They just saw "WARN Set TR Deq Ptr cmd failed due to incorrect slot > or ep state" message caused by different problem and added comment > to this particular bug report. > > > Do we know what xHCI controllers they were using? > > What I can tell issue was reported mostly on ASMedia and AMD > controllers. We can ask for exact vendor and device IDs and > just add xhci->quirks flag. > > However I'm not entirely sure that xHCI hardware misbehave > is actual root cause. I think equally probable is that > connected device do not handle soft retry correctly. In that > case disabling Soft Retry per device would be actually > "lightest hammer" since other devices connected to the > xHCI host could benefit from faster recovery. > > Is there way to debug/identify which side: host or device > hardware misbehave when Soft Retry is performed ? So I think we do not have such possibility. If xhci quirk is something that can be accepted I'll precede with such patch. I'm going to ask bug reporters about xHCI conntroler PCI-id's ... Stanislaw
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c index e4473a551241..f1d82b3e6bba 100644 --- a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c @@ -214,6 +214,7 @@ void rt2x00usb_register_read_async(struct rt2x00_dev *rt2x00dev, usb_fill_control_urb(urb, usb_dev, usb_rcvctrlpipe(usb_dev, 0), (unsigned char *)(&rd->cr), &rd->reg, sizeof(rd->reg), rt2x00usb_register_read_async_cb, rd); + urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK; usb_anchor_urb(urb, rt2x00dev->anchor); if (usb_submit_urb(urb, GFP_ATOMIC) < 0) { usb_unanchor_urb(urb); @@ -323,6 +324,7 @@ static bool rt2x00usb_kick_tx_entry(struct queue_entry *entry, void *data) usb_sndbulkpipe(usb_dev, entry->queue->usb_endpoint), entry->skb->data, length, rt2x00usb_interrupt_txdone, entry); + entry_priv->urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK; status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC); if (status) { @@ -409,6 +411,7 @@ static bool rt2x00usb_kick_rx_entry(struct queue_entry *entry, void *data) usb_rcvbulkpipe(usb_dev, entry->queue->usb_endpoint), entry->skb->data, entry->skb->len, rt2x00usb_interrupt_rxdone, entry); + entry_priv->urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK; status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC); if (status) { diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c index 357b149b20d3..140bac59dc32 100644 --- a/drivers/usb/core/urb.c +++ b/drivers/usb/core/urb.c @@ -495,7 +495,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags) /* Check against a simple/standard policy */ allowed = (URB_NO_TRANSFER_DMA_MAP | URB_NO_INTERRUPT | URB_DIR_MASK | - URB_FREE_BUFFER); + URB_SOFT_RETRY_NOT_OK | URB_FREE_BUFFER); switch (xfertype) { case USB_ENDPOINT_XFER_BULK: case USB_ENDPOINT_XFER_INT: diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 5677b81c0915..6712e1a7735c 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2302,7 +2302,8 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td, remaining = 0; break; case COMP_USB_TRANSACTION_ERROR: - if ((ep_ring->err_count++ > MAX_SOFT_RETRY) || + if (td->urb->transfer_flags & URB_SOFT_RETRY_NOT_OK || + (ep_ring->err_count++ > MAX_SOFT_RETRY) || le32_to_cpu(slot_ctx->tt_info) & TT_SLOT) break; *status = 0; diff --git a/include/linux/usb.h b/include/linux/usb.h index 7d72c4e0713c..dcdac2f03263 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -1329,6 +1329,7 @@ extern int usb_disabled(void); #define URB_ISO_ASAP 0x0002 /* iso-only; use the first unexpired * slot in the schedule */ #define URB_NO_TRANSFER_DMA_MAP 0x0004 /* urb->transfer_dma valid on submit */ +#define URB_SOFT_RETRY_NOT_OK 0x0008 /* Avoid XHCI Soft Retry */ #define URB_ZERO_PACKET 0x0040 /* Finish bulk OUT with short packet */ #define URB_NO_INTERRUPT 0x0080 /* HINT: no non-error interrupt * needed */