Message ID | 20190219154135.5967-1-suwan.kim027@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | usb: xhci: Support running urb giveback in tasklet context | expand |
Hi On 19.2.2019 17.41, Suwan Kim wrote: > Patch "USB: HCD: support giveback of URB in tasklet context" > introduced giveback of urb in tasklet context. [1] This patch was > applied to ehci but not xhci. [2] It significantly reduces the hard > irq time of xhci. Especially for the uvc driver, the hard irq including > the uvc complete function runs quite long (about 300-350us in my > Thinkpad s440 laptop with webcam) but applying this patch reduces > the hard irq time of xhci to about 18-30us. Sorry about the late reply. Can you recall any reason why this wasn't applied to xhci back then? xhci is doing a lot in hard interrupt context, and reducing that would be a good idea. Another option to look at is using threaded interrupts for xhci. We might however be opening a can of worms with this, the impact is unknown. How much testing was done with URB return in tasklet for SS devices? Would be nice to test this out on a bit wider audience before applying it. -Mathias
Hi Mathias, On Thu, Feb 28, 2019 at 11:18:58AM +0200, Mathias Nyman wrote: > Hi > > On 19.2.2019 17.41, Suwan Kim wrote: > > Patch "USB: HCD: support giveback of URB in tasklet context" > > introduced giveback of urb in tasklet context. [1] This patch was > > applied to ehci but not xhci. [2] It significantly reduces the hard > > irq time of xhci. Especially for the uvc driver, the hard irq including > > the uvc complete function runs quite long (about 300-350us in my > > Thinkpad s440 laptop with webcam) but applying this patch reduces > > the hard irq time of xhci to about 18-30us. > > Sorry about the late reply. > Can you recall any reason why this wasn't applied to xhci back then? Ming Lei who is the author of the giveback-tasklet patch in usb core worked with Alan stern when he was working on the patch, and he only replaced giveback of ehci. I don't know why he didn't replace xhci. After some time, somebody asked him why this patch did not apply to xhci. And there was no apparent reason.[1] IMO, no one seems interested in xhci... > xhci is doing a lot in hard interrupt context, and reducing that would > be a good idea. Another option to look at is using threaded interrupts > for xhci. When the giveback-tasklet patch was working, the author experimented with comparing threaded interrupt and tasklet. At that time, tasklet showed better performance than threaded interrupt.[2] > We might however be opening a can of worms with this, the impact is unknown. > > How much testing was done with URB return in tasklet for SS devices? > > Would be nice to test this out on a bit wider audience before applying it. Unfortunately, SS devices I have are only usb mass storage devices. When I tested with a USB mass storage device, both xhci with tasklet and without tasklet(urb complete in hard IRQ) showed similar performance. [USB-mass storage TEST] - Testbed is i5-7600 and two mass storage devices (usb flash memory, external hard drive) are used. Test is executed 10 times and figure out the average speed - dd if=/dev/sdN of=/dev/null iflag=direct bs=1G count=1 - device1 : Sandisk Ultra Flair USB 3.0 32GB - device2 : WD My Passport 2TB (external hard drive) - xhci without tasklet - device1 - 129.727MB/s - device2 - 103.667MB/s - xhci with tasklet - device1 - 103.2MB/s - device2 - 103.692MB/s I have only one high-speed isochronous type device that is built-in webcam in my laptop (Thinkpad s440, i5-4210U) In this case, it is difficult to measure the performance. So i checked the overrun/underrun event in xhci. Until now, no overrun/underrun event has occurred in actual use. Regards Suwan Kim
I forgot to add links... sorry! [1], https://marc.info/?l=linux-usb&m=139523715802919&w=2 [2], https://marc.info/?l=linux-usb&m=137109031822092&w=2
There is typo.. sorry again in [USB-mass storge TEST] - xhci with tasklet - device1 - 130.2MB/s (not 103.2MS/s!) - device2 - 103.692MB/s
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 40fa25c4d041..0ede5265e6e2 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -644,10 +644,8 @@ static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci, } xhci_urb_free_priv(urb_priv); usb_hcd_unlink_urb_from_ep(hcd, urb); - spin_unlock(&xhci->lock); trace_xhci_urb_giveback(urb); usb_hcd_giveback_urb(hcd, urb, status); - spin_lock(&xhci->lock); } static void xhci_unmap_td_bounce_buffer(struct xhci_hcd *xhci, diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 005e65922608..b4610f70d660 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -5142,7 +5142,7 @@ static const struct hc_driver xhci_hc_driver = { * generic hardware linkage */ .irq = xhci_irq, - .flags = HCD_MEMORY | HCD_USB3 | HCD_SHARED, + .flags = HCD_MEMORY | HCD_USB3 | HCD_SHARED | HCD_BH, /* * basic lifecycle operations
Patch "USB: HCD: support giveback of URB in tasklet context" introduced giveback of urb in tasklet context. [1] This patch was applied to ehci but not xhci. [2] It significantly reduces the hard irq time of xhci. Especially for the uvc driver, the hard irq including the uvc complete function runs quite long (about 300-350us in my Thinkpad s440 laptop with webcam) but applying this patch reduces the hard irq time of xhci to about 18-30us. As mentioned in the patch [1], when the urb complete function is executed in the tasklet, urb resubmission may be delayed and a time sensitive interrupt/isochronous pipe may be affected. However, as mentioned in [1], an interrupt pipe is mainly used for event report. So some delays are fine. In the case of isochronous pipe, executing the urb complete function in tasklet can cause late urb resubmissions due to tasklet schduling delay. Then the td can not match the endpoint service time interval resulting in an empty isochronous transfer ring and ring overrun/ underrun event. But this is not a problem. xhci spec 4.10.3.1 describes the xhc behavior when an empty isochronous transfer ring situation occurs due to the software's late resubmission of urb. And ehci also operates as the xhci specification. [3] If the driver can not match the endpoint service time interval, the isochronous transfer ring is empty and an overrun/underrun event occurs. Then the isochronous td that missed the frame is dropped and xhc shcedule the next td in appropriate frame. Or the xhc immediately schedules the missing td to the next frame as soon as possible if the driver sets the URB_ISO_ASAP flag in the urb->transfer_flags. Therefore, there is no scheduling issue for the late submitted isochronous td. And drivers using isochronous pipe usually submit multiple urbs. For example, uvc driver uses up to 5 urbs. Therefore, the time interval at which the isochronous transfer ring is empty is quite long.[4] Therefore, executing the urb complete function in the tasklet does not affect the isochronous transfer ring overrun/underrun. [1], https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=94dfd7edfd5c9b605caf7b562de7a813d216e011 [2], https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428aac8a81058e2303677a8fbf26670229e51d3a [3], https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=46c73d1d3ebc38feed1d97c6980252a0a01f6a5b [4], https://marc.info/?l=linux-usb&m=137268420600615&w=2 Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> --- drivers/usb/host/xhci-ring.c | 2 -- drivers/usb/host/xhci.c | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-)