Message ID | 186569458.91967.1492547106553@email.1und1.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Stefan, On Tue, Apr 18, 2017 at 1:25 PM, Stefan Wahren <stefan.wahren@i2se.com> wrote: > Hi, > > [add Johan] > >> Stefan Wahren <stefan.wahren@i2se.com> hat am 18. April 2017 um 10:07 geschrieben: >> >> >> Am 18.04.2017 um 00:37 schrieb Doug Anderson: >> > Hi, >> > >> > On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren <stefan.wahren@i2se.com> wrote: >> >> Hi, >> >> >> >>> Stefan Wahren <stefan.wahren@i2se.com> hat am 31. Oktober 2016 um 21:34 geschrieben: >> >>> >> >>> >> >>> I inspired by this issue [1] i build up a slightly modified setup with a >> >>> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 Prolific >> >>> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which >> >>> only tries to open all ttyUSB devices one after the other. >> >>> >> >>> Unfortunately the complete system stuck after opening the first ttyUSB device ( >> >>> heartbeat LED stop blinking, no reaction to debug UART). The only way to >> >>> reanimate the system is to powerdown the USB hub with the USB to serial >> >>> convertors. >> >>> >> >>> [1] - https://github.com/raspberrypi/linux/issues/1692 >> >>> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa >> >> since this issue still exists with 4.11 (even without or with microframe scheduler enabled), i want to ask some additional questions: >> >> >> >> Is this issue reproducible with other dwc2 platforms than bcm2835? >> > +Edmund Szeto, who I seem to remember emailing me about similar >> > questions in the past. >> > >> > >> >> Does the soft lockup also occurs after opening the second serial convertor or later? >> > I don't have serial converters easily available to me, but back in the >> > day when I was stressing things out on rk3288 I never saw anything >> > this bad. ...of course, on rk3288 we've got 4 A17 cores running >> > really fast, so possibly just being slower is what causes your >> > problems here? >> >> The downstream kernel of the Raspberry Pi foundation with it's >> out-of-tree dwc_otg driver is able to handle 8 serial converter on a RPI >> B. I would be happy to get at least 2 or 3 working on mainline. >> >> > >> > I will make the following observations: >> > >> > 1. With dwc2 you often end up in the situation where you need to >> > service an interrupt every 125 uS. If servicing that interrupt takes >> > anywhere near 125 uS in the common case then you'll be in trouble. >> >> I will try to measure this with a logic analyzer. >> > > i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below). > > So i made my observations for 3 test cases: > > 1) no serial converter connected (idle) > 2) 1 FTDI serial converter connected > 3) 1 PL2303 serial converter connected > > case | ksoftirq cpu | mean duration | max duration | max duration | urb_enqueue | > | | hcd_irq | hcd_irq | urb_enqueue | within 10 sec| > -------+------------------+---------------+---------------+--------------+--------------+ > idle | 0.0% | 2 us | 16.5 us | 12 us | 5 | > FTDI | 25.0% | 8.5 us | 18.0 us | 31000 us | ~ 400 | > PL2303 | top doesn't work | 8.5 us | 22.5 us | 900000 us | 4 | It's hard to know for sure that all of this time is really in urb_enqueue(). Possible we could have task switched out and been blocked elsewhere. Using ftrace to get more fine-grained timings would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your friends here if you want to use trace_printk. > So it seems the serial USB driver has also an impact. In the analyzer trace the FTDI triggers many smaller urb_enqueue calls in the opposite to the PL2303 which only has few but huge calls. > > Additional notes: > After closing the serial connection to the FTDI the system is usable as before. In case of PL2303 i need to disconnect the converter in order to get a usable system. > > Why do they behave so differently? > Are these results of a overload? > Doug, can you point me to your timing patch? I did in the previous message, but basically take a look at: https://chromium-review.googlesource.com/c/321980/4/drivers/usb/core/hcd.c#1668 I never posted this upstream since it was just a test patch. > > Thanks > Stefan > > PS: My hacky patches for measuring > > diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h > index 1a7e830..aa490c3 100644 > --- a/drivers/usb/dwc2/core.h > +++ b/drivers/usb/dwc2/core.h > @@ -930,6 +930,9 @@ struct dwc2_hsotg { > struct dentry *debug_root; > struct debugfs_regset32 *regset; > > + int gpio_17; > + int gpio_18; > + > /* DWC OTG HW Release versions */ > #define DWC2_CORE_REV_2_71a 0x4f54271a > #define DWC2_CORE_REV_2_90a 0x4f54290a > diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c > index a73722e..97271c8 100644 > --- a/drivers/usb/dwc2/hcd.c > +++ b/drivers/usb/dwc2/hcd.c > @@ -45,6 +45,7 @@ > #include <linux/platform_device.h> > #include <linux/dma-mapping.h> > #include <linux/delay.h> > +#include <linux/gpio.h> > #include <linux/io.h> > #include <linux/slab.h> > #include <linux/usb.h> > @@ -4595,13 +4596,17 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, > bool qh_allocated = false; > struct dwc2_qtd *qtd; > > + gpio_set_value(hsotg->gpio_18, 1); > + > if (dbg_urb(urb)) { > dev_vdbg(hsotg->dev, "DWC OTG HCD URB Enqueue\n"); > dwc2_dump_urb_info(hcd, urb, "urb_enqueue"); > } > > - if (!ep) > + if (!ep) { > + gpio_set_value(hsotg->gpio_18, 0); > return -EINVAL; > + } > > if (usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS || > usb_pipetype(urb->pipe) == PIPE_INTERRUPT) { > @@ -4628,8 +4633,10 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, > > dwc2_urb = dwc2_hcd_urb_alloc(hsotg, urb->number_of_packets, > mem_flags); > - if (!dwc2_urb) > + if (!dwc2_urb) { > + gpio_set_value(hsotg->gpio_18, 0); > return -ENOMEM; > + } > > dwc2_hcd_urb_set_pipeinfo(hsotg, dwc2_urb, usb_pipedevice(urb->pipe), > usb_pipeendpoint(urb->pipe), ep_type, > @@ -4705,6 +4712,8 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, > > spin_unlock_irqrestore(&hsotg->lock, flags); > > + gpio_set_value(hsotg->gpio_18, 0); > + > return 0; > > fail3: > @@ -4732,6 +4741,8 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, > fail0: > kfree(dwc2_urb); > > + gpio_set_value(hsotg->gpio_18, 0); > + > return retval; > } > > @@ -4825,8 +4836,15 @@ static void _dwc2_hcd_endpoint_reset(struct usb_hcd *hcd, > static irqreturn_t _dwc2_hcd_irq(struct usb_hcd *hcd) > { > struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd); > + irqreturn_t ret; > > - return dwc2_handle_hcd_intr(hsotg); > + gpio_set_value(hsotg->gpio_17, 1); > + > + ret = dwc2_handle_hcd_intr(hsotg); > + > + gpio_set_value(hsotg->gpio_17, 0); > + > + return ret; > } > > /* > diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c > index 9564bc7..e3f8a8a 100644 > --- a/drivers/usb/dwc2/platform.c > +++ b/drivers/usb/dwc2/platform.c > @@ -35,6 +35,7 @@ > */ > > #include <linux/kernel.h> > +#include <linux/gpio.h> > #include <linux/module.h> > #include <linux/slab.h> > #include <linux/clk.h> > @@ -404,6 +405,16 @@ static int dwc2_driver_probe(struct platform_device *dev) > return hsotg->irq; > } > > + hsotg->gpio_17 = 475; > + retval = devm_gpio_request_one(hsotg->dev, hsotg->gpio_17, GPIOF_OUT_INIT_LOW, "gpio_17"); > + if (retval) > + pr_warn("Unable to request GPIO17: %d\n", retval); > + > + hsotg->gpio_18 = 476; > + retval = devm_gpio_request_one(hsotg->dev, hsotg->gpio_18, GPIOF_OUT_INIT_LOW, "gpio_18"); > + if (retval) > + pr_warn("Unable to request GPIO18: %d\n", retval); > + > dev_dbg(hsotg->dev, "registering common handler for irq%d\n", > hsotg->irq); > retval = devm_request_irq(hsotg->dev, hsotg->irq,
> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: > > > Stefan, > > On Tue, Apr 18, 2017 at 1:25 PM, Stefan Wahren <stefan.wahren@i2se.com> wrote: > > Hi, > > > > [add Johan] > > > >> Stefan Wahren <stefan.wahren@i2se.com> hat am 18. April 2017 um 10:07 geschrieben: > >> > >> > >> Am 18.04.2017 um 00:37 schrieb Doug Anderson: > >> > Hi, > >> > > >> > On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren <stefan.wahren@i2se.com> wrote: > >> >> Hi, > >> >> > >> >>> Stefan Wahren <stefan.wahren@i2se.com> hat am 31. Oktober 2016 um 21:34 geschrieben: > >> >>> > >> >>> > >> >>> I inspired by this issue [1] i build up a slightly modified setup with a > >> >>> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 Prolific > >> >>> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which > >> >>> only tries to open all ttyUSB devices one after the other. > >> >>> > >> >>> Unfortunately the complete system stuck after opening the first ttyUSB device ( > >> >>> heartbeat LED stop blinking, no reaction to debug UART). The only way to > >> >>> reanimate the system is to powerdown the USB hub with the USB to serial > >> >>> convertors. > >> >>> > >> >>> [1] - https://github.com/raspberrypi/linux/issues/1692 > >> >>> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa > >> >> since this issue still exists with 4.11 (even without or with microframe scheduler enabled), i want to ask some additional questions: > >> >> > >> >> Is this issue reproducible with other dwc2 platforms than bcm2835? > >> > +Edmund Szeto, who I seem to remember emailing me about similar > >> > questions in the past. > >> > > >> > > >> >> Does the soft lockup also occurs after opening the second serial convertor or later? > >> > I don't have serial converters easily available to me, but back in the > >> > day when I was stressing things out on rk3288 I never saw anything > >> > this bad. ...of course, on rk3288 we've got 4 A17 cores running > >> > really fast, so possibly just being slower is what causes your > >> > problems here? > >> > >> The downstream kernel of the Raspberry Pi foundation with it's > >> out-of-tree dwc_otg driver is able to handle 8 serial converter on a RPI > >> B. I would be happy to get at least 2 or 3 working on mainline. > >> > >> > > >> > I will make the following observations: > >> > > >> > 1. With dwc2 you often end up in the situation where you need to > >> > service an interrupt every 125 uS. If servicing that interrupt takes > >> > anywhere near 125 uS in the common case then you'll be in trouble. > >> > >> I will try to measure this with a logic analyzer. > >> > > > > i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below). > > > > So i made my observations for 3 test cases: > > > > 1) no serial converter connected (idle) > > 2) 1 FTDI serial converter connected > > 3) 1 PL2303 serial converter connected > > > > case | ksoftirq cpu | mean duration | max duration | max duration | urb_enqueue | > > | | hcd_irq | hcd_irq | urb_enqueue | within 10 sec| > > -------+------------------+---------------+---------------+--------------+--------------+ > > idle | 0.0% | 2 us | 16.5 us | 12 us | 5 | > > FTDI | 25.0% | 8.5 us | 18.0 us | 31000 us | ~ 400 | > > PL2303 | top doesn't work | 8.5 us | 22.5 us | 900000 us | 4 | > > It's hard to know for sure that all of this time is really in > urb_enqueue(). Possible we could have task switched out and been > blocked elsewhere. Using ftrace to get more fine-grained timings > would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your > friends here if you want to use trace_printk. > I saw your last reply after sending my last mail. I will go further with ftrace.
On Tue, Apr 18, 2017 at 10:25:06PM +0200, Stefan Wahren wrote: > Hi, > > [add Johan] > > > Stefan Wahren <stefan.wahren@i2se.com> hat am 18. April 2017 um 10:07 geschrieben: > > > > > > Am 18.04.2017 um 00:37 schrieb Doug Anderson: > > > Hi, > > > > > > On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren <stefan.wahren@i2se.com> wrote: > > >> Hi, > > >> > > >>> Stefan Wahren <stefan.wahren@i2se.com> hat am 31. Oktober 2016 um 21:34 geschrieben: > > >>> > > >>> > > >>> I inspired by this issue [1] i build up a slightly modified > > >>> setup with a Raspberry Pi B (mainline kernel 4.9rc3), a powered > > >>> 7 port USB hub and 5 Prolific PL2303 USB to serial convertors. I > > >>> modified the usb_test for dwc2 [2], which only tries to open all > > >>> ttyUSB devices one after the other. > > >>> > > >>> Unfortunately the complete system stuck after opening the first > > >>> ttyUSB device ( heartbeat LED stop blinking, no reaction to > > >>> debug UART). The only way to reanimate the system is to > > >>> powerdown the USB hub with the USB to serial convertors. > > >>> > > >>> [1] - https://github.com/raspberrypi/linux/issues/1692 > i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure > _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below). > > So i made my observations for 3 test cases: > > 1) no serial converter connected (idle) > 2) 1 FTDI serial converter connected > 3) 1 PL2303 serial converter connected > > case | ksoftirq cpu | mean duration | max duration | max duration | urb_enqueue | > | | hcd_irq | hcd_irq | urb_enqueue | within 10 sec| > -------+------------------+---------------+---------------+--------------+--------------+ > idle | 0.0% | 2 us | 16.5 us | 12 us | 5 | > FTDI | 25.0% | 8.5 us | 18.0 us | 31000 us | ~ 400 | > PL2303 | top doesn't work | 8.5 us | 22.5 us | 900000 us | 4 | > > So it seems the serial USB driver has also an impact. In the analyzer > trace the FTDI triggers many smaller urb_enqueue calls in the opposite > to the PL2303 which only has few but huge calls. What do you mean by "huge calls" above? Are you just keeping the ports open (i.e. with no data being received or sent)? FTDI devices are notorious for their status messages being sent continuously while the port is open. You'll get a two-byte bulk-in message every 16ms by default (these were sent every millisecond up until recently due to a long-standing regression). PL2303 devices have an interrupt-in endpoint which is used for status notifications so you would not see any completion callbacks on an otherwise idle port. > Additional notes: > After closing the serial connection to the FTDI the system is usable > as before. In case of PL2303 i need to disconnect the converter in > order to get a usable system. Does your system lock up when you open the first pl2303 device? > Why do they behave so differently? So the ftdi bulk-in status messages and the pl2303 interrupt-in endpoint could be two reasons, but I guess so could any change in timing etc. Is your ftdi-device a Full Speed device like the pl2303? Johan
Am 19.04.2017 um 10:56 schrieb Johan Hovold: > On Tue, Apr 18, 2017 at 10:25:06PM +0200, Stefan Wahren wrote: >> Hi, >> >> [add Johan] >> >>> Stefan Wahren <stefan.wahren@i2se.com> hat am 18. April 2017 um 10:07 geschrieben: >>> >>> >>> Am 18.04.2017 um 00:37 schrieb Doug Anderson: >>>> Hi, >>>> >>>> On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren <stefan.wahren@i2se.com> wrote: >>>>> Hi, >>>>> >>>>>> Stefan Wahren <stefan.wahren@i2se.com> hat am 31. Oktober 2016 um 21:34 geschrieben: >>>>>> >>>>>> >>>>>> I inspired by this issue [1] i build up a slightly modified >>>>>> setup with a Raspberry Pi B (mainline kernel 4.9rc3), a powered >>>>>> 7 port USB hub and 5 Prolific PL2303 USB to serial convertors. I >>>>>> modified the usb_test for dwc2 [2], which only tries to open all >>>>>> ttyUSB devices one after the other. >>>>>> >>>>>> Unfortunately the complete system stuck after opening the first >>>>>> ttyUSB device ( heartbeat LED stop blinking, no reaction to >>>>>> debug UART). The only way to reanimate the system is to >>>>>> powerdown the USB hub with the USB to serial convertors. >>>>>> >>>>>> [1] - https://github.com/raspberrypi/linux/issues/1692 >> i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure >> _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below). >> >> So i made my observations for 3 test cases: >> >> 1) no serial converter connected (idle) >> 2) 1 FTDI serial converter connected >> 3) 1 PL2303 serial converter connected >> >> case | ksoftirq cpu | mean duration | max duration | max duration | urb_enqueue | >> | | hcd_irq | hcd_irq | urb_enqueue | within 10 sec| >> -------+------------------+---------------+---------------+--------------+--------------+ >> idle | 0.0% | 2 us | 16.5 us | 12 us | 5 | >> FTDI | 25.0% | 8.5 us | 18.0 us | 31000 us | ~ 400 | >> PL2303 | top doesn't work | 8.5 us | 22.5 us | 900000 us | 4 | >> >> So it seems the serial USB driver has also an impact. In the analyzer >> trace the FTDI triggers many smaller urb_enqueue calls in the opposite >> to the PL2303 which only has few but huge calls. > What do you mean by "huge calls" above? The time it takes until _dwc2_hcd_urb_enqueue is finished. > > Are you just keeping the ports open (i.e. with no data being received or > sent)? Yes, only open and no data is received or sent (LEDs doesn't show any activity). > > FTDI devices are notorious for their status messages being sent > continuously while the port is open. You'll get a two-byte bulk-in > message every 16ms by default (these were sent every millisecond up > until recently due to a long-standing regression). > > PL2303 devices have an interrupt-in endpoint which is used for status > notifications so you would not see any completion callbacks on an > otherwise idle port. > >> Additional notes: >> After closing the serial connection to the FTDI the system is usable >> as before. In case of PL2303 i need to disconnect the converter in >> order to get a usable system. > Does your system lock up when you open the first pl2303 device? Yes (directly to the onboard hub or an external hub make no difference). > >> Why do they behave so differently? > So the ftdi bulk-in status messages and the pl2303 interrupt-in endpoint > could be two reasons, but I guess so could any change in timing etc. > > Is your ftdi-device a Full Speed device like the pl2303? Sorry, i will need to verify. Thanks Stefan > > Johan
On Wed, Apr 19, 2017 at 11:12:17AM +0200, Stefan Wahren wrote: > Am 19.04.2017 um 10:56 schrieb Johan Hovold: > > On Tue, Apr 18, 2017 at 10:25:06PM +0200, Stefan Wahren wrote: > >> Hi, > >> > >> [add Johan] > >> > >>> Stefan Wahren <stefan.wahren@i2se.com> hat am 18. April 2017 um 10:07 geschrieben: > >>> > >>> > >>> Am 18.04.2017 um 00:37 schrieb Doug Anderson: > >>>> Hi, > >>>> > >>>> On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren <stefan.wahren@i2se.com> wrote: > >>>>> Hi, > >>>>> > >>>>>> Stefan Wahren <stefan.wahren@i2se.com> hat am 31. Oktober 2016 um 21:34 geschrieben: > >>>>>> > >>>>>> > >>>>>> I inspired by this issue [1] i build up a slightly modified > >>>>>> setup with a Raspberry Pi B (mainline kernel 4.9rc3), a powered > >>>>>> 7 port USB hub and 5 Prolific PL2303 USB to serial convertors. I > >>>>>> modified the usb_test for dwc2 [2], which only tries to open all > >>>>>> ttyUSB devices one after the other. > >>>>>> > >>>>>> Unfortunately the complete system stuck after opening the first > >>>>>> ttyUSB device ( heartbeat LED stop blinking, no reaction to > >>>>>> debug UART). The only way to reanimate the system is to > >>>>>> powerdown the USB hub with the USB to serial convertors. > >>>>>> > >>>>>> [1] - https://github.com/raspberrypi/linux/issues/1692 > >> i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure > >> _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below). > >> > >> So i made my observations for 3 test cases: > >> > >> 1) no serial converter connected (idle) > >> 2) 1 FTDI serial converter connected > >> 3) 1 PL2303 serial converter connected > >> > >> case | ksoftirq cpu | mean duration | max duration | max duration | urb_enqueue | > >> | | hcd_irq | hcd_irq | urb_enqueue | within 10 sec| > >> -------+------------------+---------------+---------------+--------------+--------------+ > >> idle | 0.0% | 2 us | 16.5 us | 12 us | 5 | > >> FTDI | 25.0% | 8.5 us | 18.0 us | 31000 us | ~ 400 | > >> PL2303 | top doesn't work | 8.5 us | 22.5 us | 900000 us | 4 | > >> > >> So it seems the serial USB driver has also an impact. In the analyzer > >> trace the FTDI triggers many smaller urb_enqueue calls in the opposite > >> to the PL2303 which only has few but huge calls. > > What do you mean by "huge calls" above? > > The time it takes until _dwc2_hcd_urb_enqueue is finished. Yes, sorry, I realised that after I sent my reply. > > Are you just keeping the ports open (i.e. with no data being received or > > sent)? > > Yes, only open and no data is received or sent (LEDs doesn't show any > activity). > > > FTDI devices are notorious for their status messages being sent > > continuously while the port is open. You'll get a two-byte bulk-in > > message every 16ms by default (these were sent every millisecond up > > until recently due to a long-standing regression). > > > > PL2303 devices have an interrupt-in endpoint which is used for status > > notifications so you would not see any completion callbacks on an > > otherwise idle port. > > > >> Additional notes: > >> After closing the serial connection to the FTDI the system is usable > >> as before. In case of PL2303 i need to disconnect the converter in > >> order to get a usable system. > > Does your system lock up when you open the first pl2303 device? > > Yes (directly to the onboard hub or an external hub make no difference). Ok, good. I suggest not using the external hub until you've tracked this down. > >> Why do they behave so differently? > > So the ftdi bulk-in status messages and the pl2303 interrupt-in endpoint > > could be two reasons, but I guess so could any change in timing etc. > > > > Is your ftdi-device a Full Speed device like the pl2303? > > Sorry, i will need to verify. Can you reproduce this on 4.11-rc7 without the external hub and post the corresponding soft lockup bug output? On open, the pl2303 driver first enqueues the interrupt-in URB before enqueueing two bulk-in URBs. From your original stack trace it looks like the lockup happens when queueing either of those. Perhaps you can enable debugging in the dwc2 driver or add some printks to determine how far you get. Another thing to try is to comment out the interrupt-in URB submission in pl2303 and see whether that avoids triggering the lockup. Johan
Hi, > Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: > > > It's hard to know for sure that all of this time is really in > urb_enqueue(). Possible we could have task switched out and been > blocked elsewhere. Using ftrace to get more fine-grained timings > would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your > friends here if you want to use trace_printk. i'm a newbie to ftrace, so i hope this would be helpful. # connect PL2303 to the onboard hub # echo 0 > options/sleep-time # echo 0 > function_profile_enabled # echo 1 > function_profile_enabled # ./usb_test # Waiting for at least 20 seconds and then disconnect PL2303 # echo 0 > function_profile_enabled # cat trace_stat/function0 Function Hit Time Avg s^2 -------- --- ---- --- --- bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us irq_exit 1082482 98197771 us 90.715 us 29649040 us handle_level_irq 1082482 95812379 us 88.511 us 51910093 us do_sys_open 1806 87612983 us 48512.17 us 2198507 us SyS_open 1601 87372331 us 54573.59 us 1898996 us do_filp_open 1862 87368058 us 46921.62 us 1634982 us path_openat 1862 87314553 us 46892.88 us 3357817 us __do_softirq 3035 86266050 us 28423.73 us 6449768 us vfs_open 1515 85877012 us 56684.49 us 101673.5 us do_dentry_open 1515 85861429 us 56674.21 us 812420.7 us usb_submit_urb 136 85760172 us 630589.5 us 59532024 us usb_hcd_submit_urb 134 85756518 us 639974.0 us 726298102 us _dwc2_hcd_urb_enqueue 134 85738333 us 639838.3 us 874104371 us chrdev_open 87 85716519 us 985247.3 us 1951112835 us tty_open 3 85714494 us 28571498 us 3743206849 us tty_port_open 3 85712603 us 28570867 us 1968003468 us serial_open 1 85712472 us 85712472 us 0.000 us serial_port_activate 1 85709890 us 85709890 us 0.000 us pl2303_open 1 85709875 us 85709875 us 0.000 us usb_serial_generic_open 1 85701170 us 85701170 us 0.000 us usb_serial_generic_submit_read 1 85701166 us 85701166 us 0.000 us usb_serial_generic_submit_read 2 85701160 us 42850580 us 2252410463 us handle_irq_event 1082482 80686112 us 74.538 us 36417905 us handle_irq_event_percpu 1082482 78398378 us 72.424 us 34060119 us __handle_irq_event_percpu 1082482 68466046 us 63.249 us 53609076 us usb_hcd_irq 1078945 59761116 us 55.388 us 25139388 us _dwc2_hcd_irq 1078945 57481606 us 53.275 us 30847036 us dwc2_handle_hcd_intr 1078945 55380910 us 51.328 us 35986237 us do_idle 394 43577379 us 110602.4 us 84278209 us dwc2_hc_n_intr 651065 42408491 us 65.137 us 7281229 us default_idle_call 229969 38021287 us 165.332 us 11386904 us arch_cpu_idle 229969 37379305 us 162.540 us 2168909 us dwc2_halt_channel 650811 34208348 us 52.562 us 7792012 us dwc2_release_channel 651065 33042638 us 50.751 us 8112754 us dwc2_hc_nak_intr 334689 18892951 us 56.449 us 1823592 us dwc2_hc_ack_intr 315986 18323433 us 57.988 us 151346.6 us dwc2_hcd_select_transactions 1730010 11694955 us 6.760 us 1188442 us dwc2_hcd_queue_transactions 651065 10931854 us 16.790 us 952074.2 us dwc2_assign_and_init_hc 651065 9540885 us 14.654 us 258067.1 us dwc2_queue_transaction 651065 9192559 us 14.119 us 736431.6 us tick_nohz_irq_exit 325055 9052192 us 27.848 us 916293.3 us __tick_nohz_idle_enter.constpr 324760 8463861 us 26.061 us 2410440 us dwc2_hc_start_transfer 651065 7687376 us 11.807 us 625773.3 us irq_enter 1082482 6627959 us 6.122 us 11528932 us arm_heavy_mb 10903913 6603586 us 0.605 us 60562924 us unmask_irq 1082482 4763699 us 4.400 us 1032638 us tick_irq_enter 326467 4377088 us 13.407 us 4606805 us dwc2_handle_common_intr 1078945 3799392 us 3.521 us 1131504 us add_interrupt_randomness 1082482 3542044 us 3.272 us 30545511 us get_next_armctrl_hwirq 1443829 3276974 us 2.269 us 2116848 us tick_nohz_restart 70325 3250462 us 46.220 us 35710.73 us dwc2_hc_cleanup 651065 3079003 us 4.729 us 81202.50 us dwc2_hcd_qh_deactivate 651065 3039940 us 4.669 us 90928.10 us ledtrig_cpu 459938 2450327 us 5.327 us 1466705 us arch_cpu_idle_exit 229969 2443315 us 10.624 us 8606787 us do_page_fault 5509 2304998 us 418.405 us 17961439 us ktime_get 795998 2121741 us 2.665 us 166119.5 us get_next_timer_interrupt 324760 1998999 us 6.155 us 32201.63 us handle_mm_fault 5537 1945500 us 351.363 us 8324917 us dwc2_is_controller_alive 2157890 1473590 us 0.682 us 291399.4 us hrtimer_cancel 70450 1394330 us 19.791 us 36980.28 us arch_cpu_idle_enter 229969 1350759 us 5.873 us 10789.94 us hrtimer_try_to_cancel 71029 1266967 us 17.837 us 30260.75 us filemap_map_pages 1632 1263006 us 773.900 us 207846.4 us tick_program_event 141799 1260485 us 8.889 us 9343.194 us hrtimer_start_range_ns 97555 1253698 us 12.851 us 43166.94 us irq_to_desc 1082482 1146955 us 1.059 us 546937.6 us link_path_walk 3486 1134351 us 325.401 us 484443.0 us __remove_hrtimer 99205 999724.0 us 10.077 us 201933.9 us clockevents_program_event 141799 985346.0 us 6.948 us 7114.725 us note_interrupt 1082482 899494.0 us 0.830 us 705141.6 us update_ts_time_stats.constprop 324760 862996.0 us 2.657 us 21813.81 us alloc_set_pte 20011 805880.0 us 40.271 us 1086748 us SyS_read 1940 805061.0 us 414.979 us 164198.7 us _local_bh_enable 326467 777908.0 us 2.382 us 37202.43 us vfs_read 2080 761487.0 us 366.099 us 22944777 us dwc2_hcd_get_frame_number 1097673 721108.0 us 0.656 us 525348.1 us do_munmap 1194 717161.0 us 600.637 us 65635.31 us schedule 3297 700578.0 us 212.489 us 28594.59 us __vfs_read 2080 694376.0 us 333.834 us 21955675 us walk_component 9912 683211.0 us 68.927 us 349771.7 us irq_find_mapping 1082482 673162.0 us 0.621 us 900861.5 us __next_timer_interrupt 324929 667175.0 us 2.053 us 4576.720 us armctrl_translate_shortcut 1079274 662875.0 us 0.614 us 164464.8 us filename_lookup 1316 646059.0 us 490.926 us 148884.8 us mmput 259 644417.0 us 2488.096 us 33155247 us armctrl_unmask_irq 1082482 639692.0 us 0.590 us 891074.7 us armctrl_mask_irq 1082482 636505.0 us 0.588 us 888583.1 us exit_mmap 45 634389.0 us 14097.53 us 24105769 us irq_state_set_masked 1082482 622182.0 us 0.574 us 894212.6 us user_path_at_empty 1156 620229.0 us 536.530 us 166865.3 us SyS_munmap 895 600613.0 us 671.075 us 75763.68 us vm_mmap_pgoff 1437 598955.0 us 416.809 us 55882.41 us read_current_timer 1082666 598757.0 us 0.553 us 86817.59 us irq_may_run 1082482 598471.0 us 0.552 us 898308.6 us vm_munmap 937 595309.0 us 635.335 us 70914.31 us path_lookupat 1316 592448.0 us 450.188 us 147624.4 us SyS_mmap_pgoff 1325 588992.0 us 444.522 us 62142.49 us irq_state_clr_masked 1082482 582077.0 us 0.537 us 893572.4 us idle_cpu 1082482 580428.0 us 0.536 us 905609.5 us do_execve 28 573864.0 us 20495.14 us 98356111 us do_execveat_common 28 573576.0 us 20484.85 us 98471441 us dwc2_hc_set_even_odd_frame 651065 570127.0 us 0.875 us 543701.5 us ledtrig_cpu.part.0 229969 549060.0 us 2.387 us 9142.920 us do_mmap 1437 548817.0 us 381.918 us 54270.64 us do_work_pending 2211 525805.0 us 237.813 us 234836.0 us led_trigger_event 460228 501093.0 us 1.088 us 4759241 us search_binary_handler 28 501026.0 us 17893.78 us 93445723 us clocksource_mmio_readl_up 807791 500765.0 us 0.619 us 91050.42 us load_elf_binary 28 500359.0 us 17869.96 us 93309220 us process_one_work 558 496994.0 us 890.670 us 7196614 us tick_do_update_jiffies64 324826 496168.0 us 1.527 us 3422867 us hrtimer_forward 72095 490887.0 us 6.808 us 4548.034 us unmap_vmas 1239 490538.0 us 395.914 us 2723415 us finish_task_switch 3926 479983.0 us 122.257 us 25336.71 us unmap_single_vma 3403 474812.0 us 139.527 us 959184.6 us __sync_icache_dcache 27231 468069.0 us 17.188 us 364740.1 us SyS_execve 16 462210.0 us 28888.12 us 4450186 us unmap_region 1194 460297.0 us 385.508 us 44372.58 us unmap_page_range 3403 455941.0 us 133.982 us 956896.1 us bcm2835_time_interrupt 3208 455584.0 us 142.014 us 56012.72 us mmap_region 1437 449599.0 us 312.873 us 55175.49 us hrtimer_interrupt 3208 443908.0 us 138.375 us 55446.73 us dwc2_hcd_qh_add 632341 436511.0 us 0.690 us 58787.66 us lookup_fast 11601 436227.0 us 37.602 us 570183.6 us inode_permission 14699 426528.0 us 29.017 us 404007.8 us SyS_access 561 412200.0 us 734.759 us 146663.8 us dwc2_hcd_qh_unlink 632344 410157.0 us 0.648 us 87505.07 us seq_read 669 408885.0 us 611.188 us 226703.8 us SyS_faccessat 564 406969.0 us 721.576 us 141955.4 us dwc2_check_qtd_still_ok 651065 392283.0 us 0.602 us 59460.80 us SyS_getdents64 392 379673.0 us 968.553 us 13678636 us SyS_write 404 378782.0 us 937.579 us 254399.0 us iterate_dir 393 367777.0 us 935.819 us 13659529 us vfs_write 404 358326.0 us 886.945 us 240956.8 us __hrtimer_run_queues.constprop 3227 357720.0 us 110.852 us 47783.12 us task_work_run 1460 350902.0 us 240.343 us 71379.18 us _cond_resched 109971 344855.0 us 3.135 us 2245087 us __vfs_write 404 333260.0 us 824.900 us 230347.3 us ____fput 1680 326232.0 us 194.185 us 40913.99 us __inode_permission 14699 324102.0 us 22.049 us 328791.7 us __fput 1680 316285.0 us 188.264 us 40874.09 us flush_old_exec 28 313336.0 us 11190.57 us 96843591 us tick_sched_timer 1964 312793.0 us 159.263 us 2002.533 us do_wp_page 1758 300693.0 us 171.042 us 17667.03 us _do_fork 29 281388.0 us 9703.034 us 54207832 us copy_process.part.4 29 273985.0 us 9447.758 us 53888393 us SyS_clone 17 267170.0 us 15715.88 us 2444683 us free_pgtables 1239 265593.0 us 214.360 us 500428.0 us dput 15879 265579.0 us 16.725 us 269685.7 us SyS_openat 205 258191.0 us 1259.468 us 209643.6 us pfn_valid 18146 257726.0 us 14.202 us 121416.6 us update_wall_time 5957 250787.0 us 42.099 us 619.230 us mmc_blk_issue_rq 63 246608.0 us 3914.412 us 8858815 us mmc_blk_issue_rw_rq 63 241626.0 us 3835.333 us 8884205 us vfs_statx 441 232198.0 us 526.526 us 150124.0 us SyS_sendmsg 316 226698.0 us 717.398 us 164208.5 us mmc_start_request 145 225615.0 us 1555.965 us 5542154 us __sys_sendmsg 316 223856.0 us 708.405 us 164907.1 us wp_page_copy 1061 222841.0 us 210.029 us 13010.76 us __generic_file_write_iter 206 219092.0 us 1063.553 us 242948.7 us terminate_walk 3225 217649.0 us 67.488 us 47336.14 us ext4_file_write_iter 170 216280.0 us 1272.235 us 127738.3 us __mmc_start_request 145 214090.0 us 1476.482 us 5509235 us ___sys_sendmsg 316 212732.0 us 673.202 us 161410.8 us kmem_cache_alloc 17570 210590.0 us 11.985 us 120917.2 us mmc_start_areq 63 210528.0 us 3341.714 us 9610382 us sdhci_request 145 209828.0 us 1447.089 us 5547701 us sock_sendmsg 353 208523.0 us 590.716 us 147666.9 us __split_vma 981 206458.0 us 210.456 us 13253.43 us credit_entropy_bits 16914 205434.0 us 12.145 us 208.830 us hrtimer_get_next_event 324760 199659.0 us 0.614 us 17454.58 us path_put 6090 196782.0 us 32.312 us 54264.43 us SyS_mprotect 427 193190.0 us 452.435 us 42168.10 us SyS_stat64 370 181587.0 us 490.775 us 119707.4 us nr_iowait_cpu 324760 176767.0 us 0.544 us 17782.33 us __vma_adjust 1452 174253.0 us 120.008 us 5746.323 us tick_sched_handle 1964 174004.0 us 88.596 us 419.232 us sdhci_irq 211 173984.0 us 824.568 us 4212897 us touch_softlockup_watchdog_sche 323207 173767.0 us 0.537 us 108651.0 us kernfs_iop_permission 5012 173442.0 us 34.605 us 21830.09 us __local_bh_enable 329502 172898.0 us 0.524 us 9002.660 us proc_single_show 233 170938.0 us 733.639 us 235941.1 us mprotect_fixup 455 163622.0 us 359.608 us 43581.82 us SyS_recvmsg 515 163337.0 us 317.159 us 23749.85 us update_process_times 1964 161227.0 us 82.091 us 436.558 us usb_control_msg 131 160306.0 us 1223.709 us 63779.20 us __sys_recvmsg 515 159685.0 us 310.067 us 23888.88 us check_carrier 29 159209.0 us 5489.965 us 157176.1 us __dentry_kill 1306 159044.0 us 121.779 us 13522.69 us smsc95xx_mdio_read 29 156605.0 us 5400.172 us 147578.2 us __smsc95xx_mdio_read 29 156128.0 us 5383.724 us 153872.9 us ext4_readdir 60 152036.0 us 2533.933 us 67111799 us run_ksoftirqd 844 151846.0 us 179.912 us 74010.42 us usb_start_wait_urb 131 151181.0 us 1154.053 us 59721.86 us find_vma 8806 149418.0 us 16.967 us 419885.5 us __alloc_pages_nodemask 4365 143647.0 us 32.908 us 22236.43 us ___sys_recvmsg 515 143567.0 us 278.770 us 23183.05 us memblock_is_map_memory 18146 142969.0 us 7.878 us 62191.29 us timekeeping_max_deferment 252165 141177.0 us 0.559 us 3782.924 us unlink_anon_vmas 3403 137080.0 us 40.282 us 17090.37 us __vma_link_rb 3403 136104.0 us 39.995 us 12357.79 us rcu_process_callbacks 1050 135701.0 us 129.239 us 66292.86 us SyS_close 1977 135041.0 us 68.306 us 4406.697 us pick_next_task_fair 3996 133622.0 us 33.438 us 1391.499 us proc_readfd 50 133185.0 us 2663.700 us 18788258 us proc_readfd_common 50 132911.0 us 2658.220 us 18769391 us generic_perform_write 206 132888.0 us 645.087 us 134152.0 us __mix_pool_bytes 16914 128343.0 us 7.587 us 303.814 us run_timer_softirq 1960 127980.0 us 65.295 us 13337.43 us timekeeping_update 5956 127128.0 us 21.344 us 147.035 us try_to_wake_up 2348 125639.0 us 53.508 us 5318.431 us ktime_add_safe 241886 125624.0 us 0.519 us 6352.310 us __rcu_process_callbacks 2100 125522.0 us 59.772 us 1422883 us __wake_up 19318 121496.0 us 6.289 us 54037.07 us tick_nohz_idle_enter 394 119544.0 us 303.411 us 24204.73 us proc_fill_cache 402 119333.0 us 296.848 us 32614.19 us unix_dgram_sendmsg 144 119088.0 us 827.000 us 197140.4 us enqueue_hrtimer 99205 118756.0 us 1.197 us 1634.105 us call_usermodehelper_exec_async 12 117852.0 us 9821.000 us 319067.4 us mutex_lock 14089 117414.0 us 8.333 us 245056.5 us sock_recvmsg 548 116787.0 us 213.114 us 20249.99 us __smsc95xx_read_reg 87 116418.0 us 1338.137 us 39000.81 us usbnet_read_cmd 87 115523.0 us 1327.850 us 37075.87 us SyS_epoll_wait 336 114271.0 us 340.092 us 47629.77 us __close_fd 1977 113569.0 us 57.445 us 3969.846 us split_vma 501 112236.0 us 224.023 us 20703.73 us vma_link 1248 111097.0 us 89.020 us 5058.706 us __usbnet_read_cmd 87 111024.0 us 1276.137 us 34288.14 us schedule_hrtimeout_range 390 107480.0 us 275.589 us 10798.57 us dentry_unlink_inode 733 107440.0 us 146.575 us 8915.389 us unix_find_other 172 106799.0 us 620.924 us 79499.41 us iput 757 106445.0 us 140.614 us 8828.710 us ext4_htree_fill_tree 36 106438.0 us 2956.611 us 10423331 us kernfs_fop_read 198 105215.0 us 531.388 us 48887.87 us htree_dirblock_to_tree 36 104352.0 us 2898.666 us 10008360 us schedule_hrtimeout_range_clock 390 103931.0 us 266.489 us 10595.17 us SyS_connect 99 102476.0 us 1035.111 us 160650.8 us __mark_inode_dirty 641 102069.0 us 159.234 us 35047.69 us expire_timers 700 100145.0 us 143.064 us 8519.548 us do_wait 42 99498.00 us 2369.000 us 11701021 us lookup_slow 413 99137.00 us 240.041 us 40034.02 us release_task 29 98995.00 us 3413.620 us 12343735 us SyS_fstat64 1168 98993.00 us 84.754 us 5326.254 us wait_consider_task 444 98839.00 us 222.610 us 1532328 us kmem_cache_free 17783 98715.00 us 5.551 us 72804.30 us ...
Hi, On Wed, Apr 19, 2017 at 1:25 PM, Stefan Wahren <stefan.wahren@i2se.com> wrote: > Hi, > >> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: >> >> >> It's hard to know for sure that all of this time is really in >> urb_enqueue(). Possible we could have task switched out and been >> blocked elsewhere. Using ftrace to get more fine-grained timings >> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your >> friends here if you want to use trace_printk. > > i'm a newbie to ftrace, so i hope this would be helpful. > > # connect PL2303 to the onboard hub > # echo 0 > options/sleep-time > # echo 0 > function_profile_enabled > # echo 1 > function_profile_enabled > # ./usb_test > # Waiting for at least 20 seconds and then disconnect PL2303 > # echo 0 > function_profile_enabled > # cat trace_stat/function0 > > Function Hit Time Avg s^2 > -------- --- ---- --- --- > bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us > __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us > generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us > irq_exit 1082482 98197771 us 90.715 us 29649040 us > handle_level_irq 1082482 95812379 us 88.511 us 51910093 us > do_sys_open 1806 87612983 us 48512.17 us 2198507 us > SyS_open 1601 87372331 us 54573.59 us 1898996 us > do_filp_open 1862 87368058 us 46921.62 us 1634982 us > path_openat 1862 87314553 us 46892.88 us 3357817 us > __do_softirq 3035 86266050 us 28423.73 us 6449768 us > vfs_open 1515 85877012 us 56684.49 us 101673.5 us > do_dentry_open 1515 85861429 us 56674.21 us 812420.7 us > usb_submit_urb 136 85760172 us 630589.5 us 59532024 us > usb_hcd_submit_urb 134 85756518 us 639974.0 us 726298102 us > _dwc2_hcd_urb_enqueue 134 85738333 us 639838.3 us 874104371 us The 134 calls to this are taking a ton of time. It would be nice to know where all the time actually was in here. Are you on a single-core device, or multi-core? Oh, probably this so slow because we're just getting interrupted constantly. You can see that during your trace 80686112 us was in handle_irq_event(). Presumably all of that time could have been counted towards whatever code you were in when the interrupt went off. Of that 80 seconds, 60 seconds or so was usb_hcd_irq, and of that 57 seconds was in _dwc2_hcd_irq(). Presumably the big part of your problem is that dwc2_handle_hcd_intr() is just too slow. From glancing at the numbers below, it seems likely that nothing in particular is slow, it's just running a lot of code and (in total) all of that is slow. ...but someone with more time on their hands would need to really debug. One thing that would be interesting would be to see if you can increase the bug clock for talking to the dwc2 controller. Possibly some of those memory mapped IOs talking to dwc2 could be making everything slow? Other than that, maybe you can find some way to optimize the code in dwc2 so it runs faster, at least in the cases you care about... > chrdev_open 87 85716519 us 985247.3 us 1951112835 us > tty_open 3 85714494 us 28571498 us 3743206849 us > tty_port_open 3 85712603 us 28570867 us 1968003468 us > serial_open 1 85712472 us 85712472 us 0.000 us > serial_port_activate 1 85709890 us 85709890 us 0.000 us > pl2303_open 1 85709875 us 85709875 us 0.000 us > usb_serial_generic_open 1 85701170 us 85701170 us 0.000 us > usb_serial_generic_submit_read 1 85701166 us 85701166 us 0.000 us > usb_serial_generic_submit_read 2 85701160 us 42850580 us 2252410463 us > handle_irq_event 1082482 80686112 us 74.538 us 36417905 us > handle_irq_event_percpu 1082482 78398378 us 72.424 us 34060119 us > __handle_irq_event_percpu 1082482 68466046 us 63.249 us 53609076 us > usb_hcd_irq 1078945 59761116 us 55.388 us 25139388 us > _dwc2_hcd_irq 1078945 57481606 us 53.275 us 30847036 us > dwc2_handle_hcd_intr 1078945 55380910 us 51.328 us 35986237 us > do_idle 394 43577379 us 110602.4 us 84278209 us > dwc2_hc_n_intr 651065 42408491 us 65.137 us 7281229 us > default_idle_call 229969 38021287 us 165.332 us 11386904 us > arch_cpu_idle 229969 37379305 us 162.540 us 2168909 us > dwc2_halt_channel 650811 34208348 us 52.562 us 7792012 us > dwc2_release_channel 651065 33042638 us 50.751 us 8112754 us > dwc2_hc_nak_intr 334689 18892951 us 56.449 us 1823592 us > dwc2_hc_ack_intr 315986 18323433 us 57.988 us 151346.6 us > dwc2_hcd_select_transactions 1730010 11694955 us 6.760 us 1188442 us > dwc2_hcd_queue_transactions 651065 10931854 us 16.790 us 952074.2 us > dwc2_assign_and_init_hc 651065 9540885 us 14.654 us 258067.1 us > dwc2_queue_transaction 651065 9192559 us 14.119 us 736431.6 us > tick_nohz_irq_exit 325055 9052192 us 27.848 us 916293.3 us > __tick_nohz_idle_enter.constpr 324760 8463861 us 26.061 us 2410440 us > dwc2_hc_start_transfer 651065 7687376 us 11.807 us 625773.3 us > irq_enter 1082482 6627959 us 6.122 us 11528932 us > arm_heavy_mb 10903913 6603586 us 0.605 us 60562924 us > unmask_irq 1082482 4763699 us 4.400 us 1032638 us > tick_irq_enter 326467 4377088 us 13.407 us 4606805 us > dwc2_handle_common_intr 1078945 3799392 us 3.521 us 1131504 us > add_interrupt_randomness 1082482 3542044 us 3.272 us 30545511 us > get_next_armctrl_hwirq 1443829 3276974 us 2.269 us 2116848 us > tick_nohz_restart 70325 3250462 us 46.220 us 35710.73 us > dwc2_hc_cleanup 651065 3079003 us 4.729 us 81202.50 us > dwc2_hcd_qh_deactivate 651065 3039940 us 4.669 us 90928.10 us > ledtrig_cpu 459938 2450327 us 5.327 us 1466705 us > arch_cpu_idle_exit 229969 2443315 us 10.624 us 8606787 us > do_page_fault 5509 2304998 us 418.405 us 17961439 us > ktime_get 795998 2121741 us 2.665 us 166119.5 us > get_next_timer_interrupt 324760 1998999 us 6.155 us 32201.63 us > handle_mm_fault 5537 1945500 us 351.363 us 8324917 us > dwc2_is_controller_alive 2157890 1473590 us 0.682 us 291399.4 us > hrtimer_cancel 70450 1394330 us 19.791 us 36980.28 us > arch_cpu_idle_enter 229969 1350759 us 5.873 us 10789.94 us > hrtimer_try_to_cancel 71029 1266967 us 17.837 us 30260.75 us > filemap_map_pages 1632 1263006 us 773.900 us 207846.4 us > tick_program_event 141799 1260485 us 8.889 us 9343.194 us > hrtimer_start_range_ns 97555 1253698 us 12.851 us 43166.94 us > irq_to_desc 1082482 1146955 us 1.059 us 546937.6 us > link_path_walk 3486 1134351 us 325.401 us 484443.0 us > __remove_hrtimer 99205 999724.0 us 10.077 us 201933.9 us > clockevents_program_event 141799 985346.0 us 6.948 us 7114.725 us > note_interrupt 1082482 899494.0 us 0.830 us 705141.6 us > update_ts_time_stats.constprop 324760 862996.0 us 2.657 us 21813.81 us > alloc_set_pte 20011 805880.0 us 40.271 us 1086748 us > SyS_read 1940 805061.0 us 414.979 us 164198.7 us > _local_bh_enable 326467 777908.0 us 2.382 us 37202.43 us > vfs_read 2080 761487.0 us 366.099 us 22944777 us > dwc2_hcd_get_frame_number 1097673 721108.0 us 0.656 us 525348.1 us > do_munmap 1194 717161.0 us 600.637 us 65635.31 us > schedule 3297 700578.0 us 212.489 us 28594.59 us > __vfs_read 2080 694376.0 us 333.834 us 21955675 us > walk_component 9912 683211.0 us 68.927 us 349771.7 us > irq_find_mapping 1082482 673162.0 us 0.621 us 900861.5 us > __next_timer_interrupt 324929 667175.0 us 2.053 us 4576.720 us > armctrl_translate_shortcut 1079274 662875.0 us 0.614 us 164464.8 us > filename_lookup 1316 646059.0 us 490.926 us 148884.8 us > mmput 259 644417.0 us 2488.096 us 33155247 us > armctrl_unmask_irq 1082482 639692.0 us 0.590 us 891074.7 us > armctrl_mask_irq 1082482 636505.0 us 0.588 us 888583.1 us > exit_mmap 45 634389.0 us 14097.53 us 24105769 us > irq_state_set_masked 1082482 622182.0 us 0.574 us 894212.6 us > user_path_at_empty 1156 620229.0 us 536.530 us 166865.3 us > SyS_munmap 895 600613.0 us 671.075 us 75763.68 us > vm_mmap_pgoff 1437 598955.0 us 416.809 us 55882.41 us > read_current_timer 1082666 598757.0 us 0.553 us 86817.59 us > irq_may_run 1082482 598471.0 us 0.552 us 898308.6 us > vm_munmap 937 595309.0 us 635.335 us 70914.31 us > path_lookupat 1316 592448.0 us 450.188 us 147624.4 us > SyS_mmap_pgoff 1325 588992.0 us 444.522 us 62142.49 us > irq_state_clr_masked 1082482 582077.0 us 0.537 us 893572.4 us > idle_cpu 1082482 580428.0 us 0.536 us 905609.5 us > do_execve 28 573864.0 us 20495.14 us 98356111 us > do_execveat_common 28 573576.0 us 20484.85 us 98471441 us > dwc2_hc_set_even_odd_frame 651065 570127.0 us 0.875 us 543701.5 us > ledtrig_cpu.part.0 229969 549060.0 us 2.387 us 9142.920 us > do_mmap 1437 548817.0 us 381.918 us 54270.64 us > do_work_pending 2211 525805.0 us 237.813 us 234836.0 us > led_trigger_event 460228 501093.0 us 1.088 us 4759241 us > search_binary_handler 28 501026.0 us 17893.78 us 93445723 us > clocksource_mmio_readl_up 807791 500765.0 us 0.619 us 91050.42 us > load_elf_binary 28 500359.0 us 17869.96 us 93309220 us > process_one_work 558 496994.0 us 890.670 us 7196614 us > tick_do_update_jiffies64 324826 496168.0 us 1.527 us 3422867 us > hrtimer_forward 72095 490887.0 us 6.808 us 4548.034 us > unmap_vmas 1239 490538.0 us 395.914 us 2723415 us > finish_task_switch 3926 479983.0 us 122.257 us 25336.71 us > unmap_single_vma 3403 474812.0 us 139.527 us 959184.6 us > __sync_icache_dcache 27231 468069.0 us 17.188 us 364740.1 us > SyS_execve 16 462210.0 us 28888.12 us 4450186 us > unmap_region 1194 460297.0 us 385.508 us 44372.58 us > unmap_page_range 3403 455941.0 us 133.982 us 956896.1 us > bcm2835_time_interrupt 3208 455584.0 us 142.014 us 56012.72 us > mmap_region 1437 449599.0 us 312.873 us 55175.49 us > hrtimer_interrupt 3208 443908.0 us 138.375 us 55446.73 us > dwc2_hcd_qh_add 632341 436511.0 us 0.690 us 58787.66 us > lookup_fast 11601 436227.0 us 37.602 us 570183.6 us > inode_permission 14699 426528.0 us 29.017 us 404007.8 us > SyS_access 561 412200.0 us 734.759 us 146663.8 us > dwc2_hcd_qh_unlink 632344 410157.0 us 0.648 us 87505.07 us > seq_read 669 408885.0 us 611.188 us 226703.8 us > SyS_faccessat 564 406969.0 us 721.576 us 141955.4 us > dwc2_check_qtd_still_ok 651065 392283.0 us 0.602 us 59460.80 us > SyS_getdents64 392 379673.0 us 968.553 us 13678636 us > SyS_write 404 378782.0 us 937.579 us 254399.0 us > iterate_dir 393 367777.0 us 935.819 us 13659529 us > vfs_write 404 358326.0 us 886.945 us 240956.8 us > __hrtimer_run_queues.constprop 3227 357720.0 us 110.852 us 47783.12 us > task_work_run 1460 350902.0 us 240.343 us 71379.18 us > _cond_resched 109971 344855.0 us 3.135 us 2245087 us > __vfs_write 404 333260.0 us 824.900 us 230347.3 us > ____fput 1680 326232.0 us 194.185 us 40913.99 us > __inode_permission 14699 324102.0 us 22.049 us 328791.7 us > __fput 1680 316285.0 us 188.264 us 40874.09 us > flush_old_exec 28 313336.0 us 11190.57 us 96843591 us > tick_sched_timer 1964 312793.0 us 159.263 us 2002.533 us > do_wp_page 1758 300693.0 us 171.042 us 17667.03 us > _do_fork 29 281388.0 us 9703.034 us 54207832 us > copy_process.part.4 29 273985.0 us 9447.758 us 53888393 us > SyS_clone 17 267170.0 us 15715.88 us 2444683 us > free_pgtables 1239 265593.0 us 214.360 us 500428.0 us > dput 15879 265579.0 us 16.725 us 269685.7 us > SyS_openat 205 258191.0 us 1259.468 us 209643.6 us > pfn_valid 18146 257726.0 us 14.202 us 121416.6 us > update_wall_time 5957 250787.0 us 42.099 us 619.230 us > mmc_blk_issue_rq 63 246608.0 us 3914.412 us 8858815 us > mmc_blk_issue_rw_rq 63 241626.0 us 3835.333 us 8884205 us > vfs_statx 441 232198.0 us 526.526 us 150124.0 us > SyS_sendmsg 316 226698.0 us 717.398 us 164208.5 us > mmc_start_request 145 225615.0 us 1555.965 us 5542154 us > __sys_sendmsg 316 223856.0 us 708.405 us 164907.1 us > wp_page_copy 1061 222841.0 us 210.029 us 13010.76 us > __generic_file_write_iter 206 219092.0 us 1063.553 us 242948.7 us > terminate_walk 3225 217649.0 us 67.488 us 47336.14 us > ext4_file_write_iter 170 216280.0 us 1272.235 us 127738.3 us > __mmc_start_request 145 214090.0 us 1476.482 us 5509235 us > ___sys_sendmsg 316 212732.0 us 673.202 us 161410.8 us > kmem_cache_alloc 17570 210590.0 us 11.985 us 120917.2 us > mmc_start_areq 63 210528.0 us 3341.714 us 9610382 us > sdhci_request 145 209828.0 us 1447.089 us 5547701 us > sock_sendmsg 353 208523.0 us 590.716 us 147666.9 us > __split_vma 981 206458.0 us 210.456 us 13253.43 us > credit_entropy_bits 16914 205434.0 us 12.145 us 208.830 us > hrtimer_get_next_event 324760 199659.0 us 0.614 us 17454.58 us > path_put 6090 196782.0 us 32.312 us 54264.43 us > SyS_mprotect 427 193190.0 us 452.435 us 42168.10 us > SyS_stat64 370 181587.0 us 490.775 us 119707.4 us > nr_iowait_cpu 324760 176767.0 us 0.544 us 17782.33 us > __vma_adjust 1452 174253.0 us 120.008 us 5746.323 us > tick_sched_handle 1964 174004.0 us 88.596 us 419.232 us > sdhci_irq 211 173984.0 us 824.568 us 4212897 us > touch_softlockup_watchdog_sche 323207 173767.0 us 0.537 us 108651.0 us > kernfs_iop_permission 5012 173442.0 us 34.605 us 21830.09 us > __local_bh_enable 329502 172898.0 us 0.524 us 9002.660 us > proc_single_show 233 170938.0 us 733.639 us 235941.1 us > mprotect_fixup 455 163622.0 us 359.608 us 43581.82 us > SyS_recvmsg 515 163337.0 us 317.159 us 23749.85 us > update_process_times 1964 161227.0 us 82.091 us 436.558 us > usb_control_msg 131 160306.0 us 1223.709 us 63779.20 us > __sys_recvmsg 515 159685.0 us 310.067 us 23888.88 us > check_carrier 29 159209.0 us 5489.965 us 157176.1 us > __dentry_kill 1306 159044.0 us 121.779 us 13522.69 us > smsc95xx_mdio_read 29 156605.0 us 5400.172 us 147578.2 us > __smsc95xx_mdio_read 29 156128.0 us 5383.724 us 153872.9 us > ext4_readdir 60 152036.0 us 2533.933 us 67111799 us > run_ksoftirqd 844 151846.0 us 179.912 us 74010.42 us > usb_start_wait_urb 131 151181.0 us 1154.053 us 59721.86 us > find_vma 8806 149418.0 us 16.967 us 419885.5 us > __alloc_pages_nodemask 4365 143647.0 us 32.908 us 22236.43 us > ___sys_recvmsg 515 143567.0 us 278.770 us 23183.05 us > memblock_is_map_memory 18146 142969.0 us 7.878 us 62191.29 us > timekeeping_max_deferment 252165 141177.0 us 0.559 us 3782.924 us > unlink_anon_vmas 3403 137080.0 us 40.282 us 17090.37 us > __vma_link_rb 3403 136104.0 us 39.995 us 12357.79 us > rcu_process_callbacks 1050 135701.0 us 129.239 us 66292.86 us > SyS_close 1977 135041.0 us 68.306 us 4406.697 us > pick_next_task_fair 3996 133622.0 us 33.438 us 1391.499 us > proc_readfd 50 133185.0 us 2663.700 us 18788258 us > proc_readfd_common 50 132911.0 us 2658.220 us 18769391 us > generic_perform_write 206 132888.0 us 645.087 us 134152.0 us > __mix_pool_bytes 16914 128343.0 us 7.587 us 303.814 us > run_timer_softirq 1960 127980.0 us 65.295 us 13337.43 us > timekeeping_update 5956 127128.0 us 21.344 us 147.035 us > try_to_wake_up 2348 125639.0 us 53.508 us 5318.431 us > ktime_add_safe 241886 125624.0 us 0.519 us 6352.310 us > __rcu_process_callbacks 2100 125522.0 us 59.772 us 1422883 us > __wake_up 19318 121496.0 us 6.289 us 54037.07 us > tick_nohz_idle_enter 394 119544.0 us 303.411 us 24204.73 us > proc_fill_cache 402 119333.0 us 296.848 us 32614.19 us > unix_dgram_sendmsg 144 119088.0 us 827.000 us 197140.4 us > enqueue_hrtimer 99205 118756.0 us 1.197 us 1634.105 us > call_usermodehelper_exec_async 12 117852.0 us 9821.000 us 319067.4 us > mutex_lock 14089 117414.0 us 8.333 us 245056.5 us > sock_recvmsg 548 116787.0 us 213.114 us 20249.99 us > __smsc95xx_read_reg 87 116418.0 us 1338.137 us 39000.81 us > usbnet_read_cmd 87 115523.0 us 1327.850 us 37075.87 us > SyS_epoll_wait 336 114271.0 us 340.092 us 47629.77 us > __close_fd 1977 113569.0 us 57.445 us 3969.846 us > split_vma 501 112236.0 us 224.023 us 20703.73 us > vma_link 1248 111097.0 us 89.020 us 5058.706 us > __usbnet_read_cmd 87 111024.0 us 1276.137 us 34288.14 us > schedule_hrtimeout_range 390 107480.0 us 275.589 us 10798.57 us > dentry_unlink_inode 733 107440.0 us 146.575 us 8915.389 us > unix_find_other 172 106799.0 us 620.924 us 79499.41 us > iput 757 106445.0 us 140.614 us 8828.710 us > ext4_htree_fill_tree 36 106438.0 us 2956.611 us 10423331 us > kernfs_fop_read 198 105215.0 us 531.388 us 48887.87 us > htree_dirblock_to_tree 36 104352.0 us 2898.666 us 10008360 us > schedule_hrtimeout_range_clock 390 103931.0 us 266.489 us 10595.17 us > SyS_connect 99 102476.0 us 1035.111 us 160650.8 us > __mark_inode_dirty 641 102069.0 us 159.234 us 35047.69 us > expire_timers 700 100145.0 us 143.064 us 8519.548 us > do_wait 42 99498.00 us 2369.000 us 11701021 us > lookup_slow 413 99137.00 us 240.041 us 40034.02 us > release_task 29 98995.00 us 3413.620 us 12343735 us > SyS_fstat64 1168 98993.00 us 84.754 us 5326.254 us > wait_consider_task 444 98839.00 us 222.610 us 1532328 us > kmem_cache_free 17783 98715.00 us 5.551 us 72804.30 us > > ...
Am 19.04.2017 um 23:47 schrieb Doug Anderson: > Hi, > > On Wed, Apr 19, 2017 at 1:25 PM, Stefan Wahren <stefan.wahren@i2se.com> wrote: >> Hi, >> >>> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: >>> >>> >>> It's hard to know for sure that all of this time is really in >>> urb_enqueue(). Possible we could have task switched out and been >>> blocked elsewhere. Using ftrace to get more fine-grained timings >>> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your >>> friends here if you want to use trace_printk. >> i'm a newbie to ftrace, so i hope this would be helpful. >> >> # connect PL2303 to the onboard hub >> # echo 0 > options/sleep-time >> # echo 0 > function_profile_enabled >> # echo 1 > function_profile_enabled >> # ./usb_test >> # Waiting for at least 20 seconds and then disconnect PL2303 >> # echo 0 > function_profile_enabled >> # cat trace_stat/function0 >> >> Function Hit Time Avg s^2 >> -------- --- ---- --- --- >> bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us >> __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us >> generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us >> irq_exit 1082482 98197771 us 90.715 us 29649040 us >> handle_level_irq 1082482 95812379 us 88.511 us 51910093 us >> do_sys_open 1806 87612983 us 48512.17 us 2198507 us >> SyS_open 1601 87372331 us 54573.59 us 1898996 us >> do_filp_open 1862 87368058 us 46921.62 us 1634982 us >> path_openat 1862 87314553 us 46892.88 us 3357817 us >> __do_softirq 3035 86266050 us 28423.73 us 6449768 us >> vfs_open 1515 85877012 us 56684.49 us 101673.5 us >> do_dentry_open 1515 85861429 us 56674.21 us 812420.7 us >> usb_submit_urb 136 85760172 us 630589.5 us 59532024 us >> usb_hcd_submit_urb 134 85756518 us 639974.0 us 726298102 us >> _dwc2_hcd_urb_enqueue 134 85738333 us 639838.3 us 874104371 us > The 134 calls to this are taking a ton of time. It would be nice to > know where all the time actually was in here. Are you on a > single-core device, or multi-core? Raspberry Pi B is a single core device which runs with 700 MHz cpu freq. > Oh, probably this so slow because we're just getting interrupted > constantly. You can see that during your trace 80686112 us was in > handle_irq_event(). Presumably all of that time could have been > counted towards whatever code you were in when the interrupt went off. > Of that 80 seconds, 60 seconds or so was usb_hcd_irq, and of that 57 > seconds was in _dwc2_hcd_irq(). Since the FTDI (full speed device) and the PL2303 (also full speed device) behaves differently, i will try to compare those ftraces. > > Presumably the big part of your problem is that dwc2_handle_hcd_intr() > is just too slow. From glancing at the numbers below, it seems likely > that nothing in particular is slow, it's just running a lot of code > and (in total) all of that is slow. ...but someone with more time on > their hands would need to really debug. > > One thing that would be interesting would be to see if you can > increase the bug clock for talking to the dwc2 controller. Unfortunately the bcm2835 datasheet [1] doesn't provide many information about the USB IP core and the Synopsys documents aren't public available. The Raspberry Pi Foundation decided to use a different driver which uses FIQ in the downstream tree [2], but this won't be an option here. Maybe i could start with comparing the register settings between dwc2 and dwc_otg. [1] - https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf [2] - https://github.com/raspberrypi/linux/commit/65165df02f2ba9feccebf45d8c732dffb34b6109 > Possibly > some of those memory mapped IOs talking to dwc2 could be making > everything slow? > > Other than that, maybe you can find some way to optimize the code in > dwc2 so it runs faster, at least in the cases you care about... >
Hi, On Thu, Apr 20, 2017 at 12:46 AM, Stefan Wahren <stefan.wahren@i2se.com> wrote: > Am 19.04.2017 um 23:47 schrieb Doug Anderson: >> Hi, >> >> On Wed, Apr 19, 2017 at 1:25 PM, Stefan Wahren <stefan.wahren@i2se.com> wrote: >>> Hi, >>> >>>> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: >>>> >>>> >>>> It's hard to know for sure that all of this time is really in >>>> urb_enqueue(). Possible we could have task switched out and been >>>> blocked elsewhere. Using ftrace to get more fine-grained timings >>>> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your >>>> friends here if you want to use trace_printk. >>> i'm a newbie to ftrace, so i hope this would be helpful. >>> >>> # connect PL2303 to the onboard hub >>> # echo 0 > options/sleep-time >>> # echo 0 > function_profile_enabled >>> # echo 1 > function_profile_enabled >>> # ./usb_test >>> # Waiting for at least 20 seconds and then disconnect PL2303 >>> # echo 0 > function_profile_enabled >>> # cat trace_stat/function0 >>> >>> Function Hit Time Avg s^2 >>> -------- --- ---- --- --- >>> bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us >>> __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us >>> generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us >>> irq_exit 1082482 98197771 us 90.715 us 29649040 us >>> handle_level_irq 1082482 95812379 us 88.511 us 51910093 us >>> do_sys_open 1806 87612983 us 48512.17 us 2198507 us >>> SyS_open 1601 87372331 us 54573.59 us 1898996 us >>> do_filp_open 1862 87368058 us 46921.62 us 1634982 us >>> path_openat 1862 87314553 us 46892.88 us 3357817 us >>> __do_softirq 3035 86266050 us 28423.73 us 6449768 us >>> vfs_open 1515 85877012 us 56684.49 us 101673.5 us >>> do_dentry_open 1515 85861429 us 56674.21 us 812420.7 us >>> usb_submit_urb 136 85760172 us 630589.5 us 59532024 us >>> usb_hcd_submit_urb 134 85756518 us 639974.0 us 726298102 us >>> _dwc2_hcd_urb_enqueue 134 85738333 us 639838.3 us 874104371 us >> The 134 calls to this are taking a ton of time. It would be nice to >> know where all the time actually was in here. Are you on a >> single-core device, or multi-core? > > Raspberry Pi B is a single core device which runs with 700 MHz cpu freq. > >> Oh, probably this so slow because we're just getting interrupted >> constantly. You can see that during your trace 80686112 us was in >> handle_irq_event(). Presumably all of that time could have been >> counted towards whatever code you were in when the interrupt went off. >> Of that 80 seconds, 60 seconds or so was usb_hcd_irq, and of that 57 >> seconds was in _dwc2_hcd_irq(). > > Since the FTDI (full speed device) and the PL2303 (also full speed > device) behaves differently, i will try to compare those ftraces. The most interesting to know is what the endpoints looked like for these two devices. Since they are full speed devices, I'd expect that any interrupt end points should fire once per "ms" at most. >> Presumably the big part of your problem is that dwc2_handle_hcd_intr() >> is just too slow. From glancing at the numbers below, it seems likely >> that nothing in particular is slow, it's just running a lot of code >> and (in total) all of that is slow. ...but someone with more time on >> their hands would need to really debug. >> >> One thing that would be interesting would be to see if you can >> increase the bug clock for talking to the dwc2 controller. > > Unfortunately the bcm2835 datasheet [1] doesn't provide many information > about the USB IP core and the Synopsys documents aren't public > available. The Raspberry Pi Foundation decided to use a different driver > which uses FIQ in the downstream tree [2], but this won't be an option here. > > Maybe i could start with comparing the register settings between dwc2 > and dwc_otg. I was suggesting looking at clock speeds related to usb that might be in /sys/kernel/debug/clk/clk_summary ...but at 700 MHz single core (and I think it's also an older ARM core so 700 MHz there might be a lot slower than 700 MHz on newer cores), the theory that you're just running too many instructions is a sane one. Probably someone just needs to see if there's any way to fast path some of the common cases in the IRQ handler. -Doug
Stefan Wahren <stefan.wahren@i2se.com> writes: > Hi, > >> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: >> >> >> It's hard to know for sure that all of this time is really in >> urb_enqueue(). Possible we could have task switched out and been >> blocked elsewhere. Using ftrace to get more fine-grained timings >> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your >> friends here if you want to use trace_printk. > > i'm a newbie to ftrace, so i hope this would be helpful. > > # connect PL2303 to the onboard hub > # echo 0 > options/sleep-time > # echo 0 > function_profile_enabled > # echo 1 > function_profile_enabled > # ./usb_test > # Waiting for at least 20 seconds and then disconnect PL2303 > # echo 0 > function_profile_enabled > # cat trace_stat/function0 > > Function Hit Time Avg s^2 > -------- --- ---- --- --- > bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us > __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us > generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us > irq_exit 1082482 98197771 us 90.715 us 29649040 us > handle_level_irq 1082482 95812379 us 88.511 us 51910093 us If I'm reading this output right, we're spending half of our interrupt processing time in irq_exit(), so even if dwc2's interrupt was free (the generic_handle_irq() chain), we'd be eating about half the CPU getting back out of the interrupt handler, right? I don't really know anything about DWC2 or USB, but is there any way we could mitigate the interrupt frequency with this hardware? If nothing else, could we loop reading gintsts until it reads back 0?
Hi, On Thu, Apr 20, 2017 at 11:54 AM, Eric Anholt <eric@anholt.net> wrote: > Stefan Wahren <stefan.wahren@i2se.com> writes: > >> Hi, >> >>> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: >>> >>> >>> It's hard to know for sure that all of this time is really in >>> urb_enqueue(). Possible we could have task switched out and been >>> blocked elsewhere. Using ftrace to get more fine-grained timings >>> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your >>> friends here if you want to use trace_printk. >> >> i'm a newbie to ftrace, so i hope this would be helpful. >> >> # connect PL2303 to the onboard hub >> # echo 0 > options/sleep-time >> # echo 0 > function_profile_enabled >> # echo 1 > function_profile_enabled >> # ./usb_test >> # Waiting for at least 20 seconds and then disconnect PL2303 >> # echo 0 > function_profile_enabled >> # cat trace_stat/function0 >> >> Function Hit Time Avg s^2 >> -------- --- ---- --- --- >> bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us >> __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us >> generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us >> irq_exit 1082482 98197771 us 90.715 us 29649040 us >> handle_level_irq 1082482 95812379 us 88.511 us 51910093 us > > If I'm reading this output right, we're spending half of our interrupt > processing time in irq_exit(), so even if dwc2's interrupt was free (the > generic_handle_irq() chain), we'd be eating about half the CPU getting > back out of the interrupt handler, right? > > I don't really know anything about DWC2 or USB, but is there any way we > could mitigate the interrupt frequency with this hardware? If nothing > else, could we loop reading gintsts until it reads back 0? Take ftrace with a little bit of a grain of salt, especially on older / slower ARMs (without the arch timer). Whenever ftrace takes a log it grabs a timestamp. This can be an expensive (ish) operation. Even on newer CPUs it's still not free if you call it as much as ftrace, but on older CPUs it's extra expensive. I spent a chunk of time working on optimizations for that on exynos since it showed up in profiles as an expensive operation (Chrome asks for the time a lot during its internal profiling). Some of that type of data is in commit 3252a646aa2c ("clocksource: exynos_mct: Only use 32-bits where possible").
Doug Anderson <dianders@chromium.org> writes: > Hi, > > On Thu, Apr 20, 2017 at 11:54 AM, Eric Anholt <eric@anholt.net> wrote: >> Stefan Wahren <stefan.wahren@i2se.com> writes: >> >>> Hi, >>> >>>> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: >>>> >>>> >>>> It's hard to know for sure that all of this time is really in >>>> urb_enqueue(). Possible we could have task switched out and been >>>> blocked elsewhere. Using ftrace to get more fine-grained timings >>>> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your >>>> friends here if you want to use trace_printk. >>> >>> i'm a newbie to ftrace, so i hope this would be helpful. >>> >>> # connect PL2303 to the onboard hub >>> # echo 0 > options/sleep-time >>> # echo 0 > function_profile_enabled >>> # echo 1 > function_profile_enabled >>> # ./usb_test >>> # Waiting for at least 20 seconds and then disconnect PL2303 >>> # echo 0 > function_profile_enabled >>> # cat trace_stat/function0 >>> >>> Function Hit Time Avg s^2 >>> -------- --- ---- --- --- >>> bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us >>> __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us >>> generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us >>> irq_exit 1082482 98197771 us 90.715 us 29649040 us >>> handle_level_irq 1082482 95812379 us 88.511 us 51910093 us >> >> If I'm reading this output right, we're spending half of our interrupt >> processing time in irq_exit(), so even if dwc2's interrupt was free (the >> generic_handle_irq() chain), we'd be eating about half the CPU getting >> back out of the interrupt handler, right? >> >> I don't really know anything about DWC2 or USB, but is there any way we >> could mitigate the interrupt frequency with this hardware? If nothing >> else, could we loop reading gintsts until it reads back 0? > > Take ftrace with a little bit of a grain of salt, especially on older > / slower ARMs (without the arch timer). Whenever ftrace takes a log > it grabs a timestamp. This can be an expensive (ish) operation. Even > on newer CPUs it's still not free if you call it as much as ftrace, > but on older CPUs it's extra expensive. If per-function timestamp cost was the problem, shouldn't I expect to see a bunch of irq_exit()'s children each taking a bit of time? We have a long callchain with the functions each taking a bit of time in the dwc2 interrupt handler, but irq_exit() seems to be a monolithic cost.
Hi, On Thu, Apr 20, 2017 at 12:57 PM, Eric Anholt <eric@anholt.net> wrote: > Doug Anderson <dianders@chromium.org> writes: > >> Hi, >> >> On Thu, Apr 20, 2017 at 11:54 AM, Eric Anholt <eric@anholt.net> wrote: >>> Stefan Wahren <stefan.wahren@i2se.com> writes: >>> >>>> Hi, >>>> >>>>> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: >>>>> >>>>> >>>>> It's hard to know for sure that all of this time is really in >>>>> urb_enqueue(). Possible we could have task switched out and been >>>>> blocked elsewhere. Using ftrace to get more fine-grained timings >>>>> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your >>>>> friends here if you want to use trace_printk. >>>> >>>> i'm a newbie to ftrace, so i hope this would be helpful. >>>> >>>> # connect PL2303 to the onboard hub >>>> # echo 0 > options/sleep-time >>>> # echo 0 > function_profile_enabled >>>> # echo 1 > function_profile_enabled >>>> # ./usb_test >>>> # Waiting for at least 20 seconds and then disconnect PL2303 >>>> # echo 0 > function_profile_enabled >>>> # cat trace_stat/function0 >>>> >>>> Function Hit Time Avg s^2 >>>> -------- --- ---- --- --- >>>> bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us >>>> __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us >>>> generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us >>>> irq_exit 1082482 98197771 us 90.715 us 29649040 us >>>> handle_level_irq 1082482 95812379 us 88.511 us 51910093 us >>> >>> If I'm reading this output right, we're spending half of our interrupt >>> processing time in irq_exit(), so even if dwc2's interrupt was free (the >>> generic_handle_irq() chain), we'd be eating about half the CPU getting >>> back out of the interrupt handler, right? >>> >>> I don't really know anything about DWC2 or USB, but is there any way we >>> could mitigate the interrupt frequency with this hardware? If nothing >>> else, could we loop reading gintsts until it reads back 0? >> >> Take ftrace with a little bit of a grain of salt, especially on older >> / slower ARMs (without the arch timer). Whenever ftrace takes a log >> it grabs a timestamp. This can be an expensive (ish) operation. Even >> on newer CPUs it's still not free if you call it as much as ftrace, >> but on older CPUs it's extra expensive. > > If per-function timestamp cost was the problem, shouldn't I expect to > see a bunch of irq_exit()'s children each taking a bit of time? We have > a long callchain with the functions each taking a bit of time in the > dwc2 interrupt handler, but irq_exit() seems to be a monolithic cost. Maybe. I remember some of the timestamp code being a might bit odd. Specifically there might have been cases where the timestamp code predictable slower in some cases, but that slowness would be blamed on the wrong function. I think this might have to do with some of the the fact that a memory mapped read could block until other outstanding memory mapped operations finished. -Doug
Hi, > Eric Anholt <eric@anholt.net> hat am 20. April 2017 um 20:54 geschrieben: > > > Stefan Wahren <stefan.wahren@i2se.com> writes: > > > Hi, > > > >> Doug Anderson <dianders@chromium.org> hat am 18. April 2017 um 22:41 geschrieben: > >> > >> > >> It's hard to know for sure that all of this time is really in > >> urb_enqueue(). Possible we could have task switched out and been > >> blocked elsewhere. Using ftrace to get more fine-grained timings > >> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your > >> friends here if you want to use trace_printk. > > > > i'm a newbie to ftrace, so i hope this would be helpful. > > > > # connect PL2303 to the onboard hub > > # echo 0 > options/sleep-time > > # echo 0 > function_profile_enabled > > # echo 1 > function_profile_enabled > > # ./usb_test > > # Waiting for at least 20 seconds and then disconnect PL2303 > > # echo 0 > function_profile_enabled > > # cat trace_stat/function0 > > > > Function Hit Time Avg s^2 > > -------- --- ---- --- --- > > bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us > > __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us > > generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us > > irq_exit 1082482 98197771 us 90.715 us 29649040 us > > handle_level_irq 1082482 95812379 us 88.511 us 51910093 us > > If I'm reading this output right, we're spending half of our interrupt > processing time in irq_exit(), so even if dwc2's interrupt was free (the > generic_handle_irq() chain), we'd be eating about half the CPU getting > back out of the interrupt handler, right? > > I don't really know anything about DWC2 or USB, but is there any way we > could mitigate the interrupt frequency with this hardware? If nothing > else, could we loop reading gintsts until it reads back 0? first of all i updated my kernel to 4.11rc7 and the issue still occures. Today i had some time to investigate this issue further and i made some interesting observations: 1. The lockup doesn't occure always after starting usb_test. In rare cases i was able to run the program without lockup. 2. In case the lockup occured we are always able to "rescue" the BCM2835 from this state by sending some serial data to the PL2303. 3. I looked again at the logic analyzer traces of the PL2303 and FTDI. After usb_test has been started the time between 2 _dwc2_hcd_irq calls is mostly smaller than 6 us (this can't be correct for a single device doesn't send any data). With other words it seems to me one or more interrupt(s) are never acked. 4. I placed a trace_printk into dwc2_handle_hcd_intr() in order to dump GINTSTS. In normal state we usually reach the value 0x04000029, but not in lockup state. Regards Stefan
diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index 1a7e830..aa490c3 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -930,6 +930,9 @@ struct dwc2_hsotg { struct dentry *debug_root; struct debugfs_regset32 *regset; + int gpio_17; + int gpio_18; + /* DWC OTG HW Release versions */ #define DWC2_CORE_REV_2_71a 0x4f54271a #define DWC2_CORE_REV_2_90a 0x4f54290a diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index a73722e..97271c8 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -45,6 +45,7 @@ #include <linux/platform_device.h> #include <linux/dma-mapping.h> #include <linux/delay.h> +#include <linux/gpio.h> #include <linux/io.h> #include <linux/slab.h> #include <linux/usb.h> @@ -4595,13 +4596,17 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, bool qh_allocated = false; struct dwc2_qtd *qtd; + gpio_set_value(hsotg->gpio_18, 1); + if (dbg_urb(urb)) { dev_vdbg(hsotg->dev, "DWC OTG HCD URB Enqueue\n"); dwc2_dump_urb_info(hcd, urb, "urb_enqueue"); } - if (!ep) + if (!ep) { + gpio_set_value(hsotg->gpio_18, 0); return -EINVAL; + } if (usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS || usb_pipetype(urb->pipe) == PIPE_INTERRUPT) { @@ -4628,8 +4633,10 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, dwc2_urb = dwc2_hcd_urb_alloc(hsotg, urb->number_of_packets, mem_flags); - if (!dwc2_urb) + if (!dwc2_urb) { + gpio_set_value(hsotg->gpio_18, 0); return -ENOMEM; + } dwc2_hcd_urb_set_pipeinfo(hsotg, dwc2_urb, usb_pipedevice(urb->pipe), usb_pipeendpoint(urb->pipe), ep_type, @@ -4705,6 +4712,8 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, spin_unlock_irqrestore(&hsotg->lock, flags); + gpio_set_value(hsotg->gpio_18, 0); + return 0; fail3: @@ -4732,6 +4741,8 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, fail0: kfree(dwc2_urb); + gpio_set_value(hsotg->gpio_18, 0); + return retval; } @@ -4825,8 +4836,15 @@ static void _dwc2_hcd_endpoint_reset(struct usb_hcd *hcd, static irqreturn_t _dwc2_hcd_irq(struct usb_hcd *hcd) { struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd); + irqreturn_t ret; - return dwc2_handle_hcd_intr(hsotg); + gpio_set_value(hsotg->gpio_17, 1); + + ret = dwc2_handle_hcd_intr(hsotg); + + gpio_set_value(hsotg->gpio_17, 0); + + return ret; } /* diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c index 9564bc7..e3f8a8a 100644 --- a/drivers/usb/dwc2/platform.c +++ b/drivers/usb/dwc2/platform.c @@ -35,6 +35,7 @@ */ #include <linux/kernel.h> +#include <linux/gpio.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/clk.h> @@ -404,6 +405,16 @@ static int dwc2_driver_probe(struct platform_device *dev) return hsotg->irq; } + hsotg->gpio_17 = 475; + retval = devm_gpio_request_one(hsotg->dev, hsotg->gpio_17, GPIOF_OUT_INIT_LOW, "gpio_17"); + if (retval) + pr_warn("Unable to request GPIO17: %d\n", retval); + + hsotg->gpio_18 = 476; + retval = devm_gpio_request_one(hsotg->dev, hsotg->gpio_18, GPIOF_OUT_INIT_LOW, "gpio_18"); + if (retval) + pr_warn("Unable to request GPIO18: %d\n", retval); + dev_dbg(hsotg->dev, "registering common handler for irq%d\n", hsotg->irq); retval = devm_request_irq(hsotg->dev, hsotg->irq,