mbox series

[4.19.y,0/9] Fix scheduling while atomic in dwc3_gadget_ep_dequeue

Message ID 20190627205240.38366-1-john.stultz@linaro.org (mailing list archive)
Headers show
Series Fix scheduling while atomic in dwc3_gadget_ep_dequeue | expand

Message

John Stultz June 27, 2019, 8:52 p.m. UTC
With recent changes in AOSP, adb is using asynchronous io, which
causes the following crash usually on a reboot:

[  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
[  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
[  184.316034] Preemption disabled at:
[  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
[  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-00669-g8e4970572c43-dirty #356
[  184.334963] Hardware name: HiKey960 (DT)
[  184.338892] Call trace:
[  184.341352]  dump_backtrace+0x0/0x158
[  184.345025]  show_stack+0x14/0x20
[  184.348355]  dump_stack+0x80/0xa4
[  184.351685]  __schedule_bug+0x6c/0xc0
[  184.355363]  __schedule+0x64c/0x978
[  184.358863]  schedule+0x2c/0x90
[  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
[  184.367210]  usb_ep_dequeue+0x24/0xf8
[  184.370884]  ffs_aio_cancel+0x3c/0x80
[  184.374561]  free_ioctx_users+0x40/0x148
[  184.378500]  percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
[  184.383830]  rcu_process_callbacks+0x24c/0x5d8
[  184.388283]  __do_softirq+0x13c/0x398
[  184.391959]  run_ksoftirqd+0x3c/0x48
[  184.395549]  smpboot_thread_fn+0x220/0x288
[  184.399660]  kthread+0x12c/0x130
[  184.402901]  ret_from_fork+0x10/0x1c


This happens as usb_ep_dequeue can be called in interrupt
context, and dwc3_gadget_ep_dequeue() then calls
wait_event_lock_irq() which can sleep.

Upstream kernels are not affected due to the change
fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
removes the wait_even_lock_irq code. Unfortunately that change
has a number of dependencies, which I'm submitting here.

Also, to match upstream, in this series I've reverted one
change that was backported to -stable, to replace it with the
cherry-picked upstream commit (as the dependencies are now
there)

This issue also affects 4.14,4.9 and I believe 4.4 kernels,
however I don't know how to best backport this functionality
that far back. Help from the maintainers would be very much
appreciated!

Feedback and comments would be welcome!

thanks
-john

Cc: Fei Yang <fei.yang@intel.com>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: linux-usb@vger.kernel.org
Cc: stable@vger.kernel.org # 4.19.y

Felipe Balbi (7):
  usb: dwc3: gadget: combine unaligned and zero flags
  usb: dwc3: gadget: track number of TRBs per request
  usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()
  usb: dwc3: gadget: extract dwc3_gadget_ep_skip_trbs()
  usb: dwc3: gadget: introduce cancelled_list
  usb: dwc3: gadget: move requests to cancelled_list
  usb: dwc3: gadget: remove wait_end_transfer

Jack Pham (1):
  usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup

John Stultz (1):
  Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup"

 drivers/usb/dwc3/core.h   |  15 ++--
 drivers/usb/dwc3/gadget.c | 158 +++++++++++++-------------------------
 drivers/usb/dwc3/gadget.h |  15 ++++
 3 files changed, 75 insertions(+), 113 deletions(-)

Comments

Gopal, Saranya June 28, 2019, 10:10 a.m. UTC | #1
> With recent changes in AOSP, adb is using asynchronous io, which
> causes the following crash usually on a reboot:
> 
> [  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
> [  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec
> wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3
> snd_soc_simple_card snd_soc_a
> [  184.316034] Preemption disabled at:
> [  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
> [  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-
> 00669-g8e4970572c43-dirty #356
> [  184.334963] Hardware name: HiKey960 (DT)
> [  184.338892] Call trace:
> [  184.341352]  dump_backtrace+0x0/0x158
> [  184.345025]  show_stack+0x14/0x20
> [  184.348355]  dump_stack+0x80/0xa4
> [  184.351685]  __schedule_bug+0x6c/0xc0
> [  184.355363]  __schedule+0x64c/0x978
> [  184.358863]  schedule+0x2c/0x90
> [  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]


> This happens as usb_ep_dequeue can be called in interrupt
> context, and dwc3_gadget_ep_dequeue() then calls
> wait_event_lock_irq() which can sleep.
> 
> Upstream kernels are not affected due to the change
> fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
> removes the wait_even_lock_irq code. Unfortunately that change
> has a number of dependencies, which I'm submitting here.
> 
> Also, to match upstream, in this series I've reverted one
> change that was backported to -stable, to replace it with the
> cherry-picked upstream commit (as the dependencies are now
> there)
> 
> This issue also affects 4.14,4.9 and I believe 4.4 kernels,
> however I don't know how to best backport this functionality
> that far back. Help from the maintainers would be very much
> appreciated!
> 
> Feedback and comments would be welcome!
> 
> thanks
> -john

I confirm that this patch series fixes crash seen on reboot.
Considering that many Android platforms use 4.19 stable kernel with latest AOSP codebase, it would be really helpful if these patches are merged to 4.19 stable.

Thanks,
Saranya
John Stultz June 28, 2019, 6:14 p.m. UTC | #2
On Fri, Jun 28, 2019 at 3:10 AM Gopal, Saranya <saranya.gopal@intel.com> wrote:
>
> > With recent changes in AOSP, adb is using asynchronous io, which
> > causes the following crash usually on a reboot:
> >
> > [  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
> > [  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec
> > wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3
> > snd_soc_simple_card snd_soc_a
> > [  184.316034] Preemption disabled at:
> > [  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
> > [  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-
> > 00669-g8e4970572c43-dirty #356
> > [  184.334963] Hardware name: HiKey960 (DT)
> > [  184.338892] Call trace:
> > [  184.341352]  dump_backtrace+0x0/0x158
> > [  184.345025]  show_stack+0x14/0x20
> > [  184.348355]  dump_stack+0x80/0xa4
> > [  184.351685]  __schedule_bug+0x6c/0xc0
> > [  184.355363]  __schedule+0x64c/0x978
> > [  184.358863]  schedule+0x2c/0x90
> > [  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
>
>
> > This happens as usb_ep_dequeue can be called in interrupt
> > context, and dwc3_gadget_ep_dequeue() then calls
> > wait_event_lock_irq() which can sleep.
> >
> > Upstream kernels are not affected due to the change
> > fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
> > removes the wait_even_lock_irq code. Unfortunately that change
> > has a number of dependencies, which I'm submitting here.
> >
> > Also, to match upstream, in this series I've reverted one
> > change that was backported to -stable, to replace it with the
> > cherry-picked upstream commit (as the dependencies are now
> > there)
> >
> > This issue also affects 4.14,4.9 and I believe 4.4 kernels,
> > however I don't know how to best backport this functionality
> > that far back. Help from the maintainers would be very much
> > appreciated!
> >
> > Feedback and comments would be welcome!
> >
> > thanks
> > -john
>
> I confirm that this patch series fixes crash seen on reboot.
> Considering that many Android platforms use 4.19 stable kernel with latest AOSP codebase, it would be really helpful if these patches are merged to 4.19 stable.
>

Thanks so much for the testing! Do let me know if you come across any
ideas on how to cleanly resolve this for 4.14/4.9/4.4!

thanks
-john