Message ID | 20240119094825.26530-1-quic_uaggarwa@quicinc.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 61a348857e869432e6a920ad8ea9132e8d44c316 |
Headers | show |
Series | [v3] usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend | expand |
Dear All, On 19.01.2024 10:48, Uttkarsh Aggarwal wrote: > In current scenario if Plug-out and Plug-In performed continuously > there could be a chance while checking for dwc->gadget_driver in > dwc3_gadget_suspend, a NULL pointer dereference may occur. > > Call Stack: > > CPU1: CPU2: > gadget_unbind_driver dwc3_suspend_common > dwc3_gadget_stop dwc3_gadget_suspend > dwc3_disconnect_gadget > > CPU1 basically clears the variable and CPU2 checks the variable. > Consider CPU1 is running and right before gadget_driver is cleared > and in parallel CPU2 executes dwc3_gadget_suspend where it finds > dwc->gadget_driver which is not NULL and resumes execution and then > CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where > it checks dwc->gadget_driver is already NULL because of which the > NULL pointer deference occur. > > Cc: <stable@vger.kernel.org> > Fixes: 9772b47a4c29 ("usb: dwc3: gadget: Fix suspend/resume during device mode") > Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> > Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com> This patch landed some time ago in linux-next as commit 61a348857e86 ("usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend"). Recently I found that it causes the following warning when no USB gadget is bound to the DWC3 driver and a system suspend/resume cycle is performed: dwc3 12400000.usb: wait for SETUP phase timed out dwc3 12400000.usb: failed to set STALL on ep0out ------------[ cut here ]------------ WARNING: CPU: 4 PID: 604 at drivers/usb/dwc3/ep0.c:289 dwc3_ep0_out_start+0xc8/0xcc Modules linked in: CPU: 4 PID: 604 Comm: rtcwake Not tainted 6.8.0-rc3-next-20240207 #7979 Hardware name: Samsung Exynos (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0x7c/0x1bc __warn from warn_slowpath_fmt+0x1a0/0x1a8 warn_slowpath_fmt from dwc3_ep0_out_start+0xc8/0xcc dwc3_ep0_out_start from dwc3_gadget_soft_disconnect+0x16c/0x230 dwc3_gadget_soft_disconnect from dwc3_gadget_suspend+0xc/0x90 dwc3_gadget_suspend from dwc3_suspend_common+0x44/0x30c dwc3_suspend_common from dwc3_suspend+0x14/0x2c dwc3_suspend from dpm_run_callback+0x94/0x288 dpm_run_callback from device_suspend+0x130/0x6d0 device_suspend from dpm_suspend+0x124/0x35c dpm_suspend from dpm_suspend_start+0x64/0x6c dpm_suspend_start from suspend_devices_and_enter+0x134/0xbd8 suspend_devices_and_enter from pm_suspend+0x2ec/0x380 pm_suspend from state_store+0x68/0xc8 state_store from kernfs_fop_write_iter+0x110/0x1d4 kernfs_fop_write_iter from vfs_write+0x2e8/0x430 vfs_write from ksys_write+0x5c/0xd4 ksys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xf1421fa8 to 0xf1421ff0) ... irq event stamp: 14304 hardirqs last enabled at (14303): [<c01a599c>] console_unlock+0x108/0x114 hardirqs last disabled at (14304): [<c0c229d8>] _raw_spin_lock_irqsave+0x64/0x68 softirqs last enabled at (13030): [<c010163c>] __do_softirq+0x318/0x4f4 softirqs last disabled at (13025): [<c012dd40>] __irq_exit_rcu+0x130/0x184 ---[ end trace 0000000000000000 ]--- IMHO dwc3_gadget_soft_disconnect() requires some kind of a check if dwc->gadget_driver is present or not, as it really makes no sense to do any ep0 related operations if there is no gadget driver at all. > --- > > changes in v3: > Corrected fixes tag and typo mistake in commit message dw3_gadget_stop -> dwc3_gadget_stop. > > Link to v2: > https://lore.kernel.org/linux-usb/CAKzKK0r8RUqgXy1o5dndU21KuTKtyZ5rn5Fb9sZqTPZqAjT_9A@mail.gmail.com/T/#t > > Changes in v2: > Added cc and fixes tag missing in v1. > > Link to v1: > https://lore.kernel.org/linux-usb/20240110095532.4776-1-quic_uaggarwa@quicinc.com/T/#u > > drivers/usb/dwc3/gadget.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 019368f8e9c4..564976b3e2b9 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -4709,15 +4709,13 @@ int dwc3_gadget_suspend(struct dwc3 *dwc) > unsigned long flags; > int ret; > > - if (!dwc->gadget_driver) > - return 0; > - > ret = dwc3_gadget_soft_disconnect(dwc); > if (ret) > goto err; > > spin_lock_irqsave(&dwc->lock, flags); > - dwc3_disconnect_gadget(dwc); > + if (dwc->gadget_driver) > + dwc3_disconnect_gadget(dwc); > spin_unlock_irqrestore(&dwc->lock, flags); > > return 0; Best regards
On Wed, Feb 07, 2024, Marek Szyprowski wrote: > Dear All, > > On 19.01.2024 10:48, Uttkarsh Aggarwal wrote: > > In current scenario if Plug-out and Plug-In performed continuously > > there could be a chance while checking for dwc->gadget_driver in > > dwc3_gadget_suspend, a NULL pointer dereference may occur. > > > > Call Stack: > > > > CPU1: CPU2: > > gadget_unbind_driver dwc3_suspend_common > > dwc3_gadget_stop dwc3_gadget_suspend > > dwc3_disconnect_gadget > > > > CPU1 basically clears the variable and CPU2 checks the variable. > > Consider CPU1 is running and right before gadget_driver is cleared > > and in parallel CPU2 executes dwc3_gadget_suspend where it finds > > dwc->gadget_driver which is not NULL and resumes execution and then > > CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where > > it checks dwc->gadget_driver is already NULL because of which the > > NULL pointer deference occur. > > > > Cc: <stable@vger.kernel.org> > > Fixes: 9772b47a4c29 ("usb: dwc3: gadget: Fix suspend/resume during device mode") > > Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> > > Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com> > > This patch landed some time ago in linux-next as commit 61a348857e86 > ("usb: dwc3: gadget: Fix NULL pointer dereference in > dwc3_gadget_suspend"). Recently I found that it causes the following > warning when no USB gadget is bound to the DWC3 driver and a system > suspend/resume cycle is performed: > > dwc3 12400000.usb: wait for SETUP phase timed out > dwc3 12400000.usb: failed to set STALL on ep0out > ------------[ cut here ]------------ > WARNING: CPU: 4 PID: 604 at drivers/usb/dwc3/ep0.c:289 > dwc3_ep0_out_start+0xc8/0xcc > Modules linked in: > CPU: 4 PID: 604 Comm: rtcwake Not tainted 6.8.0-rc3-next-20240207 #7979 > Hardware name: Samsung Exynos (Flattened Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x58/0x70 > dump_stack_lvl from __warn+0x7c/0x1bc > __warn from warn_slowpath_fmt+0x1a0/0x1a8 > warn_slowpath_fmt from dwc3_ep0_out_start+0xc8/0xcc > dwc3_ep0_out_start from dwc3_gadget_soft_disconnect+0x16c/0x230 > dwc3_gadget_soft_disconnect from dwc3_gadget_suspend+0xc/0x90 > dwc3_gadget_suspend from dwc3_suspend_common+0x44/0x30c > dwc3_suspend_common from dwc3_suspend+0x14/0x2c > dwc3_suspend from dpm_run_callback+0x94/0x288 > dpm_run_callback from device_suspend+0x130/0x6d0 > device_suspend from dpm_suspend+0x124/0x35c > dpm_suspend from dpm_suspend_start+0x64/0x6c > dpm_suspend_start from suspend_devices_and_enter+0x134/0xbd8 > suspend_devices_and_enter from pm_suspend+0x2ec/0x380 > pm_suspend from state_store+0x68/0xc8 > state_store from kernfs_fop_write_iter+0x110/0x1d4 > kernfs_fop_write_iter from vfs_write+0x2e8/0x430 > vfs_write from ksys_write+0x5c/0xd4 > ksys_write from ret_fast_syscall+0x0/0x1c > Exception stack(0xf1421fa8 to 0xf1421ff0) > ... > irq event stamp: 14304 > hardirqs last enabled at (14303): [<c01a599c>] console_unlock+0x108/0x114 > hardirqs last disabled at (14304): [<c0c229d8>] > _raw_spin_lock_irqsave+0x64/0x68 > softirqs last enabled at (13030): [<c010163c>] __do_softirq+0x318/0x4f4 > softirqs last disabled at (13025): [<c012dd40>] __irq_exit_rcu+0x130/0x184 > ---[ end trace 0000000000000000 ]--- > > IMHO dwc3_gadget_soft_disconnect() requires some kind of a check if > dwc->gadget_driver is present or not, as it really makes no sense to do I don't think checking that is sufficient, and I don't think that's the case here. > any ep0 related operations if there is no gadget driver at all. > If there's indeed no gadget_driver present, then we wouldn't get this stack trace. (ie. dwc3_ep0_out_start should occurs when gadget_driver is present). This is a race happened between binding + suspend. I think something like this should be sufficient. Would you mind giving it a try? diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 564976b3e2b9..1990d6371066 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2656,6 +2656,11 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) int ret; spin_lock_irqsave(&dwc->lock, flags); + if (!dwc->pullups_connected) { + spin_unlock_irqrestore(&dwc->lock, flags); + return 0; + } + dwc->connected = false; /* Thanks, Thinh
On 08.02.2024 23:54, Thinh Nguyen wrote: > On Wed, Feb 07, 2024, Marek Szyprowski wrote: >> On 19.01.2024 10:48, Uttkarsh Aggarwal wrote: >>> In current scenario if Plug-out and Plug-In performed continuously >>> there could be a chance while checking for dwc->gadget_driver in >>> dwc3_gadget_suspend, a NULL pointer dereference may occur. >>> >>> Call Stack: >>> >>> CPU1: CPU2: >>> gadget_unbind_driver dwc3_suspend_common >>> dwc3_gadget_stop dwc3_gadget_suspend >>> dwc3_disconnect_gadget >>> >>> CPU1 basically clears the variable and CPU2 checks the variable. >>> Consider CPU1 is running and right before gadget_driver is cleared >>> and in parallel CPU2 executes dwc3_gadget_suspend where it finds >>> dwc->gadget_driver which is not NULL and resumes execution and then >>> CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where >>> it checks dwc->gadget_driver is already NULL because of which the >>> NULL pointer deference occur. >>> >>> Cc: <stable@vger.kernel.org> >>> Fixes: 9772b47a4c29 ("usb: dwc3: gadget: Fix suspend/resume during device mode") >>> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> >>> Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com> >> This patch landed some time ago in linux-next as commit 61a348857e86 >> ("usb: dwc3: gadget: Fix NULL pointer dereference in >> dwc3_gadget_suspend"). Recently I found that it causes the following >> warning when no USB gadget is bound to the DWC3 driver and a system >> suspend/resume cycle is performed: >> >> dwc3 12400000.usb: wait for SETUP phase timed out >> dwc3 12400000.usb: failed to set STALL on ep0out >> ------------[ cut here ]------------ >> WARNING: CPU: 4 PID: 604 at drivers/usb/dwc3/ep0.c:289 >> dwc3_ep0_out_start+0xc8/0xcc >> Modules linked in: >> CPU: 4 PID: 604 Comm: rtcwake Not tainted 6.8.0-rc3-next-20240207 #7979 >> Hardware name: Samsung Exynos (Flattened Device Tree) >> unwind_backtrace from show_stack+0x10/0x14 >> show_stack from dump_stack_lvl+0x58/0x70 >> dump_stack_lvl from __warn+0x7c/0x1bc >> __warn from warn_slowpath_fmt+0x1a0/0x1a8 >> warn_slowpath_fmt from dwc3_ep0_out_start+0xc8/0xcc >> dwc3_ep0_out_start from dwc3_gadget_soft_disconnect+0x16c/0x230 >> dwc3_gadget_soft_disconnect from dwc3_gadget_suspend+0xc/0x90 >> dwc3_gadget_suspend from dwc3_suspend_common+0x44/0x30c >> dwc3_suspend_common from dwc3_suspend+0x14/0x2c >> dwc3_suspend from dpm_run_callback+0x94/0x288 >> dpm_run_callback from device_suspend+0x130/0x6d0 >> device_suspend from dpm_suspend+0x124/0x35c >> dpm_suspend from dpm_suspend_start+0x64/0x6c >> dpm_suspend_start from suspend_devices_and_enter+0x134/0xbd8 >> suspend_devices_and_enter from pm_suspend+0x2ec/0x380 >> pm_suspend from state_store+0x68/0xc8 >> state_store from kernfs_fop_write_iter+0x110/0x1d4 >> kernfs_fop_write_iter from vfs_write+0x2e8/0x430 >> vfs_write from ksys_write+0x5c/0xd4 >> ksys_write from ret_fast_syscall+0x0/0x1c >> Exception stack(0xf1421fa8 to 0xf1421ff0) >> ... >> irq event stamp: 14304 >> hardirqs last enabled at (14303): [<c01a599c>] console_unlock+0x108/0x114 >> hardirqs last disabled at (14304): [<c0c229d8>] >> _raw_spin_lock_irqsave+0x64/0x68 >> softirqs last enabled at (13030): [<c010163c>] __do_softirq+0x318/0x4f4 >> softirqs last disabled at (13025): [<c012dd40>] __irq_exit_rcu+0x130/0x184 >> ---[ end trace 0000000000000000 ]--- >> >> IMHO dwc3_gadget_soft_disconnect() requires some kind of a check if >> dwc->gadget_driver is present or not, as it really makes no sense to do > I don't think checking that is sufficient, and I don't think that's the > case here. > >> any ep0 related operations if there is no gadget driver at all. >> > If there's indeed no gadget_driver present, then we wouldn't get this > stack trace. (ie. dwc3_ep0_out_start should occurs when gadget_driver is > present). This is a race happened between binding + suspend. I have no gadget compiled into the kernel and no such created via configfs, so how can this be caused by a race? > I think something like this should be sufficient. Would you mind giving > it a try? > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 564976b3e2b9..1990d6371066 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -2656,6 +2656,11 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > int ret; > > spin_lock_irqsave(&dwc->lock, flags); > + if (!dwc->pullups_connected) { > + spin_unlock_irqrestore(&dwc->lock, flags); > + return 0; > + } > + > dwc->connected = false; > > /* > This patch fixes the reported issue. Feel free to add: Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Best regards
Sorry for the late reply. On Fri, Feb 09, 2024, Marek Szyprowski wrote: > On 08.02.2024 23:54, Thinh Nguyen wrote: > > On Wed, Feb 07, 2024, Marek Szyprowski wrote: > >> On 19.01.2024 10:48, Uttkarsh Aggarwal wrote: > >>> In current scenario if Plug-out and Plug-In performed continuously > >>> there could be a chance while checking for dwc->gadget_driver in > >>> dwc3_gadget_suspend, a NULL pointer dereference may occur. > >>> > >>> Call Stack: > >>> > >>> CPU1: CPU2: > >>> gadget_unbind_driver dwc3_suspend_common > >>> dwc3_gadget_stop dwc3_gadget_suspend > >>> dwc3_disconnect_gadget > >>> > >>> CPU1 basically clears the variable and CPU2 checks the variable. > >>> Consider CPU1 is running and right before gadget_driver is cleared > >>> and in parallel CPU2 executes dwc3_gadget_suspend where it finds > >>> dwc->gadget_driver which is not NULL and resumes execution and then > >>> CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where > >>> it checks dwc->gadget_driver is already NULL because of which the > >>> NULL pointer deference occur. > >>> > >>> Cc: <stable@vger.kernel.org> > >>> Fixes: 9772b47a4c29 ("usb: dwc3: gadget: Fix suspend/resume during device mode") > >>> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> > >>> Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com> > >> This patch landed some time ago in linux-next as commit 61a348857e86 > >> ("usb: dwc3: gadget: Fix NULL pointer dereference in > >> dwc3_gadget_suspend"). Recently I found that it causes the following > >> warning when no USB gadget is bound to the DWC3 driver and a system > >> suspend/resume cycle is performed: > >> > >> dwc3 12400000.usb: wait for SETUP phase timed out > >> dwc3 12400000.usb: failed to set STALL on ep0out > >> ------------[ cut here ]------------ > >> WARNING: CPU: 4 PID: 604 at drivers/usb/dwc3/ep0.c:289 > >> dwc3_ep0_out_start+0xc8/0xcc > >> Modules linked in: > >> CPU: 4 PID: 604 Comm: rtcwake Not tainted 6.8.0-rc3-next-20240207 #7979 > >> Hardware name: Samsung Exynos (Flattened Device Tree) > >> unwind_backtrace from show_stack+0x10/0x14 > >> show_stack from dump_stack_lvl+0x58/0x70 > >> dump_stack_lvl from __warn+0x7c/0x1bc > >> __warn from warn_slowpath_fmt+0x1a0/0x1a8 > >> warn_slowpath_fmt from dwc3_ep0_out_start+0xc8/0xcc > >> dwc3_ep0_out_start from dwc3_gadget_soft_disconnect+0x16c/0x230 > >> dwc3_gadget_soft_disconnect from dwc3_gadget_suspend+0xc/0x90 > >> dwc3_gadget_suspend from dwc3_suspend_common+0x44/0x30c > >> dwc3_suspend_common from dwc3_suspend+0x14/0x2c > >> dwc3_suspend from dpm_run_callback+0x94/0x288 > >> dpm_run_callback from device_suspend+0x130/0x6d0 > >> device_suspend from dpm_suspend+0x124/0x35c > >> dpm_suspend from dpm_suspend_start+0x64/0x6c > >> dpm_suspend_start from suspend_devices_and_enter+0x134/0xbd8 > >> suspend_devices_and_enter from pm_suspend+0x2ec/0x380 > >> pm_suspend from state_store+0x68/0xc8 > >> state_store from kernfs_fop_write_iter+0x110/0x1d4 > >> kernfs_fop_write_iter from vfs_write+0x2e8/0x430 > >> vfs_write from ksys_write+0x5c/0xd4 > >> ksys_write from ret_fast_syscall+0x0/0x1c > >> Exception stack(0xf1421fa8 to 0xf1421ff0) > >> ... > >> irq event stamp: 14304 > >> hardirqs last enabled at (14303): [<c01a599c>] console_unlock+0x108/0x114 > >> hardirqs last disabled at (14304): [<c0c229d8>] > >> _raw_spin_lock_irqsave+0x64/0x68 > >> softirqs last enabled at (13030): [<c010163c>] __do_softirq+0x318/0x4f4 > >> softirqs last disabled at (13025): [<c012dd40>] __irq_exit_rcu+0x130/0x184 > >> ---[ end trace 0000000000000000 ]--- > >> > >> IMHO dwc3_gadget_soft_disconnect() requires some kind of a check if > >> dwc->gadget_driver is present or not, as it really makes no sense to do > > I don't think checking that is sufficient, and I don't think that's the > > case here. > > > >> any ep0 related operations if there is no gadget driver at all. > >> > > If there's indeed no gadget_driver present, then we wouldn't get this > > stack trace. (ie. dwc3_ep0_out_start should occurs when gadget_driver is > > present). This is a race happened between binding + suspend. > > I have no gadget compiled into the kernel and no such created via > configfs, so how can this be caused by a race? Ah... In that case, we got through the incomplete/wrong check for dwc3_gadget_soft_disconnect(): if (dwc->ep0state != EP0_SETUP_PHASE) Since there's no gadget driver, the controller never started and the ep0state is defaulted to EP0_UNCONNECTED, which explained why it got into the timeout condition above and incorrectly attempt to start the control transfer. > > > > > I think something like this should be sufficient. Would you mind giving > > it a try? > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > index 564976b3e2b9..1990d6371066 100644 > > --- a/drivers/usb/dwc3/gadget.c > > +++ b/drivers/usb/dwc3/gadget.c > > @@ -2656,6 +2656,11 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > > int ret; > > > > spin_lock_irqsave(&dwc->lock, flags); > > + if (!dwc->pullups_connected) { > > + spin_unlock_irqrestore(&dwc->lock, flags); > > + return 0; > > + } > > + > > dwc->connected = false; > > > > /* > > > This patch fixes the reported issue. Feel free to add: > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > Thanks for the report and Tested-by! I'll send a fix patch soon. BR, Thinh
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 019368f8e9c4..564976b3e2b9 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -4709,15 +4709,13 @@ int dwc3_gadget_suspend(struct dwc3 *dwc) unsigned long flags; int ret; - if (!dwc->gadget_driver) - return 0; - ret = dwc3_gadget_soft_disconnect(dwc); if (ret) goto err; spin_lock_irqsave(&dwc->lock, flags); - dwc3_disconnect_gadget(dwc); + if (dwc->gadget_driver) + dwc3_disconnect_gadget(dwc); spin_unlock_irqrestore(&dwc->lock, flags); return 0;