diff mbox series

[02/11] drivers: usb: host: Fix deadlock in oxu_bus_suspend()

Message ID 8b1201dc7554a2ab3ca555a2b6e2747761603d19.1649310812.git.duoming@zju.edu.cn (mailing list archive)
State Not Applicable
Headers show
Series Fix deadlocks caused by del_timer_sync() | expand

Commit Message

Duoming Zhou April 7, 2022, 6:33 a.m. UTC
There is a deadlock in oxu_bus_suspend(), which is shown below:

   (Thread 1)              |      (Thread 2)
                           | timer_action()
oxu_bus_suspend()          |  mod_timer()
 spin_lock_irq() //(1)     |  (wait a time)
 ...                       | oxu_watchdog()
 del_timer_sync()          |  spin_lock_irq() //(2)
 (wait timer to stop)      |  ...

We hold oxu->lock in position (1) of thread 1, and use
del_timer_sync() to wait timer to stop, but timer handler
also need oxu->lock in position (2) of thread 2. As a result,
oxu_bus_suspend() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_irq(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/usb/host/oxu210hp-hcd.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Oliver Neukum April 7, 2022, 8:01 a.m. UTC | #1
On 07.04.22 08:33, Duoming Zhou wrote:
> There is a deadlock in oxu_bus_suspend(), which is shown below:
>
>    (Thread 1)              |      (Thread 2)
>                            | timer_action()
> oxu_bus_suspend()          |  mod_timer()
>  spin_lock_irq() //(1)     |  (wait a time)
>  ...                       | oxu_watchdog()
>  del_timer_sync()          |  spin_lock_irq() //(2)
>  (wait timer to stop)      |  ...
>
> We hold oxu->lock in position (1) of thread 1, and use
> del_timer_sync() to wait timer to stop, but timer handler
> also need oxu->lock in position (2) of thread 2. As a result,
> oxu_bus_suspend() will block forever.
>
> This patch extracts del_timer_sync() from the protection of
> spin_lock_irq(), which could let timer handler to obtain
> the needed lock.
Good catch.
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
>  drivers/usb/host/oxu210hp-hcd.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/usb/host/oxu210hp-hcd.c b/drivers/usb/host/oxu210hp-hcd.c
> index b741670525e..ee403df3309 100644
> --- a/drivers/usb/host/oxu210hp-hcd.c
> +++ b/drivers/usb/host/oxu210hp-hcd.c
> @@ -3909,8 +3909,10 @@ static int oxu_bus_suspend(struct usb_hcd *hcd)
>  		}
>  	}
>  
> +	spin_unlock_irq(&oxu->lock);
>  	/* turn off now-idle HC */
>  	del_timer_sync(&oxu->watchdog);
> +	spin_lock_irq(&oxu->lock);
>  	ehci_halt(oxu);
>  	hcd->state = HC_STATE_SUSPENDED;
>  

What is the lock protecting at that stage? Why not just drop it earlier.

    Regards
        Oliver
Duoming Zhou April 7, 2022, 12:02 p.m. UTC | #2
Hello,

On Thu, 7 Apr 2022 10:01:43 +0200 Oliver Neukum wrote:

> On 07.04.22 08:33, Duoming Zhou wrote:
> > There is a deadlock in oxu_bus_suspend(), which is shown below:
> >
> >    (Thread 1)              |      (Thread 2)
> >                            | timer_action()
> > oxu_bus_suspend()          |  mod_timer()
> >  spin_lock_irq() //(1)     |  (wait a time)
> >  ...                       | oxu_watchdog()
> >  del_timer_sync()          |  spin_lock_irq() //(2)
> >  (wait timer to stop)      |  ...
> >
> > We hold oxu->lock in position (1) of thread 1, and use
> > del_timer_sync() to wait timer to stop, but timer handler
> > also need oxu->lock in position (2) of thread 2. As a result,
> > oxu_bus_suspend() will block forever.
> >
> > This patch extracts del_timer_sync() from the protection of
> > spin_lock_irq(), which could let timer handler to obtain
> > the needed lock.
> Good catch.
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> >  drivers/usb/host/oxu210hp-hcd.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/usb/host/oxu210hp-hcd.c b/drivers/usb/host/oxu210hp-hcd.c
> > index b741670525e..ee403df3309 100644
> > --- a/drivers/usb/host/oxu210hp-hcd.c
> > +++ b/drivers/usb/host/oxu210hp-hcd.c
> > @@ -3909,8 +3909,10 @@ static int oxu_bus_suspend(struct usb_hcd *hcd)
> >  		}
> >  	}
> >  
> > +	spin_unlock_irq(&oxu->lock);
> >  	/* turn off now-idle HC */
> >  	del_timer_sync(&oxu->watchdog);
> > +	spin_lock_irq(&oxu->lock);
> >  	ehci_halt(oxu);
> >  	hcd->state = HC_STATE_SUSPENDED;
> >  
> 
> What is the lock protecting at that stage? Why not just drop it earlier.

I think there is a race condition between oxu_bus_suspend() and oxu_stop(),
so I think we could not drop the oxu->lock earlier.

               (Thread 1)               |         (Thread 2)
 oxu_bus_suspend()                      | oxu_stop()
                                        | 
 hcd->state = HC_STATE_SUSPENDED;       |  spin_lock_irq(&oxu->lock);
 ...                                    | 
 writel(mask, &oxu->regs->intr_enable); |  ...
                                        |  writel(0, &oxu->regs->intr_enable);
 readl(&oxu->regs->intr_enable);        |

The oxu->regs->intr_enable is set to 0 in oxu_stop(), and the readl() in
oxu_bus_suspend() will read the wrong value.

Thanks a lot for your time and advice. If you have questions, welcome to ask me.

Best regards,
Duoming Zhou
diff mbox series

Patch

diff --git a/drivers/usb/host/oxu210hp-hcd.c b/drivers/usb/host/oxu210hp-hcd.c
index b741670525e..ee403df3309 100644
--- a/drivers/usb/host/oxu210hp-hcd.c
+++ b/drivers/usb/host/oxu210hp-hcd.c
@@ -3909,8 +3909,10 @@  static int oxu_bus_suspend(struct usb_hcd *hcd)
 		}
 	}
 
+	spin_unlock_irq(&oxu->lock);
 	/* turn off now-idle HC */
 	del_timer_sync(&oxu->watchdog);
+	spin_lock_irq(&oxu->lock);
 	ehci_halt(oxu);
 	hcd->state = HC_STATE_SUSPENDED;