usb: hub: Fix usb enumeration issue due to address0 race

Message ID	20211115221630.871204-1-mathias.nyman@linux.intel.com (mailing list archive)
State	Accepted
Commit	6ae6dc22d2d1ce6aa77a6da8a761e61aca216f8b
Headers	show Return-Path: <linux-usb-owner@kernel.org> From: Mathias Nyman <mathias.nyman@linux.intel.com> To: <gregkh@linuxfoundation.org>, <stern@rowland.harvard.edu>, kishon@ti.com Cc: hdegoede@redhat.com, chris.chiu@canonical.com, linux-usb@vger.kernel.org, Mathias Nyman <mathias.nyman@linux.intel.com>, stable@vger.kernel.org Subject: [PATCH] usb: hub: Fix usb enumeration issue due to address0 race Date: Tue, 16 Nov 2021 00:16:30 +0200 Message-Id: <20211115221630.871204-1-mathias.nyman@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	usb: hub: Fix usb enumeration issue due to address0 race \| expand usb: hub: Fix usb enumeration issue due to address0 race

Mathias Nyman Nov. 15, 2021, 10:16 p.m. UTC

xHC hardware can only have one slot in default state with address 0
waiting for a unique address at a time, otherwise "undefined behavior
may occur" according to xhci spec 5.4.3.4

The address0_mutex exists to prevent this across both xhci roothubs.

If hub_port_init() fails, it may unlock the mutex and exit with a xhci
slot in default state. If the other xhci roothub calls hub_port_init()
at this point we end up with two slots in default state.

Make sure the address0_mutex protects the slot default state across
hub_port_init() retries, until slot is addressed or disabled.

Note, one known minor case is not fixed by this patch.
If device needs to be reset during resume, but fails all hub_port_init()
retries in usb_reset_and_verify_device(), then it's possible the slot is
still left in default state when address0_mutex is unlocked.

Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/core/hub.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Greg KH Nov. 16, 2021, 8:22 a.m. UTC | #1

On Tue, Nov 16, 2021 at 12:16:30AM +0200, Mathias Nyman wrote:
> xHC hardware can only have one slot in default state with address 0
> waiting for a unique address at a time, otherwise "undefined behavior
> may occur" according to xhci spec 5.4.3.4
> 
> The address0_mutex exists to prevent this across both xhci roothubs.
> 
> If hub_port_init() fails, it may unlock the mutex and exit with a xhci
> slot in default state. If the other xhci roothub calls hub_port_init()
> at this point we end up with two slots in default state.
> 
> Make sure the address0_mutex protects the slot default state across
> hub_port_init() retries, until slot is addressed or disabled.
> 
> Note, one known minor case is not fixed by this patch.
> If device needs to be reset during resume, but fails all hub_port_init()
> retries in usb_reset_and_verify_device(), then it's possible the slot is
> still left in default state when address0_mutex is unlocked.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>

What commit id does this "fix"?

thanks,

greg k-h

Mathias Nyman Nov. 16, 2021, 8:39 a.m. UTC | #2

On 16.11.2021 10.22, Greg KH wrote:
> On Tue, Nov 16, 2021 at 12:16:30AM +0200, Mathias Nyman wrote:
>> xHC hardware can only have one slot in default state with address 0
>> waiting for a unique address at a time, otherwise "undefined behavior
>> may occur" according to xhci spec 5.4.3.4
>>
>> The address0_mutex exists to prevent this across both xhci roothubs.
>>
>> If hub_port_init() fails, it may unlock the mutex and exit with a xhci
>> slot in default state. If the other xhci roothub calls hub_port_init()
>> at this point we end up with two slots in default state.
>>
>> Make sure the address0_mutex protects the slot default state across
>> hub_port_init() retries, until slot is addressed or disabled.
>>
>> Note, one known minor case is not fixed by this patch.
>> If device needs to be reset during resume, but fails all hub_port_init()
>> retries in usb_reset_and_verify_device(), then it's possible the slot is
>> still left in default state when address0_mutex is unlocked.
>>
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> 
> What commit id does this "fix"?
> 

Looks like original cause could be:
638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel")

which was partially fixed in 4.7 by:
feb26ac31a2a ("usb: core: hub: hub_port_init lock controller instead of bus")

And now improved by this patch 

-Mathias

Marek Szyprowski Nov. 18, 2021, 11:19 a.m. UTC | #3

Hi,

On 15.11.2021 23:16, Mathias Nyman wrote:
> xHC hardware can only have one slot in default state with address 0
> waiting for a unique address at a time, otherwise "undefined behavior
> may occur" according to xhci spec 5.4.3.4
>
> The address0_mutex exists to prevent this across both xhci roothubs.
>
> If hub_port_init() fails, it may unlock the mutex and exit with a xhci
> slot in default state. If the other xhci roothub calls hub_port_init()
> at this point we end up with two slots in default state.
>
> Make sure the address0_mutex protects the slot default state across
> hub_port_init() retries, until slot is addressed or disabled.
>
> Note, one known minor case is not fixed by this patch.
> If device needs to be reset during resume, but fails all hub_port_init()
> retries in usb_reset_and_verify_device(), then it's possible the slot is
> still left in default state when address0_mutex is unlocked.
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>

This patch landed in linux next-20211118 as commit 6ae6dc22d2d1 ("usb: 
hub: Fix usb enumeration issue due to address0 race"). On my test 
systems it triggers the following deplock warning during system 
suspend/resume cycle:

======================================================
WARNING: possible circular locking dependency detected
5.16.0-rc1-00014-g6ae6dc22d2d1 #4126 Not tainted
------------------------------------------------------
kworker/u16:8/738 is trying to acquire lock:
cf81f738 (hcd->address0_mutex){+.+.}-{3:3}, at: 
usb_reset_and_verify_device+0xe8/0x3e4

but task is already holding lock:
cf80ab3c (&port_dev->status_lock){+.+.}-{3:3}, at: 
usb_port_resume+0xa0/0x7e8

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&port_dev->status_lock){+.+.}-{3:3}:
        mutex_lock_nested+0x1c/0x24
        hub_event+0x824/0x1758
        process_one_work+0x2c8/0x7ec
        worker_thread+0x50/0x584
        kthread+0x13c/0x19c
        ret_from_fork+0x14/0x2c
        0x0

-> #0 (hcd->address0_mutex){+.+.}-{3:3}:
        lock_acquire+0x2a0/0x42c
        __mutex_lock+0x94/0xaa8
        mutex_lock_nested+0x1c/0x24
        usb_reset_and_verify_device+0xe8/0x3e4
        usb_port_resume+0x4e0/0x7e8
        usb_generic_driver_resume+0x18/0x40
        usb_resume_both+0x120/0x164
        usb_resume+0x14/0x60
        usb_dev_resume+0xc/0x10
        dpm_run_callback+0x98/0x32c
        device_resume+0xb4/0x258
        async_resume+0x20/0x64
        async_run_entry_fn+0x40/0x15c
        process_one_work+0x2c8/0x7ec
        worker_thread+0x50/0x584
        kthread+0x13c/0x19c
        ret_from_fork+0x14/0x2c
        0x0

other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&port_dev->status_lock);
                                lock(hcd->address0_mutex);
lock(&port_dev->status_lock);
   lock(hcd->address0_mutex);

  *** DEADLOCK ***

4 locks held by kworker/u16:8/738:
  #0: c1c08ca8 ((wq_completion)events_unbound){+.+.}-{0:0}, at: 
process_one_work+0x21c/0x7ec
  #1: cd9cff18 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: 
process_one_work+0x21c/0x7ec
  #2: cf810148 (&dev->mutex){....}-{3:3}, at: device_resume+0x60/0x258
  #3: cf80ab3c (&port_dev->status_lock){+.+.}-{3:3}, at: 
usb_port_resume+0xa0/0x7e8

stack backtrace:
CPU: 4 PID: 738 Comm: kworker/u16:8 Not tainted 
5.16.0-rc1-00014-g6ae6dc22d2d1 #4126
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_unbound async_run_entry_fn
[<c01110d0>] (unwind_backtrace) from [<c010cab8>] (show_stack+0x10/0x14)
[<c010cab8>] (show_stack) from [<c0b7427c>] (dump_stack_lvl+0x58/0x70)
[<c0b7427c>] (dump_stack_lvl) from [<c0192958>] 
(check_noncircular+0x144/0x15c)
[<c0192958>] (check_noncircular) from [<c0195e68>] 
(__lock_acquire+0x17e4/0x3180)
[<c0195e68>] (__lock_acquire) from [<c01983ec>] (lock_acquire+0x2a0/0x42c)
[<c01983ec>] (lock_acquire) from [<c0b7ba80>] (__mutex_lock+0x94/0xaa8)
[<c0b7ba80>] (__mutex_lock) from [<c0b7c4b0>] (mutex_lock_nested+0x1c/0x24)
[<c0b7c4b0>] (mutex_lock_nested) from [<c077bf9c>] 
(usb_reset_and_verify_device+0xe8/0x3e4)
[<c077bf9c>] (usb_reset_and_verify_device) from [<c077e79c>] 
(usb_port_resume+0x4e0/0x7e8)
[<c077e79c>] (usb_port_resume) from [<c07941f0>] 
(usb_generic_driver_resume+0x18/0x40)
[<c07941f0>] (usb_generic_driver_resume) from [<c0789a40>] 
(usb_resume_both+0x120/0x164)
[<c0789a40>] (usb_resume_both) from [<c078a854>] (usb_resume+0x14/0x60)
[<c078a854>] (usb_resume) from [<c0777fa0>] (usb_dev_resume+0xc/0x10)
[<c0777fa0>] (usb_dev_resume) from [<c06ca1b8>] 
(dpm_run_callback+0x98/0x32c)
[<c06ca1b8>] (dpm_run_callback) from [<c06ca500>] (device_resume+0xb4/0x258)
[<c06ca500>] (device_resume) from [<c06ca6c4>] (async_resume+0x20/0x64)
[<c06ca6c4>] (async_resume) from [<c01548bc>] 
(async_run_entry_fn+0x40/0x15c)
[<c01548bc>] (async_run_entry_fn) from [<c0148a68>] 
(process_one_work+0x2c8/0x7ec)
[<c0148a68>] (process_one_work) from [<c0148fdc>] (worker_thread+0x50/0x584)
[<c0148fdc>] (worker_thread) from [<c0151274>] (kthread+0x13c/0x19c)
[<c0151274>] (kthread) from [<c0100108>] (ret_from_fork+0x14/0x2c)
Exception stack(0xcd9cffb0 to 0xcd9cfff8)
ffa0:                                     00000000 00000000 00000000 
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
usb 1-1: reset high-speed USB device number 2 using exynos-ehci
usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
usb usb3: root hub lost power or was reset
usb usb4: root hub lost power or was reset
s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
smsc95xx 1-1.1:1.0 eth0: Link is Down
OOM killer enabled.
Restarting tasks ... done.
PM: suspend exit

Reverting it on top of next-20211118 fixes/hides this warning. Let me 
know if I can help somehow fixing this issue.

> ---
>   drivers/usb/core/hub.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 86658a81d284..00c3506324e4 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -4700,8 +4700,6 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
>   	if (oldspeed == USB_SPEED_LOW)
>   		delay = HUB_LONG_RESET_TIME;
>   
> -	mutex_lock(hcd->address0_mutex);
> -
>   	/* Reset the device; full speed may morph to high speed */
>   	/* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
>   	retval = hub_port_reset(hub, port1, udev, delay, false);
> @@ -5016,7 +5014,6 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
>   		hub_port_disable(hub, port1, 0);
>   		update_devnum(udev, devnum);	/* for disconnect processing */
>   	}
> -	mutex_unlock(hcd->address0_mutex);
>   	return retval;
>   }
>   
> @@ -5246,6 +5243,9 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus,
>   		unit_load = 100;
>   
>   	status = 0;
> +
> +	mutex_lock(hcd->address0_mutex);
> +
>   	for (i = 0; i < PORT_INIT_TRIES; i++) {
>   
>   		/* reallocate for each attempt, since references
> @@ -5282,6 +5282,8 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus,
>   		if (status < 0)
>   			goto loop;
>   
> +		mutex_unlock(hcd->address0_mutex);
> +
>   		if (udev->quirks & USB_QUIRK_DELAY_INIT)
>   			msleep(2000);
>   
> @@ -5370,6 +5372,7 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus,
>   
>   loop_disable:
>   		hub_port_disable(hub, port1, 1);
> +		mutex_lock(hcd->address0_mutex);
>   loop:
>   		usb_ep0_reinit(udev);
>   		release_devnum(udev);
> @@ -5396,6 +5399,8 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus,
>   	}
>   
>   done:
> +	mutex_unlock(hcd->address0_mutex);
> +
>   	hub_port_disable(hub, port1, 1);
>   	if (hcd->driver->relinquish_port && !hub->hdev->parent) {
>   		if (status != -ENOTCONN && status != -ENODEV)
> @@ -5915,6 +5920,8 @@ static int usb_reset_and_verify_device(struct usb_device *udev)
>   	bos = udev->bos;
>   	udev->bos = NULL;
>   
> +	mutex_lock(hcd->address0_mutex);
> +
>   	for (i = 0; i < PORT_INIT_TRIES; ++i) {
>   
>   		/* ep0 maxpacket size may change; let the HCD know about it.
> @@ -5924,6 +5931,7 @@ static int usb_reset_and_verify_device(struct usb_device *udev)
>   		if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV)
>   			break;
>   	}
> +	mutex_unlock(hcd->address0_mutex);
>   
>   	if (ret < 0)
>   		goto re_enumerate;

Best regards

Mathias Nyman Nov. 18, 2021, 1:50 p.m. UTC | #4

On 18.11.2021 13.19, Marek Szyprowski wrote:
> Hi,
> 
> On 15.11.2021 23:16, Mathias Nyman wrote:
>> xHC hardware can only have one slot in default state with address 0
>> waiting for a unique address at a time, otherwise "undefined behavior
>> may occur" according to xhci spec 5.4.3.4
>>
>> The address0_mutex exists to prevent this across both xhci roothubs.
>>
>> If hub_port_init() fails, it may unlock the mutex and exit with a xhci
>> slot in default state. If the other xhci roothub calls hub_port_init()
>> at this point we end up with two slots in default state.
>>
>> Make sure the address0_mutex protects the slot default state across
>> hub_port_init() retries, until slot is addressed or disabled.
>>
>> Note, one known minor case is not fixed by this patch.
>> If device needs to be reset during resume, but fails all hub_port_init()
>> retries in usb_reset_and_verify_device(), then it's possible the slot is
>> still left in default state when address0_mutex is unlocked.
>>
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> 
> This patch landed in linux next-20211118 as commit 6ae6dc22d2d1 ("usb: 
> hub: Fix usb enumeration issue due to address0 race"). On my test 
> systems it triggers the following deplock warning during system 
> suspend/resume cycle:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.16.0-rc1-00014-g6ae6dc22d2d1 #4126 Not tainted
> ------------------------------------------------------
> kworker/u16:8/738 is trying to acquire lock:
> cf81f738 (hcd->address0_mutex){+.+.}-{3:3}, at: 
> usb_reset_and_verify_device+0xe8/0x3e4
> 
> but task is already holding lock:
> cf80ab3c (&port_dev->status_lock){+.+.}-{3:3}, at: 
> usb_port_resume+0xa0/0x7e8
> 

Thanks, I see it now.

Lock order is:
  mutex_lock(&port_dev->status_lock)
    mutex_lock(hcd->address0_mutex)
    mutex_unlock(hcd->address0_mutex)
  mutex_unlock(&port_dev->status_lock)
in hub_port_connect(), usb_port_resume() and usb_reset_device()

But patch changed the status_lock and address0_mutex lock order in hub_port_connect().
Lets see if we can take the status_lock a bit earlier in hub_port_connect() to
solve this.

-Mathias

Mathias Nyman Nov. 22, 2021, 10:44 a.m. UTC | #5

On 18.11.2021 15.50, Mathias Nyman wrote:
> On 18.11.2021 13.19, Marek Szyprowski wrote:
>> Hi,
>>
>> On 15.11.2021 23:16, Mathias Nyman wrote:
>>> xHC hardware can only have one slot in default state with address 0
>>> waiting for a unique address at a time, otherwise "undefined behavior
>>> may occur" according to xhci spec 5.4.3.4
>>>
>>> The address0_mutex exists to prevent this across both xhci roothubs.
>>>
>>> If hub_port_init() fails, it may unlock the mutex and exit with a xhci
>>> slot in default state. If the other xhci roothub calls hub_port_init()
>>> at this point we end up with two slots in default state.
>>>
>>> Make sure the address0_mutex protects the slot default state across
>>> hub_port_init() retries, until slot is addressed or disabled.
>>>
>>> Note, one known minor case is not fixed by this patch.
>>> If device needs to be reset during resume, but fails all hub_port_init()
>>> retries in usb_reset_and_verify_device(), then it's possible the slot is
>>> still left in default state when address0_mutex is unlocked.
>>>
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>
>> This patch landed in linux next-20211118 as commit 6ae6dc22d2d1 ("usb: 
>> hub: Fix usb enumeration issue due to address0 race"). On my test 
>> systems it triggers the following deplock warning during system 
>> suspend/resume cycle:
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.16.0-rc1-00014-g6ae6dc22d2d1 #4126 Not tainted
>> ------------------------------------------------------
>> kworker/u16:8/738 is trying to acquire lock:
>> cf81f738 (hcd->address0_mutex){+.+.}-{3:3}, at: 
>> usb_reset_and_verify_device+0xe8/0x3e4
>>
>> but task is already holding lock:
>> cf80ab3c (&port_dev->status_lock){+.+.}-{3:3}, at: 
>> usb_port_resume+0xa0/0x7e8
>>
> 
> Thanks, I see it now.
> 
> Lock order is:
>   mutex_lock(&port_dev->status_lock)
>     mutex_lock(hcd->address0_mutex)
>     mutex_unlock(hcd->address0_mutex)
>   mutex_unlock(&port_dev->status_lock)
> in hub_port_connect(), usb_port_resume() and usb_reset_device()
> 
> But patch changed the status_lock and address0_mutex lock order in hub_port_connect().
> Lets see if we can take the status_lock a bit earlier in hub_port_connect() to
> solve this.
> 

I can easily reproduce this myself now.
I'll send a patch on top of this one to fix it.
Lockdep warnings are gone for me after applying it,
Also fixes an unbalanced address0_mutex unlock in error codepath.

Grateful if someone else could try it out as well.

Thanks 
-Mathias

usb: hub: Fix usb enumeration issue due to address0 race

Commit Message

Comments

Patch