diff mbox

[v3,9/9] PCI: imx6: Fix possible dead lock

Message ID 1411966997-27118-10-git-send-email-r65037@freescale.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Richard Zhu Sept. 29, 2014, 5:03 a.m. UTC
kernel report one possible dead lock during imx6sx pcie
suspend resume stress tests, after enable Lock Debugging.
platform: imx6sx sdb board + xhci(pcie2usb3.0 ep)

reason: usleep_range used in imx6_pcie_link_up maybe scheduled
out from dw_pcie_valid_config.isra...
About details, please see the following logs.

solution: replace the usleep_range(1000, 2000) by udelay(10) and
enlarge the loop counter.

logs:
[   50.643062] xhci_hcd 0000:01:00.0: Refused to change power state, currently in D3
[   50.653390]
[   50.654898] =========================================================
[   50.661343] [ INFO: possible irq lock inversion dependency detected ]
[   50.667792] 3.17.0-rc2-01341-gfc43ff7-dirty #101 Not tainted
[   50.673454] ---------------------------------------------------------
[   50.679898] kworker/u2:2/48 just changed the state of lock:
[   50.685477]  (pci_lock){+.....}, at: [<802d650c>] pci_bus_read_config_dword+0x44/0x94
[   50.693394] but this lock was taken by another, HARDIRQ-safe lock in the past:
[   50.700619]  (&irq_desc_lock_class){-.-...}

and interrupts could create inverse lock ordering between them.

[   50.710843]
[   50.710843] other info that might help us debug this:
[   50.717377]  Possible interrupt unsafe locking scenario:
[   50.717377]
[   50.724169]        CPU0                    CPU1
[   50.728702]        ----                    ----
[   50.733234]   lock(pci_lock);
[   50.736232]                                local_irq_disable();
[   50.742154]                                lock(&irq_desc_lock_class);
[   50.748713]                                lock(pci_lock);
[   50.754229]   <Interrupt>
[   50.756852]     lock(&irq_desc_lock_class);
[   50.761065]
[   50.761065]  *** DEADLOCK ***
...

[   52.119515] [<806e8ad0>] (schedule_hrtimeout_range) from [<80077e90>] (usleep_range+0x50/0x58)
[   52.128141] [<80077e40>] (usleep_range) from [<802f3694>] (imx6_pcie_link_up+0x48/0x16c)
[   52.136242] [<802f364c>] (imx6_pcie_link_up) from [<802f1b74>] (dw_pcie_valid_config.isra.10+0x40/0x7c)
[   52.145637]  r6:ae72606d r5:ae72606c r4:ae711228
[   52.150311] [<802f1b34>] (dw_pcie_valid_config.isra.10) from [<802f1ca8>] (dw_pcie_rd_conf+0x4c/0x154)
[   52.159620]  r7:00000000 r6:00000000 r5:ae711228 r4:ae726000
[   52.165355] [<802f1c5c>] (dw_pcie_rd_conf) from [<802d6534>] (pci_bus_read_config_dword+0x6c/0x94)
[   52.174316]  r9:adf4e910 r8:809bc51c r7:00000000 r6:60000153 r5:adc3fd5c r4:ae726000
[   52.182152] [<802d64c8>] (pci_bus_read_config_dword) from [<802db588>] (pci_restore_config_dword+0x54/0xa4)
[   52.191894]  r6:00000024 r5:0000000a r4:ae61e000
[   52.196570] [<802db534>] (pci_restore_config_dword) from [<802dd150>] (pci_restore_state.part.37+0x7c/0x1f8)
[   52.206399]  r8:802e086c r7:ae61dfe8 r6:519e2024 r5:ae61e000 r4:ae61dffc
[   52.213187] [<802dd0d4>] (pci_restore_state.part.37) from [<802dd2e8>] (pci_restore_state+0x1c/0x20)
[   52.222322]  r7:adc3fea8 r6:809d90e0 r5:ae61e000 r4:ae61e068
[   52.228056] [<802dd2cc>] (pci_restore_state) from [<802e0894>] (pci_pm_resume_noirq+0x28/0xa4)
[   52.236683] [<802e086c>] (pci_pm_resume_noirq) from [<8037ea1c>] (dpm_run_callback.isra.12+0x34/0x7c)

Signed-off-by: Richard Zhu <r65037@freescale.com>
---
 drivers/pci/host/pci-imx6.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Lucas Stach Sept. 29, 2014, 10:38 a.m. UTC | #1
Am Montag, den 29.09.2014, 13:03 +0800 schrieb Richard Zhu:
> kernel report one possible dead lock during imx6sx pcie
> suspend resume stress tests, after enable Lock Debugging.
> platform: imx6sx sdb board + xhci(pcie2usb3.0 ep)
> 
> reason: usleep_range used in imx6_pcie_link_up maybe scheduled
> out from dw_pcie_valid_config.isra...
> About details, please see the following logs.
> 
> solution: replace the usleep_range(1000, 2000) by udelay(10) and
> enlarge the loop counter.
> 
I don't like this change, as we eat one CPU just to work around a
slightly wrong behavior of the link-up signaling. I had a look at the
code in question and it seems there are more things that need different
handling in there. I'll post a patch to resolve the deadlock without
using a udelay shortly.

Regards,
Lucas

> logs:
> [   50.643062] xhci_hcd 0000:01:00.0: Refused to change power state, currently in D3
> [   50.653390]
> [   50.654898] =========================================================
> [   50.661343] [ INFO: possible irq lock inversion dependency detected ]
> [   50.667792] 3.17.0-rc2-01341-gfc43ff7-dirty #101 Not tainted
> [   50.673454] ---------------------------------------------------------
> [   50.679898] kworker/u2:2/48 just changed the state of lock:
> [   50.685477]  (pci_lock){+.....}, at: [<802d650c>] pci_bus_read_config_dword+0x44/0x94
> [   50.693394] but this lock was taken by another, HARDIRQ-safe lock in the past:
> [   50.700619]  (&irq_desc_lock_class){-.-...}
> 
> and interrupts could create inverse lock ordering between them.
> 
> [   50.710843]
> [   50.710843] other info that might help us debug this:
> [   50.717377]  Possible interrupt unsafe locking scenario:
> [   50.717377]
> [   50.724169]        CPU0                    CPU1
> [   50.728702]        ----                    ----
> [   50.733234]   lock(pci_lock);
> [   50.736232]                                local_irq_disable();
> [   50.742154]                                lock(&irq_desc_lock_class);
> [   50.748713]                                lock(pci_lock);
> [   50.754229]   <Interrupt>
> [   50.756852]     lock(&irq_desc_lock_class);
> [   50.761065]
> [   50.761065]  *** DEADLOCK ***
> ...
> 
> [   52.119515] [<806e8ad0>] (schedule_hrtimeout_range) from [<80077e90>] (usleep_range+0x50/0x58)
> [   52.128141] [<80077e40>] (usleep_range) from [<802f3694>] (imx6_pcie_link_up+0x48/0x16c)
> [   52.136242] [<802f364c>] (imx6_pcie_link_up) from [<802f1b74>] (dw_pcie_valid_config.isra.10+0x40/0x7c)
> [   52.145637]  r6:ae72606d r5:ae72606c r4:ae711228
> [   52.150311] [<802f1b34>] (dw_pcie_valid_config.isra.10) from [<802f1ca8>] (dw_pcie_rd_conf+0x4c/0x154)
> [   52.159620]  r7:00000000 r6:00000000 r5:ae711228 r4:ae726000
> [   52.165355] [<802f1c5c>] (dw_pcie_rd_conf) from [<802d6534>] (pci_bus_read_config_dword+0x6c/0x94)
> [   52.174316]  r9:adf4e910 r8:809bc51c r7:00000000 r6:60000153 r5:adc3fd5c r4:ae726000
> [   52.182152] [<802d64c8>] (pci_bus_read_config_dword) from [<802db588>] (pci_restore_config_dword+0x54/0xa4)
> [   52.191894]  r6:00000024 r5:0000000a r4:ae61e000
> [   52.196570] [<802db534>] (pci_restore_config_dword) from [<802dd150>] (pci_restore_state.part.37+0x7c/0x1f8)
> [   52.206399]  r8:802e086c r7:ae61dfe8 r6:519e2024 r5:ae61e000 r4:ae61dffc
> [   52.213187] [<802dd0d4>] (pci_restore_state.part.37) from [<802dd2e8>] (pci_restore_state+0x1c/0x20)
> [   52.222322]  r7:adc3fea8 r6:809d90e0 r5:ae61e000 r4:ae61e068
> [   52.228056] [<802dd2cc>] (pci_restore_state) from [<802e0894>] (pci_pm_resume_noirq+0x28/0xa4)
> [   52.236683] [<802e086c>] (pci_pm_resume_noirq) from [<8037ea1c>] (dpm_run_callback.isra.12+0x34/0x7c)
> 
> Signed-off-by: Richard Zhu <r65037@freescale.com>
> ---
>  drivers/pci/host/pci-imx6.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/host/pci-imx6.c b/drivers/pci/host/pci-imx6.c
> index be953aa..e60c195 100644
> --- a/drivers/pci/host/pci-imx6.c
> +++ b/drivers/pci/host/pci-imx6.c
> @@ -521,7 +521,7 @@ static void imx6_pcie_reset_phy(struct pcie_port *pp)
>  static int imx6_pcie_link_up(struct pcie_port *pp)
>  {
>  	u32 rc, debug_r0, rx_valid;
> -	int count = 5;
> +	int count = 500;
>  
>  	/*
>  	 * Test if the PHY reports that the link is up and also that the LTSSM
> @@ -552,7 +552,7 @@ static int imx6_pcie_link_up(struct pcie_port *pp)
>  		 * Wait a little bit, then re-check if the link finished
>  		 * the training.
>  		 */
> -		usleep_range(1000, 2000);
> +		udelay(10);
>  	}
>  	/*
>  	 * From L0, initiate MAC entry to gen2 if EP/RC supports gen2.
diff mbox

Patch

diff --git a/drivers/pci/host/pci-imx6.c b/drivers/pci/host/pci-imx6.c
index be953aa..e60c195 100644
--- a/drivers/pci/host/pci-imx6.c
+++ b/drivers/pci/host/pci-imx6.c
@@ -521,7 +521,7 @@  static void imx6_pcie_reset_phy(struct pcie_port *pp)
 static int imx6_pcie_link_up(struct pcie_port *pp)
 {
 	u32 rc, debug_r0, rx_valid;
-	int count = 5;
+	int count = 500;
 
 	/*
 	 * Test if the PHY reports that the link is up and also that the LTSSM
@@ -552,7 +552,7 @@  static int imx6_pcie_link_up(struct pcie_port *pp)
 		 * Wait a little bit, then re-check if the link finished
 		 * the training.
 		 */
-		usleep_range(1000, 2000);
+		udelay(10);
 	}
 	/*
 	 * From L0, initiate MAC entry to gen2 if EP/RC supports gen2.