diff mbox

Warning about possible recursive locking detected in IPoIB

Message ID 519E3472.3050708@profitbricks.com (mailing list archive)
State Rejected
Headers show

Commit Message

Jinpu Wang May 23, 2013, 3:23 p.m. UTC
Hi Or,

I saw below warning when enable CONFIG_DEBUG_MUTEXES


> 1893 May 21 08:56:32 ib2 kernel: [   44.738725] =============================================
>  1894 May 21 08:56:32 ib2 kernel: [   44.738782] [ INFO: possible recursive locking detected ]
>  1895 May 21 08:56:32 ib2 kernel: [   44.738841] 3.9.0-rc7-pserver #4 Tainted: G           O
>  1896 May 21 08:56:32 ib2 kernel: [   44.738896] ---------------------------------------------
>  1897 May 21 08:56:32 ib2 kernel: [   44.738953] kworker/u:5/238 is trying to acquire lock:
>  1898 May 21 08:56:32 ib2 kernel: [   44.739008]  (&priv->vlan_mutex){+.+.+.}, at: [<ffffffffa073f36c>] __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib]
>  1899 May 21 08:56:32 ib2 kernel: [   44.739218]
>  1900 May 21 08:56:32 ib2 kernel: [   44.739218] but task is already holding lock:
>  1901 May 21 08:56:32 ib2 kernel: [   44.739328]  (&priv->vlan_mutex){+.+.+.}, at: [<ffffffffa073f36c>] __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib]
>  1902 May 21 08:56:32 ib2 kernel: [   44.739537]
>  1903 May 21 08:56:32 ib2 kernel: [   44.739537] other info that might help us debug this:
>  1904 May 21 08:56:32 ib2 kernel: [   44.739613]  Possible unsafe locking scenario:
>  1905 May 21 08:56:32 ib2 kernel: [   44.739613]
>  1906 May 21 08:56:32 ib2 kernel: [   44.739688]        CPU0
>  1907 May 21 08:56:32 ib2 kernel: [   44.739741]        ----
>  1908 May 21 08:56:32 ib2 kernel: [   44.739791]   lock(&priv->vlan_mutex);
>  1909 May 21 08:56:32 ib2 kernel: [   44.739902]   lock(&priv->vlan_mutex);
>  1910 May 21 08:56:32 ib2 kernel: [   44.740014]
>  1911 May 21 08:56:32 ib2 kernel: [   44.740014]  *** DEADLOCK ***
>  1912 May 21 08:56:32 ib2 kernel: [   44.740014]
>  1913 May 21 08:56:32 ib2 kernel: [   44.740103]  May be due to missing lock nesting notation
>  1914 May 21 08:56:32 ib2 kernel: [   44.740103]
>  1915 May 21 08:56:32 ib2 kernel: [   44.740213] 3 locks held by kworker/u:5/238:
>  1916 May 21 08:56:32 ib2 kernel: [   44.740266]  #0:  (ipoib){.+.+.+}, at: [<ffffffff81067755>] process_one_work+0x165/0x560
>  1917 May 21 08:56:32 ib2 kernel: [   44.740495]  #1:  ((&priv->flush_heavy)){+.+...}, at: [<ffffffff81067755>] process_one_work+0x165/0x560
>  1918 May 21 08:56:32 ib2 kernel: [   44.740725]  #2:  (&priv->vlan_mutex){+.+.+.}, at: [<ffffffffa073f36c>] __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib]
>  1919 May 21 08:56:32 ib2 kernel: [   44.740961]
>  1920 May 21 08:56:32 ib2 kernel: [   44.740961] stack backtrace:
>  1921 May 21 08:56:32 ib2 kernel: [   44.741035] Pid: 238, comm: kworker/u:5 Tainted: G           O 3.9.0-rc7-pserver #4
>  1922 May 21 08:56:32 ib2 kernel: [   44.741111] Call Trace:
>  1923 May 21 08:56:32 ib2 kernel: [   44.741170]  [<ffffffff81046430>] ? vprintk_emit+0x280/0x520
>  1924 May 21 08:56:32 ib2 kernel: [   44.741233]  [<ffffffff810ac473>] __lock_acquire+0x6c3/0x17c0
>  1925 May 21 08:56:32 ib2 kernel: [   44.741295]  [<ffffffff810aca3c>] ? __lock_acquire+0xc8c/0x17c0
>  1926 May 21 08:56:32 ib2 kernel: [   44.741357]  [<ffffffff810047a7>] ? dump_trace+0x177/0x2f0
>  1927 May 21 08:56:32 ib2 kernel: [   44.741418]  [<ffffffff810ad612>] lock_acquire+0xa2/0x180
>  1928 May 21 08:56:32 ib2 kernel: [   44.741483]  [<ffffffffa073f36c>] ? __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib]
>  1929 May 21 08:56:32 ib2 kernel: [   44.741563]  [<ffffffff8101090f>] ? save_stack_trace+0x2f/0x50
>  1930 May 21 08:56:32 ib2 kernel: [   44.741626]  [<ffffffff817bb0cb>] __mutex_lock_common+0x5b/0x3e0
>  1931 May 21 08:56:32 ib2 kernel: [   44.741693]  [<ffffffffa073f36c>] ? __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib]
>  1932 May 21 08:56:32 ib2 kernel: [   44.741777]  [<ffffffffa073f36c>] ? __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib]
>  1933 May 21 08:56:32 ib2 kernel: [   44.741855]  [<ffffffff817bb585>] mutex_lock_nested+0x45/0x50
>  1934 May 21 08:56:32 ib2 kernel: [   44.741922]  [<ffffffffa073f36c>] __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib]
>  1935 May 21 08:56:32 ib2 kernel: [   44.741991]  [<ffffffffa073f38a>] __ipoib_ib_dev_flush+0x5a/0x230 [ib_ipoib]
>  1936 May 21 08:56:32 ib2 kernel: [   44.742060]  [<ffffffffa073f57a>] ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib]
> 1937 May 21 08:56:32 ib2 kernel: [   44.742138]  [<ffffffff810677c6>] process_one_work+0x1d6/0x560
>  1938 May 21 08:56:32 ib2 kernel: [   44.742199]  [<ffffffff81067755>] ? process_one_work+0x165/0x560
>  1939 May 21 08:56:32 ib2 kernel: [   44.742262]  [<ffffffff81068e29>] worker_thread+0x119/0x370
>  1940 May 21 08:56:32 ib2 kernel: [   44.742324]  [<ffffffff81068d10>] ? manage_workers+0x340/0x340
>  1941 May 21 08:56:32 ib2 kernel: [   44.742388]  [<ffffffff8106e846>] kthread+0xe6/0xf0
>  1942 May 21 08:56:32 ib2 kernel: [   44.742450]  [<ffffffff8106e760>] ? __init_kthread_worker+0x70/0x70
>  1943 May 21 08:56:32 ib2 kernel: [   44.742513]  [<ffffffff817c842c>] ret_from_fork+0x7c/0xb0
>  1944 May 21 08:56:32 ib2 kernel: [   44.742575]  [<ffffffff8106e760>] ? __init_kthread_worker+0x70/0x70
>  1945 May 21 08:56:32 ib2 kernel: [   44.744467] IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready
>  1946 May 21 08:56:45 ib2 kernel: [   57.700823] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready

And I found attached patch you submitted long time ago, I tried that
patch, it fixed the warning, I wonder why the patch was not accepted,
anything wrong?

Regards,
Jack
diff mbox

Patch

do nested flushing only if the device isn't a child

Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>

----

setting CONFIG_DEBUG_MUTEXES I see the below warning, however,
for some reason, I didn't manage to trigger it without my other
patch that adds the clones, I don't see how that patch could
be the reason for the warning, as the code always goes nested,
I've instrumented the flush code to dump its caller/stack and
indeed, you can see that the flushing code is called recursively
and should have that warning, but it doesn't...

ib0.8001: downing ib_dev
ib0: downing ib_dev
ib0: ipoib_ib_dev_flush_light called
ib0: __ipoib_ib_dev_flush pid 29251
Pid: 29251, comm: kworker/u:1 Not tainted 3.2.0-06106-g75f0703-dirty #16
Call Trace:
 [<ffffffffa02b5e1e>] __ipoib_ib_dev_flush+0x57/0x204 [ib_ipoib]
 [<ffffffffa02b6057>] ? ipoib_ib_dev_flush_normal+0x46/0x46 [ib_ipoib]
 [<ffffffffa02b6096>] ipoib_ib_dev_flush_light+0x3f/0x43 [ib_ipoib]
 [<ffffffff81041ee6>] process_one_work+0x2bd/0x4a6
 [<ffffffff81041e39>] ? process_one_work+0x210/0x4a6
 [<ffffffff810424e6>] worker_thread+0x1d6/0x350
 [<ffffffff81042310>] ? rescuer_thread+0x241/0x241
 [<ffffffff81045d5a>] kthread+0x84/0x8c
 [<ffffffff81366ee4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810514d1>] ? finish_task_switch+0x154/0x156
 [<ffffffff8135f243>] ? _raw_spin_unlock_irq+0x2b/0x40
 [<ffffffff8135f59d>] ? retint_restore_args+0xe/0xe
 [<ffffffff81045cd6>] ? __init_kthread_worker+0x56/0x56
 [<ffffffff81366ee0>] ? gs_change+0xb/0xb
ib0.8001: __ipoib_ib_dev_flush pid 29251
Pid: 29251, comm: kworker/u:1 Not tainted 3.2.0-06106-g75f0703-dirty #16
Call Trace:
 [<ffffffffa02b5e1e>] __ipoib_ib_dev_flush+0x57/0x204 [ib_ipoib]
 [<ffffffffa02b5e4e>] __ipoib_ib_dev_flush+0x87/0x204 [ib_ipoib]
 [<ffffffffa02b6057>] ? ipoib_ib_dev_flush_normal+0x46/0x46 [ib_ipoib]
 [<ffffffffa02b6096>] ipoib_ib_dev_flush_light+0x3f/0x43 [ib_ipoib]
 [<ffffffff81041ee6>] process_one_work+0x2bd/0x4a6
 [<ffffffff81041e39>] ? process_one_work+0x210/0x4a6
 [<ffffffff810424e6>] worker_thread+0x1d6/0x350
 [<ffffffff81042310>] ? rescuer_thread+0x241/0x241
 [<ffffffff81045d5a>] kthread+0x84/0x8c
 [<ffffffff81366ee4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810514d1>] ? finish_task_switch+0x154/0x156
 [<ffffffff8135f243>] ? _raw_spin_unlock_irq+0x2b/0x40
 [<ffffffff8135f59d>] ? retint_restore_args+0xe/0xe
 [<ffffffff81045cd6>] ? __init_kthread_worker+0x56/0x56
 [<ffffffff81366ee0>] ? gs_change+0xb/0xb

---

=============================================
[ INFO: possible recursive locking detected ]
3.2.0-06106-g75f0703-dirty #16 Not tainted
---------------------------------------------
kworker/u:2/1578 is trying to acquire lock:
 (&priv->vlan_mutex){+.+.+.}, at: [<ffffffffa021ae9f>] 
__ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib]

but task is already holding lock:
 (&priv->vlan_mutex){+.+.+.}, at: [<ffffffffa021ae9f>] 
__ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&priv->vlan_mutex);
  lock(&priv->vlan_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u:2/1578:
 #0:  (ipoib){.+.+.+}, at: [<ffffffff81041e39>] process_one_work+0x210/0x4a6
 #1:  ((&priv->flush_heavy)){+.+...}, at: [<ffffffff81041e39>] 
process_one_work+0x210/0x4a6
 #2:  (&priv->vlan_mutex){+.+.+.}, at: [<ffffffffa021ae9f>] 
__ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib]

stack backtrace:
Pid: 1578, comm: kworker/u:2 Not tainted 3.2.0-06106-g75f0703-dirty #16
Call Trace:
 [<ffffffff81029a02>] ? console_unlock+0x10c/0x207
 [<ffffffff810668a6>] __lock_acquire+0x16b5/0x174e
 [<ffffffff8100ca22>] ? save_stack_trace+0x2a/0x47
 [<ffffffff81066a2f>] lock_acquire+0xf0/0x116
 [<ffffffffa021ae9f>] ? __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib]
 [<ffffffff8135cbb9>] mutex_lock_nested+0x64/0x2e6
 [<ffffffffa021ae9f>] ? __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib]
 [<ffffffff81063bad>] ? trace_hardirqs_on_caller+0x11e/0x155
 [<ffffffffa021ae9f>] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib]
 [<ffffffffa021aec5>] __ipoib_ib_dev_flush+0x52/0x1cf [ib_ipoib]
 [<ffffffff81063bad>] ? trace_hardirqs_on_caller+0x11e/0x155
 [<ffffffffa021b042>] ? __ipoib_ib_dev_flush+0x1cf/0x1cf [ib_ipoib]
 [<ffffffffa021b057>] ipoib_ib_dev_flush_heavy+0x15/0x17 [ib_ipoib]
 [<ffffffff81041ee6>] process_one_work+0x2bd/0x4a6
 [<ffffffff81041e39>] ? process_one_work+0x210/0x4a6
 [<ffffffff8135f243>] ? _raw_spin_unlock_irq+0x2b/0x40
 [<ffffffff810424e6>] worker_thread+0x1d6/0x350
 [<ffffffff81042310>] ? rescuer_thread+0x241/0x241
 [<ffffffff81045d5a>] kthread+0x84/0x8c
 [<ffffffff81366ee4>] kernel_thread_helper+0x4/0x10
 [<ffffffff8135f59d>] ? retint_restore_args+0xe/0xe
 [<ffffffff81045cd6>] ? __init_kthread_worker+0x56/0x56
 [<ffffffff81366ee0>] ? gs_change+0xb/0xb
ADDRCONF(NETDEV_CHANGE): ib0.8001: link becomes ready
ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready

 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 5c1bc99..cac2b71 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -934,16 +934,18 @@  static void __ipoib_ib_dev_flush(struct ipoib_dev_priv 
*priv,
        struct net_device *dev = priv->dev;
        u16 new_index;

-       mutex_lock(&priv->vlan_mutex);
+       if (!priv->parent) {
+               mutex_lock(&priv->vlan_mutex);

-       /*
-        * Flush any child interfaces too -- they might be up even if
-        * the parent is down.
-        */
-       list_for_each_entry(cpriv, &priv->child_intfs, list)
-               __ipoib_ib_dev_flush(cpriv, level);
+               /*
+                * Flush any child interfaces too -- they might be up even if
+                * the parent is down.
+                */
+               list_for_each_entry(cpriv, &priv->child_intfs, list)
+                       __ipoib_ib_dev_flush(cpriv, level);

-       mutex_unlock(&priv->vlan_mutex);
+               mutex_unlock(&priv->vlan_mutex);
+       }

        if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags)) {
                ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not 
set.\n");