From patchwork Thu May 23 15:23:30 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinpu Wang X-Patchwork-Id: 2607661 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 583DCDFB78 for ; Thu, 23 May 2013 15:23:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758797Ab3EWPXd (ORCPT ); Thu, 23 May 2013 11:23:33 -0400 Received: from mail-bk0-f52.google.com ([209.85.214.52]:57607 "EHLO mail-bk0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758652Ab3EWPXc (ORCPT ); Thu, 23 May 2013 11:23:32 -0400 Received: by mail-bk0-f52.google.com with SMTP id mz10so1926019bkb.11 for ; Thu, 23 May 2013 08:23:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:x-gm-message-state; bh=HQIs0Iqaxs069GfD4uoOHbUHsEHzeRArRr6yJJig3II=; b=lx3OKOzfN6oomHQHYJemokMViIdoEqxY9DtMBN14umbQJzm96wjH2gPpKEgXDwhtSc 0PmtHNQDd0YKYeXzfiss0/J4tS3fLvnz9tmn9LjFr8qsm+JniABBwpVQFTECfe6UhY+1 nOHgRIwKXd16YM0Kpbt5dYmFsd8/wTynhQi1F6of2HvjMHDFV1rcwFJ+aegPHIEhxour f2xF2IHF2YmUCkSiNiYwk2SdS31TTly+1JBxOcnZxN4O0h6euuSU0QYRVOhwREx8duyj Sgsxw8pkBXSs4Kljy9dYWcrscDjOvs9zLpTJMCUTaZtBvSEg0f2sPRGKJSLSA1i+QA+r YeJA== X-Received: by 10.205.44.129 with SMTP id ug1mr7160254bkb.29.1369322610711; Thu, 23 May 2013 08:23:30 -0700 (PDT) Received: from [192.168.88.32] ([62.217.45.26]) by mx.google.com with ESMTPSA id kw1sm3453906bkb.21.2013.05.23.08.23.28 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 23 May 2013 08:23:29 -0700 (PDT) Message-ID: <519E3472.3050708@profitbricks.com> Date: Thu, 23 May 2013 17:23:30 +0200 From: Jack Wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Or Gerlitz , linux-rdma@vger.kernel.org, Sebastian Riemer , Dongsu Park Subject: Warning about possible recursive locking detected in IPoIB X-Gm-Message-State: ALoCoQnUBDS7EkyQzxR0L3XgX5JgpVofct0B1CA5HtLa2p4p9xeh+V/QAPApyOa1YlKYtTKzuEqI Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Hi Or, I saw below warning when enable CONFIG_DEBUG_MUTEXES > 1893 May 21 08:56:32 ib2 kernel: [ 44.738725] ============================================= > 1894 May 21 08:56:32 ib2 kernel: [ 44.738782] [ INFO: possible recursive locking detected ] > 1895 May 21 08:56:32 ib2 kernel: [ 44.738841] 3.9.0-rc7-pserver #4 Tainted: G O > 1896 May 21 08:56:32 ib2 kernel: [ 44.738896] --------------------------------------------- > 1897 May 21 08:56:32 ib2 kernel: [ 44.738953] kworker/u:5/238 is trying to acquire lock: > 1898 May 21 08:56:32 ib2 kernel: [ 44.739008] (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib] > 1899 May 21 08:56:32 ib2 kernel: [ 44.739218] > 1900 May 21 08:56:32 ib2 kernel: [ 44.739218] but task is already holding lock: > 1901 May 21 08:56:32 ib2 kernel: [ 44.739328] (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib] > 1902 May 21 08:56:32 ib2 kernel: [ 44.739537] > 1903 May 21 08:56:32 ib2 kernel: [ 44.739537] other info that might help us debug this: > 1904 May 21 08:56:32 ib2 kernel: [ 44.739613] Possible unsafe locking scenario: > 1905 May 21 08:56:32 ib2 kernel: [ 44.739613] > 1906 May 21 08:56:32 ib2 kernel: [ 44.739688] CPU0 > 1907 May 21 08:56:32 ib2 kernel: [ 44.739741] ---- > 1908 May 21 08:56:32 ib2 kernel: [ 44.739791] lock(&priv->vlan_mutex); > 1909 May 21 08:56:32 ib2 kernel: [ 44.739902] lock(&priv->vlan_mutex); > 1910 May 21 08:56:32 ib2 kernel: [ 44.740014] > 1911 May 21 08:56:32 ib2 kernel: [ 44.740014] *** DEADLOCK *** > 1912 May 21 08:56:32 ib2 kernel: [ 44.740014] > 1913 May 21 08:56:32 ib2 kernel: [ 44.740103] May be due to missing lock nesting notation > 1914 May 21 08:56:32 ib2 kernel: [ 44.740103] > 1915 May 21 08:56:32 ib2 kernel: [ 44.740213] 3 locks held by kworker/u:5/238: > 1916 May 21 08:56:32 ib2 kernel: [ 44.740266] #0: (ipoib){.+.+.+}, at: [] process_one_work+0x165/0x560 > 1917 May 21 08:56:32 ib2 kernel: [ 44.740495] #1: ((&priv->flush_heavy)){+.+...}, at: [] process_one_work+0x165/0x560 > 1918 May 21 08:56:32 ib2 kernel: [ 44.740725] #2: (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib] > 1919 May 21 08:56:32 ib2 kernel: [ 44.740961] > 1920 May 21 08:56:32 ib2 kernel: [ 44.740961] stack backtrace: > 1921 May 21 08:56:32 ib2 kernel: [ 44.741035] Pid: 238, comm: kworker/u:5 Tainted: G O 3.9.0-rc7-pserver #4 > 1922 May 21 08:56:32 ib2 kernel: [ 44.741111] Call Trace: > 1923 May 21 08:56:32 ib2 kernel: [ 44.741170] [] ? vprintk_emit+0x280/0x520 > 1924 May 21 08:56:32 ib2 kernel: [ 44.741233] [] __lock_acquire+0x6c3/0x17c0 > 1925 May 21 08:56:32 ib2 kernel: [ 44.741295] [] ? __lock_acquire+0xc8c/0x17c0 > 1926 May 21 08:56:32 ib2 kernel: [ 44.741357] [] ? dump_trace+0x177/0x2f0 > 1927 May 21 08:56:32 ib2 kernel: [ 44.741418] [] lock_acquire+0xa2/0x180 > 1928 May 21 08:56:32 ib2 kernel: [ 44.741483] [] ? __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib] > 1929 May 21 08:56:32 ib2 kernel: [ 44.741563] [] ? save_stack_trace+0x2f/0x50 > 1930 May 21 08:56:32 ib2 kernel: [ 44.741626] [] __mutex_lock_common+0x5b/0x3e0 > 1931 May 21 08:56:32 ib2 kernel: [ 44.741693] [] ? __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib] > 1932 May 21 08:56:32 ib2 kernel: [ 44.741777] [] ? __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib] > 1933 May 21 08:56:32 ib2 kernel: [ 44.741855] [] mutex_lock_nested+0x45/0x50 > 1934 May 21 08:56:32 ib2 kernel: [ 44.741922] [] __ipoib_ib_dev_flush+0x3c/0x230 [ib_ipoib] > 1935 May 21 08:56:32 ib2 kernel: [ 44.741991] [] __ipoib_ib_dev_flush+0x5a/0x230 [ib_ipoib] > 1936 May 21 08:56:32 ib2 kernel: [ 44.742060] [] ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib] > 1937 May 21 08:56:32 ib2 kernel: [ 44.742138] [] process_one_work+0x1d6/0x560 > 1938 May 21 08:56:32 ib2 kernel: [ 44.742199] [] ? process_one_work+0x165/0x560 > 1939 May 21 08:56:32 ib2 kernel: [ 44.742262] [] worker_thread+0x119/0x370 > 1940 May 21 08:56:32 ib2 kernel: [ 44.742324] [] ? manage_workers+0x340/0x340 > 1941 May 21 08:56:32 ib2 kernel: [ 44.742388] [] kthread+0xe6/0xf0 > 1942 May 21 08:56:32 ib2 kernel: [ 44.742450] [] ? __init_kthread_worker+0x70/0x70 > 1943 May 21 08:56:32 ib2 kernel: [ 44.742513] [] ret_from_fork+0x7c/0xb0 > 1944 May 21 08:56:32 ib2 kernel: [ 44.742575] [] ? __init_kthread_worker+0x70/0x70 > 1945 May 21 08:56:32 ib2 kernel: [ 44.744467] IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready > 1946 May 21 08:56:45 ib2 kernel: [ 57.700823] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready And I found attached patch you submitted long time ago, I tried that patch, it fixed the warning, I wonder why the patch was not accepted, anything wrong? Regards, Jack do nested flushing only if the device isn't a child Signed-off-by: Or Gerlitz ---- setting CONFIG_DEBUG_MUTEXES I see the below warning, however, for some reason, I didn't manage to trigger it without my other patch that adds the clones, I don't see how that patch could be the reason for the warning, as the code always goes nested, I've instrumented the flush code to dump its caller/stack and indeed, you can see that the flushing code is called recursively and should have that warning, but it doesn't... ib0.8001: downing ib_dev ib0: downing ib_dev ib0: ipoib_ib_dev_flush_light called ib0: __ipoib_ib_dev_flush pid 29251 Pid: 29251, comm: kworker/u:1 Not tainted 3.2.0-06106-g75f0703-dirty #16 Call Trace: [] __ipoib_ib_dev_flush+0x57/0x204 [ib_ipoib] [] ? ipoib_ib_dev_flush_normal+0x46/0x46 [ib_ipoib] [] ipoib_ib_dev_flush_light+0x3f/0x43 [ib_ipoib] [] process_one_work+0x2bd/0x4a6 [] ? process_one_work+0x210/0x4a6 [] worker_thread+0x1d6/0x350 [] ? rescuer_thread+0x241/0x241 [] kthread+0x84/0x8c [] kernel_thread_helper+0x4/0x10 [] ? finish_task_switch+0x154/0x156 [] ? _raw_spin_unlock_irq+0x2b/0x40 [] ? retint_restore_args+0xe/0xe [] ? __init_kthread_worker+0x56/0x56 [] ? gs_change+0xb/0xb ib0.8001: __ipoib_ib_dev_flush pid 29251 Pid: 29251, comm: kworker/u:1 Not tainted 3.2.0-06106-g75f0703-dirty #16 Call Trace: [] __ipoib_ib_dev_flush+0x57/0x204 [ib_ipoib] [] __ipoib_ib_dev_flush+0x87/0x204 [ib_ipoib] [] ? ipoib_ib_dev_flush_normal+0x46/0x46 [ib_ipoib] [] ipoib_ib_dev_flush_light+0x3f/0x43 [ib_ipoib] [] process_one_work+0x2bd/0x4a6 [] ? process_one_work+0x210/0x4a6 [] worker_thread+0x1d6/0x350 [] ? rescuer_thread+0x241/0x241 [] kthread+0x84/0x8c [] kernel_thread_helper+0x4/0x10 [] ? finish_task_switch+0x154/0x156 [] ? _raw_spin_unlock_irq+0x2b/0x40 [] ? retint_restore_args+0xe/0xe [] ? __init_kthread_worker+0x56/0x56 [] ? gs_change+0xb/0xb --- ============================================= [ INFO: possible recursive locking detected ] 3.2.0-06106-g75f0703-dirty #16 Not tainted --------------------------------------------- kworker/u:2/1578 is trying to acquire lock: (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] but task is already holding lock: (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&priv->vlan_mutex); lock(&priv->vlan_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/u:2/1578: #0: (ipoib){.+.+.+}, at: [] process_one_work+0x210/0x4a6 #1: ((&priv->flush_heavy)){+.+...}, at: [] process_one_work+0x210/0x4a6 #2: (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] stack backtrace: Pid: 1578, comm: kworker/u:2 Not tainted 3.2.0-06106-g75f0703-dirty #16 Call Trace: [] ? console_unlock+0x10c/0x207 [] __lock_acquire+0x16b5/0x174e [] ? save_stack_trace+0x2a/0x47 [] lock_acquire+0xf0/0x116 [] ? __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] [] mutex_lock_nested+0x64/0x2e6 [] ? __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] [] ? trace_hardirqs_on_caller+0x11e/0x155 [] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] [] __ipoib_ib_dev_flush+0x52/0x1cf [ib_ipoib] [] ? trace_hardirqs_on_caller+0x11e/0x155 [] ? __ipoib_ib_dev_flush+0x1cf/0x1cf [ib_ipoib] [] ipoib_ib_dev_flush_heavy+0x15/0x17 [ib_ipoib] [] process_one_work+0x2bd/0x4a6 [] ? process_one_work+0x210/0x4a6 [] ? _raw_spin_unlock_irq+0x2b/0x40 [] worker_thread+0x1d6/0x350 [] ? rescuer_thread+0x241/0x241 [] kthread+0x84/0x8c [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0xe/0xe [] ? __init_kthread_worker+0x56/0x56 [] ? gs_change+0xb/0xb ADDRCONF(NETDEV_CHANGE): ib0.8001: link becomes ready ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready drivers/infiniband/ulp/ipoib/ipoib_ib.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 5c1bc99..cac2b71 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -934,16 +934,18 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, struct net_device *dev = priv->dev; u16 new_index; - mutex_lock(&priv->vlan_mutex); + if (!priv->parent) { + mutex_lock(&priv->vlan_mutex); - /* - * Flush any child interfaces too -- they might be up even if - * the parent is down. - */ - list_for_each_entry(cpriv, &priv->child_intfs, list) - __ipoib_ib_dev_flush(cpriv, level); + /* + * Flush any child interfaces too -- they might be up even if + * the parent is down. + */ + list_for_each_entry(cpriv, &priv->child_intfs, list) + __ipoib_ib_dev_flush(cpriv, level); - mutex_unlock(&priv->vlan_mutex); + mutex_unlock(&priv->vlan_mutex); + } if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags)) { ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n");