[v3,00/21] DEPT(Dependency Tracker)

Message ID	1646042220-28952-1-git-send-email-byungchul.park@lge.com (mailing list archive)
Headers	show Return-Path: <linux-block-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A575DC4167B for <linux-block@archiver.kernel.org>; Mon, 28 Feb 2022 09:57:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234623AbiB1J6J (ORCPT <rfc822;linux-block@archiver.kernel.org>); Mon, 28 Feb 2022 04:58:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234578AbiB1J6H (ORCPT <rfc822;linux-block@vger.kernel.org>); Mon, 28 Feb 2022 04:58:07 -0500 Received: from lgeamrelo11.lge.com (lgeamrelo11.lge.com [156.147.23.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D340E1034 for <linux-block@vger.kernel.org>; Mon, 28 Feb 2022 01:57:19 -0800 (PST) Received: from unknown (HELO lgemrelse6q.lge.com) (156.147.1.121) by 156.147.23.51 with ESMTP; 28 Feb 2022 18:57:18 +0900 X-Original-SENDERIP: 156.147.1.121 X-Original-MAILFROM: byungchul.park@lge.com Received: from unknown (HELO localhost.localdomain) (10.177.244.38) by 156.147.1.121 with ESMTP; 28 Feb 2022 18:57:18 +0900 X-Original-SENDERIP: 10.177.244.38 X-Original-MAILFROM: byungchul.park@lge.com From: Byungchul Park <byungchul.park@lge.com> To: torvalds@linux-foundation.org Cc: damien.lemoal@opensource.wdc.com, linux-ide@vger.kernel.org, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, peterz@infradead.org, will@kernel.org, tglx@linutronix.de, rostedt@goodmis.org, joel@joelfernandes.org, sashal@kernel.org, daniel.vetter@ffwll.ch, chris@chris-wilson.co.uk, duyuyang@gmail.com, johannes.berg@intel.com, tj@kernel.org, tytso@mit.edu, willy@infradead.org, david@fromorbit.com, amir73il@gmail.com, bfields@fieldses.org, gregkh@linuxfoundation.org, kernel-team@lge.com, linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@kernel.org, minchan@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com, sj@kernel.org, jglisse@redhat.com, dennis@kernel.org, cl@linux.com, penberg@kernel.org, rientjes@google.com, vbabka@suse.cz, ngupta@vflare.org, linux-block@vger.kernel.org, paolo.valente@linaro.org, josef@toxicpanda.com, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, jack@suse.cz, jack@suse.com, jlayton@kernel.org, dan.j.williams@intel.com, hch@infradead.org, djwong@kernel.org, dri-devel@lists.freedesktop.org, airlied@linux.ie, rodrigosiqueiramelo@gmail.com, melissa.srw@gmail.com, hamohammed.sa@gmail.com Subject: [PATCH v3 00/21] DEPT(Dependency Tracker) Date: Mon, 28 Feb 2022 18:56:39 +0900 Message-Id: <1646042220-28952-1-git-send-email-byungchul.park@lge.com> X-Mailer: git-send-email 1.9.1 Precedence: bulk List-ID: <linux-block.vger.kernel.org> X-Mailing-List: linux-block@vger.kernel.org
Series	DEPT(Dependency Tracker) \| expand [v3,00/21] DEPT(Dependency Tracker) [v3,01/21] llist: Move llist_{head,node} definition to types.h [v3,02/21] dept: Implement Dept(Dependency Tracker) [v3,03/21] dept: Embed Dept data in Lockdep [v3,04/21] dept: Add a API for skipping dependency check temporarily [v3,05/21] dept: Apply Dept to spinlock [v3,06/21] dept: Apply Dept to mutex families [v3,07/21] dept: Apply Dept to rwlock [v3,08/21] dept: Apply Dept to wait_for_completion()/complete() [v3,09/21] dept: Apply Dept to seqlock [v3,10/21] dept: Apply Dept to rwsem [v3,11/21] dept: Add proc knobs to show stats and dependency graph [v3,12/21] dept: Introduce split map concept and new APIs for them [v3,13/21] dept: Apply Dept to wait/event of PG_{locked,writeback} [v3,14/21] dept: Apply SDT to swait [v3,15/21] dept: Apply SDT to wait(waitqueue) [v3,16/21] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread [v3,17/21] dept: Distinguish each syscall context from another [v3,18/21] dept: Distinguish each work from another [v3,19/21] dept: Disable Dept within the wait_bit layer by default [v3,20/21] dept: Add nocheck version of init_completion() [v3,21/21] dept: Disable Dept on struct crypto_larval's completion for now

Byungchul Park Feb. 28, 2022, 9:56 a.m. UTC

I didn't want to bother you so I was planning to send the next spin
after making more progress. However, PATCH v2 reports too many false
positives because Dept tracked the bit_wait_table[] wrong way - I
apologize for that. So I decided to send PATCH v3 first before going
further for those who want to run Dept for now.

There might still be some false positives but not overwhelming.

---

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.17-rc1 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

How to interpret reports:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Thanks,
Byungchul

---

Changes from v2:

	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
	   reporting a lot of false positives, which is my fault.
	   Wait/event for bit_wait_table[] should've been tagged in a
	   higher layer for better work, which is a future work.
	   (feedback from Jan Kara)
	2. Disable Dept on crypto_larval's completion to prevent a false
	   positive.

Changes from v1:

	1. Fix coding style and typo. (feedback from Steven)
	2. Distinguish each work context from another in workqueue.
	3. Skip checking lock acquisition with nest_lock, which is about
	   correct lock usage that should be checked by Lockdep.

Changes from RFC:

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	   (feedback from Linus and Matthew)
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Byungchul Park (21):
  llist: Move llist_{head,node} definition to types.h
  dept: Implement Dept(Dependency Tracker)
  dept: Embed Dept data in Lockdep
  dept: Add a API for skipping dependency check temporarily
  dept: Apply Dept to spinlock
  dept: Apply Dept to mutex families
  dept: Apply Dept to rwlock
  dept: Apply Dept to wait_for_completion()/complete()
  dept: Apply Dept to seqlock
  dept: Apply Dept to rwsem
  dept: Add proc knobs to show stats and dependency graph
  dept: Introduce split map concept and new APIs for them
  dept: Apply Dept to wait/event of PG_{locked,writeback}
  dept: Apply SDT to swait
  dept: Apply SDT to wait(waitqueue)
  locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  dept: Distinguish each syscall context from another
  dept: Distinguish each work from another
  dept: Disable Dept within the wait_bit layer by default
  dept: Add nocheck version of init_completion()
  dept: Disable Dept on struct crypto_larval's completion for now

 crypto/api.c                       |    7 +-
 include/linux/completion.h         |   50 +-
 include/linux/dept.h               |  535 +++++++
 include/linux/dept_page.h          |   78 ++
 include/linux/dept_sdt.h           |   62 +
 include/linux/hardirq.h            |    3 +
 include/linux/irqflags.h           |   33 +-
 include/linux/llist.h              |    8 -
 include/linux/lockdep.h            |  158 ++-
 include/linux/lockdep_types.h      |    3 +
 include/linux/mutex.h              |   33 +
 include/linux/page-flags.h         |   45 +-
 include/linux/pagemap.h            |    7 +-
 include/linux/percpu-rwsem.h       |   10 +-
 include/linux/rtmutex.h            |    7 +
 include/linux/rwlock.h             |   52 +
 include/linux/rwlock_api_smp.h     |    8 +-
 include/linux/rwlock_types.h       |    7 +
 include/linux/rwsem.h              |   33 +
 include/linux/sched.h              |    7 +
 include/linux/seqlock.h            |   59 +-
 include/linux/spinlock.h           |   26 +
 include/linux/spinlock_types_raw.h |   13 +
 include/linux/swait.h              |    4 +
 include/linux/types.h              |    8 +
 include/linux/wait.h               |    6 +-
 init/init_task.c                   |    2 +
 init/main.c                        |    4 +
 kernel/Makefile                    |    1 +
 kernel/cpu.c                       |    2 +-
 kernel/dependency/Makefile         |    4 +
 kernel/dependency/dept.c           | 2712 ++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h      |   10 +
 kernel/dependency/dept_internal.h  |   26 +
 kernel/dependency/dept_object.h    |   13 +
 kernel/dependency/dept_proc.c      |   92 ++
 kernel/entry/common.c              |    3 +
 kernel/exit.c                      |    1 +
 kernel/fork.c                      |    2 +
 kernel/locking/lockdep.c           |   12 +-
 kernel/module.c                    |    2 +
 kernel/sched/completion.c          |   12 +-
 kernel/sched/core.c                |    3 +
 kernel/sched/swait.c               |   10 +
 kernel/sched/wait.c                |   16 +
 kernel/sched/wait_bit.c            |    5 +-
 kernel/softirq.c                   |    6 +-
 kernel/trace/trace_preemptirq.c    |   19 +-
 kernel/workqueue.c                 |    3 +
 lib/Kconfig.debug                  |   21 +
 mm/filemap.c                       |   68 +
 mm/page_ext.c                      |    5 +
 52 files changed, 4257 insertions(+), 59 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_page.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c

Hyeonggon Yoo March 2, 2022, 4:36 a.m. UTC | #1

On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> I didn't want to bother you so I was planning to send the next spin
> after making more progress. However, PATCH v2 reports too many false
> positives because Dept tracked the bit_wait_table[] wrong way - I
> apologize for that. So I decided to send PATCH v3 first before going
> further for those who want to run Dept for now.
> 
> There might still be some false positives but not overwhelming.
>

Hello Byungchul, I'm running DEPT v3 on my system
and I see report below.

Looking at the kmemleak code and comment, I think
kmemleak tried to avoid lockdep recursive warning
but detected by DEPT?

===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc1+ #1 Tainted: G        W
---------------------------------------------------
summary
---------------------------------------------------
*** AA DEADLOCK ***

context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)

[S] __raw_spin_lock_irqsave(&object->lock:0):
[<ffffffc00810302c>] scan_gray_list+0x84/0x13c
stacktrace:
      dept_ecxt_enter+0x88/0xf4
      _raw_spin_lock_irqsave+0xf0/0x1c4
      scan_gray_list+0x84/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[E] spin_unlock(&object->lock:0):
[<ffffffc008102ee0>] scan_block+0x60/0x128
---------------------------------------------------
information that might be helpful
---------------------------------------------------
CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x9c/0xc4
 show_stack+0x14/0x28
 dump_stack_lvl+0x9c/0xcc
 dump_stack+0x14/0x2c
 print_circle+0x2d4/0x438
 cb_check_dl+0x44/0x70
 bfs+0x60/0x168
 add_dep+0x88/0x11c
 add_wait+0x2d0/0x2dc
 __dept_wait+0x8c/0xa4
 dept_wait+0x6c/0x88
 _raw_spin_lock_nested+0xa8/0x1b0
 scan_block+0xb4/0x128
 scan_gray_list+0xc4/0x13c
 kmemleak_scan+0x2d8/0x54c
 kmemleak_scan_thread+0xac/0xd4
 kthread+0xd4/0xe4
 ret_from_fork+0x10/0x20

> ---
> 
> Hi Linus and folks,
> 
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock(?) acquisition order to try to
> cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> 
> https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
>
[...]
> Benifit:
> 
> 	0. Works with all lock primitives.
> 	1. Works with wait_for_completion()/complete().
> 	2. Works with 'wait' on PG_locked.
> 	3. Works with 'wait' on PG_writeback.
> 	4. Works with swait/wakeup.
> 	5. Works with waitqueue.
> 	6. Multiple reports are allowed.
> 	7. Deduplication control on multiple reports.
> 	8. Withstand false positives thanks to 6.
> 	9. Easy to tag any wait/event.
> 
> Future work:
> 
> 	0. To make it more stable.
> 	1. To separates Dept from Lockdep.
> 	2. To improves performance in terms of time and space.
> 	3. To use Dept as a dependency engine for Lockdep.
> 	4. To add any missing tags of wait/event in the kernel.
> 	5. To deduplicate stack trace.
> 
> How to interpret reports:
> 
> 	1. E(event) in each context cannot be triggered because of the
> 	   W(wait) that cannot be woken.
> 	2. The stack trace helping find the problematic code is located
> 	   in each conext's detail.
> 
> Thanks,
> Byungchul
> 
> ---
> 
> Changes from v2:
> 
> 	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
> 	   reporting a lot of false positives, which is my fault.
> 	   Wait/event for bit_wait_table[] should've been tagged in a
> 	   higher layer for better work, which is a future work.
> 	   (feedback from Jan Kara)
> 	2. Disable Dept on crypto_larval's completion to prevent a false
> 	   positive.
> 
> Changes from v1:
> 
> 	1. Fix coding style and typo. (feedback from Steven)
> 	2. Distinguish each work context from another in workqueue.
> 	3. Skip checking lock acquisition with nest_lock, which is about
> 	   correct lock usage that should be checked by Lockdep.
> 
> Changes from RFC:
> 
> 	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
> 	   (feedback from Linus and Matthew)
> 	2. Use try version at lockdep_acquire_cpus_lock() annotation.
> 	3. Distinguish each syscall context from another.
[ ... ]

Hyeonggon Yoo March 2, 2022, 4:53 a.m. UTC | #2

On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > I didn't want to bother you so I was planning to send the next spin
> > after making more progress. However, PATCH v2 reports too many false
> > positives because Dept tracked the bit_wait_table[] wrong way - I
> > apologize for that. So I decided to send PATCH v3 first before going
> > further for those who want to run Dept for now.
> > 
> > There might still be some false positives but not overwhelming.
> >
> 
> Hello Byungchul, I'm running DEPT v3 on my system
> and I see report below.
> 
> Looking at the kmemleak code and comment, I think
> kmemleak tried to avoid lockdep recursive warning
> but detected by DEPT?
>

Forgot to include another warning caused by DEPT.

And comment below might be useful for debugging:

in kmemleak.c:
  43  * Locks and mutexes are acquired/nested in the following order:
  44  *
  45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
  46  *
  47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
  48  * regions.

===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc1+ #1 Tainted: G        W        
---------------------------------------------------
summary
---------------------------------------------------
*** DEADLOCK ***

context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
    [E] spin_unlock(&object->lock:0)

context B
    [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(kmemleak_lock:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable

---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
    [E] spin_unlock(&object->lock:0)

[S] __raw_spin_lock_irqsave(&object->lock:0):
[<ffffffc00810302c>] scan_gray_list+0x84/0x13c
stacktrace:
      dept_ecxt_enter+0x88/0xf4
      _raw_spin_lock_irqsave+0xf0/0x1c4
      scan_gray_list+0x84/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[W] __raw_spin_lock_irqsave(kmemleak_lock:0):
[<ffffffc008102ebc>] scan_block+0x3c/0x128
stacktrace:
      __dept_wait+0x8c/0xa4
      dept_wait+0x6c/0x88
      _raw_spin_lock_irqsave+0xb8/0x1c4
      scan_block+0x3c/0x128
      scan_gray_list+0xc4/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[E] spin_unlock(&object->lock:0):
[<ffffffc008102ee0>] scan_block+0x60/0x128

---------------------------------------------------
context B's detail
---------------------------------------------------
context B
    [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(kmemleak_lock:0)

[S] __raw_spin_lock_irqsave(kmemleak_lock:0):
[<ffffffc008102ebc>] scan_block+0x3c/0x128
stacktrace:
      dept_ecxt_enter+0x88/0xf4
      _raw_spin_lock_irqsave+0xf0/0x1c4
      scan_block+0x3c/0x128
      kmemleak_scan+0x19c/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[W] _raw_spin_lock_nested(&object->lock:0):
[<ffffffc008102f34>] scan_block+0xb4/0x128
stacktrace:
      dept_wait+0x74/0x88
      _raw_spin_lock_nested+0xa8/0x1b0
      scan_block+0xb4/0x128
      kmemleak_scan+0x19c/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20
[E] spin_unlock(kmemleak_lock:0):
[<ffffffc008102ee0>] scan_block+0x60/0x128
stacktrace:
      dept_event+0x7c/0xfc
      _raw_spin_unlock_irqrestore+0x8c/0x120
      scan_block+0x60/0x128
      kmemleak_scan+0x19c/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20
---------------------------------------------------
information that might be helpful
---------------------------------------------------
CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x9c/0xc4
 show_stack+0x14/0x28
 dump_stack_lvl+0x9c/0xcc
 dump_stack+0x14/0x2c
 print_circle+0x2d4/0x438
 cb_check_dl+0x6c/0x70
 bfs+0xc0/0x168
 add_dep+0x88/0x11c
 add_wait+0x2d0/0x2dc
 __dept_wait+0x8c/0xa4
 dept_wait+0x6c/0x88
 _raw_spin_lock_irqsave+0xb8/0x1c4
 scan_block+0x3c/0x128
 scan_gray_list+0xc4/0x13c
 kmemleak_scan+0x2d8/0x54c
 kmemleak_scan_thread+0xac/0xd4
 kthread+0xd4/0xe4
 ret_from_fork+0x10/0x20

> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: G        W
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> stacktrace:
>       dept_ecxt_enter+0x88/0xf4
>       _raw_spin_lock_irqsave+0xf0/0x1c4
>       scan_gray_list+0x84/0x13c
>       kmemleak_scan+0x2d8/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [E] spin_unlock(&object->lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  add_wait+0x2d0/0x2dc
>  __dept_wait+0x8c/0xa4
>  dept_wait+0x6c/0x88
>  _raw_spin_lock_nested+0xa8/0x1b0
>  scan_block+0xb4/0x128
>  scan_gray_list+0xc4/0x13c
>  kmemleak_scan+0x2d8/0x54c
>  kmemleak_scan_thread+0xac/0xd4
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20
>
[...]

--
Thank you, You are awesome!
Hyeonggon :-)

Byungchul Park March 3, 2022, 12:18 a.m. UTC | #3

On Wed, Mar 02, 2022 at 04:53:41AM +0000, Hyeonggon Yoo wrote:
> On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> > On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > > I didn't want to bother you so I was planning to send the next spin
> > > after making more progress. However, PATCH v2 reports too many false
> > > positives because Dept tracked the bit_wait_table[] wrong way - I
> > > apologize for that. So I decided to send PATCH v3 first before going
> > > further for those who want to run Dept for now.
> > > 
> > > There might still be some false positives but not overwhelming.
> > >
> > 
> > Hello Byungchul, I'm running DEPT v3 on my system
> > and I see report below.
> > 
> > Looking at the kmemleak code and comment, I think
> > kmemleak tried to avoid lockdep recursive warning
> > but detected by DEPT?
> >
> 
> Forgot to include another warning caused by DEPT.
> 
> And comment below might be useful for debugging:
> 
> in kmemleak.c:
>   43  * Locks and mutexes are acquired/nested in the following order:
>   44  *
>   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
>   46  *
>   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
>   48  * regions.
> 
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: G        W        
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** DEADLOCK ***
> 
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> context B
>     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(kmemleak_lock:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable

Hi Hyeonggon,

Dept also allows the following scenario when an user guarantees that
each lock instance is different from another at a different depth:

   lock A0 with depth
   lock A1 with depth + 1
   lock A2 with depth + 2
   lock A3 with depth + 3
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock A1
   unlock A0

However, Dept does not allow the following scenario where another lock
class cuts in the dependency chain:

   lock A0 with depth
   lock B
   lock A1 with depth + 1
   lock A2 with depth + 2
   lock A3 with depth + 3
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock A1
   unlock B
   unlock A0

This scenario is clearly problematic. What do you think is going to
happen with another context running the following?

   lock A1 with depth
   lock B
   lock A2 with depth + 1
   lock A3 with depth + 2
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock B
   unlock A1

It's a deadlock. That's why Dept reports this case as a problem. Or am I
missing something?

Thanks,
Byungchul

> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> stacktrace:
>       dept_ecxt_enter+0x88/0xf4
>       _raw_spin_lock_irqsave+0xf0/0x1c4
>       scan_gray_list+0x84/0x13c
>       kmemleak_scan+0x2d8/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> [<ffffffc008102ebc>] scan_block+0x3c/0x128
> stacktrace:
>       __dept_wait+0x8c/0xa4
>       dept_wait+0x6c/0x88
>       _raw_spin_lock_irqsave+0xb8/0x1c4
>       scan_block+0x3c/0x128
>       scan_gray_list+0xc4/0x13c
>       kmemleak_scan+0x2d8/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [E] spin_unlock(&object->lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> 
> ---------------------------------------------------
> context B's detail
> ---------------------------------------------------
> context B
>     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(kmemleak_lock:0)
> 
> [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> [<ffffffc008102ebc>] scan_block+0x3c/0x128
> stacktrace:
>       dept_ecxt_enter+0x88/0xf4
>       _raw_spin_lock_irqsave+0xf0/0x1c4
>       scan_block+0x3c/0x128
>       kmemleak_scan+0x19c/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [W] _raw_spin_lock_nested(&object->lock:0):
> [<ffffffc008102f34>] scan_block+0xb4/0x128
> stacktrace:
>       dept_wait+0x74/0x88
>       _raw_spin_lock_nested+0xa8/0x1b0
>       scan_block+0xb4/0x128
>       kmemleak_scan+0x19c/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> [E] spin_unlock(kmemleak_lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> stacktrace:
>       dept_event+0x7c/0xfc
>       _raw_spin_unlock_irqrestore+0x8c/0x120
>       scan_block+0x60/0x128
>       kmemleak_scan+0x19c/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x6c/0x70
>  bfs+0xc0/0x168
>  add_dep+0x88/0x11c
>  add_wait+0x2d0/0x2dc
>  __dept_wait+0x8c/0xa4
>  dept_wait+0x6c/0x88
>  _raw_spin_lock_irqsave+0xb8/0x1c4
>  scan_block+0x3c/0x128
>  scan_gray_list+0xc4/0x13c
>  kmemleak_scan+0x2d8/0x54c
>  kmemleak_scan_thread+0xac/0xd4
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20
> 
> > ===================================================
> > DEPT: Circular dependency has been detected.
> > 5.17.0-rc1+ #1 Tainted: G        W
> > ---------------------------------------------------
> > summary
> > ---------------------------------------------------
> > *** AA DEADLOCK ***
> > 
> > context A
> >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> >     [W] _raw_spin_lock_nested(&object->lock:0)
> >     [E] spin_unlock(&object->lock:0)
> > 
> > [S]: start of the event context
> > [W]: the wait blocked
> > [E]: the event not reachable
> > ---------------------------------------------------
> > context A's detail
> > ---------------------------------------------------
> > context A
> >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> >     [W] _raw_spin_lock_nested(&object->lock:0)
> >     [E] spin_unlock(&object->lock:0)
> > 
> > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > stacktrace:
> >       dept_ecxt_enter+0x88/0xf4
> >       _raw_spin_lock_irqsave+0xf0/0x1c4
> >       scan_gray_list+0x84/0x13c
> >       kmemleak_scan+0x2d8/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [E] spin_unlock(&object->lock:0):
> > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > ---------------------------------------------------
> > information that might be helpful
> > ---------------------------------------------------
> > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace.part.0+0x9c/0xc4
> >  show_stack+0x14/0x28
> >  dump_stack_lvl+0x9c/0xcc
> >  dump_stack+0x14/0x2c
> >  print_circle+0x2d4/0x438
> >  cb_check_dl+0x44/0x70
> >  bfs+0x60/0x168
> >  add_dep+0x88/0x11c
> >  add_wait+0x2d0/0x2dc
> >  __dept_wait+0x8c/0xa4
> >  dept_wait+0x6c/0x88
> >  _raw_spin_lock_nested+0xa8/0x1b0
> >  scan_block+0xb4/0x128
> >  scan_gray_list+0xc4/0x13c
> >  kmemleak_scan+0x2d8/0x54c
> >  kmemleak_scan_thread+0xac/0xd4
> >  kthread+0xd4/0xe4
> >  ret_from_fork+0x10/0x20
> >
> [...]
> 
> --
> Thank you, You are awesome!
> Hyeonggon :-)

Byungchul Park March 3, 2022, 2:22 a.m. UTC | #4

On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > I didn't want to bother you so I was planning to send the next spin
> > after making more progress. However, PATCH v2 reports too many false
> > positives because Dept tracked the bit_wait_table[] wrong way - I
> > apologize for that. So I decided to send PATCH v3 first before going
> > further for those who want to run Dept for now.
> > 
> > There might still be some false positives but not overwhelming.
> >
> 
> Hello Byungchul, I'm running DEPT v3 on my system
> and I see report below.
> 
> Looking at the kmemleak code and comment, I think
> kmemleak tried to avoid lockdep recursive warning
> but detected by DEPT?
> 
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: G        W
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> stacktrace:
>       dept_ecxt_enter+0x88/0xf4
>       _raw_spin_lock_irqsave+0xf0/0x1c4
>       scan_gray_list+0x84/0x13c
>       kmemleak_scan+0x2d8/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20

[W]'s stack trace is missed. But I guess this issue is the same issue of
what you reported following this one. We can discuss this issue on the
other report's thread.

Thanks,
Byunghcul

> [E] spin_unlock(&object->lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  add_wait+0x2d0/0x2dc
>  __dept_wait+0x8c/0xa4
>  dept_wait+0x6c/0x88
>  _raw_spin_lock_nested+0xa8/0x1b0
>  scan_block+0xb4/0x128
>  scan_gray_list+0xc4/0x13c
>  kmemleak_scan+0x2d8/0x54c
>  kmemleak_scan_thread+0xac/0xd4
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20
> 
> > ---
> > 
> > Hi Linus and folks,
> > 
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > 
> > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> >
> [...]
> > Benifit:
> > 
> > 	0. Works with all lock primitives.
> > 	1. Works with wait_for_completion()/complete().
> > 	2. Works with 'wait' on PG_locked.
> > 	3. Works with 'wait' on PG_writeback.
> > 	4. Works with swait/wakeup.
> > 	5. Works with waitqueue.
> > 	6. Multiple reports are allowed.
> > 	7. Deduplication control on multiple reports.
> > 	8. Withstand false positives thanks to 6.
> > 	9. Easy to tag any wait/event.
> > 
> > Future work:
> > 
> > 	0. To make it more stable.
> > 	1. To separates Dept from Lockdep.
> > 	2. To improves performance in terms of time and space.
> > 	3. To use Dept as a dependency engine for Lockdep.
> > 	4. To add any missing tags of wait/event in the kernel.
> > 	5. To deduplicate stack trace.
> > 
> > How to interpret reports:
> > 
> > 	1. E(event) in each context cannot be triggered because of the
> > 	   W(wait) that cannot be woken.
> > 	2. The stack trace helping find the problematic code is located
> > 	   in each conext's detail.
> > 
> > Thanks,
> > Byungchul
> > 
> > ---
> > 
> > Changes from v2:
> > 
> > 	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
> > 	   reporting a lot of false positives, which is my fault.
> > 	   Wait/event for bit_wait_table[] should've been tagged in a
> > 	   higher layer for better work, which is a future work.
> > 	   (feedback from Jan Kara)
> > 	2. Disable Dept on crypto_larval's completion to prevent a false
> > 	   positive.
> > 
> > Changes from v1:
> > 
> > 	1. Fix coding style and typo. (feedback from Steven)
> > 	2. Distinguish each work context from another in workqueue.
> > 	3. Skip checking lock acquisition with nest_lock, which is about
> > 	   correct lock usage that should be checked by Lockdep.
> > 
> > Changes from RFC:
> > 
> > 	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
> > 	   (feedback from Linus and Matthew)
> > 	2. Use try version at lockdep_acquire_cpus_lock() annotation.
> > 	3. Distinguish each syscall context from another.
> [ ... ] 
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)

Hyeonggon Yoo March 3, 2022, 8:03 a.m. UTC | #5

On Thu, Mar 03, 2022 at 09:18:13AM +0900, Byungchul Park wrote:
> On Wed, Mar 02, 2022 at 04:53:41AM +0000, Hyeonggon Yoo wrote:
> > On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> > > On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > > > I didn't want to bother you so I was planning to send the next spin
> > > > after making more progress. However, PATCH v2 reports too many false
> > > > positives because Dept tracked the bit_wait_table[] wrong way - I
> > > > apologize for that. So I decided to send PATCH v3 first before going
> > > > further for those who want to run Dept for now.
> > > > 
> > > > There might still be some false positives but not overwhelming.
> > > >
> > > 
> > > Hello Byungchul, I'm running DEPT v3 on my system
> > > and I see report below.
> > > 
> > > Looking at the kmemleak code and comment, I think
> > > kmemleak tried to avoid lockdep recursive warning
> > > but detected by DEPT?
> > >
> > 
> > Forgot to include another warning caused by DEPT.
> > 
> > And comment below might be useful for debugging:
> > 
> > in kmemleak.c:
> >   43  * Locks and mutexes are acquired/nested in the following order:
> >   44  *
> >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> >   46  *
> >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> >   48  * regions.
> > 
> > ===================================================
> > DEPT: Circular dependency has been detected.
> > 5.17.0-rc1+ #1 Tainted: G        W        
> > ---------------------------------------------------
> > summary
> > ---------------------------------------------------
> > *** DEADLOCK ***
> > 
> > context A
> >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> >     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> >     [E] spin_unlock(&object->lock:0)
> > 
> > context B
> >     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> >     [W] _raw_spin_lock_nested(&object->lock:0)
> >     [E] spin_unlock(kmemleak_lock:0)
> > 
> > [S]: start of the event context
> > [W]: the wait blocked
> > [E]: the event not reachable
> 
> Hi Hyeonggon,
> 
> Dept also allows the following scenario when an user guarantees that
> each lock instance is different from another at a different depth:
>
>    lock A0 with depth
>    lock A1 with depth + 1
>    lock A2 with depth + 2
>    lock A3 with depth + 3
>    (and so on)
>    ..
>    unlock A3
>    unlock A2
>    unlock A1
>    unlock A0
> 
> However, Dept does not allow the following scenario where another lock
> class cuts in the dependency chain:
> 
>    lock A0 with depth
>    lock B
>    lock A1 with depth + 1
>    lock A2 with depth + 2
>    lock A3 with depth + 3
>    (and so on)
>    ..
>    unlock A3
>    unlock A2
>    unlock A1
>    unlock B
>    unlock A0
> 
> This scenario is clearly problematic. What do you think is going to
> happen with another context running the following?
>

First of all, I want to say I'm not expert at locking primitives.
I may be wrong.

> >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> >   46  *
> >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> >   48  * regions.

lock order in kmemleak is described above.

and DEPT detects two cases as deadlock:

1) object->lock -> other_object->lock
2) object->lock -> kmemleak_lock, kmemleak_lock -> other_object->lock

And in kmemleak case, 1) and 2) is not possible because it must hold
scan_mutex first.

I think the author of kmemleak intended lockdep to treat object->lock
and other_object->lock as different class, using raw_spin_lock_nested().

Am I missing something?

Thanks.

>    lock A1 with depth
>    lock B
>    lock A2 with depth + 1
>    lock A3 with depth + 2
>    (and so on)
>    ..
>    unlock A3
>    unlock A2
>    unlock B
>    unlock A1
> 
> It's a deadlock. That's why Dept reports this case as a problem. Or am I
> missing something?
> 
> Thanks,
> Byungchul
> 
> > ---------------------------------------------------
> > context A's detail
> > ---------------------------------------------------
> > context A
> >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> >     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> >     [E] spin_unlock(&object->lock:0)
> > 
> > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > stacktrace:
> >       dept_ecxt_enter+0x88/0xf4
> >       _raw_spin_lock_irqsave+0xf0/0x1c4
> >       scan_gray_list+0x84/0x13c
> >       kmemleak_scan+0x2d8/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > stacktrace:
> >       __dept_wait+0x8c/0xa4
> >       dept_wait+0x6c/0x88
> >       _raw_spin_lock_irqsave+0xb8/0x1c4
> >       scan_block+0x3c/0x128
> >       scan_gray_list+0xc4/0x13c
> >       kmemleak_scan+0x2d8/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [E] spin_unlock(&object->lock:0):
> > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > 
> > ---------------------------------------------------
> > context B's detail
> > ---------------------------------------------------
> > context B
> >     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> >     [W] _raw_spin_lock_nested(&object->lock:0)
> >     [E] spin_unlock(kmemleak_lock:0)
> > 
> > [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > stacktrace:
> >       dept_ecxt_enter+0x88/0xf4
> >       _raw_spin_lock_irqsave+0xf0/0x1c4
> >       scan_block+0x3c/0x128
> >       kmemleak_scan+0x19c/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [W] _raw_spin_lock_nested(&object->lock:0):
> > [<ffffffc008102f34>] scan_block+0xb4/0x128
> > stacktrace:
> >       dept_wait+0x74/0x88
> >       _raw_spin_lock_nested+0xa8/0x1b0
> >       scan_block+0xb4/0x128
> >       kmemleak_scan+0x19c/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > [E] spin_unlock(kmemleak_lock:0):
> > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > stacktrace:
> >       dept_event+0x7c/0xfc
> >       _raw_spin_unlock_irqrestore+0x8c/0x120
> >       scan_block+0x60/0x128
> >       kmemleak_scan+0x19c/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > ---------------------------------------------------
> > information that might be helpful
> > ---------------------------------------------------
> > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace.part.0+0x9c/0xc4
> >  show_stack+0x14/0x28
> >  dump_stack_lvl+0x9c/0xcc
> >  dump_stack+0x14/0x2c
> >  print_circle+0x2d4/0x438
> >  cb_check_dl+0x6c/0x70
> >  bfs+0xc0/0x168
> >  add_dep+0x88/0x11c
> >  add_wait+0x2d0/0x2dc
> >  __dept_wait+0x8c/0xa4
> >  dept_wait+0x6c/0x88
> >  _raw_spin_lock_irqsave+0xb8/0x1c4
> >  scan_block+0x3c/0x128
> >  scan_gray_list+0xc4/0x13c
> >  kmemleak_scan+0x2d8/0x54c
> >  kmemleak_scan_thread+0xac/0xd4
> >  kthread+0xd4/0xe4
> >  ret_from_fork+0x10/0x20
> > 
> > > ===================================================
> > > DEPT: Circular dependency has been detected.
> > > 5.17.0-rc1+ #1 Tainted: G        W
> > > ---------------------------------------------------
> > > summary
> > > ---------------------------------------------------
> > > *** AA DEADLOCK ***
> > > 
> > > context A
> > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > >     [E] spin_unlock(&object->lock:0)
> > > 
> > > [S]: start of the event context
> > > [W]: the wait blocked
> > > [E]: the event not reachable
> > > ---------------------------------------------------
> > > context A's detail
> > > ---------------------------------------------------
> > > context A
> > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > >     [E] spin_unlock(&object->lock:0)
> > > 
> > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > stacktrace:
> > >       dept_ecxt_enter+0x88/0xf4
> > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > >       scan_gray_list+0x84/0x13c
> > >       kmemleak_scan+0x2d8/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [E] spin_unlock(&object->lock:0):
> > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > ---------------------------------------------------
> > > information that might be helpful
> > > ---------------------------------------------------
> > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > > Hardware name: linux,dummy-virt (DT)
> > > Call trace:
> > >  dump_backtrace.part.0+0x9c/0xc4
> > >  show_stack+0x14/0x28
> > >  dump_stack_lvl+0x9c/0xcc
> > >  dump_stack+0x14/0x2c
> > >  print_circle+0x2d4/0x438
> > >  cb_check_dl+0x44/0x70
> > >  bfs+0x60/0x168
> > >  add_dep+0x88/0x11c
> > >  add_wait+0x2d0/0x2dc
> > >  __dept_wait+0x8c/0xa4
> > >  dept_wait+0x6c/0x88
> > >  _raw_spin_lock_nested+0xa8/0x1b0
> > >  scan_block+0xb4/0x128
> > >  scan_gray_list+0xc4/0x13c
> > >  kmemleak_scan+0x2d8/0x54c
> > >  kmemleak_scan_thread+0xac/0xd4
> > >  kthread+0xd4/0xe4
> > >  ret_from_fork+0x10/0x20
> > >
> > [...]
> > 
> > --
> > Thank you, You are awesome!
> > Hyeonggon :-)

Byungchul Park March 3, 2022, 9:48 a.m. UTC | #6

On Thu, Mar 03, 2022 at 08:03:21AM +0000, Hyeonggon Yoo wrote:
> On Thu, Mar 03, 2022 at 09:18:13AM +0900, Byungchul Park wrote:
> > Hi Hyeonggon,
> > 
> > Dept also allows the following scenario when an user guarantees that
> > each lock instance is different from another at a different depth:
> >
> >    lock A0 with depth
> >    lock A1 with depth + 1
> >    lock A2 with depth + 2
> >    lock A3 with depth + 3
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock A1
> >    unlock A0

Look at this. Dept allows object->lock -> other_object->lock (with a
different depth using *_lock_nested()) so won't report it.

> > However, Dept does not allow the following scenario where another lock
> > class cuts in the dependency chain:
> > 
> >    lock A0 with depth
> >    lock B
> >    lock A1 with depth + 1
> >    lock A2 with depth + 2
> >    lock A3 with depth + 3
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock A1
> >    unlock B
> >    unlock A0
> > 
> > This scenario is clearly problematic. What do you think is going to
> > happen with another context running the following?
> >
> 
> First of all, I want to say I'm not expert at locking primitives.
> I may be wrong.

It's okay. Thanks anyway for your feedback.

> > >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> > >   46  *
> > >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> > >   48  * regions.
> 
> lock order in kmemleak is described above.
> 
> and DEPT detects two cases as deadlock:
> 
> 1) object->lock -> other_object->lock

It's not a deadlock *IF* two have different depth using *_lock_nested().
Dept also allows this case. So Dept wouldn't report it.

> 2) object->lock -> kmemleak_lock, kmemleak_lock -> other_object->lock

But this usage is risky. I already explained it in the mail you replied
to. I copied it. See the below.

context A
> >    lock A0 with depth
> >    lock B
> >    lock A1 with depth + 1
> >    lock A2 with depth + 2
> >    lock A3 with depth + 3
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock A1
> >    unlock B
> >    unlock A0

...

context B
> >    lock A1 with depth
> >    lock B
> >    lock A2 with depth + 1
> >    lock A3 with depth + 2
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock B
> >    unlock A1

where Ax : object->lock, B : kmemleak_lock.

A deadlock might occur if the two contexts run at the same time.

> And in kmemleak case, 1) and 2) is not possible because it must hold
> scan_mutex first.

This is another issue. Let's focus on whether the order is okay for now.

> I think the author of kmemleak intended lockdep to treat object->lock
> and other_object->lock as different class, using raw_spin_lock_nested().

Yes. The author meant to assign a different class according to its depth
using a Lockdep API. Strictly speaking, those are the same class anyway
but we assign a different class to each depth to avoid Lockdep splats
*IF* the user guarantees the nesting lock usage is safe, IOW, guarantees
each lock instance is different at a different depth.

I was fundamentally asking you... so... is the nesting lock usage safe
for real? I hope you distinguish between the safe case and the risky
case when *_lock_nested() is involved. Thoughts?

Thanks,
Byungchul

> Am I missing something?
> 
> Thanks.
> 
> >    lock A1 with depth
> >    lock B
> >    lock A2 with depth + 1
> >    lock A3 with depth + 2
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock B
> >    unlock A1
> > 
> > It's a deadlock. That's why Dept reports this case as a problem. Or am I
> > missing something?
> > 
> > Thanks,
> > Byungchul
> > 
> > > ---------------------------------------------------
> > > context A's detail
> > > ---------------------------------------------------
> > > context A
> > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > >     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> > >     [E] spin_unlock(&object->lock:0)
> > > 
> > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > stacktrace:
> > >       dept_ecxt_enter+0x88/0xf4
> > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > >       scan_gray_list+0x84/0x13c
> > >       kmemleak_scan+0x2d8/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > > stacktrace:
> > >       __dept_wait+0x8c/0xa4
> > >       dept_wait+0x6c/0x88
> > >       _raw_spin_lock_irqsave+0xb8/0x1c4
> > >       scan_block+0x3c/0x128
> > >       scan_gray_list+0xc4/0x13c
> > >       kmemleak_scan+0x2d8/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [E] spin_unlock(&object->lock:0):
> > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > 
> > > ---------------------------------------------------
> > > context B's detail
> > > ---------------------------------------------------
> > > context B
> > >     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > >     [E] spin_unlock(kmemleak_lock:0)
> > > 
> > > [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > > stacktrace:
> > >       dept_ecxt_enter+0x88/0xf4
> > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > >       scan_block+0x3c/0x128
> > >       kmemleak_scan+0x19c/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [W] _raw_spin_lock_nested(&object->lock:0):
> > > [<ffffffc008102f34>] scan_block+0xb4/0x128
> > > stacktrace:
> > >       dept_wait+0x74/0x88
> > >       _raw_spin_lock_nested+0xa8/0x1b0
> > >       scan_block+0xb4/0x128
> > >       kmemleak_scan+0x19c/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > [E] spin_unlock(kmemleak_lock:0):
> > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > stacktrace:
> > >       dept_event+0x7c/0xfc
> > >       _raw_spin_unlock_irqrestore+0x8c/0x120
> > >       scan_block+0x60/0x128
> > >       kmemleak_scan+0x19c/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > ---------------------------------------------------
> > > information that might be helpful
> > > ---------------------------------------------------
> > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > > Hardware name: linux,dummy-virt (DT)
> > > Call trace:
> > >  dump_backtrace.part.0+0x9c/0xc4
> > >  show_stack+0x14/0x28
> > >  dump_stack_lvl+0x9c/0xcc
> > >  dump_stack+0x14/0x2c
> > >  print_circle+0x2d4/0x438
> > >  cb_check_dl+0x6c/0x70
> > >  bfs+0xc0/0x168
> > >  add_dep+0x88/0x11c
> > >  add_wait+0x2d0/0x2dc
> > >  __dept_wait+0x8c/0xa4
> > >  dept_wait+0x6c/0x88
> > >  _raw_spin_lock_irqsave+0xb8/0x1c4
> > >  scan_block+0x3c/0x128
> > >  scan_gray_list+0xc4/0x13c
> > >  kmemleak_scan+0x2d8/0x54c
> > >  kmemleak_scan_thread+0xac/0xd4
> > >  kthread+0xd4/0xe4
> > >  ret_from_fork+0x10/0x20
> > > 
> > > > ===================================================
> > > > DEPT: Circular dependency has been detected.
> > > > 5.17.0-rc1+ #1 Tainted: G        W
> > > > ---------------------------------------------------
> > > > summary
> > > > ---------------------------------------------------
> > > > *** AA DEADLOCK ***
> > > > 
> > > > context A
> > > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > > >     [E] spin_unlock(&object->lock:0)
> > > > 
> > > > [S]: start of the event context
> > > > [W]: the wait blocked
> > > > [E]: the event not reachable
> > > > ---------------------------------------------------
> > > > context A's detail
> > > > ---------------------------------------------------
> > > > context A
> > > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > > >     [E] spin_unlock(&object->lock:0)
> > > > 
> > > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > > stacktrace:
> > > >       dept_ecxt_enter+0x88/0xf4
> > > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > > >       scan_gray_list+0x84/0x13c
> > > >       kmemleak_scan+0x2d8/0x54c
> > > >       kmemleak_scan_thread+0xac/0xd4
> > > >       kthread+0xd4/0xe4
> > > >       ret_from_fork+0x10/0x20
> > > > 
> > > > [E] spin_unlock(&object->lock:0):
> > > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > > ---------------------------------------------------
> > > > information that might be helpful
> > > > ---------------------------------------------------
> > > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > > > Hardware name: linux,dummy-virt (DT)
> > > > Call trace:
> > > >  dump_backtrace.part.0+0x9c/0xc4
> > > >  show_stack+0x14/0x28
> > > >  dump_stack_lvl+0x9c/0xcc
> > > >  dump_stack+0x14/0x2c
> > > >  print_circle+0x2d4/0x438
> > > >  cb_check_dl+0x44/0x70
> > > >  bfs+0x60/0x168
> > > >  add_dep+0x88/0x11c
> > > >  add_wait+0x2d0/0x2dc
> > > >  __dept_wait+0x8c/0xa4
> > > >  dept_wait+0x6c/0x88
> > > >  _raw_spin_lock_nested+0xa8/0x1b0
> > > >  scan_block+0xb4/0x128
> > > >  scan_gray_list+0xc4/0x13c
> > > >  kmemleak_scan+0x2d8/0x54c
> > > >  kmemleak_scan_thread+0xac/0xd4
> > > >  kthread+0xd4/0xe4
> > > >  ret_from_fork+0x10/0x20
> > > >
> > > [...]
> > > 
> > > --
> > > Thank you, You are awesome!
> > > Hyeonggon :-)
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)

Hyeonggon Yoo March 3, 2022, 12:38 p.m. UTC | #7

On Thu, Mar 03, 2022 at 06:48:24PM +0900, Byungchul Park wrote:
> On Thu, Mar 03, 2022 at 08:03:21AM +0000, Hyeonggon Yoo wrote:
> > On Thu, Mar 03, 2022 at 09:18:13AM +0900, Byungchul Park wrote:
> > > Hi Hyeonggon,
> > > 
> > > Dept also allows the following scenario when an user guarantees that
> > > each lock instance is different from another at a different depth:
> > >
> > >    lock A0 with depth
> > >    lock A1 with depth + 1
> > >    lock A2 with depth + 2
> > >    lock A3 with depth + 3
> > >    (and so on)
> > >    ..
> > >    unlock A3
> > >    unlock A2
> > >    unlock A1
> > >    unlock A0
> 

[+Cc kmemleak maintainer]

> Look at this. Dept allows object->lock -> other_object->lock (with a
> different depth using *_lock_nested()) so won't report it.
>

No, It did.

S: object->lock ( _raw_spin_lock_irqsave)
W: other_object->lock (_raw_spin_lock_nested)

DEPT reported this as AA deadlock.

===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc1+ #1 Tainted: G        W
---------------------------------------------------
summary
---------------------------------------------------
*** AA DEADLOCK ***

context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)

[S] __raw_spin_lock_irqsave(&object->lock:0):
[<ffffffc00810302c>] scan_gray_list+0x84/0x13c
stacktrace:
      dept_ecxt_enter+0x88/0xf4
      _raw_spin_lock_irqsave+0xf0/0x1c4
      scan_gray_list+0x84/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[W] _raw_spin_lock_nested(&object->lock:0):
[<ffffffc008102f34>] scan_block+0xb4/0x128
stacktrace:
      __dept_wait+0x8c/0xa4
      dept_wait+0x6c/0x88
      _raw_spin_lock_nested+0xa8/0x1b0
      scan_block+0xb4/0x128
      scan_gray_list+0xc4/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[E] spin_unlock(&object->lock:0):
[<ffffffc008102ee0>] scan_block+0x60/0x128
---------------------------------------------------
information that might be helpful
---------------------------------------------------
CPU: 2 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x9c/0xc4
 show_stack+0x14/0x28
 dump_stack_lvl+0x9c/0xcc
 dump_stack+0x14/0x2c
 print_circle+0x2d4/0x438
 cb_check_dl+0x44/0x70
 bfs+0x60/0x168
 add_dep+0x88/0x11c
 add_wait+0x2d0/0x2dc
 __dept_wait+0x8c/0xa4
 dept_wait+0x6c/0x88
 _raw_spin_lock_nested+0xa8/0x1b0
 scan_block+0xb4/0x128
 scan_gray_list+0xc4/0x13c
 kmemleak_scan+0x2d8/0x54c
 kmemleak_scan_thread+0xac/0xd4
 kthread+0xd4/0xe4
 ret_from_fork+0x10/0x20

> > > However, Dept does not allow the following scenario where another lock
> > > class cuts in the dependency chain:
> > > 
> > >    lock A0 with depth
> > >    lock B
> > >    lock A1 with depth + 1
> > >    lock A2 with depth + 2
> > >    lock A3 with depth + 3
> > >    (and so on)
> > >    ..
> > >    unlock A3
> > >    unlock A2
> > >    unlock A1
> > >    unlock B
> > >    unlock A0
> > > 
> > > This scenario is clearly problematic. What do you think is going to
> > > happen with another context running the following?
> > >
> > 
> > First of all, I want to say I'm not expert at locking primitives.
> > I may be wrong.
> 
> It's okay. Thanks anyway for your feedback.
>

Thanks.

> > > >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> > > >   46  *
> > > >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> > > >   48  * regions.
> > 
> > lock order in kmemleak is described above.
> > 
> > and DEPT detects two cases as deadlock:
> > 
> > 1) object->lock -> other_object->lock
> 
> It's not a deadlock *IF* two have different depth using *_lock_nested().
> Dept also allows this case. So Dept wouldn't report it.
>
> > 2) object->lock -> kmemleak_lock, kmemleak_lock -> other_object->lock
>
> But this usage is risky. I already explained it in the mail you replied
> to. I copied it. See the below.
>

I understand why you said this is risky.
Its lock ordering is not good.

> context A
> > >    lock A0 with depth
> > >    lock B
> > >    lock A1 with depth + 1
> > >    lock A2 with depth + 2
> > >    lock A3 with depth + 3
> > >    (and so on)
> > >    ..
> > >    unlock A3
> > >    unlock A2
> > >    unlock A1
> > >    unlock B
> > >    unlock A0
>
> ...
>
> context B
> > >    lock A1 with depth
> > >    lock B
> > >    lock A2 with depth + 1
> > >    lock A3 with depth + 2
> > >    (and so on)
> > >    ..
> > >    unlock A3
> > >    unlock A2
> > >    unlock B
> > >    unlock A1
> 
> where Ax : object->lock, B : kmemleak_lock.
> 
> A deadlock might occur if the two contexts run at the same time.
>

But I want to say kmemleak is getting things under control. No two contexts
can run at same time.

> > And in kmemleak case, 1) and 2) is not possible because it must hold
> > scan_mutex first.
> 
> This is another issue. Let's focus on whether the order is okay for now.
>

Why is it another issue?

> > I think the author of kmemleak intended lockdep to treat object->lock
> > and other_object->lock as different class, using raw_spin_lock_nested().
> 
> Yes. The author meant to assign a different class according to its depth
> using a Lockdep API. Strictly speaking, those are the same class anyway
> but we assign a different class to each depth to avoid Lockdep splats
> *IF* the user guarantees the nesting lock usage is safe, IOW, guarantees
> each lock instance is different at a different depth.

Then why DEPT reports 1) and 2) as deadlock?
Does DEPT assign same class unlike Lockdep?

> I was fundamentally asking you... so... is the nesting lock usage safe
> for real?

I don't get what the point is. I agree it's not a good lock ordering.
But in kmemleak case, I think kmemleak is getting things under control.

Byungchul Park March 4, 2022, 12:28 a.m. UTC | #8

On Thu, Mar 03, 2022 at 12:38:39PM +0000, Hyeonggon Yoo wrote:
> On Thu, Mar 03, 2022 at 06:48:24PM +0900, Byungchul Park wrote:
> > On Thu, Mar 03, 2022 at 08:03:21AM +0000, Hyeonggon Yoo wrote:
> > > On Thu, Mar 03, 2022 at 09:18:13AM +0900, Byungchul Park wrote:
> > > > Hi Hyeonggon,
> > > > 
> > > > Dept also allows the following scenario when an user guarantees that
> > > > each lock instance is different from another at a different depth:
> > > >
> > > >    lock A0 with depth
> > > >    lock A1 with depth + 1
> > > >    lock A2 with depth + 2
> > > >    lock A3 with depth + 3
> > > >    (and so on)
> > > >    ..
> > > >    unlock A3
> > > >    unlock A2
> > > >    unlock A1
> > > >    unlock A0
> > 
> 
> [+Cc kmemleak maintainer]
> 
> > Look at this. Dept allows object->lock -> other_object->lock (with a
> > different depth using *_lock_nested()) so won't report it.
> >
> 
> No, It did.

Yes, you are right. I should've asked you to resend the AA deadlock
report when I found [W]'s stacktrace was missed in what you shared and
should've taken a look at it more.

Dept normally doesn't report this type of AA deadlock. But it does
when the case happens, that we are talking about, say, another lock
class cut in between the nesting locks. I will fix it. The AA deadlock
report here doesn't make sense. Thank you.

However, the other report below still makes sense.

> > > > >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> > > > >   46  *
> > > > >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> > > > >   48  * regions.
> > > 
> > > lock order in kmemleak is described above.
> > > 
> > > and DEPT detects two cases as deadlock:
> > > 
> > > 1) object->lock -> other_object->lock
> > 
> > It's not a deadlock *IF* two have different depth using *_lock_nested().
> > Dept also allows this case. So Dept wouldn't report it.
> >
> > > 2) object->lock -> kmemleak_lock, kmemleak_lock -> other_object->lock
> >
> > But this usage is risky. I already explained it in the mail you replied
> > to. I copied it. See the below.
> >
> 
> I understand why you said this is risky.
> Its lock ordering is not good.
> 
> > context A
> > > >    lock A0 with depth
> > > >    lock B
> > > >    lock A1 with depth + 1
> > > >    lock A2 with depth + 2
> > > >    lock A3 with depth + 3
> > > >    (and so on)
> > > >    ..
> > > >    unlock A3
> > > >    unlock A2
> > > >    unlock A1
> > > >    unlock B
> > > >    unlock A0
> >
> > ...
> >
> > context B
> > > >    lock A1 with depth
> > > >    lock B
> > > >    lock A2 with depth + 1
> > > >    lock A3 with depth + 2
> > > >    (and so on)
> > > >    ..
> > > >    unlock A3
> > > >    unlock A2
> > > >    unlock B
> > > >    unlock A1
> > 
> > where Ax : object->lock, B : kmemleak_lock.
> > 
> > A deadlock might occur if the two contexts run at the same time.
> >
> 
> But I want to say kmemleak is getting things under control. No two contexts
> can run at same time.

So.. do you think the below is also okay? Because lock C and lock B are
under control?

   context X			context Y

   lock mutex A			lock mutex A
   lock B			lock C
   lock C			lock B
   unlock C			unlock B
   unlock B			unlock C
   unlock mutex A		unlock mutex A

In my opinion, lock B and lock C are unnecessary if they are always
along with lock mutex A. Or we should keep correct lock order across all
the code.

> > > And in kmemleak case, 1) and 2) is not possible because it must hold
> > > scan_mutex first.
> > 
> > This is another issue. Let's focus on whether the order is okay for now.
> >
> 
> Why is it another issue?

You seem to insist that locking order is not important *if* they are
under control by serializing the sections. I meant this is another issue.

> > > I think the author of kmemleak intended lockdep to treat object->lock
> > > and other_object->lock as different class, using raw_spin_lock_nested().
> > 
> > Yes. The author meant to assign a different class according to its depth
> > using a Lockdep API. Strictly speaking, those are the same class anyway
> > but we assign a different class to each depth to avoid Lockdep splats
> > *IF* the user guarantees the nesting lock usage is safe, IOW, guarantees
> > each lock instance is different at a different depth.
> 
> Then why DEPT reports 1) and 2) as deadlock?

1) will be fixed so that Dept doesn't report it. But I still think the
case 2) should be reported for the wrong usage.

Thanks,
Byungchul

> Does DEPT assign same class unlike Lockdep?
> 
> > I was fundamentally asking you... so... is the nesting lock usage safe
> > for real?
> 
> I don't get what the point is. I agree it's not a good lock ordering.
> But in kmemleak case, I think kmemleak is getting things under control.
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)
> 
> > I hope you distinguish between the safe case and the risky
> > case when *_lock_nested() is involved. Thoughts?
> >
> > Thanks,
> > Byungchul
> > 
> > > Am I missing something?
> > > 
> > > Thanks.
> > > 
> > > >    lock A1 with depth
> > > >    lock B
> > > >    lock A2 with depth + 1
> > > >    lock A3 with depth + 2
> > > >    (and so on)
> > > >    ..
> > > >    unlock A3
> > > >    unlock A2
> > > >    unlock B
> > > >    unlock A1
> > > > 
> > > > It's a deadlock. That's why Dept reports this case as a problem. Or am I
> > > > missing something?
> > > >
> > > > Thanks,
> > > > Byungchul
> > > > 
> > > > > ---------------------------------------------------
> > > > > context A's detail
> > > > > ---------------------------------------------------
> > > > > context A
> > > > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > > >     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> > > > >     [E] spin_unlock(&object->lock:0)
> > > > > 
> > > > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > > > stacktrace:
> > > > >       dept_ecxt_enter+0x88/0xf4
> > > > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > > > >       scan_gray_list+0x84/0x13c
> > > > >       kmemleak_scan+0x2d8/0x54c
> > > > >       kmemleak_scan_thread+0xac/0xd4
> > > > >       kthread+0xd4/0xe4
> > > > >       ret_from_fork+0x10/0x20
> > > > > 
> > > > > [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > > > > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > > > > stacktrace:
> > > > >       __dept_wait+0x8c/0xa4
> > > > >       dept_wait+0x6c/0x88
> > > > >       _raw_spin_lock_irqsave+0xb8/0x1c4
> > > > >       scan_block+0x3c/0x128
> > > > >       scan_gray_list+0xc4/0x13c
> > > > >       kmemleak_scan+0x2d8/0x54c
> > > > >       kmemleak_scan_thread+0xac/0xd4
> > > > >       kthread+0xd4/0xe4
> > > > >       ret_from_fork+0x10/0x20
> > > > > 
> > > > > [E] spin_unlock(&object->lock:0):
> > > > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > > > 
> > > > > ---------------------------------------------------
> > > > > context B's detail
> > > > > ---------------------------------------------------
> > > > > context B
> > > > >     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> > > > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > > > >     [E] spin_unlock(kmemleak_lock:0)
> > > > > 
> > > > > [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > > > > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > > > > stacktrace:
> > > > >       dept_ecxt_enter+0x88/0xf4
> > > > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > > > >       scan_block+0x3c/0x128
> > > > >       kmemleak_scan+0x19c/0x54c
> > > > >       kmemleak_scan_thread+0xac/0xd4
> > > > >       kthread+0xd4/0xe4
> > > > >       ret_from_fork+0x10/0x20
> > > > > 
> > > > > [W] _raw_spin_lock_nested(&object->lock:0):
> > > > > [<ffffffc008102f34>] scan_block+0xb4/0x128
> > > > > stacktrace:
> > > > >       dept_wait+0x74/0x88
> > > > >       _raw_spin_lock_nested+0xa8/0x1b0
> > > > >       scan_block+0xb4/0x128
> > > > >       kmemleak_scan+0x19c/0x54c
> > > > >       kmemleak_scan_thread+0xac/0xd4
> > > > >       kthread+0xd4/0xe4
> > > > >       ret_from_fork+0x10/0x20
> > > > > [E] spin_unlock(kmemleak_lock:0):
> > > > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > > > stacktrace:
> > > > >       dept_event+0x7c/0xfc
> > > > >       _raw_spin_unlock_irqrestore+0x8c/0x120
> > > > >       scan_block+0x60/0x128
> > > > >       kmemleak_scan+0x19c/0x54c
> > > > >       kmemleak_scan_thread+0xac/0xd4
> > > > >       kthread+0xd4/0xe4
> > > > >       ret_from_fork+0x10/0x20
> > > > > ---------------------------------------------------
> > > > > information that might be helpful
> > > > > ---------------------------------------------------
> > > > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > > > > Hardware name: linux,dummy-virt (DT)
> > > > > Call trace:
> > > > >  dump_backtrace.part.0+0x9c/0xc4
> > > > >  show_stack+0x14/0x28
> > > > >  dump_stack_lvl+0x9c/0xcc
> > > > >  dump_stack+0x14/0x2c
> > > > >  print_circle+0x2d4/0x438
> > > > >  cb_check_dl+0x6c/0x70
> > > > >  bfs+0xc0/0x168
> > > > >  add_dep+0x88/0x11c
> > > > >  add_wait+0x2d0/0x2dc
> > > > >  __dept_wait+0x8c/0xa4
> > > > >  dept_wait+0x6c/0x88
> > > > >  _raw_spin_lock_irqsave+0xb8/0x1c4
> > > > >  scan_block+0x3c/0x128
> > > > >  scan_gray_list+0xc4/0x13c
> > > > >  kmemleak_scan+0x2d8/0x54c
> > > > >  kmemleak_scan_thread+0xac/0xd4
> > > > >  kthread+0xd4/0xe4
> > > > >  ret_from_fork+0x10/0x20
> > > > > 
> > > > > > ===================================================
> > > > > > DEPT: Circular dependency has been detected.
> > > > > > 5.17.0-rc1+ #1 Tainted: G        W
> > > > > > ---------------------------------------------------
> > > > > > summary
> > > > > > ---------------------------------------------------
> > > > > > *** AA DEADLOCK ***
> > > > > > 
> > > > > > context A
> > > > > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > > > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > > > > >     [E] spin_unlock(&object->lock:0)
> > > > > > 
> > > > > > [S]: start of the event context
> > > > > > [W]: the wait blocked
> > > > > > [E]: the event not reachable
> > > > > > ---------------------------------------------------
> > > > > > context A's detail
> > > > > > ---------------------------------------------------
> > > > > > context A
> > > > > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > > > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > > > > >     [E] spin_unlock(&object->lock:0)
> > > > > > 
> > > > > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > > > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > > > > stacktrace:
> > > > > >       dept_ecxt_enter+0x88/0xf4
> > > > > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > > > > >       scan_gray_list+0x84/0x13c
> > > > > >       kmemleak_scan+0x2d8/0x54c
> > > > > >       kmemleak_scan_thread+0xac/0xd4
> > > > > >       kthread+0xd4/0xe4
> > > > > >       ret_from_fork+0x10/0x20
> > > > > > 
> > > > > > [E] spin_unlock(&object->lock:0):
> > > > > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > > > > ---------------------------------------------------
> > > > > > information that might be helpful
> > > > > > ---------------------------------------------------
> > > > > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > > > > > Hardware name: linux,dummy-virt (DT)
> > > > > > Call trace:
> > > > > >  dump_backtrace.part.0+0x9c/0xc4
> > > > > >  show_stack+0x14/0x28
> > > > > >  dump_stack_lvl+0x9c/0xcc
> > > > > >  dump_stack+0x14/0x2c
> > > > > >  print_circle+0x2d4/0x438
> > > > > >  cb_check_dl+0x44/0x70
> > > > > >  bfs+0x60/0x168
> > > > > >  add_dep+0x88/0x11c
> > > > > >  add_wait+0x2d0/0x2dc
> > > > > >  __dept_wait+0x8c/0xa4
> > > > > >  dept_wait+0x6c/0x88
> > > > > >  _raw_spin_lock_nested+0xa8/0x1b0
> > > > > >  scan_block+0xb4/0x128
> > > > > >  scan_gray_list+0xc4/0x13c
> > > > > >  kmemleak_scan+0x2d8/0x54c
> > > > > >  kmemleak_scan_thread+0xac/0xd4
> > > > > >  kthread+0xd4/0xe4
> > > > > >  ret_from_fork+0x10/0x20
> > > > > >
> > > > > [...]
> > > > > 
> > > > > --
> > > > > Thank you, You are awesome!
> > > > > Hyeonggon :-)
> > > 
> > > -- 
> > > Thank you, You are awesome!
> > > Hyeonggon :-)

[v3,00/21] DEPT(Dependency Tracker)

Message

Comments