[RFC,v7,00/23] DEPT(Dependency Tracker)

Message ID	1673235231-30302-1-git-send-email-byungchul.park@lge.com (mailing list archive)
Headers	show Return-Path: <linux-block-owner@vger.kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 013F6C678D5 for <linux-block@archiver.kernel.org>; Mon, 9 Jan 2023 04:04:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236632AbjAIEE0 (ORCPT <rfc822;linux-block@archiver.kernel.org>); Sun, 8 Jan 2023 23:04:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236428AbjAIEEB (ORCPT <rfc822;linux-block@vger.kernel.org>); Sun, 8 Jan 2023 23:04:01 -0500 Received: from lgeamrelo11.lge.com (lgeamrelo12.lge.com [156.147.23.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4A0871180F for <linux-block@vger.kernel.org>; Sun, 8 Jan 2023 20:03:52 -0800 (PST) Received: from unknown (HELO lgemrelse6q.lge.com) (156.147.1.121) by 156.147.23.52 with ESMTP; 9 Jan 2023 12:33:51 +0900 X-Original-SENDERIP: 156.147.1.121 X-Original-MAILFROM: byungchul.park@lge.com Received: from unknown (HELO localhost.localdomain) (10.177.244.38) by 156.147.1.121 with ESMTP; 9 Jan 2023 12:33:51 +0900 X-Original-SENDERIP: 10.177.244.38 X-Original-MAILFROM: byungchul.park@lge.com From: Byungchul Park <byungchul.park@lge.com> To: linux-kernel@vger.kernel.org Cc: torvalds@linux-foundation.org, damien.lemoal@opensource.wdc.com, linux-ide@vger.kernel.org, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, mingo@redhat.com, peterz@infradead.org, will@kernel.org, tglx@linutronix.de, rostedt@goodmis.org, joel@joelfernandes.org, sashal@kernel.org, daniel.vetter@ffwll.ch, duyuyang@gmail.com, johannes.berg@intel.com, tj@kernel.org, tytso@mit.edu, willy@infradead.org, david@fromorbit.com, amir73il@gmail.com, gregkh@linuxfoundation.org, kernel-team@lge.com, linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@kernel.org, minchan@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com, sj@kernel.org, jglisse@redhat.com, dennis@kernel.org, cl@linux.com, penberg@kernel.org, rientjes@google.com, vbabka@suse.cz, ngupta@vflare.org, linux-block@vger.kernel.org, paolo.valente@linaro.org, josef@toxicpanda.com, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, jack@suse.cz, jlayton@kernel.org, dan.j.williams@intel.com, hch@infradead.org, djwong@kernel.org, dri-devel@lists.freedesktop.org, rodrigosiqueiramelo@gmail.com, melissa.srw@gmail.com, hamohammed.sa@gmail.com, 42.hyeyoo@gmail.com, chris.p.wilson@intel.com, gwan-gyeong.mun@intel.com Subject: [PATCH RFC v7 00/23] DEPT(Dependency Tracker) Date: Mon, 9 Jan 2023 12:33:28 +0900 Message-Id: <1673235231-30302-1-git-send-email-byungchul.park@lge.com> X-Mailer: git-send-email 1.9.1 Precedence: bulk List-ID: <linux-block.vger.kernel.org> X-Mailing-List: linux-block@vger.kernel.org
Series	DEPT(Dependency Tracker) \| expand [RFC,v7,00/23] DEPT(Dependency Tracker) [RFC,v7,01/23] llist: Move llist_{head,node} definition to types.h [RFC,v7,02/23] dept: Implement Dept(Dependency Tracker) [RFC,v7,03/23] dept: Add single event dependency tracker APIs [RFC,v7,04/23] dept: Add lock dependency tracker APIs [RFC,v7,05/23] dept: Tie to Lockdep and IRQ tracing [RFC,v7,06/23] dept: Add proc knobs to show stats and dependency graph [RFC,v7,07/23] dept: Apply sdt_might_sleep_strong() to wait_for_completion()/complete() [RFC,v7,08/23] dept: Apply sdt_might_sleep_strong() to PG_{locked,writeback} wait [RFC,v7,09/23] dept: Apply sdt_might_sleep_weak() to swait [RFC,v7,10/23] dept: Apply sdt_might_sleep_weak() to waitqueue wait [RFC,v7,11/23] dept: Apply sdt_might_sleep_weak() to hashed-waitqueue wait [RFC,v7,12/23] dept: Distinguish each syscall context from another [RFC,v7,13/23] dept: Distinguish each work from another [RFC,v7,14/23] dept: Add a mechanism to refill the internal memory pools on running out [RFC,v7,15/23] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread [RFC,v7,16/23] dept: Apply sdt_might_sleep_strong() to dma fence wait [RFC,v7,17/23] dept: Track timeout waits separately with a new Kconfig [RFC,v7,18/23] dept: Apply timeout consideration to wait_for_completion()/complete() [RFC,v7,19/23] dept: Apply timeout consideration to swait [RFC,v7,20/23] dept: Apply timeout consideration to waitqueue wait [RFC,v7,21/23] dept: Apply timeout consideration to hashed-waitqueue wait [RFC,v7,22/23] dept: Apply timeout consideration to dma fence wait [RFC,v7,23/23] dept: Record the latest one out of consecutive waits of the same class

Byungchul Park Jan. 9, 2023, 3:33 a.m. UTC

Just for those who want to try the latest version of DEPT.

---

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v6.2-rc2.

https://github.com/lgebyungchulpark/linux-dept/commits/dept2.0_on_v6.2-rc2

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

How to interpret reports:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Thanks,
Byungchul

---

Changes from v6:

	1. Tie to task scheduler code to track sleep and try_to_wake_up()
	   assuming sleeps cause waits, try_to_wake_up()s would be the
	   events that those are waiting for, of course with proper DEPT
	   annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong()
	   and so on. For these cases, class is classified at sleep
	   entrance rather than the synchronization initialization code.
	   Which would extremely reduce false alarms.
	2. Remove the DEPT associated instance in each page struct for
	   tracking dependencies by PG_locked and PG_writeback thanks to
	   the 1. work above.
	3. Introduce CONFIG_DEPT_AGGRESIVE_TIMEOUT_WAIT to suppress
	   reports that waits with timeout set are involved, for those
	   who don't like verbose reporting.
	4. Add a mechanism to refill the internal memory pools on
	   running out so that DEPT could keep working as long as free
	   memory is available in the system.
	5. Re-enable tracking hashed-waitqueue wait. That's going to no
	   longer generate false positives because class is classified
	   at sleep entrance rather than the waitqueue initailization.
	6. Refactor to make it easier to port onto each new version of
	   the kernel.
	7. Apply DEPT to dma fence.
	8. Do trivial optimizaitions.

Changes from v5:

	1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
	   of internal resources because WARN_*() printing stacktrace is
	   too much for informing the lack. (feedback from Ted, Hyeonggon)
	2. Fix trivial bugs like missing initializing a struct before
	   using it.
	3. Assign a different class per task when handling onstack
	   variables for waitqueue or the like. Which makes Dept
	   distinguish between onstack variables of different tasks so
	   as to prevent false positives. (reported by Hyeonggon)
	4. Make Dept aware of even raw_local_irq_*() to prevent false
	   positives. (reported by Hyeonggon)
	5. Don't consider dependencies between the events that might be
	   triggered within __schedule() and the waits that requires
	    __schedule(), real ones. (reported by Hyeonggon)
	6. Unstage the staged wait that has prepare_to_wait_event()'ed
	   *and* yet to get to __schedule(), if we encounter __schedule()
	   in-between for another sleep, which is possible if e.g. a
	   mutex_lock() exists in 'condition' of ___wait_event().
	7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
	   on the hardirq and softirq entrance tracing to make Dept more
	   portable for now.

Changes from v4:

	1. Fix some bugs that produce false alarms.
	2. Distinguish each syscall context from another *for arm64*.
	3. Make it not warn it but just print it in case Dept ring
	   buffer gets exhausted. (feedback from Hyeonggon)
	4. Explicitely describe "EXPERIMENTAL" and "Dept might produce
	   false positive reports" in Kconfig. (feedback from Ted)

Changes from v3:

	1. Dept shouldn't create dependencies between different depths
	   of a class that were indicated by *_lock_nested(). Dept
	   normally doesn't but it does once another lock class comes
	   in. So fixed it. (feedback from Hyeonggon)
	2. Dept considered a wait as a real wait once getting to
	   __schedule() even if it has been set to TASK_RUNNING by wake
	   up sources in advance. Fixed it so that Dept doesn't consider
	   the case as a real wait. (feedback from Jan Kara)
	3. Stop tracking dependencies with a map once the event
	   associated with the map has been handled. Dept will start to
	   work with the map again, on the next sleep.

Changes from v2:

	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
	   reporting a lot of false positives, which is my fault.
	   Wait/event for bit_wait_table[] should've been tagged in a
	   higher layer for better work, which is a future work.
	   (feedback from Jan Kara)
	2. Disable Dept on crypto_larval's completion to prevent a false
	   positive.

Changes from v1:

	1. Fix coding style and typo. (feedback from Steven)
	2. Distinguish each work context from another in workqueue.
	3. Skip checking lock acquisition with nest_lock, which is about
	   correct lock usage that should be checked by Lockdep.

Changes from RFC(v0):

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	   (feedback from Linus and Matthew)
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Byungchul Park (23):
  llist: Move llist_{head,node} definition to types.h
  dept: Implement Dept(Dependency Tracker)
  dept: Add single event dependency tracker APIs
  dept: Add lock dependency tracker APIs
  dept: Tie to Lockdep and IRQ tracing
  dept: Add proc knobs to show stats and dependency graph
  dept: Apply sdt_might_sleep_strong() to
    wait_for_completion()/complete()
  dept: Apply sdt_might_sleep_strong() to PG_{locked,writeback} wait
  dept: Apply sdt_might_sleep_weak() to swait
  dept: Apply sdt_might_sleep_weak() to waitqueue wait
  dept: Apply sdt_might_sleep_weak() to hashed-waitqueue wait
  dept: Distinguish each syscall context from another
  dept: Distinguish each work from another
  dept: Add a mechanism to refill the internal memory pools on running
    out
  locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  dept: Apply sdt_might_sleep_strong() to dma fence wait
  dept: Track timeout waits separately with a new Kconfig
  dept: Apply timeout consideration to wait_for_completion()/complete()
  dept: Apply timeout consideration to swait
  dept: Apply timeout consideration to waitqueue wait
  dept: Apply timeout consideration to hashed-waitqueue wait
  dept: Apply timeout consideration to dma fence wait
  dept: Record the latest one out of consecutive waits of the same class

 arch/arm64/kernel/syscall.c         |    2 +
 arch/x86/entry/common.c             |    4 +
 drivers/dma-buf/dma-fence.c         |   11 +
 include/linux/completion.h          |  102 +-
 include/linux/dept.h                |  600 +++++++
 include/linux/dept_ldt.h            |   77 +
 include/linux/dept_sdt.h            |  101 ++
 include/linux/hardirq.h             |    3 +
 include/linux/irqflags.h            |   22 +-
 include/linux/llist.h               |    8 -
 include/linux/local_lock_internal.h |    1 +
 include/linux/lockdep.h             |  108 +-
 include/linux/lockdep_types.h       |    3 +
 include/linux/mutex.h               |    1 +
 include/linux/percpu-rwsem.h        |    2 +-
 include/linux/rtmutex.h             |    1 +
 include/linux/rwlock_types.h        |    1 +
 include/linux/rwsem.h               |    1 +
 include/linux/sched.h               |    3 +
 include/linux/seqlock.h             |    2 +-
 include/linux/spinlock_types_raw.h  |    3 +
 include/linux/srcu.h                |    2 +-
 include/linux/swait.h               |    6 +
 include/linux/types.h               |    8 +
 include/linux/wait.h                |    6 +
 include/linux/wait_bit.h            |    6 +
 init/init_task.c                    |    2 +
 init/main.c                         |    2 +
 kernel/Makefile                     |    1 +
 kernel/cpu.c                        |    2 +-
 kernel/dependency/Makefile          |    4 +
 kernel/dependency/dept.c            | 3120 +++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h       |   10 +
 kernel/dependency/dept_internal.h   |   26 +
 kernel/dependency/dept_object.h     |   13 +
 kernel/dependency/dept_proc.c       |   93 ++
 kernel/exit.c                       |    1 +
 kernel/fork.c                       |    2 +
 kernel/locking/lockdep.c            |   23 +
 kernel/module/main.c                |    2 +
 kernel/sched/completion.c           |   60 +-
 kernel/sched/core.c                 |    9 +
 kernel/workqueue.c                  |    3 +
 lib/Kconfig.debug                   |   37 +
 lib/locking-selftest.c              |    2 +
 mm/filemap.c                        |   11 +
 46 files changed, 4432 insertions(+), 75 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_ldt.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c

Linus Torvalds Jan. 16, 2023, 6 p.m. UTC | #1

[ Back from travel, so trying to make sense of this series.. ]

On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <byungchul.park@lge.com> wrote:
>
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock(?) acquisition order to try to
> cover all synchonization machanisms. It's done on v6.2-rc2.

Ugh. I hate how this adds random patterns like

        if (timeout == MAX_SCHEDULE_TIMEOUT)
                sdt_might_sleep_strong(NULL);
        else
                sdt_might_sleep_strong_timeout(NULL);
   ...
        sdt_might_sleep_finish();

to various places, it seems so very odd and unmaintainable.

I also recall this giving a fair amount of false positives, are they all fixed?

Anyway, I'd really like the lockdep people to comment and be involved.
We did have a fairly recent case of "lockdep doesn't track page lock
dependencies because it fundamentally cannot" issue, so DEPT might fix
those kinds of missing dependency analysis. See

    https://lore.kernel.org/lkml/00000000000060d41f05f139aa44@google.com/

for some context to that one, but at teh same time I would *really*
want the lockdep people more involved and acking this work.

Maybe I missed the email where you reported on things DEPT has found
(and on the lack of false positives)?

               Linus

Boqun Feng Jan. 17, 2023, 6:18 p.m. UTC | #2

[Cc Waiman]

On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
> [ Back from travel, so trying to make sense of this series.. ]
> 
> On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <byungchul.park@lge.com> wrote:
> >
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms. It's done on v6.2-rc2.
> 
> Ugh. I hate how this adds random patterns like
> 
>         if (timeout == MAX_SCHEDULE_TIMEOUT)
>                 sdt_might_sleep_strong(NULL);
>         else
>                 sdt_might_sleep_strong_timeout(NULL);
>    ...
>         sdt_might_sleep_finish();
> 
> to various places, it seems so very odd and unmaintainable.
> 
> I also recall this giving a fair amount of false positives, are they all fixed?
> 

From the following part in the cover letter, I guess the answer is no?

	...
        6. Multiple reports are allowed.
        7. Deduplication control on multiple reports.
        8. Withstand false positives thanks to 6.
	...

seems to me that the logic is since DEPT allows multiple reports so that
false positives are fitlerable by users?

> Anyway, I'd really like the lockdep people to comment and be involved.

I never get Cced, so I'm unware of this for a long time...

A few comments after a quick look:

*	Looks like the DEPT dependency graph doesn't handle the
	fair/unfair readers as lockdep current does. Which bring the
	next question.

*	Can DEPT pass all the selftests of lockdep in
	lib/locking-selftests.c?

*	Instead of introducing a brand new detector/dependency tracker,
	could we first improve the lockdep's dependency tracker? I think
	Byungchul also agrees that DEPT and lockdep should share the
	same dependency tracker and the benefit of improving the
	existing one is that we can always use the self test to catch
	any regression. Thoughts?

Actually the above sugguest is just to revert revert cross-release
without exposing any annotation, which I think is more practical to
review and test.

I'd sugguest we 1) first improve the lockdep dependency tracker with
wait/event in mind and then 2) introduce wait related annotation so that
users can use, and then 3) look for practical ways to resolve false
positives/multi reports with the help of users, if all goes well,
4) make it all operation annotated.

Thoughts?

Regards,
Boqun

> We did have a fairly recent case of "lockdep doesn't track page lock
> dependencies because it fundamentally cannot" issue, so DEPT might fix
> those kinds of missing dependency analysis. See
> 
>     https://lore.kernel.org/lkml/00000000000060d41f05f139aa44@google.com/
> 
> for some context to that one, but at teh same time I would *really*
> want the lockdep people more involved and acking this work.
> 
> Maybe I missed the email where you reported on things DEPT has found
> (and on the lack of false positives)?
> 
>                Linus
>

Waiman Long Jan. 17, 2023, 6:40 p.m. UTC | #3

On 1/17/23 13:18, Boqun Feng wrote:
> [Cc Waiman]
>
> On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
>> [ Back from travel, so trying to make sense of this series.. ]
>>
>> On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <byungchul.park@lge.com> wrote:
>>> I've been developing a tool for detecting deadlock possibilities by
>>> tracking wait/event rather than lock(?) acquisition order to try to
>>> cover all synchonization machanisms. It's done on v6.2-rc2.
>> Ugh. I hate how this adds random patterns like
>>
>>          if (timeout == MAX_SCHEDULE_TIMEOUT)
>>                  sdt_might_sleep_strong(NULL);
>>          else
>>                  sdt_might_sleep_strong_timeout(NULL);
>>     ...
>>          sdt_might_sleep_finish();
>>
>> to various places, it seems so very odd and unmaintainable.
>>
>> I also recall this giving a fair amount of false positives, are they all fixed?
>>
>  From the following part in the cover letter, I guess the answer is no?
>
> 	...
>          6. Multiple reports are allowed.
>          7. Deduplication control on multiple reports.
>          8. Withstand false positives thanks to 6.
> 	...
>
> seems to me that the logic is since DEPT allows multiple reports so that
> false positives are fitlerable by users?
>
>> Anyway, I'd really like the lockdep people to comment and be involved.
> I never get Cced, so I'm unware of this for a long time...
>
> A few comments after a quick look:
>
> *	Looks like the DEPT dependency graph doesn't handle the
> 	fair/unfair readers as lockdep current does. Which bring the
> 	next question.
>
> *	Can DEPT pass all the selftests of lockdep in
> 	lib/locking-selftests.c?
>
> *	Instead of introducing a brand new detector/dependency tracker,
> 	could we first improve the lockdep's dependency tracker? I think
> 	Byungchul also agrees that DEPT and lockdep should share the
> 	same dependency tracker and the benefit of improving the
> 	existing one is that we can always use the self test to catch
> 	any regression. Thoughts?
>
> Actually the above sugguest is just to revert revert cross-release
> without exposing any annotation, which I think is more practical to
> review and test.
>
> I'd sugguest we 1) first improve the lockdep dependency tracker with
> wait/event in mind and then 2) introduce wait related annotation so that
> users can use, and then 3) look for practical ways to resolve false
> positives/multi reports with the help of users, if all goes well,
> 4) make it all operation annotated.

I agree with your suggestions. In fact, the lockdep code itself is one 
of major overheads when running a debug kernel. If we have another set 
of parallel dependency tracker, we may slow down a debug kernel even 
more. So I would rather prefer improving the existing lockdep code 
instead creating a completely new one.

I do agree that the lockdep code itself is now rather complex. A 
separate dependency tracker, however, may undergo similar transformation 
over time to become more and more complex due to the needs to meet 
different requirement and constraints.

Cheers,
Longman

Thomas Gleixner Jan. 18, 2023, 12:55 p.m. UTC | #4

On Tue, Jan 17 2023 at 10:18, Boqun Feng wrote:
> On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
>> I also recall this giving a fair amount of false positives, are they all fixed?
>
> From the following part in the cover letter, I guess the answer is no?
> 	...
>         6. Multiple reports are allowed.
>         7. Deduplication control on multiple reports.
>         8. Withstand false positives thanks to 6.
> 	...
>
> seems to me that the logic is since DEPT allows multiple reports so that
> false positives are fitlerable by users?

I really do not know what's so valuable about multiple reports. They
produce a flood of information which needs to be filtered, while a
single report ensures that the first detected issue is dumped, which
increases the probability that it can be recorded and acted upon.

Filtering out false positives is just the wrong approach. Decoding
dependency issues from any tracker is complex enough given the nature of
the problem, so adding the burden of filtering out issues from a stream
of dumps is not helpful at all. It's just a marketing gag.

> *	Instead of introducing a brand new detector/dependency tracker,
> 	could we first improve the lockdep's dependency tracker? I think
> 	Byungchul also agrees that DEPT and lockdep should share the
> 	same dependency tracker and the benefit of improving the
> 	existing one is that we can always use the self test to catch
> 	any regression. Thoughts?

Ack. If the internal implementation of lockdep has shortcomings, then we
can expand and/or replace it instead of having yet another
infrastructure which is not even remotely as mature.

Thanks,

        tglx

Byungchul Park Jan. 19, 2023, 12:58 a.m. UTC | #5

Torvalds wrote:
> On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <byungchul.park@lge.com> wrote:
>>
>> I've been developing a tool for detecting deadlock possibilities by
>> tracking wait/event rather than lock(?) acquisition order to try to
>> cover all synchonization machanisms. It's done on v6.2-rc2.
> 
> Ugh. I hate how this adds random patterns like

I undertand what you mean.. But all the synchronization primitives
should let DEPT know the beginning and the end of each. However, I will
remove the 'if' statement that looks ugly from the next spin, and place
the pattern to a better place if possible.

> 	if (timeout == MAX_SCHEDULE_TIMEOUT)
> 		sdt_might_sleep_strong(NULL);
> 	else
> 		sdt_might_sleep_strong_timeout(NULL);
> 	...
> 	sdt_might_sleep_finish();
> 
> to various places, it seems so very odd and unmaintainable.
> 
> I also recall this giving a fair amount of false positives, are they all fixed?

Yes. Of course I removed all the false positives we found.

> Anyway, I'd really like the lockdep people to comment and be involved.
> We did have a fairly recent case of "lockdep doesn't track page lock
> dependencies because it fundamentally cannot" issue, so DEPT might fix
> those kinds of missing dependency analysis. See

Sure. That's exactly what DEPT works for e.g. PG_locked.

> 	https://lore.kernel.org/lkml/00000000000060d41f05f139aa44@google.com/

I will reproduce it and share the result.

> for some context to that one, but at teh same time I would *really*
> want the lockdep people more involved and acking this work.
> 
> Maybe I missed the email where you reported on things DEPT has found
> (and on the lack of false positives)?

Maybe you didn't miss. It's still too hard to make a decision between:

	Aggressive detection with false alarms that need to be fixed by
	manual classification as Lockdep did, focusing on potential
	possibility more.

	versus

	Conservative detection with few false alarms, which requires us
	to test much longer to get result we expect, focusing on actual
	happening.

> 
> 	Linus

	Byungchul

Byungchul Park Jan. 19, 2023, 6:23 a.m. UTC | #6

Boqun wrote:
> On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
> > [ Back from travel, so trying to make sense of this series..  ]
> > 
> > On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <byungchul.park@lge.com> wrote:
> > >
> > > I've been developing a tool for detecting deadlock possibilities by
> > > tracking wait/event rather than lock(?) acquisition order to try to
> > > cover all synchonization machanisms. It's done on v6.2-rc2.
> > 
> > Ugh. I hate how this adds random patterns like
> > 
> >         if (timeout == MAX_SCHEDULE_TIMEOUT)
> >                 sdt_might_sleep_strong(NULL);
> >         else
> >                 sdt_might_sleep_strong_timeout(NULL);
> >    ...
> >         sdt_might_sleep_finish();
> > 
> > to various places, it seems so very odd and unmaintainable.
> > 
> > I also recall this giving a fair amount of false positives, are they all fixed?
> > 
> 
> From the following part in the cover letter, I guess the answer is no?

I fixed what we found anyway.

> 	...
> 	6. Multiple reports are allowed.
> 	7. Deduplication control on multiple reports.
> 	8. Withstand false positives thanks to 6.
> 	...
> 
> seems to me that the logic is since DEPT allows multiple reports so that
> false positives are fitlerable by users?

At lease, it's needed until DEPT is considered stable because stronger
detection inevitably has more chance of false alarms unless we do manual
fix on each, which is the same as Lockdep.

> > Anyway, I'd really like the lockdep people to comment and be involved.
> 
> I never get Cced, so I'm unware of this for a long time...

Sorry I missed it. I will cc you from now on.

> A few comments after a quick look:
> 
> *	Looks like the DEPT dependency graph doesn't handle the
> 	fair/unfair readers as lockdep current does. Which bring the
> 	next question.

No. DEPT works better for unfair read. It works based on wait/event. So
read_lock() is considered a potential wait waiting on write_unlock()
while write_lock() is considered a potential wait waiting on either
write_unlock() or read_unlock(). DEPT is working perfect for it.

For fair read (maybe you meant queued read lock), I think the case
should be handled in the same way as normal lock. I might get it wrong.
Please let me know if I miss something.

> *	Can DEPT pass all the selftests of lockdep in
> 	lib/locking-selftests.c?
> 
> *	Instead of introducing a brand new detector/dependency tracker,
> 	could we first improve the lockdep's dependency tracker? I think

At the beginning of this work, of course I was thinking to improve
Lockdep but I decided to implement a new tool because:

	1. What we need to check for deadlock detection is no longer
	   lock dependency but more fundamental dependency by wait/event.
	   A better design would have a separate dependency engine for
	   that, not within Lockdep. Remind lock/unlock are also
	   wait/event after all.

	2. I was thinking to revert the revert of cross-release. But it
	   will start to report false alarms as Lockdep was at the
	   beginning, and require us to keep fixing things until being
	   able to see what we are interested in, maybe for ever. How
	   can we control that situation? I wouldn't use this extention.

	3. Okay. Let's think about modifying the current Lockdep to make
	   it work similar to DEPT. It'd require us to pay more effort
	   than developing a new simple tool from the scratch with the
	   basic requirement.

	4. Big change at once right away? No way. The new tool need to
	   be matured and there are ones who want to make use of DEPT at
	   the same time. The best approach would be I think to go along
	   together for a while.

Please don't look at each detail but the big picture, the architecture.
Plus, please consider I introduce a tool only focucing on fundamental
dependency itself that Lockdep can make use of. I wish great developers
like you would join improve the common engine togather.

> 	Byungchul also agrees that DEPT and lockdep should share the
> 	same dependency tracker and the benefit of improving the

I agree that both should share a single tracker.

> 	existing one is that we can always use the self test to catch
> 	any regression. Thoughts?

I imagine the follownig look for the final form:

     Lock correctness checker(LOCKDEP)
     +-----------------------------------------+
     | Lock usage correctness check            |
     |                                         |
     |                                         |
     |       (Request dependency check)        |
     |                           T             |
     +---------------------------|-------------+
                                 |
     Dependency tracker(DEPT)    V
     +-----------------------------------------+
     | Dependency check                        |
     | (by tracking wait and event context)    |
     +-----------------------------------------+

> Actually the above sugguest is just to revert revert cross-release
> without exposing any annotation, which I think is more practical to
> review and test.

Reverting the revert of cross-release is not bad. But I'd suggest a
nicer design for the reasons I explained above.

	Byungchul

> I'd sugguest we 1) first improve the lockdep dependency tracker with
> wait/event in mind and then 2) introduce wait related annotation so that
> users can use, and then 3) look for practical ways to resolve false
> positives/multi reports with the help of users, if all goes well,
> 4) make it all operation annotated.
> 
> Thoughts?
> 
> Regards,
> Boqun

Byungchul Park Jan. 19, 2023, 7:06 a.m. UTC | #7

Byungchul wrote:
> Boqun wrote:
> > On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
> > > [ Back from travel, so trying to make sense of this series..  ]
> > > 
> > > On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <byungchul.park@lge.com> wrote:
> > > >
> > > > I've been developing a tool for detecting deadlock possibilities by
> > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > cover all synchonization machanisms. It's done on v6.2-rc2.
> > > 
> > > Ugh. I hate how this adds random patterns like
> > > 
> > >         if (timeout == MAX_SCHEDULE_TIMEOUT)
> > >                 sdt_might_sleep_strong(NULL);
> > >         else
> > >                 sdt_might_sleep_strong_timeout(NULL);
> > >    ...
> > >         sdt_might_sleep_finish();
> > > 
> > > to various places, it seems so very odd and unmaintainable.
> > > 
> > > I also recall this giving a fair amount of false positives, are they all fixed?
> > > 
> > 
> > From the following part in the cover letter, I guess the answer is no?
> 
> I fixed what we found anyway.
> 
> > 	...
> > 	6. Multiple reports are allowed.
> > 	7. Deduplication control on multiple reports.
> > 	8. Withstand false positives thanks to 6.
> > 	...
> > 
> > seems to me that the logic is since DEPT allows multiple reports so that
> > false positives are fitlerable by users?
> 
> At lease, it's needed until DEPT is considered stable because stronger
> detection inevitably has more chance of false alarms unless we do manual
> fix on each, which is the same as Lockdep.
> 
> > > Anyway, I'd really like the lockdep people to comment and be involved.
> > 
> > I never get Cced, so I'm unware of this for a long time...
> 
> Sorry I missed it. I will cc you from now on.
> 
> > A few comments after a quick look:
> > 
> > *	Looks like the DEPT dependency graph doesn't handle the
> > 	fair/unfair readers as lockdep current does. Which bring the
> > 	next question.
> 
> No. DEPT works better for unfair read. It works based on wait/event. So
> read_lock() is considered a potential wait waiting on write_unlock()
> while write_lock() is considered a potential wait waiting on either
> write_unlock() or read_unlock(). DEPT is working perfect for it.
> 
> For fair read (maybe you meant queued read lock), I think the case
> should be handled in the same way as normal lock. I might get it wrong.
> Please let me know if I miss something.
> 
> > *	Can DEPT pass all the selftests of lockdep in
> > 	lib/locking-selftests.c?
> > 
> > *	Instead of introducing a brand new detector/dependency tracker,
> > 	could we first improve the lockdep's dependency tracker? I think
> 
> At the beginning of this work, of course I was thinking to improve
> Lockdep but I decided to implement a new tool because:
> 
> 	1. What we need to check for deadlock detection is no longer
> 	   lock dependency but more fundamental dependency by wait/event.
> 	   A better design would have a separate dependency engine for
> 	   that, not within Lockdep. Remind lock/unlock are also
> 	   wait/event after all.
> 
> 	2. I was thinking to revert the revert of cross-release. But it
> 	   will start to report false alarms as Lockdep was at the
> 	   beginning, and require us to keep fixing things until being
> 	   able to see what we are interested in, maybe for ever. How
> 	   can we control that situation? I wouldn't use this extention.
> 
> 	3. Okay. Let's think about modifying the current Lockdep to make
> 	   it work similar to DEPT. It'd require us to pay more effort
> 	   than developing a new simple tool from the scratch with the
> 	   basic requirement.
> 
> 	4. Big change at once right away? No way. The new tool need to
> 	   be matured and there are ones who want to make use of DEPT at
> 	   the same time. The best approach would be I think to go along
> 	   together for a while.

(Appologize for this. Let me re-write this part.)

	4. Big change at once right away? No way. The new feature need
	   to be matured and there are ones who want to use the new
	   feature at the same time. The best approach would be I think
	   to go along together for a while.

Thanks,
	Byungchul

> Please don't look at each detail but the big picture, the architecture.
> Plus, please consider I introduce a tool only focucing on fundamental
> dependency itself that Lockdep can make use of. I wish great developers
> like you would join improve the common engine togather.
> 
> > 	Byungchul also agrees that DEPT and lockdep should share the
> > 	same dependency tracker and the benefit of improving the
> 
> I agree that both should share a single tracker.
> 
> > 	existing one is that we can always use the self test to catch
> > 	any regression. Thoughts?
> 
> I imagine the follownig look for the final form:
> 
>      Lock correctness checker(LOCKDEP)
>      +-----------------------------------------+
>      | Lock usage correctness check            |
>      |                                         |
>      |                                         |
>      |       (Request dependency check)        |
>      |                           T             |
>      +---------------------------|-------------+
>                                  |
>      Dependency tracker(DEPT)    V
>      +-----------------------------------------+
>      | Dependency check                        |
>      | (by tracking wait and event context)    |
>      +-----------------------------------------+
> 
> > Actually the above sugguest is just to revert revert cross-release
> > without exposing any annotation, which I think is more practical to
> > review and test.
> 
> Reverting the revert of cross-release is not bad. But I'd suggest a
> nicer design for the reasons I explained above.
> 
> 	Byungchul
> 
> > I'd sugguest we 1) first improve the lockdep dependency tracker with
> > wait/event in mind and then 2) introduce wait related annotation so that
> > users can use, and then 3) look for practical ways to resolve false
> > positives/multi reports with the help of users, if all goes well,
> > 4) make it all operation annotated.
> > 
> > Thoughts?
> > 
> > Regards,
> > Boqun

Byungchul Park Jan. 19, 2023, 9:05 a.m. UTC | #8

Thomas wrote:
> On Tue, Jan 17 2023 at 10:18, Boqun Feng wrote:
> > On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
> > > I also recall this giving a fair amount of false positives, are they all fixed?
> >
> > From the following part in the cover letter, I guess the answer is no?
> >	...
> >       6. Multiple reports are allowed.
> >       7. Deduplication control on multiple reports.
> >       8. Withstand false positives thanks to 6.
> >	...
> >
> > seems to me that the logic is since DEPT allows multiple reports so that
> > false positives are fitlerable by users?
> 
> I really do not know what's so valuable about multiple reports. They
> produce a flood of information which needs to be filtered, while a
> single report ensures that the first detected issue is dumped, which
> increases the probability that it can be recorded and acted upon.

Assuming the following 2 assumptions, you are right.

Assumption 1. There will be too many reports with the multi-report
	      support, like all the combination of dependencies between
	      e.g. in-irq and irq-enabled-context.

Assumption 2. The detection is matured enough so that it barely happens
	      to fix false onces to see true one which is not a big deal.

However, DEPT doesn't generate all the combination of irq things as
Lockdep does so we only see a few multi-reports even with the support,
and I admit DEPT hasn't matured enough yet because fine classification
is required anyway to suppress false alarms. That's why I introduced
multi-report support at least for now. IMHO, it'd be still useful even
if it's gonna report a few true ones at once w/o false ones some day.

> Filtering out false positives is just the wrong approach. Decoding
> dependency issues from any tracker is complex enough given the nature of
> the problem, so adding the burden of filtering out issues from a stream
> of dumps is not helpful at all. It's just a marketing gag.
> 
> > *	Instead of introducing a brand new detector/dependency tracker,
> >	could we first improve the lockdep's dependency tracker? I think
> >	Byungchul also agrees that DEPT and lockdep should share the
> >	same dependency tracker and the benefit of improving the
> >	existing one is that we can always use the self test to catch
> >	any regression. Thoughts?
> 
> Ack. If the internal implementation of lockdep has shortcomings, then we
> can expand and/or replace it instead of having yet another
> infrastructure which is not even remotely as mature.

Ultimately, yes. We should expand or replace it instead of having
another ultimately.

	Byungchul
> 
> Thanks,
> 
>         tglx

Matthew Wilcox Jan. 19, 2023, 1:33 p.m. UTC | #9

On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> Boqun wrote:
> > *	Looks like the DEPT dependency graph doesn't handle the
> > 	fair/unfair readers as lockdep current does. Which bring the
> > 	next question.
> 
> No. DEPT works better for unfair read. It works based on wait/event. So
> read_lock() is considered a potential wait waiting on write_unlock()
> while write_lock() is considered a potential wait waiting on either
> write_unlock() or read_unlock(). DEPT is working perfect for it.
> 
> For fair read (maybe you meant queued read lock), I think the case
> should be handled in the same way as normal lock. I might get it wrong.
> Please let me know if I miss something.

From the lockdep/DEPT point of view, the question is whether:

	read_lock(A)
	read_lock(A)

can deadlock if a writer comes in between the two acquisitions and
sleeps waiting on A to be released.  A fair lock will block new
readers when a writer is waiting, while an unfair lock will allow
new readers even while a writer is waiting.

Boqun Feng Jan. 19, 2023, 7:25 p.m. UTC | #10

On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > Boqun wrote:
> > > *	Looks like the DEPT dependency graph doesn't handle the
> > > 	fair/unfair readers as lockdep current does. Which bring the
> > > 	next question.
> > 
> > No. DEPT works better for unfair read. It works based on wait/event. So
> > read_lock() is considered a potential wait waiting on write_unlock()
> > while write_lock() is considered a potential wait waiting on either
> > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > 
> > For fair read (maybe you meant queued read lock), I think the case
> > should be handled in the same way as normal lock. I might get it wrong.
> > Please let me know if I miss something.
> 
> From the lockdep/DEPT point of view, the question is whether:
> 
> 	read_lock(A)
> 	read_lock(A)
> 
> can deadlock if a writer comes in between the two acquisitions and
> sleeps waiting on A to be released.  A fair lock will block new
> readers when a writer is waiting, while an unfair lock will allow
> new readers even while a writer is waiting.
> 

To be more accurate, a fair reader will wait if there is a writer
waiting for other reader (fair or not) to unlock, and an unfair reader
won't.

In kernel there are read/write locks that can have both fair and unfair
readers (e.g. queued rwlock). Regarding deadlocks,

	T0		T1		T2
	--		--		--
	fair_read_lock(A);
			write_lock(B);
					write_lock(A);
	write_lock(B);
			unfair_read_lock(A);

the above is not a deadlock, since T1's unfair reader can "steal" the
lock. However the following is a deadlock:

	T0		T1		T2
	--		--		--
	unfair_read_lock(A);
			write_lock(B);
					write_lock(A);
	write_lock(B);
			fair_read_lock(A);

, since T'1 fair reader will wait.

FWIW, lockdep is able to catch this (figuring out which is deadlock and
which is not) since two years ago, plus other trivial deadlock detection
for read/write locks. Needless to say, if lib/lock-selftests.c was given
a try, one could find it out on one's own.

Regards,
Boqun

Byungchul Park Jan. 20, 2023, 1:51 a.m. UTC | #11

Boqun wrote:
> On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> > On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > > Boqun wrote:
> > > > *Looks like the DEPT dependency graph doesn't handle the
> > > > fair/unfair readers as lockdep current does. Which bring the
> > > > next question.
> > > 
> > > No. DEPT works better for unfair read. It works based on wait/event. So
> > > read_lock() is considered a potential wait waiting on write_unlock()
> > > while write_lock() is considered a potential wait waiting on either
> > > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > > 
> > > For fair read (maybe you meant queued read lock), I think the case
> > > should be handled in the same way as normal lock. I might get it wrong.
> > > Please let me know if I miss something.
> > 
> > From the lockdep/DEPT point of view, the question is whether:
> > 
> >	read_lock(A)
> >	read_lock(A)
> > 
> > can deadlock if a writer comes in between the two acquisitions and
> > sleeps waiting on A to be released.  A fair lock will block new
> > readers when a writer is waiting, while an unfair lock will allow
> > new readers even while a writer is waiting.
> > 
> 
> To be more accurate, a fair reader will wait if there is a writer
> waiting for other reader (fair or not) to unlock, and an unfair reader
> won't.

What a kind guys, both of you! Thanks.

I asked to check if there are other subtle things than this. Fortunately,
I already understand what you guys shared.

> In kernel there are read/write locks that can have both fair and unfair
> readers (e.g. queued rwlock). Regarding deadlocks,
> 
> 	T0		T1		T2
> 	--		--		--
> 	fair_read_lock(A);
> 			write_lock(B);
> 					write_lock(A);
> 	write_lock(B);
> 			unfair_read_lock(A);

With the DEPT's point of view (let me re-write the scenario):

	T0		T1		T2
	--		--		--
	fair_read_lock(A);
			write_lock(B);
					write_lock(A);
	write_lock(B);
			unfair_read_lock(A);
	write_unlock(B);
	read_unlock(A);
			read_unlock(A);
			write_unlock(B);
					write_unlock(A);

T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
    not doing either write_unlock(B) or read_unlock(B). In other words:

      1. read_unlock(A) happening depends on write_unlock(B) happening.
      2. read_unlock(A) happening depends on read_unlock(B) happening.

T1: write_unlock(B) cannot happen if unfair_read_lock(A) is stuck by a A
    owner not doing write_unlock(A). In other words:

      3. write_unlock(B) happening depends on write_unlock(A) happening.

1, 2 and 3 give the following dependencies:

    1. read_unlock(A) -> write_unlock(B)
    2. read_unlock(A) -> read_unlock(B)
    3. write_unlock(B) -> write_unlock(A)

There's no circular dependency so it's safe. DEPT doesn't report this.

> the above is not a deadlock, since T1's unfair reader can "steal" the
> lock. However the following is a deadlock:
> 
> 	T0		T1		T2
> 	--		--		--
> 	unfair_read_lock(A);
> 			write_lock(B);
> 					write_lock(A);
> 	write_lock(B);
> 			fair_read_lock(A);
> 
> , since T'1 fair reader will wait.

With the DEPT's point of view (let me re-write the scenario):

	T0		T1		T2
	--		--		--
	unfair_read_lock(A);
			write_lock(B);
					write_lock(A);
	write_lock(B);
			fair_read_lock(A);
	write_unlock(B);
	read_unlock(A);
			read_unlock(A);
			write_unlock(B);
					write_unlock(A);

T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
    not doing either write_unlock(B) or read_unlock(B). In other words:

      1. read_unlock(A) happening depends on write_unlock(B) happening.
      2. read_unlock(A) happening depends on read_unlock(B) happening.

T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
    owner not doing either write_unlock(A) or read_unlock(A). In other
    words:

      3. write_unlock(B) happening depends on write_unlock(A) happening.
      4. write_unlock(B) happening depends on read_unlock(A) happening.

1, 2, 3 and 4 give the following dependencies:

    1. read_unlock(A) -> write_unlock(B)
    2. read_unlock(A) -> read_unlock(B)
    3. write_unlock(B) -> write_unlock(A)
    4. write_unlock(B) -> read_unlock(A)

With 1 and 4, there's a circular dependency so DEPT definitely report
this as a problem.

REMIND: DEPT focuses on waits and events.

> FWIW, lockdep is able to catch this (figuring out which is deadlock and
> which is not) since two years ago, plus other trivial deadlock detection
> for read/write locks. Needless to say, if lib/lock-selftests.c was given
> a try, one could find it out on one's own.
> 
> Regards,
> Boqun
>

Boqun Feng Jan. 20, 2023, 2:23 a.m. UTC | #12

On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> Boqun wrote:
> > On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> > > On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > > > Boqun wrote:
> > > > > *Looks like the DEPT dependency graph doesn't handle the
> > > > > fair/unfair readers as lockdep current does. Which bring the
> > > > > next question.
> > > > 
> > > > No. DEPT works better for unfair read. It works based on wait/event. So
> > > > read_lock() is considered a potential wait waiting on write_unlock()
> > > > while write_lock() is considered a potential wait waiting on either
> > > > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > > > 
> > > > For fair read (maybe you meant queued read lock), I think the case
> > > > should be handled in the same way as normal lock. I might get it wrong.
> > > > Please let me know if I miss something.
> > > 
> > > From the lockdep/DEPT point of view, the question is whether:
> > > 
> > >	read_lock(A)
> > >	read_lock(A)
> > > 
> > > can deadlock if a writer comes in between the two acquisitions and
> > > sleeps waiting on A to be released.  A fair lock will block new
> > > readers when a writer is waiting, while an unfair lock will allow
> > > new readers even while a writer is waiting.
> > > 
> > 
> > To be more accurate, a fair reader will wait if there is a writer
> > waiting for other reader (fair or not) to unlock, and an unfair reader
> > won't.
> 
> What a kind guys, both of you! Thanks.
> 
> I asked to check if there are other subtle things than this. Fortunately,
> I already understand what you guys shared.
> 
> > In kernel there are read/write locks that can have both fair and unfair
> > readers (e.g. queued rwlock). Regarding deadlocks,
> > 
> > 	T0		T1		T2
> > 	--		--		--
> > 	fair_read_lock(A);
> > 			write_lock(B);
> > 					write_lock(A);
> > 	write_lock(B);
> > 			unfair_read_lock(A);
> 
> With the DEPT's point of view (let me re-write the scenario):
> 
> 	T0		T1		T2
> 	--		--		--
> 	fair_read_lock(A);
> 			write_lock(B);
> 					write_lock(A);
> 	write_lock(B);
> 			unfair_read_lock(A);
> 	write_unlock(B);
> 	read_unlock(A);
> 			read_unlock(A);
> 			write_unlock(B);
> 					write_unlock(A);
> 
> T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
>     not doing either write_unlock(B) or read_unlock(B). In other words:
> 
>       1. read_unlock(A) happening depends on write_unlock(B) happening.
>       2. read_unlock(A) happening depends on read_unlock(B) happening.
> 
> T1: write_unlock(B) cannot happen if unfair_read_lock(A) is stuck by a A
>     owner not doing write_unlock(A). In other words:
> 
>       3. write_unlock(B) happening depends on write_unlock(A) happening.
> 
> 1, 2 and 3 give the following dependencies:
> 
>     1. read_unlock(A) -> write_unlock(B)
>     2. read_unlock(A) -> read_unlock(B)
>     3. write_unlock(B) -> write_unlock(A)
> 
> There's no circular dependency so it's safe. DEPT doesn't report this.
> 
> > the above is not a deadlock, since T1's unfair reader can "steal" the
> > lock. However the following is a deadlock:
> > 
> > 	T0		T1		T2
> > 	--		--		--
> > 	unfair_read_lock(A);
> > 			write_lock(B);
> > 					write_lock(A);
> > 	write_lock(B);
> > 			fair_read_lock(A);
> > 
> > , since T'1 fair reader will wait.
> 
> With the DEPT's point of view (let me re-write the scenario):
> 
> 	T0		T1		T2
> 	--		--		--
> 	unfair_read_lock(A);
> 			write_lock(B);
> 					write_lock(A);
> 	write_lock(B);
> 			fair_read_lock(A);
> 	write_unlock(B);
> 	read_unlock(A);
> 			read_unlock(A);
> 			write_unlock(B);
> 					write_unlock(A);
> 
> T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
>     not doing either write_unlock(B) or read_unlock(B). In other words:
> 
>       1. read_unlock(A) happening depends on write_unlock(B) happening.
>       2. read_unlock(A) happening depends on read_unlock(B) happening.
> 
> T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
>     owner not doing either write_unlock(A) or read_unlock(A). In other
>     words:
> 
>       3. write_unlock(B) happening depends on write_unlock(A) happening.
>       4. write_unlock(B) happening depends on read_unlock(A) happening.
> 
> 1, 2, 3 and 4 give the following dependencies:
> 
>     1. read_unlock(A) -> write_unlock(B)
>     2. read_unlock(A) -> read_unlock(B)
>     3. write_unlock(B) -> write_unlock(A)
>     4. write_unlock(B) -> read_unlock(A)
> 
> With 1 and 4, there's a circular dependency so DEPT definitely report
> this as a problem.
> 
> REMIND: DEPT focuses on waits and events.

Do you have the test cases showing DEPT can detect this?

Regards,
Boqun

> 
> > FWIW, lockdep is able to catch this (figuring out which is deadlock and
> > which is not) since two years ago, plus other trivial deadlock detection
> > for read/write locks. Needless to say, if lib/lock-selftests.c was given
> > a try, one could find it out on one's own.
> > 
> > Regards,
> > Boqun
> >

Boqun Feng Jan. 20, 2023, 3:07 a.m. UTC | #13

On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> > Boqun wrote:
> > > On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> > > > On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > > > > Boqun wrote:
> > > > > > *Looks like the DEPT dependency graph doesn't handle the
> > > > > > fair/unfair readers as lockdep current does. Which bring the
> > > > > > next question.
> > > > > 
> > > > > No. DEPT works better for unfair read. It works based on wait/event. So
> > > > > read_lock() is considered a potential wait waiting on write_unlock()
> > > > > while write_lock() is considered a potential wait waiting on either
> > > > > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > > > > 
> > > > > For fair read (maybe you meant queued read lock), I think the case
> > > > > should be handled in the same way as normal lock. I might get it wrong.
> > > > > Please let me know if I miss something.
> > > > 
> > > > From the lockdep/DEPT point of view, the question is whether:
> > > > 
> > > >	read_lock(A)
> > > >	read_lock(A)
> > > > 
> > > > can deadlock if a writer comes in between the two acquisitions and
> > > > sleeps waiting on A to be released.  A fair lock will block new
> > > > readers when a writer is waiting, while an unfair lock will allow
> > > > new readers even while a writer is waiting.
> > > > 
> > > 
> > > To be more accurate, a fair reader will wait if there is a writer
> > > waiting for other reader (fair or not) to unlock, and an unfair reader
> > > won't.
> > 
> > What a kind guys, both of you! Thanks.
> > 
> > I asked to check if there are other subtle things than this. Fortunately,
> > I already understand what you guys shared.
> > 
> > > In kernel there are read/write locks that can have both fair and unfair
> > > readers (e.g. queued rwlock). Regarding deadlocks,
> > > 
> > > 	T0		T1		T2
> > > 	--		--		--
> > > 	fair_read_lock(A);
> > > 			write_lock(B);
> > > 					write_lock(A);
> > > 	write_lock(B);
> > > 			unfair_read_lock(A);
> > 
> > With the DEPT's point of view (let me re-write the scenario):
> > 
> > 	T0		T1		T2
> > 	--		--		--
> > 	fair_read_lock(A);
> > 			write_lock(B);
> > 					write_lock(A);
> > 	write_lock(B);
> > 			unfair_read_lock(A);
> > 	write_unlock(B);
> > 	read_unlock(A);
> > 			read_unlock(A);
> > 			write_unlock(B);
> > 					write_unlock(A);
> > 
> > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> >     not doing either write_unlock(B) or read_unlock(B). In other words:
> > 
> >       1. read_unlock(A) happening depends on write_unlock(B) happening.
> >       2. read_unlock(A) happening depends on read_unlock(B) happening.
> > 
> > T1: write_unlock(B) cannot happen if unfair_read_lock(A) is stuck by a A
> >     owner not doing write_unlock(A). In other words:
> > 
> >       3. write_unlock(B) happening depends on write_unlock(A) happening.
> > 
> > 1, 2 and 3 give the following dependencies:
> > 
> >     1. read_unlock(A) -> write_unlock(B)
> >     2. read_unlock(A) -> read_unlock(B)
> >     3. write_unlock(B) -> write_unlock(A)
> > 
> > There's no circular dependency so it's safe. DEPT doesn't report this.
> > 
> > > the above is not a deadlock, since T1's unfair reader can "steal" the
> > > lock. However the following is a deadlock:
> > > 
> > > 	T0		T1		T2
> > > 	--		--		--
> > > 	unfair_read_lock(A);
> > > 			write_lock(B);
> > > 					write_lock(A);
> > > 	write_lock(B);
> > > 			fair_read_lock(A);
> > > 
> > > , since T'1 fair reader will wait.
> > 
> > With the DEPT's point of view (let me re-write the scenario):
> > 
> > 	T0		T1		T2
> > 	--		--		--
> > 	unfair_read_lock(A);
> > 			write_lock(B);
> > 					write_lock(A);
> > 	write_lock(B);
> > 			fair_read_lock(A);
> > 	write_unlock(B);
> > 	read_unlock(A);
> > 			read_unlock(A);
> > 			write_unlock(B);
> > 					write_unlock(A);
> > 
> > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> >     not doing either write_unlock(B) or read_unlock(B). In other words:
> > 
> >       1. read_unlock(A) happening depends on write_unlock(B) happening.
> >       2. read_unlock(A) happening depends on read_unlock(B) happening.
> > 
> > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> >     owner not doing either write_unlock(A) or read_unlock(A). In other
> >     words:
> > 
> >       3. write_unlock(B) happening depends on write_unlock(A) happening.
> >       4. write_unlock(B) happening depends on read_unlock(A) happening.
> > 
> > 1, 2, 3 and 4 give the following dependencies:
> > 
> >     1. read_unlock(A) -> write_unlock(B)
> >     2. read_unlock(A) -> read_unlock(B)
> >     3. write_unlock(B) -> write_unlock(A)
> >     4. write_unlock(B) -> read_unlock(A)
> > 
> > With 1 and 4, there's a circular dependency so DEPT definitely report
> > this as a problem.
> > 
> > REMIND: DEPT focuses on waits and events.
> 
> Do you have the test cases showing DEPT can detect this?
> 

Just tried the following on your latest GitHub branch, I commented all
but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
Feel free to double check.

Regards,
Boqun

------------------------------------------->8
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index cd89138d62ba..f38e4109e013 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -2375,6 +2375,7 @@ static void ww_tests(void)
  */
 static void queued_read_lock_hardirq_RE_Er(void)
 {
+	// T0
 	HARDIRQ_ENTER();
 	read_lock(&rwlock_A);
 	LOCK(B);
@@ -2382,12 +2383,17 @@ static void queued_read_lock_hardirq_RE_Er(void)
 	read_unlock(&rwlock_A);
 	HARDIRQ_EXIT();
 
+	// T1
 	HARDIRQ_DISABLE();
 	LOCK(B);
 	read_lock(&rwlock_A);
 	read_unlock(&rwlock_A);
 	UNLOCK(B);
 	HARDIRQ_ENABLE();
+
+	// T2
+	write_lock_irq(&rwlock_A);
+	write_unlock_irq(&rwlock_A);
 }
 
 /*
@@ -2455,6 +2461,7 @@ static void queued_read_lock_tests(void)
 	dotest(queued_read_lock_hardirq_RE_Er, FAILURE, LOCKTYPE_RWLOCK);
 	pr_cont("\n");
 
+#if 0
 	print_testname("hardirq lock-read/read-lock");
 	dotest(queued_read_lock_hardirq_ER_rE, SUCCESS, LOCKTYPE_RWLOCK);
 	pr_cont("\n");
@@ -2462,6 +2469,7 @@ static void queued_read_lock_tests(void)
 	print_testname("hardirq inversion");
 	dotest(queued_read_lock_hardirq_inversion, FAILURE, LOCKTYPE_RWLOCK);
 	pr_cont("\n");
+#endif
 }
 
 static void fs_reclaim_correct_nesting(void)
@@ -2885,6 +2893,7 @@ void locking_selftest(void)
 	init_shared_classes();
 	lockdep_set_selftest_task(current);
 
+#if 0
 	DO_TESTCASE_6R("A-A deadlock", AA);
 	DO_TESTCASE_6R("A-B-B-A deadlock", ABBA);
 	DO_TESTCASE_6R("A-B-B-C-C-A deadlock", ABBCCA);
@@ -2967,6 +2976,7 @@ void locking_selftest(void)
 	DO_TESTCASE_6x2x2RW("irq read-recursion #3", irq_read_recursion3);
 
 	ww_tests();
+#endif
 
 	force_read_lock_recursive = 0;
 	/*
@@ -2975,6 +2985,7 @@ void locking_selftest(void)
 	if (IS_ENABLED(CONFIG_QUEUED_RWLOCKS))
 		queued_read_lock_tests();
 
+#if 0
 	fs_reclaim_tests();
 
 	/* Wait context test cases that are specific for RAW_LOCK_NESTING */
@@ -2987,6 +2998,7 @@ void locking_selftest(void)
 	dotest(hardirq_deadlock_softirq_not_deadlock, FAILURE, LOCKTYPE_SPECIAL);
 	pr_cont("\n");
 
+#endif
 	if (unexpected_testcase_failures) {
 		printk("-----------------------------------------------------------------\n");
 		debug_locks = 0;

Boqun Feng Jan. 20, 2023, 3:26 a.m. UTC | #14

On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote:
> On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> > > Boqun wrote:
> > > > On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> > > > > On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > > > > > Boqun wrote:
> > > > > > > *Looks like the DEPT dependency graph doesn't handle the
> > > > > > > fair/unfair readers as lockdep current does. Which bring the
> > > > > > > next question.
> > > > > > 
> > > > > > No. DEPT works better for unfair read. It works based on wait/event. So
> > > > > > read_lock() is considered a potential wait waiting on write_unlock()
> > > > > > while write_lock() is considered a potential wait waiting on either
> > > > > > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > > > > > 
> > > > > > For fair read (maybe you meant queued read lock), I think the case
> > > > > > should be handled in the same way as normal lock. I might get it wrong.
> > > > > > Please let me know if I miss something.
> > > > > 
> > > > > From the lockdep/DEPT point of view, the question is whether:
> > > > > 
> > > > >	read_lock(A)
> > > > >	read_lock(A)
> > > > > 
> > > > > can deadlock if a writer comes in between the two acquisitions and
> > > > > sleeps waiting on A to be released.  A fair lock will block new
> > > > > readers when a writer is waiting, while an unfair lock will allow
> > > > > new readers even while a writer is waiting.
> > > > > 
> > > > 
> > > > To be more accurate, a fair reader will wait if there is a writer
> > > > waiting for other reader (fair or not) to unlock, and an unfair reader
> > > > won't.
> > > 
> > > What a kind guys, both of you! Thanks.
> > > 
> > > I asked to check if there are other subtle things than this. Fortunately,
> > > I already understand what you guys shared.
> > > 
> > > > In kernel there are read/write locks that can have both fair and unfair
> > > > readers (e.g. queued rwlock). Regarding deadlocks,
> > > > 
> > > > 	T0		T1		T2
> > > > 	--		--		--
> > > > 	fair_read_lock(A);
> > > > 			write_lock(B);
> > > > 					write_lock(A);
> > > > 	write_lock(B);
> > > > 			unfair_read_lock(A);
> > > 
> > > With the DEPT's point of view (let me re-write the scenario):
> > > 
> > > 	T0		T1		T2
> > > 	--		--		--
> > > 	fair_read_lock(A);
> > > 			write_lock(B);
> > > 					write_lock(A);
> > > 	write_lock(B);
> > > 			unfair_read_lock(A);
> > > 	write_unlock(B);
> > > 	read_unlock(A);
> > > 			read_unlock(A);
> > > 			write_unlock(B);
> > > 					write_unlock(A);
> > > 
> > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > >     not doing either write_unlock(B) or read_unlock(B). In other words:
> > > 
> > >       1. read_unlock(A) happening depends on write_unlock(B) happening.
> > >       2. read_unlock(A) happening depends on read_unlock(B) happening.
> > > 
> > > T1: write_unlock(B) cannot happen if unfair_read_lock(A) is stuck by a A
> > >     owner not doing write_unlock(A). In other words:
> > > 
> > >       3. write_unlock(B) happening depends on write_unlock(A) happening.
> > > 
> > > 1, 2 and 3 give the following dependencies:
> > > 
> > >     1. read_unlock(A) -> write_unlock(B)
> > >     2. read_unlock(A) -> read_unlock(B)
> > >     3. write_unlock(B) -> write_unlock(A)
> > > 
> > > There's no circular dependency so it's safe. DEPT doesn't report this.
> > > 
> > > > the above is not a deadlock, since T1's unfair reader can "steal" the
> > > > lock. However the following is a deadlock:
> > > > 
> > > > 	T0		T1		T2
> > > > 	--		--		--
> > > > 	unfair_read_lock(A);
> > > > 			write_lock(B);
> > > > 					write_lock(A);
> > > > 	write_lock(B);
> > > > 			fair_read_lock(A);
> > > > 
> > > > , since T'1 fair reader will wait.
> > > 
> > > With the DEPT's point of view (let me re-write the scenario):
> > > 
> > > 	T0		T1		T2
> > > 	--		--		--
> > > 	unfair_read_lock(A);
> > > 			write_lock(B);
> > > 					write_lock(A);
> > > 	write_lock(B);
> > > 			fair_read_lock(A);
> > > 	write_unlock(B);
> > > 	read_unlock(A);
> > > 			read_unlock(A);
> > > 			write_unlock(B);
> > > 					write_unlock(A);
> > > 
> > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > >     not doing either write_unlock(B) or read_unlock(B). In other words:
> > > 
> > >       1. read_unlock(A) happening depends on write_unlock(B) happening.
> > >       2. read_unlock(A) happening depends on read_unlock(B) happening.
> > > 
> > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > >     owner not doing either write_unlock(A) or read_unlock(A). In other
> > >     words:
> > > 
> > >       3. write_unlock(B) happening depends on write_unlock(A) happening.
> > >       4. write_unlock(B) happening depends on read_unlock(A) happening.
> > > 
> > > 1, 2, 3 and 4 give the following dependencies:
> > > 
> > >     1. read_unlock(A) -> write_unlock(B)
> > >     2. read_unlock(A) -> read_unlock(B)
> > >     3. write_unlock(B) -> write_unlock(A)
> > >     4. write_unlock(B) -> read_unlock(A)
> > > 
> > > With 1 and 4, there's a circular dependency so DEPT definitely report
> > > this as a problem.
> > > 
> > > REMIND: DEPT focuses on waits and events.
> > 
> > Do you have the test cases showing DEPT can detect this?
> > 
> 
> Just tried the following on your latest GitHub branch, I commented all
> but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
> Feel free to double check.
> 

In case anyone else want to try, let me explain a little bit how to
verify the behavior of the detectors. With the change, the only test
that runs is 

	dotest(queued_read_lock_hardirq_RE_Er, FAILURE, LOCKTYPE_RWLOCK);

"FAILURE" indicates selftests think lockdep should report a deadlock,
therefore for lockdep if all goes well, you will see:

	[...]       hardirq read-lock/lock-read:  ok  |

If you expect lockdep to print a full splat in the test (lockdep is
silent by default), you can add "debug_locks_verbose=2" in the kernel
command line, "2" mean RWLOCK testsuite.

Regards,
Boqun

> Regards,
> Boqun

Byungchul Park Jan. 21, 2023, 2:40 a.m. UTC | #15

Byungchul wrote:
> Torvalds wrote:
> > On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <byungchul.park@lge.com> wrote:
> > >
> > > I've been developing a tool for detecting deadlock possibilities by
> > > tracking wait/event rather than lock(?) acquisition order to try to
> > > cover all synchonization machanisms. It's done on v6.2-rc2.
> > 
> > Ugh. I hate how this adds random patterns like
> 
> I undertand what you mean.. But all the synchronization primitives
> should let DEPT know the beginning and the end of each. However, I will
> remove the 'if' statement that looks ugly from the next spin, and place
> the pattern to a better place if possible.
> 
> >	if (timeout == MAX_SCHEDULE_TIMEOUT)
> >		sdt_might_sleep_strong(NULL);
> >	else
> >		sdt_might_sleep_strong_timeout(NULL);
> >	...
> >	sdt_might_sleep_finish();
> > 
> > to various places, it seems so very odd and unmaintainable.
> > 
> > I also recall this giving a fair amount of false positives, are they all fixed?
> 
> Yes. Of course I removed all the false positives we found.
> 
> > Anyway, I'd really like the lockdep people to comment and be involved.
> > We did have a fairly recent case of "lockdep doesn't track page lock
> > dependencies because it fundamentally cannot" issue, so DEPT might fix
> > those kinds of missing dependency analysis. See
> 
> Sure. That's exactly what DEPT works for e.g. PG_locked.
> 
> >	https://lore.kernel.org/lkml/00000000000060d41f05f139aa44@google.com/
> 
> I will reproduce it and share the result.

Hi Torvalds and folks,

I reproduced the issue with DEPT on (after making DEPT work a lil more
aggressively for PG_locked), and obtain a DEPT report. I wish this is
the true positive, explaining the issue correctly!

Let me remind you guys again, "DEPT is designed exactly for that kind of
deadlock issue by e.g. PG_locked, PG_writeback and any wait APIs".

I attach the report and add how to interpret it at the end.

---

[  227.854322] ===================================================
[  227.854880] DEPT: Circular dependency has been detected.
[  227.855341] 6.2.0-rc1-00025-gb0c20ebf51ac-dirty #28 Not tainted
[  227.855864] ---------------------------------------------------
[  227.856367] summary
[  227.856601] ---------------------------------------------------
[  227.857107] *** DEADLOCK ***

[  227.857551] context A
[  227.857803]     [S] lock(&ni->ni_lock:0)
[  227.858175]     [W] folio_wait_bit_common(PG_locked_map:0)
[  227.858658]     [E] unlock(&ni->ni_lock:0)

[  227.859233] context B
[  227.859484]     [S] (unknown)(PG_locked_map:0)
[  227.859906]     [W] lock(&ni->ni_lock:0)
[  227.860277]     [E] folio_unlock(PG_locked_map:0)

[  227.860883] [S]: start of the event context
[  227.861263] [W]: the wait blocked
[  227.861581] [E]: the event not reachable
[  227.861941] ---------------------------------------------------
[  227.862436] context A's detail
[  227.862738] ---------------------------------------------------
[  227.863242] context A
[  227.863490]     [S] lock(&ni->ni_lock:0)
[  227.863865]     [W] folio_wait_bit_common(PG_locked_map:0)
[  227.864356]     [E] unlock(&ni->ni_lock:0)

[  227.864929] [S] lock(&ni->ni_lock:0):
[  227.865279] [<ffffffff82b396fb>] ntfs3_setattr+0x54b/0xd40
[  227.865803] stacktrace:
[  227.866064]       ntfs3_setattr+0x54b/0xd40
[  227.866469]       notify_change+0xcb3/0x1430
[  227.866875]       do_truncate+0x149/0x210
[  227.867277]       path_openat+0x21a3/0x2a90
[  227.867692]       do_filp_open+0x1ba/0x410
[  227.868110]       do_sys_openat2+0x16d/0x4e0
[  227.868520]       __x64_sys_creat+0xcd/0x120
[  227.868925]       do_syscall_64+0x41/0xc0
[  227.869322]       entry_SYSCALL_64_after_hwframe+0x63/0xcd

[  227.870019] [W] folio_wait_bit_common(PG_locked_map:0):
[  227.870491] [<ffffffff81b228b0>] truncate_inode_pages_range+0x9b0/0xf20
[  227.871074] stacktrace:
[  227.871335]       folio_wait_bit_common+0x5e0/0xaf0
[  227.871796]       truncate_inode_pages_range+0x9b0/0xf20
[  227.872287]       truncate_pagecache+0x67/0x90
[  227.872730]       ntfs3_setattr+0x55a/0xd40
[  227.873152]       notify_change+0xcb3/0x1430
[  227.873578]       do_truncate+0x149/0x210
[  227.873981]       path_openat+0x21a3/0x2a90
[  227.874395]       do_filp_open+0x1ba/0x410
[  227.874803]       do_sys_openat2+0x16d/0x4e0
[  227.875215]       __x64_sys_creat+0xcd/0x120
[  227.875623]       do_syscall_64+0x41/0xc0
[  227.876035]       entry_SYSCALL_64_after_hwframe+0x63/0xcd

[  227.876738] [E] unlock(&ni->ni_lock:0):
[  227.877105] (N/A)
[  227.877331] ---------------------------------------------------
[  227.877850] context B's detail
[  227.878169] ---------------------------------------------------
[  227.878699] context B
[  227.878956]     [S] (unknown)(PG_locked_map:0)
[  227.879381]     [W] lock(&ni->ni_lock:0)
[  227.879774]     [E] folio_unlock(PG_locked_map:0)

[  227.880429] [S] (unknown)(PG_locked_map:0):
[  227.880825] (N/A)

[  227.881249] [W] lock(&ni->ni_lock:0):
[  227.881607] [<ffffffff82b009ec>] attr_data_get_block+0x32c/0x19f0
[  227.882151] stacktrace:
[  227.882421]       attr_data_get_block+0x32c/0x19f0
[  227.882877]       ntfs_get_block_vbo+0x264/0x1330
[  227.883316]       __block_write_begin_int+0x3bd/0x14b0
[  227.883809]       block_write_begin+0xb9/0x4d0
[  227.884231]       ntfs_write_begin+0x27e/0x480
[  227.884650]       generic_perform_write+0x256/0x570
[  227.885155]       __generic_file_write_iter+0x2ae/0x500
[  227.885658]       ntfs_file_write_iter+0x66d/0x1d70
[  227.886136]       do_iter_readv_writev+0x20b/0x3c0
[  227.886596]       do_iter_write+0x188/0x710
[  227.887015]       vfs_iter_write+0x74/0xa0
[  227.887425]       iter_file_splice_write+0x745/0xc90
[  227.887913]       direct_splice_actor+0x114/0x180
[  227.888364]       splice_direct_to_actor+0x33b/0x8b0
[  227.888831]       do_splice_direct+0x1b7/0x280
[  227.889256]       do_sendfile+0xb49/0x1310

[  227.889854] [E] folio_unlock(PG_locked_map:0):
[  227.890265] [<ffffffff81f10222>] generic_write_end+0xf2/0x440
[  227.890788] stacktrace:
[  227.891056]       generic_write_end+0xf2/0x440
[  227.891484]       ntfs_write_end+0x42e/0x980
[  227.891920]       generic_perform_write+0x316/0x570
[  227.892393]       __generic_file_write_iter+0x2ae/0x500
[  227.892899]       ntfs_file_write_iter+0x66d/0x1d70
[  227.893378]       do_iter_readv_writev+0x20b/0x3c0
[  227.893838]       do_iter_write+0x188/0x710
[  227.894253]       vfs_iter_write+0x74/0xa0
[  227.894660]       iter_file_splice_write+0x745/0xc90
[  227.895133]       direct_splice_actor+0x114/0x180
[  227.895585]       splice_direct_to_actor+0x33b/0x8b0
[  227.896082]       do_splice_direct+0x1b7/0x280
[  227.896521]       do_sendfile+0xb49/0x1310
[  227.896926]       __x64_sys_sendfile64+0x1d0/0x210
[  227.897389]       do_syscall_64+0x41/0xc0
[  227.897804]       entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  227.898332] ---------------------------------------------------
[  227.898858] information that might be helpful
[  227.899278] ---------------------------------------------------
[  227.899817] CPU: 1 PID: 8060 Comm: a.out Not tainted 6.2.0-rc1-00025-gb0c20ebf51ac-dirty #28
[  227.900547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  227.901249] Call Trace:
[  227.901527]  <TASK>
[  227.901778]  dump_stack_lvl+0xf2/0x169
[  227.902167]  print_circle.cold+0xca4/0xd28
[  227.902593]  ? lookup_dep+0x240/0x240
[  227.902989]  ? extend_queue+0x223/0x300
[  227.903392]  cb_check_dl+0x1e7/0x260
[  227.903783]  bfs+0x27b/0x610
[  227.904102]  ? print_circle+0x240/0x240
[  227.904493]  ? llist_add_batch+0x180/0x180
[  227.904901]  ? extend_queue_rev+0x300/0x300
[  227.905317]  ? __add_dep+0x60f/0x810
[  227.905689]  add_dep+0x221/0x5b0
[  227.906041]  ? __add_idep+0x310/0x310
[  227.906432]  ? add_iecxt+0x1bc/0xa60
[  227.906821]  ? add_iecxt+0x1bc/0xa60
[  227.907210]  ? add_iecxt+0x1bc/0xa60
[  227.907599]  ? add_iecxt+0x1bc/0xa60
[  227.907997]  __dept_wait+0x600/0x1490
[  227.908392]  ? add_iecxt+0x1bc/0xa60
[  227.908778]  ? truncate_inode_pages_range+0x9b0/0xf20
[  227.909274]  ? check_new_class+0x790/0x790
[  227.909700]  ? dept_enirq_transition+0x519/0x9c0
[  227.910162]  dept_wait+0x159/0x3b0
[  227.910535]  ? truncate_inode_pages_range+0x9b0/0xf20
[  227.911032]  folio_wait_bit_common+0x5e0/0xaf0
[  227.911482]  ? filemap_get_folios_contig+0xa30/0xa30
[  227.911975]  ? dept_enirq_transition+0x519/0x9c0
[  227.912440]  ? lock_is_held_type+0x10e/0x160
[  227.912868]  ? lock_is_held_type+0x11e/0x160
[  227.913300]  truncate_inode_pages_range+0x9b0/0xf20
[  227.913782]  ? truncate_inode_partial_folio+0xba0/0xba0
[  227.914304]  ? setattr_prepare+0x142/0xc40
[  227.914718]  truncate_pagecache+0x67/0x90
[  227.915135]  ntfs3_setattr+0x55a/0xd40
[  227.915535]  ? ktime_get_coarse_real_ts64+0x1e5/0x2f0
[  227.916031]  ? ntfs_extend+0x5c0/0x5c0
[  227.916431]  ? mode_strip_sgid+0x210/0x210
[  227.916861]  ? ntfs_extend+0x5c0/0x5c0
[  227.917262]  notify_change+0xcb3/0x1430
[  227.917661]  ? do_truncate+0x149/0x210
[  227.918061]  do_truncate+0x149/0x210
[  227.918449]  ? file_open_root+0x430/0x430
[  227.918871]  ? process_measurement+0x18c0/0x18c0
[  227.919337]  ? ntfs_file_release+0x230/0x230
[  227.919784]  path_openat+0x21a3/0x2a90
[  227.920185]  ? path_lookupat+0x840/0x840
[  227.920595]  ? dept_enirq_transition+0x519/0x9c0
[  227.921047]  ? lock_is_held_type+0x10e/0x160
[  227.921460]  do_filp_open+0x1ba/0x410
[  227.921839]  ? may_open_dev+0xf0/0xf0
[  227.922214]  ? find_held_lock+0x2d/0x110
[  227.922612]  ? lock_release+0x43c/0x830
[  227.922992]  ? dept_ecxt_exit+0x31a/0x590
[  227.923395]  ? _raw_spin_unlock+0x3b/0x50
[  227.923793]  ? alloc_fd+0x2de/0x6e0
[  227.924148]  do_sys_openat2+0x16d/0x4e0
[  227.924529]  ? __ia32_sys_get_robust_list+0x3b0/0x3b0
[  227.925013]  ? build_open_flags+0x6f0/0x6f0
[  227.925414]  ? dept_enirq_transition+0x519/0x9c0
[  227.925870]  ? dept_enirq_transition+0x519/0x9c0
[  227.926331]  ? lock_is_held_type+0x4e/0x160
[  227.926751]  ? lock_is_held_type+0x4e/0x160
[  227.927168]  __x64_sys_creat+0xcd/0x120
[  227.927561]  ? __x64_compat_sys_openat+0x1f0/0x1f0
[  227.928031]  do_syscall_64+0x41/0xc0
[  227.928416]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  227.928912] RIP: 0033:0x7f8b9e4e4469
[  227.929285] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
[  227.930793] RSP: 002b:00007f8b9eea4ef8 EFLAGS: 00000202 ORIG_RAX: 0000000000000055
[  227.931456] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8b9e4e4469
[  227.932062] RDX: 0000000000737562 RSI: 0000000000000000 RDI: 0000000020000000
[  227.932661] RBP: 00007f8b9eea4f20 R08: 0000000000000000 R09: 0000000000000000
[  227.933252] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fffa75511ee
[  227.933845] R13: 00007fffa75511ef R14: 00007f8b9ee85000 R15: 0000000000000003
[  227.934443]  </TASK>

---

This part is the most important.

[  227.857551] context A
[  227.857803]     [S] lock(&ni->ni_lock:0)
[  227.858175]     [W] folio_wait_bit_common(PG_locked_map:0)
[  227.858658]     [E] unlock(&ni->ni_lock:0)

[  227.859233] context B
[  227.859484]     [S] (unknown)(PG_locked_map:0)
[  227.859906]     [W] lock(&ni->ni_lock:0)
[  227.860277]     [E] folio_unlock(PG_locked_map:0)

[  227.860883] [S]: start of the event context
[  227.861263] [W]: the wait blocked
[  227.861581] [E]: the event not reachable

Dependency 1. A's unlock(&ni_lock:0) cannot happen if A's
              folio_wait_bit_common(PG_locked_map:0) is stuck waiting on
	      folio_ulock(PG_locked_map:0) that will wake up A.

Dependency 2. B's folio_unlock(PG_locked_map:0) cannot happend if B's
              lock(&ni->ni_lock:0) is stuck waiting on
	      unlock(&ni->ni_lock:0) that will release &ni->ni_lock.

So if these two contexts run at the same time, a deadlock is gonna
happen. DEPT reports it based on the two dependencies above. You can
check the stacktrace of each [W] and [E] in context's detail section.

It'd be appreciated if you share your opinion. I will work on it and
post the next spin, after getting back to work in 4 days.

	Byungchul

Byungchul Park Jan. 21, 2023, 3:28 a.m. UTC | #16

On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote:
> On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:

[...]

> > > T0		T1		T2
> > > --		--		--
> > > unfair_read_lock(A);
> > >			write_lock(B);
> > >					write_lock(A);
> > > write_lock(B);
> > >			fair_read_lock(A);
> > > write_unlock(B);
> > > read_unlock(A);
> > >			read_unlock(A);
> > >			write_unlock(B);
> > >					write_unlock(A);
> > > 
> > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > >     not doing either write_unlock(B) or read_unlock(B). In other words:
> > > 
> > >       1. read_unlock(A) happening depends on write_unlock(B) happening.
> > >       2. read_unlock(A) happening depends on read_unlock(B) happening.
> > > 
> > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > >     owner not doing either write_unlock(A) or read_unlock(A). In other
> > >     words:
> > > 
> > >       3. write_unlock(B) happening depends on write_unlock(A) happening.
> > >       4. write_unlock(B) happening depends on read_unlock(A) happening.
> > > 
> > > 1, 2, 3 and 4 give the following dependencies:
> > > 
> > >     1. read_unlock(A) -> write_unlock(B)
> > >     2. read_unlock(A) -> read_unlock(B)
> > >     3. write_unlock(B) -> write_unlock(A)
> > >     4. write_unlock(B) -> read_unlock(A)
> > > 
> > > With 1 and 4, there's a circular dependency so DEPT definitely report
> > > this as a problem.
> > > 
> > > REMIND: DEPT focuses on waits and events.
> > 
> > Do you have the test cases showing DEPT can detect this?
> > 
> 
> Just tried the following on your latest GitHub branch, I commented all
> but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
> Feel free to double check.

I tried the 'queued read lock' test cases with DEPT on. I can see DEPT
detect and report it. But yeah.. it's too verbose now. It's because DEPT
is not aware of the test environment so it's just working hard to report
every case.

To make DEPT work with the selftest better, some works are needed. I
will work on it later or you please work on it.

The corresponding report is the following.

---

[    4.583997] ===================================================
[    4.585094] DEPT: Circular dependency has been detected.
[    4.585620] 6.0.0-00023-g331e0412f735 #2 Tainted: G        W         
[    4.586347] ---------------------------------------------------
[    4.586942] summary
[    4.587161] ---------------------------------------------------
[    4.587757] *** DEADLOCK ***
[    4.587757] 
[    4.588198] context A
[    4.588434]     [S] lock(&rwlock_A:0)
[    4.588804]     [W] lock(&rwlock_B:0)
[    4.589175]     [E] unlock(&rwlock_A:0)
[    4.589565] 
[    4.589727] context B
[    4.589963]     [S] lock(&rwlock_B:0)
[    4.590375]     [W] lock(&rwlock_A:0)
[    4.590749]     [E] unlock(&rwlock_B:0)
[    4.591136] 
[    4.591295] [S]: start of the event context
[    4.591716] [W]: the wait blocked
[    4.592049] [E]: the event not reachable
[    4.592443] ---------------------------------------------------
[    4.593037] context A's detail
[    4.593351] ---------------------------------------------------
[    4.593944] context A
[    4.594182]     [S] lock(&rwlock_A:0)
[    4.594577]     [W] lock(&rwlock_B:0)
[    4.594952]     [E] unlock(&rwlock_A:0)
[    4.595341] 
[    4.595501] [S] lock(&rwlock_A:0):
[    4.595848] [<ffffffff814eb244>] queued_read_lock_hardirq_ER_rE+0xf4/0x170
[    4.596547] stacktrace:
[    4.596797]       _raw_read_lock+0xcf/0x110
[    4.597215]       queued_read_lock_hardirq_ER_rE+0xf4/0x170
[    4.597766]       dotest+0x30/0x7bc
[    4.598118]       locking_selftest+0x2c6f/0x2ead
[    4.598602]       start_kernel+0x5aa/0x6d5
[    4.599017]       secondary_startup_64_no_verify+0xe0/0xeb
[    4.599562] 
[    4.599721] [W] lock(&rwlock_B:0):
[    4.600064] [<ffffffff814eb250>] queued_read_lock_hardirq_ER_rE+0x100/0x170
[    4.600823] stacktrace:
[    4.601075]       dept_wait+0x12c/0x1d0
[    4.601465]       _raw_write_lock+0xa0/0xd0
[    4.601892]       queued_read_lock_hardirq_ER_rE+0x100/0x170
[    4.602496]       dotest+0x30/0x7bc
[    4.602854]       locking_selftest+0x2c6f/0x2ead
[    4.603333]       start_kernel+0x5aa/0x6d5
[    4.603745]       secondary_startup_64_no_verify+0xe0/0xeb
[    4.604298] 
[    4.604458] [E] unlock(&rwlock_A:0):
[    4.604820] (N/A)
[    4.605023] ---------------------------------------------------
[    4.605617] context B's detail
[    4.605930] ---------------------------------------------------
[    4.606551] context B
[    4.606790]     [S] lock(&rwlock_B:0)
[    4.607163]     [W] lock(&rwlock_A:0)
[    4.607534]     [E] unlock(&rwlock_B:0)
[    4.607920] 
[    4.608080] [S] lock(&rwlock_B:0):
[    4.608427] [<ffffffff814eb3b4>] queued_read_lock_hardirq_RE_Er+0xf4/0x170
[    4.609113] stacktrace:
[    4.609366]       _raw_write_lock+0xc3/0xd0
[    4.609788]       queued_read_lock_hardirq_RE_Er+0xf4/0x170
[    4.610371]       dotest+0x30/0x7bc
[    4.610730]       locking_selftest+0x2c41/0x2ead
[    4.611195]       start_kernel+0x5aa/0x6d5
[    4.611615]       secondary_startup_64_no_verify+0xe0/0xeb
[    4.612164] 
[    4.612325] [W] lock(&rwlock_A:0):
[    4.612671] [<ffffffff814eb3c0>] queued_read_lock_hardirq_RE_Er+0x100/0x170
[    4.613369] stacktrace:
[    4.613622]       _raw_read_lock+0xac/0x110
[    4.614047]       queued_read_lock_hardirq_RE_Er+0x100/0x170
[    4.614652]       dotest+0x30/0x7bc
[    4.615007]       locking_selftest+0x2c41/0x2ead
[    4.615468]       start_kernel+0x5aa/0x6d5
[    4.615879]       secondary_startup_64_no_verify+0xe0/0xeb
[    4.616607] 
[    4.616769] [E] unlock(&rwlock_B:0):
[    4.617132] (N/A)
[    4.617336] ---------------------------------------------------
[    4.617927] information that might be helpful
[    4.618390] ---------------------------------------------------
[    4.618981] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W          6.0.0-00023-g331e0412f735 #2
[    4.619886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    4.620699] Call Trace:
[    4.620958]  <TASK>
[    4.621182]  dump_stack_lvl+0x5d/0x81
[    4.621561]  print_circle.cold+0x52b/0x545
[    4.621983]  ? print_circle+0xd0/0xd0
[    4.622385]  cb_check_dl+0x58/0x60
[    4.622737]  bfs+0xba/0x170
[    4.623029]  add_dep+0x85/0x170
[    4.623355]  ? from_pool+0x4c/0x160
[    4.623714]  __dept_wait+0x1fd/0x600
[    4.624081]  ? queued_read_lock_hardirq_ER_rE+0x100/0x170
[    4.624628]  ? rcu_read_lock_held_common+0x9/0x50
[    4.625108]  ? queued_read_lock_hardirq_ER_rE+0x100/0x170
[    4.625652]  dept_wait+0x12c/0x1d0
[    4.626000]  _raw_write_lock+0xa0/0xd0
[    4.626417]  queued_read_lock_hardirq_ER_rE+0x100/0x170
[    4.626951]  dotest+0x30/0x7bc
[    4.627270]  locking_selftest+0x2c6f/0x2ead
[    4.627702]  start_kernel+0x5aa/0x6d5
[    4.628081]  secondary_startup_64_no_verify+0xe0/0xeb
[    4.628597]  </TASK>
---

The most important part is the following.

[    4.588198] context A
[    4.588434]     [S] lock(&rwlock_A:0)
[    4.588804]     [W] lock(&rwlock_B:0)
[    4.589175]     [E] unlock(&rwlock_A:0)
[    4.589565] 
[    4.589727] context B
[    4.589963]     [S] lock(&rwlock_B:0)
[    4.590375]     [W] lock(&rwlock_A:0)
[    4.590749]     [E] unlock(&rwlock_B:0)

As I told you, DEPT treats a queued lock as a normal type lock, no
matter whether it's a read lock. That's why it prints just
'lock(&rwlock_A:0)' instead of 'read_lock(&rwlock_A:0)'. If needed, I'm
gonna change the format.

I checked the selftest code and found, LOCK(B) is transformed like:

	LOCK(B) -> WL(B) -> write_lock(&rwlock_B)

That's why '&rwlock_B' is printed instead of just 'B', JFYI.

Plus, for your information, you should turn on CONFIG_DEPT to use it.

	Byungchul

Boqun Feng Jan. 21, 2023, 3:44 a.m. UTC | #17

On Sat, Jan 21, 2023 at 12:28:14PM +0900, Byungchul Park wrote:
> On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote:
> > On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> > > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> 
> [...]
> 
> > > > T0		T1		T2
> > > > --		--		--
> > > > unfair_read_lock(A);
> > > >			write_lock(B);
> > > >					write_lock(A);
> > > > write_lock(B);
> > > >			fair_read_lock(A);
> > > > write_unlock(B);
> > > > read_unlock(A);
> > > >			read_unlock(A);
> > > >			write_unlock(B);
> > > >					write_unlock(A);
> > > > 
> > > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > > >     not doing either write_unlock(B) or read_unlock(B). In other words:
> > > > 
> > > >       1. read_unlock(A) happening depends on write_unlock(B) happening.
> > > >       2. read_unlock(A) happening depends on read_unlock(B) happening.
> > > > 
> > > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > > >     owner not doing either write_unlock(A) or read_unlock(A). In other
> > > >     words:
> > > > 
> > > >       3. write_unlock(B) happening depends on write_unlock(A) happening.
> > > >       4. write_unlock(B) happening depends on read_unlock(A) happening.
> > > > 
> > > > 1, 2, 3 and 4 give the following dependencies:
> > > > 
> > > >     1. read_unlock(A) -> write_unlock(B)
> > > >     2. read_unlock(A) -> read_unlock(B)
> > > >     3. write_unlock(B) -> write_unlock(A)
> > > >     4. write_unlock(B) -> read_unlock(A)
> > > > 
> > > > With 1 and 4, there's a circular dependency so DEPT definitely report
> > > > this as a problem.
> > > > 
> > > > REMIND: DEPT focuses on waits and events.
> > > 
> > > Do you have the test cases showing DEPT can detect this?
> > > 
> > 
> > Just tried the following on your latest GitHub branch, I commented all
> > but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
> > Feel free to double check.
> 
> I tried the 'queued read lock' test cases with DEPT on. I can see DEPT
> detect and report it. But yeah.. it's too verbose now. It's because DEPT
> is not aware of the test environment so it's just working hard to report
> every case.
> 
> To make DEPT work with the selftest better, some works are needed. I
> will work on it later or you please work on it.
> 
> The corresponding report is the following.
> 
[...]
> [    4.593037] context A's detail
> [    4.593351] ---------------------------------------------------
> [    4.593944] context A
> [    4.594182]     [S] lock(&rwlock_A:0)
> [    4.594577]     [W] lock(&rwlock_B:0)
> [    4.594952]     [E] unlock(&rwlock_A:0)
> [    4.595341] 
> [    4.595501] [S] lock(&rwlock_A:0):
> [    4.595848] [<ffffffff814eb244>] queued_read_lock_hardirq_ER_rE+0xf4/0x170
> [    4.596547] stacktrace:
> [    4.596797]       _raw_read_lock+0xcf/0x110
> [    4.597215]       queued_read_lock_hardirq_ER_rE+0xf4/0x170
> [    4.597766]       dotest+0x30/0x7bc
> [    4.598118]       locking_selftest+0x2c6f/0x2ead
> [    4.598602]       start_kernel+0x5aa/0x6d5
> [    4.599017]       secondary_startup_64_no_verify+0xe0/0xeb
> [    4.599562] 
[...]
> [    4.608427] [<ffffffff814eb3b4>] queued_read_lock_hardirq_RE_Er+0xf4/0x170
> [    4.609113] stacktrace:
> [    4.609366]       _raw_write_lock+0xc3/0xd0
> [    4.609788]       queued_read_lock_hardirq_RE_Er+0xf4/0x170
> [    4.610371]       dotest+0x30/0x7bc
> [    4.610730]       locking_selftest+0x2c41/0x2ead
> [    4.611195]       start_kernel+0x5aa/0x6d5
> [    4.611615]       secondary_startup_64_no_verify+0xe0/0xeb
> [    4.612164] 
> [    4.612325] [W] lock(&rwlock_A:0):
> [    4.612671] [<ffffffff814eb3c0>] queued_read_lock_hardirq_RE_Er+0x100/0x170
> [    4.613369] stacktrace:
> [    4.613622]       _raw_read_lock+0xac/0x110
> [    4.614047]       queued_read_lock_hardirq_RE_Er+0x100/0x170
> [    4.614652]       dotest+0x30/0x7bc
> [    4.615007]       locking_selftest+0x2c41/0x2ead
> [    4.615468]       start_kernel+0x5aa/0x6d5
> [    4.615879]       secondary_startup_64_no_verify+0xe0/0xeb
> [    4.616607] 
[...]

> As I told you, DEPT treats a queued lock as a normal type lock, no
> matter whether it's a read lock. That's why it prints just
> 'lock(&rwlock_A:0)' instead of 'read_lock(&rwlock_A:0)'. If needed, I'm
> gonna change the format.
> 
> I checked the selftest code and found, LOCK(B) is transformed like:
> 
> 	LOCK(B) -> WL(B) -> write_lock(&rwlock_B)
> 
> That's why '&rwlock_B' is printed instead of just 'B', JFYI.
> 

Nah, you output shows that you've run at least both function

	queued_read_lock_hardirq_RE_Er()
	queued_read_lock_hardirq_ER_rE()

but if you apply my diff

	https://lore.kernel.org/lkml/Y8oFj9A19cw3enHB@boqun-archlinux/

you should only run

	queued_read_lock_hardirq_RE_Er()

one test.

One of the reason that DEPT "detect" this is that DEPT doesn't reset
between tests, so old dependencies from previous run get carried over.


> Plus, for your information, you should turn on CONFIG_DEPT to use it.
> 

Yes I turn that config on.

Regards,
Boqun

> 	Byungchul

Byungchul Park Jan. 21, 2023, 4:47 a.m. UTC | #18

Boqun wrote:
> On Sat, Jan 21, 2023 at 12:28:14PM +0900, Byungchul Park wrote:
> > On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote:
> > > On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> > > > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> > 
> > [...]
> > 
> > > > > T0		T1		T2
> > > > > --		--		--
> > > > > unfair_read_lock(A);
> > > > >			write_lock(B);
> > > > >					write_lock(A);
> > > > > write_lock(B);
> > > > >			fair_read_lock(A);
> > > > > write_unlock(B);
> > > > > read_unlock(A);
> > > > >			read_unlock(A);
> > > > >			write_unlock(B);
> > > > >					write_unlock(A);
> > > > > 
> > > > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > > > >     not doing either write_unlock(B) or read_unlock(B). In other words:
> > > > > 
> > > > >       1. read_unlock(A) happening depends on write_unlock(B) happening.
> > > > >       2. read_unlock(A) happening depends on read_unlock(B) happening.
> > > > > 
> > > > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > > > >     owner not doing either write_unlock(A) or read_unlock(A). In other
> > > > >     words:
> > > > > 
> > > > >       3. write_unlock(B) happening depends on write_unlock(A) happening.
> > > > >       4. write_unlock(B) happening depends on read_unlock(A) happening.
> > > > > 
> > > > > 1, 2, 3 and 4 give the following dependencies:
> > > > > 
> > > > >     1. read_unlock(A) -> write_unlock(B)
> > > > >     2. read_unlock(A) -> read_unlock(B)
> > > > >     3. write_unlock(B) -> write_unlock(A)
> > > > >     4. write_unlock(B) -> read_unlock(A)
> > > > > 
> > > > > With 1 and 4, there's a circular dependency so DEPT definitely report
> > > > > this as a problem.
> > > > > 
> > > > > REMIND: DEPT focuses on waits and events.
> > > > 
> > > > Do you have the test cases showing DEPT can detect this?
> > > > 
> > > 
> > > Just tried the following on your latest GitHub branch, I commented all
> > > but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
> > > Feel free to double check.
> > 
> > I tried the 'queued read lock' test cases with DEPT on. I can see DEPT
> > detect and report it. But yeah.. it's too verbose now. It's because DEPT
> > is not aware of the test environment so it's just working hard to report
> > every case.
> > 
> > To make DEPT work with the selftest better, some works are needed. I
> > will work on it later or you please work on it.
> > 
> > The corresponding report is the following.
> > 
> [...]
> > [    4.593037] context A's detail
> > [    4.593351] ---------------------------------------------------
> > [    4.593944] context A
> > [    4.594182]     [S] lock(&rwlock_A:0)
> > [    4.594577]     [W] lock(&rwlock_B:0)
> > [    4.594952]     [E] unlock(&rwlock_A:0)
> > [    4.595341] 
> > [    4.595501] [S] lock(&rwlock_A:0):
> > [    4.595848] [<ffffffff814eb244>] queued_read_lock_hardirq_ER_rE+0xf4/0x170
> > [    4.596547] stacktrace:
> > [    4.596797]       _raw_read_lock+0xcf/0x110
> > [    4.597215]       queued_read_lock_hardirq_ER_rE+0xf4/0x170
> > [    4.597766]       dotest+0x30/0x7bc
> > [    4.598118]       locking_selftest+0x2c6f/0x2ead
> > [    4.598602]       start_kernel+0x5aa/0x6d5
> > [    4.599017]       secondary_startup_64_no_verify+0xe0/0xeb
> > [    4.599562] 
> [...]
> > [    4.608427] [<ffffffff814eb3b4>] queued_read_lock_hardirq_RE_Er+0xf4/0x170
> > [    4.609113] stacktrace:
> > [    4.609366]       _raw_write_lock+0xc3/0xd0
> > [    4.609788]       queued_read_lock_hardirq_RE_Er+0xf4/0x170
> > [    4.610371]       dotest+0x30/0x7bc
> > [    4.610730]       locking_selftest+0x2c41/0x2ead
> > [    4.611195]       start_kernel+0x5aa/0x6d5
> > [    4.611615]       secondary_startup_64_no_verify+0xe0/0xeb
> > [    4.612164] 
> > [    4.612325] [W] lock(&rwlock_A:0):
> > [    4.612671] [<ffffffff814eb3c0>] queued_read_lock_hardirq_RE_Er+0x100/0x170
> > [    4.613369] stacktrace:
> > [    4.613622]       _raw_read_lock+0xac/0x110
> > [    4.614047]       queued_read_lock_hardirq_RE_Er+0x100/0x170
> > [    4.614652]       dotest+0x30/0x7bc
> > [    4.615007]       locking_selftest+0x2c41/0x2ead
> > [    4.615468]       start_kernel+0x5aa/0x6d5
> > [    4.615879]       secondary_startup_64_no_verify+0xe0/0xeb
> > [    4.616607] 
> [...]
> 
> > As I told you, DEPT treats a queued lock as a normal type lock, no
> > matter whether it's a read lock. That's why it prints just
> > 'lock(&rwlock_A:0)' instead of 'read_lock(&rwlock_A:0)'. If needed, I'm
> > gonna change the format.
> > 
> > I checked the selftest code and found, LOCK(B) is transformed like:
> > 
> > 	LOCK(B) -> WL(B) -> write_lock(&rwlock_B)
> > 
> > That's why '&rwlock_B' is printed instead of just 'B', JFYI.
> > 
> 
> Nah, you output shows that you've run at least both function
> 
> 	queued_read_lock_hardirq_RE_Er()
> 	queued_read_lock_hardirq_ER_rE()

Indeed! I'm sorry for that.

> but if you apply my diff
> 
> 	https://lore.kernel.org/lkml/Y8oFj9A19cw3enHB@boqun-archlinux/
> 
> you should only run
> 
> 	queued_read_lock_hardirq_RE_Er()
> 
> one test.

I checked it. DEPT doesn't assume a rwlock switches between recursive
read lock and non-recursive read lock in a run time. Maybe it switches
since read lock needs to switch to recursive one in interrupt context.

By forcing read_lock_is_recursive() to always return false, DEPT works
as we expect. Otherwise, it doesn't.

Probabily I need to fix it.

Thanks.

	Byungchul

[RFC,v7,00/23] DEPT(Dependency Tracker)

Message

Comments