diff mbox

locking/lockdep: Add CONFIG_LOCKDEP_AGGRESSIVE

Message ID 20171211035017.32678-1-tytso@mit.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Theodore Ts'o Dec. 11, 2017, 3:50 a.m. UTC
CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result
in a large number of false positives because lockdep doesn't
understand how to deal with multiple stacked loop or MD devices.

Until someone can figure out how to automatically add annotations to
all file system and storage devices --- hopefully without forcing
developers to insert a large number of calls to undocumented lockdep
macros into the the loop device, every single file system, and every
single device-mapper backend --- let's add an option to allow these
lockdep features to be disabled.  Otherwise, many file system
developers will disable LOCKDEP entirely since it results in far too
many false positives when trying to use xfstests.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@lge.com
Cc: linux-block@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: oleg@redhat.com
Cc: tj@kernel.org
---
 lib/Kconfig.debug | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Theodore Ts'o Dec. 11, 2017, 3:57 a.m. UTC | #1
On Sun, Dec 10, 2017 at 10:50:17PM -0500, Theodore Ts'o wrote:
> CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result
> in a large number of false positives because lockdep doesn't
> understand how to deal with multiple stacked loop or MD devices.
> 
> Until someone can figure out how to automatically add annotations to
> all file system and storage devices --- hopefully without forcing
> developers to insert a large number of calls to undocumented lockdep
> macros into the the loop device, every single file system, and every
> single device-mapper backend --- let's add an option to allow these
> lockdep features to be disabled.  Otherwise, many file system
> developers will disable LOCKDEP entirely since it results in far too
> many false positives when trying to use xfstests.

Note --- this is going to be in the ext4 git tree because otherwise
using Lockdep would greatly interfere with ext4 development --- and
I'd prefer not to completely give up on Lockdep as a lost cause.  As
such, it will appear in linux-next.

I'll drop it from the ext4 git branch before I send a pull request to
Linus, unless I get an OK for me to merge this patch to mainline via
the ext4 git tree.

Cheers,

						- Ted
Linus Torvalds Dec. 11, 2017, 9:06 p.m. UTC | #2
On Sun, Dec 10, 2017 at 7:50 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result
> in a large number of false positives because lockdep doesn't
> understand how to deal with multiple stacked loop or MD devices.

Guys, can we just remove this nasty crud already?

It's not working. Give it up. It was complex, it was buggy, it was slow.

Now it's causing people to disable lockdep entirely, or play these
kinds of games in unrelated trees.

It's time to give up on bad debugging, and definitely _not_ enable it
by default for PROVE_LOCKING.

Ted: in the meantime, don't use PROVE_LOCKING. Just use
DEBUG_LOCK_ALLOC instead. Please drop this patch from your tree.

                      Linus
Theodore Ts'o Dec. 12, 2017, 1:56 a.m. UTC | #3
On Mon, Dec 11, 2017 at 01:06:55PM -0800, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 7:50 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> > CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result
> > in a large number of false positives because lockdep doesn't
> > understand how to deal with multiple stacked loop or MD devices.
> 
> Guys, can we just remove this nasty crud already?
> 
> It's not working. Give it up. It was complex, it was buggy, it was slow.
> 
> Now it's causing people to disable lockdep entirely, or play these
> kinds of games in unrelated trees.
> 
> It's time to give up on bad debugging, and definitely _not_ enable it
> by default for PROVE_LOCKING.

To be fair to Byungchul, I think it *can* be valid for finding some
classes of bugs.  It's just a disaster for anything to do with storage.

I crafted this patch as something something which I thought *could* be
a path forward; it disables it by default, and gives a warning about
how it could cause a lot of pain for storage developers, but if other
kernel devs want to use it to potentially find problem in their
networking or wifi drivers --- sure, why not?  Just make it be
something *optional*.

If people really want to make this work for storage, what I think we
would need is variants of spin_lock_init(), mutex_init(), etc., which
take a struct super or a struct block device, with proper
documentation so that people don't have to struggle with undocumented
C preprocessor macros where every single time I need to mess with
lockdep annotations, I have to try figure out exactly what is a class
and subclass.

So in fact, what I was really hoping for was that some variant of this
patch would end up in the sched tree, and get pushed to you v4.15-rcX
patch as a regression fix, and I'd drop it from the ext4 tree.

	      	   	      	       	   - Ted
Byungchul Park Dec. 12, 2017, 5:20 a.m. UTC | #4
On 12/12/2017 6:06 AM, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 7:50 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>> CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result
>> in a large number of false positives because lockdep doesn't
>> understand how to deal with multiple stacked loop or MD devices.
> 
> Guys, can we just remove this nasty crud already?
> 
> It's not working. Give it up. It was complex, it was buggy, it was slow.

Sorry to hear that, but it works well and fortunately it has not
been buggy until now, of course, it has been wrongly accused
twice though. Furthormore, it's not slow now since it was fixed.
You seem to say that w/o foundation but emotionally.

The *problem* is false positives, since locks and waiters in
kernel are not classified properly, at the moment, which is just
a fact that is not related to cross-release stuff at all. IOW,
that would be useful once all locks and waiters are classified
correctly. It might take time but the classifying is a must-do
we have to keep doing.

I admit to make it optional for now, but I don't see why you
want to remove it entierly.
> Now it's causing people to disable lockdep entirely, or play these
> kinds of games in unrelated trees.
> 
> It's time to give up on bad debugging, and definitely _not_ enable it
> by default for PROVE_LOCKING.
> 
> Ted: in the meantime, don't use PROVE_LOCKING. Just use
> DEBUG_LOCK_ALLOC instead. Please drop this patch from your tree.
> 
>                        Linus
>
Theodore Ts'o Dec. 12, 2017, 1:03 p.m. UTC | #5
On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote:
> 
> The *problem* is false positives, since locks and waiters in
> kernel are not classified properly, at the moment, which is just
> a fact that is not related to cross-release stuff at all. IOW,
> that would be useful once all locks and waiters are classified
> correctly. It might take time but the classifying is a must-do
> we have to keep doing.

This is the wrong attitude.  The reason why LOCKDEP was so powerful
was because it automatically classified locks, instead of requiring
developers to document the locking hierarchy.  Requiring developers to
have to document and classified locks --- especially when the d*mned
mechanisms for doign so are so primitive and not even documented ---
is a complete non-strarter.

So, ***NO***, WE do not have to do anything.  We can just disable
Lockdep instead.  You can not and must not transfer responsibility for
documenting locks to the subsystem maintainers, as if it is some sort
of bug that the locks are not "classified correctly".  The fact that
your new technique requires clasification is a BUG, and goes against
the original design principle of LOCKDEP in the first place.

> I admit to make it optional for now, but I don't see why you
> want to remove it entierly.

So are you willing to take my patch?  Or give me permission to keep in
the ext4 tree?

						- Ted
Matthew Wilcox (Oracle) Dec. 12, 2017, 3:39 p.m. UTC | #6
On Tue, Dec 12, 2017 at 08:03:43AM -0500, Theodore Ts'o wrote:
> On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote:
> > The *problem* is false positives, since locks and waiters in
> > kernel are not classified properly, at the moment, which is just
> > a fact that is not related to cross-release stuff at all. IOW,
> > that would be useful once all locks and waiters are classified
> > correctly. It might take time but the classifying is a must-do
> > we have to keep doing.
> 
> This is the wrong attitude.  The reason why LOCKDEP was so powerful
> was because it automatically classified locks, instead of requiring
> developers to document the locking hierarchy.  Requiring developers to
> have to document and classified locks --- especially when the d*mned
> mechanisms for doign so are so primitive and not even documented ---
> is a complete non-strarter.

That's not fair.  We had to annotate i_mutex nesting, for example, and
several other places.  crosslock doesn't change anything in this respect,
it's just that the case that you hit every damn day as a filesystem
developer is something that the normal person almost never does.

> So are you willing to take my patch?  Or give me permission to keep in
> the ext4 tree?

He sent a patch earlier ...
Linus Torvalds Dec. 12, 2017, 5 p.m. UTC | #7
On Mon, Dec 11, 2017 at 9:20 PM, Byungchul Park <byungchul.park@lge.com> wrote:
>
> The *problem* is false positives, since locks and waiters in
> kernel are not classified properly

So the problem is that those false positives apparently end up being a
big deal for the filesystem people.

I personally don't think the code itself has to be removed, but I do
think that it should never have been added on as part of the generic
lock proving, and should always have been a separate config option.

I also feel that you dismiss "false positives" much too easily. A
false positive is a big problem - because it makes people ignore the
real cases (or just disable the functionality entirely).

It's why I am very quick to disable compiler warnings that have false
positives, for example. Just a couple of "harmless" false positive
warnings will poison the real warnings for people because they'll get
used to seeing warnings while building, and no longer actually look at
them.

                 Linus
Byungchul Park Dec. 13, 2017, 5:33 a.m. UTC | #8
On 12/12/2017 10:03 PM, Theodore Ts'o wrote:
> On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote:
>>
>> The *problem* is false positives, since locks and waiters in
>> kernel are not classified properly, at the moment, which is just
>> a fact that is not related to cross-release stuff at all. IOW,
>> that would be useful once all locks and waiters are classified
>> correctly. It might take time but the classifying is a must-do
>> we have to keep doing.
> 
> This is the wrong attitude.  The reason why LOCKDEP was so powerful

I understand you why you told me like this, but you seem to
mis-understand me. It's not a matter of attitudes. I also
think any tools should not bother others as far as possible.

> was because it automatically classified locks, instead of requiring

Right. Original lockdep tries to classify locks automatically,
but it can never achieve it with the current way. Conceptually,
there are still many places where locks are not classified
properly yet, but enough for lockdep to work for now.

> developers to document the locking hierarchy.  Requiring developers to
> have to document and classified locks --- especially when the d*mned

I also think requesting it to others instead of solving it
internally is bad. Please don't mis-understand me. I just
wanted to say that experts of specific domains can do it best.

I know you and fs folks have worked on it as enough as original
lockdep works, even though many other locks are still left
classified unproperly.

> mechanisms for doign so are so primitive and not even documented ---
> is a complete non-strarter.
> 
> So, ***NO***, WE do not have to do anything.  We can just disable

Right. I also think the best way is to classify locks more
perfectly and automatically w/o bothering others - but we cannot
achieve it in current way - relying on caller sites.

> Lockdep instead.  You can not and must not transfer responsibility for
> documenting locks to the subsystem maintainers, as if it is some sort

I said it can be done by each sub-system expert best, but it's not
about responsibility. I also think we should not transfer
respensibility to them. Sorry for confusing you.

> of bug that the locks are not "classified correctly".  The fact that

However, "locks are not classified correctly, yet" is a fact. Of
course, it has been just annotated as enough as lockdep works for
now. But many things are still left.

> your new technique requires clasification is a BUG, and goes against

Sorry to say it, but I don't think so, it's not a bug. To classify
locks more precisely just becomes required. For *example*,

    Lockdep automatically classifies locks 30%.
    Manually we have classified locks 20% more precisely until now.
    Cross-release requires to classify locks 20% more precisely.
    However, we still have 30% locks classified unproperly,
    but it's enough to work and those don't affect lockdep's work.

> the original design principle of LOCKDEP in the first place.

It's never against the original principle but following exactly
same as original principle.

>> I admit to make it optional for now, but I don't see why you
>> want to remove it entierly.
> 
> So are you willing to take my patch?  Or give me permission to keep in
> the ext4 tree?

I agree with your patch. Or I want another to be taken which makes
cross-release optional. Whatever.

Thank you for your opinion.

> 
> 						- Ted
>
Byungchul Park Dec. 13, 2017, 5:38 a.m. UTC | #9
On 12/13/2017 2:00 AM, Linus Torvalds wrote:
> On Mon, Dec 11, 2017 at 9:20 PM, Byungchul Park <byungchul.park@lge.com> wrote:
>>
>> The *problem* is false positives, since locks and waiters in
>> kernel are not classified properly
> 
> So the problem is that those false positives apparently end up being a
> big deal for the filesystem people.
> 
> I personally don't think the code itself has to be removed, but I do
> think that it should never have been added on as part of the generic
> lock proving, and should always have been a separate config option.

I admit it.

> I also feel that you dismiss "false positives" much too easily. A

I don't dismiss the ones easily...

Anyway, I mostly agree with your whole opinion.

Thanks for replying.

> false positive is a big problem - because it makes people ignore the
> real cases (or just disable the functionality entirely).
> 
> It's why I am very quick to disable compiler warnings that have false
> positives, for example. Just a couple of "harmless" false positive
> warnings will poison the real warnings for people because they'll get
> used to seeing warnings while building, and no longer actually look at
> them.
> 
>                   Linus
>
diff mbox

Patch

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 947d3e2ed5c2..acae7ba99c16 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1099,8 +1099,8 @@  config PROVE_LOCKING
 	select DEBUG_MUTEXES
 	select DEBUG_RT_MUTEXES if RT_MUTEXES
 	select DEBUG_LOCK_ALLOC
-	select LOCKDEP_CROSSRELEASE
-	select LOCKDEP_COMPLETIONS
+	select LOCKDEP_CROSSRELEASE if LOCKDEP_AGGRESSIVE
+	select LOCKDEP_COMPLETIONS if LOCKDEP_AGGRESSIVE
 	select TRACE_IRQFLAGS
 	default n
 	help
@@ -1137,6 +1137,16 @@  config PROVE_LOCKING
 
 	 For more details, see Documentation/locking/lockdep-design.txt.
 
+config LOCKDEP_AGGRESSIVE
+	bool "Lock debugging: use aggressive techniques"
+	default n
+	help
+	 This enables some extremely aggressive lockdep checking that
+	 may result in false positives.
+
+	 Say N here if you are planning on using xfstests and don't
+	 want to deal with many false positive test failures.
+
 config LOCKDEP
 	bool
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT