mbox series

[RFC,0/2] coding-style.rst: document BUG() and WARN() rules

Message ID 20220824163100.224449-1-david@redhat.com (mailing list archive)
Headers show
Series coding-style.rst: document BUG() and WARN() rules | expand

Message

David Hildenbrand Aug. 24, 2022, 4:30 p.m. UTC
As it seems to be rather unclear if/when to use BUG(), BUG_ON(),
VM_BUG_ON(), WARN_ON_ONCE(), ... let's try to document the result of a
recent discussion.

Details can be found in patch #1.

--------------------------------------------------------------------------

Here is some braindump after thinking about BUG_ON(), WARN_ON(), ... and
how it interacts with kdump.

I was wondering what the expectation on a system with armed kdump are,
for example, after we removed most BUG_ON() instances and replaced them
by WARN_ON_ONCE(). I would assume that we actually want to panic in some
cases to capture a proper system dump instead of continuing and eventually
ending up with a completely broken system where it's hard to extract any
useful debug information. We'd have to enable panic_on_warn. But we'd only
want to do that in case kdump is actually armed after boot.

So one idea would be to have some kind of "panic_on_warn_with_kdump" mode.
But then, we'd actually crash+kdump even on the most harmless WARN_ON()
conditions, because they all look alike. To compensate, we would need
some kind of "severity" levels of a warning -- at least some kind of
"this is harmless and we can easily recover, but please tell the
developers" vs. "this is real bad and unexpected, capture a dump
immediately instead of trying to recover and eventually failing miserably".

But then, maybe we really want something like BUG_ON() -- let's call it
CBUG_ON() for simplicity -- but be able to make it be usable in
conditionals (to implement recovery code if easily possible) and make the
runtime behavior configurable.

if (CBUG_ON(whatever))
	try_to_recover()

Whereby, for example, "panic_on_cbug" and "panic_on_cbug_with_kdump"
could control the runtime behavior.

But this is just a braindump and I assume people reading along have other,
better ideas. Especially, a better name for CBUG.


Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>

David Hildenbrand (2):
  coding-style.rst: document BUG() and WARN() rules ("do not crash the
    kernel")
  checkpatch: warn on usage of VM_BUG_ON() and friends

 Documentation/process/coding-style.rst | 27 ++++++++++++++++++++++++++
 scripts/checkpatch.pl                  |  6 +++---
 2 files changed, 30 insertions(+), 3 deletions(-)

Comments

John Hubbard Aug. 25, 2022, 2:30 a.m. UTC | #1
On 8/24/22 09:30, David Hildenbrand wrote:
...
> So one idea would be to have some kind of "panic_on_warn_with_kdump" mode.
> But then, we'd actually crash+kdump even on the most harmless WARN_ON()
> conditions, because they all look alike. To compensate, we would need
> some kind of "severity" levels of a warning -- at least some kind of
> "this is harmless and we can easily recover, but please tell the
> developers" vs. "this is real bad and unexpected, capture a dump
> immediately instead of trying to recover and eventually failing miserably".
> 
> But then, maybe we really want something like BUG_ON() -- let's call it
> CBUG_ON() for simplicity -- but be able to make it be usable in
> conditionals (to implement recovery code if easily possible) and make the
> runtime behavior configurable.
> 
> if (CBUG_ON(whatever))
> 	try_to_recover()
> 
> Whereby, for example, "panic_on_cbug" and "panic_on_cbug_with_kdump"
> could control the runtime behavior.
> 
> But this is just a braindump and I assume people reading along have other,
> better ideas. Especially, a better name for CBUG.
> 

If this direction is pursued (as opposed to just recommending the
panic_on_warn approach, which is probably viable as well, btw), then I'd
suggest this name:

    PANIC_ON()

It's different than BUG_ON(), because it calls panic() instead of
immediately halting on a undefined instruction exception (yes, that's
x86-centric, I know). So at least in the better behaved cases, there is
a backtrace and a reboot, rather than a mysterious hard lockup.

As Mel points out [1], it's not always that much better. But in my
experience, this is usually a *lot* better.

It's only intended for a few very special cases. Not intended as any
sort of assert (which BUG sometimes was used for).

This forces a panic(), which is what David is looking for.

[1] https://lore.kernel.org/all/20220816094056.x4ldzednboaln3ag@suse.de/


thanks,