diff mbox series

[3/3] panic: Allow printing extra panic information on kdump

Message ID 20211109202848.610874-4-gpiccoli@igalia.com (mailing list archive)
State New, archived
Headers show
Series Some improvements on panic_print | expand

Commit Message

Guilherme G. Piccoli Nov. 9, 2021, 8:28 p.m. UTC
Currently we have the "panic_print" parameter/sysctl to allow some extra
information to be printed in a panic event. On the other hand, the kdump
mechanism allows to kexec a new kernel to collect a memory dump for the
running kernel in case of panic.
Right now these options are incompatible: the user either sets the kdump
or makes use of "panic_print". The code path of "panic_print" isn't
reached when kdump is configured.

There are situations though in which this would be interesting: for
example, in systems that are very memory constrained, a handcrafted
tiny kernel/initrd for kdump might be used in order to only collect the
dmesg in kdump kernel. Even more common, systems with no disk space for
the full (compressed) memory dump might very well rely in this
functionality too, dumping only the dmesg with the additional information
provided by "panic_print".

So, this is what the patch does: allows both functionality to co-exist;
if "panic_print" is set and the system performs a kdump, the extra
information is printed on dmesg before the kexec. Some notes about the
design choices here:

(a) We could have introduced a sysctl or an extra bit on "panic_print"
to allow enabling the co-existence of kdump and "panic_print", but seems
that would be over-engineering; we have 3 cases, let's check how this
patch change things:

- if the user have kdump set and not "panic_print", nothing changes;
- if the user have "panic_print" set and not kdump, nothing changes;
- if both are enabled, now we print the extra information before kdump,
which is exactly the goal of the patch (and should be the goal of the
user, since they enabled both options).

(b) We assume that the code path won't return from __crash_kexec()
so we didn't guard against double execution of panic_print_sys_info().

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 kernel/panic.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Dave Young Dec. 22, 2021, 11:45 a.m. UTC | #1
Hi Guilherme,

Thanks for you patch.  Could you add kexec list for any following up
patches?  This could change kdump behavior so let's see if any comments
from kexec list.

Kudos for the lore+lei tool so that I can catch this by seeing this
coming into Andrews tree :)
On 11/09/21 at 05:28pm, Guilherme G. Piccoli wrote:
> Currently we have the "panic_print" parameter/sysctl to allow some extra
> information to be printed in a panic event. On the other hand, the kdump
> mechanism allows to kexec a new kernel to collect a memory dump for the
> running kernel in case of panic.
> Right now these options are incompatible: the user either sets the kdump
> or makes use of "panic_print". The code path of "panic_print" isn't
> reached when kdump is configured.
> 
> There are situations though in which this would be interesting: for
> example, in systems that are very memory constrained, a handcrafted
> tiny kernel/initrd for kdump might be used in order to only collect the
> dmesg in kdump kernel. Even more common, systems with no disk space for
> the full (compressed) memory dump might very well rely in this
> functionality too, dumping only the dmesg with the additional information
> provided by "panic_print".
> 
> So, this is what the patch does: allows both functionality to co-exist;
> if "panic_print" is set and the system performs a kdump, the extra
> information is printed on dmesg before the kexec. Some notes about the
> design choices here:
> 
> (a) We could have introduced a sysctl or an extra bit on "panic_print"
> to allow enabling the co-existence of kdump and "panic_print", but seems
> that would be over-engineering; we have 3 cases, let's check how this
> patch change things:
> 
> - if the user have kdump set and not "panic_print", nothing changes;
> - if the user have "panic_print" set and not kdump, nothing changes;
> - if both are enabled, now we print the extra information before kdump,
> which is exactly the goal of the patch (and should be the goal of the
> user, since they enabled both options).

People may enable kdump crashkernel and panic_print together but
they are not aware the extra panic print could cause kdump not reliable
(in theory).  So at least some words in kernel-parameters.txt would
help.
 
> 
> (b) We assume that the code path won't return from __crash_kexec()
> so we didn't guard against double execution of panic_print_sys_info().
> 
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
>  kernel/panic.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 5da71fa4e5f1..439dbf93b406 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -243,6 +243,13 @@ void panic(const char *fmt, ...)
>  	 */
>  	kgdb_panic(buf);
>  
> +	/*
> +	 * If we have a kdump kernel loaded, give a chance to panic_print
> +	 * show some extra information on kernel log if it was set...
> +	 */
> +	if (kexec_crash_loaded())
> +		panic_print_sys_info();
> +
>  	/*
>  	 * If we have crashed and we have a crash kernel loaded let it handle
>  	 * everything else.
> -- 
> 2.33.1
> 
> 

Thanks
Dave
Guilherme G. Piccoli Dec. 22, 2021, 12:34 p.m. UTC | #2
On 22/12/2021 08:45, Dave Young wrote:
> Hi Guilherme,
> 
> Thanks for you patch.  Could you add kexec list for any following up
> patches?  This could change kdump behavior so let's see if any comments
> from kexec list.
> 
> Kudos for the lore+lei tool so that I can catch this by seeing this
> coming into Andrews tree :)

Hi Dave, I'm really sorry for not adding the kexec list, I forgot. But I
will do next time for sure, my apologies. And thanks for taking a look
after you noticed that on lore, I appreciate your feedback!

> [...]
> People may enable kdump crashkernel and panic_print together but
> they are not aware the extra panic print could cause kdump not reliable
> (in theory).  So at least some words in kernel-parameters.txt would
> help.
>  

That makes sense, I'll improve that in a follow-up patch, how about
that? Indeed it's a good idea to let people be sure that panic_print
might affect kdump reliability, although I consider the risk to be
pretty low. And I'll loop the kexec list for sure!

Cheers,


Guilherme
Dave Young Dec. 24, 2021, 1:35 a.m. UTC | #3
Hi Guilherme,
On 12/22/21 at 09:34am, Guilherme G. Piccoli wrote:
> On 22/12/2021 08:45, Dave Young wrote:
> > Hi Guilherme,
> > 
> > Thanks for you patch.  Could you add kexec list for any following up
> > patches?  This could change kdump behavior so let's see if any comments
> > from kexec list.
> > 
> > Kudos for the lore+lei tool so that I can catch this by seeing this
> > coming into Andrews tree :)
> 
> Hi Dave, I'm really sorry for not adding the kexec list, I forgot. But I
> will do next time for sure, my apologies. And thanks for taking a look
> after you noticed that on lore, I appreciate your feedback!

Thanks!

> 
> > [...]
> > People may enable kdump crashkernel and panic_print together but
> > they are not aware the extra panic print could cause kdump not reliable
> > (in theory).  So at least some words in kernel-parameters.txt would
> > help.
> >  
> 
> That makes sense, I'll improve that in a follow-up patch, how about
> that? Indeed it's a good idea to let people be sure that panic_print
> might affect kdump reliability, although I consider the risk to be
> pretty low. And I'll loop the kexec list for sure!

If only the doc update, I think it is fine to be another follup-up
patch.

About your 1st option in patch log, there is crash_kexec_post_notifiers
kernel param which can be used to switch on panic notifiers before kdump
bootup.   Another way probably you can try to move panic print to be
panic notifier. Have this been discussed before? 

> 
> Cheers,
> 
> 
> Guilherme

Thanks
Dave
Guilherme G. Piccoli Dec. 25, 2021, 7:21 p.m. UTC | #4
On 23/12/2021 22:35, Dave Young wrote:
> Hi Guilherme,
> [...]
> If only the doc update, I think it is fine to be another follup-up
> patch.
> 
> About your 1st option in patch log, there is crash_kexec_post_notifiers
> kernel param which can be used to switch on panic notifiers before kdump
> bootup.   Another way probably you can try to move panic print to be
> panic notifier. Have this been discussed before? 
> 

Hey Dave, thanks for the suggestion. I've considered that but didn't
like the idea. My reasoning was: allowing post notifiers on kdump will
highly compromise the reliability, whereas the panic_print is a solo
option, and not very invasive.

To mix it with all panic notifiers would just increase a lot the risk of
a kdump failure. Put in other words: if I'm a kdump user and in order to
have this panic_print setting I'd also need to enable post notifiers,
certainly I'll not use the feature, 'cause I don't wanna risk kdump too
much.

One other option I've considered however, and I'd appreciate your
opinion here, would be a new option on crash_kexec_post_notifiers that
allows the users to select *which few notifiers* they want to enable.
Currently it's all or nothing, and this approach is too heavy/risky I
believe. Allowing customization on which post notifiers the user wants
would be much better and in this case, having a post notifier for
panic_print makes a lot of sense. What do you think?

Thanks!
Dave Young Dec. 27, 2021, 1:45 a.m. UTC | #5
On 12/25/21 at 04:21pm, Guilherme G. Piccoli wrote:
> On 23/12/2021 22:35, Dave Young wrote:
> > Hi Guilherme,
> > [...]
> > If only the doc update, I think it is fine to be another follup-up
> > patch.
> > 
> > About your 1st option in patch log, there is crash_kexec_post_notifiers
> > kernel param which can be used to switch on panic notifiers before kdump
> > bootup.   Another way probably you can try to move panic print to be
> > panic notifier. Have this been discussed before? 
> > 
> 
> Hey Dave, thanks for the suggestion. I've considered that but didn't
> like the idea. My reasoning was: allowing post notifiers on kdump will
> highly compromise the reliability, whereas the panic_print is a solo
> option, and not very invasive.
> 
> To mix it with all panic notifiers would just increase a lot the risk of
> a kdump failure. Put in other words: if I'm a kdump user and in order to
> have this panic_print setting I'd also need to enable post notifiers,
> certainly I'll not use the feature, 'cause I don't wanna risk kdump too
> much.

Hi Guilherme, yes, I have the same concern.  But there could be more
things like the panic_print in the future, it looks odd to have more
kernel cmdline params though.

> 
> One other option I've considered however, and I'd appreciate your
> opinion here, would be a new option on crash_kexec_post_notifiers that
> allows the users to select *which few notifiers* they want to enable.
> Currently it's all or nothing, and this approach is too heavy/risky I
> believe. Allowing customization on which post notifiers the user wants
> would be much better and in this case, having a post notifier for
> panic_print makes a lot of sense. What do you think?

It is definitely a good idea, I'm more than glad to see this if you
would like to work on this! 

> 
> Thanks!
> 

Thanks
Dave
Guilherme G. Piccoli Dec. 27, 2021, 3:14 a.m. UTC | #6
On 26/12/2021 22:45, Dave Young wrote:
> On 12/25/21 at 04:21pm, Guilherme G. Piccoli wrote:
> [...] 
> Hi Guilherme, yes, I have the same concern.  But there could be more
> things like the panic_print in the future, it looks odd to have more
> kernel cmdline params though.
> 

Agreed! We're on the same page here, definitely.


> [...] 
> It is definitely a good idea, I'm more than glad to see this if you
> would like to work on this! 
> 

Awesome! I'll try to give it a shot this week, let's see how complex
that'd be.

Thanks again for the great feedback!
Cheers,


Guilherme
Petr Mladek Jan. 13, 2022, 9:02 a.m. UTC | #7
On Tue 2021-11-09 17:28:48, Guilherme G. Piccoli wrote:
> Currently we have the "panic_print" parameter/sysctl to allow some extra
> information to be printed in a panic event. On the other hand, the kdump
> mechanism allows to kexec a new kernel to collect a memory dump for the
> running kernel in case of panic.
> Right now these options are incompatible: the user either sets the kdump
> or makes use of "panic_print". The code path of "panic_print" isn't
> reached when kdump is configured.
> 
> There are situations though in which this would be interesting: for
> example, in systems that are very memory constrained, a handcrafted
> tiny kernel/initrd for kdump might be used in order to only collect the
> dmesg in kdump kernel. Even more common, systems with no disk space for
> the full (compressed) memory dump might very well rely in this
> functionality too, dumping only the dmesg with the additional information
> provided by "panic_print".

Is anyone really using this approach? kmsg_dump() looks like a better
choice when there are memory constrains. It does not need to reserve
memory for booting the crash kernel.

I would not mind much but this change depends on a not fully reliable
assumption, see below.

Also it will also complicate the solution for the kmsg_dump() code path.
It would be better to discuss this togeter with the other patch
https://lore.kernel.org/r/20220106212835.119409-1-gpiccoli@igalia.com


> So, this is what the patch does: allows both functionality to co-exist;
> if "panic_print" is set and the system performs a kdump, the extra
> information is printed on dmesg before the kexec. Some notes about the
> design choices here:
> 
> (a) We could have introduced a sysctl or an extra bit on "panic_print"
> to allow enabling the co-existence of kdump and "panic_print", but seems
> that would be over-engineering; we have 3 cases, let's check how this
> patch change things:
> 
> - if the user have kdump set and not "panic_print", nothing changes;
> - if the user have "panic_print" set and not kdump, nothing changes;
> - if both are enabled, now we print the extra information before kdump,
> which is exactly the goal of the patch (and should be the goal of the
> user, since they enabled both options).
> 
> (b) We assume that the code path won't return from __crash_kexec()
> so we didn't guard against double execution of panic_print_sys_info().

This sounds suspiciously. There is small race window but it actually works.
__crash_kexec() really never returns when @kexec_crash_image is
loaded. Well, it might break in the future if the code is modified.

Best Regards,
Petr
Guilherme G. Piccoli Jan. 13, 2022, 1 p.m. UTC | #8
On 13/01/2022 06:02, Petr Mladek wrote:
> [...]
> Is anyone really using this approach? kmsg_dump() looks like a better
> choice when there are memory constrains. It does not need to reserve
> memory for booting the crash kernel.
> 
> I would not mind much but this change depends on a not fully reliable
> assumption, see below.
> 
> Also it will also complicate the solution for the kmsg_dump() code path.
> It would be better to discuss this togeter with the other patch
> https://lore.kernel.org/r/20220106212835.119409-1-gpiccoli@igalia.com

Hi Petr, thanks for your analysis here. Indeed, our use case benefits
from both this and the other patch in the thread you mentioned above -
see [0].


> [...]
>> (b) We assume that the code path won't return from __crash_kexec()
>> so we didn't guard against double execution of panic_print_sys_info().
> 
> This sounds suspiciously. There is small race window but it actually works.
> __crash_kexec() really never returns when @kexec_crash_image is
> loaded. Well, it might break in the future if the code is modified.
> 
> Best Regards,
> Petr

OK, so since this patch is already on linux-next and is relevant for our
use case, how about if we explicitly guard against the double print, as
I suggested in [0]?

Cheers,

Guilherme


[0]
https://lore.kernel.org/lkml/ba0e29ba-0e08-df6e-ade5-eb58ae2495e3@igalia.com/
Guilherme G. Piccoli Jan. 27, 2022, 4:53 p.m. UTC | #9
Hi Andrew, can I ask you to please remove this patch from linux-next?

It shows here:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=56439cb78293

Baoquan has concerns, and we're discussing that in another thread [0],
after I submit another change on top of this one. So, I guess it's
simpler to just drop it.

My apologies for this, I should have definitely loop the kexec list in
this one , but I forgot.

Cheers,


Guilherme


[0]
https://lore.kernel.org/lkml/7b93afff-66a0-44ee-3bb7-3d1e12dd47c2@igalia.com/
Guilherme G. Piccoli Feb. 8, 2022, 6:12 p.m. UTC | #10
On 27/01/2022 13:53, Guilherme G. Piccoli wrote:
> Hi Andrew, can I ask you to please remove this patch from linux-next?
> 
> It shows here:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=56439cb78293
> 
> Baoquan has concerns, and we're discussing that in another thread [0],
> after I submit another change on top of this one. So, I guess it's
> simpler to just drop it.
> 
> My apologies for this, I should have definitely loop the kexec list in
> this one , but I forgot.
> 
> Cheers,
> 
> 
> Guilherme
> 
> 
> [0]
> https://lore.kernel.org/lkml/7b93afff-66a0-44ee-3bb7-3d1e12dd47c2@igalia.com/


Hi Stephen / Andrew, sorry for the annoyance, but can you please remove
this patch from linux-next?

Today it shows as commit
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=d691651735f1
- this commit is subject to concern from Baoquan and we are discussing
better ways to achieve that, through a refactor.

Thanks in advance,


Guilherme
Stephen Rothwell Feb. 8, 2022, 9:39 p.m. UTC | #11
Hi Guilherme,

On Tue, 8 Feb 2022 15:12:04 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
>
> Hi Stephen / Andrew, sorry for the annoyance, but can you please remove
> this patch from linux-next?
> 
> Today it shows as commit
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=d691651735f1
> - this commit is subject to concern from Baoquan and we are discussing
> better ways to achieve that, through a refactor.

Dropped from linux-next today.
Guilherme G. Piccoli Feb. 9, 2022, 3:06 p.m. UTC | #12
On 08/02/2022 18:39, Stephen Rothwell wrote:
> Hi Guilherme,
> [...] 
> Dropped from linux-next today.
> 

Hi Stephen, thanks! I'm still seeing this patch over there, though - I'm
not sure if takes a while to show up in the tree...

Notice this request is only for patch 3/3 in this series - patches 1 and
2 are fine, were reviewed and accepted =)

Cheers,


Guilherme
Stephen Rothwell Feb. 9, 2022, 11:26 p.m. UTC | #13
Hi Guilherme,

On Wed, 9 Feb 2022 12:06:03 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
>
> On 08/02/2022 18:39, Stephen Rothwell wrote:
> > Hi Guilherme,
> > [...] 
> > Dropped from linux-next today.
> >   
> 
> Hi Stephen, thanks! I'm still seeing this patch over there, though - I'm
> not sure if takes a while to show up in the tree...

Andrew did another mmotm release which put it back in.  I have removed
it again for today.

> Notice this request is only for patch 3/3 in this series - patches 1 and
> 2 are fine, were reviewed and accepted =)

Understood.
Guilherme G. Piccoli Feb. 10, 2022, 12:50 p.m. UTC | #14
On 09/02/2022 20:26, Stephen Rothwell wrote:
> [...]
>> Hi Stephen, thanks! I'm still seeing this patch over there, though - I'm
>> not sure if takes a while to show up in the tree...
> 
> Andrew did another mmotm release which put it back in.  I have removed
> it again for today.
> 
>> Notice this request is only for patch 3/3 in this series - patches 1 and
>> 2 are fine, were reviewed and accepted =)
> 
> Understood.
> 

Thanks a bunch! So it seems Andrew needs to remove patch 3 from mmotm right?

Cheers,


Guilherme
diff mbox series

Patch

diff --git a/kernel/panic.c b/kernel/panic.c
index 5da71fa4e5f1..439dbf93b406 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -243,6 +243,13 @@  void panic(const char *fmt, ...)
 	 */
 	kgdb_panic(buf);
 
+	/*
+	 * If we have a kdump kernel loaded, give a chance to panic_print
+	 * show some extra information on kernel log if it was set...
+	 */
+	if (kexec_crash_loaded())
+		panic_print_sys_info();
+
 	/*
 	 * If we have crashed and we have a crash kernel loaded let it handle
 	 * everything else.