Message ID | 20211109202848.610874-4-gpiccoli@igalia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Some improvements on panic_print | expand |
Hi Guilherme, Thanks for you patch. Could you add kexec list for any following up patches? This could change kdump behavior so let's see if any comments from kexec list. Kudos for the lore+lei tool so that I can catch this by seeing this coming into Andrews tree :) On 11/09/21 at 05:28pm, Guilherme G. Piccoli wrote: > Currently we have the "panic_print" parameter/sysctl to allow some extra > information to be printed in a panic event. On the other hand, the kdump > mechanism allows to kexec a new kernel to collect a memory dump for the > running kernel in case of panic. > Right now these options are incompatible: the user either sets the kdump > or makes use of "panic_print". The code path of "panic_print" isn't > reached when kdump is configured. > > There are situations though in which this would be interesting: for > example, in systems that are very memory constrained, a handcrafted > tiny kernel/initrd for kdump might be used in order to only collect the > dmesg in kdump kernel. Even more common, systems with no disk space for > the full (compressed) memory dump might very well rely in this > functionality too, dumping only the dmesg with the additional information > provided by "panic_print". > > So, this is what the patch does: allows both functionality to co-exist; > if "panic_print" is set and the system performs a kdump, the extra > information is printed on dmesg before the kexec. Some notes about the > design choices here: > > (a) We could have introduced a sysctl or an extra bit on "panic_print" > to allow enabling the co-existence of kdump and "panic_print", but seems > that would be over-engineering; we have 3 cases, let's check how this > patch change things: > > - if the user have kdump set and not "panic_print", nothing changes; > - if the user have "panic_print" set and not kdump, nothing changes; > - if both are enabled, now we print the extra information before kdump, > which is exactly the goal of the patch (and should be the goal of the > user, since they enabled both options). People may enable kdump crashkernel and panic_print together but they are not aware the extra panic print could cause kdump not reliable (in theory). So at least some words in kernel-parameters.txt would help. > > (b) We assume that the code path won't return from __crash_kexec() > so we didn't guard against double execution of panic_print_sys_info(). > > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > --- > kernel/panic.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/panic.c b/kernel/panic.c > index 5da71fa4e5f1..439dbf93b406 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -243,6 +243,13 @@ void panic(const char *fmt, ...) > */ > kgdb_panic(buf); > > + /* > + * If we have a kdump kernel loaded, give a chance to panic_print > + * show some extra information on kernel log if it was set... > + */ > + if (kexec_crash_loaded()) > + panic_print_sys_info(); > + > /* > * If we have crashed and we have a crash kernel loaded let it handle > * everything else. > -- > 2.33.1 > > Thanks Dave
On 22/12/2021 08:45, Dave Young wrote: > Hi Guilherme, > > Thanks for you patch. Could you add kexec list for any following up > patches? This could change kdump behavior so let's see if any comments > from kexec list. > > Kudos for the lore+lei tool so that I can catch this by seeing this > coming into Andrews tree :) Hi Dave, I'm really sorry for not adding the kexec list, I forgot. But I will do next time for sure, my apologies. And thanks for taking a look after you noticed that on lore, I appreciate your feedback! > [...] > People may enable kdump crashkernel and panic_print together but > they are not aware the extra panic print could cause kdump not reliable > (in theory). So at least some words in kernel-parameters.txt would > help. > That makes sense, I'll improve that in a follow-up patch, how about that? Indeed it's a good idea to let people be sure that panic_print might affect kdump reliability, although I consider the risk to be pretty low. And I'll loop the kexec list for sure! Cheers, Guilherme
Hi Guilherme, On 12/22/21 at 09:34am, Guilherme G. Piccoli wrote: > On 22/12/2021 08:45, Dave Young wrote: > > Hi Guilherme, > > > > Thanks for you patch. Could you add kexec list for any following up > > patches? This could change kdump behavior so let's see if any comments > > from kexec list. > > > > Kudos for the lore+lei tool so that I can catch this by seeing this > > coming into Andrews tree :) > > Hi Dave, I'm really sorry for not adding the kexec list, I forgot. But I > will do next time for sure, my apologies. And thanks for taking a look > after you noticed that on lore, I appreciate your feedback! Thanks! > > > [...] > > People may enable kdump crashkernel and panic_print together but > > they are not aware the extra panic print could cause kdump not reliable > > (in theory). So at least some words in kernel-parameters.txt would > > help. > > > > That makes sense, I'll improve that in a follow-up patch, how about > that? Indeed it's a good idea to let people be sure that panic_print > might affect kdump reliability, although I consider the risk to be > pretty low. And I'll loop the kexec list for sure! If only the doc update, I think it is fine to be another follup-up patch. About your 1st option in patch log, there is crash_kexec_post_notifiers kernel param which can be used to switch on panic notifiers before kdump bootup. Another way probably you can try to move panic print to be panic notifier. Have this been discussed before? > > Cheers, > > > Guilherme Thanks Dave
On 23/12/2021 22:35, Dave Young wrote: > Hi Guilherme, > [...] > If only the doc update, I think it is fine to be another follup-up > patch. > > About your 1st option in patch log, there is crash_kexec_post_notifiers > kernel param which can be used to switch on panic notifiers before kdump > bootup. Another way probably you can try to move panic print to be > panic notifier. Have this been discussed before? > Hey Dave, thanks for the suggestion. I've considered that but didn't like the idea. My reasoning was: allowing post notifiers on kdump will highly compromise the reliability, whereas the panic_print is a solo option, and not very invasive. To mix it with all panic notifiers would just increase a lot the risk of a kdump failure. Put in other words: if I'm a kdump user and in order to have this panic_print setting I'd also need to enable post notifiers, certainly I'll not use the feature, 'cause I don't wanna risk kdump too much. One other option I've considered however, and I'd appreciate your opinion here, would be a new option on crash_kexec_post_notifiers that allows the users to select *which few notifiers* they want to enable. Currently it's all or nothing, and this approach is too heavy/risky I believe. Allowing customization on which post notifiers the user wants would be much better and in this case, having a post notifier for panic_print makes a lot of sense. What do you think? Thanks!
On 12/25/21 at 04:21pm, Guilherme G. Piccoli wrote: > On 23/12/2021 22:35, Dave Young wrote: > > Hi Guilherme, > > [...] > > If only the doc update, I think it is fine to be another follup-up > > patch. > > > > About your 1st option in patch log, there is crash_kexec_post_notifiers > > kernel param which can be used to switch on panic notifiers before kdump > > bootup. Another way probably you can try to move panic print to be > > panic notifier. Have this been discussed before? > > > > Hey Dave, thanks for the suggestion. I've considered that but didn't > like the idea. My reasoning was: allowing post notifiers on kdump will > highly compromise the reliability, whereas the panic_print is a solo > option, and not very invasive. > > To mix it with all panic notifiers would just increase a lot the risk of > a kdump failure. Put in other words: if I'm a kdump user and in order to > have this panic_print setting I'd also need to enable post notifiers, > certainly I'll not use the feature, 'cause I don't wanna risk kdump too > much. Hi Guilherme, yes, I have the same concern. But there could be more things like the panic_print in the future, it looks odd to have more kernel cmdline params though. > > One other option I've considered however, and I'd appreciate your > opinion here, would be a new option on crash_kexec_post_notifiers that > allows the users to select *which few notifiers* they want to enable. > Currently it's all or nothing, and this approach is too heavy/risky I > believe. Allowing customization on which post notifiers the user wants > would be much better and in this case, having a post notifier for > panic_print makes a lot of sense. What do you think? It is definitely a good idea, I'm more than glad to see this if you would like to work on this! > > Thanks! > Thanks Dave
On 26/12/2021 22:45, Dave Young wrote: > On 12/25/21 at 04:21pm, Guilherme G. Piccoli wrote: > [...] > Hi Guilherme, yes, I have the same concern. But there could be more > things like the panic_print in the future, it looks odd to have more > kernel cmdline params though. > Agreed! We're on the same page here, definitely. > [...] > It is definitely a good idea, I'm more than glad to see this if you > would like to work on this! > Awesome! I'll try to give it a shot this week, let's see how complex that'd be. Thanks again for the great feedback! Cheers, Guilherme
On Tue 2021-11-09 17:28:48, Guilherme G. Piccoli wrote: > Currently we have the "panic_print" parameter/sysctl to allow some extra > information to be printed in a panic event. On the other hand, the kdump > mechanism allows to kexec a new kernel to collect a memory dump for the > running kernel in case of panic. > Right now these options are incompatible: the user either sets the kdump > or makes use of "panic_print". The code path of "panic_print" isn't > reached when kdump is configured. > > There are situations though in which this would be interesting: for > example, in systems that are very memory constrained, a handcrafted > tiny kernel/initrd for kdump might be used in order to only collect the > dmesg in kdump kernel. Even more common, systems with no disk space for > the full (compressed) memory dump might very well rely in this > functionality too, dumping only the dmesg with the additional information > provided by "panic_print". Is anyone really using this approach? kmsg_dump() looks like a better choice when there are memory constrains. It does not need to reserve memory for booting the crash kernel. I would not mind much but this change depends on a not fully reliable assumption, see below. Also it will also complicate the solution for the kmsg_dump() code path. It would be better to discuss this togeter with the other patch https://lore.kernel.org/r/20220106212835.119409-1-gpiccoli@igalia.com > So, this is what the patch does: allows both functionality to co-exist; > if "panic_print" is set and the system performs a kdump, the extra > information is printed on dmesg before the kexec. Some notes about the > design choices here: > > (a) We could have introduced a sysctl or an extra bit on "panic_print" > to allow enabling the co-existence of kdump and "panic_print", but seems > that would be over-engineering; we have 3 cases, let's check how this > patch change things: > > - if the user have kdump set and not "panic_print", nothing changes; > - if the user have "panic_print" set and not kdump, nothing changes; > - if both are enabled, now we print the extra information before kdump, > which is exactly the goal of the patch (and should be the goal of the > user, since they enabled both options). > > (b) We assume that the code path won't return from __crash_kexec() > so we didn't guard against double execution of panic_print_sys_info(). This sounds suspiciously. There is small race window but it actually works. __crash_kexec() really never returns when @kexec_crash_image is loaded. Well, it might break in the future if the code is modified. Best Regards, Petr
On 13/01/2022 06:02, Petr Mladek wrote: > [...] > Is anyone really using this approach? kmsg_dump() looks like a better > choice when there are memory constrains. It does not need to reserve > memory for booting the crash kernel. > > I would not mind much but this change depends on a not fully reliable > assumption, see below. > > Also it will also complicate the solution for the kmsg_dump() code path. > It would be better to discuss this togeter with the other patch > https://lore.kernel.org/r/20220106212835.119409-1-gpiccoli@igalia.com Hi Petr, thanks for your analysis here. Indeed, our use case benefits from both this and the other patch in the thread you mentioned above - see [0]. > [...] >> (b) We assume that the code path won't return from __crash_kexec() >> so we didn't guard against double execution of panic_print_sys_info(). > > This sounds suspiciously. There is small race window but it actually works. > __crash_kexec() really never returns when @kexec_crash_image is > loaded. Well, it might break in the future if the code is modified. > > Best Regards, > Petr OK, so since this patch is already on linux-next and is relevant for our use case, how about if we explicitly guard against the double print, as I suggested in [0]? Cheers, Guilherme [0] https://lore.kernel.org/lkml/ba0e29ba-0e08-df6e-ade5-eb58ae2495e3@igalia.com/
Hi Andrew, can I ask you to please remove this patch from linux-next? It shows here: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=56439cb78293 Baoquan has concerns, and we're discussing that in another thread [0], after I submit another change on top of this one. So, I guess it's simpler to just drop it. My apologies for this, I should have definitely loop the kexec list in this one , but I forgot. Cheers, Guilherme [0] https://lore.kernel.org/lkml/7b93afff-66a0-44ee-3bb7-3d1e12dd47c2@igalia.com/
On 27/01/2022 13:53, Guilherme G. Piccoli wrote: > Hi Andrew, can I ask you to please remove this patch from linux-next? > > It shows here: > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=56439cb78293 > > Baoquan has concerns, and we're discussing that in another thread [0], > after I submit another change on top of this one. So, I guess it's > simpler to just drop it. > > My apologies for this, I should have definitely loop the kexec list in > this one , but I forgot. > > Cheers, > > > Guilherme > > > [0] > https://lore.kernel.org/lkml/7b93afff-66a0-44ee-3bb7-3d1e12dd47c2@igalia.com/ Hi Stephen / Andrew, sorry for the annoyance, but can you please remove this patch from linux-next? Today it shows as commit https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=d691651735f1 - this commit is subject to concern from Baoquan and we are discussing better ways to achieve that, through a refactor. Thanks in advance, Guilherme
Hi Guilherme, On Tue, 8 Feb 2022 15:12:04 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: > > Hi Stephen / Andrew, sorry for the annoyance, but can you please remove > this patch from linux-next? > > Today it shows as commit > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=d691651735f1 > - this commit is subject to concern from Baoquan and we are discussing > better ways to achieve that, through a refactor. Dropped from linux-next today.
On 08/02/2022 18:39, Stephen Rothwell wrote: > Hi Guilherme, > [...] > Dropped from linux-next today. > Hi Stephen, thanks! I'm still seeing this patch over there, though - I'm not sure if takes a while to show up in the tree... Notice this request is only for patch 3/3 in this series - patches 1 and 2 are fine, were reviewed and accepted =) Cheers, Guilherme
Hi Guilherme, On Wed, 9 Feb 2022 12:06:03 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: > > On 08/02/2022 18:39, Stephen Rothwell wrote: > > Hi Guilherme, > > [...] > > Dropped from linux-next today. > > > > Hi Stephen, thanks! I'm still seeing this patch over there, though - I'm > not sure if takes a while to show up in the tree... Andrew did another mmotm release which put it back in. I have removed it again for today. > Notice this request is only for patch 3/3 in this series - patches 1 and > 2 are fine, were reviewed and accepted =) Understood.
On 09/02/2022 20:26, Stephen Rothwell wrote: > [...] >> Hi Stephen, thanks! I'm still seeing this patch over there, though - I'm >> not sure if takes a while to show up in the tree... > > Andrew did another mmotm release which put it back in. I have removed > it again for today. > >> Notice this request is only for patch 3/3 in this series - patches 1 and >> 2 are fine, were reviewed and accepted =) > > Understood. > Thanks a bunch! So it seems Andrew needs to remove patch 3 from mmotm right? Cheers, Guilherme
diff --git a/kernel/panic.c b/kernel/panic.c index 5da71fa4e5f1..439dbf93b406 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -243,6 +243,13 @@ void panic(const char *fmt, ...) */ kgdb_panic(buf); + /* + * If we have a kdump kernel loaded, give a chance to panic_print + * show some extra information on kernel log if it was set... + */ + if (kexec_crash_loaded()) + panic_print_sys_info(); + /* * If we have crashed and we have a crash kernel loaded let it handle * everything else.
Currently we have the "panic_print" parameter/sysctl to allow some extra information to be printed in a panic event. On the other hand, the kdump mechanism allows to kexec a new kernel to collect a memory dump for the running kernel in case of panic. Right now these options are incompatible: the user either sets the kdump or makes use of "panic_print". The code path of "panic_print" isn't reached when kdump is configured. There are situations though in which this would be interesting: for example, in systems that are very memory constrained, a handcrafted tiny kernel/initrd for kdump might be used in order to only collect the dmesg in kdump kernel. Even more common, systems with no disk space for the full (compressed) memory dump might very well rely in this functionality too, dumping only the dmesg with the additional information provided by "panic_print". So, this is what the patch does: allows both functionality to co-exist; if "panic_print" is set and the system performs a kdump, the extra information is printed on dmesg before the kexec. Some notes about the design choices here: (a) We could have introduced a sysctl or an extra bit on "panic_print" to allow enabling the co-existence of kdump and "panic_print", but seems that would be over-engineering; we have 3 cases, let's check how this patch change things: - if the user have kdump set and not "panic_print", nothing changes; - if the user have "panic_print" set and not kdump, nothing changes; - if both are enabled, now we print the extra information before kdump, which is exactly the goal of the patch (and should be the goal of the user, since they enabled both options). (b) We assume that the code path won't return from __crash_kexec() so we didn't guard against double execution of panic_print_sys_info(). Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> --- kernel/panic.c | 7 +++++++ 1 file changed, 7 insertions(+)