Message ID | cover.1698441495.git.kjlx@templeofstupid.com (mailing list archive) |
---|---|
Headers | show |
Series | Triggering a softlockup panic during SMP boot | expand |
On Fri, Oct 27, 2023 at 02:46:26PM -0700, Krister Johansen wrote: > Hi, > This pair of patches was the result of an unsuccessful attempt to set > softlockup_panic before SMP boot. The rationale for wanting to set this > parameter is that some of the VMs that my team runs will occasionally > get stuck while onlining the non-boot processors as part of SMP boot. > > In the cases where this happens, we find out about it after the instance > successfully boots; however, the machines can get stuck for tens of > minutes at a time before finally completing onlining processors. Since > we pay per minute for many of these VMs there were two goals for setting > this value on boot: first, fail fast and hope that a subsequent boot > attempt will be successful. Second, a panic is a little easier to keep > track of, especially if we're scraping serial logs after the fact. In > essence, the goal is to trigger the failure earlier and hopefully get > more useful information for further debugging the problem as well. > > While testing to make sure that this value was getting correctly set on > boot, I ran into a pair of surprises. First, when the softlockup_panic > parameter was migrated to a sysctl alias, it had the side effect of > setting the parameter value after SMP boot has occurred, when it used to > be set before this. Second, testing revealed that even though the > aliases were being correctly processed, the kernel was reporting the > commandline arguments as unrecognized. This generated a message in the > logs about an unrecognized parameter (even though it was) and the > parameter was passed as an environment variable to init. > > The first patch ensures that aliased sysctl arguments are not reported > as unrecognized boot arguments. > > The second patch moves the setting of softlockup_panic earlier in boot, > where it can take effect before SMP boot beings. Sounds all great but I only got the cover letter, so may be resend? Luis
On Fri, Oct 27, 2023 at 03:04:56PM -0700, Luis Chamberlain wrote: > On Fri, Oct 27, 2023 at 02:46:26PM -0700, Krister Johansen wrote: > > Hi, > > This pair of patches was the result of an unsuccessful attempt to set > > softlockup_panic before SMP boot. The rationale for wanting to set this > > parameter is that some of the VMs that my team runs will occasionally > > get stuck while onlining the non-boot processors as part of SMP boot. > > > > In the cases where this happens, we find out about it after the instance > > successfully boots; however, the machines can get stuck for tens of > > minutes at a time before finally completing onlining processors. Since > > we pay per minute for many of these VMs there were two goals for setting > > this value on boot: first, fail fast and hope that a subsequent boot > > attempt will be successful. Second, a panic is a little easier to keep > > track of, especially if we're scraping serial logs after the fact. In > > essence, the goal is to trigger the failure earlier and hopefully get > > more useful information for further debugging the problem as well. > > > > While testing to make sure that this value was getting correctly set on > > boot, I ran into a pair of surprises. First, when the softlockup_panic > > parameter was migrated to a sysctl alias, it had the side effect of > > setting the parameter value after SMP boot has occurred, when it used to > > be set before this. Second, testing revealed that even though the > > aliases were being correctly processed, the kernel was reporting the > > commandline arguments as unrecognized. This generated a message in the > > logs about an unrecognized parameter (even though it was) and the > > parameter was passed as an environment variable to init. > > > > The first patch ensures that aliased sysctl arguments are not reported > > as unrecognized boot arguments. > > > > The second patch moves the setting of softlockup_panic earlier in boot, > > where it can take effect before SMP boot beings. > > Sounds all great but I only got the cover letter, so may be resend? Apologies, I'm not sure quite what went wrong there. I've resent the patches to the people in the To: of the original messages, in an attempt to avoid sending copies to everybody a second time. The entire set seems to have made it to lore: https://lore.kernel.org/linux-fsdevel/ZTw0CACF3jtT3%2FdX@bombadil.infradead.org/T/#r831972d73aad653c3b732e4e36e743cd53673847 If you still haven't got the copies, please let me know and I'll see if there's something else I can do to get them to you. Sorry about this. :/ -K
On Fri, Oct 27, 2023 at 02:46:26PM -0700, Krister Johansen wrote: > Hi, > This pair of patches was the result of an unsuccessful attempt to set > softlockup_panic before SMP boot. The rationale for wanting to set this > parameter is that some of the VMs that my team runs will occasionally > get stuck while onlining the non-boot processors as part of SMP boot. > > In the cases where this happens, we find out about it after the instance > successfully boots; however, the machines can get stuck for tens of > minutes at a time before finally completing onlining processors. Since > we pay per minute for many of these VMs there were two goals for setting > this value on boot: first, fail fast and hope that a subsequent boot > attempt will be successful. Second, a panic is a little easier to keep > track of, especially if we're scraping serial logs after the fact. In > essence, the goal is to trigger the failure earlier and hopefully get > more useful information for further debugging the problem as well. > > While testing to make sure that this value was getting correctly set on > boot, I ran into a pair of surprises. First, when the softlockup_panic > parameter was migrated to a sysctl alias, it had the side effect of > setting the parameter value after SMP boot has occurred, when it used to > be set before this. Second, testing revealed that even though the > aliases were being correctly processed, the kernel was reporting the > commandline arguments as unrecognized. This generated a message in the > logs about an unrecognized parameter (even though it was) and the > parameter was passed as an environment variable to init. > > The first patch ensures that aliased sysctl arguments are not reported > as unrecognized boot arguments. > > The second patch moves the setting of softlockup_panic earlier in boot, > where it can take effect before SMP boot beings. Thanks! Looks good, merged and will push to Linus soon. Luis