Message ID | 20230127015030.30074-1-tony.luck@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic | expand |
On Thu, Jan 26, 2023 at 05:50:30PM -0800, Tony Luck wrote: > From: Zhiquan Li <zhiquan1.li@intel.com> > > Kdump can exclude the HWPosion page to avoid touch the error page > again, the prerequisite is the PG_hwpoison page flag is set. > However, for some MCE fatal error cases, there are no opportunity > to queue a task for calling memory_failure(), as a result, > the capture kernel touches the error page again and panics. > > Add function mce_set_page_hwpoison_now() which mark a page as > HWPoison before kernel panic() for MCE error, so that the dump > program can check and skip the error page and prevent the capture > kernel panic. > > [Tony: Changed TestSetPageHWPoison() to SetPageHWPoison()] > > Co-developedd-by: Youquan Song <youquan.song@intel.com> > Signed-off-by: Youquan Song <youquan.song@intel.com> > Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> > Signed-off-by: Tony Luck <tony.luck@intel.com> Hi, Thank you for the patch. > --- > arch/x86/kernel/cpu/mce/core.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 2c8ec5c71712..0630999c6311 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -162,6 +162,24 @@ void mce_unregister_decode_chain(struct notifier_block *nb) > } > EXPORT_SYMBOL_GPL(mce_unregister_decode_chain); > > +/* > + * Kdump can exclude the HWPosion page to avoid touch the error page again, > + * the prerequisite is the PG_hwpoison page flag is set. However, for some > + * MCE fatal error cases, there are no opportunity to queue a task > + * for calling memory_failure(), as a result, the capture kernel panic. s/panic/panics/ ? > + * This function mark the page as HWPoison before kernel panic() for MCE. s/mark/marks/ > + */ > +static void mce_set_page_hwpoison_now(unsigned long pfn) > +{ > + struct page *p; > + > + /* TODO: need to handle other sort of page, like SGX, PMEM and > + * HugeTLB pages*/ Although I'm not sure that SGX memory or PMEM pages are expected to be included in kdump, but simply setting PageHWPoison does not work for them? (Maybe that depends on how kdump handles these types of memory.) As for HugeTLB, kdump utility should parse the struct page and be aware of HugeTLB pages, so maybe setting PageHWPoison on the head page could work. > + p = pfn_to_online_page(pfn); > + if (p) > + SetPageHWPoison(p); > +} > + > static void __print_mce(struct mce *m) > { > pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n", > @@ -292,6 +310,8 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) > if (!fake_panic) { > if (panic_timeout == 0) > panic_timeout = mca_cfg.panic_timeout; > + if (final && (final->status & MCI_STATUS_ADDRV)) > + mce_set_page_hwpoison_now(final->addr >> PAGE_SHIFT); > panic(msg); I think that setting PageHWPoison outside hwpoison subsystem is OK here, because this is called just before calling panic() so it's expected to not conflict with other hwpoison-related code. Thanks, Naoya Horiguchi
> Although I'm not sure that SGX memory or PMEM pages are expected to be > included in kdump, but simply setting PageHWPoison does not work for them? > (Maybe that depends on how kdump handles these types of memory.) SGX/TDX pages can't be dumped. They are encrypted with no way for kdump to get the key. PMEM seems pointless (but I don't know what kdump does here). > As for HugeTLB, kdump utility should parse the struct page and be aware of > HugeTLB pages, so maybe setting PageHWPoison on the head page could work. Or maybe kdump can take not of the PageHWPoison flag on the sub-page of the huge page? It depends on whether there is any benefit to the dump to include the not-poisoned parts of a huge page. > I think that setting PageHWPoison outside hwpoison subsystem is OK here, > because this is called just before calling panic() so it's expected to not > conflict with other hwpoison-related code. Thanks for the review. -Tony
On Mon, Jan 30, 2023 at 07:21:15PM +0000, Luck, Tony wrote: > > Although I'm not sure that SGX memory or PMEM pages are expected to be > > included in kdump, but simply setting PageHWPoison does not work for them? > > (Maybe that depends on how kdump handles these types of memory.) > > SGX/TDX pages can't be dumped. They are encrypted with no way for kdump to > get the key. > > PMEM seems pointless (but I don't know what kdump does here). > > > As for HugeTLB, kdump utility should parse the struct page and be aware of > > HugeTLB pages, so maybe setting PageHWPoison on the head page could work. > > Or maybe kdump can take not of the PageHWPoison flag on the sub-page of the > huge page? It depends on whether there is any benefit to the dump to include the > not-poisoned parts of a huge page. I think that many kdump users filter out HugeTLB pages (setting dump_level to filter "User pages") to reduce the size of kdump. User pages are not much helpful to investigate kernel problems, so filtering all sub-pages in hwpoisoned hugepage seems to me not so harmful. I don't say that saving healthy subpages has no benefit, but I don't know much about usecases where user pages in kdump file help. Thanks, Naoya Horiguchi
On 2023/1/31 12:50, HORIGUCHI NAOYA(堀口 直也) wrote: > On Mon, Jan 30, 2023 at 07:21:15PM +0000, Luck, Tony wrote: >>> Although I'm not sure that SGX memory or PMEM pages are expected to be >>> included in kdump, but simply setting PageHWPoison does not work for them? >>> (Maybe that depends on how kdump handles these types of memory.) >> >> SGX/TDX pages can't be dumped. They are encrypted with no way for kdump to >> get the key. >> >> PMEM seems pointless (but I don't know what kdump does here). >> >>> As for HugeTLB, kdump utility should parse the struct page and be aware of >>> HugeTLB pages, so maybe setting PageHWPoison on the head page could work. >> >> Or maybe kdump can take not of the PageHWPoison flag on the sub-page of the >> huge page? It depends on whether there is any benefit to the dump to include the >> not-poisoned parts of a huge page. > > I think that many kdump users filter out HugeTLB pages (setting dump_level > to filter "User pages") to reduce the size of kdump. User pages are not > much helpful to investigate kernel problems, so filtering all sub-pages in > hwpoisoned hugepage seems to me not so harmful. > > I don't say that saving healthy subpages has no benefit, but I don't know > much about usecases where user pages in kdump file help. > Many thanks, Naoya and Tony. The "TODO" comment was initially added by me. I referenced memory_failure() that gets rid of the three cases specifically, so I'd like the cases are fully discussed here to make sure it is a strong fix. - SGX, as Tony said those pages can't be dumped. - PMEM, although the memory region is figured out by kexec, kdump utility doesn't have corresponding implementation for it. So we can ignore it. - HugeTLB As we known there are complicated processes for HugeTLB pages at memory_failure(). However, it doesn't make sense for a routine going to call panic(). Kernel marks the page hwpoisoned would be enough. The information is sufficient for kdump to make decision on how to deal with a hwpoisoned huge pages, even though setting the PageHWPoison flag on the sub-page of a huge page. It's kdump's responsibility for the not-poisoned parts of a huge page. Currently the implementation is to exclude the whole huge page simply: - https://github.com/makedumpfile/makedumpfile/commit/e8b4f93b3260defe86f5e13ca7536c07f2e32914 - https://bugzilla.redhat.com/show_bug.cgi?id=1181649 Best Regards, Zhiquan
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2c8ec5c71712..0630999c6311 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -162,6 +162,24 @@ void mce_unregister_decode_chain(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(mce_unregister_decode_chain); +/* + * Kdump can exclude the HWPosion page to avoid touch the error page again, + * the prerequisite is the PG_hwpoison page flag is set. However, for some + * MCE fatal error cases, there are no opportunity to queue a task + * for calling memory_failure(), as a result, the capture kernel panic. + * This function mark the page as HWPoison before kernel panic() for MCE. + */ +static void mce_set_page_hwpoison_now(unsigned long pfn) +{ + struct page *p; + + /* TODO: need to handle other sort of page, like SGX, PMEM and + * HugeTLB pages*/ + p = pfn_to_online_page(pfn); + if (p) + SetPageHWPoison(p); +} + static void __print_mce(struct mce *m) { pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n", @@ -292,6 +310,8 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) if (!fake_panic) { if (panic_timeout == 0) panic_timeout = mca_cfg.panic_timeout; + if (final && (final->status & MCI_STATUS_ADDRV)) + mce_set_page_hwpoison_now(final->addr >> PAGE_SHIFT); panic(msg); } else pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);