Message ID | 20210317163714.328a038d@alex-virtual-machine (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/gup: check page posion status for coredump. | expand |
On 17.03.21 09:37, Aili Yao wrote: > When we do coredump for user process signal, this may be an SIGBUS signal > with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is > resulted from ECC memory fail like SRAR or SRAO, we expect the memory > recovery work is finished correctly, then the get_dump_page() will not > return the error page as its process pte is set invalid by > memory_failure(). > > But memory_failure() may fail, and the process's related pte may not be > correctly set invalid, for current code, we will return the poison page > and get it dumped and lead to system panic as its in kernel code. > > So check the poison status in get_dump_page(), and if TRUE, return NULL. > > Signed-off-by: Aili Yao <yaoaili@kingsoft.com> > --- > mm/gup.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/gup.c b/mm/gup.c > index e4c224c..499a496 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1536,6 +1536,14 @@ struct page *get_dump_page(unsigned long addr) > FOLL_FORCE | FOLL_DUMP | FOLL_GET); > if (locked) > mmap_read_unlock(mm); > + > + if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) { > + if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page)))) > + ret = 0; > + else if (unlikely(PageHWPoison(page))) > + ret = 0; > + } I wonder if a simple if (PageHWPoison(compound_head(page))) ret = 0; won't suffice. But I guess the "issue" is compound pages that are not huge pages or transparent huge pages. If not, we certainly want a wrapper for that magic, otherwise we have to replicate the same logic all over the place. > + > return (ret == 1) ? page : NULL; > } > #endif /* CONFIG_ELF_CORE */ >
On Wed, 17 Mar 2021 10:12:02 +0100 David Hildenbrand <david@redhat.com> wrote: > > I wonder if a simple > > if (PageHWPoison(compound_head(page))) > ret = 0; > > won't suffice. But I guess the "issue" is compound pages that are not > huge pages or transparent huge pages. Yes, the simple case won't suffice, as we mark the hugetlb page poison in head, and other cases in the specific single page struct. > If not, we certainly want a wrapper for that magic, otherwise we have to > replicate the same logic all over the place. > > > + > > return (ret == 1) ? page : NULL; > > } > > #endif /* CONFIG_ELF_CORE */ > > > > Yes, May other places meet the requirements as the coredump meets, it's better to make a wrapper for this. But i am not familiar with the specific scenario, so this patch only cover the coredump case. I will post a v2 patch for this.
On Wed, Mar 17, 2021 at 10:12:02AM +0100, David Hildenbrand wrote: > > + if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) { > > + if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page)))) > > + ret = 0; > > + else if (unlikely(PageHWPoison(page))) > > + ret = 0; > > + } > > I wonder if a simple > > if (PageHWPoison(compound_head(page))) > ret = 0; > > won't suffice. But I guess the "issue" is compound pages that are not huge > pages or transparent huge pages. THPs don't set the HWPoison bit on the head page. https://lore.kernel.org/linux-mm/20210316140947.GA3420@casper.infradead.org/ (and PAGEFLAG(HWPoison, hwpoison, PF_ANY)) By the way, #ifdef CONFIG_MEMORY_FAILURE PAGEFLAG(HWPoison, hwpoison, PF_ANY) TESTSCFLAG(HWPoison, hwpoison, PF_ANY) #define __PG_HWPOISON (1UL << PG_hwpoison) extern bool take_page_off_buddy(struct page *page); #else PAGEFLAG_FALSE(HWPoison) #define __PG_HWPOISON 0 #endif so there's no need for this if (IS_ENABLED(CONFIG_MEMORY_FAILURE) check, as it simply turns into if (PageHuge(page) && 0) else if (0) and the compiler can optimise it all away.
On Thu, 18 Mar 2021 04:46:00 +0000 Matthew Wilcox <willy@infradead.org> wrote: > On Wed, Mar 17, 2021 at 10:12:02AM +0100, David Hildenbrand wrote: > > > + if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) { > > > + if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page)))) > > > + ret = 0; > > > + else if (unlikely(PageHWPoison(page))) > > > + ret = 0; > > > + } > > > > I wonder if a simple > > > > if (PageHWPoison(compound_head(page))) > > ret = 0; > > > > won't suffice. But I guess the "issue" is compound pages that are not huge > > pages or transparent huge pages. > > THPs don't set the HWPoison bit on the head page. > > https://lore.kernel.org/linux-mm/20210316140947.GA3420@casper.infradead.org/ > > (and PAGEFLAG(HWPoison, hwpoison, PF_ANY)) > > By the way, > > #ifdef CONFIG_MEMORY_FAILURE > PAGEFLAG(HWPoison, hwpoison, PF_ANY) > TESTSCFLAG(HWPoison, hwpoison, PF_ANY) > #define __PG_HWPOISON (1UL << PG_hwpoison) > extern bool take_page_off_buddy(struct page *page); > #else > PAGEFLAG_FALSE(HWPoison) > #define __PG_HWPOISON 0 > #endif > > so there's no need for this > if (IS_ENABLED(CONFIG_MEMORY_FAILURE) > check, as it simply turns into > > if (PageHuge(page) && 0) > else if (0) > > and the compiler can optimise it all away. Yes, You are right, I will modify this later. Thanks for correction
On 18.03.21 05:46, Matthew Wilcox wrote: > On Wed, Mar 17, 2021 at 10:12:02AM +0100, David Hildenbrand wrote: >>> + if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) { >>> + if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page)))) >>> + ret = 0; >>> + else if (unlikely(PageHWPoison(page))) >>> + ret = 0; >>> + } >> >> I wonder if a simple >> >> if (PageHWPoison(compound_head(page))) >> ret = 0; >> >> won't suffice. But I guess the "issue" is compound pages that are not huge >> pages or transparent huge pages. > > THPs don't set the HWPoison bit on the head page. > > https://lore.kernel.org/linux-mm/20210316140947.GA3420@casper.infradead.org/ Oh, okay -- I was missing that we actually already set the HWPoison bit before trying to split via TestSetPageHWPoison(). I thought for a second that if splitting fails, we don't set any HWPoison bit.
diff --git a/mm/gup.c b/mm/gup.c index e4c224c..499a496 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1536,6 +1536,14 @@ struct page *get_dump_page(unsigned long addr) FOLL_FORCE | FOLL_DUMP | FOLL_GET); if (locked) mmap_read_unlock(mm); + + if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) { + if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page)))) + ret = 0; + else if (unlikely(PageHWPoison(page))) + ret = 0; + } + return (ret == 1) ? page : NULL; } #endif /* CONFIG_ELF_CORE */
When we do coredump for user process signal, this may be an SIGBUS signal with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is resulted from ECC memory fail like SRAR or SRAO, we expect the memory recovery work is finished correctly, then the get_dump_page() will not return the error page as its process pte is set invalid by memory_failure(). But memory_failure() may fail, and the process's related pte may not be correctly set invalid, for current code, we will return the poison page and get it dumped and lead to system panic as its in kernel code. So check the poison status in get_dump_page(), and if TRUE, return NULL. Signed-off-by: Aili Yao <yaoaili@kingsoft.com> --- mm/gup.c | 8 ++++++++ 1 file changed, 8 insertions(+)