Message ID | 20210429122519.15183-4-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages | expand |
On Thu, Apr 29, 2021 at 02:25:15PM +0200, David Hildenbrand wrote: > Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") > introduced page_is_poisoned(), however, v5 [1] of the patch used > "page_is_hwpoison()" and something went wrong while upstreaming. Rename the > function and move it to page-flags.h, from where it can be used in other > -- kcore -- context. > > Move the comment to the place where it belongs and simplify. > > [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine > > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > --- > include/linux/page-flags.h | 7 +++++++ > mm/gup.c | 6 +++++- > mm/internal.h | 20 -------------------- > 3 files changed, 12 insertions(+), 21 deletions(-) Nice :) > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 04a34c08e0a6..b8c56672a588 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -694,6 +694,13 @@ PAGEFLAG_FALSE(DoubleMap) > TESTSCFLAG_FALSE(DoubleMap) > #endif > > +static inline bool is_page_hwpoison(struct page *page) > +{ > + if (PageHWPoison(page)) > + return true; > + return PageHuge(page) && PageHWPoison(compound_head(page)); > +} > + > /* > * For pages that are never mapped to userspace (and aren't PageSlab), > * page_type may be used. Because it is initialised to -1, we invert the > diff --git a/mm/gup.c b/mm/gup.c > index ef7d2da9f03f..000f3303e7f2 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1536,7 +1536,11 @@ struct page *get_dump_page(unsigned long addr) > if (locked) > mmap_read_unlock(mm); > > - if (ret == 1 && is_page_poisoned(page)) > + /* > + * We might have hwpoisoned pages still mapped into user space. Don't > + * read these pages when creating a coredump, access could be fatal. > + */ > + if (ret == 1 && is_page_hwpoison(page)) > return NULL; > > return (ret == 1) ? page : NULL; > diff --git a/mm/internal.h b/mm/internal.h > index cb3c5e0a7799..1432feec62df 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -97,26 +97,6 @@ static inline void set_page_refcounted(struct page *page) > set_page_count(page, 1); > } > > -/* > - * When kernel touch the user page, the user page may be have been marked > - * poison but still mapped in user space, if without this page, the kernel > - * can guarantee the data integrity and operation success, the kernel is > - * better to check the posion status and avoid touching it, be good not to > - * panic, coredump for process fatal signal is a sample case matching this > - * scenario. Or if kernel can't guarantee the data integrity, it's better > - * not to call this function, let kernel touch the poison page and get to > - * panic. > - */ > -static inline bool is_page_poisoned(struct page *page) > -{ > - if (PageHWPoison(page)) > - return true; > - else if (PageHuge(page) && PageHWPoison(compound_head(page))) > - return true; > - > - return false; > -} > - > extern unsigned long highest_memmap_pfn; > > /* > -- > 2.30.2 >
On Thu 29-04-21 14:25:15, David Hildenbrand wrote: > Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") > introduced page_is_poisoned(), however, v5 [1] of the patch used > "page_is_hwpoison()" and something went wrong while upstreaming. Rename the > function and move it to page-flags.h, from where it can be used in other > -- kcore -- context. > > Move the comment to the place where it belongs and simplify. > > [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine > > Signed-off-by: David Hildenbrand <david@redhat.com> I do agree that being explicit about hwpoison is much better. Poisoned page can be also an unitialized one and I believe this is the reason why you are bringing that up. But you've made me look at d3378e86d182 and I am wondering whether this is really a valid patch. First of all it can leak a reference count AFAICS. Moreover it doesn't really fix anything because the page can be marked hwpoison right after the check is done. I do not think the race is feasible to be closed. So shouldn't we rather revert it?
On 05.05.21 15:13, Michal Hocko wrote: > On Thu 29-04-21 14:25:15, David Hildenbrand wrote: >> Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") >> introduced page_is_poisoned(), however, v5 [1] of the patch used >> "page_is_hwpoison()" and something went wrong while upstreaming. Rename the >> function and move it to page-flags.h, from where it can be used in other >> -- kcore -- context. >> >> Move the comment to the place where it belongs and simplify. >> >> [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine >> >> Signed-off-by: David Hildenbrand <david@redhat.com> > > I do agree that being explicit about hwpoison is much better. Poisoned > page can be also an unitialized one and I believe this is the reason why > you are bringing that up. I'm bringing it up because I want to reuse that function as state above :) > > But you've made me look at d3378e86d182 and I am wondering whether this > is really a valid patch. First of all it can leak a reference count > AFAICS. Moreover it doesn't really fix anything because the page can be > marked hwpoison right after the check is done. I do not think the race > is feasible to be closed. So shouldn't we rather revert it? I am not sure if we really care about races here that much here? I mean, essentially we are racing with HW breaking asynchronously. Just because we would be synchronizing with SetPageHWPoison() wouldn't mean we can stop HW from breaking. Long story short, this should be good enough for the cases we actually can handle? What am I missing?
On Wed 05-05-21 15:17:53, David Hildenbrand wrote: > On 05.05.21 15:13, Michal Hocko wrote: > > On Thu 29-04-21 14:25:15, David Hildenbrand wrote: > > > Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") > > > introduced page_is_poisoned(), however, v5 [1] of the patch used > > > "page_is_hwpoison()" and something went wrong while upstreaming. Rename the > > > function and move it to page-flags.h, from where it can be used in other > > > -- kcore -- context. > > > > > > Move the comment to the place where it belongs and simplify. > > > > > > [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > > I do agree that being explicit about hwpoison is much better. Poisoned > > page can be also an unitialized one and I believe this is the reason why > > you are bringing that up. > > I'm bringing it up because I want to reuse that function as state above :) > > > > > But you've made me look at d3378e86d182 and I am wondering whether this > > is really a valid patch. First of all it can leak a reference count > > AFAICS. Moreover it doesn't really fix anything because the page can be > > marked hwpoison right after the check is done. I do not think the race > > is feasible to be closed. So shouldn't we rather revert it? > > I am not sure if we really care about races here that much here? I mean, > essentially we are racing with HW breaking asynchronously. Just because we > would be synchronizing with SetPageHWPoison() wouldn't mean we can stop HW > from breaking. Right > Long story short, this should be good enough for the cases we actually can > handle? What am I missing? I am not sure I follow. My point is that I fail to see any added value of the check as it doesn't prevent the race (it fundamentally cannot as the page can be poisoned at any time) but the failure path doesn't put_page which is incorrect even for hwpoison pages.
>> Long story short, this should be good enough for the cases we actually can >> handle? What am I missing? > > I am not sure I follow. My point is that I fail to see any added value > of the check as it doesn't prevent the race (it fundamentally cannot as > the page can be poisoned at any time) but the failure path doesn't > put_page which is incorrect even for hwpoison pages. Oh, I think you are right. If we have a page and return NULL we would leak a reference. Actually, we discussed in that thread handling this entirely differently, which resulted in a v7 [1]; however Andrew moved forward with this (outdated?) patch, maybe that was just a mistake? Yes, I agree we should revert that patch for now. Regarding the race comment: AFAIU e.g., [2], it's not really a problem with a race, but rather some corner case issue that can happen if we fail in memory_failure(). [1] https://lkml.kernel.org/r/20210406104123.451ee3c3@alex-virtual-machine [2] https://lkml.kernel.org/r/20210331015258.GB22060@hori.linux.bs1.fc.nec.co.jp
On Wed 05-05-21 15:39:08, David Hildenbrand wrote: > > > Long story short, this should be good enough for the cases we actually can > > > handle? What am I missing? > > > > I am not sure I follow. My point is that I fail to see any added value > > of the check as it doesn't prevent the race (it fundamentally cannot as > > the page can be poisoned at any time) but the failure path doesn't > > put_page which is incorrect even for hwpoison pages. > > Oh, I think you are right. If we have a page and return NULL we would leak a > reference. > > Actually, we discussed in that thread handling this entirely differently, > which resulted in a v7 [1]; however Andrew moved forward with this > (outdated?) patch, maybe that was just a mistake? > > Yes, I agree we should revert that patch for now. OK, Let me send the revert to Andrew.
On Wed, 5 May 2021 15:27:39 +0200 Michal Hocko <mhocko@suse.com> wrote: > On Wed 05-05-21 15:17:53, David Hildenbrand wrote: > > On 05.05.21 15:13, Michal Hocko wrote: > > > On Thu 29-04-21 14:25:15, David Hildenbrand wrote: > > > > Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") > > > > introduced page_is_poisoned(), however, v5 [1] of the patch used > > > > "page_is_hwpoison()" and something went wrong while upstreaming. Rename the > > > > function and move it to page-flags.h, from where it can be used in other > > > > -- kcore -- context. > > > > > > > > Move the comment to the place where it belongs and simplify. > > > > > > > > [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine > > > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > > > > I do agree that being explicit about hwpoison is much better. Poisoned > > > page can be also an unitialized one and I believe this is the reason why > > > you are bringing that up. > > > > I'm bringing it up because I want to reuse that function as state above :) > > > > > > > > But you've made me look at d3378e86d182 and I am wondering whether this > > > is really a valid patch. First of all it can leak a reference count > > > AFAICS. Moreover it doesn't really fix anything because the page can be > > > marked hwpoison right after the check is done. I do not think the race > > > is feasible to be closed. So shouldn't we rather revert it? > > > > I am not sure if we really care about races here that much here? I mean, > > essentially we are racing with HW breaking asynchronously. Just because we > > would be synchronizing with SetPageHWPoison() wouldn't mean we can stop HW > > from breaking. > > Right > > > Long story short, this should be good enough for the cases we actually can > > handle? What am I missing? > > I am not sure I follow. My point is that I fail to see any added value > of the check as it doesn't prevent the race (it fundamentally cannot as > the page can be poisoned at any time) but the failure path doesn't > put_page which is incorrect even for hwpoison pages. Sorry, I have something to say: I have noticed the ref count leak in the previous topic ,but I don't think it's a really matter. For memory recovery case for user pages, we will keep one reference to the poison page so the error page will not be freed to buddy allocator. which can be checked in memory_faulure() function. For the case here, the reference count for error page may be greater than one as it's not unmmapped successfully and may shared. we take a reference for that page will not result some really mattering issue. The only break I can think for this leak is that we will fail to unpoison the error page for test purpose. Thanks! Aili Yao
On Wed, 5 May 2021 15:45:47 +0200 Michal Hocko <mhocko@suse.com> wrote: > On Wed 05-05-21 15:39:08, David Hildenbrand wrote: > > > > Long story short, this should be good enough for the cases we actually can > > > > handle? What am I missing? > > > > > > I am not sure I follow. My point is that I fail to see any added value > > > of the check as it doesn't prevent the race (it fundamentally cannot as > > > the page can be poisoned at any time) but the failure path doesn't > > > put_page which is incorrect even for hwpoison pages. > > > > Oh, I think you are right. If we have a page and return NULL we would leak a > > reference. > > > > Actually, we discussed in that thread handling this entirely differently, > > which resulted in a v7 [1]; however Andrew moved forward with this > > (outdated?) patch, maybe that was just a mistake? > > > > Yes, I agree we should revert that patch for now. > > OK, Let me send the revert to Andrew. > Got this! Anyway, I will try to post a new patch for this issue based on the previous patch v7. Thanks! Aili Yao
On Thu 06-05-21 08:56:11, Aili Yao wrote: > On Wed, 5 May 2021 15:27:39 +0200 > Michal Hocko <mhocko@suse.com> wrote: > > > On Wed 05-05-21 15:17:53, David Hildenbrand wrote: > > > On 05.05.21 15:13, Michal Hocko wrote: > > > > On Thu 29-04-21 14:25:15, David Hildenbrand wrote: > > > > > Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") > > > > > introduced page_is_poisoned(), however, v5 [1] of the patch used > > > > > "page_is_hwpoison()" and something went wrong while upstreaming. Rename the > > > > > function and move it to page-flags.h, from where it can be used in other > > > > > -- kcore -- context. > > > > > > > > > > Move the comment to the place where it belongs and simplify. > > > > > > > > > > [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine > > > > > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > > > > > > I do agree that being explicit about hwpoison is much better. Poisoned > > > > page can be also an unitialized one and I believe this is the reason why > > > > you are bringing that up. > > > > > > I'm bringing it up because I want to reuse that function as state above :) > > > > > > > > > > > But you've made me look at d3378e86d182 and I am wondering whether this > > > > is really a valid patch. First of all it can leak a reference count > > > > AFAICS. Moreover it doesn't really fix anything because the page can be > > > > marked hwpoison right after the check is done. I do not think the race > > > > is feasible to be closed. So shouldn't we rather revert it? > > > > > > I am not sure if we really care about races here that much here? I mean, > > > essentially we are racing with HW breaking asynchronously. Just because we > > > would be synchronizing with SetPageHWPoison() wouldn't mean we can stop HW > > > from breaking. > > > > Right > > > > > Long story short, this should be good enough for the cases we actually can > > > handle? What am I missing? > > > > I am not sure I follow. My point is that I fail to see any added value > > of the check as it doesn't prevent the race (it fundamentally cannot as > > the page can be poisoned at any time) but the failure path doesn't > > put_page which is incorrect even for hwpoison pages. > > Sorry, I have something to say: > > I have noticed the ref count leak in the previous topic ,but I don't think > it's a really matter. For memory recovery case for user pages, we will keep one > reference to the poison page so the error page will not be freed to buddy allocator. > which can be checked in memory_faulure() function. So what would happen if those pages are hwpoisoned from userspace rather than by HW. And repeatedly so?
On Thu, 6 May 2021 09:06:14 +0200 Michal Hocko <mhocko@suse.com> wrote: > On Thu 06-05-21 08:56:11, Aili Yao wrote: > > On Wed, 5 May 2021 15:27:39 +0200 > > Michal Hocko <mhocko@suse.com> wrote: > > > > > On Wed 05-05-21 15:17:53, David Hildenbrand wrote: > > > > On 05.05.21 15:13, Michal Hocko wrote: > > > > > On Thu 29-04-21 14:25:15, David Hildenbrand wrote: > > > > > > Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") > > > > > > introduced page_is_poisoned(), however, v5 [1] of the patch used > > > > > > "page_is_hwpoison()" and something went wrong while upstreaming. Rename the > > > > > > function and move it to page-flags.h, from where it can be used in other > > > > > > -- kcore -- context. > > > > > > > > > > > > Move the comment to the place where it belongs and simplify. > > > > > > > > > > > > [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine > > > > > > > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > > > > > > > > I do agree that being explicit about hwpoison is much better. Poisoned > > > > > page can be also an unitialized one and I believe this is the reason why > > > > > you are bringing that up. > > > > > > > > I'm bringing it up because I want to reuse that function as state above :) > > > > > > > > > > > > > > But you've made me look at d3378e86d182 and I am wondering whether this > > > > > is really a valid patch. First of all it can leak a reference count > > > > > AFAICS. Moreover it doesn't really fix anything because the page can be > > > > > marked hwpoison right after the check is done. I do not think the race > > > > > is feasible to be closed. So shouldn't we rather revert it? > > > > > > > > I am not sure if we really care about races here that much here? I mean, > > > > essentially we are racing with HW breaking asynchronously. Just because we > > > > would be synchronizing with SetPageHWPoison() wouldn't mean we can stop HW > > > > from breaking. > > > > > > Right > > > > > > > Long story short, this should be good enough for the cases we actually can > > > > handle? What am I missing? > > > > > > I am not sure I follow. My point is that I fail to see any added value > > > of the check as it doesn't prevent the race (it fundamentally cannot as > > > the page can be poisoned at any time) but the failure path doesn't > > > put_page which is incorrect even for hwpoison pages. > > > > Sorry, I have something to say: > > > > I have noticed the ref count leak in the previous topic ,but I don't think > > it's a really matter. For memory recovery case for user pages, we will keep one > > reference to the poison page so the error page will not be freed to buddy allocator. > > which can be checked in memory_faulure() function. > > So what would happen if those pages are hwpoisoned from userspace rather > than by HW. And repeatedly so? Sorry, I may be not totally understand what you mean. Do you mean hard page offline from mcelog? If yes, I think it's not for one real UC error but for CE storms. when we access this page in kernel, the access may success even it was marked hwpoison. Thanks! Aili Yao
On Thu 06-05-21 15:28:05, Aili Yao wrote: > On Thu, 6 May 2021 09:06:14 +0200 > Michal Hocko <mhocko@suse.com> wrote: > > > On Thu 06-05-21 08:56:11, Aili Yao wrote: > > > On Wed, 5 May 2021 15:27:39 +0200 > > > Michal Hocko <mhocko@suse.com> wrote: [...] > > > > I am not sure I follow. My point is that I fail to see any added value > > > > of the check as it doesn't prevent the race (it fundamentally cannot as > > > > the page can be poisoned at any time) but the failure path doesn't > > > > put_page which is incorrect even for hwpoison pages. > > > > > > Sorry, I have something to say: > > > > > > I have noticed the ref count leak in the previous topic ,but I don't think > > > it's a really matter. For memory recovery case for user pages, we will keep one > > > reference to the poison page so the error page will not be freed to buddy allocator. > > > which can be checked in memory_faulure() function. > > > > So what would happen if those pages are hwpoisoned from userspace rather > > than by HW. And repeatedly so? > > Sorry, I may be not totally understand what you mean. > > Do you mean hard page offline from mcelog? No I mean soft hwpoison from userspace (e.g. by MADV_HWPOISON but there are other interfaces AFAIK). And just to be explicit. All those interfaces are root only (CAP_SYS_ADMIN) so I am not really worried about any malitious abuse of the reference leak. I am mostly concerned that this is obviously broken without a good reason. The most trivial fix would have been to put_page in the return path but as I've mentioned in other email thread the fix really needs a deeper thought and consider other things. Hope that clarifies this some more.
On Thu, 6 May 2021 09:55:51 +0200 Michal Hocko <mhocko@suse.com> wrote: > On Thu 06-05-21 15:28:05, Aili Yao wrote: > > On Thu, 6 May 2021 09:06:14 +0200 > > Michal Hocko <mhocko@suse.com> wrote: > > > > > On Thu 06-05-21 08:56:11, Aili Yao wrote: > > > > On Wed, 5 May 2021 15:27:39 +0200 > > > > Michal Hocko <mhocko@suse.com> wrote: > [...] > > > > > I am not sure I follow. My point is that I fail to see any added value > > > > > of the check as it doesn't prevent the race (it fundamentally cannot as > > > > > the page can be poisoned at any time) but the failure path doesn't > > > > > put_page which is incorrect even for hwpoison pages. > > > > > > > > Sorry, I have something to say: > > > > > > > > I have noticed the ref count leak in the previous topic ,but I don't think > > > > it's a really matter. For memory recovery case for user pages, we will keep one > > > > reference to the poison page so the error page will not be freed to buddy allocator. > > > > which can be checked in memory_faulure() function. > > > > > > So what would happen if those pages are hwpoisoned from userspace rather > > > than by HW. And repeatedly so? > > > > Sorry, I may be not totally understand what you mean. > > > > Do you mean hard page offline from mcelog? > > No I mean soft hwpoison from userspace (e.g. by MADV_HWPOISON but there > are other interfaces AFAIK). > > And just to be explicit. All those interfaces are root only > (CAP_SYS_ADMIN) so I am not really worried about any malitious abuse of > the reference leak. I am mostly concerned that this is obviously broken > without a good reason. The most trivial fix would have been to put_page > in the return path but as I've mentioned in other email thread the fix > really needs a deeper thought and consider other things. > > Hope that clarifies this some more. Thanks, got it! Yes, there are some test scenarios that should be covered. But for test, the default SIGBUS handlers is usually replaced, and the test process may not hit the coredump code. Anyway, there is a ref leak in the normal enviorments and better to be fixed. Thanks! Aili Yao
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 04a34c08e0a6..b8c56672a588 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -694,6 +694,13 @@ PAGEFLAG_FALSE(DoubleMap) TESTSCFLAG_FALSE(DoubleMap) #endif +static inline bool is_page_hwpoison(struct page *page) +{ + if (PageHWPoison(page)) + return true; + return PageHuge(page) && PageHWPoison(compound_head(page)); +} + /* * For pages that are never mapped to userspace (and aren't PageSlab), * page_type may be used. Because it is initialised to -1, we invert the diff --git a/mm/gup.c b/mm/gup.c index ef7d2da9f03f..000f3303e7f2 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1536,7 +1536,11 @@ struct page *get_dump_page(unsigned long addr) if (locked) mmap_read_unlock(mm); - if (ret == 1 && is_page_poisoned(page)) + /* + * We might have hwpoisoned pages still mapped into user space. Don't + * read these pages when creating a coredump, access could be fatal. + */ + if (ret == 1 && is_page_hwpoison(page)) return NULL; return (ret == 1) ? page : NULL; diff --git a/mm/internal.h b/mm/internal.h index cb3c5e0a7799..1432feec62df 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -97,26 +97,6 @@ static inline void set_page_refcounted(struct page *page) set_page_count(page, 1); } -/* - * When kernel touch the user page, the user page may be have been marked - * poison but still mapped in user space, if without this page, the kernel - * can guarantee the data integrity and operation success, the kernel is - * better to check the posion status and avoid touching it, be good not to - * panic, coredump for process fatal signal is a sample case matching this - * scenario. Or if kernel can't guarantee the data integrity, it's better - * not to call this function, let kernel touch the poison page and get to - * panic. - */ -static inline bool is_page_poisoned(struct page *page) -{ - if (PageHWPoison(page)) - return true; - else if (PageHuge(page) && PageHWPoison(compound_head(page))) - return true; - - return false; -} - extern unsigned long highest_memmap_pfn; /*
Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") introduced page_is_poisoned(), however, v5 [1] of the patch used "page_is_hwpoison()" and something went wrong while upstreaming. Rename the function and move it to page-flags.h, from where it can be used in other -- kcore -- context. Move the comment to the place where it belongs and simplify. [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine Signed-off-by: David Hildenbrand <david@redhat.com> --- include/linux/page-flags.h | 7 +++++++ mm/gup.c | 6 +++++- mm/internal.h | 20 -------------------- 3 files changed, 12 insertions(+), 21 deletions(-)