Message ID | 20210909004131.163221-1-naoya.horiguchi@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable() | expand |
On Wed, Sep 8, 2021 at 5:41 PM Naoya Horiguchi <naoya.horiguchi@linux.dev> wrote: > > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > > commit fcc00621d88b ("mm/hwpoison: retry with shake_page() for > unhandlable pages") changes the return value of __get_hwpoison_page() to > retry for transiently unhandlable cases. However, __get_hwpoison_page() > currently fails to properly judge buddy pages as handlable, so hard/soft > offline for buddy pages always fail as "unhandlable page". This is > totally regrettable. > > So let's add is_free_buddy_page() in HWPoisonHandlable(), so that > __get_hwpoison_page() returns different return values between buddy > pages and unhandlable pages as intended. > > Fixes: fcc00621d88b ("mm/hwpoison: retry with shake_page() for unhandlable pages") > Cc: <stable@vger.kernel.org> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > --- > mm/memory-failure.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git v5.14-rc7-mmotm-2021-08-23-16-42/mm/memory-failure.c v5.14-rc7-mmotm-2021-08-23-16-42_patched/mm/memory-failure.c > index 60df8fcd0444..3416c55be810 100644 > --- v5.14-rc7-mmotm-2021-08-23-16-42/mm/memory-failure.c > +++ v5.14-rc7-mmotm-2021-08-23-16-42_patched/mm/memory-failure.c > @@ -1126,7 +1126,7 @@ static int page_action(struct page_state *ps, struct page *p, > */ > static inline bool HWPoisonHandlable(struct page *page) > { > - return PageLRU(page) || __PageMovable(page); > + return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page); It seems to work. Although this may change the return value of get_any_page() to 1 when MF_COUNT_INCREASED is set. This may cause soft offline to mishandle free buddy page, but MF_COUNT_INCREASED is only set when madvise is used, and madvise definitely can't soft offline free buddy page. It did take me a while to figure out this trick. Maybe need some refactor? Anyway this patch looks fine to me. Reviewed-by: Yang Shi <shy828301@gmail.com> > } > > static int __get_hwpoison_page(struct page *page) > -- > 2.25.1 >
On Wed, Sep 08, 2021 at 08:22:14PM -0700, Yang Shi wrote: > On Wed, Sep 8, 2021 at 5:41 PM Naoya Horiguchi > <naoya.horiguchi@linux.dev> wrote: > > > > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > > > > commit fcc00621d88b ("mm/hwpoison: retry with shake_page() for > > unhandlable pages") changes the return value of __get_hwpoison_page() to > > retry for transiently unhandlable cases. However, __get_hwpoison_page() > > currently fails to properly judge buddy pages as handlable, so hard/soft > > offline for buddy pages always fail as "unhandlable page". This is > > totally regrettable. > > > > So let's add is_free_buddy_page() in HWPoisonHandlable(), so that > > __get_hwpoison_page() returns different return values between buddy > > pages and unhandlable pages as intended. > > > > Fixes: fcc00621d88b ("mm/hwpoison: retry with shake_page() for unhandlable pages") > > Cc: <stable@vger.kernel.org> > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > > --- > > mm/memory-failure.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git v5.14-rc7-mmotm-2021-08-23-16-42/mm/memory-failure.c v5.14-rc7-mmotm-2021-08-23-16-42_patched/mm/memory-failure.c > > index 60df8fcd0444..3416c55be810 100644 > > --- v5.14-rc7-mmotm-2021-08-23-16-42/mm/memory-failure.c > > +++ v5.14-rc7-mmotm-2021-08-23-16-42_patched/mm/memory-failure.c > > @@ -1126,7 +1126,7 @@ static int page_action(struct page_state *ps, struct page *p, > > */ > > static inline bool HWPoisonHandlable(struct page *page) > > { > > - return PageLRU(page) || __PageMovable(page); > > + return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page); > > It seems to work. Although this may change the return value of > get_any_page() to 1 when MF_COUNT_INCREASED is set. This may cause > soft offline to mishandle free buddy page, but MF_COUNT_INCREASED is > only set when madvise is used, and madvise definitely can't soft > offline free buddy page. Yes, madvise(MADV_SOFT_OFFLINE) should not be called for free buddy pages, because the interface covers only user-mapped pages. > It did take me a while to figure out this > trick. Maybe need some refactor? I think so, the current code looks to me fragile for future change, but I'm not sure how we can improve this. One (maybe not related to MF_COUNT_INCREASED) room for refactoring is "if (PageTransHuge(head))" block in __get_hwpoison_page(), which might be old and useless code for now. > > Anyway this patch looks fine to me. Reviewed-by: Yang Shi <shy828301@gmail.com> Thank you for the review. - Naoya
On 09.09.21 02:41, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > > commit fcc00621d88b ("mm/hwpoison: retry with shake_page() for > unhandlable pages") changes the return value of __get_hwpoison_page() to > retry for transiently unhandlable cases. However, __get_hwpoison_page() > currently fails to properly judge buddy pages as handlable, so hard/soft > offline for buddy pages always fail as "unhandlable page". This is > totally regrettable. > > So let's add is_free_buddy_page() in HWPoisonHandlable(), so that > __get_hwpoison_page() returns different return values between buddy > pages and unhandlable pages as intended. > > Fixes: fcc00621d88b ("mm/hwpoison: retry with shake_page() for unhandlable pages") > Cc: <stable@vger.kernel.org> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > --- > mm/memory-failure.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git v5.14-rc7-mmotm-2021-08-23-16-42/mm/memory-failure.c v5.14-rc7-mmotm-2021-08-23-16-42_patched/mm/memory-failure.c > index 60df8fcd0444..3416c55be810 100644 > --- v5.14-rc7-mmotm-2021-08-23-16-42/mm/memory-failure.c > +++ v5.14-rc7-mmotm-2021-08-23-16-42_patched/mm/memory-failure.c > @@ -1126,7 +1126,7 @@ static int page_action(struct page_state *ps, struct page *p, > */ > static inline bool HWPoisonHandlable(struct page *page) > { > - return PageLRU(page) || __PageMovable(page); > + return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page); > } > > static int __get_hwpoison_page(struct page *page) > LGTM and I agree that code might desrve some cleanups. Acked-by: David Hildenbrand <david@redhat.com>
diff --git v5.14-rc7-mmotm-2021-08-23-16-42/mm/memory-failure.c v5.14-rc7-mmotm-2021-08-23-16-42_patched/mm/memory-failure.c index 60df8fcd0444..3416c55be810 100644 --- v5.14-rc7-mmotm-2021-08-23-16-42/mm/memory-failure.c +++ v5.14-rc7-mmotm-2021-08-23-16-42_patched/mm/memory-failure.c @@ -1126,7 +1126,7 @@ static int page_action(struct page_state *ps, struct page *p, */ static inline bool HWPoisonHandlable(struct page *page) { - return PageLRU(page) || __PageMovable(page); + return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page); } static int __get_hwpoison_page(struct page *page)