Message ID | 20230708085744.3599311-9-linmiaohe@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | A few fixup and cleanup patches for memory-failure | expand |
On Sat, Jul 08, 2023 at 04:57:44PM +0800, Miaohe Lin wrote: > page_folio() is fetched before calling get_hwpoison_hugetlb_folio() > without hugetlb_lock being held. So hugetlb page could be demoted > before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after > page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold > unexpected extra refcnt of hugetlb folio while leaving demoted page > un-refcnted. Very nice, thank you for finding the issue. > > Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation") > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/memory-failure.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 76d88d27cdbe..066bf57f2d22 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags) > bool hugetlb = false; > > ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false); > - if (hugetlb) > - return ret; > + if (hugetlb) { > + if (folio == page_folio(page)) > + return ret; Some short comment about the race against demotion here is helpful. Anyway, the patch looks good to me. Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > + if (ret > 0) { > + folio_put(folio); > + folio = page_folio(page); > + } > + } > > /* > * This check prevents from calling folio_try_get() for any > @@ -1478,8 +1484,12 @@ static int __get_unpoison_page(struct page *page) > bool hugetlb = false; > > ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, true); > - if (hugetlb) > - return ret; > + if (hugetlb) { > + if (folio == page_folio(page)) > + return ret; > + if (ret > 0) > + folio_put(folio); > + } > > /* > * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison, > -- > 2.33.0 > > >
On 2023/7/10 15:58, Naoya Horiguchi wrote: > On Sat, Jul 08, 2023 at 04:57:44PM +0800, Miaohe Lin wrote: >> page_folio() is fetched before calling get_hwpoison_hugetlb_folio() >> without hugetlb_lock being held. So hugetlb page could be demoted >> before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after >> page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold >> unexpected extra refcnt of hugetlb folio while leaving demoted page >> un-refcnted. > > Very nice, thank you for finding the issue. > >> >> Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation") >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> mm/memory-failure.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 76d88d27cdbe..066bf57f2d22 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags) >> bool hugetlb = false; >> >> ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false); >> - if (hugetlb) >> - return ret; >> + if (hugetlb) { >> + if (folio == page_folio(page)) >> + return ret; > > Some short comment about the race against demotion here is helpful. Does the below comment makes sense to you? " Make sure hugetlb demotion did not happen from under us. " > > Anyway, the patch looks good to me. > > Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Many thanks for your review and comment, Naoya.
On Mon, Jul 10, 2023 at 04:32:27PM +0800, Miaohe Lin wrote: > On 2023/7/10 15:58, Naoya Horiguchi wrote: > > On Sat, Jul 08, 2023 at 04:57:44PM +0800, Miaohe Lin wrote: > >> page_folio() is fetched before calling get_hwpoison_hugetlb_folio() > >> without hugetlb_lock being held. So hugetlb page could be demoted > >> before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after > >> page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold > >> unexpected extra refcnt of hugetlb folio while leaving demoted page > >> un-refcnted. > > > > Very nice, thank you for finding the issue. > > > >> > >> Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation") > >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > >> --- > >> mm/memory-failure.c | 18 ++++++++++++++---- > >> 1 file changed, 14 insertions(+), 4 deletions(-) > >> > >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >> index 76d88d27cdbe..066bf57f2d22 100644 > >> --- a/mm/memory-failure.c > >> +++ b/mm/memory-failure.c > >> @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags) > >> bool hugetlb = false; > >> > >> ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false); > >> - if (hugetlb) > >> - return ret; > >> + if (hugetlb) { > >> + if (folio == page_folio(page)) > >> + return ret; > > > > Some short comment about the race against demotion here is helpful. > > Does the below comment makes sense to you? > > " > Make sure hugetlb demotion did not happen from under us. > " Yes, this sounds fine. Thanks, Naoya Horiguchi
On 2023/7/10 16:39, Naoya Horiguchi wrote: > On Mon, Jul 10, 2023 at 04:32:27PM +0800, Miaohe Lin wrote: >> On 2023/7/10 15:58, Naoya Horiguchi wrote: >>> On Sat, Jul 08, 2023 at 04:57:44PM +0800, Miaohe Lin wrote: >>>> page_folio() is fetched before calling get_hwpoison_hugetlb_folio() >>>> without hugetlb_lock being held. So hugetlb page could be demoted >>>> before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after >>>> page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold >>>> unexpected extra refcnt of hugetlb folio while leaving demoted page >>>> un-refcnted. >>> >>> Very nice, thank you for finding the issue. >>> >>>> >>>> Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation") >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>> --- >>>> mm/memory-failure.c | 18 ++++++++++++++---- >>>> 1 file changed, 14 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>> index 76d88d27cdbe..066bf57f2d22 100644 >>>> --- a/mm/memory-failure.c >>>> +++ b/mm/memory-failure.c >>>> @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags) >>>> bool hugetlb = false; >>>> >>>> ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false); >>>> - if (hugetlb) >>>> - return ret; >>>> + if (hugetlb) { >>>> + if (folio == page_folio(page)) >>>> + return ret; >>> >>> Some short comment about the race against demotion here is helpful. >> >> Does the below comment makes sense to you? >> >> " >> Make sure hugetlb demotion did not happen from under us. >> " > > Yes, this sounds fine. Will do it in v2. Thanks. > > Thanks, > Naoya Horiguchi > > . >
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 76d88d27cdbe..066bf57f2d22 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags) bool hugetlb = false; ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false); - if (hugetlb) - return ret; + if (hugetlb) { + if (folio == page_folio(page)) + return ret; + if (ret > 0) { + folio_put(folio); + folio = page_folio(page); + } + } /* * This check prevents from calling folio_try_get() for any @@ -1478,8 +1484,12 @@ static int __get_unpoison_page(struct page *page) bool hugetlb = false; ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, true); - if (hugetlb) - return ret; + if (hugetlb) { + if (folio == page_folio(page)) + return ret; + if (ret > 0) + folio_put(folio); + } /* * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
page_folio() is fetched before calling get_hwpoison_hugetlb_folio() without hugetlb_lock being held. So hugetlb page could be demoted before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold unexpected extra refcnt of hugetlb folio while leaving demoted page un-refcnted. Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- mm/memory-failure.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)