Message ID | 20240521235429.2368017-6-jane.chu@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Enhance soft hwpoison handling and injection | expand |
On Tue, May 21, 2024 at 05:54:29PM -0600, Jane Chu wrote: > While handling hwpoison in a THP page, it is possible that > try_to_split_thp_page() fails. For example, when the THP page has > been RDMA pinned. At this point, the kernel cannot isolate the > poisoned THP page, all it could do is to send a SIGBUS to the user > process with meaningful payload to give user-level recovery a chance. > > Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de>
On 2024/5/22 7:54, Jane Chu wrote: > While handling hwpoison in a THP page, it is possible that > try_to_split_thp_page() fails. For example, when the THP page has > been RDMA pinned. At this point, the kernel cannot isolate the > poisoned THP page, all it could do is to send a SIGBUS to the user > process with meaningful payload to give user-level recovery a chance. > Thanks for your patch. > Signed-off-by: Jane Chu <jane.chu@oracle.com> > --- > mm/memory-failure.c | 35 ++++++++++++++++++++++++++++++----- > 1 file changed, 30 insertions(+), 5 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 794196951a04..a14d56e66902 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1706,7 +1706,12 @@ static int identify_page_state(unsigned long pfn, struct page *p, > return page_action(ps, p, pfn); > } > > -static int try_to_split_thp_page(struct page *page) > +/* > + * When 'release' is 'false', it means that if thp split has failed, > + * there is still more to do, hence the page refcount we took earlier > + * is still needed. > + */ > +static int try_to_split_thp_page(struct page *page, bool release) > { > int ret; > > @@ -1714,7 +1719,7 @@ static int try_to_split_thp_page(struct page *page) > ret = split_huge_page(page); > unlock_page(page); > > - if (unlikely(ret)) > + if (ret && release) > put_page(page); Is "unlikely" still needed? > > return ret; > @@ -2187,6 +2192,24 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > return rc; > } > > +/* > + * The calling condition is as such: thp split failed, page might have > + * been RDMA pinned, not much can be done for recovery. > + * But a SIGBUS should be delivered with vaddr provided so that the user > + * application has a chance to recover. Also, application processes' > + * election for MCE early killed will be honored. > + */ > +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, > + struct folio *folio) > +{ > + LIST_HEAD(tokill); > + > + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); > + kill_procs(&tokill, true, pfn, flags); > + > + return -EHWPOISON; > +} > + > /** > * memory_failure - Handle memory failure of a page. > * @pfn: Page Number of the corrupted page > @@ -2328,8 +2351,10 @@ int memory_failure(unsigned long pfn, int flags) > * page is a valid handlable page. > */ > folio_set_has_hwpoisoned(folio); > - if (try_to_split_thp_page(p) < 0) { > - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); > + if (try_to_split_thp_page(p, false) < 0) { > + res = kill_procs_now(p, pfn, flags, folio); No strong opinion but we might remove the return value of kill_procs_now as it always return -EHWPOISON? We could simply set res to -EHWPOISON here. Besides from above possible nits, this patch looks good to me. Acked-by: Miaohe Lin <linmiaohe@huawei.com> Thanks. .
On 5/22/2024 8:02 PM, Miaohe Lin wrote: > On 2024/5/22 7:54, Jane Chu wrote: >> While handling hwpoison in a THP page, it is possible that >> try_to_split_thp_page() fails. For example, when the THP page has >> been RDMA pinned. At this point, the kernel cannot isolate the >> poisoned THP page, all it could do is to send a SIGBUS to the user >> process with meaningful payload to give user-level recovery a chance. >> > Thanks for your patch. > >> Signed-off-by: Jane Chu <jane.chu@oracle.com> >> --- >> mm/memory-failure.c | 35 ++++++++++++++++++++++++++++++----- >> 1 file changed, 30 insertions(+), 5 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 794196951a04..a14d56e66902 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1706,7 +1706,12 @@ static int identify_page_state(unsigned long pfn, struct page *p, >> return page_action(ps, p, pfn); >> } >> >> -static int try_to_split_thp_page(struct page *page) >> +/* >> + * When 'release' is 'false', it means that if thp split has failed, >> + * there is still more to do, hence the page refcount we took earlier >> + * is still needed. >> + */ >> +static int try_to_split_thp_page(struct page *page, bool release) >> { >> int ret; >> >> @@ -1714,7 +1719,7 @@ static int try_to_split_thp_page(struct page *page) >> ret = split_huge_page(page); >> unlock_page(page); >> >> - if (unlikely(ret)) >> + if (ret && release) >> put_page(page); > Is "unlikely" still needed? I'd say not, because this code is not on performance sensitive code path. >> >> return ret; >> @@ -2187,6 +2192,24 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, >> return rc; >> } >> >> +/* >> + * The calling condition is as such: thp split failed, page might have >> + * been RDMA pinned, not much can be done for recovery. >> + * But a SIGBUS should be delivered with vaddr provided so that the user >> + * application has a chance to recover. Also, application processes' >> + * election for MCE early killed will be honored. >> + */ >> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, >> + struct folio *folio) >> +{ >> + LIST_HEAD(tokill); >> + >> + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); >> + kill_procs(&tokill, true, pfn, flags); >> + >> + return -EHWPOISON; >> +} >> + >> /** >> * memory_failure - Handle memory failure of a page. >> * @pfn: Page Number of the corrupted page >> @@ -2328,8 +2351,10 @@ int memory_failure(unsigned long pfn, int flags) >> * page is a valid handlable page. >> */ >> folio_set_has_hwpoisoned(folio); >> - if (try_to_split_thp_page(p) < 0) { >> - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); >> + if (try_to_split_thp_page(p, false) < 0) { >> + res = kill_procs_now(p, pfn, flags, folio); > No strong opinion but we might remove the return value of kill_procs_now as > it always return -EHWPOISON? We could simply set res to -EHWPOISON here. I like that, will change. > > Besides from above possible nits, this patch looks good to me. > Acked-by: Miaohe Lin <linmiaohe@huawei.com> > Thanks. Thank! -jane > . >
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 794196951a04..a14d56e66902 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1706,7 +1706,12 @@ static int identify_page_state(unsigned long pfn, struct page *p, return page_action(ps, p, pfn); } -static int try_to_split_thp_page(struct page *page) +/* + * When 'release' is 'false', it means that if thp split has failed, + * there is still more to do, hence the page refcount we took earlier + * is still needed. + */ +static int try_to_split_thp_page(struct page *page, bool release) { int ret; @@ -1714,7 +1719,7 @@ static int try_to_split_thp_page(struct page *page) ret = split_huge_page(page); unlock_page(page); - if (unlikely(ret)) + if (ret && release) put_page(page); return ret; @@ -2187,6 +2192,24 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, return rc; } +/* + * The calling condition is as such: thp split failed, page might have + * been RDMA pinned, not much can be done for recovery. + * But a SIGBUS should be delivered with vaddr provided so that the user + * application has a chance to recover. Also, application processes' + * election for MCE early killed will be honored. + */ +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, + struct folio *folio) +{ + LIST_HEAD(tokill); + + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); + kill_procs(&tokill, true, pfn, flags); + + return -EHWPOISON; +} + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -2328,8 +2351,10 @@ int memory_failure(unsigned long pfn, int flags) * page is a valid handlable page. */ folio_set_has_hwpoisoned(folio); - if (try_to_split_thp_page(p) < 0) { - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); + if (try_to_split_thp_page(p, false) < 0) { + res = kill_procs_now(p, pfn, flags, folio); + put_page(p); + action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED); goto unlock_mutex; } VM_BUG_ON_PAGE(!page_count(p), p); @@ -2703,7 +2728,7 @@ static int soft_offline_in_use_page(struct page *page) }; if (!huge && folio_test_large(folio)) { - if (try_to_split_thp_page(page)) { + if (try_to_split_thp_page(page, true)) { pr_info("soft offline: %#lx: thp split failed\n", pfn); return -EBUSY; }
While handling hwpoison in a THP page, it is possible that try_to_split_thp_page() fails. For example, when the THP page has been RDMA pinned. At this point, the kernel cannot isolate the poisoned THP page, all it could do is to send a SIGBUS to the user process with meaningful payload to give user-level recovery a chance. Signed-off-by: Jane Chu <jane.chu@oracle.com> --- mm/memory-failure.c | 35 ++++++++++++++++++++++++++++++----- 1 file changed, 30 insertions(+), 5 deletions(-)