Message ID | 20240510062602.901510-6-jane.chu@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Enhance soft hwpoison handling and injection | expand |
On Fri, May 10, 2024 at 12:26:02AM -0600, Jane Chu wrote: > When handle hwpoison in a RDMA longterm pinned thp page, > try_to_split_thp_page() will fail. And at this point, there is > little else the kernel could do except sending a SIGBUS to > the user process, thus give it a chance to recover. Well, it does need to be a RDMA longterm pinned, right? Anything holding an extra refcount can already make us bite the dust, so I would not make it that specific. > Signed-off-by: Jane Chu <jane.chu@oracle.com> > --- > mm/memory-failure.c | 31 ++++++++++++++++++++++++++----- > 1 file changed, 26 insertions(+), 5 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 2fa884d8b5a3..15bb1c0c42e8 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1697,7 +1697,7 @@ static int identify_page_state(unsigned long pfn, struct page *p, > return page_action(ps, p, pfn); > } > > -static int try_to_split_thp_page(struct page *page) > +static int try_to_split_thp_page(struct page *page, bool release) > { > int ret; > > @@ -1705,7 +1705,7 @@ static int try_to_split_thp_page(struct page *page) > ret = split_huge_page(page); > unlock_page(page); > > - if (unlikely(ret)) > + if (ret && release) > put_page(page); I would document whhen and when not we can release the page. E.g: we cannot release it if there are still processes mapping the thp. > +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, > + struct folio *folio) > +{ > + LIST_HEAD(tokill); > + > + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); > + kill_procs(&tokill, true, pfn, flags); > + > + return -EHWPOISON; You are returning -EHWPOISON here, > +} > + > /** > * memory_failure - Handle memory failure of a page. > * @pfn: Page Number of the corrupted page > @@ -2313,8 +2331,11 @@ int memory_failure(unsigned long pfn, int flags) > * page is a valid handlable page. > */ > folio_set_has_hwpoisoned(folio); > - if (try_to_split_thp_page(p) < 0) { > - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); > + if (try_to_split_thp_page(p, false) < 0) { > + pr_err("%#lx: thp split failed\n", pfn); > + res = kill_procs_now(p, pfn, flags, folio); > + put_page(p); > + res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED); just to overwrite it here with action_result(). Which one do we need? I think we would need -EBUSY here, right? So I would drop the retcode from kill_procs_now. Also, do we want the extra pr_err() here. action_result() will already provide us the pfn and the action_page_types which will be "unsplit thp". Is not that clear enough? I would drop that.
On 5/16/2024 5:47 AM, Oscar Salvador wrote: > On Fri, May 10, 2024 at 12:26:02AM -0600, Jane Chu wrote: >> When handle hwpoison in a RDMA longterm pinned thp page, >> try_to_split_thp_page() will fail. And at this point, there is >> little else the kernel could do except sending a SIGBUS to >> the user process, thus give it a chance to recover. > Well, it does need to be a RDMA longterm pinned, right? > Anything holding an extra refcount can already make us bite the dust, so > I would not make it that specific. How about let me just mention RDMA longterm pin as one of the use cases? To be honest, it is the only known case to me, and not all FOLL_LONGTERM pin lead to THP split failure. > >> Signed-off-by: Jane Chu <jane.chu@oracle.com> >> --- >> mm/memory-failure.c | 31 ++++++++++++++++++++++++++----- >> 1 file changed, 26 insertions(+), 5 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 2fa884d8b5a3..15bb1c0c42e8 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1697,7 +1697,7 @@ static int identify_page_state(unsigned long pfn, struct page *p, >> return page_action(ps, p, pfn); >> } >> >> -static int try_to_split_thp_page(struct page *page) >> +static int try_to_split_thp_page(struct page *page, bool release) >> { >> int ret; >> >> @@ -1705,7 +1705,7 @@ static int try_to_split_thp_page(struct page *page) >> ret = split_huge_page(page); >> unlock_page(page); >> >> - if (unlikely(ret)) >> + if (ret && release) >> put_page(page); > I would document whhen and when not we can release the page. > E.g: we cannot release it if there are still processes mapping the thp. Sure. > >> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, >> + struct folio *folio) >> +{ >> + LIST_HEAD(tokill); >> + >> + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); >> + kill_procs(&tokill, true, pfn, flags); >> + >> + return -EHWPOISON; > You are returning -EHWPOISON here, Yes, indeed. >> +} >> + >> /** >> * memory_failure - Handle memory failure of a page. >> * @pfn: Page Number of the corrupted page >> @@ -2313,8 +2331,11 @@ int memory_failure(unsigned long pfn, int flags) >> * page is a valid handlable page. >> */ >> folio_set_has_hwpoisoned(folio); >> - if (try_to_split_thp_page(p) < 0) { >> - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); >> + if (try_to_split_thp_page(p, false) < 0) { >> + pr_err("%#lx: thp split failed\n", pfn); >> + res = kill_procs_now(p, pfn, flags, folio); >> + put_page(p); >> + res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED); > just to overwrite it here with action_result(). Which one do we need? > I think we would need -EBUSY here, right? So I would drop the retcode > from kill_procs_now. The overwrite was wrong, it should return -EHWPOISON to indicate to the caller (such as kill_me_maybe) that no further action against the process is needed since the m-f() handler has killed the process. > > Also, do we want the extra pr_err() here. > action_result() will already provide us the pfn and the > action_page_types which will be "unsplit thp". Is not that clear enough? > > I would drop that. Agreed, will drop the extra print. thanks! -jane > >
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 2fa884d8b5a3..15bb1c0c42e8 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1697,7 +1697,7 @@ static int identify_page_state(unsigned long pfn, struct page *p, return page_action(ps, p, pfn); } -static int try_to_split_thp_page(struct page *page) +static int try_to_split_thp_page(struct page *page, bool release) { int ret; @@ -1705,7 +1705,7 @@ static int try_to_split_thp_page(struct page *page) ret = split_huge_page(page); unlock_page(page); - if (unlikely(ret)) + if (ret && release) put_page(page); return ret; @@ -2177,6 +2177,24 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, return rc; } +/* + * The calling condition is as such: thp split failed, page might have + * been RDMA pinned, not much can be done for recovery. + * But a SIGBUS should be delivered with vaddr provided so that the user + * application has a chance to recover. Also, application processes' + * election for MCE early killed will be honored. + */ +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, + struct folio *folio) +{ + LIST_HEAD(tokill); + + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); + kill_procs(&tokill, true, pfn, flags); + + return -EHWPOISON; +} + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -2313,8 +2331,11 @@ int memory_failure(unsigned long pfn, int flags) * page is a valid handlable page. */ folio_set_has_hwpoisoned(folio); - if (try_to_split_thp_page(p) < 0) { - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); + if (try_to_split_thp_page(p, false) < 0) { + pr_err("%#lx: thp split failed\n", pfn); + res = kill_procs_now(p, pfn, flags, folio); + put_page(p); + res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED); goto unlock_mutex; } VM_BUG_ON_PAGE(!page_count(p), p); @@ -2688,7 +2709,7 @@ static int soft_offline_in_use_page(struct page *page) }; if (!huge && folio_test_large(folio)) { - if (try_to_split_thp_page(page)) { + if (try_to_split_thp_page(page, true)) { pr_info("soft offline: %#lx: thp split failed\n", pfn); return -EBUSY; }
When handle hwpoison in a RDMA longterm pinned thp page, try_to_split_thp_page() will fail. And at this point, there is little else the kernel could do except sending a SIGBUS to the user process, thus give it a chance to recover. Signed-off-by: Jane Chu <jane.chu@oracle.com> --- mm/memory-failure.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-)