Message ID | 20230707201904.953262-3-jiaqiyan@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Improve hugetlbfs read on HWPOISON hugepages | expand |
On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
This is incorrect naming. "subpage" was needed before we had the
folio concept, but it should not be used any more. We have folios
and pages now.
Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
hwp is already used as an acronym by acpi, intel_pstate, some clock
drivers, an ethernet driver, and a scsi driver.
On Fri, 7 Jul 2023 21:31:39 +0100 Matthew Wilcox <willy@infradead.org> wrote: > On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote: > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a > > This is incorrect naming. "subpage" was needed before we had the > folio concept, but it should not be used any more. We have folios > and pages now. > > Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me. > hwp is already used as an acronym by acpi, intel_pstate, some clock > drivers, an ethernet driver, and a scsi driver. Thanks, Matthew. I'll merge v3 so we hit next -next (which actually won't be until Monday), and add a note that v4 is expected.
On 2023/7/8 4:19, Jiaqi Yan wrote: > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a > hugetlb folio is a raw HWPOISON page. This functionality relies on > RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list > becomes meaningless. > > is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize > with __get_huge_page_for_hwpoison, who iterates and inserts an entry to > raw_hwp_list. llist itself doesn't ensure insertion is synchornized with > the iterating used by __is_raw_hwp_list. Caller can minimize the > overhead of lock cycles by first checking if folio / head page's > HWPOISON flag is set. > > Exports this functionality to be immediately used in the read operation > for hugetlbfs. > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> > --- > include/linux/hugetlb.h | 19 +++++++++++++++++++ > include/linux/mm.h | 7 +++++++ > mm/hugetlb.c | 10 ++++++++++ > mm/memory-failure.c | 34 ++++++++++++++++++++++++---------- > 4 files changed, 60 insertions(+), 10 deletions(-) > ... > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio) > +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage) > { > - return (struct llist_head *)&folio->_hugetlb_hwpoison; > + struct llist_head *raw_hwp_head; > + struct raw_hwp_page *p, *tmp; > + bool ret = false; > + > + if (!folio_test_hwpoison(folio)) > + return false; > + > + /* > + * When RawHwpUnreliable is set, kernel lost track of which subpages > + * are HWPOISON. So return as if ALL subpages are HWPOISONed. > + */ > + if (folio_test_hugetlb_raw_hwp_unreliable(folio)) > + return true; > + > + raw_hwp_head = raw_hwp_list_head(folio); > + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) { Since we don't free the raw_hwp_list, does llist_for_each_entry works same as llist_for_each_entry_safe? > + if (subpage == p->page) { > + ret = true; > + break; > + } > + } > + > + return ret; > } It seems there's a race between __is_raw_hwp_subpage and unpoison_memory: unpoison_memory __is_raw_hwp_subpage if (!folio_test_hwpoison(folio)) -- hwpoison is set folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list llist_del_all .. folio_test_clear_hwpoison But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a folio_mapping == NULL check before folio_free_raw_hwp. Anyway, this patch looks good to me. Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Thanks.
On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote: > On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote: > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a > > This is incorrect naming. "subpage" was needed before we had the > folio concept, but it should not be used any more. We have folios > and pages now. I think we can address the raw hwpoison page by the offset in folio/hugepage to eliminate the concept of "subpage". > > Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me. > hwp is already used as an acronym by acpi, intel_pstate, some clock > drivers, an ethernet driver, and a scsi driver. I originally introduced the abbreviation "hwp" to avoid using a lengthy function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()." Therefore, I prefer using "rawhwp" instead of a longer form like "raw_hwpoison," although I don't expect any confusion between "hwp" and "rawhwp." As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c, we can easily substitute it with "hwpoison_walk." Thanks, Naoya Horiguchi
On Sun, Jul 9, 2023 at 5:21 PM Naoya Horiguchi <naoya.horiguchi@linux.dev> wrote: > > On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote: > > On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote: > > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a > > > > This is incorrect naming. "subpage" was needed before we had the > > folio concept, but it should not be used any more. We have folios > > and pages now. > Thanks for your comment, Matthew. > I think we can address the raw hwpoison page by the offset in folio/hugepage > to eliminate the concept of "subpage". > > > > > Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me. > > hwp is already used as an acronym by acpi, intel_pstate, some clock > > drivers, an ethernet driver, and a scsi driver. > > I originally introduced the abbreviation "hwp" to avoid using a lengthy > function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()." > Therefore, I prefer using "rawhwp" instead of a longer form like > "raw_hwpoison," although I don't expect any confusion between "hwp" and > "rawhwp." These are names in my mind, what do you think? * is_rawhwp_page_in_hugepage * is_raw_hwpoison_page_in_hugepage // I prefer this one * folio_test_hugetlb_raw_hwpoison_page > As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c, > we can easily substitute it with "hwpoison_walk." In this "hwp_walk" case, I also prefer "hwpoison" than "hwp". I can create a separate renaming patch. > > Thanks, > Naoya Horiguchi
On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote: > > On 2023/7/8 4:19, Jiaqi Yan wrote: > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a > > hugetlb folio is a raw HWPOISON page. This functionality relies on > > RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list > > becomes meaningless. > > > > is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize > > with __get_huge_page_for_hwpoison, who iterates and inserts an entry to > > raw_hwp_list. llist itself doesn't ensure insertion is synchornized with > > the iterating used by __is_raw_hwp_list. Caller can minimize the > > overhead of lock cycles by first checking if folio / head page's > > HWPOISON flag is set. > > > > Exports this functionality to be immediately used in the read operation > > for hugetlbfs. > > > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> > > --- > > include/linux/hugetlb.h | 19 +++++++++++++++++++ > > include/linux/mm.h | 7 +++++++ > > mm/hugetlb.c | 10 ++++++++++ > > mm/memory-failure.c | 34 ++++++++++++++++++++++++---------- > > 4 files changed, 60 insertions(+), 10 deletions(-) > > ... > > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio) > > +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage) > > { > > - return (struct llist_head *)&folio->_hugetlb_hwpoison; > > + struct llist_head *raw_hwp_head; > > + struct raw_hwp_page *p, *tmp; > > + bool ret = false; > > + > > + if (!folio_test_hwpoison(folio)) > > + return false; > > + > > + /* > > + * When RawHwpUnreliable is set, kernel lost track of which subpages > > + * are HWPOISON. So return as if ALL subpages are HWPOISONed. > > + */ > > + if (folio_test_hugetlb_raw_hwp_unreliable(folio)) > > + return true; > > + > > + raw_hwp_head = raw_hwp_list_head(folio); > > + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) { > > Since we don't free the raw_hwp_list, does llist_for_each_entry works same as llist_for_each_entry_safe? > > > + if (subpage == p->page) { > > + ret = true; > > + break; > > + } > > + } > > + > > + return ret; > > } > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory: > unpoison_memory __is_raw_hwp_subpage > if (!folio_test_hwpoison(folio)) -- hwpoison is set > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list > llist_del_all .. > folio_test_clear_hwpoison > Thanks Miaohe for raising this concern. > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a > folio_mapping == NULL check before folio_free_raw_hwp. I agree. But in near future I do want to make __is_raw_hwp_subpage work for shared-mapping hugetlb, so it would be nice to work with unpoison_memory. It doesn't seem to me that holding mf_mutex in __is_raw_hwp_subpage is nice or even absolutely correct. Let me think if I can come up with something in v4. > > Anyway, this patch looks good to me. > > Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> > Thanks. >
On Mon, Jul 10, 2023 at 08:11:48AM -0700, Jiaqi Yan wrote: > On Sun, Jul 9, 2023 at 5:21 PM Naoya Horiguchi > <naoya.horiguchi@linux.dev> wrote: > > > > On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote: > > > On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote: > > > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a > > > > > > This is incorrect naming. "subpage" was needed before we had the > > > folio concept, but it should not be used any more. We have folios > > > and pages now. > > > > Thanks for your comment, Matthew. > > > I think we can address the raw hwpoison page by the offset in folio/hugepage > > to eliminate the concept of "subpage". > > > > > > > > Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me. > > > hwp is already used as an acronym by acpi, intel_pstate, some clock > > > drivers, an ethernet driver, and a scsi driver. > > > > I originally introduced the abbreviation "hwp" to avoid using a lengthy > > function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()." > > Therefore, I prefer using "rawhwp" instead of a longer form like > > "raw_hwpoison," although I don't expect any confusion between "hwp" and > > "rawhwp." > > These are names in my mind, what do you think? > * is_rawhwp_page_in_hugepage > * is_raw_hwpoison_page_in_hugepage // I prefer this one This one is fine to me. > * folio_test_hugetlb_raw_hwpoison_page > > > As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c, > > we can easily substitute it with "hwpoison_walk." > > In this "hwp_walk" case, I also prefer "hwpoison" than "hwp". I can > create a separate renaming patch. Great, thank you. - Naoya Horiguchi
On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <jiaqiyan@google.com> wrote: > > On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote: > > > > On 2023/7/8 4:19, Jiaqi Yan wrote: > > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a > > > hugetlb folio is a raw HWPOISON page. This functionality relies on > > > RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list > > > becomes meaningless. > > > > > > is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize > > > with __get_huge_page_for_hwpoison, who iterates and inserts an entry to > > > raw_hwp_list. llist itself doesn't ensure insertion is synchornized with > > > the iterating used by __is_raw_hwp_list. Caller can minimize the > > > overhead of lock cycles by first checking if folio / head page's > > > HWPOISON flag is set. > > > > > > Exports this functionality to be immediately used in the read operation > > > for hugetlbfs. > > > > > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > > > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> > > > --- > > > include/linux/hugetlb.h | 19 +++++++++++++++++++ > > > include/linux/mm.h | 7 +++++++ > > > mm/hugetlb.c | 10 ++++++++++ > > > mm/memory-failure.c | 34 ++++++++++++++++++++++++---------- > > > 4 files changed, 60 insertions(+), 10 deletions(-) > > > ... > > > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio) > > > +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage) > > > { > > > - return (struct llist_head *)&folio->_hugetlb_hwpoison; > > > + struct llist_head *raw_hwp_head; > > > + struct raw_hwp_page *p, *tmp; > > > + bool ret = false; > > > + > > > + if (!folio_test_hwpoison(folio)) > > > + return false; > > > + > > > + /* > > > + * When RawHwpUnreliable is set, kernel lost track of which subpages > > > + * are HWPOISON. So return as if ALL subpages are HWPOISONed. > > > + */ > > > + if (folio_test_hugetlb_raw_hwp_unreliable(folio)) > > > + return true; > > > + > > > + raw_hwp_head = raw_hwp_list_head(folio); > > > + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) { > > > > Since we don't free the raw_hwp_list, does llist_for_each_entry works same as llist_for_each_entry_safe? Sorry I missed this comment. Yes they are the same but llist_for_each_entry doesn't need "tmp". I will switch to llist_for_each_entry in v4. > > > > > > + if (subpage == p->page) { > > > + ret = true; > > > + break; > > > + } > > > + } > > > + > > > + return ret; > > > } > > > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory: > > unpoison_memory __is_raw_hwp_subpage > > if (!folio_test_hwpoison(folio)) -- hwpoison is set > > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list > > llist_del_all .. > > folio_test_clear_hwpoison > > > > Thanks Miaohe for raising this concern. > > > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a > > folio_mapping == NULL check before folio_free_raw_hwp. > > I agree. But in near future I do want to make __is_raw_hwp_subpage > work for shared-mapping hugetlb, so it would be nice to work with > unpoison_memory. It doesn't seem to me that holding mf_mutex in > __is_raw_hwp_subpage is nice or even absolutely correct. Let me think > if I can come up with something in v4. At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex before llist_for_each_entry, it will introduce a deadlock: unpoison_memory __is_raw_hwp_subpage held mf_mutex held hugetlb_lock get_hwpoison_hugetlb_folio attempts mf_mutex attempts hugetlb lock Not for this patch series, but for future, is it a good idea to make mf_mutex available to hugetlb code? Then enforce the order of locking to be mf_mutex first, hugetlb_lock second? I believe this is the current locking pattern / order for try_memory_failure_hugetlb. > > > > > Anyway, this patch looks good to me. > > > > Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> > > Thanks. > >
On 07/11/23 10:05, Jiaqi Yan wrote: > On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <jiaqiyan@google.com> wrote: > > On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote: > > > On 2023/7/8 4:19, Jiaqi Yan wrote: > > > > > > > + if (subpage == p->page) { > > > > + ret = true; > > > > + break; > > > > + } > > > > + } > > > > + > > > > + return ret; > > > > } > > > > > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory: > > > unpoison_memory __is_raw_hwp_subpage > > > if (!folio_test_hwpoison(folio)) -- hwpoison is set > > > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list > > > llist_del_all .. > > > folio_test_clear_hwpoison > > > > > > > Thanks Miaohe for raising this concern. > > > > > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a > > > folio_mapping == NULL check before folio_free_raw_hwp. > > > > I agree. But in near future I do want to make __is_raw_hwp_subpage > > work for shared-mapping hugetlb, so it would be nice to work with > > unpoison_memory. It doesn't seem to me that holding mf_mutex in > > __is_raw_hwp_subpage is nice or even absolutely correct. Let me think > > if I can come up with something in v4. > > At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex > before llist_for_each_entry, it will introduce a deadlock: > > unpoison_memory __is_raw_hwp_subpage > held mf_mutex held hugetlb_lock > get_hwpoison_hugetlb_folio attempts mf_mutex > attempts hugetlb lock > > Not for this patch series, but for future, is it a good idea to make > mf_mutex available to hugetlb code? Then enforce the order of locking > to be mf_mutex first, hugetlb_lock second? I believe this is the > current locking pattern / order for try_memory_failure_hugetlb. I think only holding mf_mutex in __is_raw_hwp_subpage would be sufficient to prevent races with unpoison_memory. memory failure code needs to take both mf_mutex and hugetlb_lock. The hugetlb lock is to prevent hugetlb page state changes. IIUC, __is_raw_hwp_subpage is only taking hugetlb_lock to prevent races with memory failure code. Of course, I could be missing something as there are subtle issues with locking in the memory failure code.
On Tue, Jul 11, 2023 at 11:02 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 07/11/23 10:05, Jiaqi Yan wrote: > > On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <jiaqiyan@google.com> wrote: > > > On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote: > > > > On 2023/7/8 4:19, Jiaqi Yan wrote: > > > > > > > > > + if (subpage == p->page) { > > > > > + ret = true; > > > > > + break; > > > > > + } > > > > > + } > > > > > + > > > > > + return ret; > > > > > } > > > > > > > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory: > > > > unpoison_memory __is_raw_hwp_subpage > > > > if (!folio_test_hwpoison(folio)) -- hwpoison is set > > > > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list > > > > llist_del_all .. > > > > folio_test_clear_hwpoison > > > > > > > > > > Thanks Miaohe for raising this concern. > > > > > > > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a > > > > folio_mapping == NULL check before folio_free_raw_hwp. > > > > > > I agree. But in near future I do want to make __is_raw_hwp_subpage > > > work for shared-mapping hugetlb, so it would be nice to work with > > > unpoison_memory. It doesn't seem to me that holding mf_mutex in > > > __is_raw_hwp_subpage is nice or even absolutely correct. Let me think > > > if I can come up with something in v4. > > > > At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex > > before llist_for_each_entry, it will introduce a deadlock: > > > > unpoison_memory __is_raw_hwp_subpage > > held mf_mutex held hugetlb_lock > > get_hwpoison_hugetlb_folio attempts mf_mutex > > attempts hugetlb lock > > > > Not for this patch series, but for future, is it a good idea to make > > mf_mutex available to hugetlb code? Then enforce the order of locking > > to be mf_mutex first, hugetlb_lock second? I believe this is the > > current locking pattern / order for try_memory_failure_hugetlb. > > I think only holding mf_mutex in __is_raw_hwp_subpage would be sufficient > to prevent races with unpoison_memory. memory failure code needs to take > both mf_mutex and hugetlb_lock. The hugetlb lock is to prevent hugetlb > page state changes. IIUC, __is_raw_hwp_subpage is only taking hugetlb_lock > to prevent races with memory failure code. Thanks Mike, I think mf_mutex is a simple and correct solution. I will drop hugetlb_lock from __is_raw_hwp_subpage in v4. > > Of course, I could be missing something as there are subtle issues with > locking in the memory failure code. > -- > Mike Kravetz
On 2023/7/12 2:01, Mike Kravetz wrote: > On 07/11/23 10:05, Jiaqi Yan wrote: >> On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <jiaqiyan@google.com> wrote: >>> On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote: >>>> On 2023/7/8 4:19, Jiaqi Yan wrote: >>>> >>>>> + if (subpage == p->page) { >>>>> + ret = true; >>>>> + break; >>>>> + } >>>>> + } >>>>> + >>>>> + return ret; >>>>> } >>>> >>>> It seems there's a race between __is_raw_hwp_subpage and unpoison_memory: >>>> unpoison_memory __is_raw_hwp_subpage >>>> if (!folio_test_hwpoison(folio)) -- hwpoison is set >>>> folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list >>>> llist_del_all .. >>>> folio_test_clear_hwpoison >>>> >>> >>> Thanks Miaohe for raising this concern. >>> >>>> But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a >>>> folio_mapping == NULL check before folio_free_raw_hwp. >>> >>> I agree. But in near future I do want to make __is_raw_hwp_subpage >>> work for shared-mapping hugetlb, so it would be nice to work with >>> unpoison_memory. It doesn't seem to me that holding mf_mutex in >>> __is_raw_hwp_subpage is nice or even absolutely correct. Let me think >>> if I can come up with something in v4. >> >> At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex >> before llist_for_each_entry, it will introduce a deadlock: >> >> unpoison_memory __is_raw_hwp_subpage >> held mf_mutex held hugetlb_lock >> get_hwpoison_hugetlb_folio attempts mf_mutex >> attempts hugetlb lock >> >> Not for this patch series, but for future, is it a good idea to make >> mf_mutex available to hugetlb code? Then enforce the order of locking >> to be mf_mutex first, hugetlb_lock second? I believe this is the >> current locking pattern / order for try_memory_failure_hugetlb. > > I think only holding mf_mutex in __is_raw_hwp_subpage would be sufficient > to prevent races with unpoison_memory. memory failure code needs to take Since soft_offline_page, memory_failure and unpoison_memory both holds mf_mutex, I think this should be enough to prevent races between them too. Thanks.
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ca3c8e10f24a..4a745af98525 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1007,6 +1007,25 @@ void hugetlb_register_node(struct node *node); void hugetlb_unregister_node(struct node *node); #endif +/* + * Struct raw_hwp_page represents information about "raw error page", + * constructing singly linked list from ->_hugetlb_hwpoison field of folio. + */ +struct raw_hwp_page { + struct llist_node node; + struct page *page; +}; + +static inline struct llist_head *raw_hwp_list_head(struct folio *folio) +{ + return (struct llist_head *)&folio->_hugetlb_hwpoison; +} + +/* + * Check if a given raw @subpage in a hugepage @folio is HWPOISON. + */ +bool is_raw_hwp_subpage(struct folio *folio, struct page *subpage); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; diff --git a/include/linux/mm.h b/include/linux/mm.h index 74f1be743ba2..edaa18b6f731 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3680,6 +3680,7 @@ extern const struct attribute_group memory_failure_attr_group; extern void memory_failure_queue(unsigned long pfn, int flags); extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags, bool *migratable_cleared); +extern bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage); void num_poisoned_pages_inc(unsigned long pfn); void num_poisoned_pages_sub(unsigned long pfn, long i); struct task_struct *task_early_kill(struct task_struct *tsk, int force_early); @@ -3694,6 +3695,12 @@ static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags, return 0; } +static inline bool __is_raw_hwp_subpage(struct folio *folio, + struct page *subpage) +{ + return false; +} + static inline void num_poisoned_pages_inc(unsigned long pfn) { } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bce28cca73a1..9c608d2f6630 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7373,6 +7373,16 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags, return ret; } +bool is_raw_hwp_subpage(struct folio *folio, struct page *subpage) +{ + bool ret; + + spin_lock_irq(&hugetlb_lock); + ret = __is_raw_hwp_subpage(folio, subpage); + spin_unlock_irq(&hugetlb_lock); + return ret; +} + void folio_putback_active_hugetlb(struct folio *folio) { spin_lock_irq(&hugetlb_lock); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a08677dcf953..5b6c8ceb13c0 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1813,18 +1813,32 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs); #endif /* CONFIG_FS_DAX */ #ifdef CONFIG_HUGETLB_PAGE -/* - * Struct raw_hwp_page represents information about "raw error page", - * constructing singly linked list from ->_hugetlb_hwpoison field of folio. - */ -struct raw_hwp_page { - struct llist_node node; - struct page *page; -}; -static inline struct llist_head *raw_hwp_list_head(struct folio *folio) +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage) { - return (struct llist_head *)&folio->_hugetlb_hwpoison; + struct llist_head *raw_hwp_head; + struct raw_hwp_page *p, *tmp; + bool ret = false; + + if (!folio_test_hwpoison(folio)) + return false; + + /* + * When RawHwpUnreliable is set, kernel lost track of which subpages + * are HWPOISON. So return as if ALL subpages are HWPOISONed. + */ + if (folio_test_hugetlb_raw_hwp_unreliable(folio)) + return true; + + raw_hwp_head = raw_hwp_list_head(folio); + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) { + if (subpage == p->page) { + ret = true; + break; + } + } + + return ret; } static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)