Message ID | 20230414194832.973194-1-tsahu@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/folio: Avoid special handling for order value 0 in folio_set_order | expand |
On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order > folio does not have any tail page to set order. I think you're missing the point of how folio_set_order() is used. When splitting a large folio, we need to zero out the folio_nr_pages in the tail, so it does have a tail page, and that tail page needs to be zeroed. We even assert that there is a tail page: if (WARN_ON_ONCE(!folio_test_large(folio))) return; Or maybe you need to explain yourself better. > folio->_folio_nr_pages is > set to 0 for order 0 in folio_set_order. It is required because > _folio_nr_pages overlapped with page->mapping and leaving it non zero > caused "bad page" error while freeing gigantic hugepages. This was fixed in > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA > pages to CMA") now explicitly clear page->mapping and hence we won't see > the bad page error even if _folio_nr_pages remains unset. Also the order 0 > folios are not supposed to call folio_set_order, So now we can get rid of > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. ... this is all very confusing. > The patch also moves _folio_set_head and folio_set_order calls in > __prep_compound_gigantic_folio() such that we avoid clearing them in the > error path. But don't we need those bits set while we operate on the folio to set it up? It makes me nervous if we don't have those bits set because we can end up with speculative references that point to a head page while that page is not marked as a head page. It may not be a problem, but I want to see some air-tight analysis of that. > Testing: I have run LTP tests, which all passes. and also I have written > the test in LTP which tests the bug caused by compound_nr and page->mapping > overlapping. > > https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ > > Running on older kernel ( < 5.10-rc7) with the above bug this fails while > on newer kernel and, also with this patch it passes. > > Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> > --- > mm/hugetlb.c | 9 +++------ > mm/internal.h | 8 ++------ > 2 files changed, 5 insertions(+), 12 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..e2540269c1dc 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, > set_page_refcounted(p); > } > > - folio_set_order(folio, 0); > __folio_clear_head(folio); > } > > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > struct page *p; > > __folio_clear_reserved(folio); > - __folio_set_head(folio); > - /* we rely on prep_new_hugetlb_folio to set the destructor */ > - folio_set_order(folio, order); > for (i = 0; i < nr_pages; i++) { > p = folio_page(folio, i); > > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > if (i != 0) > set_compound_head(p, &folio->page); > } > + __folio_set_head(folio); > + /* we rely on prep_new_hugetlb_folio to set the destructor */ > + folio_set_order(folio, order); > atomic_set(&folio->_entire_mapcount, -1); > atomic_set(&folio->_nr_pages_mapped, 0); > atomic_set(&folio->_pincount, 0); > @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > p = folio_page(folio, j); > __ClearPageReserved(p); > } > - folio_set_order(folio, 0); > - __folio_clear_head(folio); > return false; > } > > diff --git a/mm/internal.h b/mm/internal.h > index 18cda26b8a92..0d96a3bc1d58 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, > */ > static inline void folio_set_order(struct folio *folio, unsigned int order) > { > - if (WARN_ON_ONCE(!folio_test_large(folio))) > + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) > return; > > folio->_folio_order = order; > #ifdef CONFIG_64BIT > - /* > - * When hugetlb dissolves a folio, we need to clear the tail > - * page, rather than setting nr_pages to 1. > - */ > - folio->_folio_nr_pages = order ? 1U << order : 0; > + folio->_folio_nr_pages = 1U << order; > #endif > } > > -- > 2.31.1 >
On 4/14/23 12:48 PM, Tarun Sahu wrote: > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order > folio does not have any tail page to set order. folio->_folio_nr_pages is > set to 0 for order 0 in folio_set_order. It is required because In the previous discussion of this function, Mike mentioned having folio_set_order() be used for non-zero orders and adding a folio_clear_order() that is used to set order to 0. This could be done to reduce confusion. > _folio_nr_pages overlapped with page->mapping and leaving it non zero > caused "bad page" error while freeing gigantic hugepages. This was fixed in > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA > pages to CMA") now explicitly clear page->mapping and hence we won't see > the bad page error even if _folio_nr_pages remains unset. Also the order 0 > folios are not supposed to call folio_set_order, So now we can get rid of > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. > > The patch also moves _folio_set_head and folio_set_order calls in > __prep_compound_gigantic_folio() such that we avoid clearing them in the > error path. > > Testing: I have run LTP tests, which all passes. and also I have written > the test in LTP which tests the bug caused by compound_nr and page->mapping > overlapping. > > https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ > > Running on older kernel ( < 5.10-rc7) with the above bug this fails while > on newer kernel and, also with this patch it passes. > > Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> > --- > mm/hugetlb.c | 9 +++------ > mm/internal.h | 8 ++------ > 2 files changed, 5 insertions(+), 12 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..e2540269c1dc 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, > set_page_refcounted(p); > } > > - folio_set_order(folio, 0); > __folio_clear_head(folio); > } > > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > struct page *p; > > __folio_clear_reserved(folio); > - __folio_set_head(folio); > - /* we rely on prep_new_hugetlb_folio to set the destructor */ > - folio_set_order(folio, order); > for (i = 0; i < nr_pages; i++) { > p = folio_page(folio, i); > > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > if (i != 0) > set_compound_head(p, &folio->page); > } calling set_compound_head() for the tail page before the folio has the head flag set could seem misleading. At this point order is not set as well so it is not clear that the folio is a compound page folio. > + __folio_set_head(folio); > + /* we rely on prep_new_hugetlb_folio to set the destructor */ > + folio_set_order(folio, order); > atomic_set(&folio->_entire_mapcount, -1); > atomic_set(&folio->_nr_pages_mapped, 0); > atomic_set(&folio->_pincount, 0); > @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > p = folio_page(folio, j); > __ClearPageReserved(p); > } > - folio_set_order(folio, 0); > - __folio_clear_head(folio); > return false; > } > > diff --git a/mm/internal.h b/mm/internal.h > index 18cda26b8a92..0d96a3bc1d58 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, > */ > static inline void folio_set_order(struct folio *folio, unsigned int order) > { > - if (WARN_ON_ONCE(!folio_test_large(folio))) > + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) > return; > > folio->_folio_order = order; > #ifdef CONFIG_64BIT > - /* > - * When hugetlb dissolves a folio, we need to clear the tail > - * page, rather than setting nr_pages to 1. > - */ > - folio->_folio_nr_pages = order ? 1U << order : 0; > + folio->_folio_nr_pages = 1U << order; > #endif > } >
Matthew Wilcox <willy@infradead.org> writes: Hi Mathew, Thanks for reviewing. please find my comments inline. > On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: >> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >> folio does not have any tail page to set order. > > I think you're missing the point of how folio_set_order() is used. > When splitting a large folio, we need to zero out the folio_nr_pages > in the tail, so it does have a tail page, and that tail page needs to > be zeroed. We even assert that there is a tail page: > > if (WARN_ON_ONCE(!folio_test_large(folio))) > return; > > Or maybe you need to explain yourself better. > Yes, I understand, folio_set_order(order, 0) is called to clear out tail pages folio_order/folio_nr_pages. With this patch, I am trying to convey two things:- 1. It is not necessary to clear out these field if page->mapping is being explicitly updated. I explain this below [EXP]. 2. folio_set_order(order, 0) now currently being used to clear folio_order and folio_nr_pages which is ok. But looking at folio_set_order(folio, 0) is confusing as setting order 0 implies that only 1 page in folio. and folio_order and folio_nr_pages are part of first tail page. IIRC, there was a discussion to use folio_clear_order to avoid such confusion. But if above point 1 deemed to be correct, there will not be any need of this too. **[EXP]** IIUC, during splitting, page->mapping is updated explicitly for tail pages. There is no code path I see, where folio_set_order(order, 0) or set_compound_order(head, 0) is called except below two places. 1. __destroy_compound_gigantic_folio Here, in past there was a problem when struct page used to have compound_nr field which used to overlap with page->mapping. So while freeing, it was necessary to explicitly clear out compound_nr. Which was taken care by Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic pages"). But after, Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA"), page->mapping has explicitly been cleared out for all tail pages. for (i = 1; i < nr_pages; i++) { p = folio_page(folio, i); p->mapping = NULL; <======== (Here) clear_compound_head(p); if (!demote) set_page_refcounted(p); } folio_set_order(folio, 0); <== this line can be removed. 2. __prep_compound_gigantic_folio Here, folio_set_order(folio, 0) is called in error path only. which can be avoided if we call folio_set_order(folio, order) after the for loop. I am new to memory allocators. But as far as I could understood by looking at past discussion around this function [1][2], During RCU grace period there could be a race condition causing ref count inflation. But IIUC, that doesn't have any dependency on newly allocated gigantic page except that the ref count might be taken by folio_ref_try_add_rcu for the same page/s which will cause prep_compound_gigantic_folio to fail. So IMHO, it will be ok to move __folio_set_head and folio_set_order after the for loop. Here, Just for reference, below I copy pasted the *for loop*, from before, I am moving these two calls to after this. for (i = 0; i < nr_pages; i++) { p = folio_page(folio, i); if (i != 0) /* head page cleared above */ __ClearPageReserved(p); if (!demote) { if (!page_ref_freeze(p, 1)) { pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); goto out_error; } } else { VM_BUG_ON_PAGE(page_count(p), p); } if (i != 0) set_compound_head(p, &folio->page); } I also tested it with triggering demotion of gigantic hugepages to PMD hugepages. $ echo 5 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages 5 $ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 0 $ echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote $ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 512 I am quite new to field. Please correct me if I understood it differently than it is. Also if I didn't consider other code path for its consideration. [1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/ [2] https://lore.kernel.org/all/20210622021423.154662-3-mike.kravetz@oracle.com/T/#u >> folio->_folio_nr_pages is >> set to 0 for order 0 in folio_set_order. It is required because >> _folio_nr_pages overlapped with page->mapping and leaving it non zero >> caused "bad page" error while freeing gigantic hugepages. This was fixed in >> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >> pages to CMA") now explicitly clear page->mapping and hence we won't see >> the bad page error even if _folio_nr_pages remains unset. Also the order 0 >> folios are not supposed to call folio_set_order, So now we can get rid of >> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. > > ... this is all very confusing. > Sorry, for this. Lemme know if above explanation [EXP] is clear. >> The patch also moves _folio_set_head and folio_set_order calls in >> __prep_compound_gigantic_folio() such that we avoid clearing them in the >> error path. > > But don't we need those bits set while we operate on the folio to set it > up? It makes me nervous if we don't have those bits set because we can > end up with speculative references that point to a head page while that > page is not marked as a head page. It may not be a problem, but I want > to see some air-tight analysis of that. > >> Testing: I have run LTP tests, which all passes. and also I have written >> the test in LTP which tests the bug caused by compound_nr and page->mapping >> overlapping. >> >> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ >> >> Running on older kernel ( < 5.10-rc7) with the above bug this fails while >> on newer kernel and, also with this patch it passes. >> >> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> >> --- >> mm/hugetlb.c | 9 +++------ >> mm/internal.h | 8 ++------ >> 2 files changed, 5 insertions(+), 12 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index f16b25b1a6b9..e2540269c1dc 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, >> set_page_refcounted(p); >> } >> >> - folio_set_order(folio, 0); >> __folio_clear_head(folio); >> } >> >> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> struct page *p; >> >> __folio_clear_reserved(folio); >> - __folio_set_head(folio); >> - /* we rely on prep_new_hugetlb_folio to set the destructor */ >> - folio_set_order(folio, order); >> for (i = 0; i < nr_pages; i++) { >> p = folio_page(folio, i); >> >> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> if (i != 0) >> set_compound_head(p, &folio->page); >> } >> + __folio_set_head(folio); >> + /* we rely on prep_new_hugetlb_folio to set the destructor */ >> + folio_set_order(folio, order); >> atomic_set(&folio->_entire_mapcount, -1); >> atomic_set(&folio->_nr_pages_mapped, 0); >> atomic_set(&folio->_pincount, 0); >> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> p = folio_page(folio, j); >> __ClearPageReserved(p); >> } >> - folio_set_order(folio, 0); >> - __folio_clear_head(folio); >> return false; >> } >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 18cda26b8a92..0d96a3bc1d58 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, >> */ >> static inline void folio_set_order(struct folio *folio, unsigned int order) >> { >> - if (WARN_ON_ONCE(!folio_test_large(folio))) >> + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) >> return; >> >> folio->_folio_order = order; >> #ifdef CONFIG_64BIT >> - /* >> - * When hugetlb dissolves a folio, we need to clear the tail >> - * page, rather than setting nr_pages to 1. >> - */ >> - folio->_folio_nr_pages = order ? 1U << order : 0; >> + folio->_folio_nr_pages = 1U << order; >> #endif >> } >> >> -- >> 2.31.1 >>
Hi Sidhartha, Thanks for your inputs, please find my comments inline > On 4/14/23 12:48 PM, Tarun Sahu wrote: >> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >> folio does not have any tail page to set order. folio->_folio_nr_pages is >> set to 0 for order 0 in folio_set_order. It is required because > > In the previous discussion of this function, Mike mentioned having > folio_set_order() be used for non-zero orders and adding a > folio_clear_order() that is used to set order to 0. This could be done > to reduce confusion. > Yes, I agree, I replied to Mathew reply to this thread, Lemme know your thought on this. In this patch, I proposed that there won't be need of folio_clear_order if folio_set_order(folio, 0) is not needed with minor changes in code path. >> _folio_nr_pages overlapped with page->mapping and leaving it non zero >> caused "bad page" error while freeing gigantic hugepages. This was fixed in >> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >> pages to CMA") now explicitly clear page->mapping and hence we won't see >> the bad page error even if _folio_nr_pages remains unset. Also the order 0 >> folios are not supposed to call folio_set_order, So now we can get rid of >> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. >> >> The patch also moves _folio_set_head and folio_set_order calls in >> __prep_compound_gigantic_folio() such that we avoid clearing them in the >> error path. >> >> Testing: I have run LTP tests, which all passes. and also I have written >> the test in LTP which tests the bug caused by compound_nr and page->mapping >> overlapping. >> >> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ >> >> Running on older kernel ( < 5.10-rc7) with the above bug this fails while >> on newer kernel and, also with this patch it passes. >> >> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> >> --- >> mm/hugetlb.c | 9 +++------ >> mm/internal.h | 8 ++------ >> 2 files changed, 5 insertions(+), 12 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index f16b25b1a6b9..e2540269c1dc 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, >> set_page_refcounted(p); >> } >> >> - folio_set_order(folio, 0); >> __folio_clear_head(folio); >> } >> >> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> struct page *p; >> >> __folio_clear_reserved(folio); >> - __folio_set_head(folio); >> - /* we rely on prep_new_hugetlb_folio to set the destructor */ >> - folio_set_order(folio, order); >> for (i = 0; i < nr_pages; i++) { >> p = folio_page(folio, i); >> >> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> if (i != 0) >> set_compound_head(p, &folio->page); >> } > > calling set_compound_head() for the tail page before the folio has the > head flag set could seem misleading. At this point order is not set as > well so it is not clear that the folio is a compound page folio. > Yeah, I agree, But they are part of same call. I can avoid moving __folio_set_head. And I think, It wont mislead if I avoid moving __folio_set_head. Below function has the similar path. void prep_compound_page(struct page *page, unsigned int order) { int i; int nr_pages = 1 << order; __SetPageHead(page); for (i = 1; i < nr_pages; i++) prep_compound_tail(page, i); prep_compound_head(page, order); } Lemme know you thoughts. ~Tarun >> + __folio_set_head(folio); >> + /* we rely on prep_new_hugetlb_folio to set the destructor */ >> + folio_set_order(folio, order); >> atomic_set(&folio->_entire_mapcount, -1); >> atomic_set(&folio->_nr_pages_mapped, 0); >> atomic_set(&folio->_pincount, 0); >> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> p = folio_page(folio, j); >> __ClearPageReserved(p); >> } >> - folio_set_order(folio, 0); >> - __folio_clear_head(folio); >> return false; >> } >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 18cda26b8a92..0d96a3bc1d58 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, >> */ >> static inline void folio_set_order(struct folio *folio, unsigned int order) >> { >> - if (WARN_ON_ONCE(!folio_test_large(folio))) >> + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) >> return; >> >> folio->_folio_order = order; >> #ifdef CONFIG_64BIT >> - /* >> - * When hugetlb dissolves a folio, we need to clear the tail >> - * page, rather than setting nr_pages to 1. >> - */ >> - folio->_folio_nr_pages = order ? 1U << order : 0; >> + folio->_folio_nr_pages = 1U << order; >> #endif >> } >>
Tarun Sahu <tsahu@linux.ibm.com> writes: > Hi Sidhartha, > > Thanks for your inputs, please find my comments inline > >> On 4/14/23 12:48 PM, Tarun Sahu wrote: >>> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >>> folio does not have any tail page to set order. folio->_folio_nr_pages is >>> set to 0 for order 0 in folio_set_order. It is required because >> >> In the previous discussion of this function, Mike mentioned having >> folio_set_order() be used for non-zero orders and adding a >> folio_clear_order() that is used to set order to 0. This could be done >> to reduce confusion. >> > Yes, I agree, I replied to Mathew reply to this thread, Lemme know your > thought on this. In this patch, I proposed that there won't be need of > folio_clear_order if folio_set_order(folio, 0) is not needed with minor > changes in code path. > >>> _folio_nr_pages overlapped with page->mapping and leaving it non zero >>> caused "bad page" error while freeing gigantic hugepages. This was fixed in >>> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >>> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >>> pages to CMA") now explicitly clear page->mapping and hence we won't see >>> the bad page error even if _folio_nr_pages remains unset. Also the order 0 >>> folios are not supposed to call folio_set_order, So now we can get rid of >>> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. >>> >>> The patch also moves _folio_set_head and folio_set_order calls in >>> __prep_compound_gigantic_folio() such that we avoid clearing them in the >>> error path. >>> >>> Testing: I have run LTP tests, which all passes. and also I have written >>> the test in LTP which tests the bug caused by compound_nr and page->mapping >>> overlapping. >>> >>> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ >>> >>> Running on older kernel ( < 5.10-rc7) with the above bug this fails while >>> on newer kernel and, also with this patch it passes. >>> >>> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> >>> --- >>> mm/hugetlb.c | 9 +++------ >>> mm/internal.h | 8 ++------ >>> 2 files changed, 5 insertions(+), 12 deletions(-) >>> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index f16b25b1a6b9..e2540269c1dc 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, >>> set_page_refcounted(p); >>> } >>> >>> - folio_set_order(folio, 0); >>> __folio_clear_head(folio); >>> } >>> >>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >>> struct page *p; >>> >>> __folio_clear_reserved(folio); >>> - __folio_set_head(folio); >>> - /* we rely on prep_new_hugetlb_folio to set the destructor */ >>> - folio_set_order(folio, order); >>> for (i = 0; i < nr_pages; i++) { >>> p = folio_page(folio, i); >>> >>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >>> if (i != 0) >>> set_compound_head(p, &folio->page); >>> } >> >> calling set_compound_head() for the tail page before the folio has the >> head flag set could seem misleading. At this point order is not set as >> well so it is not clear that the folio is a compound page folio. >> > Yeah, I agree, But they are part of same call. I can avoid moving > __folio_set_head. And I think, It wont mislead if I avoid moving > __folio_set_head. Below function has the similar path. Apologies, Here, I mixed the sentences, I want to say It won't mislead if we avoid moving __folio_set_head, but move only folio_set_order. Below function does the same. > > void prep_compound_page(struct page *page, unsigned int order) > { > int i; > int nr_pages = 1 << order; > > __SetPageHead(page); > for (i = 1; i < nr_pages; i++) > prep_compound_tail(page, i); > > prep_compound_head(page, order); > } > > Lemme know you thoughts. > > > ~Tarun > >>> + __folio_set_head(folio); >>> + /* we rely on prep_new_hugetlb_folio to set the destructor */ >>> + folio_set_order(folio, order); >>> atomic_set(&folio->_entire_mapcount, -1); >>> atomic_set(&folio->_nr_pages_mapped, 0); >>> atomic_set(&folio->_pincount, 0); >>> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >>> p = folio_page(folio, j); >>> __ClearPageReserved(p); >>> } >>> - folio_set_order(folio, 0); >>> - __folio_clear_head(folio); >>> return false; >>> } >>> >>> diff --git a/mm/internal.h b/mm/internal.h >>> index 18cda26b8a92..0d96a3bc1d58 100644 >>> --- a/mm/internal.h >>> +++ b/mm/internal.h >>> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, >>> */ >>> static inline void folio_set_order(struct folio *folio, unsigned int order) >>> { >>> - if (WARN_ON_ONCE(!folio_test_large(folio))) >>> + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) >>> return; >>> >>> folio->_folio_order = order; >>> #ifdef CONFIG_64BIT >>> - /* >>> - * When hugetlb dissolves a folio, we need to clear the tail >>> - * page, rather than setting nr_pages to 1. >>> - */ >>> - folio->_folio_nr_pages = order ? 1U << order : 0; >>> + folio->_folio_nr_pages = 1U << order; >>> #endif >>> } >>>
On 04/14/23 21:12, Matthew Wilcox wrote: > On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: > > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order > > folio does not have any tail page to set order. > > I think you're missing the point of how folio_set_order() is used. > When splitting a large folio, we need to zero out the folio_nr_pages > in the tail, so it does have a tail page, and that tail page needs to > be zeroed. We even assert that there is a tail page: > > if (WARN_ON_ONCE(!folio_test_large(folio))) > return; > > Or maybe you need to explain yourself better. > > > folio->_folio_nr_pages is > > set to 0 for order 0 in folio_set_order. It is required because > > _folio_nr_pages overlapped with page->mapping and leaving it non zero > > caused "bad page" error while freeing gigantic hugepages. This was fixed in > > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic > > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA > > pages to CMA") now explicitly clear page->mapping and hence we won't see > > the bad page error even if _folio_nr_pages remains unset. Also the order 0 > > folios are not supposed to call folio_set_order, So now we can get rid of > > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. > > ... this is all very confusing. > > > The patch also moves _folio_set_head and folio_set_order calls in > > __prep_compound_gigantic_folio() such that we avoid clearing them in the > > error path. > > But don't we need those bits set while we operate on the folio to set it > up? It makes me nervous if we don't have those bits set because we can > end up with speculative references that point to a head page while that > page is not marked as a head page. It may not be a problem, but I want > to see some air-tight analysis of that. I am fairly certain we are 'safe'. Here is code before setting up the pointer to the head page. * In the case of demote, the ref count will be zero. */ if (!demote) { if (!page_ref_freeze(p, 1)) { pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); goto out_error; } } else { VM_BUG_ON_PAGE(page_count(p), p); } if (i != 0) set_compound_head(p, &folio->page); So, before setting the pointer to head page ref count will be zero. I 'think' it would actually be better to move the calls to _folio_set_head and folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why? In the current code, the ref count on the 'head page' is still 1 (or more) while those calls are made. So, someone could take a speculative ref on the page BEFORE the tail pages are set up. TBH, I do not have much of an opinion about potential confusion surrounding folio_set_compound_order(folio, 0). IIUC, hugetlb gigantic page setup is the only place outside the page allocation code that sets up compound pages/large folios. So, it is going to be a bit 'special'. As mentioned, when this was originally discussed I suggested folio_clear_order(). I would be happy with either.
Hi, Mathew, I am not sure If I was clear about my intention behind the patch. Here, I attempt to answer it again. Please lemme know if you agree. Matthew Wilcox <willy@infradead.org> writes: > On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: >> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >> folio does not have any tail page to set order. > > I think you're missing the point of how folio_set_order() is used. > When splitting a large folio, we need to zero out the folio_nr_pages > in the tail, so it does have a tail page, and that tail page needs to > be zeroed. We even assert that there is a tail page: > > if (WARN_ON_ONCE(!folio_test_large(folio))) > return; > > Or maybe you need to explain yourself better. > In the upstream, I don't see folio_set_order(folio, 0) being called in splitting path. IIUC, we had to zero out _folio_nr_pages during freeing gigantic folio as described by Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic pages"). I agree that folio_set_order(folio, 0) is called with folio having tail pages. But I meant only that, in general, it is just confusing to have setting the folio order to 0. With this patch, I would like to draw attention to the point that there is no need to call folio_set_order(folio, 0) anymore to zero out _folio_order and _folio_nr_pages. In past, it was needed because page->mapping used to overlap with _folio_nr_pages and _folio_order. So if these fields were left uncleared during freeing gigantic hugepages, they were causing "BUG: bad page state" due to non-zero page->mapping. Now, After Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA") page->mapping has explicitly been cleared out for tail pages. Also, _folio_order and _folio_nr_pages no longer overlaps with page->mapping. struct page { ... struct address_space * mapping; /* 24 8 */ ... } struct folio { ... union { struct { long unsigned int _flags_1; /* 64 8 */ long unsigned int _head_1; /* 72 8 */ unsigned char _folio_dtor; /* 80 1 */ unsigned char _folio_order; /* 81 1 */ /* XXX 2 bytes hole, try to pack */ atomic_t _entire_mapcount; /* 84 4 */ atomic_t _nr_pages_mapped; /* 88 4 */ atomic_t _pincount; /* 92 4 */ unsigned int _folio_nr_pages; /* 96 4 */ }; /* 64 40 */ struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */ } ... } So, folio_set_order(folio, 0) can be removed from freeing gigantic folio path (__destroy_compound_gigantic_folio). Another place, where folio_set_order(folio, 0) is called inside __prep_compound_gigantic_folio during error path. Here, folio_set_order(folio, 0) can also be removed if we move folio_set_order(folio, order) after for loop. Also, Mike confirmed that it is safe to move the call. ~Tarun >> folio->_folio_nr_pages is >> set to 0 for order 0 in folio_set_order. It is required because >> _folio_nr_pages overlapped with page->mapping and leaving it non zero >> caused "bad page" error while freeing gigantic hugepages. This was fixed in >> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >> pages to CMA") now explicitly clear page->mapping and hence we won't see >> the bad page error even if _folio_nr_pages remains unset. Also the order 0 >> folios are not supposed to call folio_set_order, So now we can get rid of >> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. > > ... this is all very confusing. > >> The patch also moves _folio_set_head and folio_set_order calls in >> __prep_compound_gigantic_folio() such that we avoid clearing them in the >> error path. > > But don't we need those bits set while we operate on the folio to set it > up? It makes me nervous if we don't have those bits set because we can > end up with speculative references that point to a head page while that > page is not marked as a head page. It may not be a problem, but I want > to see some air-tight analysis of that. > >> Testing: I have run LTP tests, which all passes. and also I have written >> the test in LTP which tests the bug caused by compound_nr and page->mapping >> overlapping. >> >> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ >> >> Running on older kernel ( < 5.10-rc7) with the above bug this fails while >> on newer kernel and, also with this patch it passes. >> >> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> >> --- >> mm/hugetlb.c | 9 +++------ >> mm/internal.h | 8 ++------ >> 2 files changed, 5 insertions(+), 12 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index f16b25b1a6b9..e2540269c1dc 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, >> set_page_refcounted(p); >> } >> >> - folio_set_order(folio, 0); >> __folio_clear_head(folio); >> } >> >> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> struct page *p; >> >> __folio_clear_reserved(folio); >> - __folio_set_head(folio); >> - /* we rely on prep_new_hugetlb_folio to set the destructor */ >> - folio_set_order(folio, order); >> for (i = 0; i < nr_pages; i++) { >> p = folio_page(folio, i); >> >> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> if (i != 0) >> set_compound_head(p, &folio->page); >> } >> + __folio_set_head(folio); >> + /* we rely on prep_new_hugetlb_folio to set the destructor */ >> + folio_set_order(folio, order); >> atomic_set(&folio->_entire_mapcount, -1); >> atomic_set(&folio->_nr_pages_mapped, 0); >> atomic_set(&folio->_pincount, 0); >> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> p = folio_page(folio, j); >> __ClearPageReserved(p); >> } >> - folio_set_order(folio, 0); >> - __folio_clear_head(folio); >> return false; >> } >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 18cda26b8a92..0d96a3bc1d58 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, >> */ >> static inline void folio_set_order(struct folio *folio, unsigned int order) >> { >> - if (WARN_ON_ONCE(!folio_test_large(folio))) >> + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) >> return; >> >> folio->_folio_order = order; >> #ifdef CONFIG_64BIT >> - /* >> - * When hugetlb dissolves a folio, we need to clear the tail >> - * page, rather than setting nr_pages to 1. >> - */ >> - folio->_folio_nr_pages = order ? 1U << order : 0; >> + folio->_folio_nr_pages = 1U << order; >> #endif >> } >> >> -- >> 2.31.1 >>
Hi Mike, Mike Kravetz <mike.kravetz@oracle.com> writes: > On 04/14/23 21:12, Matthew Wilcox wrote: >> On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: >> > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >> > folio does not have any tail page to set order. >> >> I think you're missing the point of how folio_set_order() is used. >> When splitting a large folio, we need to zero out the folio_nr_pages >> in the tail, so it does have a tail page, and that tail page needs to >> be zeroed. We even assert that there is a tail page: >> >> if (WARN_ON_ONCE(!folio_test_large(folio))) >> return; >> >> Or maybe you need to explain yourself better. >> >> > folio->_folio_nr_pages is >> > set to 0 for order 0 in folio_set_order. It is required because >> > _folio_nr_pages overlapped with page->mapping and leaving it non zero >> > caused "bad page" error while freeing gigantic hugepages. This was fixed in >> > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >> > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >> > pages to CMA") now explicitly clear page->mapping and hence we won't see >> > the bad page error even if _folio_nr_pages remains unset. Also the order 0 >> > folios are not supposed to call folio_set_order, So now we can get rid of >> > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. >> >> ... this is all very confusing. >> >> > The patch also moves _folio_set_head and folio_set_order calls in >> > __prep_compound_gigantic_folio() such that we avoid clearing them in the >> > error path. >> >> But don't we need those bits set while we operate on the folio to set it >> up? It makes me nervous if we don't have those bits set because we can >> end up with speculative references that point to a head page while that >> page is not marked as a head page. It may not be a problem, but I want >> to see some air-tight analysis of that. > > I am fairly certain we are 'safe'. Here is code before setting up the > pointer to the head page. > > * In the case of demote, the ref count will be zero. > */ > if (!demote) { > if (!page_ref_freeze(p, 1)) { > pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); > goto out_error; > } > } else { > VM_BUG_ON_PAGE(page_count(p), p); > } > if (i != 0) > set_compound_head(p, &folio->page); > > So, before setting the pointer to head page ref count will be zero. > > I 'think' it would actually be better to move the calls to _folio_set_head and > folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why? > In the current code, the ref count on the 'head page' is still 1 (or more) > while those calls are made. So, someone could take a speculative ref on the > page BEFORE the tail pages are set up. > Thanks, for confirming the correctness of moving these calls. Also I didn't look at it this way while moving them. Thanks for the comment. I will update the commit msg and send the v2. ~Tarun > TBH, I do not have much of an opinion about potential confusion surrounding > folio_set_compound_order(folio, 0). IIUC, hugetlb gigantic page setup is the > only place outside the page allocation code that sets up compound pages/large > folios. So, it is going to be a bit 'special'. As mentioned, when this was > originally discussed I suggested folio_clear_order(). I would be happy with > either. > -- > Mike Kravetz
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f16b25b1a6b9..e2540269c1dc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, set_page_refcounted(p); } - folio_set_order(folio, 0); __folio_clear_head(folio); } @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, struct page *p; __folio_clear_reserved(folio); - __folio_set_head(folio); - /* we rely on prep_new_hugetlb_folio to set the destructor */ - folio_set_order(folio, order); for (i = 0; i < nr_pages; i++) { p = folio_page(folio, i); @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, if (i != 0) set_compound_head(p, &folio->page); } + __folio_set_head(folio); + /* we rely on prep_new_hugetlb_folio to set the destructor */ + folio_set_order(folio, order); atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, p = folio_page(folio, j); __ClearPageReserved(p); } - folio_set_order(folio, 0); - __folio_clear_head(folio); return false; } diff --git a/mm/internal.h b/mm/internal.h index 18cda26b8a92..0d96a3bc1d58 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, */ static inline void folio_set_order(struct folio *folio, unsigned int order) { - if (WARN_ON_ONCE(!folio_test_large(folio))) + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) return; folio->_folio_order = order; #ifdef CONFIG_64BIT - /* - * When hugetlb dissolves a folio, we need to clear the tail - * page, rather than setting nr_pages to 1. - */ - folio->_folio_nr_pages = order ? 1U << order : 0; + folio->_folio_nr_pages = 1U << order; #endif }
folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order folio does not have any tail page to set order. folio->_folio_nr_pages is set to 0 for order 0 in folio_set_order. It is required because _folio_nr_pages overlapped with page->mapping and leaving it non zero caused "bad page" error while freeing gigantic hugepages. This was fixed in Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA") now explicitly clear page->mapping and hence we won't see the bad page error even if _folio_nr_pages remains unset. Also the order 0 folios are not supposed to call folio_set_order, So now we can get rid of folio_set_order(folio, 0) from hugetlb code path to clear the confusion. The patch also moves _folio_set_head and folio_set_order calls in __prep_compound_gigantic_folio() such that we avoid clearing them in the error path. Testing: I have run LTP tests, which all passes. and also I have written the test in LTP which tests the bug caused by compound_nr and page->mapping overlapping. https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ Running on older kernel ( < 5.10-rc7) with the above bug this fails while on newer kernel and, also with this patch it passes. Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> --- mm/hugetlb.c | 9 +++------ mm/internal.h | 8 ++------ 2 files changed, 5 insertions(+), 12 deletions(-)