Message ID | 20240312035017.516865-1-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Always initialise folio->_deferred_list | expand |
On 12.03.24 04:50, Matthew Wilcox (Oracle) wrote: > For compound pages which are at least order-2 (and hence have a > deferred_list), initialise it and then we can check at free that the > page is not part of a deferred list. We recently found this useful to > rule out a source of corruption. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/huge_memory.c | 2 -- > mm/internal.h | 2 ++ > mm/memcontrol.c | 2 ++ > mm/page_alloc.c | 9 +++++---- > 4 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9859aa4f7553..04fb994a7b0b 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio) > { > if (!folio || !folio_test_large(folio)) > return; > - if (folio_order(folio) > 1) > - INIT_LIST_HEAD(&folio->_deferred_list); > folio_set_large_rmappable(folio); > } > > diff --git a/mm/internal.h b/mm/internal.h > index 7e486f2c502c..10895ec52546 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -525,6 +525,8 @@ static inline void prep_compound_head(struct page *page, unsigned int order) > atomic_set(&folio->_entire_mapcount, -1); > atomic_set(&folio->_nr_pages_mapped, 0); > atomic_set(&folio->_pincount, 0); > + if (order > 1) > + INIT_LIST_HEAD(&folio->_deferred_list); > } > > static inline void prep_compound_tail(struct page *head, int tail_idx) > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 138bcfa18234..e2334c4ee550 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -7483,6 +7483,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) > struct obj_cgroup *objcg; > > VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); > + VM_BUG_ON_FOLIO(folio_order(folio) > 1 && > + !list_empty(&folio->_deferred_list), folio); > > /* > * Nobody should be changing or seriously looking at > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 62240014a498..8374ba9b6249 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) > } > break; > case 2: > - /* > - * the second tail page: ->mapping is > - * deferred_list.next -- ignore value. > - */ > + /* the second tail page: deferred_list overlaps ->mapping */ > + if (unlikely(!list_empty(&folio->_deferred_list))) { > + bad_page(page, "on deferred list"); > + goto out; > + } IIRC, hugetlb might overwrite this with _hugetlb_subpool? Isn't that a problem we have to handle?
On Tue, Mar 12, 2024 at 02:17:01PM +0100, David Hildenbrand wrote: > On 12.03.24 04:50, Matthew Wilcox (Oracle) wrote: > > +++ b/mm/page_alloc.c > > @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) > > } > > break; > > case 2: > > - /* > > - * the second tail page: ->mapping is > > - * deferred_list.next -- ignore value. > > - */ > > + /* the second tail page: deferred_list overlaps ->mapping */ > > + if (unlikely(!list_empty(&folio->_deferred_list))) { > > + bad_page(page, "on deferred list"); > > + goto out; > > + } > > IIRC, hugetlb might overwrite this with _hugetlb_subpool? Isn't that a > problem we have to handle? Ah yes, you're right. I think this should do the trick? +++ b/mm/hugetlb.c @@ -1796,6 +1796,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, destroy_compound_gigantic_folio(folio, huge_page_order(h)); free_gigantic_folio(folio, huge_page_order(h)); } else { + INIT_LIST_HEAD(&folio->_deferred_list); __free_pages(&folio->page, huge_page_order(h)); } } (that should also be a folio_put(), IMO)
On 12.03.24 15:01, Matthew Wilcox wrote: > On Tue, Mar 12, 2024 at 02:17:01PM +0100, David Hildenbrand wrote: >> On 12.03.24 04:50, Matthew Wilcox (Oracle) wrote: >>> +++ b/mm/page_alloc.c >>> @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) >>> } >>> break; >>> case 2: >>> - /* >>> - * the second tail page: ->mapping is >>> - * deferred_list.next -- ignore value. >>> - */ >>> + /* the second tail page: deferred_list overlaps ->mapping */ >>> + if (unlikely(!list_empty(&folio->_deferred_list))) { >>> + bad_page(page, "on deferred list"); >>> + goto out; >>> + } >> >> IIRC, hugetlb might overwrite this with _hugetlb_subpool? Isn't that a >> problem we have to handle? > > Ah yes, you're right. I think this should do the trick? > > +++ b/mm/hugetlb.c > @@ -1796,6 +1796,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, > destroy_compound_gigantic_folio(folio, huge_page_order(h)); > free_gigantic_folio(folio, huge_page_order(h)); > } else { > + INIT_LIST_HEAD(&folio->_deferred_list); > __free_pages(&folio->page, huge_page_order(h)); Heh, __free_pages() says: "This function can free multi-page allocations that are not compound pages". I suspect that means "can also free compound pages", but it's confusing. Especially, I thought we recently learned that free hugetlb folios do have a refcount of 0? Confusing. Anyhow, I suspect your change does the trick and agree that __free_pages() might need a replacement. ... but it's confusing.
On Tue, Mar 12, 2024 at 03:34:13PM +0100, David Hildenbrand wrote: > On 12.03.24 15:01, Matthew Wilcox wrote: > > __free_pages(&folio->page, huge_page_order(h)); > > Heh, __free_pages() says: > > "This function can free multi-page allocations that are not compound pages". > I suspect that means "can also free compound pages", but it's confusing. Sorry. I wrote that documentation and I was focused on one thing, but failed to think of the other thing ... it can indeed free compound pages. I'll add this: * If the pages were allocated with __GFP_COMP, prefer using put_page() * to using this function. Also prefer to call put_page() rather than * calling __free_page() or __free_pages(page, 0). > Especially, I thought we recently learned that free hugetlb folios do have a > refcount of 0? Confusing. Yes, that is confusing. This function wouldn't work if the refcount were 0 (put_page_testzero() would complain). Ah, here we go: In __folio_put(), we know that refcount is 0. __folio_put_large() -> destroy_large_folio -> free_huge_folio (refcount still 0 here, indeed it asserts it) If we add it back to the pool, the refcount stays at 0. But if we're removing it, we call __remove_hugetlb_folio() which calls folio_ref_unfreeze().
On 12.03.24 16:38, Matthew Wilcox wrote: > On Tue, Mar 12, 2024 at 03:34:13PM +0100, David Hildenbrand wrote: >> On 12.03.24 15:01, Matthew Wilcox wrote: >>> __free_pages(&folio->page, huge_page_order(h)); >> >> Heh, __free_pages() says: >> >> "This function can free multi-page allocations that are not compound pages". >> I suspect that means "can also free compound pages", but it's confusing. > > Sorry. I wrote that documentation and I was focused on one thing, > but failed to think of the other thing ... it can indeed free compound > pages. I'll add this: > > * If the pages were allocated with __GFP_COMP, prefer using put_page() > * to using this function. Also prefer to call put_page() rather than > * calling __free_page() or __free_pages(page, 0). Sounds good. > >> Especially, I thought we recently learned that free hugetlb folios do have a >> refcount of 0? Confusing. > > Yes, that is confusing. This function wouldn't work if the refcount > were 0 (put_page_testzero() would complain). Ah, here we go: > > In __folio_put(), we know that refcount is 0. > __folio_put_large() -> destroy_large_folio -> free_huge_folio > (refcount still 0 here, indeed it asserts it) > If we add it back to the pool, the refcount stays at 0. > But if we're removing it, we call __remove_hugetlb_folio() > which calls folio_ref_unfreeze(). Ah, that's the case I was missing. The joy of hugetlb :) As a side note: hugetlb.c still talks about "destructor" and "dtor". I assume that dates back to the times where we identified hugetlb folios by a deconstrucor in the compound page.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9859aa4f7553..04fb994a7b0b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio) { if (!folio || !folio_test_large(folio)) return; - if (folio_order(folio) > 1) - INIT_LIST_HEAD(&folio->_deferred_list); folio_set_large_rmappable(folio); } diff --git a/mm/internal.h b/mm/internal.h index 7e486f2c502c..10895ec52546 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -525,6 +525,8 @@ static inline void prep_compound_head(struct page *page, unsigned int order) atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); + if (order > 1) + INIT_LIST_HEAD(&folio->_deferred_list); } static inline void prep_compound_tail(struct page *head, int tail_idx) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 138bcfa18234..e2334c4ee550 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7483,6 +7483,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); + VM_BUG_ON_FOLIO(folio_order(folio) > 1 && + !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 62240014a498..8374ba9b6249 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) } break; case 2: - /* - * the second tail page: ->mapping is - * deferred_list.next -- ignore value. - */ + /* the second tail page: deferred_list overlaps ->mapping */ + if (unlikely(!list_empty(&folio->_deferred_list))) { + bad_page(page, "on deferred list"); + goto out; + } break; default: if (page->mapping != TAIL_MAPPING) {
For compound pages which are at least order-2 (and hence have a deferred_list), initialise it and then we can check at free that the page is not part of a deferred list. We recently found this useful to rule out a source of corruption. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/huge_memory.c | 2 -- mm/internal.h | 2 ++ mm/memcontrol.c | 2 ++ mm/page_alloc.c | 9 +++++---- 4 files changed, 9 insertions(+), 6 deletions(-)