Message ID | 20240402200656.913841-1-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio | expand |
On 4/2/24 1:06 PM, Matthew Wilcox (Oracle) wrote: > While this function returned a folio, it was still using __alloc_pages() > and __free_pages(). Use __folio_alloc() and put_folio() instead. This > actually removes a call to compound_head(), but more importantly, it > prepares us for the move to memdescs. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/hugetlb.c | 33 ++++++++++++++++----------------- > 1 file changed, 16 insertions(+), 17 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index a3bffa8debde..5f1e0b1a0d57 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, > nodemask_t *node_alloc_noretry) > { > int order = huge_page_order(h); > - struct page *page; > + struct folio *folio; > bool alloc_try_hard = true; > bool retry = true; > > /* > - * By default we always try hard to allocate the page with > - * __GFP_RETRY_MAYFAIL flag. However, if we are allocating pages in > + * By default we always try hard to allocate the folio with > + * __GFP_RETRY_MAYFAIL flag. However, if we are allocating folios in > * a loop (to adjust global huge page counts) and previous allocation > * failed, do not continue to try hard on the same node. Use the > * node_alloc_noretry bitmap to manage this state information. > @@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, > if (nid == NUMA_NO_NODE) > nid = numa_mem_id(); > retry: > - page = __alloc_pages(gfp_mask, order, nid, nmask); > + folio = __folio_alloc(gfp_mask, order, nid, nmask); > > - /* Freeze head page */ > - if (page && !page_ref_freeze(page, 1)) { > - __free_pages(page, order); > + if (folio && !folio_ref_freeze(folio, 1)) { > + folio_put(folio); > if (retry) { /* retry once */ > retry = false; > goto retry; > } > /* WOW! twice in a row. */ > - pr_warn("HugeTLB head page unexpected inflated ref count\n"); > - page = NULL; > + pr_warn("HugeTLB unexpected inflated folio ref count\n"); > + folio = NULL; > } > > /* > - * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this > - * indicates an overall state change. Clear bit so that we resume > - * normal 'try hard' allocations. > + * If we did not specify __GFP_RETRY_MAYFAIL, but still got a > + * folio this indicates an overall state change. Clear bit so > + * that we resume normal 'try hard' allocations. > */ > - if (node_alloc_noretry && page && !alloc_try_hard) > + if (node_alloc_noretry && folio && !alloc_try_hard) > node_clear(nid, *node_alloc_noretry); > > /* > - * If we tried hard to get a page but failed, set bit so that > + * If we tried hard to get a folio but failed, set bit so that > * subsequent attempts will not try as hard until there is an > * overall state change. > */ > - if (node_alloc_noretry && !page && alloc_try_hard) > + if (node_alloc_noretry && !folio && alloc_try_hard) > node_set(nid, *node_alloc_noretry); > > - if (!page) { > + if (!folio) { > __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL); > return NULL; > } > > __count_vm_event(HTLB_BUDDY_PGALLOC); > - return page_folio(page); > + return folio; > } > > static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h, Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
On Tue, Apr 02, 2024 at 09:06:54PM +0100, Matthew Wilcox (Oracle) wrote: > While this function returned a folio, it was still using __alloc_pages() > and __free_pages(). Use __folio_alloc() and put_folio() instead. This > actually removes a call to compound_head(), but more importantly, it > prepares us for the move to memdescs. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> > - page = __alloc_pages(gfp_mask, order, nid, nmask); > + folio = __folio_alloc(gfp_mask, order, nid, nmask); > > - /* Freeze head page */ > - if (page && !page_ref_freeze(page, 1)) { > - __free_pages(page, order); > + if (folio && !folio_ref_freeze(folio, 1)) { > + folio_put(folio); This made me look again at the problem we had in the past with speculative refcount vs hugetlb pages, and made me think whether there are any more users trying to defeat speculative refcounts this way. It was discussed some time ago that maybe all pages returned from the buddy allocator should have its refcount frozen, to avoid this.
> On Apr 3, 2024, at 04:06, Matthew Wilcox (Oracle) <willy@infradead.org> wrote: > > While this function returned a folio, it was still using __alloc_pages() > and __free_pages(). Use __folio_alloc() and put_folio() instead. This > actually removes a call to compound_head(), but more importantly, it > prepares us for the move to memdescs. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Thanks.
> On Apr 3, 2024, at 12:55, Oscar Salvador <osalvador@suse.de> wrote: > > On Tue, Apr 02, 2024 at 09:06:54PM +0100, Matthew Wilcox (Oracle) wrote: >> While this function returned a folio, it was still using __alloc_pages() >> and __free_pages(). Use __folio_alloc() and put_folio() instead. This >> actually removes a call to compound_head(), but more importantly, it >> prepares us for the move to memdescs. >> >> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > Reviewed-by: Oscar Salvador <osalvador@suse.de> > >> - page = __alloc_pages(gfp_mask, order, nid, nmask); >> + folio = __folio_alloc(gfp_mask, order, nid, nmask); >> >> - /* Freeze head page */ >> - if (page && !page_ref_freeze(page, 1)) { >> - __free_pages(page, order); >> + if (folio && !folio_ref_freeze(folio, 1)) { >> + folio_put(folio); > > This made me look again at the problem we had in the past with > speculative refcount vs hugetlb pages, and made me think whether there > are any more users trying to defeat speculative refcounts this way. > It was discussed some time ago that maybe all pages returned from the buddy > allocator should have its refcount frozen, to avoid this. I think you mean this patch [1], right? With alloc_frozen_pages() introduced, we could get rid of the trick from HugeTLB code. [1] https://lore.kernel.org/linux-mm/B4889CC0-5D36-44E2-B901-CDC5226995A2@oracle.com/T/#m20457752e757cdafc99c7f5e6a3d8cbbb65fcd3e > > > -- > Oscar Salvador > SUSE Labs
On Wed, Apr 03, 2024 at 02:19:19PM +0800, Muchun Song wrote: > I think you mean this patch [1], right? With alloc_frozen_pages() > introduced, we could get rid of the trick from HugeTLB code. Ah yes, that one, thanks. It would be nice, but having read the discussion I am kind of skeptical. But maybe some to revisit.
On Wed, Apr 03, 2024 at 09:25:41AM +0200, Oscar Salvador wrote: > On Wed, Apr 03, 2024 at 02:19:19PM +0800, Muchun Song wrote: > > I think you mean this patch [1], right? With alloc_frozen_pages() > > introduced, we could get rid of the trick from HugeTLB code. > > Ah yes, that one, thanks. > It would be nice, but having read the discussion I am kind of skeptical. > > But maybe some to revisit. I haven't given up on it. It's just currently parked, awaiting more cleanups, some of which I have scheduled for the next merge window. Part of the memdesc project will involve not having refcounts for some memdescs. Slab, percpu and pagetable don't need them, for example. I think hugetlb is being unnecessarily paranoid here, tbh. Or maybe this part is just badly structured; if we're allocating a hugetlb folio, it should be fine for its refcount to be temporarily elevated by someone else. Not sure I can figure out what's going on in alloc_and_dissolve_hugetlb_folio() though.
On Wed, Apr 03, 2024 at 10:17:45PM +0100, Matthew Wilcox wrote: > I think hugetlb is being unnecessarily paranoid here, tbh. Or maybe > this part is just badly structured; if we're allocating a hugetlb folio, > it should be fine for its refcount to be temporarily elevated by someone > else. Not sure I can figure out what's going on in > alloc_and_dissolve_hugetlb_folio() though. AFAICR, the problem comes when we need to remap the pages for vmemmap optimization [1]. So, IIUC: 1) if someone comes around and grabs a refcount (say something doing speculative stuff) 2) we do the remapping 3) that someone who took the refcount, now does a put_page() 4) vmemmap no longer points to the old page but the new one, meaning that that 'put_page()' is done on the wrong page. @Munchun: Did I get this right? [1] https://lore.kernel.org/linux-mm/YupRjWRiz4lPo+y7@FVFYT0MHHV2J/
> On Apr 4, 2024, at 19:13, Oscar Salvador <osalvador@suse.de> wrote: > > On Wed, Apr 03, 2024 at 10:17:45PM +0100, Matthew Wilcox wrote: >> I think hugetlb is being unnecessarily paranoid here, tbh. Or maybe >> this part is just badly structured; if we're allocating a hugetlb folio, >> it should be fine for its refcount to be temporarily elevated by someone >> else. Not sure I can figure out what's going on in >> alloc_and_dissolve_hugetlb_folio() though. > > AFAICR, the problem comes when we need to remap the pages for vmemmap > optimization [1]. > So, IIUC: > > 1) if someone comes around and grabs a refcount (say something doing > speculative stuff) > 2) we do the remapping > 3) that someone who took the refcount, now does a put_page() > 4) vmemmap no longer points to the old page but the new one, meaning > that that 'put_page()' is done on the wrong page. > > @Munchun: Did I get this right? Right. But let me clarify this again. We need to keep the content of the physical page constant throughout the processing of HVO (i.g. between the copying of head vmemmap page and remapping of it). Zero-referenced page could prevent others from updating the content of the page structs. > > [1] https://lore.kernel.org/linux-mm/YupRjWRiz4lPo+y7@FVFYT0MHHV2J/ Yes, I've also pointed it out here. Thanks. > > > -- > Oscar Salvador > SUSE Labs
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a3bffa8debde..5f1e0b1a0d57 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, nodemask_t *node_alloc_noretry) { int order = huge_page_order(h); - struct page *page; + struct folio *folio; bool alloc_try_hard = true; bool retry = true; /* - * By default we always try hard to allocate the page with - * __GFP_RETRY_MAYFAIL flag. However, if we are allocating pages in + * By default we always try hard to allocate the folio with + * __GFP_RETRY_MAYFAIL flag. However, if we are allocating folios in * a loop (to adjust global huge page counts) and previous allocation * failed, do not continue to try hard on the same node. Use the * node_alloc_noretry bitmap to manage this state information. @@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, if (nid == NUMA_NO_NODE) nid = numa_mem_id(); retry: - page = __alloc_pages(gfp_mask, order, nid, nmask); + folio = __folio_alloc(gfp_mask, order, nid, nmask); - /* Freeze head page */ - if (page && !page_ref_freeze(page, 1)) { - __free_pages(page, order); + if (folio && !folio_ref_freeze(folio, 1)) { + folio_put(folio); if (retry) { /* retry once */ retry = false; goto retry; } /* WOW! twice in a row. */ - pr_warn("HugeTLB head page unexpected inflated ref count\n"); - page = NULL; + pr_warn("HugeTLB unexpected inflated folio ref count\n"); + folio = NULL; } /* - * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this - * indicates an overall state change. Clear bit so that we resume - * normal 'try hard' allocations. + * If we did not specify __GFP_RETRY_MAYFAIL, but still got a + * folio this indicates an overall state change. Clear bit so + * that we resume normal 'try hard' allocations. */ - if (node_alloc_noretry && page && !alloc_try_hard) + if (node_alloc_noretry && folio && !alloc_try_hard) node_clear(nid, *node_alloc_noretry); /* - * If we tried hard to get a page but failed, set bit so that + * If we tried hard to get a folio but failed, set bit so that * subsequent attempts will not try as hard until there is an * overall state change. */ - if (node_alloc_noretry && !page && alloc_try_hard) + if (node_alloc_noretry && !folio && alloc_try_hard) node_set(nid, *node_alloc_noretry); - if (!page) { + if (!folio) { __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL); return NULL; } __count_vm_event(HTLB_BUDDY_PGALLOC); - return page_folio(page); + return folio; } static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
While this function returned a folio, it was still using __alloc_pages() and __free_pages(). Use __folio_alloc() and put_folio() instead. This actually removes a call to compound_head(), but more importantly, it prepares us for the move to memdescs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/hugetlb.c | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-)