Message ID | 20250304083841.283159-1-liushixin2@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm/hugetlb: update nr_huge_pages and surplus_huge_pages together | expand |
On Tue, Mar 04, 2025 at 04:38:41PM +0800, Liu Shixin wrote: > In alloc_surplus_hugetlb_folio(), we increase nr_huge_pages and > surplus_huge_pages separately. In the middle window, if we set > nr_hugepages to smaller and satisfy count < persistent_huge_pages(h), > the surplus_huge_pages will be increased by adjust_pool_surplus(). > > After adding delay in the middle window, we can reproduce the problem > easily by following step: > > 1. echo 3 > /proc/sys/vm/nr_overcommit_hugepages > 2. mmap two hugepages. When nr_huge_pages=2 and surplus_huge_pages=1, > goto step 3. > 3. echo 0 > /proc/sys/vm/nr_huge_pages > > Finally, nr_huge_pages is less than surplus_huge_pages. > > To fix the problem, call only_alloc_fresh_hugetlb_folio() instead and > move down __prep_account_new_huge_page() into the hugetlb_lock. > > Fixes: 0c397daea1d4 ("mm, hugetlb: further simplify hugetlb allocation API") > Signed-off-by: Liu Shixin <liushixin2@huawei.com> > --- > mm/hugetlb.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 9faa1034704ff..0b02ea1c39e63 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2253,11 +2253,19 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h, > goto out_unlock; > spin_unlock_irq(&hugetlb_lock); > > - folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask); > + folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL); > if (!folio) > return NULL; > > + hugetlb_vmemmap_optimize_folio(h, folio); > + > spin_lock_irq(&hugetlb_lock); > + /* > + * Update nr_huge_pages and surplus_huge_pages together, > + * otherwise it might confuse persistent_huge_pages() momentarily. > + */ > + __prep_account_new_huge_page(h, nid); It would be great if we could pair up this one with the actual increase of surplus pages, but then free_huge_folio() will decrease the nr_huge_pages if the pool changed. Also, that comment makes me think that__prep_account_new_huge_page() will adjust both counters, so maybe just go with something like "nr_huge_pages needs to be adjusted within the same lock cycle as surplus_pages, otherwise..."
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9faa1034704ff..0b02ea1c39e63 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2253,11 +2253,19 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h, goto out_unlock; spin_unlock_irq(&hugetlb_lock); - folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask); + folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL); if (!folio) return NULL; + hugetlb_vmemmap_optimize_folio(h, folio); + spin_lock_irq(&hugetlb_lock); + /* + * Update nr_huge_pages and surplus_huge_pages together, + * otherwise it might confuse persistent_huge_pages() momentarily. + */ + __prep_account_new_huge_page(h, nid); + /* * We could have raced with the pool size change. * Double check that and simply deallocate the new page
In alloc_surplus_hugetlb_folio(), we increase nr_huge_pages and surplus_huge_pages separately. In the middle window, if we set nr_hugepages to smaller and satisfy count < persistent_huge_pages(h), the surplus_huge_pages will be increased by adjust_pool_surplus(). After adding delay in the middle window, we can reproduce the problem easily by following step: 1. echo 3 > /proc/sys/vm/nr_overcommit_hugepages 2. mmap two hugepages. When nr_huge_pages=2 and surplus_huge_pages=1, goto step 3. 3. echo 0 > /proc/sys/vm/nr_huge_pages Finally, nr_huge_pages is less than surplus_huge_pages. To fix the problem, call only_alloc_fresh_hugetlb_folio() instead and move down __prep_account_new_huge_page() into the hugetlb_lock. Fixes: 0c397daea1d4 ("mm, hugetlb: further simplify hugetlb allocation API") Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/hugetlb.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)