Message ID | 20210809184832.18342-2-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | hugetlb: fix potential ref counting races | expand |
On 2021-08-09 20:48, Mike Kravetz wrote: > Code in prep_compound_gigantic_page waits for a rcu grace period if it > notices a temporarily inflated ref count on a tail page. This was due > to the identified potential race with speculative page cache references > which could only last for a rcu grace period. This is overly > complicated > as this situation is VERY unlikely to ever happen. Instead, just > quickly > return an error. > Also, only print a warning in prep_compound_gigantic_page instead of > multiple callers. The above makes sense to me. My only question would be: gather_bootmem_prealloc() is an __init function. Can we have speculative refcounts due to e.g: pagecache at that early stage? I think we cannot, but I am not really sure. We might be able to remove that else() in case we cannot have such scenarios.
On 8/10/21 2:29 AM, Oscar Salvador wrote: > On 2021-08-09 20:48, Mike Kravetz wrote: >> Code in prep_compound_gigantic_page waits for a rcu grace period if it >> notices a temporarily inflated ref count on a tail page. This was due >> to the identified potential race with speculative page cache references >> which could only last for a rcu grace period. This is overly complicated >> as this situation is VERY unlikely to ever happen. Instead, just quickly >> return an error. >> Also, only print a warning in prep_compound_gigantic_page instead of >> multiple callers. > > The above makes sense to me. Thanks for taking a look! > My only question would be: gather_bootmem_prealloc() is an __init function. > Can we have speculative refcounts due to e.g: pagecache at that early stage? > I think we cannot, but I am not really sure. I had the same thought when adding that code. We cannot have a speculative refcount this early after boot. However, further thinking below... > > We might be able to remove that else() in case we cannot have such scenarios. > Instead of removing the else, I think we should put a BUG_ON() just to make sure the condition never happens in the future. Otherwise, we would just abandon the gigantic page and leave memory (1GB or so) unavailable. Even if it were possible to have speculative references this early in the boot process, the likelihood of it happening here is still VERY small. So, I would not expect a BUG_ON() to ever be hit in development or testing. Rather, with our luck it would show up in some production environment. Since handling the race in this routine is so simple, I chose to just add the code to handle it.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dfc940d5221d..791ee699d635 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1657,16 +1657,14 @@ static bool prep_compound_gigantic_page(struct page *page, unsigned int order) * cache adding could take a ref on a 'to be' tail page. * We need to respect any increased ref count, and only set * the ref count to zero if count is currently 1. If count - * is not 1, we call synchronize_rcu in the hope that a rcu - * grace period will cause ref count to drop and then retry. - * If count is still inflated on retry we return an error and - * must discard the pages. + * is not 1, we return an error. An error return indicates + * the set of pages can not be converted to a gigantic page. + * The caller who allocated the pages should then discard the + * pages using the appropriate free interface. */ if (!page_ref_freeze(p, 1)) { - pr_info("HugeTLB unexpected inflated ref count on freshly allocated page\n"); - synchronize_rcu(); - if (!page_ref_freeze(p, 1)) - goto out_error; + pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); + goto out_error; } set_page_count(p, 0); set_compound_head(p, page); @@ -1830,7 +1828,6 @@ static struct page *alloc_fresh_huge_page(struct hstate *h, retry = true; goto retry; } - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); return NULL; } } @@ -2828,8 +2825,8 @@ static void __init gather_bootmem_prealloc(void) prep_new_huge_page(h, page, page_to_nid(page)); put_page(page); /* add to the hugepage allocator */ } else { + /* VERY unlikely inflated ref count on a tail page */ free_gigantic_page(page, huge_page_order(h)); - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); } /*
Code in prep_compound_gigantic_page waits for a rcu grace period if it notices a temporarily inflated ref count on a tail page. This was due to the identified potential race with speculative page cache references which could only last for a rcu grace period. This is overly complicated as this situation is VERY unlikely to ever happen. Instead, just quickly return an error. Also, only print a warning in prep_compound_gigantic_page instead of multiple callers. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)