diff mbox series

[v2,1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code

Message ID 20210809184832.18342-2-mike.kravetz@oracle.com (mailing list archive)
State New
Headers show
Series hugetlb: fix potential ref counting races | expand

Commit Message

Mike Kravetz Aug. 9, 2021, 6:48 p.m. UTC
Code in prep_compound_gigantic_page waits for a rcu grace period if it
notices a temporarily inflated ref count on a tail page.  This was due
to the identified potential race with speculative page cache references
which could only last for a rcu grace period.  This is overly complicated
as this situation is VERY unlikely to ever happen.  Instead, just quickly
return an error.

Also, only print a warning in prep_compound_gigantic_page instead of
multiple callers.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

Comments

Oscar Salvador Aug. 10, 2021, 9:29 a.m. UTC | #1
On 2021-08-09 20:48, Mike Kravetz wrote:
> Code in prep_compound_gigantic_page waits for a rcu grace period if it
> notices a temporarily inflated ref count on a tail page.  This was due
> to the identified potential race with speculative page cache references
> which could only last for a rcu grace period.  This is overly 
> complicated
> as this situation is VERY unlikely to ever happen.  Instead, just 
> quickly
> return an error.
> Also, only print a warning in prep_compound_gigantic_page instead of
> multiple callers.

The above makes sense to me.
My only question would be: gather_bootmem_prealloc() is an __init 
function.
Can we have speculative refcounts due to e.g: pagecache at that early 
stage?
I think we cannot, but I am not really sure.

We might be able to remove that else() in case we cannot have such 
scenarios.
Mike Kravetz Aug. 10, 2021, 4:51 p.m. UTC | #2
On 8/10/21 2:29 AM, Oscar Salvador wrote:
> On 2021-08-09 20:48, Mike Kravetz wrote:
>> Code in prep_compound_gigantic_page waits for a rcu grace period if it
>> notices a temporarily inflated ref count on a tail page.  This was due
>> to the identified potential race with speculative page cache references
>> which could only last for a rcu grace period.  This is overly complicated
>> as this situation is VERY unlikely to ever happen.  Instead, just quickly
>> return an error.
>> Also, only print a warning in prep_compound_gigantic_page instead of
>> multiple callers.
> 
> The above makes sense to me.

Thanks for taking a look!

> My only question would be: gather_bootmem_prealloc() is an __init function.
> Can we have speculative refcounts due to e.g: pagecache at that early stage?
> I think we cannot, but I am not really sure.

I had the same thought when adding that code.  We cannot have a
speculative refcount this early after boot.  However, further
thinking below...

> 
> We might be able to remove that else() in case we cannot have such scenarios.
> 

Instead of removing the else, I think we should put a BUG_ON() just to
make sure the condition never happens in the future.  Otherwise, we would
just abandon the gigantic page and leave memory (1GB or so) unavailable.

Even if it were possible to have speculative references this early in
the boot process, the likelihood of it happening here is still VERY
small.  So, I would not expect a BUG_ON() to ever be hit in development or
testing.  Rather, with our luck it would show up in some production
environment.

Since handling the race in this routine is so simple, I chose to just
add the code to handle it.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dfc940d5221d..791ee699d635 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1657,16 +1657,14 @@  static bool prep_compound_gigantic_page(struct page *page, unsigned int order)
 		 * cache adding could take a ref on a 'to be' tail page.
 		 * We need to respect any increased ref count, and only set
 		 * the ref count to zero if count is currently 1.  If count
-		 * is not 1, we call synchronize_rcu in the hope that a rcu
-		 * grace period will cause ref count to drop and then retry.
-		 * If count is still inflated on retry we return an error and
-		 * must discard the pages.
+		 * is not 1, we return an error.  An error return indicates
+		 * the set of pages can not be converted to a gigantic page.
+		 * The caller who allocated the pages should then discard the
+		 * pages using the appropriate free interface.
 		 */
 		if (!page_ref_freeze(p, 1)) {
-			pr_info("HugeTLB unexpected inflated ref count on freshly allocated page\n");
-			synchronize_rcu();
-			if (!page_ref_freeze(p, 1))
-				goto out_error;
+			pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
+			goto out_error;
 		}
 		set_page_count(p, 0);
 		set_compound_head(p, page);
@@ -1830,7 +1828,6 @@  static struct page *alloc_fresh_huge_page(struct hstate *h,
 				retry = true;
 				goto retry;
 			}
-			pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
 			return NULL;
 		}
 	}
@@ -2828,8 +2825,8 @@  static void __init gather_bootmem_prealloc(void)
 			prep_new_huge_page(h, page, page_to_nid(page));
 			put_page(page); /* add to the hugepage allocator */
 		} else {
+			/* VERY unlikely inflated ref count on a tail page */
 			free_gigantic_page(page, huge_page_order(h));
-			pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
 		}
 
 		/*