diff mbox series

[1/4] mm/memory-failure.c: fix race with changing page compound again

Message ID 20220228140245.24552-2-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series A few fixup patches for memory failure | expand

Commit Message

Miaohe Lin Feb. 28, 2022, 2:02 p.m. UTC
There is a race window where we got the compound_head, the hugetlb page
could be freed to buddy, or even changed to another compound page just
before we try to get hwpoison page. If this happens, just bail out.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

HORIGUCHI NAOYA(堀口 直也) March 4, 2022, 8:26 a.m. UTC | #1
On Mon, Feb 28, 2022 at 10:02:42PM +0800, Miaohe Lin wrote:
> There is a race window where we got the compound_head, the hugetlb page
> could be freed to buddy, or even changed to another compound page just
> before we try to get hwpoison page. If this happens, just bail out.

I think that when some hugetlb page is about to change into other type/size
of compound page, it has to go through buddy allocator because hugetlb pages
are maintained in separate memory allocator and they never change into other
normal state directly.  memory_failure_hugetlb() takes refcount before
lock_page(), so the hugetlb page seems not change between get_hwpoison_page()
and lock_page(). So it this new check really necessary?

Thanks,
Naoya Horiguchi

> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5444a8ef4867..0d7c58340a98 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1534,6 +1534,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  	}
>  
>  	lock_page(head);
> +
> +	/**
> +	 * The page could have changed compound pages due to race window.
> +	 * If this happens just bail out.
> +	 */
> +	if (!PageHuge(p) || compound_head(p) != head) {
> +		action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
> +		res = -EBUSY;
> +		goto out;
> +	}
> +
>  	page_flags = head->flags;
>  
>  	if (hwpoison_filter(p)) {
> -- 
> 2.23.0
Mike Kravetz March 4, 2022, 7:32 p.m. UTC | #2
On 3/4/22 00:26, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Feb 28, 2022 at 10:02:42PM +0800, Miaohe Lin wrote:
>> There is a race window where we got the compound_head, the hugetlb page
>> could be freed to buddy, or even changed to another compound page just
>> before we try to get hwpoison page. If this happens, just bail out.
> 
> I think that when some hugetlb page is about to change into other type/size
> of compound page, it has to go through buddy allocator because hugetlb pages
> are maintained in separate memory allocator and they never change into other
> normal state directly.  memory_failure_hugetlb() takes refcount before
> lock_page(), so the hugetlb page seems not change between get_hwpoison_page()
> and lock_page(). So it this new check really necessary?

A hugetlb page could change size without going through buddy via the new
demote functionality [1].  Only hugetlb pages on the hugetlb free list can
be demoted.  

We should not demote a page if poison is set.  However, there is no check in
the demote code.  IIUC, poison is set early in the memory error handling
process, even before taking ref on page.  Demote code needs to be fixed so
that poisoned pages are not demoted.  I can do that.

With this change in place, then I think Naoya's statement that hugetlb pages
can not change state is correct and this patch is not necessary.

Does that sound reasonable?

[1] https://lore.kernel.org/linux-mm/20211007181918.136982-1-mike.kravetz@oracle.com/
Miaohe Lin March 7, 2022, 3:44 a.m. UTC | #3
On 2022/3/5 3:32, Mike Kravetz wrote:
> On 3/4/22 00:26, HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Mon, Feb 28, 2022 at 10:02:42PM +0800, Miaohe Lin wrote:
>>> There is a race window where we got the compound_head, the hugetlb page
>>> could be freed to buddy, or even changed to another compound page just
>>> before we try to get hwpoison page. If this happens, just bail out.
>>
>> I think that when some hugetlb page is about to change into other type/size
>> of compound page, it has to go through buddy allocator because hugetlb pages
>> are maintained in separate memory allocator and they never change into other
>> normal state directly.  memory_failure_hugetlb() takes refcount before
>> lock_page(), so the hugetlb page seems not change between get_hwpoison_page()
>> and lock_page(). So it this new check really necessary?
> 
> A hugetlb page could change size without going through buddy via the new
> demote functionality [1].  Only hugetlb pages on the hugetlb free list can
> be demoted.  
> 
> We should not demote a page if poison is set.  However, there is no check in
> the demote code.  IIUC, poison is set early in the memory error handling
> process, even before taking ref on page.  Demote code needs to be fixed so
> that poisoned pages are not demoted.  I can do that.
> 
> With this change in place, then I think Naoya's statement that hugetlb pages
> can not change state is correct and this patch is not necessary.
> 

Sorry for my confusing commit words. What I mean to tell is indeed the below race:
  CPU 1							CPU 2
  memory_failure_hugetlb
  struct page *head = compound_head(p);
							hugetlb page is freed to buddy, or
							even changed to another compound page
							as we haven't held the page refcnt now
  get_hwpoison_page -- page is not what we want now...

Does this make sense for both of you? Many thanks for comment and reply! :)

> Does that sound reasonable?
> 
> [1] https://lore.kernel.org/linux-mm/20211007181918.136982-1-mike.kravetz@oracle.com/

This is really a nice feature. Thanks.

>
HORIGUCHI NAOYA(堀口 直也) March 7, 2022, 7:01 a.m. UTC | #4
On Mon, Mar 07, 2022 at 11:44:20AM +0800, Miaohe Lin wrote:
> On 2022/3/5 3:32, Mike Kravetz wrote:
> > On 3/4/22 00:26, HORIGUCHI NAOYA(堀口 直也) wrote:
> >> On Mon, Feb 28, 2022 at 10:02:42PM +0800, Miaohe Lin wrote:
> >>> There is a race window where we got the compound_head, the hugetlb page
> >>> could be freed to buddy, or even changed to another compound page just
> >>> before we try to get hwpoison page. If this happens, just bail out.
> >>
> >> I think that when some hugetlb page is about to change into other type/size
> >> of compound page, it has to go through buddy allocator because hugetlb pages
> >> are maintained in separate memory allocator and they never change into other
> >> normal state directly.  memory_failure_hugetlb() takes refcount before
> >> lock_page(), so the hugetlb page seems not change between get_hwpoison_page()
> >> and lock_page(). So it this new check really necessary?
> > 
> > A hugetlb page could change size without going through buddy via the new
> > demote functionality [1].  Only hugetlb pages on the hugetlb free list can
> > be demoted.  
> > 
> > We should not demote a page if poison is set.  However, there is no check in
> > the demote code.  IIUC, poison is set early in the memory error handling
> > process, even before taking ref on page.  Demote code needs to be fixed so
> > that poisoned pages are not demoted.  I can do that.
> > 
> > With this change in place, then I think Naoya's statement that hugetlb pages
> > can not change state is correct and this patch is not necessary.
> > 
> 
> Sorry for my confusing commit words. What I mean to tell is indeed the below race:
>   CPU 1							CPU 2
>   memory_failure_hugetlb
>   struct page *head = compound_head(p);
> 							hugetlb page is freed to buddy, or
> 							even changed to another compound page
> 							as we haven't held the page refcnt now
>   get_hwpoison_page -- page is not what we want now...
> 
> Does this make sense for both of you? Many thanks for comment and reply! :)

Thanks for elaboration, I agree with you (I simply overlooked this race, sorry).
And please add this in the commit log.

> +
> +       /**
> +        * The page could have changed compound pages due to race window.
> +        * If this happens just bail out.
> +        */
> +       if (!PageHuge(p) || compound_head(p) != head) {
> +               action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
> +               res = -EBUSY;
> +               goto out;
> +       }

Let me have one comment on the diff. The result code MF_MSG_DIFFERENT_COMPOUND
might not fit when PageHuge is false in the check (because it's no longer a
compound page).  Maybe you may invent another result code, or changes
MF_MSG_DIFFERENT_COMPOUND (for example) to MF_MSG_DIFFERENT_PAGE_SIZE?

> 
> > Does that sound reasonable?
> > 
> > [1] https://lore.kernel.org/linux-mm/20211007181918.136982-1-mike.kravetz@oracle.com/
> 
> This is really a nice feature. Thanks.

Yes. Hugepage demotion after entering memory_failure_hugetlb() should
cause the race, so we need the check after page lock.

Thanks,
Naoya Horiguchi
Mike Kravetz March 7, 2022, 7:07 p.m. UTC | #5
On 3/6/22 23:01, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Mar 07, 2022 at 11:44:20AM +0800, Miaohe Lin wrote:
>> On 2022/3/5 3:32, Mike Kravetz wrote:
>>> On 3/4/22 00:26, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>> On Mon, Feb 28, 2022 at 10:02:42PM +0800, Miaohe Lin wrote:
>>>>> There is a race window where we got the compound_head, the hugetlb page
>>>>> could be freed to buddy, or even changed to another compound page just
>>>>> before we try to get hwpoison page. If this happens, just bail out.
>>>>
>>>> I think that when some hugetlb page is about to change into other type/size
>>>> of compound page, it has to go through buddy allocator because hugetlb pages
>>>> are maintained in separate memory allocator and they never change into other
>>>> normal state directly.  memory_failure_hugetlb() takes refcount before
>>>> lock_page(), so the hugetlb page seems not change between get_hwpoison_page()
>>>> and lock_page(). So it this new check really necessary?
>>>
>>> A hugetlb page could change size without going through buddy via the new
>>> demote functionality [1].  Only hugetlb pages on the hugetlb free list can
>>> be demoted.  
>>>
>>> We should not demote a page if poison is set.  However, there is no check in
>>> the demote code.  IIUC, poison is set early in the memory error handling
>>> process, even before taking ref on page.  Demote code needs to be fixed so
>>> that poisoned pages are not demoted.  I can do that.
>>>
>>> With this change in place, then I think Naoya's statement that hugetlb pages
>>> can not change state is correct and this patch is not necessary.
>>>
>>
>> Sorry for my confusing commit words. What I mean to tell is indeed the below race:
>>   CPU 1							CPU 2
>>   memory_failure_hugetlb
>>   struct page *head = compound_head(p);
>> 							hugetlb page is freed to buddy, or
>> 							even changed to another compound page
>> 							as we haven't held the page refcnt now
>>   get_hwpoison_page -- page is not what we want now...
>>
>> Does this make sense for both of you? Many thanks for comment and reply! :)
> 
> Thanks for elaboration, I agree with you (I simply overlooked this race, sorry).

Yes, thank you.

> And please add this in the commit log.
> 
>> +
>> +       /**
>> +        * The page could have changed compound pages due to race window.
>> +        * If this happens just bail out.
>> +        */
>> +       if (!PageHuge(p) || compound_head(p) != head) {
>> +               action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
>> +               res = -EBUSY;
>> +               goto out;
>> +       }
> 
> Let me have one comment on the diff. The result code MF_MSG_DIFFERENT_COMPOUND
> might not fit when PageHuge is false in the check (because it's no longer a
> compound page).  Maybe you may invent another result code, or changes
> MF_MSG_DIFFERENT_COMPOUND (for example) to MF_MSG_DIFFERENT_PAGE_SIZE?
> 

Suppose we do encounter this race.  Also, suppose p != head.
At the beginning of memory_failure_hugetlb, we do:

struct page *head = compound_head(p);
...
if (TestSetPageHWPoison(head))

So, it could be that we set Poison in the 'head' page but the error was really
in another page.  Is that correct?

Now with the race, head is not a huge page and the pages could even be on
buddy.  Does this mean we could have poison set on the wrong page in buddy?
HORIGUCHI NAOYA(堀口 直也) March 8, 2022, 6:56 a.m. UTC | #6
On Mon, Mar 07, 2022 at 11:07:32AM -0800, Mike Kravetz wrote:
...
> > 
> >> +
> >> +       /**
> >> +        * The page could have changed compound pages due to race window.
> >> +        * If this happens just bail out.
> >> +        */
> >> +       if (!PageHuge(p) || compound_head(p) != head) {
> >> +               action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
> >> +               res = -EBUSY;
> >> +               goto out;
> >> +       }
> > 
> > Let me have one comment on the diff. The result code MF_MSG_DIFFERENT_COMPOUND
> > might not fit when PageHuge is false in the check (because it's no longer a
> > compound page).  Maybe you may invent another result code, or changes
> > MF_MSG_DIFFERENT_COMPOUND (for example) to MF_MSG_DIFFERENT_PAGE_SIZE?
> > 
> 
> Suppose we do encounter this race.  Also, suppose p != head.
> At the beginning of memory_failure_hugetlb, we do:
> 
> struct page *head = compound_head(p);
> ...
> if (TestSetPageHWPoison(head))
> 
> So, it could be that we set Poison in the 'head' page but the error was really
> in another page.  Is that correct?
> 
> Now with the race, head is not a huge page and the pages could even be on
> buddy.  Does this mean we could have poison set on the wrong page in buddy?

Correct, the race might be rare, but this needs a fix.
I think that setting PageHWPoison first (before taking refcount and page lock)
is the root of all related problems.  This behavior came from the original
concept in hwpoison that preventing consumption of corrupted data is the first
priority.  But now I think that this makes no sense if we have this kind of bugs.

I'll try to write a patch for this (I only fix memory_failure_hugetlb() first,
but generic path should be fixed later).
Thank you for pointing out.

- Naoya Horiguchi
Miaohe Lin March 8, 2022, 11:28 a.m. UTC | #7
On 2022/3/8 14:56, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Mar 07, 2022 at 11:07:32AM -0800, Mike Kravetz wrote:
> ...
>>>
>>>> +
>>>> +       /**
>>>> +        * The page could have changed compound pages due to race window.
>>>> +        * If this happens just bail out.
>>>> +        */
>>>> +       if (!PageHuge(p) || compound_head(p) != head) {
>>>> +               action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
>>>> +               res = -EBUSY;
>>>> +               goto out;
>>>> +       }
>>>
>>> Let me have one comment on the diff. The result code MF_MSG_DIFFERENT_COMPOUND
>>> might not fit when PageHuge is false in the check (because it's no longer a
>>> compound page).  Maybe you may invent another result code, or changes
>>> MF_MSG_DIFFERENT_COMPOUND (for example) to MF_MSG_DIFFERENT_PAGE_SIZE?
>>>
>>
>> Suppose we do encounter this race.  Also, suppose p != head.
>> At the beginning of memory_failure_hugetlb, we do:
>>
>> struct page *head = compound_head(p);
>> ...
>> if (TestSetPageHWPoison(head))
>>
>> So, it could be that we set Poison in the 'head' page but the error was really
>> in another page.  Is that correct?
>>
>> Now with the race, head is not a huge page and the pages could even be on
>> buddy.  Does this mean we could have poison set on the wrong page in buddy?
> 
> Correct, the race might be rare, but this needs a fix.
> I think that setting PageHWPoison first (before taking refcount and page lock)
> is the root of all related problems.  This behavior came from the original
> concept in hwpoison that preventing consumption of corrupted data is the first
> priority.  But now I think that this makes no sense if we have this kind of bugs.
> 
> I'll try to write a patch for this (I only fix memory_failure_hugetlb() first,
> but generic path should be fixed later).
> Thank you for pointing out.

Many thanks for both of you for doing this. :)

> 
> - Naoya Horiguchi
>
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5444a8ef4867..0d7c58340a98 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1534,6 +1534,17 @@  static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	}
 
 	lock_page(head);
+
+	/**
+	 * The page could have changed compound pages due to race window.
+	 * If this happens just bail out.
+	 */
+	if (!PageHuge(p) || compound_head(p) != head) {
+		action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
+		res = -EBUSY;
+		goto out;
+	}
+
 	page_flags = head->flags;
 
 	if (hwpoison_filter(p)) {