diff mbox series

[6/8] mm/memory-failure: Convert memory_failure() to use a folio

Message ID 20240229212036.2160900-7-willy@infradead.org (mailing list archive)
State New
Headers show
Series Some cleanups for memory-failure | expand

Commit Message

Matthew Wilcox (Oracle) Feb. 29, 2024, 9:20 p.m. UTC
Saves dozens of calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory-failure.c | 40 +++++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 19 deletions(-)

Comments

Miaohe Lin March 8, 2024, 8:48 a.m. UTC | #1
On 2024/3/1 5:20, Matthew Wilcox (Oracle) wrote:
> Saves dozens of calls to compound_head().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/memory-failure.c | 40 +++++++++++++++++++++-------------------
>  1 file changed, 21 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index fe4959e994d0..74e87a0a792c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2189,7 +2189,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>  int memory_failure(unsigned long pfn, int flags)
>  {
>  	struct page *p;
> -	struct page *hpage;
> +	struct folio *folio;
>  	struct dev_pagemap *pgmap;
>  	int res = 0;
>  	unsigned long page_flags;
> @@ -2277,8 +2277,8 @@ int memory_failure(unsigned long pfn, int flags)
>  		}
>  	}
>  
> -	hpage = compound_head(p);
> -	if (PageTransHuge(hpage)) {
> +	folio = page_folio(p);
> +	if (folio_test_large(folio)) {
>  		/*
>  		 * The flag must be set after the refcount is bumped
>  		 * otherwise it may race with THP split.
> @@ -2292,12 +2292,13 @@ int memory_failure(unsigned long pfn, int flags)
>  		 * or unhandlable page.  The refcount is bumped iff the
>  		 * page is a valid handlable page.
>  		 */
> -		SetPageHasHWPoisoned(hpage);
> +		folio_set_has_hwpoisoned(folio);
>  		if (try_to_split_thp_page(p) < 0) {
>  			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>  			goto unlock_mutex;
>  		}
>  		VM_BUG_ON_PAGE(!page_count(p), p);
> +		folio = page_folio(p);
>  	}
>  
>  	/*
> @@ -2308,9 +2309,9 @@ int memory_failure(unsigned long pfn, int flags)
>  	 * The check (unnecessarily) ignores LRU pages being isolated and
>  	 * walked by the page reclaim code, however that's not a big loss.
>  	 */
> -	shake_page(p);
> +	shake_folio(folio);
>  
> -	lock_page(p);
> +	folio_lock(folio);
>  
>  	/*
>  	 * We're only intended to deal with the non-Compound page here.
> @@ -2318,11 +2319,11 @@ int memory_failure(unsigned long pfn, int flags)
>  	 * race window. If this happens, we could try again to hopefully
>  	 * handle the page next round.
>  	 */
> -	if (PageCompound(p)) {
> +	if (folio_test_large(folio)) {

folio_test_large() only checks whether PG_head is set but PageCompound() also checks PageTail().
So folio_test_large() and PageCompound() are not equivalent?

Thanks.
Matthew Wilcox (Oracle) March 11, 2024, 12:31 p.m. UTC | #2
On Fri, Mar 08, 2024 at 04:48:33PM +0800, Miaohe Lin wrote:
> On 2024/3/1 5:20, Matthew Wilcox (Oracle) wrote:
> > @@ -2277,8 +2277,8 @@ int memory_failure(unsigned long pfn, int flags)
> >  		}
> >  	}
> >  
> > -	hpage = compound_head(p);
> > -	if (PageTransHuge(hpage)) {
> > +	folio = page_folio(p);
> > +	if (folio_test_large(folio)) {
> >  		/*
> >  		 * The flag must be set after the refcount is bumped
> >  		 * otherwise it may race with THP split.
[...]
> > @@ -2318,11 +2319,11 @@ int memory_failure(unsigned long pfn, int flags)
> >  	 * race window. If this happens, we could try again to hopefully
> >  	 * handle the page next round.
> >  	 */
> > -	if (PageCompound(p)) {
> > +	if (folio_test_large(folio)) {
> 
> folio_test_large() only checks whether PG_head is set but PageCompound() also checks PageTail().
> So folio_test_large() and PageCompound() are not equivalent?

Assuming we have a refcount on this page so it can't be simultaneously
split/freed/whatever, these three sequences are equivalent:

1	if (PageCompound(p))

2	struct page *head = compound_head(p);
2	if (PageHead(head))

3	struct folio *folio = page_folio(p);
3	if (folio_test_large(folio))
Miaohe Lin March 12, 2024, 7:07 a.m. UTC | #3
On 2024/3/11 20:31, Matthew Wilcox wrote:
> On Fri, Mar 08, 2024 at 04:48:33PM +0800, Miaohe Lin wrote:
>> On 2024/3/1 5:20, Matthew Wilcox (Oracle) wrote:
>>> @@ -2277,8 +2277,8 @@ int memory_failure(unsigned long pfn, int flags)
>>>  		}
>>>  	}
>>>  
>>> -	hpage = compound_head(p);
>>> -	if (PageTransHuge(hpage)) {
>>> +	folio = page_folio(p);
>>> +	if (folio_test_large(folio)) {
>>>  		/*
>>>  		 * The flag must be set after the refcount is bumped
>>>  		 * otherwise it may race with THP split.
> [...]
>>> @@ -2318,11 +2319,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>  	 * race window. If this happens, we could try again to hopefully
>>>  	 * handle the page next round.
>>>  	 */
>>> -	if (PageCompound(p)) {
>>> +	if (folio_test_large(folio)) {
>>
>> folio_test_large() only checks whether PG_head is set but PageCompound() also checks PageTail().
>> So folio_test_large() and PageCompound() are not equivalent?
> 
> Assuming we have a refcount on this page so it can't be simultaneously
> split/freed/whatever, these three sequences are equivalent:

If page is stable after page refcnt is held, I agree below three sequences are equivalent.

> 
> 1	if (PageCompound(p))
> 
> 2	struct page *head = compound_head(p);
> 2	if (PageHead(head))
> 
> 3	struct folio *folio = page_folio(p);
> 3	if (folio_test_large(folio))
> 
> .
> 

But please see below commit:

"""
commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Aug 6 16:06:49 2014 -0700

    hwpoison: fix race with changing page during offlining

    When a hwpoison page is locked it could change state due to parallel
    modifications.  The original compound page can be torn down and then
    this 4k page becomes part of a differently-size compound page is is a
    standalone regular page.

    Check after the lock if the page is still the same compound page.

    We could go back, grab the new head page and try again but it should be
    quite rare, so I thought this was safest.  A retry loop would be more
    difficult to test and may have more side effects.

    The hwpoison code by design only tries to handle cases that are
    reasonably common in workloads, as visible in page-flags.

    I'm not really that concerned about handling this (likely rare case),
    just not crashing on it.

    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a013bc94ebbe..44c6bd201d3a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1172,6 +1172,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)

 	lock_page(hpage);

+	/*
+	 * The page could have changed compound pages during the locking.
+	 * If this happens just bail out.
+	 */
+	if (compound_head(p) != hpage) {
+		action_result(pfn, "different compound page after locking", IGNORED);
+		res = -EBUSY;
+		goto out;
+	}
+
 	/*
 	 * We use page flags to determine what action should be taken, but
 	 * the flags can be modified by the error containment action.  One

"""

It says a page could still change to a differently-size compound page due to parallel
modifications even if extra page refcnt is held and page is locked. But this commit is
early (ten years ago) things might have changed. Any thoughts?

Thanks.
Matthew Wilcox (Oracle) March 12, 2024, 2:14 p.m. UTC | #4
On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
> On 2024/3/11 20:31, Matthew Wilcox wrote:
> > Assuming we have a refcount on this page so it can't be simultaneously
> > split/freed/whatever, these three sequences are equivalent:
> 
> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
> 
> > 
> > 1	if (PageCompound(p))
> > 
> > 2	struct page *head = compound_head(p);
> > 2	if (PageHead(head))
> > 
> > 3	struct folio *folio = page_folio(p);
> > 3	if (folio_test_large(folio))
> > 
> > .
> > 
> 
> But please see below commit:
> 
> """
> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
> Author: Andi Kleen <ak@linux.intel.com>
> Date:   Wed Aug 6 16:06:49 2014 -0700
> 
>     hwpoison: fix race with changing page during offlining
> 
>     When a hwpoison page is locked it could change state due to parallel
>     modifications.  The original compound page can be torn down and then
>     this 4k page becomes part of a differently-size compound page is is a
>     standalone regular page.
> 
>     Check after the lock if the page is still the same compound page.

I can't speak to what the rules were ten years ago, but this is not
true now.  Compound pages cannot be split if you hold a refcount.
Since we don't track a per-page refcount, we wouldn't know which of
the split pages to give the excess refcount to.
Jane Chu March 13, 2024, 1:23 a.m. UTC | #5
On 3/12/2024 7:14 AM, Matthew Wilcox wrote:

> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
>> On 2024/3/11 20:31, Matthew Wilcox wrote:
>>> Assuming we have a refcount on this page so it can't be simultaneously
>>> split/freed/whatever, these three sequences are equivalent:
>> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
>>
>>> 1	if (PageCompound(p))
>>>
>>> 2	struct page *head = compound_head(p);
>>> 2	if (PageHead(head))
>>>
>>> 3	struct folio *folio = page_folio(p);
>>> 3	if (folio_test_large(folio))
>>>
>>> .
>>>
>> But please see below commit:
>>
>> """
>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
>> Author: Andi Kleen <ak@linux.intel.com>
>> Date:   Wed Aug 6 16:06:49 2014 -0700
>>
>>      hwpoison: fix race with changing page during offlining
>>
>>      When a hwpoison page is locked it could change state due to parallel
>>      modifications.  The original compound page can be torn down and then
>>      this 4k page becomes part of a differently-size compound page is is a
>>      standalone regular page.
>>
>>      Check after the lock if the page is still the same compound page.
> I can't speak to what the rules were ten years ago, but this is not
> true now.  Compound pages cannot be split if you hold a refcount.
> Since we don't track a per-page refcount, we wouldn't know which of
> the split pages to give the excess refcount to.

I noticed this recently

  * GUP pin and PG_locked transferred to @page. Rest subpages can be 
freed if
  * they are not mapped.
  *
  * Returns 0 if the hugepage is split successfully.
  * Returns -EBUSY if the page is pinned or if anon_vma disappeared from 
under
  * us.
  */
int split_huge_page_to_list(struct page *page, struct list_head *list)
{

I have a test case with poisoned shmem THP page that was mlocked and

GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.

thanks,

-jane
Miaohe Lin March 14, 2024, 2:34 a.m. UTC | #6
On 2024/3/13 9:23, Jane Chu wrote:
> On 3/12/2024 7:14 AM, Matthew Wilcox wrote:
> 
>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
>>> On 2024/3/11 20:31, Matthew Wilcox wrote:
>>>> Assuming we have a refcount on this page so it can't be simultaneously
>>>> split/freed/whatever, these three sequences are equivalent:
>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
>>>
>>>> 1    if (PageCompound(p))
>>>>
>>>> 2    struct page *head = compound_head(p);
>>>> 2    if (PageHead(head))
>>>>
>>>> 3    struct folio *folio = page_folio(p);
>>>> 3    if (folio_test_large(folio))
>>>>
>>>> .
>>>>
>>> But please see below commit:
>>>
>>> """
>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
>>> Author: Andi Kleen <ak@linux.intel.com>
>>> Date:   Wed Aug 6 16:06:49 2014 -0700
>>>
>>>      hwpoison: fix race with changing page during offlining
>>>
>>>      When a hwpoison page is locked it could change state due to parallel
>>>      modifications.  The original compound page can be torn down and then
>>>      this 4k page becomes part of a differently-size compound page is is a
>>>      standalone regular page.
>>>
>>>      Check after the lock if the page is still the same compound page.
>> I can't speak to what the rules were ten years ago, but this is not
>> true now.  Compound pages cannot be split if you hold a refcount.
>> Since we don't track a per-page refcount, we wouldn't know which of
>> the split pages to give the excess refcount to.
> 
> I noticed this recently
> 
>  * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>  * they are not mapped.
>  *
>  * Returns 0 if the hugepage is split successfully.
>  * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>  * us.
>  */
> int split_huge_page_to_list(struct page *page, struct list_head *list)
> {
> 
> I have a test case with poisoned shmem THP page that was mlocked and
> 
> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.

Thanks for points this out. Compound pages can be split even if extra refcnt is held. So folio_test_large
check is not stable if we hold a refcnt now? Will it introduce some obscure races?

Except from that, I think a page cannot become a subpage of a THP when extra refcnt is held now. So below code can be removed.
Any thought?

	/*
	 * We're only intended to deal with the non-Compound page here.
	 * However, the page could have changed compound pages due to
	 * race window. If this happens, we could try again to hopefully
	 * handle the page next round.
	 */
	if (PageCompound(p)) {
		if (retry) {
			ClearPageHWPoison(p);
			unlock_page(p);
			put_page(p);
			flags &= ~MF_COUNT_INCREASED;
			retry = false;
			goto try_again;
		}
		res = action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
		goto unlock_page;
	}

Thanks.

> 
> thanks,
> 
> -jane
> 
> .
Jane Chu March 14, 2024, 6:15 p.m. UTC | #7
On 3/13/2024 7:34 PM, Miaohe Lin wrote:

> On 2024/3/13 9:23, Jane Chu wrote:
>> On 3/12/2024 7:14 AM, Matthew Wilcox wrote:
>>
>>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
>>>> On 2024/3/11 20:31, Matthew Wilcox wrote:
>>>>> Assuming we have a refcount on this page so it can't be simultaneously
>>>>> split/freed/whatever, these three sequences are equivalent:
>>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
>>>>
>>>>> 1    if (PageCompound(p))
>>>>>
>>>>> 2    struct page *head = compound_head(p);
>>>>> 2    if (PageHead(head))
>>>>>
>>>>> 3    struct folio *folio = page_folio(p);
>>>>> 3    if (folio_test_large(folio))
>>>>>
>>>>> .
>>>>>
>>>> But please see below commit:
>>>>
>>>> """
>>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
>>>> Author: Andi Kleen <ak@linux.intel.com>
>>>> Date:   Wed Aug 6 16:06:49 2014 -0700
>>>>
>>>>       hwpoison: fix race with changing page during offlining
>>>>
>>>>       When a hwpoison page is locked it could change state due to parallel
>>>>       modifications.  The original compound page can be torn down and then
>>>>       this 4k page becomes part of a differently-size compound page is is a
>>>>       standalone regular page.
>>>>
>>>>       Check after the lock if the page is still the same compound page.
>>> I can't speak to what the rules were ten years ago, but this is not
>>> true now.  Compound pages cannot be split if you hold a refcount.
>>> Since we don't track a per-page refcount, we wouldn't know which of
>>> the split pages to give the excess refcount to.
>> I noticed this recently
>>
>>   * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>>   * they are not mapped.
>>   *
>>   * Returns 0 if the hugepage is split successfully.
>>   * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>>   * us.
>>   */
>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>> {
>>
>> I have a test case with poisoned shmem THP page that was mlocked and
>>
>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
> Thanks for points this out. Compound pages can be split even if extra refcnt is held. So folio_test_large
> check is not stable if we hold a refcnt now? Will it introduce some obscure races?
>
> Except from that, I think a page cannot become a subpage of a THP when extra refcnt is held now. So below code can be removed.
> Any thought?
>
> 	/*
> 	 * We're only intended to deal with the non-Compound page here.
> 	 * However, the page could have changed compound pages due to
> 	 * race window. If this happens, we could try again to hopefully
> 	 * handle the page next round.
> 	 */
> 	if (PageCompound(p)) {
> 		if (retry) {
> 			ClearPageHWPoison(p);
> 			unlock_page(p);
> 			put_page(p);
> 			flags &= ~MF_COUNT_INCREASED;
> 			retry = false;
> 			goto try_again;
> 		}
> 		res = action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
> 		goto unlock_page;
> 	}
Not sure of what scenario it was meant to deal with.  How about adding a 
warning instead of removal? It'll be interesting to see how the warning 
got triggered. But if after a while nothing happens, then remove it.

thanks!

-jane

>
> Thanks.
>
>> thanks,
>>
>> -jane
>>
>> .
Miaohe Lin March 15, 2024, 6:25 a.m. UTC | #8
On 2024/3/15 2:15, Jane Chu wrote:
> On 3/13/2024 7:34 PM, Miaohe Lin wrote:
> 
>> On 2024/3/13 9:23, Jane Chu wrote:
>>> On 3/12/2024 7:14 AM, Matthew Wilcox wrote:
>>>
>>>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
>>>>> On 2024/3/11 20:31, Matthew Wilcox wrote:
>>>>>> Assuming we have a refcount on this page so it can't be simultaneously
>>>>>> split/freed/whatever, these three sequences are equivalent:
>>>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
>>>>>
>>>>>> 1    if (PageCompound(p))
>>>>>>
>>>>>> 2    struct page *head = compound_head(p);
>>>>>> 2    if (PageHead(head))
>>>>>>
>>>>>> 3    struct folio *folio = page_folio(p);
>>>>>> 3    if (folio_test_large(folio))
>>>>>>
>>>>>> .
>>>>>>
>>>>> But please see below commit:
>>>>>
>>>>> """
>>>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
>>>>> Author: Andi Kleen <ak@linux.intel.com>
>>>>> Date:   Wed Aug 6 16:06:49 2014 -0700
>>>>>
>>>>>       hwpoison: fix race with changing page during offlining
>>>>>
>>>>>       When a hwpoison page is locked it could change state due to parallel
>>>>>       modifications.  The original compound page can be torn down and then
>>>>>       this 4k page becomes part of a differently-size compound page is is a
>>>>>       standalone regular page.
>>>>>
>>>>>       Check after the lock if the page is still the same compound page.
>>>> I can't speak to what the rules were ten years ago, but this is not
>>>> true now.  Compound pages cannot be split if you hold a refcount.
>>>> Since we don't track a per-page refcount, we wouldn't know which of
>>>> the split pages to give the excess refcount to.
>>> I noticed this recently
>>>
>>>   * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>>>   * they are not mapped.
>>>   *
>>>   * Returns 0 if the hugepage is split successfully.
>>>   * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>>>   * us.
>>>   */
>>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>>> {
>>>
>>> I have a test case with poisoned shmem THP page that was mlocked and
>>>
>>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
>> Thanks for points this out. Compound pages can be split even if extra refcnt is held. So folio_test_large
>> check is not stable if we hold a refcnt now? Will it introduce some obscure races?
>>
>> Except from that, I think a page cannot become a subpage of a THP when extra refcnt is held now. So below code can be removed.
>> Any thought?
>>
>>     /*
>>      * We're only intended to deal with the non-Compound page here.
>>      * However, the page could have changed compound pages due to
>>      * race window. If this happens, we could try again to hopefully
>>      * handle the page next round.
>>      */
>>     if (PageCompound(p)) {
>>         if (retry) {
>>             ClearPageHWPoison(p);
>>             unlock_page(p);
>>             put_page(p);
>>             flags &= ~MF_COUNT_INCREASED;
>>             retry = false;
>>             goto try_again;
>>         }
>>         res = action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
>>         goto unlock_page;
>>     }
> Not sure of what scenario it was meant to deal with.  How about adding a warning instead of removal? It'll be interesting to see how the warning got triggered. But if after a while nothing happens, then remove it.

This sounds like a good alternative. Thanks.
Miaohe Lin March 15, 2024, 8:32 a.m. UTC | #9
On 2024/3/13 9:23, Jane Chu wrote:
> On 3/12/2024 7:14 AM, Matthew Wilcox wrote:
> 
>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
>>> On 2024/3/11 20:31, Matthew Wilcox wrote:
>>>> Assuming we have a refcount on this page so it can't be simultaneously
>>>> split/freed/whatever, these three sequences are equivalent:
>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
>>>
>>>> 1    if (PageCompound(p))
>>>>
>>>> 2    struct page *head = compound_head(p);
>>>> 2    if (PageHead(head))
>>>>
>>>> 3    struct folio *folio = page_folio(p);
>>>> 3    if (folio_test_large(folio))
>>>>
>>>> .
>>>>
>>> But please see below commit:
>>>
>>> """
>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
>>> Author: Andi Kleen <ak@linux.intel.com>
>>> Date:   Wed Aug 6 16:06:49 2014 -0700
>>>
>>>      hwpoison: fix race with changing page during offlining
>>>
>>>      When a hwpoison page is locked it could change state due to parallel
>>>      modifications.  The original compound page can be torn down and then
>>>      this 4k page becomes part of a differently-size compound page is is a
>>>      standalone regular page.
>>>
>>>      Check after the lock if the page is still the same compound page.
>> I can't speak to what the rules were ten years ago, but this is not
>> true now.  Compound pages cannot be split if you hold a refcount.
>> Since we don't track a per-page refcount, we wouldn't know which of
>> the split pages to give the excess refcount to.
> 
> I noticed this recently
> 
>  * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>  * they are not mapped.
>  *
>  * Returns 0 if the hugepage is split successfully.
>  * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>  * us.
>  */
> int split_huge_page_to_list(struct page *page, struct list_head *list)
> {
> 
> I have a test case with poisoned shmem THP page that was mlocked and
> 
> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.

Can you elaborate your test case a little bit more detail? There is a check in split_huge_page_to_list():

/* Racy check whether the huge page can be split */
bool can_split_folio(struct folio *folio, int *pextra_pins)
{
	int extra_pins;

	/* Additional pins from page cache */
	if (folio_test_anon(folio))
		extra_pins = folio_test_swapcache(folio) ?
				folio_nr_pages(folio) : 0;
	else
		extra_pins = folio_nr_pages(folio);
	if (pextra_pins)
		*pextra_pins = extra_pins;
	return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - 1;
}

So a large folio can only be split if only one extra page refcnt is held. It means large folio won't be split from
under us if we hold an page refcnt. Or am I miss something?

Thanks.

> 
> thanks,
> 
> -jane
> 
> .
Jane Chu March 15, 2024, 7:22 p.m. UTC | #10
On 3/15/2024 1:32 AM, Miaohe Lin wrote:

> On 2024/3/13 9:23, Jane Chu wrote:
>> On 3/12/2024 7:14 AM, Matthew Wilcox wrote:
>>
>>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
>>>> On 2024/3/11 20:31, Matthew Wilcox wrote:
>>>>> Assuming we have a refcount on this page so it can't be simultaneously
>>>>> split/freed/whatever, these three sequences are equivalent:
>>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
>>>>
>>>>> 1    if (PageCompound(p))
>>>>>
>>>>> 2    struct page *head = compound_head(p);
>>>>> 2    if (PageHead(head))
>>>>>
>>>>> 3    struct folio *folio = page_folio(p);
>>>>> 3    if (folio_test_large(folio))
>>>>>
>>>>> .
>>>>>
>>>> But please see below commit:
>>>>
>>>> """
>>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
>>>> Author: Andi Kleen <ak@linux.intel.com>
>>>> Date:   Wed Aug 6 16:06:49 2014 -0700
>>>>
>>>>       hwpoison: fix race with changing page during offlining
>>>>
>>>>       When a hwpoison page is locked it could change state due to parallel
>>>>       modifications.  The original compound page can be torn down and then
>>>>       this 4k page becomes part of a differently-size compound page is is a
>>>>       standalone regular page.
>>>>
>>>>       Check after the lock if the page is still the same compound page.
>>> I can't speak to what the rules were ten years ago, but this is not
>>> true now.  Compound pages cannot be split if you hold a refcount.
>>> Since we don't track a per-page refcount, we wouldn't know which of
>>> the split pages to give the excess refcount to.
>> I noticed this recently
>>
>>   * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>>   * they are not mapped.
>>   *
>>   * Returns 0 if the hugepage is split successfully.
>>   * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>>   * us.
>>   */
>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>> {
>>
>> I have a test case with poisoned shmem THP page that was mlocked and
>>
>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
> Can you elaborate your test case a little bit more detail? There is a check in split_huge_page_to_list():
>
> /* Racy check whether the huge page can be split */
> bool can_split_folio(struct folio *folio, int *pextra_pins)
> {
> 	int extra_pins;
>
> 	/* Additional pins from page cache */
> 	if (folio_test_anon(folio))
> 		extra_pins = folio_test_swapcache(folio) ?
> 				folio_nr_pages(folio) : 0;
> 	else
> 		extra_pins = folio_nr_pages(folio);
> 	if (pextra_pins)
> 		*pextra_pins = extra_pins;
> 	return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - 1;
> }
>
> So a large folio can only be split if only one extra page refcnt is held. It means large folio won't be split from
> under us if we hold an page refcnt. Or am I miss something?
My experiment was with an older kernel, though the can_split check is 
the same.
Also, I was emulating GUP pin with a hack:  in madvise_inject_error(), 
replaced
get_user_pages_fast(start, 1, 0, &page) with
pin_user_pages_fast(start, 1, FOLL_WRITE|FOLL_LONGTERM, &page)
I suspect something might be wrong with my hack, I'm trying to reproduce 
with real GUP pin and on a newer kernel.
Will keep you informed.
thanks!
-jane


>
> Thanks.
>
>> thanks,
>>
>> -jane
>>
>> .
Miaohe Lin March 18, 2024, 2:28 a.m. UTC | #11
On 2024/3/16 3:22, Jane Chu wrote:
> On 3/15/2024 1:32 AM, Miaohe Lin wrote:
> 
>> On 2024/3/13 9:23, Jane Chu wrote:
>>> On 3/12/2024 7:14 AM, Matthew Wilcox wrote:
>>>
>>>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
>>>>> On 2024/3/11 20:31, Matthew Wilcox wrote:
>>>>>> Assuming we have a refcount on this page so it can't be simultaneously
>>>>>> split/freed/whatever, these three sequences are equivalent:
>>>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
>>>>>
>>>>>> 1    if (PageCompound(p))
>>>>>>
>>>>>> 2    struct page *head = compound_head(p);
>>>>>> 2    if (PageHead(head))
>>>>>>
>>>>>> 3    struct folio *folio = page_folio(p);
>>>>>> 3    if (folio_test_large(folio))
>>>>>>
>>>>>> .
>>>>>>
>>>>> But please see below commit:
>>>>>
>>>>> """
>>>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
>>>>> Author: Andi Kleen <ak@linux.intel.com>
>>>>> Date:   Wed Aug 6 16:06:49 2014 -0700
>>>>>
>>>>>       hwpoison: fix race with changing page during offlining
>>>>>
>>>>>       When a hwpoison page is locked it could change state due to parallel
>>>>>       modifications.  The original compound page can be torn down and then
>>>>>       this 4k page becomes part of a differently-size compound page is is a
>>>>>       standalone regular page.
>>>>>
>>>>>       Check after the lock if the page is still the same compound page.
>>>> I can't speak to what the rules were ten years ago, but this is not
>>>> true now.  Compound pages cannot be split if you hold a refcount.
>>>> Since we don't track a per-page refcount, we wouldn't know which of
>>>> the split pages to give the excess refcount to.
>>> I noticed this recently
>>>
>>>   * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>>>   * they are not mapped.
>>>   *
>>>   * Returns 0 if the hugepage is split successfully.
>>>   * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>>>   * us.
>>>   */
>>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>>> {
>>>
>>> I have a test case with poisoned shmem THP page that was mlocked and
>>>
>>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
>> Can you elaborate your test case a little bit more detail? There is a check in split_huge_page_to_list():
>>
>> /* Racy check whether the huge page can be split */
>> bool can_split_folio(struct folio *folio, int *pextra_pins)
>> {
>>     int extra_pins;
>>
>>     /* Additional pins from page cache */
>>     if (folio_test_anon(folio))
>>         extra_pins = folio_test_swapcache(folio) ?
>>                 folio_nr_pages(folio) : 0;
>>     else
>>         extra_pins = folio_nr_pages(folio);
>>     if (pextra_pins)
>>         *pextra_pins = extra_pins;
>>     return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - 1;
>> }
>>
>> So a large folio can only be split if only one extra page refcnt is held. It means large folio won't be split from
>> under us if we hold an page refcnt. Or am I miss something?
> My experiment was with an older kernel, though the can_split check is the same.
> Also, I was emulating GUP pin with a hack:  in madvise_inject_error(), replaced
> get_user_pages_fast(start, 1, 0, &page) with
> pin_user_pages_fast(start, 1, FOLL_WRITE|FOLL_LONGTERM, &page)

IIUC, get_user_pages_fast() and pin_user_pages_fast(FOLL_LONGTERM) will both call try_grab_folio() to fetch extra page refcnt.
get_user_pages_fast() will have FOLL_GET set while pin_user_pages_fast() will have FOLL_PIN set. It seems they works same for
large folio about page refcnt.

 *
 *    FOLL_GET: folio's refcount will be incremented by @refs.
 *
 *    FOLL_PIN on large folios: folio's refcount will be incremented by
 *    @refs, and its pincount will be incremented by @refs.
 *
 *    FOLL_PIN on single-page folios: folio's refcount will be incremented by
 *    @refs * GUP_PIN_COUNTING_BIAS.
 *
 * Return: The folio containing @page (with refcount appropriately
 * incremented) for success, or NULL upon failure. If neither FOLL_GET
 * nor FOLL_PIN was set, that's considered failure, and furthermore,
 * a likely bug in the caller, so a warning is also emitted.
 */
struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)

They will both call try_get_folio(page, refs) to fetch the page refcnt. So your hack with emulating GUP pin seems
doesn't work as you expected. Or am I miss something?

Thanks.

> I suspect something might be wrong with my hack, I'm trying to reproduce with real GUP pin and on a newer kernel.
> Will keep you informed.
> thanks!
> -jane
> 
> 
>>
>> Thanks.
>>
>>> thanks,
>>>
>>> -jane
>>>
>>> .
> .
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fe4959e994d0..74e87a0a792c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2189,7 +2189,7 @@  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 int memory_failure(unsigned long pfn, int flags)
 {
 	struct page *p;
-	struct page *hpage;
+	struct folio *folio;
 	struct dev_pagemap *pgmap;
 	int res = 0;
 	unsigned long page_flags;
@@ -2277,8 +2277,8 @@  int memory_failure(unsigned long pfn, int flags)
 		}
 	}
 
-	hpage = compound_head(p);
-	if (PageTransHuge(hpage)) {
+	folio = page_folio(p);
+	if (folio_test_large(folio)) {
 		/*
 		 * The flag must be set after the refcount is bumped
 		 * otherwise it may race with THP split.
@@ -2292,12 +2292,13 @@  int memory_failure(unsigned long pfn, int flags)
 		 * or unhandlable page.  The refcount is bumped iff the
 		 * page is a valid handlable page.
 		 */
-		SetPageHasHWPoisoned(hpage);
+		folio_set_has_hwpoisoned(folio);
 		if (try_to_split_thp_page(p) < 0) {
 			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
 			goto unlock_mutex;
 		}
 		VM_BUG_ON_PAGE(!page_count(p), p);
+		folio = page_folio(p);
 	}
 
 	/*
@@ -2308,9 +2309,9 @@  int memory_failure(unsigned long pfn, int flags)
 	 * The check (unnecessarily) ignores LRU pages being isolated and
 	 * walked by the page reclaim code, however that's not a big loss.
 	 */
-	shake_page(p);
+	shake_folio(folio);
 
-	lock_page(p);
+	folio_lock(folio);
 
 	/*
 	 * We're only intended to deal with the non-Compound page here.
@@ -2318,11 +2319,11 @@  int memory_failure(unsigned long pfn, int flags)
 	 * race window. If this happens, we could try again to hopefully
 	 * handle the page next round.
 	 */
-	if (PageCompound(p)) {
+	if (folio_test_large(folio)) {
 		if (retry) {
 			ClearPageHWPoison(p);
-			unlock_page(p);
-			put_page(p);
+			folio_unlock(folio);
+			folio_put(folio);
 			flags &= ~MF_COUNT_INCREASED;
 			retry = false;
 			goto try_again;
@@ -2338,29 +2339,29 @@  int memory_failure(unsigned long pfn, int flags)
 	 * folio_remove_rmap_*() in try_to_unmap_one(). So to determine page
 	 * status correctly, we save a copy of the page flags at this time.
 	 */
-	page_flags = p->flags;
+	page_flags = folio->flags;
 
 	if (hwpoison_filter(p)) {
 		ClearPageHWPoison(p);
-		unlock_page(p);
-		put_page(p);
+		folio_unlock(folio);
+		folio_put(folio);
 		res = -EOPNOTSUPP;
 		goto unlock_mutex;
 	}
 
 	/*
-	 * __munlock_folio() may clear a writeback page's LRU flag without
-	 * page_lock. We need wait writeback completion for this page or it
-	 * may trigger vfs BUG while evict inode.
+	 * __munlock_folio() may clear a writeback folio's LRU flag without
+	 * the folio lock. We need to wait for writeback completion for this
+	 * folio or it may trigger a vfs BUG while evicting inode.
 	 */
-	if (!PageLRU(p) && !PageWriteback(p))
+	if (!folio_test_lru(folio) && !folio_test_writeback(folio))
 		goto identify_page_state;
 
 	/*
 	 * It's very difficult to mess with pages currently under IO
 	 * and in many cases impossible, so we just avoid it here.
 	 */
-	wait_on_page_writeback(p);
+	folio_wait_writeback(folio);
 
 	/*
 	 * Now take care of user space mappings.
@@ -2374,7 +2375,8 @@  int memory_failure(unsigned long pfn, int flags)
 	/*
 	 * Torn down by someone else?
 	 */
-	if (PageLRU(p) && !PageSwapCache(p) && p->mapping == NULL) {
+	if (folio_test_lru(folio) && !folio_test_swapcache(folio) &&
+	    folio->mapping == NULL) {
 		res = action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
 		goto unlock_page;
 	}
@@ -2384,7 +2386,7 @@  int memory_failure(unsigned long pfn, int flags)
 	mutex_unlock(&mf_mutex);
 	return res;
 unlock_page:
-	unlock_page(p);
+	folio_unlock(folio);
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	return res;