mm/vmscan: don't try to reclaim freed folios

Message ID	20220527080451.48549-1-linmiaohe@huawei.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Miaohe Lin <linmiaohe@huawei.com> To: <akpm@linux-foundation.org> CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <linmiaohe@huawei.com> Subject: [PATCH] mm/vmscan: don't try to reclaim freed folios Date: Fri, 27 May 2022 16:04:51 +0800 Message-ID: <20220527080451.48549-1-linmiaohe@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/vmscan: don't try to reclaim freed folios \| expand mm/vmscan: don't try to reclaim freed folios

Miaohe Lin May 27, 2022, 8:04 a.m. UTC

If folios were freed from under us, there's no need to reclaim them. Skip
these folios to save lots of cpu cycles and avoid possible unnecessary
disk IO.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/vmscan.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Matthew Wilcox May 27, 2022, 3:02 p.m. UTC | #1

On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote:
> If folios were freed from under us, there's no need to reclaim them. Skip
> these folios to save lots of cpu cycles and avoid possible unnecessary
> disk IO.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/vmscan.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f7d9a683e3a7..646dd1efad32 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>  		folio = lru_to_folio(page_list);
>  		list_del(&folio->lru);
>  
> +		nr_pages = folio_nr_pages(folio);
> +		if (folio_ref_count(folio) == 1) {
> +			/* folio was freed from under us. So we are done. */
> +			WARN_ON(!folio_put_testzero(folio));

What?  No.  This can absolutely happen.  We have a refcount on the folio,
which means that any other thread can temporarily raise the refcount,
so this WARN_ON can trigger.  Also, we don't hold the folio locked,
or an extra reference, so nr_pages is unstable because it can be split.

> +			goto free_it;
> +		}
> +
>  		if (!folio_trylock(folio))
>  			goto keep;
>  
>  		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>  
> -		nr_pages = folio_nr_pages(folio);
>  
>  		/* Account the number of base pages */
>  		sc->nr_scanned += nr_pages;
> -- 
> 2.23.0
> 
>

Miaohe Lin May 28, 2022, 2:52 a.m. UTC | #2

On 2022/5/27 23:02, Matthew Wilcox wrote:
> On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote:
>> If folios were freed from under us, there's no need to reclaim them. Skip
>> these folios to save lots of cpu cycles and avoid possible unnecessary
>> disk IO.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/vmscan.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index f7d9a683e3a7..646dd1efad32 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>>  		folio = lru_to_folio(page_list);
>>  		list_del(&folio->lru);
>>  
>> +		nr_pages = folio_nr_pages(folio);
>> +		if (folio_ref_count(folio) == 1) {
>> +			/* folio was freed from under us. So we are done. */
>> +			WARN_ON(!folio_put_testzero(folio));
> 
> What?  No.  This can absolutely happen.  We have a refcount on the folio,
> which means that any other thread can temporarily raise the refcount,

IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or
under any use. So there should be no way that any other thread can temporarily raise the refcount when
folio_ref_count == 1. Or am I miss something?

> so this WARN_ON can trigger.  Also, we don't hold the folio locked,
> or an extra reference, so nr_pages is unstable because it can be split.

Yes, you're right. When folio_ref_count != 1, nr_pages is unstable. Will fix it if v2 is possible. :)

Thanks a lot for review and comment!

> 
>> +			goto free_it;
>> +		}
>> +
>>  		if (!folio_trylock(folio))
>>  			goto keep;
>>  
>>  		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>>  
>> -		nr_pages = folio_nr_pages(folio);
>>  
>>  		/* Account the number of base pages */
>>  		sc->nr_scanned += nr_pages;
>> -- 
>> 2.23.0
>>
>>
> 
> .
>

Matthew Wilcox May 28, 2022, 3:13 a.m. UTC | #3

On Sat, May 28, 2022 at 10:52:11AM +0800, Miaohe Lin wrote:
> On 2022/5/27 23:02, Matthew Wilcox wrote:
> > What?  No.  This can absolutely happen.  We have a refcount on the folio,
> > which means that any other thread can temporarily raise the refcount,
> 
> IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or
> under any use. So there should be no way that any other thread can temporarily raise the refcount when
> folio_ref_count == 1. Or am I miss something?

Take a look at something like GUP (fast).  If this page _was_ mapped to
userspace, something like this can happen:

Thread A	Thread B
load PTE
		unmap page
		refcount goes to 1
		vmscan sees the page
try_get_ref
		refcount is now 2.  WARN_ON.

Thread A will see that the PTE has changed and will now drop its
reference, but Thread B already spat out the WARN.

A similar thing can happen with the page cache.

If this is a worthwhile optimisation (does it happen often that we find
a refcount == 1 page?), we could do something like ...

		if (folio_ref_freeze(folio, 1)) {
			nr_pages = folio_nr_pages(folio);
			goto free_it;
		}

... or ...

		if (folio_ref_count(folio) == 1 &&
		    folio_ref_freeze(folio, 1)) {

... if we want to test-and-test-and-clear

But this function is far too complicated already.  I really want to
see numbers that proves the extra complexity is worth it.

Miaohe Lin May 28, 2022, 6:24 a.m. UTC | #4

On 2022/5/28 11:13, Matthew Wilcox wrote:
> On Sat, May 28, 2022 at 10:52:11AM +0800, Miaohe Lin wrote:
>> On 2022/5/27 23:02, Matthew Wilcox wrote:
>>> What?  No.  This can absolutely happen.  We have a refcount on the folio,
>>> which means that any other thread can temporarily raise the refcount,
>>
>> IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or
>> under any use. So there should be no way that any other thread can temporarily raise the refcount when
>> folio_ref_count == 1. Or am I miss something?
> 
> Take a look at something like GUP (fast).  If this page _was_ mapped to
> userspace, something like this can happen:
> 
> Thread A	Thread B
> load PTE
> 		unmap page
> 		refcount goes to 1
> 		vmscan sees the page
> try_get_ref
> 		refcount is now 2.  WARN_ON.
> 
> Thread A will see that the PTE has changed and will now drop its
> reference, but Thread B already spat out the WARN.
> 
> A similar thing can happen with the page cache.

Oh, I see. Many thanks for your patient explanation! :)

> 
> If this is a worthwhile optimisation (does it happen often that we find
> a refcount == 1 page?), we could do something like ...

No, It should be rare.

> 
> 		if (folio_ref_freeze(folio, 1)) {
> 			nr_pages = folio_nr_pages(folio);
> 			goto free_it;
> 		}
> 
> ... or ...
> 
> 		if (folio_ref_count(folio) == 1 &&
> 		    folio_ref_freeze(folio, 1)) {
> 
> ... if we want to test-and-test-and-clear

These proposed code changes look good to me.

> 
> But this function is far too complicated already.  I really want to
> see numbers that proves the extra complexity is worth it.

This optimization can save lots of cpu cycles and avoid possible disk I/O in
that specified case. But that is a somewhat rare case. So there's no numbers
that proves the extra complexity is worth it.

Should I drop this patch or proceed with the proposed code changes above in
next version? :)

Many thanks!

> 
> 
> .
>

Miaohe Lin June 8, 2022, 2:09 p.m. UTC | #5

On 2022/5/27 23:02, Matthew Wilcox wrote:
> On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote:
>> If folios were freed from under us, there's no need to reclaim them. Skip
>> these folios to save lots of cpu cycles and avoid possible unnecessary
>> disk IO.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/vmscan.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index f7d9a683e3a7..646dd1efad32 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>>  		folio = lru_to_folio(page_list);
>>  		list_del(&folio->lru);
>>  
>> +		nr_pages = folio_nr_pages(folio);
>> +		if (folio_ref_count(folio) == 1) {
>> +			/* folio was freed from under us. So we are done. */
>> +			WARN_ON(!folio_put_testzero(folio));
> 
> What?  No.  This can absolutely happen.  We have a refcount on the folio,
> which means that any other thread can temporarily raise the refcount,
> so this WARN_ON can trigger.  Also, we don't hold the folio locked,
> or an extra reference, so nr_pages is unstable because it can be split.

When I reread the code, I found caller holds an extra reference to the folio when
calling isolate_lru_pages(), so folio can't be split and thus nr_pages should be
stable indeed? Or am I miss something again?

Thanks!

> 
>> +			goto free_it;
>> +		}
>> +
>>  		if (!folio_trylock(folio))
>>  			goto keep;
>>  
>>  		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>>  
>> -		nr_pages = folio_nr_pages(folio);
>>  
>>  		/* Account the number of base pages */
>>  		sc->nr_scanned += nr_pages;
>> -- 
>> 2.23.0
>>
>>
> 
> .
>

mm/vmscan: don't try to reclaim freed folios

Commit Message

Comments

Patch