diff mbox series

mm, compaction: Skip all pinned pages during scan

Message ID 20230511165516.77957-1-khalid.aziz@oracle.com (mailing list archive)
State New
Headers show
Series mm, compaction: Skip all pinned pages during scan | expand

Commit Message

Khalid Aziz May 11, 2023, 4:55 p.m. UTC
Pinned pages can not be migrated. Currently as
isolate_migratepages_block() scans pages for compaction, it skips
any pinned anonymous pages. All pinned pages should be skipped and
not just the anonymous pinned pages. This patch adds a check for
pinned page by comparing its refcount with mapcount and accounts for
possible extra refcounts. This was seen as a real issue on a
customer workload where a large number of pages were pinned by vfio
on the host and any attempts to allocate hugepages resulted in
significant amount of cpu time spent in either direct compaction or
in kcompatd scanning vfio pinned pages over and over again that can
not be migrated.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Suggested-by: Steve Sistare <steven.sistare@oracle.com>
---
 mm/compaction.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

Comments

Andrew Morton May 12, 2023, 3:45 a.m. UTC | #1
On Thu, 11 May 2023 10:55:16 -0600 Khalid Aziz <khalid.aziz@oracle.com> wrote:

> Pinned pages can not be migrated. Currently as
> isolate_migratepages_block() scans pages for compaction, it skips
> any pinned anonymous pages. All pinned pages should be skipped and
> not just the anonymous pinned pages. This patch adds a check for
> pinned page by comparing its refcount with mapcount and accounts for
> possible extra refcounts. This was seen as a real issue on a
> customer workload where a large number of pages were pinned by vfio
> on the host and any attempts to allocate hugepages resulted in
> significant amount of cpu time spent in either direct compaction or
> in kcompatd scanning vfio pinned pages over and over again that can
> not be migrated.
> 
> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  mm/compaction.c | 33 +++++++++++++++++++++++++++++----
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5a9501e0ae01..d1371fd75391 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -764,6 +764,32 @@ static bool too_many_isolated(pg_data_t *pgdat)
>  	return too_many;
>  }
>  
> +/*
> + * Check if this base page should be skipped from isolation because
> + * it is pinned. This function is called for regular pages only, and not
> + * for THP or hugetlbfs pages. This code is inspired by similar code
> + * in migrate_vma_check_page(), can_split_folio() and
> + * folio_migrate_mapping()
> + */
> +static inline bool is_pinned_page(struct page *page)
> +{
> +	unsigned long extra_refs;
> +
> +	/* anonymous page can have extra ref from page cache */

"from swapcache"?

> +	if (page_mapping(page))
> +		extra_refs = 1 + page_has_private(page);
> +	else
> +		extra_refs = PageSwapCache(page) ? 1 : 0;
> +
> +	/*
> +	 * This is an admittedly racy check but good enough to determine
> +	 * if a page should be isolated

"cannot be migrated"?

> +	 */
> +	if ((page_count(page) - extra_refs) > page_mapcount(page))
> +		return true;
> +	return false;
> +}
> +
>  /**
>   * isolate_migratepages_block() - isolate all migrate-able pages within
>   *				  a single pageblock
> @@ -992,12 +1018,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			goto isolate_fail;
>  
>  		/*
> -		 * Migration will fail if an anonymous page is pinned in memory,
> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -		 * admittedly racy check.
> +		 * Migration will fail if a page is pinned in memory,
> +		 * so avoid taking lru_lock and isolating it unnecessarily
>  		 */
>  		mapping = page_mapping(page);
> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> +		if (is_pinned_page(page))
>  			goto isolate_fail_put;
>  
>  		/*
> -- 
> 2.37.2
Khalid Aziz May 12, 2023, 3:30 p.m. UTC | #2
On 5/11/23 21:45, Andrew Morton wrote:
> On Thu, 11 May 2023 10:55:16 -0600 Khalid Aziz <khalid.aziz@oracle.com> wrote:
> 
>> Pinned pages can not be migrated. Currently as
>> isolate_migratepages_block() scans pages for compaction, it skips
>> any pinned anonymous pages. All pinned pages should be skipped and
>> not just the anonymous pinned pages. This patch adds a check for
>> pinned page by comparing its refcount with mapcount and accounts for
>> possible extra refcounts. This was seen as a real issue on a
>> customer workload where a large number of pages were pinned by vfio
>> on the host and any attempts to allocate hugepages resulted in
>> significant amount of cpu time spent in either direct compaction or
>> in kcompatd scanning vfio pinned pages over and over again that can
>> not be migrated.
>>
>> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
>> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>   mm/compaction.c | 33 +++++++++++++++++++++++++++++----
>>   1 file changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 5a9501e0ae01..d1371fd75391 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -764,6 +764,32 @@ static bool too_many_isolated(pg_data_t *pgdat)
>>   	return too_many;
>>   }
>>   
>> +/*
>> + * Check if this base page should be skipped from isolation because
>> + * it is pinned. This function is called for regular pages only, and not
>> + * for THP or hugetlbfs pages. This code is inspired by similar code
>> + * in migrate_vma_check_page(), can_split_folio() and
>> + * folio_migrate_mapping()
>> + */
>> +static inline bool is_pinned_page(struct page *page)
>> +{
>> +	unsigned long extra_refs;
>> +
>> +	/* anonymous page can have extra ref from page cache */
> 
> "from swapcache"?
> 
>> +	if (page_mapping(page))
>> +		extra_refs = 1 + page_has_private(page);
>> +	else
>> +		extra_refs = PageSwapCache(page) ? 1 : 0;
>> +
>> +	/*
>> +	 * This is an admittedly racy check but good enough to determine
>> +	 * if a page should be isolated
> 
> "cannot be migrated"?
> 

Thanks, Andrew! I will fix these and send out v2 patch.

--
Khalid
Matthew Wilcox (Oracle) May 12, 2023, 3:53 p.m. UTC | #3
On Thu, May 11, 2023 at 10:55:16AM -0600, Khalid Aziz wrote:
> +/*
> + * Check if this base page should be skipped from isolation because
> + * it is pinned. This function is called for regular pages only, and not
> + * for THP or hugetlbfs pages. This code is inspired by similar code
> + * in migrate_vma_check_page(), can_split_folio() and
> + * folio_migrate_mapping()
> + */
> +static inline bool is_pinned_page(struct page *page)

... yet another reminder this file hasn't been converted to folios :-(
This part is particularly hard because we don't have a refcount on the
page yet, so it may be allocated or freed while we're looking at it
which means we can't use folios _here_ because the Tail flag may get
set which would cause the folio code to drop BUGs all over us.

> +{
> +	unsigned long extra_refs;
> +
> +	/* anonymous page can have extra ref from page cache */
> +	if (page_mapping(page))

We already did the work of calling page_mapping() in the caller.
Probably best to pass it in here.

> +		extra_refs = 1 + page_has_private(page);

page_has_private() is wrong.  That's for determining if we need to call
the release function.  Filesystems don't increment the refcount when
they set PG_private_2.  This should just be PagePrivate().

> +	else
> +		extra_refs = PageSwapCache(page) ? 1 : 0;
> +
> +	/*
> +	 * This is an admittedly racy check but good enough to determine
> +	 * if a page should be isolated
> +	 */
> +	if ((page_count(page) - extra_refs) > page_mapcount(page))

page_count() includes a hidden call to compound_head(); you probably
meant page_ref_count() here.

>  		/*
> -		 * Migration will fail if an anonymous page is pinned in memory,
> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -		 * admittedly racy check.
> +		 * Migration will fail if a page is pinned in memory,
> +		 * so avoid taking lru_lock and isolating it unnecessarily
>  		 */
>  		mapping = page_mapping(page);
> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> +		if (is_pinned_page(page))

"pinned" now has two meanings when applied to pages, alas.  Better to
say "If there are extra references to this page beyond those from the
page/swap cache and page tables".

So it's probably also unwise to call it is_pinned_page().  Maybe

		if (page_extra_refcounts(page)) ?
Khalid Aziz May 12, 2023, 5:04 p.m. UTC | #4
Hi Matthew,

Thanks for the review.

On 5/12/23 09:53, Matthew Wilcox wrote:
> On Thu, May 11, 2023 at 10:55:16AM -0600, Khalid Aziz wrote:
>> +/*
>> + * Check if this base page should be skipped from isolation because
>> + * it is pinned. This function is called for regular pages only, and not
>> + * for THP or hugetlbfs pages. This code is inspired by similar code
>> + * in migrate_vma_check_page(), can_split_folio() and
>> + * folio_migrate_mapping()
>> + */
>> +static inline bool is_pinned_page(struct page *page)
> 
> ... yet another reminder this file hasn't been converted to folios :-(
> This part is particularly hard because we don't have a refcount on the
> page yet, so it may be allocated or freed while we're looking at it
> which means we can't use folios _here_ because the Tail flag may get
> set which would cause the folio code to drop BUGs all over us.
> 
>> +{
>> +	unsigned long extra_refs;
>> +
>> +	/* anonymous page can have extra ref from page cache */
>> +	if (page_mapping(page))
> 
> We already did the work of calling page_mapping() in the caller.
> Probably best to pass it in here.

That makes sense. I will change that.

> 
>> +		extra_refs = 1 + page_has_private(page);
> 
> page_has_private() is wrong.  That's for determining if we need to call
> the release function.  Filesystems don't increment the refcount when
> they set PG_private_2.  This should just be PagePrivate().

I will fix that.

> 
>> +	else
>> +		extra_refs = PageSwapCache(page) ? 1 : 0;
>> +
>> +	/*
>> +	 * This is an admittedly racy check but good enough to determine
>> +	 * if a page should be isolated
>> +	 */
>> +	if ((page_count(page) - extra_refs) > page_mapcount(page))
> 
> page_count() includes a hidden call to compound_head(); you probably
> meant page_ref_count() here.

You are right.

> 
>>   		/*
>> -		 * Migration will fail if an anonymous page is pinned in memory,
>> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
>> -		 * admittedly racy check.
>> +		 * Migration will fail if a page is pinned in memory,
>> +		 * so avoid taking lru_lock and isolating it unnecessarily
>>   		 */
>>   		mapping = page_mapping(page);
>> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
>> +		if (is_pinned_page(page))
> 
> "pinned" now has two meanings when applied to pages, alas.  Better to
> say "If there are extra references to this page beyond those from the
> page/swap cache and page tables".
> 
> So it's probably also unwise to call it is_pinned_page().  Maybe
> 
> 		if (page_extra_refcounts(page)) ?

I like that better. I will make these modifications.

Thanks,
Khalid
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index 5a9501e0ae01..d1371fd75391 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -764,6 +764,32 @@  static bool too_many_isolated(pg_data_t *pgdat)
 	return too_many;
 }
 
+/*
+ * Check if this base page should be skipped from isolation because
+ * it is pinned. This function is called for regular pages only, and not
+ * for THP or hugetlbfs pages. This code is inspired by similar code
+ * in migrate_vma_check_page(), can_split_folio() and
+ * folio_migrate_mapping()
+ */
+static inline bool is_pinned_page(struct page *page)
+{
+	unsigned long extra_refs;
+
+	/* anonymous page can have extra ref from page cache */
+	if (page_mapping(page))
+		extra_refs = 1 + page_has_private(page);
+	else
+		extra_refs = PageSwapCache(page) ? 1 : 0;
+
+	/*
+	 * This is an admittedly racy check but good enough to determine
+	 * if a page should be isolated
+	 */
+	if ((page_count(page) - extra_refs) > page_mapcount(page))
+		return true;
+	return false;
+}
+
 /**
  * isolate_migratepages_block() - isolate all migrate-able pages within
  *				  a single pageblock
@@ -992,12 +1018,11 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 
 		/*
-		 * Migration will fail if an anonymous page is pinned in memory,
-		 * so avoid taking lru_lock and isolating it unnecessarily in an
-		 * admittedly racy check.
+		 * Migration will fail if a page is pinned in memory,
+		 * so avoid taking lru_lock and isolating it unnecessarily
 		 */
 		mapping = page_mapping(page);
-		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+		if (is_pinned_page(page))
 			goto isolate_fail_put;
 
 		/*