diff mbox series

[v2] mm, compaction: Skip all non-migratable pages during scan

Message ID 20230512190000.103737-1-khalid.aziz@oracle.com (mailing list archive)
State New
Headers show
Series [v2] mm, compaction: Skip all non-migratable pages during scan | expand

Commit Message

Khalid Aziz May 12, 2023, 7 p.m. UTC
Pages pinned in memory through extra refcounts can not be migrated.
Currently as isolate_migratepages_block() scans pages for
compaction, it skips any pinned anonymous pages. All non-migratable
pages should be skipped and not just the anonymous pinned pages.
This patch adds a check for extra refcounts on a page to determine
if the page can be migrated.  This was seen as a real issue on a
customer workload where a large number of pages were pinned by vfio
on the host and any attempts to allocate hugepages resulted in
significant amount of cpu time spent in either direct compaction or
in kcompatd scanning vfio pinned pages over and over again that can
not be migrated.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Suggested-by: Steve Sistare <steven.sistare@oracle.com>
Cc: Khalid Aziz <khalid@kernel.org>
---
v2:
	- Update comments in the code (Suggested by Andrew)
	- Use PagePrivate() instead of page_has_private() (Suggested
	  by Matthew)
	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
	- Use page_ref_count() (Suggested by Matthew)
	- Rename is_pinned_page() to reflect its function more
	  accurately (Suggested by Matthew)

 mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

Comments

Huang, Ying May 15, 2023, 8:47 a.m. UTC | #1
Hi, Khalid,

Cced Mel.

Khalid Aziz <khalid.aziz@oracle.com> writes:

> Pages pinned in memory through extra refcounts can not be migrated.
> Currently as isolate_migratepages_block() scans pages for
> compaction, it skips any pinned anonymous pages. All non-migratable
> pages should be skipped and not just the anonymous pinned pages.
> This patch adds a check for extra refcounts on a page to determine
> if the page can be migrated.  This was seen as a real issue on a
> customer workload where a large number of pages were pinned by vfio
> on the host and any attempts to allocate hugepages resulted in
> significant amount of cpu time spent in either direct compaction or
> in kcompatd scanning vfio pinned pages over and over again that can

s/kcompatd/kcompactd/

> not be migrated.

With the patch below, the cycles for kcompactd disappeared?

> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
> Cc: Khalid Aziz <khalid@kernel.org>
> ---
> v2:
> 	- Update comments in the code (Suggested by Andrew)
> 	- Use PagePrivate() instead of page_has_private() (Suggested
> 	  by Matthew)
> 	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
> 	- Use page_ref_count() (Suggested by Matthew)
> 	- Rename is_pinned_page() to reflect its function more
> 	  accurately (Suggested by Matthew)
>
>  mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
>  1 file changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5a9501e0ae01..837f20df2bbb 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -764,6 +764,34 @@ static bool too_many_isolated(pg_data_t *pgdat)
>  	return too_many;
>  }
>  
> +/*
> + * Check if this base page should be skipped from isolation because
> + * it has extra refcounts that will prevent it from being migrated.
> + * This function is called for regular pages only, and not
> + * for THP or hugetlbfs pages. This code is inspired by similar code
> + * in migrate_vma_check_page(), can_split_folio() and
> + * folio_migrate_mapping()
> + */
> +static inline bool page_has_extrarefs(struct page *page,

Better to be named as page_has_extra_refs()?

> +					struct address_space *mapping)
> +{
> +	unsigned long extra_refs;
> +
> +	/* anonymous page can have extra ref from swap cache */
> +	if (mapping)
> +		extra_refs = 1 + PagePrivate(page);
> +	else
> +		extra_refs = PageSwapCache(page) ? 1 : 0;

IIUC, mapping != NULL if PageSwapCache(page) is true.  Please check the
implementation of page_mapping().

And even if mapping == NULL, the extra_refs should be 1, because we have
elevated the page refcount in isolate_migratepages_block() before
checking whether the page is pinned.  IIUC, this is the original
behavior.  Or, we can add "- 1" in the following checking.

> +
> +	/*
> +	 * This is an admittedly racy check but good enough to determine
> +	 * if a page is pinned and can not be migrated
> +	 */
> +	if ((page_ref_count(page) - extra_refs) > page_mapcount(page))
> +		return true;
> +	return false;
> +}
> +
>  /**
>   * isolate_migratepages_block() - isolate all migrate-able pages within
>   *				  a single pageblock
> @@ -992,12 +1020,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			goto isolate_fail;
>  
>  		/*
> -		 * Migration will fail if an anonymous page is pinned in memory,
> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -		 * admittedly racy check.
> +		 * Migration will fail if a page has extra refcounts
> +		 * preventing it from migrating, so avoid taking
> +		 * lru_lock and isolating it unnecessarily
>  		 */
>  		mapping = page_mapping(page);
> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> +		if (page_has_extrarefs(page, mapping))
>  			goto isolate_fail_put;
>  
>  		/*

Best Regards,
Huang, Ying
Khalid Aziz May 15, 2023, 10:36 p.m. UTC | #2
On 5/15/23 02:47, Huang, Ying wrote:
> Hi, Khalid,
> 
> Cced Mel.
> 
> Khalid Aziz <khalid.aziz@oracle.com> writes:
> 
>> Pages pinned in memory through extra refcounts can not be migrated.
>> Currently as isolate_migratepages_block() scans pages for
>> compaction, it skips any pinned anonymous pages. All non-migratable
>> pages should be skipped and not just the anonymous pinned pages.
>> This patch adds a check for extra refcounts on a page to determine
>> if the page can be migrated.  This was seen as a real issue on a
>> customer workload where a large number of pages were pinned by vfio
>> on the host and any attempts to allocate hugepages resulted in
>> significant amount of cpu time spent in either direct compaction or
>> in kcompatd scanning vfio pinned pages over and over again that can
> 
> s/kcompatd/kcompactd/

Thanks for the review. I will add the missing "c".

> 
>> not be migrated.
> 
> With the patch below, the cycles for kcompactd disappeared?

Yes, the test showed cpu time accumulated by kcompatcd during test run dropped significantly.

> 
>> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
>> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
>> Cc: Khalid Aziz <khalid@kernel.org>
>> ---
>> v2:
>> 	- Update comments in the code (Suggested by Andrew)
>> 	- Use PagePrivate() instead of page_has_private() (Suggested
>> 	  by Matthew)
>> 	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
>> 	- Use page_ref_count() (Suggested by Matthew)
>> 	- Rename is_pinned_page() to reflect its function more
>> 	  accurately (Suggested by Matthew)
>>
>>   mm/compaction.c | 36 ++++++++++++++++++++++++++++++++----
>>   1 file changed, 32 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 5a9501e0ae01..837f20df2bbb 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -764,6 +764,34 @@ static bool too_many_isolated(pg_data_t *pgdat)
>>   	return too_many;
>>   }
>>   
>> +/*
>> + * Check if this base page should be skipped from isolation because
>> + * it has extra refcounts that will prevent it from being migrated.
>> + * This function is called for regular pages only, and not
>> + * for THP or hugetlbfs pages. This code is inspired by similar code
>> + * in migrate_vma_check_page(), can_split_folio() and
>> + * folio_migrate_mapping()
>> + */
>> +static inline bool page_has_extrarefs(struct page *page,
> 
> Better to be named as page_has_extra_refs()?

Sure, I can do that.

> 
>> +					struct address_space *mapping)
>> +{
>> +	unsigned long extra_refs;
>> +
>> +	/* anonymous page can have extra ref from swap cache */
>> +	if (mapping)
>> +		extra_refs = 1 + PagePrivate(page);
>> +	else
>> +		extra_refs = PageSwapCache(page) ? 1 : 0;
> 
> IIUC, mapping != NULL if PageSwapCache(page) is true.  Please check the
> implementation of page_mapping().

which should work out for refcount since it would go to if part of conditional which will add a 1 for swap cache 
reference, but this code could be written better. I will work on it.

> 
> And even if mapping == NULL, the extra_refs should be 1, because we have
> elevated the page refcount in isolate_migratepages_block() before
> checking whether the page is pinned.  IIUC, this is the original
> behavior.  Or, we can add "- 1" in the following checking.
> 

You are right. There is a get_page_unless_zero() earlier in the code. I will compensate for that.

>> +
>> +	/*
>> +	 * This is an admittedly racy check but good enough to determine
>> +	 * if a page is pinned and can not be migrated
>> +	 */
>> +	if ((page_ref_count(page) - extra_refs) > page_mapcount(page))
>> +		return true;
>> +	return false;
>> +}
>> +
>>   /**
>>    * isolate_migratepages_block() - isolate all migrate-able pages within
>>    *				  a single pageblock
>> @@ -992,12 +1020,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>   			goto isolate_fail;
>>   
>>   		/*
>> -		 * Migration will fail if an anonymous page is pinned in memory,
>> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
>> -		 * admittedly racy check.
>> +		 * Migration will fail if a page has extra refcounts
>> +		 * preventing it from migrating, so avoid taking
>> +		 * lru_lock and isolating it unnecessarily
>>   		 */
>>   		mapping = page_mapping(page);
>> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
>> +		if (page_has_extrarefs(page, mapping))
>>   			goto isolate_fail_put;
>>   
>>   		/*
> 
> Best Regards,
> Huang, Ying

Thanks,
Khalid
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index 5a9501e0ae01..837f20df2bbb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -764,6 +764,34 @@  static bool too_many_isolated(pg_data_t *pgdat)
 	return too_many;
 }
 
+/*
+ * Check if this base page should be skipped from isolation because
+ * it has extra refcounts that will prevent it from being migrated.
+ * This function is called for regular pages only, and not
+ * for THP or hugetlbfs pages. This code is inspired by similar code
+ * in migrate_vma_check_page(), can_split_folio() and
+ * folio_migrate_mapping()
+ */
+static inline bool page_has_extrarefs(struct page *page,
+					struct address_space *mapping)
+{
+	unsigned long extra_refs;
+
+	/* anonymous page can have extra ref from swap cache */
+	if (mapping)
+		extra_refs = 1 + PagePrivate(page);
+	else
+		extra_refs = PageSwapCache(page) ? 1 : 0;
+
+	/*
+	 * This is an admittedly racy check but good enough to determine
+	 * if a page is pinned and can not be migrated
+	 */
+	if ((page_ref_count(page) - extra_refs) > page_mapcount(page))
+		return true;
+	return false;
+}
+
 /**
  * isolate_migratepages_block() - isolate all migrate-able pages within
  *				  a single pageblock
@@ -992,12 +1020,12 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 
 		/*
-		 * Migration will fail if an anonymous page is pinned in memory,
-		 * so avoid taking lru_lock and isolating it unnecessarily in an
-		 * admittedly racy check.
+		 * Migration will fail if a page has extra refcounts
+		 * preventing it from migrating, so avoid taking
+		 * lru_lock and isolating it unnecessarily
 		 */
 		mapping = page_mapping(page);
-		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+		if (page_has_extrarefs(page, mapping))
 			goto isolate_fail_put;
 
 		/*