Message ID | 20230511165516.77957-1-khalid.aziz@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm, compaction: Skip all pinned pages during scan | expand |
On Thu, 11 May 2023 10:55:16 -0600 Khalid Aziz <khalid.aziz@oracle.com> wrote: > Pinned pages can not be migrated. Currently as > isolate_migratepages_block() scans pages for compaction, it skips > any pinned anonymous pages. All pinned pages should be skipped and > not just the anonymous pinned pages. This patch adds a check for > pinned page by comparing its refcount with mapcount and accounts for > possible extra refcounts. This was seen as a real issue on a > customer workload where a large number of pages were pinned by vfio > on the host and any attempts to allocate hugepages resulted in > significant amount of cpu time spent in either direct compaction or > in kcompatd scanning vfio pinned pages over and over again that can > not be migrated. > > Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> > Suggested-by: Steve Sistare <steven.sistare@oracle.com> > --- > mm/compaction.c | 33 +++++++++++++++++++++++++++++---- > 1 file changed, 29 insertions(+), 4 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 5a9501e0ae01..d1371fd75391 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -764,6 +764,32 @@ static bool too_many_isolated(pg_data_t *pgdat) > return too_many; > } > > +/* > + * Check if this base page should be skipped from isolation because > + * it is pinned. This function is called for regular pages only, and not > + * for THP or hugetlbfs pages. This code is inspired by similar code > + * in migrate_vma_check_page(), can_split_folio() and > + * folio_migrate_mapping() > + */ > +static inline bool is_pinned_page(struct page *page) > +{ > + unsigned long extra_refs; > + > + /* anonymous page can have extra ref from page cache */ "from swapcache"? > + if (page_mapping(page)) > + extra_refs = 1 + page_has_private(page); > + else > + extra_refs = PageSwapCache(page) ? 1 : 0; > + > + /* > + * This is an admittedly racy check but good enough to determine > + * if a page should be isolated "cannot be migrated"? > + */ > + if ((page_count(page) - extra_refs) > page_mapcount(page)) > + return true; > + return false; > +} > + > /** > * isolate_migratepages_block() - isolate all migrate-able pages within > * a single pageblock > @@ -992,12 +1018,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > goto isolate_fail; > > /* > - * Migration will fail if an anonymous page is pinned in memory, > - * so avoid taking lru_lock and isolating it unnecessarily in an > - * admittedly racy check. > + * Migration will fail if a page is pinned in memory, > + * so avoid taking lru_lock and isolating it unnecessarily > */ > mapping = page_mapping(page); > - if (!mapping && (page_count(page) - 1) > total_mapcount(page)) > + if (is_pinned_page(page)) > goto isolate_fail_put; > > /* > -- > 2.37.2
On 5/11/23 21:45, Andrew Morton wrote: > On Thu, 11 May 2023 10:55:16 -0600 Khalid Aziz <khalid.aziz@oracle.com> wrote: > >> Pinned pages can not be migrated. Currently as >> isolate_migratepages_block() scans pages for compaction, it skips >> any pinned anonymous pages. All pinned pages should be skipped and >> not just the anonymous pinned pages. This patch adds a check for >> pinned page by comparing its refcount with mapcount and accounts for >> possible extra refcounts. This was seen as a real issue on a >> customer workload where a large number of pages were pinned by vfio >> on the host and any attempts to allocate hugepages resulted in >> significant amount of cpu time spent in either direct compaction or >> in kcompatd scanning vfio pinned pages over and over again that can >> not be migrated. >> >> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> >> Suggested-by: Steve Sistare <steven.sistare@oracle.com> >> --- >> mm/compaction.c | 33 +++++++++++++++++++++++++++++---- >> 1 file changed, 29 insertions(+), 4 deletions(-) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 5a9501e0ae01..d1371fd75391 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -764,6 +764,32 @@ static bool too_many_isolated(pg_data_t *pgdat) >> return too_many; >> } >> >> +/* >> + * Check if this base page should be skipped from isolation because >> + * it is pinned. This function is called for regular pages only, and not >> + * for THP or hugetlbfs pages. This code is inspired by similar code >> + * in migrate_vma_check_page(), can_split_folio() and >> + * folio_migrate_mapping() >> + */ >> +static inline bool is_pinned_page(struct page *page) >> +{ >> + unsigned long extra_refs; >> + >> + /* anonymous page can have extra ref from page cache */ > > "from swapcache"? > >> + if (page_mapping(page)) >> + extra_refs = 1 + page_has_private(page); >> + else >> + extra_refs = PageSwapCache(page) ? 1 : 0; >> + >> + /* >> + * This is an admittedly racy check but good enough to determine >> + * if a page should be isolated > > "cannot be migrated"? > Thanks, Andrew! I will fix these and send out v2 patch. -- Khalid
On Thu, May 11, 2023 at 10:55:16AM -0600, Khalid Aziz wrote: > +/* > + * Check if this base page should be skipped from isolation because > + * it is pinned. This function is called for regular pages only, and not > + * for THP or hugetlbfs pages. This code is inspired by similar code > + * in migrate_vma_check_page(), can_split_folio() and > + * folio_migrate_mapping() > + */ > +static inline bool is_pinned_page(struct page *page) ... yet another reminder this file hasn't been converted to folios :-( This part is particularly hard because we don't have a refcount on the page yet, so it may be allocated or freed while we're looking at it which means we can't use folios _here_ because the Tail flag may get set which would cause the folio code to drop BUGs all over us. > +{ > + unsigned long extra_refs; > + > + /* anonymous page can have extra ref from page cache */ > + if (page_mapping(page)) We already did the work of calling page_mapping() in the caller. Probably best to pass it in here. > + extra_refs = 1 + page_has_private(page); page_has_private() is wrong. That's for determining if we need to call the release function. Filesystems don't increment the refcount when they set PG_private_2. This should just be PagePrivate(). > + else > + extra_refs = PageSwapCache(page) ? 1 : 0; > + > + /* > + * This is an admittedly racy check but good enough to determine > + * if a page should be isolated > + */ > + if ((page_count(page) - extra_refs) > page_mapcount(page)) page_count() includes a hidden call to compound_head(); you probably meant page_ref_count() here. > /* > - * Migration will fail if an anonymous page is pinned in memory, > - * so avoid taking lru_lock and isolating it unnecessarily in an > - * admittedly racy check. > + * Migration will fail if a page is pinned in memory, > + * so avoid taking lru_lock and isolating it unnecessarily > */ > mapping = page_mapping(page); > - if (!mapping && (page_count(page) - 1) > total_mapcount(page)) > + if (is_pinned_page(page)) "pinned" now has two meanings when applied to pages, alas. Better to say "If there are extra references to this page beyond those from the page/swap cache and page tables". So it's probably also unwise to call it is_pinned_page(). Maybe if (page_extra_refcounts(page)) ?
Hi Matthew, Thanks for the review. On 5/12/23 09:53, Matthew Wilcox wrote: > On Thu, May 11, 2023 at 10:55:16AM -0600, Khalid Aziz wrote: >> +/* >> + * Check if this base page should be skipped from isolation because >> + * it is pinned. This function is called for regular pages only, and not >> + * for THP or hugetlbfs pages. This code is inspired by similar code >> + * in migrate_vma_check_page(), can_split_folio() and >> + * folio_migrate_mapping() >> + */ >> +static inline bool is_pinned_page(struct page *page) > > ... yet another reminder this file hasn't been converted to folios :-( > This part is particularly hard because we don't have a refcount on the > page yet, so it may be allocated or freed while we're looking at it > which means we can't use folios _here_ because the Tail flag may get > set which would cause the folio code to drop BUGs all over us. > >> +{ >> + unsigned long extra_refs; >> + >> + /* anonymous page can have extra ref from page cache */ >> + if (page_mapping(page)) > > We already did the work of calling page_mapping() in the caller. > Probably best to pass it in here. That makes sense. I will change that. > >> + extra_refs = 1 + page_has_private(page); > > page_has_private() is wrong. That's for determining if we need to call > the release function. Filesystems don't increment the refcount when > they set PG_private_2. This should just be PagePrivate(). I will fix that. > >> + else >> + extra_refs = PageSwapCache(page) ? 1 : 0; >> + >> + /* >> + * This is an admittedly racy check but good enough to determine >> + * if a page should be isolated >> + */ >> + if ((page_count(page) - extra_refs) > page_mapcount(page)) > > page_count() includes a hidden call to compound_head(); you probably > meant page_ref_count() here. You are right. > >> /* >> - * Migration will fail if an anonymous page is pinned in memory, >> - * so avoid taking lru_lock and isolating it unnecessarily in an >> - * admittedly racy check. >> + * Migration will fail if a page is pinned in memory, >> + * so avoid taking lru_lock and isolating it unnecessarily >> */ >> mapping = page_mapping(page); >> - if (!mapping && (page_count(page) - 1) > total_mapcount(page)) >> + if (is_pinned_page(page)) > > "pinned" now has two meanings when applied to pages, alas. Better to > say "If there are extra references to this page beyond those from the > page/swap cache and page tables". > > So it's probably also unwise to call it is_pinned_page(). Maybe > > if (page_extra_refcounts(page)) ? I like that better. I will make these modifications. Thanks, Khalid
diff --git a/mm/compaction.c b/mm/compaction.c index 5a9501e0ae01..d1371fd75391 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -764,6 +764,32 @@ static bool too_many_isolated(pg_data_t *pgdat) return too_many; } +/* + * Check if this base page should be skipped from isolation because + * it is pinned. This function is called for regular pages only, and not + * for THP or hugetlbfs pages. This code is inspired by similar code + * in migrate_vma_check_page(), can_split_folio() and + * folio_migrate_mapping() + */ +static inline bool is_pinned_page(struct page *page) +{ + unsigned long extra_refs; + + /* anonymous page can have extra ref from page cache */ + if (page_mapping(page)) + extra_refs = 1 + page_has_private(page); + else + extra_refs = PageSwapCache(page) ? 1 : 0; + + /* + * This is an admittedly racy check but good enough to determine + * if a page should be isolated + */ + if ((page_count(page) - extra_refs) > page_mapcount(page)) + return true; + return false; +} + /** * isolate_migratepages_block() - isolate all migrate-able pages within * a single pageblock @@ -992,12 +1018,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_fail; /* - * Migration will fail if an anonymous page is pinned in memory, - * so avoid taking lru_lock and isolating it unnecessarily in an - * admittedly racy check. + * Migration will fail if a page is pinned in memory, + * so avoid taking lru_lock and isolating it unnecessarily */ mapping = page_mapping(page); - if (!mapping && (page_count(page) - 1) > total_mapcount(page)) + if (is_pinned_page(page)) goto isolate_fail_put; /*
Pinned pages can not be migrated. Currently as isolate_migratepages_block() scans pages for compaction, it skips any pinned anonymous pages. All pinned pages should be skipped and not just the anonymous pinned pages. This patch adds a check for pinned page by comparing its refcount with mapcount and accounts for possible extra refcounts. This was seen as a real issue on a customer workload where a large number of pages were pinned by vfio on the host and any attempts to allocate hugepages resulted in significant amount of cpu time spent in either direct compaction or in kcompatd scanning vfio pinned pages over and over again that can not be migrated. Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Suggested-by: Steve Sistare <steven.sistare@oracle.com> --- mm/compaction.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)