Message ID | 20240415081220.3246839-2-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: allow more high-order pages stored on PCP lists | expand |
On 2024/4/15 16:12, Kefeng Wang wrote: > Both the file pages and anonymous pages support large folio, high-order > pages except HPAGE_PMD_ORDER(PMD_SHIFT - PAGE_SHIFT) will be allocated > frequently which will increase the zone lock contention, allow high-order > pages on pcp lists could alleviate the big zone lock contention, in order > to allows high-orders(PAGE_ALLOC_COSTLY_ORDER, HPAGE_PMD_ORDER) to be > stored on the per-cpu lists, similar with PMD_ORDER pages, more lists is > added in struct per_cpu_pages (one list each high-order pages), also a > new PCP_MAX_ORDER instead of HPAGE_PMD_ORDER is added in mmzone.h. > > But as commit 44042b449872 ("mm/page_alloc: allow high-order pages to be > stored on the per-cpu lists") pointed, it may not win in all the scenes, > so this don't allow higher-order pages to be added to PCP list, the next > will add a control to enable or disable it. > > The struct per_cpu_pages increases in size from 256(4 cache lines) to > 320 bytes (5 cache lines) on arm64 with defconfig. > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > --- > include/linux/mmzone.h | 4 +++- > mm/page_alloc.c | 10 +++++----- > 2 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index c11b7cde81ef..c745e2f1a0f2 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -657,11 +657,13 @@ enum zone_watermarks { > * failures. > */ > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > -#define NR_PCP_THP 1 > +#define PCP_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT) > +#define NR_PCP_THP (PCP_MAX_ORDER - PAGE_ALLOC_COSTLY_ORDER) > #else > #define NR_PCP_THP 0 > #endif > #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1)) > +#define HIGHORDER_PCP_LIST_INDEX (NR_LOWORDER_PCP_LISTS - (PAGE_ALLOC_COSTLY_ORDER + 1)) Thanks for starting the discussion. I am concerned that mixing mTHPs of different migratetypes in a single pcp list might lead to fragmentation issues, potentially causing unmovable mTHPs to occupy movable pageblocks, which would reduce compaction efficiency. But also not sure if it is suitable to add more pcp lists, maybe we can just add the most commonly used mTHP as a start, for example: 64K?
On 2024/4/15 19:41, Baolin Wang wrote: > > > On 2024/4/15 16:12, Kefeng Wang wrote: >> Both the file pages and anonymous pages support large folio, high-order >> pages except HPAGE_PMD_ORDER(PMD_SHIFT - PAGE_SHIFT) will be allocated >> frequently which will increase the zone lock contention, allow high-order >> pages on pcp lists could alleviate the big zone lock contention, in order >> to allows high-orders(PAGE_ALLOC_COSTLY_ORDER, HPAGE_PMD_ORDER) to be >> stored on the per-cpu lists, similar with PMD_ORDER pages, more lists is >> added in struct per_cpu_pages (one list each high-order pages), also a >> new PCP_MAX_ORDER instead of HPAGE_PMD_ORDER is added in mmzone.h. >> >> But as commit 44042b449872 ("mm/page_alloc: allow high-order pages to be >> stored on the per-cpu lists") pointed, it may not win in all the scenes, >> so this don't allow higher-order pages to be added to PCP list, the next >> will add a control to enable or disable it. >> >> The struct per_cpu_pages increases in size from 256(4 cache lines) to >> 320 bytes (5 cache lines) on arm64 with defconfig. >> >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >> --- >> include/linux/mmzone.h | 4 +++- >> mm/page_alloc.c | 10 +++++----- >> 2 files changed, 8 insertions(+), 6 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index c11b7cde81ef..c745e2f1a0f2 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -657,11 +657,13 @@ enum zone_watermarks { >> * failures. >> */ >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >> -#define NR_PCP_THP 1 >> +#define PCP_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT) >> +#define NR_PCP_THP (PCP_MAX_ORDER - PAGE_ALLOC_COSTLY_ORDER) >> #else >> #define NR_PCP_THP 0 >> #endif >> #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * >> (PAGE_ALLOC_COSTLY_ORDER + 1)) >> +#define HIGHORDER_PCP_LIST_INDEX (NR_LOWORDER_PCP_LISTS - >> (PAGE_ALLOC_COSTLY_ORDER + 1)) > > Thanks for starting the discussion. > > I am concerned that mixing mTHPs of different migratetypes in a single > pcp list might lead to fragmentation issues, potentially causing > unmovable mTHPs to occupy movable pageblocks, which would reduce > compaction efficiency. > Yes, this is not enabled it by default. > But also not sure if it is suitable to add more pcp lists, maybe we can > just add the most commonly used mTHP as a start, for example: 64K? Do you mean only add only one list for 64K, I think it before, but it is not true for all cases, maybe other order is most used in different tests, so only enable specified high-order by a pcp_enabled sysfs, but it is certain that we need find a case to show improvement when use the high-order(eg,order4 = 64K) on pcp list.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c11b7cde81ef..c745e2f1a0f2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -657,11 +657,13 @@ enum zone_watermarks { * failures. */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define NR_PCP_THP 1 +#define PCP_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT) +#define NR_PCP_THP (PCP_MAX_ORDER - PAGE_ALLOC_COSTLY_ORDER) #else #define NR_PCP_THP 0 #endif #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1)) +#define HIGHORDER_PCP_LIST_INDEX (NR_LOWORDER_PCP_LISTS - (PAGE_ALLOC_COSTLY_ORDER + 1)) #define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP) #define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b51becf03d1e..2248afc7b73a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -506,8 +506,8 @@ static inline unsigned int order_to_pindex(int migratetype, int order) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (order > PAGE_ALLOC_COSTLY_ORDER) { - VM_BUG_ON(order != HPAGE_PMD_ORDER); - return NR_LOWORDER_PCP_LISTS; + VM_BUG_ON(order > PCP_MAX_ORDER); + return order + HIGHORDER_PCP_LIST_INDEX; } #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); @@ -521,8 +521,8 @@ static inline int pindex_to_order(unsigned int pindex) int order = pindex / MIGRATE_PCPTYPES; #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (pindex == NR_LOWORDER_PCP_LISTS) - order = HPAGE_PMD_ORDER; + if (pindex >= NR_LOWORDER_PCP_LISTS) + order = pindex - HIGHORDER_PCP_LIST_INDEX; #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); #endif @@ -535,7 +535,7 @@ static inline bool pcp_allowed_order(unsigned int order) if (order <= PAGE_ALLOC_COSTLY_ORDER) return true; #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (order == HPAGE_PMD_ORDER) + if (order == PCP_MAX_ORDER) return true; #endif return false;
Both the file pages and anonymous pages support large folio, high-order pages except HPAGE_PMD_ORDER(PMD_SHIFT - PAGE_SHIFT) will be allocated frequently which will increase the zone lock contention, allow high-order pages on pcp lists could alleviate the big zone lock contention, in order to allows high-orders(PAGE_ALLOC_COSTLY_ORDER, HPAGE_PMD_ORDER) to be stored on the per-cpu lists, similar with PMD_ORDER pages, more lists is added in struct per_cpu_pages (one list each high-order pages), also a new PCP_MAX_ORDER instead of HPAGE_PMD_ORDER is added in mmzone.h. But as commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists") pointed, it may not win in all the scenes, so this don't allow higher-order pages to be added to PCP list, the next will add a control to enable or disable it. The struct per_cpu_pages increases in size from 256(4 cache lines) to 320 bytes (5 cache lines) on arm64 with defconfig. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- include/linux/mmzone.h | 4 +++- mm/page_alloc.c | 10 +++++----- 2 files changed, 8 insertions(+), 6 deletions(-)