Message ID | 1734436004-1212-1-git-send-email-yangge1116@126.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [V7] mm, compaction: don't use ALLOC_CMA for unmovable allocations | expand |
Hello Yangge, On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote: > From: yangge <yangge1116@126.com> > > Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags > in __compaction_suitable()") allow compaction to proceed when free > pages required for compaction reside in the CMA pageblocks, it's > possible that __compaction_suitable() always returns true, and in > some cases, it's not acceptable. > > There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > of memory. I have configured 16GB of CMA memory on each NUMA node, > and starting a 32GB virtual machine with device passthrough is > extremely slow, taking almost an hour. > > During the start-up of the virtual machine, it will call > pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. > Long term GUP cannot allocate memory from CMA area, so a maximum > of 16 GB of no-CMA memory on a NUMA node can be used as virtual > machine memory. Since there is 16G of free CMA memory on the NUMA > node, watermark for order-0 always be met for compaction, so > __compaction_suitable() always returns true, even if the node is > unable to allocate non-CMA memory for the virtual machine. > > For costly allocations, because __compaction_suitable() always > returns true, __alloc_pages_slowpath() can't exit at the appropriate > place, resulting in excessively long virtual machine startup times. > Call trace: > __alloc_pages_slowpath > if (compact_result == COMPACT_SKIPPED || > compact_result == COMPACT_DEFERRED) > goto nopage; // should exit __alloc_pages_slowpath() from here > > Other unmovable alloctions, like dma_buf, which can be large in a > Linux system, are also unable to allocate memory from CMA, and these > allocations suffer from the same problems described above. In order > to quickly fall back to remote node, we should remove ALLOC_CMA both > in __compaction_suitable() and __isolate_free_page() for unmovable > alloctions. After this fix, starting a 32GB virtual machine with > device passthrough takes only a few seconds. The symptom is obviously bad, but I don't understand this fix. The reason we do ALLOC_CMA is that, even for unmovable allocations, you can create space in non-CMA space by moving migratable pages over to CMA space. This is not a property we want to lose. But I also don't see how it would interfere with your scenario. There is the compaction_suitable() check in should_compact_retry(), but that only applies when COMPACT_SKIPPED. IOW, it should only happen when compaction_suitable() just now returned false. IOW, a race condition. Which is why it's also not subject to limited retries. What's the exact condition that traps the allocator inside the loop?
在 2024/12/17 23:55, Johannes Weiner 写道: > Hello Yangge, > > On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote: >> From: yangge <yangge1116@126.com> >> >> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags >> in __compaction_suitable()") allow compaction to proceed when free >> pages required for compaction reside in the CMA pageblocks, it's >> possible that __compaction_suitable() always returns true, and in >> some cases, it's not acceptable. >> >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >> of memory. I have configured 16GB of CMA memory on each NUMA node, >> and starting a 32GB virtual machine with device passthrough is >> extremely slow, taking almost an hour. >> >> During the start-up of the virtual machine, it will call >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >> Long term GUP cannot allocate memory from CMA area, so a maximum >> of 16 GB of no-CMA memory on a NUMA node can be used as virtual >> machine memory. Since there is 16G of free CMA memory on the NUMA >> node, watermark for order-0 always be met for compaction, so >> __compaction_suitable() always returns true, even if the node is >> unable to allocate non-CMA memory for the virtual machine. >> >> For costly allocations, because __compaction_suitable() always >> returns true, __alloc_pages_slowpath() can't exit at the appropriate >> place, resulting in excessively long virtual machine startup times. >> Call trace: >> __alloc_pages_slowpath >> if (compact_result == COMPACT_SKIPPED || >> compact_result == COMPACT_DEFERRED) >> goto nopage; // should exit __alloc_pages_slowpath() from here >> >> Other unmovable alloctions, like dma_buf, which can be large in a >> Linux system, are also unable to allocate memory from CMA, and these >> allocations suffer from the same problems described above. In order >> to quickly fall back to remote node, we should remove ALLOC_CMA both >> in __compaction_suitable() and __isolate_free_page() for unmovable >> alloctions. After this fix, starting a 32GB virtual machine with >> device passthrough takes only a few seconds. > > The symptom is obviously bad, but I don't understand this fix. > > The reason we do ALLOC_CMA is that, even for unmovable allocations, > you can create space in non-CMA space by moving migratable pages over > to CMA space. This is not a property we want to lose. But I also don't > see how it would interfere with your scenario. The __alloc_pages_slowpath() function was originally intended to exit at place 1, but due to __compaction_suitable() always returning true, it results in __alloc_pages_slowpath() exiting at place 2 instead. This ultimately leads to a significantly longer execution time for __alloc_pages_slowpath(). Call trace: __alloc_pages_slowpath if (compact_result == COMPACT_SKIPPED || compact_result == COMPACT_DEFERRED) goto nopage; // place 1 __alloc_pages_direct_reclaim() // Reclaim is very expensive __alloc_pages_direct_compact() if (gfp_mask & __GFP_NORETRY) goto nopage; // place 2 Every time memory allocation goes through the above slower process, it ultimately leads to significantly longer virtual machine startup times. > > There is the compaction_suitable() check in should_compact_retry(), > but that only applies when COMPACT_SKIPPED. IOW, it should only happen > when compaction_suitable() just now returned false. IOW, a race > condition. Which is why it's also not subject to limited retries. > > What's the exact condition that traps the allocator inside the loop? The should_compact_retry() function was not executed, and the slow here was mainly due to the execution of __alloc_pages_direct_reclaim().
On Wed, Dec 18, 2024 at 10:15:06AM +0800, Ge Yang wrote: > > > 在 2024/12/17 23:55, Johannes Weiner 写道: > > Hello Yangge, > > > > On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote: > >> From: yangge <yangge1116@126.com> > >> > >> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags > >> in __compaction_suitable()") allow compaction to proceed when free > >> pages required for compaction reside in the CMA pageblocks, it's > >> possible that __compaction_suitable() always returns true, and in > >> some cases, it's not acceptable. > >> > >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > >> of memory. I have configured 16GB of CMA memory on each NUMA node, > >> and starting a 32GB virtual machine with device passthrough is > >> extremely slow, taking almost an hour. > >> > >> During the start-up of the virtual machine, it will call > >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. > >> Long term GUP cannot allocate memory from CMA area, so a maximum > >> of 16 GB of no-CMA memory on a NUMA node can be used as virtual > >> machine memory. Since there is 16G of free CMA memory on the NUMA > >> node, watermark for order-0 always be met for compaction, so > >> __compaction_suitable() always returns true, even if the node is > >> unable to allocate non-CMA memory for the virtual machine. > >> > >> For costly allocations, because __compaction_suitable() always > >> returns true, __alloc_pages_slowpath() can't exit at the appropriate > >> place, resulting in excessively long virtual machine startup times. > >> Call trace: > >> __alloc_pages_slowpath > >> if (compact_result == COMPACT_SKIPPED || > >> compact_result == COMPACT_DEFERRED) > >> goto nopage; // should exit __alloc_pages_slowpath() from here > >> > >> Other unmovable alloctions, like dma_buf, which can be large in a > >> Linux system, are also unable to allocate memory from CMA, and these > >> allocations suffer from the same problems described above. In order > >> to quickly fall back to remote node, we should remove ALLOC_CMA both > >> in __compaction_suitable() and __isolate_free_page() for unmovable > >> alloctions. After this fix, starting a 32GB virtual machine with > >> device passthrough takes only a few seconds. > > > > The symptom is obviously bad, but I don't understand this fix. > > > > The reason we do ALLOC_CMA is that, even for unmovable allocations, > > you can create space in non-CMA space by moving migratable pages over > > to CMA space. This is not a property we want to lose. But I also don't > > see how it would interfere with your scenario. > > The __alloc_pages_slowpath() function was originally intended to exit at > place 1, but due to __compaction_suitable() always returning true, it > results in __alloc_pages_slowpath() exiting at place 2 instead. This > ultimately leads to a significantly longer execution time for > __alloc_pages_slowpath(). > > Call trace: > __alloc_pages_slowpath > if (compact_result == COMPACT_SKIPPED || > compact_result == COMPACT_DEFERRED) > goto nopage; // place 1 > __alloc_pages_direct_reclaim() // Reclaim is very expensive > __alloc_pages_direct_compact() > if (gfp_mask & __GFP_NORETRY) > goto nopage; // place 2 > > Every time memory allocation goes through the above slower process, it > ultimately leads to significantly longer virtual machine startup times. I still don't follow. Why do you want the allocation to fail? The changelog says this is in order to fall back quickly to other nodes. But there is a full node walk in get_page_from_freelist() before the allocator even engages reclaim. There is something missing from the story still. But regardless - surely you can see that we can't make the allocator generally weaker on large requests just because they happen to be optional in your specific case? > > There is the compaction_suitable() check in should_compact_retry(), > > but that only applies when COMPACT_SKIPPED. IOW, it should only happen > > when compaction_suitable() just now returned false. IOW, a race > > condition. Which is why it's also not subject to limited retries. > > > > What's the exact condition that traps the allocator inside the loop? > The should_compact_retry() function was not executed, and the slow here > was mainly due to the execution of __alloc_pages_direct_reclaim(). Ok.
在 2024/12/18 11:29, Johannes Weiner 写道: > On Wed, Dec 18, 2024 at 10:15:06AM +0800, Ge Yang wrote: >> >> >> 在 2024/12/17 23:55, Johannes Weiner 写道: >>> Hello Yangge, >>> >>> On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote: >>>> From: yangge <yangge1116@126.com> >>>> >>>> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags >>>> in __compaction_suitable()") allow compaction to proceed when free >>>> pages required for compaction reside in the CMA pageblocks, it's >>>> possible that __compaction_suitable() always returns true, and in >>>> some cases, it's not acceptable. >>>> >>>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >>>> of memory. I have configured 16GB of CMA memory on each NUMA node, >>>> and starting a 32GB virtual machine with device passthrough is >>>> extremely slow, taking almost an hour. >>>> >>>> During the start-up of the virtual machine, it will call >>>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >>>> Long term GUP cannot allocate memory from CMA area, so a maximum >>>> of 16 GB of no-CMA memory on a NUMA node can be used as virtual >>>> machine memory. Since there is 16G of free CMA memory on the NUMA >>>> node, watermark for order-0 always be met for compaction, so >>>> __compaction_suitable() always returns true, even if the node is >>>> unable to allocate non-CMA memory for the virtual machine. >>>> >>>> For costly allocations, because __compaction_suitable() always >>>> returns true, __alloc_pages_slowpath() can't exit at the appropriate >>>> place, resulting in excessively long virtual machine startup times. >>>> Call trace: >>>> __alloc_pages_slowpath >>>> if (compact_result == COMPACT_SKIPPED || >>>> compact_result == COMPACT_DEFERRED) >>>> goto nopage; // should exit __alloc_pages_slowpath() from here >>>> >>>> Other unmovable alloctions, like dma_buf, which can be large in a >>>> Linux system, are also unable to allocate memory from CMA, and these >>>> allocations suffer from the same problems described above. In order >>>> to quickly fall back to remote node, we should remove ALLOC_CMA both >>>> in __compaction_suitable() and __isolate_free_page() for unmovable >>>> alloctions. After this fix, starting a 32GB virtual machine with >>>> device passthrough takes only a few seconds. >>> >>> The symptom is obviously bad, but I don't understand this fix. >>> >>> The reason we do ALLOC_CMA is that, even for unmovable allocations, >>> you can create space in non-CMA space by moving migratable pages over >>> to CMA space. This is not a property we want to lose. But I also don't >>> see how it would interfere with your scenario. >> >> The __alloc_pages_slowpath() function was originally intended to exit at >> place 1, but due to __compaction_suitable() always returning true, it >> results in __alloc_pages_slowpath() exiting at place 2 instead. This >> ultimately leads to a significantly longer execution time for >> __alloc_pages_slowpath(). >> >> Call trace: >> __alloc_pages_slowpath >> if (compact_result == COMPACT_SKIPPED || >> compact_result == COMPACT_DEFERRED) >> goto nopage; // place 1 >> __alloc_pages_direct_reclaim() // Reclaim is very expensive >> __alloc_pages_direct_compact() >> if (gfp_mask & __GFP_NORETRY) >> goto nopage; // place 2 >> >> Every time memory allocation goes through the above slower process, it >> ultimately leads to significantly longer virtual machine startup times. > > I still don't follow. Why do you want the allocation to fail? > pin_user_pages_remote(..., FOLL_LONGTERM, ...) first attemps to allocate THP only on local node, and then fall back to remote NUMA nodes if the local allocation fail. For detail, see alloc_pages_mpol(). static struct page *alloc_pages_mpol() { page = __alloc_frozen_pages_noprof(__GFP_THISNODE,...); // 1, try to allocate THP only on local node if (page || !(gpf & __GFP_DIRECT_RECLAIM)) return page; page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask);//2, fall back to remote NUMA nodes } > The changelog says this is in order to fall back quickly to other > nodes. But there is a full node walk in get_page_from_freelist() > before the allocator even engages reclaim. There is something missing > from the story still. > > But regardless - surely you can see that we can't make the allocator > generally weaker on large requests just because they happen to be > optional in your specific case? >First, try to allocate THP on the local node as much as possible, and then fall back to a remote node if the local allocation fail. This is the default memory allocation strategy when starting virtual machines. >>> There is the compaction_suitable() check in should_compact_retry(), >>> but that only applies when COMPACT_SKIPPED. IOW, it should only happen >>> when compaction_suitable() just now returned false. IOW, a race >>> condition. Which is why it's also not subject to limited retries. >>> >>> What's the exact condition that traps the allocator inside the loop? >> The should_compact_retry() function was not executed, and the slow here >> was mainly due to the execution of __alloc_pages_direct_reclaim(). > > Ok.
在 2024/12/18 11:29, Johannes Weiner 写道: > On Wed, Dec 18, 2024 at 10:15:06AM +0800, Ge Yang wrote: >> >> >> 在 2024/12/17 23:55, Johannes Weiner 写道: >>> Hello Yangge, >>> >>> On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote: >>>> From: yangge <yangge1116@126.com> >>>> >>>> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags >>>> in __compaction_suitable()") allow compaction to proceed when free >>>> pages required for compaction reside in the CMA pageblocks, it's >>>> possible that __compaction_suitable() always returns true, and in >>>> some cases, it's not acceptable. >>>> >>>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >>>> of memory. I have configured 16GB of CMA memory on each NUMA node, >>>> and starting a 32GB virtual machine with device passthrough is >>>> extremely slow, taking almost an hour. >>>> >>>> During the start-up of the virtual machine, it will call >>>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >>>> Long term GUP cannot allocate memory from CMA area, so a maximum >>>> of 16 GB of no-CMA memory on a NUMA node can be used as virtual >>>> machine memory. Since there is 16G of free CMA memory on the NUMA >>>> node, watermark for order-0 always be met for compaction, so >>>> __compaction_suitable() always returns true, even if the node is >>>> unable to allocate non-CMA memory for the virtual machine. >>>> >>>> For costly allocations, because __compaction_suitable() always >>>> returns true, __alloc_pages_slowpath() can't exit at the appropriate >>>> place, resulting in excessively long virtual machine startup times. >>>> Call trace: >>>> __alloc_pages_slowpath >>>> if (compact_result == COMPACT_SKIPPED || >>>> compact_result == COMPACT_DEFERRED) >>>> goto nopage; // should exit __alloc_pages_slowpath() from here >>>> >>>> Other unmovable alloctions, like dma_buf, which can be large in a >>>> Linux system, are also unable to allocate memory from CMA, and these >>>> allocations suffer from the same problems described above. In order >>>> to quickly fall back to remote node, we should remove ALLOC_CMA both >>>> in __compaction_suitable() and __isolate_free_page() for unmovable >>>> alloctions. After this fix, starting a 32GB virtual machine with >>>> device passthrough takes only a few seconds. >>> >>> The symptom is obviously bad, but I don't understand this fix. >>> >>> The reason we do ALLOC_CMA is that, even for unmovable allocations, >>> you can create space in non-CMA space by moving migratable pages over >>> to CMA space. This is not a property we want to lose. But I also don't >>> see how it would interfere with your scenario. >> >> The __alloc_pages_slowpath() function was originally intended to exit at >> place 1, but due to __compaction_suitable() always returning true, it >> results in __alloc_pages_slowpath() exiting at place 2 instead. This >> ultimately leads to a significantly longer execution time for >> __alloc_pages_slowpath(). >> >> Call trace: >> __alloc_pages_slowpath >> if (compact_result == COMPACT_SKIPPED || >> compact_result == COMPACT_DEFERRED) >> goto nopage; // place 1 >> __alloc_pages_direct_reclaim() // Reclaim is very expensive >> __alloc_pages_direct_compact() >> if (gfp_mask & __GFP_NORETRY) >> goto nopage; // place 2 >> >> Every time memory allocation goes through the above slower process, it >> ultimately leads to significantly longer virtual machine startup times. > > I still don't follow. Why do you want the allocation to fail? > pin_user_pages_remote(..., FOLL_LONGTERM, ...) first attemps to allocate THP only on local node, and then fall back to remote NUMA nodes if the local allocation fail. For detail, see alloc_pages_mpol(). static struct page *alloc_pages_mpol() { page = __alloc_frozen_pages_noprof(__GFP_THISNODE,...); // 1, try to allocate THP only on local node if (page || !(gpf & __GFP_DIRECT_RECLAIM)) return page; page = __alloc_frozen_pages_noprof(gfp, order, nid, nodemask);//2, fall back to remote NUMA nodes } > The changelog says this is in order to fall back quickly to other > nodes. But there is a full node walk in get_page_from_freelist() > before the allocator even engages reclaim. There is something missing > from the story still. > > But regardless - surely you can see that we can't make the allocator > generally weaker on large requests just because they happen to be > optional in your specific case? > First, try to allocate THP on the local node as much as possible, and then fall back to a remote node if the local allocation fail. This is the default memory allocation strategy when starting virtual machines. >>> There is the compaction_suitable() check in should_compact_retry(), >>> but that only applies when COMPACT_SKIPPED. IOW, it should only happen >>> when compaction_suitable() just now returned false. IOW, a race >>> condition. Which is why it's also not subject to limited retries. >>> >>> What's the exact condition that traps the allocator inside the loop? >> The should_compact_retry() function was not executed, and the slow here >> was mainly due to the execution of __alloc_pages_direct_reclaim(). > > Ok.
On 2024/12/17 23:55, Johannes Weiner wrote: > Hello Yangge, > > On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote: >> From: yangge <yangge1116@126.com> >> >> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags >> in __compaction_suitable()") allow compaction to proceed when free >> pages required for compaction reside in the CMA pageblocks, it's >> possible that __compaction_suitable() always returns true, and in >> some cases, it's not acceptable. >> >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >> of memory. I have configured 16GB of CMA memory on each NUMA node, >> and starting a 32GB virtual machine with device passthrough is >> extremely slow, taking almost an hour. >> >> During the start-up of the virtual machine, it will call >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >> Long term GUP cannot allocate memory from CMA area, so a maximum >> of 16 GB of no-CMA memory on a NUMA node can be used as virtual >> machine memory. Since there is 16G of free CMA memory on the NUMA >> node, watermark for order-0 always be met for compaction, so >> __compaction_suitable() always returns true, even if the node is >> unable to allocate non-CMA memory for the virtual machine. >> >> For costly allocations, because __compaction_suitable() always >> returns true, __alloc_pages_slowpath() can't exit at the appropriate >> place, resulting in excessively long virtual machine startup times. >> Call trace: >> __alloc_pages_slowpath >> if (compact_result == COMPACT_SKIPPED || >> compact_result == COMPACT_DEFERRED) >> goto nopage; // should exit __alloc_pages_slowpath() from here >> >> Other unmovable alloctions, like dma_buf, which can be large in a >> Linux system, are also unable to allocate memory from CMA, and these >> allocations suffer from the same problems described above. In order >> to quickly fall back to remote node, we should remove ALLOC_CMA both >> in __compaction_suitable() and __isolate_free_page() for unmovable >> alloctions. After this fix, starting a 32GB virtual machine with >> device passthrough takes only a few seconds. > > The symptom is obviously bad, but I don't understand this fix. > > The reason we do ALLOC_CMA is that, even for unmovable allocations, > you can create space in non-CMA space by moving migratable pages over > to CMA space. This is not a property we want to lose. But I also don't Good point. I missed that and I need to withdraw my reviewed tag. Thanks.
diff --git a/include/linux/compaction.h b/include/linux/compaction.h index e947764..b4c3ac3 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -90,7 +90,8 @@ extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, struct page **page); extern void reset_isolation_suitable(pg_data_t *pgdat); extern bool compaction_suitable(struct zone *zone, int order, - int highest_zoneidx); + int highest_zoneidx, + unsigned int alloc_flags); extern void compaction_defer_reset(struct zone *zone, int order, bool alloc_success); @@ -108,7 +109,8 @@ static inline void reset_isolation_suitable(pg_data_t *pgdat) } static inline bool compaction_suitable(struct zone *zone, int order, - int highest_zoneidx) + int highest_zoneidx, + unsigned int alloc_flags) { return false; } diff --git a/mm/compaction.c b/mm/compaction.c index 07bd227..223f2da 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -655,7 +655,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, /* Found a free page, will break it into order-0 pages */ order = buddy_order(page); - isolated = __isolate_free_page(page, order); + isolated = __isolate_free_page(page, order, cc->alloc_flags); if (!isolated) break; set_page_private(page, order); @@ -1634,7 +1634,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* Isolate the page if available */ if (page) { - if (__isolate_free_page(page, order)) { + if (__isolate_free_page(page, order, cc->alloc_flags)) { set_page_private(page, order); nr_isolated = 1 << order; nr_scanned += nr_isolated - 1; @@ -2381,6 +2381,7 @@ static enum compact_result compact_finished(struct compact_control *cc) static bool __compaction_suitable(struct zone *zone, int order, int highest_zoneidx, + unsigned int alloc_flags, unsigned long wmark_target) { unsigned long watermark; @@ -2395,25 +2396,26 @@ static bool __compaction_suitable(struct zone *zone, int order, * even if compaction succeeds. * For costly orders, we require low watermark instead of min for * compaction to proceed to increase its chances. - * ALLOC_CMA is used, as pages in CMA pageblocks are considered - * suitable migration targets + * In addition to unmovable allocations, ALLOC_CMA is used, as pages in + * CMA pageblocks are considered suitable migration targets */ watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? low_wmark_pages(zone) : min_wmark_pages(zone); watermark += compact_gap(order); return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx, - ALLOC_CMA, wmark_target); + alloc_flags & ALLOC_CMA, wmark_target); } /* * compaction_suitable: Is this suitable to run compaction on this zone now? */ -bool compaction_suitable(struct zone *zone, int order, int highest_zoneidx) +bool compaction_suitable(struct zone *zone, int order, int highest_zoneidx, + unsigned int alloc_flags) { enum compact_result compact_result; bool suitable; - suitable = __compaction_suitable(zone, order, highest_zoneidx, + suitable = __compaction_suitable(zone, order, highest_zoneidx, alloc_flags, zone_page_state(zone, NR_FREE_PAGES)); /* * fragmentation index determines if allocation failures are due to @@ -2474,7 +2476,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, available = zone_reclaimable_pages(zone) / order; available += zone_page_state_snapshot(zone, NR_FREE_PAGES); if (__compaction_suitable(zone, order, ac->highest_zoneidx, - available)) + alloc_flags, available)) return true; } @@ -2499,7 +2501,7 @@ compaction_suit_allocation_order(struct zone *zone, unsigned int order, alloc_flags)) return COMPACT_SUCCESS; - if (!compaction_suitable(zone, order, highest_zoneidx)) + if (!compaction_suitable(zone, order, highest_zoneidx, alloc_flags)) return COMPACT_SKIPPED; return COMPACT_CONTINUE; @@ -2893,6 +2895,7 @@ static int compact_node(pg_data_t *pgdat, bool proactive) struct compact_control cc = { .order = -1, .mode = proactive ? MIGRATE_SYNC_LIGHT : MIGRATE_SYNC, + .alloc_flags = ALLOC_CMA, .ignore_skip_hint = true, .whole_zone = true, .gfp_mask = GFP_KERNEL, @@ -3037,7 +3040,7 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat) ret = compaction_suit_allocation_order(zone, pgdat->kcompactd_max_order, - highest_zoneidx, ALLOC_WMARK_MIN); + highest_zoneidx, ALLOC_CMA | ALLOC_WMARK_MIN); if (ret == COMPACT_CONTINUE) return true; } @@ -3058,6 +3061,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) .search_order = pgdat->kcompactd_max_order, .highest_zoneidx = pgdat->kcompactd_highest_zoneidx, .mode = MIGRATE_SYNC_LIGHT, + .alloc_flags = ALLOC_CMA | ALLOC_WMARK_MIN, .ignore_skip_hint = false, .gfp_mask = GFP_KERNEL, }; @@ -3078,7 +3082,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) continue; ret = compaction_suit_allocation_order(zone, - cc.order, zoneid, ALLOC_WMARK_MIN); + cc.order, zoneid, cc.alloc_flags); if (ret != COMPACT_CONTINUE) continue; diff --git a/mm/internal.h b/mm/internal.h index 3922788..6d257c8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -662,7 +662,8 @@ static inline void clear_zone_contiguous(struct zone *zone) zone->contiguous = false; } -extern int __isolate_free_page(struct page *page, unsigned int order); +extern int __isolate_free_page(struct page *page, unsigned int order, + unsigned int alloc_flags); extern void __putback_isolated_page(struct page *page, unsigned int order, int mt); extern void memblock_free_pages(struct page *page, unsigned long pfn, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dde19db..1bfdca3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2809,7 +2809,8 @@ void split_page(struct page *page, unsigned int order) } EXPORT_SYMBOL_GPL(split_page); -int __isolate_free_page(struct page *page, unsigned int order) +int __isolate_free_page(struct page *page, unsigned int order, + unsigned int alloc_flags) { struct zone *zone = page_zone(page); int mt = get_pageblock_migratetype(page); @@ -2823,7 +2824,8 @@ int __isolate_free_page(struct page *page, unsigned int order) * exists. */ watermark = zone->_watermark[WMARK_MIN] + (1UL << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) + if (!zone_watermark_ok(zone, 0, watermark, 0, + alloc_flags & ALLOC_CMA)) return 0; } @@ -6454,6 +6456,7 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end, .order = -1, .zone = page_zone(pfn_to_page(start)), .mode = MIGRATE_SYNC, + .alloc_flags = ALLOC_CMA, .ignore_skip_hint = true, .no_set_skip_hint = true, .alloc_contig = true, diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c608e9d..a1f2c79 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -229,7 +229,8 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) buddy = find_buddy_page_pfn(page, page_to_pfn(page), order, NULL); if (buddy && !is_migrate_isolate_page(buddy)) { - isolated_page = !!__isolate_free_page(page, order); + isolated_page = !!__isolate_free_page(page, order, + ALLOC_CMA); /* * Isolating a free page in an isolated pageblock * is expected to always work as watermarks don't diff --git a/mm/page_reporting.c b/mm/page_reporting.c index e4c428e..fd3813b 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -198,7 +198,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* Attempt to pull page from list and place in scatterlist */ if (*offset) { - if (!__isolate_free_page(page, order)) { + if (!__isolate_free_page(page, order, ALLOC_CMA)) { next = page; break; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 5e03a61..33f5b46 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5815,7 +5815,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, sc->reclaim_idx, 0)) return false; - if (compaction_suitable(zone, sc->order, sc->reclaim_idx)) + if (compaction_suitable(zone, sc->order, sc->reclaim_idx, ALLOC_CMA)) return false; } @@ -6043,7 +6043,7 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) return true; /* Compaction cannot yet proceed. Do reclaim. */ - if (!compaction_suitable(zone, sc->order, sc->reclaim_idx)) + if (!compaction_suitable(zone, sc->order, sc->reclaim_idx, ALLOC_CMA)) return false; /*