Message ID | 20250116142242.615-1-justinjiang@vivo.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: memcg supports freeing the specified zone's memory | expand |
On Thu 16-01-25 22:22:42, Zhiguo Jiang wrote: > Currently, the try_to_free_mem_cgroup_pages interface releases the > memory occupied by the memcg, which defaults to all zones in the system. > However, for multi zone systems, such as when there are both movable zone > and normal zone, it is not possible to release memory that is only in > the normal zone. > > This patch is used to implement the try_to_free_mem_cgroup_pages interface > to support for releasing the specified zone's memory occupied by the > memcg in a multi zone systems, in order to optimize the memory usage of > multiple zones. Could you elaborate more on the actual usecase please? Who is going to control which zone to reclaim from, how and why?
在 2025/1/16 22:36, Michal Hocko 写道: > On Thu 16-01-25 22:22:42, Zhiguo Jiang wrote: >> Currently, the try_to_free_mem_cgroup_pages interface releases the >> memory occupied by the memcg, which defaults to all zones in the system. >> However, for multi zone systems, such as when there are both movable zone >> and normal zone, it is not possible to release memory that is only in >> the normal zone. >> >> This patch is used to implement the try_to_free_mem_cgroup_pages interface >> to support for releasing the specified zone's memory occupied by the >> memcg in a multi zone systems, in order to optimize the memory usage of >> multiple zones. > Could you elaborate more on the actual usecase please? Who is going to > control which zone to reclaim from, how and why? Hi Michal Hocko, Thanks for your comments. In the memory allocation process, it can be known that the application gfp flags determine which zones it can only alloc memory from. __alloc_frozen_pages_noprof --> prepare_alloc_pages --> ac->highest_zoneidx = gfp_zone(gfp_mask); The order of allocation from zones is as follows: MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA. For example, in a dual zone system with both movable and normal zones, according to the GFP_ZONE_TABLE table, it can be known that which zone can different gfp flags alloc memory from, as follows: * GFP_ZONE_TABLE * bit result * ================= * 0x0 => NORMAL * 0x1 => DMA or NORMAL * 0x2 => HIGHMEM or NORMAL * 0x3 => BAD (DMA+HIGHMEM) * 0x4 => DMA32 or NORMAL * 0x5 => BAD (DMA+DMA32) * 0x6 => BAD (HIGHMEM+DMA32) * 0x7 => BAD (HIGHMEM+DMA32+DMA) * 0x8 => NORMAL (MOVABLE+0) * 0x9 => DMA or NORMAL (MOVABLE+DMA) * 0xa => MOVABLE (Movable is valid only if HIGHMEM is set too) * 0xb => BAD (MOVABLE+HIGHMEM+DMA) * 0xc => DMA32 or NORMAL (MOVABLE+DMA32) * 0xd => BAD (MOVABLE+DMA32+DMA) * 0xe => BAD (MOVABLE+DMA32+HIGHMEM) * 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA) The gfps containing __GFP_MOVABLE | __GFP_HIGHMEM can alloc from both the movable zone and the normal zone, while other gfp flags such as GFP_KERNEL can only alloc from the normal zone, even if there is very little free memory in the normal zone and a lot of memory in the movable zone in the current system. In response to the above situation, we need reclaim only the normal zone's memory occupied by memcg by try_to_free_mem_cgroup_pages(), in order to solve the issues of the gfp flags allocations and failure due to gfp flags limited only to alloc memory from the normal zone. At this point, if the memcg memory reclaimed by try_to_free_mem_cgroup_pages() mainly comes from the movable zone, which cannot solve such problems. In try_to_free_mem_cgroup_pages(), the sc.reclaim_idx will determine which zones the memcg's memory are reclaimed from. The current sc.reclaim_idx is fixed to MAX_NR_ZONES - 1, which means memcg is fixed to reclaim all the zones's memory occupied by it. Thanks >
On Fri 17-01-25 12:41:40, zhiguojiang wrote: [...] > In response to the above situation, we need reclaim only the normal > zone's memory occupied by memcg by try_to_free_mem_cgroup_pages(), in > order to solve the issues of the gfp flags allocations and failure due > to gfp flags limited only to alloc memory from the normal zone. At this > point, if the memcg memory reclaimed by try_to_free_mem_cgroup_pages() > mainly comes from the movable zone, which cannot solve such problems. Memory cgroup reclaim doesn't allocate the memory directly. This is done by the page allocator called before the memory is charged. The memcg charging is then responsible for reclaiming charges and that is not really zone aware. Could you describe problem that you are trying to solve?
在 2025/1/17 17:33, Michal Hocko 写道: > On Fri 17-01-25 12:41:40, zhiguojiang wrote: > [...] >> In response to the above situation, we need reclaim only the normal >> zone's memory occupied by memcg by try_to_free_mem_cgroup_pages(), in >> order to solve the issues of the gfp flags allocations and failure due >> to gfp flags limited only to alloc memory from the normal zone. At this >> point, if the memcg memory reclaimed by try_to_free_mem_cgroup_pages() >> mainly comes from the movable zone, which cannot solve such problems. > Memory cgroup reclaim doesn't allocate the memory directly. This is done Yes, what I mean is that we hope to reclaim accurately the specified zone's memory occupied by memcg through try_to_free_mem_cgroup_pages(), in order to meet the current system's memory allocation requirements for the specified zone on the memory allocate path. > by the page allocator called before the memory is charged. The memcg > charging is then responsible for reclaiming charges and that is not > really zone aware. > > Could you describe problem that you are trying to solve? In a dual zone system with both movable and normal zones, we encountered the problem where the GFP_KERNEL flag failed to allocate memory from the normal zone and crashed. Analyzing the logs, we found that there was very little free memory in the normal zone, but more free memory in the movable zone at this time. Therefore, we want to reclaim accurately the normal zone's memory occupied by memcg through try_to_free_mem_cgroup_pages(). Thanks
On Fri 17-01-25 18:25:13, zhiguojiang wrote: [...] > > Could you describe problem that you are trying to solve? > > In a dual zone system with both movable and normal zones, we encountered > the problem where the GFP_KERNEL flag failed to allocate memory from the > normal zone and crashed. Analyzing the logs, we found that there was > very little free memory in the normal zone, but more free memory in the > movable zone at this time. Therefore, we want to reclaim accurately > the normal zone's memory occupied by memcg through > try_to_free_mem_cgroup_pages(). Could you be more specific please? What was the allocation request. Has the allocation or charge failed? Do you have allocation failure memory info or oom killer report?
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 2be6b9112808..9dc398e9d5f9 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1377,7 +1377,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, continue; } - if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, + if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_HIGHUSER_MOVABLE, memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { ret = -EBUSY; break; @@ -1409,7 +1409,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) if (signal_pending(current)) return -EINTR; - if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, + if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_HIGHUSER_MOVABLE, MEMCG_RECLAIM_MAY_SWAP, NULL)) nr_retries--; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 46f8b372d212..e0b92edb2f3e --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1945,7 +1945,7 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, psi_memstall_enter(&pflags); nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, - gfp_mask, + gfp_mask | __GFP_MOVABLE | __GFP_HIGHMEM, MEMCG_RECLAIM_MAY_SWAP, NULL); psi_memstall_leave(&pflags); @@ -2253,7 +2253,8 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, psi_memstall_enter(&pflags); nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, - gfp_mask, reclaim_options, NULL); + gfp_mask | __GFP_MOVABLE | __GFP_HIGHMEM, + reclaim_options, NULL); psi_memstall_leave(&pflags); if (mem_cgroup_margin(mem_over_limit) >= nr_pages) @@ -4109,7 +4110,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, } reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL); + GFP_HIGHUSER_MOVABLE, MEMCG_RECLAIM_MAY_SWAP, NULL); if (!reclaimed && !nr_retries--) break; @@ -4158,7 +4159,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, if (nr_reclaims) { if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL)) + GFP_HIGHUSER_MOVABLE, MEMCG_RECLAIM_MAY_SWAP, NULL)) nr_reclaims--; continue; } @@ -4351,7 +4352,7 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, lru_add_drain_all(); reclaimed = try_to_free_mem_cgroup_pages(memcg, - batch_size, GFP_KERNEL, + batch_size, GFP_HIGHUSER_MOVABLE, reclaim_options, swappiness == -1 ? NULL : &swappiness); diff --git a/mm/vmscan.c b/mm/vmscan.c index 5b626b4f38af..9d198bc4e543 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6610,8 +6610,8 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX), .proactive_swappiness = swappiness, .gfp_mask = (current_gfp_context(gfp_mask) & GFP_RECLAIM_MASK) | - (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK), - .reclaim_idx = MAX_NR_ZONES - 1, + (gfp_mask & (__GFP_MOVABLE | __GFP_HIGHMEM)), + .reclaim_idx = gfp_zone(gfp_mask), .target_mem_cgroup = memcg, .priority = DEF_PRIORITY, .may_writepage = !laptop_mode,
Currently, the try_to_free_mem_cgroup_pages interface releases the memory occupied by the memcg, which defaults to all zones in the system. However, for multi zone systems, such as when there are both movable zone and normal zone, it is not possible to release memory that is only in the normal zone. This patch is used to implement the try_to_free_mem_cgroup_pages interface to support for releasing the specified zone's memory occupied by the memcg in a multi zone systems, in order to optimize the memory usage of multiple zones. Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com> --- mm/memcontrol-v1.c | 4 ++-- mm/memcontrol.c | 11 ++++++----- mm/vmscan.c | 4 ++-- 3 files changed, 10 insertions(+), 9 deletions(-)