diff mbox series

mm: memcg supports freeing the specified zone's memory

Message ID 20250116142242.615-1-justinjiang@vivo.com (mailing list archive)
State New
Headers show
Series mm: memcg supports freeing the specified zone's memory | expand

Commit Message

zhiguojiang Jan. 16, 2025, 2:22 p.m. UTC
Currently, the try_to_free_mem_cgroup_pages interface releases the
memory occupied by the memcg, which defaults to all zones in the system.
However, for multi zone systems, such as when there are both movable zone
and normal zone, it is not possible to release memory that is only in
the normal zone.

This patch is used to implement the try_to_free_mem_cgroup_pages interface
to support for releasing the specified zone's memory occupied by the
memcg in a multi zone systems, in order to optimize the memory usage of
multiple zones.

Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
---
 mm/memcontrol-v1.c |  4 ++--
 mm/memcontrol.c    | 11 ++++++-----
 mm/vmscan.c        |  4 ++--
 3 files changed, 10 insertions(+), 9 deletions(-)

Comments

Michal Hocko Jan. 16, 2025, 2:36 p.m. UTC | #1
On Thu 16-01-25 22:22:42, Zhiguo Jiang wrote:
> Currently, the try_to_free_mem_cgroup_pages interface releases the
> memory occupied by the memcg, which defaults to all zones in the system.
> However, for multi zone systems, such as when there are both movable zone
> and normal zone, it is not possible to release memory that is only in
> the normal zone.
> 
> This patch is used to implement the try_to_free_mem_cgroup_pages interface
> to support for releasing the specified zone's memory occupied by the
> memcg in a multi zone systems, in order to optimize the memory usage of
> multiple zones.

Could you elaborate more on the actual usecase please? Who is going to
control which zone to reclaim from, how and why?
zhiguojiang Jan. 17, 2025, 4:41 a.m. UTC | #2
在 2025/1/16 22:36, Michal Hocko 写道:
> On Thu 16-01-25 22:22:42, Zhiguo Jiang wrote:
>> Currently, the try_to_free_mem_cgroup_pages interface releases the
>> memory occupied by the memcg, which defaults to all zones in the system.
>> However, for multi zone systems, such as when there are both movable zone
>> and normal zone, it is not possible to release memory that is only in
>> the normal zone.
>>
>> This patch is used to implement the try_to_free_mem_cgroup_pages interface
>> to support for releasing the specified zone's memory occupied by the
>> memcg in a multi zone systems, in order to optimize the memory usage of
>> multiple zones.
> Could you elaborate more on the actual usecase please? Who is going to
> control which zone to reclaim from, how and why?
Hi Michal Hocko,

Thanks for your comments.

In the memory allocation process, it can be known that the application
gfp flags determine which zones it can only alloc memory from.
__alloc_frozen_pages_noprof
   --> prepare_alloc_pages
       --> ac->highest_zoneidx = gfp_zone(gfp_mask);

The order of allocation from zones is as follows:
MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA.

For example, in a dual zone system with both movable and normal zones,
according to the GFP_ZONE_TABLE table, it can be known that which zone
can different gfp flags alloc memory from, as follows:

*       GFP_ZONE_TABLE
*       bit       result
*       =================
*       0x0    => NORMAL
*       0x1    => DMA or NORMAL
*       0x2    => HIGHMEM or NORMAL
*       0x3    => BAD (DMA+HIGHMEM)
*       0x4    => DMA32 or NORMAL
*       0x5    => BAD (DMA+DMA32)
*       0x6    => BAD (HIGHMEM+DMA32)
*       0x7    => BAD (HIGHMEM+DMA32+DMA)
*       0x8    => NORMAL (MOVABLE+0)
*       0x9    => DMA or NORMAL (MOVABLE+DMA)
*       0xa    => MOVABLE (Movable is valid only if HIGHMEM is set too)
*       0xb    => BAD (MOVABLE+HIGHMEM+DMA)
*       0xc    => DMA32 or NORMAL (MOVABLE+DMA32)
*       0xd    => BAD (MOVABLE+DMA32+DMA)
*       0xe    => BAD (MOVABLE+DMA32+HIGHMEM)
*       0xf    => BAD (MOVABLE+DMA32+HIGHMEM+DMA)

The gfps containing __GFP_MOVABLE | __GFP_HIGHMEM can alloc from both
the movable zone and the normal zone, while other gfp flags such as
GFP_KERNEL can only alloc from the normal zone, even if there is very
little free memory in the normal zone and a lot of memory in the movable
zone in the current system.

In response to the above situation, we need reclaim only the normal
zone's memory occupied by memcg by try_to_free_mem_cgroup_pages(), in
order to solve the issues of the gfp flags allocations and failure due
to gfp flags limited only to alloc memory from the normal zone. At this
point, if the memcg memory reclaimed by try_to_free_mem_cgroup_pages()
mainly comes from the movable zone, which cannot solve such problems.

In try_to_free_mem_cgroup_pages(), the sc.reclaim_idx will determine
which zones the memcg's memory are reclaimed from. The current
sc.reclaim_idx is fixed to MAX_NR_ZONES - 1, which means memcg is
fixed to reclaim all the zones's memory occupied by it.

Thanks
>
Michal Hocko Jan. 17, 2025, 9:33 a.m. UTC | #3
On Fri 17-01-25 12:41:40, zhiguojiang wrote:
[...]
> In response to the above situation, we need reclaim only the normal
> zone's memory occupied by memcg by try_to_free_mem_cgroup_pages(), in
> order to solve the issues of the gfp flags allocations and failure due
> to gfp flags limited only to alloc memory from the normal zone. At this
> point, if the memcg memory reclaimed by try_to_free_mem_cgroup_pages()
> mainly comes from the movable zone, which cannot solve such problems.

Memory cgroup reclaim doesn't allocate the memory directly. This is done
by the page allocator called before the memory is charged. The memcg
charging is then responsible for reclaiming charges and that is not
really zone aware.

Could you describe problem that you are trying to solve?
zhiguojiang Jan. 17, 2025, 10:25 a.m. UTC | #4
在 2025/1/17 17:33, Michal Hocko 写道:
> On Fri 17-01-25 12:41:40, zhiguojiang wrote:
> [...]
>> In response to the above situation, we need reclaim only the normal
>> zone's memory occupied by memcg by try_to_free_mem_cgroup_pages(), in
>> order to solve the issues of the gfp flags allocations and failure due
>> to gfp flags limited only to alloc memory from the normal zone. At this
>> point, if the memcg memory reclaimed by try_to_free_mem_cgroup_pages()
>> mainly comes from the movable zone, which cannot solve such problems.
> Memory cgroup reclaim doesn't allocate the memory directly. This is done
Yes, what I mean is that we hope to reclaim accurately the specified
zone's memory occupied by memcg through try_to_free_mem_cgroup_pages(),
in order to meet the current system's memory allocation requirements
for the specified zone on the memory allocate path.
> by the page allocator called before the memory is charged. The memcg
> charging is then responsible for reclaiming charges and that is not
> really zone aware.
>
> Could you describe problem that you are trying to solve?
In a dual zone system with both movable and normal zones, we encountered
the problem where the GFP_KERNEL flag failed to allocate memory from the
normal zone and crashed. Analyzing the logs, we found that there was
very little free memory in the normal zone, but more free memory in the
movable zone at this time. Therefore, we want to reclaim accurately
the normal zone's memory occupied by memcg through
try_to_free_mem_cgroup_pages().

Thanks
Michal Hocko Jan. 17, 2025, 11:43 a.m. UTC | #5
On Fri 17-01-25 18:25:13, zhiguojiang wrote:
[...]
> > Could you describe problem that you are trying to solve?
>
> In a dual zone system with both movable and normal zones, we encountered
> the problem where the GFP_KERNEL flag failed to allocate memory from the
> normal zone and crashed. Analyzing the logs, we found that there was
> very little free memory in the normal zone, but more free memory in the
> movable zone at this time. Therefore, we want to reclaim accurately
> the normal zone's memory occupied by memcg through
> try_to_free_mem_cgroup_pages().

Could you be more specific please? What was the allocation request. Has
the allocation or charge failed? Do you have allocation failure memory
info or oom killer report?
diff mbox series

Patch

diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 2be6b9112808..9dc398e9d5f9
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -1377,7 +1377,7 @@  static int mem_cgroup_resize_max(struct mem_cgroup *memcg,
 			continue;
 		}
 
-		if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL,
+		if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_HIGHUSER_MOVABLE,
 				memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) {
 			ret = -EBUSY;
 			break;
@@ -1409,7 +1409,7 @@  static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
 		if (signal_pending(current))
 			return -EINTR;
 
-		if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL,
+		if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_HIGHUSER_MOVABLE,
 						  MEMCG_RECLAIM_MAY_SWAP, NULL))
 			nr_retries--;
 	}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 46f8b372d212..e0b92edb2f3e
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1945,7 +1945,7 @@  static unsigned long reclaim_high(struct mem_cgroup *memcg,
 
 		psi_memstall_enter(&pflags);
 		nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages,
-							gfp_mask,
+							gfp_mask | __GFP_MOVABLE | __GFP_HIGHMEM,
 							MEMCG_RECLAIM_MAY_SWAP,
 							NULL);
 		psi_memstall_leave(&pflags);
@@ -2253,7 +2253,8 @@  int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 
 	psi_memstall_enter(&pflags);
 	nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages,
-						    gfp_mask, reclaim_options, NULL);
+						    gfp_mask | __GFP_MOVABLE | __GFP_HIGHMEM,
+						    reclaim_options, NULL);
 	psi_memstall_leave(&pflags);
 
 	if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
@@ -4109,7 +4110,7 @@  static ssize_t memory_high_write(struct kernfs_open_file *of,
 		}
 
 		reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
-					GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL);
+					GFP_HIGHUSER_MOVABLE, MEMCG_RECLAIM_MAY_SWAP, NULL);
 
 		if (!reclaimed && !nr_retries--)
 			break;
@@ -4158,7 +4159,7 @@  static ssize_t memory_max_write(struct kernfs_open_file *of,
 
 		if (nr_reclaims) {
 			if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max,
-					GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL))
+					GFP_HIGHUSER_MOVABLE, MEMCG_RECLAIM_MAY_SWAP, NULL))
 				nr_reclaims--;
 			continue;
 		}
@@ -4351,7 +4352,7 @@  static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 			lru_add_drain_all();
 
 		reclaimed = try_to_free_mem_cgroup_pages(memcg,
-					batch_size, GFP_KERNEL,
+					batch_size, GFP_HIGHUSER_MOVABLE,
 					reclaim_options,
 					swappiness == -1 ? NULL : &swappiness);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5b626b4f38af..9d198bc4e543
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6610,8 +6610,8 @@  unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 		.nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX),
 		.proactive_swappiness = swappiness,
 		.gfp_mask = (current_gfp_context(gfp_mask) & GFP_RECLAIM_MASK) |
-				(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK),
-		.reclaim_idx = MAX_NR_ZONES - 1,
+				(gfp_mask & (__GFP_MOVABLE | __GFP_HIGHMEM)),
+		.reclaim_idx = gfp_zone(gfp_mask),
 		.target_mem_cgroup = memcg,
 		.priority = DEF_PRIORITY,
 		.may_writepage = !laptop_mode,