From patchwork Sun Feb 25 11:42:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13570800 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A14DDC47DD9 for ; Sun, 25 Feb 2024 11:43:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BFC8D6B0108; Sun, 25 Feb 2024 06:43:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB6156B0109; Sun, 25 Feb 2024 06:43:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4D186B010A; Sun, 25 Feb 2024 06:43:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 94F336B0108 for ; Sun, 25 Feb 2024 06:43:24 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6AD711204D5 for ; Sun, 25 Feb 2024 11:43:24 +0000 (UTC) X-FDA: 81830140728.03.7E71119 Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) by imf23.hostedemail.com (Postfix) with ESMTP id EFA8B140005 for ; Sun, 25 Feb 2024 11:43:22 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="irtDlF/p"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.210.53 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708861403; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=jVBdpYgZgiLhnLWjwyBwQwt3xIQGBnTqNz+Ljh1Myno=; b=a1X0yuTmnT46tiQpuGVxda55qySIeKUElBbIGNgxYFfAHqh7vlNSqaAuJejnF+N68xh+Lf FTykl1CW6MKcKa/UlohD+sjZ6R6hqUNAu+DozUqNSvfNW8t+EaGfeWTFhHxXhP4tHoW4ai 9LFXdk2F6vXyvaENjGcvbAFrQfrFtxg= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="irtDlF/p"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.210.53 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708861403; a=rsa-sha256; cv=none; b=1U2afKKVBqnjI3nFZNwA37n5j93ToIOrtnXriy4Fmv8FvccJo1s9EyLVFACazGRRmd/I4/ R9EWaUyOh1IQeyY4zzCRymskU9q0Az2r18GxY9Dsh+PbsPNRxnOvHaukGJaIrHnXwcOLNd Kl7jlKM/Uwjn0666RK1fidrBtkk12f4= Received: by mail-ot1-f53.google.com with SMTP id 46e09a7af769-6e432514155so964132a34.1 for ; Sun, 25 Feb 2024 03:43:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708861402; x=1709466202; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=jVBdpYgZgiLhnLWjwyBwQwt3xIQGBnTqNz+Ljh1Myno=; b=irtDlF/pTuEJb0gJhNUNY1AXgSc3swzlaUliu0IilwMDV/kYW42Ln13bMOMh14vby4 ZGwvQIiiAgXbH5tA3UUBcfHiWsCfL17p/9yqU0AmtBhKCBHfxtAiOOXhnedNyJ9DGIL6 HrcHymQquNua9w/IxFUHRQvYPH5S6ZY59uByHRxXd6Bb4diK43S/s8jKtE7phU+FNfYF NG5E2TKI8HWJ6ptktd4DSzhHXtyRZzVID3a8U/gfIQNikL+KCxVsyhUFufnjGG+ecsO1 u8RYEqsjkhshmm2CXGljTmRAs9/CJZ+CysGm1FGZiS8yHPoKeTg/cET/EBj/nfsmmBSZ 43cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708861402; x=1709466202; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jVBdpYgZgiLhnLWjwyBwQwt3xIQGBnTqNz+Ljh1Myno=; b=fz/Jl999SD4G6h+AdRTDHL7rM1j8zYloMfzXVKXU7HeQkeDk2/etXnFGM5gwltywF0 /AtxT6lVBf3DcUR4lDmR6r7vm0M5J5/5MWfUQ+h7ALTerZTVVtcv+n2hseWEGovcUM+b gcHwKBzkIIHDay2kRjtZow/o0NnpEIqFgUeQ+pv3d86daNyGbTVfpeB/mv1Cq2b06DVW HSJ711n3xe/gxWtxM2+dd0/lvV4YnT0Ilmm2pgVNhO3D+yvA7QROI6tcY+acFmMkax6i Ni8pGXYWy4cQuS8+5RDwHP+QUl5ent9nYiCsO7V5xbdn4krpySMGaE6rDgqtDWcURNIM YGPg== X-Gm-Message-State: AOJu0YwutxE55RftB89Zx3hVGLyF38rsWguNWmnPIvGNN78iLVhCYwrq CrelVLp0t7L89AgTRT5eHYhpH3a00rCRqDF/zNLZi9poKhAG3AVl X-Google-Smtp-Source: AGHT+IFKLpWYPQKMmH18xHQRGNHPNhVw2cPeQMO/2PVXe2tc5ak+AlnHme42ADvOdxlpPJ4SywLysg== X-Received: by 2002:a05:6830:1e96:b0:6e4:947e:e5d4 with SMTP id n22-20020a0568301e9600b006e4947ee5d4mr2287326otr.35.1708861401995; Sun, 25 Feb 2024 03:43:21 -0800 (PST) Received: from localhost.localdomain ([39.144.104.176]) by smtp.gmail.com with ESMTPSA id n15-20020aa7984f000000b006e215f95bc8sm2209030pfq.152.2024.02.25.03.43.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 25 Feb 2024 03:43:21 -0800 (PST) From: Yafang Shao To: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev Cc: linux-mm@kvack.org, Yafang Shao Subject: [RFC PATCH] mm: Add reclaim type to memory.reclaim Date: Sun, 25 Feb 2024 19:42:04 +0800 Message-Id: <20240225114204.50459-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) MIME-Version: 1.0 X-Rspamd-Queue-Id: EFA8B140005 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: rjbops9izco8zpjg1fu49tj9m136up8p X-HE-Tag: 1708861402-69546 X-HE-Meta: U2FsdGVkX19tt7TjmG2gIVT9jaCKBo/Ve/XQnPqNhqTe/GVgBzD8KVukaaV/lNKZ157nxo+AC0oSiM80XERtEV/FgseVlJSF1AYJJcPP5sGLivkrFxyastO2bWYXKzX8HfHa246AzOYAkclO53iG1V8xCU46kgmhgB9mo7imje+dXTRW8KL3pa1UA3Ac/JT6Ve+NkPs2d9s2vOboGpPaGsDwgFp158MsWZRzk3W6SynTVJKqYHe9tBSrYLxuVNQnlGCKMiw8PtiAx7tc5ADKkIwjgweyUruHoTg6qRH9ZiIW7j8W3ju858ahQ9vw3jshoqQr0+aaWoo6szytOhiwTqXxhDEmH+G/4JC6FTQfqIOUro5ingujBRJ/CsiNyxF42/VcHG89rzyH1tjEEZDrzyyG3MaQe6mDULeijHs1o/7GclH9/1qmi8+uUp+UalGssHBLlGPjxbi8pLyyhcNq6xM6qQJAysfY8WP5+ZAdvoKb/g6ZomNEzrR41Uo5zoSTEGIVttvydnxCD3ZlczM+c5CPk9qTllQJf8KQ/B2H/OeXO2bk4rhmTZGHi4+tYp3VlBSI57Re17o5EPbQ0TGbbiMCnXObRisPupsKmq2kAOWcaI8Y9IJjyUiax+Y+TJad9X5FId/s69juR9U310aiDkUwNbVDjWzmGj1WdIH4WTRUTKLJrHLvNcNZIvEMR/zApRGqqh1SU5g7/w9TDBa8ZMyrwPDMnJHJj+Zr/xt+S+KLGWIbFLZgmIojXumqfOH23sUKPuvHf2dL8pT6jjW5+/1JXqMrtCIX+ItR+KeEfHDBtaE1SIfpQ/4JJmxJIQNOtaVIXFY59qwnpDHayawUxKFh8NtQdMqN4Xh0EcOY7sZ5/kJY1ZU+zJtqg2iQ2IheGL2YIB9Efc25R4mPKxW+nKSflQroROeeKSsLr+wPAlA2mMQYFSHc2AqgctXHOQymsrx/dL/FjF/Bj1602oI 3CG7IkD2 OEO0xYlhOKqa4ncTp0CGzOhP/7fq1kYhtPP/DRQLPYQKfIY/65sL1DQDAAiu4oeVCPPQUV5tvPTjlzUTSTv++G/X9u5HlPzM+EEoSrY/bYcQPnXXS2SIt+RmC9CM4blcP2Hw2nmlcpm/zsBr3bNnMl/6D6q7s4Qw0EIuGplxGHaCIHe9EKHwhMM2SziivtPkOEsfl99I2hmaRdbaPKhfY5dt5Ug== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In our container environment, we've observed that certain containers may accumulate more than 40GB of slabs, predominantly negative dentries. These negative dentries remain unreclaimed unless there is memory pressure. Even after the containers exit, these negative dentries persist. To manage disk storage efficiently, we employ an agent that identifies container images eligible for destruction once all instances of that image exit. However, during destruction, dealing with directories containing numerous negative dentries can significantly impact performance. To mitigate this issue, we aim to proactively reclaim these dentries using a user agent. Extending the memory.reclaim functionality to specifically target slabs aligns with our requirements. We propose adding a "type=" parameter to memory.reclaim to allow reclamation of pagecache pages only, slabs only, or both: - type=1: Reclaim pagecache pages only - type=2: Reclaim slabs only - type=3: Reclaim both pagecache pages and slabs For instance: echo "1M type=1" > /sys/fs/cgroup/test/memory.reclaim will perform the reclaim on the 'test' memcg to reclaim pagecache pages only. Please note that due to the derferred freeing of slabs, the amount of reclaimed slabs may higher than 1M during this process. Signed-off-by: Yafang Shao --- Documentation/admin-guide/cgroup-v2.rst | 12 ++++++++++++ include/linux/swap.h | 2 ++ mm/memcontrol.c | 22 +++++++++++++++++++++- mm/vmscan.c | 13 ++++++++++--- 4 files changed, 45 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 0270517ade47..6807d0fa197d 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1322,6 +1322,18 @@ The following nested keys are defined. same semantics as vm.swappiness applied to memcg reclaim with all the existing limitations and potential future extensions. + ==== ============================== + type Type of memory to reclaim with + ==== ============================== + + Specifying a memory type value instructs the kernel to perform + the reclaim with that memory type. The current supported + values are: + + 1 - Reclaim pagecache pages only + 2 - Reclaim slabs only + 3 - Reclaim both pagecache pages and slabs + memory.peak A read-only single value file which exists on non-root cgroups. diff --git a/include/linux/swap.h b/include/linux/swap.h index 41e4b484bc34..27c432101032 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -404,6 +404,8 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, #define MEMCG_RECLAIM_MAY_SWAP (1 << 1) #define MEMCG_RECLAIM_PROACTIVE (1 << 2) +#define MEMCG_RECLAIM_PAGECACHE_ONLY (1 << 3) +#define MEMCG_RECLAIM_SLAB_ONLY (1 << 4) #define MIN_SWAPPINESS 0 #define MAX_SWAPPINESS 200 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4070ba84b508..3dfdbf5782c8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include #include @@ -6930,11 +6931,13 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, enum { MEMORY_RECLAIM_SWAPPINESS = 0, + MEMORY_RECLAIM_TYPE = 1, MEMORY_RECLAIM_NULL, }; static const match_table_t tokens = { { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"}, + { MEMORY_RECLAIM_TYPE, "type=%d"}, { MEMORY_RECLAIM_NULL, NULL }, }; @@ -6944,7 +6947,7 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); unsigned int nr_retries = MAX_RECLAIM_RETRIES; unsigned long nr_to_reclaim, nr_reclaimed = 0; - int swappiness = -1; + int swappiness = -1, type = 0; unsigned int reclaim_options; char *old_buf, *start; substring_t args[MAX_OPT_ARGS]; @@ -6968,12 +6971,29 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS) return -EINVAL; break; + case MEMORY_RECLAIM_TYPE: + if (match_int(&args[0], &type)) + return -EINVAL; + if (type > 3 || type <= 0) + return -EINVAL; + break; default: return -EINVAL; } } reclaim_options = MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE; + switch (type) { + case 1: + reclaim_options |= MEMCG_RECLAIM_PAGECACHE_ONLY; + break; + case 2: + reclaim_options |= MEMCG_RECLAIM_SLAB_ONLY; + break; + default: + break; + } + while (nr_reclaimed < nr_to_reclaim) { /* Will converge on zero, but reclaim enforces a minimum */ unsigned long batch_size = (nr_to_reclaim - nr_reclaimed) / 4; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4b1a609755bb..53cea01a1742 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -141,6 +141,9 @@ struct scan_control { /* Always discard instead of demoting to lower tier memory */ unsigned int no_demotion:1; + unsigned int pagecache_only:1; + unsigned int slab_only:1; + /* Allocation order */ s8 order; @@ -5881,10 +5884,12 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) reclaimed = sc->nr_reclaimed; scanned = sc->nr_scanned; - shrink_lruvec(lruvec, sc); + if (!sc->slab_only) + shrink_lruvec(lruvec, sc); - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, - sc->priority); + if (!sc->pagecache_only) + shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, + sc->priority); /* Record the group's reclaim efficiency */ if (!sc->proactive) @@ -6522,6 +6527,8 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .may_unmap = 1, .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .pagecache_only = !!(reclaim_options & MEMCG_RECLAIM_PAGECACHE_ONLY), + .slab_only = !!(reclaim_options & MEMCG_RECLAIM_SLAB_ONLY), }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put