From patchwork Wed Mar 19 04:15:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhongkun He X-Patchwork-Id: 14022036 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 452EFC35FF1 for ; Wed, 19 Mar 2025 04:15:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 179AE280002; Wed, 19 Mar 2025 00:15:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1284E280001; Wed, 19 Mar 2025 00:15:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0A24280002; Wed, 19 Mar 2025 00:15:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D28F1280001 for ; Wed, 19 Mar 2025 00:15:50 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 71AFCBAC1D for ; Wed, 19 Mar 2025 04:15:51 +0000 (UTC) X-FDA: 83236987302.12.55109B0 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf07.hostedemail.com (Postfix) with ESMTP id CEA774000C for ; Wed, 19 Mar 2025 04:15:48 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=QmUEPX9p; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf07.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742357749; a=rsa-sha256; cv=none; b=t6Lm6yUmWf+VKe91Q/GK199HCQKL087Ru717xbHwBLqBDjCHs+A8y28jiFG5+8wJEiGxI5 XJD7GsSL8rjfE3Mrzwo5nQUkBP9W1ZmaHLCbpVLCfVWrT99Vj5MZajoPghTzXKmDITm4Gb CSvQYX4SDEcOlxj/IpXkGO3Di4PmJ0w= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=QmUEPX9p; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf07.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742357749; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ofUmcFoXUDulR5CEd/UA152CPLLCw0nV9Au5bxRVUcE=; b=AJ3hbi8mJOVmzW6Ag6PX5mJ4+bsVel7QE9MdoAlhkNGoU6vyLg5T+79Mj7pvrK9ISq4MKu 6fG0eCk3SA55Vg34oDSH21NhrKILXlune9RQwC5BMRmgaRuvOuCi+keKWBi63RYiEChGdp /kEBp5WywjAkGyZ+B0dg+ZzaZRO+vyM= Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-3012a0c8496so5114219a91.2 for ; Tue, 18 Mar 2025 21:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1742357747; x=1742962547; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ofUmcFoXUDulR5CEd/UA152CPLLCw0nV9Au5bxRVUcE=; b=QmUEPX9pdkRkeDOCkLNA3jNLnEq2UEcAYC/qMXjLQfy4KyfaN3x9X8nWEzx8cOjd9j hvns4sIVWcR4acj0wn0rxpKUqLqGU2drKLW7iUIOtcTpuoM3ADwasvuvhqrjXtzMOhSC WCejW1rZdTOX0sA7O+o2XXk9x57TPbPeupSDPyyXGmJI7qlKeZMTBJe8W9e3ue7XMw76 P/DfzmtGJHit+TGw9WUS6jBrqvMADUgk1JudZD7P0Lz2nmahPLOaCtrtqknwFaGQwk5p r0bmcFtXtnfFcfnbBamv0c9RjR79RmUgiXkPtbjpVuWC23bsv8EzODmdp1ae6gBraWo8 UnTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742357747; x=1742962547; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ofUmcFoXUDulR5CEd/UA152CPLLCw0nV9Au5bxRVUcE=; b=m6L6AqkMzlSdpEPmtAsaR/22DlVuycuySKHOmQgA5trlmSeu6jmnTjD5KfJGyv8bC8 OcRkmO6ovN+o4f1AeIJP6cGzO6sJgClIBmPN2g7lzTEAxj/bNKJiT81Y68uqdKkWOoRg fUcVJfVqLFVFxXXe6lGAgAx4gInHq8ehZ2saJVSnZkNQMBnzZSSV63QIFI82aBgCIdzB CfzzH0wylaigMwbBPF2AZQ3zkimP1O2jfdenwjBH2cWsd52PmqWOAr8PC7Pjh1EhAZp0 TSDWOxpGmhruGoYDSGzlJftkq40XQ9AJmAGtQyVvm/UbMxl+SV78wXSRnI064LbsJ8RX s7sA== X-Forwarded-Encrypted: i=1; AJvYcCU9/2u+3fdP+NNwU66b4u4Z/XBHJuyvp2m+H2RFYzKx3BiKMJKADEdw7VB2LCks34X5iBz9QujAlA==@kvack.org X-Gm-Message-State: AOJu0YwSHSjEwoAp/wnyzmoT5q52oREeY0mQ8JeLyWS08T+EcCeW1JdY B7dvwf5R+hdHAI9uB8clIGItOnCFD58A+7o3fvj0nbNF2CfPicKt4Cho5nZwSF0= X-Gm-Gg: ASbGnct+trb5t1d6hk4DMGuD3LHDHWTBFzd42mTX/44GjYBI4RF4DgmwDm9588xkEw0 2/1nDlL1wADqUxfvn3uVFTRK7Nb6rTEswrpbMcIAnrmVBtBQ3Wy/DyDmwHiav+8VNkEreo3Vdsj 0E2GL+9N9f7xahSCelmerDmAk2Wte672SWVgTWebFckA55ct5AzkNbFpPY4QeE99kOoabFM3XdG tK+Ie9UmOQCP0Kb/poD0MGpTWV82hdUJats3zMmVlwrG7NAiwJVG4AF1iBYt2rhxO4SBQYVPOdE frNnucT6bB4giX0ftPVz23A3isCSYblCvh/v3/tTg0zXmygczC6fIR9eB4DxnP+i3Wp6kw== X-Google-Smtp-Source: AGHT+IHoD1by/glIMwNAj++qJ0/prHOI4LYCs/XespnzgESYiHgFgUgG9fcobDB7NU+VAzYmUL8SOQ== X-Received: by 2002:a17:90b:2541:b0:2ef:19d0:2261 with SMTP id 98e67ed59e1d1-301bdf942f8mr2382662a91.16.1742357747531; Tue, 18 Mar 2025 21:15:47 -0700 (PDT) Received: from n37-069-081.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-301bf61a44csm404799a91.37.2025.03.18.21.15.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Mar 2025 21:15:46 -0700 (PDT) From: Zhongkun He To: akpm@linux-foundation.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, yuzhao@google.com Cc: mhocko@suse.com, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhongkun He Subject: [PATCH V2] mm: add swappiness=max arg to memory.reclaim for only anon reclaim Date: Wed, 19 Mar 2025 12:15:36 +0800 Message-Id: <20250319041536.520852-1-hezhongkun.hzk@bytedance.com> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: CEA774000C X-Stat-Signature: ubwjqhztt5h9oi7b7zxqiz4ymy4orpj4 X-Rspamd-Server: rspam06 X-HE-Tag: 1742357748-442738 X-HE-Meta: U2FsdGVkX1+dhkJhdpC5LSh+gSV89meR0gyJGRy0LiZVBv3Kum0eQeIvT0EX4cFVmrRUCKB+1CvOrk1dng1/xxcsxVvKPN3ZDl+111/OOc4eWtGWApMWc0qP9WbItutIN7yMeHHrlEY8kFnHarOYK9e54q9VMK/YMmKzmelbiWC5MYXqyNAQ5chLvsk0AHZOdyK7B0hrWMkNyu+eW11X0CtU5WGiimrWtCEBDlfAyCY++F7nnfbLCd7k1mLpOQi3qcVM2wd1L+kRRhLmpT+BcmMAXXvZaGeQfVTfKU/MyQotuDRQy4dmKrhTNxXCcf/ZQNZgD7ie9d4xSeFdA83fRiwz9Mj9jg7UEY+838JWDmOBgWj8DXeVOr6WfbgX53LWsQtDqCeTMYoDkuvkG5AyLhsVXr/A97zz/1JFEEil598r303cIkdthmuXaYzG/vBam9mrUGpbNl79XotWqrHZ0gBKOBFWN/WssUfaPeJy9xyi6cBscd4jBJsNvzT8koFLgeim3+495hxtORjf2X9alz3boJyw/wjrG3zXjhJhMStSdPONSw3goLTei0Abifx4qYJaHR7v19nx5ZulXxiuyf4MZmhSuXqsHLdkkYxGGNy211Dy2Lu1g7AqpsRELUVzkxPnGgxvZ1EsbXR2evXZr3xdaypv/SiTWCqC9al4C5JH8naARRRnHUZwEMolzJY8T43jG5Rc0PZX8l5g3qSitFkT8KuwmTx+GJ6psVCspfoKOLQdHEf7EdVlCWp4Z+M2KO1rVgZdaKPySRIk5FYqFCesohwHihlg7cwDnti4uKlGaaZ7HNgWY4cqnZTS3W9kwaNFmvM8f4fVt6HgRcl7sWdR38rsRARg0IZBN2Y11iQoKRWufoKMk5AKTs1ytkW2WgfAUyrsrb8Q7PTB8IiiZ7lgvAtJouU6NFiZktMRZqoFYk6eFe7/iXneXvoWT8HZ4Nm0fsISpYbKtkGCTB8 ds/6qGLo yLPHdz6Fcu/1IjBgZBZ1KouQRzMBPj3p93LLm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to memory.reclaim")', we can submit an additional swappiness= argument to memory.reclaim. It is very useful because we can dynamically adjust the reclamation ratio based on the anonymous folios and file folios of each cgroup. For example,when swappiness is set to 0, we only reclaim from file folios. However,we have also encountered a new issue: when swappiness is set to the MAX_SWAPPINESS, it may still only reclaim file folios. So, we hope to add a new arg 'swappiness=max' in memory.reclaim where proactive memory reclaim only reclaims from anonymous folios when swappiness is set to max. The swappiness semantics from a user perspective remain unchanged. For example, something like this: echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim will perform reclaim on the rootcg with a swappiness setting of 'max' (a new mode) regardless of the file folios. Users have a more comprehensive view of the application's memory distribution because there are many metrics available. For example, if we find that a certain cgroup has a large number of inactive anon folios, we can reclaim only those and skip file folios, because with the zram/zswap, the IO tradeoff that cache_trim_mode or other file first logic is making doesn't hold - file refaults will cause IO, whereas anon decompression will not. With this patch, the swappiness argument of memory.reclaim has a new mode 'max', means reclaiming just from anonymous folios both in traditional LRU and MGLRU. Here is the previous discussion: https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/ https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/ Suggested-by: Yosry Ahmed Signed-off-by: Zhongkun He --- V2: Make the code clearer. -- from yosry's suggestions Documentation/admin-guide/cgroup-v2.rst | 2 ++ include/linux/swap.h | 4 ++++ mm/memcontrol.c | 5 +++++ mm/vmscan.c | 7 +++++++ 4 files changed, 18 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index cb1b4e759b7e..254cead74d62 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1343,6 +1343,8 @@ The following nested keys are defined. same semantics as vm.swappiness applied to memcg reclaim with all the existing limitations and potential future extensions. + Setting swappiness=max exclusively reclaims anonymous memory. + memory.peak A read-write single value file which exists on non-root cgroups. diff --git a/include/linux/swap.h b/include/linux/swap.h index b13b72645db3..60370bf989c8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -419,6 +419,10 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, #define MEMCG_RECLAIM_PROACTIVE (1 << 2) #define MIN_SWAPPINESS 0 #define MAX_SWAPPINESS 200 + +/* Just recliam from anon folios in proactive memory reclaim */ +#define SWAPPINESS_ANON_ONLY (MAX_SWAPPINESS + 1) + extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4de6acb9b8ec..2e16d6b52fdd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4291,11 +4291,13 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, enum { MEMORY_RECLAIM_SWAPPINESS = 0, + MEMORY_RECLAIM_SWAPPINESS_MAX, MEMORY_RECLAIM_NULL, }; static const match_table_t tokens = { { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"}, + { MEMORY_RECLAIM_SWAPPINESS_MAX, "swappiness=max"}, { MEMORY_RECLAIM_NULL, NULL }, }; @@ -4329,6 +4331,9 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS) return -EINVAL; break; + case MEMORY_RECLAIM_SWAPPINESS_MAX: + swappiness = SWAPPINESS_ANON_ONLY; + break; default: return -EINVAL; } diff --git a/mm/vmscan.c b/mm/vmscan.c index c767d71c43d7..08fbb8da773b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2438,6 +2438,13 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, goto out; } + /* Proactive reclaim initiated by userspace for anonymous memory only */ + if (swappiness == SWAPPINESS_ANON_ONLY) { + WARN_ON_ONCE(!sc->proactive); + scan_balance = SCAN_ANON; + goto out; + } + /* * Do not apply any pressure balancing cleverness when the * system is close to OOM, scan both anon and file equally