From patchwork Wed Dec 13 01:38:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Schatzberg X-Patchwork-Id: 13490271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38AF4C4332F for ; Wed, 13 Dec 2023 01:39:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C3C2A8D0012; Tue, 12 Dec 2023 20:39:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BEC628D0009; Tue, 12 Dec 2023 20:39:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADB138D0012; Tue, 12 Dec 2023 20:39:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9E89D8D0009 for ; Tue, 12 Dec 2023 20:39:01 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 66BD0C0955 for ; Wed, 13 Dec 2023 01:39:01 +0000 (UTC) X-FDA: 81560086482.22.E3F918D Received: from mail-oi1-f179.google.com (mail-oi1-f179.google.com [209.85.167.179]) by imf19.hostedemail.com (Postfix) with ESMTP id 9A2021A001B for ; Wed, 13 Dec 2023 01:38:59 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=m+SA3xKF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of schatzberg.dan@gmail.com designates 209.85.167.179 as permitted sender) smtp.mailfrom=schatzberg.dan@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702431539; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=auvI36LUE8ejZAXPToUPGGocmqelYA9bY8VJoyIN2tw=; b=zIzEuz6QC4OVBJCOqXenHfVw1P1RAo24MbWbjHcS18iS8pbNpJB3rFGA8+Dl6LLTUbC9Vt Ggyofp0J6SpUgJW2uoVbvStynLtkTP3/gvEeUv9+nH6s1uh4NhZ1FaE1i9gGk/wTjt3gYN 8bQQPt13ofoiBMWPaue3/zfLBz+v1NQ= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=m+SA3xKF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of schatzberg.dan@gmail.com designates 209.85.167.179 as permitted sender) smtp.mailfrom=schatzberg.dan@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702431539; a=rsa-sha256; cv=none; b=0gl0is/lVS2JQaVpxEXal1deopRedUue+HS62516b7L5r6oxhS7v/GqDmL1wmvdSwv1aHl XGpvHcgb9odgetgOdvohwcZUTGkF6YRUFFcFv1MycIVQIlQ5wpieB8jSmlJbukbPJYSm8n gwaJLSWVP9f1SCXEXLKwiAs7I+TUits= Received: by mail-oi1-f179.google.com with SMTP id 5614622812f47-3b8958b32a2so5000978b6e.2 for ; Tue, 12 Dec 2023 17:38:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702431539; x=1703036339; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=auvI36LUE8ejZAXPToUPGGocmqelYA9bY8VJoyIN2tw=; b=m+SA3xKFRduwtiCeTZPxW7m+6ITJCf/0C+UzsyVBgdGC8N6UhyYiU7T4+In9MEdgcv /pINmjp7BkHb11lLqmnu9qia5U5PDvOQTVGQeOQiEO2H/WPtMfXACVGEFOr+UdRkCO7b 6gmIySYtGsbaw1N2Qn6durbBjaOyFbvJ1Uj/K3cXfnWCVVQqNBakb7Ad9XCiI/eoXuyx X6t29eNv1Ptvzewh9SnwZxEMj/VL7Ku+tQ07IQfhf/qbNRhf6FlqBw7hU0qqDHy8UbUi D4jsRjteXXgEq/JN6znUO8+GSWKbaTdRl27J20KUr4QDC3+xEFmFa4ehdnCiXOBto0CR USwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702431539; x=1703036339; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=auvI36LUE8ejZAXPToUPGGocmqelYA9bY8VJoyIN2tw=; b=EAONfaOxXtRVqnMHX4tREO+QwRaIU9G6dlM7WtwNx2JQFN9FtX2IrLmq1O7j69MsuI q7iIxoqCTPiuNQ6gtGVPvgP2qnxQuMbZG2zXAf8h2+9ioP9y4pFxWStYi/6P5Ke9VRPK /7PRY1aa2uQP3odhVtPNOFO+8V9mLBtrkWhNeuRR5/vn1u3IcF2RVjFQ9qdnDs7mn6Ck 2yNOFsgdLMfnM0zMVZZVLTi7Rw09+8GxIR4V68gWuVxJbDPlHqu+J4OrTR6EAPoL04yN Dd5od/bnlpMntPg0Kagu+Z7kwq0t3QW4ub4Ns0ppDy+0tgSwfT9xna9BewCyEbto7rUM EiPA== X-Gm-Message-State: AOJu0YyUPZbf6zjtKTUwaDly//got07ZD7x71YxV8KOIPTyl5bXtOqgu oxdZHyPdWyITxmfehnerrnQ= X-Google-Smtp-Source: AGHT+IEPRFFkdSriOP29xRP3FC7OLv6jCpNAM8KRiqnpSneN+mERl09dXemtdfLzfx721QtwIbEe3Q== X-Received: by 2002:a05:6808:1998:b0:3b9:de63:f514 with SMTP id bj24-20020a056808199800b003b9de63f514mr9078746oib.12.1702431538693; Tue, 12 Dec 2023 17:38:58 -0800 (PST) Received: from localhost ([2620:10d:c090:500::7:1c76]) by smtp.gmail.com with ESMTPSA id z3-20020a1709028f8300b001d0242c0471sm9256526plo.224.2023.12.12.17.38.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 17:38:58 -0800 (PST) From: Dan Schatzberg To: Johannes Weiner , Roman Gushchin , Yosry Ahmed , Huan Yang Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Tejun Heo , Zefan Li , Jonathan Corbet , Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Matthew Wilcox , Kefeng Wang , Dan Schatzberg , Yue Zhao , Hugh Dickins Subject: [PATCH V4 2/2] mm: add swapiness= arg to memory.reclaim Date: Tue, 12 Dec 2023 17:38:03 -0800 Message-Id: <20231213013807.897742-3-schatzberg.dan@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231213013807.897742-1-schatzberg.dan@gmail.com> References: <20231213013807.897742-1-schatzberg.dan@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9A2021A001B X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: fggxsusnrgwimw97x7nm1kmo9m3uj8dg X-HE-Tag: 1702431539-668100 X-HE-Meta: U2FsdGVkX1/0xSzuYZOUX0S3Nn9eDWrSGV1Ajvl2To01qfhGMZIhYp3hKaJ/8UM4qRAsGqoaR89S7bdSTXtB9ZpT+z7rZraQ2izCJq8lUtJ9iJm87Z0qcmX15i9GuzAdgcgVDFNwQKHX+rw1+G3/6pS7nhS8GYs1OYagJdNsVvBkOHqZ589p7NC/IjWkf0EmKkFgy1Ww/j1eR9STc+AjeWjOLb6htCaR2u4ZH2vVvgv6ucVVMGuJSTG8Vp/+qUABqwrDfbNVKsHjQYIsOWsYqvu6VRlr0/aOf9ImuQLGXCm0tvZnGutmMUYBZ+4fSRwwy51Qu77wMSBWI+cOOtPE44Xmi0H9uQtDFxw4TvWejJsx9SQvOey+jsVnGA0iFVnnE9lOl/v9Qq+bg0SqEfDTAOoBRRDfX3PVaMUwIjIdtub7O0KO3UTxcbOozj3dkYl1qD2tVXEFDzMPcYa4yfBKDGwzb2JI2OBmjhMV9TW+aeoA16Gs7NPo2i6OHGQ8pktDaLA+1tP11skKTL0mqxeYezlxFIT/Fm/WjwzyCNYqhLWerrVLHB922RvRudqnm+LO0YwdYMRGhtFJm/x/ML2OcSMLfTiRk24288v/uQTQauw+B0drJtns+Iqb4CZBIececUsFO4+nJhAX49JaMfoPi8wxWm38GHK4YDxz/ArQ104lnzJIr/nq8eFLXvLkFrmwudotao7FDOx4W2uVwD9SxsskRVhTcu1gj01xg+wTqsZQj5DTocCjeOZ23LUv9iqh6Bzmo8J9j2+O7soTZQoWjPSUb2j30CGyDUEJ0YWzbaeg6/FvrJ7WKSlSz2OWmWY+mTtKDLUtVbxvtx8HCNbeymcx9sFxtKHQgorOcPsrKM5CE8yzYTyJJ5Y7DgxerQM6mRUkHntog5sPip5Js/RCqlaweZsmBDdF1u3SZwYIKCARh0nH9tyDvJpnor4xX3JiOVJBgNkRQymFwKYledH yFQUfujt W/TSKEuPSVMQ0Bh7kDm+rv37o7R8yNBmYg1iKapdthDjDkcHXIYLdPOke+MrzS0tklp3wwTmvQLDl+ZeKScXiQ9pMpxJEq7gFOOCLrbqw9bUJWj0/sbMbviZpADUHXqNBsBrc8o+z5HMMBJiJZjUfTOJs0bEksTMomABMHBR2jUAu2DQ4s7JPDlDvFwUh4YnCsZTPFuSJ272Xt9SZNnA80X7ReXgnbRvOY7z9x2otNnfbk4Ccsxkv4/bE9lMumQbTDXA2JqXT0178RXi8onhYBl8imhKjCb5G5WhV7F9OVqnwbT8Ug3QzHC31+zYz9NncU04o0p5EpJt4w+VjN4qE7y2Q0OqpKKOW7/dTXEnx+K/e4qrPlIw0c5o4Q5ldTc2/zF+1oMe5JMeYkkrCnjmGUTsUIMDnjbQ4+tYOqBeqthnIBCJ9ZxwrMUQlb8v2QyRBX9LyLNiwa6hpCcC2NeKTQOAnB3bna0fK+eG7bFuEqSamcj/NCOnAvkEWaRkqWh7GIffkp3ISlqi4a4nTPP3Ym6Gt36v2FrKErurVgxe2ew+m15qAJUIHlLu5HQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Allow proactive reclaimers to submit an additional swappiness= argument to memory.reclaim. This overrides the global or per-memcg swappiness setting for that reclaim attempt. For example: echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim will perform reclaim on the rootcg with a swappiness setting of 0 (no swap) regardless of the vm.swappiness sysctl setting. Signed-off-by: Dan Schatzberg --- Documentation/admin-guide/cgroup-v2.rst | 19 +++++--- include/linux/swap.h | 3 +- mm/memcontrol.c | 61 ++++++++++++++++++++----- mm/vmscan.c | 11 +++-- 4 files changed, 72 insertions(+), 22 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 3f85254f3cef..06319349c072 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1282,17 +1282,10 @@ PAGE_SIZE multiple when read back. This is a simple interface to trigger memory reclaim in the target cgroup. - This file accepts a single key, the number of bytes to reclaim. - No nested keys are currently supported. - Example:: echo "1G" > memory.reclaim - The interface can be later extended with nested keys to - configure the reclaim behavior. For example, specify the - type of memory to reclaim from (anon, file, ..). - Please note that the kernel can over or under reclaim from the target cgroup. If less bytes are reclaimed than the specified amount, -EAGAIN is returned. @@ -1304,6 +1297,18 @@ PAGE_SIZE multiple when read back. This means that the networking layer will not adapt based on reclaim induced by memory.reclaim. +The following nested keys are defined. + + ========== ================================ + swappiness Swappiness value to reclaim with + ========== ================================ + + Specifying a swappiness value instructs the kernel to perform + the reclaim with that swappiness value. Note that this has the + same semantics as the vm.swappiness sysctl - it sets the + relative IO cost of reclaiming anon vs file memory but does + not allow for reclaiming specific amounts of anon or file memory. + memory.peak A read-only single value file which exists on non-root cgroups. diff --git a/include/linux/swap.h b/include/linux/swap.h index e2ab76c25b4a..690898c347de 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -412,7 +412,8 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - unsigned int reclaim_options); + unsigned int reclaim_options, + int swappiness); extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, gfp_t gfp_mask, bool noswap, pg_data_t *pgdat, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9748a6b88bb8..be3d5797d9df 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -63,6 +63,7 @@ #include #include #include +#include #include #include "internal.h" #include @@ -2449,7 +2450,8 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, psi_memstall_enter(&pflags); nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, - MEMCG_RECLAIM_MAY_SWAP); + MEMCG_RECLAIM_MAY_SWAP, + mem_cgroup_swappiness(memcg)); psi_memstall_leave(&pflags); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2740,7 +2742,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, psi_memstall_enter(&pflags); nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, - gfp_mask, reclaim_options); + gfp_mask, reclaim_options, + mem_cgroup_swappiness(memcg)); psi_memstall_leave(&pflags); if (mem_cgroup_margin(mem_over_limit) >= nr_pages) @@ -3660,7 +3663,8 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, } if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP)) { + memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, + mem_cgroup_swappiness(memcg))) { ret = -EBUSY; break; } @@ -3774,7 +3778,8 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) return -EINTR; if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_MAY_SWAP)) + MEMCG_RECLAIM_MAY_SWAP, + mem_cgroup_swappiness(memcg))) nr_retries--; } @@ -6720,7 +6725,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, } reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP); + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, + mem_cgroup_swappiness(memcg)); if (!reclaimed && !nr_retries--) break; @@ -6769,7 +6775,8 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, if (nr_reclaims) { if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP)) + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, + mem_cgroup_swappiness(memcg))) nr_reclaims--; continue; } @@ -6895,6 +6902,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, return nbytes; } +enum { + MEMORY_RECLAIM_SWAPPINESS = 0, + MEMORY_RECLAIM_NULL, +}; + +static const match_table_t tokens = { + { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"}, + { MEMORY_RECLAIM_NULL, NULL }, +}; + static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -6902,12 +6919,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, unsigned int nr_retries = MAX_RECLAIM_RETRIES; unsigned long nr_to_reclaim, nr_reclaimed = 0; unsigned int reclaim_options; - int err; + char *old_buf, *start; + substring_t args[MAX_OPT_ARGS]; + int swappiness = mem_cgroup_swappiness(memcg); buf = strstrip(buf); - err = page_counter_memparse(buf, "", &nr_to_reclaim); - if (err) - return err; + + old_buf = buf; + nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE; + if (buf == old_buf) + return -EINVAL; + + buf = strstrip(buf); + + while ((start = strsep(&buf, " ")) != NULL) { + if (!strlen(start)) + continue; + switch (match_token(start, tokens, args)) { + case MEMORY_RECLAIM_SWAPPINESS: + if (match_int(&args[0], &swappiness)) + return -EINVAL; + if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS) + return -EINVAL; + break; + default: + return -EINVAL; + } + } reclaim_options = MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE; while (nr_reclaimed < nr_to_reclaim) { @@ -6926,7 +6964,8 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, reclaimed = try_to_free_mem_cgroup_pages(memcg, min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX), - GFP_KERNEL, reclaim_options); + GFP_KERNEL, reclaim_options, + swappiness); if (!reclaimed && !nr_retries--) return -EAGAIN; diff --git a/mm/vmscan.c b/mm/vmscan.c index 2aa671fe938b..cfef06c7c23f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -136,6 +136,9 @@ struct scan_control { /* Always discard instead of demoting to lower tier memory */ unsigned int no_demotion:1; + /* Swappiness value for reclaim */ + int swappiness; + /* Allocation order */ s8 order; @@ -2327,7 +2330,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); unsigned long anon_cost, file_cost, total_cost; - int swappiness = mem_cgroup_swappiness(memcg); + int swappiness = sc->swappiness; u64 fraction[ANON_AND_FILE]; u64 denominator = 0; /* gcc */ enum scan_balance scan_balance; @@ -2608,7 +2611,7 @@ static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc) mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH) return 0; - return mem_cgroup_swappiness(memcg); + return sc->swappiness; } static int get_nr_gens(struct lruvec *lruvec, int type) @@ -6433,7 +6436,8 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg, unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - unsigned int reclaim_options) + unsigned int reclaim_options, + int swappiness) { unsigned long nr_reclaimed; unsigned int noreclaim_flag; @@ -6448,6 +6452,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .may_unmap = 1, .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .swappiness = swappiness, }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put