From patchwork Tue May 28 20:20:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C256C25B7E for ; Tue, 28 May 2024 20:21:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09F376B0098; Tue, 28 May 2024 16:21:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F40F36B0099; Tue, 28 May 2024 16:21:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E09826B009A; Tue, 28 May 2024 16:21:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C1FF66B0098 for ; Tue, 28 May 2024 16:21:34 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7992BA0593 for ; Tue, 28 May 2024 20:21:34 +0000 (UTC) X-FDA: 82168924908.14.DE6816F Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) by imf06.hostedemail.com (Postfix) with ESMTP id 5E7C6180008 for ; Tue, 28 May 2024 20:21:32 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="kXZSc8P/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.179 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927692; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=skdB+ennJA28iFKwvEIuJBTK1zA9kQ9+RV54OQgd+d4=; b=T6MjsKivais+DFRQqTGsan3SBYlLtgfNvyaGw0RSbgwQhyHvZU4AlRmfHgtQODug0squc3 culq4xaSEbjZ/lJ5Ul3lKZlBojcJGUBoG4f4dok9tD6nkmE5z+/R5IpP/ebeJZqcjOwRGd rz0aoEvIriFon3mXHa7Ov7nqkiCd5xM= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="kXZSc8P/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.179 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927692; a=rsa-sha256; cv=none; b=qqyQ4WGcawFkoGRLUzPzNkiqQFpif11srncTFsGKVT11u0SNA/dcCdYgSmzTPhftDM4FOY 7tPMqOrG0EOsp2ymelxYjiHWr8/tqe7ylfT1PFAZrLyQu7fMRXWnGoafPeKYfHx7V3GG44 GrSvXHuSeUb4aT34MKQY39tN3vkkqxQ= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927690; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=skdB+ennJA28iFKwvEIuJBTK1zA9kQ9+RV54OQgd+d4=; b=kXZSc8P/l1zrFGRit40gHI3whujSvruQH3YhSlwEZ1uUMcCozwIdWfL6qDEkcRm9QzDSVc UHRXLckf864TfJOmD04sKC+D6vsPTG6HnILGtUgBzBbnRKJRSLW+ud56nXyAeDqTjFyNeV B/HrZW4NnMEMOrjHq8VMrBAJkpsXtPo= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 01/14] mm: memcg: introduce memcontrol-v1.c Date: Tue, 28 May 2024 13:20:53 -0700 Message-ID: <20240528202101.3099300-2-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 5E7C6180008 X-Stat-Signature: ingqdbjm6y1wfw37jioyjpe483uctn1r X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1716927692-774268 X-HE-Meta: U2FsdGVkX1/RbzmXbJdLpUH9v7RSC9h3bEKUarF9zOKOq1LRfQ7ZFID6sOa2dlrW/qkCmHe/n2HJoFlx9bCDBt54NjHE3dVrypTvsPhG2g7SVz+thhEjz70U+nHn675WY4eMTXWaFnXI4R4OWQM/k8nnXCpfYE4MSUrwbMSeaMAeJGqvJksxgQrKf50WKADEvyr6KqQBV2vBTfBMVKS9saiUjYsHC8YHNRRZ/bWIdoOnBqFV8wezjgpGdSnkxvOFoxzWmwDN+slJn/WPzkYqGhRCiPYDE8lAW8wZj/wZaMvCSU8phVesIRg8uQHXku8rryJp0cfsgDVcytCYZ8uI9r7rnOG1MEI5QBX+3Y3ZML9iwdtUX+rxQAQXjQypvPEWBBvwEnAKl+KUBJ9D12yzuI0ABarFDKhx1nylr+0U/3ichUIsBCrp+qauh7YTHqqnFnltmUcwQ7S4kLCxwTtUgafHcBBFoVfJEG3sBRm4QR/5g3mt9gskcKiccnqplG/cV1txZ8UrqRNKTf3XWkxQut4bSSGwj/vxmeJXAm6OscNs+Nr2ELqib1UXBCIRXkpVaNjU6TFAB7I3dHHoF7l5IgDw3bE4lYU4NCDUYG2WVvMkYuYQ0VOxj1G9beahVQJPK1SXgqyOFS+gNEBvXhDF8HNKEvWx2omBE67p3jR1/z+NXyai0URuP0dAeTu6Y1ZRWz2JHBVyXkt5BOvDTxUaByWXlmJVg2r/cYQpZqrx30ghUcxUfVHs6hFKVIXCU+22j494iYlBsHLzjbx0vqm30HQCKJ0feJWbbMimD6Oc+Qc6MxYZdRPc2p7aLtHJ8mcnT7WXsGnN0Rw/wsMk3qEULkHyol0TZgsrs4NLKUPvCMs4FcOOSjaiAIQytUZmptIkqGO63iXI9Hde7T6AMBVRR3DHDBS9Fh7ZlZCQIqDw0M6Fhx1Og5wr1zWxtju8vqBIM+v1q9vvme6q/M2uSNR /XvZp7cI Ny3by8RqCNHj+pL39oc0aCe1L1agwMWKulVD7PLiDnpNm3wfHIp6TbVImAIo2dCRhHod5ZKSOXldB2yM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch introduces the mm/memcontrol-v1.c source file which will be used for all legacy (cgroup v1) memory cgroup code. It also introduces mm/memcontrol-v1.h to keep declarations shared between mm/memcontrol.c and mm/memcontrol-v1.c. As of now, let's compile it if CONFIG_MEMCG is set, similar to mm/memcontrol.c. Later on it can be switched to use a separate config option, so that the legacy code won't be compiled if not required. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- mm/Makefile | 3 ++- mm/memcontrol-v1.c | 3 +++ mm/memcontrol-v1.h | 7 +++++++ 3 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 mm/memcontrol-v1.c create mode 100644 mm/memcontrol-v1.h diff --git a/mm/Makefile b/mm/Makefile index 8fb85acda1b1c..124d4dea20351 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -26,6 +26,7 @@ KCOV_INSTRUMENT_page_alloc.o := n KCOV_INSTRUMENT_debug-pagealloc.o := n KCOV_INSTRUMENT_kmemleak.o := n KCOV_INSTRUMENT_memcontrol.o := n +KCOV_INSTRUMENT_memcontrol-v1.o := n KCOV_INSTRUMENT_mmzone.o := n KCOV_INSTRUMENT_vmstat.o := n KCOV_INSTRUMENT_failslab.o := n @@ -95,7 +96,7 @@ obj-$(CONFIG_NUMA) += memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o -obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o +obj-$(CONFIG_MEMCG) += memcontrol.o memcontrol-v1.o vmpressure.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c new file mode 100644 index 0000000000000..a941446ba575b --- /dev/null +++ b/mm/memcontrol-v1.c @@ -0,0 +1,3 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "memcontrol-v1.h" diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h new file mode 100644 index 0000000000000..7c5f094755ff8 --- /dev/null +++ b/mm/memcontrol-v1.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#ifndef __MM_MEMCONTROL_V1_H +#define __MM_MEMCONTROL_V1_H + + +#endif /* __MM_MEMCONTROL_V1_H */ From patchwork Tue May 28 20:20:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677350 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26E0AC25B7C for ; Tue, 28 May 2024 20:21:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C5A16B009C; Tue, 28 May 2024 16:21:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 976B46B009D; Tue, 28 May 2024 16:21:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F08B6B009C; Tue, 28 May 2024 16:21:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5B2D76B009A for ; Tue, 28 May 2024 16:21:38 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 14D07A01BE for ; Tue, 28 May 2024 20:21:38 +0000 (UTC) X-FDA: 82168925076.24.8B163D4 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) by imf29.hostedemail.com (Postfix) with ESMTP id D23E5120019 for ; Tue, 28 May 2024 20:21:35 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=KmYhQrct; spf=pass (imf29.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927696; a=rsa-sha256; cv=none; b=jkVo3jejiHaMaBx5QX2WdUc4FJh6m6mxbhtWB9GfSMNHf1bYKsJIVF7rtUVQuiY3+V5yle AFQGxDeEuzZBAVNA66ZMvxtTGjmX1NDsADTXX3xG3NAiAYaXNU8ErxS66YIKgXcW24OScN dbatjYOJ1eWM/WQqkaYQmWwUVof94uA= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=KmYhQrct; spf=pass (imf29.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927696; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cKNJ7uS8GGR+wEr4ln42Kbd/rwRlFtdH5xJqx+waYCA=; b=dxZalSbtECufii6cv1qGqFvC2On1BXpP0hlLpj4nyxR6KUI7gS04Pl0J+lPdBetxZLqo9w N4lXRzCT789bQqNEiOZ1C++kVH5Lp4YY0bTYmJWV1Ax0Hrces8g6ar5oKZUjiIWdfWG8wt QNskjTKtCa7hryV8994M5ebhWKTSaus= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927694; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cKNJ7uS8GGR+wEr4ln42Kbd/rwRlFtdH5xJqx+waYCA=; b=KmYhQrctprNOoOSbiNFfNcwd83UM+LZZNhsjmBZt4h/E/SVvaiaBxgiNKZc5KbYUytUQ0c Tzoa1zb2RKSgDhs01vp6H0ziNpelBPjOd6i/5QL5uZl4KrVTzhkLmGNJASuZOqwyHuHiQZ 2EKTD1pmATKU94KvIVyl4wjPsOfCq7Y= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 02/14] mm: memcg: move soft limit reclaim code to memcontrol-v1.c Date: Tue, 28 May 2024 13:20:54 -0700 Message-ID: <20240528202101.3099300-3-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D23E5120019 X-Stat-Signature: ezf7bp4yozooy6gqh1yt39b9bm1rnsu1 X-HE-Tag: 1716927695-428179 X-HE-Meta: U2FsdGVkX193Goal+Q3gNujD2A9CcP8zBO6cSXx+vpnbGmXCUPjSoITxSA8HMDij/sFaqV5Ow+8QEsSjk1gv45XlDtvAF4y0iMuzw2k0h4k+1UzKSDG8cAZd2KNaOBe31Y5YONMiMlcBADDHedxbikXavjBjuuVye5PpoehAeXAvEUnZQmegE69XiB384BccR/M4zNSOD+Bp1jJ3XuoxRs3GDCt05m4AwTGdYuEZKfVotZEYvUEJHxfw7alDPhJzcftIHDPQj1jUDG7h4Inp8fN/B2fyVO1pDsosytcnAmIS/20G+vSLEOpHPgTt24eELSY9+Np62ixA2ks887RN05O1BpejDBx6sCYdrA1KMveocA4NgCjU93NIZgSckgDYCTSK5gRowWFyBW5MImlY2hQcNK2e3LB7P67qvKE37IOfB7T+A2jWQ+ifrtURvR+yl6FO79KCotoHWLggv1rnn/1ZVSh9bwK225t9YJxpH0ripxsiaopjGpe4GN2vHTqcrCfWzHKNZ4OiE5Yim9ItJ9cBZQhBLSCKNTS5n5Vp7c5JD7k67BkTPNRxI0kd07CEWkN7s5HvOK4KQx3Ziq3x+34mHbETGDHy2B4gT3nXhEadDE0+GuFhOLDdt6urj87DCZ5ktP85EdKxiiM58Xblzzn+dB0yKaupfpRNw4vAlef46hVesgiX7PAZ7Oye07CgiXN6EDv+QCf7IeQOyvUi31SyQv9X4FspULXdfJpyuyfWM3tiSBzyBcbkxNu0G8k/z0dUKnxAtNyry2prFn7zziD+1VIZGnkCrdRnlSHK49Md2wYpZjEq7Lj0NJ/70XZ15FlqSr9+mlFsx6iAWCLE+gb7/QkjA+ZSiyp5U2+rI1j1Wol/CedQ+Nr+JpV0VqeqVMKx2fL8Bff2tvJoPtTn3M948UMRcac8gFsaSOqCZMMz+B8jCJwERBbhJ1DCz8q5RYjXNUxKKv3Y/3ckDao QJlZIihZ jz+66eis0xTX9oNsXIOZdL61aCvc9q+nYCtOGYSMUyCqqafP38rN7/47zbN1MKTu4G1PI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Soft limits are cgroup v1-specific and are not supported by cgroup v2, so let's move the corresponding code into memcontrol-v1.c. Aside from simple moving the code, this commits introduces a trivial memcg1_soft_limit_reset() function to reset soft limits and also moves the global soft limit tree initialization code into a new memcg1_init() function. It also moves corresponding declarations shared between memcontrol.c and memcontrol-v1.c into mm/memcontrol-v1.h. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- mm/memcontrol-v1.c | 342 +++++++++++++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 7 + mm/memcontrol.c | 337 +------------------------------------------- 3 files changed, 353 insertions(+), 333 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index a941446ba575b..2ccb8406fa84e 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1,3 +1,345 @@ // SPDX-License-Identifier: GPL-2.0-or-later +#include +#include +#include + #include "memcontrol-v1.h" + +/* + * Cgroups above their limits are maintained in a RB-Tree, independent of + * their hierarchy representation + */ + +struct mem_cgroup_tree_per_node { + struct rb_root rb_root; + struct rb_node *rb_rightmost; + spinlock_t lock; +}; + +struct mem_cgroup_tree { + struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES]; +}; + +static struct mem_cgroup_tree soft_limit_tree __read_mostly; + +/* + * Maximum loops in mem_cgroup_soft_reclaim(), used for soft + * limit reclaim to prevent infinite loops, if they ever occur. + */ +#define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 +#define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 + +static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz, + unsigned long new_usage_in_excess) +{ + struct rb_node **p = &mctz->rb_root.rb_node; + struct rb_node *parent = NULL; + struct mem_cgroup_per_node *mz_node; + bool rightmost = true; + + if (mz->on_tree) + return; + + mz->usage_in_excess = new_usage_in_excess; + if (!mz->usage_in_excess) + return; + while (*p) { + parent = *p; + mz_node = rb_entry(parent, struct mem_cgroup_per_node, + tree_node); + if (mz->usage_in_excess < mz_node->usage_in_excess) { + p = &(*p)->rb_left; + rightmost = false; + } else { + p = &(*p)->rb_right; + } + } + + if (rightmost) + mctz->rb_rightmost = &mz->tree_node; + + rb_link_node(&mz->tree_node, parent, p); + rb_insert_color(&mz->tree_node, &mctz->rb_root); + mz->on_tree = true; +} + +static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz) +{ + if (!mz->on_tree) + return; + + if (&mz->tree_node == mctz->rb_rightmost) + mctz->rb_rightmost = rb_prev(&mz->tree_node); + + rb_erase(&mz->tree_node, &mctz->rb_root); + mz->on_tree = false; +} + +static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz) +{ + unsigned long flags; + + spin_lock_irqsave(&mctz->lock, flags); + __mem_cgroup_remove_exceeded(mz, mctz); + spin_unlock_irqrestore(&mctz->lock, flags); +} + +static unsigned long soft_limit_excess(struct mem_cgroup *memcg) +{ + unsigned long nr_pages = page_counter_read(&memcg->memory); + unsigned long soft_limit = READ_ONCE(memcg->soft_limit); + unsigned long excess = 0; + + if (nr_pages > soft_limit) + excess = nr_pages - soft_limit; + + return excess; +} + +void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) +{ + unsigned long excess; + struct mem_cgroup_per_node *mz; + struct mem_cgroup_tree_per_node *mctz; + + if (lru_gen_enabled()) { + if (soft_limit_excess(memcg)) + lru_gen_soft_reclaim(memcg, nid); + return; + } + + mctz = soft_limit_tree.rb_tree_per_node[nid]; + if (!mctz) + return; + /* + * Necessary to update all ancestors when hierarchy is used. + * because their event counter is not touched. + */ + for (; memcg; memcg = parent_mem_cgroup(memcg)) { + mz = memcg->nodeinfo[nid]; + excess = soft_limit_excess(memcg); + /* + * We have to update the tree if mz is on RB-tree or + * mem is over its softlimit. + */ + if (excess || mz->on_tree) { + unsigned long flags; + + spin_lock_irqsave(&mctz->lock, flags); + /* if on-tree, remove it */ + if (mz->on_tree) + __mem_cgroup_remove_exceeded(mz, mctz); + /* + * Insert again. mz->usage_in_excess will be updated. + * If excess is 0, no tree ops. + */ + __mem_cgroup_insert_exceeded(mz, mctz, excess); + spin_unlock_irqrestore(&mctz->lock, flags); + } + } +} + +void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) +{ + struct mem_cgroup_tree_per_node *mctz; + struct mem_cgroup_per_node *mz; + int nid; + + for_each_node(nid) { + mz = memcg->nodeinfo[nid]; + mctz = soft_limit_tree.rb_tree_per_node[nid]; + if (mctz) + mem_cgroup_remove_exceeded(mz, mctz); + } +} + +static struct mem_cgroup_per_node * +__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) +{ + struct mem_cgroup_per_node *mz; + +retry: + mz = NULL; + if (!mctz->rb_rightmost) + goto done; /* Nothing to reclaim from */ + + mz = rb_entry(mctz->rb_rightmost, + struct mem_cgroup_per_node, tree_node); + /* + * Remove the node now but someone else can add it back, + * we will to add it back at the end of reclaim to its correct + * position in the tree. + */ + __mem_cgroup_remove_exceeded(mz, mctz); + if (!soft_limit_excess(mz->memcg) || + !css_tryget(&mz->memcg->css)) + goto retry; +done: + return mz; +} + +static struct mem_cgroup_per_node * +mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) +{ + struct mem_cgroup_per_node *mz; + + spin_lock_irq(&mctz->lock); + mz = __mem_cgroup_largest_soft_limit_node(mctz); + spin_unlock_irq(&mctz->lock); + return mz; +} + +static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, + pg_data_t *pgdat, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + struct mem_cgroup *victim = NULL; + int total = 0; + int loop = 0; + unsigned long excess; + unsigned long nr_scanned; + struct mem_cgroup_reclaim_cookie reclaim = { + .pgdat = pgdat, + }; + + excess = soft_limit_excess(root_memcg); + + while (1) { + victim = mem_cgroup_iter(root_memcg, victim, &reclaim); + if (!victim) { + loop++; + if (loop >= 2) { + /* + * If we have not been able to reclaim + * anything, it might because there are + * no reclaimable pages under this hierarchy + */ + if (!total) + break; + /* + * We want to do more targeted reclaim. + * excess >> 2 is not to excessive so as to + * reclaim too much, nor too less that we keep + * coming back to reclaim from this cgroup + */ + if (total >= (excess >> 2) || + (loop > MEM_CGROUP_MAX_RECLAIM_LOOPS)) + break; + } + continue; + } + total += mem_cgroup_shrink_node(victim, gfp_mask, false, + pgdat, &nr_scanned); + *total_scanned += nr_scanned; + if (!soft_limit_excess(root_memcg)) + break; + } + mem_cgroup_iter_break(root_memcg, victim); + return total; +} + +unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + unsigned long nr_reclaimed = 0; + struct mem_cgroup_per_node *mz, *next_mz = NULL; + unsigned long reclaimed; + int loop = 0; + struct mem_cgroup_tree_per_node *mctz; + unsigned long excess; + + if (lru_gen_enabled()) + return 0; + + if (order > 0) + return 0; + + mctz = soft_limit_tree.rb_tree_per_node[pgdat->node_id]; + + /* + * Do not even bother to check the largest node if the root + * is empty. Do it lockless to prevent lock bouncing. Races + * are acceptable as soft limit is best effort anyway. + */ + if (!mctz || RB_EMPTY_ROOT(&mctz->rb_root)) + return 0; + + /* + * This loop can run a while, specially if mem_cgroup's continuously + * keep exceeding their soft limit and putting the system under + * pressure + */ + do { + if (next_mz) + mz = next_mz; + else + mz = mem_cgroup_largest_soft_limit_node(mctz); + if (!mz) + break; + + reclaimed = mem_cgroup_soft_reclaim(mz->memcg, pgdat, + gfp_mask, total_scanned); + nr_reclaimed += reclaimed; + spin_lock_irq(&mctz->lock); + + /* + * If we failed to reclaim anything from this memory cgroup + * it is time to move on to the next cgroup + */ + next_mz = NULL; + if (!reclaimed) + next_mz = __mem_cgroup_largest_soft_limit_node(mctz); + + excess = soft_limit_excess(mz->memcg); + /* + * One school of thought says that we should not add + * back the node to the tree if reclaim returns 0. + * But our reclaim could return 0, simply because due + * to priority we are exposing a smaller subset of + * memory to reclaim from. Consider this as a longer + * term TODO. + */ + /* If excess == 0, no tree ops */ + __mem_cgroup_insert_exceeded(mz, mctz, excess); + spin_unlock_irq(&mctz->lock); + css_put(&mz->memcg->css); + loop++; + /* + * Could not reclaim anything and there are no more + * mem cgroups to try or we seem to be looping without + * reclaiming anything. + */ + if (!nr_reclaimed && + (next_mz == NULL || + loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS)) + break; + } while (!nr_reclaimed); + if (next_mz) + css_put(&next_mz->memcg->css); + return nr_reclaimed; +} + +static int __init memcg1_init(void) +{ + int node; + + for_each_node(node) { + struct mem_cgroup_tree_per_node *rtpn; + + rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, node); + + rtpn->rb_root = RB_ROOT; + rtpn->rb_rightmost = NULL; + spin_lock_init(&rtpn->lock); + soft_limit_tree.rb_tree_per_node[node] = rtpn; + } + + return 0; +} +subsys_initcall(memcg1_init); diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7c5f094755ff8..4da6fa561c6d4 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,5 +3,12 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H +void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid); +void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg); + +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) +{ + WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); +} #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 352b6cda962ce..fa1df6e894787 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -72,6 +72,7 @@ #include #include "slab.h" #include "swap.h" +#include "memcontrol-v1.h" #include @@ -108,23 +109,6 @@ static bool do_memsw_account(void) #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 -/* - * Cgroups above their limits are maintained in a RB-Tree, independent of - * their hierarchy representation - */ - -struct mem_cgroup_tree_per_node { - struct rb_root rb_root; - struct rb_node *rb_rightmost; - spinlock_t lock; -}; - -struct mem_cgroup_tree { - struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES]; -}; - -static struct mem_cgroup_tree soft_limit_tree __read_mostly; - /* for OOM */ struct mem_cgroup_eventfd_list { struct list_head list; @@ -199,13 +183,6 @@ static struct move_charge_struct { .waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), }; -/* - * Maximum loops in mem_cgroup_soft_reclaim(), used for soft - * limit reclaim to prevent infinite loops, if they ever occur. - */ -#define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 -#define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 - /* for encoding cft->private value on file */ enum res_type { _MEM, @@ -413,169 +390,6 @@ ino_t page_cgroup_ino(struct page *page) return ino; } -static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz, - unsigned long new_usage_in_excess) -{ - struct rb_node **p = &mctz->rb_root.rb_node; - struct rb_node *parent = NULL; - struct mem_cgroup_per_node *mz_node; - bool rightmost = true; - - if (mz->on_tree) - return; - - mz->usage_in_excess = new_usage_in_excess; - if (!mz->usage_in_excess) - return; - while (*p) { - parent = *p; - mz_node = rb_entry(parent, struct mem_cgroup_per_node, - tree_node); - if (mz->usage_in_excess < mz_node->usage_in_excess) { - p = &(*p)->rb_left; - rightmost = false; - } else { - p = &(*p)->rb_right; - } - } - - if (rightmost) - mctz->rb_rightmost = &mz->tree_node; - - rb_link_node(&mz->tree_node, parent, p); - rb_insert_color(&mz->tree_node, &mctz->rb_root); - mz->on_tree = true; -} - -static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) -{ - if (!mz->on_tree) - return; - - if (&mz->tree_node == mctz->rb_rightmost) - mctz->rb_rightmost = rb_prev(&mz->tree_node); - - rb_erase(&mz->tree_node, &mctz->rb_root); - mz->on_tree = false; -} - -static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) -{ - unsigned long flags; - - spin_lock_irqsave(&mctz->lock, flags); - __mem_cgroup_remove_exceeded(mz, mctz); - spin_unlock_irqrestore(&mctz->lock, flags); -} - -static unsigned long soft_limit_excess(struct mem_cgroup *memcg) -{ - unsigned long nr_pages = page_counter_read(&memcg->memory); - unsigned long soft_limit = READ_ONCE(memcg->soft_limit); - unsigned long excess = 0; - - if (nr_pages > soft_limit) - excess = nr_pages - soft_limit; - - return excess; -} - -static void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) -{ - unsigned long excess; - struct mem_cgroup_per_node *mz; - struct mem_cgroup_tree_per_node *mctz; - - if (lru_gen_enabled()) { - if (soft_limit_excess(memcg)) - lru_gen_soft_reclaim(memcg, nid); - return; - } - - mctz = soft_limit_tree.rb_tree_per_node[nid]; - if (!mctz) - return; - /* - * Necessary to update all ancestors when hierarchy is used. - * because their event counter is not touched. - */ - for (; memcg; memcg = parent_mem_cgroup(memcg)) { - mz = memcg->nodeinfo[nid]; - excess = soft_limit_excess(memcg); - /* - * We have to update the tree if mz is on RB-tree or - * mem is over its softlimit. - */ - if (excess || mz->on_tree) { - unsigned long flags; - - spin_lock_irqsave(&mctz->lock, flags); - /* if on-tree, remove it */ - if (mz->on_tree) - __mem_cgroup_remove_exceeded(mz, mctz); - /* - * Insert again. mz->usage_in_excess will be updated. - * If excess is 0, no tree ops. - */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); - spin_unlock_irqrestore(&mctz->lock, flags); - } - } -} - -static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) -{ - struct mem_cgroup_tree_per_node *mctz; - struct mem_cgroup_per_node *mz; - int nid; - - for_each_node(nid) { - mz = memcg->nodeinfo[nid]; - mctz = soft_limit_tree.rb_tree_per_node[nid]; - if (mctz) - mem_cgroup_remove_exceeded(mz, mctz); - } -} - -static struct mem_cgroup_per_node * -__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) -{ - struct mem_cgroup_per_node *mz; - -retry: - mz = NULL; - if (!mctz->rb_rightmost) - goto done; /* Nothing to reclaim from */ - - mz = rb_entry(mctz->rb_rightmost, - struct mem_cgroup_per_node, tree_node); - /* - * Remove the node now but someone else can add it back, - * we will to add it back at the end of reclaim to its correct - * position in the tree. - */ - __mem_cgroup_remove_exceeded(mz, mctz); - if (!soft_limit_excess(mz->memcg) || - !css_tryget(&mz->memcg->css)) - goto retry; -done: - return mz; -} - -static struct mem_cgroup_per_node * -mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) -{ - struct mem_cgroup_per_node *mz; - - spin_lock_irq(&mctz->lock); - mz = __mem_cgroup_largest_soft_limit_node(mctz); - spin_unlock_irq(&mctz->lock); - return mz; -} - /* Subset of node_stat_item for memcg stats */ static const unsigned int memcg_node_stat_items[] = { NR_INACTIVE_ANON, @@ -1980,56 +1794,6 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, return ret; } -static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, - pg_data_t *pgdat, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - struct mem_cgroup *victim = NULL; - int total = 0; - int loop = 0; - unsigned long excess; - unsigned long nr_scanned; - struct mem_cgroup_reclaim_cookie reclaim = { - .pgdat = pgdat, - }; - - excess = soft_limit_excess(root_memcg); - - while (1) { - victim = mem_cgroup_iter(root_memcg, victim, &reclaim); - if (!victim) { - loop++; - if (loop >= 2) { - /* - * If we have not been able to reclaim - * anything, it might because there are - * no reclaimable pages under this hierarchy - */ - if (!total) - break; - /* - * We want to do more targeted reclaim. - * excess >> 2 is not to excessive so as to - * reclaim too much, nor too less that we keep - * coming back to reclaim from this cgroup - */ - if (total >= (excess >> 2) || - (loop > MEM_CGROUP_MAX_RECLAIM_LOOPS)) - break; - } - continue; - } - total += mem_cgroup_shrink_node(victim, gfp_mask, false, - pgdat, &nr_scanned); - *total_scanned += nr_scanned; - if (!soft_limit_excess(root_memcg)) - break; - } - mem_cgroup_iter_break(root_memcg, victim); - return total; -} - #ifdef CONFIG_LOCKDEP static struct lockdep_map memcg_oom_lock_dep_map = { .name = "memcg_oom_lock", @@ -3927,88 +3691,6 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, return ret; } -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - unsigned long nr_reclaimed = 0; - struct mem_cgroup_per_node *mz, *next_mz = NULL; - unsigned long reclaimed; - int loop = 0; - struct mem_cgroup_tree_per_node *mctz; - unsigned long excess; - - if (lru_gen_enabled()) - return 0; - - if (order > 0) - return 0; - - mctz = soft_limit_tree.rb_tree_per_node[pgdat->node_id]; - - /* - * Do not even bother to check the largest node if the root - * is empty. Do it lockless to prevent lock bouncing. Races - * are acceptable as soft limit is best effort anyway. - */ - if (!mctz || RB_EMPTY_ROOT(&mctz->rb_root)) - return 0; - - /* - * This loop can run a while, specially if mem_cgroup's continuously - * keep exceeding their soft limit and putting the system under - * pressure - */ - do { - if (next_mz) - mz = next_mz; - else - mz = mem_cgroup_largest_soft_limit_node(mctz); - if (!mz) - break; - - reclaimed = mem_cgroup_soft_reclaim(mz->memcg, pgdat, - gfp_mask, total_scanned); - nr_reclaimed += reclaimed; - spin_lock_irq(&mctz->lock); - - /* - * If we failed to reclaim anything from this memory cgroup - * it is time to move on to the next cgroup - */ - next_mz = NULL; - if (!reclaimed) - next_mz = __mem_cgroup_largest_soft_limit_node(mctz); - - excess = soft_limit_excess(mz->memcg); - /* - * One school of thought says that we should not add - * back the node to the tree if reclaim returns 0. - * But our reclaim could return 0, simply because due - * to priority we are exposing a smaller subset of - * memory to reclaim from. Consider this as a longer - * term TODO. - */ - /* If excess == 0, no tree ops */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); - spin_unlock_irq(&mctz->lock); - css_put(&mz->memcg->css); - loop++; - /* - * Could not reclaim anything and there are no more - * mem cgroups to try or we seem to be looping without - * reclaiming anything. - */ - if (!nr_reclaimed && - (next_mz == NULL || - loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS)) - break; - } while (!nr_reclaimed); - if (next_mz) - css_put(&next_mz->memcg->css); - return nr_reclaimed; -} - /* * Reclaims as many pages from the given memcg as possible. * @@ -5786,7 +5468,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) return ERR_CAST(memcg); page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); + memcg1_soft_limit_reset(memcg); #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) memcg->zswap_max = PAGE_COUNTER_MAX; WRITE_ONCE(memcg->zswap_writeback, @@ -5959,7 +5641,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys_state *css) page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); + memcg1_soft_limit_reset(memcg); page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX); memcg_wb_domain_size_changed(memcg); } @@ -7986,7 +7668,7 @@ __setup("cgroup.memory=", cgroup_memory); */ static int __init mem_cgroup_init(void) { - int cpu, node; + int cpu; /* * Currently s32 type (can refer to struct batched_lruvec_stat) is @@ -8003,17 +7685,6 @@ static int __init mem_cgroup_init(void) INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work, drain_local_stock); - for_each_node(node) { - struct mem_cgroup_tree_per_node *rtpn; - - rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, node); - - rtpn->rb_root = RB_ROOT; - rtpn->rb_rightmost = NULL; - spin_lock_init(&rtpn->lock); - soft_limit_tree.rb_tree_per_node[node] = rtpn; - } - return 0; } subsys_initcall(mem_cgroup_init); From patchwork Tue May 28 20:20:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677351 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F31AC25B78 for ; Tue, 28 May 2024 20:21:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD9076B009B; Tue, 28 May 2024 16:21:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A14206B009D; Tue, 28 May 2024 16:21:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 865886B009E; Tue, 28 May 2024 16:21:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 60B606B009B for ; Tue, 28 May 2024 16:21:40 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 187BBA01BE for ; Tue, 28 May 2024 20:21:40 +0000 (UTC) X-FDA: 82168925160.14.3710442 Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) by imf08.hostedemail.com (Postfix) with ESMTP id EC212160015 for ; Tue, 28 May 2024 20:21:37 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ijO+FHfZ; spf=pass (imf08.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.188 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927698; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y/IyRDssHrMHkNOV40XaGNGgjC7h+E/txgZ4e1RmFhI=; b=dk/Cs/BARco+wcUGHWNVerM59rv9I15MHB4q/KEOlj9TApSX+68f/vnGj5/lRaK3lzss62 X7C667PtmQgo5XJYu9/+eodVKslHEZdFFQ+o0UVvvMVxGoCUNuhtlfDIzVgcG6AoTTdRL2 /XZKtavHbWvPTTcWjaDjAe51uYzdNWI= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ijO+FHfZ; spf=pass (imf08.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.188 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927698; a=rsa-sha256; cv=none; b=VkMMq0b/F/Rz3nxM0jm8a5uv7gwkbnr5t3Nrz6DclOfwkWWo9D1pfHwpKiYYr7mnk9xQ65 hi9JFiIt/vLdjE/nyVs5Nmd/IU2PJQFPjQO/ilKClHRQBfFMF2AWyQZaNIpk1Kca0Ogybw uWrmsxwXIHeRZUbUmP14/W0f97Lg24o= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y/IyRDssHrMHkNOV40XaGNGgjC7h+E/txgZ4e1RmFhI=; b=ijO+FHfZzox6CnufXeEk7CPNoA3O1/MQLRzOj+KdMN4dJyyPjH9cMVVMna/IaQ5LOeHr4S FOlE9zJiodlgVj+FbbHzxKpLryUFAN2kQiR+MeQEKCAPIqA4cYLjZAwfVHcAfeqsGNoM0k m3nCL38Tpi4loQmhNfcMUkph7RmTW74= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 03/14] mm: memcg: rename soft limit reclaim-related functions Date: Tue, 28 May 2024 13:20:55 -0700 Message-ID: <20240528202101.3099300-4-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Stat-Signature: r8m5qjopecymsky9684zbkcpr8t54se9 X-Rspamd-Queue-Id: EC212160015 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1716927697-684530 X-HE-Meta: U2FsdGVkX19fwWDyWnPaLi0zoAuKTjgBUxUNXE4Zi7AwiK2Ej2oYJFbAgcpw2qyJgDgFBVtjnivP8s82W0ZpOyrHzM81GXl8x+e/NDmqFuBVloXUGbQrONC22ducwf70707ClJxcg0mwD3AcSe1o4PK+IYzLcLLCceGa2QPjUMEMg31zlfaEFMhNcLbqoO0cm3ejEfiyz/ChCsgMq0Z9zDmzTZjiJ0QfV4K1v0K7amsocC+ILWzjbkww1Qt6vFl77PdnHMqushQXKQJyFruClW/gu7IyoAc2fWy+YNSMd0q24u8HrLwqxpFevwce9UJuzkbIbjIAZRv9tETaxD7yq+Na+d2gpenda2e54nlZK/amHOLekwtmiuxgJlKqiPOgqgWvwL0j+nlwu7AWEMsUlD9TeDzwafIZZpideJiywroGEkByl1NrQ8+Pc6EcPY8rt+gRnZhAuWhkgZI55dSGO4quXL6ipwVyBDz53liornRbQfQE+xsg/TJecoYnnBaJdHbKcVSYV1Iuk8diT8/K8uMg92HF8sQf16P+yUyx44P8RgdZEB5Bt7zQWrVkkP3alc0eCho92QQPToPuxIRXwJtfU/o31RqkIuXYcKWzZ4+JrCO0n6bH0Q+Ku1M6/f0IyU42gh59MPOUceCwXwvIaKgfDFkzwFCKffsL/qWoAO3ZnvRpXkbly4xhhlKr9piNmggjEGiySpK9aaVCC4s79A85D9HH89SuK8RmO2S+/3DIqWF7tacCyoBJSpBJWLgP5c7tuqhXYxBvS8fkp6r/vTWBEIYOjVN86QQ+w9W6uyCEk1ZiL7HvfClcfqr5rs8M6dTYLF2VHHwbkPG6d1fONMK6TRVRJj68xEaom+jvuIKwF3ysNQ10qnSnfSLzAa9VMvodUxrHcPeZyETawLx5mUJ2+gKjq2rBL8of9KW6X5T/WSvHPPUrXl/Js0LkeX1+yZoQvD+ARD1RcC5CG3M 6xeRVetK 27NBbt6f0OUkTQgfI56fKlUVJvyUdL6UAC0a9U63aopxkTgXw+VDHNSELlImS7ayouzhchljU48B0Xoc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Rename exported function related to the softlimit reclaim to have memcg1_ prefix. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 12 ++++++------ mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 4 ++-- mm/memcontrol.c | 4 ++-- mm/vmscan.c | 10 +++++----- 5 files changed, 18 insertions(+), 18 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 3d1599146afe4..04f9f7b0e7c2f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1115,9 +1115,9 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, void split_page_memcg(struct page *head, int old_order, int new_order); -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned); +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned); #else /* CONFIG_MEMCG */ @@ -1566,9 +1566,9 @@ static inline void split_page_memcg(struct page *head, int old_order, int new_or } static inline -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) { return 0; } diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 2ccb8406fa84e..68e2f1a718d36 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -100,7 +100,7 @@ static unsigned long soft_limit_excess(struct mem_cgroup *memcg) return excess; } -void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) +void memcg1_update_tree(struct mem_cgroup *memcg, int nid) { unsigned long excess; struct mem_cgroup_per_node *mz; @@ -143,7 +143,7 @@ void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) } } -void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) +void memcg1_remove_from_trees(struct mem_cgroup *memcg) { struct mem_cgroup_tree_per_node *mctz; struct mem_cgroup_per_node *mz; @@ -243,7 +243,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, return total; } -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned) { diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 4da6fa561c6d4..e37bc7e8d9558 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,8 +3,8 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H -void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid); -void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg); +void memcg1_update_tree(struct mem_cgroup *memcg, int nid); +void memcg1_remove_from_trees(struct mem_cgroup *memcg); static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fa1df6e894787..0fbc4e518222e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1012,7 +1012,7 @@ static void memcg_check_events(struct mem_cgroup *memcg, int nid) MEM_CGROUP_TARGET_SOFTLIMIT); mem_cgroup_threshold(memcg); if (unlikely(do_softlimit)) - mem_cgroup_update_tree(memcg, nid); + memcg1_update_tree(memcg, nid); } } @@ -5612,7 +5612,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css) vmpressure_cleanup(&memcg->vmpressure); cancel_work_sync(&memcg->high_work); - mem_cgroup_remove_from_trees(memcg); + memcg1_remove_from_trees(memcg); free_shrinker_info(memcg); mem_cgroup_free(memcg); } diff --git a/mm/vmscan.c b/mm/vmscan.c index b9170f767353d..3833b11c96f9b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6187,9 +6187,9 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) * and balancing, not for a memcg's limit. */ nr_soft_scanned = 0; - nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone->zone_pgdat, - sc->order, sc->gfp_mask, - &nr_soft_scanned); + nr_soft_reclaimed = memcg1_soft_limit_reclaim(zone->zone_pgdat, + sc->order, sc->gfp_mask, + &nr_soft_scanned); sc->nr_reclaimed += nr_soft_reclaimed; sc->nr_scanned += nr_soft_scanned; /* need some check for avoid more shrink_zone() */ @@ -6952,8 +6952,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) /* Call soft limit reclaim before calling shrink_node. */ sc.nr_scanned = 0; nr_soft_scanned = 0; - nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(pgdat, sc.order, - sc.gfp_mask, &nr_soft_scanned); + nr_soft_reclaimed = memcg1_soft_limit_reclaim(pgdat, sc.order, + sc.gfp_mask, &nr_soft_scanned); sc.nr_reclaimed += nr_soft_reclaimed; /* From patchwork Tue May 28 20:20:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677352 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88940C25B78 for ; Tue, 28 May 2024 20:21:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13C406B009E; Tue, 28 May 2024 16:21:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C8A46B009F; Tue, 28 May 2024 16:21:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6F576B00A0; Tue, 28 May 2024 16:21:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AAA6D6B009E for ; Tue, 28 May 2024 16:21:43 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 61432806C0 for ; Tue, 28 May 2024 20:21:43 +0000 (UTC) X-FDA: 82168925286.19.19EE10C Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) by imf18.hostedemail.com (Postfix) with ESMTP id 9EF5C1C000A for ; Tue, 28 May 2024 20:21:40 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=XB19bsr1; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927701; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kvie96LYOxQyZ5EhvOtbtWGGfARBQCz8abY4oY6B4dE=; b=epfR3k/kNzy+bBJu8gulS2uSKlyQNS7FOKEF2yvqM8aS6+WCUvwvwhHzupJfnCj+MeMykZ +9uCseHAcv27zVEQlW6S3f/DjVpxTNqqJFoYAUvVTujf6TGfZgcAEJpI15rQp2VKbBK1LL TdMJ44UKpF5mvHJd4lw+aspu/CuRdQQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927701; a=rsa-sha256; cv=none; b=Y5//l06TTnzVxCyUbqNsLziwhNe/al1/L1z1aQoOGKC6BSDSwQFDg0d1E1ZR3ShRziifMF 70Xj2YgxOhpq8i1UetXAAucxmeBZHXp5oPrQ0BmYkh9MSW0r5S2KfrI/7FZRSVaolqk8D+ GGxPx9uOOtPtVLX1nacxXVjCOa4vSzE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=XB19bsr1; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927699; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Kvie96LYOxQyZ5EhvOtbtWGGfARBQCz8abY4oY6B4dE=; b=XB19bsr1ltwsNEy/tzyhYrGBiAGQHWcE4Hh8nPP6GCdrXupKlWy5Mwlkm7ku0cMhCdIlvI PlfSf1LRukY0RZVcycWa7RNCX25e9IND+TYlK17StbqB0RhuWkYUq0R2dfEmk16Aw0afjG 2MYtShJ5YVz2/1jC+XpHp+/DMSdw5U0= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 04/14] mm: memcg: move charge migration code to memcontrol-v1.c Date: Tue, 28 May 2024 13:20:56 -0700 Message-ID: <20240528202101.3099300-5-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Stat-Signature: rq7zfohyzykkeicn59mb1go4runaxc7s X-Rspamd-Queue-Id: 9EF5C1C000A X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1716927700-173342 X-HE-Meta: U2FsdGVkX1+o8qq0LvNNdTEIF5GKmyuXkeUI6SXueI5llx683bI2990uTUxY3rYyGGz/ziVt5PBI7xsr7XF2ZlALZX/cG2b9huRmFLajT9L9jluxz+0cHijVUKeXHh38yMfLHk9vfNsetWs5kL3B3nuH0IWX7ZnZ41iomwCsXeku21Pc5viQNnW4LFXkbsc67sSHrDu96n93WKnQXMGyjyVsi1W0HAD2GZHkUMagQwiXxCV7O8kWKaZnqgIWaZzXFWkh2fkA4umlv5JI+Ext7mZruvZWP/iUIqBzonipIA0Fg+LxSaHcmsWfKlJv7EqIzl/T42WrWuvBWM1Khh4yZOgd6UBlqPEAjoIQtzxXPzmW+OzfPLkjU4ui56psT5UwoqSwpA582T6i92IWWmnI/u6LxGKboZ6uGx+sO/Xen16OOj2iA88zKSlzgLa9Y5nLigyfzaDzbOD1Pl+jlvcJL3aHv+siVnIhnkMn34zhDXacP8XvX3lPzVDaV0ETmbAVvI3DklOkS12eQ7tRBx0vuQ+nzO76QxJJYuYUt7gOhbMW2TpvoE/XiVwTt09J1vMubpY08TNA945HE67MGt2LrWDhdOmFne+ZDungLiv6IYl/cSUZUdn+l/D0oh+c9WQyIn7Ca/Rj/rWvH4mt/8J/bw8bqu3jkVxuQsgy6reJ6JnNltf7TVR84MiFjDsyTyCmbStlKnd5DwEkjRPcIAtGULT895+h/6hKzCRKEU7UInHh4mwepDY9VuEMvIyI1j7q6YMrEnaTG1DcSs6lU7bXo/YeiJ7A2MymUzyV//Z9H0A8qEGZnaZJfzeLHWob4fiC3Jl/NvkhO5F+P/79ip3dxEcS64CVdUqcyyOHz1BcBiYyFkv9WXZnzUnYKNDUTvI9RIf5SW68z7rJYeyTzNw42s0NwrylVKG8riYglWdXD2HgeDKgMVQUTBOgs0a9NoePcChDyKC9I0uLgYzsOHv ptqgnzdz G0zI6GAsaHTfFqpSl2CvhAQcyOETUfQ6ZSocJNwXJJwWI2BSlHju0MsZDneZpw2jEl9vYtTwgrHVufyXPkt9ir1mcjlfnaJYu7uN3lr3I6mcBbaZzXd+PCqPoZXKla1yxWTeQY3xePz4in3g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Unlike the legacy cgroup v1 memory controller, cgroup v2 memory controller doesn't support moving charged pages between cgroups. It's a fairly large and complicated code which created a number of problems in the past. Let's move this code into memcontrol-v1.c. It shaves off 1k lines from memcontrol.c. It's also another step towards making the legacy memory controller code optionally compiled. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- mm/memcontrol-v1.c | 981 +++++++++++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 30 ++ mm/memcontrol.c | 1004 +------------------------------------------- 3 files changed, 1019 insertions(+), 996 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 68e2f1a718d36..f4c8bec5ae1b8 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -3,7 +3,12 @@ #include #include #include +#include +#include +#include +#include "internal.h" +#include "swap.h" #include "memcontrol-v1.h" /* @@ -30,6 +35,31 @@ static struct mem_cgroup_tree soft_limit_tree __read_mostly; #define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 #define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 +/* Stuffs for move charges at task migration. */ +/* + * Types of charges to be moved. + */ +#define MOVE_ANON 0x1U +#define MOVE_FILE 0x2U +#define MOVE_MASK (MOVE_ANON | MOVE_FILE) + +/* "mc" and its members are protected by cgroup_mutex */ +static struct move_charge_struct { + spinlock_t lock; /* for from, to */ + struct mm_struct *mm; + struct mem_cgroup *from; + struct mem_cgroup *to; + unsigned long flags; + unsigned long precharge; + unsigned long moved_charge; + unsigned long moved_swap; + struct task_struct *moving_task; /* a task moving charges */ + wait_queue_head_t waitq; /* a waitq for other context */ +} mc = { + .lock = __SPIN_LOCK_UNLOCKED(mc.lock), + .waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), +}; + static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, unsigned long new_usage_in_excess) @@ -325,6 +355,957 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, return nr_reclaimed; } +/* + * A routine for checking "mem" is under move_account() or not. + * + * Checking a cgroup is mc.from or mc.to or under hierarchy of + * moving cgroups. This is for waiting at high-memory pressure + * caused by "move". + */ +static bool mem_cgroup_under_move(struct mem_cgroup *memcg) +{ + struct mem_cgroup *from; + struct mem_cgroup *to; + bool ret = false; + /* + * Unlike task_move routines, we access mc.to, mc.from not under + * mutual exclusion by cgroup_mutex. Here, we take spinlock instead. + */ + spin_lock(&mc.lock); + from = mc.from; + to = mc.to; + if (!from) + goto unlock; + + ret = mem_cgroup_is_descendant(from, memcg) || + mem_cgroup_is_descendant(to, memcg); +unlock: + spin_unlock(&mc.lock); + return ret; +} + +bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) +{ + if (mc.moving_task && current != mc.moving_task) { + if (mem_cgroup_under_move(memcg)) { + DEFINE_WAIT(wait); + prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); + /* moving charge context might have finished. */ + if (mc.moving_task) + schedule(); + finish_wait(&mc.waitq, &wait); + return true; + } + } + return false; +} + +/** + * folio_memcg_lock - Bind a folio to its memcg. + * @folio: The folio. + * + * This function prevents unlocked LRU folios from being moved to + * another cgroup. + * + * It ensures lifetime of the bound memcg. The caller is responsible + * for the lifetime of the folio. + */ +void folio_memcg_lock(struct folio *folio) +{ + struct mem_cgroup *memcg; + unsigned long flags; + + /* + * The RCU lock is held throughout the transaction. The fast + * path can get away without acquiring the memcg->move_lock + * because page moving starts with an RCU grace period. + */ + rcu_read_lock(); + + if (mem_cgroup_disabled()) + return; +again: + memcg = folio_memcg(folio); + if (unlikely(!memcg)) + return; + +#ifdef CONFIG_PROVE_LOCKING + local_irq_save(flags); + might_lock(&memcg->move_lock); + local_irq_restore(flags); +#endif + + if (atomic_read(&memcg->moving_account) <= 0) + return; + + spin_lock_irqsave(&memcg->move_lock, flags); + if (memcg != folio_memcg(folio)) { + spin_unlock_irqrestore(&memcg->move_lock, flags); + goto again; + } + + /* + * When charge migration first begins, we can have multiple + * critical sections holding the fast-path RCU lock and one + * holding the slowpath move_lock. Track the task who has the + * move_lock for folio_memcg_unlock(). + */ + memcg->move_lock_task = current; + memcg->move_lock_flags = flags; +} + +static void __folio_memcg_unlock(struct mem_cgroup *memcg) +{ + if (memcg && memcg->move_lock_task == current) { + unsigned long flags = memcg->move_lock_flags; + + memcg->move_lock_task = NULL; + memcg->move_lock_flags = 0; + + spin_unlock_irqrestore(&memcg->move_lock, flags); + } + + rcu_read_unlock(); +} + +/** + * folio_memcg_unlock - Release the binding between a folio and its memcg. + * @folio: The folio. + * + * This releases the binding created by folio_memcg_lock(). This does + * not change the accounting of this folio to its memcg, but it does + * permit others to change it. + */ +void folio_memcg_unlock(struct folio *folio) +{ + __folio_memcg_unlock(folio_memcg(folio)); +} + +#ifdef CONFIG_SWAP +/** + * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. + * @entry: swap entry to be moved + * @from: mem_cgroup which the entry is moved from + * @to: mem_cgroup which the entry is moved to + * + * It succeeds only when the swap_cgroup's record for this entry is the same + * as the mem_cgroup's id of @from. + * + * Returns 0 on success, -EINVAL on failure. + * + * The caller must have charged to @to, IOW, called page_counter_charge() about + * both res and memsw, and called css_get(). + */ +static int mem_cgroup_move_swap_account(swp_entry_t entry, + struct mem_cgroup *from, struct mem_cgroup *to) +{ + unsigned short old_id, new_id; + + old_id = mem_cgroup_id(from); + new_id = mem_cgroup_id(to); + + if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -1); + mod_memcg_state(to, MEMCG_SWAP, 1); + return 0; + } + return -EINVAL; +} +#else +static inline int mem_cgroup_move_swap_account(swp_entry_t entry, + struct mem_cgroup *from, struct mem_cgroup *to) +{ + return -EINVAL; +} +#endif + +u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return mem_cgroup_from_css(css)->move_charge_at_immigrate; +} + +#ifdef CONFIG_MMU +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + + if (val & ~MOVE_MASK) + return -EINVAL; + + /* + * No kind of locking is needed in here, because ->can_attach() will + * check this value once in the beginning of the process, and then carry + * on with stale data. This means that changes to this value will only + * affect task migrations starting after the change. + */ + memcg->move_charge_at_immigrate = val; + return 0; +} +#else +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + return -ENOSYS; +} +#endif + +#ifdef CONFIG_MMU +/* Handlers for move charge at task migration. */ +static int mem_cgroup_do_precharge(unsigned long count) +{ + int ret; + + /* Try a single bulk charge without reclaim first, kswapd may wake */ + ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); + if (!ret) { + mc.precharge += count; + return ret; + } + + /* Try charges one by one with reclaim, but do not retry */ + while (count--) { + ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); + if (ret) + return ret; + mc.precharge++; + cond_resched(); + } + return 0; +} + +union mc_target { + struct folio *folio; + swp_entry_t ent; +}; + +enum mc_target_type { + MC_TARGET_NONE = 0, + MC_TARGET_PAGE, + MC_TARGET_SWAP, + MC_TARGET_DEVICE, +}; + +static struct page *mc_handle_present_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent) +{ + struct page *page = vm_normal_page(vma, addr, ptent); + + if (!page) + return NULL; + if (PageAnon(page)) { + if (!(mc.flags & MOVE_ANON)) + return NULL; + } else { + if (!(mc.flags & MOVE_FILE)) + return NULL; + } + get_page(page); + + return page; +} + +#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE) +static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, + pte_t ptent, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pte_to_swp_entry(ptent); + + if (!(mc.flags & MOVE_ANON)) + return NULL; + + /* + * Handle device private pages that are not accessible by the CPU, but + * stored as special swap entries in the page table. + */ + if (is_device_private_entry(ent)) { + page = pfn_swap_entry_to_page(ent); + if (!get_page_unless_zero(page)) + return NULL; + return page; + } + + if (non_swap_entry(ent)) + return NULL; + + /* + * Because swap_cache_get_folio() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swap_cache_index(ent)); + entry->val = ent.val; + + return page; +} +#else +static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, + pte_t ptent, swp_entry_t *entry) +{ + return NULL; +} +#endif + +static struct page *mc_handle_file_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent) +{ + unsigned long index; + struct folio *folio; + + if (!vma->vm_file) /* anonymous vma */ + return NULL; + if (!(mc.flags & MOVE_FILE)) + return NULL; + + /* folio is moved even if it's not RSS of this task(page-faulted). */ + /* shmem/tmpfs may report page out on swap: account for that too. */ + index = linear_page_index(vma, addr); + folio = filemap_get_incore_folio(vma->vm_file->f_mapping, index); + if (IS_ERR(folio)) + return NULL; + return folio_file_page(folio, index); +} + +/** + * mem_cgroup_move_account - move account of the folio + * @folio: The folio. + * @compound: charge the page as compound or small page + * @from: mem_cgroup which the folio is moved from. + * @to: mem_cgroup which the folio is moved to. @from != @to. + * + * The folio must be locked and not on the LRU. + * + * This function doesn't do "charge" to new cgroup and doesn't do "uncharge" + * from old cgroup. + */ +static int mem_cgroup_move_account(struct folio *folio, + bool compound, + struct mem_cgroup *from, + struct mem_cgroup *to) +{ + struct lruvec *from_vec, *to_vec; + struct pglist_data *pgdat; + unsigned int nr_pages = compound ? folio_nr_pages(folio) : 1; + int nid, ret; + + VM_BUG_ON(from == to); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); + VM_BUG_ON(compound && !folio_test_large(folio)); + + ret = -EINVAL; + if (folio_memcg(folio) != from) + goto out; + + pgdat = folio_pgdat(folio); + from_vec = mem_cgroup_lruvec(from, pgdat); + to_vec = mem_cgroup_lruvec(to, pgdat); + + folio_memcg_lock(folio); + + if (folio_test_anon(folio)) { + if (folio_mapped(folio)) { + __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); + __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); + if (folio_test_pmd_mappable(folio)) { + __mod_lruvec_state(from_vec, NR_ANON_THPS, + -nr_pages); + __mod_lruvec_state(to_vec, NR_ANON_THPS, + nr_pages); + } + } + } else { + __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); + + if (folio_test_swapbacked(folio)) { + __mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages); + __mod_lruvec_state(to_vec, NR_SHMEM, nr_pages); + } + + if (folio_mapped(folio)) { + __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); + } + + if (folio_test_dirty(folio)) { + struct address_space *mapping = folio_mapping(folio); + + if (mapping_can_writeback(mapping)) { + __mod_lruvec_state(from_vec, NR_FILE_DIRTY, + -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_DIRTY, + nr_pages); + } + } + } + +#ifdef CONFIG_SWAP + if (folio_test_swapcache(folio)) { + __mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages); + __mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages); + } +#endif + if (folio_test_writeback(folio)) { + __mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages); + __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); + } + + /* + * All state has been migrated, let's switch to the new memcg. + * + * It is safe to change page's memcg here because the page + * is referenced, charged, isolated, and locked: we can't race + * with (un)charging, migration, LRU putback, or anything else + * that would rely on a stable page's memory cgroup. + * + * Note that folio_memcg_lock is a memcg lock, not a page lock, + * to save space. As soon as we switch page's memory cgroup to a + * new memcg that isn't locked, the above state can change + * concurrently again. Make sure we're truly done with it. + */ + smp_mb(); + + css_get(&to->css); + css_put(&from->css); + + folio->memcg_data = (unsigned long)to; + + __folio_memcg_unlock(from); + + ret = 0; + nid = folio_nid(folio); + + local_irq_disable(); + mem_cgroup_charge_statistics(to, nr_pages); + memcg_check_events(to, nid); + mem_cgroup_charge_statistics(from, -nr_pages); + memcg_check_events(from, nid); + local_irq_enable(); +out: + return ret; +} + +/** + * get_mctgt_type - get target type of moving charge + * @vma: the vma the pte to be checked belongs + * @addr: the address corresponding to the pte to be checked + * @ptent: the pte to be checked + * @target: the pointer the target page or swap ent will be stored(can be NULL) + * + * Context: Called with pte lock held. + * Return: + * * MC_TARGET_NONE - If the pte is not a target for move charge. + * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for + * move charge. If @target is not NULL, the folio is stored in target->folio + * with extra refcnt taken (Caller should release it). + * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a + * target for charge migration. If @target is not NULL, the entry is + * stored in target->ent. + * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and + * thus not on the lru. For now such page is charged like a regular page + * would be as it is just special memory taking the place of a regular page. + * See Documentations/vm/hmm.txt and include/linux/hmm.h + */ +static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent, union mc_target *target) +{ + struct page *page = NULL; + struct folio *folio; + enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; + + if (pte_present(ptent)) + page = mc_handle_present_pte(vma, addr, ptent); + else if (pte_none_mostly(ptent)) + /* + * PTE markers should be treated as a none pte here, separated + * from other swap handling below. + */ + page = mc_handle_file_pte(vma, addr, ptent); + else if (is_swap_pte(ptent)) + page = mc_handle_swap_pte(vma, ptent, &ent); + + if (page) + folio = page_folio(page); + if (target && page) { + if (!folio_trylock(folio)) { + folio_put(folio); + return ret; + } + /* + * page_mapped() must be stable during the move. This + * pte is locked, so if it's present, the page cannot + * become unmapped. If it isn't, we have only partial + * control over the mapped state: the page lock will + * prevent new faults against pagecache and swapcache, + * so an unmapped page cannot become mapped. However, + * if the page is already mapped elsewhere, it can + * unmap, and there is nothing we can do about it. + * Alas, skip moving the page in this case. + */ + if (!pte_present(ptent) && page_mapped(page)) { + folio_unlock(folio); + folio_put(folio); + return ret; + } + } + + if (!page && !ent.val) + return ret; + if (page) { + /* + * Do only loose check w/o serialization. + * mem_cgroup_move_account() checks the page is valid or + * not under LRU exclusion. + */ + if (folio_memcg(folio) == mc.from) { + ret = MC_TARGET_PAGE; + if (folio_is_device_private(folio) || + folio_is_device_coherent(folio)) + ret = MC_TARGET_DEVICE; + if (target) + target->folio = folio; + } + if (!ret || !target) { + if (target) + folio_unlock(folio); + folio_put(folio); + } + } + /* + * There is a swap entry and a page doesn't exist or isn't charged. + * But we cannot move a tail-page in a THP. + */ + if (ent.val && !ret && (!page || !PageTransCompound(page)) && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } + return ret; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/* + * We don't consider PMD mapped swapping or file mapped pages because THP does + * not support them for now. + * Caller should make sure that pmd_trans_huge(pmd) is true. + */ +static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, + unsigned long addr, pmd_t pmd, union mc_target *target) +{ + struct page *page = NULL; + struct folio *folio; + enum mc_target_type ret = MC_TARGET_NONE; + + if (unlikely(is_swap_pmd(pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(pmd)); + return ret; + } + page = pmd_page(pmd); + VM_BUG_ON_PAGE(!page || !PageHead(page), page); + folio = page_folio(page); + if (!(mc.flags & MOVE_ANON)) + return ret; + if (folio_memcg(folio) == mc.from) { + ret = MC_TARGET_PAGE; + if (target) { + folio_get(folio); + if (!folio_trylock(folio)) { + folio_put(folio); + return MC_TARGET_NONE; + } + target->folio = folio; + } + } + return ret; +} +#else +static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, + unsigned long addr, pmd_t pmd, union mc_target *target) +{ + return MC_TARGET_NONE; +} +#endif + +static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + pte_t *pte; + spinlock_t *ptl; + + ptl = pmd_trans_huge_lock(pmd, vma); + if (ptl) { + /* + * Note their can not be MC_TARGET_DEVICE for now as we do not + * support transparent huge page with MEMORY_DEVICE_PRIVATE but + * this might change. + */ + if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) + mc.precharge += HPAGE_PMD_NR; + spin_unlock(ptl); + return 0; + } + + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; + for (; addr != end; pte++, addr += PAGE_SIZE) + if (get_mctgt_type(vma, addr, ptep_get(pte), NULL)) + mc.precharge++; /* increment precharge temporarily */ + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + return 0; +} + +static const struct mm_walk_ops precharge_walk_ops = { + .pmd_entry = mem_cgroup_count_precharge_pte_range, + .walk_lock = PGWALK_RDLOCK, +}; + +static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) +{ + unsigned long precharge; + + mmap_read_lock(mm); + walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL); + mmap_read_unlock(mm); + + precharge = mc.precharge; + mc.precharge = 0; + + return precharge; +} + +static int mem_cgroup_precharge_mc(struct mm_struct *mm) +{ + unsigned long precharge = mem_cgroup_count_precharge(mm); + + VM_BUG_ON(mc.moving_task); + mc.moving_task = current; + return mem_cgroup_do_precharge(precharge); +} + +/* cancels all extra charges on mc.from and mc.to, and wakes up all waiters. */ +static void __mem_cgroup_clear_mc(void) +{ + struct mem_cgroup *from = mc.from; + struct mem_cgroup *to = mc.to; + + /* we must uncharge all the leftover precharges from mc.to */ + if (mc.precharge) { + mem_cgroup_cancel_charge(mc.to, mc.precharge); + mc.precharge = 0; + } + /* + * we didn't uncharge from mc.from at mem_cgroup_move_account(), so + * we must uncharge here. + */ + if (mc.moved_charge) { + mem_cgroup_cancel_charge(mc.from, mc.moved_charge); + mc.moved_charge = 0; + } + /* we must fixup refcnts and charges */ + if (mc.moved_swap) { + /* uncharge swap account from the old cgroup */ + if (!mem_cgroup_is_root(mc.from)) + page_counter_uncharge(&mc.from->memsw, mc.moved_swap); + + mem_cgroup_id_put_many(mc.from, mc.moved_swap); + + /* + * we charged both to->memory and to->memsw, so we + * should uncharge to->memory. + */ + if (!mem_cgroup_is_root(mc.to)) + page_counter_uncharge(&mc.to->memory, mc.moved_swap); + + mc.moved_swap = 0; + } + memcg_oom_recover(from); + memcg_oom_recover(to); + wake_up_all(&mc.waitq); +} + +static void mem_cgroup_clear_mc(void) +{ + struct mm_struct *mm = mc.mm; + + /* + * we must clear moving_task before waking up waiters at the end of + * task migration. + */ + mc.moving_task = NULL; + __mem_cgroup_clear_mc(); + spin_lock(&mc.lock); + mc.from = NULL; + mc.to = NULL; + mc.mm = NULL; + spin_unlock(&mc.lock); + + mmput(mm); +} + +int mem_cgroup_can_attach(struct cgroup_taskset *tset) +{ + struct cgroup_subsys_state *css; + struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */ + struct mem_cgroup *from; + struct task_struct *leader, *p; + struct mm_struct *mm; + unsigned long move_flags; + int ret = 0; + + /* charge immigration isn't supported on the default hierarchy */ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return 0; + + /* + * Multi-process migrations only happen on the default hierarchy + * where charge immigration is not used. Perform charge + * immigration if @tset contains a leader and whine if there are + * multiple. + */ + p = NULL; + cgroup_taskset_for_each_leader(leader, css, tset) { + WARN_ON_ONCE(p); + p = leader; + memcg = mem_cgroup_from_css(css); + } + if (!p) + return 0; + + /* + * We are now committed to this value whatever it is. Changes in this + * tunable will only affect upcoming migrations, not the current one. + * So we need to save it, and keep it going. + */ + move_flags = READ_ONCE(memcg->move_charge_at_immigrate); + if (!move_flags) + return 0; + + from = mem_cgroup_from_task(p); + + VM_BUG_ON(from == memcg); + + mm = get_task_mm(p); + if (!mm) + return 0; + /* We move charges only when we move a owner of the mm */ + if (mm->owner == p) { + VM_BUG_ON(mc.from); + VM_BUG_ON(mc.to); + VM_BUG_ON(mc.precharge); + VM_BUG_ON(mc.moved_charge); + VM_BUG_ON(mc.moved_swap); + + spin_lock(&mc.lock); + mc.mm = mm; + mc.from = from; + mc.to = memcg; + mc.flags = move_flags; + spin_unlock(&mc.lock); + /* We set mc.moving_task later */ + + ret = mem_cgroup_precharge_mc(mm); + if (ret) + mem_cgroup_clear_mc(); + } else { + mmput(mm); + } + return ret; +} + +void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +{ + if (mc.to) + mem_cgroup_clear_mc(); +} + +static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + int ret = 0; + struct vm_area_struct *vma = walk->vma; + pte_t *pte; + spinlock_t *ptl; + enum mc_target_type target_type; + union mc_target target; + struct folio *folio; + + ptl = pmd_trans_huge_lock(pmd, vma); + if (ptl) { + if (mc.precharge < HPAGE_PMD_NR) { + spin_unlock(ptl); + return 0; + } + target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); + if (target_type == MC_TARGET_PAGE) { + folio = target.folio; + if (folio_isolate_lru(folio)) { + if (!mem_cgroup_move_account(folio, true, + mc.from, mc.to)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_charge += HPAGE_PMD_NR; + } + folio_putback_lru(folio); + } + folio_unlock(folio); + folio_put(folio); + } else if (target_type == MC_TARGET_DEVICE) { + folio = target.folio; + if (!mem_cgroup_move_account(folio, true, + mc.from, mc.to)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_charge += HPAGE_PMD_NR; + } + folio_unlock(folio); + folio_put(folio); + } + spin_unlock(ptl); + return 0; + } + +retry: + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; + for (; addr != end; addr += PAGE_SIZE) { + pte_t ptent = ptep_get(pte++); + bool device = false; + swp_entry_t ent; + + if (!mc.precharge) + break; + + switch (get_mctgt_type(vma, addr, ptent, &target)) { + case MC_TARGET_DEVICE: + device = true; + fallthrough; + case MC_TARGET_PAGE: + folio = target.folio; + /* + * We can have a part of the split pmd here. Moving it + * can be done but it would be too convoluted so simply + * ignore such a partial THP and keep it in original + * memcg. There should be somebody mapping the head. + */ + if (folio_test_large(folio)) + goto put; + if (!device && !folio_isolate_lru(folio)) + goto put; + if (!mem_cgroup_move_account(folio, false, + mc.from, mc.to)) { + mc.precharge--; + /* we uncharge from mc.from later. */ + mc.moved_charge++; + } + if (!device) + folio_putback_lru(folio); +put: /* get_mctgt_type() gets & locks the page */ + folio_unlock(folio); + folio_put(folio); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + mc.precharge--; + mem_cgroup_id_get_many(mc.to, 1); + /* we fixup other refcnts and charges later. */ + mc.moved_swap++; + } + break; + default: + break; + } + } + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + if (addr != end) { + /* + * We have consumed all precharges we got in can_attach(). + * We try charge one by one, but don't do any additional + * charges to mc.to if we have failed in charge once in attach() + * phase. + */ + ret = mem_cgroup_do_precharge(1); + if (!ret) + goto retry; + } + + return ret; +} + +static const struct mm_walk_ops charge_walk_ops = { + .pmd_entry = mem_cgroup_move_charge_pte_range, + .walk_lock = PGWALK_RDLOCK, +}; + +static void mem_cgroup_move_charge(void) +{ + lru_add_drain_all(); + /* + * Signal folio_memcg_lock() to take the memcg's move_lock + * while we're moving its pages to another memcg. Then wait + * for already started RCU-only updates to finish. + */ + atomic_inc(&mc.from->moving_account); + synchronize_rcu(); +retry: + if (unlikely(!mmap_read_trylock(mc.mm))) { + /* + * Someone who are holding the mmap_lock might be waiting in + * waitq. So we cancel all extra charges, wake up all waiters, + * and retry. Because we cancel precharges, we might not be able + * to move enough charges, but moving charge is a best-effort + * feature anyway, so it wouldn't be a big problem. + */ + __mem_cgroup_clear_mc(); + cond_resched(); + goto retry; + } + /* + * When we have consumed all precharges and failed in doing + * additional charge, the page walk just aborts. + */ + walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL); + mmap_read_unlock(mc.mm); + atomic_dec(&mc.from->moving_account); +} + +void mem_cgroup_move_task(void) +{ + if (mc.to) { + mem_cgroup_move_charge(); + mem_cgroup_clear_mc(); + } +} + +#else /* !CONFIG_MMU */ +static int mem_cgroup_can_attach(struct cgroup_taskset *tset) +{ + return 0; +} +static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +{ +} +static void mem_cgroup_move_task(void) +{ +} +#endif + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index e37bc7e8d9558..55e7c4f90c399 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -11,4 +11,34 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); } +void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); +void memcg_check_events(struct mem_cgroup *memcg, int nid); +void memcg_oom_recover(struct mem_cgroup *memcg); +int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages); + +static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages) +{ + if (mem_cgroup_is_root(memcg)) + return 0; + + return try_charge_memcg(memcg, gfp_mask, nr_pages); +} + +void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); +void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); + +bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg); +struct cgroup_taskset; +int mem_cgroup_can_attach(struct cgroup_taskset *tset); +void mem_cgroup_cancel_attach(struct cgroup_taskset *tset); +void mem_cgroup_move_task(void); + +struct cftype; +u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, + struct cftype *cft); +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val); + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0fbc4e518222e..ba538ab0a80ad 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -28,7 +28,6 @@ #include #include #include -#include #include #include #include @@ -45,7 +44,6 @@ #include #include #include -#include #include #include #include @@ -71,7 +69,6 @@ #include #include #include "slab.h" -#include "swap.h" #include "memcontrol-v1.h" #include @@ -158,31 +155,6 @@ struct mem_cgroup_event { static void mem_cgroup_threshold(struct mem_cgroup *memcg); static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); -/* Stuffs for move charges at task migration. */ -/* - * Types of charges to be moved. - */ -#define MOVE_ANON 0x1U -#define MOVE_FILE 0x2U -#define MOVE_MASK (MOVE_ANON | MOVE_FILE) - -/* "mc" and its members are protected by cgroup_mutex */ -static struct move_charge_struct { - spinlock_t lock; /* for from, to */ - struct mm_struct *mm; - struct mem_cgroup *from; - struct mem_cgroup *to; - unsigned long flags; - unsigned long precharge; - unsigned long moved_charge; - unsigned long moved_swap; - struct task_struct *moving_task; /* a task moving charges */ - wait_queue_head_t waitq; /* a waitq for other context */ -} mc = { - .lock = __SPIN_LOCK_UNLOCKED(mc.lock), - .waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), -}; - /* for encoding cft->private value on file */ enum res_type { _MEM, @@ -955,8 +927,7 @@ static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) return READ_ONCE(memcg->vmstats->events_local[i]); } -static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, - int nr_pages) +void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages) { /* pagein of a big page is an event. So, ignore page size */ if (nr_pages > 0) @@ -998,7 +969,7 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, * Check events in order. * */ -static void memcg_check_events(struct mem_cgroup *memcg, int nid) +void memcg_check_events(struct mem_cgroup *memcg, int nid) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) return; @@ -1467,51 +1438,6 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg) return margin; } -/* - * A routine for checking "mem" is under move_account() or not. - * - * Checking a cgroup is mc.from or mc.to or under hierarchy of - * moving cgroups. This is for waiting at high-memory pressure - * caused by "move". - */ -static bool mem_cgroup_under_move(struct mem_cgroup *memcg) -{ - struct mem_cgroup *from; - struct mem_cgroup *to; - bool ret = false; - /* - * Unlike task_move routines, we access mc.to, mc.from not under - * mutual exclusion by cgroup_mutex. Here, we take spinlock instead. - */ - spin_lock(&mc.lock); - from = mc.from; - to = mc.to; - if (!from) - goto unlock; - - ret = mem_cgroup_is_descendant(from, memcg) || - mem_cgroup_is_descendant(to, memcg); -unlock: - spin_unlock(&mc.lock); - return ret; -} - -static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) -{ - if (mc.moving_task && current != mc.moving_task) { - if (mem_cgroup_under_move(memcg)) { - DEFINE_WAIT(wait); - prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); - /* moving charge context might have finished. */ - if (mc.moving_task) - schedule(); - finish_wait(&mc.waitq, &wait); - return true; - } - } - return false; -} - struct memory_stat { const char *name; unsigned int idx; @@ -1904,7 +1830,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t *wait, return autoremove_wake_function(wait, mode, sync, arg); } -static void memcg_oom_recover(struct mem_cgroup *memcg) +void memcg_oom_recover(struct mem_cgroup *memcg) { /* * For the following lockless ->under_oom test, the only required @@ -2093,87 +2019,6 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) pr_cont(" are going to be killed due to memory.oom.group set\n"); } -/** - * folio_memcg_lock - Bind a folio to its memcg. - * @folio: The folio. - * - * This function prevents unlocked LRU folios from being moved to - * another cgroup. - * - * It ensures lifetime of the bound memcg. The caller is responsible - * for the lifetime of the folio. - */ -void folio_memcg_lock(struct folio *folio) -{ - struct mem_cgroup *memcg; - unsigned long flags; - - /* - * The RCU lock is held throughout the transaction. The fast - * path can get away without acquiring the memcg->move_lock - * because page moving starts with an RCU grace period. - */ - rcu_read_lock(); - - if (mem_cgroup_disabled()) - return; -again: - memcg = folio_memcg(folio); - if (unlikely(!memcg)) - return; - -#ifdef CONFIG_PROVE_LOCKING - local_irq_save(flags); - might_lock(&memcg->move_lock); - local_irq_restore(flags); -#endif - - if (atomic_read(&memcg->moving_account) <= 0) - return; - - spin_lock_irqsave(&memcg->move_lock, flags); - if (memcg != folio_memcg(folio)) { - spin_unlock_irqrestore(&memcg->move_lock, flags); - goto again; - } - - /* - * When charge migration first begins, we can have multiple - * critical sections holding the fast-path RCU lock and one - * holding the slowpath move_lock. Track the task who has the - * move_lock for folio_memcg_unlock(). - */ - memcg->move_lock_task = current; - memcg->move_lock_flags = flags; -} - -static void __folio_memcg_unlock(struct mem_cgroup *memcg) -{ - if (memcg && memcg->move_lock_task == current) { - unsigned long flags = memcg->move_lock_flags; - - memcg->move_lock_task = NULL; - memcg->move_lock_flags = 0; - - spin_unlock_irqrestore(&memcg->move_lock, flags); - } - - rcu_read_unlock(); -} - -/** - * folio_memcg_unlock - Release the binding between a folio and its memcg. - * @folio: The folio. - * - * This releases the binding created by folio_memcg_lock(). This does - * not change the accounting of this folio to its memcg, but it does - * permit others to change it. - */ -void folio_memcg_unlock(struct folio *folio) -{ - __folio_memcg_unlock(folio_memcg(folio)); -} - struct memcg_stock_pcp { local_lock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ @@ -2653,8 +2498,8 @@ void mem_cgroup_handle_over_high(gfp_t gfp_mask) css_put(&memcg->css); } -static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) +int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages) { unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries = MAX_RECLAIM_RETRIES; @@ -2849,15 +2694,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, return 0; } -static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) -{ - if (mem_cgroup_is_root(memcg)) - return 0; - - return try_charge_memcg(memcg, gfp_mask, nr_pages); -} - /** * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call. * @memcg: memcg previously charged. @@ -3597,43 +3433,6 @@ void split_page_memcg(struct page *head, int old_order, int new_order) css_get_many(&memcg->css, old_nr / new_nr - 1); } -#ifdef CONFIG_SWAP -/** - * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved - * @from: mem_cgroup which the entry is moved from - * @to: mem_cgroup which the entry is moved to - * - * It succeeds only when the swap_cgroup's record for this entry is the same - * as the mem_cgroup's id of @from. - * - * Returns 0 on success, -EINVAL on failure. - * - * The caller must have charged to @to, IOW, called page_counter_charge() about - * both res and memsw, and called css_get(). - */ -static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) -{ - unsigned short old_id, new_id; - - old_id = mem_cgroup_id(from); - new_id = mem_cgroup_id(to); - - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); - return 0; - } - return -EINVAL; -} -#else -static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) -{ - return -EINVAL; -} -#endif static DEFINE_MUTEX(memcg_max_mutex); @@ -4017,42 +3816,6 @@ static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, return nbytes; } -static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - return mem_cgroup_from_css(css)->move_charge_at_immigrate; -} - -#ifdef CONFIG_MMU -static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - - pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - - if (val & ~MOVE_MASK) - return -EINVAL; - - /* - * No kind of locking is needed in here, because ->can_attach() will - * check this value once in the beginning of the process, and then carry - * on with stale data. This means that changes to this value will only - * affect task migrations starting after the change. - */ - memcg->move_charge_at_immigrate = val; - return 0; -} -#else -static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - return -ENOSYS; -} -#endif - #ifdef CONFIG_NUMA #define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) @@ -5263,13 +5026,13 @@ static void mem_cgroup_id_remove(struct mem_cgroup *memcg) } } -static void __maybe_unused mem_cgroup_id_get_many(struct mem_cgroup *memcg, - unsigned int n) +void __maybe_unused mem_cgroup_id_get_many(struct mem_cgroup *memcg, + unsigned int n) { refcount_add(n, &memcg->id.ref); } -static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n) +void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n) { if (refcount_sub_and_test(n, &memcg->id.ref)) { mem_cgroup_id_remove(memcg); @@ -5749,757 +5512,6 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) atomic64_set(&memcg->vmstats->stats_updates, 0); } -#ifdef CONFIG_MMU -/* Handlers for move charge at task migration. */ -static int mem_cgroup_do_precharge(unsigned long count) -{ - int ret; - - /* Try a single bulk charge without reclaim first, kswapd may wake */ - ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); - if (!ret) { - mc.precharge += count; - return ret; - } - - /* Try charges one by one with reclaim, but do not retry */ - while (count--) { - ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); - if (ret) - return ret; - mc.precharge++; - cond_resched(); - } - return 0; -} - -union mc_target { - struct folio *folio; - swp_entry_t ent; -}; - -enum mc_target_type { - MC_TARGET_NONE = 0, - MC_TARGET_PAGE, - MC_TARGET_SWAP, - MC_TARGET_DEVICE, -}; - -static struct page *mc_handle_present_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent) -{ - struct page *page = vm_normal_page(vma, addr, ptent); - - if (!page) - return NULL; - if (PageAnon(page)) { - if (!(mc.flags & MOVE_ANON)) - return NULL; - } else { - if (!(mc.flags & MOVE_FILE)) - return NULL; - } - get_page(page); - - return page; -} - -#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE) -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, - pte_t ptent, swp_entry_t *entry) -{ - struct page *page = NULL; - swp_entry_t ent = pte_to_swp_entry(ptent); - - if (!(mc.flags & MOVE_ANON)) - return NULL; - - /* - * Handle device private pages that are not accessible by the CPU, but - * stored as special swap entries in the page table. - */ - if (is_device_private_entry(ent)) { - page = pfn_swap_entry_to_page(ent); - if (!get_page_unless_zero(page)) - return NULL; - return page; - } - - if (non_swap_entry(ent)) - return NULL; - - /* - * Because swap_cache_get_folio() updates some statistics counter, - * we call find_get_page() with swapper_space directly. - */ - page = find_get_page(swap_address_space(ent), swap_cache_index(ent)); - entry->val = ent.val; - - return page; -} -#else -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, - pte_t ptent, swp_entry_t *entry) -{ - return NULL; -} -#endif - -static struct page *mc_handle_file_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent) -{ - unsigned long index; - struct folio *folio; - - if (!vma->vm_file) /* anonymous vma */ - return NULL; - if (!(mc.flags & MOVE_FILE)) - return NULL; - - /* folio is moved even if it's not RSS of this task(page-faulted). */ - /* shmem/tmpfs may report page out on swap: account for that too. */ - index = linear_page_index(vma, addr); - folio = filemap_get_incore_folio(vma->vm_file->f_mapping, index); - if (IS_ERR(folio)) - return NULL; - return folio_file_page(folio, index); -} - -/** - * mem_cgroup_move_account - move account of the folio - * @folio: The folio. - * @compound: charge the page as compound or small page - * @from: mem_cgroup which the folio is moved from. - * @to: mem_cgroup which the folio is moved to. @from != @to. - * - * The folio must be locked and not on the LRU. - * - * This function doesn't do "charge" to new cgroup and doesn't do "uncharge" - * from old cgroup. - */ -static int mem_cgroup_move_account(struct folio *folio, - bool compound, - struct mem_cgroup *from, - struct mem_cgroup *to) -{ - struct lruvec *from_vec, *to_vec; - struct pglist_data *pgdat; - unsigned int nr_pages = compound ? folio_nr_pages(folio) : 1; - int nid, ret; - - VM_BUG_ON(from == to); - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON(compound && !folio_test_large(folio)); - - ret = -EINVAL; - if (folio_memcg(folio) != from) - goto out; - - pgdat = folio_pgdat(folio); - from_vec = mem_cgroup_lruvec(from, pgdat); - to_vec = mem_cgroup_lruvec(to, pgdat); - - folio_memcg_lock(folio); - - if (folio_test_anon(folio)) { - if (folio_mapped(folio)) { - __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); - __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); - if (folio_test_pmd_mappable(folio)) { - __mod_lruvec_state(from_vec, NR_ANON_THPS, - -nr_pages); - __mod_lruvec_state(to_vec, NR_ANON_THPS, - nr_pages); - } - } - } else { - __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); - - if (folio_test_swapbacked(folio)) { - __mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages); - __mod_lruvec_state(to_vec, NR_SHMEM, nr_pages); - } - - if (folio_mapped(folio)) { - __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); - } - - if (folio_test_dirty(folio)) { - struct address_space *mapping = folio_mapping(folio); - - if (mapping_can_writeback(mapping)) { - __mod_lruvec_state(from_vec, NR_FILE_DIRTY, - -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_DIRTY, - nr_pages); - } - } - } - -#ifdef CONFIG_SWAP - if (folio_test_swapcache(folio)) { - __mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages); - __mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages); - } -#endif - if (folio_test_writeback(folio)) { - __mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages); - __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); - } - - /* - * All state has been migrated, let's switch to the new memcg. - * - * It is safe to change page's memcg here because the page - * is referenced, charged, isolated, and locked: we can't race - * with (un)charging, migration, LRU putback, or anything else - * that would rely on a stable page's memory cgroup. - * - * Note that folio_memcg_lock is a memcg lock, not a page lock, - * to save space. As soon as we switch page's memory cgroup to a - * new memcg that isn't locked, the above state can change - * concurrently again. Make sure we're truly done with it. - */ - smp_mb(); - - css_get(&to->css); - css_put(&from->css); - - folio->memcg_data = (unsigned long)to; - - __folio_memcg_unlock(from); - - ret = 0; - nid = folio_nid(folio); - - local_irq_disable(); - mem_cgroup_charge_statistics(to, nr_pages); - memcg_check_events(to, nid); - mem_cgroup_charge_statistics(from, -nr_pages); - memcg_check_events(from, nid); - local_irq_enable(); -out: - return ret; -} - -/** - * get_mctgt_type - get target type of moving charge - * @vma: the vma the pte to be checked belongs - * @addr: the address corresponding to the pte to be checked - * @ptent: the pte to be checked - * @target: the pointer the target page or swap ent will be stored(can be NULL) - * - * Context: Called with pte lock held. - * Return: - * * MC_TARGET_NONE - If the pte is not a target for move charge. - * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for - * move charge. If @target is not NULL, the folio is stored in target->folio - * with extra refcnt taken (Caller should release it). - * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a - * target for charge migration. If @target is not NULL, the entry is - * stored in target->ent. - * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and - * thus not on the lru. For now such page is charged like a regular page - * would be as it is just special memory taking the place of a regular page. - * See Documentations/vm/hmm.txt and include/linux/hmm.h - */ -static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent, union mc_target *target) -{ - struct page *page = NULL; - struct folio *folio; - enum mc_target_type ret = MC_TARGET_NONE; - swp_entry_t ent = { .val = 0 }; - - if (pte_present(ptent)) - page = mc_handle_present_pte(vma, addr, ptent); - else if (pte_none_mostly(ptent)) - /* - * PTE markers should be treated as a none pte here, separated - * from other swap handling below. - */ - page = mc_handle_file_pte(vma, addr, ptent); - else if (is_swap_pte(ptent)) - page = mc_handle_swap_pte(vma, ptent, &ent); - - if (page) - folio = page_folio(page); - if (target && page) { - if (!folio_trylock(folio)) { - folio_put(folio); - return ret; - } - /* - * page_mapped() must be stable during the move. This - * pte is locked, so if it's present, the page cannot - * become unmapped. If it isn't, we have only partial - * control over the mapped state: the page lock will - * prevent new faults against pagecache and swapcache, - * so an unmapped page cannot become mapped. However, - * if the page is already mapped elsewhere, it can - * unmap, and there is nothing we can do about it. - * Alas, skip moving the page in this case. - */ - if (!pte_present(ptent) && page_mapped(page)) { - folio_unlock(folio); - folio_put(folio); - return ret; - } - } - - if (!page && !ent.val) - return ret; - if (page) { - /* - * Do only loose check w/o serialization. - * mem_cgroup_move_account() checks the page is valid or - * not under LRU exclusion. - */ - if (folio_memcg(folio) == mc.from) { - ret = MC_TARGET_PAGE; - if (folio_is_device_private(folio) || - folio_is_device_coherent(folio)) - ret = MC_TARGET_DEVICE; - if (target) - target->folio = folio; - } - if (!ret || !target) { - if (target) - folio_unlock(folio); - folio_put(folio); - } - } - /* - * There is a swap entry and a page doesn't exist or isn't charged. - * But we cannot move a tail-page in a THP. - */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && - mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { - ret = MC_TARGET_SWAP; - if (target) - target->ent = ent; - } - return ret; -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -/* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. - */ -static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) -{ - struct page *page = NULL; - struct folio *folio; - enum mc_target_type ret = MC_TARGET_NONE; - - if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; - } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); - folio = page_folio(page); - if (!(mc.flags & MOVE_ANON)) - return ret; - if (folio_memcg(folio) == mc.from) { - ret = MC_TARGET_PAGE; - if (target) { - folio_get(folio); - if (!folio_trylock(folio)) { - folio_put(folio); - return MC_TARGET_NONE; - } - target->folio = folio; - } - } - return ret; -} -#else -static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) -{ - return MC_TARGET_NONE; -} -#endif - -static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - struct vm_area_struct *vma = walk->vma; - pte_t *pte; - spinlock_t *ptl; - - ptl = pmd_trans_huge_lock(pmd, vma); - if (ptl) { - /* - * Note their can not be MC_TARGET_DEVICE for now as we do not - * support transparent huge page with MEMORY_DEVICE_PRIVATE but - * this might change. - */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; - spin_unlock(ptl); - return 0; - } - - pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) - return 0; - for (; addr != end; pte++, addr += PAGE_SIZE) - if (get_mctgt_type(vma, addr, ptep_get(pte), NULL)) - mc.precharge++; /* increment precharge temporarily */ - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - - return 0; -} - -static const struct mm_walk_ops precharge_walk_ops = { - .pmd_entry = mem_cgroup_count_precharge_pte_range, - .walk_lock = PGWALK_RDLOCK, -}; - -static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) -{ - unsigned long precharge; - - mmap_read_lock(mm); - walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL); - mmap_read_unlock(mm); - - precharge = mc.precharge; - mc.precharge = 0; - - return precharge; -} - -static int mem_cgroup_precharge_mc(struct mm_struct *mm) -{ - unsigned long precharge = mem_cgroup_count_precharge(mm); - - VM_BUG_ON(mc.moving_task); - mc.moving_task = current; - return mem_cgroup_do_precharge(precharge); -} - -/* cancels all extra charges on mc.from and mc.to, and wakes up all waiters. */ -static void __mem_cgroup_clear_mc(void) -{ - struct mem_cgroup *from = mc.from; - struct mem_cgroup *to = mc.to; - - /* we must uncharge all the leftover precharges from mc.to */ - if (mc.precharge) { - mem_cgroup_cancel_charge(mc.to, mc.precharge); - mc.precharge = 0; - } - /* - * we didn't uncharge from mc.from at mem_cgroup_move_account(), so - * we must uncharge here. - */ - if (mc.moved_charge) { - mem_cgroup_cancel_charge(mc.from, mc.moved_charge); - mc.moved_charge = 0; - } - /* we must fixup refcnts and charges */ - if (mc.moved_swap) { - /* uncharge swap account from the old cgroup */ - if (!mem_cgroup_is_root(mc.from)) - page_counter_uncharge(&mc.from->memsw, mc.moved_swap); - - mem_cgroup_id_put_many(mc.from, mc.moved_swap); - - /* - * we charged both to->memory and to->memsw, so we - * should uncharge to->memory. - */ - if (!mem_cgroup_is_root(mc.to)) - page_counter_uncharge(&mc.to->memory, mc.moved_swap); - - mc.moved_swap = 0; - } - memcg_oom_recover(from); - memcg_oom_recover(to); - wake_up_all(&mc.waitq); -} - -static void mem_cgroup_clear_mc(void) -{ - struct mm_struct *mm = mc.mm; - - /* - * we must clear moving_task before waking up waiters at the end of - * task migration. - */ - mc.moving_task = NULL; - __mem_cgroup_clear_mc(); - spin_lock(&mc.lock); - mc.from = NULL; - mc.to = NULL; - mc.mm = NULL; - spin_unlock(&mc.lock); - - mmput(mm); -} - -static int mem_cgroup_can_attach(struct cgroup_taskset *tset) -{ - struct cgroup_subsys_state *css; - struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */ - struct mem_cgroup *from; - struct task_struct *leader, *p; - struct mm_struct *mm; - unsigned long move_flags; - int ret = 0; - - /* charge immigration isn't supported on the default hierarchy */ - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) - return 0; - - /* - * Multi-process migrations only happen on the default hierarchy - * where charge immigration is not used. Perform charge - * immigration if @tset contains a leader and whine if there are - * multiple. - */ - p = NULL; - cgroup_taskset_for_each_leader(leader, css, tset) { - WARN_ON_ONCE(p); - p = leader; - memcg = mem_cgroup_from_css(css); - } - if (!p) - return 0; - - /* - * We are now committed to this value whatever it is. Changes in this - * tunable will only affect upcoming migrations, not the current one. - * So we need to save it, and keep it going. - */ - move_flags = READ_ONCE(memcg->move_charge_at_immigrate); - if (!move_flags) - return 0; - - from = mem_cgroup_from_task(p); - - VM_BUG_ON(from == memcg); - - mm = get_task_mm(p); - if (!mm) - return 0; - /* We move charges only when we move a owner of the mm */ - if (mm->owner == p) { - VM_BUG_ON(mc.from); - VM_BUG_ON(mc.to); - VM_BUG_ON(mc.precharge); - VM_BUG_ON(mc.moved_charge); - VM_BUG_ON(mc.moved_swap); - - spin_lock(&mc.lock); - mc.mm = mm; - mc.from = from; - mc.to = memcg; - mc.flags = move_flags; - spin_unlock(&mc.lock); - /* We set mc.moving_task later */ - - ret = mem_cgroup_precharge_mc(mm); - if (ret) - mem_cgroup_clear_mc(); - } else { - mmput(mm); - } - return ret; -} - -static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) -{ - if (mc.to) - mem_cgroup_clear_mc(); -} - -static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - int ret = 0; - struct vm_area_struct *vma = walk->vma; - pte_t *pte; - spinlock_t *ptl; - enum mc_target_type target_type; - union mc_target target; - struct folio *folio; - - ptl = pmd_trans_huge_lock(pmd, vma); - if (ptl) { - if (mc.precharge < HPAGE_PMD_NR) { - spin_unlock(ptl); - return 0; - } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { - folio = target.folio; - if (folio_isolate_lru(folio)) { - if (!mem_cgroup_move_account(folio, true, - mc.from, mc.to)) { - mc.precharge -= HPAGE_PMD_NR; - mc.moved_charge += HPAGE_PMD_NR; - } - folio_putback_lru(folio); - } - folio_unlock(folio); - folio_put(folio); - } else if (target_type == MC_TARGET_DEVICE) { - folio = target.folio; - if (!mem_cgroup_move_account(folio, true, - mc.from, mc.to)) { - mc.precharge -= HPAGE_PMD_NR; - mc.moved_charge += HPAGE_PMD_NR; - } - folio_unlock(folio); - folio_put(folio); - } - spin_unlock(ptl); - return 0; - } - -retry: - pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) - return 0; - for (; addr != end; addr += PAGE_SIZE) { - pte_t ptent = ptep_get(pte++); - bool device = false; - swp_entry_t ent; - - if (!mc.precharge) - break; - - switch (get_mctgt_type(vma, addr, ptent, &target)) { - case MC_TARGET_DEVICE: - device = true; - fallthrough; - case MC_TARGET_PAGE: - folio = target.folio; - /* - * We can have a part of the split pmd here. Moving it - * can be done but it would be too convoluted so simply - * ignore such a partial THP and keep it in original - * memcg. There should be somebody mapping the head. - */ - if (folio_test_large(folio)) - goto put; - if (!device && !folio_isolate_lru(folio)) - goto put; - if (!mem_cgroup_move_account(folio, false, - mc.from, mc.to)) { - mc.precharge--; - /* we uncharge from mc.from later. */ - mc.moved_charge++; - } - if (!device) - folio_putback_lru(folio); -put: /* get_mctgt_type() gets & locks the page */ - folio_unlock(folio); - folio_put(folio); - break; - case MC_TARGET_SWAP: - ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { - mc.precharge--; - mem_cgroup_id_get_many(mc.to, 1); - /* we fixup other refcnts and charges later. */ - mc.moved_swap++; - } - break; - default: - break; - } - } - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - - if (addr != end) { - /* - * We have consumed all precharges we got in can_attach(). - * We try charge one by one, but don't do any additional - * charges to mc.to if we have failed in charge once in attach() - * phase. - */ - ret = mem_cgroup_do_precharge(1); - if (!ret) - goto retry; - } - - return ret; -} - -static const struct mm_walk_ops charge_walk_ops = { - .pmd_entry = mem_cgroup_move_charge_pte_range, - .walk_lock = PGWALK_RDLOCK, -}; - -static void mem_cgroup_move_charge(void) -{ - lru_add_drain_all(); - /* - * Signal folio_memcg_lock() to take the memcg's move_lock - * while we're moving its pages to another memcg. Then wait - * for already started RCU-only updates to finish. - */ - atomic_inc(&mc.from->moving_account); - synchronize_rcu(); -retry: - if (unlikely(!mmap_read_trylock(mc.mm))) { - /* - * Someone who are holding the mmap_lock might be waiting in - * waitq. So we cancel all extra charges, wake up all waiters, - * and retry. Because we cancel precharges, we might not be able - * to move enough charges, but moving charge is a best-effort - * feature anyway, so it wouldn't be a big problem. - */ - __mem_cgroup_clear_mc(); - cond_resched(); - goto retry; - } - /* - * When we have consumed all precharges and failed in doing - * additional charge, the page walk just aborts. - */ - walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL); - mmap_read_unlock(mc.mm); - atomic_dec(&mc.from->moving_account); -} - -static void mem_cgroup_move_task(void) -{ - if (mc.to) { - mem_cgroup_move_charge(); - mem_cgroup_clear_mc(); - } -} - -#else /* !CONFIG_MMU */ -static int mem_cgroup_can_attach(struct cgroup_taskset *tset) -{ - return 0; -} -static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) -{ -} -static void mem_cgroup_move_task(void) -{ -} -#endif - #ifdef CONFIG_MEMCG_KMEM static void mem_cgroup_fork(struct task_struct *task) { From patchwork Tue May 28 20:20:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677353 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A03D6C25B7E for ; Tue, 28 May 2024 20:21:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C62016B00A0; Tue, 28 May 2024 16:21:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C0D6F6B00A1; Tue, 28 May 2024 16:21:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8AEC6B00A2; Tue, 28 May 2024 16:21:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 80AFD6B00A0 for ; Tue, 28 May 2024 16:21:45 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 37C9F160353 for ; Tue, 28 May 2024 20:21:45 +0000 (UTC) X-FDA: 82168925370.07.D4DA953 Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) by imf26.hostedemail.com (Postfix) with ESMTP id 1E884140004 for ; Tue, 28 May 2024 20:21:42 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=uEf6u+5H; spf=pass (imf26.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927703; a=rsa-sha256; cv=none; b=avsbS4Kf5xO8yZr+utr5fmhR5pYaE56+yZG8beMWaUOn96ZPEEVYEB3eMYxDz71ph/WsC1 lgO+jX7sVJxfhokR1+8Zq5My4x+Ow9JVvNm0int2W2FT72v8eyD2JxQzvfSKTiRHo79qNJ cG8vuoGbWfaRKOBzGA2GyLZgHyiVnRM= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=uEf6u+5H; spf=pass (imf26.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927703; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Eun9LANYNIu6WRYBzr/NUdVevzuva5x/1L5dxfrGADE=; b=LnTyjQLD7QWe2hO3/O69MJ6cBnpttotWNxaWEDVsHGS0Hh/x63z/3L9vY81eHa9Y0s1+re nvj3AF2uXPvU/oAMaOW4imyqTxSCXFyMXfNCEtbQ8OrOECaf7Ume2H3KLGnvPyBb7BrWn3 Thrw6F7qbwCv+pykQn2ufenMGcdDcII= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927701; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eun9LANYNIu6WRYBzr/NUdVevzuva5x/1L5dxfrGADE=; b=uEf6u+5H2ptxKxJYJqPeQz4DfDY15c2qmOVY/ZbMWk4dE+M1nT/rRUA2ZEqtnnY+ri68h9 8aCs3D9eu74QEehioHZpRj37oa8wtOZTqdd3cDCg5sEnFhEEZwrVmVy9GE3VycxnUsYFHn FziRkd0mj5l+N0Zocgl/JWyUTsjDvfc= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 05/14] mm: memcg: rename charge move-related functions Date: Tue, 28 May 2024 13:20:57 -0700 Message-ID: <20240528202101.3099300-6-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 1E884140004 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 56fu73cp5w1c1gh8971qdcnx6cad5jdz X-HE-Tag: 1716927702-705052 X-HE-Meta: U2FsdGVkX1+Q7+V32xkBjAuT8+gQYUxdMTr23JTOY207dwGfFsTyZ9/t8Qmq1cInJQIsobUCEnxQi+l+nRGAC6dyhDLZRW7uU3y+P4a7iOKutcqxxYVQIq+kcAVcIGxijKsIx4vm4h2YbHy6riNwIKRB7xALGBeNrzc7oGCJOkqzAFLr43hWCVzB4Qs/Rddb5oOmMQWNix+eYvmOIxhj2+P9DHNUZDYUesjoyE/Ti0S49rHcADSRxAtLK0Kbw12OZJzSGeAXpCNzXWvClcItdF4/fzOGTm3k0REGa9jA7zpxL7WUHFxyqqQ9IAFCXGhjUqIZU7+jAuAbZgZrZYJhAQLYP/CcLd+KESxlK4XRCNPC9o1B1D3g7J249p1InQ4lf0XB75iVEVLrJ4w5EUS0DUTO0sDbeil3P7aMDBwVO+3+VHFn7Cw8RsjGtrvTClMINY6liVfDzE8F06Nb7zdgXeJ/oH3Ic4xRyAB8Udbxm+hv6mn72LyZa3vbXoUa01zaQ5/ADSyG/Tdmk27g0Zszs8gRUuJzLK7fAZgYVuVhV5SSDCUM2LB8LCYmXKJ3i0EhkBjU2hvwrMLDOlAWv777lWOs/KS7LJcnNAQBMfb4hFG5tgAHkUFh4eFBaNr3voJujH593X1uVxzxc2SYCx7UoglunroayM1cJIIRF9iquma9jOIjp78e1hPOq03WDvy71FfWrXxAZQuxIP9zdDM1xeB16jAA7WoXiajudwbzxtEgryHVC9xdVrbf3tV5msCdUd6nvkB1GK5NjeBaYeUPqFCQccMObyJH3AIHLnchAbiNn0dalp0QrPrKnKrUp7GySyeNBOM1a7sSbGBGwZz1/Xf01MH7t8id/3H8vG960T5tOkObD/Y9b/s6sBCjh5HvrZ9A4EM2HHD88RLaXUI7czxNieAL2fNpNcqC2v+DckKKsBFTcnmWMiDDHJdIAIxbqvkspcjpamNR3eXrMSz s85JuU2b ls/g1N6mMyXSmpgeL1T57ImhWqv6HRmRG3FLg99OIZFSCQ2zfJ5eE8R587Mym8tD9eYLlVQBOfdFmu+8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Rename exported function related to the charge move to have the memcg1_ prefix. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- mm/memcontrol-v1.c | 8 ++++---- mm/memcontrol-v1.h | 8 ++++---- mm/memcontrol.c | 8 ++++---- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index f4c8bec5ae1b8..f3ea01a17eea9 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -384,7 +384,7 @@ static bool mem_cgroup_under_move(struct mem_cgroup *memcg) return ret; } -bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) +bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { if (mc.moving_task && current != mc.moving_task) { if (mem_cgroup_under_move(memcg)) { @@ -1056,7 +1056,7 @@ static void mem_cgroup_clear_mc(void) mmput(mm); } -int mem_cgroup_can_attach(struct cgroup_taskset *tset) +int memcg1_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */ @@ -1126,7 +1126,7 @@ int mem_cgroup_can_attach(struct cgroup_taskset *tset) return ret; } -void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +void memcg1_cancel_attach(struct cgroup_taskset *tset) { if (mc.to) mem_cgroup_clear_mc(); @@ -1285,7 +1285,7 @@ static void mem_cgroup_move_charge(void) atomic_dec(&mc.from->moving_account); } -void mem_cgroup_move_task(void) +void memcg1_move_task(void) { if (mc.to) { mem_cgroup_move_charge(); diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 55e7c4f90c399..d377c0be9880b 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -29,11 +29,11 @@ static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); -bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg); +bool memcg1_wait_acct_move(struct mem_cgroup *memcg); struct cgroup_taskset; -int mem_cgroup_can_attach(struct cgroup_taskset *tset); -void mem_cgroup_cancel_attach(struct cgroup_taskset *tset); -void mem_cgroup_move_task(void); +int memcg1_can_attach(struct cgroup_taskset *tset); +void memcg1_cancel_attach(struct cgroup_taskset *tset); +void memcg1_move_task(void); struct cftype; u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ba538ab0a80ad..b983cb6d1d228 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2582,7 +2582,7 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, * At task move, charge accounts can be doubly counted. So, it's * better to wait until the end of task_move if something is going on. */ - if (mem_cgroup_wait_acct_move(mem_over_limit)) + if (memcg1_wait_acct_move(mem_over_limit)) goto retry; if (nr_retries--) @@ -6032,12 +6032,12 @@ struct cgroup_subsys memory_cgrp_subsys = { .css_free = mem_cgroup_css_free, .css_reset = mem_cgroup_css_reset, .css_rstat_flush = mem_cgroup_css_rstat_flush, - .can_attach = mem_cgroup_can_attach, + .can_attach = memcg1_can_attach, #if defined(CONFIG_LRU_GEN) || defined(CONFIG_MEMCG_KMEM) .attach = mem_cgroup_attach, #endif - .cancel_attach = mem_cgroup_cancel_attach, - .post_attach = mem_cgroup_move_task, + .cancel_attach = memcg1_cancel_attach, + .post_attach = memcg1_move_task, #ifdef CONFIG_MEMCG_KMEM .fork = mem_cgroup_fork, .exit = mem_cgroup_exit, From patchwork Tue May 28 20:20:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677354 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A524EC25B78 for ; Tue, 28 May 2024 20:21:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 755DF6B00A2; Tue, 28 May 2024 16:21:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 704CC6B00A4; Tue, 28 May 2024 16:21:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5320C6B00A3; Tue, 28 May 2024 16:21:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1E0BA6B00A1 for ; Tue, 28 May 2024 16:21:48 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 933B1A025F for ; Tue, 28 May 2024 20:21:47 +0000 (UTC) X-FDA: 82168925454.03.3A85C4D Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) by imf19.hostedemail.com (Postfix) with ESMTP id 373681A0019 for ; Tue, 28 May 2024 20:21:45 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Tjo2t5xF; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf19.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927705; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3jy+g9g752L2fAOW7zmH9BmqwS3rWEeuJdP/k2VAPjY=; b=pkJe/zrll5TgTHFuZ2Wu94+l+dhgOZlKzy84pk9ZMikC/a1rWxj87AY2agVTHEhO92ItQ4 WO/DQJNF/d04vFoqACcIoWg3lV+vjGn1JjCSWOuhbgu6vpwvvBQhvlBtB+ykoDaxp5C7OK oFyYtX7jRKKRaott8mlfCzcpn6NerJY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Tjo2t5xF; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf19.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927705; a=rsa-sha256; cv=none; b=ASjBvuGOt/gUxv1suUf8TROOviTNdO8h+z6He2oTNc9XVWdWo1f4Nv6Q6HtxkwcuphNfdK pTethpvvIHuyRkIdEozCFftHuxI+psiYo2PU/RVifAHl9WxgMzFSL7N289lOL4k5NB+QaP kzN94C43bUwdt3zTxYNxOh10Kj1EUvo= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927703; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3jy+g9g752L2fAOW7zmH9BmqwS3rWEeuJdP/k2VAPjY=; b=Tjo2t5xFKFcPfuwKEkYvoKH488z0pUatOguWe4d+2NumgyhVHH6FEb2QEvgsTI1D1dgf84 BL2UHurYUkCq1qsURK5sI5XG0MZXevhGCr8UIkDyRhcaHYYbHHTnzg0d/kD2wCIJityBom 0lfon81MTV5CRCkmfV6rWmENZo9rW50= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 06/14] mm: memcg: move legacy memcg event code into memcontrol-v1.c Date: Tue, 28 May 2024 13:20:58 -0700 Message-ID: <20240528202101.3099300-7-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 373681A0019 X-Stat-Signature: td8t6gmorzbb1oxpqmdiibow47tmsz1k X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1716927705-186316 X-HE-Meta: U2FsdGVkX1+W29V66trbwBQuV6ea9eLX9uj9HiOMsN7d4+7Hx5ulfINyixwWJLCaly19B2WuwJxZaBmk4aZuDCdh9O+DU1RoWIH+EtjMk+vlfTDUPox2kmEzQdcU3DYgJLuEdXellcfY82ASsnNkMPCZlJiwNa+ySQe9u3FWRjqg4QJxiqx6/ED2Dhselnn9V3JEGw0NxkTVdDKkglf7LCYvvVJdh7O1ee1jMTUn3AoavSC5tWJEXIO++atn2gG2K2ROeiYB3nYXuA/LRPFmDNDtvkOs1UYEJVsXuzX3CHKR7neVkL5jPEUC/ejcKnM2rziv9VrEElPN0zeMRxPhR+oboLoJxpJU9ijdZvry2zax9BDMMuwVek2xp5Xw/5+Bnr/2V6oi2w6A7S336Bruix2Ephs/gcZf8jLWf1mgzZ3B0CqfBAQVLoI9b7ybS6VpG52Cqz+Z9VXT1QIJ2UU1aIwY4a4mDao834z29klzZcAc4SWjabkp+fi3sgfERZ0/6GQOPaHwl93YKoiaYpu6pgTcRkPaYEsNT93/Kv/3MOJ6B0IOJbH8EBsQnNmjZindF1D56NJy/N4yh626/zThwgfULhQM6xVkme6wnQSyhmvYI2cUVZmtzFLMVPXNImXmkB+DFhMEepdDxItmhFYTLrPwXvceZd9U0dWp9AkFIQBlhmNSbzrOsDxHgeKNIfxnurK37/7jd9DLqD0eAOwchDdJw54oMfSqjwc3d6dOd1pkINVJFblI9nnHSKwbzuVklv2m8Mkmf4vsoweRU4rNOP6Enn9arXZVcygPjMLBh0sNu8pWlSzzH4+4GfPutrfFw14a5XfjgaUJs0jtchsDnX6uUlpl7R3A6qJc4lEPiM/wYCJ0nsx7Z1DrnXQv1PmR2P++VRclwzlx8OTQsujfyxJ3vrlS1dyqBcMwnHxWc5vybJD90iNQ5d0WRGgUuE31IWWThSm/FDti4FHkrc/ xgBhoc/o XQV3BgdbK6quCy75sfSx9O14xDqTAekmtlsF59cQirXAZKuCydMF2rQSiLtv/cRMhjRUR/e9GF5vRcew= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Cgroup v1's memory controller contains a pretty complicated event notifications mechanism which is not used on cgroup v2. Let's move the corresponding code into memcontrol-v1.c. Please, note, that mem_cgroup_event_ratelimit() remains in memcontrol.c, otherwise it would require exporting too many details on memcg stats outside of memcontrol.c. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 12 - mm/memcontrol-v1.c | 653 +++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 51 +++ mm/memcontrol.c | 687 +------------------------------------ 4 files changed, 709 insertions(+), 694 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 04f9f7b0e7c2f..394b92b620d9c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -69,18 +69,6 @@ struct mem_cgroup_id { refcount_t ref; }; -/* - * Per memcg event counter is incremented at every pagein/pageout. With THP, - * it will be incremented by the number of pages. This counter is used - * to trigger some periodic events. This is straightforward and better - * than using jiffies etc. to handle periodic memcg event. - */ -enum mem_cgroup_events_target { - MEM_CGROUP_TARGET_THRESH, - MEM_CGROUP_TARGET_SOFTLIMIT, - MEM_CGROUP_NTARGETS, -}; - struct memcg_vmstats_percpu; struct memcg_vmstats; struct lruvec_stats_percpu; diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index f3ea01a17eea9..c47ffb6105931 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -6,6 +6,10 @@ #include #include #include +#include +#include +#include +#include #include "internal.h" #include "swap.h" @@ -60,6 +64,54 @@ static struct move_charge_struct { .waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), }; +/* for OOM */ +struct mem_cgroup_eventfd_list { + struct list_head list; + struct eventfd_ctx *eventfd; +}; + +/* + * cgroup_event represents events which userspace want to receive. + */ +struct mem_cgroup_event { + /* + * memcg which the event belongs to. + */ + struct mem_cgroup *memcg; + /* + * eventfd to signal userspace about the event. + */ + struct eventfd_ctx *eventfd; + /* + * Each of these stored in a list by the cgroup. + */ + struct list_head list; + /* + * register_event() callback will be used to add new userspace + * waiter for changes related to this event. Use eventfd_signal() + * on eventfd to send notification to userspace. + */ + int (*register_event)(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args); + /* + * unregister_event() callback will be called when userspace closes + * the eventfd or on cgroup removing. This callback must be set, + * if you want provide notification functionality. + */ + void (*unregister_event)(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd); + /* + * All fields below needed to unregister event when + * userspace closes eventfd. + */ + poll_table pt; + wait_queue_head_t *wqh; + wait_queue_entry_t wait; + struct work_struct remove; +}; + +extern spinlock_t memcg_oom_lock; + static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, unsigned long new_usage_in_excess) @@ -1306,6 +1358,607 @@ static void mem_cgroup_move_task(void) } #endif +static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) +{ + struct mem_cgroup_threshold_ary *t; + unsigned long usage; + int i; + + rcu_read_lock(); + if (!swap) + t = rcu_dereference(memcg->thresholds.primary); + else + t = rcu_dereference(memcg->memsw_thresholds.primary); + + if (!t) + goto unlock; + + usage = mem_cgroup_usage(memcg, swap); + + /* + * current_threshold points to threshold just below or equal to usage. + * If it's not true, a threshold was crossed after last + * call of __mem_cgroup_threshold(). + */ + i = t->current_threshold; + + /* + * Iterate backward over array of thresholds starting from + * current_threshold and check if a threshold is crossed. + * If none of thresholds below usage is crossed, we read + * only one element of the array here. + */ + for (; i >= 0 && unlikely(t->entries[i].threshold > usage); i--) + eventfd_signal(t->entries[i].eventfd); + + /* i = current_threshold + 1 */ + i++; + + /* + * Iterate forward over array of thresholds starting from + * current_threshold+1 and check if a threshold is crossed. + * If none of thresholds above usage is crossed, we read + * only one element of the array here. + */ + for (; i < t->size && unlikely(t->entries[i].threshold <= usage); i++) + eventfd_signal(t->entries[i].eventfd); + + /* Update current_threshold */ + t->current_threshold = i - 1; +unlock: + rcu_read_unlock(); +} + +static void mem_cgroup_threshold(struct mem_cgroup *memcg) +{ + while (memcg) { + __mem_cgroup_threshold(memcg, false); + if (do_memsw_account()) + __mem_cgroup_threshold(memcg, true); + + memcg = parent_mem_cgroup(memcg); + } +} + +/* + * Check events in order. + * + */ +void memcg_check_events(struct mem_cgroup *memcg, int nid) +{ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return; + + /* threshold event is triggered in finer grain than soft limit */ + if (unlikely(mem_cgroup_event_ratelimit(memcg, + MEM_CGROUP_TARGET_THRESH))) { + bool do_softlimit; + + do_softlimit = mem_cgroup_event_ratelimit(memcg, + MEM_CGROUP_TARGET_SOFTLIMIT); + mem_cgroup_threshold(memcg); + if (unlikely(do_softlimit)) + memcg1_update_tree(memcg, nid); + } +} + +static int compare_thresholds(const void *a, const void *b) +{ + const struct mem_cgroup_threshold *_a = a; + const struct mem_cgroup_threshold *_b = b; + + if (_a->threshold > _b->threshold) + return 1; + + if (_a->threshold < _b->threshold) + return -1; + + return 0; +} + +static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) +{ + struct mem_cgroup_eventfd_list *ev; + + spin_lock(&memcg_oom_lock); + + list_for_each_entry(ev, &memcg->oom_notify, list) + eventfd_signal(ev->eventfd); + + spin_unlock(&memcg_oom_lock); + return 0; +} + +void mem_cgroup_oom_notify(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + for_each_mem_cgroup_tree(iter, memcg) + mem_cgroup_oom_notify_cb(iter); +} + +static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args, enum res_type type) +{ + struct mem_cgroup_thresholds *thresholds; + struct mem_cgroup_threshold_ary *new; + unsigned long threshold; + unsigned long usage; + int i, size, ret; + + ret = page_counter_memparse(args, "-1", &threshold); + if (ret) + return ret; + + mutex_lock(&memcg->thresholds_lock); + + if (type == _MEM) { + thresholds = &memcg->thresholds; + usage = mem_cgroup_usage(memcg, false); + } else if (type == _MEMSWAP) { + thresholds = &memcg->memsw_thresholds; + usage = mem_cgroup_usage(memcg, true); + } else + BUG(); + + /* Check if a threshold crossed before adding a new one */ + if (thresholds->primary) + __mem_cgroup_threshold(memcg, type == _MEMSWAP); + + size = thresholds->primary ? thresholds->primary->size + 1 : 1; + + /* Allocate memory for new array of thresholds */ + new = kmalloc(struct_size(new, entries, size), GFP_KERNEL); + if (!new) { + ret = -ENOMEM; + goto unlock; + } + new->size = size; + + /* Copy thresholds (if any) to new array */ + if (thresholds->primary) + memcpy(new->entries, thresholds->primary->entries, + flex_array_size(new, entries, size - 1)); + + /* Add new threshold */ + new->entries[size - 1].eventfd = eventfd; + new->entries[size - 1].threshold = threshold; + + /* Sort thresholds. Registering of new threshold isn't time-critical */ + sort(new->entries, size, sizeof(*new->entries), + compare_thresholds, NULL); + + /* Find current threshold */ + new->current_threshold = -1; + for (i = 0; i < size; i++) { + if (new->entries[i].threshold <= usage) { + /* + * new->current_threshold will not be used until + * rcu_assign_pointer(), so it's safe to increment + * it here. + */ + ++new->current_threshold; + } else + break; + } + + /* Free old spare buffer and save old primary buffer as spare */ + kfree(thresholds->spare); + thresholds->spare = thresholds->primary; + + rcu_assign_pointer(thresholds->primary, new); + + /* To be sure that nobody uses thresholds */ + synchronize_rcu(); + +unlock: + mutex_unlock(&memcg->thresholds_lock); + + return ret; +} + +static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM); +} + +static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP); +} + +static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, enum res_type type) +{ + struct mem_cgroup_thresholds *thresholds; + struct mem_cgroup_threshold_ary *new; + unsigned long usage; + int i, j, size, entries; + + mutex_lock(&memcg->thresholds_lock); + + if (type == _MEM) { + thresholds = &memcg->thresholds; + usage = mem_cgroup_usage(memcg, false); + } else if (type == _MEMSWAP) { + thresholds = &memcg->memsw_thresholds; + usage = mem_cgroup_usage(memcg, true); + } else + BUG(); + + if (!thresholds->primary) + goto unlock; + + /* Check if a threshold crossed before removing */ + __mem_cgroup_threshold(memcg, type == _MEMSWAP); + + /* Calculate new number of threshold */ + size = entries = 0; + for (i = 0; i < thresholds->primary->size; i++) { + if (thresholds->primary->entries[i].eventfd != eventfd) + size++; + else + entries++; + } + + new = thresholds->spare; + + /* If no items related to eventfd have been cleared, nothing to do */ + if (!entries) + goto unlock; + + /* Set thresholds array to NULL if we don't have thresholds */ + if (!size) { + kfree(new); + new = NULL; + goto swap_buffers; + } + + new->size = size; + + /* Copy thresholds and find current threshold */ + new->current_threshold = -1; + for (i = 0, j = 0; i < thresholds->primary->size; i++) { + if (thresholds->primary->entries[i].eventfd == eventfd) + continue; + + new->entries[j] = thresholds->primary->entries[i]; + if (new->entries[j].threshold <= usage) { + /* + * new->current_threshold will not be used + * until rcu_assign_pointer(), so it's safe to increment + * it here. + */ + ++new->current_threshold; + } + j++; + } + +swap_buffers: + /* Swap primary and spare array */ + thresholds->spare = thresholds->primary; + + rcu_assign_pointer(thresholds->primary, new); + + /* To be sure that nobody uses thresholds */ + synchronize_rcu(); + + /* If all events are unregistered, free the spare array */ + if (!new) { + kfree(thresholds->spare); + thresholds->spare = NULL; + } +unlock: + mutex_unlock(&memcg->thresholds_lock); +} + +static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM); +} + +static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP); +} + +static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + struct mem_cgroup_eventfd_list *event; + + event = kmalloc(sizeof(*event), GFP_KERNEL); + if (!event) + return -ENOMEM; + + spin_lock(&memcg_oom_lock); + + event->eventfd = eventfd; + list_add(&event->list, &memcg->oom_notify); + + /* already in OOM ? */ + if (memcg->under_oom) + eventfd_signal(eventfd); + spin_unlock(&memcg_oom_lock); + + return 0; +} + +static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + struct mem_cgroup_eventfd_list *ev, *tmp; + + spin_lock(&memcg_oom_lock); + + list_for_each_entry_safe(ev, tmp, &memcg->oom_notify, list) { + if (ev->eventfd == eventfd) { + list_del(&ev->list); + kfree(ev); + } + } + + spin_unlock(&memcg_oom_lock); +} + +/* + * DO NOT USE IN NEW FILES. + * + * "cgroup.event_control" implementation. + * + * This is way over-engineered. It tries to support fully configurable + * events for each user. Such level of flexibility is completely + * unnecessary especially in the light of the planned unified hierarchy. + * + * Please deprecate this and replace with something simpler if at all + * possible. + */ + +/* + * Unregister event and free resources. + * + * Gets called from workqueue. + */ +static void memcg_event_remove(struct work_struct *work) +{ + struct mem_cgroup_event *event = + container_of(work, struct mem_cgroup_event, remove); + struct mem_cgroup *memcg = event->memcg; + + remove_wait_queue(event->wqh, &event->wait); + + event->unregister_event(memcg, event->eventfd); + + /* Notify userspace the event is going away. */ + eventfd_signal(event->eventfd); + + eventfd_ctx_put(event->eventfd); + kfree(event); + css_put(&memcg->css); +} + +/* + * Gets called on EPOLLHUP on eventfd when user closes it. + * + * Called with wqh->lock held and interrupts disabled. + */ +static int memcg_event_wake(wait_queue_entry_t *wait, unsigned mode, + int sync, void *key) +{ + struct mem_cgroup_event *event = + container_of(wait, struct mem_cgroup_event, wait); + struct mem_cgroup *memcg = event->memcg; + __poll_t flags = key_to_poll(key); + + if (flags & EPOLLHUP) { + /* + * If the event has been detached at cgroup removal, we + * can simply return knowing the other side will cleanup + * for us. + * + * We can't race against event freeing since the other + * side will require wqh->lock via remove_wait_queue(), + * which we hold. + */ + spin_lock(&memcg->event_list_lock); + if (!list_empty(&event->list)) { + list_del_init(&event->list); + /* + * We are in atomic context, but cgroup_event_remove() + * may sleep, so we have to call it in workqueue. + */ + schedule_work(&event->remove); + } + spin_unlock(&memcg->event_list_lock); + } + + return 0; +} + +static void memcg_event_ptable_queue_proc(struct file *file, + wait_queue_head_t *wqh, poll_table *pt) +{ + struct mem_cgroup_event *event = + container_of(pt, struct mem_cgroup_event, pt); + + event->wqh = wqh; + add_wait_queue(wqh, &event->wait); +} + +/* + * DO NOT USE IN NEW FILES. + * + * Parse input and register new cgroup event handler. + * + * Input must be in format ' '. + * Interpretation of args is defined by control file implementation. + */ +ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct cgroup_subsys_state *css = of_css(of); + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + struct mem_cgroup_event *event; + struct cgroup_subsys_state *cfile_css; + unsigned int efd, cfd; + struct fd efile; + struct fd cfile; + struct dentry *cdentry; + const char *name; + char *endp; + int ret; + + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return -EOPNOTSUPP; + + buf = strstrip(buf); + + efd = simple_strtoul(buf, &endp, 10); + if (*endp != ' ') + return -EINVAL; + buf = endp + 1; + + cfd = simple_strtoul(buf, &endp, 10); + if ((*endp != ' ') && (*endp != '\0')) + return -EINVAL; + buf = endp + 1; + + event = kzalloc(sizeof(*event), GFP_KERNEL); + if (!event) + return -ENOMEM; + + event->memcg = memcg; + INIT_LIST_HEAD(&event->list); + init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc); + init_waitqueue_func_entry(&event->wait, memcg_event_wake); + INIT_WORK(&event->remove, memcg_event_remove); + + efile = fdget(efd); + if (!efile.file) { + ret = -EBADF; + goto out_kfree; + } + + event->eventfd = eventfd_ctx_fileget(efile.file); + if (IS_ERR(event->eventfd)) { + ret = PTR_ERR(event->eventfd); + goto out_put_efile; + } + + cfile = fdget(cfd); + if (!cfile.file) { + ret = -EBADF; + goto out_put_eventfd; + } + + /* the process need read permission on control file */ + /* AV: shouldn't we check that it's been opened for read instead? */ + ret = file_permission(cfile.file, MAY_READ); + if (ret < 0) + goto out_put_cfile; + + /* + * The control file must be a regular cgroup1 file. As a regular cgroup + * file can't be renamed, it's safe to access its name afterwards. + */ + cdentry = cfile.file->f_path.dentry; + if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) { + ret = -EINVAL; + goto out_put_cfile; + } + + /* + * Determine the event callbacks and set them in @event. This used + * to be done via struct cftype but cgroup core no longer knows + * about these events. The following is crude but the whole thing + * is for compatibility anyway. + * + * DO NOT ADD NEW FILES. + */ + name = cdentry->d_name.name; + + if (!strcmp(name, "memory.usage_in_bytes")) { + event->register_event = mem_cgroup_usage_register_event; + event->unregister_event = mem_cgroup_usage_unregister_event; + } else if (!strcmp(name, "memory.oom_control")) { + event->register_event = mem_cgroup_oom_register_event; + event->unregister_event = mem_cgroup_oom_unregister_event; + } else if (!strcmp(name, "memory.pressure_level")) { + event->register_event = vmpressure_register_event; + event->unregister_event = vmpressure_unregister_event; + } else if (!strcmp(name, "memory.memsw.usage_in_bytes")) { + event->register_event = memsw_cgroup_usage_register_event; + event->unregister_event = memsw_cgroup_usage_unregister_event; + } else { + ret = -EINVAL; + goto out_put_cfile; + } + + /* + * Verify @cfile should belong to @css. Also, remaining events are + * automatically removed on cgroup destruction but the removal is + * asynchronous, so take an extra ref on @css. + */ + cfile_css = css_tryget_online_from_dir(cdentry->d_parent, + &memory_cgrp_subsys); + ret = -EINVAL; + if (IS_ERR(cfile_css)) + goto out_put_cfile; + if (cfile_css != css) { + css_put(cfile_css); + goto out_put_cfile; + } + + ret = event->register_event(memcg, event->eventfd, buf); + if (ret) + goto out_put_css; + + vfs_poll(efile.file, &event->pt); + + spin_lock_irq(&memcg->event_list_lock); + list_add(&event->list, &memcg->event_list); + spin_unlock_irq(&memcg->event_list_lock); + + fdput(cfile); + fdput(efile); + + return nbytes; + +out_put_css: + css_put(css); +out_put_cfile: + fdput(cfile); +out_put_eventfd: + eventfd_ctx_put(event->eventfd); +out_put_efile: + fdput(efile); +out_kfree: + kfree(event); + + return ret; +} + +void memcg1_css_offline(struct mem_cgroup *memcg) +{ + struct mem_cgroup_event *event, *tmp; + + /* + * Unregister events and notify userspace. + * Notify userspace about cgroup removing only after rmdir of cgroup + * directory to avoid race between userspace and kernelspace. + */ + spin_lock_irq(&memcg->event_list_lock); + list_for_each_entry_safe(event, tmp, &memcg->event_list, list) { + list_del_init(&event->list); + schedule_work(&event->remove); + } + spin_unlock_irq(&memcg->event_list_lock); +} + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index d377c0be9880b..524a2c76ffc97 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -41,4 +41,55 @@ u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val); +/* + * Per memcg event counter is incremented at every pagein/pageout. With THP, + * it will be incremented by the number of pages. This counter is used + * to trigger some periodic events. This is straightforward and better + * than using jiffies etc. to handle periodic memcg event. + */ +enum mem_cgroup_events_target { + MEM_CGROUP_TARGET_THRESH, + MEM_CGROUP_TARGET_SOFTLIMIT, + MEM_CGROUP_NTARGETS, +}; + +/* Whether legacy memory+swap accounting is active */ +static bool do_memsw_account(void) +{ + return !cgroup_subsys_on_dfl(memory_cgrp_subsys); +} + +/* + * Iteration constructs for visiting all cgroups (under a tree). If + * loops are exited prematurely (break), mem_cgroup_iter_break() must + * be used for reference counting. + */ +#define for_each_mem_cgroup_tree(iter, root) \ + for (iter = mem_cgroup_iter(root, NULL, NULL); \ + iter != NULL; \ + iter = mem_cgroup_iter(root, iter, NULL)) + +#define for_each_mem_cgroup(iter) \ + for (iter = mem_cgroup_iter(NULL, NULL, NULL); \ + iter != NULL; \ + iter = mem_cgroup_iter(NULL, iter, NULL)) + +void memcg1_css_offline(struct mem_cgroup *memcg); + +/* for encoding cft->private value on file */ +enum res_type { + _MEM, + _MEMSWAP, + _KMEM, + _TCP, +}; + +bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, + enum mem_cgroup_events_target target); +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); +void mem_cgroup_oom_notify(struct mem_cgroup *memcg); +ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off); + + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b983cb6d1d228..6bc9009bee517 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -46,9 +46,6 @@ #include #include #include -#include -#include -#include #include #include #include @@ -59,7 +56,6 @@ #include #include #include -#include #include #include #include @@ -97,91 +93,13 @@ static bool cgroup_memory_nobpf __ro_after_init; static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #endif -/* Whether legacy memory+swap accounting is active */ -static bool do_memsw_account(void) -{ - return !cgroup_subsys_on_dfl(memory_cgrp_subsys); -} - #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 -/* for OOM */ -struct mem_cgroup_eventfd_list { - struct list_head list; - struct eventfd_ctx *eventfd; -}; - -/* - * cgroup_event represents events which userspace want to receive. - */ -struct mem_cgroup_event { - /* - * memcg which the event belongs to. - */ - struct mem_cgroup *memcg; - /* - * eventfd to signal userspace about the event. - */ - struct eventfd_ctx *eventfd; - /* - * Each of these stored in a list by the cgroup. - */ - struct list_head list; - /* - * register_event() callback will be used to add new userspace - * waiter for changes related to this event. Use eventfd_signal() - * on eventfd to send notification to userspace. - */ - int (*register_event)(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args); - /* - * unregister_event() callback will be called when userspace closes - * the eventfd or on cgroup removing. This callback must be set, - * if you want provide notification functionality. - */ - void (*unregister_event)(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd); - /* - * All fields below needed to unregister event when - * userspace closes eventfd. - */ - poll_table pt; - wait_queue_head_t *wqh; - wait_queue_entry_t wait; - struct work_struct remove; -}; - -static void mem_cgroup_threshold(struct mem_cgroup *memcg); -static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); - -/* for encoding cft->private value on file */ -enum res_type { - _MEM, - _MEMSWAP, - _KMEM, - _TCP, -}; - #define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) #define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) #define MEMFILE_ATTR(val) ((val) & 0xffff) -/* - * Iteration constructs for visiting all cgroups (under a tree). If - * loops are exited prematurely (break), mem_cgroup_iter_break() must - * be used for reference counting. - */ -#define for_each_mem_cgroup_tree(iter, root) \ - for (iter = mem_cgroup_iter(root, NULL, NULL); \ - iter != NULL; \ - iter = mem_cgroup_iter(root, iter, NULL)) - -#define for_each_mem_cgroup(iter) \ - for (iter = mem_cgroup_iter(NULL, NULL, NULL); \ - iter != NULL; \ - iter = mem_cgroup_iter(NULL, iter, NULL)) - static inline bool task_is_dying(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || @@ -940,8 +858,8 @@ void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages) __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages); } -static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, - enum mem_cgroup_events_target target) +bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, + enum mem_cgroup_events_target target) { unsigned long val, next; @@ -965,28 +883,6 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, return false; } -/* - * Check events in order. - * - */ -void memcg_check_events(struct mem_cgroup *memcg, int nid) -{ - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - return; - - /* threshold event is triggered in finer grain than soft limit */ - if (unlikely(mem_cgroup_event_ratelimit(memcg, - MEM_CGROUP_TARGET_THRESH))) { - bool do_softlimit; - - do_softlimit = mem_cgroup_event_ratelimit(memcg, - MEM_CGROUP_TARGET_SOFTLIMIT); - mem_cgroup_threshold(memcg); - if (unlikely(do_softlimit)) - memcg1_update_tree(memcg, nid); - } -} - struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) { /* @@ -1726,7 +1622,7 @@ static struct lockdep_map memcg_oom_lock_dep_map = { }; #endif -static DEFINE_SPINLOCK(memcg_oom_lock); +DEFINE_SPINLOCK(memcg_oom_lock); /* * Check OOM-Killer is already running under our hierarchy. @@ -3547,7 +3443,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, return -EINVAL; } -static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) { unsigned long val; @@ -4048,331 +3944,6 @@ static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, return 0; } -static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) -{ - struct mem_cgroup_threshold_ary *t; - unsigned long usage; - int i; - - rcu_read_lock(); - if (!swap) - t = rcu_dereference(memcg->thresholds.primary); - else - t = rcu_dereference(memcg->memsw_thresholds.primary); - - if (!t) - goto unlock; - - usage = mem_cgroup_usage(memcg, swap); - - /* - * current_threshold points to threshold just below or equal to usage. - * If it's not true, a threshold was crossed after last - * call of __mem_cgroup_threshold(). - */ - i = t->current_threshold; - - /* - * Iterate backward over array of thresholds starting from - * current_threshold and check if a threshold is crossed. - * If none of thresholds below usage is crossed, we read - * only one element of the array here. - */ - for (; i >= 0 && unlikely(t->entries[i].threshold > usage); i--) - eventfd_signal(t->entries[i].eventfd); - - /* i = current_threshold + 1 */ - i++; - - /* - * Iterate forward over array of thresholds starting from - * current_threshold+1 and check if a threshold is crossed. - * If none of thresholds above usage is crossed, we read - * only one element of the array here. - */ - for (; i < t->size && unlikely(t->entries[i].threshold <= usage); i++) - eventfd_signal(t->entries[i].eventfd); - - /* Update current_threshold */ - t->current_threshold = i - 1; -unlock: - rcu_read_unlock(); -} - -static void mem_cgroup_threshold(struct mem_cgroup *memcg) -{ - while (memcg) { - __mem_cgroup_threshold(memcg, false); - if (do_memsw_account()) - __mem_cgroup_threshold(memcg, true); - - memcg = parent_mem_cgroup(memcg); - } -} - -static int compare_thresholds(const void *a, const void *b) -{ - const struct mem_cgroup_threshold *_a = a; - const struct mem_cgroup_threshold *_b = b; - - if (_a->threshold > _b->threshold) - return 1; - - if (_a->threshold < _b->threshold) - return -1; - - return 0; -} - -static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) -{ - struct mem_cgroup_eventfd_list *ev; - - spin_lock(&memcg_oom_lock); - - list_for_each_entry(ev, &memcg->oom_notify, list) - eventfd_signal(ev->eventfd); - - spin_unlock(&memcg_oom_lock); - return 0; -} - -static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - for_each_mem_cgroup_tree(iter, memcg) - mem_cgroup_oom_notify_cb(iter); -} - -static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args, enum res_type type) -{ - struct mem_cgroup_thresholds *thresholds; - struct mem_cgroup_threshold_ary *new; - unsigned long threshold; - unsigned long usage; - int i, size, ret; - - ret = page_counter_memparse(args, "-1", &threshold); - if (ret) - return ret; - - mutex_lock(&memcg->thresholds_lock); - - if (type == _MEM) { - thresholds = &memcg->thresholds; - usage = mem_cgroup_usage(memcg, false); - } else if (type == _MEMSWAP) { - thresholds = &memcg->memsw_thresholds; - usage = mem_cgroup_usage(memcg, true); - } else - BUG(); - - /* Check if a threshold crossed before adding a new one */ - if (thresholds->primary) - __mem_cgroup_threshold(memcg, type == _MEMSWAP); - - size = thresholds->primary ? thresholds->primary->size + 1 : 1; - - /* Allocate memory for new array of thresholds */ - new = kmalloc(struct_size(new, entries, size), GFP_KERNEL); - if (!new) { - ret = -ENOMEM; - goto unlock; - } - new->size = size; - - /* Copy thresholds (if any) to new array */ - if (thresholds->primary) - memcpy(new->entries, thresholds->primary->entries, - flex_array_size(new, entries, size - 1)); - - /* Add new threshold */ - new->entries[size - 1].eventfd = eventfd; - new->entries[size - 1].threshold = threshold; - - /* Sort thresholds. Registering of new threshold isn't time-critical */ - sort(new->entries, size, sizeof(*new->entries), - compare_thresholds, NULL); - - /* Find current threshold */ - new->current_threshold = -1; - for (i = 0; i < size; i++) { - if (new->entries[i].threshold <= usage) { - /* - * new->current_threshold will not be used until - * rcu_assign_pointer(), so it's safe to increment - * it here. - */ - ++new->current_threshold; - } else - break; - } - - /* Free old spare buffer and save old primary buffer as spare */ - kfree(thresholds->spare); - thresholds->spare = thresholds->primary; - - rcu_assign_pointer(thresholds->primary, new); - - /* To be sure that nobody uses thresholds */ - synchronize_rcu(); - -unlock: - mutex_unlock(&memcg->thresholds_lock); - - return ret; -} - -static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM); -} - -static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP); -} - -static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, enum res_type type) -{ - struct mem_cgroup_thresholds *thresholds; - struct mem_cgroup_threshold_ary *new; - unsigned long usage; - int i, j, size, entries; - - mutex_lock(&memcg->thresholds_lock); - - if (type == _MEM) { - thresholds = &memcg->thresholds; - usage = mem_cgroup_usage(memcg, false); - } else if (type == _MEMSWAP) { - thresholds = &memcg->memsw_thresholds; - usage = mem_cgroup_usage(memcg, true); - } else - BUG(); - - if (!thresholds->primary) - goto unlock; - - /* Check if a threshold crossed before removing */ - __mem_cgroup_threshold(memcg, type == _MEMSWAP); - - /* Calculate new number of threshold */ - size = entries = 0; - for (i = 0; i < thresholds->primary->size; i++) { - if (thresholds->primary->entries[i].eventfd != eventfd) - size++; - else - entries++; - } - - new = thresholds->spare; - - /* If no items related to eventfd have been cleared, nothing to do */ - if (!entries) - goto unlock; - - /* Set thresholds array to NULL if we don't have thresholds */ - if (!size) { - kfree(new); - new = NULL; - goto swap_buffers; - } - - new->size = size; - - /* Copy thresholds and find current threshold */ - new->current_threshold = -1; - for (i = 0, j = 0; i < thresholds->primary->size; i++) { - if (thresholds->primary->entries[i].eventfd == eventfd) - continue; - - new->entries[j] = thresholds->primary->entries[i]; - if (new->entries[j].threshold <= usage) { - /* - * new->current_threshold will not be used - * until rcu_assign_pointer(), so it's safe to increment - * it here. - */ - ++new->current_threshold; - } - j++; - } - -swap_buffers: - /* Swap primary and spare array */ - thresholds->spare = thresholds->primary; - - rcu_assign_pointer(thresholds->primary, new); - - /* To be sure that nobody uses thresholds */ - synchronize_rcu(); - - /* If all events are unregistered, free the spare array */ - if (!new) { - kfree(thresholds->spare); - thresholds->spare = NULL; - } -unlock: - mutex_unlock(&memcg->thresholds_lock); -} - -static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM); -} - -static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP); -} - -static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - struct mem_cgroup_eventfd_list *event; - - event = kmalloc(sizeof(*event), GFP_KERNEL); - if (!event) - return -ENOMEM; - - spin_lock(&memcg_oom_lock); - - event->eventfd = eventfd; - list_add(&event->list, &memcg->oom_notify); - - /* already in OOM ? */ - if (memcg->under_oom) - eventfd_signal(eventfd); - spin_unlock(&memcg_oom_lock); - - return 0; -} - -static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - struct mem_cgroup_eventfd_list *ev, *tmp; - - spin_lock(&memcg_oom_lock); - - list_for_each_entry_safe(ev, tmp, &memcg->oom_notify, list) { - if (ev->eventfd == eventfd) { - list_del(&ev->list); - kfree(ev); - } - } - - spin_unlock(&memcg_oom_lock); -} - static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(sf); @@ -4613,243 +4184,6 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg) #endif /* CONFIG_CGROUP_WRITEBACK */ -/* - * DO NOT USE IN NEW FILES. - * - * "cgroup.event_control" implementation. - * - * This is way over-engineered. It tries to support fully configurable - * events for each user. Such level of flexibility is completely - * unnecessary especially in the light of the planned unified hierarchy. - * - * Please deprecate this and replace with something simpler if at all - * possible. - */ - -/* - * Unregister event and free resources. - * - * Gets called from workqueue. - */ -static void memcg_event_remove(struct work_struct *work) -{ - struct mem_cgroup_event *event = - container_of(work, struct mem_cgroup_event, remove); - struct mem_cgroup *memcg = event->memcg; - - remove_wait_queue(event->wqh, &event->wait); - - event->unregister_event(memcg, event->eventfd); - - /* Notify userspace the event is going away. */ - eventfd_signal(event->eventfd); - - eventfd_ctx_put(event->eventfd); - kfree(event); - css_put(&memcg->css); -} - -/* - * Gets called on EPOLLHUP on eventfd when user closes it. - * - * Called with wqh->lock held and interrupts disabled. - */ -static int memcg_event_wake(wait_queue_entry_t *wait, unsigned mode, - int sync, void *key) -{ - struct mem_cgroup_event *event = - container_of(wait, struct mem_cgroup_event, wait); - struct mem_cgroup *memcg = event->memcg; - __poll_t flags = key_to_poll(key); - - if (flags & EPOLLHUP) { - /* - * If the event has been detached at cgroup removal, we - * can simply return knowing the other side will cleanup - * for us. - * - * We can't race against event freeing since the other - * side will require wqh->lock via remove_wait_queue(), - * which we hold. - */ - spin_lock(&memcg->event_list_lock); - if (!list_empty(&event->list)) { - list_del_init(&event->list); - /* - * We are in atomic context, but cgroup_event_remove() - * may sleep, so we have to call it in workqueue. - */ - schedule_work(&event->remove); - } - spin_unlock(&memcg->event_list_lock); - } - - return 0; -} - -static void memcg_event_ptable_queue_proc(struct file *file, - wait_queue_head_t *wqh, poll_table *pt) -{ - struct mem_cgroup_event *event = - container_of(pt, struct mem_cgroup_event, pt); - - event->wqh = wqh; - add_wait_queue(wqh, &event->wait); -} - -/* - * DO NOT USE IN NEW FILES. - * - * Parse input and register new cgroup event handler. - * - * Input must be in format ' '. - * Interpretation of args is defined by control file implementation. - */ -static ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) -{ - struct cgroup_subsys_state *css = of_css(of); - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - struct mem_cgroup_event *event; - struct cgroup_subsys_state *cfile_css; - unsigned int efd, cfd; - struct fd efile; - struct fd cfile; - struct dentry *cdentry; - const char *name; - char *endp; - int ret; - - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - return -EOPNOTSUPP; - - buf = strstrip(buf); - - efd = simple_strtoul(buf, &endp, 10); - if (*endp != ' ') - return -EINVAL; - buf = endp + 1; - - cfd = simple_strtoul(buf, &endp, 10); - if ((*endp != ' ') && (*endp != '\0')) - return -EINVAL; - buf = endp + 1; - - event = kzalloc(sizeof(*event), GFP_KERNEL); - if (!event) - return -ENOMEM; - - event->memcg = memcg; - INIT_LIST_HEAD(&event->list); - init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc); - init_waitqueue_func_entry(&event->wait, memcg_event_wake); - INIT_WORK(&event->remove, memcg_event_remove); - - efile = fdget(efd); - if (!efile.file) { - ret = -EBADF; - goto out_kfree; - } - - event->eventfd = eventfd_ctx_fileget(efile.file); - if (IS_ERR(event->eventfd)) { - ret = PTR_ERR(event->eventfd); - goto out_put_efile; - } - - cfile = fdget(cfd); - if (!cfile.file) { - ret = -EBADF; - goto out_put_eventfd; - } - - /* the process need read permission on control file */ - /* AV: shouldn't we check that it's been opened for read instead? */ - ret = file_permission(cfile.file, MAY_READ); - if (ret < 0) - goto out_put_cfile; - - /* - * The control file must be a regular cgroup1 file. As a regular cgroup - * file can't be renamed, it's safe to access its name afterwards. - */ - cdentry = cfile.file->f_path.dentry; - if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) { - ret = -EINVAL; - goto out_put_cfile; - } - - /* - * Determine the event callbacks and set them in @event. This used - * to be done via struct cftype but cgroup core no longer knows - * about these events. The following is crude but the whole thing - * is for compatibility anyway. - * - * DO NOT ADD NEW FILES. - */ - name = cdentry->d_name.name; - - if (!strcmp(name, "memory.usage_in_bytes")) { - event->register_event = mem_cgroup_usage_register_event; - event->unregister_event = mem_cgroup_usage_unregister_event; - } else if (!strcmp(name, "memory.oom_control")) { - event->register_event = mem_cgroup_oom_register_event; - event->unregister_event = mem_cgroup_oom_unregister_event; - } else if (!strcmp(name, "memory.pressure_level")) { - event->register_event = vmpressure_register_event; - event->unregister_event = vmpressure_unregister_event; - } else if (!strcmp(name, "memory.memsw.usage_in_bytes")) { - event->register_event = memsw_cgroup_usage_register_event; - event->unregister_event = memsw_cgroup_usage_unregister_event; - } else { - ret = -EINVAL; - goto out_put_cfile; - } - - /* - * Verify @cfile should belong to @css. Also, remaining events are - * automatically removed on cgroup destruction but the removal is - * asynchronous, so take an extra ref on @css. - */ - cfile_css = css_tryget_online_from_dir(cdentry->d_parent, - &memory_cgrp_subsys); - ret = -EINVAL; - if (IS_ERR(cfile_css)) - goto out_put_cfile; - if (cfile_css != css) { - css_put(cfile_css); - goto out_put_cfile; - } - - ret = event->register_event(memcg, event->eventfd, buf); - if (ret) - goto out_put_css; - - vfs_poll(efile.file, &event->pt); - - spin_lock_irq(&memcg->event_list_lock); - list_add(&event->list, &memcg->event_list); - spin_unlock_irq(&memcg->event_list_lock); - - fdput(cfile); - fdput(efile); - - return nbytes; - -out_put_css: - css_put(css); -out_put_cfile: - fdput(cfile); -out_put_eventfd: - eventfd_ctx_put(event->eventfd); -out_put_efile: - fdput(efile); -out_kfree: - kfree(event); - - return ret; -} - #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) static int mem_cgroup_slab_show(struct seq_file *m, void *p) { @@ -5316,19 +4650,8 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); - struct mem_cgroup_event *event, *tmp; - /* - * Unregister events and notify userspace. - * Notify userspace about cgroup removing only after rmdir of cgroup - * directory to avoid race between userspace and kernelspace. - */ - spin_lock_irq(&memcg->event_list_lock); - list_for_each_entry_safe(event, tmp, &memcg->event_list, list) { - list_del_init(&event->list); - schedule_work(&event->remove); - } - spin_unlock_irq(&memcg->event_list_lock); + memcg1_css_offline(memcg); page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); From patchwork Tue May 28 20:20:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677355 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E5B7C25B7C for ; Tue, 28 May 2024 20:21:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2002A6B00A3; Tue, 28 May 2024 16:21:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 15E786B00A4; Tue, 28 May 2024 16:21:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF9B76B00A5; Tue, 28 May 2024 16:21:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C572C6B00A3 for ; Tue, 28 May 2024 16:21:49 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 80D9D14046E for ; Tue, 28 May 2024 20:21:49 +0000 (UTC) X-FDA: 82168925538.23.FCD4732 Received: from out-175.mta1.migadu.com (out-175.mta1.migadu.com [95.215.58.175]) by imf30.hostedemail.com (Postfix) with ESMTP id 6D38880009 for ; Tue, 28 May 2024 20:21:47 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=LVdjxeY3; spf=pass (imf30.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.175 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927707; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iHmz/4j1k+x1JyR1DIx1IjyudPVgo21HIVNpr+GFlhs=; b=WJ/83iZHYDORUhGAbrhuOoZtzcNl0oYYF29WKpAVcZU12/ceg2sQpGmeuXvuzoeeI5Bz0g O77lMEz/7mJUgytePg4ON++8CNBNNaYU0VpbLa9/dj/m1KAh7gG8q42IiIJLjMXReSFrIw YcMAeZn+5kU3IaEN9ntPy3o6sqOZ3yU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927707; a=rsa-sha256; cv=none; b=8jAyFGKzPf0BwQ/7MZcE89ItGrypAnjlgqwMVo3Dvy1gi/s6tsZOgJyqt7F+BRNCM+QouJ mrkXPLsreT5hNL8XOOgQ/t8NrL8xQosXi/qNbpuq2QCSQL/DW+MZaDi5wZQxc5zAH93CsI zmtTxx7bNN+RxqYTNXRlx33A0XUkvFo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=LVdjxeY3; spf=pass (imf30.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.175 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927706; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iHmz/4j1k+x1JyR1DIx1IjyudPVgo21HIVNpr+GFlhs=; b=LVdjxeY3LUib37ppAfCHlYVPf/yApIN0kLiPB4FhEQDPrNz936dNx1FmhJTTlzA/+vt2ZN aNsexft76Fo9VCdnuUG+U5S0qohsfoHgSdvCO/XXJNh02mdFVF9Dne0aDA6WrkIjiCpDnL PQGCwOvl7qqurpkyKBNvI3YnND5IhW8= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 07/14] mm: memcg: rename memcg_check_events() Date: Tue, 28 May 2024 13:20:59 -0700 Message-ID: <20240528202101.3099300-8-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6D38880009 X-Rspam-User: X-Stat-Signature: tpjuj594sswyftz49uc9hk6eqj7hqexi X-HE-Tag: 1716927707-78141 X-HE-Meta: U2FsdGVkX1/Ai4BChHU8S4CY0uF01Za4hiluECAdtPoVa0OctD87LDJMrCOfEUQq9xHLRO9TCU+fAGOCQ8d30tb5mAgRiSCZbQWfo6XJYABMeUhzRlkUkDN5yScLwe6slAVcrWvtVaByw5mmIQOsgQuC/dWz1LKYCMjFy1Tgep9MrMGODyHH2VygAUDHRyDCIXe8xwJ0ug8J64r8zCEwVWmSGgxKkhdIo/4usrNghSsQSVer0LuqV3NB+aFs4C9LiEsFNBmq2ikPow9Bckm75YAm7wsWG44Tur8W7sqehlPwXFZrJ8ZwZj5HEBEGV6eDJfcvcjo0mmi4IxHBYBfd8ijc9j16U5UTcg/Ykji1rCgsHWvpAda4bcoo82rQSToXr9U00ek8UphM4tE7vjpq+MbjaXC9E3XULfLibA+xu7UeSzd/xKOsB+CzIz5dR66qD6M8Z0OPq6UJ278vxEq1Pcq65ste5NT3RzJu7kklCbER2IUOZAXSHzRxZc3TOB9DnJLlrAEwqOvb9Q/yYPQQuymG4Zl4FWW2WTaTluZSEdq67AwRIISw17IU/Tinv4VdRhOn8jUYr/t3dsCT//VCO29yEm8/KPOvqYRVTpkw559SQqCMoxZ88e04w2akNt+LiAtlHr6kEn4pdj6xDfMYHstegDZmIDjcYY69+zoibQg9RViZcv5mq1+uJLSQT3034btURM3yS9kWjZyOnamnYQLp/prGQw6UUYBwNc+Gp2Giw4AlStA/o9F15lOlHhFgLTzQqF9tq+/StDkZwU8q1dYSHALy9p5p5d/U/DE6J1lX3G3b8BFxyZw5zdXZzSZhmUXgskS28z5fujmUS5/qGNLky1sjTMwX+ttJpKyr2XKmrM6VC60S73tvSzWa38u44MGoEgQXmLX8cmmRqu6EY0XbEDd5878QF8fcOBEXJhHl43fVrDaYnz2wO0UInCGwo9Ghb+Hu+P0mQaOzj6B Mx4zXe6K F2J24dp/rxHRrmi1yQ9TEM9057zpmz21DxoI1W9dOuXQZybJ5qiLfqtATWc3lVkEK4v0J2jA8/jiKaDCa6kIK10rT38VOIpYUyxtyDUQCUinGfrRWefTxK19IZQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Rename memcg_check_events() into memcg1_check_events() for consistency with other cgroup v1-specific functions. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 2 +- mm/memcontrol.c | 8 ++++---- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index c47ffb6105931..8b8c2c9516349 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -835,9 +835,9 @@ static int mem_cgroup_move_account(struct folio *folio, local_irq_disable(); mem_cgroup_charge_statistics(to, nr_pages); - memcg_check_events(to, nid); + memcg1_check_events(to, nid); mem_cgroup_charge_statistics(from, -nr_pages); - memcg_check_events(from, nid); + memcg1_check_events(from, nid); local_irq_enable(); out: return ret; @@ -1424,7 +1424,7 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg) * Check events in order. * */ -void memcg_check_events(struct mem_cgroup *memcg, int nid) +void memcg1_check_events(struct mem_cgroup *memcg, int nid) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) return; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 524a2c76ffc97..ef1b7037cbdcc 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -12,7 +12,7 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) } void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); -void memcg_check_events(struct mem_cgroup *memcg, int nid); +void memcg1_check_events(struct mem_cgroup *memcg, int nid); void memcg_oom_recover(struct mem_cgroup *memcg); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6bc9009bee517..c14b1b01bcf53 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2632,7 +2632,7 @@ void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg) local_irq_disable(); mem_cgroup_charge_statistics(memcg, folio_nr_pages(folio)); - memcg_check_events(memcg, folio_nid(folio)); + memcg1_check_events(memcg, folio_nid(folio)); local_irq_enable(); } @@ -5699,7 +5699,7 @@ static void uncharge_batch(const struct uncharge_gather *ug) local_irq_save(flags); __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory); - memcg_check_events(ug->memcg, ug->nid); + memcg1_check_events(ug->memcg, ug->nid); local_irq_restore(flags); /* drop reference from uncharge_folio */ @@ -5839,7 +5839,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new) local_irq_save(flags); mem_cgroup_charge_statistics(memcg, nr_pages); - memcg_check_events(memcg, folio_nid(new)); + memcg1_check_events(memcg, folio_nid(new)); local_irq_restore(flags); } @@ -6106,7 +6106,7 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) memcg_stats_lock(); mem_cgroup_charge_statistics(memcg, -nr_entries); memcg_stats_unlock(); - memcg_check_events(memcg, folio_nid(folio)); + memcg1_check_events(memcg, folio_nid(folio)); css_put(&memcg->css); } From patchwork Tue May 28 20:21:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677356 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A96D7C25B78 for ; Tue, 28 May 2024 20:21:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 051F76B00A4; Tue, 28 May 2024 16:21:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF8866B00A6; Tue, 28 May 2024 16:21:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D4A286B00A7; Tue, 28 May 2024 16:21:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AEDC06B00A4 for ; Tue, 28 May 2024 16:21:52 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 67E34A025F for ; Tue, 28 May 2024 20:21:52 +0000 (UTC) X-FDA: 82168925664.02.518667B Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) by imf11.hostedemail.com (Postfix) with ESMTP id 286E540017 for ; Tue, 28 May 2024 20:21:49 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ME5mJ+Jj; spf=pass (imf11.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927710; a=rsa-sha256; cv=none; b=seOm1a56TfFhF+JMlH2XYyyciqbXcQpHqOFGSLjGPbPziMi+qApRkoq4ppUqDKbN8k1WQW 7Z7s2H77cpm0PzDpGajDQ9rKm6yiU2cSWIUrqqgjge/JuLpTBlYqEZFp7T7S/no5ZcaIx4 Orepn/B19rSsot8hJN2OIxGwm67p5u4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ME5mJ+Jj; spf=pass (imf11.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927710; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=33wz13A4f/caQiWBSVfYyeyVyNh/UQwJYekopJQjL5M=; b=jZt0nGB9oD5Y8vgW4lUyVNWhKEXeHflOGcKJPwT3twvFrjvSpffRBQllWR8K0kpM7fFf6h zslZOhRzxKq0oB8bvSR3HOzu4xOv/U0UOuDEBvFqnEXVNINkyupL13yl2p9NW7AYTz7FgJ ar9gxTmRRNsLwb5LVGtUvxox9GIrKSA= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927708; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=33wz13A4f/caQiWBSVfYyeyVyNh/UQwJYekopJQjL5M=; b=ME5mJ+JjKpEoN6vLZzvzE2Myr1yywyib2bjtmdIwDIgFXJ6fRru8TAjkjs+Hzu7YOuZZ5V Au0i9HmXB4IXm1EgKmnlp5p0q0EhJCwNF1/XBoRUqSvLKeHhWz3gr9oQXXeGAlgqk3wqYb 0UFyXpNoh8TcncjyKWF+ReGUO8MG39M= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 08/14] mm: memcg: move cgroup v1 oom handling code into memcontrol-v1.c Date: Tue, 28 May 2024 13:21:00 -0700 Message-ID: <20240528202101.3099300-9-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 286E540017 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 3d1eeeqgj1nt1ot8wqi7cwobcz6o3ee1 X-HE-Tag: 1716927709-597432 X-HE-Meta: U2FsdGVkX19Itn4yp1tV6LjjUhahGkpAw7pv1FfM8MP3BuJwPHRlLFa1zLom/Dt/ElSiOym94EbiOBBl5qN6S9m6Db6d7OWsr48vRwZS48xG2Uj+XM4dP65nWwQVHEtpxS3xlQdy2JInB3rb2VCcC5wvRD1GSErLP5WoDYtq0dj0Ei+H1eG2P5dWRRL5GzM5/yJ1O3RxniG5+aC7NndYP8jOupZPfi0pTVIWGGFG5s8/m3ZjcWdBwbQU7ka8sYIONmjfcrWO0GAWaFW8WMISabmvVzilYu6P0XtGgD3Bh79xkC117wp/r9Fl5Yqlau51pEAAlCrJ782KDZxfMft3JRQac9Vm1Bn84mFih5HqH9GBEQSuSLpWYh8EWK05H34Yeoo1TPK85XSOaKxw78xlaIPP6S1wAyHJsWX9IkQjqsv0aT/nv9+Mjhnw3bqK/zKCQVDBcXnJnCZxHhwyr6xuTPIIeaN8vY4YOy9wIWYdNT1WYza34v1d0HJ1/QliqzSUtjl1IcW3LhZyFtqclK6SVtHqWQAl+J7WieaySfMNaRCVzkM28sMN9iv372xeyS6e+r/ANyX/y+bTitDkugd1kjBb8uXit8dT8cktEpBPGrp66CYcPVJ53tsx/fm5HmtrLy0Dc3dAWa5V9Ng6DfF3eluqjzgdX6Xx7ZAJDUGw0k3oEmIKOQOBiVO4kmiLMOlMqqEMvCupFt09CFYx8jZ7s0L1ddyJQWT9g6zj4QYUOFafLXH0bAEgWbNSF3ZqhOAipwPtAD0eQPANfuFaHIWV7Rc3J4GTNooAWPwM0u0dUPGkSvFagThNNbFfZt/FIZMYRXrCeOWdF35gedhygfVkcI97eJE9hBMLMCMgKRwceycmwLJzIMzE3NdD4hsbai9xErlxrlBQ/vCx97U1Sc7EFPcZ1J2dOdGM2f3j3NUg2V9Ob4OOzKfrfEUvEl8rXfIJZOFQRyT6ZJSrypH7ezw RN3zwNku Z8f2eSTDxpVehXj0LAY18chySgr4HhSbuRMevOm/UQ37bbvtT8O7Xgk0AMmydJZ85iS4h X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Cgroup v1 supports a complicated OOM handling in userspace mechanism, which is not supported by cgroup v2. Let's move the corresponding code into memcontrol-v1.c. Aside from mechanical code movement this patch introduces two new functions: memcg1_oom_prepare() and memcg1_oom_finish(). Those are implementing cgroup v1-specific parts of the common memcg OOM handling path. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt Acked-by: Shakeel Butt --- mm/memcontrol-v1.c | 229 ++++++++++++++++++++++++++++++++++++++++++++- mm/memcontrol-v1.h | 3 +- mm/memcontrol.c | 216 +----------------------------------------- 3 files changed, 231 insertions(+), 217 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 8b8c2c9516349..6188b33ba03db 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -110,7 +110,13 @@ struct mem_cgroup_event { struct work_struct remove; }; -extern spinlock_t memcg_oom_lock; +#ifdef CONFIG_LOCKDEP +static struct lockdep_map memcg_oom_lock_dep_map = { + .name = "memcg_oom_lock", +}; +#endif + +DEFINE_SPINLOCK(memcg_oom_lock); static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, @@ -1469,7 +1475,7 @@ static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) return 0; } -void mem_cgroup_oom_notify(struct mem_cgroup *memcg) +static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) { struct mem_cgroup *iter; @@ -1959,6 +1965,225 @@ void memcg1_css_offline(struct mem_cgroup *memcg) spin_unlock_irq(&memcg->event_list_lock); } +/* + * Check OOM-Killer is already running under our hierarchy. + * If someone is running, return false. + */ +static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter, *failed = NULL; + + spin_lock(&memcg_oom_lock); + + for_each_mem_cgroup_tree(iter, memcg) { + if (iter->oom_lock) { + /* + * this subtree of our hierarchy is already locked + * so we cannot give a lock. + */ + failed = iter; + mem_cgroup_iter_break(memcg, iter); + break; + } else + iter->oom_lock = true; + } + + if (failed) { + /* + * OK, we failed to lock the whole subtree so we have + * to clean up what we set up to the failing subtree + */ + for_each_mem_cgroup_tree(iter, memcg) { + if (iter == failed) { + mem_cgroup_iter_break(memcg, iter); + break; + } + iter->oom_lock = false; + } + } else + mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_); + + spin_unlock(&memcg_oom_lock); + + return !failed; +} + +static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + spin_lock(&memcg_oom_lock); + mutex_release(&memcg_oom_lock_dep_map, _RET_IP_); + for_each_mem_cgroup_tree(iter, memcg) + iter->oom_lock = false; + spin_unlock(&memcg_oom_lock); +} + +static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + spin_lock(&memcg_oom_lock); + for_each_mem_cgroup_tree(iter, memcg) + iter->under_oom++; + spin_unlock(&memcg_oom_lock); +} + +static void mem_cgroup_unmark_under_oom(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + /* + * Be careful about under_oom underflows because a child memcg + * could have been added after mem_cgroup_mark_under_oom. + */ + spin_lock(&memcg_oom_lock); + for_each_mem_cgroup_tree(iter, memcg) + if (iter->under_oom > 0) + iter->under_oom--; + spin_unlock(&memcg_oom_lock); +} + +static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq); + +struct oom_wait_info { + struct mem_cgroup *memcg; + wait_queue_entry_t wait; +}; + +static int memcg_oom_wake_function(wait_queue_entry_t *wait, + unsigned mode, int sync, void *arg) +{ + struct mem_cgroup *wake_memcg = (struct mem_cgroup *)arg; + struct mem_cgroup *oom_wait_memcg; + struct oom_wait_info *oom_wait_info; + + oom_wait_info = container_of(wait, struct oom_wait_info, wait); + oom_wait_memcg = oom_wait_info->memcg; + + if (!mem_cgroup_is_descendant(wake_memcg, oom_wait_memcg) && + !mem_cgroup_is_descendant(oom_wait_memcg, wake_memcg)) + return 0; + return autoremove_wake_function(wait, mode, sync, arg); +} + +void memcg_oom_recover(struct mem_cgroup *memcg) +{ + /* + * For the following lockless ->under_oom test, the only required + * guarantee is that it must see the state asserted by an OOM when + * this function is called as a result of userland actions + * triggered by the notification of the OOM. This is trivially + * achieved by invoking mem_cgroup_mark_under_oom() before + * triggering notification. + */ + if (memcg && memcg->under_oom) + __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); +} + +/** + * mem_cgroup_oom_synchronize - complete memcg OOM handling + * @handle: actually kill/wait or just clean up the OOM state + * + * This has to be called at the end of a page fault if the memcg OOM + * handler was enabled. + * + * Memcg supports userspace OOM handling where failed allocations must + * sleep on a waitqueue until the userspace task resolves the + * situation. Sleeping directly in the charge context with all kinds + * of locks held is not a good idea, instead we remember an OOM state + * in the task and mem_cgroup_oom_synchronize() has to be called at + * the end of the page fault to complete the OOM handling. + * + * Returns %true if an ongoing memcg OOM situation was detected and + * completed, %false otherwise. + */ +bool mem_cgroup_oom_synchronize(bool handle) +{ + struct mem_cgroup *memcg = current->memcg_in_oom; + struct oom_wait_info owait; + bool locked; + + /* OOM is global, do not handle */ + if (!memcg) + return false; + + if (!handle) + goto cleanup; + + owait.memcg = memcg; + owait.wait.flags = 0; + owait.wait.func = memcg_oom_wake_function; + owait.wait.private = current; + INIT_LIST_HEAD(&owait.wait.entry); + + prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); + mem_cgroup_mark_under_oom(memcg); + + locked = mem_cgroup_oom_trylock(memcg); + + if (locked) + mem_cgroup_oom_notify(memcg); + + schedule(); + mem_cgroup_unmark_under_oom(memcg); + finish_wait(&memcg_oom_waitq, &owait.wait); + + if (locked) + mem_cgroup_oom_unlock(memcg); +cleanup: + current->memcg_in_oom = NULL; + css_put(&memcg->css); + return true; +} + + +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) +{ + /* + * We are in the middle of the charge context here, so we + * don't want to block when potentially sitting on a callstack + * that holds all kinds of filesystem and mm locks. + * + * cgroup1 allows disabling the OOM killer and waiting for outside + * handling until the charge can succeed; remember the context and put + * the task to sleep at the end of the page fault when all locks are + * released. + * + * On the other hand, in-kernel OOM killer allows for an async victim + * memory reclaim (oom_reaper) and that means that we are not solely + * relying on the oom victim to make a forward progress and we can + * invoke the oom killer here. + * + * Please note that mem_cgroup_out_of_memory might fail to find a + * victim and then we have to bail out from the charge path. + */ + if (READ_ONCE(memcg->oom_kill_disable)) { + if (current->in_user_fault) { + css_get(&memcg->css); + current->memcg_in_oom = memcg; + } + return false; + } + + mem_cgroup_mark_under_oom(memcg); + + *locked = mem_cgroup_oom_trylock(memcg); + + if (*locked) + mem_cgroup_oom_notify(memcg); + + mem_cgroup_unmark_under_oom(memcg); + + return true; +} + +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked) +{ + if (locked) + mem_cgroup_oom_unlock(memcg); +} + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index ef1b7037cbdcc..3de956b2422f1 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -87,9 +87,10 @@ enum res_type { bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -void mem_cgroup_oom_notify(struct mem_cgroup *memcg); ssize_t memcg_write_event_control(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c14b1b01bcf53..3b088e0078821 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1616,130 +1616,6 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, return ret; } -#ifdef CONFIG_LOCKDEP -static struct lockdep_map memcg_oom_lock_dep_map = { - .name = "memcg_oom_lock", -}; -#endif - -DEFINE_SPINLOCK(memcg_oom_lock); - -/* - * Check OOM-Killer is already running under our hierarchy. - * If someone is running, return false. - */ -static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter, *failed = NULL; - - spin_lock(&memcg_oom_lock); - - for_each_mem_cgroup_tree(iter, memcg) { - if (iter->oom_lock) { - /* - * this subtree of our hierarchy is already locked - * so we cannot give a lock. - */ - failed = iter; - mem_cgroup_iter_break(memcg, iter); - break; - } else - iter->oom_lock = true; - } - - if (failed) { - /* - * OK, we failed to lock the whole subtree so we have - * to clean up what we set up to the failing subtree - */ - for_each_mem_cgroup_tree(iter, memcg) { - if (iter == failed) { - mem_cgroup_iter_break(memcg, iter); - break; - } - iter->oom_lock = false; - } - } else - mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_); - - spin_unlock(&memcg_oom_lock); - - return !failed; -} - -static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - spin_lock(&memcg_oom_lock); - mutex_release(&memcg_oom_lock_dep_map, _RET_IP_); - for_each_mem_cgroup_tree(iter, memcg) - iter->oom_lock = false; - spin_unlock(&memcg_oom_lock); -} - -static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - spin_lock(&memcg_oom_lock); - for_each_mem_cgroup_tree(iter, memcg) - iter->under_oom++; - spin_unlock(&memcg_oom_lock); -} - -static void mem_cgroup_unmark_under_oom(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - /* - * Be careful about under_oom underflows because a child memcg - * could have been added after mem_cgroup_mark_under_oom. - */ - spin_lock(&memcg_oom_lock); - for_each_mem_cgroup_tree(iter, memcg) - if (iter->under_oom > 0) - iter->under_oom--; - spin_unlock(&memcg_oom_lock); -} - -static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq); - -struct oom_wait_info { - struct mem_cgroup *memcg; - wait_queue_entry_t wait; -}; - -static int memcg_oom_wake_function(wait_queue_entry_t *wait, - unsigned mode, int sync, void *arg) -{ - struct mem_cgroup *wake_memcg = (struct mem_cgroup *)arg; - struct mem_cgroup *oom_wait_memcg; - struct oom_wait_info *oom_wait_info; - - oom_wait_info = container_of(wait, struct oom_wait_info, wait); - oom_wait_memcg = oom_wait_info->memcg; - - if (!mem_cgroup_is_descendant(wake_memcg, oom_wait_memcg) && - !mem_cgroup_is_descendant(oom_wait_memcg, wake_memcg)) - return 0; - return autoremove_wake_function(wait, mode, sync, arg); -} - -void memcg_oom_recover(struct mem_cgroup *memcg) -{ - /* - * For the following lockless ->under_oom test, the only required - * guarantee is that it must see the state asserted by an OOM when - * this function is called as a result of userland actions - * triggered by the notification of the OOM. This is trivially - * achieved by invoking mem_cgroup_mark_under_oom() before - * triggering notification. - */ - if (memcg && memcg->under_oom) - __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); -} - /* * Returns true if successfully killed one or more processes. Though in some * corner cases it can return true even without killing any process. @@ -1753,104 +1629,16 @@ static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) memcg_memory_event(memcg, MEMCG_OOM); - /* - * We are in the middle of the charge context here, so we - * don't want to block when potentially sitting on a callstack - * that holds all kinds of filesystem and mm locks. - * - * cgroup1 allows disabling the OOM killer and waiting for outside - * handling until the charge can succeed; remember the context and put - * the task to sleep at the end of the page fault when all locks are - * released. - * - * On the other hand, in-kernel OOM killer allows for an async victim - * memory reclaim (oom_reaper) and that means that we are not solely - * relying on the oom victim to make a forward progress and we can - * invoke the oom killer here. - * - * Please note that mem_cgroup_out_of_memory might fail to find a - * victim and then we have to bail out from the charge path. - */ - if (READ_ONCE(memcg->oom_kill_disable)) { - if (current->in_user_fault) { - css_get(&memcg->css); - current->memcg_in_oom = memcg; - } + if (!memcg1_oom_prepare(memcg, &locked)) return false; - } - - mem_cgroup_mark_under_oom(memcg); - locked = mem_cgroup_oom_trylock(memcg); - - if (locked) - mem_cgroup_oom_notify(memcg); - - mem_cgroup_unmark_under_oom(memcg); ret = mem_cgroup_out_of_memory(memcg, mask, order); - if (locked) - mem_cgroup_oom_unlock(memcg); + memcg1_oom_finish(memcg, locked); return ret; } -/** - * mem_cgroup_oom_synchronize - complete memcg OOM handling - * @handle: actually kill/wait or just clean up the OOM state - * - * This has to be called at the end of a page fault if the memcg OOM - * handler was enabled. - * - * Memcg supports userspace OOM handling where failed allocations must - * sleep on a waitqueue until the userspace task resolves the - * situation. Sleeping directly in the charge context with all kinds - * of locks held is not a good idea, instead we remember an OOM state - * in the task and mem_cgroup_oom_synchronize() has to be called at - * the end of the page fault to complete the OOM handling. - * - * Returns %true if an ongoing memcg OOM situation was detected and - * completed, %false otherwise. - */ -bool mem_cgroup_oom_synchronize(bool handle) -{ - struct mem_cgroup *memcg = current->memcg_in_oom; - struct oom_wait_info owait; - bool locked; - - /* OOM is global, do not handle */ - if (!memcg) - return false; - - if (!handle) - goto cleanup; - - owait.memcg = memcg; - owait.wait.flags = 0; - owait.wait.func = memcg_oom_wake_function; - owait.wait.private = current; - INIT_LIST_HEAD(&owait.wait.entry); - - prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); - mem_cgroup_mark_under_oom(memcg); - - locked = mem_cgroup_oom_trylock(memcg); - - if (locked) - mem_cgroup_oom_notify(memcg); - - schedule(); - mem_cgroup_unmark_under_oom(memcg); - finish_wait(&memcg_oom_waitq, &owait.wait); - - if (locked) - mem_cgroup_oom_unlock(memcg); -cleanup: - current->memcg_in_oom = NULL; - css_put(&memcg->css); - return true; -} - /** * mem_cgroup_get_oom_group - get a memory cgroup to clean up after OOM * @victim: task to be killed by the OOM killer From patchwork Tue May 28 20:21:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677357 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AF25C25B78 for ; Tue, 28 May 2024 20:22:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 315F06B00A6; Tue, 28 May 2024 16:21:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C0496B00A7; Tue, 28 May 2024 16:21:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1398F6B00A8; Tue, 28 May 2024 16:21:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E18286B00A6 for ; Tue, 28 May 2024 16:21:54 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5F04A120441 for ; Tue, 28 May 2024 20:21:54 +0000 (UTC) X-FDA: 82168925748.02.A2A293E Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) by imf02.hostedemail.com (Postfix) with ESMTP id 4865F80012 for ; Tue, 28 May 2024 20:21:52 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=LkZenzNv; spf=pass (imf02.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.187 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716927712; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OeFsCVicsmIcJz/F7rjSxNev8T0hhwq0eN6+J3Y4BM8=; b=sgblG7oYBYyRvfakbBtgG90HCe5v/SNHoq+vw468yLTrnAfs8U9g3cHl4t4X2vj9/Ldee1 RBNxHEP91Nvme9Wv+/Jh2uIvG9XeGsEpThSRHaUZm5j0cEsYivHyCxaqUiri3k4htiQHBd Sk3q4FCWMdGXEqAoh1kQ/DJyOLR+csE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716927712; a=rsa-sha256; cv=none; b=7kJl/CSvitKT9xYfFRyJqqQoy8uvNoDX9y4TbxgEvgY93xD8n2XMZL0qyEPtWqO4Ox1kvG OBhzeGLqegr9xM/kK4SFLDrnZQRYIrFJTqsyyPtFOIgEyj1+XxrhsdtsJwo4HFzVIjff2w 3nE5kQX3b+0Pq0qak5cA/rmp4S1McYQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=LkZenzNv; spf=pass (imf02.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.187 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716927711; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OeFsCVicsmIcJz/F7rjSxNev8T0hhwq0eN6+J3Y4BM8=; b=LkZenzNvmfsJgfDdOHnpDbqf2Na0D38sAL9OmxSR8fBqCjbiSMP/ZL3nHkm6LHssps2b4X 4Bv/L0c++DrNIfkFSX9UOfULUTn4B1FArCAXQ4nGirdw2AxLiILByzklAeAPwjKF2Tmvz5 JSqzJmTzj5EF9CN5CPuGrnstZlmh8as= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 09/14] mm: memcg: rename memcg_oom_recover() Date: Tue, 28 May 2024 13:21:01 -0700 Message-ID: <20240528202101.3099300-10-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Stat-Signature: 4s5yajnjc8za1ofnoizqzdac5zjmyww6 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4865F80012 X-HE-Tag: 1716927712-535344 X-HE-Meta: U2FsdGVkX19ZXngd5mYXtJRSbcKk4F+3ovkTP3RvCJZgpOYmUcOYkG7d7ikxDre5EF7aHr4PScEPGtC3a3vVKUVGEzN2lIJCYfuL4QwHWZ0/wlgnaGGspABI4F1IdQvjZVaKtJoBFqsuXraTySLt226E4gO2MOHxhAjyebH36pP61FZYFCddE5RnhrPhHydzJCdGpsQyyOgS8hS5Iu4nWxlp3KnPq4zBSO8eFENC5BQ7xSqQ8SJIrnoVPdup6K8H2ZcKmMarbEBnJNGpzYRtkY2Cr3XPX8mc09kTTtSY5ipE8hrOUvMljB6xuYJ8np4BE5pKxVAW9I3mUkziaRCH7IfPU//zlPQhyP0safmQCdOAW3QlZomgoh8JAnFt0G1KUX+Plwe+YB0GU7dmlzakdjIQQyi6BCLCY0gxW44D/vDnIAx0hd+d+RBbEEfxGor1R01G/GEWW/Rboz1Fu4QALS9jW4mLVJASGW/uuy2KIJtt0OgBPVkl+WukneKuz7Ql9t6kL+W9PIeT2TGAhsHmyLY2t5TWjKKK+g5MrZMVrGaRGLJUPkBiWGNgnkpls73eiI+tgvl23kfb4GR1EKjUP3XeDvw1/l13Y/4YPskcKlpQz3WewUK/phBW/Pi/nXdbb4iezwY8e54cbCU7Z7VkeIic42kO7EEwGC9AyOhoLWYNKNuEdQtOL4hL81EIa9EPjdI211U80I53fWvuZ6agBx6gw4CqvBBi8yuixI0lLZdVOwApMHkWQU/mY97FzUBUmKUYBXbPSenS//iJnNBewMac9IehnVIr3I5byUybgUtWEHcnhj1EBfVLQVHy8k3Y7s4nTPFqc8HghOsSVDQRgM4Cstk+sPGXcQVKueVgAWoUi4iYvQSKX7ee/SxRdcITuflPXK7FStTFnj3uv9OstfYo4a9jegPiCstNJkOtVT0a/e82MqPpSQLzQkRepjlhIV1aJYCC0jtUsh/zMf4 Lrj8AvFz b2EuQwhsnoZz7MR3o79ZAiyoW7MrkW5GgoM+0Mz872gciJhOLxWuJHaVYCYVoSJfKtCjNFltWh5BWdFCm8iaVDg4d9qazBs8qKivEmuYrfZM6TOMqpbfxSeHV7DaDaQOA1DfT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Rename memcg_oom_recover() into memcg1_oom_recover() for consistency with other memory cgroup v1-related functions. Move the declaration in mm/memcontrol-v1.h to be nearby other memcg v1 oom handling functions. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 2 +- mm/memcontrol.c | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 6188b33ba03db..f73551ff8b2d1 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1090,8 +1090,8 @@ static void __mem_cgroup_clear_mc(void) mc.moved_swap = 0; } - memcg_oom_recover(from); - memcg_oom_recover(to); + memcg1_oom_recover(from); + memcg1_oom_recover(to); wake_up_all(&mc.waitq); } @@ -2067,7 +2067,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t *wait, return autoremove_wake_function(wait, mode, sync, arg); } -void memcg_oom_recover(struct mem_cgroup *memcg) +void memcg1_oom_recover(struct mem_cgroup *memcg) { /* * For the following lockless ->under_oom test, the only required diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 3de956b2422f1..972c493a8ae32 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -13,7 +13,6 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); void memcg1_check_events(struct mem_cgroup *memcg, int nid); -void memcg_oom_recover(struct mem_cgroup *memcg); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); @@ -92,5 +91,6 @@ ssize_t memcg_write_event_control(struct kernfs_open_file *of, bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); +void memcg1_oom_recover(struct mem_cgroup *memcg); #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3b088e0078821..1622086cab274 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3169,7 +3169,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, } while (true); if (!ret && enlarge) - memcg_oom_recover(memcg); + memcg1_oom_recover(memcg); return ret; } @@ -3754,7 +3754,7 @@ static int mem_cgroup_oom_control_write(struct cgroup_subsys_state *css, WRITE_ONCE(memcg->oom_kill_disable, val); if (!val) - memcg_oom_recover(memcg); + memcg1_oom_recover(memcg); return 0; } @@ -5481,7 +5481,7 @@ static void uncharge_batch(const struct uncharge_gather *ug) page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); if (ug->nr_kmem) memcg_account_kmem(ug->memcg, -ug->nr_kmem); - memcg_oom_recover(ug->memcg); + memcg1_oom_recover(ug->memcg); } local_irq_save(flags); From patchwork Tue May 28 21:44:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2843AC27C44 for ; Tue, 28 May 2024 21:45:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8290A6B009E; Tue, 28 May 2024 17:45:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FE706B00A6; Tue, 28 May 2024 17:45:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F9886B00AB; Tue, 28 May 2024 17:45:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 27EB76B009E for ; Tue, 28 May 2024 17:45:02 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CEDDC1C103B for ; Tue, 28 May 2024 21:45:01 +0000 (UTC) X-FDA: 82169135202.14.206350B Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) by imf14.hostedemail.com (Postfix) with ESMTP id D176A10001A for ; Tue, 28 May 2024 21:44:57 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=FK8rC5UK; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf14.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716932698; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wb5xoxtsKPiXoxB44ZAx2H0nu1mCBAYBnQVJ0gBrFu4=; b=UHF1BOqU2ud6s//OQQqawBYAzO1wIHUh6iiU9PjTCl5qDGkfKsIDrh6cDbVuiLGdd91G4k 5kSfhcL4I9D7DbeioYD+a6J59/jXpn592o0/rvhh3NznVBxi8RgMmEt7GwE6JNrXqa+hjW QVnotPD7gmOAZ2qdMNyALz9NCZKhNqY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=FK8rC5UK; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf14.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716932698; a=rsa-sha256; cv=none; b=R030RjgIsWnzv1SlFhB5kZvRpGWKAi3BBtoN7sfgD+VTwvcs1YIz1DbQS591KSRgmuzoh4 sXuAQMazX10WNN1JCdCYUhdga8rdQ5yL/T0RTr5fiAu1dTSPDYhyxZfkeoOOv9an4hGEew yCYmpcpzL6Akm9/yo5T1vdB4lqE9cV4= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716932695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wb5xoxtsKPiXoxB44ZAx2H0nu1mCBAYBnQVJ0gBrFu4=; b=FK8rC5UKcyrXb1lhCwK6ladqhPlSBTRYDkDC+ulfy9+KsKyKpp7cZZdPzkREqX7MnnRtSR y+OvNp+9njjNZarCe579GeA37Jnhaj7ekB6g32wMkW/Ub3UMxh7ryArFQqwfZTN8c8K8XN ZM/KxrD5h4ZoCvS+hbCkpB/OWz8/yzg= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 10/14] mm: memcg: move cgroup v1 interface files to memcontrol-v1.c Date: Tue, 28 May 2024 14:44:31 -0700 Message-ID: <20240528214435.3125304-1-roman.gushchin@linux.dev> In-Reply-To: <20240528202101.3099300-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: D176A10001A X-Stat-Signature: qe9s856sbon41t5zkq3eticjozw3q7ei X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1716932697-739617 X-HE-Meta: U2FsdGVkX1883ImVbX6pgXccmStY61ID8HNSq8FtATqw6dGcSI23DIK1KHZWt3r3QoiRFkm0yp6K8MBqiMrFy1kExhshOSF52l57DNIZim7xcrtU/KisOJAH0r+OSwLnSJAEv4Huw1U++xtSNV2p0R8zIzyYdKSlVE1aM7teBmw+E5YwlayWmA5DjpxGdIo876DsubkmzpLH2Z0cKUa0f2vXcNQKSBL88/B21boI7v8R2Xu68acRfAnooPuNU4nTSRkeiV9DM9/+LmDN8jAoYC+jVmfiQTgBPiolLx5W1p/gwdYt4Yqe+RNDHzGlKKLboAAVHobFYo6+XEZZy/uqQ9ZWMYR9FwlFzEBICSfepNsCbr17FaElAyET9E3CcSjZ5+5OgJxMFJZDaNKEFXbPEhtK6iAkVRB7RrK4NVAt8JDC7zP775x/huBXLV+3lhlzvMZiV2V6wozbzqR0ZKZvZTgB1iI1oD/RYuHAZ65Qlq9PqjSW30IV5LjeEaBzeX5eR/6VGLTY6cn1USmyqFiN5bHQ8cTm04xH0e1ZOZJif4/37wwCxwp1fMs2cm/PquIQ1rJm4M1HaYBAgSjFmyqW34khqonAocGaM5D7OuG3LS1tX788ZalsHqGrq1JLkp3KF07q3/ircjUETErUG9XvYKwzd0TzycjIVntqARNcMfnnOTj/k+xGfh+M5Pz1U9v1xzbOoJfwFezJVKOpbXCBLtmPNINXZ9PDOxx+UCUOOBOMuIUcyP+vlol9PMFhp7Ve3IGnTqP3wUqDvtA+tVGM9Q6D6M3Lt+8wLT4R2qUnLapzuPevR5igHQVeWG1n5SOzDLxhEwHe/hlNNXBVqNlxGQU5Znox3WTnkPSs/qS1MqpziWuNvyrh8oOn+MGUSy1XoP56WY7FaYDiUCkGQ86S9PbdQ6nEV62CYGEGX5c5ijiAafjPsKkAbQ36ZKPp+6iBu+91gtHGG/XyJv3qiQi GZhROoIb J5LE1nw1QvKYWqGfU36vmZ+Ng4threqkYElyGAWAelqkvKf4n5/+h5IUbwmBcfP7yxKJtJ6NhUAi6r5Rabh8Qe9MLPYB9bx0KEASbzRk64/PaFB0an4/VxB5/oeV1O+lHMBR8KrRnwSQrNwM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Move legacy cgroup v1 memory controller interfaces and corresponding code into memcontrol-v1.c. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- mm/memcontrol-v1.c | 739 ++++++++++++++++++++++++++++++++++++++++++++- mm/memcontrol-v1.h | 29 +- mm/memcontrol.c | 721 +------------------------------------------ 3 files changed, 767 insertions(+), 722 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index f73551ff8b2d1..3972172f83624 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "internal.h" #include "swap.h" @@ -110,6 +111,18 @@ struct mem_cgroup_event { struct work_struct remove; }; +#define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) +#define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) +#define MEMFILE_ATTR(val) ((val) & 0xffff) + +enum { + RES_USAGE, + RES_LIMIT, + RES_MAX_USAGE, + RES_FAILCNT, + RES_SOFT_LIMIT, +}; + #ifdef CONFIG_LOCKDEP static struct lockdep_map memcg_oom_lock_dep_map = { .name = "memcg_oom_lock", @@ -577,14 +590,14 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, } #endif -u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, +static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, struct cftype *cft) { return mem_cgroup_from_css(css)->move_charge_at_immigrate; } #ifdef CONFIG_MMU -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, +static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); @@ -606,7 +619,7 @@ int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, return 0; } #else -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, +static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { return -ENOSYS; @@ -1803,8 +1816,8 @@ static void memcg_event_ptable_queue_proc(struct file *file, * Input must be in format ' '. * Interpretation of args is defined by control file implementation. */ -ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) +static ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) { struct cgroup_subsys_state *css = of_css(of); struct mem_cgroup *memcg = mem_cgroup_from_css(css); @@ -2184,6 +2197,722 @@ void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked) mem_cgroup_oom_unlock(memcg); } +static DEFINE_MUTEX(memcg_max_mutex); + +static int mem_cgroup_resize_max(struct mem_cgroup *memcg, + unsigned long max, bool memsw) +{ + bool enlarge = false; + bool drained = false; + int ret; + bool limits_invariant; + struct page_counter *counter = memsw ? &memcg->memsw : &memcg->memory; + + do { + if (signal_pending(current)) { + ret = -EINTR; + break; + } + + mutex_lock(&memcg_max_mutex); + /* + * Make sure that the new limit (memsw or memory limit) doesn't + * break our basic invariant rule memory.max <= memsw.max. + */ + limits_invariant = memsw ? max >= READ_ONCE(memcg->memory.max) : + max <= memcg->memsw.max; + if (!limits_invariant) { + mutex_unlock(&memcg_max_mutex); + ret = -EINVAL; + break; + } + if (max > counter->max) + enlarge = true; + ret = page_counter_set_max(counter, max); + mutex_unlock(&memcg_max_mutex); + + if (!ret) + break; + + if (!drained) { + drain_all_stock(memcg); + drained = true; + continue; + } + + if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, + memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { + ret = -EBUSY; + break; + } + } while (true); + + if (!ret && enlarge) + memcg1_oom_recover(memcg); + + return ret; +} + +/* + * Reclaims as many pages from the given memcg as possible. + * + * Caller is responsible for holding css reference for memcg. + */ +static int mem_cgroup_force_empty(struct mem_cgroup *memcg) +{ + int nr_retries = MAX_RECLAIM_RETRIES; + + /* we call try-to-free pages for make this cgroup empty */ + lru_add_drain_all(); + + drain_all_stock(memcg); + + /* try to free all pages in this cgroup */ + while (nr_retries && page_counter_read(&memcg->memory)) { + if (signal_pending(current)) + return -EINTR; + + if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, + MEMCG_RECLAIM_MAY_SWAP, NULL)) + nr_retries--; + } + + return 0; +} + +static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + if (mem_cgroup_is_root(memcg)) + return -EINVAL; + return mem_cgroup_force_empty(memcg) ?: nbytes; +} + +static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return 1; +} + +static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + if (val == 1) + return 0; + + pr_warn_once("Non-hierarchical mode is deprecated. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + + return -EINVAL; +} + +static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + struct page_counter *counter; + + switch (MEMFILE_TYPE(cft->private)) { + case _MEM: + counter = &memcg->memory; + break; + case _MEMSWAP: + counter = &memcg->memsw; + break; + case _KMEM: + counter = &memcg->kmem; + break; + case _TCP: + counter = &memcg->tcpmem; + break; + default: + BUG(); + } + + switch (MEMFILE_ATTR(cft->private)) { + case RES_USAGE: + if (counter == &memcg->memory) + return (u64)mem_cgroup_usage(memcg, false) * PAGE_SIZE; + if (counter == &memcg->memsw) + return (u64)mem_cgroup_usage(memcg, true) * PAGE_SIZE; + return (u64)page_counter_read(counter) * PAGE_SIZE; + case RES_LIMIT: + return (u64)counter->max * PAGE_SIZE; + case RES_MAX_USAGE: + return (u64)counter->watermark * PAGE_SIZE; + case RES_FAILCNT: + return counter->failcnt; + case RES_SOFT_LIMIT: + return (u64)READ_ONCE(memcg->soft_limit) * PAGE_SIZE; + default: + BUG(); + } +} + +/* + * This function doesn't do anything useful. Its only job is to provide a read + * handler for a file so that cgroup_file_mode() will add read permissions. + */ +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, + __always_unused void *v) +{ + return -EINVAL; +} + +static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long max) +{ + int ret; + + mutex_lock(&memcg_max_mutex); + + ret = page_counter_set_max(&memcg->tcpmem, max); + if (ret) + goto out; + + if (!memcg->tcpmem_active) { + /* + * The active flag needs to be written after the static_key + * update. This is what guarantees that the socket activation + * function is the last one to run. See mem_cgroup_sk_alloc() + * for details, and note that we don't mark any socket as + * belonging to this memcg until that flag is up. + * + * We need to do this, because static_keys will span multiple + * sites, but we can't control their order. If we mark a socket + * as accounted, but the accounting functions are not patched in + * yet, we'll lose accounting. + * + * We never race with the readers in mem_cgroup_sk_alloc(), + * because when this value change, the code to process it is not + * patched in yet. + */ + static_branch_inc(&memcg_sockets_enabled_key); + memcg->tcpmem_active = true; + } +out: + mutex_unlock(&memcg_max_mutex); + return ret; +} + +/* + * The user of this function is... + * RES_LIMIT. + */ +static ssize_t mem_cgroup_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + unsigned long nr_pages; + int ret; + + buf = strstrip(buf); + ret = page_counter_memparse(buf, "-1", &nr_pages); + if (ret) + return ret; + + switch (MEMFILE_ATTR(of_cft(of)->private)) { + case RES_LIMIT: + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ + ret = -EINVAL; + break; + } + switch (MEMFILE_TYPE(of_cft(of)->private)) { + case _MEM: + ret = mem_cgroup_resize_max(memcg, nr_pages, false); + break; + case _MEMSWAP: + ret = mem_cgroup_resize_max(memcg, nr_pages, true); + break; + case _KMEM: + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " + "Writing any value to this file has no effect. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + ret = 0; + break; + case _TCP: + ret = memcg_update_tcp_max(memcg, nr_pages); + break; + } + break; + case RES_SOFT_LIMIT: + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + ret = -EOPNOTSUPP; + } else { + WRITE_ONCE(memcg->soft_limit, nr_pages); + ret = 0; + } + break; + } + return ret ?: nbytes; +} + +static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + struct page_counter *counter; + + switch (MEMFILE_TYPE(of_cft(of)->private)) { + case _MEM: + counter = &memcg->memory; + break; + case _MEMSWAP: + counter = &memcg->memsw; + break; + case _KMEM: + counter = &memcg->kmem; + break; + case _TCP: + counter = &memcg->tcpmem; + break; + default: + BUG(); + } + + switch (MEMFILE_ATTR(of_cft(of)->private)) { + case RES_MAX_USAGE: + page_counter_reset_watermark(counter); + break; + case RES_FAILCNT: + counter->failcnt = 0; + break; + default: + BUG(); + } + + return nbytes; +} + +#ifdef CONFIG_NUMA + +#define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) +#define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON)) +#define LRU_ALL ((1 << NR_LRU_LISTS) - 1) + +/* static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, */ +/* int nid, unsigned int lru_mask, bool tree) */ +/* { */ +/* struct lruvec *lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid)); */ +/* unsigned long nr = 0; */ +/* enum lru_list lru; */ + +/* VM_BUG_ON((unsigned)nid >= nr_node_ids); */ + +/* for_each_lru(lru) { */ +/* if (!(BIT(lru) & lru_mask)) */ +/* continue; */ +/* if (tree) */ +/* nr += lruvec_page_state(lruvec, NR_LRU_BASE + lru); */ +/* else */ +/* nr += lruvec_page_state_local(lruvec, NR_LRU_BASE + lru); */ +/* } */ +/* return nr; */ +/* } */ + +/* static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, */ +/* unsigned int lru_mask, */ +/* bool tree) */ +/* { */ +/* unsigned long nr = 0; */ +/* enum lru_list lru; */ + +/* for_each_lru(lru) { */ +/* if (!(BIT(lru) & lru_mask)) */ +/* continue; */ +/* if (tree) */ +/* nr += memcg_page_state(memcg, NR_LRU_BASE + lru); */ +/* else */ +/* nr += memcg_page_state_local(memcg, NR_LRU_BASE + lru); */ +/* } */ +/* return nr; */ +/* } */ + +static int memcg_numa_stat_show(struct seq_file *m, void *v) +{ + struct numa_stat { + const char *name; + unsigned int lru_mask; + }; + + static const struct numa_stat stats[] = { + { "total", LRU_ALL }, + { "file", LRU_ALL_FILE }, + { "anon", LRU_ALL_ANON }, + { "unevictable", BIT(LRU_UNEVICTABLE) }, + }; + const struct numa_stat *stat; + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + mem_cgroup_flush_stats(memcg); + + for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { + seq_printf(m, "%s=%lu", stat->name, + mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, + false)); + for_each_node_state(nid, N_MEMORY) + seq_printf(m, " N%d=%lu", nid, + mem_cgroup_node_nr_lru_pages(memcg, nid, + stat->lru_mask, false)); + seq_putc(m, '\n'); + } + + for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { + + seq_printf(m, "hierarchical_%s=%lu", stat->name, + mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, + true)); + for_each_node_state(nid, N_MEMORY) + seq_printf(m, " N%d=%lu", nid, + mem_cgroup_node_nr_lru_pages(memcg, nid, + stat->lru_mask, true)); + seq_putc(m, '\n'); + } + + return 0; +} +#endif /* CONFIG_NUMA */ + +static const unsigned int memcg1_stats[] = { + NR_FILE_PAGES, + NR_ANON_MAPPED, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + NR_ANON_THPS, +#endif + NR_SHMEM, + NR_FILE_MAPPED, + NR_FILE_DIRTY, + NR_WRITEBACK, + WORKINGSET_REFAULT_ANON, + WORKINGSET_REFAULT_FILE, +#ifdef CONFIG_SWAP + MEMCG_SWAP, + NR_SWAPCACHE, +#endif +}; + +static const char *const memcg1_stat_names[] = { + "cache", + "rss", +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + "rss_huge", +#endif + "shmem", + "mapped_file", + "dirty", + "writeback", + "workingset_refault_anon", + "workingset_refault_file", +#ifdef CONFIG_SWAP + "swap", + "swapcached", +#endif +}; + +/* Universal VM events cgroup1 shows, original sort order */ +static const unsigned int memcg1_events[] = { + PGPGIN, + PGPGOUT, + PGFAULT, + PGMAJFAULT, +}; + +void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) +{ + unsigned long memory, memsw; + struct mem_cgroup *mi; + unsigned int i; + + BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); + + mem_cgroup_flush_stats(memcg); + + for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { + unsigned long nr; + + nr = memcg_page_state_local_output(memcg, memcg1_stats[i]); + seq_buf_printf(s, "%s %lu\n", memcg1_stat_names[i], nr); + } + + for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) + seq_buf_printf(s, "%s %lu\n", vm_event_name(memcg1_events[i]), + memcg_events_local(memcg, memcg1_events[i])); + + for (i = 0; i < NR_LRU_LISTS; i++) + seq_buf_printf(s, "%s %lu\n", lru_list_name(i), + memcg_page_state_local(memcg, NR_LRU_BASE + i) * + PAGE_SIZE); + + /* Hierarchical information */ + memory = memsw = PAGE_COUNTER_MAX; + for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) { + memory = min(memory, READ_ONCE(mi->memory.max)); + memsw = min(memsw, READ_ONCE(mi->memsw.max)); + } + seq_buf_printf(s, "hierarchical_memory_limit %llu\n", + (u64)memory * PAGE_SIZE); + seq_buf_printf(s, "hierarchical_memsw_limit %llu\n", + (u64)memsw * PAGE_SIZE); + + for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { + unsigned long nr; + + nr = memcg_page_state_output(memcg, memcg1_stats[i]); + seq_buf_printf(s, "total_%s %llu\n", memcg1_stat_names[i], + (u64)nr); + } + + for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) + seq_buf_printf(s, "total_%s %llu\n", + vm_event_name(memcg1_events[i]), + (u64)memcg_events(memcg, memcg1_events[i])); + + for (i = 0; i < NR_LRU_LISTS; i++) + seq_buf_printf(s, "total_%s %llu\n", lru_list_name(i), + (u64)memcg_page_state(memcg, NR_LRU_BASE + i) * + PAGE_SIZE); + +#ifdef CONFIG_DEBUG_VM + { + pg_data_t *pgdat; + struct mem_cgroup_per_node *mz; + unsigned long anon_cost = 0; + unsigned long file_cost = 0; + + for_each_online_pgdat(pgdat) { + mz = memcg->nodeinfo[pgdat->node_id]; + + anon_cost += mz->lruvec.anon_cost; + file_cost += mz->lruvec.file_cost; + } + seq_buf_printf(s, "anon_cost %lu\n", anon_cost); + seq_buf_printf(s, "file_cost %lu\n", file_cost); + } +#endif +} + +static u64 mem_cgroup_swappiness_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + return mem_cgroup_swappiness(memcg); +} + +static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + if (val > MAX_SWAPPINESS) + return -EINVAL; + + if (!mem_cgroup_is_root(memcg)) + WRITE_ONCE(memcg->swappiness, val); + else + WRITE_ONCE(vm_swappiness, val); + + return 0; +} + +static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_seq(sf); + + seq_printf(sf, "oom_kill_disable %d\n", READ_ONCE(memcg->oom_kill_disable)); + seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom); + seq_printf(sf, "oom_kill %lu\n", + atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL])); + return 0; +} + +static int mem_cgroup_oom_control_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + /* cannot set to root cgroup and only 0 and 1 are allowed */ + if (mem_cgroup_is_root(memcg) || !((val == 0) || (val == 1))) + return -EINVAL; + + WRITE_ONCE(memcg->oom_kill_disable, val); + if (!val) + memcg1_oom_recover(memcg); + + return 0; +} + +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) +static int mem_cgroup_slab_show(struct seq_file *m, void *p) +{ + /* + * Deprecated. + * Please, take a look at tools/cgroup/memcg_slabinfo.py . + */ + return 0; +} +#endif + +struct cftype mem_cgroup_legacy_files[] = { + { + .name = "usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "max_usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "limit_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "soft_limit_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_SOFT_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "failcnt", + .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "stat", + .seq_show = memory_stat_show, + }, + { + .name = "force_empty", + .write = mem_cgroup_force_empty_write, + }, + { + .name = "use_hierarchy", + .write_u64 = mem_cgroup_hierarchy_write, + .read_u64 = mem_cgroup_hierarchy_read, + }, + { + .name = "cgroup.event_control", /* XXX: for compat */ + .write = memcg_write_event_control, + .flags = CFTYPE_NO_PREFIX | CFTYPE_WORLD_WRITABLE, + }, + { + .name = "swappiness", + .read_u64 = mem_cgroup_swappiness_read, + .write_u64 = mem_cgroup_swappiness_write, + }, + { + .name = "move_charge_at_immigrate", + .read_u64 = mem_cgroup_move_charge_read, + .write_u64 = mem_cgroup_move_charge_write, + }, + { + .name = "oom_control", + .seq_show = mem_cgroup_oom_control_read, + .write_u64 = mem_cgroup_oom_control_write, + }, + { + .name = "pressure_level", + .seq_show = mem_cgroup_dummy_seq_show, + }, +#ifdef CONFIG_NUMA + { + .name = "numa_stat", + .seq_show = memcg_numa_stat_show, + }, +#endif + { + .name = "kmem.limit_in_bytes", + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.usage_in_bytes", + .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.failcnt", + .private = MEMFILE_PRIVATE(_KMEM, RES_FAILCNT), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.max_usage_in_bytes", + .private = MEMFILE_PRIVATE(_KMEM, RES_MAX_USAGE), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) + { + .name = "kmem.slabinfo", + .seq_show = mem_cgroup_slab_show, + }, +#endif + { + .name = "kmem.tcp.limit_in_bytes", + .private = MEMFILE_PRIVATE(_TCP, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.tcp.usage_in_bytes", + .private = MEMFILE_PRIVATE(_TCP, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.tcp.failcnt", + .private = MEMFILE_PRIVATE(_TCP, RES_FAILCNT), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.tcp.max_usage_in_bytes", + .private = MEMFILE_PRIVATE(_TCP, RES_MAX_USAGE), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { }, /* terminate */ +}; + +struct cftype memsw_files[] = { + { + .name = "memsw.usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "memsw.max_usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "memsw.limit_in_bytes", + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "memsw.failcnt", + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { }, /* terminate */ +}; + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 972c493a8ae32..7be4670d9abbb 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,6 +3,8 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H +#include + void memcg1_update_tree(struct mem_cgroup *memcg, int nid); void memcg1_remove_from_trees(struct mem_cgroup *memcg); @@ -34,12 +36,6 @@ int memcg1_can_attach(struct cgroup_taskset *tset); void memcg1_cancel_attach(struct cgroup_taskset *tset); void memcg1_move_task(void); -struct cftype; -u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, - struct cftype *cft); -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val); - /* * Per memcg event counter is incremented at every pagein/pageout. With THP, * it will be incremented by the number of pages. This counter is used @@ -86,11 +82,28 @@ enum res_type { bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off); bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); void memcg1_oom_recover(struct mem_cgroup *memcg); +void drain_all_stock(struct mem_cgroup *root_memcg); +unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, + unsigned int lru_mask, bool tree); +unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, + int nid, unsigned int lru_mask, + bool tree); + +unsigned long memcg_events(struct mem_cgroup *memcg, int event); +unsigned long memcg_events_local(struct mem_cgroup *memcg, int event); +unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx); +unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item); +unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item); +int memory_stat_show(struct seq_file *m, void *v); + +void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); + +extern struct cftype memsw_files[]; +extern struct cftype mem_cgroup_legacy_files[]; + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1622086cab274..8a28ba4ca93ea 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -96,10 +96,6 @@ static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 -#define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) -#define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) -#define MEMFILE_ATTR(val) ((val) & 0xffff) - static inline bool task_is_dying(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || @@ -676,7 +672,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, } /* idx can be of type enum memcg_stat_item or node_stat_item. */ -static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx) +unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx) { long x; int i = memcg_stats_index(idx); @@ -825,7 +821,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, memcg_stats_unlock(); } -static unsigned long memcg_events(struct mem_cgroup *memcg, int event) +unsigned long memcg_events(struct mem_cgroup *memcg, int event) { int i = memcg_events_index(event); @@ -835,7 +831,7 @@ static unsigned long memcg_events(struct mem_cgroup *memcg, int event) return READ_ONCE(memcg->vmstats->events[i]); } -static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) +unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) { int i = memcg_events_index(event); @@ -1420,15 +1416,13 @@ static int memcg_page_state_output_unit(int item) } } -static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, - int item) +unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item) { return memcg_page_state(memcg, item) * memcg_page_state_output_unit(item); } -static inline unsigned long memcg_page_state_local_output( - struct mem_cgroup *memcg, int item) +unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item) { return memcg_page_state_local(memcg, item) * memcg_page_state_output_unit(item); @@ -1487,8 +1481,6 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) WARN_ON_ONCE(seq_buf_has_overflowed(s)); } -static void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); - static void memory_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) { if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) @@ -1861,7 +1853,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) * Drains all per-CPU charge caches for given root_memcg resp. subtree * of the hierarchy under it. */ -static void drain_all_stock(struct mem_cgroup *root_memcg) +void drain_all_stock(struct mem_cgroup *root_memcg) { int cpu, curcpu; @@ -3117,120 +3109,6 @@ void split_page_memcg(struct page *head, int old_order, int new_order) css_get_many(&memcg->css, old_nr / new_nr - 1); } - -static DEFINE_MUTEX(memcg_max_mutex); - -static int mem_cgroup_resize_max(struct mem_cgroup *memcg, - unsigned long max, bool memsw) -{ - bool enlarge = false; - bool drained = false; - int ret; - bool limits_invariant; - struct page_counter *counter = memsw ? &memcg->memsw : &memcg->memory; - - do { - if (signal_pending(current)) { - ret = -EINTR; - break; - } - - mutex_lock(&memcg_max_mutex); - /* - * Make sure that the new limit (memsw or memory limit) doesn't - * break our basic invariant rule memory.max <= memsw.max. - */ - limits_invariant = memsw ? max >= READ_ONCE(memcg->memory.max) : - max <= memcg->memsw.max; - if (!limits_invariant) { - mutex_unlock(&memcg_max_mutex); - ret = -EINVAL; - break; - } - if (max > counter->max) - enlarge = true; - ret = page_counter_set_max(counter, max); - mutex_unlock(&memcg_max_mutex); - - if (!ret) - break; - - if (!drained) { - drain_all_stock(memcg); - drained = true; - continue; - } - - if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { - ret = -EBUSY; - break; - } - } while (true); - - if (!ret && enlarge) - memcg1_oom_recover(memcg); - - return ret; -} - -/* - * Reclaims as many pages from the given memcg as possible. - * - * Caller is responsible for holding css reference for memcg. - */ -static int mem_cgroup_force_empty(struct mem_cgroup *memcg) -{ - int nr_retries = MAX_RECLAIM_RETRIES; - - /* we call try-to-free pages for make this cgroup empty */ - lru_add_drain_all(); - - drain_all_stock(memcg); - - /* try to free all pages in this cgroup */ - while (nr_retries && page_counter_read(&memcg->memory)) { - if (signal_pending(current)) - return -EINTR; - - if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_MAY_SWAP, NULL)) - nr_retries--; - } - - return 0; -} - -static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, - char *buf, size_t nbytes, - loff_t off) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - - if (mem_cgroup_is_root(memcg)) - return -EINVAL; - return mem_cgroup_force_empty(memcg) ?: nbytes; -} - -static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - return 1; -} - -static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - if (val == 1) - return 0; - - pr_warn_once("Non-hierarchical mode is deprecated. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - - return -EINVAL; -} - unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) { unsigned long val; @@ -3253,67 +3131,6 @@ unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) return val; } -enum { - RES_USAGE, - RES_LIMIT, - RES_MAX_USAGE, - RES_FAILCNT, - RES_SOFT_LIMIT, -}; - -static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - struct page_counter *counter; - - switch (MEMFILE_TYPE(cft->private)) { - case _MEM: - counter = &memcg->memory; - break; - case _MEMSWAP: - counter = &memcg->memsw; - break; - case _KMEM: - counter = &memcg->kmem; - break; - case _TCP: - counter = &memcg->tcpmem; - break; - default: - BUG(); - } - - switch (MEMFILE_ATTR(cft->private)) { - case RES_USAGE: - if (counter == &memcg->memory) - return (u64)mem_cgroup_usage(memcg, false) * PAGE_SIZE; - if (counter == &memcg->memsw) - return (u64)mem_cgroup_usage(memcg, true) * PAGE_SIZE; - return (u64)page_counter_read(counter) * PAGE_SIZE; - case RES_LIMIT: - return (u64)counter->max * PAGE_SIZE; - case RES_MAX_USAGE: - return (u64)counter->watermark * PAGE_SIZE; - case RES_FAILCNT: - return counter->failcnt; - case RES_SOFT_LIMIT: - return (u64)READ_ONCE(memcg->soft_limit) * PAGE_SIZE; - default: - BUG(); - } -} - -/* - * This function doesn't do anything useful. Its only job is to provide a read - * handler for a file so that cgroup_file_mode() will add read permissions. - */ -static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, - __always_unused void *v) -{ - return -EINVAL; -} - #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { @@ -3375,139 +3192,9 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg) } #endif /* CONFIG_MEMCG_KMEM */ -static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long max) -{ - int ret; - - mutex_lock(&memcg_max_mutex); - - ret = page_counter_set_max(&memcg->tcpmem, max); - if (ret) - goto out; - - if (!memcg->tcpmem_active) { - /* - * The active flag needs to be written after the static_key - * update. This is what guarantees that the socket activation - * function is the last one to run. See mem_cgroup_sk_alloc() - * for details, and note that we don't mark any socket as - * belonging to this memcg until that flag is up. - * - * We need to do this, because static_keys will span multiple - * sites, but we can't control their order. If we mark a socket - * as accounted, but the accounting functions are not patched in - * yet, we'll lose accounting. - * - * We never race with the readers in mem_cgroup_sk_alloc(), - * because when this value change, the code to process it is not - * patched in yet. - */ - static_branch_inc(&memcg_sockets_enabled_key); - memcg->tcpmem_active = true; - } -out: - mutex_unlock(&memcg_max_mutex); - return ret; -} - -/* - * The user of this function is... - * RES_LIMIT. - */ -static ssize_t mem_cgroup_write(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - unsigned long nr_pages; - int ret; - - buf = strstrip(buf); - ret = page_counter_memparse(buf, "-1", &nr_pages); - if (ret) - return ret; - - switch (MEMFILE_ATTR(of_cft(of)->private)) { - case RES_LIMIT: - if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ - ret = -EINVAL; - break; - } - switch (MEMFILE_TYPE(of_cft(of)->private)) { - case _MEM: - ret = mem_cgroup_resize_max(memcg, nr_pages, false); - break; - case _MEMSWAP: - ret = mem_cgroup_resize_max(memcg, nr_pages, true); - break; - case _KMEM: - pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " - "Writing any value to this file has no effect. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - ret = 0; - break; - case _TCP: - ret = memcg_update_tcp_max(memcg, nr_pages); - break; - } - break; - case RES_SOFT_LIMIT: - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - ret = -EOPNOTSUPP; - } else { - WRITE_ONCE(memcg->soft_limit, nr_pages); - ret = 0; - } - break; - } - return ret ?: nbytes; -} - -static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, - size_t nbytes, loff_t off) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - struct page_counter *counter; - - switch (MEMFILE_TYPE(of_cft(of)->private)) { - case _MEM: - counter = &memcg->memory; - break; - case _MEMSWAP: - counter = &memcg->memsw; - break; - case _KMEM: - counter = &memcg->kmem; - break; - case _TCP: - counter = &memcg->tcpmem; - break; - default: - BUG(); - } - - switch (MEMFILE_ATTR(of_cft(of)->private)) { - case RES_MAX_USAGE: - page_counter_reset_watermark(counter); - break; - case RES_FAILCNT: - counter->failcnt = 0; - break; - default: - BUG(); - } - - return nbytes; -} - -#ifdef CONFIG_NUMA - -#define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) -#define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON)) -#define LRU_ALL ((1 << NR_LRU_LISTS) - 1) - -static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, - int nid, unsigned int lru_mask, bool tree) +unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, + int nid, unsigned int lru_mask, + bool tree) { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid)); unsigned long nr = 0; @@ -3526,9 +3213,8 @@ static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, return nr; } -static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, - unsigned int lru_mask, - bool tree) +unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, + unsigned int lru_mask, bool tree) { unsigned long nr = 0; enum lru_list lru; @@ -3544,221 +3230,6 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, return nr; } -static int memcg_numa_stat_show(struct seq_file *m, void *v) -{ - struct numa_stat { - const char *name; - unsigned int lru_mask; - }; - - static const struct numa_stat stats[] = { - { "total", LRU_ALL }, - { "file", LRU_ALL_FILE }, - { "anon", LRU_ALL_ANON }, - { "unevictable", BIT(LRU_UNEVICTABLE) }, - }; - const struct numa_stat *stat; - int nid; - struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - - mem_cgroup_flush_stats(memcg); - - for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { - seq_printf(m, "%s=%lu", stat->name, - mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, - false)); - for_each_node_state(nid, N_MEMORY) - seq_printf(m, " N%d=%lu", nid, - mem_cgroup_node_nr_lru_pages(memcg, nid, - stat->lru_mask, false)); - seq_putc(m, '\n'); - } - - for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { - - seq_printf(m, "hierarchical_%s=%lu", stat->name, - mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, - true)); - for_each_node_state(nid, N_MEMORY) - seq_printf(m, " N%d=%lu", nid, - mem_cgroup_node_nr_lru_pages(memcg, nid, - stat->lru_mask, true)); - seq_putc(m, '\n'); - } - - return 0; -} -#endif /* CONFIG_NUMA */ - -static const unsigned int memcg1_stats[] = { - NR_FILE_PAGES, - NR_ANON_MAPPED, -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - NR_ANON_THPS, -#endif - NR_SHMEM, - NR_FILE_MAPPED, - NR_FILE_DIRTY, - NR_WRITEBACK, - WORKINGSET_REFAULT_ANON, - WORKINGSET_REFAULT_FILE, -#ifdef CONFIG_SWAP - MEMCG_SWAP, - NR_SWAPCACHE, -#endif -}; - -static const char *const memcg1_stat_names[] = { - "cache", - "rss", -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - "rss_huge", -#endif - "shmem", - "mapped_file", - "dirty", - "writeback", - "workingset_refault_anon", - "workingset_refault_file", -#ifdef CONFIG_SWAP - "swap", - "swapcached", -#endif -}; - -/* Universal VM events cgroup1 shows, original sort order */ -static const unsigned int memcg1_events[] = { - PGPGIN, - PGPGOUT, - PGFAULT, - PGMAJFAULT, -}; - -static void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) -{ - unsigned long memory, memsw; - struct mem_cgroup *mi; - unsigned int i; - - BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); - - mem_cgroup_flush_stats(memcg); - - for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { - unsigned long nr; - - nr = memcg_page_state_local_output(memcg, memcg1_stats[i]); - seq_buf_printf(s, "%s %lu\n", memcg1_stat_names[i], nr); - } - - for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) - seq_buf_printf(s, "%s %lu\n", vm_event_name(memcg1_events[i]), - memcg_events_local(memcg, memcg1_events[i])); - - for (i = 0; i < NR_LRU_LISTS; i++) - seq_buf_printf(s, "%s %lu\n", lru_list_name(i), - memcg_page_state_local(memcg, NR_LRU_BASE + i) * - PAGE_SIZE); - - /* Hierarchical information */ - memory = memsw = PAGE_COUNTER_MAX; - for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) { - memory = min(memory, READ_ONCE(mi->memory.max)); - memsw = min(memsw, READ_ONCE(mi->memsw.max)); - } - seq_buf_printf(s, "hierarchical_memory_limit %llu\n", - (u64)memory * PAGE_SIZE); - seq_buf_printf(s, "hierarchical_memsw_limit %llu\n", - (u64)memsw * PAGE_SIZE); - - for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { - unsigned long nr; - - nr = memcg_page_state_output(memcg, memcg1_stats[i]); - seq_buf_printf(s, "total_%s %llu\n", memcg1_stat_names[i], - (u64)nr); - } - - for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) - seq_buf_printf(s, "total_%s %llu\n", - vm_event_name(memcg1_events[i]), - (u64)memcg_events(memcg, memcg1_events[i])); - - for (i = 0; i < NR_LRU_LISTS; i++) - seq_buf_printf(s, "total_%s %llu\n", lru_list_name(i), - (u64)memcg_page_state(memcg, NR_LRU_BASE + i) * - PAGE_SIZE); - -#ifdef CONFIG_DEBUG_VM - { - pg_data_t *pgdat; - struct mem_cgroup_per_node *mz; - unsigned long anon_cost = 0; - unsigned long file_cost = 0; - - for_each_online_pgdat(pgdat) { - mz = memcg->nodeinfo[pgdat->node_id]; - - anon_cost += mz->lruvec.anon_cost; - file_cost += mz->lruvec.file_cost; - } - seq_buf_printf(s, "anon_cost %lu\n", anon_cost); - seq_buf_printf(s, "file_cost %lu\n", file_cost); - } -#endif -} - -static u64 mem_cgroup_swappiness_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - - return mem_cgroup_swappiness(memcg); -} - -static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - - if (val > MAX_SWAPPINESS) - return -EINVAL; - - if (!mem_cgroup_is_root(memcg)) - WRITE_ONCE(memcg->swappiness, val); - else - WRITE_ONCE(vm_swappiness, val); - - return 0; -} - -static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) -{ - struct mem_cgroup *memcg = mem_cgroup_from_seq(sf); - - seq_printf(sf, "oom_kill_disable %d\n", READ_ONCE(memcg->oom_kill_disable)); - seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom); - seq_printf(sf, "oom_kill %lu\n", - atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL])); - return 0; -} - -static int mem_cgroup_oom_control_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - - /* cannot set to root cgroup and only 0 and 1 are allowed */ - if (mem_cgroup_is_root(memcg) || !((val == 0) || (val == 1))) - return -EINVAL; - - WRITE_ONCE(memcg->oom_kill_disable, val); - if (!val) - memcg1_oom_recover(memcg); - - return 0; -} - #ifdef CONFIG_CGROUP_WRITEBACK #include @@ -3972,147 +3443,6 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg) #endif /* CONFIG_CGROUP_WRITEBACK */ -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) -static int mem_cgroup_slab_show(struct seq_file *m, void *p) -{ - /* - * Deprecated. - * Please, take a look at tools/cgroup/memcg_slabinfo.py . - */ - return 0; -} -#endif - -static int memory_stat_show(struct seq_file *m, void *v); - -static struct cftype mem_cgroup_legacy_files[] = { - { - .name = "usage_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_USAGE), - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "max_usage_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "limit_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "soft_limit_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_SOFT_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "failcnt", - .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "stat", - .seq_show = memory_stat_show, - }, - { - .name = "force_empty", - .write = mem_cgroup_force_empty_write, - }, - { - .name = "use_hierarchy", - .write_u64 = mem_cgroup_hierarchy_write, - .read_u64 = mem_cgroup_hierarchy_read, - }, - { - .name = "cgroup.event_control", /* XXX: for compat */ - .write = memcg_write_event_control, - .flags = CFTYPE_NO_PREFIX | CFTYPE_WORLD_WRITABLE, - }, - { - .name = "swappiness", - .read_u64 = mem_cgroup_swappiness_read, - .write_u64 = mem_cgroup_swappiness_write, - }, - { - .name = "move_charge_at_immigrate", - .read_u64 = mem_cgroup_move_charge_read, - .write_u64 = mem_cgroup_move_charge_write, - }, - { - .name = "oom_control", - .seq_show = mem_cgroup_oom_control_read, - .write_u64 = mem_cgroup_oom_control_write, - }, - { - .name = "pressure_level", - .seq_show = mem_cgroup_dummy_seq_show, - }, -#ifdef CONFIG_NUMA - { - .name = "numa_stat", - .seq_show = memcg_numa_stat_show, - }, -#endif - { - .name = "kmem.limit_in_bytes", - .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.usage_in_bytes", - .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE), - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.failcnt", - .private = MEMFILE_PRIVATE(_KMEM, RES_FAILCNT), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.max_usage_in_bytes", - .private = MEMFILE_PRIVATE(_KMEM, RES_MAX_USAGE), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) - { - .name = "kmem.slabinfo", - .seq_show = mem_cgroup_slab_show, - }, -#endif - { - .name = "kmem.tcp.limit_in_bytes", - .private = MEMFILE_PRIVATE(_TCP, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.tcp.usage_in_bytes", - .private = MEMFILE_PRIVATE(_TCP, RES_USAGE), - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.tcp.failcnt", - .private = MEMFILE_PRIVATE(_TCP, RES_FAILCNT), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.tcp.max_usage_in_bytes", - .private = MEMFILE_PRIVATE(_TCP, RES_MAX_USAGE), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { }, /* terminate */ -}; - /* * Private memory cgroup IDR * @@ -4904,7 +4234,7 @@ static int memory_events_local_show(struct seq_file *m, void *v) return 0; } -static int memory_stat_show(struct seq_file *m, void *v) +int memory_stat_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL); @@ -6135,33 +5465,6 @@ static struct cftype swap_files[] = { { } /* terminate */ }; -static struct cftype memsw_files[] = { - { - .name = "memsw.usage_in_bytes", - .private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "memsw.max_usage_in_bytes", - .private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "memsw.limit_in_bytes", - .private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "memsw.failcnt", - .private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { }, /* terminate */ -}; - #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) /** * obj_cgroup_may_zswap - check if this cgroup can zswap From patchwork Tue May 28 21:44:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677488 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B09C25B78 for ; Tue, 28 May 2024 21:45:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 431906B00A5; Tue, 28 May 2024 17:45:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 374C26B00A8; Tue, 28 May 2024 17:45:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20B226B00A6; Tue, 28 May 2024 17:45:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 03DA46B00A5 for ; Tue, 28 May 2024 17:45:01 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 63F71A1BEE for ; Tue, 28 May 2024 21:45:01 +0000 (UTC) X-FDA: 82169135202.11.D771A32 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) by imf05.hostedemail.com (Postfix) with ESMTP id 4EF63100013 for ; Tue, 28 May 2024 21:44:59 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=jYmTP6P9; spf=pass (imf05.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716932699; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f4ula8SYVBjOfrvxjW/OkqDvPiEodD69W30PbWhQWRY=; b=D3uFR7NROf5pMwnOHP1COlaqwSlsYF130HiQ7ExHzME9Sjb7FDWH+5sOZ18728WCLJ4c8v jzjmQh2hvN/85xNvrgu1+rRNIqup4qIBjva9mAyCfIJPca7d1gvm0nSPbkou9bzaH0qB9Q 6DiYqPaskn0NaejwUmGBQoKRzbsY7RM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=jYmTP6P9; spf=pass (imf05.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716932699; a=rsa-sha256; cv=none; b=1j5M72k1uRHdUGT9UCJn5p7n+TH7w1+vVginfapPiNWfPJ6b+DmaA1M2tsbP9E+LIuQRdp Ga91wXhSdAEj5xDdzxlScF6ffSTOSAUx94yuzsdrGqdke+3LrEdbFGStXjoUQXOS6VBUJG lQ/pWjU5a3Z2fcTjAVD7FsaPCcLMeT4= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716932697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f4ula8SYVBjOfrvxjW/OkqDvPiEodD69W30PbWhQWRY=; b=jYmTP6P9ngp+GGp/Jp43SJZqmrpNAcaEPMeezUJYIj/+lpi5MWRiUA6ciB715FDT4xAecG oonitBUfENxBApr2YtRZE8X6UqIOqFl2SnRkXf/stwdJ2BGqowM3n9CGd+fNpnm1MeTZg8 EXI6IPdTuZauhLzmc/DCB/RMi+YuHK8= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 11/14] mm: memcg: make memcg1_update_tree() static Date: Tue, 28 May 2024 14:44:32 -0700 Message-ID: <20240528214435.3125304-2-roman.gushchin@linux.dev> In-Reply-To: <20240528214435.3125304-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> <20240528214435.3125304-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 4EF63100013 X-Stat-Signature: tkunjs6jyxy9cifo9wt8yegcs8jxma6e X-HE-Tag: 1716932699-570599 X-HE-Meta: U2FsdGVkX19N9NFT+IrE3Lt/6f6KkzZL+F92dj7NNeiqFnE9q8JxaeiByPLBv2gt2M8QCoJ6fCKPZJXm4e0HFz8B5EgCo5nq6iP44FuBSkNeMGF+xV9TkooZXCJazLzGirZMqW95fym2laCFNGM4+utVaENK1PVbD9rC3mNEsiFFI/0JAlT4bVqJxrxQ4KHijhOgJuR5B2hohBR30un2FG4jIyY8yN6ZLi0mhUWn+YWts2kIzR+91eMuqUsYR5yIjVyuv6OCLgQi9tVBLGH/ol+Cjx0POmYoQj5Da0Ki9rb5ut/s3pQDAOAk/31uvr9qbvAsTUgJVwyQLS6jUSVrJEE9jBxueGykXXNngOwX9G4f49iX8cVO1KVdSM5r2KScMWioVdJ2woedSgbjBtIblESYb20UDhPzIDc1EUTrUe6vZ4GBxoE/KWuweZoGUU/2q32QPYeTXJClnUoza09+rx/zpna6fJuV0bIDOXK6Airo6SjPulo11IFXrWYaa3M/NzJPOoVH8n438XkdmAOcCkJ6zG1KTA3geopHYztQi+VvOVSl5xVKNUiyV6+GZhtwxerhjM81Ww6IM4fpyKLs5aqalXxY417ORy8MNnavc4nwtFIapJWKJe/ctIL7NYZUkX5r/xOqv3n0AEFStnxeIgVEzp/sOQRHJqSbELezkbIENd9I8duOIT9e+rC3OaL2QW6I/wQNNWi/RqLnli2u1IEaU36jO1sJfwPhVw3/maCUPakpxSAhhRCmGYlabuxO8mzIMNTJc5SkmDg3ilfXHTg79xoL4qz8K7YnbVtF4QhezuvyZQ9uPcgxH+w18MoX/X35VhLb66DT1hNsFXBv14TSTQgm605Aon4T/mpAjggjATkXSlVBCsD5Piw0jPuNsUdZ/FkaRv5zWoY93qUarLUQi71cNAP2Opbh4Tc0rZzfVxgvj4fpea5Nu7X4CmMnwvnog5t6TM9lof4v1ye MMxIgLPt CbZ9oyBuew8M2bx+wGtiyxCAa4FhULQ7Qn7j8gVpEVo0YVlJiXDNw69V5q5nu24L3VAJxO6aS3BVOCbfH97ONICqe+fEzs/ips0SwpUFEXjWnAu2D6Zxq48m9H8KIW5HR+wgs X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: memcg1_update_tree() is not used outside of mm/memcontrol-v1.c anymore, define it as static and remove the declaration from the header file. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- mm/memcontrol-v1.c | 2 +- mm/memcontrol-v1.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 3972172f83624..b69181322fb18 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -201,7 +201,7 @@ static unsigned long soft_limit_excess(struct mem_cgroup *memcg) return excess; } -void memcg1_update_tree(struct mem_cgroup *memcg, int nid) +static void memcg1_update_tree(struct mem_cgroup *memcg, int nid) { unsigned long excess; struct mem_cgroup_per_node *mz; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7be4670d9abbb..7d6ac4a4fb36d 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -5,7 +5,6 @@ #include -void memcg1_update_tree(struct mem_cgroup *memcg, int nid); void memcg1_remove_from_trees(struct mem_cgroup *memcg); static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) From patchwork Tue May 28 21:44:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1999CC27C43 for ; Tue, 28 May 2024 21:45:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 211736B00AB; Tue, 28 May 2024 17:45:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 172FB6B00AE; Tue, 28 May 2024 17:45:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F04BE6B00AD; Tue, 28 May 2024 17:45:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D023F6B00A6 for ; Tue, 28 May 2024 17:45:03 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 869764071C for ; Tue, 28 May 2024 21:45:03 +0000 (UTC) X-FDA: 82169135286.19.3684D0C Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) by imf24.hostedemail.com (Postfix) with ESMTP id 41EB5180003 for ; Tue, 28 May 2024 21:45:01 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Z43bJfGX; spf=pass (imf24.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716932701; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gWiY2lI1QItCOH2GFi55k4QyY7Ks3qNYQGCerCKa1hI=; b=6T04bWlmbS9b//lXCqm9mTgqCCHibiXZYN9ZnfbPCozsZzXcFFj1od7XOepMagnsLVirMv iVpMl/Y95SeS5muLYliqBvzKxDx1M5wzQui7HsSqFynvrRMcAZPxsDUrdEaQJI/NiWVUHQ l1Fo/txGvgMx14KgfakNq6VDPQFd1qA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716932701; a=rsa-sha256; cv=none; b=0APawihj2cn8vj1tnhR8728mwcmZ35mtO84eV1cCW3I9V6IFjrQMAVJbo4zxZTPuSBJQd6 YoIJuNSz1ii9FJjLW60wv3/yA360OH4s8KONgAnuFpOkiLi2YJLuaKuxQPHRzcAvhyQyEk oRLPkq8OdoirzsLH9thrIUy7BK+JM4o= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Z43bJfGX; spf=pass (imf24.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716932700; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gWiY2lI1QItCOH2GFi55k4QyY7Ks3qNYQGCerCKa1hI=; b=Z43bJfGX9tj4fp4sLBslNA7ukMkHICWqSxxE6paIJtPF/8oH76Oc1PGvo4XZ2ate4kJX2B ZiroGJsEm1BHip9HQ7ftUU0XHpvfRhnJPH0IRyQ9cD+U6ljjkRlB7/1caoMkQomqBkTHNt JF3gmM/jp/vL8hiwr9ukamXz8JWnl1k= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 12/14] mm: memcg: group cgroup v1 memcg related declarations Date: Tue, 28 May 2024 14:44:33 -0700 Message-ID: <20240528214435.3125304-3-roman.gushchin@linux.dev> In-Reply-To: <20240528214435.3125304-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> <20240528214435.3125304-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Stat-Signature: jyb56mxdaaepmked1gjk1owxz9m73f3w X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 41EB5180003 X-HE-Tag: 1716932701-376252 X-HE-Meta: U2FsdGVkX18snCJ2pKtW2w/ph1/UcfII3r+lkuEgqkHhupFw13/os4GzAPz+3BztUcZc51vXaLWKoIPYjzq/FbDp9QmeUZzSHz8h8sCQYPsIyxFg4XKsi0h656Z7B/MYgRT2PsghbCdJ9Ga7bcxGO2VPgI3VpqFPtvWWKAhXEsv5rCsc9yZjWG1s6s16edMFZypX0BjK6TuGCbktjPrd815HDssW6h0eChMjQ/GnCMKoKdqe/wt53mv0rSausPA4Z6Ei4hBu7ZG98D9nX/l2P+vH2Eqh+UWH5pMKm3XQToN6laMNuzB6lmtO2Rx3zmueytjAdHN8DyEaPhnoErEbfMXHgMvwPcY22IObnnTXpCK21gOeJ9LIGH0hxcWeGcoYxaSbs9HTSUkn/ZpblBeGgC9MWS9lqxFSIqIqex6ZENYN+ccQjNcIaKUMCbdWSRFey6J03GW0VpiO5I4sBWgSe6HnibvpZAsL0z2qSpEGAQ7iIPu1Si9R2wIxL1tHjsWEO9Gj/G7piMNQCRc0jwCS9Qpzr/zXYymY2uINVWSZm+QG7L44zCsNJGcj6tYifudmJiYxbNafVHXVl/lvvg/B88/nik+bpRDrhAK4UU4a3LwVd55c1Hw7/T7zgmHXHgnraP9guW4wo1HdRbw2TV7cJIrrhKkCl/JxYVrPM/CR09ZjI+elL0ga83fl0hAJV1CU0WtKZ0ADwgnzGwjc5HMBMDNpV5/7zVQyW0xqfMrY6tgOfz8uzifH8UewgMDCy1QI/SsCWPEQCJlkNI5hJJr8vF0hDg6OAd4sbVZQ/5V6C0CI+bXaUv3IegBbxh3Wk48Hnaa/gLBggyaE7qxpx5ZzkO2LEkvuOHHZqWzTuhbzcYUbbfItDW8VpuMy0LDxTSPEEzH0lWHjLg8iwEZ3zk8iE9SP9MbrravBdudJr5HCQG3OYz+NSuvJZsLG1KW0FWnlIkRm9f+PAJwZjuk7cZs oXdzrIYg DQhJOOWxhGYJk/c3xoPxjjwaplGohstXwuBXAtKwR4Qlpbcm8lyiCdRl4lWZHprFL/1D2JdMQuBBODzY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Group all cgroup v1-related declarations at the end of memcontrol.h and mm/memcontrol-v1.h with an intention to put them all together under a config option later on. It should make things easier to follow and maintain too. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 144 +++++++++++++++++++------------------ mm/memcontrol-v1.h | 89 ++++++++++++----------- 2 files changed, 123 insertions(+), 110 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 394b92b620d9c..7474b20a5d3f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -944,39 +944,13 @@ static inline void mem_cgroup_exit_user_fault(void) current->in_user_fault = 0; } -static inline bool task_in_memcg_oom(struct task_struct *p) -{ - return p->memcg_in_oom; -} - -bool mem_cgroup_oom_synchronize(bool wait); struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim, struct mem_cgroup *oom_domain); void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); -void folio_memcg_lock(struct folio *folio); -void folio_memcg_unlock(struct folio *folio); - void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val); -/* try to stablize folio_memcg() for all the pages in a memcg */ -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) -{ - rcu_read_lock(); - - if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account)) - return true; - - rcu_read_unlock(); - return false; -} - -static inline void mem_cgroup_unlock_pages(void) -{ - rcu_read_unlock(); -} - /* idx can be of type enum memcg_stat_item or node_stat_item */ static inline void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val) @@ -1103,10 +1077,6 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, void split_page_memcg(struct page *head, int old_order, int new_order); -unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned); - #else /* CONFIG_MEMCG */ #define MEM_CGROUP_ID_SHIFT 0 @@ -1417,26 +1387,6 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { } -static inline void folio_memcg_lock(struct folio *folio) -{ -} - -static inline void folio_memcg_unlock(struct folio *folio) -{ -} - -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) -{ - /* to match folio_memcg_rcu() */ - rcu_read_lock(); - return true; -} - -static inline void mem_cgroup_unlock_pages(void) -{ - rcu_read_unlock(); -} - static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask) { } @@ -1449,16 +1399,6 @@ static inline void mem_cgroup_exit_user_fault(void) { } -static inline bool task_in_memcg_oom(struct task_struct *p) -{ - return false; -} - -static inline bool mem_cgroup_oom_synchronize(bool wait) -{ - return false; -} - static inline struct mem_cgroup *mem_cgroup_get_oom_group( struct task_struct *victim, struct mem_cgroup *oom_domain) { @@ -1552,14 +1492,6 @@ void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) static inline void split_page_memcg(struct page *head, int old_order, int new_order) { } - -static inline -unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - return 0; -} #endif /* CONFIG_MEMCG */ /* @@ -1910,4 +1842,80 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) } #endif + +/* Cgroup v1-related declarations */ + +#ifdef CONFIG_MEMCG +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned); + +bool mem_cgroup_oom_synchronize(bool wait); + +static inline bool task_in_memcg_oom(struct task_struct *p) +{ + return p->memcg_in_oom; +} + +void folio_memcg_lock(struct folio *folio); +void folio_memcg_unlock(struct folio *folio); + +/* try to stablize folio_memcg() for all the pages in a memcg */ +static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) +{ + rcu_read_lock(); + + if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account)) + return true; + + rcu_read_unlock(); + return false; +} + +static inline void mem_cgroup_unlock_pages(void) +{ + rcu_read_unlock(); +} + +#else /* CONFIG_MEMCG */ +static inline +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + return 0; +} + +static inline void folio_memcg_lock(struct folio *folio) +{ +} + +static inline void folio_memcg_unlock(struct folio *folio) +{ +} + +static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) +{ + /* to match folio_memcg_rcu() */ + rcu_read_lock(); + return true; +} + +static inline void mem_cgroup_unlock_pages(void) +{ + rcu_read_unlock(); +} + +static inline bool task_in_memcg_oom(struct task_struct *p) +{ + return false; +} + +static inline bool mem_cgroup_oom_synchronize(bool wait) +{ + return false; +} + +#endif /* CONFIG_MEMCG */ + #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7d6ac4a4fb36d..89d4207930481 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -5,15 +5,9 @@ #include -void memcg1_remove_from_trees(struct mem_cgroup *memcg); - -static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) -{ - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); -} +/* Cgroup v1 and v2 common declarations */ void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); -void memcg1_check_events(struct mem_cgroup *memcg, int nid); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); @@ -29,30 +23,6 @@ static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); -bool memcg1_wait_acct_move(struct mem_cgroup *memcg); -struct cgroup_taskset; -int memcg1_can_attach(struct cgroup_taskset *tset); -void memcg1_cancel_attach(struct cgroup_taskset *tset); -void memcg1_move_task(void); - -/* - * Per memcg event counter is incremented at every pagein/pageout. With THP, - * it will be incremented by the number of pages. This counter is used - * to trigger some periodic events. This is straightforward and better - * than using jiffies etc. to handle periodic memcg event. - */ -enum mem_cgroup_events_target { - MEM_CGROUP_TARGET_THRESH, - MEM_CGROUP_TARGET_SOFTLIMIT, - MEM_CGROUP_NTARGETS, -}; - -/* Whether legacy memory+swap accounting is active */ -static bool do_memsw_account(void) -{ - return !cgroup_subsys_on_dfl(memory_cgrp_subsys); -} - /* * Iteration constructs for visiting all cgroups (under a tree). If * loops are exited prematurely (break), mem_cgroup_iter_break() must @@ -68,24 +38,28 @@ static bool do_memsw_account(void) iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) -void memcg1_css_offline(struct mem_cgroup *memcg); +/* Whether legacy memory+swap accounting is active */ +static bool do_memsw_account(void) +{ + return !cgroup_subsys_on_dfl(memory_cgrp_subsys); +} -/* for encoding cft->private value on file */ -enum res_type { - _MEM, - _MEMSWAP, - _KMEM, - _TCP, +/* + * Per memcg event counter is incremented at every pagein/pageout. With THP, + * it will be incremented by the number of pages. This counter is used + * to trigger some periodic events. This is straightforward and better + * than using jiffies etc. to handle periodic memcg event. + */ +enum mem_cgroup_events_target { + MEM_CGROUP_TARGET_THRESH, + MEM_CGROUP_TARGET_SOFTLIMIT, + MEM_CGROUP_NTARGETS, }; bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); -void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); -void memcg1_oom_recover(struct mem_cgroup *memcg); - void drain_all_stock(struct mem_cgroup *root_memcg); unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, unsigned int lru_mask, bool tree); @@ -100,6 +74,37 @@ unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item); unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item); int memory_stat_show(struct seq_file *m, void *v); +/* Cgroup v1-specific declarations */ + +void memcg1_remove_from_trees(struct mem_cgroup *memcg); + +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) +{ + WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); +} + +bool memcg1_wait_acct_move(struct mem_cgroup *memcg); + +struct cgroup_taskset; +int memcg1_can_attach(struct cgroup_taskset *tset); +void memcg1_cancel_attach(struct cgroup_taskset *tset); +void memcg1_move_task(void); +void memcg1_css_offline(struct mem_cgroup *memcg); + +/* for encoding cft->private value on file */ +enum res_type { + _MEM, + _MEMSWAP, + _KMEM, + _TCP, +}; + +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); +void memcg1_oom_recover(struct mem_cgroup *memcg); + +void memcg1_check_events(struct mem_cgroup *memcg, int nid); + void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); extern struct cftype memsw_files[]; From patchwork Tue May 28 21:44:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2DA4C25B78 for ; Tue, 28 May 2024 21:45:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 850E96B00AE; Tue, 28 May 2024 17:45:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B46C6B00B0; Tue, 28 May 2024 17:45:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DDB46B00B1; Tue, 28 May 2024 17:45:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3EBCC6B00AE for ; Tue, 28 May 2024 17:45:06 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C34F81A041D for ; Tue, 28 May 2024 21:45:05 +0000 (UTC) X-FDA: 82169135370.04.1298373 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) by imf24.hostedemail.com (Postfix) with ESMTP id 9272018000F for ; Tue, 28 May 2024 21:45:03 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="hrL47g/m"; spf=pass (imf24.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716932703; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l2pLA/RqNoqlbvbC4p4fDwpEqhgLufpxOxGhK7S1G2o=; b=ZY697+V58Ld8batFAOoGXRnTuKUdUk7nFU6g14b7l14AEG+KAVsxL0ndHMQRP2O7Ox0nTP cLktpf+wpIEgf+Orb1GivyRM3DuSMea0Gei84CLHL02BPNqG1UO0DxsYFtwlbTxJTJfBFa HQ6xklZyfWl2J686UYng22YKltGfS9U= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="hrL47g/m"; spf=pass (imf24.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716932703; a=rsa-sha256; cv=none; b=VD3dbZwI4S3Q4PlRRMhfqaO/PgaNb1jBfXnnxK4jKH05qlqDzF3VQguSShSzNpPCyndL0X 43dsSjwcDiRPAG/RzYsJhqHdDVsUIMxjEEwqrSMzolc9z3Te+aFqa1OGqZB5yYmz1gYmP1 8bLMAJUIf9eOxUIr78AxOhjEc51tNAw= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716932702; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l2pLA/RqNoqlbvbC4p4fDwpEqhgLufpxOxGhK7S1G2o=; b=hrL47g/mD1SQrSO2YhcIljWTdeJXZtuP+O7Sspx8GuL8CyR5bK0pB33DFF/Flum7NaPBcj wuqXxuVDRjfVR4Hq70V1K0YhgI/IldHzP4IpGe0FW+30xwSnmKy8X5nPUvW2IVWvuTOWSK y7K++vRfFd0kmbxOqcAuKddTyqUJAUo= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 13/14] mm: memcg: put cgroup v1-related members of task_struct under config option Date: Tue, 28 May 2024 14:44:34 -0700 Message-ID: <20240528214435.3125304-4-roman.gushchin@linux.dev> In-Reply-To: <20240528214435.3125304-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> <20240528214435.3125304-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9272018000F X-Stat-Signature: xjpgewhp1dco7deyqpbx9zje1cwssg6n X-Rspam-User: X-HE-Tag: 1716932703-505982 X-HE-Meta: U2FsdGVkX18yr8dZs3EhQbzetN568DUBGJPAagK3ys5oumtZW30E3UxEQSPuOaBk+DBXqvWyxzGCBSEb8AZWBmDue54rvg6gaSgNmQ9V9EvYWaQMg8h4bfVQbdv7/7qQoAcjMlLL5vy1jQoA4NUaELkEgpv7LZb2LVy2UhChp8otre9Jqoz7+ICEb4nN2yZB1s2C3Sgj5yetYL9k2L9piGXrhgJoe7hJJpVoQo/v15FrM8rCLLPaTZp2g1QCY+RZEs1T5aMD4wV2J0kv5D8lS+Wz3w99AgjU1moavEgH4i+3jxkg8P2P78TL1p88QMpM8LT1zHD/fkhkM/paGM557dGwnhTakYaULaS3mZv6wNH584aCzkKt0AmOyZPtGrDuG+QvwbpuPnKNbC/qlf/VXapuB+xZLVrHYGBkVB2HZvdfEYrjRgLa0MJA70WZYf89t0+JNGBfWKPrvZXaZJ52hLe+h1jNpQZFqFxBDIZ8yB9cVw8ACnI0bxi0ohpz2hrfG9D9xCeLWj59lurAJG8ULeFQk8JGfvfOQHln9V316YoyltqYkXmOxs0ccxMk5ANRXLamW6AYarpSs9GO13aSZqnbX+23Ycih3lQgmB1f68Ed9g0CZr6SUjos66gQMtJk3WNrQlAU985X6/PzsLRnaj0c+aqo3UMleeCOMlCAxClYNEgXwJ3MQYshrop4jFbbPvsYHtLEyRL/1HUCejPoLbJJGcNpFYywVaQJvUpAAmK65xCk1dcMRxGRwa3NXyHhOPld2V/pzjWLiT19LKWVH20CqbpzVehxlNCuGeGjS5vj61JVcINZLbX8vWt0v0kF8CAvK0tlGBIl8ZqofTI7pDqKa32FH3BYqjMCQdAZ5Z1PmcBqbp7Mlmen78+hyRdsK2npCC1G5ao4sqnTrpIkO7ELf2CHYDe4jYcNkPt3+4MFmuVR5oVg0dwbbDpYhIfrsFrsgWF0DGZqvuQNndE wHszYkJW rf7FqY6Nv2OookxDkGi+RsGsRqJlvlZHyKolTK2VJKimzKepOHa/uZf/UkEpugO3hMtQ3sibYrVOxIRmNi3YZ9ORwVSjDlleTmwTETTty7FvsaMJ0i88BRgxOgwVlzcgBd+Uk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Guard cgroup v1-related members of task_struct under the CONFIG_MEMCG_V1 config option, so that users who adopted cgroup v2 don't have to waste the memory for fields which are never accessed. Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 2 +- init/Kconfig | 9 +++++++++ mm/Makefile | 3 ++- mm/memcontrol-v1.h | 21 ++++++++++++++++++++- mm/memcontrol.c | 10 +++++++--- 5 files changed, 39 insertions(+), 6 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 7474b20a5d3f5..aaa97ec0634f7 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1845,7 +1845,7 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) /* Cgroup v1-related declarations */ -#ifdef CONFIG_MEMCG +#ifdef CONFIG_MEMCG_V1 unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned); diff --git a/init/Kconfig b/init/Kconfig index febdea2afc3be..5191b6435b4e7 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -969,6 +969,15 @@ config MEMCG help Provides control over the memory footprint of tasks in a cgroup. +config MEMCG_V1 + bool "Legacy memory controller" + depends on MEMCG + default n + help + Legacy cgroup v1 memory controller. + + San N is unsure. + config MEMCG_KMEM bool depends on MEMCG diff --git a/mm/Makefile b/mm/Makefile index 124d4dea20351..d2915f8c9dc01 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -96,7 +96,8 @@ obj-$(CONFIG_NUMA) += memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o -obj-$(CONFIG_MEMCG) += memcontrol.o memcontrol-v1.o vmpressure.o +obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o +obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 89d4207930481..64b053d7f131e 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -75,7 +75,7 @@ unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item); int memory_stat_show(struct seq_file *m, void *v); /* Cgroup v1-specific declarations */ - +#ifdef CONFIG_MEMCG_V1 void memcg1_remove_from_trees(struct mem_cgroup *memcg); static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) @@ -110,4 +110,23 @@ void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); extern struct cftype memsw_files[]; extern struct cftype mem_cgroup_legacy_files[]; +#else /* CONFIG_MEMCG_V1 */ + +static inline void memcg1_remove_from_trees(struct mem_cgroup *memcg) {} +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) {} +static inline bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { return false; } +static inline void memcg1_css_offline(struct mem_cgroup *memcg) {} + +static inline bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) { return true; } +static inline void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked) {} +static inline void memcg1_oom_recover(struct mem_cgroup *memcg) {} + +static inline void memcg1_check_events(struct mem_cgroup *memcg, int nid) {} + +static inline void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) {} + +extern struct cftype memsw_files[]; +extern struct cftype mem_cgroup_legacy_files[]; +#endif /* CONFIG_MEMCG_V1 */ + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8a28ba4ca93ea..8cbceeb9c7ac4 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4473,18 +4473,20 @@ struct cgroup_subsys memory_cgrp_subsys = { .css_free = mem_cgroup_css_free, .css_reset = mem_cgroup_css_reset, .css_rstat_flush = mem_cgroup_css_rstat_flush, - .can_attach = memcg1_can_attach, #if defined(CONFIG_LRU_GEN) || defined(CONFIG_MEMCG_KMEM) .attach = mem_cgroup_attach, #endif - .cancel_attach = memcg1_cancel_attach, - .post_attach = memcg1_move_task, #ifdef CONFIG_MEMCG_KMEM .fork = mem_cgroup_fork, .exit = mem_cgroup_exit, #endif .dfl_cftypes = memory_files, +#ifdef CONFIG_MEMCG_V1 + .can_attach = memcg1_can_attach, + .cancel_attach = memcg1_cancel_attach, + .post_attach = memcg1_move_task, .legacy_cftypes = mem_cgroup_legacy_files, +#endif .early_init = 0, }; @@ -5655,7 +5657,9 @@ static int __init mem_cgroup_swap_init(void) return 0; WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files)); +#ifdef CONFIG_MEMCG_V1 WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, memsw_files)); +#endif #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, zswap_files)); #endif From patchwork Tue May 28 21:44:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13677492 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE891C25B78 for ; Tue, 28 May 2024 21:45:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62F896B00B0; Tue, 28 May 2024 17:45:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DCC36B00B2; Tue, 28 May 2024 17:45:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47DFD6B00B3; Tue, 28 May 2024 17:45:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 233C16B00B0 for ; Tue, 28 May 2024 17:45:08 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D638D1A041D for ; Tue, 28 May 2024 21:45:07 +0000 (UTC) X-FDA: 82169135454.15.D4B7B45 Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) by imf24.hostedemail.com (Postfix) with ESMTP id B32C618001F for ; Tue, 28 May 2024 21:45:05 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=wg6mHbFw; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf24.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716932705; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Go08MgcoZFqIURxZ4j9Ru1G8PiEnIRyffZErr6eYIZU=; b=yNWkuFz74Mzz1V/X5NTgXo8IaLGrO74vFPE+VBU8c89JpWlt3x44EtDgUQEs4JwRTyVwQV Rnk+F2qU0P1VDRUQw7iVtAmJtia6mln/WjH8PHNBmw88WtI42QqHrDw3EozPbIu832wFSo RBaAklG2DbaC4m5SrYyDq7ZnQsvqNNU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=wg6mHbFw; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf24.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716932705; a=rsa-sha256; cv=none; b=HBrkw8KsmhnmWGgQQNeGCysTonJ6kh3P0wOPp5sTsdW6W25ObCK+q6b7OKgxhtl+OMRZP7 bxKd22qtTRxVm6PPemVM6BnpelHvcSxZc/vt7pan5xyvSfdyPZK9v5PflsKskyXrFzdf3h WqvcoPZbolRtSjCKiJfR/hVrmsB5Gbg= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716932704; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Go08MgcoZFqIURxZ4j9Ru1G8PiEnIRyffZErr6eYIZU=; b=wg6mHbFw/Xr3q1hV392+fUiUTmQFKkNz0r67UJvbRYO+WAM0XjtUpE8IaJX4nEDdQrhogz oxvNZbxdtz2XLB9kBf499yA4zFcMAhNdiwjpKw9XTJPOW+tgiXDxAqX1aMlf1vz+aT2gzy kPBFhLFm7Xfxea+eQgVI5aUOJlBRIo0= X-Envelope-To: muchun.song@linux.dev X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: willy@infradead.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Muchun Song , Johannes Weiner , Michal Hocko , Shakeel Butt , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Roman Gushchin Subject: [PATCH v1 14/14] MAINTAINERS: add mm/memcontrol-v1.c/h to the list of maintained files Date: Tue, 28 May 2024 14:44:35 -0700 Message-ID: <20240528214435.3125304-5-roman.gushchin@linux.dev> In-Reply-To: <20240528214435.3125304-1-roman.gushchin@linux.dev> References: <20240528202101.3099300-1-roman.gushchin@linux.dev> <20240528214435.3125304-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: B32C618001F X-Stat-Signature: 7tk9gajeo9jrzy5hsofuoah8kozuhfee X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1716932705-102618 X-HE-Meta: U2FsdGVkX1+xpySLQw6nktVXVG/WirfxjH83R8cz8t6gTENEeWws5BRDOQlIxA4dEKCXITwWvi6H/R50xIrtXrd59hIVq2TPEH4ofZBKEHsGHm6eYQ9wOWt4sSiLLK9L1mGjGwlzWNsNjh19a/PobYeWcOEEwx9WPs7puMWdCAY2tenVqsQA21QfrQTrmgH3ui3cCaXEQRDCDDOB8k66CWtC7k/TP8JTry5pVbdCplvWriECZGeHjKXkdNB/nS8f4VnFNHbnVbp5knxg0VxAmmddegbW1YfXA4/rf0PtUj7wklKLBwzy0rVSWC+g47gtX0MrguYggh0uYlSzNAh8LpilEy7lxmybvkTVPUUIGCTKCYkJBFymQ95EmS0HKH08nw7vNtTRFa5723wkVQr5mAMHjin/bTDikDC+IU6ixgbvVqIZc+pxFR0pkuMuf9J3bZUYcy3Sj5WgaQ9CVp3RYPnMpfIsthy8wxdhbrLh0hGFZF9GzHRT/TscSyyO/Rcfb0ZALEeG7z1mzLAnG5eTtf+kWKm2GW5e+asvjQJC+B601WCmM31tqUCJXgwVRF1V5F0Rt3x0DlDOYP0MaDdKQZ3I4CuF+ecguUrkXqjin7N6L91XACgC7cc5NWPk9wST+DLQCrbtxWc1JSsAjZKMtwKbc5+6caKjjBdifnLKx7zmZhwhQq4RRKgEiwf/X6mD52NThoCgCFCV+nprpH8g6rQf4Lu6a9yUBzQLPrvMJrW35VCKE9JUSwch5gBEGEs3mLbAiHwrHbLGE2fqoS9uLKwIFGR1Tyh9mjfctKgPmoEB/Zx/YtF/ICRTYDECAK4aBoOJZMEI0G8tVr+O7mey8DvYwbfkZqXng6jYJAt3+C+jZgnavxzAHJA2YcBTWDWdJP1MKxpbGc0rAY1voIV2xXJ7OgEx90jW+AOGZY+ZplDxdkANx1Rd/XUHZRwCvUTg9LI2iMB0tyM7t6lQWFn OISyRXz3 hBsrCBcZrUQz7LwpXWeMqB7Z9AnfrPP2Fd4HCorCOOFAvHoQe7/E0CCSgErv62jPBo3VfUiRn2LplrQt3hbfEkj3FwAvJkMYsnpLB7w4mFF9W0ux14Bwpn+Xyu5YsAv+vo2Ck5aT0PcBR+fzO9lSaAbWGXA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Signed-off-by: Roman Gushchin Acked-by: Shakeel Butt --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 7af431daf77bc..cccaf2bcedecd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5584,6 +5584,8 @@ L: linux-mm@kvack.org S: Maintained F: include/linux/memcontrol.h F: mm/memcontrol.c +F: mm/memcontrol-v1.c +F: mm/memcontrol-v1.h F: mm/swap_cgroup.c F: samples/cgroup/* F: tools/testing/selftests/cgroup/memcg_protection.m