From patchwork Sat Feb 26 20:41:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12761451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EAD4C433EF for ; Sat, 26 Feb 2022 20:41:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19DCF8D0002; Sat, 26 Feb 2022 15:41:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC3318D0007; Sat, 26 Feb 2022 15:41:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE7AB8D0005; Sat, 26 Feb 2022 15:41:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0080.hostedemail.com [216.40.44.80]) by kanga.kvack.org (Postfix) with ESMTP id 848098D0002 for ; Sat, 26 Feb 2022 15:41:53 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2747C181CA0A2 for ; Sat, 26 Feb 2022 20:41:53 +0000 (UTC) X-FDA: 79186102506.28.92199F9 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf05.hostedemail.com (Postfix) with ESMTP id 7475C100002 for ; Sat, 26 Feb 2022 20:41:52 +0000 (UTC) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1645908110; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RN3eik2uo1EctM+OaYMX4dV1IIZTzcai62Ou4T2C8MI=; b=0N+FywqIugTqRm0YHKJq3iPeYH/FyWUeOTjimDVELZaVljesQx+Hu/Bh/yEvwxHMFd8QdI c0DGdIiC87Pv+9S+IB7zALwqlSE+pmJt3o4u011YisaZycjTzuCK+Fa+jLJ0SVjXJEhDFs leLBhTrnHZdFi2PYs++dBGNLOOlDnmFFV5EqXGzeBfSnnEIAozA+Jmmt6YYZEe8NgM9+Zn ho2CFy9m64hOsdoXeRYg7a+r517sSGgRn9UkQVZ8ElmAM94sy5MU9oG4ovCS7AsC+6jX53 5n2VLpZjrgeDhG7KqHr8Db1Kh2LDkiChet7krnx0hAbFU3KAv58CnWcV4TiHrg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1645908110; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RN3eik2uo1EctM+OaYMX4dV1IIZTzcai62Ou4T2C8MI=; b=zsBH9L/9u0cb5zB+nyCwhcxV7ZLdTpBFk210HKUV989cQ/ZHuJSYagzBM90z+Cn0n6IzcL rV72bslFdn1vjkCQ== To: cgroups@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , =?utf-8?q?Michal_Koutn=C3=BD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long , Michal Hocko , Sebastian Andrzej Siewior , Roman Gushchin , Shakeel Butt Subject: [PATCH v5 1/6] mm/memcg: Revert ("mm/memcg: optimize user context object stock access") Date: Sat, 26 Feb 2022 21:41:39 +0100 Message-Id: <20220226204144.1008339-2-bigeasy@linutronix.de> In-Reply-To: <20220226204144.1008339-1-bigeasy@linutronix.de> References: <20220226204144.1008339-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7475C100002 X-Stat-Signature: kuowy5w81ercy33yoxbr4xqqjjr4chhm Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=0N+FywqI; dkim=pass header.d=linutronix.de header.s=2020e header.b="zsBH9L/9"; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf05.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de X-HE-Tag: 1645908112-138066 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Michal Hocko The optimisation is based on a micro benchmark where local_irq_save() is more expensive than a preempt_disable(). There is no evidence that it is visible in a real-world workload and there are CPUs where the opposite is true (local_irq_save() is cheaper than preempt_disable()). Based on micro benchmarks, the optimisation makes sense on PREEMPT_NONE where preempt_disable() is optimized away. There is no improvement with PREEMPT_DYNAMIC since the preemption counter is always available. The optimization makes also the PREEMPT_RT integration more complicated since most of the assumption are not true on PREEMPT_RT. Revert the optimisation since it complicates the PREEMPT_RT integration and the improvement is hardly visible. [ bigeasy: Patch body around Michal's diff ] Link: https://lore.kernel.org/all/YgOGkXXCrD%2F1k+p4@dhcp22.suse.cz Link: https://lkml.kernel.org/r/YdX+INO9gQje6d0S@linutronix.de Signed-off-by: Michal Hocko Signed-off-by: Sebastian Andrzej Siewior Acked-by: Roman Gushchin Acked-by: Johannes Weiner Reviewed-by: Shakeel Butt Acked-by: Michal Hocko --- mm/memcontrol.c | 94 ++++++++++++++----------------------------------- 1 file changed, 27 insertions(+), 67 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3c4816147273a..8ab2dc75e70ec 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2078,23 +2078,17 @@ void unlock_page_memcg(struct page *page) folio_memcg_unlock(page_folio(page)); } -struct obj_stock { +struct memcg_stock_pcp { + struct mem_cgroup *cached; /* this never be root cgroup */ + unsigned int nr_pages; + #ifdef CONFIG_MEMCG_KMEM struct obj_cgroup *cached_objcg; struct pglist_data *cached_pgdat; unsigned int nr_bytes; int nr_slab_reclaimable_b; int nr_slab_unreclaimable_b; -#else - int dummy[0]; #endif -}; - -struct memcg_stock_pcp { - struct mem_cgroup *cached; /* this never be root cgroup */ - unsigned int nr_pages; - struct obj_stock task_obj; - struct obj_stock irq_obj; struct work_struct work; unsigned long flags; @@ -2104,13 +2098,13 @@ static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); static DEFINE_MUTEX(percpu_charge_mutex); #ifdef CONFIG_MEMCG_KMEM -static void drain_obj_stock(struct obj_stock *stock); +static void drain_obj_stock(struct memcg_stock_pcp *stock); static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg); static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages); #else -static inline void drain_obj_stock(struct obj_stock *stock) +static inline void drain_obj_stock(struct memcg_stock_pcp *stock) { } static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, @@ -2190,9 +2184,7 @@ static void drain_local_stock(struct work_struct *dummy) local_irq_save(flags); stock = this_cpu_ptr(&memcg_stock); - drain_obj_stock(&stock->irq_obj); - if (in_task()) - drain_obj_stock(&stock->task_obj); + drain_obj_stock(stock); drain_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); @@ -2767,41 +2759,6 @@ static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) */ #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT) -/* - * Most kmem_cache_alloc() calls are from user context. The irq disable/enable - * sequence used in this case to access content from object stock is slow. - * To optimize for user context access, there are now two object stocks for - * task context and interrupt context access respectively. - * - * The task context object stock can be accessed by disabling preemption only - * which is cheap in non-preempt kernel. The interrupt context object stock - * can only be accessed after disabling interrupt. User context code can - * access interrupt object stock, but not vice versa. - */ -static inline struct obj_stock *get_obj_stock(unsigned long *pflags) -{ - struct memcg_stock_pcp *stock; - - if (likely(in_task())) { - *pflags = 0UL; - preempt_disable(); - stock = this_cpu_ptr(&memcg_stock); - return &stock->task_obj; - } - - local_irq_save(*pflags); - stock = this_cpu_ptr(&memcg_stock); - return &stock->irq_obj; -} - -static inline void put_obj_stock(unsigned long flags) -{ - if (likely(in_task())) - preempt_enable(); - else - local_irq_restore(flags); -} - /* * mod_objcg_mlstate() may be called with irq enabled, so * mod_memcg_lruvec_state() should be used. @@ -3082,10 +3039,13 @@ void __memcg_kmem_uncharge_page(struct page *page, int order) void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, enum node_stat_item idx, int nr) { + struct memcg_stock_pcp *stock; unsigned long flags; - struct obj_stock *stock = get_obj_stock(&flags); int *bytes; + local_irq_save(flags); + stock = this_cpu_ptr(&memcg_stock); + /* * Save vmstat data in stock and skip vmstat array update unless * accumulating over a page of vmstat data or when pgdat or idx @@ -3136,26 +3096,29 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, if (nr) mod_objcg_mlstate(objcg, pgdat, idx, nr); - put_obj_stock(flags); + local_irq_restore(flags); } static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) { + struct memcg_stock_pcp *stock; unsigned long flags; - struct obj_stock *stock = get_obj_stock(&flags); bool ret = false; + local_irq_save(flags); + + stock = this_cpu_ptr(&memcg_stock); if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { stock->nr_bytes -= nr_bytes; ret = true; } - put_obj_stock(flags); + local_irq_restore(flags); return ret; } -static void drain_obj_stock(struct obj_stock *stock) +static void drain_obj_stock(struct memcg_stock_pcp *stock) { struct obj_cgroup *old = stock->cached_objcg; @@ -3211,13 +3174,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, { struct mem_cgroup *memcg; - if (in_task() && stock->task_obj.cached_objcg) { - memcg = obj_cgroup_memcg(stock->task_obj.cached_objcg); - if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) - return true; - } - if (stock->irq_obj.cached_objcg) { - memcg = obj_cgroup_memcg(stock->irq_obj.cached_objcg); + if (stock->cached_objcg) { + memcg = obj_cgroup_memcg(stock->cached_objcg); if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) return true; } @@ -3228,10 +3186,13 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, bool allow_uncharge) { + struct memcg_stock_pcp *stock; unsigned long flags; - struct obj_stock *stock = get_obj_stock(&flags); unsigned int nr_pages = 0; + local_irq_save(flags); + + stock = this_cpu_ptr(&memcg_stock); if (stock->cached_objcg != objcg) { /* reset if necessary */ drain_obj_stock(stock); obj_cgroup_get(objcg); @@ -3247,7 +3208,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, stock->nr_bytes &= (PAGE_SIZE - 1); } - put_obj_stock(flags); + local_irq_restore(flags); if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); @@ -6812,7 +6773,6 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) long nr_pages; struct mem_cgroup *memcg; struct obj_cgroup *objcg; - bool use_objcg = folio_memcg_kmem(folio); VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); @@ -6821,7 +6781,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) * folio memcg or objcg at this point, we have fully * exclusive access to the folio. */ - if (use_objcg) { + if (folio_memcg_kmem(folio)) { objcg = __folio_objcg(folio); /* * This get matches the put at the end of the function and @@ -6849,7 +6809,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) nr_pages = folio_nr_pages(folio); - if (use_objcg) { + if (folio_memcg_kmem(folio)) { ug->nr_memory += nr_pages; ug->nr_kmem += nr_pages; From patchwork Sat Feb 26 20:41:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12761450 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9E0CC433F5 for ; Sat, 26 Feb 2022 20:41:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D80928D0009; Sat, 26 Feb 2022 15:41:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AE5658D0003; Sat, 26 Feb 2022 15:41:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BDC18D0005; Sat, 26 Feb 2022 15:41:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 3E5248D0002 for ; Sat, 26 Feb 2022 15:41:53 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 169B760871 for ; Sat, 26 Feb 2022 20:41:53 +0000 (UTC) X-FDA: 79186102506.02.97927AB Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf24.hostedemail.com (Postfix) with ESMTP id 61BF8180005 for ; Sat, 26 Feb 2022 20:41:52 +0000 (UTC) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1645908110; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mCAlyN0P4ukGqzwBCvXW6Rkz9k8c5aTlYxZl8s5hStA=; b=fCdNKAjBl9vrngXspjT4kthbVsz22PPfH6UHiXaEHFCeN4KcqBX5IWCdsnsQUUVeexcNtL SpTlOdFuj7/zHfimvuMqbtbGYTjYzJzbUguDj1VuRv/it0vVlhImh/XOCUyxgAEW3NKxpX AV+9TpdXiMxDBv0N4ClnwthK/6LHCVLTGNOjBszWLe2oRVdzq+bidJqB+U8vfXKhr0Xm2u +gL3IDLk4MNGj72E2QIwbCl623ngVtj/CS4KBtiP/r6pYEX+stf3R34MENtey27SATvQyh K2kcybvipOdQa+fO6RkZcLN41ytpd73Foq22M0NJd+mHDeI30UyL9l4iC7i60w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1645908110; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mCAlyN0P4ukGqzwBCvXW6Rkz9k8c5aTlYxZl8s5hStA=; b=Rs1WEMa1ifWXq4y53hwbdfO0lQ0WYxNEVxrEC0jtRH0+Aq8afuL4MsApwNy5YZnE8gQhaZ c0dc4lNFT6zIGlBw== To: cgroups@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , =?utf-8?q?Michal_Koutn=C3=BD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long , Sebastian Andrzej Siewior , Roman Gushchin , Shakeel Butt , Michal Hocko Subject: [PATCH v5 2/6] mm/memcg: Disable threshold event handlers on PREEMPT_RT Date: Sat, 26 Feb 2022 21:41:40 +0100 Message-Id: <20220226204144.1008339-3-bigeasy@linutronix.de> In-Reply-To: <20220226204144.1008339-1-bigeasy@linutronix.de> References: <20220226204144.1008339-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspam-User: X-Stat-Signature: snuhhqg7jgneynqr9pdd5xb5o8xadysn Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=fCdNKAjB; dkim=pass header.d=linutronix.de header.s=2020e header.b=Rs1WEMa1; spf=pass (imf24.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de X-Rspamd-Queue-Id: 61BF8180005 X-HE-Tag: 1645908112-723633 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: During the integration of PREEMPT_RT support, the code flow around memcg_check_events() resulted in `twisted code'. Moving the code around and avoiding then would then lead to an additional local-irq-save section within memcg_check_events(). While looking better, it adds a local-irq-save section to code flow which is usually within an local-irq-off block on non-PREEMPT_RT configurations. The threshold event handler is a deprecated memcg v1 feature. Instead of trying to get it to work under PREEMPT_RT just disable it. There should be no users on PREEMPT_RT. From that perspective it makes even less sense to get it to work under PREEMPT_RT while having zero users. Make memory.soft_limit_in_bytes and cgroup.event_control return -EOPNOTSUPP on PREEMPT_RT. Make an empty memcg_check_events() and memcg_write_event_control() which return only -EOPNOTSUPP on PREEMPT_RT. Document that the two knobs are disabled on PREEMPT_RT. Suggested-by: Michal Hocko Suggested-by: Michal Koutný Signed-off-by: Sebastian Andrzej Siewior Acked-by: Roman Gushchin Acked-by: Johannes Weiner Reviewed-by: Shakeel Butt Acked-by: Michal Hocko --- Documentation/admin-guide/cgroup-v1/memory.rst | 2 ++ mm/memcontrol.c | 14 ++++++++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index faac50149a222..2cc502a75ef64 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -64,6 +64,7 @@ Brief summary of control files. threads cgroup.procs show list of processes cgroup.event_control an interface for event_fd() + This knob is not available on CONFIG_PREEMPT_RT systems. memory.usage_in_bytes show current usage for memory (See 5.5 for details) memory.memsw.usage_in_bytes show current usage for memory+Swap @@ -75,6 +76,7 @@ Brief summary of control files. memory.max_usage_in_bytes show max memory usage recorded memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded memory.soft_limit_in_bytes set/show soft limit of memory usage + This knob is not available on CONFIG_PREEMPT_RT systems. memory.stat show various statistics memory.use_hierarchy set/show hierarchical account enabled This knob is deprecated and shouldn't be diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8ab2dc75e70ec..0b5117ed2ae08 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -859,6 +859,9 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, */ static void memcg_check_events(struct mem_cgroup *memcg, int nid) { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return; + /* threshold event is triggered in finer grain than soft limit */ if (unlikely(mem_cgroup_event_ratelimit(memcg, MEM_CGROUP_TARGET_THRESH))) { @@ -3731,8 +3734,12 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of, } break; case RES_SOFT_LIMIT: - memcg->soft_limit = nr_pages; - ret = 0; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + ret = -EOPNOTSUPP; + } else { + memcg->soft_limit = nr_pages; + ret = 0; + } break; } return ret ?: nbytes; @@ -4708,6 +4715,9 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of, char *endp; int ret; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return -EOPNOTSUPP; + buf = strstrip(buf); efd = simple_strtoul(buf, &endp, 10); From patchwork Sat Feb 26 20:41:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12761449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AD1CC433FE for ; Sat, 26 Feb 2022 20:41:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A6B588D0001; Sat, 26 Feb 2022 15:41:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 907B58D0007; Sat, 26 Feb 2022 15:41:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D2A18D0001; Sat, 26 Feb 2022 15:41:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 42A748D0003 for ; Sat, 26 Feb 2022 15:41:53 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 048E921741 for ; Sat, 26 Feb 2022 20:41:52 +0000 (UTC) X-FDA: 79186102506.06.7FD934F Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf02.hostedemail.com (Postfix) with ESMTP id 53DF18000D for ; Sat, 26 Feb 2022 20:41:52 +0000 (UTC) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1645908111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vzffaKtY2TPuSciDyqAdDELxkAsc6OFdR8wdvbRfvC8=; b=HfojqYzQ6+JkjfDeLFJcwECUNDYhedrn0fTUATceUUdx+2MnqdFSyJz9x1RXUzuAa567xe yl3CMMtLaCjUaHBL7+vqJBe6aKXv0b6TslTod4pLAmrLfbapb2/iNfFcMmdvhHwg1xdDyg RxsMIz57z2RoGNK0biAieGMonD2qdvpA6XVbBexJcE3x4H4tU6TY302Yr+dQyDKi9WZgqy uOcDZfuSXf5MCihdLx61r8uxKEt1yi8tzqE1D/tScrseZ3QDdtTu2IYvL/M9Pz0Ktg2JIF uv3RaHsJlodo6H70lTr4OJE16oQUmJPHYvo4Sset1N1xmjGvIQQ9CMrX72W6EQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1645908111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vzffaKtY2TPuSciDyqAdDELxkAsc6OFdR8wdvbRfvC8=; b=XeDtsWWbNTY8NrX/xGqu21GrQtVyptp5oaHZP4roJVmI7X1TwzvM2EwJQWs2pvFq6LD3yy XeweT9UkNlYxraCA== To: cgroups@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , =?utf-8?q?Michal_Koutn=C3=BD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long , Sebastian Andrzej Siewior , Roman Gushchin Subject: [PATCH v5 3/6] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed. Date: Sat, 26 Feb 2022 21:41:41 +0100 Message-Id: <20220226204144.1008339-4-bigeasy@linutronix.de> In-Reply-To: <20220226204144.1008339-1-bigeasy@linutronix.de> References: <20220226204144.1008339-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 53DF18000D X-Stat-Signature: r1fnt57onmb35ktib8goqpr67kehz1qq Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=HfojqYzQ; dkim=pass header.d=linutronix.de header.s=2020e header.b=XeDtsWWb; spf=pass (imf02.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de X-HE-Tag: 1645908112-790911 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The per-CPU counter are modified with the non-atomic modifier. The consistency is ensured by disabling interrupts for the update. On non PREEMPT_RT configuration this works because acquiring a spinlock_t typed lock with the _irq() suffix disables interrupts. On PREEMPT_RT configurations the RMW operation can be interrupted. Another problem is that mem_cgroup_swapout() expects to be invoked with disabled interrupts because the caller has to acquire a spinlock_t which is acquired with disabled interrupts. Since spinlock_t never disables interrupts on PREEMPT_RT the interrupts are never disabled at this point. The code is never called from in_irq() context on PREEMPT_RT therefore disabling preemption during the update is sufficient on PREEMPT_RT. The sections which explicitly disable interrupts can remain on PREEMPT_RT because the sections remain short and they don't involve sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT). Disable preemption during update of the per-CPU variables which do not explicitly disable interrupts. Signed-off-by: Sebastian Andrzej Siewior Acked-by: Roman Gushchin Reviewed-by: Shakeel Butt --- mm/memcontrol.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0b5117ed2ae08..238ea77aade5d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -630,6 +630,35 @@ static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); +/* + * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can + * not rely on this as part of an acquired spinlock_t lock. These functions are + * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion + * is sufficient. + */ +static void memcg_stats_lock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_disable(); +#else + VM_BUG_ON(!irqs_disabled()); +#endif +} + +static void __memcg_stats_lock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_disable(); +#endif +} + +static void memcg_stats_unlock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_enable(); +#endif +} + static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { unsigned int x; @@ -706,6 +735,27 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; + /* + * The caller from rmap relay on disabled preemption becase they never + * update their counter from in-interrupt context. For these two + * counters we check that the update is never performed from an + * interrupt context while other caller need to have disabled interrupt. + */ + __memcg_stats_lock(); + if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) { + switch (idx) { + case NR_ANON_MAPPED: + case NR_FILE_MAPPED: + case NR_ANON_THPS: + case NR_SHMEM_PMDMAPPED: + case NR_FILE_PMDMAPPED: + WARN_ON_ONCE(!in_task()); + break; + default: + WARN_ON_ONCE(!irqs_disabled()); + } + } + /* Update memcg */ __this_cpu_add(memcg->vmstats_percpu->state[idx], val); @@ -713,6 +763,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); memcg_rstat_updated(memcg, val); + memcg_stats_unlock(); } /** @@ -795,8 +846,10 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, if (mem_cgroup_disabled()) return; + memcg_stats_lock(); __this_cpu_add(memcg->vmstats_percpu->events[idx], count); memcg_rstat_updated(memcg, count); + memcg_stats_unlock(); } static unsigned long memcg_events(struct mem_cgroup *memcg, int event) @@ -7140,8 +7193,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) * important here to have the interrupts disabled because it is the * only synchronisation we have for updating the per-CPU variables. */ - VM_BUG_ON(!irqs_disabled()); + memcg_stats_lock(); mem_cgroup_charge_statistics(memcg, -nr_entries); + memcg_stats_unlock(); memcg_check_events(memcg, page_to_nid(page)); css_put(&memcg->css); From patchwork Sat Feb 26 20:41:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12761452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE626C433F5 for ; Sat, 26 Feb 2022 20:42:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C8718D0003; Sat, 26 Feb 2022 15:41:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ED3EE8D0005; Sat, 26 Feb 2022 15:41:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB72A8D0002; Sat, 26 Feb 2022 15:41:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 912E48D0008 for ; Sat, 26 Feb 2022 15:41:53 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 227E2181CA09B for ; Sat, 26 Feb 2022 20:41:53 +0000 (UTC) X-FDA: 79186102506.29.87E89AC Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf20.hostedemail.com (Postfix) with ESMTP id 7FD671C0006 for ; Sat, 26 Feb 2022 20:41:52 +0000 (UTC) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1645908111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ve4oxATZGbwpLn9yS3gNeW25czwPhD7pWlWAGFyb+pk=; b=1bbLabG79sUK8TM/MWVQjlXrms8r108UojZ+z/5ltrQgXxLFHmBGjFgjLtFnMLjqsGUs68 Q/WLVjW8qWEVuYjkKVfxivCOA1rVzhst1soFfeefujAaghuS6LHwwT2CRaybw3D3dqOfwP F81KzPdY87dYsME3qUxVi8O9507dnyYtXNXcRQUWuSVOYyTWTP4mEhrurpZgyq5eAJHJDC lgcSvluZjnVDWoDewPsZt+pd4EzTH6qN+W4veKMmrZEiFaK0H87NxGvEhMIQWjH9/tftLc uqKg7wxn9zsSVtBeW0ih5xwGybPA3uiUVBueNJqWdXGO9UjV4dVfV/RemraaKg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1645908111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ve4oxATZGbwpLn9yS3gNeW25czwPhD7pWlWAGFyb+pk=; b=nZU3LbdBMl3zmc30iXmeSoZFk98tNGWWbI32mBLGxBQ6pfib+AfI9+0l5jUzpnh44x+pdQ NhXBGZN/6wm84HAA== To: cgroups@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , =?utf-8?q?Michal_Koutn=C3=BD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long , Sebastian Andrzej Siewior , Shakeel Butt , Roman Gushchin , Michal Hocko Subject: [PATCH v5 4/6] mm/memcg: Opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock() Date: Sat, 26 Feb 2022 21:41:42 +0100 Message-Id: <20220226204144.1008339-5-bigeasy@linutronix.de> In-Reply-To: <20220226204144.1008339-1-bigeasy@linutronix.de> References: <20220226204144.1008339-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7FD671C0006 X-Stat-Signature: hsixp96uehekob7rf61rpkyew3wnnusd X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=1bbLabG7; dkim=pass header.d=linutronix.de header.s=2020e header.b=nZU3LbdB; spf=pass (imf20.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de X-Rspamd-Server: rspam07 X-HE-Tag: 1645908112-986062 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Johannes Weiner Provide the inner part of refill_stock() as __refill_stock() without disabling interrupts. This eases the integration of local_lock_t where recursive locking must be avoided. Open code obj_cgroup_uncharge_pages() in drain_obj_stock() and use __refill_stock(). The caller of drain_obj_stock() already disables interrupts. [bigeasy: Patch body around Johannes' diff ] Signed-off-by: Johannes Weiner Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Shakeel Butt Reviewed-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 238ea77aade5d..4d049b4691afd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2251,12 +2251,9 @@ static void drain_local_stock(struct work_struct *dummy) * Cache charges(val) to local per_cpu area. * This will be consumed by consume_stock() function, later. */ -static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { struct memcg_stock_pcp *stock; - unsigned long flags; - - local_irq_save(flags); stock = this_cpu_ptr(&memcg_stock); if (stock->cached != memcg) { /* reset if necessary */ @@ -2268,7 +2265,14 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) if (stock->nr_pages > MEMCG_CHARGE_BATCH) drain_stock(stock); +} +static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +{ + unsigned long flags; + + local_irq_save(flags); + __refill_stock(memcg, nr_pages); local_irq_restore(flags); } @@ -3185,8 +3189,16 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock) unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); - if (nr_pages) - obj_cgroup_uncharge_pages(old, nr_pages); + if (nr_pages) { + struct mem_cgroup *memcg; + + memcg = get_mem_cgroup_from_objcg(old); + + memcg_account_kmem(memcg, -nr_pages); + __refill_stock(memcg, nr_pages); + + css_put(&memcg->css); + } /* * The leftover is flushed to the centralized per-memcg value. From patchwork Sat Feb 26 20:41:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12761454 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78E6DC433FE for ; Sat, 26 Feb 2022 20:42:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BB4A8D0005; Sat, 26 Feb 2022 15:41:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 11D148D000A; Sat, 26 Feb 2022 15:41:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE91D8D0005; Sat, 26 Feb 2022 15:41:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0253.hostedemail.com [216.40.44.253]) by kanga.kvack.org (Postfix) with ESMTP id BCFE88D0007 for ; Sat, 26 Feb 2022 15:41:54 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 75F96181CA09A for ; Sat, 26 Feb 2022 20:41:54 +0000 (UTC) X-FDA: 79186102548.22.A298D2E Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf26.hostedemail.com (Postfix) with ESMTP id CEE65140002 for ; Sat, 26 Feb 2022 20:41:53 +0000 (UTC) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1645908111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YS66z6alTYMGsVDzNzJq2eTXDHgOItRSpB1pj7Q+ytc=; b=0zG06eHrX29uzQQn4QVAihEnqnMQ3hNjHFNicdDLIjmk2NXJRzQEc5fBCHQIxigdN4OKQd Q9CrG73vVSxjkH8va6vh9gae9OgBBGX+g+Yzb0ygjI0x7ACjJKMGPiCi95Hx8W2X9KYicJ x8B6U9FQ6DknwMpAuIkvNhAw1hRoEgTV/61veXYTjW9SVPsVJkPIy/+duOSXXsnI4WShJ+ Yj6INtsw3aRa3ciazQtzrLZ0HmrAW+5wQk0L7hUWk4ePss6GGGkC2m0g/u/hUC/B31lmW0 B1jL2YcMOu64JMEjkMPBMb1GLKRAFoI0FyOjdAekhebMoFCZUwGx/0fQ/hGygw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1645908111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YS66z6alTYMGsVDzNzJq2eTXDHgOItRSpB1pj7Q+ytc=; b=jnxx0J31h+fMUtAah4t0ydb8Slbj8/niBtT+XegK2aad0o0hvaPEKJgwmTYpcWxmuJ/t9t k/9D2qIx2qSIEcCw== To: cgroups@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , =?utf-8?q?Michal_Koutn=C3=BD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long , Sebastian Andrzej Siewior , kernel test robot Subject: [PATCH v5 5/6] mm/memcg: Protect memcg_stock with a local_lock_t Date: Sat, 26 Feb 2022 21:41:43 +0100 Message-Id: <20220226204144.1008339-6-bigeasy@linutronix.de> In-Reply-To: <20220226204144.1008339-1-bigeasy@linutronix.de> References: <20220226204144.1008339-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: CEE65140002 X-Stat-Signature: ota1umi6yi98jhecm8rdm8iraern3weo Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=0zG06eHr; dkim=pass header.d=linutronix.de header.s=2020e header.b=jnxx0J31; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf26.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de X-HE-Tag: 1645908113-342121 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The members of the per-CPU structure memcg_stock_pcp are protected by disabling interrupts. This is not working on PREEMPT_RT because it creates atomic context in which actions are performed which require preemptible context. One example is obj_cgroup_release(). The IRQ-disable sections can be replaced with local_lock_t which preserves the explicit disabling of interrupts while keeps the code preemptible on PREEMPT_RT. drain_obj_stock() drops a reference on obj_cgroup which leads to an invocation of obj_cgroup_release() if it is the last object. This in turn leads to recursive locking of the local_lock_t. To avoid this, obj_cgroup_release() is invoked outside of the locked section. obj_cgroup_uncharge_pages() can be invoked with the local_lock_t acquired and without it. This will lead later to a recursion in refill_stock(). To avoid the locking recursion provide obj_cgroup_uncharge_pages_locked() which uses the locked version of refill_stock(). - Replace disabling interrupts for memcg_stock with a local_lock_t. - Let drain_obj_stock() return the old struct obj_cgroup which is passed to obj_cgroup_put() outside of the locked section. - Provide obj_cgroup_uncharge_pages_locked() which uses the locked version of refill_stock() to avoid recursive locking in drain_obj_stock(). Link: https://lkml.kernel.org/r/20220209014709.GA26885@xsang-OptiPlex-9020 Reported-by: kernel test robot Signed-off-by: Sebastian Andrzej Siewior Acked-by: Michal Hocko --- mm/memcontrol.c | 59 +++++++++++++++++++++++++++++++------------------ 1 file changed, 38 insertions(+), 21 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4d049b4691afd..6439b0089d392 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2135,6 +2135,7 @@ void unlock_page_memcg(struct page *page) } struct memcg_stock_pcp { + local_lock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; @@ -2150,18 +2151,21 @@ struct memcg_stock_pcp { unsigned long flags; #define FLUSHING_CACHED_CHARGE 0 }; -static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); +static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = { + .stock_lock = INIT_LOCAL_LOCK(stock_lock), +}; static DEFINE_MUTEX(percpu_charge_mutex); #ifdef CONFIG_MEMCG_KMEM -static void drain_obj_stock(struct memcg_stock_pcp *stock); +static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock); static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg); static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages); #else -static inline void drain_obj_stock(struct memcg_stock_pcp *stock) +static inline struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock) { + return NULL; } static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg) @@ -2193,7 +2197,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) if (nr_pages > MEMCG_CHARGE_BATCH) return ret; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (memcg == stock->cached && stock->nr_pages >= nr_pages) { @@ -2201,7 +2205,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) ret = true; } - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); return ret; } @@ -2230,6 +2234,7 @@ static void drain_stock(struct memcg_stock_pcp *stock) static void drain_local_stock(struct work_struct *dummy) { struct memcg_stock_pcp *stock; + struct obj_cgroup *old = NULL; unsigned long flags; /* @@ -2237,14 +2242,16 @@ static void drain_local_stock(struct work_struct *dummy) * drain_stock races is that we always operate on local CPU stock * here with IRQ disabled */ - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); - drain_obj_stock(stock); + old = drain_obj_stock(stock); drain_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + if (old) + obj_cgroup_put(old); } /* @@ -2271,9 +2278,9 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { unsigned long flags; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); __refill_stock(memcg, nr_pages); - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); } /* @@ -3100,10 +3107,11 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, enum node_stat_item idx, int nr) { struct memcg_stock_pcp *stock; + struct obj_cgroup *old = NULL; unsigned long flags; int *bytes; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); /* @@ -3112,7 +3120,7 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, * changes. */ if (stock->cached_objcg != objcg) { - drain_obj_stock(stock); + old = drain_obj_stock(stock); obj_cgroup_get(objcg); stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0; @@ -3156,7 +3164,9 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, if (nr) mod_objcg_mlstate(objcg, pgdat, idx, nr); - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + if (old) + obj_cgroup_put(old); } static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) @@ -3165,7 +3175,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) unsigned long flags; bool ret = false; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { @@ -3173,17 +3183,17 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) ret = true; } - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); return ret; } -static void drain_obj_stock(struct memcg_stock_pcp *stock) +static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock) { struct obj_cgroup *old = stock->cached_objcg; if (!old) - return; + return NULL; if (stock->nr_bytes) { unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; @@ -3233,8 +3243,12 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock) stock->cached_pgdat = NULL; } - obj_cgroup_put(old); stock->cached_objcg = NULL; + /* + * The `old' objects needs to be released by the caller via + * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock. + */ + return old; } static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, @@ -3255,14 +3269,15 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, bool allow_uncharge) { struct memcg_stock_pcp *stock; + struct obj_cgroup *old = NULL; unsigned long flags; unsigned int nr_pages = 0; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (stock->cached_objcg != objcg) { /* reset if necessary */ - drain_obj_stock(stock); + old = drain_obj_stock(stock); obj_cgroup_get(objcg); stock->cached_objcg = objcg; stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) @@ -3276,7 +3291,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, stock->nr_bytes &= (PAGE_SIZE - 1); } - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + if (old) + obj_cgroup_put(old); if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); From patchwork Sat Feb 26 20:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12761453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFC2FC433EF for ; Sat, 26 Feb 2022 20:42:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAECF8D0008; Sat, 26 Feb 2022 15:41:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E384E8D0007; Sat, 26 Feb 2022 15:41:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C65998D0008; Sat, 26 Feb 2022 15:41:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0230.hostedemail.com [216.40.44.230]) by kanga.kvack.org (Postfix) with ESMTP id A05BA8D0005 for ; Sat, 26 Feb 2022 15:41:54 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5AEA097315 for ; Sat, 26 Feb 2022 20:41:54 +0000 (UTC) X-FDA: 79186102548.25.05C8575 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf10.hostedemail.com (Postfix) with ESMTP id CE08DC0007 for ; Sat, 26 Feb 2022 20:41:53 +0000 (UTC) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1645908112; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VuB/1jlIhEyPDsXw/VrUOxUQMh/b8hRGxZ9Q00QMyEw=; b=Ft6l+vfY/mtjKmilohDaR2FygjHI2QE8zu8W3iTBIBD8onmMtmfYo1oeZ8oEnQ/p2xDuPu ALZbRB6CepCK5z0ICaHCoqUQ07bv3yGEKNvnp4tiClDYlySh7fSfuV+dhGfFewOkMFOoWV 1ADT4IB+uN99kBJP3GJz9OcBVMfz0G8gksA7ihnaHs7n+pXZPuzAvCKN2g8F9lj+yByvxh sx8cBN8dWgurKJyknZZMN27qKIeR6h+xJiRG6tvZvHrcntnvTSDg1ulpejxFkKP59UYGs7 PCn+0EGrzV9TUYn1BmCmV7LIYciewy75tr2gacolu2k4NiMUjaEphkpS/Cq/Mw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1645908112; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VuB/1jlIhEyPDsXw/VrUOxUQMh/b8hRGxZ9Q00QMyEw=; b=vwOE5noON8Etg+MQn1ibZzISWjg5NtyzZxugs4RVno03ttWjNyc0HuiLypBCngSlgscdhl ZPuggoD0HSrn9LCA== To: cgroups@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , =?utf-8?q?Michal_Koutn=C3=BD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long , Sebastian Andrzej Siewior , Michal Hocko Subject: [PATCH v5 6/6] mm/memcg: Disable migration instead of preemption in drain_all_stock(). Date: Sat, 26 Feb 2022 21:41:44 +0100 Message-Id: <20220226204144.1008339-7-bigeasy@linutronix.de> In-Reply-To: <20220226204144.1008339-1-bigeasy@linutronix.de> References: <20220226204144.1008339-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: CE08DC0007 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=Ft6l+vfY; dkim=pass header.d=linutronix.de header.s=2020e header.b=vwOE5noO; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf10.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de X-Stat-Signature: 7xo914utntzsz9ixxwkg5i93cdqtc141 X-HE-Tag: 1645908113-324554 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Before the for-each-CPU loop, preemption is disabled so that so that drain_local_stock() can be invoked directly instead of scheduling a worker. Ensuring that drain_local_stock() completed on the local CPU is not correctness problem. It _could_ be that the charging path will be forced to reclaim memory because cached charges are still waiting for their draining. Disabling preemption before invoking drain_local_stock() is problematic on PREEMPT_RT due to the sleeping locks involved. To ensure that no CPU migrations happens across for_each_online_cpu() it is enouhg to use migrate_disable() which disables migration and keeps context preemptible to a sleeping lock can be acquired. A race with CPU hotplug is not a problem because pcp data is not going away. In the worst case we just schedule draining of an empty stock. Use migrate_disable() instead of get_cpu() around the for_each_online_cpu() loop. Signed-off-by: Sebastian Andrzej Siewior Acked-by: Michal Hocko --- mm/memcontrol.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6439b0089d392..89664d8094bc0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2300,7 +2300,8 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) * as well as workers from this path always operate on the local * per-cpu data. CPU up doesn't touch memcg_stock at all. */ - curcpu = get_cpu(); + migrate_disable(); + curcpu = smp_processor_id(); for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *memcg; @@ -2323,7 +2324,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) schedule_work_on(cpu, &stock->work); } } - put_cpu(); + migrate_enable(); mutex_unlock(&percpu_charge_mutex); }