From patchwork Thu Mar 23 04:00:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FA2DC6FD1C for ; Thu, 23 Mar 2023 04:00:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D21336B0082; Thu, 23 Mar 2023 00:00:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD3F96B0083; Thu, 23 Mar 2023 00:00:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B72AD6B0085; Thu, 23 Mar 2023 00:00:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9BAB36B0082 for ; Thu, 23 Mar 2023 00:00:51 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 723AA802FB for ; Thu, 23 Mar 2023 04:00:51 +0000 (UTC) X-FDA: 80598811902.11.03BE0BB Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf17.hostedemail.com (Postfix) with ESMTP id ACDCC4000C for ; Thu, 23 Mar 2023 04:00:49 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cgw7YjSY; spf=pass (imf17.hostedemail.com: domain of 38M4bZAoKCCUZPTSZBINFEHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--yosryahmed.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=38M4bZAoKCCUZPTSZBINFEHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679544049; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xrG0SezuE3D1BzHk1bOIN+tOwhU+wFsyLck18RLDkiM=; b=EQKCJqORdEdavwRM2I7/0GNFxFKtQ6d5vtV3qJ+2+m9EoMRqjnCAPDM5oRKrWST1zv9R5A 4ok7PISfV3sNpyoaXy7ZcIpEjZUaJCTwVPJt61EB39Vdjtd0MzJ/UCQ3+gWbrt+5of3PZ9 YJ+wrFxgyQ2Zk5lk7HjLa677YbbIGS8= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cgw7YjSY; spf=pass (imf17.hostedemail.com: domain of 38M4bZAoKCCUZPTSZBINFEHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--yosryahmed.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=38M4bZAoKCCUZPTSZBINFEHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679544049; a=rsa-sha256; cv=none; b=WbwZdpZiOGwaQ5hDX6TL8mT30V3xzpf4mtLQlru7mwS3x0doM4oStvvoJEPnLuScLzhutk jEdwVwuXkxSru1dLnNTYCS+M2t6UmB+4oe3eLgmvOWCZ0JOBjwQkPEnNSgYqGn3xWykf47 Ra3QMtWUHLOQpbGFI8TY4yhm4S4qVYc= Received: by mail-yb1-f202.google.com with SMTP id 185-20020a250ac2000000b00b6d0cdc8e3bso10222880ybk.4 for ; Wed, 22 Mar 2023 21:00:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544049; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xrG0SezuE3D1BzHk1bOIN+tOwhU+wFsyLck18RLDkiM=; b=cgw7YjSYw+yZCrYg4MWNWyL1ocWVlMfhI6U/IH5SNNI/KXZEirBQTnVHw1Jk19ca7x YZMRDXY2GIXRShVJyUBfGcwrm5OQ54Bn3XOEBif3BbHD9ArewU0b2vLyFm38bp8+NRx8 4lwk3DHRAIpY5JhlVhZi+qJFnS5DTPOFtneU65+9wwMhEPN3ScjeohcSLADEGreopxgz hPSmYPd9gVS979yLkwNh9j+9QGCIU5M1yATespEadqsIAym+pG18x2O6VFUlYJCK8ejc v8yJF8c/6m0MiyUkNdlDzwM4KjC9ikuULQ0wGXuneSrTmETcMqRGnUz3Mhll67JgWKef njxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544049; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xrG0SezuE3D1BzHk1bOIN+tOwhU+wFsyLck18RLDkiM=; b=GzBbwf8pTFOJOApFDK0vTdxvThnNdWD+1jFpoMrGsoMw7l5yrhv1WFCUe8+qxY7fY9 jDQtspJyApzpUxqAVx1YWz6cI31ecyRqB1fnfyPMJIkKEvSika5c4bpt2oVpHfaVJLUJ QeqFT8q3DAhkBnF+kxi+ScrLrSNltZL7SgueGS0Q+XbkDrV7ldSeJNQSq0jzgseyAhGZ HpbWRJbla4vlDS2sFLZRzzllSZadOAj9AShX8WSSwqHIQlrZl25sABbcOlR1pzuSgMQr g5bcRBOtEjktSlNi5HziKUV7RYO5u/ZWwzXuPaGqDHPk9r8HTKbVSowS+pfNDHj77ioE njpQ== X-Gm-Message-State: AAQBX9cH5VkkBQKEOOnSo1QKqJJbakfyIps8oiBHP+biVa9k6kOiDLMI UzV3Hh1Xzt/GDWOi+P2gVlr6QSzWqKHV13e1 X-Google-Smtp-Source: AKy350bTQGWCAJk1730fCXLD+nAem5ueAaNE2q7H+XY1RCF27byWvmJXxhgyQR2PhcI0/Laslob9u/xZD8mfS05D X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a81:b149:0:b0:544:bb1e:f9cf with SMTP id p70-20020a81b149000000b00544bb1ef9cfmr1185027ywh.4.1679544048914; Wed, 22 Mar 2023 21:00:48 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:34 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-5-yosryahmed@google.com> Subject: [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: ACDCC4000C X-Stat-Signature: utbidxtpnqows33acgpku3zsa63n6zw1 X-HE-Tag: 1679544049-482795 X-HE-Meta: U2FsdGVkX18Ljcvw/XJ3SVkNjkae76nOR3sBIOR54B9WJnQKIRmDQsVHfMU20hqNjPupv7jTs3WmZcaq1x0KFcgtBs6itruZEBknY1CFlMX9gZgPavCuGABLRsAoI7WXO0uwKVT9lWa4t7wGJt2sUrLjFL1S2ZAc6FUlszKC4Jv1onJtso38Rfo8ErtKnDBC4MBsusBjKJbtCqBSsYZ2u2csYXFD2Gmm7K9oCS6C3upAunwtt/a4KadUIuWdadusiB2l/5TWK6FMiITc6PcWB4O53qp28/F976InsMH0tqSZVkXeGJqpjuLZYXplm0vJHg1FBZ3Vrf8Y6IdOGsGl8hhMZ6Kht6+z/Sk1MGZ5FJKlzaAkbV2VxElFP8m9NbzvCnAk7n1orw/db+ZdFHg1jExg21aiTRmm5n7DnVMiKB0sxAcNs0RFfRo404/hgAPxNHStvpiangLFC/df7NOnH0IwVifkKuXyQ11v3QVWh1mQGMq4nRYo6jb1PfLe1GKM/4hlIX27mBK7oPA9JVC8BQpzNoWLWFLqRLYNwNmZNhslWQO9DJC8MZmLzl8giFEytVSS0wu6gGIGVuvifrkr3P8W10D0i5v11lk7RoLGgIKtComibqUATQnxieDKf9OUzIXyW9FuOdWGIulgRQZ2GdoI+B7kI9SN13i9cgHnAvW06u+YvgZ24pxxhKxDDpzbcNbGgCCjWrCEifdjnhhs+b8Jez2H/MVVj2Cc6HKt2SBHyP/okNsLATMjmpo5Y7kicJTn4wac8nlit0q+SK9z46VocTxJhjXUc9tztZHiW5egShe6Gbc5jhl2rS14oBYF2URGBr5U4xKARLpeaCzKmihLLsvCmq8sjnasgRaGJlrSPw+jBMTaG3WMegTPqClFE8bvAVMwPeOFEZeKZK6wdxLm5umhX5S3D8Y+gakJHn+vsFknnTwRpQYiNmQCYW9wcuORb8unhhFTm6JBdIC HynioMl9 9usRz2ywr6c+cybu1yVxRXcimaYLE2nL/tXEs8tk1tRlKqeusPavmNBbipEknLu6EU+/O6xd5oYEXyxegBeXYym578fHuO9sLaQXDViXtkHxAtL+p8a+p74XOElv1k0H0ogqaDEwMai6ZkyBh9XTI4TodyerBJjoegGRiFcbxB1FnbC8URitgrnPBfEe7jlKIsxXYuTnzcdHWsqtAgsG3+6AxBdOsDdKdr42R5yBA7yKUuhF7bXqfcsyO5wmMQkaU19Vjs9lXibN2kbgT6Tgip7C+S2d9AMj/+ORcFl/2oqX4616ly+C5AIZ/rjOg6vIH/mlg25Xl6AcKTPjcFyngP1f5GPEPzP11yzWqW2AaBxSUmShgt4IRiE1WPJXpMBkrsDISvIy97EEnj5qWC3fjaS7PJ5CvRoWtuhfbRfTfWUQzXzGEJNfIqF1F6yEYIAoRrOY/eMBpw1RTtC2Crkv/UfA31693qrC0Wn/WhtMIukNWvoPu/g91t29eYDgfkImX0iH5NAc3JDwNu76+HN5Ujlrk125vkMfmd14unhmw4QXplDrp8U/WmHLtN8tq+/m+0V7l2R9G6B/qNhRiQJgc0NL9YUCqqp37GkZY5o3hIqQO5PZPd3BkdrM+Bo4gpK9B82pxnFkV9hB0kmUcPbcA2IYwZA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, all contexts that flush memcg stats do so with sleeping not allowed. Some of these contexts are perfectly safe to sleep in, such as reading cgroup files from userspace or the background periodic flusher. Enable choosing whether sleeping is allowed or not when flushing memcg stats, and allow sleeping in safe contexts to avoid unnecessarily performing a lot of work without sleeping. Signed-off-by: Yosry Ahmed --- include/linux/memcontrol.h | 8 ++++---- mm/memcontrol.c | 35 ++++++++++++++++++++++------------- mm/vmscan.c | 2 +- mm/workingset.c | 3 ++- 4 files changed, 29 insertions(+), 19 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b6eda2ab205d..0c7b286f2caf 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1036,8 +1036,8 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return x; } -void mem_cgroup_flush_stats(void); -void mem_cgroup_flush_stats_delayed(void); +void mem_cgroup_flush_stats(bool may_sleep); +void mem_cgroup_flush_stats_delayed(bool may_sleep); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1531,11 +1531,11 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return node_page_state(lruvec_pgdat(lruvec), idx); } -static inline void mem_cgroup_flush_stats(void) +static inline void mem_cgroup_flush_stats(bool may_sleep) { } -static inline void mem_cgroup_flush_stats_delayed(void) +static inline void mem_cgroup_flush_stats_delayed(bool may_sleep) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 72cd44f88d97..39a9c7a978ae 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -634,7 +634,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) } } -static void __mem_cgroup_flush_stats(void) +static void __mem_cgroup_flush_stats(bool may_sleep) { /* * This lock can be acquired from interrupt context, but we only acquire @@ -644,26 +644,26 @@ static void __mem_cgroup_flush_stats(void) return; flush_next_time = jiffies_64 + 2*FLUSH_TIME; - cgroup_rstat_flush(root_mem_cgroup->css.cgroup, false); + cgroup_rstat_flush(root_mem_cgroup->css.cgroup, may_sleep); atomic_set(&stats_flush_threshold, 0); spin_unlock(&stats_flush_lock); } -void mem_cgroup_flush_stats(void) +void mem_cgroup_flush_stats(bool may_sleep) { if (atomic_read(&stats_flush_threshold) > num_online_cpus()) - __mem_cgroup_flush_stats(); + __mem_cgroup_flush_stats(may_sleep); } -void mem_cgroup_flush_stats_delayed(void) +void mem_cgroup_flush_stats_delayed(bool may_sleep) { if (time_after64(jiffies_64, flush_next_time)) - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(may_sleep); } static void flush_memcg_stats_dwork(struct work_struct *w) { - __mem_cgroup_flush_stats(); + __mem_cgroup_flush_stats(true); queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); } @@ -1570,7 +1570,7 @@ static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize) * * Current memory state: */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { u64 size; @@ -3671,7 +3671,11 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) unsigned long val; if (mem_cgroup_is_root(memcg)) { - mem_cgroup_flush_stats(); + /* + * mem_cgroup_threshold() calls here from irqsafe context. + * Don't sleep. + */ + mem_cgroup_flush_stats(false); val = memcg_page_state(memcg, NR_FILE_PAGES) + memcg_page_state(memcg, NR_ANON_MAPPED); if (swap) @@ -4014,7 +4018,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) int nid; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { seq_printf(m, "%s=%lu", stat->name, @@ -4090,7 +4094,7 @@ static int memcg_stat_show(struct seq_file *m, void *v) BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { unsigned long nr; @@ -4594,7 +4598,12 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_flush_stats(); + /* + * wb_writeback() takes a spinlock and calls + * wb_over_bg_thresh()->mem_cgroup_wb_stats(). + * Do not sleep. + */ + mem_cgroup_flush_stats(false); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); @@ -6596,7 +6605,7 @@ static int memory_numa_stat_show(struct seq_file *m, void *v) int i; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { int nid; diff --git a/mm/vmscan.c b/mm/vmscan.c index 9c1c5e8b24b8..59d1830d08ac 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(false); /* * Determine the scan balance between anon and file LRUs. diff --git a/mm/workingset.c b/mm/workingset.c index 00c6f4d9d9be..042eabbb43f6 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -462,7 +462,8 @@ void workingset_refault(struct folio *folio, void *shadow) mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - mem_cgroup_flush_stats_delayed(); + /* Do not sleep with RCU lock held */ + mem_cgroup_flush_stats_delayed(false); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if