From patchwork Tue Mar 28 06:16:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13190559 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6431DC77B60 for ; Tue, 28 Mar 2023 06:16:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3E44280003; Tue, 28 Mar 2023 02:16:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA0A6280001; Tue, 28 Mar 2023 02:16:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3FD9280003; Tue, 28 Mar 2023 02:16:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BE771280001 for ; Tue, 28 Mar 2023 02:16:55 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 87EF980849 for ; Tue, 28 Mar 2023 06:16:55 +0000 (UTC) X-FDA: 80617298790.03.09FDEC6 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf14.hostedemail.com (Postfix) with ESMTP id BD9FB100011 for ; Tue, 28 Mar 2023 06:16:53 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ntB4QsMV; spf=pass (imf14.hostedemail.com: domain of 3VIYiZAoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3VIYiZAoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679984213; a=rsa-sha256; cv=none; b=3/dwhwHHACLjhYq5wDzMqp0sjmA2k7C0nQnHaIMFTB4RH0IOZOmJ38Cuyf/XCAWh7EUpLQ BL8sDiMi3zwwDPGwl7+Q/1c3En5EhJ1n9m78YmigYom55bMeBF10N0aEfOlRnxlfnmkQbm xBOFUWmlyhaK9MvSYOPTXbuamURsEBY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ntB4QsMV; spf=pass (imf14.hostedemail.com: domain of 3VIYiZAoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3VIYiZAoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679984213; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9IOPAG+bqxWxKt0wdRgDKHhdCXWwpmJ9UigZO44sKyE=; b=5DpXTXJgLD+WqcKY1bVX7o53aybEDmgQIIlKeVkEVyRjudM1ENhJzxmGFpLObzIpM5UHcb yYho8jx3GASH8YxcZObLy9Q4tOu+9SU9rWvWDAR4f2hzkxZbl+LjxpG/5eWUECXb4zWf/Z 21g2PMtvJ7vRC5gqS/hNPGwP/IQbjoQ= Received: by mail-pg1-f201.google.com with SMTP id k1-20020a632401000000b0050beb8972bfso2919994pgk.7 for ; Mon, 27 Mar 2023 23:16:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679984213; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9IOPAG+bqxWxKt0wdRgDKHhdCXWwpmJ9UigZO44sKyE=; b=ntB4QsMVT8Fqv0//N2SyNJC8TO+SF5Nd//5MMCXPuMIlVWZztrbNiDvPnYTm36PsVJ 0K9UDAdVpl/jKPNVqGESnlXBby6gGh6KP9olFVGSWL7LZjX9hTDyTd2rcdVRYaALTq3d Svdw5D9xcS6f+R64pTOq4hP3DTwLDxc7s6WmPqIXIzObjUGQ7BCksYYcDbsn8CMNCe4l XYGLULvlYDvGRGQwaFkUUkpNiN3oSuW1tCmNSMxILm7ttbjB9+T2xkgSS2wr2ydIjDg8 ulb0ThWq/NbS6X5lqaGmn6zu6IKbfwWSX3vLp0eLAeAr3Ue3Dif4Fen7adboVvdM5kPM ZMpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679984213; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9IOPAG+bqxWxKt0wdRgDKHhdCXWwpmJ9UigZO44sKyE=; b=Hi2Ec82poc7IDPcEUbShnrI1gCZSpatfoX1gbXoQEDZsptW35C0XlEeCfdEsnNq0k5 N4FOQwCnB0XlVzSWakdHP4LOSDGBbi9b9Rqu3yDnzhfGMYKv9JdFMymGa5Ho7I6Rry1d ktC+oOjbLZFnvD9w2kZbsu1WDjk1CoVxHArfbcES5a6z0YYrTGJzkfUOy4lSYdSc9Wum nxinF+cdd2Lg85Zak1mLTVoI5/7BT5Rt2oY60IQ2FpzHC+9JQ8FbLdqSTrwPxZT8lP8o 3Vgr+FzIFhoBJrNd7u15XNVNonl0GkMEUT1ZUbut0bO1MTSYNG51pVgDDKCu1g3Sh9cm 180g== X-Gm-Message-State: AAQBX9fCdq6udg91u4jI1pf4i43pHGVOToci0YkiH3B37ZezKlsEhGLA UL3XDs6Pn5Tj+ftxYXKngLMXPu1XEq+s7eWR X-Google-Smtp-Source: AKy350aQyFDG3vc2bblz86MnnpZXVqNeZCl/Rb1kLJ1qs2lbXDpCUugeQMSOiQHvyL07sX6z7bmDfgTHunmuYJ4D X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a63:4a41:0:b0:507:46cb:f45b with SMTP id j1-20020a634a41000000b0050746cbf45bmr3969874pgl.1.1679984212630; Mon, 27 Mar 2023 23:16:52 -0700 (PDT) Date: Tue, 28 Mar 2023 06:16:35 +0000 In-Reply-To: <20230328061638.203420-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230328061638.203420-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230328061638.203420-7-yosryahmed@google.com> Subject: [PATCH v1 6/9] memcg: sleep during flushing stats in safe contexts From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed X-Rspam-User: X-Rspamd-Queue-Id: BD9FB100011 X-Rspamd-Server: rspam01 X-Stat-Signature: e1t77jcck9sw7cded4i4tt5nzdfp3ooz X-HE-Tag: 1679984213-767051 X-HE-Meta: U2FsdGVkX1+YvFrJRI/W7McarTxDTka+IRUK5khqF3Hxk3I4rM5anl22xhcjG7szLANnqWwG83Y7Z8DqDzrU6E9HxW3DzyHdRif3b2J+pgCjwMvfQQnPgrGh2d+KlXi+8pw32MiHkT/PZEX6taH2dvkWq7/aAps2MuIQWLo05bYoThjOvp72qn9zN3uVxOKD2s2CAgk8Oz9LEFaPiipOURDEej87fvHOxzw49v7Vm/T+qFA/ivaQxBZVnfWTm1SRuD71vaBMC9Rt5+JWe0+VdBEVEJPY6fEQf8nEsBpXJxfIdkt03dMNMT2AZ/ICB47uDpyxvf77IlDwY9pxk5gtptbit5E8TgH/k/YVpLTK5tkIC4C72htCvgQZbUYAJbekk6V9hQWPnAB2rdAl+B913Dp0UgRyby4DeEaVfpitMatnfHMHJpfb0uKGm3YJgHI+4syzPUKuU8v/waZPf0WwqHORmxCBp+66ux9+vMSlQZGrcGIVlh8AbJicYIy167XYeZ5w1clQRYHqtVTz5bc1PkNhdNSP+zZj9xter/0vJ9rDojJVXkbJlCerRggMd0HGwHOZPLQy27ZZR5vb9IsF3ld3K+i6B+j2vZncVsUrpGaorlYzGaU5EqEF0VoBWzvjjHnF/vBviEsKQZFw5r2E0NaLYfDVwcrB0hrvu/aPzL1OaR8dXAZ1S9e1oPXzp4daexFSd5xvBWiGR2ETXOUDYGMrDYJH2197e8jaJdFUGON/jnuE5TnM1JoPXgIstf1fqx2Y2bGemoUMZC+dR6H2wln2QlIJmQSt12Bxxv2g3ZYk1Nmws4Zno4BJ0upF/NmUMzyFFPCe2h+nsOACFZdKiolSmA2E/1cIhFjyhwaXWQireitWEzt/W5JdsPqkasmM+icRDRHzxA+KsTKYU5AwQeCl5Vr4g8GL9c9je277ZeFKshugd9ef02pURCXpxvmCbY9HxkOLHLvOjr3wVM7 el3mdFwv N7OxAvLyJkGnCBu2HJ3msf6o1B+03dSO11N4+r7HMBaxHMj4s5EFNPwfQ9uQtvXJqh0J1DvGgL9p5MT1C6TlDHYS/94kCPiYvdQmNheltdXJUtM0LVSY2Xvrza4RNniGSl2IGSil5bidhG3NgZAVlO9aoF2MTkLmiU6S2vjOgWfcfHHUjoO8pA2c766oLpQgZ1dHFFZm09GurFPWnW9OwQy2nDQeRa9XimgQNisfRzM4r8K2kf1ebpYIkWVkopzw48gSPrdBxLPg0vIuuTfwdG5xwb+Bc4OIN023xiOW8DvRueFGEqPnE6HuHMmojfmXcy2dz7YMZIqqpB7XN8Z8Z0XESOzUYm6nuZVDQMCD9FS/GEuHDDPa/L5AiNly0FtN696H1WMd1DMj/jJNjfg7l25NEngTIdvoV5vnrm5H+rzWqL2t674AFKJBCnFgKLMdobGndQesNcS3OeLfGRRUzK5ScGt8e4vZWx3r3/Cxuz9wSn2q+dQ5HxVJPN8hFUZKPyvFdMOWgVxk2aYMFn2RLlQseLIpA/2o8wib43cqKCCwhu6msgI1KlJ40vmXDQAgqYgEEVh08CVklbWvyex2W2JI63ynk5czJJaGKgHLSJOfoYHtf9zOuHL3zxjDLG+jLuwq7DcjpKT6EJG4vp95ZRlVh/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, all contexts that flush memcg stats do so with sleeping not allowed. Some of these contexts are perfectly safe to sleep in, such as reading cgroup files from userspace or the background periodic flusher. Refactor the code to make mem_cgroup_flush_stats() non-atomic (aka sleepable), and provide a separate atomic version. The atomic version is used in reclaim, refault, writeback, and in mem_cgroup_usage(). All other code paths are left to use the non-atomic version. This includes callbacks for userspace reads and the periodic flusher. Since refault is the only caller of mem_cgroup_flush_stats_ratelimited(), this function is changed to call the atomic version of mem_cgroup_flush_stats(). Reclaim and refault code paths are modified to do non-atomic flushing in separate later patches -- so mem_cgroup_flush_stats_ratelimited() will eventually become non-atomic. Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 5 ++++ mm/memcontrol.c | 58 ++++++++++++++++++++++++++++++++------ mm/vmscan.c | 2 +- 3 files changed, 55 insertions(+), 10 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ac3f3b3a45e2..a4bc3910a2eb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1037,6 +1037,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, } void mem_cgroup_flush_stats(void); +void mem_cgroup_flush_stats_atomic(void); void mem_cgroup_flush_stats_ratelimited(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, @@ -1535,6 +1536,10 @@ static inline void mem_cgroup_flush_stats(void) { } +static inline void mem_cgroup_flush_stats_atomic(void) +{ +} + static inline void mem_cgroup_flush_stats_ratelimited(void) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 64ff33e02c96..57e8cbf701f3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -634,7 +634,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) } } -static void __mem_cgroup_flush_stats(void) +static bool mem_cgroup_pre_stats_flush(void) { /* * We always flush the entire tree, so concurrent flushers can just @@ -642,24 +642,57 @@ static void __mem_cgroup_flush_stats(void) * from memcg flushers (e.g. reclaim, refault, etc). */ if (atomic_xchg(&stats_flush_ongoing, 1)) - return; + return false; WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); - cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); + return true; +} + +static void mem_cgroup_post_stats_flush(void) +{ atomic_set(&stats_flush_threshold, 0); atomic_set(&stats_flush_ongoing, 0); } -void mem_cgroup_flush_stats(void) +static bool mem_cgroup_should_flush_stats(void) { - if (atomic_read(&stats_flush_threshold) > num_online_cpus()) - __mem_cgroup_flush_stats(); + return atomic_read(&stats_flush_threshold) > num_online_cpus(); +} + +/* atomic functions, safe to call from any context */ +static void __mem_cgroup_flush_stats_atomic(void) +{ + if (mem_cgroup_pre_stats_flush()) { + cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); + mem_cgroup_post_stats_flush(); + } +} + +void mem_cgroup_flush_stats_atomic(void) +{ + if (mem_cgroup_should_flush_stats()) + __mem_cgroup_flush_stats_atomic(); } void mem_cgroup_flush_stats_ratelimited(void) { if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_atomic(); +} + +/* non-atomic functions, only safe from sleepable contexts */ +static void __mem_cgroup_flush_stats(void) +{ + if (mem_cgroup_pre_stats_flush()) { + cgroup_rstat_flush(root_mem_cgroup->css.cgroup); + mem_cgroup_post_stats_flush(); + } +} + +void mem_cgroup_flush_stats(void) +{ + if (mem_cgroup_should_flush_stats()) + __mem_cgroup_flush_stats(); } static void flush_memcg_stats_dwork(struct work_struct *w) @@ -3684,9 +3717,12 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) * done from irq context; use stale stats in this case. * Arguably, usage threshold events are not reliable on the root * memcg anyway since its usage is ill-defined. + * + * Additionally, other call paths through memcg_check_events() + * disable irqs, so make sure we are flushing stats atomically. */ if (in_task()) - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_atomic(); val = memcg_page_state(memcg, NR_FILE_PAGES) + memcg_page_state(memcg, NR_ANON_MAPPED); if (swap) @@ -4609,7 +4645,11 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_flush_stats(); + /* + * wb_writeback() takes a spinlock and calls + * wb_over_bg_thresh()->mem_cgroup_wb_stats(). Do not sleep. + */ + mem_cgroup_flush_stats_atomic(); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); diff --git a/mm/vmscan.c b/mm/vmscan.c index 9c1c5e8b24b8..a9511ccb936f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_atomic(); /* * Determine the scan balance between anon and file LRUs.