From patchwork Mon Aug 21 20:54:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13359808 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91F67EE49AA for ; Mon, 21 Aug 2023 20:55:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1765E940013; Mon, 21 Aug 2023 16:55:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1000A8E0012; Mon, 21 Aug 2023 16:55:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6D41940013; Mon, 21 Aug 2023 16:55:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CEBAC8E0012 for ; Mon, 21 Aug 2023 16:55:25 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9F108140B1E for ; Mon, 21 Aug 2023 20:55:25 +0000 (UTC) X-FDA: 81149317410.03.76A9951 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf13.hostedemail.com (Postfix) with ESMTP id D30D220005 for ; Mon, 21 Aug 2023 20:55:23 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=KmH6O0vZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of 3Os_jZAoKCJUNDHGNz6B325DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yosryahmed.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3Os_jZAoKCJUNDHGNz6B325DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692651323; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=II24bAIuGVg2YyiqwiCMReE2mVTE25qmDUAuXBGJ4Ok=; b=7Ar+uzhTl5abwmTHFT4VzkFnnoESq6wP/e0laGCwFqcuAPIFOIyjw3n4wODgx1XbCBczw2 V3/W257pVwCSYf1QHR8bQwIeIirMTIOJOxmf3R6JVd3ApqwUHdl0MfC7eIKK+KE2uVov6s A14GIAOy30GlkPgqoHeZW/AA8mE9blM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=KmH6O0vZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of 3Os_jZAoKCJUNDHGNz6B325DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yosryahmed.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3Os_jZAoKCJUNDHGNz6B325DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692651323; a=rsa-sha256; cv=none; b=uUlcUEBJUfgbj3LFiyqEL00Y/sOB7KITYaBqSCLB+xpp8qfJDLYKqnuKLK3XgGubkxd1YD 73sY/SiwQ59Rr45PFrmEgQbZndV4C4w8JAxfowu5CWJ6zQni3WJIVNRhy1D7dgXJtx13oN NHef3eTnBqPU0+jwZu2MvbdXogOGQ8M= Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-1bdba3f0e73so41877095ad.1 for ; Mon, 21 Aug 2023 13:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692651323; x=1693256123; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=II24bAIuGVg2YyiqwiCMReE2mVTE25qmDUAuXBGJ4Ok=; b=KmH6O0vZ9sBp7HHBae0PwJCnRa8ujSumjQ0AlsGJ1pi5KbQIxpyrvcCO0krC5ztY+u N8ljzGOzIX6jer2HV1GxzZPCURYnOZC6D1lrlNXbpCHNDpRJB0nITXoN0mon6iaCy4e1 QkiLalZ25r1ZxplkbXN+NiDkoW8GDcA+Uj1Shac+wu5psshTskn8q8Lf6B5bKZffckoB jKrEuG2Vw03aNvOQfFER14s8IkriQqDS2U2JTgSKgCtg7TmcMJ9q3B18uyHoSMZM7lnj ioj21pXqwFO1fJbsQwZ0rFa+1Vc07Ob56riuci9Ea7Iw13vPBzZ96Hsd5+OoDw6zGDiR 3h5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692651323; x=1693256123; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=II24bAIuGVg2YyiqwiCMReE2mVTE25qmDUAuXBGJ4Ok=; b=KJvFNEMfM63rs0JCWHadM7MGEMNBJSa7DGFk1S+KsTjIb5vjLr+q7HZEb3hXusAzNH or9s5OjxtBsWrfJfLTBaGByFliRa57QD858HXzd/ZGV22w9iOpQdzFvbIIlZBh63mrd/ YetLPCsgf5kL1If734GCzV0Kg0Qe+8eAAkgGI8otXNP0DT55OhbHr4AlV5EnTADN9j83 EX/aARWOgO3ab93OUhyuubP6LmHJ4ZNmDLuELLLgkQZpGqwEWFdNMV23+CAVhfAIbDz+ SkfXazMOAMVzYw1h2NzVMWU2nlBeRLjqJxQ/1AW5WzLu8OaunPhWH4qw5nRfiZSIJXSe qD1Q== X-Gm-Message-State: AOJu0YwtHFc/fJ7qtTn6d4P18VLE9Cw65zHYLO/mLPbkL6GZTo2SPfQU 0BwRgV8MsLuwwwySGO/G+WD5x4fVCGZwRmTj X-Google-Smtp-Source: AGHT+IEvQmiYC84sZKnE5YzQy6JNP4dH1ZNkD129Wrs7O1rPHJU5tveJ2ce7KEKHH/DtaNVXLSxePpzIYxXdsGad X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:902:eccc:b0:1b8:a555:385d with SMTP id a12-20020a170902eccc00b001b8a555385dmr3847926plh.9.1692651322749; Mon, 21 Aug 2023 13:55:22 -0700 (PDT) Date: Mon, 21 Aug 2023 20:54:58 +0000 In-Reply-To: <20230821205458.1764662-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230821205458.1764662-1-yosryahmed@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog Message-ID: <20230821205458.1764662-4-yosryahmed@google.com> Subject: [PATCH 3/3] mm: memcg: use non-unified stats flushing for userspace reads From: Yosry Ahmed To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed X-Rspamd-Queue-Id: D30D220005 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: fixpoq4uwe98qqdmwpg7uh9k1n4737sa X-HE-Tag: 1692651323-429222 X-HE-Meta: U2FsdGVkX1+JJ+Aea7PEXAeinft69/s5Jczv582CIBmo6hH0yV3g10iQ3RldNs7TvrXVIS+7hrVrVAZdmDI1yWHhglAeUJoQsKElD/YjYYxxeWaDML3f0wMB7RoRWsrjPnDGCBr7YQjL0/yqVhhJzsejTrXQLt5gIdAZ6+djM69zYh09RNTbtasymb7yunNzDoKSdinh1mdgzKrbVQtrxwR8FlPHla3s5w+EsUX/4fb5rqUXAQS0plnwJR+JErTJAGEgdGdn2NI++wTMO2quq1j4DAJZPeRlCyKDSz5iMVUrYoym2hL1J337EbJE4KjvEDl4gfYS/ocs3Fs5cWQ+V4ZZdJwrKfMsc759pBhhNFNwc4iKFJJpswKjUcbHGViLmwO9VzuxHoJEtFdtRwQYFOCdx0DbyL3UZ4b67gwLLSsAIOY7/mB4zaoiAhj7TKVjBZtdHd4+Zh0p3Zg/gVb/gpcPJf9F9XRCmOgiDihiHCXbQRKgNp1DMfTsmwu9Xk/bpBfiKmbT2t3sA2L+x694mzpBJIui0KM11Rs53V1rapYS8FqR5SZBEQbvmJ6FnG8W+xRXFybF3YE8AllzR5mjbUnoqJ7fMRMnPpsic1rBOyANWV/vSOSQitzqOMuYp9CMy36OwXYM+SlZmjq6P8OCtdbr3rLLoLALyw051fIZXYFVDvADYsqOHcrsdIUwZNXvWy1sxfiQ7rxGyYSxz0KQsmfTXpYzKmzJSO8Fw0xAM6TK5z7AGbw+QNHoU9DeDteSuiYH4g0Ub4UzPnUjQAk1hkHKbNnQNXboYBflqmjQvzkJqJAVJEklH+u4ZFDNdZW14KlcrVa2uAmb5j5FiD9y5+kU4CDrGvR56ypXE7H1FUyG7VmJuw9ythYLn1sDB/BomyEWVm3R+ZP8dvkmiYaT91gJqqZZ8VolHDVox0nyIzq5xlIOgD4pB4PF2+kNy+uzFx5rNieaFR4jjxnuMsF FaVz+uZC lJFTvn5gDQn++hLpJQM9F68U/f7rNa7rfzfxRFhnX97xPVoHZl6AH9aRFO1brutcRVBPRmRV8dPIPPiPIibNruWNy0s/P/wJmjTbUM+7yvSq9dXmchPBrRYgsouTcr1TZdhgPQkpQ24UXGWYtAC/6guAa4rwAmhixB3K0IuA1/eXH2Z1udpMQljIxyKGCRJhUPTF8FJVaxq9MzTp7tIBygainKD0SAEj21v61QybuRU61MZOaorMkT80a3ovpCLjnYpiBaGkLTaTBxLPgIgJfqavEDnkGBMYN4Eqy+ChKVUCH1nBCdYYJgKMsuRPkqKACzdYaQm2rN24DuvtKfpJdo2YTyBngQFwOqs/UrLsxPvzP4Eq9jtLkiopxTTRwU3R3/hIrb2EKsFy5AwAMjEN6Z/lz+Usr1ZQKSTitrJF44qiR9lr4SKbG3rfIX0sOmOnKUlSrXMQCWynqgGdtIjt8nGq0Nj6lfIdq1Vt3UuFKeK+UnDL3N7WnENB4SLqeOtYfKDe8Eg1dZoy4cv0PAkMTwu1uoHHSZykAnFAFFPoQjgGsT6Xi+xmKWm+O/nfFSH6hlu57w+nj1bIXfAPjnWT1CF/z8kX078Gtw+U29v+qE7eFc+GDJrDPs7YjJ9stwPjRrjE98P/Rg3YhzdK1vTSZtjwN75VJL8Fe7P90/SP/U1F7Zffxwz+mMPcxyzH18PdtwoZsObo0VXLNdsV0bVD+V0Sjrg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Unified flushing allows for great concurrency for paths that attempt to flush the stats, at the expense of potential staleness and a single flusher paying the extra cost of flushing the full tree. This tradeoff makes sense for in-kernel flushers that may observe high concurrency (e.g. reclaim, refault). For userspace readers, stale stats may be unexpected and problematic, especially when such stats are used for critical paths such as userspace OOM handling. Additionally, a userspace reader will occasionally pay the cost of flushing the entire hierarchy, which also causes problems in some cases [1]. Opt userspace reads out of unified flushing. This makes the cost of reading the stats more predictable (proportional to the size of the subtree), as well as the freshness of the stats. Since userspace readers are not expected to have similar concurrency to in-kernel flushers, serializing them among themselves and among in-kernel flushers should be okay. This was tested on a machine with 256 cpus by running a synthetic test The script that creates 50 top-level cgroups, each with 5 children (250 leaf cgroups). Each leaf cgroup has 10 processes running that allocate memory beyond the cgroup limit, invoking reclaim (which is an in-kernel unified flusher). Concurrently, one thread is spawned per-cgroup to read the stats every second (including root, top-level, and leaf cgroups -- so total 251 threads). No regressions were observed in the total running time; which means that non-unified userspace readers are not slowing down in-kernel unified flushers: Base (mm-unstable): real 0m18.228s user 0m9.463s sys 60m15.879s real 0m20.828s user 0m8.535s sys 70m12.364s real 0m19.789s user 0m9.177s sys 66m10.798s With this patch: real 0m19.632s user 0m8.608s sys 64m23.483s real 0m18.463s user 0m7.465s sys 60m34.089s real 0m20.309s user 0m7.754s sys 68m2.392s Additionally, the average latency for reading stats went from roughly 40ms to 5 ms, because we mostly read the stats of leaf cgroups in this script, so we only have to flush one cgroup, instead of *sometimes* flushing the entire tree with unified flushing. [1]https://lore.kernel.org/lkml/CABWYdi0c6__rh-K7dcM_pkf9BJdTRtAU08M43KO9ME4-dsgfoQ@mail.gmail.com/ Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90f08b35fa77..d3b13a06224c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1606,7 +1606,7 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) * * Current memory state: */ - mem_cgroup_try_flush_stats(); + do_stats_flush(memcg); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { u64 size; @@ -4048,7 +4048,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) int nid; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_try_flush_stats(); + do_stats_flush(memcg); for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { seq_printf(m, "%s=%lu", stat->name, @@ -4123,7 +4123,7 @@ static void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); - mem_cgroup_try_flush_stats(); + do_stats_flush(memcg); for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { unsigned long nr; @@ -4625,7 +4625,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_try_flush_stats(); + do_stats_flush(memcg); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); @@ -6640,7 +6640,7 @@ static int memory_numa_stat_show(struct seq_file *m, void *v) int i; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_try_flush_stats(); + do_stats_flush(memcg); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { int nid;