From patchwork Sat Sep 12 15:40:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 11771947 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2563A139F for ; Sat, 12 Sep 2020 15:40:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 94DD7207C3 for ; Sat, 12 Sep 2020 15:40:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="AbWVWob8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 94DD7207C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 42B706B0002; Sat, 12 Sep 2020 11:40:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3DDDA6B0037; Sat, 12 Sep 2020 11:40:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CEA26B0055; Sat, 12 Sep 2020 11:40:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0159.hostedemail.com [216.40.44.159]) by kanga.kvack.org (Postfix) with ESMTP id 12D596B0002 for ; Sat, 12 Sep 2020 11:40:56 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C94AA362E for ; Sat, 12 Sep 2020 15:40:55 +0000 (UTC) X-FDA: 77254822470.17.fowl34_4c0e4f1270f8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id A0CAE180D0184 for ; Sat, 12 Sep 2020 15:40:55 +0000 (UTC) X-Spam-Summary: 1,0,0,e137d27bcecbeb9b,d41d8cd98f00b204,songmuchun@bytedance.com,,RULES_HIT:2:41:355:379:541:800:960:966:973:988:989:1260:1311:1314:1345:1437:1500:1515:1535:1605:1730:1747:1777:1792:1801:1963:2196:2198:2199:2200:2393:2553:2559:2562:2892:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3874:4049:4120:4250:4321:4385:4605:5007:6119:6261:6653:7903:8603:9010:9040:10004:11026:11473:11658:11914:12043:12048:12219:12291:12296:12297:12438:12517:12519:12555:12895:13869:13894:14394:21063:21080:21324:21444:21451:21611:21627:21740:21939:21990:30005:30029:30045:30054:30074:30075:30090,0,RBL:209.85.210.194:@bytedance.com:.lbl8.mailshell.net-62.14.0.100 66.201.201.201;04yfzb8oh9obof983dbbyx5gtfmxyopxn6a3imo5xnheqrkj7pg51ao7tok54ke.zeiyzy3sj95gegwqg786s675k1i3ekbwemyfn5sauygi6azsyg5hze3z639uykt.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:29,LUA_SUMMARY :none X-HE-Tag: fowl34_4c0e4f1270f8 X-Filterd-Recvd-Size: 9876 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Sat, 12 Sep 2020 15:40:55 +0000 (UTC) Received: by mail-pf1-f194.google.com with SMTP id d9so9342183pfd.3 for ; Sat, 12 Sep 2020 08:40:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=K45/sJTWfGLOHz8j1CJv9AIhPEXu5Ad4gLQ/bbT0UcA=; b=AbWVWob8pChjjUl+ZgKGLxWQ8FbtPajG0QedKbpFktVwmCp8w5ZOyoAl709BTS4W0r +5i7Oq2hkNf5xVABiJcrvQYrCi8ohfIbR3mGDo5ddHxDKxUlGbE9PbcVLzPY1L0bkoWH dK+9ISOgAu5BoWTT7pSA7C2v4usnGSBDiR3xZUIpnjgaXOxuIu6sn8KmrRC3TmTPLyx4 gM6atRQ5QNYCrqtcy4MWk0fmYNstVnw1nt5nn/GBoWtHdXTFK2r4xSdKx80EbWFW6k6K 63+2426KoPso6XKJjSFlIKLEMM7SpNOOKtZmd85kv8J0kLpxPyqccurt+CY8PN46dr3j 7bZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=K45/sJTWfGLOHz8j1CJv9AIhPEXu5Ad4gLQ/bbT0UcA=; b=pTY2TWQoLegyHbvmk6UHxVYpUp6kk/cmHQK+WbvXeAPFitCly7kkeAlwrF1E2vxMTw s+AjQz9flCgklSqfyriNJERD+t0eYoMJwalBTRGxuqnr2ggc7W6f0QkWBmbWsWPW3XTP Gxc2d666kK1y5ZhXHcBfPNmAzRxeo1el0JkfYn3gSobbbvNxqmTkj/IhCAgq6jH5TZJW TcVyjylB8o2Y3zTolfUD5yYnSzZyZQDSG7giUVR/xviFOdhbugERczGFGpDXASUgRuFt xpKQ8j3Lg3rhqKraqLXS20Fp6ZWFtfOAE3LeW2MOMk0WEtXDyVfDGg6671xAmWVn+35B O4Pw== X-Gm-Message-State: AOAM532VE6f+VveXQa2M/Cb06AtSRmJDhhjmorxnQMlW6APTMYlvsCDC gT88ZDDu41rn1J31sc5ZuGW9QA== X-Google-Smtp-Source: ABdhPJxbENed1wu39aYbUE+kjJBBVndX04LrcEzfuaHjdB8eCqNmx8fQWw8S+WlnGdmOF75u5Tg+tA== X-Received: by 2002:aa7:86c6:: with SMTP id h6mr6973675pfo.92.1599925253881; Sat, 12 Sep 2020 08:40:53 -0700 (PDT) Received: from localhost.localdomain ([103.136.221.71]) by smtp.gmail.com with ESMTPSA id f19sm5305713pfj.25.2020.09.12.08.40.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 12 Sep 2020 08:40:53 -0700 (PDT) From: Muchun Song To: tj@kernel.org, lizefan@huawei.com, hannes@cmpxchg.org, corbet@lwn.net, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, shakeelb@google.com, guro@fb.com Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2] mm: memcontrol: Add the missing numa_stat interface for cgroup v2 Date: Sat, 12 Sep 2020 23:40:18 +0800 Message-Id: <20200912154018.24910-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) MIME-Version: 1.0 X-Rspamd-Queue-Id: A0CAE180D0184 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the cgroup v1, we have a numa_stat interface. This is useful for providing visibility into the numa locality information within an memcg since the pages are allowed to be allocated from any physical node. One of the use cases is evaluating application performance by combining this information with the application's CPU allocation. But the cgroup v2 does not. So this patch adds the missing information. Suggested-by: Shakeel Butt Signed-off-by: Muchun Song --- changelog in v2: 1. Add memory.numa_stat interface in cgroup v2. Documentation/admin-guide/cgroup-v2.rst | 72 +++++++++++++++++++++ mm/memcontrol.c | 84 +++++++++++++++++++++++++ 2 files changed, 156 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 6be43781ec7f..92207f0012e4 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back. collapsing an existing range of pages. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE is not set. + memory.numa_stat + A read-only flat-keyed file which exists on non-root cgroups. + + This breaks down the cgroup's memory footprint into different + types of memory, type-specific details, and other information + per node on the state of the memory management system. + + This is useful for providing visibility into the numa locality + information within an memcg since the pages are allowed to be + allocated from any physical node. One of the use cases is evaluating + application performance by combining this information with the + application's CPU allocation. + + All memory amounts are in bytes. + + The output format of memory.numa_stat is:: + + type N0= N1= ... + + The entries are ordered to be human readable, and new entries + can show up in the middle. Don't rely on items remaining in a + fixed position; use the keys to look up specific values! + + anon + Amount of memory per node used in anonymous mappings such + as brk(), sbrk(), and mmap(MAP_ANONYMOUS) + + file + Amount of memory per node used to cache filesystem data, + including tmpfs and shared memory. + + kernel_stack + Amount of memory per node allocated to kernel stacks. + + shmem + Amount of cached filesystem data per node that is swap-backed, + such as tmpfs, shm segments, shared anonymous mmap()s + + file_mapped + Amount of cached filesystem data per node mapped with mmap() + + file_dirty + Amount of cached filesystem data per node that was modified but + not yet written back to disk + + file_writeback + Amount of cached filesystem data per node that was modified and + is currently being written back to disk + + anon_thp + Amount of memory per node used in anonymous mappings backed by + transparent hugepages + + inactive_anon, active_anon, inactive_file, active_file, unevictable + Amount of memory, swap-backed and filesystem-backed, + per node on the internal memory management lists used + by the page reclaim algorithm. + + As these represent internal list state (eg. shmem pages are on anon + memory management lists), inactive_foo + active_foo may not be equal to + the value for the foo counter, since the foo counter is type-based, not + list-based. + + slab_reclaimable + Amount of memory per node used for storing in-kernel data + structures which might be reclaimed, such as dentries and + inodes. + + slab_unreclaimable + Amount of memory per node used for storing in-kernel data + structures which cannot be reclaimed on memory pressure. + memory.swap.current A read-only single value file which exists on non-root cgroups. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 75cd1a1e66c8..f2ef9a770eeb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6425,6 +6425,84 @@ static int memory_stat_show(struct seq_file *m, void *v) return 0; } +#ifdef CONFIG_NUMA +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg, + unsigned int nid, + enum node_stat_item idx) +{ + VM_BUG_ON(nid >= nr_node_ids); + return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx); +} + +static const char *memory_numa_stat_format(struct mem_cgroup *memcg) +{ + struct numa_stat { + const char *name; + unsigned int ratio; + enum node_stat_item idx; + }; + + static const struct numa_stat stats[] = { + { "anno", PAGE_SIZE, NR_ANON_MAPPED }, + { "file", PAGE_SIZE, NR_FILE_PAGES }, + { "kernel_stack", 1024, NR_KERNEL_STACK_KB }, + { "shmem", PAGE_SIZE, NR_SHMEM }, + { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED }, + { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY }, + { "file_writeback", PAGE_SIZE, NR_WRITEBACK }, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + { "anon_thp", HPAGE_PMD_SIZE, NR_ANON_THPS }, +#endif + { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON }, + { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON }, + { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE }, + { "active_file", PAGE_SIZE, NR_ACTIVE_FILE }, + { "unevictable", PAGE_SIZE, NR_UNEVICTABLE }, + { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B }, + { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B }, + }; + + int i, nid; + struct seq_buf s; + + /* Reserve a byte for the trailing null */ + seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1); + if (!s.buffer) + return NULL; + + for (i = 0; i < ARRAY_SIZE(stats); i++) { + seq_buf_printf(&s, "%s", stats[i].name); + for_each_node_state(nid, N_MEMORY) { + u64 size; + + size = memcg_node_page_state(memcg, nid, stats[i].idx); + size *= stats[i].ratio; + seq_buf_printf(&s, " N%d=%llu", nid, size); + } + seq_buf_putc(&s, '\n'); + } + + /* The above should easily fit into one page */ + if (WARN_ON_ONCE(seq_buf_putc(&s, '\0'))) + s.buffer[PAGE_SIZE - 1] = '\0'; + + return s.buffer; +} + +static int memory_numa_stat_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + const char *buf; + + buf = memory_numa_stat_format(memcg); + if (!buf) + return -ENOMEM; + seq_puts(m, buf); + kfree(buf); + return 0; +} +#endif + static int memory_oom_group_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); @@ -6502,6 +6580,12 @@ static struct cftype memory_files[] = { .name = "stat", .seq_show = memory_stat_show, }, +#ifdef CONFIG_NUMA + { + .name = "numa_stat", + .seq_show = memory_numa_stat_show, + }, +#endif { .name = "oom.group", .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE,