From patchwork Wed Sep 21 17:00:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 12984032 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9C6EC6FA82 for ; Wed, 21 Sep 2022 17:00:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75FCF6B0082; Wed, 21 Sep 2022 13:00:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 737106B0083; Wed, 21 Sep 2022 13:00:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B213940007; Wed, 21 Sep 2022 13:00:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 468B86B0082 for ; Wed, 21 Sep 2022 13:00:31 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0845BC0F59 for ; Wed, 21 Sep 2022 17:00:31 +0000 (UTC) X-FDA: 79936706262.19.3BF249F Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf07.hostedemail.com (Postfix) with ESMTP id 6706640014 for ; Wed, 21 Sep 2022 17:00:30 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id p1-20020a17090a2d8100b0020040a3f75eso6521549pjd.4 for ; Wed, 21 Sep 2022 10:00:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=rEuCWbdgrwwGiHAb+CkeCueJvfR5xkIo+Y2V08UZ+0Q=; b=C4bm40YB8V9wdtklRrCCiR3gKKqr4vT5mv+UmGw/unek0Cdvuigja/P7Dp+MQyi3bU gfKt7ChIFsbYGzX5ZdSVeqZZVSQ7PTVpBH69eT2f4oTlkFr/rRL5wWCz03IITuIIYVDw p7zkKTB/REpasj9icnRE32zK+VZ0/7UwK89KL1Hl9OGSmIvQox7MHLRt7kIuexgcOsrK GF4ehGltq1jy1DXXRAumQoCHZ4mKYidr13NRBPxtpKdKZR2isS6nNpRqNvRlGtPErT6G cMsoSA0CBK9xcE1ug+J5uZCXVB6uQQN2165ppcDwMTtweS2RbDZH+emKAgKy2KdF98Yo cLTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=rEuCWbdgrwwGiHAb+CkeCueJvfR5xkIo+Y2V08UZ+0Q=; b=DzfjMgfIAwt0BkfHGfGKxGaqqJndH89MqRtHQ5Zv8+M56Nx+e6pALsNQbSkfMjmMgo +QGcdiRybGSHLksaNCjXFqwKEpJoolNou/9T2ur6uG9ANVqzfqiT1gBnfuNX+xsy9042 R65oahdOzpGNK0iVB918Ij3iIFniIt3EJ1Gq3vesbnGKl7oWRzcjEuzojIqGeRzv8lEj 7zGG0gHMKAsIVKLTvoZBO6J1z+ubyxxi/4eCL9o+GyHBbXGcDKDtCMtRyiKH1/DLI9dP TcAn/CShxze7ufK7OIHYVaV1N/bN3T7wVfXfTEyumiK0vi9cXEkcdM/RTFGdBDSQbBX+ 4ipg== X-Gm-Message-State: ACrzQf19KerU648Ctr0wKa4RXfsXzulCqbTv+Hs38Hv+hcxcvHRnqA63 xeAHUQJzQJwKr4YId2dRPfE= X-Google-Smtp-Source: AMsMyM68wv7b1uQ8Tbun+mu+bNwKV5sJPRgpzJPJbjzgAjhhr9RWZmRnD6jART9e8Z6VQ+pZZnmQtg== X-Received: by 2002:a17:90b:3b92:b0:203:a4c6:383c with SMTP id pc18-20020a17090b3b9200b00203a4c6383cmr10353340pjb.92.1663779629464; Wed, 21 Sep 2022 10:00:29 -0700 (PDT) Received: from vultr.guest ([2001:19f0:6001:488e:5400:4ff:fe25:7db8]) by smtp.gmail.com with ESMTPSA id mp4-20020a17090b190400b002006f8e7688sm2102495pjb.32.2022.09.21.10.00.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Sep 2022 10:00:28 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, songmuchun@bytedance.com, akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com Cc: cgroups@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org, Yafang Shao Subject: [RFC PATCH bpf-next 10/10] bpf, memcg: Add new item bpf into memory.stat Date: Wed, 21 Sep 2022 17:00:02 +0000 Message-Id: <20220921170002.29557-11-laoar.shao@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220921170002.29557-1-laoar.shao@gmail.com> References: <20220921170002.29557-1-laoar.shao@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=C4bm40YB; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663779630; a=rsa-sha256; cv=none; b=ftVWkXuRoOg4bR2/Pl2Kx39ESD7mz9LdqVR3ZwZc6ZWJRQLFg31FCxb+AauXsar9U61ydP NeiL6mfRdDtuuvwFe2LnyRmB19iJfiwakmmdSCx1Csug/Vnf+DYO30oXibwiARSI6vDAPe AyNXtUauhn80NsubL51bAzLe6+G1rs0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663779630; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rEuCWbdgrwwGiHAb+CkeCueJvfR5xkIo+Y2V08UZ+0Q=; b=GcnC8A+dhe3h9YCnn3HZYYaQmLIkuqXlTjgn0kOq3y6EoByyqhuSP5CODjV0Kblns9iPPJ Vr+hpIrW/Hb8mZmHErSot4LCEbritqbnrvBAOH2KUokwxqQf5Z8NZfiMWwqIi50lZKBiBe LnW7G5CTUs4F6WXq4j+kE+mZxLaTT+o= Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=C4bm40YB; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Rspamd-Server: rspam06 X-Stat-Signature: 7duhoa77q6iz3xaah37coybe8kt4fsxj X-Rspam-User: X-Rspamd-Queue-Id: 6706640014 X-HE-Tag: 1663779630-338654 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new item 'bpf' is introduced into memory.stat, then we can get the memory consumed by bpf. Currently only the memory of bpf-map is accounted. The accouting of this new item is implemented with scope-based accouting, which is similar to set_active_memcg(). In this scope, the memory allocated will be accounted or unaccounted to a specific item, which is specified by set_active_memcg_item(). The result in cgroup v1 as follows, $ cat /sys/fs/cgroup/memory/foo/memory.stat | grep bpf bpf 109056000 total_bpf 109056000 After the map is removed, the counter will become zero again. $ cat /sys/fs/cgroup/memory/foo/memory.stat | grep bpf bpf 0 total_bpf 0 The 'bpf' may not be 0 after the bpf-map is destroyed, because there may be cached objects. Note that there's no kmemcg in root memory cgroup, so the item 'bpf' will be always 0 in root memory cgroup. If a bpf-map is charged into root memcg directly, its memory size will not be accounted, so the 'total_bpf' can't be used to monitor system-wide bpf memory consumption yet. Signed-off-by: Yafang Shao --- include/linux/bpf.h | 10 ++++++++-- include/linux/memcontrol.h | 1 + include/linux/sched.h | 1 + include/linux/sched/mm.h | 24 ++++++++++++++++++++++++ kernel/bpf/memalloc.c | 10 ++++++++++ kernel/bpf/ringbuf.c | 4 ++++ kernel/bpf/syscall.c | 40 ++++++++++++++++++++++++++++++++++++++-- kernel/fork.c | 1 + mm/memcontrol.c | 20 ++++++++++++++++++++ 9 files changed, 107 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f7a4cfc..9eda143 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1725,7 +1725,13 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, void bpf_map_kvfree(const void *ptr); void bpf_map_free_percpu(void __percpu *ptr); -#define bpf_map_kfree_rcu(ptr, rhf...) kvfree_rcu(ptr, ## rhf) +#define bpf_map_kfree_rcu(ptr, rhf...) { \ + int old_item; \ + \ + old_item = set_active_memcg_item(MEMCG_BPF); \ + kvfree_rcu(ptr, ## rhf); \ + set_active_memcg_item(old_item); \ +} #else static inline void * @@ -1771,7 +1777,7 @@ static inline void bpf_map_free_percpu(void __percpu *ptr) #define bpf_map_kfree_rcu(ptr, rhf...) kvfree_rcu(ptr, ## rhf) -#endif +#endif /* CONFIG_MEMCG_KMEM */ extern int sysctl_unprivileged_bpf_disabled; diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d4a0ad3..f345467 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -37,6 +37,7 @@ enum memcg_stat_item { MEMCG_KMEM, MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, + MEMCG_BPF, MEMCG_NR_STAT, }; diff --git a/include/linux/sched.h b/include/linux/sched.h index e7b2f8a..79362da 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1423,6 +1423,7 @@ struct task_struct { /* Used by memcontrol for targeted memcg charge: */ struct mem_cgroup *active_memcg; + int active_item; #endif #ifdef CONFIG_BLK_CGROUP diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 2a24361..3a334c7 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -363,6 +363,7 @@ static inline void memalloc_pin_restore(unsigned int flags) #ifdef CONFIG_MEMCG DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg); +DECLARE_PER_CPU(int, int_active_item); /** * set_active_memcg - Starts the remote memcg charging scope. * @memcg: memcg to charge. @@ -389,12 +390,35 @@ static inline void memalloc_pin_restore(unsigned int flags) return old; } + +static inline int +set_active_memcg_item(int item) +{ + int old_item; + + if (!in_task()) { + old_item = this_cpu_read(int_active_item); + this_cpu_write(int_active_item, item); + } else { + old_item = current->active_item; + current->active_item = item; + } + + return old_item; +} + #else static inline struct mem_cgroup * set_active_memcg(struct mem_cgroup *memcg) { return NULL; } + +static inline int +set_active_memcg_item(int item) +{ + return MEMCG_NR_STAT; +} #endif #ifdef CONFIG_MEMBARRIER diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 5f83be1..51d59d4 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -165,11 +165,14 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) { struct mem_cgroup *memcg = NULL, *old_memcg; unsigned long flags; + int old_item; void *obj; int i; memcg = get_memcg(c); old_memcg = set_active_memcg(memcg); + old_item = set_active_memcg_item(MEMCG_BPF); + for (i = 0; i < cnt; i++) { obj = __alloc(c, node); if (!obj) @@ -194,19 +197,26 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); } + + set_active_memcg_item(old_item); set_active_memcg(old_memcg); mem_cgroup_put(memcg); } static void free_one(struct bpf_mem_cache *c, void *obj) { + int old_item; + + old_item = set_active_memcg_item(MEMCG_BPF); if (c->percpu_size) { free_percpu(((void **)obj)[1]); kfree(obj); + set_active_memcg_item(old_item); return; } kfree(obj); + set_active_memcg_item(old_item); } static void __free_rcu(struct rcu_head *head) diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index 535e440..72435bd 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -61,7 +61,11 @@ struct bpf_ringbuf_hdr { static inline void bpf_map_free_page(struct page *page) { + int old_item; + + old_item = set_active_memcg_item(MEMCG_BPF); __free_page(page); + set_active_memcg_item(old_item); } static void bpf_ringbuf_pages_free(struct page **pages, int nr_pages) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index b9250c8..703aa6a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -340,11 +340,14 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT; unsigned int flags = 0; unsigned long align = 1; + int old_item; void *area; + void *ptr; if (size >= SIZE_MAX) return NULL; + old_item = set_active_memcg_item(MEMCG_BPF); /* kmalloc()'ed memory can't be mmap()'ed */ if (mmapable) { BUG_ON(!PAGE_ALIGNED(size)); @@ -353,13 +356,18 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) } else if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { area = kmalloc_node(size, gfp | GFP_USER | __GFP_NORETRY, numa_node); - if (area != NULL) + if (area != NULL) { + set_active_memcg_item(old_item); return area; + } } - return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, + ptr = __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, PAGE_KERNEL, flags, numa_node, __builtin_return_address(0)); + + set_active_memcg_item(old_item); + return ptr; } void *bpf_map_area_alloc(u64 size, int numa_node, struct bpf_map *map) @@ -386,9 +394,13 @@ void *bpf_map_area_mmapable_alloc(u64 size, int numa_node) void bpf_map_area_free(void *area, struct bpf_map *map) { + int old_item; + if (map) bpf_map_release_memcg(map); + old_item = set_active_memcg_item(MEMCG_BPF); kvfree(area); + set_active_memcg_item(old_item); } static u32 bpf_map_flags_retain_permanent(u32 flags) @@ -464,11 +476,14 @@ void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, int node) { struct mem_cgroup *memcg, *old_memcg; + int old_item; void *ptr; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); + old_item = set_active_memcg_item(MEMCG_BPF); ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node); + set_active_memcg_item(old_item); set_active_memcg(old_memcg); bpf_map_put_memcg(memcg); @@ -479,10 +494,13 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) { struct mem_cgroup *memcg, *old_memcg; void *ptr; + int old_item; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); + old_item = set_active_memcg_item(MEMCG_BPF); ptr = kzalloc(size, flags | __GFP_ACCOUNT); + set_active_memcg_item(old_item); set_active_memcg(old_memcg); bpf_map_put_memcg(memcg); @@ -494,11 +512,14 @@ void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size, { struct mem_cgroup *memcg, *old_memcg; void *ptr; + int old_item; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); + old_item = set_active_memcg_item(MEMCG_BPF); ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT); set_active_memcg(old_memcg); + set_active_memcg_item(old_item); bpf_map_put_memcg(memcg); return ptr; @@ -509,10 +530,13 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, { struct mem_cgroup *memcg, *old_memcg; void __percpu *ptr; + int old_item; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); + old_item = set_active_memcg_item(MEMCG_BPF); ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT); + set_active_memcg_item(old_item); set_active_memcg(old_memcg); bpf_map_put_memcg(memcg); @@ -521,17 +545,29 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, void bpf_map_kfree(const void *ptr) { + int old_item; + + old_item = set_active_memcg_item(MEMCG_BPF); kfree(ptr); + set_active_memcg_item(old_item); } void bpf_map_kvfree(const void *ptr) { + int old_item; + + old_item = set_active_memcg_item(MEMCG_BPF); kvfree(ptr); + set_active_memcg_item(old_item); } void bpf_map_free_percpu(void __percpu *ptr) { + int old_item; + + old_item = set_active_memcg_item(MEMCG_BPF); free_percpu(ptr); + set_active_memcg_item(old_item); } #endif diff --git a/kernel/fork.c b/kernel/fork.c index 90c85b1..dac2429 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1043,6 +1043,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; + tsk->active_item = 0; #endif #ifdef CONFIG_CPU_SUP_INTEL diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b69979c..9008417 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -82,6 +82,10 @@ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); EXPORT_PER_CPU_SYMBOL_GPL(int_active_memcg); +/* Active memory cgroup to use from an interrupt context */ +DEFINE_PER_CPU(int, int_active_item); +EXPORT_PER_CPU_SYMBOL_GPL(int_active_item); + /* Socket memory accounting disabled? */ static bool cgroup_memory_nosocket __ro_after_init; @@ -923,6 +927,14 @@ static __always_inline struct mem_cgroup *active_memcg(void) return current->active_memcg; } +static __always_inline int active_memcg_item(void) +{ + if (!in_task()) + return this_cpu_read(int_active_item); + + return current->active_item; +} + /** * get_mem_cgroup_from_mm: Obtain a reference on given mm_struct's memcg. * @mm: mm from which memcg should be extracted. It can be NULL. @@ -1436,6 +1448,7 @@ struct memory_stat { { "workingset_restore_anon", WORKINGSET_RESTORE_ANON }, { "workingset_restore_file", WORKINGSET_RESTORE_FILE }, { "workingset_nodereclaim", WORKINGSET_NODERECLAIM }, + { "bpf", MEMCG_BPF }, }; /* Translate stat items to the correct unit for memory.stat output */ @@ -2993,6 +3006,11 @@ struct obj_cgroup *get_obj_cgroup_from_page(struct page *page) static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages) { + int item = active_memcg_item(); + + WARN_ON_ONCE(item != 0 && (item < MEMCG_SWAP || item >= MEMCG_NR_STAT)); + if (item) + mod_memcg_state(memcg, item, nr_pages); mod_memcg_state(memcg, MEMCG_KMEM, nr_pages); if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { if (nr_pages > 0) @@ -3976,6 +3994,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) NR_FILE_DIRTY, NR_WRITEBACK, MEMCG_SWAP, + MEMCG_BPF, }; static const char *const memcg1_stat_names[] = { @@ -3989,6 +4008,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) "dirty", "writeback", "swap", + "bpf", }; /* Universal VM events cgroup1 shows, original sort order */