From patchwork Fri Aug 21 15:01:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11729683 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BA5A739 for ; Fri, 21 Aug 2020 15:02:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DEB98207C3 for ; Fri, 21 Aug 2020 15:02:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="X9UBRfaZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DEB98207C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 16C298D0001; Fri, 21 Aug 2020 11:02:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0CFC36B0023; Fri, 21 Aug 2020 11:02:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED9C88D0001; Fri, 21 Aug 2020 11:02:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id CF3816B0022 for ; Fri, 21 Aug 2020 11:02:16 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5188E180ACF8B for ; Fri, 21 Aug 2020 15:02:16 +0000 (UTC) X-FDA: 77174891472.09.hose21_4b0d50f2703a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 90A92180AE7F3 for ; Fri, 21 Aug 2020 15:01:58 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=55027216e2=guro@fb.com,,RULES_HIT:30003:30029:30054:30064:30070,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04ygn9tu3ssrhxr1geo5psz8a3ipzyp9xxesm7om47c7w5x8x5d5ipczgw4y3mg.crnrwkbik4dac5znxnxirkcr15jy3wefj46gz1mo4kii9tcnaxhi8apbk7qoud4.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: hose21_4b0d50f2703a X-Filterd-Recvd-Size: 8145 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Fri, 21 Aug 2020 15:01:48 +0000 (UTC) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 07LExm3w019716 for ; Fri, 21 Aug 2020 08:01:48 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Ul+xf1TI54FnPjFpxc3Ihst3TQMepDvmciNWWocFk1Q=; b=X9UBRfaZyQqWzJ9tvDuKhWjfiKeHCzV8LSU6YcfZQqOEyfgA1Iu7cmymBDHunq6oDngT qPnd/sfDuIbLthIrW0mC6czuKOVvFwjEXSqZpgBg2RnQBmb03ZZy/yusAU9zlsfskQmW qNbD42wqrAo7zWO0iiRwsRsLBllgeViIPPE= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 3304m3da81-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 21 Aug 2020 08:01:47 -0700 Received: from intmgw004.06.prn3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Fri, 21 Aug 2020 08:01:41 -0700 Received: by devvm1096.prn0.facebook.com (Postfix, from userid 111017) id 3163A344104B; Fri, 21 Aug 2020 08:01:35 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1096.prn0.facebook.com To: CC: , Alexei Starovoitov , Daniel Borkmann , , , Johannes Weiner , Shakeel Butt , , Roman Gushchin Smtp-Origin-Cluster: prn0c01 Subject: [PATCH bpf-next v4 03/30] bpf: memcg-based memory accounting for bpf maps Date: Fri, 21 Aug 2020 08:01:07 -0700 Message-ID: <20200821150134.2581465-4-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200821150134.2581465-1-guro@fb.com> References: <20200821150134.2581465-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-08-21_08:2020-08-21,2020-08-21 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=999 impostorscore=0 mlxscore=0 bulkscore=0 suspectscore=38 spamscore=0 phishscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 adultscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008210141 X-FB-Internal: deliver X-Rspamd-Queue-Id: 90A92180AE7F3 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch enables memcg-based memory accounting for memory allocated by __bpf_map_area_alloc(), which is used by most map types for large allocations. If a map is updated from an interrupt context, and the update results in memory allocation, the memory cgroup can't be determined from the context of the current process. To address this case, bpf map preserves a pointer to the memory cgroup of the process, which created the map. This memory cgroup is charged for allocations from interrupt context. Following patches in the series will refine the accounting for some map types. Signed-off-by: Roman Gushchin Reported-by: kernel test robot --- include/linux/bpf.h | 4 ++++ kernel/bpf/helpers.c | 37 ++++++++++++++++++++++++++++++++++++- kernel/bpf/syscall.c | 27 ++++++++++++++++++++++++++- 3 files changed, 66 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a9b7185a6b37..b5f178afde94 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -34,6 +34,7 @@ struct btf_type; struct exception_table_entry; struct seq_operations; struct bpf_iter_aux_info; +struct mem_cgroup; extern struct idr btf_idr; extern spinlock_t btf_idr_lock; @@ -138,6 +139,9 @@ struct bpf_map { u32 btf_value_type_id; struct btf *btf; struct bpf_map_memory memory; +#ifdef CONFIG_MEMCG_KMEM + struct mem_cgroup *memcg; +#endif char name[BPF_OBJ_NAME_LEN]; u32 btf_vmlinux_value_type_id; bool bypass_spec_v1; diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index be43ab3e619f..f8ce7bc7003f 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "../../lib/kstrtox.h" @@ -41,11 +42,45 @@ const struct bpf_func_proto bpf_map_lookup_elem_proto = { .arg2_type = ARG_PTR_TO_MAP_KEY, }; +#ifdef CONFIG_MEMCG_KMEM +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key, + void *value, u64 flags) +{ + struct mem_cgroup *old_memcg; + bool in_interrupt; + int ret; + + /* + * If update from an interrupt context results in a memory allocation, + * the memory cgroup to charge can't be determined from the context + * of the current task. Instead, we charge the memory cgroup, which + * contained a process created the map. + */ + in_interrupt = in_interrupt(); + if (in_interrupt) + old_memcg = memalloc_use_memcg(map->memcg); + + ret = map->ops->map_update_elem(map, key, value, flags); + + if (in_interrupt) + memalloc_use_memcg(old_memcg); + + return ret; +} +#else +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key, + void *value, u64 flags) +{ + return map->ops->map_update_elem(map, key, value, flags); +} +#endif + BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key, void *, value, u64, flags) { WARN_ON_ONCE(!rcu_read_lock_held()); - return map->ops->map_update_elem(map, key, value, flags); + + return __bpf_map_update_elem(map, key, value, flags); } const struct bpf_func_proto bpf_map_update_elem_proto = { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 689d736b6904..683614c17a95 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -29,6 +29,7 @@ #include #include #include +#include #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ @@ -275,7 +276,7 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) * __GFP_RETRY_MAYFAIL to avoid such situations. */ - const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO; + const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT; unsigned int flags = 0; unsigned long align = 1; void *area; @@ -452,6 +453,27 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock) __release(&map_idr_lock); } +#ifdef CONFIG_MEMCG_KMEM +static void bpf_map_save_memcg(struct bpf_map *map) +{ + map->memcg = get_mem_cgroup_from_mm(current->mm); +} + +static void bpf_map_release_memcg(struct bpf_map *map) +{ + mem_cgroup_put(map->memcg); +} + +#else +static void bpf_map_save_memcg(struct bpf_map *map) +{ +} + +static void bpf_map_release_memcg(struct bpf_map *map) +{ +} +#endif + /* called from workqueue */ static void bpf_map_free_deferred(struct work_struct *work) { @@ -463,6 +485,7 @@ static void bpf_map_free_deferred(struct work_struct *work) /* implementation dependent freeing */ map->ops->map_free(map); bpf_map_charge_finish(&mem); + bpf_map_release_memcg(map); } static void bpf_map_put_uref(struct bpf_map *map) @@ -869,6 +892,8 @@ static int map_create(union bpf_attr *attr) if (err) goto free_map_sec; + bpf_map_save_memcg(map); + err = bpf_map_new_fd(map, f_flags); if (err < 0) { /* failed to allocate fd.