From patchwork Wed Aug 10 15:18:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 12940674 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 750C1C25B07 for ; Wed, 10 Aug 2022 15:19:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A5546B0081; Wed, 10 Aug 2022 11:19:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12C838E0001; Wed, 10 Aug 2022 11:19:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD5206B0083; Wed, 10 Aug 2022 11:19:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C56A06B0081 for ; Wed, 10 Aug 2022 11:19:14 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9A792140EE3 for ; Wed, 10 Aug 2022 15:19:14 +0000 (UTC) X-FDA: 79784041428.22.AEDDED7 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf24.hostedemail.com (Postfix) with ESMTP id 3FDAB18019D for ; Wed, 10 Aug 2022 15:19:14 +0000 (UTC) Received: by mail-pl1-f169.google.com with SMTP id m2so14536285pls.4 for ; Wed, 10 Aug 2022 08:19:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=j0vArd5kCmiXrID4XqlmDliohUWxtZHgIa+7cLj7rn4=; b=MMqCZhZzn3sPBSjB7ZDkJJIzBlptDVnfvWBJBi7wzuaf50HQkzpq0n1tWWQQphwNyJ ZklkxnQA8koYN0EJXc+3L66T4Z7Q4/utfybyMzW/726CwjF6qxJRRhjGxZLL/UFOCJpy TbLLn8wPu5HHYALnaIL6Zr1cVcQhfLODb3U5fKDoct69VMSPVdkMR02c4MZ4pbaOGL88 bsrNxcSN9J2IbjjHDyneHYAO1vR1Fk8CzPkrwABFfjOsMeoee/lSNk1rOo7iZrmr9mwX neOuRzkK/8bEk/+EYUnY/xmdqOrtEX7z889sH2Ud4PBHNpxb5htGNjQTuNIdLiCtBobn EDYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=j0vArd5kCmiXrID4XqlmDliohUWxtZHgIa+7cLj7rn4=; b=qTLH4Aqu125cVCJ27icpET+bkgsL5QL2cBDVr9ceCXLSuBBYOfdSE9ymJlGBfgYsar +URly3YFfMvSmLcL6qEIIofQY41vDJrsx4YgG2P53spemf0zNe96kgngBaddmR9rNcF8 HnY4by0BDEvLZyiXoPuKnolPVtj9Tw/rlvY+rZN6V+/M6L+wmlI1EKeppI9KRseOa3Fr 3/IO/iuHlvP0VbwRA+i0cb+mFT+da3Alf19rnfAqev9/AfaajPJ+9ZnupoGYFKMNN8Na xHEwFyvhpu/z1cuw4wyOtqXSTESedBLJD/n1r7ZzzvDcEDuaeTK8ObiAFPJC3z1lEav4 +ZRw== X-Gm-Message-State: ACgBeo3f2ANe9yEQx28JqwMWF/N4N9aNax7g7+buZQw7ZhfcyNFV7IPA 5sE4EB26Wu6PtwxBf8P6d1s= X-Google-Smtp-Source: AA6agR7KJ1PJ7r6sNN9BQifYoxntxi66pFLvj/s4h7cp7gKDAa5B4ZqS0oSQrdeDzsOIO2LhhJwkrA== X-Received: by 2002:a17:90b:33c6:b0:1f4:f595:b0f1 with SMTP id lk6-20020a17090b33c600b001f4f595b0f1mr4349009pjb.230.1660144753765; Wed, 10 Aug 2022 08:19:13 -0700 (PDT) Received: from vultr.guest ([2001:19f0:6001:5c3e:5400:4ff:fe19:c3bc]) by smtp.gmail.com with ESMTPSA id 85-20020a621558000000b0052b6ed5ca40sm2071935pfv.192.2022.08.10.08.19.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Aug 2022 08:19:12 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, songmuchun@bytedance.com, akpm@linux-foundation.org Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org, Yafang Shao Subject: [PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Date: Wed, 10 Aug 2022 15:18:40 +0000 Message-Id: <20220810151840.16394-16-laoar.shao@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220810151840.16394-1-laoar.shao@gmail.com> References: <20220810151840.16394-1-laoar.shao@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MMqCZhZz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660144754; a=rsa-sha256; cv=none; b=6sbo32RqXYCcHJ76CDFyV7BFli428hQo1A1FSEZtmV1Rg8oG5Ri21a3i1CrfuBmiIK+T/G IfK3o/Bc4QwtLaRjtTbXTLGsaImOq9rcf9MZDOqnLJsC4nBghTF6kUZi0J/aaUVf7IYoX1 7uhLQ0+HeEg+Qqg8SEXXLsr3wL3MRME= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660144754; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j0vArd5kCmiXrID4XqlmDliohUWxtZHgIa+7cLj7rn4=; b=TjHuy6LEMhZO/5AP2+LmXpkWbJ3IUNhfvzSAES5EAxMck2K/Yyo6Ina2A8YcbU1gtNu2i6 oa+Y6w6OsNTHN8VgRnY9n7wYkTQox+yKf3R6jd2WmvlVau8/6qpM0jL4YHmppZKDcU18KW XezaAjCsjRtf0yY5QsduLsYjPAFr0XY= X-Rspamd-Server: rspam10 X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MMqCZhZz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Rspamd-Queue-Id: 3FDAB18019D X-Stat-Signature: bhowng59xt9i5byyr6nk8ymy6kumbg7n X-HE-Tag: 1660144754-679634 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new bpf attr memcg_fd is introduced for map creation. The map creation path in libbpf is changed consequently to set this attr. A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE command, which is the fd of an opened cgroup directory. In this cgroup, the memory subsystem must be enabled. The valid memcg_fd must be a postive number, that means it can't be zero(a valid return value of open(2)). Once the kernel get the memory cgroup from this fd, it will set this memcg into bpf map, then all the subsequent memory allocation of this map will be charged to the memcg. The map creation paths in libbpf are also changed consequently. Currently it is only supported for cgroup2 directory. The usage of this new member as follows, struct bpf_map_create_opts map_opts = { .sz = sizeof(map_opts), }; int memcg_fd, map_fd, old_fd; int key, value; memcg_fd = open("/cgroup2", O_DIRECTORY); if (memcg_fd < 0) { perror("memcg dir open"); return -1; } /* 0 is a invalid fd */ if (memcg_fd == 0) { old_fd = memcg_fd; memcg_fd = fcntl(memcg_fd, F_DUPFD_CLOEXEC, 3); close(old_fd); if (memcg_fd < 0) { perror("fcntl"); return -1; } } map_opts.memcg_fd = memcg_fd; map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg", sizeof(key), sizeof(value), 1024, &map_opts); if (map_fd <= 0) { close(memcg_fd); perror("map create"); return -1; } Signed-off-by: Yafang Shao --- include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 42 ++++++++++++++++++++++++++++++++---------- tools/include/uapi/linux/bpf.h | 1 + tools/lib/bpf/bpf.c | 3 ++- tools/lib/bpf/bpf.h | 3 ++- tools/lib/bpf/gen_loader.c | 2 +- tools/lib/bpf/libbpf.c | 2 ++ tools/lib/bpf/skel_internal.h | 2 +- 8 files changed, 42 insertions(+), 14 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 534e33f..d46acbb 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1300,6 +1300,7 @@ struct bpf_stack_build_id { * to using 5 hash functions). */ __u64 map_extra; + __u32 memcg_fd; /* selectable memcg */ }; struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5f5cade4..91a6e15 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -294,14 +294,30 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value, } #ifdef CONFIG_MEMCG_KMEM -static void bpf_map_save_memcg(struct bpf_map *map) +static int bpf_map_save_memcg(struct bpf_map *map, u32 memcg_fd) { - /* Currently if a map is created by a process belonging to the root - * memory cgroup, get_obj_cgroup_from_current() will return NULL. - * So we have to check map->objcg for being NULL each time it's - * being used. - */ - map->objcg = get_obj_cgroup_from_current(); + struct obj_cgroup *objcg; + struct cgroup *cgrp; + + if (memcg_fd) { + cgrp = cgroup_get_from_fd(memcg_fd); + if (IS_ERR(cgrp)) + return -EINVAL; + + objcg = get_obj_cgroup_from_cgroup(cgrp); + if (IS_ERR(objcg)) + return PTR_ERR(objcg); + } else { + /* Currently if a map is created by a process belonging to the root + * memory cgroup, get_obj_cgroup_from_current() will return NULL. + * So we have to check map->objcg for being NULL each time it's + * being used. + */ + objcg = get_obj_cgroup_from_current(); + } + + map->objcg = objcg; + return 0; } static void bpf_map_release_memcg(struct bpf_map *map) @@ -311,8 +327,9 @@ static void bpf_map_release_memcg(struct bpf_map *map) } #else -static void bpf_map_save_memcg(struct bpf_map *map) +static int bpf_map_save_memcg(struct bpf_map *map, u32 memcg_fd) { + return 0; } static void bpf_map_release_memcg(struct bpf_map *map) @@ -405,7 +422,12 @@ static u32 bpf_map_flags_retain_permanent(u32 flags) int bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr) { - bpf_map_save_memcg(map); + int err; + + err = bpf_map_save_memcg(map, attr->memcg_fd); + if (err) + return err; + map->map_type = attr->map_type; map->key_size = attr->key_size; map->value_size = attr->value_size; @@ -1091,7 +1113,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, return ret; } -#define BPF_MAP_CREATE_LAST_FIELD map_extra +#define BPF_MAP_CREATE_LAST_FIELD memcg_fd /* called via syscall */ static int map_create(union bpf_attr *attr) { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index f58d58e..12203406 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1300,6 +1300,7 @@ struct bpf_stack_build_id { * to using 5 hash functions). */ __u64 map_extra; + __u32 memcg_fd; /* selectable memcg */ }; struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index efcc06d..9c613b2 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -171,7 +171,7 @@ int bpf_map_create(enum bpf_map_type map_type, __u32 max_entries, const struct bpf_map_create_opts *opts) { - const size_t attr_sz = offsetofend(union bpf_attr, map_extra); + const size_t attr_sz = offsetofend(union bpf_attr, memcg_fd); union bpf_attr attr; int fd; @@ -199,6 +199,7 @@ int bpf_map_create(enum bpf_map_type map_type, attr.map_extra = OPTS_GET(opts, map_extra, 0); attr.numa_node = OPTS_GET(opts, numa_node, 0); attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0); + attr.memcg_fd = OPTS_GET(opts, memcg_fd, 0); fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz); return libbpf_err_errno(fd); diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 9c50bea..dd0d929 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -51,8 +51,9 @@ struct bpf_map_create_opts { __u32 numa_node; __u32 map_ifindex; + __u32 memcg_fd; }; -#define bpf_map_create_opts__last_field map_ifindex +#define bpf_map_create_opts__last_field memcg_fd LIBBPF_API int bpf_map_create(enum bpf_map_type map_type, const char *map_name, diff --git a/tools/lib/bpf/gen_loader.c b/tools/lib/bpf/gen_loader.c index 23f5c46..f35b014 100644 --- a/tools/lib/bpf/gen_loader.c +++ b/tools/lib/bpf/gen_loader.c @@ -451,7 +451,7 @@ void bpf_gen__map_create(struct bpf_gen *gen, __u32 key_size, __u32 value_size, __u32 max_entries, struct bpf_map_create_opts *map_attr, int map_idx) { - int attr_size = offsetofend(union bpf_attr, map_extra); + int attr_size = offsetofend(union bpf_attr, memcg_fd); bool close_inner_map_fd = false; int map_create_attr, idx; union bpf_attr attr; diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index f7364ea..88c65e5 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -506,6 +506,7 @@ struct bpf_map { bool reused; bool autocreate; __u64 map_extra; + __u32 memcg_fd; }; enum extern_type { @@ -4931,6 +4932,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b create_attr.map_flags = def->map_flags; create_attr.numa_node = map->numa_node; create_attr.map_extra = map->map_extra; + create_attr.memcg_fd = map->memcg_fd; if (bpf_map__is_struct_ops(map)) create_attr.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; diff --git a/tools/lib/bpf/skel_internal.h b/tools/lib/bpf/skel_internal.h index bd6f450..83403cc 100644 --- a/tools/lib/bpf/skel_internal.h +++ b/tools/lib/bpf/skel_internal.h @@ -222,7 +222,7 @@ static inline int skel_map_create(enum bpf_map_type map_type, __u32 value_size, __u32 max_entries) { - const size_t attr_sz = offsetofend(union bpf_attr, map_extra); + const size_t attr_sz = offsetofend(union bpf_attr, memcg_fd); union bpf_attr attr; memset(&attr, 0, attr_sz);