From patchwork Fri Jul 29 15:23:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 12932545 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA2F5C19F2B for ; Fri, 29 Jul 2022 15:23:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D8696B0088; Fri, 29 Jul 2022 11:23:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6610D6B0089; Fri, 29 Jul 2022 11:23:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B3708E0001; Fri, 29 Jul 2022 11:23:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3E6866B0088 for ; Fri, 29 Jul 2022 11:23:54 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 22039C0DA6 for ; Fri, 29 Jul 2022 15:23:54 +0000 (UTC) X-FDA: 79740507588.20.54EB2F1 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf21.hostedemail.com (Postfix) with ESMTP id AA0CB1C00D5 for ; Fri, 29 Jul 2022 15:23:53 +0000 (UTC) Received: by mail-pl1-f178.google.com with SMTP id o3so4902850ple.5 for ; Fri, 29 Jul 2022 08:23:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7YsE4RcubCx0MNNfTDgzCPEblKVY7zNgxkQBDs4eXwk=; b=O5fg25+RPCgr8YZ1DZBPRlYRypSoSLhhZ+s3atPcrFK25aDQy6CL5rwsw3m5+llAGw wxrSeU2pkwin6ICTg/I3esczDOPduHDhcISG8YjXkEl7nhA+/yeiVyiBanmd9vhcuN1y E/hSC/t7jj5rA1Meyi7gYlgtiHotJd1DdRmBBK0p2WnPK0R9+EwD+M2bpgHZ6x0q5OWX gAoKVgnTZLygdyqobNOJdDpPJffXYepcW9t/NfVuXa2B2frGIHDd/tdQ9M5E3HJK8EFL Tr3Rt7EnK4zC3dObgNYBgZgpxW/coISdKL/i9f6FAYcMOrnO+oDj7WvlZZBTzgVAy+YH QTxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7YsE4RcubCx0MNNfTDgzCPEblKVY7zNgxkQBDs4eXwk=; b=2Gmylp/hVuOhz6zwRyl0UWUzYeN6EzO7xe9JL0tLtvpfObwuNtkScA3nYz6k7ThiY6 eQWb1ifXefCnqGb6u+clO5gOzP81rlbt7PgBztp/DN0iW4WFy+ZX7+NGhFsxOvBPvbko FbcL/v4YHF/t6bgJb6+OlCA/vx870+OqksqnHXVZMRiUKZrZ5ohtuo4DPRHTPxOkCzBY 6VryQlq7ohwExcMs2luCEA/CHci+8T/cCobNW0B58YoQnhZUdcArF9kxMaCzHGGMw7UA Pr5cW9wspZEm8dB6Rog/4OLicEIjUlkf2Q2SE80lkhAe/489Kdw8xI06pqAR9vERU1e/ e6Yw== X-Gm-Message-State: ACgBeo0+QkxcJP2xvr6pje50n81w2YyTouM4Vjz22Aeg3TECPFf/lQ4O oT/Kq1utOEsdCXB9T5V+KIU= X-Google-Smtp-Source: AA6agR7ytKHuTcGgGPxk083gm4DArTadtq9szQO+g7w3QFR6tTBNSGjLIDUSkmlzzv7MzA+JPSsXSg== X-Received: by 2002:a17:902:a502:b0:15e:c251:b769 with SMTP id s2-20020a170902a50200b0015ec251b769mr4526184plq.115.1659108232767; Fri, 29 Jul 2022 08:23:52 -0700 (PDT) Received: from vultr.guest ([2001:19f0:6001:2912:5400:4ff:fe16:4344]) by smtp.gmail.com with ESMTPSA id b12-20020a1709027e0c00b0016d3a354cffsm3714219plm.89.2022.07.29.08.23.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Jul 2022 08:23:51 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, songmuchun@bytedance.com, akpm@linux-foundation.org Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org, Yafang Shao Subject: [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Date: Fri, 29 Jul 2022 15:23:16 +0000 Message-Id: <20220729152316.58205-16-laoar.shao@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220729152316.58205-1-laoar.shao@gmail.com> References: <20220729152316.58205-1-laoar.shao@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=O5fg25+R; spf=pass (imf21.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659108233; a=rsa-sha256; cv=none; b=bk70w4Ofl4jPvMG/itqnibn4hShdDMVISNdKdE3fDNhPd2p7PDFVZ3WgodelGUJ0p5AXp6 OA8Ip9CeSaecbA/zMgdpzfQA8kd68+TR6F2mO7ry2LlayXU74Q1n64YcJX5giffJkPn0wX wUCIs5Hl1x2aPpRQLuDyBY6bRlC94aE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659108233; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7YsE4RcubCx0MNNfTDgzCPEblKVY7zNgxkQBDs4eXwk=; b=x8U8Xtzc9hq384rP6MB4tXYfznn72+Z1edNdj1tB4uJgGfQlExLI+9jD3eW4uQ0EhtUOSf 4gZb34cvjeW72v8LvSmaXNS9obHuD/6rP3n+NzaGYv5WqaFDa4qif2JwPyaD+r5zHmPwXD gqCjlcN7brFeQyu68H9rL0xKyxYaREI= X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AA0CB1C00D5 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=O5fg25+R; spf=pass (imf21.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: k7uemtgy4cmmu98ck7xxzoowqk7c39kx X-Rspam-User: X-HE-Tag: 1659108233-156835 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE command, which is the fd of an opened cgroup directory. In this cgroup, the memory subsystem must be enabled. This value is valid only when BPF_F_SELECTABLE_MEMCG is set in map_flags. Once the kernel get the memory cgroup from this fd, it will set this memcg into bpf map, then all the subsequent memory allocation of this map will be charge to the memcg. The map creation paths in libbpf are also changed consequently. Currently it is only supported for cgroup2 directory. The usage of this new member as follows, struct bpf_map_create_opts map_opts = { .sz = sizeof(map_opts), .map_flags = BPF_F_SELECTABLE_MEMCG, }; int memcg_fd, int map_fd; int key, value; memcg_fd = open("/cgroup2", O_DIRECTORY); if (memcg_fd < 0) { perror("memcg dir open"); return -1; } map_opts.memcg_fd = memcg_fd; map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg", sizeof(key), sizeof(value), 1024, &map_opts); if (map_fd <= 0) { perror("map create"); return -1; } Signed-off-by: Yafang Shao --- include/uapi/linux/bpf.h | 2 ++ kernel/bpf/syscall.c | 47 ++++++++++++++++++++++++++-------- tools/include/uapi/linux/bpf.h | 2 ++ tools/lib/bpf/bpf.c | 1 + tools/lib/bpf/bpf.h | 3 ++- tools/lib/bpf/libbpf.c | 2 ++ 6 files changed, 46 insertions(+), 11 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d5fc1ea70b59..a6e02c8be924 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1296,6 +1296,8 @@ union bpf_attr { * struct stored as the * map value */ + __s32 memcg_fd; /* selectable memcg */ + __s32 :32; /* hole */ /* Any per-map-type extra fields * * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 6401cc417fa9..9900e2b87315 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -402,14 +402,30 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock) } #ifdef CONFIG_MEMCG_KMEM -static void bpf_map_save_memcg(struct bpf_map *map) +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr) { - /* Currently if a map is created by a process belonging to the root - * memory cgroup, get_obj_cgroup_from_current() will return NULL. - * So we have to check map->objcg for being NULL each time it's - * being used. - */ - map->objcg = get_obj_cgroup_from_current(); + struct obj_cgroup *objcg; + struct cgroup *cgrp; + + if (attr->map_flags & BPF_F_SELECTABLE_MEMCG) { + cgrp = cgroup_get_from_fd(attr->memcg_fd); + if (IS_ERR(cgrp)) + return -EINVAL; + + objcg = get_obj_cgroup_from_cgroup(cgrp); + if (IS_ERR(objcg)) + return PTR_ERR(objcg); + } else { + /* Currently if a map is created by a process belonging to the root + * memory cgroup, get_obj_cgroup_from_current() will return NULL. + * So we have to check map->objcg for being NULL each time it's + * being used. + */ + objcg = get_obj_cgroup_from_current(); + } + + map->objcg = objcg; + return 0; } static void bpf_map_release_memcg(struct bpf_map *map) @@ -485,8 +501,9 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, } #else -static void bpf_map_save_memcg(struct bpf_map *map) +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr) { + return 0; } static void bpf_map_release_memcg(struct bpf_map *map) @@ -530,13 +547,18 @@ void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node) { struct bpf_map *map; void *container; + int ret; container = __bpf_map_area_alloc(size, numa_node, false); if (!container) return ERR_PTR(-ENOMEM); map = (struct bpf_map *)container; - bpf_map_save_memcg(map); + ret = bpf_map_save_memcg(map, attr); + if (ret) { + bpf_map_area_free(container); + return ERR_PTR(ret); + } return container; } @@ -547,6 +569,7 @@ void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size, struct bpf_map *map; void *container; void *ptr; + int ret; /* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */ ptr = __bpf_map_area_alloc(size, numa_node, true); @@ -555,7 +578,11 @@ void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size, container = ptr + align - offset; map = (struct bpf_map *)container; - bpf_map_save_memcg(map); + ret = bpf_map_save_memcg(map, attr); + if (ret) { + bpf_map_area_free(ptr); + return ERR_PTR(ret); + } return ptr; } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d5fc1ea70b59..a6e02c8be924 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1296,6 +1296,8 @@ union bpf_attr { * struct stored as the * map value */ + __s32 memcg_fd; /* selectable memcg */ + __s32 :32; /* hole */ /* Any per-map-type extra fields * * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 5eb0df90eb2b..662ce5808386 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -199,6 +199,7 @@ int bpf_map_create(enum bpf_map_type map_type, attr.map_extra = OPTS_GET(opts, map_extra, 0); attr.numa_node = OPTS_GET(opts, numa_node, 0); attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0); + attr.memcg_fd = OPTS_GET(opts, memcg_fd, 0); fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz); return libbpf_err_errno(fd); diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 88a7cc4bd76f..481aad49422b 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -51,8 +51,9 @@ struct bpf_map_create_opts { __u32 numa_node; __u32 map_ifindex; + __u32 memcg_fd; }; -#define bpf_map_create_opts__last_field map_ifindex +#define bpf_map_create_opts__last_field memcg_fd LIBBPF_API int bpf_map_create(enum bpf_map_type map_type, const char *map_name, diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 50d41815f431..86916d550031 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -505,6 +505,7 @@ struct bpf_map { bool pinned; bool reused; bool autocreate; + __s32 memcg_fd; __u64 map_extra; }; @@ -4928,6 +4929,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b create_attr.map_ifindex = map->map_ifindex; create_attr.map_flags = def->map_flags; create_attr.numa_node = map->numa_node; + create_attr.memcg_fd = map->memcg_fd; create_attr.map_extra = map->map_extra; if (bpf_map__is_struct_ops(map))