From patchwork Fri Aug 19 21:42:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949268 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A122FC32771 for ; Fri, 19 Aug 2022 21:42:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06DF16B0074; Fri, 19 Aug 2022 17:42:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F38B66B0075; Fri, 19 Aug 2022 17:42:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D64ED8D0001; Fri, 19 Aug 2022 17:42:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C201A6B0074 for ; Fri, 19 Aug 2022 17:42:40 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 997F1121AE9 for ; Fri, 19 Aug 2022 21:42:40 +0000 (UTC) X-FDA: 79817666880.21.4B6CB68 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf24.hostedemail.com (Postfix) with ESMTP id 4A8B1180026 for ; Fri, 19 Aug 2022 21:42:40 +0000 (UTC) Received: by mail-pj1-f47.google.com with SMTP id e19so4575039pju.1 for ; Fri, 19 Aug 2022 14:42:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=OQXqiUxNIBJPJkPe9vcnoPlbU6j/NURphb+zCMrcKUQ=; b=czlS2WS/uyJQoPDdAdGWM0CyRxPkm/xGFh/8e+CLsGdAvIcrJtwFA3bMbBfjc4gY24 lBTZLLlSqabzkyUQ5ImV/a+7xR3rHSqIUKDpFrp05IPKh06lWhTSQ8U7VPaeJhua55BB DqyT44C5vumOrT+DXqg2jKWsHtv8gGwkqzDYs58EbhE4Rht+CQa2QXdbrtaEgTgBF66R 17MIUXraC3UJqEzfZ8ihpULDS/szMCT8drs56ixUSDV3qxpLjtd2uQ09UwEZtut8qEJ/ vRJ6Uixk50YDiC8Q8UdoB6Cx+mt7/IQYeTlDDOrhp9CAs6+S/3aVdjCYHTuUFMY3GMtt fEbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=OQXqiUxNIBJPJkPe9vcnoPlbU6j/NURphb+zCMrcKUQ=; b=g6vcTIRXqMA7KZ38v9O3TS56w6GjH3N2s+SOcHUj0Gm1dMrOCUcVd8G9wxgEM7uOp/ j0VZs+j8ErOKUJ3ihVuDju+Ii0DIeE4bQuiSOlo6RRbrMNVRPdT/bK3879F68flBN9w2 9XAB1cL6JtoGmz8BsN4XnhPCxh0eASAhtsz8QKWAKGv+zR+ZaxJUOMlQ8rRkaIsOkWjf P699p3ohu+VKzUNCZ1NA3wpjg8UYTwNPnXV4p1rkRBB2GCPDMwC7Jd5H0JMV+ydsZKAW VqpXS2v+PF6GtovY/VijPqStEQzDqS+uaDrpBgSA0s/LEkpX/XU8G4tVO3Z7JOA4EEJL 9qwg== X-Gm-Message-State: ACgBeo0HzA4ekoyCRZ28SIn8wbgpkhg+MomwqqluG5h+OB4ftoDxAETb WD0LHRO7MtBdY93AEre+cuY= X-Google-Smtp-Source: AA6agR7uoRHU61BYT/sgTXPA7dsJ2V9wEGH8GUVA2WMJHIXwumZxy/1mHl0EGJm9Q1qojwtaGT5+Tg== X-Received: by 2002:a17:902:b186:b0:172:728a:3636 with SMTP id s6-20020a170902b18600b00172728a3636mr9443310plr.14.1660945359180; Fri, 19 Aug 2022 14:42:39 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id y14-20020aa79aee000000b00534bb955b36sm892810pfp.29.2022.08.19.14.42.37 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:42:38 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 01/15] bpf: Introduce any context BPF specific memory allocator. Date: Fri, 19 Aug 2022 14:42:18 -0700 Message-Id: <20220819214232.18784-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945360; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OQXqiUxNIBJPJkPe9vcnoPlbU6j/NURphb+zCMrcKUQ=; b=26etxQ1tc5itxjCvGLuT+sVkobYCjlU9AM93Q1Qswn0ew6oHM/SDSX6YFNP6IqKko5YkDM sZnsmvkFqFVALtjpMTlJgemQ/IRe7NpF7FUJJfQNlzjWbQaawOIWh+VO0XH7fnelBOoei4 QKj1joMPP44Dh8v7uR+O8su+bBD/N14= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="czlS2WS/"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945360; a=rsa-sha256; cv=none; b=bz818s23i9oDcHQFPpHe/UHJXvplaW7lXUNAsodrMaIZtIxy+6MkzIEznHpEx0lPPZkHpS NgQcOUS9kpam8OfnEZN79mgMS3JRJUm4be7e6L4NayCpTN6QmHyroXmULzStDm7d1GJ/xN xfxOPaTrNH07A7HEe91SbaN+dMv81Vs= Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="czlS2WS/"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 4A8B1180026 X-Stat-Signature: adhetce9jwqf5yr4x394fer8egbxogit X-Rspam-User: X-HE-Tag: 1660945360-203036 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with minimal per-cpu cache of free elements. Refill this cache asynchronously from irq_work. BPF programs always run with migration disabled. It's safe to allocate from cache of the current cpu with irqs disabled. Free-ing is always done into bucket of the current cpu as well. irq_work trims extra free elements from buckets with kfree and refills them with kmalloc, so global kmalloc logic takes care of freeing objects allocated by one cpu and freed on another. struct bpf_mem_alloc supports two modes: - When size != 0 create kmem_cache and bpf_mem_cache for each cpu. This is typical bpf hash map use case when all elements have equal size. - When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on kmalloc/kfree. Max allocation size is 4096 in this case. This is bpf_dynptr and bpf_kptr use case. bpf_mem_alloc/bpf_mem_free are bpf specific 'wrappers' of kmalloc/kfree. bpf_mem_cache_alloc/bpf_mem_cache_free are 'wrappers' of kmem_cache_alloc/kmem_cache_free. The allocators are NMI-safe from bpf programs only. They are not NMI-safe in general. Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 26 ++ kernel/bpf/Makefile | 2 +- kernel/bpf/memalloc.c | 475 ++++++++++++++++++++++++++++++++++ 3 files changed, 502 insertions(+), 1 deletion(-) create mode 100644 include/linux/bpf_mem_alloc.h create mode 100644 kernel/bpf/memalloc.c diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h new file mode 100644 index 000000000000..804733070f8d --- /dev/null +++ b/include/linux/bpf_mem_alloc.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#ifndef _BPF_MEM_ALLOC_H +#define _BPF_MEM_ALLOC_H +#include + +struct bpf_mem_cache; +struct bpf_mem_caches; + +struct bpf_mem_alloc { + struct bpf_mem_caches __percpu *caches; + struct bpf_mem_cache __percpu *cache; +}; + +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); + +/* kmalloc/kfree equivalent: */ +void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); +void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); + +/* kmem_cache_alloc/free equivalent: */ +void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); +void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); + +#endif /* _BPF_MEM_ALLOC_H */ diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 057ba8e01e70..11fb9220909b 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o -obj-$(CONFIG_BPF_SYSCALL) += btf.o +obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) obj-$(CONFIG_BPF_SYSCALL) += devmap.o diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c new file mode 100644 index 000000000000..293380eaea41 --- /dev/null +++ b/kernel/bpf/memalloc.c @@ -0,0 +1,475 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include +#include + +/* Any context (including NMI) BPF specific memory allocator. + * + * Tracing BPF programs can attach to kprobe and fentry. Hence they + * run in unknown context where calling plain kmalloc() might not be safe. + * + * Front-end kmalloc() with per-cpu per-bucket cache of free elements. + * Refill this cache asynchronously from irq_work. + * + * CPU_0 buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * ... + * CPU_N buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * + * The buckets are prefilled at the start. + * BPF programs always run with migration disabled. + * It's safe to allocate from cache of the current cpu with irqs disabled. + * Free-ing is always done into bucket of the current cpu as well. + * irq_work trims extra free elements from buckets with kfree + * and refills them with kmalloc, so global kmalloc logic takes care + * of freeing objects allocated by one cpu and freed on another. + * + * Every allocated objected is padded with extra 8 bytes that contains + * struct llist_node. + */ +#define LLIST_NODE_SZ sizeof(struct llist_node) + +/* similar to kmalloc, but sizeof == 8 bucket is gone */ +static u8 size_index[24] __ro_after_init = { + 3, /* 8 */ + 3, /* 16 */ + 4, /* 24 */ + 4, /* 32 */ + 5, /* 40 */ + 5, /* 48 */ + 5, /* 56 */ + 5, /* 64 */ + 1, /* 72 */ + 1, /* 80 */ + 1, /* 88 */ + 1, /* 96 */ + 6, /* 104 */ + 6, /* 112 */ + 6, /* 120 */ + 6, /* 128 */ + 2, /* 136 */ + 2, /* 144 */ + 2, /* 152 */ + 2, /* 160 */ + 2, /* 168 */ + 2, /* 176 */ + 2, /* 184 */ + 2 /* 192 */ +}; + +static int bpf_mem_cache_idx(size_t size) +{ + if (!size || size > 4096) + return -1; + + if (size <= 192) + return size_index[(size - 1) / 8] - 1; + + return fls(size - 1) - 1; +} + +#define NUM_CACHES 11 + +struct bpf_mem_cache { + /* per-cpu list of free objects of size 'unit_size'. + * All accesses are done with interrupts disabled and 'active' counter + * protection with __llist_add() and __llist_del_first(). + */ + struct llist_head free_llist; + local_t active; + + /* Operations on the free_list from unit_alloc/unit_free/bpf_mem_refill + * are sequenced by per-cpu 'active' counter. But unit_free() cannot + * fail. When 'active' is busy the unit_free() will add an object to + * free_llist_extra. + */ + struct llist_head free_llist_extra; + + /* kmem_cache != NULL when bpf_mem_alloc was created for specific + * element size. + */ + struct kmem_cache *kmem_cache; + struct irq_work refill_work; + struct obj_cgroup *objcg; + int unit_size; + /* count of objects in free_llist */ + int free_cnt; +}; + +struct bpf_mem_caches { + struct bpf_mem_cache cache[NUM_CACHES]; +}; + +static struct llist_node notrace *__llist_del_first(struct llist_head *head) +{ + struct llist_node *entry, *next; + + entry = head->first; + if (!entry) + return NULL; + next = entry->next; + head->first = next; + return entry; +} + +#define BATCH 48 +#define LOW_WATERMARK 32 +#define HIGH_WATERMARK 96 +/* Assuming the average number of elements per bucket is 64, when all buckets + * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + + * 64*4096*32 ~ 20Mbyte + */ + +static void *__alloc(struct bpf_mem_cache *c, int node) +{ + /* Allocate, but don't deplete atomic reserves that typical + * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc + * will allocate from the current numa node which is what we + * want here. + */ + gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + + if (c->kmem_cache) + return kmem_cache_alloc_node(c->kmem_cache, flags, node); + + return kmalloc_node(c->unit_size, flags, node); +} + +static struct mem_cgroup *get_memcg(const struct bpf_mem_cache *c) +{ +#ifdef CONFIG_MEMCG_KMEM + if (c->objcg) + return get_mem_cgroup_from_objcg(c->objcg); +#endif + +#ifdef CONFIG_MEMCG + return root_mem_cgroup; +#else + return NULL; +#endif +} + +/* Mostly runs from irq_work except __init phase. */ +static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) +{ + struct mem_cgroup *memcg = NULL, *old_memcg; + unsigned long flags; + void *obj; + int i; + + memcg = get_memcg(c); + old_memcg = set_active_memcg(memcg); + for (i = 0; i < cnt; i++) { + obj = __alloc(c, node); + if (!obj) + break; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + /* In RT irq_work runs in per-cpu kthread, so disable + * interrupts to avoid preemption and interrupts and + * reduce the chance of bpf prog executing on this cpu + * when active counter is busy. + */ + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + __llist_add(obj, &c->free_llist); + c->free_cnt++; + } + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + } + set_active_memcg(old_memcg); + mem_cgroup_put(memcg); +} + +static void free_one(struct bpf_mem_cache *c, void *obj) +{ + if (c->kmem_cache) + kmem_cache_free(c->kmem_cache, obj); + else + kfree(obj); +} + +static void free_bulk(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + unsigned long flags; + int cnt; + + do { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + else + cnt = 0; + } + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + free_one(c, llnode); + } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + + /* and drain free_llist_extra */ + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) + free_one(c, llnode); +} + +static void bpf_mem_refill(struct irq_work *work) +{ + struct bpf_mem_cache *c = container_of(work, struct bpf_mem_cache, refill_work); + int cnt; + + /* Racy access to free_cnt. It doesn't need to be 100% accurate */ + cnt = c->free_cnt; + if (cnt < LOW_WATERMARK) + /* irq_work runs on this cpu and kmalloc will allocate + * from the current numa node which is what we want here. + */ + alloc_bulk(c, BATCH, NUMA_NO_NODE); + else if (cnt > HIGH_WATERMARK) + free_bulk(c); +} + +static void notrace irq_work_raise(struct bpf_mem_cache *c) +{ + irq_work_queue(&c->refill_work); +} + +static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) +{ + init_irq_work(&c->refill_work, bpf_mem_refill); + /* To avoid consuming memory assume that 1st run of bpf + * prog won't be doing more than 4 map_update_elem from + * irq disabled region + */ + alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); +} + +/* When size != 0 create kmem_cache and bpf_mem_cache for each cpu. + * This is typical bpf hash map use case when all elements have equal size. + * + * When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on + * kmalloc/kfree. Max allocation size is 4096 in this case. + * This is bpf_dynptr and bpf_kptr use case. + */ +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +{ + static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; + struct bpf_mem_caches *cc, __percpu *pcc; + struct bpf_mem_cache *c, __percpu *pc; + struct kmem_cache *kmem_cache; + struct obj_cgroup *objcg = NULL; + char buf[32]; + int cpu, i; + + if (size) { + pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); + if (!pc) + return -ENOMEM; + size += LLIST_NODE_SZ; /* room for llist_node */ + snprintf(buf, sizeof(buf), "bpf-%u", size); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + if (!kmem_cache) { + free_percpu(pc); + return -ENOMEM; + } +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(pc, cpu); + c->kmem_cache = kmem_cache; + c->unit_size = size; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + ma->cache = pc; + return 0; + } + + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); + if (!pcc) + return -ENOMEM; +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(pcc, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + c->unit_size = sizes[i]; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + } + ma->caches = pcc; + return 0; +} + +static void drain_mem_cache(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) + free_one(c, llnode); + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) + free_one(c, llnode); +} + +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) +{ + struct bpf_mem_caches *cc; + struct bpf_mem_cache *c; + int cpu, i; + + if (ma->cache) { + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(ma->cache, cpu); + drain_mem_cache(c); + } + /* kmem_cache and memcg are the same across cpus */ + kmem_cache_destroy(c->kmem_cache); + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->cache); + ma->cache = NULL; + } + if (ma->caches) { + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(ma->caches, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + drain_mem_cache(c); + } + } + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->caches); + ma->caches = NULL; + } +} + +/* notrace is necessary here and in other functions to make sure + * bpf programs cannot attach to them and cause llist corruptions. + */ +static void notrace *unit_alloc(struct bpf_mem_cache *c) +{ + struct llist_node *llnode = NULL; + unsigned long flags; + int cnt = 0; + + /* Disable irqs to prevent the following race for majority of prog types: + * prog_A + * bpf_mem_alloc + * preemption or irq -> prog_B + * bpf_mem_alloc + * + * but prog_B could be a perf_event NMI prog. + * Use per-cpu 'active' counter to order free_list access between + * unit_alloc/unit_free/bpf_mem_refill. + */ + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + } + local_dec(&c->active); + local_irq_restore(flags); + + WARN_ON(cnt < 0); + + if (cnt < LOW_WATERMARK) + irq_work_raise(c); + return llnode; +} + +/* Though 'ptr' object could have been allocated on a different cpu + * add it to the free_llist of the current cpu. + * Let kfree() logic deal with it when it's later called from irq_work. + */ +static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) +{ + struct llist_node *llnode = ptr - LLIST_NODE_SZ; + unsigned long flags; + int cnt = 0; + + BUILD_BUG_ON(LLIST_NODE_SZ > 8); + + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + __llist_add(llnode, &c->free_llist); + cnt = ++c->free_cnt; + } else { + /* unit_free() cannot fail. Therefore add an object to atomic + * llist. free_bulk() will drain it. Though free_llist_extra is + * a per-cpu list we have to use atomic llist_add here, since + * it also can be interrupted by bpf nmi prog that does another + * unit_free() into the same free_llist_extra. + */ + llist_add(llnode, &c->free_llist_extra); + } + local_dec(&c->active); + local_irq_restore(flags); + + if (cnt > HIGH_WATERMARK) + /* free few objects from current cpu into global kmalloc pool */ + irq_work_raise(c); +} + +/* Called from BPF program or from sys_bpf syscall. + * In both cases migration is disabled. + */ +void notrace *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size) +{ + int idx; + void *ret; + + if (!size) + return ZERO_SIZE_PTR; + + idx = bpf_mem_cache_idx(size + LLIST_NODE_SZ); + if (idx < 0) + return NULL; + + ret = unit_alloc(this_cpu_ptr(ma->caches)->cache + idx); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr) +{ + int idx; + + if (!ptr) + return; + + idx = bpf_mem_cache_idx(__ksize(ptr - LLIST_NODE_SZ)); + if (idx < 0) + return; + + unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr); +} + +void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma) +{ + void *ret; + + ret = unit_alloc(this_cpu_ptr(ma->cache)); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) +{ + if (!ptr) + return; + + unit_free(this_cpu_ptr(ma->cache), ptr); +} From patchwork Fri Aug 19 21:42:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949269 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 402C2C32771 for ; Fri, 19 Aug 2022 21:42:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D66D66B0075; Fri, 19 Aug 2022 17:42:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CEF5C6B0078; Fri, 19 Aug 2022 17:42:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B688E8D0001; Fri, 19 Aug 2022 17:42:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A7F686B0075 for ; Fri, 19 Aug 2022 17:42:44 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7D2D2121AE6 for ; Fri, 19 Aug 2022 21:42:44 +0000 (UTC) X-FDA: 79817667048.31.9B859A0 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf07.hostedemail.com (Postfix) with ESMTP id D46D740018 for ; Fri, 19 Aug 2022 21:42:43 +0000 (UTC) Received: by mail-pl1-f181.google.com with SMTP id y4so5163332plb.2 for ; Fri, 19 Aug 2022 14:42:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=DB/TYi6+dkOmK19tEIPXNQRA+MB2a2IbEFWliUNeEu8=; b=I3BC4qvSpRZy/Za86lRfd3dZyHz2E2X7+Zgz68m4kYn9WlssEZ17khve9pczdJDK7N vw/TGP0hoQXpIpZQbQiqD0MiwX+jNP4OqpAWNdIvV6pkzFArLjAdMTdZJrZ21Tb5TlQC WAOPka1VrnciztMz8b0RZsM6eve4VMneZwAtW/FAC4lRmYaflPUMSl/2N+x0/6bks26M 8atASL0/0ip8MEp/hPxxVigGmDmzF+CtTUIwBNGqC9/QeVJ3DbrGu+RyvNj/iE3RoH1/ WQkQaVu8tJxMdtCxAL8hjACeOTRD3KJu2BNKX/RHX596ShBmeoHb+8HVIsUXk7aBZI+V mZJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=DB/TYi6+dkOmK19tEIPXNQRA+MB2a2IbEFWliUNeEu8=; b=1E1iklbFgYi50lP3FzHZu7lqYn97Ax9js3WgtjFYmevX3M59g6L7bTvGdK3rr3k9/Z xTYKtA47CXJzFWYB+DyUFA6niIttSdR4w8np3w70KJBXCN4XR8VCY0ZTTcdreOVMC32v I4gNIA4V3RZw7ER9RoUBqGfNYq4RxiMCPCzfb1lacMYT52dBKq4sulmcc+ELlpRy8TnW yWslaSvsnL/Mjm2jOkgjh9WMTKzRjB/8KKD4tolY5kgGmPWmJIz8be6JSoSNCZRZNAVY zI0Fdqhvdxa1DrljDQkoBUTtS+k744AiyujqktEZ6faWe6QLX2rahL6nkODeCvA6UNdb 5qIA== X-Gm-Message-State: ACgBeo0mOdAwdTTP4sIKFfhIE+8LPcBE6H5Y4fq6nnnGVLuN9dYizwAN kvgnj+bMiHdcFeyoVZUbQ0g= X-Google-Smtp-Source: AA6agR6EcXVBwH4FRcz5zQ0ds7wQ5sfrqP4e5JMj4aW3I0FYrktzqq90TCS0DrTvnsXrIzz1PL2IAA== X-Received: by 2002:a17:902:d2c3:b0:16e:ea56:7840 with SMTP id n3-20020a170902d2c300b0016eea567840mr9183680plc.142.1660945362784; Fri, 19 Aug 2022 14:42:42 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id q18-20020a63d612000000b0041d628dde58sm3226219pgg.30.2022.08.19.14.42.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:42:42 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc. Date: Fri, 19 Aug 2022 14:42:19 -0700 Message-Id: <20220819214232.18784-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945363; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DB/TYi6+dkOmK19tEIPXNQRA+MB2a2IbEFWliUNeEu8=; b=BEbDt7hQ7t1uft2CrmY9g8QcMySqhjOx5lyjDt+wyvfBeWwMLhCJA9mD/Z4jP9grfbYHrC 4VfQTdytBYp/SZtaynl75hOOANu1uXgg0BlEdV4eKH57QB/JKCZNwsZL7zwSK2lbyeuji8 ZdQA5mAlpJUC4+Yo9qcMwn68Anx7RDs= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=I3BC4qvS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945363; a=rsa-sha256; cv=none; b=TwMLiee94vK3OyPTnp8ib+g3gOaBWCMIu17Q0Mq4o6mk73eAzLu5UFzGaTw4osHSudJcYl CHcycZpnfvEqVR1heridaYxXO64u/lVK3//vSamGfAZmAZ3khy/tkmIL7DiNpyORbKcltS SPWlQuUFGuZrMor/OwCUEGM/SvyP4fA= Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=I3BC4qvS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D46D740018 X-Stat-Signature: uccwioftwzd9eh3s3w7i16r8sbmeweuc X-Rspam-User: X-HE-Tag: 1660945363-422649 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert bpf hash map to use bpf memory allocator. Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index b301a63afa2f..bd23c8830d49 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -14,6 +14,7 @@ #include "percpu_freelist.h" #include "bpf_lru_list.h" #include "map_in_map.h" +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -92,6 +93,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; + struct bpf_mem_alloc ma; struct bucket *buckets; void *elems; union { @@ -563,6 +565,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (err) goto free_prealloc; } + } else { + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + if (err) + goto free_map_locked; } return &htab->map; @@ -573,6 +579,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); @@ -849,7 +856,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); check_and_free_fields(htab, l); - kfree(l); + bpf_mem_cache_free(&htab->ma, l); } static void htab_elem_free_rcu(struct rcu_head *head) @@ -973,9 +980,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = ERR_PTR(-E2BIG); goto dec_count; } - l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, - GFP_NOWAIT | __GFP_NOWARN, - htab->map.numa_node); + l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); goto dec_count; @@ -994,7 +999,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, pptr = bpf_map_alloc_percpu(&htab->map, size, 8, GFP_NOWAIT | __GFP_NOWARN); if (!pptr) { - kfree(l_new); + bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } @@ -1489,6 +1494,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Fri Aug 19 21:42:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949279 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 727AEC3F6B0 for ; Fri, 19 Aug 2022 21:44:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BCC58D0001; Fri, 19 Aug 2022 17:44:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 046076B0074; Fri, 19 Aug 2022 17:44:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E021B8D0001; Fri, 19 Aug 2022 17:44:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CD0066B0073 for ; Fri, 19 Aug 2022 17:44:36 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A8037AC8BA for ; Fri, 19 Aug 2022 21:44:36 +0000 (UTC) X-FDA: 79817671752.18.B7CC03F Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf03.hostedemail.com (Postfix) with ESMTP id 5F9C42003E for ; Fri, 19 Aug 2022 21:42:47 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id o5-20020a17090a3d4500b001ef76490983so6018356pjf.2 for ; Fri, 19 Aug 2022 14:42:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=dmkm8LUqLMPRLLj04Upjo1iLNrY0YeL6wYt31D9wn9c=; b=g5Ax/6Xv9cqCyyR94zcUzn0wGeLtY1TKEosUBJ3ox+a3VDhK/7ZUH11/R6GkotOzt0 ro+XXiAhrilK2OhvjomOAEeCMX/Nu9apno/sJIaykK/d0p1W6SiTClj4Ft9PHf/0I7T6 tX9cT9v2GsuW5+b/7NGU5Tjkm65e9cLVV1C5EGyu1uXgSFaNIP7grIQYVhfWYyZzOWwM hzhHB8NBGVUO6n9V0RbJlPznK4cmecaimKajYRj7PDLIU2/LWbLuT93c20EL4TBAWWx3 VGqcCDfoIGU1FpapPDBzfNoTNWg499kyfd+bXnmW5x52luQiQ6QLLO5cKMC+5Kwln6vt fxTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=dmkm8LUqLMPRLLj04Upjo1iLNrY0YeL6wYt31D9wn9c=; b=r7vv/ZwZV+OQpImRItEHerU1KE1qD+MqDjmuoaZ0pU1A254zMRDXBmG/QvVpKDOBP6 wWm4OkNsApORN05WXTvr8MJpEpjyJ6i/54+fbiIwRal4LPaYebG63cbJoXcCQA58Z50K sLA3gBnvkW0NfBPR8VGfTm8MuToiJe2bh3OuAWtryJD5M6fcN2ATiuXR4lypcGDhapD7 dXUdemfDM9B5sqVaQhr0LppBBhIePOTgtZwFYiWnei5VCdMaoNjq5TeqXNQQdbeZSpAH /WHYqYDUwhaXXK3S+JYbfR1HbCT7qvLQxIQ69xssLFPZk8DL9kJDh1WrZBzFLCx+CRxl z+PA== X-Gm-Message-State: ACgBeo0ij4zurvaGMyO1+HKmuxh8jASlHlybzOS4r10k+ERqvXPwYpJ4 mtKyBlwHn22LvOjMC3D/ytI= X-Google-Smtp-Source: AA6agR6XIbzCUUYxE9qqryXKyHB5woeKhQiUWWYtqbletzdvYojvwWCjYx6ctJR/Zrt848tUy436pQ== X-Received: by 2002:a17:903:187:b0:172:65c3:e229 with SMTP id z7-20020a170903018700b0017265c3e229mr9212966plg.97.1660945366434; Fri, 19 Aug 2022 14:42:46 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id n2-20020a622702000000b0052e7f103138sm3850793pfn.38.2022.08.19.14.42.44 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:42:45 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Date: Fri, 19 Aug 2022 14:42:20 -0700 Message-Id: <20220819214232.18784-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945367; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dmkm8LUqLMPRLLj04Upjo1iLNrY0YeL6wYt31D9wn9c=; b=elRSbYjqXLYJVGQXRDQqYFCbprRybJRdD45pGxeCdsUazcfo5QbbpleLYk320W0TtH5qOP MpqbGKIobQ9ru/qeQ0Fs/ufn9xCJLFMiNm5DszZE4hnNXUj+ifLkL0H91OQJd36DAhrXZm WN3mloMlNeQ4jnXoLY8FdVl9aLuZU/g= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="g5Ax/6Xv"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945367; a=rsa-sha256; cv=none; b=JU3Gq607VGx1W8HwgOSA68OJ7DW0BQL+XE84SlRTvFGIEp2yEI7P5MAJAYKxky9HOSWKBg LRmOjHYOWOVU116Tn8y1SpyKzZytkBv47BurnuGD31/3uhSs5hzfKWsPsRVxh0GkWkZ4ns bo3A+XVRHg/SEjVlYCQEoWimM+BLT9g= Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="g5Ax/6Xv"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 5F9C42003E X-Stat-Signature: f7wfiikpnifdf8ahz8dkakiw3iedoqwi X-Rspam-User: X-HE-Tag: 1660945367-407833 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make test_maps more stressful with more parallelism in update/delete/lookup/walk including different value sizes. Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_maps.c | 38 ++++++++++++++++--------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index cbebfaa7c1e8..d1ffc76814d9 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -264,10 +264,11 @@ static void test_hashmap_percpu(unsigned int task, void *data) close(fd); } +#define VALUE_SIZE 3 static int helper_fill_hashmap(int max_entries) { int i, fd, ret; - long long key, value; + long long key, value[VALUE_SIZE] = {}; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), max_entries, &map_opts); @@ -276,8 +277,8 @@ static int helper_fill_hashmap(int max_entries) "err: %s, flags: 0x%x\n", strerror(errno), map_opts.map_flags); for (i = 0; i < max_entries; i++) { - key = i; value = key; - ret = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + key = i; value[0] = key; + ret = bpf_map_update_elem(fd, &key, value, BPF_NOEXIST); CHECK(ret != 0, "can't update hashmap", "err: %s\n", strerror(ret)); @@ -288,8 +289,8 @@ static int helper_fill_hashmap(int max_entries) static void test_hashmap_walk(unsigned int task, void *data) { - int fd, i, max_entries = 1000; - long long key, value, next_key; + int fd, i, max_entries = 10000; + long long key, value[VALUE_SIZE], next_key; bool next_key_valid = true; fd = helper_fill_hashmap(max_entries); @@ -297,7 +298,7 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); } assert(i == max_entries); @@ -305,9 +306,9 @@ static void test_hashmap_walk(unsigned int task, void *data) assert(bpf_map_get_next_key(fd, NULL, &key) == 0); for (i = 0; next_key_valid; i++) { next_key_valid = bpf_map_get_next_key(fd, &key, &next_key) == 0; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - value++; - assert(bpf_map_update_elem(fd, &key, &value, BPF_EXIST) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + value[0]++; + assert(bpf_map_update_elem(fd, &key, value, BPF_EXIST) == 0); key = next_key; } @@ -316,8 +317,8 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - assert(value - 1 == key); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + assert(value[0] - 1 == key); } assert(i == max_entries); @@ -1371,16 +1372,16 @@ static void __run_parallel(unsigned int tasks, static void test_map_stress(void) { + run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_hashmap, NULL); run_parallel(100, test_hashmap_percpu, NULL); run_parallel(100, test_hashmap_sizes, NULL); - run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_arraymap, NULL); run_parallel(100, test_arraymap_percpu, NULL); } -#define TASKS 1024 +#define TASKS 100 #define DO_UPDATE 1 #define DO_DELETE 0 @@ -1432,6 +1433,8 @@ static void test_update_delete(unsigned int fn, void *data) int fd = ((int *)data)[0]; int i, key, value, err; + if (fn & 1) + test_hashmap_walk(fn, NULL); for (i = fn; i < MAP_SIZE; i += TASKS) { key = value = i; @@ -1455,7 +1458,7 @@ static void test_update_delete(unsigned int fn, void *data) static void test_map_parallel(void) { - int i, fd, key = 0, value = 0; + int i, fd, key = 0, value = 0, j = 0; int data[2]; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), @@ -1466,6 +1469,7 @@ static void test_map_parallel(void) exit(1); } +again: /* Use the same fd in children to add elements to this map: * child_0 adds key=0, key=1024, key=2048, ... * child_1 adds key=1, key=1025, key=2049, ... @@ -1502,6 +1506,12 @@ static void test_map_parallel(void) key = -1; assert(bpf_map_get_next_key(fd, NULL, &key) < 0 && errno == ENOENT); assert(bpf_map_get_next_key(fd, &key, &key) < 0 && errno == ENOENT); + + key = 0; + bpf_map_delete_elem(fd, &key); + if (j++ < 5) + goto again; + close(fd); } static void test_map_rdonly(void) From patchwork Fri Aug 19 21:42:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949270 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 175C0C3F6B0 for ; Fri, 19 Aug 2022 21:42:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96E5E6B0078; Fri, 19 Aug 2022 17:42:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F8968D0001; Fri, 19 Aug 2022 17:42:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 771056B007D; Fri, 19 Aug 2022 17:42:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 656086B0078 for ; Fri, 19 Aug 2022 17:42:51 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 44D5181AB2 for ; Fri, 19 Aug 2022 21:42:51 +0000 (UTC) X-FDA: 79817667342.29.5CC0E16 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf01.hostedemail.com (Postfix) with ESMTP id 1006440018 for ; Fri, 19 Aug 2022 21:42:50 +0000 (UTC) Received: by mail-pf1-f179.google.com with SMTP id z187so5365621pfb.12 for ; Fri, 19 Aug 2022 14:42:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=Y2Iep7E/2mTKKhtTknM/1Uwo/WL5pQSofmmMCSYQizM=; b=gG+VWJ3awALvNnUdyC5xL/1f4A42VHpiPiDSlLBPml4whB/IXBE47qCA6VeA5GMhye kbelUgRgUkw8/z5A16+NaN7+dlsMj9i2VsZ6qd9GzI3d9CM7cz65arD4QIo1Q1EAaz3L NCbDlRtvO6uU5a6lznjnOSOsvc/+wEReO7RhQoh7W9hxRrDsq/U1DjtyR/1rbVREgA0b tHJCges6gxW8rHlU7QhGisbp54TXqJK0GSg6YHCvZrgXbhsMyq3SY676osQBPYfv9n/Q we1OuOyjcLdajD9JRo78l42/P0ojTBaTddHKBBYMDXwlHCwz2HyYFSg40kEOO23N2U4x D8fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Y2Iep7E/2mTKKhtTknM/1Uwo/WL5pQSofmmMCSYQizM=; b=0HV+oSaohllUs1FOOq5j8fZvTBXHx962NQDOXrkaR5DnQ61Meruj+7Mjj5muSviwNE xI3Pr27dHS6bYLOBijVfN1g4r2T73V1ubpE6+C91D0sIR3moWSPOSZRf/cMTuRQV2X0H LyaRK+Lnn/fXcrMl5esczyNekUAIFZDA+3LEKTeb4CBe4heLdXuuYnpM+dTFeSFiZgwM TmSC6Ztq4TGiINOkMDSigJmB1pXdqPRATQjVKf0S0mLc6nVa6QsP3Mkrd6/Y4vXHWNnx O2R42BXrpBjmc01mmHYAr0kGeVunfulEpYPZgBaScdgCjO1kxWmB+z9uq5Cp3uC6yyuu d7jg== X-Gm-Message-State: ACgBeo12XOgAdQRD/Ql8LnlfhR1pPfduj5jNH3Lxo6jZPF12Je6Rs+KH E67WfEVtTDdmmQP6Hd8BSLk= X-Google-Smtp-Source: AA6agR4Iv3f5dhBPNPtyy0UsY3l+qHByiiKxabu+VXGaOrgGFuMa8I/iMKo4NKtrYYDovLnpxL3GMw== X-Received: by 2002:a63:5b56:0:b0:422:6905:547f with SMTP id l22-20020a635b56000000b004226905547fmr7574031pgm.126.1660945370056; Fri, 19 Aug 2022 14:42:50 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id x61-20020a17090a6c4300b001f02a72f29csm3592368pjj.8.2022.08.19.14.42.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:42:49 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test. Date: Fri, 19 Aug 2022 14:42:21 -0700 Message-Id: <20220819214232.18784-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945371; a=rsa-sha256; cv=none; b=UYpoVazFEDkmBCfdGzKQ9n7iFPJ5GQSHXSp63L5NxwQvnJo5GQaia6nWcx7hOe8OCNWuJx OsAyB8wRWHPjtaKLGzaubLrLWXtEpsRLkkJv3aGN6MiFncsDr6b4bW/FvYkqKUwsOPHHdj s7I3PQyLwMdL8xv9Ne+PAEHQjyhMhu8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gG+VWJ3a; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945371; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y2Iep7E/2mTKKhtTknM/1Uwo/WL5pQSofmmMCSYQizM=; b=vvufbJQZdakQUeClEtXNxhGgin8FWjgwJe0BISavCWeIadM/Vnx/6/9RYWt8VjItjGFKBD +t9bK+sgGKBU2V/Jb0UkA1ZnIcmWai6xHAtOqwIauULrzXF+6a37TrP0VvBk0gNBTEC6dH msev/PPW3g5U7SV11KcCric3/GVjdZk= X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1006440018 X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gG+VWJ3a; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: ufu6dayc6g8ofafrzzrxaua1euub4719 X-HE-Tag: 1660945370-967975 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make map_perf_test for preallocated and non-preallocated hash map spend more time inside bpf program to focus performance analysis on the speed of update/lookup/delete operations performed by bpf program. It makes 'perf report' of bpf_mem_alloc look like: 11.76% map_perf_test [k] _raw_spin_lock_irqsave 11.26% map_perf_test [k] htab_map_update_elem 9.70% map_perf_test [k] _raw_spin_lock 9.47% map_perf_test [k] htab_map_delete_elem 8.57% map_perf_test [k] memcpy_erms 5.58% map_perf_test [k] alloc_htab_elem 4.09% map_perf_test [k] __htab_map_lookup_elem 3.44% map_perf_test [k] syscall_exit_to_user_mode 3.13% map_perf_test [k] lookup_nulls_elem_raw 3.05% map_perf_test [k] migrate_enable 3.04% map_perf_test [k] memcmp 2.67% map_perf_test [k] unit_free 2.39% map_perf_test [k] lookup_elem_raw Reduce default iteration count as well to make 'map_perf_test' quick enough even on debug kernels. Signed-off-by: Alexei Starovoitov --- samples/bpf/map_perf_test_kern.c | 44 ++++++++++++++++++++------------ samples/bpf/map_perf_test_user.c | 2 +- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c index 8773f22b6a98..7342c5b2f278 100644 --- a/samples/bpf/map_perf_test_kern.c +++ b/samples/bpf/map_perf_test_kern.c @@ -108,11 +108,14 @@ int stress_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map, &key); - if (value) - bpf_map_delete_elem(&hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map, &key); + if (value) + bpf_map_delete_elem(&hash_map, &key); + } return 0; } @@ -123,11 +126,14 @@ int stress_percpu_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map, &key); + } return 0; } @@ -137,11 +143,14 @@ int stress_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&hash_map_alloc, &key); + } return 0; } @@ -151,11 +160,14 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + } return 0; } diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index b6fc174ab1f2..1bb53f4b29e1 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -72,7 +72,7 @@ static int test_flags = ~0; static uint32_t num_map_entries; static uint32_t inner_lru_hash_size; static int lru_hash_lookup_test_entries = 32; -static uint32_t max_cnt = 1000000; +static uint32_t max_cnt = 10000; static int check_test_flags(enum test_type t) { From patchwork Fri Aug 19 21:42:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B429BC32771 for ; Fri, 19 Aug 2022 21:42:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DAE36B007B; Fri, 19 Aug 2022 17:42:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 463916B007D; Fri, 19 Aug 2022 17:42:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DEBF6B007E; Fri, 19 Aug 2022 17:42:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1D1E66B007B for ; Fri, 19 Aug 2022 17:42:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E9DD51C7678 for ; Fri, 19 Aug 2022 21:42:54 +0000 (UTC) X-FDA: 79817667468.28.AA7A67C Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf30.hostedemail.com (Postfix) with ESMTP id 9C9A280006 for ; Fri, 19 Aug 2022 21:42:54 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id w14so5143296plp.9 for ; Fri, 19 Aug 2022 14:42:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=KAWNhWE/WD3wFZacJ2ko17zYo7gsMhvffUTmRG8how0=; b=IJU+tAgIT6VuO5vXWYEJYSrs+tbTdFc9scQFNl6/qxCoimGAqFqVKFLtfP3ORfpHRc pMymoiUer9FBhgdQ10onvBH5bPowq+J6JG09mUonw6k8A2ayOcFgskAwekWPRUCYgmss oztKP6p+IMe2wJNJifLQ68ZaYz7G53sojy2B5ky2v65akg2hOaUTmghTabkRHWQZvZna rGSAxzIyfaskEWfhv23ry3UX8LpgnOvXseZtfV+mehCvPNP2HJj144ese0DFSPJ3SrNy 8xi+X6X9qUJEsq2kzUFxG5rysq8yLSOa4aIB+30GlKaNeGCMmvFIG9+8RQRgRCS4ypyd K2QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=KAWNhWE/WD3wFZacJ2ko17zYo7gsMhvffUTmRG8how0=; b=oUHByCRNmt1SQw0ZBXTa7WuwhTuzNEzzLgHPNlaErVf2UufSPtl7OUm3rqYKZpqc6/ YLiYs1OUvqKhYjxHHLivS/7xYadBFhtOhakI9msiiXQKbf7vFIaTEdCLWN2Ue6DFfcwT UTQRlRdsRnGKsBzxFy0frvYvlOS559rT1so/RM8dPUey5AxlNvvuXsTJ7NdBXNpXVNs3 HLPEymWGGtyKXgf95BPt7WSFoOIC42J+vR9ybDwmBxdPxWIaDdVYbEOgXgmxNx6oXkYy fFyTeZ0Kha/fYxsH+bfRet5+m+SFNAaXne0vPbeoXFfiwISGM0OXSNWz9iXBoBOHsJyD J+KQ== X-Gm-Message-State: ACgBeo0WNv0V0isBAelUrSlZZMXUajxnDHM7F/i2aZMfid3BPQ71hKjf fmUdD1ZC0AY8y26zX0OVbZQ= X-Google-Smtp-Source: AA6agR5imW3PqoQL8w/K4SjG7gOYKgOg1fCub6mJHTYaUIK+t9YYvXW+AavS+6d/c4j5U1tijiMk3A== X-Received: by 2002:a17:902:cf11:b0:172:6437:412e with SMTP id i17-20020a170902cf1100b001726437412emr9009830plg.77.1660945373647; Fri, 19 Aug 2022 14:42:53 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id c5-20020a637245000000b0041a919ed63dsm3341084pgn.3.2022.08.19.14.42.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:42:53 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs. Date: Fri, 19 Aug 2022 14:42:22 -0700 Message-Id: <20220819214232.18784-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945374; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KAWNhWE/WD3wFZacJ2ko17zYo7gsMhvffUTmRG8how0=; b=b7/DDLTNqR3M37yqoW5SANGOSgrBZ92AspkXv0kgtwgcCO5lMNDziTCstO4DG4OjCMPqij R+Af7fVnrwwEgF7rj2CDAVSFEtS5e51SPRDMYfVCzLSYfjjEmipZsfVq+nzZ3FxXtafL23 jSHyzSPeEnB4VRWeaOWFhS0TPmw0vlA= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IJU+tAgI; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945374; a=rsa-sha256; cv=none; b=WjaL040ceR93m37r3g/ZFattpRnOAT9gEgCqsyg6pLBfko91sDEdW6O9C1q6Eosk7zSC8y 5cZtshb6Y+hJKaV67AJOpRpGersUsTrD2Yj1+wx3F48QUphaYNP6z/eWP4sKMT+/HFqVFC feOQ0Vy4x035AwvfVrW82wJYLFXSuko= Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IJU+tAgI; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9C9A280006 X-Stat-Signature: fmobs159ytbysqo86pojgcaaky35btcj X-Rspam-User: X-HE-Tag: 1660945374-176749 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since bpf hash map was converted to use bpf_mem_alloc it is safe to use from tracing programs and in RT kernels. But per-cpu hash map is still using dynamic allocation for per-cpu map values, hence keep the warning for this map type. In the future alloc_percpu_gfp can be front-end-ed with bpf_mem_cache and this restriction will be completely lifted. perf_event (NMI) bpf programs have to use preallocated hash maps, because free_htab_elem() is using call_rcu which might crash if re-entered. Sleepable bpf programs have to use preallocated hash maps, because life time of the map elements is not protected by rcu_read_lock/unlock. This restriction can be lifted in the future as well. Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2c1f8069f7b7..d785f29047d7 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12605,10 +12605,12 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, * For programs attached to PERF events this is mandatory as the * perf NMI can hit any arbitrary code sequence. * - * All other trace types using preallocated hash maps are unsafe as - * well because tracepoint or kprobes can be inside locked regions - * of the memory allocator or at a place where a recursion into the - * memory allocator would see inconsistent state. + * All other trace types using non-preallocated per-cpu hash maps are + * unsafe as well because tracepoint or kprobes can be inside locked + * regions of the per-cpu memory allocator or at a place where a + * recursion into the per-cpu memory allocator would see inconsistent + * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is + * safe to use from kprobe/fentry and in RT. * * On RT enabled kernels run-time allocation of all trace type * programs is strictly prohibited due to lock type constraints. On @@ -12618,15 +12620,26 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, */ if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { + /* perf_event bpf progs have to use preallocated hash maps + * because non-prealloc is still relying on call_rcu to free + * elements. + */ verbose(env, "perf_event programs can only use preallocated hash map\n"); return -EINVAL; } - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, "trace type programs can only use preallocated hash map\n"); - return -EINVAL; + if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || + (map->inner_map_meta && + map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + verbose(env, + "trace type programs can only use preallocated per-cpu hash map\n"); + return -EINVAL; + } + WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); + verbose(env, + "trace type programs with run-time allocated per-cpu hash maps are unsafe." + " Switch to preallocated hash maps.\n"); } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n"); } if (map_value_has_spin_lock(map)) { From patchwork Fri Aug 19 21:42:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA54AC32771 for ; Fri, 19 Aug 2022 21:42:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32CFC6B007D; Fri, 19 Aug 2022 17:42:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B65B6B007E; Fri, 19 Aug 2022 17:42:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12FBF8D0001; Fri, 19 Aug 2022 17:42:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 01A166B007D for ; Fri, 19 Aug 2022 17:42:58 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CD207161B60 for ; Fri, 19 Aug 2022 21:42:58 +0000 (UTC) X-FDA: 79817667636.09.F083EA2 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf24.hostedemail.com (Postfix) with ESMTP id 77C4918001D for ; Fri, 19 Aug 2022 21:42:58 +0000 (UTC) Received: by mail-pj1-f48.google.com with SMTP id bf22so5766792pjb.4 for ; Fri, 19 Aug 2022 14:42:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=1ZPsTJvD2SD8oJE2LdgDjHUXXNxOa1YZX3bUXze9YDM=; b=h6OnE731x3+bxw84Yrk5LMRet9tgUrjwY5zJZW8txa5IWmma0MDPxhZILEnikbrjg8 eVfNMIi1UAXY83whNiOXlYxS509Df73jTWadxLUfDiHPze06amQlR2r0aTXqintkqpvz n1RBrmne9LZb6nRBs75CyBwz65jYr1D6EPsw5WnJ7J7/NaXkEWvpKfX0Ce2jMEcLeP4h sz+tF9zGKIA1FTwn0AgwR5MhRFyTlbXWrTuj2YBcx4D+41FHMX15BessF5z8u6VdKCg3 IuA/DPNsV6iyZv299egKdbL/6uQPjl7a0PqUJEWAwVJs/SZJdGSTmDo/9KBT4dIDmWZS 1PZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=1ZPsTJvD2SD8oJE2LdgDjHUXXNxOa1YZX3bUXze9YDM=; b=fxc4ThiMg0uFyx+o7NlazX/CuPLnUSdph1Z/Por22nWcDFbpv6HgCn78i/oEINsqkT hTMOkx8w5HslgfLPJNG8d9G33/Agd7IUds25+s5OPAkXBcVqeqpYCOmrc0xN38MJTVu/ uAqTAjMpncdBYZYJQlLeauuCaSlV2BomEhfzJSz9maVaUN06zDtmtFDmKqs2/zMaTfDj J3ZIbKYvdEwm9WwBxA9zubnedCiMR+h/cMmez08fpJGrPejfse993T8Y6+ciTSEIQVJ1 TaSfvpIz0ka8BgljTZYXyrifEqdLr9deGjdGKxzTtzchWs1PACWY6dN68VAa9uMJTMkq TnYg== X-Gm-Message-State: ACgBeo3rnj4EqrE+CHBi/rwsmHytKem5l8/YvAfF6HoRZfDq7uMW2u4j CNHyquACwfMd2bz31ARkf58= X-Google-Smtp-Source: AA6agR54vdsWqFFAZbvuRKIsTnvg6U9ayUFsbInKCU7GqGHP0S6D0wOdeMreN6Cxi4vbQYrkfXOY5w== X-Received: by 2002:a17:902:c945:b0:16e:d24d:1b27 with SMTP id i5-20020a170902c94500b0016ed24d1b27mr9298199pla.51.1660945377401; Fri, 19 Aug 2022 14:42:57 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id k10-20020aa79d0a000000b0053612ec8859sm1656032pfp.209.2022.08.19.14.42.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:42:56 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map. Date: Fri, 19 Aug 2022 14:42:23 -0700 Message-Id: <20220819214232.18784-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945378; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1ZPsTJvD2SD8oJE2LdgDjHUXXNxOa1YZX3bUXze9YDM=; b=5x2Cm4glopde4u6KyBay9xkIWPlvbPBKVLGzHQ20iwQomWR1S5wPS73J9YdUq8Yp8VAFyJ 2N7UAqONB2fYGzcgabE+7Oynrl0A2IjZukhWG2sDRB1EoZTD+ZW8CSbmmdwWR0y2StsNoh 2rZCOIM7e52MK+A7FuksW8fUZtnaWSk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=h6OnE731; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945378; a=rsa-sha256; cv=none; b=sHT5NHF6jtGzkQmMQe8aKKHigRuHTlLkIfmAOZ2lTzPjWvzTdFRWxBK2i/6sWvnCBmR3EU 4jXdwqE++S199W5wC9EkrvzXhCEkuiiqkyGWs4fwPdgiGCK5C+wOhsP2pbMHjxlbRz1XZc ZPGjaXq2lLm+38XVlnnFrBN6NZWJy9Q= Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=h6OnE731; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 77C4918001D X-Stat-Signature: iyptu3qsbyn6g138fpu9aqi35e1dgu8s X-Rspam-User: X-HE-Tag: 1660945378-560278 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The atomic_inc/dec might cause extreme cache line bouncing when multiple cpus access the same bpf map. Based on specified max_entries for the hash map calculate when percpu_counter becomes faster than atomic_t and use it for such maps. For example samples/bpf/map_perf_test is using hash map with max_entries 1000. On a system with 16 cpus the 'map_perf_test 4' shows 14k events per second using atomic_t. On a system with 15 cpus it shows 100k events per second using percpu. map_perf_test is an extreme case where all cpus colliding on atomic_t which causes extreme cache bouncing. Note that the slow path of percpu_counter is 5k events per secound vs 14k for atomic, so the heuristic is necessary. See comment in the code why the heuristic is based on num_online_cpus(). Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 70 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 62 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index bd23c8830d49..8f68c6e13339 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -101,7 +101,12 @@ struct bpf_htab { struct bpf_lru lru; }; struct htab_elem *__percpu *extra_elems; - atomic_t count; /* number of elements in this hashtable */ + /* number of elements in non-preallocated hashtable are kept + * in either pcount or count + */ + struct percpu_counter pcount; + atomic_t count; + bool use_percpu_counter; u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; @@ -552,6 +557,29 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) htab_init_buckets(htab); +/* compute_batch_value() computes batch value as num_online_cpus() * 2 + * and __percpu_counter_compare() needs + * htab->max_entries - cur_number_of_elems to be more than batch * num_online_cpus() + * for percpu_counter to be faster than atomic_t. In practice the average bpf + * hash map size is 10k, which means that a system with 64 cpus will fill + * hashmap to 20% of 10k before percpu_counter becomes ineffective. Therefore + * define our own batch count as 32 then 10k hash map can be filled up to 80%: + * 10k - 8k > 32 _batch_ * 64 _cpus_ + * and __percpu_counter_compare() will still be fast. At that point hash map + * collisions will dominate its performance anyway. Assume that hash map filled + * to 50+% isn't going to be O(1) and use the following formula to choose + * between percpu_counter and atomic_t. + */ +#define PERCPU_COUNTER_BATCH 32 + if (attr->max_entries / 2 > num_online_cpus() * PERCPU_COUNTER_BATCH) + htab->use_percpu_counter = true; + + if (htab->use_percpu_counter) { + err = percpu_counter_init(&htab->pcount, 0, GFP_KERNEL); + if (err) + goto free_map_locked; + } + if (prealloc) { err = prealloc_init(htab); if (err) @@ -878,6 +906,31 @@ static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) } } +static bool is_map_full(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + return __percpu_counter_compare(&htab->pcount, htab->map.max_entries, + PERCPU_COUNTER_BATCH) >= 0; + return atomic_read(&htab->count) >= htab->map.max_entries; +} + +static void inc_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, 1, PERCPU_COUNTER_BATCH); + else + atomic_inc(&htab->count); +} + +static void dec_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, -1, PERCPU_COUNTER_BATCH); + else + atomic_dec(&htab->count); +} + + static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) { htab_put_fd_value(htab, l); @@ -886,7 +939,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) check_and_free_fields(htab, l); __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { - atomic_dec(&htab->count); + dec_elem_count(htab); l->htab = htab; call_rcu(&l->rcu, htab_elem_free_rcu); } @@ -970,16 +1023,15 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = container_of(l, struct htab_elem, fnode); } } else { - if (atomic_inc_return(&htab->count) > htab->map.max_entries) - if (!old_elem) { + if (is_map_full(htab)) + if (!old_elem) /* when map is full and update() is replacing * old element, it's ok to allocate, since * old element will be freed immediately. * Otherwise return an error */ - l_new = ERR_PTR(-E2BIG); - goto dec_count; - } + return ERR_PTR(-E2BIG); + inc_elem_count(htab); l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); @@ -1021,7 +1073,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new->hash = hash; return l_new; dec_count: - atomic_dec(&htab->count); + dec_elem_count(htab); return l_new; } @@ -1495,6 +1547,8 @@ static void htab_map_free(struct bpf_map *map) free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->ma); + if (htab->use_percpu_counter) + percpu_counter_destroy(&htab->pcount); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Fri Aug 19 21:42:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B05FC32771 for ; Fri, 19 Aug 2022 21:45:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B92D26B0073; Fri, 19 Aug 2022 17:45:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B41708D0005; Fri, 19 Aug 2022 17:45:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E3278D0003; Fri, 19 Aug 2022 17:45:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 88E626B0073 for ; Fri, 19 Aug 2022 17:45:07 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6DBCC1A1B45 for ; Fri, 19 Aug 2022 21:45:07 +0000 (UTC) X-FDA: 79817673054.22.2A2DBE2 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf05.hostedemail.com (Postfix) with ESMTP id F134C100019 for ; Fri, 19 Aug 2022 21:43:01 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id 24so4646704pgr.7 for ; Fri, 19 Aug 2022 14:43:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=R7CCPPKWPgzyGKsplgrBfSdH/nOmKmQ7KDErBAIDa18=; b=W109X9RhtblogGbY/aVlPs1Y4LZp8SnbcvVkgcoF8tC4PawP3w35jFiqmf99IA4VfR lNl6t9GtQBMfrrLFXxkGSfCj5BGXJg/Afiw91kdLkjh44GEkUw3CyszIGlkf2dAxLOV8 IsTsIhRf6AIjxIS6Or2OPut3RHRKf/6bv0PxJYrg5tNq8EQvIRtTqGZDMwEegLFAjJlu R7EXthDU0ZmhTptdSHzFdiW57EwqU0bMHyHzcgiV96oBiqNly4eCk+bDi753HCZ6s8cF XSzINLFY2+KEGJonLOROpd4rFrRv8rvHafNS0iH3L/JZrlU8AtjdKMZ35S/7t/AdBMuV dhvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=R7CCPPKWPgzyGKsplgrBfSdH/nOmKmQ7KDErBAIDa18=; b=g38F/m/algpRX8oLJW0fgYqDC6d099Fdtp0EseyV2qvpiaY/qcLaAo2uWW1zIHdwYx SY6n4HLlw8TRxQf/1Wag2lMYesB7nJld/SX1ibMr5w1xs7CYmRPBA+KaOlpCUgjtpZKr 9injzMyaUqZdLCWyhT5Y15/rux4oWSSFhdLSeGjC3PU7+z6184TdKUTI3NrisCn3TjcR JHoD2+Q1mp4C6b2jk4CtD5B1z0Z9SxuW2LEWCKNQxGXLXq8afbv4UcaCmFQDdlMg4iF5 xAhGLSZKQ9s5ZfgUxj7zC47oaYHtT++tiNF52XAl54wQNW/shfcp4w6FYPT6i0BBo1tR Potg== X-Gm-Message-State: ACgBeo2W0xZnDJLD4DshbKy9rvlq6FeZ5V2fHHWM/4hZCR0gFu7M8xVC 9PhXYn32i6D4deaLNybHWMQ= X-Google-Smtp-Source: AA6agR601MFqy59cpbHMqgqpRSfKpiJoe5v62VbSTaZ9Yb5MeIfvR/bq2hylOcfwGkv2xgu4gTHZ/Q== X-Received: by 2002:a62:1c81:0:b0:52f:ccb5:9de1 with SMTP id c123-20020a621c81000000b0052fccb59de1mr9648921pfc.45.1660945380983; Fri, 19 Aug 2022 14:43:00 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id e13-20020a17090301cd00b0016ef6c2375fsm3620437plh.217.2022.08.19.14.42.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:00 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 07/15] bpf: Optimize call_rcu in non-preallocated hash map. Date: Fri, 19 Aug 2022 14:42:24 -0700 Message-Id: <20220819214232.18784-8-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945382; a=rsa-sha256; cv=none; b=pJMAtlaX/XnsZco/4LDiUnp425pV/MRM5oa3CObGf6wKO+4dWvrsnQFNRTsKSLnhbmxu2c 4825dEORpBFHWQzRxdlTnZe276/elpQHkVV44PckBsPh7itGtHZhKuhwZ7mn+j3QsZyZQn ZCdaeajpmZtrGbEP9LNgpDRqxaxkJRw= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=W109X9Rh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945381; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R7CCPPKWPgzyGKsplgrBfSdH/nOmKmQ7KDErBAIDa18=; b=1oX6RwzeD+X8VZMQTrmAK/SEcSVurD5RMzi87IUHaDm+NVisgjBeqz12xj1yw5wZyOMFy5 ujeltl4uyu72zNHhICB3pIp/vhvhCuvG7RaSX2fRCtof5iLKXShi6eeCJAMlhzTvvb3MwJ q8u2OYm9UPT7ffTn+oyU+qv3egATu9s= X-Stat-Signature: hwd1ppoen8acmagsfjq6rt7rmpj8yrj4 X-Rspamd-Queue-Id: F134C100019 X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=W109X9Rh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam11 X-HE-Tag: 1660945381-258557 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Doing call_rcu() million times a second becomes a bottle neck. Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU. The rcu critical section is no longer observed for one htab element which makes non-preallocated hash map behave just like preallocated hash map. The map elements are released back to kernel memory after observing rcu critical section. This improves 'map_perf_test 4' performance from 100k events per second to 250k events per second. bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance boost to non-preallocated hash map and make it within few % of preallocated map while consuming fraction of memory. Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 8 ++++++-- kernel/bpf/memalloc.c | 2 +- tools/testing/selftests/bpf/progs/timer.c | 11 ----------- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 8f68c6e13339..299ab98f9811 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -940,8 +940,12 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); + if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { + l->htab = htab; + call_rcu(&l->rcu, htab_elem_free_rcu); + } else { + htab_elem_free(htab, l); + } } } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 293380eaea41..cfa07f539eda 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -276,7 +276,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c index 5f5309791649..0053c5402173 100644 --- a/tools/testing/selftests/bpf/progs/timer.c +++ b/tools/testing/selftests/bpf/progs/timer.c @@ -208,17 +208,6 @@ static int timer_cb2(void *map, int *key, struct hmap_elem *val) */ bpf_map_delete_elem(map, key); - /* in non-preallocated hashmap both 'key' and 'val' are RCU - * protected and still valid though this element was deleted - * from the map. Arm this timer for ~35 seconds. When callback - * finishes the call_rcu will invoke: - * htab_elem_free_rcu - * check_and_free_timer - * bpf_timer_cancel_and_free - * to cancel this 35 second sleep and delete the timer for real. - */ - if (bpf_timer_start(&val->timer, 1ull << 35, 0) != 0) - err |= 256; ok |= 4; } return 0; From patchwork Fri Aug 19 21:42:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949273 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0E92C32793 for ; Fri, 19 Aug 2022 21:43:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B47E6B007E; Fri, 19 Aug 2022 17:43:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43CF46B0080; Fri, 19 Aug 2022 17:43:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B6B38D0001; Fri, 19 Aug 2022 17:43:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1A2E06B007E for ; Fri, 19 Aug 2022 17:43:06 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C6FFEC1B6E for ; Fri, 19 Aug 2022 21:43:05 +0000 (UTC) X-FDA: 79817667930.16.FF4FC14 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf29.hostedemail.com (Postfix) with ESMTP id 85BCF120126 for ; Fri, 19 Aug 2022 21:43:05 +0000 (UTC) Received: by mail-pj1-f47.google.com with SMTP id t2-20020a17090a4e4200b001f21572f3a4so6053869pjl.0 for ; Fri, 19 Aug 2022 14:43:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=9NJuZZGi61ivxe7o1wakU220SGtbaFMb+lW1/WXuBSo=; b=GXP7gjWcLzkdZc//4cZi2WcfReF8h2ed3jYt6HNXUljAIxDeVaMXDydRzNLfxKuXCk 9YsdNRLZrWtA+/Cu5ZoK7ww2AV3xfK3ONHZESILERBYJxRSrlskhbp4JdgFh/XlXfXYl gUaEiq4hkIOwSPhQdQgqjLxQ7M2rDwxY2zO1XjhFRrmraSgDaG9aw3tRVw3R5VlpiAkm EQ1q3/UYsfoqupXJ5v3ZogbfXjWpnvKfVVdb9jt6pluoDwa39Jh3cZ8jbDoY3nIPX/85 1ZpfJvL4xHrHUb/rr6e/BjDduz1bvoBZ6hX14siv03bSTsvO+mxOWwoBTvSPm0PXlbfv x+xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=9NJuZZGi61ivxe7o1wakU220SGtbaFMb+lW1/WXuBSo=; b=5683nkJgLUN0pl8Eodul4k5SppVLKz9der6XlfhWSfINu0r1Fi+EfD9d4elU9gXY9R xBG9EWz+15C0eKlKuiOjQGHGeXP9TNq06aCZ66D6vWM7N6Ias2esxR4p8Y47HeqWGIwh cgWZ8Gjpcfcm2fr/rryhnZZst26ONjcbLY843V/kmswEdJrbB5pfcoU5bmIW8WrLfPix 6kLISHLZmSJ29AZev5ehJPK5Ge7TjmQv0GC7z96jd6YbWVZ/T46Br/ukGVPbzj1BSvec fxkNjw93bZp1cIIdIDsUSx/n8mDwkybURdfu/U5xQ+yq7JmJkpXC9dyJlNGSsVRhXJEz Q2Jg== X-Gm-Message-State: ACgBeo1iiEHekfok+oowZZtMP3rwCAuyi5GBkpsyh6FdHCtHYwLDioU5 QNBbD3xcaYzjuM+Kyhta15U= X-Google-Smtp-Source: AA6agR6Y/FL1isCq/nfuK7x5R2qTj7SNzOmwS6LqoSR2m4C9z+rfVY4THDenlt8EtdaWG9doWa2+fg== X-Received: by 2002:a17:902:e5cc:b0:16f:1153:c519 with SMTP id u12-20020a170902e5cc00b0016f1153c519mr9393770plf.151.1660945384628; Fri, 19 Aug 2022 14:43:04 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id x24-20020a17090a789800b001f312e7665asm3582549pjk.47.2022.08.19.14.43.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:04 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Date: Fri, 19 Aug 2022 14:42:25 -0700 Message-Id: <20220819214232.18784-9-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945385; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9NJuZZGi61ivxe7o1wakU220SGtbaFMb+lW1/WXuBSo=; b=u+iLEXmqhZbM6A0Hg1b3Eq+03xA6SeoFT1r19kVXnTfTYSlA4FR3bU6no8Py3Ua1/TYtYS xReU1fw9hTAICSZNRX3VQDHbAjGeflUi271fSnx574ZJkY6KEywGmVgyseHonwWZ1y5qDs oFp/LkejcwSGbBYGGi0pMeetXoA5jPo= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GXP7gjWc; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945385; a=rsa-sha256; cv=none; b=QM3FXYJz4O76ABFO31ZHxlhQS0sJAAmfb4SHIee2OPsuWJ9iEEVBXIIJUYnxZGfIBdomCU 6DiFejdzqMkoSNHeSdKcQa9sEDNrpmdH3hHDEZl9l/o+4+pbKiurJjj/2AYc2A8/xVKdLF pNH2EvwrWL6kQqy+jXqZ0OCZZdjpOEs= Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GXP7gjWc; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 85BCF120126 X-Stat-Signature: 9shc4j43ts9zjj9ypr46xmg4t6y3wtiu X-Rspam-User: X-HE-Tag: 1660945385-277550 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The same low/high watermarks for every bucket in bpf_mem_cache consume significant amount of memory. Preallocating 64 elements of 4096 bytes each in the free list is not efficient. Make low/high watermarks and batching value dependent on element size. This change brings significant memory savings. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 50 +++++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index cfa07f539eda..22b729914afe 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -99,6 +99,7 @@ struct bpf_mem_cache { int unit_size; /* count of objects in free_llist */ int free_cnt; + int low_watermark, high_watermark, batch; }; struct bpf_mem_caches { @@ -117,14 +118,6 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head) return entry; } -#define BATCH 48 -#define LOW_WATERMARK 32 -#define HIGH_WATERMARK 96 -/* Assuming the average number of elements per bucket is 64, when all buckets - * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + - * 64*4096*32 ~ 20Mbyte - */ - static void *__alloc(struct bpf_mem_cache *c, int node) { /* Allocate, but don't deplete atomic reserves that typical @@ -215,7 +208,7 @@ static void free_bulk(struct bpf_mem_cache *c) if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); free_one(c, llnode); - } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) @@ -229,12 +222,12 @@ static void bpf_mem_refill(struct irq_work *work) /* Racy access to free_cnt. It doesn't need to be 100% accurate */ cnt = c->free_cnt; - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) /* irq_work runs on this cpu and kmalloc will allocate * from the current numa node which is what we want here. */ - alloc_bulk(c, BATCH, NUMA_NO_NODE); - else if (cnt > HIGH_WATERMARK) + alloc_bulk(c, c->batch, NUMA_NO_NODE); + else if (cnt > c->high_watermark) free_bulk(c); } @@ -243,9 +236,38 @@ static void notrace irq_work_raise(struct bpf_mem_cache *c) irq_work_queue(&c->refill_work); } +/* For typical bpf map case that uses bpf_mem_cache_alloc and single bucket + * the freelist cache will be elem_size * 64 (or less) on each cpu. + * + * For bpf programs that don't have statically known allocation sizes and + * assuming (low_mark + high_mark) / 2 as an average number of elements per + * bucket and all buckets are used the total amount of memory in freelists + * on each cpu will be: + * 64*16 + 64*32 + 64*64 + 64*96 + 64*128 + 64*196 + 64*256 + 32*512 + 16*1024 + 8*2048 + 4*4096 + * == ~ 116 Kbyte using below heuristic. + * Initialized, but unused bpf allocator (not bpf map specific one) will + * consume ~ 11 Kbyte per cpu. + * Typical case will be between 11K and 116K closer to 11K. + * bpf progs can and should share bpf_mem_cache when possible. + */ + static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) { init_irq_work(&c->refill_work, bpf_mem_refill); + if (c->unit_size <= 256) { + c->low_watermark = 32; + c->high_watermark = 96; + } else { + /* When page_size == 4k, order-0 cache will have low_mark == 2 + * and high_mark == 6 with batch alloc of 3 individual pages at + * a time. + * 8k allocs and above low == 1, high == 3, batch == 1. + */ + c->low_watermark = max(32 * 256 / c->unit_size, 1); + c->high_watermark = max(96 * 256 / c->unit_size, 3); + } + c->batch = max((c->high_watermark - c->low_watermark) / 4 * 3, 1); + /* To avoid consuming memory assume that 1st run of bpf * prog won't be doing more than 4 map_update_elem from * irq disabled region @@ -387,7 +409,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) WARN_ON(cnt < 0); - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) irq_work_raise(c); return llnode; } @@ -420,7 +442,7 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) local_dec(&c->active); local_irq_restore(flags); - if (cnt > HIGH_WATERMARK) + if (cnt > c->high_watermark) /* free few objects from current cpu into global kmalloc pool */ irq_work_raise(c); } From patchwork Fri Aug 19 21:42:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949274 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 892B0C32771 for ; Fri, 19 Aug 2022 21:43:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 291F76B0075; Fri, 19 Aug 2022 17:43:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 21ABE6B0080; Fri, 19 Aug 2022 17:43:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06DD88D0001; Fri, 19 Aug 2022 17:43:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E6D646B0075 for ; Fri, 19 Aug 2022 17:43:09 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BADBCAC8B0 for ; Fri, 19 Aug 2022 21:43:09 +0000 (UTC) X-FDA: 79817668098.23.C8C004A Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf31.hostedemail.com (Postfix) with ESMTP id 38A882013B for ; Fri, 19 Aug 2022 21:43:09 +0000 (UTC) Received: by mail-pf1-f180.google.com with SMTP id x19so2610680pfq.1 for ; Fri, 19 Aug 2022 14:43:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=/LUlIl8/5rbkJ1GhT0HIpYGvpg/iLs+2Zu7ChkNbcZY=; b=noM6NjPmtdgPGNZpJxNJ4NTplHUPKy9WIfXxStlcvWXDRZs+iivdCrHx3TjgyXRdAC AMFHv549t4qk2SgUbX6S8xfkf2DU6O/2qPkS5EKdb13AAkiEBToMiuZBM6wsa8hQ3IR9 oshhzgInuWKervtLgCDdfPqYNpa6gP1WoSS+xn3aUiK2A4BZFkyCjEqWT6le7apA1wge NgcQrh/rsEc3Bf1rLK8vNa7Dt3OwZVbCgxKvTq2UIII4W+bk4BWiLS/V34n1O1q0eSwN Y5NswloYL61b4qo8TrEpReA2+AoJMK1+lGA3/N4DN284N+ZBPt2dGdajMqdffLVwzOFB qlYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=/LUlIl8/5rbkJ1GhT0HIpYGvpg/iLs+2Zu7ChkNbcZY=; b=pFROirgl8AuVxd2VDjkq5XtVwH/DiSjLHzMBJGGysDaGIhAWBehCQtl/thcNPiONnn QOrcqDb76aqmmGRZ4+ykL4omG4uWKm0lhx/dMhe0M5QNl6GxEP7Hw1tzxYEPtMPrNgRQ +i5K0oUVlwHv7w42ALgKxsjnu0Gxw+YZPTITYCR3OWUehQry4EeCFUdWSwNf3bAopGih cne2PKwLktHw2cDRGVE+jl9pQgjb/zpaXRH+66h3ojAx1/mb842gfIrB2Gpp4Ovnj47r ogcO+WjfBEorGS5Cjiju3OCDXYhlOZ8WjDOtn6g1HVI8i6GVgw6gDQ40Vweb2qsyuCJB CQtg== X-Gm-Message-State: ACgBeo3KcbZZywA1s77IzejDRGbQCq/ybPTmrADyx6h+lHj7TzVgtDqQ tK4cHVtGX+GI2e69PLbCMq0= X-Google-Smtp-Source: AA6agR5LcCM9K7nbcsM4+E2/yf+LAM29F8RtUacOiamCyVK/Dsdv8chGlwYQE4X191qckGVe0hZnHw== X-Received: by 2002:a65:6385:0:b0:429:f03c:d5e with SMTP id h5-20020a656385000000b00429f03c0d5emr7817818pgv.322.1660945388255; Fri, 19 Aug 2022 14:43:08 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id r18-20020aa79ed2000000b0052d4cb47339sm3884576pfq.151.2022.08.19.14.43.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:07 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. Date: Fri, 19 Aug 2022 14:42:26 -0700 Message-Id: <20220819214232.18784-10-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945389; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/LUlIl8/5rbkJ1GhT0HIpYGvpg/iLs+2Zu7ChkNbcZY=; b=76wa5bkgH2uo1/BoSKGGZUhOZwGrjwjTce575UWb19EqfkvLo9xs9BRpNpCMr25vBLbrO/ C8EwnlJ+RlQMgPTNpZ+ZCEMjTXxK5CTi9g2rSWfb1GTxwfM16EriO7vqvK+uflI3a5M44B c1jygmVuQtaFqVfW52ps2g15TUqCaPc= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=noM6NjPm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945389; a=rsa-sha256; cv=none; b=Mh+w+Jw6KM0Q4jCbPoThHPngza6t7QZwOySyOQThw035oQkLUnOzUGGWEnbkVbubhi/Nga s9x6skUMZbk4vnNTOwAT6ueAw2y4EPBB4K0fDCX9LArM5fdeSLF6/ktfGzzm7kOFy7PnnJ yxHFpOkC3MlFpNHbhdNg18IfJSjF+WU= Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=noM6NjPm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 38A882013B X-Stat-Signature: hoycjpx5hiut9d5g37kknakh9dikdh6p X-Rspam-User: X-HE-Tag: 1660945389-205210 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov SLAB_TYPESAFE_BY_RCU makes kmem_caches non mergeable and slows down kmem_cache_destroy. All bpf_mem_cache are safe to share across different maps and programs. Convert SLAB_TYPESAFE_BY_RCU to batched call_rcu. This change solves the memory consumption issue, avoids kmem_cache_destroy latency and keeps bpf hash map performance the same. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 64 +++++++++++++++++++++++++++++++++++++++++-- kernel/bpf/syscall.c | 5 +++- 2 files changed, 65 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 22b729914afe..d765a5cb24b4 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -100,6 +100,11 @@ struct bpf_mem_cache { /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; + + struct rcu_head rcu; + struct llist_head free_by_rcu; + struct llist_head waiting_for_gp; + atomic_t call_rcu_in_progress; }; struct bpf_mem_caches { @@ -188,6 +193,45 @@ static void free_one(struct bpf_mem_cache *c, void *obj) kfree(obj); } +static void __free_rcu(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + struct llist_node *llnode = llist_del_all(&c->waiting_for_gp); + struct llist_node *pos, *t; + + llist_for_each_safe(pos, t, llnode) + free_one(c, pos); + atomic_set(&c->call_rcu_in_progress, 0); +} + +static void enque_to_free(struct bpf_mem_cache *c, void *obj) +{ + struct llist_node *llnode = obj; + + /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. + * Nothing races to add to free_by_rcu list. + */ + __llist_add(llnode, &c->free_by_rcu); +} + +static void do_call_rcu(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + if (atomic_xchg(&c->call_rcu_in_progress, 1)) + return; + + WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + /* There is no concurrent __llist_add(waiting_for_gp) access. + * It doesn't race with llist_del_all either. + * But there could be two concurrent llist_del_all(waiting_for_gp): + * from __free_rcu() and from drain_mem_cache(). + */ + __llist_add(llnode, &c->waiting_for_gp); + call_rcu(&c->rcu, __free_rcu); +} + static void free_bulk(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; @@ -207,12 +251,13 @@ static void free_bulk(struct bpf_mem_cache *c) local_dec(&c->active); if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); - free_one(c, llnode); + enque_to_free(c, llnode); } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) - free_one(c, llnode); + enque_to_free(c, llnode); + do_call_rcu(c); } static void bpf_mem_refill(struct irq_work *work) @@ -298,7 +343,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; @@ -340,6 +385,15 @@ static void drain_mem_cache(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; + /* The caller has done rcu_barrier() and no progs are using this + * bpf_mem_cache, but htab_map_free() called bpf_mem_cache_free() for + * all remaining elements and they can be in free_by_rcu or in + * waiting_for_gp lists, so drain those lists now. + */ + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + free_one(c, llnode); + llist_for_each_safe(llnode, t, llist_del_all(&c->waiting_for_gp)) + free_one(c, llnode); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) free_one(c, llnode); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) @@ -361,6 +415,10 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) kmem_cache_destroy(c->kmem_cache); if (c->objcg) obj_cgroup_put(c->objcg); + /* c->waiting_for_gp list was drained, but __free_rcu might + * still execute. Wait for it now before we free 'c'. + */ + rcu_barrier(); free_percpu(ma->cache); ma->cache = NULL; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a4d40d98428a..850270a72350 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -638,7 +638,10 @@ static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock) bpf_map_free_id(map, do_idr_lock); btf_put(map->btf); INIT_WORK(&map->work, bpf_map_free_deferred); - schedule_work(&map->work); + /* Avoid spawning kworkers, since they all might contend + * for the same mutex like slab_mutex. + */ + queue_work(system_unbound_wq, &map->work); } } From patchwork Fri Aug 19 21:42:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6B44C32771 for ; Fri, 19 Aug 2022 21:43:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 74DA66B0080; Fri, 19 Aug 2022 17:43:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D4618D0001; Fri, 19 Aug 2022 17:43:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 527486B0082; Fri, 19 Aug 2022 17:43:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 41DAC6B0080 for ; Fri, 19 Aug 2022 17:43:13 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1F60741A58 for ; Fri, 19 Aug 2022 21:43:13 +0000 (UTC) X-FDA: 79817668266.07.17B402F Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf06.hostedemail.com (Postfix) with ESMTP id C4CD518000E for ; Fri, 19 Aug 2022 21:43:12 +0000 (UTC) Received: by mail-pj1-f44.google.com with SMTP id m10-20020a17090a730a00b001fa986fd8eeso8723328pjk.0 for ; Fri, 19 Aug 2022 14:43:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=MuR2M968m3Rb7scXd/K38G3WpnkvQO3q5t0rZJwN3iE=; b=YOv2Wy26SSHNh71HEPW9jbkpq5O9aiHT5H6SJcbnNqdMSCbM/kAYUON8ztUGyBHvk+ tMdYJsIqz86miajmEDfXzm/GLHUyMxaqfsqMaHXmWAIbDP31WYJXwgTP81aEZg+y/kVN eSO6tflHFDv/a1GPm3/Up021xmuWCIC2JKV8UhxD8WHU91TWpzTnLTHOHTONyq1bBJsa aSPiqzj5GpabASWecdOhHL+70yUinqGR+ih6OkgDVRwhaaUbIkvmjLya1UG0EXWSm8cN Hm6hdBkdQLxXNu2T8L5d7m8InZ+26AC9anjhtutCBcvfn6eG+5GrCjMpchFkSS1BHmJx zXpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=MuR2M968m3Rb7scXd/K38G3WpnkvQO3q5t0rZJwN3iE=; b=DpUbejcJ4bubl5X6dGJSo7MI95TOt559qOVUv+Hu4Fa9fCOkWgZaDd5HzjmuyQJ6YY 9PmA6tkUzl5O+D0RSxR1lCHIRgdlXqbcnT/QEgIHvXbeIzGujJl10/W+bcOvhMSc5xKZ 0YsmukYrpnqKW4QBvJTmm1rQ33IUegDhlw2Q5fY0vNP9YAQ88nEQIwhGUqMNx+87dJmv xnDjmuhIGmIfZB3sMimGWjZwLDibVxilYYmvB7D9tuI3gGeSEWka1TQ5SZrOhdObSN/W GHsNhNpXLDaWob3kS+TvfVrCJixsALNpn0lvVzTCN8Rq0B7/gOhE+lV1xBXMljCjBq5g YqMw== X-Gm-Message-State: ACgBeo2xw80tIznKvklo3RXx16Vm5KNxtrGFP5kl1Noftk0/xPOpm/Il y+568G6wiw3yV5fbZp8+Eo4= X-Google-Smtp-Source: AA6agR5QWH72pEQusZfgDyL322vkzz47reSqzMWcrIDa5nVAtwXw49X5SlB106HKBrWK7vLurxJmrg== X-Received: by 2002:a17:902:e94c:b0:171:3d5d:2d00 with SMTP id b12-20020a170902e94c00b001713d5d2d00mr8969363pll.2.1660945391835; Fri, 19 Aug 2022 14:43:11 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id y15-20020a17090264cf00b00172a8e628e7sm3620953pli.190.2022.08.19.14.43.10 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:11 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc. Date: Fri, 19 Aug 2022 14:42:27 -0700 Message-Id: <20220819214232.18784-11-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945392; a=rsa-sha256; cv=none; b=x8q2y86CBTVIm45Aw2pKh9Zx0+SkuNfQVZICPSICc0dMK+ayvRBKmucqXwpxvfxnf97zoa /UZ3MN0SvTW9eywkKl12zYUP0U+vWqBEyon5xsvRS07Fm3YXHoHNYEjW6bXQJKY0CNuRFd KXiZ0cApRVRCX2i9oPwjUUCdPynckE4= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=YOv2Wy26; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945392; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MuR2M968m3Rb7scXd/K38G3WpnkvQO3q5t0rZJwN3iE=; b=PD6JkiLgAOo+DHEjDMBGs9nr6m0HqkTKbaXs5kSlClfjuyfqjg6qQIW9247Jtn7oykLyYt 4m6XPFmoGkBNff5NsgH5lh5OSd6JHGb0+Wn02fSk5ygANXNS+QFVswzvts0DsFFymtqbXY 05q6r/di2QweMIBwcDZ4cbR1XTYBPDE= X-Rspamd-Server: rspam06 X-Rspam-User: X-Stat-Signature: yxttnuwed4aee3nhe6qep5a1rh9fa9my Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=YOv2Wy26; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Queue-Id: C4CD518000E X-HE-Tag: 1660945392-12904 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Extend bpf_mem_alloc to cache free list of fixed size per-cpu allocations. Once such cache is created bpf_mem_cache_alloc() will return per-cpu objects. bpf_mem_cache_free() will free them back into global per-cpu pool after observing RCU grace period. per-cpu flavor of bpf_mem_alloc is going to be used by per-cpu hash maps. The free list cache consists of tuples { llist_node, per-cpu pointer } Unlike alloc_percpu() that returns per-cpu pointer the bpf_mem_cache_alloc() returns a pointer to per-cpu pointer and bpf_mem_cache_free() expects to receive it back. Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 2 +- kernel/bpf/hashtab.c | 2 +- kernel/bpf/memalloc.c | 44 +++++++++++++++++++++++++++++++---- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 804733070f8d..653ed1584a03 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -12,7 +12,7 @@ struct bpf_mem_alloc { struct bpf_mem_cache __percpu *cache; }; -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 299ab98f9811..8daa1132d43c 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -594,7 +594,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) goto free_prealloc; } } else { - err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index d765a5cb24b4..9e5ad7dc4dc7 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -100,6 +100,7 @@ struct bpf_mem_cache { /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; + bool percpu; struct rcu_head rcu; struct llist_head free_by_rcu; @@ -132,6 +133,19 @@ static void *__alloc(struct bpf_mem_cache *c, int node) */ gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + if (c->percpu) { + void **obj = kmem_cache_alloc_node(c->kmem_cache, flags, node); + void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); + + if (!obj || !pptr) { + free_percpu(pptr); + kfree(obj); + return NULL; + } + obj[1] = pptr; + return obj; + } + if (c->kmem_cache) return kmem_cache_alloc_node(c->kmem_cache, flags, node); @@ -187,6 +201,12 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) static void free_one(struct bpf_mem_cache *c, void *obj) { + if (c->percpu) { + free_percpu(((void **)obj)[1]); + kmem_cache_free(c->kmem_cache, obj); + return; + } + if (c->kmem_cache) kmem_cache_free(c->kmem_cache, obj); else @@ -327,21 +347,30 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) * kmalloc/kfree. Max allocation size is 4096 in this case. * This is bpf_dynptr and bpf_kptr use case. */ -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) { static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; struct bpf_mem_caches *cc, __percpu *pcc; struct bpf_mem_cache *c, __percpu *pc; - struct kmem_cache *kmem_cache; + struct kmem_cache *kmem_cache = NULL; struct obj_cgroup *objcg = NULL; char buf[32]; - int cpu, i; + int cpu, i, unit_size; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; - size += LLIST_NODE_SZ; /* room for llist_node */ + + if (percpu) { + unit_size = size; + /* room for llist_node and per-cpu pointer */ + size = LLIST_NODE_SZ + sizeof(void *); + } else { + size += LLIST_NODE_SZ; /* room for llist_node */ + unit_size = size; + } + snprintf(buf, sizeof(buf), "bpf-%u", size); kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { @@ -354,14 +383,19 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) for_each_possible_cpu(cpu) { c = per_cpu_ptr(pc, cpu); c->kmem_cache = kmem_cache; - c->unit_size = size; + c->unit_size = unit_size; c->objcg = objcg; + c->percpu = percpu; prefill_mem_cache(c, cpu); } ma->cache = pc; return 0; } + /* size == 0 && percpu is an invalid combination */ + if (WARN_ON_ONCE(percpu)) + return -EINVAL; + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); if (!pcc) return -ENOMEM; From patchwork Fri Aug 19 21:42:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949276 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79553C32771 for ; Fri, 19 Aug 2022 21:43:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 131AA8D0001; Fri, 19 Aug 2022 17:43:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BB3D6B0082; Fri, 19 Aug 2022 17:43:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4F718D0001; Fri, 19 Aug 2022 17:43:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CE70F6B0081 for ; Fri, 19 Aug 2022 17:43:16 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AB84841A64 for ; Fri, 19 Aug 2022 21:43:16 +0000 (UTC) X-FDA: 79817668392.24.7DABAF9 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf09.hostedemail.com (Postfix) with ESMTP id 65931140010 for ; Fri, 19 Aug 2022 21:43:16 +0000 (UTC) Received: by mail-pf1-f179.google.com with SMTP id g129so2071732pfb.8 for ; Fri, 19 Aug 2022 14:43:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=W43Nyf7DOecFDBnlT5N/MtBJxTqydKGvqtvjlb9SmvI=; b=Xt7Q5IfeDwF0iF0l5X2gYqx3a7MU/52LJ3FXuy8D7md2G+9wc401eu9P9mNfbcHcfH eEkai+QrPtyJfFlD7J+9cZrc1b16Bgyxm4TMp4f+kEzGLmQJD3jKMGGhYXEA+LbwVUD1 2JltVfG0aXMWUHJGyu9Rxsl60crrTrEhjZ+watrDfzxBoPVamMIO56jvuuhuNqNfQFeR kOh+cVRmQQG8h32yrtUUGah3S9UU0UWwhwpCNn7XpGzRBJ9ynDdO+vdSGkRQPORGeKp2 YZeHw4PZ0WJnxfAGXNvuO67mVWBqNCGsB7n5SxI2QdI6TlX2BQvEwWQGo0V1LKZgjgg5 4Rcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=W43Nyf7DOecFDBnlT5N/MtBJxTqydKGvqtvjlb9SmvI=; b=pZ66TExfchegkjRaaoWGyy3vy0nUxSwK9i2qsev7wPTZyWN0DJtIf7ESHRggob95b8 OupyCHt0UQrUgKhqy8CJqvOcVYRNxf/OqwV6wzDbKk8Q8ul6M5lHG3DCIlfA4S5DJsH6 JX1ndFMRheFaoqzsYGCeu7jelNkWRknlpzqzbtECtKSWIn08RIfOzVGgyNGHA5oMNeFh BR3TZIo33vmixoaj9TzDMGcE496deuV7NFyvPWuQ3PCCuhOodK9yUwr+62xk3wX6rCM9 xksEMtLfbRG33R/i18TS1/xsZiIOu5n0P4aaiVnADxpBdoLKqwN/86kcVdYeLNqe+cDL t9SA== X-Gm-Message-State: ACgBeo1MfDzhpuv9oB5cCRDt/sE9JwSmua1wYDrmdtFsKXlhoth7hnTT CTTWMdYQiDOOHk3mzawEl4I= X-Google-Smtp-Source: AA6agR7I1Wxd+o9NTQ/+nSz8ZI/dZ7eFMEtjoRW2LrBgT9NV57WIdWDB6Ouz5GrJ2m+cY0OdANJT/w== X-Received: by 2002:a65:6216:0:b0:41d:8248:3d05 with SMTP id d22-20020a656216000000b0041d82483d05mr8149260pgv.36.1660945395427; Fri, 19 Aug 2022 14:43:15 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id e11-20020a170902784b00b0016dd6929af5sm3618454pln.206.2022.08.19.14.43.14 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:14 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. Date: Fri, 19 Aug 2022 14:42:28 -0700 Message-Id: <20220819214232.18784-12-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945396; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W43Nyf7DOecFDBnlT5N/MtBJxTqydKGvqtvjlb9SmvI=; b=k25+Rrtf7jXKNrb/Jge5UW/ydGpQlaHfVOinxd00CLxjFWCicwveV0szCbmbOme/2hld8S ABB0cDGt+qspX9l+4k4zpYUvOhCe3JgCh5ivkKQ3GuPe873nRH7dS4wSS3asZjcX1iAw1S eOvZ2QAC3VgjaTUKq4HnVOkq3e6ROx4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Xt7Q5Ife; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945396; a=rsa-sha256; cv=none; b=0kK65S98WWInd455PMxdYDSOHnNqF69J0WuWvngqE0C2tUFGSFMcjLc6c1oP0bTfYZKSwj T7U3SqtujUTYptMD+g6OIQb438aaSS25zyfRWMjFkwaB/rFoRYPaMSyL70JI1g+dqWbppx 2XPpzX2jMwxSqm9OsmYGUY5FNvKovio= X-Rspamd-Queue-Id: 65931140010 X-Rspam-User: Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Xt7Q5Ife; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam04 X-Stat-Signature: 5tbaty87y86an4y56bxudfr65f1fojjs X-HE-Tag: 1660945396-635244 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert dynamic allocations in percpu hash map from alloc_percpu() to bpf_mem_cache_alloc() from per-cpu bpf_mem_alloc. Since bpf_mem_alloc frees objects after RCU gp the call_rcu() is removed. pcpu_init_value() now needs to zero-fill per-cpu allocations, since dynamically allocated map elements are now similar to full prealloc, since alloc_percpu() is not called inline and the elements are reused in the freelist. Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 45 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 8daa1132d43c..89f26cbddef5 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -94,6 +94,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; struct bpf_mem_alloc ma; + struct bpf_mem_alloc pcpu_ma; struct bucket *buckets; void *elems; union { @@ -121,14 +122,14 @@ struct htab_elem { struct { void *padding; union { - struct bpf_htab *htab; struct pcpu_freelist_node fnode; struct htab_elem *batch_flink; }; }; }; union { - struct rcu_head rcu; + /* pointer to per-cpu pointer */ + void *ptr_to_pptr; struct bpf_lru_node lru_node; }; u32 hash; @@ -435,8 +436,6 @@ static int htab_map_alloc_check(union bpf_attr *attr) bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED); int numa_node = bpf_map_attr_numa_node(attr); - BUILD_BUG_ON(offsetof(struct htab_elem, htab) != - offsetof(struct htab_elem, hash_node.pprev)); BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) != offsetof(struct htab_elem, hash_node.pprev)); @@ -597,6 +596,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; + if (percpu) { + err = bpf_mem_alloc_init(&htab->pcpu_ma, + round_up(htab->map.value_size, 8), true); + if (err) + goto free_map_locked; + } } return &htab->map; @@ -607,6 +612,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); @@ -882,19 +888,11 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) { if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) - free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); + bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); check_and_free_fields(htab, l); bpf_mem_cache_free(&htab->ma, l); } -static void htab_elem_free_rcu(struct rcu_head *head) -{ - struct htab_elem *l = container_of(head, struct htab_elem, rcu); - struct bpf_htab *htab = l->htab; - - htab_elem_free(htab, l); -} - static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) { struct bpf_map *map = &htab->map; @@ -940,12 +938,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); - } else { - htab_elem_free(htab, l); - } + htab_elem_free(htab, l); } } @@ -970,13 +963,12 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr, static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, void *value, bool onallcpus) { - /* When using prealloc and not setting the initial value on all cpus, - * zero-fill element values for other cpus (just as what happens when - * not using prealloc). Otherwise, bpf program has no way to ensure + /* When not setting the initial value on all cpus, zero-fill element + * values for other cpus. Otherwise, bpf program has no way to ensure * known initial values for cpus other than current one * (onallcpus=false always when coming from bpf prog). */ - if (htab_is_prealloc(htab) && !onallcpus) { + if (!onallcpus) { u32 size = round_up(htab->map.value_size, 8); int current_cpu = raw_smp_processor_id(); int cpu; @@ -1047,18 +1039,18 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, memcpy(l_new->key, key, key_size); if (percpu) { - size = round_up(size, 8); if (prealloc) { pptr = htab_elem_get_ptr(l_new, key_size); } else { /* alloc_percpu zero-fills */ - pptr = bpf_map_alloc_percpu(&htab->map, size, 8, - GFP_NOWAIT | __GFP_NOWARN); + pptr = bpf_mem_cache_alloc(&htab->pcpu_ma); if (!pptr) { bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } + l_new->ptr_to_pptr = pptr; + pptr = *(void **)pptr; } pcpu_init_value(htab, pptr, value, onallcpus); @@ -1550,6 +1542,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); From patchwork Fri Aug 19 21:42:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30816C32771 for ; Fri, 19 Aug 2022 21:43:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6B988D0003; Fri, 19 Aug 2022 17:43:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF3806B007D; Fri, 19 Aug 2022 17:43:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96D336B0081; Fri, 19 Aug 2022 17:43:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7E2D96B0074 for ; Fri, 19 Aug 2022 17:43:20 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6172781AAE for ; Fri, 19 Aug 2022 21:43:20 +0000 (UTC) X-FDA: 79817668560.12.C0A27D3 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf29.hostedemail.com (Postfix) with ESMTP id 223F2120124 for ; Fri, 19 Aug 2022 21:43:19 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id l64so4677598pge.0 for ; Fri, 19 Aug 2022 14:43:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=ZBI+fwy76bkYY8LVV14e2AjKMWsYkt0VzLIv3m+2oj8=; b=ZrGKkLUP6ZJKXIZPS65dM/dVlcFh5D42Kebha2HBWqrRjunPYa8phJu/LZ6s2Eykhe zMVvKesptbxuK4TjFFxmkfk39Q2PAMttuQLmkIVyZVwXdOsttmDcjpWQ/W51LhcXGeQw eWs5kWF1ymCJ0Pm7jev0AjfgeRp6CncHmEKS6M36J9/jO6HKHg3xJJGwd1ncmyQLvLn4 fWLQPjDGvrHHErxTsPTpIcX5Jbq9U+zBwl0Ebr4ekzakyu8KQKfXOe6Y2KKbZBhp+WD7 XpW6c339wN77gce7pVAqWQ6pOHHYwl15UbBmJpsFeHJxQPYe6SuRGBoJhs8Wv7uQfgzv 3JiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=ZBI+fwy76bkYY8LVV14e2AjKMWsYkt0VzLIv3m+2oj8=; b=bxodVerItFvCPsnxqZp0w5ddTmt462fXkklQXy2XXnqn/ttmmLvWPRNgFSiKhoD6Sj OVXordjqyKjwHgjruUwFd24R0w5WHfGke/6ptnf3dH5ItMNeGc1K5x3bEb6r0Bzbr1/E epB/R9q3DLg7rxM0JDzIonKrdSdcdgcYEx9kKSuX1cKbBDb7BRCOa966P2ya7cKI709O IHy44dPPe3qtuJ93FqjIeTVBkvrdCD6Xzhc+MSBGJG/SxMi7TET4eK5pVnu2a9OLEUEj +YU5kR8Lffh7LN6KE0+DI5Y96oYhfk3uHHDA5Zfp+MxCuMzqnXVAXb7tf6bj9KXhjxCG NtMA== X-Gm-Message-State: ACgBeo2RBvBHu+C9expPUmgPpZ2M4rghFrRMvdIiObo7agKjExqEfmQ9 v+px5kaJ/jVoasG1DzDJTGg= X-Google-Smtp-Source: AA6agR7oDkmuQzOgW9F0m5UMAkfxkclSWKTAKhyoF1nqkS99BcY1GM7gcN1QX1nxGpw4wZpR9y4iiA== X-Received: by 2002:a05:6a00:1588:b0:52f:a5bb:b992 with SMTP id u8-20020a056a00158800b0052fa5bbb992mr9616631pfk.38.1660945399062; Fri, 19 Aug 2022 14:43:19 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id n14-20020a170903110e00b0016d6963cb12sm3562742plh.304.2022.08.19.14.43.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:18 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 12/15] bpf: Remove tracing program restriction on map types Date: Fri, 19 Aug 2022 14:42:29 -0700 Message-Id: <20220819214232.18784-13-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945400; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZBI+fwy76bkYY8LVV14e2AjKMWsYkt0VzLIv3m+2oj8=; b=GP2Nsoy4Kjo2542lftzLB9jreiewaluBL1v46ew+RiE1XCmT83xBsOux/ejfOlT9NAZbRe 2t+GwR1GWRNlvuJSv0P6qxRblhGuTs2bWiPXM1DWE1jrfFxXuyOYYg/1tj/pA/6y3F10hI q8nc/MtfLIP4fEiUttk76byAjt2cmI0= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZrGKkLUP; spf=pass (imf29.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945400; a=rsa-sha256; cv=none; b=4MChJRPXt7/uAnvPxe/NNPK0/4verps6U9v+imn67tCKdh6pXCDAAU7DvTaYgUkdZAsVx5 nTCD0FCHcC+eAS1OLtN6GE3alY2/EI6CkqClTB+3f6IIENgrgZjdUnWd07R5GBDcj7nLeS cBDoo7L6b8NPdIFkbz32krYMQQVX+mU= Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZrGKkLUP; spf=pass (imf29.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 223F2120124 X-Stat-Signature: x146u6x7x5iwmswtckqo91ezhzawd6pw X-Rspam-User: X-HE-Tag: 1660945399-31270 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The hash map is now fully converted to bpf_mem_alloc. Its implementation is not allocating synchronously and not calling call_rcu() directly. It's now safe to use non-preallocated hash maps in all types of tracing programs including BPF_PROG_TYPE_PERF_EVENT that runs out of NMI context. Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 42 ------------------------------------------ 1 file changed, 42 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d785f29047d7..a1ada707c57c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12599,48 +12599,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, { enum bpf_prog_type prog_type = resolve_prog_type(prog); - /* - * Validate that trace type programs use preallocated hash maps. - * - * For programs attached to PERF events this is mandatory as the - * perf NMI can hit any arbitrary code sequence. - * - * All other trace types using non-preallocated per-cpu hash maps are - * unsafe as well because tracepoint or kprobes can be inside locked - * regions of the per-cpu memory allocator or at a place where a - * recursion into the per-cpu memory allocator would see inconsistent - * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is - * safe to use from kprobe/fentry and in RT. - * - * On RT enabled kernels run-time allocation of all trace type - * programs is strictly prohibited due to lock type constraints. On - * !RT kernels it is allowed for backwards compatibility reasons for - * now, but warnings are emitted so developers are made aware of - * the unsafety and can fix their programs before this is enforced. - */ - if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { - if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { - /* perf_event bpf progs have to use preallocated hash maps - * because non-prealloc is still relying on call_rcu to free - * elements. - */ - verbose(env, "perf_event programs can only use preallocated hash map\n"); - return -EINVAL; - } - if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || - (map->inner_map_meta && - map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, - "trace type programs can only use preallocated per-cpu hash map\n"); - return -EINVAL; - } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, - "trace type programs with run-time allocated per-cpu hash maps are unsafe." - " Switch to preallocated hash maps.\n"); - } - } if (map_value_has_spin_lock(map)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { From patchwork Fri Aug 19 21:42:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949280 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF069C32771 for ; Fri, 19 Aug 2022 21:44:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B4C26B0073; Fri, 19 Aug 2022 17:44:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43D338D0005; Fri, 19 Aug 2022 17:44:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B6DA8D0003; Fri, 19 Aug 2022 17:44:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 193786B0073 for ; Fri, 19 Aug 2022 17:44:43 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E9A26C1B6E for ; Fri, 19 Aug 2022 21:44:42 +0000 (UTC) X-FDA: 79817672004.25.47D8488 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf02.hostedemail.com (Postfix) with ESMTP id 964AC8008B for ; Fri, 19 Aug 2022 21:43:23 +0000 (UTC) Received: by mail-pj1-f54.google.com with SMTP id w11-20020a17090a380b00b001f73f75a1feso8691366pjb.2 for ; Fri, 19 Aug 2022 14:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=5M011h9hGg1xDHL/wlpIOyvMja4lICeITJBWcCRlji4=; b=AVNDNwVR26ADMyXqQdvKmTsJ2WGpvzN+dngyNzHqfLplmjNWJqTmLst3NChyK67sN9 fqxXyzs/06uFzhwexRSeSHQnrcSMT7Jpv7SbYM1vijQu9XY1gHvWJRCqRpjszUFQjyQL 5KExSAuxa4mBSZP9GN1LpNwXjQDeg/KvomeUe01OMG3zHHeshspvUjU1otMnSFLyB2sG 61amaQBTgBYMQ/wAwSa29D4BjRdEaCFSurKIkNVA6A4/+qy+3nLxloAmTo2wcq2xMoFj i9ubuV4S5RjjTqlvaIrgJi6RrKY7aG9XLM1Nq51Umn6IWth3rmCGlEhZ+CplWUvcIxpP bUVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=5M011h9hGg1xDHL/wlpIOyvMja4lICeITJBWcCRlji4=; b=RtDoIQ74VLcCUojI/vA+2Be4B9BmFTK/9FVpcDSv4/k49mm1PMo9Q2PQtvrjY2LHcm PSd1qD4T1Uf+j7wuO4PfXHLgCGO09HgxkWotRigQ2FO0vNtswhOlyAiIdvooPf+1V8O5 4EPfCjD2Vd0iTOC0XqvqI2YTZwEAVi/xohiJy2R3F29WQs5rmDW7Lk+YKrS3Iz6GYbf6 42SKViO6NOGeTGtdNB6xZJwS7078d8fOWRto/rSDxFeHtHXyfZh36XPdODKQLuX33q/g pbyk1KsIkjUvSds8PfOxrDiokSXQJn6YGW+HWsJPfxayCKituBaAAmauaDjtO6ID/J0g EwKg== X-Gm-Message-State: ACgBeo0QF6Op3u/HXLo3dCGRmE7XD1yhkeZvLZG1+dOPHF9fYiNpiOKv ExgQyVX9LTH+kcO9SOzo1Hs= X-Google-Smtp-Source: AA6agR4BY/cxIN0Y0cw8+3sADm8k8hA8ZoCpOczqfNw4A0fjtlT+RFhRwtJYbWAFPYYhHRriqsv1RA== X-Received: by 2002:a17:902:e80e:b0:16f:14ea:897b with SMTP id u14-20020a170902e80e00b0016f14ea897bmr9361460plg.6.1660945402633; Fri, 19 Aug 2022 14:43:22 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id l11-20020a170902f68b00b001709aea1516sm3624376plg.276.2022.08.19.14.43.21 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:22 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. Date: Fri, 19 Aug 2022 14:42:30 -0700 Message-Id: <20220819214232.18784-14-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945403; a=rsa-sha256; cv=none; b=0OHrkzipw0uO9JxBNH4JDvMwPjpYxfW1Zycifd0joeTUmoD8ig7NRqWqa3xCaf5LccvYnn zJFtLt+fJ/AfX81l9lePPRj5f5HWHFcVZZZ8dT2WGETvi2M/Unf45x9lun5bEiDkSwjyrm s2aWRTK+kPDX3yKuB/iAN77e+wO53Ik= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AVNDNwVR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945403; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5M011h9hGg1xDHL/wlpIOyvMja4lICeITJBWcCRlji4=; b=6PEKTgTgFoNc6YnTelUTBYWLqy1F+2FfHMXil1WenLweiE56REtzCRlF5tz1zRhOFEDQzf mpKxkPi23dK6cuA8T+f8qdsUs/FgL/jiZhD4AnL52IXB7L3H3HkSn8qgSFTn9c8G4Wagsk NE421RGUaskf+9ovDRmLa43AWr9jG1M= X-Stat-Signature: awzmokr8d3nrcrj35p87csoha5cpzgyp X-Rspamd-Queue-Id: 964AC8008B Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AVNDNwVR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam03 X-Rspam-User: X-HE-Tag: 1660945403-689582 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000076, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Use call_rcu_tasks_trace() to wait for sleepable progs to finish. Then use call_rcu() to wait for normal progs to finish and finally do free_one() on each element when freeing objects into global memory pool. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 9e5ad7dc4dc7..d34383dc12d9 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -224,6 +224,13 @@ static void __free_rcu(struct rcu_head *head) atomic_set(&c->call_rcu_in_progress, 0); } +static void __free_rcu_tasks_trace(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + + call_rcu(&c->rcu, __free_rcu); +} + static void enque_to_free(struct bpf_mem_cache *c, void *obj) { struct llist_node *llnode = obj; @@ -249,7 +256,11 @@ static void do_call_rcu(struct bpf_mem_cache *c) * from __free_rcu() and from drain_mem_cache(). */ __llist_add(llnode, &c->waiting_for_gp); - call_rcu(&c->rcu, __free_rcu); + /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. + * Then use call_rcu() to wait for normal progs to finish + * and finally do free_one() on each element. + */ + call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace); } static void free_bulk(struct bpf_mem_cache *c) @@ -452,6 +463,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) /* c->waiting_for_gp list was drained, but __free_rcu might * still execute. Wait for it now before we free 'c'. */ + rcu_barrier_tasks_trace(); rcu_barrier(); free_percpu(ma->cache); ma->cache = NULL; From patchwork Fri Aug 19 21:42:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949282 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0217DC32771 for ; Fri, 19 Aug 2022 21:45:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D2F36B0073; Fri, 19 Aug 2022 17:45:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 882998D0005; Fri, 19 Aug 2022 17:45:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7239E8D0003; Fri, 19 Aug 2022 17:45:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 611B16B0073 for ; Fri, 19 Aug 2022 17:45:23 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3670B81AB3 for ; Fri, 19 Aug 2022 21:45:23 +0000 (UTC) X-FDA: 79817673726.17.5D675BB Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf06.hostedemail.com (Postfix) with ESMTP id 2565418002C for ; Fri, 19 Aug 2022 21:43:26 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id 2so5186917pll.0 for ; Fri, 19 Aug 2022 14:43:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=bW5ocW6PzISfaCgKiPdCkbtq/RA2U8OKQTKMdTGnVOA=; b=QL0G/EHoJfksQ4p6mh6jETpl0vwiWZswZj/MQI0CXx3GQ3XVocxIglIesp1mjt6wig liFF8jPt0KXoJ+PurPcIH41o7V+QkbIadjk4Okh9eyYvPwKeN2XF0bHWtiHpL1MKxE7P C+ePoBtSmYEBPgR/F159rvLFaFgWqmrHo47CKrrYjAlH+WEPNOb3DdMvOn6m4kjqltc7 iD9zOt6GSKu1/Bu2Ld5wdUAS/2exFqYpqD3DLVdb4K9K446n4ZPAdmO8B4ZYdubYD6C5 NUVTtVRkpKRUm8k1SelZZIDYmuadXTy+z0Bk3bHXaqzkOhAZg8nc0nglkMGRxm6bub6P JJxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=bW5ocW6PzISfaCgKiPdCkbtq/RA2U8OKQTKMdTGnVOA=; b=1PLUmDISL5Mo2f9jIk3mQ/SpHd0WTE5UrNb2J46idrh6ep8OU9mjBCt2KKETN6+upy JaxVnC/ogCpe2wFPI0t9juqqPzBPYvP/sJbBipKLMeCHEIjQ7LVPQWfmXa5QveBzzcQB KeoHcMkQn0YAexwrhv5G6zACSX7JffXlOdxYVhjf+4nJBYS0A8s7epOutba/ERgv+Yu8 fGJpF9ljUfO79hGMjkkUocpriSKlgGfCNfytQXprgm9gv8F6fq6DsUqtBnL6YlcSvfV9 tNzhrYRVWBQf/k2FMQ0vH/FYa34bcParz+JCymU0MLsX7JpApjGC+HQFsIYWLH//49rs JF6w== X-Gm-Message-State: ACgBeo1WsU9KZnLJacEf5Df+i2WmCDilDWIgumfHYCn3UjytfUdDCqPb ayyunFG3eYAjOF5Uid5NGNc= X-Google-Smtp-Source: AA6agR56faXSMcH0PAhNAx9r0g5F4js3qHybSTN70KIu5Hh8X5AOPg7E/K1Wj532E/7UrKfXAIftFw== X-Received: by 2002:a17:90b:4a4d:b0:1f5:431c:54fa with SMTP id lb13-20020a17090b4a4d00b001f5431c54famr10378826pjb.199.1660945406270; Fri, 19 Aug 2022 14:43:26 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id y23-20020a17090264d700b0016b81679c1fsm3587099pli.216.2022.08.19.14.43.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:25 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 14/15] bpf: Remove prealloc-only restriction for sleepable bpf programs. Date: Fri, 19 Aug 2022 14:42:31 -0700 Message-Id: <20220819214232.18784-15-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bW5ocW6PzISfaCgKiPdCkbtq/RA2U8OKQTKMdTGnVOA=; b=cd4X3YErsh4iddgF3zpsDh+K8jF5Xx7LXY+aoBN6X7/kKrJA2rKOsHMqzjQ4LqsGTgNTcV RnL4l+qsFIuZ+CZ4nfhR++6laq5U2218JhUIcR5r53sTxRCjF1hfr9ggPIXK8mgMCAEZJf ovMvWX3EEesJ/+QZ0Kh9mc65ZulDTTQ= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="QL0G/EHo"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945407; a=rsa-sha256; cv=none; b=BP7IjdVB7YS0ECPL9aJpUlqQf2CVTvSnlyf6yH3b8aqzjx/lmNjX4NzahgSfffh1xQrK7b FOlh/F9j3u+5Ss9yODfkkrUzOxEMXvKDJc4q/q+2B6C8aYWICno032UN1sZXg5XuOhAV3d bY5MUM3egaOsr20femVNNcRbTI4SXRw= X-Rspamd-Queue-Id: 2565418002C X-Rspam-User: Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="QL0G/EHo"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam04 X-Stat-Signature: fjt5xaanzfokk9ekoaiqfwammpasgu3e X-HE-Tag: 1660945406-266201 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since hash map is now converted to bpf_mem_alloc and it's waiting for rcu and rcu_tasks_trace GPs before freeing elements into global memory slabs it's safe to use dynamically allocated hash maps in sleepable bpf programs. Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 23 ----------------------- 1 file changed, 23 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a1ada707c57c..dcbcf876b886 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12562,14 +12562,6 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env, return err; } -static int check_map_prealloc(struct bpf_map *map) -{ - return (map->map_type != BPF_MAP_TYPE_HASH && - map->map_type != BPF_MAP_TYPE_PERCPU_HASH && - map->map_type != BPF_MAP_TYPE_HASH_OF_MAPS) || - !(map->map_flags & BPF_F_NO_PREALLOC); -} - static bool is_tracing_prog_type(enum bpf_prog_type type) { switch (type) { @@ -12584,15 +12576,6 @@ static bool is_tracing_prog_type(enum bpf_prog_type type) } } -static bool is_preallocated_map(struct bpf_map *map) -{ - if (!check_map_prealloc(map)) - return false; - if (map->inner_map_meta && !check_map_prealloc(map->inner_map_meta)) - return false; - return true; -} - static int check_map_prog_compatibility(struct bpf_verifier_env *env, struct bpf_map *map, struct bpf_prog *prog) @@ -12645,12 +12628,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_LRU_PERCPU_HASH: case BPF_MAP_TYPE_ARRAY_OF_MAPS: case BPF_MAP_TYPE_HASH_OF_MAPS: - if (!is_preallocated_map(map)) { - verbose(env, - "Sleepable programs can only use preallocated maps\n"); - return -EINVAL; - } - break; case BPF_MAP_TYPE_RINGBUF: case BPF_MAP_TYPE_INODE_STORAGE: case BPF_MAP_TYPE_SK_STORAGE: From patchwork Fri Aug 19 21:42:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12949278 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E161BC32771 for ; Fri, 19 Aug 2022 21:43:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E8688D0005; Fri, 19 Aug 2022 17:43:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7716C6B0074; Fri, 19 Aug 2022 17:43:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5EAE38D0005; Fri, 19 Aug 2022 17:43:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 499D76B0073 for ; Fri, 19 Aug 2022 17:43:31 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 26C78A1AFA for ; Fri, 19 Aug 2022 21:43:31 +0000 (UTC) X-FDA: 79817669022.19.9DA3228 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf24.hostedemail.com (Postfix) with ESMTP id E4A0D18000B for ; Fri, 19 Aug 2022 21:43:30 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id o14-20020a17090a0a0e00b001fabfd3369cso6008325pjo.5 for ; Fri, 19 Aug 2022 14:43:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=hlhfOcaAISwN7vZVN/qowrmOmS21KIl/i8kYcPwYBWw=; b=Bo7ngEGAgqshf64Hk3UhRCye2nuSJtKiFuIEWtkcQrpI3Bp5Rc0Ivs8kuR2tm/Uqd4 Gq9K4X9mkcd9GoK8DhXBxcwPbDrVnU/IZgkLWZ2ZY3VX3/XqDtPo0PSDXQME6kYZZgy9 SOWxP+QrgzwqNI0KD36PGnUNtRaDAz+5fGRCUvwek0erbNqC9c75KYwk5SUtf55Y6nrV h2qnCbxyWH48zBO8tEMoguAl2Fmh8iIzmNFTTLBxXxBzr4SfhxDkx9DTTdX6FBUw2cbv Q6bcs+bT8f5dNRU4Tjt2soXAl/EaPj6paYfRtmM/zBqPNzW2oYcCQptEQzA3Y0IclSiv u8lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=hlhfOcaAISwN7vZVN/qowrmOmS21KIl/i8kYcPwYBWw=; b=YemBcmghRjqTz3Z95tFWKSusVznO7Q1DgvXWNjXOUKn3sLxXZZYgLFRFtCo8QhT3z1 OCcZBB7Iu/CfGl0zf7hDCxmq6qhj2DmDVYDH/MuGBr2MOFnlYVn7KOBFHiut9FyDus5o KM5geaMpA56LE1bYK+mn+zdBNAoSVPDNUflSQGojbdyCIrhGK2Hz3gF2EQlqQ3eaRsoa LMq3G5hgYZ/6/0SVb1jD/5awtJtswnIBmLQwslzg4zJs58PKCXAaSzfdYFOLW8N2B9IN 5vDxSn0R7sI9nyASw6K1PU2sbFRN/1z5bM8593y4E+NyGiTk8a4oPOuqLqJvSO3whxYw fGTw== X-Gm-Message-State: ACgBeo1q8Zjak2XJTb0ewT5U+UYptD7mP9/7zTjMPTtnGSoNFdL1aqhz /isq01ImJ6irqaUq+DeySm8= X-Google-Smtp-Source: AA6agR7dFh/IrUInZSWKnwoEMLFomgZNUtWGBrCJIt/X6KdkfB0oFBN+QytwAbcGDFIfY/yhtxOZVQ== X-Received: by 2002:a17:90a:9f96:b0:1fa:b4fb:6297 with SMTP id o22-20020a17090a9f9600b001fab4fb6297mr10349716pjp.80.1660945409961; Fri, 19 Aug 2022 14:43:29 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:c4b1]) by smtp.gmail.com with ESMTPSA id d135-20020a621d8d000000b0052d4b0d0c74sm3893099pfd.70.2022.08.19.14.43.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Aug 2022 14:43:29 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc. Date: Fri, 19 Aug 2022 14:42:32 -0700 Message-Id: <20220819214232.18784-16-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com> References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660945410; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hlhfOcaAISwN7vZVN/qowrmOmS21KIl/i8kYcPwYBWw=; b=Cu/8uQaaae1gzgsRlYZf5yh0fLoxX3tWZ+D1xDu0j374dG3E1IPoV9bhp6+88jBkOKMeEa giyTSYB1uqKGpqWWRzON4jQpDB29SFRsVjABBTc+eGv3seOQrRDWiPb1VELhfKF2MaC25z BbwPMe/ABQEBdqf9+24JcDDUefkrW5U= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Bo7ngEGA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660945410; a=rsa-sha256; cv=none; b=zAnuFS090/N71qhTckfqDhXbOokcA1JYYvWqbaWhjF0Ik09xZ+MvlcVHqOdYLfUQztXaKi WU24BikWwULs/OS6j6UVUwnAr7TLx/ltxuxFjdoy14d2WQxMvv9w1UxefFiLUyH1KQwYIS Z4SEOn5wYqTvDq0dIzAzkFMlxTZmjWY= Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Bo7ngEGA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E4A0D18000B X-Stat-Signature: bn19681b7fmp4qeajc19p9bpffbjgmt6 X-Rspam-User: X-HE-Tag: 1660945410-711894 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Introduce sysctl kernel.bpf_force_dyn_alloc to force dynamic allocation in bpf hash map. All selftests/bpf should pass with bpf_force_dyn_alloc 0 or 1 and all bpf programs (both sleepable and not) should not see any functional difference. The sysctl's observable behavior should only be improved memory usage. Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 2 ++ kernel/bpf/core.c | 2 ++ kernel/bpf/hashtab.c | 5 +++++ kernel/bpf/syscall.c | 9 +++++++++ 4 files changed, 18 insertions(+) diff --git a/include/linux/filter.h b/include/linux/filter.h index a5f21dc3c432..eb4d4a0c0bde 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1009,6 +1009,8 @@ bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, } #endif +extern int bpf_force_dyn_alloc; + #ifdef CONFIG_BPF_JIT extern int bpf_jit_enable; extern int bpf_jit_harden; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 639437f36928..a13e78ea4b90 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -533,6 +533,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) bpf_prog_kallsyms_del(fp); } +int bpf_force_dyn_alloc __read_mostly; + #ifdef CONFIG_BPF_JIT /* All BPF JIT sysctl knobs here. */ int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_DEFAULT_ON); diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 89f26cbddef5..f68a3400939e 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -505,6 +505,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) bpf_map_init_from_attr(&htab->map, attr); + if (!lru && bpf_force_dyn_alloc) { + prealloc = false; + htab->map.map_flags |= BPF_F_NO_PREALLOC; + } + if (percpu_lru) { /* ensure each CPU's lru list has >=1 elements. * since we are at it, make each lru list has the same diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 850270a72350..c201796f4997 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5297,6 +5297,15 @@ static struct ctl_table bpf_syscall_table[] = { .mode = 0644, .proc_handler = bpf_stats_handler, }, + { + .procname = "bpf_force_dyn_alloc", + .data = &bpf_force_dyn_alloc, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, { } };