From patchwork Fri Dec 30 04:11:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13084061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B4C9C3DA7C for ; Fri, 30 Dec 2022 04:12:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234388AbiL3EMM (ORCPT ); Thu, 29 Dec 2022 23:12:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229667AbiL3EMJ (ORCPT ); Thu, 29 Dec 2022 23:12:09 -0500 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C0B2E0C3; Thu, 29 Dec 2022 20:12:05 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4NjsKC081Mz4f3mT2; Fri, 30 Dec 2022 12:11:59 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLMPZa5j3H4SAw--.35465S5; Fri, 30 Dec 2022 12:12:01 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Andrii Nakryiko , Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , "Paul E . McKenney" , rcu@vger.kernel.org, houtao1@huawei.com Subject: [RFC PATCH bpf-next 1/6] bpf: Support ctor in bpf memory allocator Date: Fri, 30 Dec 2022 12:11:46 +0800 Message-Id: <20221230041151.1231169-2-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20221230041151.1231169-1-houtao@huaweicloud.com> References: <20221230041151.1231169-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLMPZa5j3H4SAw--.35465S5 X-Coremail-Antispam: 1UD129KBjvJXoW3AF1kuw4xur48XrW5ur1kGrg_yoW7KFy7pF W7Gr18Aws8XFsrWa12gws2kayaq340gF17Kay3XryF9r1rWrnrGF4kJry7ZF909r4jyayf Ar1vgrW8Z3yUArDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUB0b4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK 8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I 0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxU2mL9UUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Hou Tao Currently the freed element in bpf memory allocator may be immediately reused, for htab map the reuse will reinitialize special fields in map value (e.g., bpf_spin_lock), but lookup procedure may still access these special fields, and it may lead to hard-lockup as shown below: NMI backtrace for cpu 16 CPU: 16 PID: 2574 Comm: htab.bin Tainted: G L 6.1.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), RIP: 0010:queued_spin_lock_slowpath+0x283/0x2c0 ...... Call Trace: copy_map_value_locked+0xb7/0x170 bpf_map_copy_value+0x113/0x3c0 __sys_bpf+0x1c67/0x2780 __x64_sys_bpf+0x1c/0x20 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 ...... For htab map, just like the preallocated case, these is no need to initialize these special fields in map value again once these fields have been initialized, but now only bpf memory allocator knows whether or not an allocated object is reused or not. So introducing ctor support in bpf memory allocator and calling ctor for the allocated object only when it is newly allocated. Fixes: 0fd7c5d43339 ("bpf: Optimize call_rcu in non-preallocated hash map.") Signed-off-by: Hou Tao --- include/linux/bpf_mem_alloc.h | 4 +++- kernel/bpf/core.c | 2 +- kernel/bpf/hashtab.c | 16 ++++++++++++---- kernel/bpf/memalloc.c | 10 +++++++++- 4 files changed, 25 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 3e164b8efaa9..3c287db087e7 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -12,9 +12,11 @@ struct bpf_mem_alloc { struct bpf_mem_caches __percpu *caches; struct bpf_mem_cache __percpu *cache; struct work_struct work; + void (*ctor)(struct bpf_mem_alloc *ma, void *obj); }; -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu, + void (*ctor)(struct bpf_mem_alloc *, void *)); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7f98dec6e90f..6da2f9a6b085 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2755,7 +2755,7 @@ static int __init bpf_global_ma_init(void) { int ret; - ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false); + ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false, NULL); bpf_global_ma_set = !ret; return ret; } diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 5aa2b5525f79..3d6557ec4b92 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -453,6 +453,15 @@ static int htab_map_alloc_check(union bpf_attr *attr) return 0; } +static void htab_elem_ctor(struct bpf_mem_alloc *ma, void *obj) +{ + struct bpf_htab *htab = container_of(ma, struct bpf_htab, ma); + struct htab_elem *elem = obj; + + check_and_init_map_value(&htab->map, + elem->key + round_up(htab->map.key_size, 8)); +} + static struct bpf_map *htab_map_alloc(union bpf_attr *attr) { bool percpu = (attr->map_type == BPF_MAP_TYPE_PERCPU_HASH || @@ -565,12 +574,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) goto free_prealloc; } } else { - err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false, + htab_elem_ctor); if (err) goto free_map_locked; if (percpu) { err = bpf_mem_alloc_init(&htab->pcpu_ma, - round_up(htab->map.value_size, 8), true); + round_up(htab->map.value_size, 8), true, NULL); if (err) goto free_map_locked; } @@ -1004,8 +1014,6 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = ERR_PTR(-ENOMEM); goto dec_count; } - check_and_init_map_value(&htab->map, - l_new->key + round_up(key_size, 8)); } memcpy(l_new->key, key, key_size); diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index ebcc3dd0fa19..ac5b92fece14 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -98,6 +98,7 @@ struct bpf_mem_cache { int free_cnt; int low_watermark, high_watermark, batch; int percpu_size; + struct bpf_mem_alloc *ma; struct rcu_head rcu; struct llist_head free_by_rcu; @@ -188,6 +189,9 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) obj = __alloc(c, node); if (!obj) break; + /* Only do initialize for newly allocated object */ + if (c->ma->ctor) + c->ma->ctor(c->ma, obj); } if (IS_ENABLED(CONFIG_PREEMPT_RT)) /* In RT irq_work runs in per-cpu kthread, so disable @@ -374,7 +378,8 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) * kmalloc/kfree. Max allocation size is 4096 in this case. * This is bpf_dynptr and bpf_kptr use case. */ -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu, + void (*ctor)(struct bpf_mem_alloc *, void *)) { static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; struct bpf_mem_caches *cc, __percpu *pcc; @@ -382,6 +387,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) struct obj_cgroup *objcg = NULL; int cpu, i, unit_size, percpu_size = 0; + ma->ctor = ctor; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) @@ -402,6 +408,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) c->unit_size = unit_size; c->objcg = objcg; c->percpu_size = percpu_size; + c->ma = ma; prefill_mem_cache(c, cpu); } ma->cache = pc; @@ -424,6 +431,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) c = &cc->cache[i]; c->unit_size = sizes[i]; c->objcg = objcg; + c->ma = ma; prefill_mem_cache(c, cpu); } } From patchwork Fri Dec 30 04:11:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13084065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F6ACC3DA7C for ; Fri, 30 Dec 2022 04:12:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229655AbiL3EMS (ORCPT ); Thu, 29 Dec 2022 23:12:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234158AbiL3EMJ (ORCPT ); Thu, 29 Dec 2022 23:12:09 -0500 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59880186E2; Thu, 29 Dec 2022 20:12:07 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4NjsKC4w7sz4f3lXL; Fri, 30 Dec 2022 12:11:59 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLMPZa5j3H4SAw--.35465S6; Fri, 30 Dec 2022 12:12:02 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Andrii Nakryiko , Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , "Paul E . McKenney" , rcu@vger.kernel.org, houtao1@huawei.com Subject: [RFC PATCH bpf-next 2/6] bpf: Factor out a common helper free_llist() Date: Fri, 30 Dec 2022 12:11:47 +0800 Message-Id: <20221230041151.1231169-3-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20221230041151.1231169-1-houtao@huaweicloud.com> References: <20221230041151.1231169-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLMPZa5j3H4SAw--.35465S6 X-Coremail-Antispam: 1UD129KBjvJXoWxZr43AF4ruF48Xry5Jw1xGrg_yoW5Ww48pF y3Gry8Jr4kAFsrua1xtrn7Cas8Xw1Fqa47K3yUu34Skr13Zwn7tFWIkryIgFy5urW8t3y3 Ar4vgr1xGay8JFDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUB0b4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK 8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I 0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUFa9-UUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Hou Tao Factor out a common helper free_llist() to free normal elements or per-cpu elements on a lock-less list. Signed-off-by: Hou Tao --- kernel/bpf/memalloc.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index ac5b92fece14..3ad2e25946b5 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -217,9 +217,9 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) mem_cgroup_put(memcg); } -static void free_one(struct bpf_mem_cache *c, void *obj) +static void free_one(void *obj, bool percpu) { - if (c->percpu_size) { + if (percpu) { free_percpu(((void **)obj)[1]); kfree(obj); return; @@ -228,14 +228,19 @@ static void free_one(struct bpf_mem_cache *c, void *obj) kfree(obj); } -static void __free_rcu(struct rcu_head *head) +static void free_llist(struct llist_node *llnode, bool percpu) { - struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); - struct llist_node *llnode = llist_del_all(&c->waiting_for_gp); struct llist_node *pos, *t; llist_for_each_safe(pos, t, llnode) - free_one(c, pos); + free_one(pos, percpu); +} + +static void __free_rcu(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + + free_llist(llist_del_all(&c->waiting_for_gp), !!c->percpu_size); atomic_set(&c->call_rcu_in_progress, 0); } @@ -441,7 +446,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu, static void drain_mem_cache(struct bpf_mem_cache *c) { - struct llist_node *llnode, *t; + bool percpu = !!c->percpu_size; /* No progs are using this bpf_mem_cache, but htab_map_free() called * bpf_mem_cache_free() for all remaining elements and they can be in @@ -450,14 +455,10 @@ static void drain_mem_cache(struct bpf_mem_cache *c) * Except for waiting_for_gp list, there are no concurrent operations * on these lists, so it is safe to use __llist_del_all(). */ - llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) - free_one(c, llnode); - llist_for_each_safe(llnode, t, llist_del_all(&c->waiting_for_gp)) - free_one(c, llnode); - llist_for_each_safe(llnode, t, __llist_del_all(&c->free_llist)) - free_one(c, llnode); - llist_for_each_safe(llnode, t, __llist_del_all(&c->free_llist_extra)) - free_one(c, llnode); + free_llist(__llist_del_all(&c->free_by_rcu), percpu); + free_llist(llist_del_all(&c->waiting_for_gp), percpu); + free_llist(__llist_del_all(&c->free_llist), percpu); + free_llist(__llist_del_all(&c->free_llist_extra), percpu); } static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) From patchwork Fri Dec 30 04:11:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13084062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB160C4708E for ; Fri, 30 Dec 2022 04:12:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234234AbiL3EMO (ORCPT ); Thu, 29 Dec 2022 23:12:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234024AbiL3EMJ (ORCPT ); Thu, 29 Dec 2022 23:12:09 -0500 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59821186DF; Thu, 29 Dec 2022 20:12:07 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4NjsKD1stSz4f3k5d; Fri, 30 Dec 2022 12:12:00 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLMPZa5j3H4SAw--.35465S7; Fri, 30 Dec 2022 12:12:02 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Andrii Nakryiko , Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , "Paul E . McKenney" , rcu@vger.kernel.org, houtao1@huawei.com Subject: [RFC PATCH bpf-next 3/6] bpf: Pass bitwise flags to bpf_mem_alloc_init() Date: Fri, 30 Dec 2022 12:11:48 +0800 Message-Id: <20221230041151.1231169-4-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20221230041151.1231169-1-houtao@huaweicloud.com> References: <20221230041151.1231169-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLMPZa5j3H4SAw--.35465S7 X-Coremail-Antispam: 1UD129KBjvJXoWxGr4xCryxKFyfuryfArW3trb_yoW5ur18pF Z7Gr48AFs0qF4kua17Krs7Aay5Xw1Fg3WxGay5XryFvr1rWrnrWr4DJryaqF909r4jyayf ArnYgrW0y34UZFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBYb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUWw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x 0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU1c4S7UUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Hou Tao Extend a boolean argument to a bitwise flags argument for bpf_mem_alloc_init(), so more new flags can be added later. Signed-off-by: Hou Tao --- include/linux/bpf_mem_alloc.h | 8 +++++++- kernel/bpf/core.c | 2 +- kernel/bpf/hashtab.c | 5 +++-- kernel/bpf/memalloc.c | 4 +++- 4 files changed, 14 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 3c287db087e7..b9f6b9155fa5 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -13,9 +13,15 @@ struct bpf_mem_alloc { struct bpf_mem_cache __percpu *cache; struct work_struct work; void (*ctor)(struct bpf_mem_alloc *ma, void *obj); + unsigned int flags; }; -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu, +/* flags for bpf_mem_alloc_init() */ +enum { + BPF_MA_PERCPU = 1, +}; + +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, unsigned int flags, void (*ctor)(struct bpf_mem_alloc *, void *)); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 6da2f9a6b085..ca9a698c3f08 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2755,7 +2755,7 @@ static int __init bpf_global_ma_init(void) { int ret; - ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false, NULL); + ret = bpf_mem_alloc_init(&bpf_global_ma, 0, 0, NULL); bpf_global_ma_set = !ret; return ret; } diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 3d6557ec4b92..623111d4276d 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -574,13 +574,14 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) goto free_prealloc; } } else { - err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false, + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, 0, htab_elem_ctor); if (err) goto free_map_locked; if (percpu) { err = bpf_mem_alloc_init(&htab->pcpu_ma, - round_up(htab->map.value_size, 8), true, NULL); + round_up(htab->map.value_size, 8), + BPF_MA_PERCPU, NULL); if (err) goto free_map_locked; } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 3ad2e25946b5..454c86596111 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -383,7 +383,7 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) * kmalloc/kfree. Max allocation size is 4096 in this case. * This is bpf_dynptr and bpf_kptr use case. */ -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu, +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, unsigned int flags, void (*ctor)(struct bpf_mem_alloc *, void *)) { static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; @@ -391,7 +391,9 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu, struct bpf_mem_cache *c, __percpu *pc; struct obj_cgroup *objcg = NULL; int cpu, i, unit_size, percpu_size = 0; + bool percpu = (flags & BPF_MA_PERCPU); + ma->flags = flags; ma->ctor = ctor; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); From patchwork Fri Dec 30 04:11:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13084060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A8ECC10F1B for ; Fri, 30 Dec 2022 04:12:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234374AbiL3EML (ORCPT ); Thu, 29 Dec 2022 23:12:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234007AbiL3EMJ (ORCPT ); Thu, 29 Dec 2022 23:12:09 -0500 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21B7418695; Thu, 29 Dec 2022 20:12:06 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4NjsKD5F9Gz4f3nq1; Fri, 30 Dec 2022 12:12:00 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLMPZa5j3H4SAw--.35465S8; Fri, 30 Dec 2022 12:12:03 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Andrii Nakryiko , Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , "Paul E . McKenney" , rcu@vger.kernel.org, houtao1@huawei.com Subject: [RFC PATCH bpf-next 4/6] bpf: Introduce BPF_MA_NO_REUSE for bpf memory allocator Date: Fri, 30 Dec 2022 12:11:49 +0800 Message-Id: <20221230041151.1231169-5-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20221230041151.1231169-1-houtao@huaweicloud.com> References: <20221230041151.1231169-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLMPZa5j3H4SAw--.35465S8 X-Coremail-Antispam: 1UD129KBjvJXoWxKFW3Kw18JFyDGr1rAFWxXrb_yoW3CFy5pF ZxCry8Aw4kXF4IgFWaqw4vyr43Kr40gw17KrWj9ryrCr1fZryDtrn7Ary7AF15Crs7AFWI 9rZ0kFyfAr4UXFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBIb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aV CY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU13l1DUUUUU== X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Hou Tao Currently the freed element in bpf memory allocator may be reused by new allocation, the reuse may lead to two problems. One problem is that the lookup procedure may get incorrect result if the found element is freed and then reused. Another problem is that lookup procedure may still use special fields in map value or allocated object and at the same time these special fields are reinitialized by new allocation. The latter problem can be mitigated by using ctor in bpf memory allocator, but it only works for case in which all elements have the same type. So introducing BPF_MA_NO_REUSE to disable immediate reuse of freed elements. These freed elements will be moved into a global per-cpu free list instead. After the number of freed elements reaches the threshold, these free elements will be moved into a dymaically allocated object and being freed by a global per-cpu worker through call_rcu_tasks_trace(). Signed-off-by: Hou Tao --- include/linux/bpf_mem_alloc.h | 2 + kernel/bpf/memalloc.c | 175 +++++++++++++++++++++++++++++++++- 2 files changed, 173 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index b9f6b9155fa5..2a10b721832d 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -19,6 +19,8 @@ struct bpf_mem_alloc { /* flags for bpf_mem_alloc_init() */ enum { BPF_MA_PERCPU = 1, + /* Don't reuse freed elements during allocation */ + BPF_MA_NO_REUSE = 2, }; int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, unsigned int flags, diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 454c86596111..e5eaf765624b 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -35,6 +35,23 @@ */ #define LLIST_NODE_SZ sizeof(struct llist_node) +#define BPF_MA_FREE_TYPE_NR 2 + +struct bpf_ma_free_context { + raw_spinlock_t lock; + local_t active; + /* For both no per-cpu and per-cpu */ + struct llist_head to_free[BPF_MA_FREE_TYPE_NR]; + unsigned int to_free_cnt[BPF_MA_FREE_TYPE_NR]; + struct llist_head to_free_extra[BPF_MA_FREE_TYPE_NR]; + struct delayed_work dwork; +}; + +struct bpf_ma_free_batch { + struct rcu_head rcu; + struct llist_node *to_free[BPF_MA_FREE_TYPE_NR]; +}; + /* similar to kmalloc, but sizeof == 8 bucket is gone */ static u8 size_index[24] __ro_after_init = { 3, /* 8 */ @@ -63,6 +80,9 @@ static u8 size_index[24] __ro_after_init = { 2 /* 192 */ }; +static DEFINE_PER_CPU(struct bpf_ma_free_context, percpu_free_ctx); +static struct workqueue_struct *bpf_ma_free_wq; + static int bpf_mem_cache_idx(size_t size) { if (!size || size > 4096) @@ -609,14 +629,11 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) * add it to the free_llist of the current cpu. * Let kfree() logic deal with it when it's later called from irq_work. */ -static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) +static void notrace reuse_free(struct bpf_mem_cache *c, struct llist_node *llnode) { - struct llist_node *llnode = ptr - LLIST_NODE_SZ; unsigned long flags; int cnt = 0; - BUILD_BUG_ON(LLIST_NODE_SZ > 8); - local_irq_save(flags); if (local_inc_return(&c->active) == 1) { __llist_add(llnode, &c->free_llist); @@ -638,6 +655,137 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) irq_work_raise(c); } +static void batch_free_rcu(struct rcu_head *rcu) +{ + struct bpf_ma_free_batch *batch = container_of(rcu, struct bpf_ma_free_batch, rcu); + + free_llist(batch->to_free[0], false); + free_llist(batch->to_free[1], true); + kfree(batch); +} + +static void batch_free_rcu_tasks_trace(struct rcu_head *rcu) +{ + if (rcu_trace_implies_rcu_gp()) + batch_free_rcu(rcu); + else + call_rcu(rcu, batch_free_rcu); +} + +static void bpf_ma_schedule_free_dwork(struct bpf_ma_free_context *ctx) +{ + long delay, left; + u64 to_free_cnt; + + to_free_cnt = ctx->to_free_cnt[0] + ctx->to_free_cnt[1]; + delay = to_free_cnt >= 256 ? 1 : HZ; + if (delayed_work_pending(&ctx->dwork)) { + left = ctx->dwork.timer.expires - jiffies; + if (delay < left) + mod_delayed_work(bpf_ma_free_wq, &ctx->dwork, delay); + return; + } + queue_delayed_work(bpf_ma_free_wq, &ctx->dwork, delay); +} + +static void bpf_ma_splice_to_free_list(struct bpf_ma_free_context *ctx, struct llist_node **to_free) +{ + struct llist_node *tmp[BPF_MA_FREE_TYPE_NR]; + unsigned long flags; + unsigned int i; + + raw_spin_lock_irqsave(&ctx->lock, flags); + for (i = 0; i < ARRAY_SIZE(tmp); i++) { + tmp[i] = __llist_del_all(&ctx->to_free[i]); + ctx->to_free_cnt[i] = 0; + } + raw_spin_unlock_irqrestore(&ctx->lock, flags); + + for (i = 0; i < ARRAY_SIZE(tmp); i++) { + struct llist_node *first, *last; + + first = llist_del_all(&ctx->to_free_extra[i]); + if (!first) { + to_free[i] = tmp[i]; + continue; + } + last = first; + while (last->next) + last = last->next; + to_free[i] = first; + last->next = tmp[i]; + } +} + +static inline bool bpf_ma_has_to_free(const struct bpf_ma_free_context *ctx) +{ + return !llist_empty(&ctx->to_free[0]) || !llist_empty(&ctx->to_free[1]) || + !llist_empty(&ctx->to_free_extra[0]) || !llist_empty(&ctx->to_free_extra[1]); +} + +static void bpf_ma_free_dwork(struct work_struct *work) +{ + struct bpf_ma_free_context *ctx = container_of(to_delayed_work(work), + struct bpf_ma_free_context, dwork); + struct llist_node *to_free[BPF_MA_FREE_TYPE_NR]; + struct bpf_ma_free_batch *batch; + unsigned long flags; + + bpf_ma_splice_to_free_list(ctx, to_free); + + batch = kmalloc(sizeof(*batch), GFP_NOWAIT | __GFP_NOWARN); + if (!batch) { + /* TODO: handle ENOMEM case better ? */ + rcu_barrier_tasks_trace(); + rcu_barrier(); + free_llist(to_free[0], false); + free_llist(to_free[1], true); + goto check; + } + + batch->to_free[0] = to_free[0]; + batch->to_free[1] = to_free[1]; + call_rcu_tasks_trace(&batch->rcu, batch_free_rcu_tasks_trace); +check: + raw_spin_lock_irqsave(&ctx->lock, flags); + if (bpf_ma_has_to_free(ctx)) + bpf_ma_schedule_free_dwork(ctx); + raw_spin_unlock_irqrestore(&ctx->lock, flags); +} + +static void notrace direct_free(struct bpf_mem_cache *c, struct llist_node *llnode) +{ + struct bpf_ma_free_context *ctx; + bool percpu = !!c->percpu_size; + unsigned long flags; + + local_irq_save(flags); + ctx = this_cpu_ptr(&percpu_free_ctx); + if (local_inc_return(&ctx->active) == 1) { + raw_spin_lock(&ctx->lock); + __llist_add(llnode, &ctx->to_free[percpu]); + ctx->to_free_cnt[percpu] += 1; + bpf_ma_schedule_free_dwork(ctx); + raw_spin_unlock(&ctx->lock); + } else { + llist_add(llnode, &ctx->to_free_extra[percpu]); + } + local_dec(&ctx->active); + local_irq_restore(flags); +} + +static inline void unit_free(struct bpf_mem_cache *c, void *ptr) +{ + struct llist_node *llnode = ptr - LLIST_NODE_SZ; + + BUILD_BUG_ON(LLIST_NODE_SZ > 8); + + if (c->ma->flags & BPF_MA_NO_REUSE) + direct_free(c, llnode); + else + reuse_free(c, llnode); +} + /* Called from BPF program or from sys_bpf syscall. * In both cases migration is disabled. */ @@ -686,3 +834,22 @@ void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) unit_free(this_cpu_ptr(ma->cache), ptr); } + +static int __init bpf_ma_init(void) +{ + int cpu; + + bpf_ma_free_wq = alloc_workqueue("bpf_ma_free", WQ_MEM_RECLAIM, 0); + BUG_ON(!bpf_ma_free_wq); + + for_each_possible_cpu(cpu) { + struct bpf_ma_free_context *ctx; + + ctx = per_cpu_ptr(&percpu_free_ctx, cpu); + raw_spin_lock_init(&ctx->lock); + INIT_DELAYED_WORK(&ctx->dwork, bpf_ma_free_dwork); + } + + return 0; +} +fs_initcall(bpf_ma_init); From patchwork Fri Dec 30 04:11:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13084064 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 092D5C10F1B for ; Fri, 30 Dec 2022 04:12:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234095AbiL3EMQ (ORCPT ); Thu, 29 Dec 2022 23:12:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234204AbiL3EMJ (ORCPT ); Thu, 29 Dec 2022 23:12:09 -0500 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 597D4DFC6; Thu, 29 Dec 2022 20:12:07 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4NjsKF32Yfz4f3lXl; Fri, 30 Dec 2022 12:12:01 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLMPZa5j3H4SAw--.35465S9; Fri, 30 Dec 2022 12:12:04 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Andrii Nakryiko , Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , "Paul E . McKenney" , rcu@vger.kernel.org, houtao1@huawei.com Subject: [RFC PATCH bpf-next 5/6] bpf: Use BPF_MA_NO_REUSE in htab map Date: Fri, 30 Dec 2022 12:11:50 +0800 Message-Id: <20221230041151.1231169-6-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20221230041151.1231169-1-houtao@huaweicloud.com> References: <20221230041151.1231169-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLMPZa5j3H4SAw--.35465S9 X-Coremail-Antispam: 1UD129KBjvdXoWrur4DAr1fGr4rAF1xJw1kXwb_yoWkWrb_Cr 4fXFs0krsxC3yI9F1DGFsagr1FgFWSgF15urn5trZrtry5Jas5J3WUuFnxZ3yrGan7G39r X3ZI9rsFg3yaqjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbSxYFVCjjxCrM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20E Y4v20xvaj40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s 0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2 WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkE bVWUJVW8JwACjcxG0xvY0x0EwIxGrwACI402YVCY1x02628vn2kIc2xKxwCF04k20xvY0x 0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E 7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcV C0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUFgAwUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Hou Tao Use BPF_MA_NO_REUSE in htab map to disable the immediate reuse of free elements, so the lookup procedure will not return incorrect result. After the change, the performance of "./map_perf_test 4 18 8192" will drop from 520K to 330K events per sec on one CPU. Signed-off-by: Hou Tao --- kernel/bpf/hashtab.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 623111d4276d..e1636c5d0051 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -574,14 +574,14 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) goto free_prealloc; } } else { - err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, 0, + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, BPF_MA_NO_REUSE, htab_elem_ctor); if (err) goto free_map_locked; if (percpu) { err = bpf_mem_alloc_init(&htab->pcpu_ma, round_up(htab->map.value_size, 8), - BPF_MA_PERCPU, NULL); + BPF_MA_PERCPU | BPF_MA_NO_REUSE, NULL); if (err) goto free_map_locked; } From patchwork Fri Dec 30 04:11:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13084063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D5AEC4332F for ; Fri, 30 Dec 2022 04:12:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234024AbiL3EMP (ORCPT ); Thu, 29 Dec 2022 23:12:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234095AbiL3EMJ (ORCPT ); Thu, 29 Dec 2022 23:12:09 -0500 Received: from dggsgout12.his.huawei.com (unknown [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7522A186C5; Thu, 29 Dec 2022 20:12:07 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4NjsKF6DnKz4f3nqS; Fri, 30 Dec 2022 12:12:01 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLMPZa5j3H4SAw--.35465S10; Fri, 30 Dec 2022 12:12:04 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: Martin KaFai Lau , Andrii Nakryiko , Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , "Paul E . McKenney" , rcu@vger.kernel.org, houtao1@huawei.com Subject: [RFC PATCH bpf-next 6/6] selftests/bpf: Add test case for element reuse in htab map Date: Fri, 30 Dec 2022 12:11:51 +0800 Message-Id: <20221230041151.1231169-7-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20221230041151.1231169-1-houtao@huaweicloud.com> References: <20221230041151.1231169-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLMPZa5j3H4SAw--.35465S10 X-Coremail-Antispam: 1UD129KBjvJXoWxGw1DKw45Kr43GF1rWr13XFb_yoWrAF4Upa yrC34UKrWxXwn8Ww15Jan7KF4ftw1rZay5AFn3Ww1avw1UZr9avr1xKFW7tF1fCrZ3Xryr Zayfta15Zr48Cw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAI cVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2js IEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjxUFgAwUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Hou Tao The immediate reuse of free htab elements can lead to two problems: 1) lookup may get unexpected map value if the found element is deleted and then reused. 2) the reinitialization of spin-lock in map value after reuse may corrupt lookup with BPF_F_LOCK flag and result in hard lock-up. So add one test case to demonostrate these two problems. Signed-off-by: Hou Tao --- .../selftests/bpf/prog_tests/htab_reuse.c | 111 ++++++++++++++++++ .../testing/selftests/bpf/progs/htab_reuse.c | 19 +++ 2 files changed, 130 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_reuse.c create mode 100644 tools/testing/selftests/bpf/progs/htab_reuse.c diff --git a/tools/testing/selftests/bpf/prog_tests/htab_reuse.c b/tools/testing/selftests/bpf/prog_tests/htab_reuse.c new file mode 100644 index 000000000000..995972958d1d --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/htab_reuse.c @@ -0,0 +1,111 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#define _GNU_SOURCE +#include +#include +#include +#include "htab_reuse.skel.h" + +struct htab_op_ctx { + int fd; + int loop; + bool stop; +}; + +struct htab_val { + unsigned int lock; + unsigned int data; +}; + +static void *htab_lookup_fn(void *arg) +{ + struct htab_op_ctx *ctx = arg; + int ret = 0, i = 0; + + while (i++ < ctx->loop && !ctx->stop) { + struct htab_val value; + unsigned int key; + int err; + + /* + * Use BPF_F_LOCK to use spin-lock in map value. And also + * check whether or not an unexpected value is returned. + */ + key = 7; + err = bpf_map_lookup_elem_flags(ctx->fd, &key, &value, BPF_F_LOCK); + if (!err && key != value.data) + ret = EINVAL; + } + + return (void *)(long)ret; +} + +static void *htab_update_fn(void *arg) +{ + struct htab_op_ctx *ctx = arg; + int i = 0; + + while (i++ < ctx->loop && !ctx->stop) { + struct htab_val value; + unsigned int key; + + key = 7; + value.lock = 0; + value.data = key; + bpf_map_update_elem(ctx->fd, &key, &value, BPF_F_LOCK); + bpf_map_delete_elem(ctx->fd, &key); + + key = 24; + value.lock = 0; + value.data = key; + bpf_map_update_elem(ctx->fd, &key, &value, BPF_F_LOCK); + bpf_map_delete_elem(ctx->fd, &key); + } + + return NULL; +} + +void test_htab_reuse(void) +{ + unsigned int i, wr_nr = 1, rd_nr = 4; + pthread_t tids[wr_nr + rd_nr]; + struct htab_reuse *skel; + struct htab_op_ctx ctx; + int err; + + skel = htab_reuse__open_and_load(); + if (!ASSERT_OK_PTR(skel, "htab_reuse__open_and_load")) + return; + + ctx.fd = bpf_map__fd(skel->maps.htab); + ctx.loop = 500; + ctx.stop = false; + + memset(tids, 0, sizeof(tids)); + for (i = 0; i < wr_nr; i++) { + err = pthread_create(&tids[i], NULL, htab_update_fn, &ctx); + if (!ASSERT_OK(err, "pthread_create")) { + ctx.stop = true; + goto reap; + } + } + for (i = 0; i < rd_nr; i++) { + err = pthread_create(&tids[i + wr_nr], NULL, htab_lookup_fn, &ctx); + if (!ASSERT_OK(err, "pthread_create")) { + ctx.stop = true; + goto reap; + } + } + +reap: + for (i = 0; i < wr_nr + rd_nr; i++) { + void *thread_err; + + if (!tids[i]) + continue; + thread_err = NULL; + pthread_join(tids[i], &thread_err); + ASSERT_NULL(thread_err, "thread error"); + } + htab_reuse__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/htab_reuse.c b/tools/testing/selftests/bpf/progs/htab_reuse.c new file mode 100644 index 000000000000..e6dcc70517f9 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/htab_reuse.c @@ -0,0 +1,19 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct htab_val { + struct bpf_spin_lock lock; + unsigned int data; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 64); + __type(key, unsigned int); + __type(value, struct htab_val); + __uint(map_flags, BPF_F_NO_PREALLOC); +} htab SEC(".maps");