From patchwork Wed Jun 21 02:32:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286580 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3299DEB64D8 for ; Wed, 21 Jun 2023 02:32:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229685AbjFUCcu (ORCPT ); Tue, 20 Jun 2023 22:32:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229520AbjFUCcs (ORCPT ); Tue, 20 Jun 2023 22:32:48 -0400 Received: from mail-ot1-x335.google.com (mail-ot1-x335.google.com [IPv6:2607:f8b0:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54C7BB7; Tue, 20 Jun 2023 19:32:47 -0700 (PDT) Received: by mail-ot1-x335.google.com with SMTP id 46e09a7af769-6b2d356530eso3533042a34.0; Tue, 20 Jun 2023 19:32:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314766; x=1689906766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tnvjt/gAeB7s5ArRb1iRUdzehhXE3naKGVtPVDLMicQ=; b=eaDktuJaNwx2eZwEEcmydMTHPrrhsEszTspjHhPpLOYPi8zOH0v6QcM6LPOEg5RX3x kOOaRU1wYKSMm8eVmo2UfRVxUYElD3QLRRdYI0pkHB1cp9k0hLQVQD6a0+Bzx3znJd41 KS+MCYXTk7mDdfOrkFrdYr0uLclq0zqjG7/WTwpOmjipEFh06/51Z66mo0li9StNTOmE wrWKL75fPfDIXb4x7vBaTJBgSYbQ82rrT/aXTeBptvnthJKbFlAjb3wanWpDdvZlDW6m nVDHK9yGafXv1FK5PO1QTTIOd1Je4SrQ4DVs3/pU4jP6HY62sGzLgbm8PT12VsNFy1X7 nwVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314766; x=1689906766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tnvjt/gAeB7s5ArRb1iRUdzehhXE3naKGVtPVDLMicQ=; b=UznBLyOi+YvpvuOPcWnrHrk4By7EoTJ3qcdBsDbNhBpzc89T7NTPz6Hm9dp8NzgXQU hrOpsRNNYYdvZcyaqKJvPwCzTPSsrS7+tvAel8ekda7rqZBR1ZOiJ7UVukzPb+ZBknSs JOeKRMOHnO5rMotIfqe1mWQUzs1vFG8StiTIn21+3GG9qDEE2K07hqJ7+ra9Fs9LaUP4 Fbm4xGVXs6vfzx79RuFmY2ascGxeX67QHOLfUs/LErpsS/OymXY0d+3HIoEQnwLWF+bs v+YqPKFc6KeFijoppkHi8jdnb+kE7IhYLwaYONs/pe2dwZfqs2a3sBpLk3HJIiarSG5r QK5g== X-Gm-Message-State: AC+VfDw+A3hzYiIa2wa2O2Pty+BYmH8HLdtC2LPCw25lvIUk3TA1K0lX 4ePmU0r49nOYnrKmCl1InAM= X-Google-Smtp-Source: ACHHUZ6HiQ9vhEZ23oWT782NY73v2Dv4571A+yK3FNzgFua8hFYrnixYDIDs2r/hImUskp/RRTBXrQ== X-Received: by 2002:a05:6830:1d8f:b0:6b5:9111:ddff with SMTP id y15-20020a0568301d8f00b006b59111ddffmr3573958oti.38.1687314766433; Tue, 20 Jun 2023 19:32:46 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id w7-20020a63c107000000b0053031f7a367sm2004276pgf.85.2023.06.20.19.32.44 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:32:45 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 01/12] bpf: Rename few bpf_mem_alloc fields. Date: Tue, 20 Jun 2023 19:32:27 -0700 Message-Id: <20230621023238.87079-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov Rename: - struct rcu_head rcu; - struct llist_head free_by_rcu; - struct llist_head waiting_for_gp; - atomic_t call_rcu_in_progress; + struct llist_head free_by_rcu_ttrace; + struct llist_head waiting_for_gp_ttrace; + struct rcu_head rcu_ttrace; + atomic_t call_rcu_ttrace_in_progress; ... - static void do_call_rcu(struct bpf_mem_cache *c) + static void do_call_rcu_ttrace(struct bpf_mem_cache *c) to better indicate intended use. The 'tasks trace' is shortened to 'ttrace' to reduce verbosity. No functional changes. Later patches will add free_by_rcu/waiting_for_gp fields to be used with normal RCU. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 57 ++++++++++++++++++++++--------------------- 1 file changed, 29 insertions(+), 28 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 0668bcd7c926..cc5b8adb4c83 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -99,10 +99,11 @@ struct bpf_mem_cache { int low_watermark, high_watermark, batch; int percpu_size; - struct rcu_head rcu; - struct llist_head free_by_rcu; - struct llist_head waiting_for_gp; - atomic_t call_rcu_in_progress; + /* list of objects to be freed after RCU tasks trace GP */ + struct llist_head free_by_rcu_ttrace; + struct llist_head waiting_for_gp_ttrace; + struct rcu_head rcu_ttrace; + atomic_t call_rcu_ttrace_in_progress; }; struct bpf_mem_caches { @@ -165,18 +166,18 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) old_memcg = set_active_memcg(memcg); for (i = 0; i < cnt; i++) { /* - * free_by_rcu is only manipulated by irq work refill_work(). + * free_by_rcu_ttrace is only manipulated by irq work refill_work(). * IRQ works on the same CPU are called sequentially, so it is * safe to use __llist_del_first() here. If alloc_bulk() is * invoked by the initial prefill, there will be no running * refill_work(), so __llist_del_first() is fine as well. * - * In most cases, objects on free_by_rcu are from the same CPU. + * In most cases, objects on free_by_rcu_ttrace are from the same CPU. * If some objects come from other CPUs, it doesn't incur any * harm because NUMA_NO_NODE means the preference for current * numa node and it is not a guarantee. */ - obj = __llist_del_first(&c->free_by_rcu); + obj = __llist_del_first(&c->free_by_rcu_ttrace); if (!obj) { /* Allocate, but don't deplete atomic reserves that typical * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc @@ -232,10 +233,10 @@ static void free_all(struct llist_node *llnode, bool percpu) static void __free_rcu(struct rcu_head *head) { - struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace); - free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size); - atomic_set(&c->call_rcu_in_progress, 0); + free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); + atomic_set(&c->call_rcu_ttrace_in_progress, 0); } static void __free_rcu_tasks_trace(struct rcu_head *head) @@ -254,32 +255,32 @@ static void enque_to_free(struct bpf_mem_cache *c, void *obj) struct llist_node *llnode = obj; /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. - * Nothing races to add to free_by_rcu list. + * Nothing races to add to free_by_rcu_ttrace list. */ - __llist_add(llnode, &c->free_by_rcu); + __llist_add(llnode, &c->free_by_rcu_ttrace); } -static void do_call_rcu(struct bpf_mem_cache *c) +static void do_call_rcu_ttrace(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; - if (atomic_xchg(&c->call_rcu_in_progress, 1)) + if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) return; - WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); - llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) - /* There is no concurrent __llist_add(waiting_for_gp) access. + WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp_ttrace)); + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu_ttrace)) + /* There is no concurrent __llist_add(waiting_for_gp_ttrace) access. * It doesn't race with llist_del_all either. - * But there could be two concurrent llist_del_all(waiting_for_gp): + * But there could be two concurrent llist_del_all(waiting_for_gp_ttrace): * from __free_rcu() and from drain_mem_cache(). */ - __llist_add(llnode, &c->waiting_for_gp); + __llist_add(llnode, &c->waiting_for_gp_ttrace); /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. * If RCU Tasks Trace grace period implies RCU grace period, free * these elements directly, else use call_rcu() to wait for normal * progs to finish and finally do free_one() on each element. */ - call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace); + call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace); } static void free_bulk(struct bpf_mem_cache *c) @@ -307,7 +308,7 @@ static void free_bulk(struct bpf_mem_cache *c) /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) enque_to_free(c, llnode); - do_call_rcu(c); + do_call_rcu_ttrace(c); } static void bpf_mem_refill(struct irq_work *work) @@ -441,13 +442,13 @@ static void drain_mem_cache(struct bpf_mem_cache *c) /* No progs are using this bpf_mem_cache, but htab_map_free() called * bpf_mem_cache_free() for all remaining elements and they can be in - * free_by_rcu or in waiting_for_gp lists, so drain those lists now. + * free_by_rcu_ttrace or in waiting_for_gp_ttrace lists, so drain those lists now. * - * Except for waiting_for_gp list, there are no concurrent operations + * Except for waiting_for_gp_ttrace list, there are no concurrent operations * on these lists, so it is safe to use __llist_del_all(). */ - free_all(__llist_del_all(&c->free_by_rcu), percpu); - free_all(llist_del_all(&c->waiting_for_gp), percpu); + free_all(__llist_del_all(&c->free_by_rcu_ttrace), percpu); + free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu); free_all(__llist_del_all(&c->free_llist), percpu); free_all(__llist_del_all(&c->free_llist_extra), percpu); } @@ -462,7 +463,7 @@ static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) static void free_mem_alloc(struct bpf_mem_alloc *ma) { - /* waiting_for_gp lists was drained, but __free_rcu might + /* waiting_for_gp_ttrace lists was drained, but __free_rcu might * still execute. Wait for it now before we freeing percpu caches. * * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(), @@ -535,7 +536,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) */ irq_work_sync(&c->refill_work); drain_mem_cache(c); - rcu_in_progress += atomic_read(&c->call_rcu_in_progress); + rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress); } /* objcg is the same across cpus */ if (c->objcg) @@ -550,7 +551,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) c = &cc->cache[i]; irq_work_sync(&c->refill_work); drain_mem_cache(c); - rcu_in_progress += atomic_read(&c->call_rcu_in_progress); + rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress); } } if (c->objcg) From patchwork Wed Jun 21 02:32:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE257EB64D7 for ; Wed, 21 Jun 2023 02:32:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229520AbjFUCcw (ORCPT ); Tue, 20 Jun 2023 22:32:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229757AbjFUCcv (ORCPT ); Tue, 20 Jun 2023 22:32:51 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9195B7; Tue, 20 Jun 2023 19:32:50 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-6686a1051beso3034480b3a.1; Tue, 20 Jun 2023 19:32:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314770; x=1689906770; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Q6ir4URdr01h+ZUg6kzH9KBRVLbULKDLoOzFhIlPeJI=; b=cVtyy1keDAGc43zXUjz7P8te/CdSYAwP+ZHTHkSsZRf20+95UqkziklF9R1f/EJAVG SYVgcFeYdrabQ9srzDtovuFckd3cgm+LogZQdfaPGPu4U+1zfmWixcwPmLhpgoqWx1qY laQq3+tmxIgSlqJ97TYgw5MNH1U9wF/M1jCwuMBfA7koZpyc7itL0yU4xIyHuB4DNiWs vTa0uM13CVhUUYBldxz1aiQHEMFSV9wsjSAfXW0r3ys4sJ1XveuA627LCtg46/wRoQhc k3EaG8VZi7JqefT3Sh055hnycua3vHveSLyk5MkZd0KB5YaMrR+pP1MCiovWmP+jWPc3 ziZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314770; x=1689906770; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q6ir4URdr01h+ZUg6kzH9KBRVLbULKDLoOzFhIlPeJI=; b=Qw/Q9wSOV20gnAHh2wYhbR/bsieOrgA/0YXAUTFcrL1J/omYdq858JCtJ+NM0D0jWV 22B5tqY1IEjDfVOEHwxTThHCA7H0SqTLd3Ww5BwQZQCqzNICIubRRyVi8/9I7wu7zixN XkxILPKDG1apYJqQT5inVys9auC4kjBz7m8NuPO7Dc/arVKkal2CP2XrJ5gb7rpbJJcz T8n/hKDQJBAPLBB2AmUSj15T1rtb4GzU6ITeVOe4AMjUuAYWDjVpQi3nTq3yto8nemHO hAY3Rav81laD6uBA5fls0QhZDsMKKWPQN/2ai+YpZsObjKIlM9fKw5MBSzBQq98e6W3W 5MTg== X-Gm-Message-State: AC+VfDzqCfLa6S2pE7rVm9f9BCsRINhRnjZJHsapQ0KlE57ZLgokbVyy HRbT4OmBZQ21YCsRb9jpmy0= X-Google-Smtp-Source: ACHHUZ4SzoHMqXDbXtZX40cvgI1BRhhb6FDvPjqOvsYyyTp9yaIBmmW7+KipvQQ+JZuKUPuPxE/Cuw== X-Received: by 2002:a05:6a00:2309:b0:668:7744:10ea with SMTP id h9-20020a056a00230900b00668774410eamr9902325pfh.18.1687314770235; Tue, 20 Jun 2023 19:32:50 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id x19-20020a62fb13000000b0066872ef995fsm1896987pfm.5.2023.06.20.19.32.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:32:49 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 02/12] bpf: Simplify code of destroy_mem_alloc() with kmemdup(). Date: Tue, 20 Jun 2023 19:32:28 -0700 Message-Id: <20230621023238.87079-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov Use kmemdup() to simplify the code. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index cc5b8adb4c83..b0011217be6c 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -499,7 +499,7 @@ static void destroy_mem_alloc(struct bpf_mem_alloc *ma, int rcu_in_progress) return; } - copy = kmalloc(sizeof(*ma), GFP_KERNEL); + copy = kmemdup(ma, sizeof(*ma), GFP_KERNEL); if (!copy) { /* Slow path with inline barrier-s */ free_mem_alloc(ma); @@ -507,10 +507,7 @@ static void destroy_mem_alloc(struct bpf_mem_alloc *ma, int rcu_in_progress) } /* Defer barriers into worker to let the rest of map memory to be freed */ - copy->cache = ma->cache; - ma->cache = NULL; - copy->caches = ma->caches; - ma->caches = NULL; + memset(ma, 0, sizeof(*ma)); INIT_WORK(©->work, free_mem_alloc_deferred); queue_work(system_unbound_wq, ©->work); } From patchwork Wed Jun 21 02:32:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286582 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DD4FEB64D7 for ; Wed, 21 Jun 2023 02:32:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229873AbjFUCc5 (ORCPT ); Tue, 20 Jun 2023 22:32:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229821AbjFUCcz (ORCPT ); Tue, 20 Jun 2023 22:32:55 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 061E8F1; Tue, 20 Jun 2023 19:32:55 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1b52bf6e669so43216635ad.2; Tue, 20 Jun 2023 19:32:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314774; x=1689906774; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=m5dE0XhJvwd5u50dMmi7rGk+uakRp6is0d4Rg5rl7Ow=; b=pFpKKVIhWLOacLaK4ACO7CJR5EfTYtMLXSJLS0p6Vt4vEeCWfGrdy27nUgsR/4Elpe FUOH9F/L2n+xiHUprzvX7RtfMlY1EehhmtXcQD+35WA09E4JmFIPE+s0rFfd5fGrWeFT saCLgujjLTtP9FqRphFbKvnvPjqrrrpQ5ZODkDEv14gZJB+VSbS0ezl9QPi4bx/ssnHq yd3PXooGNwdrGdyH2XAADBt137wAMNbo2BKuOnIj69txUpr29pTyVIoljvYiMfUmFAXv 4J+EAww4ncGUTjcbyHtCAOip2GIoQ7yVF5q55uDePhoyqcVnc88WmCM4ZGtW10D7op6x 1uMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314774; x=1689906774; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=m5dE0XhJvwd5u50dMmi7rGk+uakRp6is0d4Rg5rl7Ow=; b=CIedwQaRJJ8MTBAZL350tsgPcW45j73wOt+0okk2lQuCHBmIDR7JYFzM3v0XSyTg8r zrOV8libSIIU6oZ7KYO+Zvrh17xvNx9aEqZiexsRvTOxMjLGA06nQZWaoBRLFzdWThJ0 7secbgzXsBryhqXOooyD7wGxywD4ZLytjeez3bd+LPFEqb7KN8Vkjtpso+6awOoCbkU1 TM8AlXW5n+LAPCspkRHlQTMonDMgjNm6V3wsyKfDFx+bzlbzdFuAhtCe1+wecZOWO5h6 pg8CTDqIpU/rAHooBL0iyCh9zDoo/AMQJtMZjph8uap7Aizw5CQvAtvPkRBLraZK0E7v E3Fg== X-Gm-Message-State: AC+VfDwaBA2bkse64wx/sBb1Y8doqUnFDBwrjP1IpDv9juEOtaddkBt5 3YnX+KMuoWQOOfUEhGKbF0w= X-Google-Smtp-Source: ACHHUZ6hP+/68xZlB/1oLVt1LlDULPjN03SzRz4XC7q4z8Vhs92C5FcKdgPYUz31bUxGv2sYvNYoSA== X-Received: by 2002:a17:902:d509:b0:1b0:4a37:9ccc with SMTP id b9-20020a170902d50900b001b04a379cccmr16821189plg.62.1687314774314; Tue, 20 Jun 2023 19:32:54 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id y20-20020a1709027c9400b001a9bcedd598sm2251306pll.11.2023.06.20.19.32.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:32:53 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 03/12] bpf: Let free_all() return the number of freed elements. Date: Tue, 20 Jun 2023 19:32:29 -0700 Message-Id: <20230621023238.87079-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov Let free_all() helper return the number of freed elements. It's not used in this patch, but helps in debug/development of bpf_mem_alloc. For example this diff for __free_rcu(): - free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); + printk("cpu %d freed %d objs after tasks trace\n", raw_smp_processor_id(), + free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size)); would show how busy RCU tasks trace is. In artificial benchmark where one cpu is allocating and different cpu is freeing the RCU tasks trace won't be able to keep up and the list of objects would keep growing from thousands to millions and eventually OOMing. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index b0011217be6c..693651d2648b 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -223,12 +223,16 @@ static void free_one(void *obj, bool percpu) kfree(obj); } -static void free_all(struct llist_node *llnode, bool percpu) +static int free_all(struct llist_node *llnode, bool percpu) { struct llist_node *pos, *t; + int cnt = 0; - llist_for_each_safe(pos, t, llnode) + llist_for_each_safe(pos, t, llnode) { free_one(pos, percpu); + cnt++; + } + return cnt; } static void __free_rcu(struct rcu_head *head) From patchwork Wed Jun 21 02:32:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42FFDEB64D8 for ; Wed, 21 Jun 2023 02:33:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229757AbjFUCdC (ORCPT ); Tue, 20 Jun 2023 22:33:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229885AbjFUCdB (ORCPT ); Tue, 20 Jun 2023 22:33:01 -0400 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 037EA10F0; Tue, 20 Jun 2023 19:32:59 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-6686a05bc66so2361406b3a.1; Tue, 20 Jun 2023 19:32:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314778; x=1689906778; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=16uWDlZdQR3Juwac+BQwsJvGxkAsevnPbRvEaTU5eEY=; b=JM2yRxUx0DuMsUf2ncZJON3wepLg5qqvC/Flm5A3O+CrUzbdgimRgELNYGtuKg+hu7 BxGfMT1vB06xClWfEPgR4cBH+Q9RX/FKPQ+Xw3cyZ5DS75m32iihLR8jfq5iNJCqx/HO Y4Qc7PlMhQG01a5462b3oFSFI3vrYVSI1IZ/xJ2JKhSRx0csQUxKVRqdea2gh+ADUzQi N/pBoRBo5ivr3MyvcoTjFidRwbC0mYXK0d2FK87kmFh4WACi0tXoXJhpXMSnY5WC7yh0 ksqWTn7KvUWAV+pXDqe30brSlR8dS4ZP+v9Bit5L1Z7VyvfVFENZfDW/loP+xWQ2zXNg Nb3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314778; x=1689906778; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=16uWDlZdQR3Juwac+BQwsJvGxkAsevnPbRvEaTU5eEY=; b=SMb+uL/voQXAY84aPDdgUqbXDwlzveadCRxKyS4uMxmgd2YbSLkfLGI4iaR+k7vIhq AR/AQ3MYSy6dueYnphLLot3TZxAue8wmerhDBHZRRMXV99gobrIH01p2mqdVN7HQbWzg 9U2qk9dqW98Kz9jLszZnTidTzUpzEoYQqKF79BegXpHFYe+9/gms2rmc8Bq0Tt7L9iMY TCNpRu1wKubyGC2/nEHoZIGgEZ+c8PieI36MmxmNjOIQV1eMc8Vex0VHB2r49nuJnqCT me64LwxeB1rYtSKk82Dd9JitawbShpK9xo6QGqyFS4kNoTLIXGh7nVZ6FoHzymFcjz2i RDww== X-Gm-Message-State: AC+VfDzn9ddQeAXHQi3cg/YtwJGoSunMOGuJz6e8EgA85P2QFvv+2bYc t+biMbo3nLWIX73L1UgaITs= X-Google-Smtp-Source: ACHHUZ4z3QrQN1C47A7g4yKSTcLxPm1+c2/TqM0d7kBM8i4Dri+ssgwk1ld3d0ZiQHBcH1vxBHbOlw== X-Received: by 2002:a05:6a20:842a:b0:114:6390:db06 with SMTP id c42-20020a056a20842a00b001146390db06mr14501851pzd.32.1687314778326; Tue, 20 Jun 2023 19:32:58 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id n21-20020a170902969500b001b016313b1esm2264547plp.82.2023.06.20.19.32.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:32:57 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 04/12] bpf: Refactor alloc_bulk(). Date: Tue, 20 Jun 2023 19:32:30 -0700 Message-Id: <20230621023238.87079-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov Factor out inner body of alloc_bulk into separate helper. No functioncal changes. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 46 ++++++++++++++++++++++++------------------- 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 693651d2648b..9693b1f8cbda 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -154,11 +154,35 @@ static struct mem_cgroup *get_memcg(const struct bpf_mem_cache *c) #endif } +static void add_obj_to_free_list(struct bpf_mem_cache *c, void *obj) +{ + unsigned long flags; + + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + /* In RT irq_work runs in per-cpu kthread, so disable + * interrupts to avoid preemption and interrupts and + * reduce the chance of bpf prog executing on this cpu + * when active counter is busy. + */ + local_irq_save(flags); + /* alloc_bulk runs from irq_work which will not preempt a bpf + * program that does unit_alloc/unit_free since IRQs are + * disabled there. There is no race to increment 'active' + * counter. It protects free_llist from corruption in case NMI + * bpf prog preempted this loop. + */ + WARN_ON_ONCE(local_inc_return(&c->active) != 1); + __llist_add(obj, &c->free_llist); + c->free_cnt++; + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); +} + /* Mostly runs from irq_work except __init phase. */ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) { struct mem_cgroup *memcg = NULL, *old_memcg; - unsigned long flags; void *obj; int i; @@ -188,25 +212,7 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) if (!obj) break; } - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - /* In RT irq_work runs in per-cpu kthread, so disable - * interrupts to avoid preemption and interrupts and - * reduce the chance of bpf prog executing on this cpu - * when active counter is busy. - */ - local_irq_save(flags); - /* alloc_bulk runs from irq_work which will not preempt a bpf - * program that does unit_alloc/unit_free since IRQs are - * disabled there. There is no race to increment 'active' - * counter. It protects free_llist from corruption in case NMI - * bpf prog preempted this loop. - */ - WARN_ON_ONCE(local_inc_return(&c->active) != 1); - __llist_add(obj, &c->free_llist); - c->free_cnt++; - local_dec(&c->active); - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - local_irq_restore(flags); + add_obj_to_free_list(c, obj); } set_active_memcg(old_memcg); mem_cgroup_put(memcg); From patchwork Wed Jun 21 02:32:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2824BEB64D7 for ; Wed, 21 Jun 2023 02:33:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229821AbjFUCdE (ORCPT ); Tue, 20 Jun 2023 22:33:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbjFUCdD (ORCPT ); Tue, 20 Jun 2023 22:33:03 -0400 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22089B7; Tue, 20 Jun 2023 19:33:03 -0700 (PDT) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1b525af07a6so25443515ad.1; Tue, 20 Jun 2023 19:33:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314782; x=1689906782; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=demoIvd6FUiWf68KRG+qXQL3nbt4VblYJWw6+Cz8IOo=; b=JNKCkfG9NmGn61uAnLjtFFkMM5+EQn0vYwBstEQwyylfetErBEuyj2aOGsbHx+p3mi /Lqe5bro7krmF6Rc/uQzaOXvfnjf6TT0Mj7DPoGZ+PzdDoAv+DDuNFdVULEfYyk+1dYl a7FU3G/1+x+jEI9jmKBvzTgEP34Rm5pvn1pbSCRgbMhS6AVkaTXZxtEt60vtA2TmXEhr 7Unc2WQiXCmQK8TB3bFJh3CWE06X1q3fy77nFEF4BVYKbSwlxy0LgrPC26J0TuvBsucG +akhV6F+9j5n2fPrGDLzt0762log1lRnuDdL3TdCh9t15+UZ/OZuEnyprKBw3f68sL18 xwXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314782; x=1689906782; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=demoIvd6FUiWf68KRG+qXQL3nbt4VblYJWw6+Cz8IOo=; b=UjZpWJLWRf0uhtefssV0SYm2vRvNEVGMsMP1QrwojFfh0KEkhc4UJsGGTcpwXRkdAW 7daHLX7M86BtJTmmMfu6z9a02BWDP6XKOkmvKY7Lw8sWNDu7yUfPZoEF24kwspuzSLBq 2p67Gup9U40c9+dhTEYZ91rcSASC1hqcFNNZl9b5hoXck9en8Cb7tLN/PAWxq+dDeOp7 VQ+RTEJEdCzyMdfN562vMbXMpjWrwW3TgPp3VgxsQOX9l5u9lrzg5XYpn1/fJ23eMFB7 WpnnPvAlijemOqorgqb3QHnfMTDZyp5DTSTt4iHuAedAq48hSpZ9F7BX5P4tjy5OVysu BO4A== X-Gm-Message-State: AC+VfDzCiFN0E3jSZ3N4Kq5aGgTpqkvlre39ZASHYQ1Fni83J5kKZo4B UtKoh06Q4IKSPfFpdHqEmKU= X-Google-Smtp-Source: ACHHUZ6ZAFSZLYBnwYiDyEaEljhkLUZQmw69WAPUZX4bk5W7Q/cNakX5ngte0EHX2LihuWU/EExNqA== X-Received: by 2002:a17:902:eacb:b0:1b3:fb76:215b with SMTP id p11-20020a170902eacb00b001b3fb76215bmr10280282pld.48.1687314782528; Tue, 20 Jun 2023 19:33:02 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id w5-20020a170902d3c500b001ae5d21f760sm2250764plb.146.2023.06.20.19.33.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:33:02 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 05/12] bpf: Further refactor alloc_bulk(). Date: Tue, 20 Jun 2023 19:32:31 -0700 Message-Id: <20230621023238.87079-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov In certain scenarios alloc_bulk() migth be taking free objects mainly from free_by_rcu_ttrace list. In such case get_memcg() and set_active_memcg() are redundant, but they show up in perf profile. Split the loop and only set memcg when allocating from slab. No performance difference in this patch alone, but it helps in combination with further patches. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 9693b1f8cbda..b07368d77343 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -186,8 +186,6 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) void *obj; int i; - memcg = get_memcg(c); - old_memcg = set_active_memcg(memcg); for (i = 0; i < cnt; i++) { /* * free_by_rcu_ttrace is only manipulated by irq work refill_work(). @@ -202,16 +200,24 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) * numa node and it is not a guarantee. */ obj = __llist_del_first(&c->free_by_rcu_ttrace); - if (!obj) { - /* Allocate, but don't deplete atomic reserves that typical - * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc - * will allocate from the current numa node which is what we - * want here. - */ - obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT); - if (!obj) - break; - } + if (!obj) + break; + add_obj_to_free_list(c, obj); + } + if (i >= cnt) + return; + + memcg = get_memcg(c); + old_memcg = set_active_memcg(memcg); + for (; i < cnt; i++) { + /* Allocate, but don't deplete atomic reserves that typical + * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc + * will allocate from the current numa node which is what we + * want here. + */ + obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT); + if (!obj) + break; add_obj_to_free_list(c, obj); } set_active_memcg(old_memcg); From patchwork Wed Jun 21 02:32:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61263EB64D7 for ; Wed, 21 Jun 2023 02:33:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229875AbjFUCdI (ORCPT ); Tue, 20 Jun 2023 22:33:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbjFUCdH (ORCPT ); Tue, 20 Jun 2023 22:33:07 -0400 Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17074B4; Tue, 20 Jun 2023 19:33:07 -0700 (PDT) Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-51b4ef5378bso4418131a12.1; Tue, 20 Jun 2023 19:33:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314786; x=1689906786; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6NzUeillbcneAmPlmWD6PS2VoyHI/+zawYBd3/Cla9Q=; b=oebQFylccU4jWEWLnxkRUk56nl8VU9C2KWa8XekBFBjZ7Ht0bGlle76tlI2nUR3qrt ipNnqCShC6pD0D+ztVD7enocqdJk2BeSpEgXzBFmFQCUW3vrn1w/GV/mBvYTGfqF+BIk +pLE5yKY9EJ5mp1140Yx4phKq97mc0C8b460rhdofJbDl7Rxy615mvngpScxRLZpSjpx 9R4I4vmQ+aVpnL6mYyAAfYH6sA1tLWnAg5x9KLhTYW/DrCHIRDGhxA4CHKQEp0qt9b3X KMrvweOPFsLewe9pcfNRrALjPTVLhEwsb7LLLzjKnvjUkzAXtaLcZhqvCXO+gftGbIau wwNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314786; x=1689906786; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6NzUeillbcneAmPlmWD6PS2VoyHI/+zawYBd3/Cla9Q=; b=CgMZmXQe1hjpzLW/hcMsscWoLNZg9K/AK0nlcCCNQE8zHivb9wyHMIb8bF9GIdHWqg vJ4Vu6EaPGTlq3eua5vH/UQM3FV5Hjk1WOLyDoSL24AkoOEg+pCS4HrvfpqCtEGE/WQd 2zL/zCbzGdQZVNRRh1iTRtNcesvrBYTQwvbBDl0hDNSnxVWvujVb3q7nEGyhaXvn5gsZ KB+Hx5lo3iRu6l4gVJWkzbmze8i2vCeREitdBsfIDwlOViq2MdLtDJYfQ2GiINoRAu9s YjZh9i8bPJj0jvXasDNktd+/6kEZce5nWqCusBt0wlLnu7DLv3lqhDBZ402A1Ye9RBEi /wUQ== X-Gm-Message-State: AC+VfDwiDN74nLWWMrL/Oc8Ug+KyI4BGG9ZaOhirkq+SU8jcjdZkrSQE 7Oou1wKkEtISmIZZAt/ZVUGefTkM2gM= X-Google-Smtp-Source: ACHHUZ6MghIpaSWcxZ0YjyTh0Kq6YNJA3vZlCjUoXi8J9U30MtmSP6y3Je02GtvlrHrxemSHe7Ey1w== X-Received: by 2002:a05:6a20:3d27:b0:121:5c4b:45bc with SMTP id y39-20020a056a203d2700b001215c4b45bcmr11210645pzi.58.1687314786466; Tue, 20 Jun 2023 19:33:06 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id g2-20020aa78742000000b0064f46570bb7sm1878808pfo.167.2023.06.20.19.33.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:33:06 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 06/12] bpf: Optimize moving objects from free_by_rcu_ttrace to waiting_for_gp_ttrace. Date: Tue, 20 Jun 2023 19:32:32 -0700 Message-Id: <20230621023238.87079-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov Optimize moving objects from free_by_rcu_ttrace to waiting_for_gp_ttrace by remembering the tail. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index b07368d77343..4fd79bd51f5a 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -101,6 +101,7 @@ struct bpf_mem_cache { /* list of objects to be freed after RCU tasks trace GP */ struct llist_head free_by_rcu_ttrace; + struct llist_node *free_by_rcu_ttrace_tail; struct llist_head waiting_for_gp_ttrace; struct rcu_head rcu_ttrace; atomic_t call_rcu_ttrace_in_progress; @@ -273,24 +274,27 @@ static void enque_to_free(struct bpf_mem_cache *c, void *obj) /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. * Nothing races to add to free_by_rcu_ttrace list. */ - __llist_add(llnode, &c->free_by_rcu_ttrace); + if (__llist_add(llnode, &c->free_by_rcu_ttrace)) + c->free_by_rcu_ttrace_tail = llnode; } static void do_call_rcu_ttrace(struct bpf_mem_cache *c) { - struct llist_node *llnode, *t; + struct llist_node *llnode; if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) return; WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp_ttrace)); - llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu_ttrace)) + llnode = __llist_del_all(&c->free_by_rcu_ttrace); + if (llnode) /* There is no concurrent __llist_add(waiting_for_gp_ttrace) access. * It doesn't race with llist_del_all either. * But there could be two concurrent llist_del_all(waiting_for_gp_ttrace): * from __free_rcu() and from drain_mem_cache(). */ - __llist_add(llnode, &c->waiting_for_gp_ttrace); + __llist_add_batch(llnode, c->free_by_rcu_ttrace_tail, + &c->waiting_for_gp_ttrace); /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. * If RCU Tasks Trace grace period implies RCU grace period, free * these elements directly, else use call_rcu() to wait for normal From patchwork Wed Jun 21 02:32:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286586 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 495CAEB64D7 for ; Wed, 21 Jun 2023 02:33:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229852AbjFUCdM (ORCPT ); Tue, 20 Jun 2023 22:33:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbjFUCdM (ORCPT ); Tue, 20 Jun 2023 22:33:12 -0400 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AEDEF1; Tue, 20 Jun 2023 19:33:11 -0700 (PDT) Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-51452556acdso2459732a12.2; Tue, 20 Jun 2023 19:33:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314790; x=1689906790; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vqFaYoyk0/PHSh1SuXkk3tJoz+IsVkP3nG7jQGkvH4c=; b=ewnhZpp6X3uUhOrp5DXjxBEmu+lI/+oBmYaLNnkktQZk1nLX7NKHuD6hihZdEi5xGK ArTzFDuJEWbOt/TbmmKKGONu0uirCErAJadhMNK2r0GWk2EEjU39h580ScR53RLNeA8E zKAujfk3PRItECUY311ndjMFvC+W4SArwPE00f/GTqNwCaSPIHvPfXR9OqF2QOTvbwQJ d3CuGAvzmO/Yrj/8o4DoJI+zT+tw+mCP9wqpDBeyQaLKLNbR3PDWjIZ4Goe7a8uu2n2A bQznQ/4XmaVHZHD0uirL+kirmnluvSz0+ll0YIOVU7BdsBdpIHT/X6o2x9GikCu+5lgN nyyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314790; x=1689906790; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vqFaYoyk0/PHSh1SuXkk3tJoz+IsVkP3nG7jQGkvH4c=; b=UjGoPsO9g/509xhI5UFJkyqDVjSbdNWnCzTwK08bqjKfp1Z3WyHfdAoJAs+oHidCLh BS5v2lX3/htiM8jD3TJZNvAVO3IEMoIGYwB8p7CEbJdCQSIsx1X1+R1Q3YzoxLoG7wfx NXMm6yVr4xYDmsf3P6em1Xjq0DHGfSvg+9aMPlxm1TVMOWrALWgw7XBhoDPWTuZkf/1b dPLjxqFVQhr8xP3mGXKkAX3gy75r1hoMSWWnHEchzPl9x5eRT+9PFKSHpjVdOnsZrtPH xft0u1Gj6ZUIuIaQJGYB3Nk27pPQH5BHHxk3EnPcCyjbPUmSmQzjNqVenqEsTDH75LPC mhdA== X-Gm-Message-State: AC+VfDwdx+Y55z12qwTEJ9XTWJewHotusdQr69NKmRr1IgBajazBhwEW SWL5nR0WBV9Cnkk/9wsclWM= X-Google-Smtp-Source: ACHHUZ4sCQNuGWkAYmDAYa0ZZxbIn/Dp+5wu55Sy4bP65uDB+nH1Ocgy6Ae+gvepe061zJuiyT0fcA== X-Received: by 2002:a05:6a20:a11c:b0:106:c9b7:c92f with SMTP id q28-20020a056a20a11c00b00106c9b7c92fmr10893867pzk.49.1687314790363; Tue, 20 Jun 2023 19:33:10 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id a3-20020a170902ecc300b001b03a1a3151sm2244329plh.70.2023.06.20.19.33.08 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:33:09 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 07/12] bpf: Add a hint to allocated objects. Date: Tue, 20 Jun 2023 19:32:33 -0700 Message-Id: <20230621023238.87079-8-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov To address OOM issue when one cpu is allocating and another cpu is freeing add a target bpf_mem_cache hint to allocated objects and when local cpu free_llist overflows free to that bpf_mem_cache. The hint addresses the OOM while maintaing the same performance for common case when alloc/free are done on the same cpu. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 46 ++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 4fd79bd51f5a..8b7645bffd1a 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -98,6 +98,7 @@ struct bpf_mem_cache { int free_cnt; int low_watermark, high_watermark, batch; int percpu_size; + struct bpf_mem_cache *tgt; /* list of objects to be freed after RCU tasks trace GP */ struct llist_head free_by_rcu_ttrace; @@ -189,18 +190,11 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) for (i = 0; i < cnt; i++) { /* - * free_by_rcu_ttrace is only manipulated by irq work refill_work(). - * IRQ works on the same CPU are called sequentially, so it is - * safe to use __llist_del_first() here. If alloc_bulk() is - * invoked by the initial prefill, there will be no running - * refill_work(), so __llist_del_first() is fine as well. - * - * In most cases, objects on free_by_rcu_ttrace are from the same CPU. - * If some objects come from other CPUs, it doesn't incur any - * harm because NUMA_NO_NODE means the preference for current - * numa node and it is not a guarantee. + * For every 'c' llist_del_first(&c->free_by_rcu_ttrace); is + * done only by one CPU == current CPU. Other CPUs might + * llist_add() and llist_del_all() in parallel. */ - obj = __llist_del_first(&c->free_by_rcu_ttrace); + obj = llist_del_first(&c->free_by_rcu_ttrace); if (!obj) break; add_obj_to_free_list(c, obj); @@ -274,7 +268,7 @@ static void enque_to_free(struct bpf_mem_cache *c, void *obj) /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. * Nothing races to add to free_by_rcu_ttrace list. */ - if (__llist_add(llnode, &c->free_by_rcu_ttrace)) + if (llist_add(llnode, &c->free_by_rcu_ttrace)) c->free_by_rcu_ttrace_tail = llnode; } @@ -286,7 +280,7 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) return; WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp_ttrace)); - llnode = __llist_del_all(&c->free_by_rcu_ttrace); + llnode = llist_del_all(&c->free_by_rcu_ttrace); if (llnode) /* There is no concurrent __llist_add(waiting_for_gp_ttrace) access. * It doesn't race with llist_del_all either. @@ -299,16 +293,22 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) * If RCU Tasks Trace grace period implies RCU grace period, free * these elements directly, else use call_rcu() to wait for normal * progs to finish and finally do free_one() on each element. + * + * call_rcu_tasks_trace() enqueues to a global queue, so it's ok + * that current cpu bpf_mem_cache != target bpf_mem_cache. */ call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace); } static void free_bulk(struct bpf_mem_cache *c) { + struct bpf_mem_cache *tgt = c->tgt; struct llist_node *llnode, *t; unsigned long flags; int cnt; + WARN_ON_ONCE(tgt->unit_size != c->unit_size); + do { if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_save(flags); @@ -322,13 +322,13 @@ static void free_bulk(struct bpf_mem_cache *c) if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); if (llnode) - enque_to_free(c, llnode); + enque_to_free(tgt, llnode); } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) - enque_to_free(c, llnode); - do_call_rcu_ttrace(c); + enque_to_free(tgt, llnode); + do_call_rcu_ttrace(tgt); } static void bpf_mem_refill(struct irq_work *work) @@ -427,6 +427,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) c->unit_size = unit_size; c->objcg = objcg; c->percpu_size = percpu_size; + c->tgt = c; prefill_mem_cache(c, cpu); } ma->cache = pc; @@ -449,6 +450,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) c = &cc->cache[i]; c->unit_size = sizes[i]; c->objcg = objcg; + c->tgt = c; prefill_mem_cache(c, cpu); } } @@ -467,7 +469,7 @@ static void drain_mem_cache(struct bpf_mem_cache *c) * Except for waiting_for_gp_ttrace list, there are no concurrent operations * on these lists, so it is safe to use __llist_del_all(). */ - free_all(__llist_del_all(&c->free_by_rcu_ttrace), percpu); + free_all(llist_del_all(&c->free_by_rcu_ttrace), percpu); free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu); free_all(__llist_del_all(&c->free_llist), percpu); free_all(__llist_del_all(&c->free_llist_extra), percpu); @@ -599,8 +601,10 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) local_irq_save(flags); if (local_inc_return(&c->active) == 1) { llnode = __llist_del_first(&c->free_llist); - if (llnode) + if (llnode) { cnt = --c->free_cnt; + *(struct bpf_mem_cache **)llnode = c; + } } local_dec(&c->active); local_irq_restore(flags); @@ -624,6 +628,12 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) BUILD_BUG_ON(LLIST_NODE_SZ > 8); + /* + * Remember bpf_mem_cache that allocated this object. + * The hint is not accurate. + */ + c->tgt = *(struct bpf_mem_cache **)llnode; + local_irq_save(flags); if (local_inc_return(&c->active) == 1) { __llist_add(llnode, &c->free_llist); From patchwork Wed Jun 21 02:32:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDB64EB64D7 for ; Wed, 21 Jun 2023 02:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229791AbjFUCdQ (ORCPT ); Tue, 20 Jun 2023 22:33:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbjFUCdP (ORCPT ); Tue, 20 Jun 2023 22:33:15 -0400 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02889B4; Tue, 20 Jun 2023 19:33:15 -0700 (PDT) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1b6824141b4so9699205ad.1; Tue, 20 Jun 2023 19:33:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314794; x=1689906794; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wtnIfLf/LPJTKp8G83PSPJyAcD0QiLkQGEhE11oDtAQ=; b=Xe/1E+JMggzR2gAo3k6EuEek9c/3RlwT7XNLFTxjYllnvHcPgCxUFxzKsH7+WoXDdo cwGsUPASALMliQh+O8nah/PbIELBl17qAnrvJPKFDURx7aMR5fMEbNVMEvI2GhLlDUrZ AOBSBwfCRmQQgJhiJ6MNNA6cVFSkPN/0xTNReMlAEc7W4mrYl0Xk5tjFP63OIwRbs5Kr QDxLmX8UPHYgDKqAlWul+rIz07YMBwUUUKalocL+gPmSM01HwZAkZJgFJpNntG2cvJFf cXa5Y9VNMB0RGH9feLDg3yHx2Qtf1TQNJjtuuMAJCiW3ShTu0NIf6E7nWLZb2CeIfaCu ysvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314794; x=1689906794; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wtnIfLf/LPJTKp8G83PSPJyAcD0QiLkQGEhE11oDtAQ=; b=IhuKCDlqC7xNNJmcbiSB+f4RuTX+6aT+hj+N2mVJ7xqo3tbw38+RlJEzuYZ79mvRLn zc4AkG/fkNqFF3UMMFJ1Dk7teE49z9CpAFzxhYGXIwRHy6rXRe2vu34CQK4WF+XlDw7N rdS2V31JgJ0INqdJJVv8J3k2rp81PUGsG5sV53TB/8v5YnXAiTPGSTIDPFu066/P33QG PEnHpHSwGilgYykLh6ShIP9UWtQnQRRnsCQzBfIzTbR5D2ba+3AN2eFQnC59N1ZaQM+v I3rigShcEs97crNisCYxvn6oc7hM6bupji8lJMV8k2+WTMZJ3/EwbWibQGL/a1z8zgTh DMXA== X-Gm-Message-State: AC+VfDx0aYsnWydUfh56Jbw0GIyF27iXgYTFLLJwg9iJ7DO3JvS4No+G 3PE4OVmJAk3sqqVptmn7rbE= X-Google-Smtp-Source: ACHHUZ6WvrksB8kF5pyesxHwJdU6H/DuUEsC7Cf1lxQMdkzxZxlKpgQPQGVssZaoF/7CAFdRP69RMQ== X-Received: by 2002:a17:902:cec7:b0:1b3:db5d:e8a0 with SMTP id d7-20020a170902cec700b001b3db5de8a0mr19915922plg.28.1687314794365; Tue, 20 Jun 2023 19:33:14 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id v23-20020a1709028d9700b001b5656b0bf9sm2212768plo.286.2023.06.20.19.33.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:33:13 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 08/12] bpf: Allow reuse from waiting_for_gp_ttrace list. Date: Tue, 20 Jun 2023 19:32:34 -0700 Message-Id: <20230621023238.87079-9-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov alloc_bulk() can reuse elements from free_by_rcu_ttrace. Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc(). Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 8b7645bffd1a..10d027674743 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -202,6 +202,15 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) if (i >= cnt) return; + for (; i < cnt; i++) { + obj = llist_del_first(&c->waiting_for_gp_ttrace); + if (!obj) + break; + add_obj_to_free_list(c, obj); + } + if (i >= cnt) + return; + memcg = get_memcg(c); old_memcg = set_active_memcg(memcg); for (; i < cnt; i++) { From patchwork Wed Jun 21 02:32:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286588 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A52EDEB64D8 for ; Wed, 21 Jun 2023 02:33:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229885AbjFUCdV (ORCPT ); Tue, 20 Jun 2023 22:33:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbjFUCdT (ORCPT ); Tue, 20 Jun 2023 22:33:19 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 067D1B7; Tue, 20 Jun 2023 19:33:19 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-668704a5b5bso3119179b3a.0; Tue, 20 Jun 2023 19:33:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314798; x=1689906798; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QJFALLSA0Zej+KutPxvztQdPk2ASSkOODjkIKNtNWzE=; b=qKNLJ1TSm1wXJWHwDkxH9F/UzLEIyQuftptzJGSxXDZzZPkvJMyfYJNfy2DOcRGeby Ahrs4o4hF7ae485L/o3G/NhiZHxL9sbkDJbII4IwY0v1V4LrOL734V5wObhehFb8ttpS XaHaIIpwRAhslQf3NXcsyysXV3VY5ftaURSaA5PhveF7rhdlCTvqISXfDgIub596Yn2i Pc/nNDoeIPjzvtdYOktGRck8y7vQUuw8nz3CYRFAWalA6YOxbhg7MzLviKaF+eLwP9K6 0bbbmDnyeVP+RNEkaDsB05pQoH2sarACh0VNBmDab31tir1hSfa8hRXpeEIzS7FI2rn1 vRjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314798; x=1689906798; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QJFALLSA0Zej+KutPxvztQdPk2ASSkOODjkIKNtNWzE=; b=J1VSCLpj6u87pbZl9OUII/NLE1ChQalcu8SnheiA0Db+VBogd6wd67iAnLaSeRgT/s hWCYrXkAGNsMV3maUU32tmsATJciaUJwXCb9BGmLh/Syn9uxMxt8Wi/PZRojMh0hFG0h KOBt54tjOrzymgN0U3UAWsqdHuKRxTYLgo8GTHM0IkQZvNXo5uqghoyp4hpfVJPHLYy6 f6FaqmNKIwf5xKaYYjfB3q9SUT8ut8lBldlGiFK0yVJcN5fAq+tC8ei5qTIU9pFY7bmU KjmkPup42MYAgRmYNTN3Nz/R8rGCJMgq5/hjZvHTHNgFjJqk2gGaNUrsXxnnmMtLNQVn b3fQ== X-Gm-Message-State: AC+VfDy0woEFBt2OiUW90e/e+8F790b4AHWFFpUoUnS9nA9PhHOYwgIj zgwafroPG52vkJxMRDzeYwQ= X-Google-Smtp-Source: ACHHUZ76+u1Qgg4DLJ6cVJswS48hyHGNQJ1ZvA7sgFImciRpXkia92T++/EuLi7ytgn24KKIS3/i7Q== X-Received: by 2002:a05:6a00:1a56:b0:658:c1a9:becc with SMTP id h22-20020a056a001a5600b00658c1a9beccmr20215081pfv.12.1687314798237; Tue, 20 Jun 2023 19:33:18 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id d20-20020aa78154000000b00666a83bd544sm1894485pfn.23.2023.06.20.19.33.16 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:33:17 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 09/12] rcu: Export rcu_request_urgent_qs_task() Date: Tue, 20 Jun 2023 19:32:35 -0700 Message-Id: <20230621023238.87079-10-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Paul E. McKenney" If a CPU is executing a long series of non-sleeping system calls, RCU grace periods can be delayed for on the order of a couple hundred milliseconds. This is normally not a problem, but if each system call does a call_rcu(), those callbacks can stack up. RCU will eventually notice this callback storm, but use of rcu_request_urgent_qs_task() allows the code invoking call_rcu() to give RCU a heads up. This function is not for general use, not yet, anyway. Reported-by: Alexei Starovoitov Signed-off-by: Paul E. McKenney Signed-off-by: Alexei Starovoitov --- include/linux/rcutiny.h | 2 ++ include/linux/rcutree.h | 1 + kernel/rcu/rcu.h | 2 -- 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 7f17acf29dda..7b949292908a 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -138,6 +138,8 @@ static inline int rcu_needs_cpu(void) return 0; } +static inline void rcu_request_urgent_qs_task(struct task_struct *t) { } + /* * Take advantage of the fact that there is only one CPU, which * allows us to ignore virtualization-based context switches. diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 56bccb5a8fde..126f6b418f6a 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -21,6 +21,7 @@ void rcu_softirq_qs(void); void rcu_note_context_switch(bool preempt); int rcu_needs_cpu(void); void rcu_cpu_stall_reset(void); +void rcu_request_urgent_qs_task(struct task_struct *t); /* * Note a virtualization-based context switch. This is simply a diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 4a1b9622598b..6f5fb3f7ebf3 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -493,7 +493,6 @@ static inline void rcu_expedite_gp(void) { } static inline void rcu_unexpedite_gp(void) { } static inline void rcu_async_hurry(void) { } static inline void rcu_async_relax(void) { } -static inline void rcu_request_urgent_qs_task(struct task_struct *t) { } #else /* #ifdef CONFIG_TINY_RCU */ bool rcu_gp_is_normal(void); /* Internal RCU use. */ bool rcu_gp_is_expedited(void); /* Internal RCU use. */ @@ -508,7 +507,6 @@ void show_rcu_tasks_gp_kthreads(void); #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ static inline void show_rcu_tasks_gp_kthreads(void) {} #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */ -void rcu_request_urgent_qs_task(struct task_struct *t); #endif /* #else #ifdef CONFIG_TINY_RCU */ #define RCU_SCHEDULER_INACTIVE 0 From patchwork Wed Jun 21 02:32:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 442C6EB64D8 for ; Wed, 21 Jun 2023 02:33:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229897AbjFUCd2 (ORCPT ); Tue, 20 Jun 2023 22:33:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229894AbjFUCdY (ORCPT ); Tue, 20 Jun 2023 22:33:24 -0400 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42D0EB4; Tue, 20 Jun 2023 19:33:23 -0700 (PDT) Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-763ca800ca9so5173285a.2; Tue, 20 Jun 2023 19:33:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314802; x=1689906802; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Id52rcowJ7NZMb5+YK4IlpOtkvo0YDczJ2nwiMNvyqw=; b=J3PGKBdPnjukA0sbhXRiwBk9g11KFflhzOYtZlGKS3eTknSgMCmyWtsAc2zmX81t9v lHFG6JJqWpV85kJJHshNZXmwqebgurllmC2LtbFLxgpSh865LwfWTzvHlyawMfqFQt/w G+qtNWCQPNdCupAsfhSRnM8M5j74Ory9NPSMOFiTndZQ8jUIK8JhKcadAjvWb2ReU7Tn FbLjjtoKcEjUcvWk62mzWuByVgqIwjMHNrR5rz32u5VG11HrTL/bBuPfHnsZw9QouQv4 9tp5aA5SfTumwPkUiC5k+k0rd2YUHo8Ru1Bf4sOKTcJMftb8qdwANMYXrzuzZ2Wsv8hk jBjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314802; x=1689906802; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Id52rcowJ7NZMb5+YK4IlpOtkvo0YDczJ2nwiMNvyqw=; b=E5EP1vRlxzkq3XSHX4Zyoh9nN083QSwBiR1ev9eKigrUE/Lwix2Oh4NS/rAoGFAaRA ZOnBcil3Fm3+9zQvbE1HkGOSM+aA42XL7cAgBb6cA8amfukz8Z/hxbnGbfebO7yoBDF8 t3oIVCyV1cwInPaQ7ZmPn+hQRoWryFLjwc9lGzvFS3OyClzDvwpLx0CCb4z0RqKccC9/ iA3m6txG+hPq8280eG0DTijyOkA0rN0lGMSvaDMPgXAGY7QznWSO/uC8TiHMvTlzaydI XDz1hIbrXFz/VYTqnuQEEYvxZP6zEMOd40frVl4BwOQBVHdZs/o3DTcH6p7YfcNW6hBW 9vaA== X-Gm-Message-State: AC+VfDzns9kZFPIGo1zsh0LaHB0zd9Ft0ypRpePJngbaqiVtPbmv1CMw dexoMLBau0Y1esh2Q50kHUCUx3FymTc= X-Google-Smtp-Source: ACHHUZ5qdum4e3IJtgk7T78so98Uamyb1AHH6fQlnjkm7duA5MKeRlBWHw2RnbPjssxK1/OibjxjyA== X-Received: by 2002:a05:620a:4d88:b0:763:c1d0:f846 with SMTP id uw8-20020a05620a4d8800b00763c1d0f846mr446014qkn.54.1687314802291; Tue, 20 Jun 2023 19:33:22 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id a5-20020a656545000000b0054fb537ca5dsm1776634pgw.92.2023.06.20.19.33.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:33:21 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 10/12] selftests/bpf: Improve test coverage of bpf_mem_alloc. Date: Tue, 20 Jun 2023 19:32:36 -0700 Message-Id: <20230621023238.87079-11-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov bpf_obj_new() calls bpf_mem_alloc(), but doing alloc/free of 8 elements is not triggering watermark conditions in bpf_mem_alloc. Increase to 200 elements to make sure alloc_bulk/free_bulk is exercised. Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/progs/linked_list.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c index 57440a554304..84d1777a9e6c 100644 --- a/tools/testing/selftests/bpf/progs/linked_list.c +++ b/tools/testing/selftests/bpf/progs/linked_list.c @@ -96,7 +96,7 @@ static __always_inline int list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map) { struct bpf_list_node *n; - struct foo *f[8], *pf; + struct foo *f[200], *pf; int i; /* Loop following this check adds nodes 2-at-a-time in order to From patchwork Wed Jun 21 02:32:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286590 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5740FEB64D7 for ; Wed, 21 Jun 2023 02:33:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229894AbjFUCda (ORCPT ); Tue, 20 Jun 2023 22:33:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbjFUCd2 (ORCPT ); Tue, 20 Jun 2023 22:33:28 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6EECF1; Tue, 20 Jun 2023 19:33:26 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1b51780c1b3so41890255ad.1; Tue, 20 Jun 2023 19:33:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314806; x=1689906806; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HEKquVMkw51/BvAklYxg11PTcdb/zRr+O2bwkd8xPgw=; b=sbrad6lCF37udQKtgyCg+BM1cG/tKGtnbuEay3vW/LkmkYDeDWHbklrVZcvLj8/h3v 9zo0kwc4nE0OAh3tNeqrRuQFBVrsLzyucWXrWAhIB90b+r9YtA5EQAs+YIViuIJncbol AdHOzMALqCL9p47NAqMSehYg+1Cqt5uCgTFH0kEH5s3QYnc4Uvpfp04DleoAQmieLnzQ eoVhv1HJYeoVG9xHXrfh5NAg1id1MDCA3gnS61pN7A/wo2zegqjzz5FZdAUTiqfx+/Pq 9imSrh2xr0xBVvlQMIcqme3U/y6vBV+EbtXUhsshHdYkddrrLbDvk6hxbvUpYS6MMbgS ZdyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314806; x=1689906806; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HEKquVMkw51/BvAklYxg11PTcdb/zRr+O2bwkd8xPgw=; b=lcAY71k4nEUtawqeDsMb+LBHkvceS8gTlw/2LSVl4oV2Jn2aR+spGQrpj5oP/Xe5hO CD51D6X9mK/8xWi+2FBdLPMWsRQSmTbrwfx+fxykelafsfEEtdqiXp1hRQxAlxqXD+Y8 NnR2HIU8DpNoJWy8fFY5yvjluRsKmdesy8CvITSGmtIt+tIa2UTYs63YTaAF7U//w9Tq jM55W8tfAKRlHFwWyO8LNn5eTInswqp5TILVB3O7t8xEKFZV19Ei8/SmCrJin4fza8HO +z0nN3jZ/B+rgIzDgZ3cbMxZ5N3a4yHpff7qUKk5c9LwfyY4GfG9N2IKtzqYzFlkSu0K U4OA== X-Gm-Message-State: AC+VfDxZpTY0pfz8Zc8GF8iiNMyHOKBcNU2E7UWHHPtx9nujRnvTsJZA BSn3cGb2NMRwKGc4BMRKL5E= X-Google-Smtp-Source: ACHHUZ4rOzdXKHH8jnoO6fXDWe3Q5our3D6E2BReA1a5A3k/IRWl0QUM4gZeuQU3NDEivGJqUJls8g== X-Received: by 2002:a17:902:d485:b0:1b6:79fe:3832 with SMTP id c5-20020a170902d48500b001b679fe3832mr4745005plg.69.1687314806194; Tue, 20 Jun 2023 19:33:26 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id u5-20020a170902b28500b001b672af624esm2231473plr.164.2023.06.20.19.33.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:33:25 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 11/12] bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu(). Date: Tue, 20 Jun 2023 19:32:37 -0700 Message-Id: <20230621023238.87079-12-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into per-cpu free list the _rcu() flavor waits for RCU grace period and then moves objects into free_by_rcu_ttrace list where they are waiting for RCU task trace grace period to be freed into slab. The life cycle of objects: alloc: dequeue free_llist free: enqeueu free_llist free_rcu: enqueue free_by_rcu -> waiting_for_gp free_llist above high watermark -> free_by_rcu_ttrace after RCU GP waiting_for_gp -> free_by_rcu_ttrace free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab Signed-off-by: Alexei Starovoitov Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 2 + kernel/bpf/memalloc.c | 118 ++++++++++++++++++++++++++++++++-- 2 files changed, 116 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 3929be5743f4..d644bbb298af 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -27,10 +27,12 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); +void bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr); /* kmem_cache_alloc/free equivalent: */ void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); +void bpf_mem_cache_free_rcu(struct bpf_mem_alloc *ma, void *ptr); void bpf_mem_cache_raw_free(void *ptr); void *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags); diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 10d027674743..4d1002e7b4b5 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -100,6 +100,15 @@ struct bpf_mem_cache { int percpu_size; struct bpf_mem_cache *tgt; + /* list of objects to be freed after RCU GP */ + struct llist_head free_by_rcu; + struct llist_node *free_by_rcu_tail; + struct llist_head waiting_for_gp; + struct llist_node *waiting_for_gp_tail; + struct rcu_head rcu; + atomic_t call_rcu_in_progress; + struct llist_head free_llist_extra_rcu; + /* list of objects to be freed after RCU tasks trace GP */ struct llist_head free_by_rcu_ttrace; struct llist_node *free_by_rcu_ttrace_tail; @@ -340,6 +349,56 @@ static void free_bulk(struct bpf_mem_cache *c) do_call_rcu_ttrace(tgt); } +static void __free_by_rcu(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + struct bpf_mem_cache *tgt = c->tgt; + struct llist_node *llnode = llist_del_all(&c->waiting_for_gp); + + if (!llnode) + goto out; + + if (llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace)) + tgt->free_by_rcu_ttrace_tail = c->waiting_for_gp_tail; + + /* Objects went through regular RCU GP. Send them to RCU tasks trace */ + do_call_rcu_ttrace(tgt); +out: + atomic_set(&c->call_rcu_in_progress, 0); +} + +static void check_free_by_rcu(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + if (llist_empty(&c->free_by_rcu) && llist_empty(&c->free_llist_extra_rcu)) + return; + + /* drain free_llist_extra_rcu */ + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra_rcu)) + if (__llist_add(llnode, &c->free_by_rcu)) + c->free_by_rcu_tail = llnode; + + if (atomic_xchg(&c->call_rcu_in_progress, 1)) { + /* + * Instead of kmalloc-ing new rcu_head and triggering 10k + * call_rcu() to hit rcutree.qhimark and force RCU to notice + * the overload just ask RCU to hurry up. There could be many + * objects in free_by_rcu list. + * This hint reduces memory consumption for an artifical + * benchmark from 2 Gbyte to 150 Mbyte. + */ + rcu_request_urgent_qs_task(current); + return; + } + + WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); + + WRITE_ONCE(c->waiting_for_gp.first, __llist_del_all(&c->free_by_rcu)); + c->waiting_for_gp_tail = c->free_by_rcu_tail; + call_rcu_hurry(&c->rcu, __free_by_rcu); +} + static void bpf_mem_refill(struct irq_work *work) { struct bpf_mem_cache *c = container_of(work, struct bpf_mem_cache, refill_work); @@ -354,6 +413,8 @@ static void bpf_mem_refill(struct irq_work *work) alloc_bulk(c, c->batch, NUMA_NO_NODE); else if (cnt > c->high_watermark) free_bulk(c); + + check_free_by_rcu(c); } static void notrace irq_work_raise(struct bpf_mem_cache *c) @@ -482,6 +543,9 @@ static void drain_mem_cache(struct bpf_mem_cache *c) free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu); free_all(__llist_del_all(&c->free_llist), percpu); free_all(__llist_del_all(&c->free_llist_extra), percpu); + free_all(__llist_del_all(&c->free_by_rcu), percpu); + free_all(__llist_del_all(&c->free_llist_extra_rcu), percpu); + free_all(llist_del_all(&c->waiting_for_gp), percpu); } static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) @@ -494,8 +558,8 @@ static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) static void free_mem_alloc(struct bpf_mem_alloc *ma) { - /* waiting_for_gp_ttrace lists was drained, but __free_rcu might - * still execute. Wait for it now before we freeing percpu caches. + /* waiting_for_gp[_ttrace] lists were drained, but RCU callbacks + * might still execute. Wait for them. * * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(), * but rcu_barrier_tasks_trace() and rcu_barrier() below are only used @@ -504,9 +568,10 @@ static void free_mem_alloc(struct bpf_mem_alloc *ma) * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by * using rcu_trace_implies_rcu_gp() as well. */ - rcu_barrier_tasks_trace(); + rcu_barrier(); /* wait for __free_by_rcu() */ + rcu_barrier_tasks_trace(); /* wait for __free_rcu() via call_rcu_tasks_trace */ if (!rcu_trace_implies_rcu_gp()) - rcu_barrier(); + rcu_barrier(); /* wait for __free_rcu() via call_rcu */ free_mem_alloc_no_barrier(ma); } @@ -565,6 +630,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) irq_work_sync(&c->refill_work); drain_mem_cache(c); rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress); + rcu_in_progress += atomic_read(&c->call_rcu_in_progress); } /* objcg is the same across cpus */ if (c->objcg) @@ -580,6 +646,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) irq_work_sync(&c->refill_work); drain_mem_cache(c); rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress); + rcu_in_progress += atomic_read(&c->call_rcu_in_progress); } } if (c->objcg) @@ -664,6 +731,27 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) irq_work_raise(c); } +static void notrace unit_free_rcu(struct bpf_mem_cache *c, void *ptr) +{ + struct llist_node *llnode = ptr - LLIST_NODE_SZ; + unsigned long flags; + + c->tgt = *(struct bpf_mem_cache **)llnode; + + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + if (__llist_add(llnode, &c->free_by_rcu)) + c->free_by_rcu_tail = llnode; + } else { + llist_add(llnode, &c->free_llist_extra_rcu); + } + local_dec(&c->active); + local_irq_restore(flags); + + if (!atomic_read(&c->call_rcu_in_progress)) + irq_work_raise(c); +} + /* Called from BPF program or from sys_bpf syscall. * In both cases migration is disabled. */ @@ -697,6 +785,20 @@ void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr) unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr); } +void notrace bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr) +{ + int idx; + + if (!ptr) + return; + + idx = bpf_mem_cache_idx(ksize(ptr - LLIST_NODE_SZ)); + if (idx < 0) + return; + + unit_free_rcu(this_cpu_ptr(ma->caches)->cache + idx, ptr); +} + void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma) { void *ret; @@ -713,6 +815,14 @@ void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) unit_free(this_cpu_ptr(ma->cache), ptr); } +void notrace bpf_mem_cache_free_rcu(struct bpf_mem_alloc *ma, void *ptr) +{ + if (!ptr) + return; + + unit_free_rcu(this_cpu_ptr(ma->cache), ptr); +} + /* Directly does a kfree() without putting 'ptr' back to the free_llist * for reuse and without waiting for a rcu_tasks_trace gp. * The caller must first go through the rcu_tasks_trace gp for 'ptr' From patchwork Wed Jun 21 02:32:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13286591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2C36EB64D8 for ; Wed, 21 Jun 2023 02:33:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229626AbjFUCdc (ORCPT ); Tue, 20 Jun 2023 22:33:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbjFUCdb (ORCPT ); Tue, 20 Jun 2023 22:33:31 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B52D7B7; Tue, 20 Jun 2023 19:33:30 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1b5465a79cdso22328945ad.3; Tue, 20 Jun 2023 19:33:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687314810; x=1689906810; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xfktoJD/McKDXPEAYo4ZmtHg2u99iMZJ4yXppCi89l0=; b=p20FW5sp0sL/DKIuel7h/Z+q4KMtjws+ZwRhTMLsNdbtENbGlSe89iLoXpVQR9l47O 0xPd73i2nzjaQhbKuL0V6N/06edDdr3Edi7NDcwDliKiulfMMX/Zw4CRyY8esvhxgkc/ iEitprfELcceVoxEkrxbbp+Q7ZKuWvEwbTU9v8cO8ZbKQvtlQYDz6tNCl/gtzN6+vGvg Yvj8NDQgQtXAJmCx5PZC03Dj9Ic/Z8rNTPIi2YO8E7euE8oh5G9F/NqXdsdHiR6x6m+k XSNBiYu01GimsDyYilVloeUVPSTO9wszic1eycY0oVGMs2/kwYYjBwB6Pa28sWKkksC5 b+1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687314810; x=1689906810; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xfktoJD/McKDXPEAYo4ZmtHg2u99iMZJ4yXppCi89l0=; b=OEVySKhdFtk4zHf8PItge82qir5Yv9W2BXp68jxxYwR7ONwUb5FQmNQNuXTXb4IB0W phlDXMNAHtTrG5eQf+nOAdWRosxd53bzWZruiMREnuwaSzq5iMxx/OZcUaoC8aNYMYvV 3wDKV5+V5I1hWuyRHX578A6JyMLpzlBNp0iegYqshdtB2PvyUccSCsOwr1q9b388WOoE kyW0H3kTDjCyHl93btYLxvCj93gbvJI+D5cPKI97hyovHRzXfV4AMBWc68HRf0mBHlVC Qvk12MXYmQMgUDdpMb8d1lAFqmKxPUO9GwlO1EEsciDFDdHUjMQP6idsmd/ACmo3nS7R jQoA== X-Gm-Message-State: AC+VfDyudqetcBH5ieGWNo73D8INGVD3gmHk9FT3/m1xwNDkPeef139+ 6MIj4NXIqW9bVfyroiTtBIs= X-Google-Smtp-Source: ACHHUZ6VrWYH+0RQ3hURKAf10UHuTT98N8h9jJbfCXdyJrCMpPHqDCPfDIrOtjzPE4L2DiaMjFqyyw== X-Received: by 2002:a17:902:bb10:b0:1b2:665:d251 with SMTP id im16-20020a170902bb1000b001b20665d251mr11334285plb.47.1687314810171; Tue, 20 Jun 2023 19:33:30 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:400::5:e719]) by smtp.gmail.com with ESMTPSA id bf1-20020a170902b90100b001b0358848b0sm2242831plb.161.2023.06.20.19.33.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 20 Jun 2023 19:33:29 -0700 (PDT) From: Alexei Starovoitov To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com, houtao@huaweicloud.com, paulmck@kernel.org Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH bpf-next 12/12] bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu. Date: Tue, 20 Jun 2023 19:32:38 -0700 Message-Id: <20230621023238.87079-13-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) In-Reply-To: <20230621023238.87079-1-alexei.starovoitov@gmail.com> References: <20230621023238.87079-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Alexei Starovoitov Convert bpf_cpumask to bpf_mem_cache_free_rcu. Signed-off-by: Alexei Starovoitov --- kernel/bpf/cpumask.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c index 938a60ff4295..6983af8e093c 100644 --- a/kernel/bpf/cpumask.c +++ b/kernel/bpf/cpumask.c @@ -9,7 +9,6 @@ /** * struct bpf_cpumask - refcounted BPF cpumask wrapper structure * @cpumask: The actual cpumask embedded in the struct. - * @rcu: The RCU head used to free the cpumask with RCU safety. * @usage: Object reference counter. When the refcount goes to 0, the * memory is released back to the BPF allocator, which provides * RCU safety. @@ -25,7 +24,6 @@ */ struct bpf_cpumask { cpumask_t cpumask; - struct rcu_head rcu; refcount_t usage; }; @@ -82,16 +80,6 @@ __bpf_kfunc struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask) return cpumask; } -static void cpumask_free_cb(struct rcu_head *head) -{ - struct bpf_cpumask *cpumask; - - cpumask = container_of(head, struct bpf_cpumask, rcu); - migrate_disable(); - bpf_mem_cache_free(&bpf_cpumask_ma, cpumask); - migrate_enable(); -} - /** * bpf_cpumask_release() - Release a previously acquired BPF cpumask. * @cpumask: The cpumask being released. @@ -102,8 +90,12 @@ static void cpumask_free_cb(struct rcu_head *head) */ __bpf_kfunc void bpf_cpumask_release(struct bpf_cpumask *cpumask) { - if (refcount_dec_and_test(&cpumask->usage)) - call_rcu(&cpumask->rcu, cpumask_free_cb); + if (!refcount_dec_and_test(&cpumask->usage)) + return; + + migrate_disable(); + bpf_mem_cache_free_rcu(&bpf_cpumask_ma, cpumask); + migrate_enable(); } /**