From patchwork Fri Sep 2 21:10:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E948EC6FA82 for ; Fri, 2 Sep 2022 21:11:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 88E6B80138; Fri, 2 Sep 2022 17:11:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8163680120; Fri, 2 Sep 2022 17:11:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6692180138; Fri, 2 Sep 2022 17:11:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4F91D80120 for ; Fri, 2 Sep 2022 17:11:44 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 27B331605EF for ; Fri, 2 Sep 2022 21:11:44 +0000 (UTC) X-FDA: 79868392128.18.4A4E795 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf18.hostedemail.com (Postfix) with ESMTP id CF9281C004D for ; Fri, 2 Sep 2022 21:11:43 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id mj6so3148367pjb.1 for ; Fri, 02 Sep 2022 14:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=TN8REtN1cZ5FkSjWziDVRGcYvlT6/7V0g9bDQR2oXQGJAz2W/n/UtuR2uitC9mHdA0 HLa0CNPRjqJSoCIh+6YEwhzoJRsxbF5pPtGyGftOu5R3wWnWmnAknCVcMpdH+aZ6jS2F g7xwWX9ZHn4BYdOlPuan08JxfyAVErz8Nwq5vAXZBVjPLkO7U0Pzh1RhaqTP49bQiB1w fA4kygCTfMM9x8mSJqMr3xg2X16ke8zx8TKjctMXkjw1YMay+gTaq25sFT+UP7cXk6g7 1dkztw58LrjHMddP5IO6SQHmrEkSCmogZPEx1pTjj+0HH6VaTOkBOtINWm4L18hdabtJ kuGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=qViGgk/PSI0b2RLdGHX5/UFpa6gNNZ2ER9BF/ujgfkHsGZV9RSzcTlJiO965O420mD YocKyUb67ofy80xp1vSkLzUIrU9+OzWTiivRbiWpgo4zPJIcg+cyxTYtZFB8pp33G+je TV0iU1fgiOLsZ1b3kZL9ldPHxkMXig4HL2kElmW8JH94hoY/kbYnmpQPlZmxxoUFELEM NVss7FPahpk16IoU8dCJI0XfoN8YO7Pr4X5NurzGLaHM8lez70l51TZ3GZUBplQhDbfT HWs6rWzmXLLPjw3B/3BDX0gfAAJfPaG2M3qtt6ebjjTppJ11hmcJcJlq+RNF7FnovNBy AzAg== X-Gm-Message-State: ACgBeo0jc4BrBmPAtO5QsiPemzjiFhNNRYxwH5ShF1EkZp5RqVQcySqc F2+oWTwwnvc3qS44SgMas5I= X-Google-Smtp-Source: AA6agR7qo1RKuBrsh4lT+UcNA95jTvod8BaJQSuVrCPa9T2Jady4SgM/LjVGogJSCCyPFILj/AMKyQ== X-Received: by 2002:a17:902:cf43:b0:172:86f3:586a with SMTP id e3-20020a170902cf4300b0017286f3586amr37149775plg.71.1662153102793; Fri, 02 Sep 2022 14:11:42 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id d10-20020a170902ceca00b0015e8d4eb1d7sm2172217plg.33.2022.09.02.14.11.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:42 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 11/16] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. Date: Fri, 2 Sep 2022 14:10:53 -0700 Message-Id: <20220902211058.60789-12-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TN8REtN1; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153103; a=rsa-sha256; cv=none; b=3lm0W3gHS//xRFTY4C2v1hqOVqUxLJlN/p7gkD6BkXRaQRQscOYisJoW3qd6H/49aE/VZb Dm1Nn4424DrZTUsYrpK3I4oTm2YBwPM24c8wx7SqoAU2ZfiIe+Gz4EouZ+2Nf36zx8tDll FblIDK2dAt5swgPKpY7je1N0qA9DRpk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153103; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=AWTRtP3sa7sP18U6cZ+gmZkKCmF5b+yhvpyUfHMTW6UP5V6bIJInL6w0j9S6Qmj3Jrhwp5 qEHQLb6iiXVV75bwFIF7K7INzL93bzpUS2xZrRkpYtZs2rVddFv1El7z+mESOVplE720Gs 9VJp/FVuRhJB+Prt89/gK0UqjK9MUcA= X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CF9281C004D X-Stat-Signature: rmegseqw7q7ih6fwiytiony8qw9a7ipw Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TN8REtN1; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1662153103-213277 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert dynamic allocations in percpu hash map from alloc_percpu() to bpf_mem_cache_alloc() from per-cpu bpf_mem_alloc. Since bpf_mem_alloc frees objects after RCU gp the call_rcu() is removed. pcpu_init_value() now needs to zero-fill per-cpu allocations, since dynamically allocated map elements are now similar to full prealloc, since alloc_percpu() is not called inline and the elements are reused in the freelist. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 45 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 70b02ff4445e..a77b9c4a4e48 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -94,6 +94,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; struct bpf_mem_alloc ma; + struct bpf_mem_alloc pcpu_ma; struct bucket *buckets; void *elems; union { @@ -121,14 +122,14 @@ struct htab_elem { struct { void *padding; union { - struct bpf_htab *htab; struct pcpu_freelist_node fnode; struct htab_elem *batch_flink; }; }; }; union { - struct rcu_head rcu; + /* pointer to per-cpu pointer */ + void *ptr_to_pptr; struct bpf_lru_node lru_node; }; u32 hash; @@ -448,8 +449,6 @@ static int htab_map_alloc_check(union bpf_attr *attr) bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED); int numa_node = bpf_map_attr_numa_node(attr); - BUILD_BUG_ON(offsetof(struct htab_elem, htab) != - offsetof(struct htab_elem, hash_node.pprev)); BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) != offsetof(struct htab_elem, hash_node.pprev)); @@ -610,6 +609,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; + if (percpu) { + err = bpf_mem_alloc_init(&htab->pcpu_ma, + round_up(htab->map.value_size, 8), true); + if (err) + goto free_map_locked; + } } return &htab->map; @@ -620,6 +625,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); @@ -895,19 +901,11 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) { if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) - free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); + bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); check_and_free_fields(htab, l); bpf_mem_cache_free(&htab->ma, l); } -static void htab_elem_free_rcu(struct rcu_head *head) -{ - struct htab_elem *l = container_of(head, struct htab_elem, rcu); - struct bpf_htab *htab = l->htab; - - htab_elem_free(htab, l); -} - static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) { struct bpf_map *map = &htab->map; @@ -953,12 +951,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); - } else { - htab_elem_free(htab, l); - } + htab_elem_free(htab, l); } } @@ -983,13 +976,12 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr, static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, void *value, bool onallcpus) { - /* When using prealloc and not setting the initial value on all cpus, - * zero-fill element values for other cpus (just as what happens when - * not using prealloc). Otherwise, bpf program has no way to ensure + /* When not setting the initial value on all cpus, zero-fill element + * values for other cpus. Otherwise, bpf program has no way to ensure * known initial values for cpus other than current one * (onallcpus=false always when coming from bpf prog). */ - if (htab_is_prealloc(htab) && !onallcpus) { + if (!onallcpus) { u32 size = round_up(htab->map.value_size, 8); int current_cpu = raw_smp_processor_id(); int cpu; @@ -1060,18 +1052,18 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, memcpy(l_new->key, key, key_size); if (percpu) { - size = round_up(size, 8); if (prealloc) { pptr = htab_elem_get_ptr(l_new, key_size); } else { /* alloc_percpu zero-fills */ - pptr = bpf_map_alloc_percpu(&htab->map, size, 8, - GFP_NOWAIT | __GFP_NOWARN); + pptr = bpf_mem_cache_alloc(&htab->pcpu_ma); if (!pptr) { bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } + l_new->ptr_to_pptr = pptr; + pptr = *(void **)pptr; } pcpu_init_value(htab, pptr, value, onallcpus); @@ -1568,6 +1560,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount);