From patchwork Tue Jul 26 13:00:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12929244 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14159C43334 for ; Tue, 26 Jul 2022 12:42:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238582AbiGZMmA (ORCPT ); Tue, 26 Jul 2022 08:42:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233657AbiGZMl7 (ORCPT ); Tue, 26 Jul 2022 08:41:59 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E33AF22B39 for ; Tue, 26 Jul 2022 05:41:56 -0700 (PDT) Received: from dggpeml500025.china.huawei.com (unknown [172.30.72.55]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4Lsc0l2v57z1M8FT; Tue, 26 Jul 2022 20:39:03 +0800 (CST) Received: from huawei.com (10.175.124.27) by dggpeml500025.china.huawei.com (7.185.36.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 26 Jul 2022 20:41:54 +0800 From: Hou Tao To: Andrii Nakryiko , Alexei Starovoitov , CC: Daniel Borkmann , Martin KaFai Lau , Yonghong Song , Song Liu , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Hao Luo , Jiri Olsa , John Fastabend , Subject: [RFC PATCH bpf-next 1/3] bpf: Add support for qp-trie map Date: Tue, 26 Jul 2022 21:00:03 +0800 Message-ID: <20220726130005.3102470-2-houtao1@huawei.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220726130005.3102470-1-houtao1@huawei.com> References: <20220726130005.3102470-1-houtao1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.27] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpeml500025.china.huawei.com (7.185.36.35) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC The initial motivation for qp-trie map is to reduce memory usage for string keys specially those with large differencies in length. Moreover as a big-endian lexicographic-ordered map, qp-trie can also be used for any binary data with fixed or variable length. The memory efficiency of qp-tries comes partly from the design of qp-trie which doesn't save key for branch node and uses sparse array to save leaf nodes, partly comes from the support of variable-length key: only the used part in key is saved. But the memory efficiency and ordered keys come with cost: the lookup performance of qp-trie is about 30% or more slower compared with hash-table. Signed-off-by: Hou Tao --- include/linux/bpf_types.h | 1 + include/uapi/linux/bpf.h | 8 + kernel/bpf/Makefile | 1 + kernel/bpf/bpf_qp_trie.c | 1064 ++++++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 8 + 5 files changed, 1082 insertions(+) create mode 100644 kernel/bpf/bpf_qp_trie.c diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 2b9112b80171..a73d47f83d74 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_QP_TRIE, qp_trie_map_ops) BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint) BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index ffcbf79a556b..39a65bf0d9f4 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -82,6 +82,13 @@ struct bpf_lpm_trie_key { __u8 data[0]; /* Arbitrary size */ }; +struct bpf_qp_trie_key { + /* the length of blob data */ + __u32 len; + /* blob data */ + __u8 data[0]; +}; + struct bpf_cgroup_storage_key { __u64 cgroup_inode_id; /* cgroup inode id */ __u32 attach_type; /* program attach type (enum bpf_attach_type) */ @@ -909,6 +916,7 @@ enum bpf_map_type { BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, + BPF_MAP_TYPE_QP_TRIE, }; /* Note that tracing related programs such as diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 057ba8e01e70..99fd5b10d544 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_i obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o +obj-$(CONFIG_BPF_SYSCALL) += bpf_qp_trie.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o diff --git a/kernel/bpf/bpf_qp_trie.c b/kernel/bpf/bpf_qp_trie.c new file mode 100644 index 000000000000..6b2672e67c87 --- /dev/null +++ b/kernel/bpf/bpf_qp_trie.c @@ -0,0 +1,1064 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Derived from qp.c in https://github.com/fanf2/qp.git + * + * Copyright (C) 2022. Huawei Technologies Co., Ltd + */ +#include +#include +#include +#include +#include +#include +#include + +/* qp-trie (quadbit popcount trie) is a memory efficient trie. Unlike + * normal trie which uses byte as lookup key, qp-trie interprets its keys + * as quadbit/nibble array and uses one nibble each time during lookup. + * The most significant nibble (upper nibble) of byte N in the key will + * be the 2*N element of nibble array, and the least significant nibble + * (lower nibble) of byte N will be the 2*N+1 element in nibble array. + * + * For normal trie, it may have 256 child nodes, and for qp-trie one branch + * node may have 17 child nodes. #0 child node is special because it must + * be a leaf node and its key is the same as the branch node. #1~#16 child + * nodes represent leaf nodes or branch nodes which have different keys + * with parent node. The key of branch node is the common prefix for these + * child nodes, and the index of child node minus one is the value of first + * different nibble between these child nodes. + * + * qp-trie reduces memory usage through two methods: + * (1) Branch node doesn't store the key. It only stores the position of + * the first nibble which differentiates child nodes. + * (2) Branch node doesn't store all 17 child nodes. It uses a bitmap and + * popcount() to implement a sparse array and only allocates memory + * for those present children. + * + * Like normal trie, qp-trie is also ordered and is in big-endian + * lexicographic order. If traverse qp-trie in a depth-first way, it will + * return a string of ordered keys. + * + * The following diagrams show the construction of a tiny qp-trie: + * + * (1) insert abc + * + * [ leaf node: abc ] + * + * (2) insert abc_d + * + * The first different nibble between "abc" and "abc_d" is the upper nibble + * of character '_' (0x5), and its position in nibble array is 6 + * (starts from 0). + * + * [ branch node ] bitmap: 0x41 diff pos: 6 + * | + * * + * children + * [0] [6] + * | | + * [leaf: abc] [leaf: abc_d] + * + * (3) insert abc_e + * + * The first different nibble between "abc_d" and "abc_e" is the lower + * nibble of character 'd'/'e', and its position in array is 9. + * + * [ branch node ] bitmap: 0x41 diff pos: 6 + * | + * * + * children + * [0] [6] + * | | + * [leaf: abc] | + * * + * [ branch node ] bitmap: 0x60 diff pos: 9 + * | + * * + * children + * [5] [6] + * | | + * [leaf: abc_d] [leaf: abc_e] + */ + +#define QP_TRIE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE | \ + BPF_F_ACCESS_MASK) + +/* bit[0] of nodes in qp_trie_branch is used to tell node type: + * + * bit[0]: 0-branch node + * bit[0]: 1-leaf node + * + * Size of qp_trie_branch is already 2-bytes aligned, so only need to make + * allocation of leaf node to be 2-bytes aligned. + */ +#define QP_TRIE_LEAF_NODE_MASK 1UL +#define QP_TRIE_LEAF_ALLOC_ALIGN 2 + +/* To reduce memory usage, only qp_trie_branch is RCU-freed. To handle + * freeing of the last leaf node, an extra qp_trie_branch node is + * allocated. The branch node has only one child and its index is 0. It + * is set as root node after adding the first leaf node. + */ +#define QP_TRIE_ROOT_NODE_INDEX 0 +#define QP_TRIE_NON_ROOT_NODE_MASK 1 + +#define QP_TRIE_NIBBLE_SHIFT 1 +#define QP_TRIE_BYTE_INDEX_SHIFT 2 + +#define QP_TRIE_MIN_KEY_DATA_SIZE 1 +#define QP_TRIE_MAX_KEY_DATA_SIZE (1U << 30) +#define QP_TRIE_MIN_KEY_SIZE (sizeof(struct bpf_qp_trie_key) + QP_TRIE_MIN_KEY_DATA_SIZE) +#define QP_TRIE_MAX_KEY_SIZE (sizeof(struct bpf_qp_trie_key) + QP_TRIE_MAX_KEY_DATA_SIZE) + +#define QP_TRIE_TWIGS_FREE_NONE_IDX 17 + +struct qp_trie_branch { + /* The bottom two bits of index are used as special flags: + * + * bit[0]: 0-root, 1-not root + * bit[1]: 0-upper nibble, 1-lower nibble + * + * bit[2:31]: byte index for key + */ + unsigned int index; + /* 17 bits are used to accommodate arbitrary keys, even when there are + * zero-bytes in these keys. + * + * bit[0]: a leaf node has the same key as the prefix of parent node + * bit[N]: a child node with the value of nibble at index as (N - 1) + */ + unsigned int bitmap:17; + /* The index of leaf node will be RCU-freed together */ + unsigned int to_free_idx:5; + struct qp_trie_branch __rcu *parent; + struct rcu_head rcu; + void __rcu *nodes[0]; +}; + +#define QP_TRIE_NR_SUBTREE 256 + +struct qp_trie { + struct bpf_map map; + atomic_t entries; + void __rcu *roots[QP_TRIE_NR_SUBTREE]; + spinlock_t locks[QP_TRIE_NR_SUBTREE]; +}; + +struct qp_trie_diff { + unsigned int index; + unsigned int sibling_bm; + unsigned int new_bm; +}; + +static inline bool is_valid_key_data_len(const struct bpf_qp_trie_key *key, + unsigned int key_size) +{ + return key->len >= QP_TRIE_MIN_KEY_DATA_SIZE && + key->len <= key_size - sizeof(*key); +} + +static inline void *to_child_node(struct bpf_qp_trie_key *key) +{ + return (void *)((long)key | QP_TRIE_LEAF_NODE_MASK); +} + +static inline struct bpf_qp_trie_key *to_leaf_node(void *node) +{ + return (void *)((long)node & ~QP_TRIE_LEAF_NODE_MASK); +} + +static inline bool is_branch_node(void *node) +{ + return !((long)node & QP_TRIE_LEAF_NODE_MASK); +} + +static inline bool is_same_key(const struct bpf_qp_trie_key *l, const struct bpf_qp_trie_key *r) +{ + return l->len == r->len && !memcmp(l->data, r->data, r->len); +} + +static inline void *qp_trie_leaf_value(const struct bpf_qp_trie_key *key) +{ + return (void *)key + sizeof(*key) + key->len; +} + +static inline unsigned int calc_twig_index(unsigned int mask, unsigned int bitmap) +{ + return hweight32(mask & (bitmap - 1)); +} + +static inline unsigned int calc_twig_nr(unsigned int bitmap) +{ + return hweight32(bitmap); +} + +static inline unsigned int nibble_to_bitmap(unsigned char nibble) +{ + return 1U << (nibble + 1); +} + +static inline unsigned int index_to_byte_index(unsigned int index) +{ + return index >> QP_TRIE_BYTE_INDEX_SHIFT; +} + +static inline unsigned int calc_br_bitmap(unsigned int index, const struct bpf_qp_trie_key *key) +{ + unsigned int byte; + unsigned char nibble; + + if (index == QP_TRIE_ROOT_NODE_INDEX) + return 1; + + byte = index_to_byte_index(index); + if (byte >= key->len) + return 1; + + nibble = key->data[byte]; + /* lower nibble */ + if ((index >> QP_TRIE_NIBBLE_SHIFT) & 1) + nibble &= 0xf; + else + nibble >>= 4; + return nibble_to_bitmap(nibble); +} + +static void qp_trie_free_twigs_rcu(struct rcu_head *rcu) +{ + struct qp_trie_branch *twigs = container_of(rcu, struct qp_trie_branch, rcu); + unsigned int idx = twigs->to_free_idx; + + if (idx != QP_TRIE_TWIGS_FREE_NONE_IDX) + kfree(to_leaf_node(twigs->nodes[idx])); + kfree(twigs); +} + +static void qp_trie_branch_free(struct qp_trie_branch *twigs, unsigned int to_free_idx) +{ + twigs->to_free_idx = to_free_idx; + call_rcu(&twigs->rcu, qp_trie_free_twigs_rcu); +} + +static inline struct qp_trie_branch * +qp_trie_branch_new(struct bpf_map *map, unsigned int nr) +{ + struct qp_trie_branch *a; + + a = bpf_map_kmalloc_node(map, sizeof(*a) + nr * sizeof(*a->nodes), + GFP_NOWAIT | __GFP_NOWARN, map->numa_node); + return a; +} + +static inline void qp_trie_assign_parent(struct qp_trie_branch *parent, void *node) +{ + if (is_branch_node(node)) + rcu_assign_pointer(((struct qp_trie_branch *)node)->parent, parent); +} + +static void qp_trie_update_parent(struct qp_trie_branch *parent, unsigned int nr) +{ + unsigned int i; + + for (i = 0; i < nr; i++) + qp_trie_assign_parent(parent, parent->nodes[i]); +} + +/* new_node can be either a leaf node or a branch node */ +static struct qp_trie_branch * +qp_trie_branch_replace(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap, + void *new_node) +{ + unsigned int nr = calc_twig_nr(old->bitmap); + unsigned int p = calc_twig_index(old->bitmap, bitmap); + struct qp_trie_branch *twigs; + + twigs = qp_trie_branch_new(map, nr); + if (!twigs) + return NULL; + + if (p) + memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes)); + + rcu_assign_pointer(twigs->nodes[p], new_node); + + if (nr - 1 > p) + memcpy(&twigs->nodes[p+1], &old->nodes[p+1], (nr - 1 - p) * sizeof(*twigs->nodes)); + + twigs->index = old->index; + twigs->bitmap = old->bitmap; + /* Initialize parent of upper-level node first, + * then update parent for child nodes after parent node is + * fully initialized + */ + RCU_INIT_POINTER(twigs->parent, old->parent); + + qp_trie_update_parent(twigs, nr); + + return twigs; +} + +static struct qp_trie_branch * +qp_trie_branch_insert(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap, + struct bpf_qp_trie_key *new) +{ + unsigned int nr = calc_twig_nr(old->bitmap); + unsigned int p = calc_twig_index(old->bitmap, bitmap); + struct qp_trie_branch *twigs; + + twigs = qp_trie_branch_new(map, nr + 1); + if (!twigs) + return NULL; + + if (p) + memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes)); + + rcu_assign_pointer(twigs->nodes[p], to_child_node(new)); + + if (nr > p) + memcpy(&twigs->nodes[p+1], &old->nodes[p], (nr - p) * sizeof(*twigs->nodes)); + + twigs->bitmap = old->bitmap | bitmap; + twigs->index = old->index; + RCU_INIT_POINTER(twigs->parent, old->parent); + + qp_trie_update_parent(twigs, nr + 1); + + return twigs; +} + +static struct qp_trie_branch * +qp_trie_branch_remove(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap) +{ + unsigned int nr = calc_twig_nr(old->bitmap); + unsigned int p = calc_twig_index(old->bitmap, bitmap); + struct qp_trie_branch *twigs; + + twigs = qp_trie_branch_new(map, nr - 1); + if (!twigs) + return NULL; + + if (p) + memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes)); + if (nr - 1 > p) + memcpy(&twigs->nodes[p], &old->nodes[p+1], (nr - 1 - p) * sizeof(*twigs->nodes)); + + twigs->bitmap = old->bitmap & ~bitmap; + twigs->index = old->index; + RCU_INIT_POINTER(twigs->parent, old->parent); + + qp_trie_update_parent(twigs, nr - 1); + + return twigs; +} + +static struct bpf_qp_trie_key *qp_trie_init_leaf_node(struct bpf_map *map, void *k, void *v) +{ + struct bpf_qp_trie_key *key = k, *new; + unsigned int key_size, total_size; + + key_size = sizeof(*key) + key->len; + total_size = round_up(key_size + map->value_size, QP_TRIE_LEAF_ALLOC_ALIGN); + new = bpf_map_kmalloc_node(map, total_size, GFP_NOWAIT | __GFP_NOWARN, map->numa_node); + if (!new) + return NULL; + + memcpy(new, k, key_size); + memcpy((void *)new + key_size, v, map->value_size); + + return new; +} + +static bool calc_prefix_len(const struct bpf_qp_trie_key *s_key, + const struct bpf_qp_trie_key *n_key, unsigned int *index) +{ + unsigned int len = min(s_key->len, n_key->len); + unsigned int i; + unsigned char diff = 0; + + for (i = 0; i < len; i++) { + diff = s_key->data[i] ^ n_key->data[i]; + if (diff) + break; + } + + *index = (i << QP_TRIE_BYTE_INDEX_SHIFT) | QP_TRIE_NON_ROOT_NODE_MASK; + if (!diff) + return s_key->len == n_key->len; + + *index += (diff & 0xf0) ? 0 : (1U << QP_TRIE_NIBBLE_SHIFT); + return false; +} + +static int qp_trie_new_branch(struct qp_trie *trie, struct qp_trie_branch **parent, + unsigned int bitmap, void *sibling, struct qp_trie_diff *d, + void *key, void *value) +{ + struct qp_trie_branch *new_child_twigs, *new_twigs, *old_twigs; + struct bpf_qp_trie_key *leaf; + struct bpf_map *map; + unsigned int iip; + int err; + + map = &trie->map; + if (atomic_inc_return(&trie->entries) > map->max_entries) { + err = -ENOSPC; + goto dec_entries; + } + + leaf = qp_trie_init_leaf_node(map, key, value); + if (!leaf) { + err = -ENOMEM; + goto dec_entries; + } + + new_child_twigs = qp_trie_branch_new(map, 2); + if (!new_child_twigs) { + err = -ENOMEM; + goto free_leaf; + } + + new_child_twigs->index = d->index; + new_child_twigs->bitmap = d->sibling_bm | d->new_bm; + + iip = calc_twig_index(new_child_twigs->bitmap, d->sibling_bm); + RCU_INIT_POINTER(new_child_twigs->nodes[iip], sibling); + rcu_assign_pointer(new_child_twigs->nodes[!iip], to_child_node(leaf)); + RCU_INIT_POINTER(new_child_twigs->parent, NULL); + + old_twigs = *parent; + new_twigs = qp_trie_branch_replace(map, old_twigs, bitmap, new_child_twigs); + if (!new_twigs) { + err = -ENOMEM; + goto free_child_twigs; + } + + qp_trie_assign_parent(new_child_twigs, sibling); + rcu_assign_pointer(*parent, new_twigs); + qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX); + + return 0; + +free_child_twigs: + kfree(new_child_twigs); +free_leaf: + kfree(leaf); +dec_entries: + atomic_dec(&trie->entries); + return err; +} + +static int qp_trie_ext_branch(struct qp_trie *trie, struct qp_trie_branch **parent, + void *key, void *value, unsigned int bitmap) +{ + struct qp_trie_branch *old_twigs, *new_twigs; + struct bpf_qp_trie_key *new; + struct bpf_map *map; + int err; + + map = &trie->map; + if (atomic_inc_return(&trie->entries) > map->max_entries) { + err = -ENOSPC; + goto dec_entries; + } + + new = qp_trie_init_leaf_node(map, key, value); + if (!new) { + err = -ENOMEM; + goto dec_entries; + } + + old_twigs = *parent; + new_twigs = qp_trie_branch_insert(map, old_twigs, bitmap, new); + if (!new_twigs) { + err = -ENOMEM; + goto free_leaf; + } + + rcu_assign_pointer(*parent, new_twigs); + qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX); + + return 0; + +free_leaf: + kfree(new); +dec_entries: + atomic_dec(&trie->entries); + return err; +} + +static int qp_trie_add_leaf_node(struct qp_trie *trie, struct qp_trie_branch **parent, + void *key, void *value) +{ + struct bpf_map *map = &trie->map; + struct qp_trie_branch *twigs; + struct bpf_qp_trie_key *new; + int err; + + if (atomic_inc_return(&trie->entries) > map->max_entries) { + err = -ENOSPC; + goto dec_entries; + } + + new = qp_trie_init_leaf_node(map, key, value); + if (!new) { + err = -ENOMEM; + goto dec_entries; + } + + twigs = qp_trie_branch_new(map, 1); + if (!twigs) { + err = -ENOMEM; + goto free_leaf; + } + twigs->index = QP_TRIE_ROOT_NODE_INDEX; + twigs->bitmap = 1; + RCU_INIT_POINTER(twigs->parent, NULL); + rcu_assign_pointer(twigs->nodes[0], to_child_node(new)); + + rcu_assign_pointer(*parent, twigs); + + return 0; +free_leaf: + kfree(new); +dec_entries: + atomic_dec(&trie->entries); + return err; +} + +static int qp_trie_rep_leaf_node(struct qp_trie *trie, struct qp_trie_branch **parent, + void *key, void *value, struct bpf_qp_trie_key *leaf, + unsigned int bitmap) +{ + struct qp_trie_branch *old_twigs, *new_twigs; + struct bpf_map *map = &trie->map; + struct bpf_qp_trie_key *new; + int err; + + new = qp_trie_init_leaf_node(map, key, value); + if (!new) + return -ENOMEM; + + old_twigs = *parent; + new_twigs = qp_trie_branch_replace(map, old_twigs, bitmap, to_child_node(new)); + if (!new_twigs) { + err = -ENOMEM; + goto free_leaf; + } + rcu_assign_pointer(*parent, new_twigs); + + qp_trie_branch_free(old_twigs, calc_twig_index(old_twigs->bitmap, bitmap)); + + return 0; +free_leaf: + kfree(new); + return err; +} + +static int qp_trie_remove_leaf(struct qp_trie *trie, struct qp_trie_branch **parent, + unsigned int bitmap, const struct bpf_qp_trie_key *node) +{ + struct bpf_map *map = &trie->map; + struct qp_trie_branch *new, *old; + unsigned int nr; + + old = *parent; + nr = calc_twig_nr(old->bitmap); + if (nr > 2) { + new = qp_trie_branch_remove(map, old, bitmap); + if (!new) + return -ENOMEM; + } else { + new = NULL; + } + + rcu_assign_pointer(*parent, new); + + qp_trie_branch_free(old, calc_twig_index(old->bitmap, bitmap)); + + atomic_dec(&trie->entries); + + return 0; +} + +static int qp_trie_merge_node(struct qp_trie *trie, struct qp_trie_branch **grand_parent, + struct qp_trie_branch *parent, unsigned int parent_bitmap, + unsigned int bitmap) +{ + struct qp_trie_branch *old_twigs, *new_twigs; + struct bpf_map *map = &trie->map; + void *new_sibling; + unsigned int iip; + + iip = calc_twig_index(parent->bitmap, bitmap); + new_sibling = parent->nodes[!iip]; + + old_twigs = *grand_parent; + new_twigs = qp_trie_branch_replace(map, old_twigs, parent_bitmap, new_sibling); + if (!new_twigs) + return -ENOMEM; + + rcu_assign_pointer(*grand_parent, new_twigs); + + qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX); + qp_trie_branch_free(parent, iip); + + atomic_dec(&trie->entries); + + return 0; +} + +/* key and value are allocated together in qp_trie_init_leaf_node() */ +static inline bool is_valid_k_v_size(unsigned int key_size, unsigned int value_size) +{ + return round_up((u64)key_size + value_size, QP_TRIE_LEAF_ALLOC_ALIGN) <= + KMALLOC_MAX_SIZE; +} + +static int qp_trie_alloc_check(union bpf_attr *attr) +{ + if (!bpf_capable()) + return -EPERM; + + if (!(attr->map_flags | BPF_F_NO_PREALLOC) || + attr->map_flags & ~QP_TRIE_CREATE_FLAG_MASK || + !bpf_map_flags_access_ok(attr->map_flags)) + return -EINVAL; + + if (!attr->max_entries || attr->key_size < QP_TRIE_MIN_KEY_SIZE || + !attr->value_size) + return -EINVAL; + + if (attr->key_size > QP_TRIE_MAX_KEY_SIZE || + !is_valid_k_v_size(attr->key_size, attr->value_size)) + return -E2BIG; + + return 0; +} + +static struct bpf_map *qp_trie_alloc(union bpf_attr *attr) +{ + struct qp_trie *trie; + unsigned int i; + + trie = bpf_map_area_alloc(sizeof(*trie), bpf_map_attr_numa_node(attr)); + if (!trie) + return ERR_PTR(-ENOMEM); + + /* roots are zeroed by bpf_map_area_alloc() */ + for (i = 0; i < ARRAY_SIZE(trie->locks); i++) + spin_lock_init(&trie->locks[i]); + + atomic_set(&trie->entries, 0); + bpf_map_init_from_attr(&trie->map, attr); + + return &trie->map; +} + +static void qp_trie_free_subtree(void *root) +{ + struct qp_trie_branch *parent = NULL; + struct bpf_qp_trie_key *cur = NULL; + void *node = root; + + /* + * Depth-first deletion + * + * 1. find left-most key and its parent + * 2. get next sibling Y from parent + * (a) Y is leaf node: continue + * (b) Y is branch node: goto step 1 + * (c) no more sibling: backtrace upwards if parent is not NULL and + * goto step 1 + */ + do { + while (is_branch_node(node)) { + parent = node; + node = rcu_dereference_protected(parent->nodes[0], 1); + } + + cur = to_leaf_node(node); + while (parent) { + unsigned int iip, bitmap, nr; + void *ancestor; + + bitmap = calc_br_bitmap(parent->index, cur); + iip = calc_twig_index(parent->bitmap, bitmap) + 1; + nr = calc_twig_nr(parent->bitmap); + + for (; iip < nr; iip++) { + kfree(cur); + + node = rcu_dereference_protected(parent->nodes[iip], 1); + if (is_branch_node(node)) + break; + + cur = to_leaf_node(node); + } + if (iip < nr) + break; + + ancestor = rcu_dereference_protected(parent->parent, 1); + kfree(parent); + parent = ancestor; + } + } while (parent); + + kfree(cur); +} + +static void qp_trie_free(struct bpf_map *map) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(trie->roots); i++) { + void *root = rcu_dereference_protected(trie->roots[i], 1); + + if (root) + qp_trie_free_subtree(root); + } + bpf_map_area_free(trie); +} + +static inline void qp_trie_copy_leaf(struct bpf_qp_trie_key *leaf, void *key, unsigned int key_size) +{ + unsigned int used = sizeof(*leaf) + leaf->len; + + memcpy(key, leaf, used); + memset(key + used, 0, key_size - used); +} + +static void qp_trie_copy_min_key_from(void *root, void *next_key, unsigned int key_size) +{ + void *node; + + node = root; + while (is_branch_node(node)) + node = rcu_dereference_raw(((struct qp_trie_branch *)node)->nodes[0]); + + qp_trie_copy_leaf(to_leaf_node(node), next_key, key_size); +} + +static int qp_trie_lookup_min_key(struct qp_trie *trie, unsigned int from, void *next_key) +{ + unsigned int i; + + for (i = from; i < ARRAY_SIZE(trie->roots); i++) { + void *root = rcu_dereference_raw(trie->roots[i]); + + if (root) { + qp_trie_copy_min_key_from(root, next_key, trie->map.key_size); + return 0; + } + } + + return -ENOENT; +} + +static int qp_trie_next_twigs_index(struct qp_trie_branch *twigs, unsigned int bitmap) +{ + unsigned int idx, nr, next; + + /* bitmap may not in twigs->bitmap */ + idx = calc_twig_index(twigs->bitmap, bitmap); + nr = calc_twig_nr(twigs->bitmap); + + next = idx; + if (twigs->bitmap & bitmap) + next += 1; + + if (next >= nr) + return -1; + return next; +} + +static int qp_trie_lookup_next_node(struct qp_trie *trie, struct bpf_qp_trie_key *key, + void *next_key) +{ + struct qp_trie_branch *parent; + struct bpf_qp_trie_key *found; + void *node, *next; + unsigned char c; + + if (!is_valid_key_data_len(key, trie->map.key_size)) + return -EINVAL; + + /* Non-exist key, so restart from the beginning */ + c = key->data[0]; + node = rcu_dereference_raw(trie->roots[c]); + if (!node) + return qp_trie_lookup_min_key(trie, 0, next_key); + + parent = NULL; + while (is_branch_node(node)) { + struct qp_trie_branch *br = node; + unsigned int iip, bitmap; + + bitmap = calc_br_bitmap(br->index, key); + if (bitmap & br->bitmap) + iip = calc_twig_index(br->bitmap, bitmap); + else + iip = 0; + + parent = br; + node = rcu_dereference_raw(br->nodes[iip]); + } + found = to_leaf_node(node); + if (!is_same_key(found, key)) + return qp_trie_lookup_min_key(trie, 0, next_key); + + /* Pair with store release in rcu_assign_pointer(*parent, twigs) to + * ensure reading node->parent will not return the old parent if + * the node is found by following the newly-created parent. + */ + smp_rmb(); + + next = NULL; + while (parent) { + unsigned int bitmap; + int next_idx; + + bitmap = calc_br_bitmap(parent->index, key); + next_idx = qp_trie_next_twigs_index(parent, bitmap); + if (next_idx >= 0) { + next = rcu_dereference_raw(parent->nodes[next_idx]); + break; + } + parent = rcu_dereference_raw(parent->parent); + } + + /* Goto next sub-tree */ + if (!next) + return qp_trie_lookup_min_key(trie, c + 1, next_key); + + if (!is_branch_node(next)) + qp_trie_copy_leaf(to_leaf_node(next), next_key, trie->map.key_size); + else + qp_trie_copy_min_key_from(next, next_key, trie->map.key_size); + + return 0; +} + +static int qp_trie_get_next_key(struct bpf_map *map, void *key, void *next_key) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + struct bpf_qp_trie_key *cur_key = key; + int err; + + if (!cur_key || !cur_key->len) + err = qp_trie_lookup_min_key(trie, 0, next_key); + else + err = qp_trie_lookup_next_node(trie, cur_key, next_key); + return err; +} + +static void *qp_trie_lookup_elem(struct bpf_map *map, void *key) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + struct bpf_qp_trie_key *trie_key = key, *found; + unsigned char c = trie_key->data[0]; + void *node, *value; + + node = rcu_dereference_raw(trie->roots[c]); + if (!node) + return NULL; + + value = NULL; + while (is_branch_node(node)) { + struct qp_trie_branch *br = node; + unsigned int bitmap; + unsigned int iip; + + /* When byte index equals with key len, the target key + * may be in twigs->nodes[0]. + */ + if (index_to_byte_index(br->index) > trie_key->len) + goto done; + + bitmap = calc_br_bitmap(br->index, trie_key); + if (!(bitmap & br->bitmap)) + goto done; + + iip = calc_twig_index(br->bitmap, bitmap); + node = rcu_dereference_raw(br->nodes[iip]); + } + + found = to_leaf_node(node); + if (is_same_key(found, trie_key)) + value = qp_trie_leaf_value(found); +done: + return value; +} + +static int qp_trie_update_elem(struct bpf_map *map, void *key, void *value, u64 flags) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + struct bpf_qp_trie_key *new_key = key, *leaf_key; + struct qp_trie_branch **parent; + struct qp_trie_diff d; + unsigned int bitmap; + spinlock_t *lock; + void **node; + unsigned char c; + bool equal; + int err; + + if (!is_valid_key_data_len(new_key, map->key_size) || flags > BPF_EXIST) + return -EINVAL; + + c = new_key->data[0]; + lock = &trie->locks[c]; + spin_lock(lock); + parent = (struct qp_trie_branch **)&trie->roots[c]; + if (!*parent) { + if (flags == BPF_EXIST) { + err = -ENOENT; + goto unlock; + } + err = qp_trie_add_leaf_node(trie, parent, key, value); + goto unlock; + } + + bitmap = 1; + node = &(*parent)->nodes[0]; + while (is_branch_node(*node)) { + struct qp_trie_branch *br = *node; + unsigned int iip; + + bitmap = calc_br_bitmap(br->index, new_key); + if (bitmap & br->bitmap) + iip = calc_twig_index(br->bitmap, bitmap); + else + iip = 0; + parent = (struct qp_trie_branch **)node; + node = &br->nodes[iip]; + } + + leaf_key = to_leaf_node(*node); + equal = calc_prefix_len(leaf_key, new_key, &d.index); + if (equal) { + if (flags == BPF_NOEXIST) { + err = -EEXIST; + goto unlock; + } + err = qp_trie_rep_leaf_node(trie, parent, key, value, leaf_key, bitmap); + goto unlock; + } + + d.sibling_bm = calc_br_bitmap(d.index, leaf_key); + d.new_bm = calc_br_bitmap(d.index, new_key); + + bitmap = 1; + parent = (struct qp_trie_branch **)&trie->roots[c]; + node = &(*parent)->nodes[0]; + while (is_branch_node(*node)) { + struct qp_trie_branch *br = *node; + unsigned int iip; + + if (d.index < br->index) + goto new_branch; + + parent = (struct qp_trie_branch **)node; + if (d.index == br->index) { + if (flags == BPF_EXIST) { + err = -ENOENT; + goto unlock; + } + err = qp_trie_ext_branch(trie, parent, key, value, d.new_bm); + goto unlock; + } + + bitmap = calc_br_bitmap(br->index, new_key); + iip = calc_twig_index(br->bitmap, bitmap); + node = &br->nodes[iip]; + } + +new_branch: + if (flags == BPF_EXIST) { + err = -ENOENT; + goto unlock; + } + err = qp_trie_new_branch(trie, parent, bitmap, *node, &d, key, value); +unlock: + spin_unlock(lock); + return err; +} + +static int qp_trie_delete_elem(struct bpf_map *map, void *key) +{ + struct qp_trie *trie = container_of(map, struct qp_trie, map); + struct qp_trie_branch **parent, **grand_parent; + unsigned int bitmap = 1, parent_bitmap = 1, nr; + struct bpf_qp_trie_key *trie_key = key, *found; + spinlock_t *lock; + void **node; + unsigned char c; + int err; + + if (!is_valid_key_data_len(trie_key, map->key_size)) + return -EINVAL; + + err = -ENOENT; + grand_parent = NULL; + c = trie_key->data[0]; + lock = &trie->locks[c]; + spin_lock(lock); + parent = (struct qp_trie_branch **)&trie->roots[c]; + if (!*parent) + goto unlock; + + node = &(*parent)->nodes[0]; + while (is_branch_node(*node)) { + struct qp_trie_branch *br = *node; + unsigned int iip; + + if (index_to_byte_index(br->index) > trie_key->len) + goto unlock; + + parent_bitmap = bitmap; + bitmap = calc_br_bitmap(br->index, trie_key); + if (!(bitmap & br->bitmap)) + goto unlock; + + grand_parent = parent; + parent = (struct qp_trie_branch **)node; + iip = calc_twig_index(br->bitmap, bitmap); + node = &br->nodes[iip]; + } + + found = to_leaf_node(*node); + if (!is_same_key(found, trie_key)) + goto unlock; + + nr = calc_twig_nr((*parent)->bitmap); + if (nr != 2) + err = qp_trie_remove_leaf(trie, parent, bitmap, found); + else + err = qp_trie_merge_node(trie, grand_parent, *parent, parent_bitmap, bitmap); +unlock: + spin_unlock(lock); + return err; +} + +static int qp_trie_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type) +{ + /* TODO: key_type embeds bpf_qp_trie_key as its first member */ + return 0; +} + +BTF_ID_LIST_SINGLE(qp_trie_map_btf_ids, struct, qp_trie) +const struct bpf_map_ops qp_trie_map_ops = { + .map_alloc_check = qp_trie_alloc_check, + .map_alloc = qp_trie_alloc, + .map_free = qp_trie_free, + .map_get_next_key = qp_trie_get_next_key, + .map_lookup_elem = qp_trie_lookup_elem, + .map_update_elem = qp_trie_update_elem, + .map_delete_elem = qp_trie_delete_elem, + .map_meta_equal = bpf_map_meta_equal, + .map_check_btf = qp_trie_check_btf, + .map_btf_id = &qp_trie_map_btf_ids[0], +}; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index ffcbf79a556b..39a65bf0d9f4 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -82,6 +82,13 @@ struct bpf_lpm_trie_key { __u8 data[0]; /* Arbitrary size */ }; +struct bpf_qp_trie_key { + /* the length of blob data */ + __u32 len; + /* blob data */ + __u8 data[0]; +}; + struct bpf_cgroup_storage_key { __u64 cgroup_inode_id; /* cgroup inode id */ __u32 attach_type; /* program attach type (enum bpf_attach_type) */ @@ -909,6 +916,7 @@ enum bpf_map_type { BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, + BPF_MAP_TYPE_QP_TRIE, }; /* Note that tracing related programs such as From patchwork Tue Jul 26 13:00:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12929245 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A495ACCA47E for ; Tue, 26 Jul 2022 12:42:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233657AbiGZMmB (ORCPT ); Tue, 26 Jul 2022 08:42:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238516AbiGZMmA (ORCPT ); Tue, 26 Jul 2022 08:42:00 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AACB255AB for ; Tue, 26 Jul 2022 05:41:58 -0700 (PDT) Received: from dggpeml500025.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Lsc1z2BZgzgYtf; Tue, 26 Jul 2022 20:40:07 +0800 (CST) Received: from huawei.com (10.175.124.27) by dggpeml500025.china.huawei.com (7.185.36.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 26 Jul 2022 20:41:55 +0800 From: Hou Tao To: Andrii Nakryiko , Alexei Starovoitov , CC: Daniel Borkmann , Martin KaFai Lau , Yonghong Song , Song Liu , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Hao Luo , Jiri Olsa , John Fastabend , Subject: [RFC PATCH bpf-next 2/3] selftests/bpf: add a simple test for qp-trie Date: Tue, 26 Jul 2022 21:00:04 +0800 Message-ID: <20220726130005.3102470-3-houtao1@huawei.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220726130005.3102470-1-houtao1@huawei.com> References: <20220726130005.3102470-1-houtao1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.27] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpeml500025.china.huawei.com (7.185.36.35) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a test to demonstrate that qp-trie doesn't care about the unused part of key. Signed-off-by: Hou Tao --- .../selftests/bpf/prog_tests/str_key.c | 69 +++++++++++++++ tools/testing/selftests/bpf/progs/str_key.c | 85 +++++++++++++++++++ 2 files changed, 154 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/str_key.c create mode 100644 tools/testing/selftests/bpf/progs/str_key.c diff --git a/tools/testing/selftests/bpf/prog_tests/str_key.c b/tools/testing/selftests/bpf/prog_tests/str_key.c new file mode 100644 index 000000000000..8f134dd45902 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/str_key.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include +#include "str_key.skel.h" + +#define FILE_PATH_SIZE 64 + +struct file_path_str { + unsigned int len; + char raw[FILE_PATH_SIZE]; +}; + +static int setup_maps(struct str_key *skel, const char *name, unsigned int value) +{ + struct file_path_str key; + int fd, err; + + memset(&key, 0, sizeof(key)); + strncpy(key.raw, name, sizeof(key.raw) - 1); + key.len = strlen(name) + 1; + + fd = bpf_map__fd(skel->maps.trie); + err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + if (!ASSERT_OK(err, "trie add")) + return -EINVAL; + + fd = bpf_map__fd(skel->maps.htab); + err = bpf_map_update_elem(fd, key.raw, &value, BPF_NOEXIST); + if (!ASSERT_OK(err, "htab add")) + return -EINVAL; + + return 0; +} + +void test_str_key(void) +{ + const char *name = "/tmp/str_key_test"; + struct str_key *skel; + unsigned int value; + int err, fd; + + skel = str_key__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_load str key")) + return; + + value = time(NULL); + if (setup_maps(skel, name, value)) + goto out; + + skel->bss->pid = getpid(); + err = str_key__attach(skel); + if (!ASSERT_OK(err, "attach")) + goto out; + + fd = open(name, O_RDONLY | O_CREAT, 0644); + if (!ASSERT_GE(fd, 0, "open tmp file")) + goto out; + close(fd); + unlink(name); + + ASSERT_EQ(skel->bss->trie_value, value, "trie lookup str"); + ASSERT_EQ(skel->bss->htab_value, -1, "htab lookup str"); +out: + str_key__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/str_key.c b/tools/testing/selftests/bpf/progs/str_key.c new file mode 100644 index 000000000000..2e8becdd58c6 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/str_key.c @@ -0,0 +1,85 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct path { +} __attribute__((preserve_access_index)); + +struct file { + struct path f_path; +} __attribute__((preserve_access_index)); + +#define FILE_PATH_SIZE 64 + +struct file_path_str { + unsigned int len; + char raw[FILE_PATH_SIZE]; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct file_path_str); +} array SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_QP_TRIE); + __uint(max_entries, 1); + __type(key, struct file_path_str); + __type(value, __u32); + __uint(map_flags, BPF_F_NO_PREALLOC); +} trie SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 1); + __uint(key_size, FILE_PATH_SIZE); + __uint(value_size, sizeof(__u32)); +} htab SEC(".maps"); + +int pid = 0; +unsigned int trie_value = 0; +unsigned int htab_value = 0; + +SEC("fentry/filp_close") +int BPF_PROG(filp_close, struct file *filp) +{ + struct path *p = &filp->f_path; + struct file_path_str *str; + unsigned int *value; + int idx, len; + + if (bpf_get_current_pid_tgid() >> 32 != pid) + return 0; + + idx = 0; + str = bpf_map_lookup_elem(&array, &idx); + if (!str) + return 0; + + len = bpf_d_path(p, str->raw, sizeof(str->raw)); + if (len < 0 || len > sizeof(str->raw)) + return 0; + + str->len = len; + value = bpf_map_lookup_elem(&trie, str); + if (value) + trie_value = *value; + else + trie_value = -1; + + value = bpf_map_lookup_elem(&htab, str->raw); + if (value) + htab_value = *value; + else + htab_value = -1; + + return 0; +} From patchwork Tue Jul 26 13:00:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 12929246 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6870C433EF for ; Tue, 26 Jul 2022 12:42:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238516AbiGZMmC (ORCPT ); Tue, 26 Jul 2022 08:42:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238587AbiGZMmA (ORCPT ); Tue, 26 Jul 2022 08:42:00 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5343A248C2 for ; Tue, 26 Jul 2022 05:41:58 -0700 (PDT) Received: from dggpeml500025.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Lsc191SBXzkXkN; Tue, 26 Jul 2022 20:39:25 +0800 (CST) Received: from huawei.com (10.175.124.27) by dggpeml500025.china.huawei.com (7.185.36.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 26 Jul 2022 20:41:56 +0800 From: Hou Tao To: Andrii Nakryiko , Alexei Starovoitov , CC: Daniel Borkmann , Martin KaFai Lau , Yonghong Song , Song Liu , KP Singh , "David S . Miller" , Jakub Kicinski , Stanislav Fomichev , Hao Luo , Jiri Olsa , John Fastabend , Subject: [RFC PATCH bpf-next 3/3] selftests/bpf: add benchmark for qp-trie map Date: Tue, 26 Jul 2022 21:00:05 +0800 Message-ID: <20220726130005.3102470-4-houtao1@huawei.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20220726130005.3102470-1-houtao1@huawei.com> References: <20220726130005.3102470-1-houtao1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.27] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpeml500025.china.huawei.com (7.185.36.35) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a benchmark for qp-trie map to compare lookup, update/delete performance and memory usage with hash table. When jhash() of bpf htab creates too much hash collisions, lookup or update of qp-trie may performance better than htab as shown below: Randomly-generated binary data (length range: [1, 256], entries=16K) htab lookup (1 thread) 5.062 ± 0.004M/s (drops 0.002 ± 0.000M/s mem 8.125 MiB) htab lookup (2 thread) 10.256 ± 0.017M/s (drops 0.006 ± 0.000M/s mem 8.114 MiB) htab lookup (4 thread) 20.383 ± 0.006M/s (drops 0.009 ± 0.000M/s mem 8.117 MiB) htab lookup (8 thread) 40.727 ± 0.093M/s (drops 0.010 ± 0.000M/s mem 8.123 MiB) htab lookup (16 thread) 81.333 ± 0.311M/s (drops 0.020 ± 0.000M/s mem 8.122 MiB) htab update (1 thread) 2.404 ± 0.012M/s (drops 0.000 ± 0.000M/s mem 8.117 MiB) htab update (2 thread) 3.874 ± 0.010M/s (drops 0.000 ± 0.000M/s mem 8.106 MiB) htab update (4 thread) 5.895 ± 0.062M/s (drops 0.000 ± 0.000M/s mem 8.118 MiB) htab update (8 thread) 7.000 ± 0.088M/s (drops 0.000 ± 0.000M/s mem 8.116 MiB) htab update (16 thread) 7.076 ± 0.043M/s (drops 0.000 ± 0.000M/s mem 8.120 MiB) qp-trie lookup (1 thread) 10.161 ± 0.008M/s (drops 0.006 ± 0.000M/s mem 4.847 MiB) qp-trie lookup (2 thread) 20.287 ± 0.024M/s (drops 0.007 ± 0.000M/s mem 4.828 MiB) qp-trie lookup (4 thread) 40.784 ± 0.020M/s (drops 0.015 ± 0.000M/s mem 4.071 MiB) qp-trie lookup (8 thread) 81.165 ± 0.013M/s (drops 0.040 ± 0.000M/s mem 4.045 MiB) qp-trie lookup (16 thread) 159.955 ± 0.014M/s (drops 0.108 ± 0.000M/s mem 4.495 MiB) qp-trie update (1 thread) 1.621 ± 0.017M/s (drops 0.000 ± 0.000M/s mem 3.939 MiB) qp-trie update (2 thread) 2.615 ± 0.003M/s (drops 0.000 ± 0.000M/s mem 3.914 MiB) qp-trie update (4 thread) 4.017 ± 0.070M/s (drops 0.000 ± 0.000M/s mem 3.585 MiB) qp-trie update (8 thread) 6.857 ± 0.146M/s (drops 0.000 ± 0.000M/s mem 4.490 MiB) qp-trie update (16 thread) 9.871 ± 0.527M/s (drops 0.000 ± 0.000M/s mem 4.209 MiB) But for the strings in /proc/kallsyms, lookup & update/delete performance of qp-trie is 40% or more slower compare with hash table: Strings in /proc/kallsyms (entries=186898) htab lookup (1 thread) 6.684 ± 0.005M/s (drops 0.458 ± 0.002M/s mem 33.614 MiB) htab lookup (2 thread) 12.644 ± 0.006M/s (drops 0.867 ± 0.003M/s mem 33.563 MiB) htab lookup (4 thread) 22.505 ± 0.012M/s (drops 1.542 ± 0.007M/s mem 33.565 MiB) htab lookup (8 thread) 45.302 ± 0.048M/s (drops 3.105 ± 0.015M/s mem 33.599 MiB) htab lookup (16 thread) 90.828 ± 0.069M/s (drops 6.225 ± 0.021M/s mem 33.564 MiB) htab update (1 thread) 2.850 ± 0.129M/s (drops 0.000 ± 0.000M/s mem 33.564 MiB) htab update (2 thread) 4.363 ± 0.031M/s (drops 0.000 ± 0.000M/s mem 33.563 MiB) htab update (4 thread) 6.306 ± 0.096M/s (drops 0.000 ± 0.000M/s mem 33.718 MiB) htab update (8 thread) 6.611 ± 0.026M/s (drops 0.000 ± 0.000M/s mem 33.627 MiB) htab update (16 thread) 6.390 ± 0.015M/s (drops 0.000 ± 0.000M/s mem 33.564 MiB) qp-trie lookup (1 thread) 4.122 ± 0.004M/s (drops 0.282 ± 0.001M/s mem 18.579 MiB) qp-trie lookup (2 thread) 8.280 ± 0.011M/s (drops 0.568 ± 0.004M/s mem 18.388 MiB) qp-trie lookup (4 thread) 15.927 ± 0.013M/s (drops 1.092 ± 0.007M/s mem 18.613 MiB) qp-trie lookup (8 thread) 31.412 ± 0.026M/s (drops 2.153 ± 0.008M/s mem 18.475 MiB) qp-trie lookup (16 thread) 63.807 ± 0.071M/s (drops 4.375 ± 0.025M/s mem 18.276 MiB) qp-trie update (1 thread) 1.157 ± 0.099M/s (drops 0.000 ± 0.000M/s mem 18.333 MiB) qp-trie update (2 thread) 1.920 ± 0.062M/s (drops 0.000 ± 0.000M/s mem 18.293 MiB) qp-trie update (4 thread) 2.630 ± 0.050M/s (drops 0.000 ± 0.000M/s mem 18.472 MiB) qp-trie update (8 thread) 3.171 ± 0.027M/s (drops 0.000 ± 0.000M/s mem 18.301 MiB) qp-trie update (16 thread) 3.782 ± 0.036M/s (drops 0.000 ± 0.000M/s mem 19.040 MiB) For strings in BTF string section, the result is similar except the memory saving of qp-trie decreases: Sorted strings in BTF string sections (entries=115980) htab lookup (1 thread) 9.792 ± 0.017M/s (drops 0.000 ± 0.000M/s mem 15.069 MiB) htab lookup (2 thread) 19.581 ± 0.008M/s (drops 0.000 ± 0.000M/s mem 15.069 MiB) htab lookup (4 thread) 36.652 ± 0.022M/s (drops 0.000 ± 0.000M/s mem 15.069 MiB) htab lookup (8 thread) 73.799 ± 0.053M/s (drops 0.000 ± 0.000M/s mem 15.068 MiB) htab lookup (16 thread) 146.957 ± 0.054M/s (drops 0.000 ± 0.000M/s mem 15.067 MiB) htab update (1 thread) 3.497 ± 0.051M/s (drops 0.000 ± 0.000M/s mem 15.069 MiB) htab update (2 thread) 4.976 ± 0.039M/s (drops 0.000 ± 0.000M/s mem 15.067 MiB) htab update (4 thread) 6.550 ± 0.025M/s (drops 0.000 ± 0.000M/s mem 15.068 MiB) htab update (8 thread) 6.821 ± 0.015M/s (drops 0.000 ± 0.000M/s mem 15.067 MiB) htab update (16 thread) 7.163 ± 0.057M/s (drops 0.000 ± 0.000M/s mem 15.069 MiB) qp-trie lookup (1 thread) 6.144 ± 0.003M/s (drops 0.000 ± 0.000M/s mem 11.361 MiB) qp-trie lookup (2 thread) 12.461 ± 0.003M/s (drops 0.000 ± 0.000M/s mem 11.031 MiB) qp-trie lookup (4 thread) 25.098 ± 0.008M/s (drops 0.000 ± 0.000M/s mem 11.082 MiB) qp-trie lookup (8 thread) 49.734 ± 0.015M/s (drops 0.000 ± 0.000M/s mem 11.187 MiB) qp-trie lookup (16 thread) 97.829 ± 0.033M/s (drops 0.000 ± 0.000M/s mem 10.803 MiB) qp-trie update (1 thread) 1.473 ± 0.112M/s (drops 0.000 ± 0.000M/s mem 10.848 MiB) qp-trie update (2 thread) 2.649 ± 0.033M/s (drops 0.000 ± 0.000M/s mem 10.817 MiB) qp-trie update (4 thread) 5.129 ± 0.071M/s (drops 0.000 ± 0.000M/s mem 10.901 MiB) qp-trie update (8 thread) 9.191 ± 0.049M/s (drops 0.000 ± 0.000M/s mem 10.753 MiB) qp-trie update (16 thread) 11.095 ± 0.167M/s (drops 0.000 ± 0.000M/s mem 10.918 MiB) Signed-off-by: Hou Tao --- tools/testing/selftests/bpf/Makefile | 5 +- tools/testing/selftests/bpf/bench.c | 10 + .../selftests/bpf/benchs/bench_qp_trie.c | 499 ++++++++++++++++++ .../selftests/bpf/benchs/run_bench_qp_trie.sh | 55 ++ .../selftests/bpf/progs/qp_trie_bench.c | 218 ++++++++ 5 files changed, 786 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/benchs/bench_qp_trie.c create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh create mode 100644 tools/testing/selftests/bpf/progs/qp_trie_bench.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 1d1a9f15c140..67a1a79f460d 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -575,11 +575,13 @@ $(OUTPUT)/bench_strncmp.o: $(OUTPUT)/strncmp_bench.skel.h $(OUTPUT)/bench_bpf_hashmap_full_update.o: $(OUTPUT)/bpf_hashmap_full_update_bench.skel.h $(OUTPUT)/bench_local_storage.o: $(OUTPUT)/local_storage_bench.skel.h $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o: $(OUTPUT)/local_storage_rcu_tasks_trace_bench.skel.h +$(OUTPUT)/bench_qp_trie.o: $(OUTPUT)/qp_trie_bench.skel.h $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ) $(OUTPUT)/bench: LDLIBS += -lm $(OUTPUT)/bench: $(OUTPUT)/bench.o \ $(TESTING_HELPERS) \ $(TRACE_HELPERS) \ + $(CGROUP_HELPERS) \ $(OUTPUT)/bench_count.o \ $(OUTPUT)/bench_rename.o \ $(OUTPUT)/bench_trigger.o \ @@ -589,7 +591,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \ $(OUTPUT)/bench_strncmp.o \ $(OUTPUT)/bench_bpf_hashmap_full_update.o \ $(OUTPUT)/bench_local_storage.o \ - $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o + $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o \ + $(OUTPUT)/bench_qp_trie.o $(call msg,BINARY,,$@) $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@ diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c index c1f20a147462..618f45fbe6e2 100644 --- a/tools/testing/selftests/bpf/bench.c +++ b/tools/testing/selftests/bpf/bench.c @@ -275,6 +275,7 @@ extern struct argp bench_bpf_loop_argp; extern struct argp bench_local_storage_argp; extern struct argp bench_local_storage_rcu_tasks_trace_argp; extern struct argp bench_strncmp_argp; +extern struct argp bench_qp_trie_argp; static const struct argp_child bench_parsers[] = { { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 }, @@ -284,6 +285,7 @@ static const struct argp_child bench_parsers[] = { { &bench_strncmp_argp, 0, "bpf_strncmp helper benchmark", 0 }, { &bench_local_storage_rcu_tasks_trace_argp, 0, "local_storage RCU Tasks Trace slowdown benchmark", 0 }, + { &bench_qp_trie_argp, 0, "qp-trie benchmark", 0 }, {}, }; @@ -490,6 +492,10 @@ extern const struct bench bench_local_storage_cache_seq_get; extern const struct bench bench_local_storage_cache_interleaved_get; extern const struct bench bench_local_storage_cache_hashmap_control; extern const struct bench bench_local_storage_tasks_trace; +extern const struct bench bench_htab_lookup; +extern const struct bench bench_qp_trie_lookup; +extern const struct bench bench_htab_update; +extern const struct bench bench_qp_trie_update; static const struct bench *benchs[] = { &bench_count_global, @@ -529,6 +535,10 @@ static const struct bench *benchs[] = { &bench_local_storage_cache_interleaved_get, &bench_local_storage_cache_hashmap_control, &bench_local_storage_tasks_trace, + &bench_htab_lookup, + &bench_qp_trie_lookup, + &bench_htab_update, + &bench_qp_trie_update, }; static void setup_benchmark() diff --git a/tools/testing/selftests/bpf/benchs/bench_qp_trie.c b/tools/testing/selftests/bpf/benchs/bench_qp_trie.c new file mode 100644 index 000000000000..35359e5e442b --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/bench_qp_trie.c @@ -0,0 +1,499 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include "bench.h" +#include "bpf_util.h" +#include "cgroup_helpers.h" + +#include "qp_trie_bench.skel.h" + +enum { + FOR_HTAB = 0, + FOR_TRIE, +}; + +static struct qp_trie_ctx { + struct qp_trie_bench *skel; + int cgrp_dfd; + u64 map_slab_mem; +} ctx; + +static struct { + const char *file; + __u32 entries; +} args; + +struct run_stat { + __u64 stats[2]; +}; + +enum { + ARG_DATA_FILE = 8001, + ARG_DATA_ENTRIES = 8002, +}; + +static const struct argp_option opts[] = { + { "file", ARG_DATA_FILE, "DATA-FILE", 0, "Set data file" }, + { "entries", ARG_DATA_ENTRIES, "DATA-ENTRIES", 0, "Set data entries" }, + {}, +}; + +static error_t qp_trie_parse_arg(int key, char *arg, struct argp_state *state) +{ + switch (key) { + case ARG_DATA_FILE: + args.file = strdup(arg); + break; + case ARG_DATA_ENTRIES: + args.entries = strtoul(arg, NULL, 10); + break; + default: + return ARGP_ERR_UNKNOWN; + } + + return 0; +} + +const struct argp bench_qp_trie_argp = { + .options = opts, + .parser = qp_trie_parse_arg, +}; + +static int parse_data_set(const char *name, struct bpf_qp_trie_key ***set, unsigned int *nr, + unsigned int *max_len) +{ +#define INT_MAX_DATA_SIZE 1024 + unsigned int i, nr_items, item_max_len; + char line[INT_MAX_DATA_SIZE + 1]; + struct bpf_qp_trie_key **items; + struct bpf_qp_trie_key *cur; + int err = 0; + FILE *file; + char *got; + + file = fopen(name, "rb"); + if (!file) { + fprintf(stderr, "open %s err %s\n", name, strerror(errno)); + return -1; + } + + got = fgets(line, sizeof(line), file); + if (!got) { + fprintf(stderr, "empty file ?\n"); + err = -1; + goto out; + } + if (sscanf(line, "%u", &nr_items) != 1) { + fprintf(stderr, "the first line must be the number of items\n"); + err = -1; + goto out; + } + + fprintf(stdout, "item %u\n", nr_items); + + items = (struct bpf_qp_trie_key **)calloc(nr_items, sizeof(*items) + INT_MAX_DATA_SIZE); + if (!items) { + fprintf(stderr, "no mem for items\n"); + err = -1; + goto out; + } + + i = 0; + item_max_len = 0; + cur = (void *)items + sizeof(*items) * nr_items; + while (true) { + unsigned int len; + + got = fgets(line, sizeof(line), file); + if (!got) { + if (!feof(file)) { + fprintf(stderr, "read file %s error\n", name); + err = -1; + } + break; + } + + len = strlen(got); + if (len && got[len - 1] == '\n') { + got[len - 1] = 0; + len -= 1; + } + if (!len) { + fprintf(stdout, "#%u empty line\n", i + 2); + continue; + } + + if (i >= nr_items) { + fprintf(stderr, "too many line in %s\n", name); + break; + } + + if (len > item_max_len) + item_max_len = len; + cur->len = len; + memcpy(cur->data, got, len); + items[i++] = cur; + cur = (void *)cur + INT_MAX_DATA_SIZE; + } + + if (!err) { + if (i != nr_items) + fprintf(stdout, "few lines in %s (exp %u got %u)\n", name, nr_items, i); + *nr = i; + *set = items; + *max_len = item_max_len; + } else { + free(items); + } + +out: + fclose(file); + return err; +} + +static int gen_data_set(struct bpf_qp_trie_key ***set, unsigned int *nr, unsigned int *max_len) +{ +#define RND_MAX_DATA_SIZE 256 + struct bpf_qp_trie_key **items; + struct bpf_qp_trie_key *cur; + size_t ptr_size, data_size; + unsigned int i, nr_items; + ssize_t got; + int err = 0; + + ptr_size = *nr * sizeof(*items); + data_size = *nr * (sizeof(*cur) + RND_MAX_DATA_SIZE); + items = (struct bpf_qp_trie_key **)malloc(ptr_size + data_size); + if (!items) { + fprintf(stderr, "no mem for items\n"); + err = -1; + goto out; + } + + cur = (void *)items + ptr_size; + got = syscall(__NR_getrandom, cur, data_size, 0); + if (got != data_size) { + fprintf(stderr, "getrandom error %s\n", strerror(errno)); + err = -1; + goto out; + } + + nr_items = 0; + for (i = 0; i < *nr; i++) { + cur->len &= 0xff; + if (cur->len) { + items[nr_items++] = cur; + memset(cur->data + cur->len, 0, RND_MAX_DATA_SIZE - cur->len); + } + cur = (void *)cur + (sizeof(*cur) + RND_MAX_DATA_SIZE); + } + if (!nr_items) { + fprintf(stderr, "no valid key in random data\n"); + err = -1; + goto out; + } + fprintf(stdout, "generate %u random keys\n", nr_items); + + *nr = nr_items; + *set = items; + *max_len = RND_MAX_DATA_SIZE; +out: + if (err && items) + free(items); + return err; +} + +static void qp_trie_validate(void) +{ + if (env.consumer_cnt != 1) { + fprintf(stderr, "qp_trie_map benchmark doesn't support multi-consumer!\n"); + exit(1); + } + + if (!args.file && !args.entries) { + fprintf(stderr, "must specify entries when use random generated data set\n"); + exit(1); + } + + if (args.file && access(args.file, R_OK)) { + fprintf(stderr, "data file is un-accessible\n"); + exit(1); + } +} + +static void qp_trie_init_map_opts(struct qp_trie_bench *skel, unsigned int data_size, + unsigned int nr) +{ + unsigned int key_size = data_size + sizeof(struct bpf_qp_trie_key); + + bpf_map__set_value_size(skel->maps.htab_array, data_size); + bpf_map__set_max_entries(skel->maps.htab_array, nr); + + bpf_map__set_key_size(skel->maps.htab, data_size); + bpf_map__set_max_entries(skel->maps.htab, nr); + + bpf_map__set_value_size(skel->maps.trie_array, key_size); + bpf_map__set_max_entries(skel->maps.trie_array, nr); + + bpf_map__set_key_size(skel->maps.qp_trie, key_size); + bpf_map__set_max_entries(skel->maps.qp_trie, nr); +} + +static void qp_trie_setup_key_map(struct bpf_map *map, unsigned int map_type, + struct bpf_qp_trie_key **set, unsigned int nr) +{ + int fd = bpf_map__fd(map); + unsigned int i; + + for (i = 0; i < nr; i++) { + void *value; + int err; + + value = (map_type != FOR_HTAB) ? (void *)set[i] : (void *)set[i]->data; + err = bpf_map_update_elem(fd, &i, value, 0); + if (err) { + fprintf(stderr, "add #%u key (%s) on %s error %d\n", + i, set[i]->data, bpf_map__name(map), err); + exit(1); + } + } +} + +static u64 qp_trie_get_slab_mem(int dfd) +{ + const char *magic = "slab "; + const char *name = "memory.stat"; + int fd; + ssize_t nr; + char buf[4096]; + char *from; + + fd = openat(dfd, name, 0); + if (fd < 0) { + fprintf(stderr, "no %s\n", name); + exit(1); + } + + nr = read(fd, buf, sizeof(buf)); + if (nr <= 0) { + fprintf(stderr, "empty %s ?\n", name); + exit(1); + } + buf[nr - 1] = 0; + + close(fd); + + from = strstr(buf, magic); + if (!from) { + fprintf(stderr, "no slab in %s\n", name); + exit(1); + } + + return strtoull(from + strlen(magic), NULL, 10); +} + +static void qp_trie_setup_lookup_map(struct bpf_map *map, unsigned int map_type, + struct bpf_qp_trie_key **set, unsigned int nr) +{ + int fd = bpf_map__fd(map); + unsigned int i; + + for (i = 0; i < nr; i++) { + void *key; + int err; + + key = (map_type != FOR_HTAB) ? (void *)set[i] : (void *)set[i]->data; + err = bpf_map_update_elem(fd, key, &i, 0); + if (err) { + fprintf(stderr, "add #%u key (%s) on %s error %d\n", + i, set[i]->data, bpf_map__name(map), err); + exit(1); + } + } +} + +static void qp_trie_setup(unsigned int map_type) +{ + struct bpf_qp_trie_key **set = NULL; + struct qp_trie_bench *skel; + unsigned int nr = 0, max_len = 0; + struct bpf_map *map; + u64 before, after; + int dfd; + int err; + + if (!args.file) { + nr = args.entries; + err = gen_data_set(&set, &nr, &max_len); + } else { + err = parse_data_set(args.file, &set, &nr, &max_len); + } + if (err < 0) + exit(1); + + if (args.entries && args.entries < nr) + nr = args.entries; + + dfd = cgroup_setup_and_join("/qp_trie"); + if (dfd < 0) { + fprintf(stderr, "failed to setup cgroup env\n"); + exit(1); + } + + setup_libbpf(); + + before = qp_trie_get_slab_mem(dfd); + + skel = qp_trie_bench__open(); + if (!skel) { + fprintf(stderr, "failed to open skeleton\n"); + exit(1); + } + + qp_trie_init_map_opts(skel, max_len, nr); + + skel->bss->update_nr = nr; + skel->bss->update_chunk = nr / env.producer_cnt; + + err = qp_trie_bench__load(skel); + if (err) { + fprintf(stderr, "failed to load skeleton\n"); + exit(1); + } + + map = (map_type == FOR_HTAB) ? skel->maps.htab_array : skel->maps.trie_array; + qp_trie_setup_key_map(map, map_type, set, nr); + + map = (map_type == FOR_HTAB) ? skel->maps.htab : skel->maps.qp_trie; + qp_trie_setup_lookup_map(map, map_type, set, nr); + + after = qp_trie_get_slab_mem(dfd); + + ctx.skel = skel; + ctx.cgrp_dfd = dfd; + ctx.map_slab_mem = after - before; +} + +static void qp_trie_attach_prog(struct bpf_program *prog) +{ + struct bpf_link *link; + + link = bpf_program__attach(prog); + if (!link) { + fprintf(stderr, "failed to attach program!\n"); + exit(1); + } +} + +static void htab_lookup_setup(void) +{ + qp_trie_setup(FOR_HTAB); + qp_trie_attach_prog(ctx.skel->progs.htab_lookup); +} + +static void qp_trie_lookup_setup(void) +{ + qp_trie_setup(FOR_TRIE); + qp_trie_attach_prog(ctx.skel->progs.qp_trie_lookup); +} + +static void htab_update_setup(void) +{ + qp_trie_setup(FOR_HTAB); + qp_trie_attach_prog(ctx.skel->progs.htab_update); +} + +static void qp_trie_update_setup(void) +{ + qp_trie_setup(FOR_TRIE); + qp_trie_attach_prog(ctx.skel->progs.qp_trie_update); +} + +static void *qp_trie_producer(void *ctx) +{ + while (true) + (void)syscall(__NR_getpgid); + return NULL; +} + +static void *qp_trie_consumer(void *ctx) +{ + return NULL; +} + +static void qp_trie_measure(struct bench_res *res) +{ + static __u64 last_hits, last_drops; + __u64 total_hits = 0, total_drops = 0; + unsigned int i, nr_cpus; + + nr_cpus = bpf_num_possible_cpus(); + for (i = 0; i < nr_cpus; i++) { + struct run_stat *s = (void *)&ctx.skel->bss->percpu_stats[i & 255]; + + total_hits += s->stats[0]; + total_drops += s->stats[1]; + } + + res->hits = total_hits - last_hits; + res->drops = total_drops - last_drops; + + last_hits = total_hits; + last_drops = total_drops; +} + +static void qp_trie_report_final(struct bench_res res[], int res_cnt) +{ + close(ctx.cgrp_dfd); + cleanup_cgroup_environment(); + + fprintf(stdout, "Slab: %.3f MiB\n", (float)ctx.map_slab_mem / 1024 / 1024); + hits_drops_report_final(res, res_cnt); +} + +const struct bench bench_htab_lookup = { + .name = "htab-lookup", + .validate = qp_trie_validate, + .setup = htab_lookup_setup, + .producer_thread = qp_trie_producer, + .consumer_thread = qp_trie_consumer, + .measure = qp_trie_measure, + .report_progress = hits_drops_report_progress, + .report_final = qp_trie_report_final, +}; + +const struct bench bench_qp_trie_lookup = { + .name = "qp-trie-lookup", + .validate = qp_trie_validate, + .setup = qp_trie_lookup_setup, + .producer_thread = qp_trie_producer, + .consumer_thread = qp_trie_consumer, + .measure = qp_trie_measure, + .report_progress = hits_drops_report_progress, + .report_final = qp_trie_report_final, +}; + +const struct bench bench_htab_update = { + .name = "htab-update", + .validate = qp_trie_validate, + .setup = htab_update_setup, + .producer_thread = qp_trie_producer, + .consumer_thread = qp_trie_consumer, + .measure = qp_trie_measure, + .report_progress = hits_drops_report_progress, + .report_final = qp_trie_report_final, +}; + +const struct bench bench_qp_trie_update = { + .name = "qp-trie-update", + .validate = qp_trie_validate, + .setup = qp_trie_update_setup, + .producer_thread = qp_trie_producer, + .consumer_thread = qp_trie_consumer, + .measure = qp_trie_measure, + .report_progress = hits_drops_report_progress, + .report_final = qp_trie_report_final, +}; diff --git a/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh b/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh new file mode 100755 index 000000000000..0cbcb5bc9292 --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh @@ -0,0 +1,55 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2022. Huawei Technologies Co., Ltd + +source ./benchs/run_common.sh + +set -eufo pipefail + +mem() +{ + echo "$*" | sed -E "s/.*Slab: ([0-9]+\.[0-9]+ MiB).*/\1/" +} + +run_qp_trie_bench() +{ + local title=$1 + local summary + + shift 1 + summary=$($RUN_BENCH "$@" | grep "Summary\|Slab:") + printf "%s %20s (drops %-16s mem %s)\n" "$title" "$(hits $summary)" \ + "$(drops $summary)" "$(mem $summary)" +} + +run_qp_trie_benchs() +{ + local p + local m + local b + local title + + for m in htab qp-trie + do + for b in lookup update + do + for p in 1 2 4 8 16 + do + title=$(printf "%-16s (%-2d thread)" "$m $b" $p) + run_qp_trie_bench "$title" ${m}-${b} -p $p "$@" + done + done + done + echo +} + +echo "Randomly-generated binary data (16K)" +run_qp_trie_benchs --entries 16384 + +echo "Strings in /proc/kallsyms" +TMP_FILE=/tmp/kallsyms.txt +SRC_FILE=/proc/kallsyms +trap 'rm -f $TMP_FILE' EXIT +wc -l $SRC_FILE | awk '{ print $1}' > $TMP_FILE +awk '{ print $3 }' $SRC_FILE >> $TMP_FILE +run_qp_trie_benchs --file $TMP_FILE diff --git a/tools/testing/selftests/bpf/progs/qp_trie_bench.c b/tools/testing/selftests/bpf/progs/qp_trie_bench.c new file mode 100644 index 000000000000..21202c578105 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/qp_trie_bench.c @@ -0,0 +1,218 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */ +#include +#include +#include +#include +#include + +struct bpf_map; + +/* value_size will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(key_size, 4); +} htab_array SEC(".maps"); + +/* value_size will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(key_size, 4); +} trie_array SEC(".maps"); + +/* key_size will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(value_size, 4); + __uint(map_flags, BPF_F_NO_PREALLOC); +} htab SEC(".maps"); + +/* key_size will be set by benchmark */ +struct { + __uint(type, BPF_MAP_TYPE_QP_TRIE); + __uint(value_size, 4); + __uint(map_flags, BPF_F_NO_PREALLOC); +} qp_trie SEC(".maps"); + +char _license[] SEC("license") = "GPL"; + +struct { + __u64 stats[2]; +} __attribute__((__aligned__(128))) percpu_stats[256]; + +struct update_ctx { + unsigned int max; + unsigned int from; +}; + +unsigned int update_nr; +unsigned int update_chunk; + +static __always_inline void update_stats(int idx) +{ + __u32 cpu = bpf_get_smp_processor_id(); + + percpu_stats[cpu & 255].stats[idx]++; +} + +static int lookup_htab(struct bpf_map *map, __u32 *key, void *value, void *data) +{ + __u32 *index; + + index = bpf_map_lookup_elem(&htab, value); + if (index && *index == *key) + update_stats(0); + else + update_stats(1); + return 0; +} + +static int update_htab_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + void *value; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&htab_array, &update->from); + if (!value) + return 1; + + err = bpf_map_update_elem(&htab, value, &update->from, 0); + if (!err) + update_stats(0); + else + update_stats(1); + update->from++; + + return 0; +} + +static int delete_htab_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + void *value; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&htab_array, &update->from); + if (!value) + return 1; + + err = bpf_map_delete_elem(&htab, value); + if (!err) + update_stats(0); + update->from++; + + return 0; +} + +static int lookup_qp_trie(struct bpf_map *map, __u32 *key, void *value, void *data) +{ + __u32 *index; + + index = bpf_map_lookup_elem(&qp_trie, value); + if (index && *index == *key) + update_stats(0); + else + update_stats(1); + return 0; +} + +static int update_qp_trie_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + void *value; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&trie_array, &update->from); + if (!value) + return 1; + + err = bpf_map_update_elem(&qp_trie, value, &update->from, 0); + if (!err) + update_stats(0); + else + update_stats(1); + update->from++; + + return 0; +} + +static int delete_qp_trie_loop(unsigned int i, void *ctx) +{ + struct update_ctx *update = ctx; + void *value; + int err; + + if (update->from >= update->max) + update->from = 0; + value = bpf_map_lookup_elem(&trie_array, &update->from); + if (!value) + return 1; + + err = bpf_map_delete_elem(&qp_trie, value); + if (!err) + update_stats(0); + update->from++; + + return 0; +} + +SEC("tp/syscalls/sys_enter_getpgid") +int htab_lookup(void *ctx) +{ + bpf_for_each_map_elem(&htab_array, lookup_htab, NULL, 0); + return 0; +} + +SEC("tp/syscalls/sys_enter_getpgid") +int qp_trie_lookup(void *ctx) +{ + bpf_for_each_map_elem(&trie_array, lookup_qp_trie, NULL, 0); + return 0; +} + +SEC("tp/syscalls/sys_enter_getpgid") +int htab_update(void *ctx) +{ + unsigned int index = bpf_get_smp_processor_id() * update_chunk; + struct update_ctx update; + + update.max = update_nr; + if (update.max && index >= update.max) + index %= update.max; + + /* Only operate part of keys according to cpu id */ + update.from = index; + bpf_loop(update_chunk, update_htab_loop, &update, 0); + + update.from = index; + bpf_loop(update_chunk, delete_htab_loop, &update, 0); + + return 0; +} + +SEC("tp/syscalls/sys_enter_getpgid") +int qp_trie_update(void *ctx) +{ + unsigned int index = bpf_get_smp_processor_id() * update_chunk; + struct update_ctx update; + + update.max = update_nr; + if (update.max && index >= update.max) + index %= update.max; + + /* Only operate part of keys according to cpu id */ + update.from = index; + bpf_loop(update_chunk, update_qp_trie_loop, &update, 0); + + update.from = index; + bpf_loop(update_chunk, delete_qp_trie_loop, &update, 0); + + return 0; +}