[bpf-next,v4,09/24] bpf: Support bpf_list_head in map values

Message ID	20221103191013.1236066-10-memxor@gmail.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: Kumar Kartikeya Dwivedi <memxor@gmail.com> To: bpf@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Martin KaFai Lau <martin.lau@kernel.org>, Dave Marchevsky <davemarchevsky@meta.com>, Delyan Kratunov <delyank@meta.com> Subject: [PATCH bpf-next v4 09/24] bpf: Support bpf_list_head in map values Date: Fri, 4 Nov 2022 00:39:58 +0530 Message-Id: <20221103191013.1236066-10-memxor@gmail.com> In-Reply-To: <20221103191013.1236066-1-memxor@gmail.com> References: <20221103191013.1236066-1-memxor@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Local kptrs, BPF linked lists \| expand [bpf-next,v4,00/24] Local kptrs, BPF linked lists [bpf-next,v4,01/24] bpf: Document UAPI details for special BPF types [bpf-next,v4,02/24] bpf: Allow specifying volatile type modifier for kptrs [bpf-next,v4,03/24] bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID [bpf-next,v4,04/24] bpf: Fix slot type check in check_stack_write_var_off [bpf-next,v4,05/24] bpf: Drop reg_type_may_be_refcounted_or_null [bpf-next,v4,06/24] bpf: Refactor kptr_off_tab into btf_record [bpf-next,v4,07/24] bpf: Consolidate spin_lock, timer management into btf_record [bpf-next,v4,08/24] bpf: Refactor map->off_arr handling [bpf-next,v4,09/24] bpf: Support bpf_list_head in map values [bpf-next,v4,10/24] bpf: Introduce local kptrs [bpf-next,v4,11/24] bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs [bpf-next,v4,12/24] bpf: Verify ownership relationships for user BTF types [bpf-next,v4,13/24] bpf: Support locking bpf_spin_lock in local kptr [bpf-next,v4,14/24] bpf: Allow locking bpf_spin_lock global variables [bpf-next,v4,15/24] bpf: Rewrite kfunc argument handling [bpf-next,v4,16/24] bpf: Drop kfunc bits from btf_check_func_arg_match [bpf-next,v4,17/24] bpf: Support constant scalar arguments for kfuncs [bpf-next,v4,18/24] bpf: Teach verifier about non-size constant arguments [bpf-next,v4,19/24] bpf: Introduce bpf_obj_new [bpf-next,v4,20/24] bpf: Introduce bpf_obj_drop [bpf-next,v4,21/24] bpf: Permit NULL checking pointer with non-zero fixed offset [bpf-next,v4,22/24] bpf: Introduce single ownership BPF linked list API [bpf-next,v4,23/24] selftests/bpf: Add __contains macro to bpf_experimental.h [bpf-next,v4,24/24] selftests/bpf: Add BPF linked list API tests

Context	Check	Description
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Series has a cover letter
netdev/patch_count	fail	Series longer than 15 patches (and no cover letter)
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1692 this patch: 1692
netdev/cc_maintainers	warning	8 maintainers not CCed: sdf@google.com kpsingh@kernel.org haoluo@google.com yhs@fb.com jolsa@kernel.org martin.lau@linux.dev song@kernel.org john.fastabend@gmail.com
netdev/build_clang	success	Errors and warnings before: 169 this patch: 169
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1684 this patch: 1684
netdev/checkpatch	fail	ERROR: space prohibited before that ':' (ctx:WxV) WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns WARNING: line length of 99 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-2	fail	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-1	pending	Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain }}
bpf/vmtest-bpf-next-VM_Test-3	fail	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-4	fail	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5	fail	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for set-matrix
bpf/vmtest-bpf-next-PR	fail	merge-conflict

diff --git a/include/linux/bpf.h b/include/linux/bpf.h index bb96bf947e53..2d9ebe9efcad 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -176,6 +176,7 @@ enum btf_field_type { BPF_KPTR_UNREF = (1 << 2), BPF_KPTR_REF = (1 << 3), BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, + BPF_LIST_HEAD = (1 << 4), }; struct btf_field_kptr { @@ -185,11 +186,18 @@ struct btf_field_kptr { u32 btf_id; }; +struct btf_field_list_head { + struct btf *btf; + u32 value_btf_id; + u32 node_offset; +}; + struct btf_field { u32 offset; enum btf_field_type type; union { struct btf_field_kptr kptr; + struct btf_field_list_head list_head; }; }; @@ -267,6 +275,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type) case BPF_KPTR_UNREF: case BPF_KPTR_REF: return "kptr"; + case BPF_LIST_HEAD: + return "bpf_list_head"; default: WARN_ON_ONCE(1); return "unknown"; @@ -283,6 +293,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type) case BPF_KPTR_UNREF: case BPF_KPTR_REF: return sizeof(u64); + case BPF_LIST_HEAD: + return sizeof(struct bpf_list_head); default: WARN_ON_ONCE(1); return 0; @@ -299,6 +311,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type) case BPF_KPTR_UNREF: case BPF_KPTR_REF: return __alignof__(u64); + case BPF_LIST_HEAD: + return __alignof__(struct bpf_list_head); default: WARN_ON_ONCE(1); return 0; @@ -404,6 +418,9 @@ static inline void zero_map_value(struct bpf_map *map, void *dst) void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, bool lock_src); void bpf_timer_cancel_and_free(void *timer); +void bpf_list_head_free(const struct btf_field *field, void *list_head, + struct bpf_spin_lock *spin_lock); + int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size); struct bpf_offload_dev; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 94659f6b3395..dd381086bad9 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6887,6 +6887,16 @@ struct bpf_dynptr { __u64 :64; } __attribute__((aligned(8))); +struct bpf_list_head { + __u64 :64; + __u64 :64; +} __attribute__((aligned(8))); + +struct bpf_list_node { + __u64 :64; + __u64 :64; +} __attribute__((aligned(8))); + struct bpf_sysctl { __u32 write; /* Sysctl is being read (= 0) or written (= 1). * Allows 1,2,4-byte read, but no write. diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 197687c86dc1..e56025505467 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3205,9 +3205,15 @@ enum { struct btf_field_info { enum btf_field_type type; u32 off; - struct { - u32 type_id; - } kptr; + union { + struct { + u32 type_id; + } kptr; + struct { + const char *node_name; + u32 value_btf_id; + } list_head; + }; }; static int btf_find_struct(const struct btf *btf, const struct btf_type *t, @@ -3261,6 +3267,66 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t, return BTF_FIELD_FOUND; } +static const char *btf_find_decl_tag_value(const struct btf *btf, + const struct btf_type *pt, + int comp_idx, const char *tag_key) +{ + int i; + + for (i = 1; i < btf_nr_types(btf); i++) { + const struct btf_type *t = btf_type_by_id(btf, i); + int len = strlen(tag_key); + + if (!btf_type_is_decl_tag(t)) + continue; + /* TODO: Instead of btf_type pt, it would be much better if we had BTF + * ID of the map value type. This would avoid btf_type_by_id call here. + */ + if (pt != btf_type_by_id(btf, t->type) || + btf_type_decl_tag(t)->component_idx != comp_idx) + continue; + if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len)) + continue; + return __btf_name_by_offset(btf, t->name_off) + len; + } + return NULL; +} + +static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt, + const struct btf_type *t, int comp_idx, + u32 off, int sz, struct btf_field_info *info) +{ + const char *value_type; + const char *list_node; + s32 id; + + if (!__btf_type_is_struct(t)) + return BTF_FIELD_IGNORE; + if (t->size != sz) + return BTF_FIELD_IGNORE; + value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:"); + if (!value_type) + return -EINVAL; + list_node = strstr(value_type, ":"); + if (!list_node) + return -EINVAL; + value_type = kstrndup(value_type, list_node - value_type, GFP_KERNEL | __GFP_NOWARN); + if (!value_type) + return -ENOMEM; + id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT); + kfree(value_type); + if (id < 0) + return id; + list_node++; + if (str_is_empty(list_node)) + return -EINVAL; + info->type = BPF_LIST_HEAD; + info->off = off; + info->list_head.value_btf_id = id; + info->list_head.node_name = list_node; + return BTF_FIELD_FOUND; +} + static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, int *align, int *sz) { @@ -3284,6 +3350,12 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, goto end; } } + if (field_mask & BPF_LIST_HEAD) { + if (!strcmp(name, "bpf_list_head")) { + type = BPF_LIST_HEAD; + goto end; + } + } /* Only return BPF_KPTR when all other types with matchable names fail */ if (field_mask & BPF_KPTR) { type = BPF_KPTR_REF; @@ -3317,6 +3389,8 @@ static int btf_find_struct_field(const struct btf *btf, return field_type; off = __btf_member_bit_offset(t, member); + if (i && !off) + return -EFAULT; if (off % 8) /* valid C code cannot generate such BTF */ return -EINVAL; @@ -3339,6 +3413,12 @@ static int btf_find_struct_field(const struct btf *btf, if (ret < 0) return ret; break; + case BPF_LIST_HEAD: + ret = btf_find_list_head(btf, t, member_type, i, off, sz, + idx < info_cnt ? &info[idx] : &tmp); + if (ret < 0) + return ret; + break; default: return -EFAULT; } @@ -3373,6 +3453,8 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t, return field_type; off = vsi->offset; + if (i && !off) + return -EFAULT; if (vsi->size != sz) continue; if (off % align) @@ -3393,6 +3475,12 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t, if (ret < 0) return ret; break; + case BPF_LIST_HEAD: + ret = btf_find_list_head(btf, var, var_type, -1, off, sz, + idx < info_cnt ? &info[idx] : &tmp); + if (ret < 0) + return ret; + break; default: return -EFAULT; } @@ -3491,6 +3579,44 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field, return ret; } +static int btf_parse_list_head(const struct btf *btf, struct btf_field *field, + struct btf_field_info *info) +{ + const struct btf_type *t, *n = NULL; + const struct btf_member *member; + u32 offset; + int i; + + t = btf_type_by_id(btf, info->list_head.value_btf_id); + /* We've already checked that value_btf_id is a struct type. We + * just need to figure out the offset of the list_node, and + * verify its type. + */ + for_each_member(i, t, member) { + if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off))) + continue; + /* Invalid BTF, two members with same name */ + if (n) + return -EINVAL; + n = btf_type_by_id(btf, member->type); + if (!__btf_type_is_struct(n)) + return -EINVAL; + if (strcmp("bpf_list_node", __btf_name_by_offset(btf, n->name_off))) + return -EINVAL; + offset = __btf_member_bit_offset(n, member); + if (offset % 8) + return -EINVAL; + offset /= 8; + if (offset % __alignof__(struct bpf_list_node)) + return -EINVAL; + + field->list_head.btf = (struct btf *)btf; + field->list_head.value_btf_id = info->list_head.value_btf_id; + field->list_head.node_offset = offset; + } + return 0; +} + struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t, u32 field_mask, u32 value_size) { @@ -3539,12 +3665,24 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type if (ret < 0) goto end; break; + case BPF_LIST_HEAD: + ret = btf_parse_list_head(btf, &rec->fields[i], &info_arr[i]); + if (ret < 0) + goto end; + break; default: ret = -EFAULT; goto end; } rec->cnt++; } + + /* bpf_list_head requires bpf_spin_lock */ + if (btf_record_has_field(rec, BPF_LIST_HEAD) && rec->spin_lock_off < 0) { + ret = -EINVAL; + goto end; + } + return rec; end: btf_record_free(rec); diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 283f55bbeb70..339cce94b408 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1706,6 +1706,38 @@ bpf_base_func_proto(enum bpf_func_id func_id) } } +void bpf_list_head_free(const struct btf_field *field, void *list_head, + struct bpf_spin_lock *spin_lock) +{ + struct list_head *head = list_head, *orig_head = head; + unsigned long flags; + + BUILD_BUG_ON(sizeof(struct list_head) > sizeof(struct bpf_list_head)); + BUILD_BUG_ON(__alignof__(struct list_head) > __alignof__(struct bpf_list_head)); + + /* __bpf_spin_lock_irqsave cannot be used here, as we may take a spin + * lock again when we call bpf_obj_free_fields in the loop, and it will + * overwrite the per-CPU local_irq_save state. + */ + local_irq_save(flags); + __bpf_spin_lock(spin_lock); + if (!head->next || list_empty(head)) + goto unlock; + head = head->next; + while (head != orig_head) { + void *obj = head; + + obj -= field->list_head.node_offset; + head = head->next; + /* TODO: Rework later */ + kfree(obj); + } +unlock: + INIT_LIST_HEAD(head); + __bpf_spin_unlock(spin_lock); + local_irq_restore(flags); +} + BTF_SET8_START(tracing_btf_ids) #ifdef CONFIG_KEXEC_CORE BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 85532d301124..fdbae52f463f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -536,6 +536,9 @@ void btf_record_free(struct btf_record *rec) module_put(rec->fields[i].kptr.module); btf_put(rec->fields[i].kptr.btf); break; + case BPF_LIST_HEAD: + /* Nothing to release for bpf_list_head */ + break; default: WARN_ON_ONCE(1); continue; @@ -578,6 +581,9 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) goto free; } break; + case BPF_LIST_HEAD: + /* Nothing to acquire for bpf_list_head */ + break; default: ret = -EFAULT; WARN_ON_ONCE(1); @@ -637,6 +643,11 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) case BPF_KPTR_REF: field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0)); break; + case BPF_LIST_HEAD: + if (WARN_ON_ONCE(rec->spin_lock_off < 0)) + continue; + bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off); + break; default: WARN_ON_ONCE(1); continue; @@ -965,7 +976,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, if (!value_type || value_size != map->value_size) return -EINVAL; - map->record = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR, + map->record = btf_parse_fields(btf, value_type, + BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD, map->value_size); if (!IS_ERR_OR_NULL(map->record)) { int i; @@ -1012,6 +1024,14 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, goto free_map_tab; } break; + case BPF_LIST_HEAD: + if (map->map_type != BPF_MAP_TYPE_HASH && + map->map_type != BPF_MAP_TYPE_LRU_HASH && + map->map_type != BPF_MAP_TYPE_ARRAY) { + ret = -EOPNOTSUPP; + goto free_map_tab; + } + break; default: /* Fail if map_type checks are missing for a field type */ ret = -EOPNOTSUPP; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 73a3516f1a48..168cd8bb9fd6 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12643,6 +12643,13 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } } + if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { + if (is_tracing_prog_type(prog_type)) { + verbose(env, "tracing progs cannot use bpf_list_head yet\n"); + return -EINVAL; + } + } + if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) && !bpf_offload_prog_map_match(prog, map)) { verbose(env, "offload device mismatch between prog and map\n"); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 94659f6b3395..dd381086bad9 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -6887,6 +6887,16 @@ struct bpf_dynptr { __u64 :64; } __attribute__((aligned(8))); +struct bpf_list_head { + __u64 :64; + __u64 :64; +} __attribute__((aligned(8))); + +struct bpf_list_node { + __u64 :64; + __u64 :64; +} __attribute__((aligned(8))); + struct bpf_sysctl { __u32 write; /* Sysctl is being read (= 0) or written (= 1). * Allows 1,2,4-byte read, but no write.

[bpf-next,v4,09/24] bpf: Support bpf_list_head in map values

Checks

Commit Message

Patch