From patchwork Fri Jul 22 18:34:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926739 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 292D6C43334 for ; Fri, 22 Jul 2022 18:34:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232157AbiGVSe6 (ORCPT ); Fri, 22 Jul 2022 14:34:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236320AbiGVSe5 (ORCPT ); Fri, 22 Jul 2022 14:34:57 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D6A881485 for ; Fri, 22 Jul 2022 11:34:56 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 26MHoqBh024211 for ; Fri, 22 Jul 2022 11:34:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=0U6H34c0tdqJjcCxGt7q7fxnft7oVj9bPnjTSrQQtMM=; b=qyAkphqgbTtQKH9PZXC1PzCr34DX+YDrDa2JO8HHJOPkGGZIeL6Bq7WCDPmmKEuXwODX Yr34oYJUU0x69cG6yXsZKCOIwS+jtSG/qqj5ehUoE9BI2JA7scCnJA4DPi4h6MER1lc1 Q4k/z5YDja/y7cAayfVgyFSGWz0yYZILMxQ= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3hg0n708yq-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:34:55 -0700 Received: from twshared39111.03.ash8.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:34:53 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id 5C631AB6F193; Fri, 22 Jul 2022 11:34:45 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 01/11] bpf: Pull repeated reg access bounds check into helper fn Date: Fri, 22 Jul 2022 11:34:28 -0700 Message-ID: <20220722183438.3319790-2-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 5EZfN1u7Ms3iJ0St8dfPnDwQOD7SypYm X-Proofpoint-GUID: 5EZfN1u7Ms3iJ0St8dfPnDwQOD7SypYm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Verifier's check_map_access function has some duplicated bounds checking arithmetic to determine whether a map element access might touch a protected field (bpf_spin_lock and bpf_timer). Pull this logic out into new reg_access_may_touch_field helper function to improve readability and understandability of the code. Signed-off-by: Dave Marchevsky --- kernel/bpf/verifier.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8ceccc420db3..c5fbaa9f025a 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3783,6 +3783,19 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno, return 0; } +/* if any part of struct field can be touched by + * load/store reject this program. + * To check that [x1, x2) overlaps with [y1, y2) + * it is sufficient to check x1 < y2 && y1 < x2. + */ +static bool reg_access_may_touch_field(const struct bpf_reg_state *reg, + u32 access_off, size_t access_sz, + u32 field_off, size_t field_sz) +{ + return reg->smin_value + access_off < field_off + field_sz && + field_off < reg->umax_value + access_off + access_sz; +} + /* check read/write into a map element with possible variable offset */ static int check_map_access(struct bpf_verifier_env *env, u32 regno, int off, int size, bool zero_size_allowed, @@ -3800,15 +3813,9 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, return err; if (map_value_has_spin_lock(map)) { - u32 lock = map->spin_lock_off; + u32 t = map->spin_lock_off; - /* if any part of struct bpf_spin_lock can be touched by - * load/store reject this program. - * To check that [x1, x2) overlaps with [y1, y2) - * it is sufficient to check x1 < y2 && y1 < x2. - */ - if (reg->smin_value + off < lock + sizeof(struct bpf_spin_lock) && - lock < reg->umax_value + off + size) { + if (reg_access_may_touch_field(reg, off, size, t, sizeof(struct bpf_spin_lock))) { verbose(env, "bpf_spin_lock cannot be accessed directly by load/store\n"); return -EACCES; } @@ -3816,8 +3823,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, if (map_value_has_timer(map)) { u32 t = map->timer_off; - if (reg->smin_value + off < t + sizeof(struct bpf_timer) && - t < reg->umax_value + off + size) { + if (reg_access_may_touch_field(reg, off, size, t, sizeof(struct bpf_timer))) { verbose(env, "bpf_timer cannot be accessed directly by load/store\n"); return -EACCES; } From patchwork Fri Jul 22 18:34:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926741 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3287C433EF for ; Fri, 22 Jul 2022 18:35:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236369AbiGVSfD (ORCPT ); Fri, 22 Jul 2022 14:35:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236337AbiGVSfC (ORCPT ); Fri, 22 Jul 2022 14:35:02 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB3EB7D1D0 for ; Fri, 22 Jul 2022 11:35:01 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26MC8QaD022622 for ; Fri, 22 Jul 2022 11:35:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=jqn5vwtwL09kaoTyW2RH9jm0IjV+NApjPiL6U3y5slE=; b=M35ZnRAglbchLkwNCD9rkiG5sxJbZKJw6QrgK7Ipe335F/b9IrofYyrljVLBuS9JUTjL LxFo8QDHor8dUCsahW3b0PglPlld0lVt08Y8U/NoNOfiJLsfgb06VBh8gTo82xaEX/V0 IwivRifMOLxvK1fv1Mn4tUmcaPzSUxaRrQA= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hf7fc10es-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:00 -0700 Received: from twshared22934.08.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:34:59 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id 256DFAB6F196; Fri, 22 Jul 2022 11:34:47 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 02/11] bpf: Add verifier support for custom callback return range Date: Fri, 22 Jul 2022 11:34:29 -0700 Message-ID: <20220722183438.3319790-3-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: udvVe_VaQrm_ktf-Nkqrjr2mlp44skz8 X-Proofpoint-ORIG-GUID: udvVe_VaQrm_ktf-Nkqrjr2mlp44skz8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Verifier logic to confirm that a callback function returns 0 or 1 was added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper"). At the time, callback return value was only used to continue or stop iteration. In order to support callbacks with a broader return value range, such as those added further in this series, add a callback_ret_range to bpf_func_state. Verifier's helpers which set in_callback_fn will also set the new field, which the verifier will later use to check return value bounds. Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel value as the latter would prevent the valid range (0, U64_MAX) being used. Signed-off-by: Dave Marchevsky --- include/linux/bpf_verifier.h | 1 + kernel/bpf/verifier.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 2e3bad8640dc..9c017575c034 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -237,6 +237,7 @@ struct bpf_func_state { */ u32 async_entry_cnt; bool in_callback_fn; + struct tnum callback_ret_range; bool in_async_callback_fn; /* The following fields should be last. See copy_func_state() */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c5fbaa9f025a..1f50becce141 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env, state->callsite = callsite; state->frameno = frameno; state->subprogno = subprogno; + state->callback_ret_range = tnum_range(0, 1); init_reg_state(env, state); mark_verifier_state_scratched(env); } @@ -6880,6 +6881,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env, __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); callee->in_callback_fn = true; + callee->callback_ret_range = tnum_range(0, 1); return 0; } @@ -6907,7 +6909,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx) caller = state->frame[state->curframe]; if (callee->in_callback_fn) { /* enforce R0 return value range [0, 1]. */ - struct tnum range = tnum_range(0, 1); + struct tnum range = callee->callback_ret_range; if (r0->type != SCALAR_VALUE) { verbose(env, "R0 not a scalar value\n"); From patchwork Fri Jul 22 18:34:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926743 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96E10CCA47C for ; Fri, 22 Jul 2022 18:35:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233737AbiGVSfE (ORCPT ); Fri, 22 Jul 2022 14:35:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236366AbiGVSfD (ORCPT ); Fri, 22 Jul 2022 14:35:03 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77C6C9FE1A for ; Fri, 22 Jul 2022 11:35:02 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26MC8QaF022622 for ; Fri, 22 Jul 2022 11:35:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=1y1r6kY8FMLTpTnnl89OSJNgwHtRLzSDIWIoWncIqwU=; b=ITFhu94pamM3xYoIQSICqVo/Irp96ZUCw1iPYfgiYhAq0i6OLx6y3GoGG0mcVZXJA2jP 0dovpB9oWv0wWoxgFuqukVjxz14OZ12m8GNlgI9OcX8iKDidTj9Cl+hqC6Me0jjIA+6c Ob8nSXL6xANgBwxRCFpHXKoHSfe9CmRrjpY= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hf7fc10es-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:01 -0700 Received: from twshared25107.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:35:00 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id D66F6AB6F199; Fri, 22 Jul 2022 11:34:47 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 03/11] bpf: Add rb_node_off to bpf_map Date: Fri, 22 Jul 2022 11:34:30 -0700 Message-ID: <20220722183438.3319790-4-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: RrDwewK-_3DYQtZudeDukARh_oHwys-y X-Proofpoint-ORIG-GUID: RrDwewK-_3DYQtZudeDukARh_oHwys-y X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Similarly to other protected fields which might be in a map value - bpf_spin_lock and bpf_timer - keep track of existence and offset of struct rb_node within map value struct. This will allow later patches in this series to prevent writes to rb_node field. Allowing bpf programs to modify the rbtree node struct's rb_node field would allow parent and children node pointers to be changed, which could corrupt an otherwise-valid rbtree. Signed-off-by: Dave Marchevsky --- include/linux/bpf.h | 6 ++++++ include/linux/btf.h | 1 + kernel/bpf/btf.c | 21 +++++++++++++++++++++ kernel/bpf/syscall.c | 3 +++ 4 files changed, 31 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a97751d845c9..eb8c550db0e2 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -214,6 +214,7 @@ struct bpf_map { int spin_lock_off; /* >=0 valid offset, <0 error */ struct bpf_map_value_off *kptr_off_tab; int timer_off; /* >=0 valid offset, <0 error */ + int rb_node_off; /* >=0 valid offset, <0 error */ u32 id; int numa_node; u32 btf_key_type_id; @@ -263,6 +264,11 @@ static inline bool map_value_has_kptrs(const struct bpf_map *map) return !IS_ERR_OR_NULL(map->kptr_off_tab); } +static inline bool map_value_has_rb_node(const struct bpf_map *map) +{ + return map->rb_node_off >= 0; +} + static inline void check_and_init_map_value(struct bpf_map *map, void *dst) { if (unlikely(map_value_has_spin_lock(map))) diff --git a/include/linux/btf.h b/include/linux/btf.h index cdb376d53238..1d8b1bcf0396 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -152,6 +152,7 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, u32 expected_offset, u32 expected_size); int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t); int btf_find_timer(const struct btf *btf, const struct btf_type *t); +int btf_find_rb_node(const struct btf *btf, const struct btf_type *t); struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf, const struct btf_type *t); bool btf_type_is_void(const struct btf_type *t); diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 7ac971ea98d1..3a7096da7f20 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3195,6 +3195,7 @@ enum btf_field_type { BTF_FIELD_SPIN_LOCK, BTF_FIELD_TIMER, BTF_FIELD_KPTR, + BTF_FIELD_RB_NODE, }; enum { @@ -3282,6 +3283,7 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t switch (field_type) { case BTF_FIELD_SPIN_LOCK: case BTF_FIELD_TIMER: + case BTF_FIELD_RB_NODE: ret = btf_find_struct(btf, member_type, off, sz, idx < info_cnt ? &info[idx] : &tmp); if (ret < 0) @@ -3332,6 +3334,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t, switch (field_type) { case BTF_FIELD_SPIN_LOCK: case BTF_FIELD_TIMER: + case BTF_FIELD_RB_NODE: ret = btf_find_struct(btf, var_type, off, sz, idx < info_cnt ? &info[idx] : &tmp); if (ret < 0) @@ -3374,6 +3377,11 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t, sz = sizeof(struct bpf_timer); align = __alignof__(struct bpf_timer); break; + case BTF_FIELD_RB_NODE: + name = "rb_node"; + sz = sizeof(struct rb_node); + align = __alignof__(struct rb_node); + break; case BTF_FIELD_KPTR: name = NULL; sz = sizeof(u64); @@ -3420,6 +3428,19 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t) return info.off; } +int btf_find_rb_node(const struct btf *btf, const struct btf_type *t) +{ + struct btf_field_info info; + int ret; + + ret = btf_find_field(btf, t, BTF_FIELD_RB_NODE, &info, 1); + if (ret < 0) + return ret; + if (!ret) + return -ENOENT; + return info.off; +} + struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf, const struct btf_type *t) { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 83c7136c5788..3947fbd137af 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1052,6 +1052,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, } } + map->rb_node_off = btf_find_rb_node(btf, value_type); + if (map->ops->map_check_btf) { ret = map->ops->map_check_btf(map, btf, key_type, value_type); if (ret < 0) @@ -1115,6 +1117,7 @@ static int map_create(union bpf_attr *attr) map->spin_lock_off = -EINVAL; map->timer_off = -EINVAL; + map->rb_node_off = -EINVAL; if (attr->btf_key_type_id || attr->btf_value_type_id || /* Even the map's value is a kernel's struct, * the bpf_prog.o must have BTF to begin with From patchwork Fri Jul 22 18:34:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926746 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AED0BC43334 for ; Fri, 22 Jul 2022 18:35:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235780AbiGVSfF (ORCPT ); Fri, 22 Jul 2022 14:35:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236337AbiGVSfE (ORCPT ); Fri, 22 Jul 2022 14:35:04 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21DC981485 for ; Fri, 22 Jul 2022 11:35:02 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26MAR3Kr022379 for ; Fri, 22 Jul 2022 11:35:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=w+9TPSsRaIPSTDNSwxbhwzSK1J9ruCyz0N0/4MzdIwc=; b=nMP2mnJEOwsIR5gCzfHukjWahhbZygR8ivD813II3Mi85LkJUYgFnHUKIH8BoiNqUYy2 AySYgssfq3//kuOYJ7PfujXaplL28UX3TnhDuyVNyP40eGm8hZEah2uWsnmX+RMY6ARe aYUY+L9Kg83ASWiFyMue67Z8cNgVQ2RCYss= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hes8vmye8-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:01 -0700 Received: from twshared22934.08.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:34:59 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id 668F6AB6F19C; Fri, 22 Jul 2022 11:34:48 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 04/11] bpf: Add rbtree map Date: Fri, 22 Jul 2022 11:34:31 -0700 Message-ID: <20220722183438.3319790-5-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: Mx618AHTMT2JEBY_yz-SZjvca37MgMAk X-Proofpoint-ORIG-GUID: Mx618AHTMT2JEBY_yz-SZjvca37MgMAk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Introduce a new map type, bpf_rbtree_map, allowing BPF programs to create and manipulate rbtree data structures. bpf_rbtree_map differs from 'classic' BPF map patterns in a few important ways: * The map does not allocate its own elements. Instead, BPF programs must call bpf_rbtree_alloc_node helper to allocate and bpf_rbtree_add to add map elem - referred to as 'node' from now on - to the rbtree. This means that rbtree maps can grow dynamically, do not preallocate, and that 'max_entries' has no meaning for rbtree maps. * Separating allocation and insertion allows alloc_node call to occur in contexts where it's safe to allocate. * It's possible to remove a node from a rbtree map with bpf_rbtree_remove helper. * Goal here is to allow a node to be removed from one rbtree and added to another [ NOTE: This functionality is still in progress ] Helpers are added to manipulate nodes and trees: * bpf_rbtree_{alloc,free}_node: Allocate / free node structs * bpf_rbtree_{add,remove}: Add / remove nodes from rbtree maps * A comparator function is passed to bpf_rbtree_add in order to find the correct place to add the node. * bpf_rbtree_find: Find a node matching some condition in the rbtree * A comparator function is passed in order to determine whether a node matches what's being searched for. bpf_rbtree_add is very similar to the 'map_push_elem' builtin, but since verifier needs special logic to setup the comparator callback a new helper is added. Same for bpf_rbtree_find and 'map_lookup_elem' builtin. In order to safely support separate allocation / insertion and passing nodes between rbtrees, some invariants must hold: * A node that is not in a rbtree map must either be free'd or added to a rbtree map before the program terminates * Nodes are in this state when returned from bpf_rbtree_alloc_node or bpf_rbtree_remove. If a node is in a rbtree map it is 'owned' by the map, otherwise it is owned by the BPF program which holds a reference to it. Owner is responsible for the lifetime of the node. This matches existing acquire / release semantics well. node_alloc and remove operations 'acquire' a node while add and node_free operations 'release' the node. The verifier enforces that acquired nodes are released before program terminates. Some other implementation details: * The value type of an rbtree map is expected to be a struct which contains 'struct rb_node' at offset 0. * BPF programs may not modify the node struct's rb_node field. * Otherwise the tree could be left in corrupted state * Rbtree map is value-only. Keys have no meaning * Since the value type is not known until the rbtree map is instantiated, helper protos have input and return type 'struct rb_node *' which verifier replaces with actual map value type. * [ TODO: Existing logic prevents any writes to PTR_TO_BTF_ID. This broadly turned off in this patch and replaced with "no writes to struct rb_node is PTR_TO_BTF_ID struct has one". This is a hack and needs to be replaced. ] Signed-off-by: Dave Marchevsky --- include/linux/bpf_types.h | 1 + include/uapi/linux/bpf.h | 62 ++++++++ kernel/bpf/Makefile | 2 +- kernel/bpf/helpers.c | 15 ++ kernel/bpf/rbtree.c | 256 +++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 118 ++++++++++++++- tools/include/uapi/linux/bpf.h | 62 ++++++++ 7 files changed, 511 insertions(+), 5 deletions(-) create mode 100644 kernel/bpf/rbtree.c diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 2b9112b80171..78e9b5253983 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_RBTREE, rbtree_map_ops) BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint) BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index ffcbf79a556b..4688ce88caf4 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -909,6 +909,7 @@ enum bpf_map_type { BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, + BPF_MAP_TYPE_RBTREE, }; /* Note that tracing related programs such as @@ -5328,6 +5329,62 @@ union bpf_attr { * **-EACCES** if the SYN cookie is not valid. * * **-EPROTONOSUPPORT** if CONFIG_IPV6 is not builtin. + * + * void *bpf_rbtree_alloc_node(struct bpf_map *map, u32 sz) + * Description + * Allocate a node of size *sz* bytes for use in rbtree *map*. + * + * *sz* must be >= sizeof(struct rb_node) + * Return + * A pointer to the allocated node if successful, otherwise NULL. + * + * The verifier considers the type of the returned pointer to be + * the BTF id of *map*'s value type. + * + * void *bpf_rbtree_add(struct bpf_map *map, void *node, void *cb) + * Description + * Add *node* to rbtree *map* using *cb* comparator callback to + * walk the rbtree. + * + * *cb* must take (struct rb_node \*, const struct rb_node \*) as + * input and return a bool signifying whether the first rb_node's + * struct is less than the second. + * + * Return + * If success, returns a pointer to the added node in the rbtree. + * + * If add fails, returns NULL + * + * long bpf_rbtree_find(struct bpf_map *map, void *key, void *cb) + * Description + * Find *key* in rbtree *map* using *cb* comparator callback to walk the + * rbtree. + * + * *cb* must take (const void \*key, const struct rb_node \*n) as + * input and return an int. If *cb* determines *n* to match *key*, *cb* must + * return 0. If larger, a positive int, and a negative int if smaller. + * + * *key* does not need to be a rbtree node struct. + * + * Return + * Ptr to rbtree node if found, otherwise NULL. + * + * void *bpf_rbtree_remove(struct bpf_map *map, void *elem) + * Description + * Remove *elem* from rbtree *map*. + * + * Return + * If success, returns a pointer to the removed node. + * + * If remove fails, returns NULL + * + * long bpf_rbtree_free_node(struct bpf_map *map, void *elem) + * Description + * Free rb_node that isn't associated w/ a tree. + * + * Return + * 0 + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5538,6 +5595,11 @@ union bpf_attr { FN(tcp_raw_gen_syncookie_ipv6), \ FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ + FN(rbtree_alloc_node), \ + FN(rbtree_add), \ + FN(rbtree_find), \ + FN(rbtree_remove), \ + FN(rbtree_free_node), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 057ba8e01e70..00eedab3ad53 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -7,7 +7,7 @@ endif CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy) obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o -obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o +obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o rbtree.o obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index a1c84d256f83..35eb66d11bf6 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1582,6 +1582,11 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto __weak; const struct bpf_func_proto bpf_probe_read_kernel_proto __weak; const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak; const struct bpf_func_proto bpf_task_pt_regs_proto __weak; +const struct bpf_func_proto bpf_rbtree_alloc_node_proto __weak; +const struct bpf_func_proto bpf_rbtree_add_proto __weak; +const struct bpf_func_proto bpf_rbtree_find_proto __weak; +const struct bpf_func_proto bpf_rbtree_remove_proto __weak; +const struct bpf_func_proto bpf_rbtree_free_node_proto __weak; const struct bpf_func_proto * bpf_base_func_proto(enum bpf_func_id func_id) @@ -1671,6 +1676,16 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_timer_cancel_proto; case BPF_FUNC_kptr_xchg: return &bpf_kptr_xchg_proto; + case BPF_FUNC_rbtree_alloc_node: + return &bpf_rbtree_alloc_node_proto; + case BPF_FUNC_rbtree_add: + return &bpf_rbtree_add_proto; + case BPF_FUNC_rbtree_find: + return &bpf_rbtree_find_proto; + case BPF_FUNC_rbtree_remove: + return &bpf_rbtree_remove_proto; + case BPF_FUNC_rbtree_free_node: + return &bpf_rbtree_free_node_proto; default: break; } diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c new file mode 100644 index 000000000000..250d62210804 --- /dev/null +++ b/kernel/bpf/rbtree.c @@ -0,0 +1,256 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ + +#include +#include +#include +#include + +struct bpf_rbtree { + struct bpf_map map; + struct rb_root_cached root; +}; + +BTF_ID_LIST_SINGLE(bpf_rbtree_btf_ids, struct, rb_node); + +static int rbtree_map_alloc_check(union bpf_attr *attr) +{ + if (attr->max_entries || !attr->btf_value_type_id) + return -EINVAL; + + return 0; +} + +static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr) +{ + struct bpf_rbtree *tree; + int numa_node; + + if (!bpf_capable()) + return ERR_PTR(-EPERM); + + if (attr->value_size == 0) + return ERR_PTR(-EINVAL); + + numa_node = bpf_map_attr_numa_node(attr); + tree = bpf_map_area_alloc(sizeof(*tree), numa_node); + if (!tree) + return ERR_PTR(-ENOMEM); + + tree->root = RB_ROOT_CACHED; + bpf_map_init_from_attr(&tree->map, attr); + return &tree->map; +} + +static struct rb_node *rbtree_map_alloc_node(struct bpf_map *map, size_t sz) +{ + struct rb_node *node; + + node = bpf_map_kmalloc_node(map, sz, GFP_KERNEL, map->numa_node); + if (!node) + return NULL; + RB_CLEAR_NODE(node); + return node; +} + +BPF_CALL_2(bpf_rbtree_alloc_node, struct bpf_map *, map, u32, sz) +{ + struct rb_node *node; + + if (map->map_type != BPF_MAP_TYPE_RBTREE) + return (u64)NULL; + + if (sz < sizeof(*node)) + return (u64)NULL; + + node = rbtree_map_alloc_node(map, (size_t)sz); + if (!node) + return (u64)NULL; + + return (u64)node; +} + +const struct bpf_func_proto bpf_rbtree_alloc_node_proto = { + .func = bpf_rbtree_alloc_node, + .gpl_only = true, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL, + .ret_btf_id = &bpf_rbtree_btf_ids[0], + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_CONST_ALLOC_SIZE_OR_ZERO, +}; + +BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb) +{ + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + struct rb_node *node = (struct rb_node *)value; + + if (WARN_ON_ONCE(!RB_EMPTY_NODE(node))) + return (u64)NULL; + + rb_add_cached(node, &tree->root, (bool (*)(struct rb_node *, const struct rb_node *))cb); + return (u64)node; +} + +const struct bpf_func_proto bpf_rbtree_add_proto = { + .func = bpf_rbtree_add, + .gpl_only = true, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE, + .arg2_btf_id = &bpf_rbtree_btf_ids[0], + .arg3_type = ARG_PTR_TO_FUNC, +}; + +BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb) +{ + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + + return (u64)rb_find(key, &tree->root.rb_root, + (int (*)(const void *key, + const struct rb_node *))cb); +} + +const struct bpf_func_proto bpf_rbtree_find_proto = { + .func = bpf_rbtree_find, + .gpl_only = true, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL, + .ret_btf_id = &bpf_rbtree_btf_ids[0], + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_FUNC, +}; + +/* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are + * 'rb_node *', so field name of rb_node within containing struct is not + * needed. + * + * Since bpf_rb_tree's node always has 'struct rb_node' at offset 0 it's + * not necessary to know field name or type of node struct + */ +#define bpf_rbtree_postorder_for_each_entry_safe(pos, n, root) \ + for (pos = rb_first_postorder(root); \ + pos && ({ n = rb_next_postorder(pos); 1; }); \ + pos = n) + +static void rbtree_map_free(struct bpf_map *map) +{ + struct rb_node *pos, *n; + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + + bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root) + kfree(pos); + bpf_map_area_free(tree); +} + +static int rbtree_map_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type) +{ + if (!map_value_has_rb_node(map)) + return -EINVAL; + + if (map->rb_node_off > 0) + return -EINVAL; + + return 0; +} + +static int rbtree_map_push_elem(struct bpf_map *map, void *value, u64 flags) +{ + /* Use bpf_rbtree_add helper instead + */ + return -ENOTSUPP; +} + +static int rbtree_map_pop_elem(struct bpf_map *map, void *value) +{ + return -ENOTSUPP; +} + +static int rbtree_map_peek_elem(struct bpf_map *map, void *value) +{ + return -ENOTSUPP; +} + +static void *rbtree_map_lookup_elem(struct bpf_map *map, void *value) +{ + /* Use bpf_rbtree_find helper instead + */ + return ERR_PTR(-ENOTSUPP); +} + +static int rbtree_map_update_elem(struct bpf_map *map, void *key, void *value, + u64 flags) +{ + return -ENOTSUPP; +} + +static int rbtree_map_delete_elem(struct bpf_map *map, void *value) +{ + return -ENOTSUPP; +} + +BPF_CALL_2(bpf_rbtree_remove, struct bpf_map *, map, void *, value) +{ + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + struct rb_node *node = (struct rb_node *)value; + + if (WARN_ON_ONCE(RB_EMPTY_NODE(node))) + return (u64)NULL; + + rb_erase_cached(node, &tree->root); + RB_CLEAR_NODE(node); + return (u64)node; +} + +const struct bpf_func_proto bpf_rbtree_remove_proto = { + .func = bpf_rbtree_remove, + .gpl_only = true, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL, + .ret_btf_id = &bpf_rbtree_btf_ids[0], + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID, + .arg2_btf_id = &bpf_rbtree_btf_ids[0], +}; + +BPF_CALL_2(bpf_rbtree_free_node, struct bpf_map *, map, void *, value) +{ + struct rb_node *node = (struct rb_node *)value; + + WARN_ON_ONCE(!RB_EMPTY_NODE(node)); + kfree(node); + return 0; +} + +const struct bpf_func_proto bpf_rbtree_free_node_proto = { + .func = bpf_rbtree_free_node, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE, + .arg2_btf_id = &bpf_rbtree_btf_ids[0], +}; + +static int rbtree_map_get_next_key(struct bpf_map *map, void *key, + void *next_key) +{ + return -ENOTSUPP; +} + +BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree) +const struct bpf_map_ops rbtree_map_ops = { + .map_meta_equal = bpf_map_meta_equal, + .map_alloc_check = rbtree_map_alloc_check, + .map_alloc = rbtree_map_alloc, + .map_free = rbtree_map_free, + .map_get_next_key = rbtree_map_get_next_key, + .map_push_elem = rbtree_map_push_elem, + .map_peek_elem = rbtree_map_peek_elem, + .map_pop_elem = rbtree_map_pop_elem, + .map_lookup_elem = rbtree_map_lookup_elem, + .map_update_elem = rbtree_map_update_elem, + .map_delete_elem = rbtree_map_delete_elem, + .map_check_btf = rbtree_map_check_btf, + .map_btf_id = &bpf_rbtree_map_btf_ids[0], +}; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1f50becce141..535f673882cd 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -481,7 +481,9 @@ static bool is_acquire_function(enum bpf_func_id func_id, func_id == BPF_FUNC_sk_lookup_udp || func_id == BPF_FUNC_skc_lookup_tcp || func_id == BPF_FUNC_ringbuf_reserve || - func_id == BPF_FUNC_kptr_xchg) + func_id == BPF_FUNC_kptr_xchg || + func_id == BPF_FUNC_rbtree_alloc_node || + func_id == BPF_FUNC_rbtree_remove) return true; if (func_id == BPF_FUNC_map_lookup_elem && @@ -531,6 +533,20 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn) insn->imm == BPF_CMPXCHG; } +static bool function_manipulates_rbtree_node(enum bpf_func_id func_id) +{ + return func_id == BPF_FUNC_rbtree_add || + func_id == BPF_FUNC_rbtree_remove || + func_id == BPF_FUNC_rbtree_free_node; +} + +static bool function_returns_rbtree_node(enum bpf_func_id func_id) +{ + return func_id == BPF_FUNC_rbtree_alloc_node || + func_id == BPF_FUNC_rbtree_add || + func_id == BPF_FUNC_rbtree_remove; +} + /* string representation of 'enum bpf_reg_type' * * Note that reg_type_str() can not appear more than once in a single verbose() @@ -3784,6 +3800,13 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno, return 0; } +static bool access_may_touch_field(u32 access_off, size_t access_sz, + u32 field_off, size_t field_sz) +{ + return access_off < field_off + field_sz && + field_off < access_off + access_sz; +} + /* if any part of struct field can be touched by * load/store reject this program. * To check that [x1, x2) overlaps with [y1, y2) @@ -4490,7 +4513,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, const char *tname = btf_name_by_offset(reg->btf, t->name_off); enum bpf_type_flag flag = 0; u32 btf_id; - int ret; + int ret, rb_node_off; if (off < 0) { verbose(env, @@ -4527,8 +4550,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, off, size, atype, &btf_id, &flag); } else { if (atype != BPF_READ) { - verbose(env, "only read is supported\n"); - return -EACCES; + rb_node_off = btf_find_rb_node(reg->btf, t); + if (rb_node_off < 0 || + access_may_touch_field(off, size, rb_node_off, + sizeof(struct rb_node))) { + verbose(env, "only read is supported\n"); + return -EACCES; + } } ret = btf_struct_access(&env->log, reg->btf, t, off, size, @@ -5764,6 +5792,17 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, if (meta->func_id == BPF_FUNC_kptr_xchg) { if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno)) return -EACCES; + } else if (function_manipulates_rbtree_node(meta->func_id)) { + if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off, + meta->map_ptr->btf, + meta->map_ptr->btf_value_type_id, + strict_type_match)) { + verbose(env, "rbtree: R%d is of type %s but %s is expected\n", + regno, kernel_type_name(reg->btf, reg->btf_id), + kernel_type_name(meta->map_ptr->btf, + meta->map_ptr->btf_value_type_id)); + return -EACCES; + } } else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off, btf_vmlinux, *arg_btf_id, strict_type_match)) { @@ -6369,10 +6408,17 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, break; case BPF_FUNC_map_pop_elem: if (map->map_type != BPF_MAP_TYPE_QUEUE && + map->map_type != BPF_MAP_TYPE_RBTREE && map->map_type != BPF_MAP_TYPE_STACK) goto error; break; case BPF_FUNC_map_peek_elem: + if (map->map_type != BPF_MAP_TYPE_QUEUE && + map->map_type != BPF_MAP_TYPE_STACK && + map->map_type != BPF_MAP_TYPE_RBTREE && + map->map_type != BPF_MAP_TYPE_BLOOM_FILTER) + goto error; + break; case BPF_FUNC_map_push_elem: if (map->map_type != BPF_MAP_TYPE_QUEUE && map->map_type != BPF_MAP_TYPE_STACK && @@ -6828,6 +6874,57 @@ static int set_loop_callback_state(struct bpf_verifier_env *env, return 0; } +static int set_rbtree_add_callback_state(struct bpf_verifier_env *env, + struct bpf_func_state *caller, + struct bpf_func_state *callee, + int insn_idx) +{ + struct bpf_map *map_ptr = caller->regs[BPF_REG_1].map_ptr; + + /* bpf_rbtree_add(struct bpf_map *map, void *value, void *cb) + * cb(struct rb_node *a, const struct rb_node *b); + */ + callee->regs[BPF_REG_1].type = PTR_TO_MAP_VALUE; + __mark_reg_known_zero(&callee->regs[BPF_REG_1]); + callee->regs[BPF_REG_1].map_ptr = map_ptr; + + callee->regs[BPF_REG_2].type = PTR_TO_MAP_VALUE; + __mark_reg_known_zero(&callee->regs[BPF_REG_2]); + callee->regs[BPF_REG_2].map_ptr = map_ptr; + + __mark_reg_not_init(env, &callee->regs[BPF_REG_3]); + __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); + __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); + callee->in_callback_fn = true; + return 0; +} + +static int set_rbtree_find_callback_state(struct bpf_verifier_env *env, + struct bpf_func_state *caller, + struct bpf_func_state *callee, + int insn_idx) +{ + struct bpf_map *map_ptr = caller->regs[BPF_REG_1].map_ptr; + + /* bpf_rbtree_find(struct bpf_map *map, void *key, void *cb) + * cb(void *key, const struct rb_node *b); + */ + callee->regs[BPF_REG_1].type = PTR_TO_MAP_VALUE; + __mark_reg_known_zero(&callee->regs[BPF_REG_1]); + callee->regs[BPF_REG_1].map_ptr = map_ptr; + + callee->regs[BPF_REG_2].type = PTR_TO_MAP_VALUE; + __mark_reg_known_zero(&callee->regs[BPF_REG_2]); + callee->regs[BPF_REG_2].map_ptr = map_ptr; + + __mark_reg_not_init(env, &callee->regs[BPF_REG_3]); + __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); + __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); + callee->in_callback_fn = true; + callee->callback_ret_range = tnum_range(0, U64_MAX); + return 0; +} + static int set_timer_callback_state(struct bpf_verifier_env *env, struct bpf_func_state *caller, struct bpf_func_state *callee, @@ -7310,6 +7407,14 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, set_loop_callback_state); break; + case BPF_FUNC_rbtree_add: + err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, + set_rbtree_add_callback_state); + break; + case BPF_FUNC_rbtree_find: + err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, + set_rbtree_find_callback_state); + break; case BPF_FUNC_dynptr_from_mem: if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) { verbose(env, "Unsupported reg type %s for bpf_dynptr_from_mem data\n", @@ -7424,6 +7529,9 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn if (func_id == BPF_FUNC_kptr_xchg) { ret_btf = meta.kptr_off_desc->kptr.btf; ret_btf_id = meta.kptr_off_desc->kptr.btf_id; + } else if (function_returns_rbtree_node(func_id)) { + ret_btf = meta.map_ptr->btf; + ret_btf_id = meta.map_ptr->btf_value_type_id; } else { ret_btf = btf_vmlinux; ret_btf_id = *fn->ret_btf_id; @@ -13462,8 +13570,10 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) BPF_SIZE((insn)->code); env->prog->aux->num_exentries++; } else if (resolve_prog_type(env->prog) != BPF_PROG_TYPE_STRUCT_OPS) { + /*TODO: Not sure what to do here verbose(env, "Writes through BTF pointers are not allowed\n"); return -EINVAL; + */ } continue; default: diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index ffcbf79a556b..4688ce88caf4 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -909,6 +909,7 @@ enum bpf_map_type { BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, + BPF_MAP_TYPE_RBTREE, }; /* Note that tracing related programs such as @@ -5328,6 +5329,62 @@ union bpf_attr { * **-EACCES** if the SYN cookie is not valid. * * **-EPROTONOSUPPORT** if CONFIG_IPV6 is not builtin. + * + * void *bpf_rbtree_alloc_node(struct bpf_map *map, u32 sz) + * Description + * Allocate a node of size *sz* bytes for use in rbtree *map*. + * + * *sz* must be >= sizeof(struct rb_node) + * Return + * A pointer to the allocated node if successful, otherwise NULL. + * + * The verifier considers the type of the returned pointer to be + * the BTF id of *map*'s value type. + * + * void *bpf_rbtree_add(struct bpf_map *map, void *node, void *cb) + * Description + * Add *node* to rbtree *map* using *cb* comparator callback to + * walk the rbtree. + * + * *cb* must take (struct rb_node \*, const struct rb_node \*) as + * input and return a bool signifying whether the first rb_node's + * struct is less than the second. + * + * Return + * If success, returns a pointer to the added node in the rbtree. + * + * If add fails, returns NULL + * + * long bpf_rbtree_find(struct bpf_map *map, void *key, void *cb) + * Description + * Find *key* in rbtree *map* using *cb* comparator callback to walk the + * rbtree. + * + * *cb* must take (const void \*key, const struct rb_node \*n) as + * input and return an int. If *cb* determines *n* to match *key*, *cb* must + * return 0. If larger, a positive int, and a negative int if smaller. + * + * *key* does not need to be a rbtree node struct. + * + * Return + * Ptr to rbtree node if found, otherwise NULL. + * + * void *bpf_rbtree_remove(struct bpf_map *map, void *elem) + * Description + * Remove *elem* from rbtree *map*. + * + * Return + * If success, returns a pointer to the removed node. + * + * If remove fails, returns NULL + * + * long bpf_rbtree_free_node(struct bpf_map *map, void *elem) + * Description + * Free rb_node that isn't associated w/ a tree. + * + * Return + * 0 + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5538,6 +5595,11 @@ union bpf_attr { FN(tcp_raw_gen_syncookie_ipv6), \ FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ + FN(rbtree_alloc_node), \ + FN(rbtree_add), \ + FN(rbtree_find), \ + FN(rbtree_remove), \ + FN(rbtree_free_node), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Fri Jul 22 18:34:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926740 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D383C433EF for ; Fri, 22 Jul 2022 18:35:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236320AbiGVSfA (ORCPT ); Fri, 22 Jul 2022 14:35:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233737AbiGVSe6 (ORCPT ); Fri, 22 Jul 2022 14:34:58 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6866E9F046 for ; Fri, 22 Jul 2022 11:34:57 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 26MHojlG021276 for ; Fri, 22 Jul 2022 11:34:56 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=M6+bsJAOH/HHolRF0OT3Ro2hADTYCMnHs8b1X75Oi/I=; b=RT0RypMVK5u7EcFBWJdwD9g/wDj6h5J5GST1RQUIBDZdtgy90dbcVfvh3rWgFF9xpDlL BBRbbrVePKWTtV1Ea3ZdGirOzrRBgpiVp9nRPDokz8XYwM1nIk1Omn7NeJhmC6GNj3zN P+R98fpo/Hy1h33WY/luX1AGAjyVGHiyS+Q= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3hg0n708ym-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:34:56 -0700 Received: from twshared6324.05.ash7.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:34:53 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id 00C5DAB6F19E; Fri, 22 Jul 2022 11:34:48 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 05/11] bpf: Add bpf_spin_lock member to rbtree Date: Fri, 22 Jul 2022 11:34:32 -0700 Message-ID: <20220722183438.3319790-6-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: ZJTQRiz7jNL5l7jbM8XuwWKOhNqOArMd X-Proofpoint-GUID: ZJTQRiz7jNL5l7jbM8XuwWKOhNqOArMd X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch adds a struct bpf_spin_lock *lock member to bpf_rbtree, as well as a bpf_rbtree_get_lock helper which allows bpf programs to access the lock. Ideally the bpf_spin_lock would be created independently oustide of the tree and associated with it before the tree is used, either as part of map definition or via some call like rbtree_init(&rbtree, &lock). Doing this in an ergonomic way is proving harder than expected, so for now use this workaround. Why is creating the bpf_spin_lock independently and associating it with the tree preferable? Because we want to be able to transfer nodes between trees atomically, and for this to work need same lock associated with 2 trees. Further locking-related patches will make it possible for the lock to be used in BPF programs and add code which enforces that the lock is held when doing any operation on the tree. Signed-off-by: Dave Marchevsky --- include/uapi/linux/bpf.h | 7 +++++++ kernel/bpf/helpers.c | 3 +++ kernel/bpf/rbtree.c | 24 ++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 7 +++++++ 4 files changed, 41 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4688ce88caf4..c677d92de3bc 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5385,6 +5385,12 @@ union bpf_attr { * Return * 0 * + * void *bpf_rbtree_get_lock(struct bpf_map *map) + * Description + * Return the bpf_spin_lock associated with the rbtree + * + * Return + * Ptr to lock */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5600,6 +5606,7 @@ union bpf_attr { FN(rbtree_find), \ FN(rbtree_remove), \ FN(rbtree_free_node), \ + FN(rbtree_get_lock), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 35eb66d11bf6..257a808bb767 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1587,6 +1587,7 @@ const struct bpf_func_proto bpf_rbtree_add_proto __weak; const struct bpf_func_proto bpf_rbtree_find_proto __weak; const struct bpf_func_proto bpf_rbtree_remove_proto __weak; const struct bpf_func_proto bpf_rbtree_free_node_proto __weak; +const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak; const struct bpf_func_proto * bpf_base_func_proto(enum bpf_func_id func_id) @@ -1686,6 +1687,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_rbtree_remove_proto; case BPF_FUNC_rbtree_free_node: return &bpf_rbtree_free_node_proto; + case BPF_FUNC_rbtree_get_lock: + return &bpf_rbtree_get_lock_proto; default: break; } diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c index 250d62210804..c6f0a2a083f6 100644 --- a/kernel/bpf/rbtree.c +++ b/kernel/bpf/rbtree.c @@ -9,6 +9,7 @@ struct bpf_rbtree { struct bpf_map map; struct rb_root_cached root; + struct bpf_spin_lock *lock; }; BTF_ID_LIST_SINGLE(bpf_rbtree_btf_ids, struct, rb_node); @@ -39,6 +40,14 @@ static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr) tree->root = RB_ROOT_CACHED; bpf_map_init_from_attr(&tree->map, attr); + + tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock), + GFP_KERNEL | __GFP_NOWARN); + if (!tree->lock) { + bpf_map_area_free(tree); + return ERR_PTR(-ENOMEM); + } + return &tree->map; } @@ -139,6 +148,7 @@ static void rbtree_map_free(struct bpf_map *map) bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root) kfree(pos); + kfree(tree->lock); bpf_map_area_free(tree); } @@ -238,6 +248,20 @@ static int rbtree_map_get_next_key(struct bpf_map *map, void *key, return -ENOTSUPP; } +BPF_CALL_1(bpf_rbtree_get_lock, struct bpf_map *, map) +{ + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + + return (u64)tree->lock; +} + +const struct bpf_func_proto bpf_rbtree_get_lock_proto = { + .func = bpf_rbtree_get_lock, + .gpl_only = true, + .ret_type = RET_PTR_TO_MAP_VALUE, + .arg1_type = ARG_CONST_MAP_PTR, +}; + BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree) const struct bpf_map_ops rbtree_map_ops = { .map_meta_equal = bpf_map_meta_equal, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 4688ce88caf4..c677d92de3bc 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5385,6 +5385,12 @@ union bpf_attr { * Return * 0 * + * void *bpf_rbtree_get_lock(struct bpf_map *map) + * Description + * Return the bpf_spin_lock associated with the rbtree + * + * Return + * Ptr to lock */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5600,6 +5606,7 @@ union bpf_attr { FN(rbtree_find), \ FN(rbtree_remove), \ FN(rbtree_free_node), \ + FN(rbtree_get_lock), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Fri Jul 22 18:34:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926744 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52DE4CCA473 for ; Fri, 22 Jul 2022 18:35:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229567AbiGVSfF (ORCPT ); Fri, 22 Jul 2022 14:35:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236377AbiGVSfE (ORCPT ); Fri, 22 Jul 2022 14:35:04 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A11C89A72 for ; Fri, 22 Jul 2022 11:35:03 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26MC8QaH022622 for ; Fri, 22 Jul 2022 11:35:02 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=j5IY34dMDkvyjKrSadECn8GfqDBNSp5UxYVCEHBrmXE=; b=GgeMBdTtLBkX9JxRQgLWOUjXFDYrJWj3+COZqxXwE44rVw6JAOw6XSYDNvZwqa8QWhaD KIH5hDizDfs4uYGXGiz7I9NZlVoxupNP1zixTIDt8Djv/I7tLMFgwTlKnGKWhmhrNwey wye2Q49cByLjQQ3VDdhv65OBV7l3WLfbXWg= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hf7fc10es-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:02 -0700 Received: from twshared25107.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:35:00 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id 9966FAB6F1A1; Fri, 22 Jul 2022 11:34:49 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 06/11] bpf: Add bpf_rbtree_{lock,unlock} helpers Date: Fri, 22 Jul 2022 11:34:33 -0700 Message-ID: <20220722183438.3319790-7-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: Yhrjpf-jfd84HC1dEsP_5VMWiCxg-Mdq X-Proofpoint-ORIG-GUID: Yhrjpf-jfd84HC1dEsP_5VMWiCxg-Mdq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC These helpers are equivalent to bpf_spin_{lock,unlock}, but the verifier doesn't try to enforce that no helper calls occur when there's an active spin lock. [ TODO: Currently the verifier doesn't do _anything_ spinlock related when it sees one of these, including setting active_spin_lock. This is probably too lenient. Also, EXPORT_SYMBOL for internal lock helpers might not be the best code structure. ] Future patches will add enforcement of "rbtree helpers must always be called when lock is held" constraint. Signed-off-by: Dave Marchevsky --- include/uapi/linux/bpf.h | 20 ++++++++++++++++++++ kernel/bpf/helpers.c | 12 ++++++++++-- kernel/bpf/rbtree.c | 29 +++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 2 ++ tools/include/uapi/linux/bpf.h | 20 ++++++++++++++++++++ 5 files changed, 81 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c677d92de3bc..d21e2c99ea14 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5391,6 +5391,24 @@ union bpf_attr { * * Return * Ptr to lock + * + * void *bpf_rbtree_lock(struct bpf_spin_lock *lock) + * Description + * Like bpf_spin_lock helper, but use separate helper for now + * as we don't want this helper to have special meaning to the verifier + * so that we can do rbtree helper calls between rbtree_lock/unlock + * + * Return + * 0 + * + * void *bpf_rbtree_unlock(struct bpf_spin_lock *lock) + * Description + * Like bpf_spin_unlock helper, but use separate helper for now + * as we don't want this helper to have special meaning to the verifier + * so that we can do rbtree helper calls between rbtree_lock/unlock + * + * Return + * 0 */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5607,6 +5625,8 @@ union bpf_attr { FN(rbtree_remove), \ FN(rbtree_free_node), \ FN(rbtree_get_lock), \ + FN(rbtree_lock), \ + FN(rbtree_unlock), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 257a808bb767..fa2dba1dcec8 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -303,7 +303,7 @@ static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock) static DEFINE_PER_CPU(unsigned long, irqsave_flags); -static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock) +inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock) { unsigned long flags; @@ -311,6 +311,7 @@ static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock) __bpf_spin_lock(lock); __this_cpu_write(irqsave_flags, flags); } +EXPORT_SYMBOL(__bpf_spin_lock_irqsave); notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) { @@ -325,7 +326,7 @@ const struct bpf_func_proto bpf_spin_lock_proto = { .arg1_type = ARG_PTR_TO_SPIN_LOCK, }; -static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock) +inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock) { unsigned long flags; @@ -333,6 +334,7 @@ static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock) __bpf_spin_unlock(lock); local_irq_restore(flags); } +EXPORT_SYMBOL(__bpf_spin_unlock_irqrestore); notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) { @@ -1588,6 +1590,8 @@ const struct bpf_func_proto bpf_rbtree_find_proto __weak; const struct bpf_func_proto bpf_rbtree_remove_proto __weak; const struct bpf_func_proto bpf_rbtree_free_node_proto __weak; const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak; +const struct bpf_func_proto bpf_rbtree_lock_proto __weak; +const struct bpf_func_proto bpf_rbtree_unlock_proto __weak; const struct bpf_func_proto * bpf_base_func_proto(enum bpf_func_id func_id) @@ -1689,6 +1693,10 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_rbtree_free_node_proto; case BPF_FUNC_rbtree_get_lock: return &bpf_rbtree_get_lock_proto; + case BPF_FUNC_rbtree_lock: + return &bpf_rbtree_lock_proto; + case BPF_FUNC_rbtree_unlock: + return &bpf_rbtree_unlock_proto; default: break; } diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c index c6f0a2a083f6..bf2e30af82ec 100644 --- a/kernel/bpf/rbtree.c +++ b/kernel/bpf/rbtree.c @@ -262,6 +262,35 @@ const struct bpf_func_proto bpf_rbtree_get_lock_proto = { .arg1_type = ARG_CONST_MAP_PTR, }; +extern void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock); +extern void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock); + +BPF_CALL_1(bpf_rbtree_lock, void *, lock) +{ + __bpf_spin_lock_irqsave((struct bpf_spin_lock *)lock); + return 0; +} + +const struct bpf_func_proto bpf_rbtree_lock_proto = { + .func = bpf_rbtree_lock, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_SPIN_LOCK, +}; + +BPF_CALL_1(bpf_rbtree_unlock, void *, lock) +{ + __bpf_spin_unlock_irqrestore((struct bpf_spin_lock *)lock); + return 0; +} + +const struct bpf_func_proto bpf_rbtree_unlock_proto = { + .func = bpf_rbtree_unlock, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_SPIN_LOCK, +}; + BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree) const struct bpf_map_ops rbtree_map_ops = { .map_meta_equal = bpf_map_meta_equal, diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 535f673882cd..174a355d97fd 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6047,6 +6047,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, } else if (meta->func_id == BPF_FUNC_spin_unlock) { if (process_spin_lock(env, regno, false)) return -EACCES; + } else if (meta->func_id == BPF_FUNC_rbtree_lock || + meta->func_id == BPF_FUNC_rbtree_unlock) { // Do nothing for now } else { verbose(env, "verifier internal error\n"); return -EFAULT; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index c677d92de3bc..d21e2c99ea14 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5391,6 +5391,24 @@ union bpf_attr { * * Return * Ptr to lock + * + * void *bpf_rbtree_lock(struct bpf_spin_lock *lock) + * Description + * Like bpf_spin_lock helper, but use separate helper for now + * as we don't want this helper to have special meaning to the verifier + * so that we can do rbtree helper calls between rbtree_lock/unlock + * + * Return + * 0 + * + * void *bpf_rbtree_unlock(struct bpf_spin_lock *lock) + * Description + * Like bpf_spin_unlock helper, but use separate helper for now + * as we don't want this helper to have special meaning to the verifier + * so that we can do rbtree helper calls between rbtree_lock/unlock + * + * Return + * 0 */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5607,6 +5625,8 @@ union bpf_attr { FN(rbtree_remove), \ FN(rbtree_free_node), \ FN(rbtree_get_lock), \ + FN(rbtree_lock), \ + FN(rbtree_unlock), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Fri Jul 22 18:34:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926742 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03DEAC43334 for ; Fri, 22 Jul 2022 18:35:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236370AbiGVSfE (ORCPT ); Fri, 22 Jul 2022 14:35:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236337AbiGVSfD (ORCPT ); Fri, 22 Jul 2022 14:35:03 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 924409FE28 for ; Fri, 22 Jul 2022 11:35:02 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26MC5lv5009346 for ; Fri, 22 Jul 2022 11:35:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=El6VgYnyfGvGeh+dXySVuzhUz+VLV/enjODNirwpF9o=; b=S0S//wXvt3CDLh/8usnfVlm0gr4kyTDyvhv/MejExuljYQmMK7oq2dXROQiTAWdwFL61 M9/z2Wk0qbkKybqB5kSDpWHrTI1SMDvD6eAwK1SCN6sXOUfzPQCluT2G3Rps9a3w2loI azg3gEfnsExYcELsWrQyPWDyxpDWfHf5Stw= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hej1w09cd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:01 -0700 Received: from twshared10560.18.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:35:00 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id 2F95BAB6F1A3; Fri, 22 Jul 2022 11:34:50 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 07/11] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} Date: Fri, 22 Jul 2022 11:34:34 -0700 Message-ID: <20220722183438.3319790-8-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: jODMBEAaQGOE6mjf3Np9aqlP5c5WjPym X-Proofpoint-ORIG-GUID: jODMBEAaQGOE6mjf3Np9aqlP5c5WjPym X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC The bpf program calling these helpers must hold the spinlock associated with the rbtree map when doing so. Otherwise, a concurrent add/remove operation could corrupt the tree while {add,remove,find} are walking it with callback or pivoting after update. Signed-off-by: Dave Marchevsky --- kernel/bpf/rbtree.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c index bf2e30af82ec..5b1ab73e164f 100644 --- a/kernel/bpf/rbtree.c +++ b/kernel/bpf/rbtree.c @@ -14,6 +14,11 @@ struct bpf_rbtree { BTF_ID_LIST_SINGLE(bpf_rbtree_btf_ids, struct, rb_node); +static bool __rbtree_lock_held(struct bpf_rbtree *tree) +{ + return spin_is_locked((spinlock_t *)tree->lock); +} + static int rbtree_map_alloc_check(union bpf_attr *attr) { if (attr->max_entries || !attr->btf_value_type_id) @@ -93,6 +98,9 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb) struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); struct rb_node *node = (struct rb_node *)value; + if (!__rbtree_lock_held(tree)) + return (u64)NULL; + if (WARN_ON_ONCE(!RB_EMPTY_NODE(node))) return (u64)NULL; @@ -114,6 +122,9 @@ BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb) { struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + if (!__rbtree_lock_held(tree)) + return (u64)NULL; + return (u64)rb_find(key, &tree->root.rb_root, (int (*)(const void *key, const struct rb_node *))cb); @@ -206,6 +217,9 @@ BPF_CALL_2(bpf_rbtree_remove, struct bpf_map *, map, void *, value) struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); struct rb_node *node = (struct rb_node *)value; + if (!__rbtree_lock_held(tree)) + return (u64)NULL; + if (WARN_ON_ONCE(RB_EMPTY_NODE(node))) return (u64)NULL; From patchwork Fri Jul 22 18:34:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926745 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F27D0C433EF for ; Fri, 22 Jul 2022 18:35:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236001AbiGVSfG (ORCPT ); Fri, 22 Jul 2022 14:35:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236378AbiGVSfE (ORCPT ); Fri, 22 Jul 2022 14:35:04 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B130A025A for ; Fri, 22 Jul 2022 11:35:03 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26MHthDO002238 for ; Fri, 22 Jul 2022 11:35:03 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Okld1U3koE437WXUbnhi7hQXZaMdwPbVTzDBZgZIIe4=; b=gs5se/nT/jhb7HMxCbW66QhCPYUns1wW9CmgIzP0qcwqw8FuENaqKg7E6ZP9Nxf//9XX M8sfJn2wAbMGkfaS39qpO6HC7J2uIUXo8o+vplsT4BROiI2OPSgx/jXUEd+Exi3vZHJ1 OiDEw+GmqAoPA5ba+Js2SUxbrnKRKMOznJg= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hg0qp87hj-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:02 -0700 Received: from twshared25107.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:35:00 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id B7EF9AB6F1A6; Fri, 22 Jul 2022 11:34:50 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 08/11] bpf: Add OBJ_NON_OWNING_REF type flag Date: Fri, 22 Jul 2022 11:34:35 -0700 Message-ID: <20220722183438.3319790-9-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: QZZxCtEzstJbnleAKk11Ha3dievshZsV X-Proofpoint-GUID: QZZxCtEzstJbnleAKk11Ha3dievshZsV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Consider a pointer to a type that would normally need acquire / release semantics to be safely held. There may be scenarios where such a pointer can be safely held without the need to acquire a reference. For example, although a PTR_TO_BTF_ID for a rbtree_map node is released via bpf_rbtree_add helper, the helper doesn't change the address of the node and must be called with the rbtree_map's spinlock held. Since the only way to remove a node from the rbtree - bpf_rbtree_remove helper - requires the same lock, the newly-added node cannot be removed by a concurrently-running program until the lock is released. Therefore it is safe to hold a reference to this node until bpf_rbtree_unlock is called. This patch introduces a new type flag and associated verifier logic to handle such "non-owning" references. Currently the only usecase I have is the rbtree example above, so the verifier logic is straightforward: * Tag return types of bpf_rbtree_{add,find} with OBJ_NON_OWNING_REF * These both require the rbtree lock to be held to return anything non-NULL * Since ret type for both is PTR_TO_BTF_ID_OR_NULL, if lock is not held and NULL is returned, existing mark_ptr_or_null_reg logic will clear reg type. * So if mark_ptr_or_null_reg logic turns the returned reg into a PTR_TO_BTF_ID | OBJ_NON_OWNING_REF, verifier knows lock is held. * When the lock is released the verifier invalidates any regs holding non owning refs similarly to existing release_reference logic - but no need to clear ref_obj_id as an 'owning' reference was never acquired. [ TODO: Currently the invalidation logic in clear_rbtree_node_non_owning_refs is not parametrized by map so unlocking any rbtree lock will invalidate all non-owning refs ] Signed-off-by: Dave Marchevsky --- include/linux/bpf.h | 1 + kernel/bpf/rbtree.c | 4 +-- kernel/bpf/verifier.c | 63 +++++++++++++++++++++++++++++++++++++++---- 3 files changed, 61 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eb8c550db0e2..c9c4b4fb019c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -412,6 +412,7 @@ enum bpf_type_flag { /* Size is known at compile time. */ MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS), + OBJ_NON_OWNING_REF = BIT(11 + BPF_BASE_TYPE_BITS), __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c index 5b1ab73e164f..34864fc83209 100644 --- a/kernel/bpf/rbtree.c +++ b/kernel/bpf/rbtree.c @@ -111,7 +111,7 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb) const struct bpf_func_proto bpf_rbtree_add_proto = { .func = bpf_rbtree_add, .gpl_only = true, - .ret_type = RET_PTR_TO_BTF_ID_OR_NULL, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF, .arg1_type = ARG_CONST_MAP_PTR, .arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE, .arg2_btf_id = &bpf_rbtree_btf_ids[0], @@ -133,7 +133,7 @@ BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb) const struct bpf_func_proto bpf_rbtree_find_proto = { .func = bpf_rbtree_find, .gpl_only = true, - .ret_type = RET_PTR_TO_BTF_ID_OR_NULL, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF, .ret_btf_id = &bpf_rbtree_btf_ids[0], .arg1_type = ARG_CONST_MAP_PTR, .arg2_type = ARG_ANYTHING, diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 174a355d97fd..4f46b2dfbc4b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -467,6 +467,11 @@ static bool type_is_rdonly_mem(u32 type) return type & MEM_RDONLY; } +static bool type_is_non_owning_ref(u32 type) +{ + return type & OBJ_NON_OWNING_REF; +} + static bool type_may_be_null(u32 type) { return type & PTR_MAYBE_NULL; @@ -555,7 +560,9 @@ static bool function_returns_rbtree_node(enum bpf_func_id func_id) static const char *reg_type_str(struct bpf_verifier_env *env, enum bpf_reg_type type) { - char postfix[16] = {0}, prefix[32] = {0}; + char postfix[32] = {0}, prefix[32] = {0}; + unsigned int postfix_idx = 0; + static const char * const str[] = { [NOT_INIT] = "?", [SCALAR_VALUE] = "scalar", @@ -579,11 +586,18 @@ static const char *reg_type_str(struct bpf_verifier_env *env, [PTR_TO_MAP_KEY] = "map_key", }; - if (type & PTR_MAYBE_NULL) { + if (type_may_be_null(type)) { if (base_type(type) == PTR_TO_BTF_ID) - strncpy(postfix, "or_null_", 16); + postfix_idx += strlcpy(postfix + postfix_idx, "or_null_", 32 - postfix_idx); else - strncpy(postfix, "_or_null", 16); + postfix_idx += strlcpy(postfix + postfix_idx, "_or_null", 32 - postfix_idx); + } + + if (type_is_non_owning_ref(type)) { + if (base_type(type) == PTR_TO_BTF_ID) + postfix_idx += strlcpy(postfix + postfix_idx, "non_own_", 32 - postfix_idx); + else + postfix_idx += strlcpy(postfix + postfix_idx, "_non_own", 32 - postfix_idx); } if (type & MEM_RDONLY) @@ -5684,12 +5698,18 @@ static const struct bpf_reg_types int_ptr_types = { }, }; +static const struct bpf_reg_types btf_ptr_types = { + .types = { + PTR_TO_BTF_ID, + PTR_TO_BTF_ID | OBJ_NON_OWNING_REF, + }, +}; + static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } }; static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } }; static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } }; static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } }; static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } }; -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } }; static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } }; static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } }; static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } }; @@ -6635,6 +6655,33 @@ static int release_reference(struct bpf_verifier_env *env, return 0; } +static void clear_non_owning_ref_regs(struct bpf_verifier_env *env, + struct bpf_func_state *state) +{ + struct bpf_reg_state *regs = state->regs, *reg; + int i; + + for (i = 0; i < MAX_BPF_REG; i++) + if (type_is_non_owning_ref(regs[i].type)) + mark_reg_unknown(env, regs, i); + + bpf_for_each_spilled_reg(i, state, reg) { + if (!reg) + continue; + if (type_is_non_owning_ref(reg->type)) + __mark_reg_unknown(env, reg); + } +} + +static void clear_rbtree_node_non_owning_refs(struct bpf_verifier_env *env) +{ + struct bpf_verifier_state *vstate = env->cur_state; + int i; + + for (i = 0; i <= vstate->curframe; i++) + clear_non_owning_ref_regs(env, vstate->frame[i]); +} + static void clear_caller_saved_regs(struct bpf_verifier_env *env, struct bpf_reg_state *regs) { @@ -7436,6 +7483,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn } } break; + case BPF_FUNC_rbtree_unlock: + /* TODO clear_rbtree_node_non_owning_refs calls should be + * parametrized by base_type or ideally by owning map + */ + clear_rbtree_node_non_owning_refs(env); + break; } if (err) From patchwork Fri Jul 22 18:34:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926748 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBCF6C43334 for ; Fri, 22 Jul 2022 18:35:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234547AbiGVSfO (ORCPT ); Fri, 22 Jul 2022 14:35:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236366AbiGVSfM (ORCPT ); Fri, 22 Jul 2022 14:35:12 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5422DB1CA for ; Fri, 22 Jul 2022 11:35:10 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26MHu43L003229 for ; Fri, 22 Jul 2022 11:35:10 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=EZUgqNntwmRMvAGD1nbOqMdiHHe+FBcGIfNMuR3ugLk=; b=JuX3sEVXgdZ4XaCv05dubBattM1PKfyuZER4f1dROlXxE5aGgcy6VlWJij0DmlZWVDWt CpLG4gn3VmcG4Nx+CvpcVums9oZS11SABoNNeG0w5d7K2U7gmiLbTfxRGh1QSlWjgI+G vG/0JkBALQ0OBs41E9GZwNGQqTGoZQ+7orI= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hg0qp87jh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:09 -0700 Received: from twshared1866.09.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:35:08 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id 3FE93AB6F1A8; Fri, 22 Jul 2022 11:34:51 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 09/11] bpf: Add CONDITIONAL_RELEASE type flag Date: Fri, 22 Jul 2022 11:34:36 -0700 Message-ID: <20220722183438.3319790-10-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: VrT7hGowEHTA2RIAJ0aJ1Yf-MrCb0CwC X-Proofpoint-GUID: VrT7hGowEHTA2RIAJ0aJ1Yf-MrCb0CwC X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Currently if a helper proto has an arg with OBJ_RELEASE flag, the verifier assumes the 'release' logic in the helper will always succeed, and therefore always clears the arg reg, other regs w/ same ref_obj_id, and releases the reference. This poses a problem for 'release' helpers which may not always succeed. For example, bpf_rbtree_add will fail to add the passed-in node to a bpf_rbtree if the rbtree's lock is not held when the helper is called. In this case the helper returns NULL and the calling bpf program must release the node in another way before terminating to avoid leaking memory. An example of such logic: 1 struct node_data *node = bpf_rbtree_alloc_node(&rbtree, ...); 2 struct node_data *ret = bpf_rbtree_add(&rbtree, node); 3 if (!ret) { 4 bpf_rbtree_free_node(node); 5 return 0; 6 } 7 bpf_trace_printk("%d\n", ret->id); However, current verifier logic does not allow this: after line 2, ref_obj_id of reg holding 'node' (BPF_REG_2) will be released via release_reference function, which will mark BPF_REG_2 and any other reg with same ref_obj_id as unknown. As a result neither ret nor node will point to anything useful if line 3's check passes. Additionally, since the reference is unconditionally released, the program would pass the verifier without the null check. This patch adds 'conditional release' semantics so that the verifier can handle the above example correctly. The CONDITIONAL_RELEASE type flag works in concert with the existing OBJ_RELEASE flag - the latter is used to tag an argument, while the new type flag is used to tag return type. If a helper has an OBJ_RELEASE arg type and CONDITIONAL_RELEASE return type, the helper is considered to use its return value to indicate success or failure of the release operation. NULL is returned if release fails, non-null otherwise. For my concrete usecase - bpf_rbtree_add - CONDITIONAL_RELEASE works in concert with OBJ_NON_OWNING_REF: successful release results in a non-owning reference being returned, allowing line 7 in above example. Instead of unconditionally releasing the OBJ_RELEASE reference when doing check_helper_call, for CONDITIONAL_RELEASE helpers the verifier will wait until the return value is checked for null. If not null: the reference is released If null: no reference is released. Since other regs w/ same ref_obj_id were not marked unknown by check_helper_call, they can be used to release the reference via other means (line 4 above), It's necessary to prevent conditionally-released ref_obj_id regs from being used between the release helper and null check. For example: 1 struct node_data *node = bpf_rbtree_alloc_node(&rbtree, ...); 2 struct node_data *ret = bpf_rbtree_add(&rbtree, node); 3 do_something_with_a_node(node); 4 if (!ret) { 5 bpf_rbtree_free_node(node); 6 return 0; 7 } Line 3 shouldn't be allowed since node may have been released. The verifier tags all regs with ref_obj_id of the conditionally-released arg in the period between the helper call and null check for this reason. Why no matching CONDITIONAL_ACQUIRE type flag? Existing verifier logic already treats acquire of an _OR_NULL type as a conditional acquire. Consider this code: 1 struct thing *i = acquire_helper_that_returns_thing_or_null(); 2 if (!i) 3 return 0; 4 manipulate_thing(i); 5 release_thing(i); After line 1, BPF_REG_0 will have an _OR_NULL type and a ref_obj_id set. When the verifier sees line 2's conditional jump, existing logic in mark_ptr_or_null_regs, specifically the if: if (ref_obj_id && ref_obj_id == id && is_null) /* regs[regno] is in the " == NULL" branch. * No one could have freed the reference state before * doing the NULL check. */ WARN_ON_ONCE(release_reference_state(state, id)); will release the reference in the is_null state. [ TODO: Either need to remove WARN_ON_ONCE there without adding CONDITIONAL_ACQUIRE flag or add the flag and don't WARN_ON_ONCE if it's set. Left out of first pass for simplicity's sake. ] Signed-off-by: Dave Marchevsky --- include/linux/bpf.h | 3 + include/linux/bpf_verifier.h | 1 + kernel/bpf/rbtree.c | 2 +- kernel/bpf/verifier.c | 122 +++++++++++++++++++++++++++++++---- 4 files changed, 113 insertions(+), 15 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c9c4b4fb019c..a601ab30a2b1 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -413,6 +413,9 @@ enum bpf_type_flag { MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS), OBJ_NON_OWNING_REF = BIT(11 + BPF_BASE_TYPE_BITS), + + CONDITIONAL_RELEASE = BIT(12 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 9c017575c034..bdc8c48c2343 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -313,6 +313,7 @@ struct bpf_verifier_state { u32 insn_idx; u32 curframe; u32 active_spin_lock; + u32 active_cond_ref_obj_id; bool speculative; /* first and last insn idx of this verifier state */ diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c index 34864fc83209..dcf8f69d4ada 100644 --- a/kernel/bpf/rbtree.c +++ b/kernel/bpf/rbtree.c @@ -111,7 +111,7 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb) const struct bpf_func_proto bpf_rbtree_add_proto = { .func = bpf_rbtree_add, .gpl_only = true, - .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF | CONDITIONAL_RELEASE, .arg1_type = ARG_CONST_MAP_PTR, .arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE, .arg2_btf_id = &bpf_rbtree_btf_ids[0], diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4f46b2dfbc4b..f80e161170de 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -472,6 +472,11 @@ static bool type_is_non_owning_ref(u32 type) return type & OBJ_NON_OWNING_REF; } +static bool type_is_cond_release(u32 type) +{ + return type & CONDITIONAL_RELEASE; +} + static bool type_may_be_null(u32 type) { return type & PTR_MAYBE_NULL; @@ -600,6 +605,15 @@ static const char *reg_type_str(struct bpf_verifier_env *env, postfix_idx += strlcpy(postfix + postfix_idx, "_non_own", 32 - postfix_idx); } + if (type_is_cond_release(type)) { + if (base_type(type) == PTR_TO_BTF_ID) + postfix_idx += strlcpy(postfix + postfix_idx, "cond_rel_", + 32 - postfix_idx); + else + postfix_idx += strlcpy(postfix + postfix_idx, "_cond_rel", + 32 - postfix_idx); + } + if (type & MEM_RDONLY) strncpy(prefix, "rdonly_", 32); if (type & MEM_ALLOC) @@ -1211,6 +1225,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state, dst_state->speculative = src->speculative; dst_state->curframe = src->curframe; dst_state->active_spin_lock = src->active_spin_lock; + dst_state->active_cond_ref_obj_id = src->active_cond_ref_obj_id; dst_state->branches = src->branches; dst_state->parent = src->parent; dst_state->first_insn_idx = src->first_insn_idx; @@ -1418,6 +1433,7 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg) return; } + reg->type &= ~CONDITIONAL_RELEASE; reg->type &= ~PTR_MAYBE_NULL; } @@ -6635,24 +6651,78 @@ static void release_reg_references(struct bpf_verifier_env *env, } } +static int __release_reference(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate, + int ref_obj_id) +{ + int err; + int i; + + err = release_reference_state(vstate->frame[vstate->curframe], ref_obj_id); + if (err) + return err; + + for (i = 0; i <= vstate->curframe; i++) + release_reg_references(env, vstate->frame[i], ref_obj_id); + return 0; +} + /* The pointer with the specified id has released its reference to kernel * resources. Identify all copies of the same pointer and clear the reference. */ static int release_reference(struct bpf_verifier_env *env, int ref_obj_id) { - struct bpf_verifier_state *vstate = env->cur_state; - int err; + return __release_reference(env, env->cur_state, ref_obj_id); +} + +static void tag_reference_cond_release_regs(struct bpf_verifier_env *env, + struct bpf_func_state *state, + int ref_obj_id, + bool remove) +{ + struct bpf_reg_state *regs = state->regs, *reg; int i; - err = release_reference_state(cur_func(env), ref_obj_id); - if (err) - return err; + for (i = 0; i < MAX_BPF_REG; i++) + if (regs[i].ref_obj_id == ref_obj_id) { + if (remove) + regs[i].type &= ~CONDITIONAL_RELEASE; + else + regs[i].type |= CONDITIONAL_RELEASE; + } + + bpf_for_each_spilled_reg(i, state, reg) { + if (!reg) + continue; + if (reg->ref_obj_id == ref_obj_id) { + if (remove) + reg->type &= ~CONDITIONAL_RELEASE; + else + reg->type |= CONDITIONAL_RELEASE; + } + } +} + +static void tag_reference_cond_release(struct bpf_verifier_env *env, + int ref_obj_id) +{ + struct bpf_verifier_state *vstate = env->cur_state; + int i; for (i = 0; i <= vstate->curframe; i++) - release_reg_references(env, vstate->frame[i], ref_obj_id); + tag_reference_cond_release_regs(env, vstate->frame[i], + ref_obj_id, false); +} - return 0; +static void untag_reference_cond_release(struct bpf_verifier_env *env, + struct bpf_verifier_state *vstate, + int ref_obj_id) +{ + int i; + + for (i = 0; i <= vstate->curframe; i++) + tag_reference_cond_release_regs(env, vstate->frame[i], + ref_obj_id, true); } static void clear_non_owning_ref_regs(struct bpf_verifier_env *env, @@ -7406,7 +7476,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn if (arg_type_is_dynptr(fn->arg_type[meta.release_regno - BPF_REG_1])) err = unmark_stack_slots_dynptr(env, ®s[meta.release_regno]); else if (meta.ref_obj_id) - err = release_reference(env, meta.ref_obj_id); + if (type_is_cond_release(fn->ret_type)) { + if (env->cur_state->active_cond_ref_obj_id) { + verbose(env, "can't handle >1 cond_release\n"); + return err; + } + env->cur_state->active_cond_ref_obj_id = meta.ref_obj_id; + tag_reference_cond_release(env, meta.ref_obj_id); + err = 0; + } else { + err = release_reference(env, meta.ref_obj_id); + } /* meta.ref_obj_id can only be 0 if register that is meant to be * released is NULL, which must be > R0. */ @@ -10040,8 +10120,8 @@ static void __mark_ptr_or_null_regs(struct bpf_func_state *state, u32 id, /* The logic is similar to find_good_pkt_pointers(), both could eventually * be folded together at some point. */ -static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno, - bool is_null) +static int mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno, + bool is_null, struct bpf_verifier_env *env) { struct bpf_func_state *state = vstate->frame[vstate->curframe]; struct bpf_reg_state *regs = state->regs; @@ -10056,8 +10136,19 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno, */ WARN_ON_ONCE(release_reference_state(state, id)); + if (type_is_cond_release(regs[regno].type)) { + if (!is_null) { + __release_reference(env, vstate, vstate->active_cond_ref_obj_id); + vstate->active_cond_ref_obj_id = 0; + } else { + untag_reference_cond_release(env, vstate, vstate->active_cond_ref_obj_id); + vstate->active_cond_ref_obj_id = 0; + } + } for (i = 0; i <= vstate->curframe; i++) __mark_ptr_or_null_regs(vstate->frame[i], id, is_null); + + return 0; } static bool try_match_pkt_pointers(const struct bpf_insn *insn, @@ -10365,10 +10456,10 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, /* Mark all identical registers in each branch as either * safe or unknown depending R == 0 or R != 0 conditional. */ - mark_ptr_or_null_regs(this_branch, insn->dst_reg, - opcode == BPF_JNE); - mark_ptr_or_null_regs(other_branch, insn->dst_reg, - opcode == BPF_JEQ); + err = mark_ptr_or_null_regs(this_branch, insn->dst_reg, + opcode == BPF_JNE, env); + err = mark_ptr_or_null_regs(other_branch, insn->dst_reg, + opcode == BPF_JEQ, env); } else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], this_branch, other_branch) && is_pointer_value(env, insn->dst_reg)) { @@ -11809,6 +11900,9 @@ static bool states_equal(struct bpf_verifier_env *env, if (old->active_spin_lock != cur->active_spin_lock) return false; + if (old->active_cond_ref_obj_id != cur->active_cond_ref_obj_id) + return false; + /* for states to be equal callsites have to be the same * and all frame states need to be equivalent */ From patchwork Fri Jul 22 18:34:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926747 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3389EC433EF for ; Fri, 22 Jul 2022 18:35:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236016AbiGVSfI (ORCPT ); Fri, 22 Jul 2022 14:35:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236232AbiGVSfH (ORCPT ); Fri, 22 Jul 2022 14:35:07 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CECB29FE28 for ; Fri, 22 Jul 2022 11:35:03 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26MHtoeG002335 for ; Fri, 22 Jul 2022 11:35:03 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=9jUdTz2V+EjxQdQYojBl1SW1YXrs04wGf0Gm8IONLJ0=; b=ZqxsljRYOG8HAolqFQM9O4gcnc+0yg39P7PYVpP5IN5cdymFfJNVQm5c5ssFCxCC7hFT eJ/i7gIs5nlYwM/fv5DXDgd8v6Wb7PYX7c5IyQ9HuDq10GptWWWmx7bIgdAw3EbligB2 FyU1QB7cs9rosUhfWQcjrQC5WD+LKhPioCU= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hg0qp87hu-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:03 -0700 Received: from twshared10560.18.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:35:02 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id BBFC6AB6F1AB; Fri, 22 Jul 2022 11:34:51 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 10/11] bpf: Introduce PTR_ITER and PTR_ITER_END type flags Date: Fri, 22 Jul 2022 11:34:37 -0700 Message-ID: <20220722183438.3319790-11-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: CTAYvy-Gp9E05K1NLO0EgCQJvKofno1y X-Proofpoint-GUID: CTAYvy-Gp9E05K1NLO0EgCQJvKofno1y X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add two type flags, PTR_ITER and PTR_ITER_END, meant to support the following pattern for iterating data structures: struct node *elem = data_structure_iter_first(&some_map) while (elem) { do_something(); elem = data_structure_iter_next(&some_map, elem); } Currently the ret_type of both _first() and _next() helpers would be PTR_TO_BTF_ID_OR_NULL and the verifier would consider the loop to be infinite as it knows nothing about an arbitrary PTR_TO_BTF_ID value. However, if we can express "this PTR_TO_BTF_ID will eventually be NULL if used in iteration helpers" via a new type flag, the verifier can use this information to determine that the loop will terminate while still verifying the loop body. So for our contrived example above, _first() now returns a PTR_TO_BTF_ID_OR_NULL | PTR_ITER, which _next() expects as input, itself returning PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END. The verifier does nothing special for PTR_ITER, so the first iteration of the example loop will result in both elem == NULL and elem != NULL branches being verified. When verifying the latter branch, elem will become a PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END. When the verifier sees a reg holding PTR_ITER_END in a conditional jump it knows the reg type and val can be replaced with SCALAR 0. So in the example loop on 2nd iteration elem == NULL will be assumed and verification will continue with that branch only. [ TODO: * PTR_ITER needs to be associated with the helper that produced it, to prevent something like: struct node *elem = data_structure_iter_last(&some_map) while (elem) { do_something(); elem = data_structure_iter_next(&some_map, elem); } i.e. _first() and _next() should be used together, same with _last() and _prev(). * This is currently very unsafe. Per multiple conversations w/ Alexei and Andrii, there are probably some ways forward here, but may be more complex than it's worth. If so, can migrate to a callback-based approach with similar functionality . ] Signed-off-by: Dave Marchevsky --- include/linux/bpf.h | 3 ++ include/uapi/linux/bpf.h | 34 ++++++++++++++- kernel/bpf/helpers.c | 12 ++++++ kernel/bpf/rbtree.c | 78 ++++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 53 +++++++++++++++++++++-- tools/include/uapi/linux/bpf.h | 34 ++++++++++++++- 6 files changed, 209 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a601ab30a2b1..819778e05060 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -416,6 +416,9 @@ enum bpf_type_flag { CONDITIONAL_RELEASE = BIT(12 + BPF_BASE_TYPE_BITS), + PTR_ITER = BIT(13 + BPF_BASE_TYPE_BITS), + PTR_ITER_END = BIT(14 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d21e2c99ea14..3869653c6aeb 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5409,6 +5409,34 @@ union bpf_attr { * * Return * 0 + * + * long bpf_rbtree_first(struct bpf_map *map) + * Description + * Return the first node in the tree according to sort order + * + * Return + * If found, ptr to node, otherwise NULL + * + * long bpf_rbtree_last(struct bpf_map *map) + * Description + * Return the last node in the tree according to sort order + * + * Return + * If found, ptr to node, otherwise NULL + * + * long bpf_rbtree_next(struct bpf_map *map, void *cur) + * Description + * Return the next node in the tree according to sort order + * + * Return + * If found, ptr to node, otherwise NULL + * + * long bpf_rbtree_prev(struct bpf_map *map, void *cur) + * Description + * Return the previous node in the tree according to sort order + * + * Return + * If found, ptr to node, otherwise NULL */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5620,13 +5648,17 @@ union bpf_attr { FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ FN(rbtree_alloc_node), \ - FN(rbtree_add), \ + FN(rbtree_add), \ FN(rbtree_find), \ FN(rbtree_remove), \ FN(rbtree_free_node), \ FN(rbtree_get_lock), \ FN(rbtree_lock), \ FN(rbtree_unlock), \ + FN(rbtree_first), \ + FN(rbtree_last), \ + FN(rbtree_next), \ + FN(rbtree_prev), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index fa2dba1dcec8..fdc505139b99 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1592,6 +1592,10 @@ const struct bpf_func_proto bpf_rbtree_free_node_proto __weak; const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak; const struct bpf_func_proto bpf_rbtree_lock_proto __weak; const struct bpf_func_proto bpf_rbtree_unlock_proto __weak; +const struct bpf_func_proto bpf_rbtree_first_proto __weak; +const struct bpf_func_proto bpf_rbtree_last_proto __weak; +const struct bpf_func_proto bpf_rbtree_next_proto __weak; +const struct bpf_func_proto bpf_rbtree_prev_proto __weak; const struct bpf_func_proto * bpf_base_func_proto(enum bpf_func_id func_id) @@ -1697,6 +1701,14 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_rbtree_lock_proto; case BPF_FUNC_rbtree_unlock: return &bpf_rbtree_unlock_proto; + case BPF_FUNC_rbtree_first: + return &bpf_rbtree_first_proto; + case BPF_FUNC_rbtree_last: + return &bpf_rbtree_last_proto; + case BPF_FUNC_rbtree_next: + return &bpf_rbtree_next_proto; + case BPF_FUNC_rbtree_prev: + return &bpf_rbtree_prev_proto; default: break; } diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c index dcf8f69d4ada..f1a06c340d88 100644 --- a/kernel/bpf/rbtree.c +++ b/kernel/bpf/rbtree.c @@ -140,6 +140,84 @@ const struct bpf_func_proto bpf_rbtree_find_proto = { .arg3_type = ARG_PTR_TO_FUNC, }; +BPF_CALL_1(bpf_rbtree_first, struct bpf_map *, map) +{ + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + + if (!__rbtree_lock_held(tree)) + return (u64)NULL; + + return (u64)rb_first_cached(&tree->root); +} + +const struct bpf_func_proto bpf_rbtree_first_proto = { + .func = bpf_rbtree_first, + .gpl_only = true, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER | OBJ_NON_OWNING_REF, + .ret_btf_id = &bpf_rbtree_btf_ids[0], + .arg1_type = ARG_CONST_MAP_PTR, +}; + +BPF_CALL_1(bpf_rbtree_last, struct bpf_map *, map) +{ + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + + if (!__rbtree_lock_held(tree)) + return (u64)NULL; + + return (u64)rb_last(&tree->root.rb_root); +} + +const struct bpf_func_proto bpf_rbtree_last_proto = { + .func = bpf_rbtree_last, + .gpl_only = true, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER | OBJ_NON_OWNING_REF, + .ret_btf_id = &bpf_rbtree_btf_ids[0], + .arg1_type = ARG_CONST_MAP_PTR, +}; + +BPF_CALL_2(bpf_rbtree_next, struct bpf_map *, map, void *, cur) +{ + struct rb_node *next = rb_next((struct rb_node *)cur); + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + + if (!__rbtree_lock_held(tree)) + return (u64)NULL; + + return (u64)next; +} + +const struct bpf_func_proto bpf_rbtree_next_proto = { + .func = bpf_rbtree_next, + .gpl_only = true, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END | OBJ_NON_OWNING_REF, + .ret_btf_id = &bpf_rbtree_btf_ids[0], + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID | PTR_ITER, + .arg2_btf_id = &bpf_rbtree_btf_ids[0], +}; + +BPF_CALL_2(bpf_rbtree_prev, struct bpf_map *, map, void *, cur) +{ + struct rb_node *node = (struct rb_node *)cur; + struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map); + + if (!__rbtree_lock_held(tree)) + return (u64)NULL; + + return (u64)rb_prev(node); +} + +const struct bpf_func_proto bpf_rbtree_prev_proto = { + .func = bpf_rbtree_prev, + .gpl_only = true, + .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END | OBJ_NON_OWNING_REF, + .ret_btf_id = &bpf_rbtree_btf_ids[0], + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID | PTR_ITER, + .arg2_btf_id = &bpf_rbtree_btf_ids[0], +}; + /* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are * 'rb_node *', so field name of rb_node within containing struct is not * needed. diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f80e161170de..e1580a01cfe3 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -482,6 +482,11 @@ static bool type_may_be_null(u32 type) return type & PTR_MAYBE_NULL; } +static bool type_is_iter(u32 type) +{ + return type & PTR_ITER || type & PTR_ITER_END; +} + static bool is_acquire_function(enum bpf_func_id func_id, const struct bpf_map *map) { @@ -547,14 +552,20 @@ static bool function_manipulates_rbtree_node(enum bpf_func_id func_id) { return func_id == BPF_FUNC_rbtree_add || func_id == BPF_FUNC_rbtree_remove || - func_id == BPF_FUNC_rbtree_free_node; + func_id == BPF_FUNC_rbtree_free_node || + func_id == BPF_FUNC_rbtree_next || + func_id == BPF_FUNC_rbtree_prev; } static bool function_returns_rbtree_node(enum bpf_func_id func_id) { return func_id == BPF_FUNC_rbtree_alloc_node || func_id == BPF_FUNC_rbtree_add || - func_id == BPF_FUNC_rbtree_remove; + func_id == BPF_FUNC_rbtree_remove || + func_id == BPF_FUNC_rbtree_first || + func_id == BPF_FUNC_rbtree_last || + func_id == BPF_FUNC_rbtree_next || + func_id == BPF_FUNC_rbtree_prev; } /* string representation of 'enum bpf_reg_type' @@ -614,6 +625,12 @@ static const char *reg_type_str(struct bpf_verifier_env *env, 32 - postfix_idx); } + if (type_is_iter(type)) { + postfix_idx += strlcpy(postfix + postfix_idx, "_iter", 32 - postfix_idx); + if (type & PTR_ITER_END) + postfix_idx += strlcpy(postfix + postfix_idx, "_end", 32 - postfix_idx); + } + if (type & MEM_RDONLY) strncpy(prefix, "rdonly_", 32); if (type & MEM_ALLOC) @@ -1428,7 +1445,7 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg) map->map_type == BPF_MAP_TYPE_SOCKHASH) { reg->type = PTR_TO_SOCKET; } else { - reg->type = PTR_TO_MAP_VALUE; + reg->type &= ~PTR_MAYBE_NULL; } return; } @@ -3021,6 +3038,11 @@ static bool __is_pointer_value(bool allow_ptr_leaks, return reg->type != SCALAR_VALUE; } +static bool __is_iter_end(const struct bpf_reg_state *reg) +{ + return reg->type & PTR_ITER_END; +} + static void save_register_state(struct bpf_func_state *state, int spi, struct bpf_reg_state *reg, int size) @@ -5666,6 +5688,8 @@ static const struct bpf_reg_types map_key_value_types = { PTR_TO_PACKET_META, PTR_TO_MAP_KEY, PTR_TO_MAP_VALUE, + PTR_TO_MAP_VALUE | PTR_ITER, + PTR_TO_MAP_VALUE | PTR_ITER_END, }, }; @@ -5793,6 +5817,17 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, if (arg_type & PTR_MAYBE_NULL) type &= ~PTR_MAYBE_NULL; + /* TYPE | PTR_ITER is valid input for helpers that expect TYPE + * TYPE is not valid input for helpers that expect TYPE | PTR_ITER + */ + if (type_is_iter(arg_type)) { + if (!type_is_iter(type)) + goto not_found; + + type &= ~PTR_ITER; + type &= ~PTR_ITER_END; + } + for (i = 0; i < ARRAY_SIZE(compatible->types); i++) { expected = compatible->types[i]; if (expected == NOT_INIT) @@ -5802,6 +5837,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, goto found; } +not_found: verbose(env, "R%d type=%s expected=", regno, reg_type_str(env, reg->type)); for (j = 0; j + 1 < i; j++) verbose(env, "%s, ", reg_type_str(env, compatible->types[j])); @@ -9762,6 +9798,17 @@ static int is_branch_taken(struct bpf_reg_state *reg, u64 val, u8 opcode, bool is_jmp32) { if (__is_pointer_value(false, reg)) { + if (__is_iter_end(reg) && val == 0) { + __mark_reg_const_zero(reg); + switch (opcode) { + case BPF_JEQ: + return 1; + case BPF_JNE: + return 0; + default: + return -1; + } + } if (!reg_type_not_null(reg->type)) return -1; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d21e2c99ea14..3869653c6aeb 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5409,6 +5409,34 @@ union bpf_attr { * * Return * 0 + * + * long bpf_rbtree_first(struct bpf_map *map) + * Description + * Return the first node in the tree according to sort order + * + * Return + * If found, ptr to node, otherwise NULL + * + * long bpf_rbtree_last(struct bpf_map *map) + * Description + * Return the last node in the tree according to sort order + * + * Return + * If found, ptr to node, otherwise NULL + * + * long bpf_rbtree_next(struct bpf_map *map, void *cur) + * Description + * Return the next node in the tree according to sort order + * + * Return + * If found, ptr to node, otherwise NULL + * + * long bpf_rbtree_prev(struct bpf_map *map, void *cur) + * Description + * Return the previous node in the tree according to sort order + * + * Return + * If found, ptr to node, otherwise NULL */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5620,13 +5648,17 @@ union bpf_attr { FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ FN(rbtree_alloc_node), \ - FN(rbtree_add), \ + FN(rbtree_add), \ FN(rbtree_find), \ FN(rbtree_remove), \ FN(rbtree_free_node), \ FN(rbtree_get_lock), \ FN(rbtree_lock), \ FN(rbtree_unlock), \ + FN(rbtree_first), \ + FN(rbtree_last), \ + FN(rbtree_next), \ + FN(rbtree_prev), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Fri Jul 22 18:34:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Marchevsky X-Patchwork-Id: 12926749 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8ED97C433EF for ; Fri, 22 Jul 2022 18:35:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235934AbiGVSfQ (ORCPT ); Fri, 22 Jul 2022 14:35:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235739AbiGVSfP (ORCPT ); Fri, 22 Jul 2022 14:35:15 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA2B2222A7 for ; Fri, 22 Jul 2022 11:35:13 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26M6LSkC011327 for ; Fri, 22 Jul 2022 11:35:13 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=vWJ04e9HlRIugBxVyk5DZsLK/c/mhW8RV268HwXg76A=; b=ddT5W0rZ1pHh/+L3rmzt2RETlcJHPigdf1BtkN4kfVlUTwNxPx98zaNsB84ieC6BP+eV l+7T/Nwbywgoz7N364BmGlDiDMePaybK3Y49Z4yOw/gjX6YSlqQZ51as42C4bAmkX4/2 dsIwm7zlW7+dFQ/r0/JNNsZHAw4MGIo6g0Q= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hfphrbkut-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 22 Jul 2022 11:35:13 -0700 Received: from twshared25107.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Fri, 22 Jul 2022 11:35:12 -0700 Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236) id 480DAAB6F1AD; Fri, 22 Jul 2022 11:34:52 -0700 (PDT) From: Dave Marchevsky To: CC: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Tejun Heo , Dave Marchevsky Subject: [RFC PATCH bpf-next 11/11] selftests/bpf: Add rbtree map tests Date: Fri, 22 Jul 2022 11:34:38 -0700 Message-ID: <20220722183438.3319790-12-davemarchevsky@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com> References: <20220722183438.3319790-1-davemarchevsky@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: ldDKxP-4pYW3dnpEVLIzUFYbEywDLCpo X-Proofpoint-GUID: ldDKxP-4pYW3dnpEVLIzUFYbEywDLCpo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-22_06,2022-07-21_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add tests demonstrating happy path of rbtree map usage as well as exercising numerous failure paths and conditions. Structure of failing test runner is based on dynptr tests. Signed-off-by: Dave Marchevsky --- .../selftests/bpf/prog_tests/rbtree_map.c | 164 ++++++++++++ .../testing/selftests/bpf/progs/rbtree_map.c | 111 ++++++++ .../selftests/bpf/progs/rbtree_map_fail.c | 236 ++++++++++++++++++ .../bpf/progs/rbtree_map_load_fail.c | 24 ++ 4 files changed, 535 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree_map.c create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map.c create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_fail.c create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c new file mode 100644 index 000000000000..17cadcd05ee4 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c @@ -0,0 +1,164 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ + +#include +#include +#include "rbtree_map.skel.h" +#include "rbtree_map_fail.skel.h" +#include "rbtree_map_load_fail.skel.h" + +static size_t log_buf_sz = 1048576; /* 1 MB */ +static char obj_log_buf[1048576]; + +static struct { + const char *prog_name; + const char *expected_err_msg; +} rbtree_prog_load_fail_tests[] = { + {"rb_node__field_store", "only read is supported"}, + {"rb_node__alloc_no_add", "Unreleased reference id=2 alloc_insn=3"}, + {"rb_node__two_alloc_one_add", "Unreleased reference id=2 alloc_insn=3"}, + {"rb_node__remove_no_free", "Unreleased reference id=5 alloc_insn=28"}, + {"rb_tree__add_wrong_type", "rbtree: R2 is of type task_struct but node_data is expected"}, + {"rb_tree__conditional_release_helper_usage", + "R2 type=ptr_cond_rel_ expected=ptr_"}, +}; + +void test_rbtree_map_load_fail(void) +{ + struct rbtree_map_load_fail *skel; + + skel = rbtree_map_load_fail__open_and_load(); + if (!ASSERT_ERR_PTR(skel, "rbtree_map_load_fail__open_and_load")) + rbtree_map_load_fail__destroy(skel); +} + +static void verify_fail(const char *prog_name, const char *expected_err_msg) +{ + LIBBPF_OPTS(bpf_object_open_opts, opts); + struct rbtree_map_fail *skel; + struct bpf_program *prog; + int err; + + opts.kernel_log_buf = obj_log_buf; + opts.kernel_log_size = log_buf_sz; + opts.kernel_log_level = 1; + + skel = rbtree_map_fail__open_opts(&opts); + if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open_opts")) + goto cleanup; + + prog = bpf_object__find_program_by_name(skel->obj, prog_name); + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) + goto cleanup; + + bpf_program__set_autoload(prog, true); + err = rbtree_map_fail__load(skel); + if (!ASSERT_ERR(err, "unexpected load success")) + goto cleanup; + + if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) { + fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg); + fprintf(stderr, "Verifier output: %s\n", obj_log_buf); + } + +cleanup: + rbtree_map_fail__destroy(skel); +} + +void test_rbtree_map_alloc_node__size_too_small(void) +{ + struct rbtree_map_fail *skel; + struct bpf_program *prog; + struct bpf_link *link; + int err; + + skel = rbtree_map_fail__open(); + if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open")) + goto cleanup; + + prog = skel->progs.alloc_node__size_too_small; + bpf_program__set_autoload(prog, true); + + err = rbtree_map_fail__load(skel); + if (!ASSERT_OK(err, "unexpected load fail")) + goto cleanup; + + link = bpf_program__attach(skel->progs.alloc_node__size_too_small); + if (!ASSERT_OK_PTR(link, "link")) + goto cleanup; + + syscall(SYS_getpgid); + + ASSERT_EQ(skel->bss->size_too_small__alloc_fail, 1, "alloc_fail"); + + bpf_link__destroy(link); +cleanup: + rbtree_map_fail__destroy(skel); +} + +void test_rbtree_map_add_node__no_lock(void) +{ + struct rbtree_map_fail *skel; + struct bpf_program *prog; + struct bpf_link *link; + int err; + + skel = rbtree_map_fail__open(); + if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open")) + goto cleanup; + + prog = skel->progs.add_node__no_lock; + bpf_program__set_autoload(prog, true); + + err = rbtree_map_fail__load(skel); + if (!ASSERT_OK(err, "unexpected load fail")) + goto cleanup; + + link = bpf_program__attach(skel->progs.add_node__no_lock); + if (!ASSERT_OK_PTR(link, "link")) + goto cleanup; + + syscall(SYS_getpgid); + + ASSERT_EQ(skel->bss->no_lock_add__fail, 1, "alloc_fail"); + + bpf_link__destroy(link); +cleanup: + rbtree_map_fail__destroy(skel); +} + +void test_rbtree_map_prog_load_fail(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(rbtree_prog_load_fail_tests); i++) { + if (!test__start_subtest(rbtree_prog_load_fail_tests[i].prog_name)) + continue; + + verify_fail(rbtree_prog_load_fail_tests[i].prog_name, + rbtree_prog_load_fail_tests[i].expected_err_msg); + } +} + +void test_rbtree_map(void) +{ + struct rbtree_map *skel; + struct bpf_link *link; + + skel = rbtree_map__open_and_load(); + if (!ASSERT_OK_PTR(skel, "rbtree_map__open_and_load")) + goto cleanup; + + link = bpf_program__attach(skel->progs.check_rbtree); + if (!ASSERT_OK_PTR(link, "link")) + goto cleanup; + + for (int i = 0; i < 100; i++) + syscall(SYS_getpgid); + + ASSERT_EQ(skel->bss->calls, 100, "calls_equal"); + + bpf_link__destroy(link); +cleanup: + rbtree_map__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/rbtree_map.c b/tools/testing/selftests/bpf/progs/rbtree_map.c new file mode 100644 index 000000000000..0cd467838f6e --- /dev/null +++ b/tools/testing/selftests/bpf/progs/rbtree_map.c @@ -0,0 +1,111 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ + +#include "vmlinux.h" +#include +#include "bpf_misc.h" + +struct node_data { + struct rb_node node; + __u32 one; + __u32 two; +}; + +struct { + __uint(type, BPF_MAP_TYPE_RBTREE); + __type(value, struct node_data); +} rbtree SEC(".maps"); + +long calls; + +static bool less(struct rb_node *a, const struct rb_node *b) +{ + struct node_data *node_a; + struct node_data *node_b; + + node_a = container_of(a, struct node_data, node); + node_b = container_of(b, struct node_data, node); + + return node_a->one < node_b->one; +} + +// Key = node_datq +static int cmp(const void *key, const struct rb_node *b) +{ + struct node_data *node_a; + struct node_data *node_b; + + node_a = container_of(key, struct node_data, node); + node_b = container_of(b, struct node_data, node); + + return node_b->one - node_a->one; +} + +// Key = just node_data.one +static int cmp2(const void *key, const struct rb_node *b) +{ + __u32 one; + struct node_data *node_b; + + one = *(__u32 *)key; + node_b = container_of(b, struct node_data, node); + + return node_b->one - one; +} + +SEC("fentry/" SYS_PREFIX "sys_getpgid") +int check_rbtree(void *ctx) +{ + struct node_data *node, *found, *ret; + struct node_data popped; + struct node_data search; + __u32 search2; + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data)); + if (!node) + return 0; + + node->one = calls; + node->two = 6; + bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree)); + + ret = (struct node_data *)bpf_rbtree_add(&rbtree, node, less); + if (!ret) { + bpf_rbtree_free_node(&rbtree, node); + goto unlock_ret; + } + + bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree)); + + bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree)); + + search.one = calls; + found = (struct node_data *)bpf_rbtree_find(&rbtree, &search, cmp); + if (!found) + goto unlock_ret; + + int node_ct = 0; + struct node_data *iter = (struct node_data *)bpf_rbtree_first(&rbtree); + + while (iter) { + node_ct++; + iter = (struct node_data *)bpf_rbtree_next(&rbtree, iter); + } + + ret = (struct node_data *)bpf_rbtree_remove(&rbtree, found); + if (!ret) + goto unlock_ret; + + bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree)); + + bpf_rbtree_free_node(&rbtree, ret); + + __sync_fetch_and_add(&calls, 1); + return 0; + +unlock_ret: + bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree)); + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c new file mode 100644 index 000000000000..287c8db080d8 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c @@ -0,0 +1,236 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ + +#include "vmlinux.h" +#include +#include "bpf_misc.h" + +struct node_data { + struct rb_node node; + __u32 one; + __u32 two; +}; + +struct { + __uint(type, BPF_MAP_TYPE_RBTREE); + __type(value, struct node_data); +} rbtree SEC(".maps"); + +long calls; + +static bool less(struct rb_node *a, const struct rb_node *b) +{ + struct node_data *node_a; + struct node_data *node_b; + + node_a = container_of(a, struct node_data, node); + node_b = container_of(b, struct node_data, node); + + return node_a->one < node_b->one; +} + +// Key = node_datq +static int cmp(const void *key, const struct rb_node *b) +{ + struct node_data *node_a; + struct node_data *node_b; + + node_a = container_of(key, struct node_data, node); + node_b = container_of(b, struct node_data, node); + + return node_b->one - node_a->one; +} + +long size_too_small__alloc_fail; + +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +int alloc_node__size_too_small(void *ctx) +{ + struct node_data *node, *ret; + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(char)); + if (!node) { + size_too_small__alloc_fail++; + return 0; + } + + bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree)); + /* will never execute, alloc_node should fail */ + node->one = 1; + ret = bpf_rbtree_add(&rbtree, node, less); + if (!ret) { + bpf_rbtree_free_node(&rbtree, node); + goto unlock_ret; + } + +unlock_ret: + bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree)); + return 0; +} + +long no_lock_add__fail; + +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +int add_node__no_lock(void *ctx) +{ + struct node_data *node, *ret; + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data)); + if (!node) + return 0; + + node->one = 1; + ret = bpf_rbtree_add(&rbtree, node, less); + if (!ret) { + no_lock_add__fail++; + /* will always execute, rbtree_add should fail + * because no lock held + */ + bpf_rbtree_free_node(&rbtree, node); + } + +unlock_ret: + return 0; +} + +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +int rb_node__field_store(void *ctx) +{ + struct node_data *node; + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data)); + if (!node) + return 0; + + /* Only rbtree_map helpers can modify rb_node field */ + node->node.rb_left = NULL; + return 0; +} + +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +int rb_node__alloc_no_add(void *ctx) +{ + struct node_data *node; + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data)); + if (!node) + return 0; + /* The node alloc'd above is never added to the rbtree. It must be + * added or free'd before prog terminates. + */ + + node->one = 42; + return 0; +} + +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +int rb_node__two_alloc_one_add(void *ctx) +{ + struct node_data *node, *ret; + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data)); + if (!node) + return 0; + node->one = 1; + /* The node alloc'd above is never added to the rbtree. It must be + * added or free'd before prog terminates. + */ + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data)); + if (!node) + return 0; + node->one = 42; + + bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree)); + + ret = bpf_rbtree_add(&rbtree, node, less); + if (!ret) { + bpf_rbtree_free_node(&rbtree, node); + goto unlock_ret; + } + +unlock_ret: + bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree)); + return 0; +} + +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +int rb_node__remove_no_free(void *ctx) +{ + struct node_data *node, *ret; + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data)); + if (!node) + return 0; + node->one = 42; + + bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree)); + + ret = bpf_rbtree_add(&rbtree, node, less); + if (!ret) { + bpf_rbtree_free_node(&rbtree, node); + goto unlock_ret; + } + + ret = bpf_rbtree_remove(&rbtree, ret); + if (!ret) + goto unlock_ret; + /* At this point we've successfully acquired a reference from + * bpf_rbtree_remove. It must be released via rbtree_add or + * rbtree_free_node before prog terminates. + */ + +unlock_ret: + bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree)); + return 0; +} + +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +int rb_tree__add_wrong_type(void *ctx) +{ + /* Can't add a task_struct to rbtree + */ + struct task_struct *task; + struct node_data *ret; + + task = bpf_get_current_task_btf(); + + bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree)); + + ret = bpf_rbtree_add(&rbtree, task, less); + /* Verifier should fail at bpf_rbtree_add, so don't bother handling + * failure. + */ + + bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree)); + return 0; +} + +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +int rb_tree__conditional_release_helper_usage(void *ctx) +{ + struct node_data *node, *ret; + + node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data)); + if (!node) + return 0; + node->one = 42; + + bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree)); + + ret = bpf_rbtree_add(&rbtree, node, less); + /* Verifier should fail when trying to use CONDITIONAL_RELEASE + * type in a helper + */ + bpf_rbtree_free_node(&rbtree, node); + if (!ret) { + bpf_rbtree_free_node(&rbtree, node); + goto unlock_ret; + } + +unlock_ret: + bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree)); + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c new file mode 100644 index 000000000000..036558eedd66 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ + +#include "vmlinux.h" +#include + +struct node_data_no_rb_node { + __u64 one; + __u64 two; + __u64 three; + __u64 four; + __u64 five; + __u64 six; + __u64 seven; +}; + +/* Should fail because value struct has no rb_node + */ +struct { + __uint(type, BPF_MAP_TYPE_RBTREE); + __type(value, struct node_data_no_rb_node); +} rbtree_fail_no_rb_node SEC(".maps"); + +char _license[] SEC("license") = "GPL";