From patchwork Tue Aug 30 17:27:42 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959637
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 03B85ECAAD1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:47:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232139AbiH3RrE (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:47:04 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43336 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231181AbiH3Rqd (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:46:33 -0400
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com
 [67.231.145.42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E41410851A
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:42:57 -0700 (PDT)
Received: from pps.filterd (m0109334.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UFwVGE020005
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:14 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=Sqt3AGW2kmJ3J/VeyLH/TXLq+U+oo//KsTgrGZ9CyBs=;
 b=Syec8qhlZetnOf3OKrHTqq44N7pnyZ5bKDneXWkaYJgpocn0R4JjgBtRKOOH05N6PuKL
 FaedNi9ThqQZcUkN/bgeLmnwH7dE6xD1SWQhPSawr+Tml/vHrSfEVVI4LjVxxP1of+hb
 mJEwODtQtKTUmIGOVC21gGPlJPIMsVZSfjE=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j7jk4swfq-4
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:14 -0700
Received: from twshared6324.05.ash7.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:28:12 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id CAFADCAD0734; Tue, 30 Aug 2022 10:28:04 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom
 callback return range
Date: Tue, 30 Aug 2022 10:27:42 -0700
Message-ID: <20220830172759.4069786-2-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: n-uVUsGFes0byct4RBU3tMhk8VeKi6S_
X-Proofpoint-GUID: n-uVUsGFes0byct4RBU3tMhk8VeKi6S_
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Verifier logic to confirm that a callback function returns 0 or 1 was
added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
At the time, callback return value was only used to continue or stop
iteration.

In order to support callbacks with a broader return value range, such as
those added further in this series, add a callback_ret_range to
bpf_func_state. Verifier's helpers which set in_callback_fn will also
set the new field, which the verifier will later use to check return
value bounds.

Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel
value as the latter would prevent the valid range (0, U64_MAX) being
used.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf_verifier.h | 1 +
 kernel/bpf/verifier.c        | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 2e3bad8640dc..9c017575c034 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -237,6 +237,7 @@ struct bpf_func_state {
 	 */
 	u32 async_entry_cnt;
 	bool in_callback_fn;
+	struct tnum callback_ret_range;
 	bool in_async_callback_fn;
 
 	/* The following fields should be last. See copy_func_state() */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9bef8b41e737..68bfa7c28048 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env,
 	state->callsite = callsite;
 	state->frameno = frameno;
 	state->subprogno = subprogno;
+	state->callback_ret_range = tnum_range(0, 1);
 	init_reg_state(env, state);
 	mark_verifier_state_scratched(env);
 }
@@ -6879,6 +6880,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
 	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
 	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
 	callee->in_callback_fn = true;
+	callee->callback_ret_range = tnum_range(0, 1);
 	return 0;
 }
 
@@ -6906,7 +6908,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 	caller = state->frame[state->curframe];
 	if (callee->in_callback_fn) {
 		/* enforce R0 return value range [0, 1]. */
-		struct tnum range = tnum_range(0, 1);
+		struct tnum range = callee->callback_ret_range;
 
 		if (r0->type != SCALAR_VALUE) {
 			verbose(env, "R0 not a scalar value\n");

From patchwork Tue Aug 30 17:27:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959641
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BA90FECAAD1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:49:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231561AbiH3Rts (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:49:48 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48564 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232155AbiH3RtU (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:49:20 -0400
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com
 [67.231.145.42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F344C1123AA
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:46:26 -0700 (PDT)
Received: from pps.filterd (m0148461.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UG2DWc009515
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:17 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=JrlU6OIh7bZ7klwThyOor5d3ykS/HjCka7P7GPoOa6E=;
 b=JOl97JFRUs5dl72iRbZKDMYtpZgZNWTgJzYGct891ekCsAlsiK44mZl2oUng4i8PfnOz
 vw+qu95Z6/G9CyAsPW2EiU9pw/udhTjlIWb7waMsEL1zsmH0aJZ/ZrGgzu7ACP73Jpwa
 Zo68FYmBXkVdg9nq/D2Xlq8q4ncAHYGpl44=
Received: from mail.thefacebook.com ([163.114.132.120])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j94gye1jk-8
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:17 -0700
Received: from twshared10711.09.ash9.facebook.com (2620:10d:c085:208::f) by
 mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:28:16 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 4523ECAD074A; Tue, 30 Aug 2022 10:28:05 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 02/18] bpf: Add verifier check for
 BPF_PTR_POISON retval and arg
Date: Tue, 30 Aug 2022 10:27:43 -0700
Message-ID: <20220830172759.4069786-3-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: 39KUiHcORC_M6e9oYytErrlz1x_9J25c
X-Proofpoint-GUID: 39KUiHcORC_M6e9oYytErrlz1x_9J25c
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

BPF_PTR_POISON was added in commit c0a5a21c25f37 ("bpf: Allow storing
referenced kptr in map") to denote a bpf_func_proto btf_id which the
verifier will replace with a dynamically-determined one at verification
time.

This patch adds 'poison' functionality to BPF_PTR_POISON in order to
prepare for expanded use of the value to poison ret- and arg-btf_id in
further patches of this series. Specifically, when the verifier checks
helper calls, it assumes that BPF_PTR_POISON'ed ret type will be
replaced with a valid type before - or in lieu of - the default
ret_btf_id logic. Similarly for argument btf_id.

If poisoned btf_id reaches default handling block for either, consider
this a verifier internal error and fail verification. Otherwise a helper
w/ poisoned btf_id but no verifier logic replacing the type will cause a
crash as the invalid pointer is dereferenced.

Also move BPF_PTR_POISON to existing <linux/posion.h> header and replace
the hardcoded value with something that looks slightly like BPF (0xB4F).

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/poison.h |  3 +++
 kernel/bpf/helpers.c   |  6 +++---
 kernel/bpf/verifier.c  | 28 ++++++++++++++++++++++------
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/include/linux/poison.h b/include/linux/poison.h
index d62ef5a6b4e9..599995d29fe4 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -81,4 +81,7 @@
 /********** net/core/page_pool.c **********/
 #define PP_SIGNATURE		(0x40 + POISON_POINTER_DELTA)
 
+/********** kernel/bpf/ **********/
+#define BPF_PTR_POISON ((void *)(0xB4FUL + POISON_POINTER_DELTA))
+
 #endif
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3c1b9bbcf971..6de4dedf45c0 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -15,6 +15,7 @@
 #include <linux/ctype.h>
 #include <linux/jiffies.h>
 #include <linux/pid_namespace.h>
+#include <linux/poison.h>
 #include <linux/proc_ns.h>
 #include <linux/security.h>
 #include <linux/btf_ids.h>
@@ -1410,10 +1411,9 @@ BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
 }
 
 /* Unlike other PTR_TO_BTF_ID helpers the btf_id in bpf_kptr_xchg()
- * helper is determined dynamically by the verifier.
+ * helper is determined dynamically by the verifier. Use BPF_PTR_POISON to
+ * denote type that verifier will determine.
  */
-#define BPF_PTR_POISON ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA))
-
 static const struct bpf_func_proto bpf_kptr_xchg_proto = {
 	.func         = bpf_kptr_xchg,
 	.gpl_only     = false,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 68bfa7c28048..ee5b57140c73 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -23,6 +23,7 @@
 #include <linux/error-injection.h>
 #include <linux/bpf_lsm.h>
 #include <linux/btf_ids.h>
+#include <linux/poison.h>
 
 #include "disasm.h"
 
@@ -5764,13 +5765,21 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 		if (meta->func_id == BPF_FUNC_kptr_xchg) {
 			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno))
 				return -EACCES;
-		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+		} else {
+			if (arg_btf_id == BPF_PTR_POISON) {
+				verbose(env, "verifier internal error: R%d has "
+					"non-overwritten BPF_PTR_POISON type\n", regno);
+				return -EACCES;
+			}
+
+			if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
 						 btf_vmlinux, *arg_btf_id,
 						 strict_type_match)) {
-			verbose(env, "R%d is of type %s but %s is expected\n",
-				regno, kernel_type_name(reg->btf, reg->btf_id),
-				kernel_type_name(btf_vmlinux, *arg_btf_id));
-			return -EACCES;
+				verbose(env, "R%d is of type %s but %s is expected\n",
+					regno, kernel_type_name(reg->btf, reg->btf_id),
+					kernel_type_name(btf_vmlinux, *arg_btf_id));
+				return -EACCES;
+			}
 		}
 	}
 
@@ -7454,7 +7463,14 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			ret_btf = meta.kptr_off_desc->kptr.btf;
 			ret_btf_id = meta.kptr_off_desc->kptr.btf_id;
 		} else {
-			ret_btf = btf_vmlinux;
+			if (fn->ret_btf_id == BPF_PTR_POISON) {
+				verbose(env, "verifier internal error: func %s "
+					"has non-overwritten "
+					"BPF_PTR_POISON return type\n",
+					func_id_name(func_id));
+				return -EINVAL;
+			}
+		        ret_btf = btf_vmlinux;
 			ret_btf_id = *fn->ret_btf_id;
 		}
 		if (ret_btf_id == 0) {

From patchwork Tue Aug 30 17:27:44 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959624
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4FBA9ECAAA1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:34:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231877AbiH3ReX (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:34:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37214 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232159AbiH3Rdi (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:33:38 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB88F21822
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:29:50 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UGLbJF003220
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:17 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=JU5cKcj7B9+S3bnzt1CW7989cHdDS6j/uiSS1TdNCBU=;
 b=oiy0Mx4uan5X/xe8AZHzrTWqY3LyuwDCdqxu7vvu7B3ojbmlEX9N++bcaA7vUKj3lgGt
 YyWOtevLO1ytjCwJcmQ0szdTKbmqZBD/G+XSBxADTYSOKAyWR6GxOC46Q3RxhBo/chn7
 M0p43uJI3FBADPxZeW0dVfrqEe+3UK3zgOA=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5djqgr-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:17 -0700
Received: from twshared6324.05.ash7.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:28:16 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id AC7BDCAD076C; Tue, 30 Aug 2022 10:28:09 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 03/18] bpf: Add rb_node_off to bpf_map
Date: Tue, 30 Aug 2022 10:27:44 -0700
Message-ID: <20220830172759.4069786-4-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: 5hD5fbwMgylQnTaIt-CXkzvuXYjpMVcV
X-Proofpoint-ORIG-GUID: 5hD5fbwMgylQnTaIt-CXkzvuXYjpMVcV
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Similarly to other protected fields which might be in a map value -
bpf_spin_lock and bpf_timer - keep track of existence and offset of
struct rb_node within map value struct. This will allow later patches in
this series to prevent writes to rb_node field.

Allowing bpf programs to modify the rbtree node struct's rb_node field
would allow parent and children node pointers to be changed, which could
corrupt an otherwise-valid rbtree.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h  |  6 ++++++
 include/linux/btf.h  |  1 +
 kernel/bpf/btf.c     | 21 +++++++++++++++++++++
 kernel/bpf/syscall.c |  3 +++
 4 files changed, 31 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a627a02cf8ab..b4a44ffb0d6c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -215,6 +215,7 @@ struct bpf_map {
 	int spin_lock_off; /* >=0 valid offset, <0 error */
 	struct bpf_map_value_off *kptr_off_tab;
 	int timer_off; /* >=0 valid offset, <0 error */
+	int rb_node_off; /* >=0 valid offset, <0 error */
 	u32 id;
 	int numa_node;
 	u32 btf_key_type_id;
@@ -264,6 +265,11 @@ static inline bool map_value_has_kptrs(const struct bpf_map *map)
 	return !IS_ERR_OR_NULL(map->kptr_off_tab);
 }
 
+static inline bool map_value_has_rb_node(const struct bpf_map *map)
+{
+	return map->rb_node_off >= 0;
+}
+
 static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 {
 	if (unlikely(map_value_has_spin_lock(map)))
diff --git a/include/linux/btf.h b/include/linux/btf.h
index ad93c2d9cc1c..cbf10b6b4ada 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -154,6 +154,7 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   u32 expected_offset, u32 expected_size);
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 int btf_find_timer(const struct btf *btf, const struct btf_type *t);
+int btf_find_rb_node(const struct btf *btf, const struct btf_type *t);
 struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
 					  const struct btf_type *t);
 bool btf_type_is_void(const struct btf_type *t);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 903719b89238..f020efad8d9b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3195,6 +3195,7 @@ enum btf_field_type {
 	BTF_FIELD_SPIN_LOCK,
 	BTF_FIELD_TIMER,
 	BTF_FIELD_KPTR,
+	BTF_FIELD_RB_NODE,
 };
 
 enum {
@@ -3282,6 +3283,7 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 		switch (field_type) {
 		case BTF_FIELD_SPIN_LOCK:
 		case BTF_FIELD_TIMER:
+		case BTF_FIELD_RB_NODE:
 			ret = btf_find_struct(btf, member_type, off, sz,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3332,6 +3334,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		switch (field_type) {
 		case BTF_FIELD_SPIN_LOCK:
 		case BTF_FIELD_TIMER:
+		case BTF_FIELD_RB_NODE:
 			ret = btf_find_struct(btf, var_type, off, sz,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3374,6 +3377,11 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
 		sz = sizeof(struct bpf_timer);
 		align = __alignof__(struct bpf_timer);
 		break;
+	case BTF_FIELD_RB_NODE:
+		name = "rb_node";
+		sz = sizeof(struct rb_node);
+		align = __alignof__(struct rb_node);
+		break;
 	case BTF_FIELD_KPTR:
 		name = NULL;
 		sz = sizeof(u64);
@@ -3420,6 +3428,19 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t)
 	return info.off;
 }
 
+int btf_find_rb_node(const struct btf *btf, const struct btf_type *t)
+{
+	struct btf_field_info info;
+	int ret;
+
+	ret = btf_find_field(btf, t, BTF_FIELD_RB_NODE, &info, 1);
+	if (ret < 0)
+		return ret;
+	if (!ret)
+		return -ENOENT;
+	return info.off;
+}
+
 struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
 					  const struct btf_type *t)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83c7136c5788..3947fbd137af 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1052,6 +1052,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 		}
 	}
 
+	map->rb_node_off = btf_find_rb_node(btf, value_type);
+
 	if (map->ops->map_check_btf) {
 		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
 		if (ret < 0)
@@ -1115,6 +1117,7 @@ static int map_create(union bpf_attr *attr)
 
 	map->spin_lock_off = -EINVAL;
 	map->timer_off = -EINVAL;
+	map->rb_node_off = -EINVAL;
 	if (attr->btf_key_type_id || attr->btf_value_type_id ||
 	    /* Even the map's value is a kernel's struct,
 	     * the bpf_prog.o must have BTF to begin with

From patchwork Tue Aug 30 17:27:45 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959625
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 60325ECAAD1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:34:27 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231958AbiH3ReY (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:34:24 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60686 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232125AbiH3Rde (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:33:34 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA9C6DFC5
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:29:48 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UAokuL029866
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:21 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=eR9mlFfdzMWPc67yqjfGgzjsQjMloaGr3A1UFqXyH7U=;
 b=X/H/myxb7GvDqak5+t4YquyAeP1lZJpnLGmVlUzuGaoouYxqONq2wtYogWke9hXZvr5W
 x5gFvO7JKztBta+BoZlZAtZ7NKYJRIF4zBCEnXbBdA4T95vP2AYkU94uhWvJa4LwQLif
 STNGlIbo2A64UzKDsDTdSgGFNdW1nbZ/Ws8=
Received: from mail.thefacebook.com ([163.114.132.120])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5djqhr-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:20 -0700
Received: from twshared10711.09.ash9.facebook.com (2620:10d:c085:108::8) by
 mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:28:19 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id C381DCAD076F; Tue, 30 Aug 2022 10:28:09 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 04/18] bpf: Add rbtree map
Date: Tue, 30 Aug 2022 10:27:45 -0700
Message-ID: <20220830172759.4069786-5-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: HYcPykogI_33kG5OJX79InGcjyQr0Hui
X-Proofpoint-ORIG-GUID: HYcPykogI_33kG5OJX79InGcjyQr0Hui
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Introduce a new map type, bpf_rbtree_map, allowing BPF programs to
create and manipulate rbtree data structures. bpf_rbtree_map differs
from 'classic' BPF map patterns in a few important ways:

  * The map does not allocate its own elements. Instead,
    BPF programs must call bpf_rbtree_alloc_node helper to allocate and
    bpf_rbtree_add to add map elem - referred to as 'node' from now on -
    to the rbtree. This means that rbtree maps can grow dynamically, do
    not preallocate, and that 'max_entries' has no meaning for rbtree
    maps.
    * Separating allocation and insertion allows alloc_node call to
      occur in contexts where it's safe to allocate.

  * It's possible to remove a node from a rbtree map with
    bpf_rbtree_remove helper.
    * Goal here is to allow a node to be removed from one rbtree and
      added to another
      [ NOTE: This functionality is still in progress ]

Helpers are added to manipulate nodes and trees:
  * bpf_rbtree_{alloc,free}_node: Allocate / free node structs
  * bpf_rbtree_{add,remove}: Add / remove nodes from rbtree maps
    * A comparator function is passed to bpf_rbtree_add in order to
      find the correct place to add the node.
  * bpf_rbtree_find: Find a node matching some condition in the rbtree
    * A comparator function is passed in order to determine whether a
      node matches what's being searched for.

bpf_rbtree_add is very similar to the 'map_push_elem' builtin, but since
verifier needs special logic to setup the comparator callback a new
helper is added. Same for bpf_rbtree_find and 'map_lookup_elem' builtin.

In order to safely support separate allocation / insertion and passing
nodes between rbtrees, some invariants must hold:

  * A node that is not in a rbtree map must either be free'd or added to
    a rbtree map before the program terminates
    * Nodes are in this state when returned from bpf_rbtree_alloc_node
      or bpf_rbtree_remove.

If a node is in a rbtree map it is 'owned' by the map, otherwise it is
owned by the BPF program which holds a reference to it. Owner is
responsible for the lifetime of the node.

This matches existing acquire / release semantics well. node_alloc and
remove operations 'acquire' a node while add and node_free operations
'release' the node. The verifier enforces that acquired nodes are
released before program terminates.

Some other implementation details:

  * The value type of an rbtree map is expected to be a struct which
    contains 'struct rb_node' at offset 0.
  * BPF programs may not modify the node struct's rb_node field.
    * Otherwise the tree could be left in corrupted state
  * Rbtree map is value-only. Keys have no meaning
  * Since the value type is not known until the rbtree map is
    instantiated, helper protos have input and return type
    'struct rb_node *' which verifier replaces with actual
    map value type.
  * [ TODO: Existing logic prevents any writes to PTR_TO_BTF_ID. This
    broadly turned off in this patch and replaced with "no writes to
    struct rb_node is PTR_TO_BTF_ID struct has one". This is a hack and
    needs to be replaced. ]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf_types.h      |   1 +
 include/uapi/linux/bpf.h       |  61 ++++++++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/helpers.c           |  15 ++
 kernel/bpf/rbtree.c            | 256 +++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          | 121 +++++++++++++++-
 tools/include/uapi/linux/bpf.h |  61 ++++++++
 7 files changed, 511 insertions(+), 6 deletions(-)
 create mode 100644 kernel/bpf/rbtree.c

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 2b9112b80171..78e9b5253983 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_RBTREE, rbtree_map_ops)
 
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e174ad28aeb7..1af17b27d34f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -909,6 +909,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
 	BPF_MAP_TYPE_BLOOM_FILTER,
+	BPF_MAP_TYPE_RBTREE,
 };
 
 /* Note that tracing related programs such as
@@ -5353,6 +5354,61 @@ union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * void *bpf_rbtree_alloc_node(struct bpf_map *map, u32 sz)
+ *	Description
+ *		Allocate a node of size *sz* bytes for use in rbtree *map*.
+ *
+ *		*sz* must be >= sizeof(struct rb_node)
+ *	Return
+ *		A pointer to the allocated node if successful, otherwise NULL.
+ *
+ *		The verifier considers the type of the returned pointer to be
+ *		the BTF id of *map*'s value type.
+ *
+ * void *bpf_rbtree_add(struct bpf_map *map, void *node, void *cb)
+ *	Description
+ *		Add *node* to rbtree *map* using *cb* comparator callback to
+ *		walk the rbtree.
+ *
+ *		*cb* must take (struct rb_node \*, const struct rb_node \*) as
+ *		input and return a bool signifying whether the first rb_node's
+ *		struct is less than the second.
+ *
+ *	Return
+ *		If success, returns a pointer to the added node in the rbtree.
+ *
+ *		If add fails, returns NULL
+ *
+ * long bpf_rbtree_find(struct bpf_map *map, void *key, void *cb)
+ *	Description
+ *		Find *key* in rbtree *map* using *cb* comparator callback to walk the
+ *		rbtree.
+ *
+ *		*cb* must take (const void \*key, const struct rb_node \*n) as
+ *		input and return an int. If *cb* determines *n* to match *key*, *cb* must
+ *		return 0. If larger, a positive int, and a negative int if smaller.
+ *
+ *		*key* does not need to be a rbtree node struct.
+ *
+ *	Return
+ *		Ptr to rbtree node if found, otherwise NULL.
+ *
+ * void *bpf_rbtree_remove(struct bpf_map *map, void *elem)
+ *	Description
+ *		Remove *elem* from rbtree *map*.
+ *
+ *	Return
+ *		If success, returns a pointer to the removed node.
+ *
+ *		If remove fails, returns NULL
+ *
+ * long bpf_rbtree_free_node(struct bpf_map *map, void *elem)
+ *	Description
+ *		Free rb_node that isn't associated w/ a tree.
+ *
+ *	Return
+ *		0
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5564,6 +5620,11 @@ union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(rbtree_alloc_node),		\
+	FN(rbtree_add),			\
+	FN(rbtree_find),		\
+	FN(rbtree_remove),		\
+	FN(rbtree_free_node),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 057ba8e01e70..00eedab3ad53 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -7,7 +7,7 @@ endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
-obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
+obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o rbtree.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6de4dedf45c0..d18d4d8ca1e2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1598,6 +1598,11 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;
 const struct bpf_func_proto bpf_probe_read_kernel_proto __weak;
 const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
 const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
+const struct bpf_func_proto bpf_rbtree_alloc_node_proto __weak;
+const struct bpf_func_proto bpf_rbtree_add_proto __weak;
+const struct bpf_func_proto bpf_rbtree_find_proto __weak;
+const struct bpf_func_proto bpf_rbtree_remove_proto __weak;
+const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
 
 const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1689,6 +1694,16 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_timer_cancel_proto;
 	case BPF_FUNC_kptr_xchg:
 		return &bpf_kptr_xchg_proto;
+	case BPF_FUNC_rbtree_alloc_node:
+		return &bpf_rbtree_alloc_node_proto;
+	case BPF_FUNC_rbtree_add:
+		return &bpf_rbtree_add_proto;
+	case BPF_FUNC_rbtree_find:
+		return &bpf_rbtree_find_proto;
+	case BPF_FUNC_rbtree_remove:
+		return &bpf_rbtree_remove_proto;
+	case BPF_FUNC_rbtree_free_node:
+		return &bpf_rbtree_free_node_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
new file mode 100644
index 000000000000..7d50574e4d57
--- /dev/null
+++ b/kernel/bpf/rbtree.c
@@ -0,0 +1,256 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+#include <linux/filter.h>
+#include <linux/poison.h>
+
+struct bpf_rbtree {
+	struct bpf_map map;
+	struct rb_root_cached root;
+};
+
+static int rbtree_map_alloc_check(union bpf_attr *attr)
+{
+	if (attr->max_entries || !attr->btf_value_type_id)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
+{
+	struct bpf_rbtree *tree;
+	int numa_node;
+
+	if (!bpf_capable())
+		return ERR_PTR(-EPERM);
+
+	if (attr->value_size == 0)
+		return ERR_PTR(-EINVAL);
+
+	numa_node = bpf_map_attr_numa_node(attr);
+	tree = bpf_map_area_alloc(sizeof(*tree), numa_node);
+	if (!tree)
+		return ERR_PTR(-ENOMEM);
+
+	tree->root = RB_ROOT_CACHED;
+	bpf_map_init_from_attr(&tree->map, attr);
+	return &tree->map;
+}
+
+static struct rb_node *rbtree_map_alloc_node(struct bpf_map *map, size_t sz)
+{
+	struct rb_node *node;
+
+	node = bpf_map_kmalloc_node(map, sz, GFP_NOWAIT, map->numa_node);
+	if (!node)
+		return NULL;
+	RB_CLEAR_NODE(node);
+	return node;
+}
+
+BPF_CALL_2(bpf_rbtree_alloc_node, struct bpf_map *, map, u32, sz)
+{
+	struct rb_node *node;
+
+	if (map->map_type != BPF_MAP_TYPE_RBTREE)
+		return (u64)NULL;
+
+	if (sz < sizeof(*node))
+		return (u64)NULL;
+
+	node = rbtree_map_alloc_node(map, (size_t)sz);
+	if (!node)
+		return (u64)NULL;
+
+	return (u64)node;
+}
+
+const struct bpf_func_proto bpf_rbtree_alloc_node_proto = {
+	.func = bpf_rbtree_alloc_node,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_btf_id = BPF_PTR_POISON,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_CONST_ALLOC_SIZE_OR_ZERO,
+};
+
+BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
+{
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+	struct rb_node *node = (struct rb_node *)value;
+
+	if (WARN_ON_ONCE(!RB_EMPTY_NODE(node)))
+		return (u64)NULL;
+
+	rb_add_cached(node, &tree->root, (bool (*)(struct rb_node *, const struct rb_node *))cb);
+	return (u64)node;
+}
+
+const struct bpf_func_proto bpf_rbtree_add_proto = {
+	.func = bpf_rbtree_add,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_btf_id = BPF_PTR_POISON,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
+	.arg2_btf_id = BPF_PTR_POISON,
+	.arg3_type = ARG_PTR_TO_FUNC,
+};
+
+BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb)
+{
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+	return (u64)rb_find(key, &tree->root.rb_root,
+			    (int (*)(const void *key,
+				     const struct rb_node *))cb);
+}
+
+const struct bpf_func_proto bpf_rbtree_find_proto = {
+	.func = bpf_rbtree_find,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_btf_id = BPF_PTR_POISON,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_ANYTHING,
+	.arg3_type = ARG_PTR_TO_FUNC,
+};
+
+/* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are
+ * 'rb_node *', so field name of rb_node within containing struct is not
+ * needed.
+ *
+ * Since bpf_rb_tree's node always has 'struct rb_node' at offset 0 it's
+ * not necessary to know field name or type of node struct
+ */
+#define bpf_rbtree_postorder_for_each_entry_safe(pos, n, root) \
+	for (pos = rb_first_postorder(root); \
+	     pos && ({ n = rb_next_postorder(pos); 1; }); \
+	     pos = n)
+
+static void rbtree_map_free(struct bpf_map *map)
+{
+	struct rb_node *pos, *n;
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root)
+		kfree(pos);
+	bpf_map_area_free(tree);
+}
+
+static int rbtree_map_check_btf(const struct bpf_map *map,
+				const struct btf *btf,
+				const struct btf_type *key_type,
+				const struct btf_type *value_type)
+{
+	if (!map_value_has_rb_node(map))
+		return -EINVAL;
+
+	if (map->rb_node_off > 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int rbtree_map_push_elem(struct bpf_map *map, void *value, u64 flags)
+{
+	/* Use bpf_rbtree_add helper instead
+	 */
+	return -ENOTSUPP;
+}
+
+static int rbtree_map_pop_elem(struct bpf_map *map, void *value)
+{
+	return -ENOTSUPP;
+}
+
+static int rbtree_map_peek_elem(struct bpf_map *map, void *value)
+{
+	return -ENOTSUPP;
+}
+
+static void *rbtree_map_lookup_elem(struct bpf_map *map, void *value)
+{
+	/* Use bpf_rbtree_find helper instead
+	 */
+	return ERR_PTR(-ENOTSUPP);
+}
+
+static int rbtree_map_update_elem(struct bpf_map *map, void *key, void *value,
+				  u64 flags)
+{
+	return -ENOTSUPP;
+}
+
+static int rbtree_map_delete_elem(struct bpf_map *map, void *value)
+{
+	return -ENOTSUPP;
+}
+
+BPF_CALL_2(bpf_rbtree_remove, struct bpf_map *, map, void *, value)
+{
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+	struct rb_node *node = (struct rb_node *)value;
+
+	if (WARN_ON_ONCE(RB_EMPTY_NODE(node)))
+		return (u64)NULL;
+
+	rb_erase_cached(node, &tree->root);
+	RB_CLEAR_NODE(node);
+	return (u64)node;
+}
+
+const struct bpf_func_proto bpf_rbtree_remove_proto = {
+	.func = bpf_rbtree_remove,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_btf_id = BPF_PTR_POISON,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_PTR_TO_BTF_ID,
+	.arg2_btf_id = BPF_PTR_POISON,
+};
+
+BPF_CALL_2(bpf_rbtree_free_node, struct bpf_map *, map, void *, value)
+{
+	struct rb_node *node = (struct rb_node *)value;
+
+	WARN_ON_ONCE(!RB_EMPTY_NODE(node));
+	kfree(node);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_rbtree_free_node_proto = {
+	.func = bpf_rbtree_free_node,
+	.gpl_only = true,
+	.ret_type = RET_INTEGER,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
+	.arg2_btf_id = BPF_PTR_POISON,
+};
+
+static int rbtree_map_get_next_key(struct bpf_map *map, void *key,
+				   void *next_key)
+{
+	return -ENOTSUPP;
+}
+
+BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree)
+const struct bpf_map_ops rbtree_map_ops = {
+	.map_meta_equal = bpf_map_meta_equal,
+	.map_alloc_check = rbtree_map_alloc_check,
+	.map_alloc = rbtree_map_alloc,
+	.map_free = rbtree_map_free,
+	.map_get_next_key = rbtree_map_get_next_key,
+	.map_push_elem = rbtree_map_push_elem,
+	.map_peek_elem = rbtree_map_peek_elem,
+	.map_pop_elem = rbtree_map_pop_elem,
+	.map_lookup_elem = rbtree_map_lookup_elem,
+	.map_update_elem = rbtree_map_update_elem,
+	.map_delete_elem = rbtree_map_delete_elem,
+	.map_check_btf = rbtree_map_check_btf,
+	.map_btf_id = &bpf_rbtree_map_btf_ids[0],
+};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ee5b57140c73..0a2e958ddca8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -482,7 +482,9 @@ static bool is_acquire_function(enum bpf_func_id func_id,
 	    func_id == BPF_FUNC_sk_lookup_udp ||
 	    func_id == BPF_FUNC_skc_lookup_tcp ||
 	    func_id == BPF_FUNC_ringbuf_reserve ||
-	    func_id == BPF_FUNC_kptr_xchg)
+	    func_id == BPF_FUNC_kptr_xchg ||
+	    func_id == BPF_FUNC_rbtree_alloc_node ||
+	    func_id == BPF_FUNC_rbtree_remove)
 		return true;
 
 	if (func_id == BPF_FUNC_map_lookup_elem &&
@@ -532,6 +534,21 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 	       insn->imm == BPF_CMPXCHG;
 }
 
+static bool function_manipulates_rbtree_node(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_rbtree_add ||
+		func_id == BPF_FUNC_rbtree_remove ||
+		func_id == BPF_FUNC_rbtree_free_node;
+}
+
+static bool function_returns_rbtree_node(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_rbtree_alloc_node ||
+		func_id == BPF_FUNC_rbtree_find ||
+		func_id == BPF_FUNC_rbtree_add ||
+		func_id == BPF_FUNC_rbtree_remove;
+}
+
 /* string representation of 'enum bpf_reg_type'
  *
  * Note that reg_type_str() can not appear more than once in a single verbose()
@@ -3785,6 +3802,13 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static bool access_can_write_field(u32 access_off, size_t access_sz,
+				   u32 field_off, size_t field_sz)
+{
+	return access_off < field_off + field_sz &&
+		field_off < access_off + access_sz;
+}
+
 /* if any part of struct field can be touched by
  * load/store reject this program.
  * To check that [x1, x2) overlaps with [y1, y2)
@@ -4491,7 +4515,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	const char *tname = btf_name_by_offset(reg->btf, t->name_off);
 	enum bpf_type_flag flag = 0;
 	u32 btf_id;
-	int ret;
+	int ret, rb_node_off;
 
 	if (off < 0) {
 		verbose(env,
@@ -4528,8 +4552,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 						  off, size, atype, &btf_id, &flag);
 	} else {
 		if (atype != BPF_READ) {
-			verbose(env, "only read is supported\n");
-			return -EACCES;
+			rb_node_off = btf_find_rb_node(reg->btf, t);
+			if (rb_node_off < 0 ||
+			    access_can_write_field(off, size, rb_node_off,
+						   sizeof(struct rb_node))) {
+				verbose(env, "only read is supported\n");
+				return -EACCES;
+			}
 		}
 
 		ret = btf_struct_access(&env->log, reg->btf, t, off, size,
@@ -5746,7 +5775,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	return -EACCES;
 
 found:
-	if (reg->type == PTR_TO_BTF_ID) {
+	if (base_type(reg->type) == PTR_TO_BTF_ID) {
 		/* For bpf_sk_release, it needs to match against first member
 		 * 'struct sock_common', hence make an exception for it. This
 		 * allows bpf_sk_release to work for multiple socket types.
@@ -5765,6 +5794,17 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 		if (meta->func_id == BPF_FUNC_kptr_xchg) {
 			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno))
 				return -EACCES;
+		} else if (function_manipulates_rbtree_node(meta->func_id)) {
+			if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+						  meta->map_ptr->btf,
+						  meta->map_ptr->btf_value_type_id,
+						  strict_type_match)) {
+				verbose(env, "rbtree: R%d is of type %s but %s is expected\n",
+					regno, kernel_type_name(reg->btf, reg->btf_id),
+					kernel_type_name(meta->map_ptr->btf,
+							 meta->map_ptr->btf_value_type_id));
+				return -EACCES;
+			}
 		} else {
 			if (arg_btf_id == BPF_PTR_POISON) {
 				verbose(env, "verifier internal error: R%d has "
@@ -6378,10 +6418,17 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		break;
 	case BPF_FUNC_map_pop_elem:
 		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
+		    map->map_type != BPF_MAP_TYPE_RBTREE &&
 		    map->map_type != BPF_MAP_TYPE_STACK)
 			goto error;
 		break;
 	case BPF_FUNC_map_peek_elem:
+		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
+		    map->map_type != BPF_MAP_TYPE_STACK &&
+		    map->map_type != BPF_MAP_TYPE_RBTREE &&
+		    map->map_type != BPF_MAP_TYPE_BLOOM_FILTER)
+			goto error;
+		break;
 	case BPF_FUNC_map_push_elem:
 		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
 		    map->map_type != BPF_MAP_TYPE_STACK &&
@@ -6836,6 +6883,57 @@ static int set_loop_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int set_rbtree_add_callback_state(struct bpf_verifier_env *env,
+					 struct bpf_func_state *caller,
+					 struct bpf_func_state *callee,
+					 int insn_idx)
+{
+	struct bpf_map *map_ptr = caller->regs[BPF_REG_1].map_ptr;
+
+	/* bpf_rbtree_add(struct bpf_map *map, void *value, void *cb)
+	 * cb(struct rb_node *a, const struct rb_node *b);
+	 */
+	callee->regs[BPF_REG_1].type = PTR_TO_MAP_VALUE;
+	__mark_reg_known_zero(&callee->regs[BPF_REG_1]);
+	callee->regs[BPF_REG_1].map_ptr = map_ptr;
+
+	callee->regs[BPF_REG_2].type = PTR_TO_MAP_VALUE;
+	__mark_reg_known_zero(&callee->regs[BPF_REG_2]);
+	callee->regs[BPF_REG_2].map_ptr = map_ptr;
+
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+	callee->in_callback_fn = true;
+	return 0;
+}
+
+static int set_rbtree_find_callback_state(struct bpf_verifier_env *env,
+					  struct bpf_func_state *caller,
+					  struct bpf_func_state *callee,
+					  int insn_idx)
+{
+	struct bpf_map *map_ptr = caller->regs[BPF_REG_1].map_ptr;
+
+	/* bpf_rbtree_find(struct bpf_map *map, void *key, void *cb)
+	 * cb(void *key, const struct rb_node *b);
+	 */
+	callee->regs[BPF_REG_1].type = PTR_TO_MAP_VALUE;
+	__mark_reg_known_zero(&callee->regs[BPF_REG_1]);
+	callee->regs[BPF_REG_1].map_ptr = map_ptr;
+
+	callee->regs[BPF_REG_2].type = PTR_TO_MAP_VALUE;
+	__mark_reg_known_zero(&callee->regs[BPF_REG_2]);
+	callee->regs[BPF_REG_2].map_ptr = map_ptr;
+
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+	callee->in_callback_fn = true;
+	callee->callback_ret_range = tnum_range(0, U64_MAX);
+	return 0;
+}
+
 static int set_timer_callback_state(struct bpf_verifier_env *env,
 				    struct bpf_func_state *caller,
 				    struct bpf_func_state *callee,
@@ -7318,6 +7416,14 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
 					set_loop_callback_state);
 		break;
+	case BPF_FUNC_rbtree_add:
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_rbtree_add_callback_state);
+		break;
+	case BPF_FUNC_rbtree_find:
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_rbtree_find_callback_state);
+		break;
 	case BPF_FUNC_dynptr_from_mem:
 		if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) {
 			verbose(env, "Unsupported reg type %s for bpf_dynptr_from_mem data\n",
@@ -7462,6 +7568,9 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		if (func_id == BPF_FUNC_kptr_xchg) {
 			ret_btf = meta.kptr_off_desc->kptr.btf;
 			ret_btf_id = meta.kptr_off_desc->kptr.btf_id;
+		} else if (function_returns_rbtree_node(func_id)) {
+			ret_btf = meta.map_ptr->btf;
+			ret_btf_id = meta.map_ptr->btf_value_type_id;
 		} else {
 			if (fn->ret_btf_id == BPF_PTR_POISON) {
 				verbose(env, "verifier internal error: func %s "
@@ -13499,8 +13608,10 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 					BPF_SIZE((insn)->code);
 				env->prog->aux->num_exentries++;
 			} else if (resolve_prog_type(env->prog) != BPF_PROG_TYPE_STRUCT_OPS) {
+				/*TODO: Not sure what to do here
 				verbose(env, "Writes through BTF pointers are not allowed\n");
 				return -EINVAL;
+				*/
 			}
 			continue;
 		default:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e174ad28aeb7..1af17b27d34f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -909,6 +909,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
 	BPF_MAP_TYPE_BLOOM_FILTER,
+	BPF_MAP_TYPE_RBTREE,
 };
 
 /* Note that tracing related programs such as
@@ -5353,6 +5354,61 @@ union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * void *bpf_rbtree_alloc_node(struct bpf_map *map, u32 sz)
+ *	Description
+ *		Allocate a node of size *sz* bytes for use in rbtree *map*.
+ *
+ *		*sz* must be >= sizeof(struct rb_node)
+ *	Return
+ *		A pointer to the allocated node if successful, otherwise NULL.
+ *
+ *		The verifier considers the type of the returned pointer to be
+ *		the BTF id of *map*'s value type.
+ *
+ * void *bpf_rbtree_add(struct bpf_map *map, void *node, void *cb)
+ *	Description
+ *		Add *node* to rbtree *map* using *cb* comparator callback to
+ *		walk the rbtree.
+ *
+ *		*cb* must take (struct rb_node \*, const struct rb_node \*) as
+ *		input and return a bool signifying whether the first rb_node's
+ *		struct is less than the second.
+ *
+ *	Return
+ *		If success, returns a pointer to the added node in the rbtree.
+ *
+ *		If add fails, returns NULL
+ *
+ * long bpf_rbtree_find(struct bpf_map *map, void *key, void *cb)
+ *	Description
+ *		Find *key* in rbtree *map* using *cb* comparator callback to walk the
+ *		rbtree.
+ *
+ *		*cb* must take (const void \*key, const struct rb_node \*n) as
+ *		input and return an int. If *cb* determines *n* to match *key*, *cb* must
+ *		return 0. If larger, a positive int, and a negative int if smaller.
+ *
+ *		*key* does not need to be a rbtree node struct.
+ *
+ *	Return
+ *		Ptr to rbtree node if found, otherwise NULL.
+ *
+ * void *bpf_rbtree_remove(struct bpf_map *map, void *elem)
+ *	Description
+ *		Remove *elem* from rbtree *map*.
+ *
+ *	Return
+ *		If success, returns a pointer to the removed node.
+ *
+ *		If remove fails, returns NULL
+ *
+ * long bpf_rbtree_free_node(struct bpf_map *map, void *elem)
+ *	Description
+ *		Free rb_node that isn't associated w/ a tree.
+ *
+ *	Return
+ *		0
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5564,6 +5620,11 @@ union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(rbtree_alloc_node),		\
+	FN(rbtree_add),			\
+	FN(rbtree_find),		\
+	FN(rbtree_remove),		\
+	FN(rbtree_free_node),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

From patchwork Tue Aug 30 17:27:46 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959626
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CC7DCC0502C
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:34:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231978AbiH3Re0 (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:34:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33428 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232184AbiH3Rdl (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:33:41 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8492558DD
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:29:51 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UGLbJG003220
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:17 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=Zu3eUoS3YFs3fC9lOiVyLm6LTawbtUPSRWdjUY2mbAE=;
 b=pPdxXzd9ZsNXtmKVNM+jnJkJ4u4sTQgU9tWJsJ7D21bueBPGS1dbEvh1sR9ObU6+NZeu
 q+cwZHfVq/IJD/Eev2p2EepuAWJlbD+7WV8ximMyLpl+eyYAFZRJJ6gpy/we5eVUtf0R
 AtYjd1tdWDrtqvv/ELzCCnFgUonjJMS/au8=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5djqgr-2
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:28:17 -0700
Received: from twshared6324.05.ash7.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:28:16 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id DB31BCAD0772; Tue, 30 Aug 2022 10:28:09 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 05/18] libbpf: Add support for private BSS map
 section
Date: Tue, 30 Aug 2022 10:27:46 -0700
Message-ID: <20220830172759.4069786-6-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: l-76MBjYaZfJ_YOK9YAFc_0aHakB766G
X-Proofpoint-ORIG-GUID: l-76MBjYaZfJ_YOK9YAFc_0aHakB766G
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Currently libbpf does not allow declaration of a struct bpf_spin_lock in
global scope. Attempting to do so results in "failed to re-mmap" error,
as .bss arraymap containing spinlock is not allowed to be mmap'd.

This patch adds support for a .bss.private section. The maps contained
in this section will not be mmaped into userspace by libbpf, nor will
they be exposed via bpftool-generated skeleton.

Intent here is to allow more natural programming pattern for
global-scope spinlocks which will be used by rbtree locking mechanism in
further patches in this series.

[
  RFC Notes:

  * Initially I called the section .bss.no_mmap, but the broader
    'private' term better indicates that skeleton shouldn't expose these
    maps at all, IMO.

  * bpftool/gen.c's is_internal_mmapable_map function checks whether the
    map flags have BPF_F_MMAPABLE, so no bpftool changes were necessary
    to remove .bss.private maps from skeleton
]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 tools/lib/bpf/libbpf.c | 65 ++++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3f01f5cd8a4c..a6dd53e0c4b4 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -463,6 +463,7 @@ struct bpf_struct_ops {
 #define KCONFIG_SEC ".kconfig"
 #define KSYMS_SEC ".ksyms"
 #define STRUCT_OPS_SEC ".struct_ops"
+#define BSS_SEC_PRIVATE ".bss.private"
 
 enum libbpf_map_type {
 	LIBBPF_MAP_UNSPEC,
@@ -576,6 +577,7 @@ enum sec_type {
 	SEC_BSS,
 	SEC_DATA,
 	SEC_RODATA,
+	SEC_BSS_PRIVATE,
 };
 
 struct elf_sec_desc {
@@ -1578,7 +1580,8 @@ bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map);
 
 static int
 bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
-			      const char *real_name, int sec_idx, void *data, size_t data_sz)
+			      const char *real_name, int sec_idx, void *data,
+			      size_t data_sz, bool do_mmap)
 {
 	struct bpf_map_def *def;
 	struct bpf_map *map;
@@ -1606,27 +1609,31 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 	def->max_entries = 1;
 	def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG
 			 ? BPF_F_RDONLY_PROG : 0;
-	def->map_flags |= BPF_F_MMAPABLE;
+	if (do_mmap)
+		def->map_flags |= BPF_F_MMAPABLE;
 
 	pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n",
 		 map->name, map->sec_idx, map->sec_offset, def->map_flags);
 
-	map->mmaped = mmap(NULL, bpf_map_mmap_sz(map), PROT_READ | PROT_WRITE,
-			   MAP_SHARED | MAP_ANONYMOUS, -1, 0);
-	if (map->mmaped == MAP_FAILED) {
-		err = -errno;
-		map->mmaped = NULL;
-		pr_warn("failed to alloc map '%s' content buffer: %d\n",
-			map->name, err);
-		zfree(&map->real_name);
-		zfree(&map->name);
-		return err;
+	map->mmaped = NULL;
+	if (do_mmap) {
+		map->mmaped = mmap(NULL, bpf_map_mmap_sz(map), PROT_READ | PROT_WRITE,
+				   MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+		if (map->mmaped == MAP_FAILED) {
+			err = -errno;
+			map->mmaped = NULL;
+			pr_warn("failed to alloc map '%s' content buffer: %d\n",
+				map->name, err);
+			zfree(&map->real_name);
+			zfree(&map->name);
+			return err;
+		}
 	}
 
 	/* failures are fine because of maps like .rodata.str1.1 */
 	(void) bpf_map_find_btf_info(obj, map);
 
-	if (data)
+	if (do_mmap && data)
 		memcpy(map->mmaped, data, data_sz);
 
 	pr_debug("map %td is \"%s\"\n", map - obj->maps, map->name);
@@ -1638,12 +1645,14 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 	struct elf_sec_desc *sec_desc;
 	const char *sec_name;
 	int err = 0, sec_idx;
+	bool do_mmap;
 
 	/*
 	 * Populate obj->maps with libbpf internal maps.
 	 */
 	for (sec_idx = 1; sec_idx < obj->efile.sec_cnt; sec_idx++) {
 		sec_desc = &obj->efile.secs[sec_idx];
+		do_mmap = true;
 
 		/* Skip recognized sections with size 0. */
 		if (sec_desc->data && sec_desc->data->d_size == 0)
@@ -1655,7 +1664,8 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_DATA,
 							    sec_name, sec_idx,
 							    sec_desc->data->d_buf,
-							    sec_desc->data->d_size);
+							    sec_desc->data->d_size,
+							    do_mmap);
 			break;
 		case SEC_RODATA:
 			obj->has_rodata = true;
@@ -1663,14 +1673,18 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_RODATA,
 							    sec_name, sec_idx,
 							    sec_desc->data->d_buf,
-							    sec_desc->data->d_size);
+							    sec_desc->data->d_size,
+							    do_mmap);
 			break;
+		case SEC_BSS_PRIVATE:
+			do_mmap = false;
 		case SEC_BSS:
 			sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
 			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_BSS,
 							    sec_name, sec_idx,
 							    NULL,
-							    sec_desc->data->d_size);
+							    sec_desc->data->d_size,
+							    do_mmap);
 			break;
 		default:
 			/* skip */
@@ -1984,7 +1998,7 @@ static int bpf_object__init_kconfig_map(struct bpf_object *obj)
 	map_sz = last_ext->kcfg.data_off + last_ext->kcfg.sz;
 	err = bpf_object__init_internal_map(obj, LIBBPF_MAP_KCONFIG,
 					    ".kconfig", obj->efile.symbols_shndx,
-					    NULL, map_sz);
+					    NULL, map_sz, true);
 	if (err)
 		return err;
 
@@ -3428,6 +3442,10 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 			sec_desc->sec_type = SEC_BSS;
 			sec_desc->shdr = sh;
 			sec_desc->data = data;
+		} else if (sh->sh_type == SHT_NOBITS && strcmp(name, BSS_SEC_PRIVATE) == 0) {
+			sec_desc->sec_type = SEC_BSS_PRIVATE;
+			sec_desc->shdr = sh;
+			sec_desc->data = data;
 		} else {
 			pr_info("elf: skipping section(%d) %s (size %zu)\n", idx, name,
 				(size_t)sh->sh_size);
@@ -3890,6 +3908,7 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
 	case SEC_BSS:
 	case SEC_DATA:
 	case SEC_RODATA:
+	case SEC_BSS_PRIVATE:
 		return true;
 	default:
 		return false;
@@ -3909,6 +3928,7 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
 		return LIBBPF_MAP_KCONFIG;
 
 	switch (obj->efile.secs[shndx].sec_type) {
+	case SEC_BSS_PRIVATE:
 	case SEC_BSS:
 		return LIBBPF_MAP_BSS;
 	case SEC_DATA:
@@ -4889,16 +4909,19 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
 {
 	enum libbpf_map_type map_type = map->libbpf_type;
 	char *cp, errmsg[STRERR_BUFSIZE];
-	int err, zero = 0;
+	int err = 0, zero = 0;
 
 	if (obj->gen_loader) {
-		bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
-					 map->mmaped, map->def.value_size);
+		if (map->mmaped)
+			bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
+						 map->mmaped, map->def.value_size);
 		if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG)
 			bpf_gen__map_freeze(obj->gen_loader, map - obj->maps);
 		return 0;
 	}
-	err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
+
+	if (map->mmaped)
+		err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
 	if (err) {
 		err = -errno;
 		cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));

From patchwork Tue Aug 30 17:27:47 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959636
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1BE67ECAAD8
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:46:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231564AbiH3Rqx (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:46:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47552 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231691AbiH3RqY (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:46:24 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F151D1123B8
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:42:53 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UAp7Ua031207
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:16 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=S5kVZFZOHJILeUAI/zBt9pkwv3jAC1n2cVZ9GzheBLU=;
 b=mnr/mIIU3Pt8GPHQzTSt0jznC8iHtMGFmpuRJkUJI2b7PP0AzTFYq/F/Y/NyMK6TTZea
 dc6Gc/wFGKEp+fuN1O6Ofk6WvpNeiAXPS17BDwGcJj8MuDuQkdhvs02x9au93GY3zwDA
 DLd6EnBAqN8PoGp4Yub8APwXppwhE9+4nCs=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5djr9u-9
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:16 -0700
Received: from twshared11415.03.ash7.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:31:12 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id F39E3CAD0775; Tue, 30 Aug 2022 10:28:09 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 06/18] bpf: Add bpf_spin_lock member to rbtree
Date: Tue, 30 Aug 2022 10:27:47 -0700
Message-ID: <20220830172759.4069786-7-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: Cr--XODqY70Gf7S4Sg2Qms62rtOAF34J
X-Proofpoint-ORIG-GUID: Cr--XODqY70Gf7S4Sg2Qms62rtOAF34J
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

This patch adds a struct bpf_spin_lock *lock member to bpf_rbtree, as
well as a bpf_rbtree_get_lock helper which allows bpf programs to access
the lock.

Ideally the bpf_spin_lock would be created independently oustide of the
tree and associated with it before the tree is used, either as part of
map definition or via some call like rbtree_init(&rbtree, &lock). Doing
this in an ergonomic way is proving harder than expected, so for now use
this workaround.

Why is creating the bpf_spin_lock independently and associating it with
the tree preferable? Because we want to be able to transfer nodes
between trees atomically, and for this to work need same lock associated
with 2 trees.

Further locking-related patches will make it possible for the lock to be
used in BPF programs and add code which enforces that the lock is held
when doing any operation on the tree.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/uapi/linux/bpf.h       |  7 +++++++
 kernel/bpf/helpers.c           |  3 +++
 kernel/bpf/rbtree.c            | 24 ++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  7 +++++++
 4 files changed, 41 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1af17b27d34f..06d71207de0b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5409,6 +5409,12 @@ union bpf_attr {
  *	Return
  *		0
  *
+ * void *bpf_rbtree_get_lock(struct bpf_map *map)
+ *	Description
+ *		Return the bpf_spin_lock associated with the rbtree
+ *
+ *	Return
+ *		Ptr to lock
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5625,6 +5631,7 @@ union bpf_attr {
 	FN(rbtree_find),		\
 	FN(rbtree_remove),		\
 	FN(rbtree_free_node),		\
+	FN(rbtree_get_lock),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index d18d4d8ca1e2..ae974d0aa70d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1603,6 +1603,7 @@ const struct bpf_func_proto bpf_rbtree_add_proto __weak;
 const struct bpf_func_proto bpf_rbtree_find_proto __weak;
 const struct bpf_func_proto bpf_rbtree_remove_proto __weak;
 const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
+const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak;
 
 const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1704,6 +1705,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_rbtree_remove_proto;
 	case BPF_FUNC_rbtree_free_node:
 		return &bpf_rbtree_free_node_proto;
+	case BPF_FUNC_rbtree_get_lock:
+		return &bpf_rbtree_get_lock_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 7d50574e4d57..0cc495b7cb26 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -10,6 +10,7 @@
 struct bpf_rbtree {
 	struct bpf_map map;
 	struct rb_root_cached root;
+	struct bpf_spin_lock *lock;
 };
 
 static int rbtree_map_alloc_check(union bpf_attr *attr)
@@ -38,6 +39,14 @@ static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
 
 	tree->root = RB_ROOT_CACHED;
 	bpf_map_init_from_attr(&tree->map, attr);
+
+	tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock),
+				     GFP_KERNEL | __GFP_NOWARN);
+	if (!tree->lock) {
+		bpf_map_area_free(tree);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	return &tree->map;
 }
 
@@ -139,6 +148,7 @@ static void rbtree_map_free(struct bpf_map *map)
 
 	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root)
 		kfree(pos);
+	kfree(tree->lock);
 	bpf_map_area_free(tree);
 }
 
@@ -238,6 +248,20 @@ static int rbtree_map_get_next_key(struct bpf_map *map, void *key,
 	return -ENOTSUPP;
 }
 
+BPF_CALL_1(bpf_rbtree_get_lock, struct bpf_map *, map)
+{
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+	return (u64)tree->lock;
+}
+
+const struct bpf_func_proto bpf_rbtree_get_lock_proto = {
+	.func = bpf_rbtree_get_lock,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_MAP_VALUE,
+	.arg1_type = ARG_CONST_MAP_PTR,
+};
+
 BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree)
 const struct bpf_map_ops rbtree_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1af17b27d34f..06d71207de0b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5409,6 +5409,12 @@ union bpf_attr {
  *	Return
  *		0
  *
+ * void *bpf_rbtree_get_lock(struct bpf_map *map)
+ *	Description
+ *		Return the bpf_spin_lock associated with the rbtree
+ *
+ *	Return
+ *		Ptr to lock
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5625,6 +5631,7 @@ union bpf_attr {
 	FN(rbtree_find),		\
 	FN(rbtree_remove),		\
 	FN(rbtree_free_node),		\
+	FN(rbtree_get_lock),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

From patchwork Tue Aug 30 17:27:48 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959630
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 86BBEECAAD8
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:39:41 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230398AbiH3Rjj (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:39:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46490 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232401AbiH3RjG (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:39:06 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92DDA136B20
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:36:16 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UAp7Ud031207
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:17 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=U3nZu1a49RK7Lk9vLSqUFpl1hmbrlApYlzGKZcCEpqA=;
 b=ec0TnLOYMtpYnRWhspHbg/zGU5KTqRY3fW2m7PlAtv1rQdZuPdKoDdMGRLWE2ZVy1qAc
 9x35tEWqLX1a9R3tfq/KS6NHIVWYDHYzLsc67AFWahgdVzYs+PjRykYh7+JEHjt8Hwmv
 QnEdT32DV6X/YiuGCeSoTF0Pi5auuxjP0iY=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5djr9u-12
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:17 -0700
Received: from twshared11415.03.ash7.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:31:12 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 200B1CAD0778; Tue, 30 Aug 2022 10:28:09 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 07/18] bpf: Add bpf_rbtree_{lock,unlock}
 helpers
Date: Tue, 30 Aug 2022 10:27:48 -0700
Message-ID: <20220830172759.4069786-8-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: jMQaguVmKrUhBBvTyNFIixen1dlwksnA
X-Proofpoint-ORIG-GUID: jMQaguVmKrUhBBvTyNFIixen1dlwksnA
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

These helpers are equivalent to bpf_spin_{lock,unlock}, but the verifier
doesn't try to enforce that no helper calls occur when there's an active
spin lock.

[ TODO: Currently the verifier doesn't do _anything_ spinlock related
when it sees one of these, including setting active_spin_lock. This is
probably too lenient. Also, EXPORT_SYMBOL for internal lock helpers
might not be the best code structure. ]

Future patches will add enforcement of "rbtree helpers must always be
called when lock is held" constraint.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/uapi/linux/bpf.h       | 20 ++++++++++++++++++++
 kernel/bpf/helpers.c           | 12 ++++++++++--
 kernel/bpf/rbtree.c            | 29 +++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          |  2 ++
 tools/include/uapi/linux/bpf.h | 20 ++++++++++++++++++++
 5 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 06d71207de0b..f4c615fbf64f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5415,6 +5415,24 @@ union bpf_attr {
  *
  *	Return
  *		Ptr to lock
+ *
+ * void *bpf_rbtree_lock(struct bpf_spin_lock *lock)
+ *	Description
+ *		Like bpf_spin_lock helper, but use separate helper for now
+ *		as we don't want this helper to have special meaning to the verifier
+ *		so that we can do rbtree helper calls between rbtree_lock/unlock
+ *
+ *	Return
+ *		0
+ *
+ * void *bpf_rbtree_unlock(struct bpf_spin_lock *lock)
+ *	Description
+ *		Like bpf_spin_unlock helper, but use separate helper for now
+ *		as we don't want this helper to have special meaning to the verifier
+ *		so that we can do rbtree helper calls between rbtree_lock/unlock
+ *
+ *	Return
+ *		0
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5632,6 +5650,8 @@ union bpf_attr {
 	FN(rbtree_remove),		\
 	FN(rbtree_free_node),		\
 	FN(rbtree_get_lock),		\
+	FN(rbtree_lock),		\
+	FN(rbtree_unlock),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index ae974d0aa70d..0ca5fed1013b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -316,7 +316,7 @@ static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
 
 static DEFINE_PER_CPU(unsigned long, irqsave_flags);
 
-static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
+inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
 {
 	unsigned long flags;
 
@@ -324,6 +324,7 @@ static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
 	__bpf_spin_lock(lock);
 	__this_cpu_write(irqsave_flags, flags);
 }
+EXPORT_SYMBOL(__bpf_spin_lock_irqsave);
 
 notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
 {
@@ -338,7 +339,7 @@ const struct bpf_func_proto bpf_spin_lock_proto = {
 	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
 };
 
-static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
+inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
 {
 	unsigned long flags;
 
@@ -346,6 +347,7 @@ static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
 	__bpf_spin_unlock(lock);
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL(__bpf_spin_unlock_irqrestore);
 
 notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
 {
@@ -1604,6 +1606,8 @@ const struct bpf_func_proto bpf_rbtree_find_proto __weak;
 const struct bpf_func_proto bpf_rbtree_remove_proto __weak;
 const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
 const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak;
+const struct bpf_func_proto bpf_rbtree_lock_proto __weak;
+const struct bpf_func_proto bpf_rbtree_unlock_proto __weak;
 
 const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1707,6 +1711,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_rbtree_free_node_proto;
 	case BPF_FUNC_rbtree_get_lock:
 		return &bpf_rbtree_get_lock_proto;
+	case BPF_FUNC_rbtree_lock:
+		return &bpf_rbtree_lock_proto;
+	case BPF_FUNC_rbtree_unlock:
+		return &bpf_rbtree_unlock_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 0cc495b7cb26..641821ee1a7f 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -262,6 +262,35 @@ const struct bpf_func_proto bpf_rbtree_get_lock_proto = {
 	.arg1_type = ARG_CONST_MAP_PTR,
 };
 
+extern void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock);
+extern void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock);
+
+BPF_CALL_1(bpf_rbtree_lock, void *, lock)
+{
+	__bpf_spin_lock_irqsave((struct bpf_spin_lock *)lock);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_rbtree_lock_proto = {
+	.func = bpf_rbtree_lock,
+	.gpl_only = true,
+	.ret_type = RET_INTEGER,
+	.arg1_type = ARG_PTR_TO_SPIN_LOCK,
+};
+
+BPF_CALL_1(bpf_rbtree_unlock, void *, lock)
+{
+	__bpf_spin_unlock_irqrestore((struct bpf_spin_lock *)lock);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_rbtree_unlock_proto = {
+	.func = bpf_rbtree_unlock,
+	.gpl_only = true,
+	.ret_type = RET_INTEGER,
+	.arg1_type = ARG_PTR_TO_SPIN_LOCK,
+};
+
 BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree)
 const struct bpf_map_ops rbtree_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0a2e958ddca8..b9e5d87fe323 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6057,6 +6057,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		} else if (meta->func_id == BPF_FUNC_spin_unlock) {
 			if (process_spin_lock(env, regno, false))
 				return -EACCES;
+		} else if (meta->func_id == BPF_FUNC_rbtree_lock ||
+			   meta->func_id == BPF_FUNC_rbtree_unlock) { // Do nothing for now
 		} else {
 			verbose(env, "verifier internal error\n");
 			return -EFAULT;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 06d71207de0b..f4c615fbf64f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5415,6 +5415,24 @@ union bpf_attr {
  *
  *	Return
  *		Ptr to lock
+ *
+ * void *bpf_rbtree_lock(struct bpf_spin_lock *lock)
+ *	Description
+ *		Like bpf_spin_lock helper, but use separate helper for now
+ *		as we don't want this helper to have special meaning to the verifier
+ *		so that we can do rbtree helper calls between rbtree_lock/unlock
+ *
+ *	Return
+ *		0
+ *
+ * void *bpf_rbtree_unlock(struct bpf_spin_lock *lock)
+ *	Description
+ *		Like bpf_spin_unlock helper, but use separate helper for now
+ *		as we don't want this helper to have special meaning to the verifier
+ *		so that we can do rbtree helper calls between rbtree_lock/unlock
+ *
+ *	Return
+ *		0
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5632,6 +5650,8 @@ union bpf_attr {
 	FN(rbtree_remove),		\
 	FN(rbtree_free_node),		\
 	FN(rbtree_get_lock),		\
+	FN(rbtree_lock),		\
+	FN(rbtree_unlock),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

From patchwork Tue Aug 30 17:27:49 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959656
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 54C46ECAAA1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:59:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230502AbiH3R67 (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:58:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42006 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232222AbiH3R6m (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:58:42 -0400
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com
 [67.231.145.42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51FD213F23
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:57:39 -0700 (PDT)
Received: from pps.filterd (m0148461.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UG282E009482
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:25 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=YdWUK1qM5R14Nw6HCmmY+/qSJX79IZx3jRfdpH798jU=;
 b=gzpVcEBBPNErZgbWCqWc+3GVtPXfInhsTZJLYnLpTZd1ou4EMN6JqWJxUWKTJvUiC4v5
 gWirXzBRJXIxSipxpMHhZHeYMFtpU0k9IEmeeknRFmeQIDPti1Jix8Op8BRlQfvYYtOe
 c+z9PN+dyYmVtlo5qIY8xD8xbtH3wfJeH2M=
Received: from mail.thefacebook.com ([163.114.132.120])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j94gye2d7-4
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:25 -0700
Received: from twshared10711.09.ash9.facebook.com (2620:10d:c085:108::8) by
 mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:31:19 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 34646CAD077A; Tue, 30 Aug 2022 10:28:09 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 08/18] bpf: Enforce spinlock hold for
 bpf_rbtree_{add,remove,find}
Date: Tue, 30 Aug 2022 10:27:49 -0700
Message-ID: <20220830172759.4069786-9-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: CNSOtJr5Dv2JQ9OeBpisxVUNAyb9MAxs
X-Proofpoint-GUID: CNSOtJr5Dv2JQ9OeBpisxVUNAyb9MAxs
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

The bpf program calling these helpers must hold the spinlock associated
with the rbtree map when doing so. Otherwise, a concurrent add/remove
operation could corrupt the tree while {add,remove,find} are walking it
with callback or pivoting after update.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/rbtree.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 641821ee1a7f..85a1d35818d0 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -13,6 +13,11 @@ struct bpf_rbtree {
 	struct bpf_spin_lock *lock;
 };
 
+static bool __rbtree_lock_held(struct bpf_rbtree *tree)
+{
+	return spin_is_locked((spinlock_t *)tree->lock);
+}
+
 static int rbtree_map_alloc_check(union bpf_attr *attr)
 {
 	if (attr->max_entries || !attr->btf_value_type_id)
@@ -92,6 +97,9 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
 	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
 	struct rb_node *node = (struct rb_node *)value;
 
+	if (!__rbtree_lock_held(tree))
+		return (u64)NULL;
+
 	if (WARN_ON_ONCE(!RB_EMPTY_NODE(node)))
 		return (u64)NULL;
 
@@ -114,6 +122,9 @@ BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb)
 {
 	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
 
+	if (!__rbtree_lock_held(tree))
+		return (u64)NULL;
+
 	return (u64)rb_find(key, &tree->root.rb_root,
 			    (int (*)(const void *key,
 				     const struct rb_node *))cb);
@@ -206,6 +217,9 @@ BPF_CALL_2(bpf_rbtree_remove, struct bpf_map *, map, void *, value)
 	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
 	struct rb_node *node = (struct rb_node *)value;
 
+	if (!__rbtree_lock_held(tree))
+		return (u64)NULL;
+
 	if (WARN_ON_ONCE(RB_EMPTY_NODE(node)))
 		return (u64)NULL;
 

From patchwork Tue Aug 30 17:27:50 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959631
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 32696ECAAD1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:40:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232131AbiH3RkY (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:40:24 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46674 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232139AbiH3Rj6 (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:39:58 -0400
Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB43B8B2E2
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:37:13 -0700 (PDT)
Received: from pps.filterd (m0089730.ppops.net [127.0.0.1])
        by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27UBSIJ3010941
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:12 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=8N08rjWdYAwkipbOIudqIE5mbw3dqNFIvt/ESVcSr+g=;
 b=npAN54olPuuNpxLR0Hj+eGOTb/pHMOF/QAG0+uYH1n4pa40uVG8H1R017U9S6v8yW7DU
 2XMeMAvf8ObUEKL5H6uSEs42L/rnuJy6h8G057GmR3wAsLoRpeLqEzb7Uc+WwNgsYjbH
 4fIrrwLLXhBxSJMWR8lFYmgcKjvvpLBju1g=
Received: from mail.thefacebook.com ([163.114.132.120])
        by m0089730.ppops.net (PPS) with ESMTPS id 3j9hpwjmcg-2
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:12 -0700
Received: from twshared6324.05.ash7.facebook.com (2620:10d:c085:108::8) by
 mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:35:11 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 3E8E2CAD077C; Tue, 30 Aug 2022 10:28:09 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 09/18] bpf: Support declarative association of
 lock with rbtree map
Date: Tue, 30 Aug 2022 10:27:50 -0700
Message-ID: <20220830172759.4069786-10-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: Hjxg4e1Ny2uCNQ6YKUpKKOlKdxQluVbN
X-Proofpoint-GUID: Hjxg4e1Ny2uCNQ6YKUpKKOlKdxQluVbN
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

This patch adds support for association of a bpf_spin_lock in an
internal arraymap with an rbtree, using the following pattern:

  struct bpf_spin_lock rbtree_lock SEC(".bss.private");

  struct {
          __uint(type, BPF_MAP_TYPE_RBTREE);
          __type(value, struct node_data);
          __array(lock, struct bpf_spin_lock);
  } rbtree SEC(".maps") = {
          .lock = {
                  [0] = &rbtree_lock,
          },
  };

There are a few benefits of this pattern over existing "init lock as
part of map, use bpf_rbtree_get_lock" logic:

  * Multiple rbtrees, potentially in different compilation units, may
    share the same lock

  * Lock lifetime does not need to be tied to map lifetime, aside from
    being strictly greater than map's lifetime

  * Can move from bpf_rbtree_{lock,unlock} to more generic
    bpf_{lock,unlock} helpers while still retaining static verification
    of locking behavior
    * struct bpf_rbtree still keeps lock pointer and this declarative
      association guarantees that the pointer address is known at rbtree
      map creation time

Implementation notes:

The mechanics of declarative lock association are heavily inspired by
declarative map-in-map inner map definition using __array(values,... .
Similarly to map-in-map "values", libbpf's bpf_map init_slots is used to
hold the target map, which in the above example is the .bss.private
internal arraymap. However, unlike "values" inner maps, the lock is a
map value within this map, so it's necessary to also pass the offset of
the lock symbol within the internal arraymap.

No uapi changes are necessary as existing BPF_MAP_CREATE map_extra is
used to pass the pair (fd of lock map, offset of symbol within lock map)
to the kernel when creating the rbtree map. rbtree_map_alloc can then
use this pair to find the address of the lock. Logic here is equivalent
to verifier's rewrite of BPF_PSEUDO_MAP_FD in check_ld_imm and
resolve_pseudo_ldimm64 - get actual struct bpf_map using fd, get address
of map's value region using map_direct_value_addr, add offset to get
actual lock addr.

This does introduce a dependency on the internal map containing the lock
("lock map") on both the libbpf- and kernel-side. For the former, it's
now necessary to init internal global data maps before user btf maps, as
the fd of the lock map must be known when data relo on rbtree map is
performed. For the latter, rbtree now holds refcount to the lock map to
ensure it cannot be freed until after the rbtree map is.

[
  RFC NOTE: This patch doesn't change verifier behavior, it's still doing
  dynamic lock checking. Further patches in the series change this
  behavior. This patch and the rest of the locking patches should be
  squashed, leaving in this state for now to make it easier to include
  or toss things piecemeal after feedback
]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/rbtree.c    | 47 +++++++++++++++++---
 kernel/bpf/syscall.c   |  8 +++-
 tools/lib/bpf/libbpf.c | 99 +++++++++++++++++++++++++++++++++++++-----
 3 files changed, 136 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 85a1d35818d0..c61662822511 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -11,6 +11,7 @@ struct bpf_rbtree {
 	struct bpf_map map;
 	struct rb_root_cached root;
 	struct bpf_spin_lock *lock;
+	struct bpf_map *lock_map;
 };
 
 static bool __rbtree_lock_held(struct bpf_rbtree *tree)
@@ -26,10 +27,22 @@ static int rbtree_map_alloc_check(union bpf_attr *attr)
 	return 0;
 }
 
+static void __rbtree_map_free(struct bpf_rbtree *tree)
+{
+	if (tree->lock_map)
+		bpf_map_put(tree->lock_map);
+	else if (tree->lock)
+		kfree(tree->lock);
+	bpf_map_area_free(tree);
+}
+
 static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
 {
+	u32 lock_map_ufd, lock_map_offset;
 	struct bpf_rbtree *tree;
+	u64 lock_map_addr;
 	int numa_node;
+	int err;
 
 	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
@@ -45,14 +58,35 @@ static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
 	tree->root = RB_ROOT_CACHED;
 	bpf_map_init_from_attr(&tree->map, attr);
 
-	tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock),
-				     GFP_KERNEL | __GFP_NOWARN);
-	if (!tree->lock) {
-		bpf_map_area_free(tree);
-		return ERR_PTR(-ENOMEM);
+	if (!attr->map_extra) {
+		tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock),
+					     GFP_KERNEL | __GFP_NOWARN);
+		if (!tree->lock) {
+			err = -ENOMEM;
+			goto err_free;
+		}
+	} else {
+		lock_map_ufd = (u32)(attr->map_extra >> 32);
+		lock_map_offset = (u32)attr->map_extra;
+		tree->lock_map = bpf_map_get(lock_map_ufd);
+		if (IS_ERR(tree->lock_map) || !tree->lock_map->ops->map_direct_value_addr) {
+			err = PTR_ERR(tree->lock_map);
+			tree->lock_map = NULL;
+			goto err_free;
+		}
+
+		err = tree->lock_map->ops->map_direct_value_addr(tree->lock_map, &lock_map_addr,
+								 lock_map_offset);
+		if (err)
+			goto err_free;
+
+		tree->lock = (struct bpf_spin_lock *)(lock_map_addr + lock_map_offset);
 	}
 
 	return &tree->map;
+err_free:
+	__rbtree_map_free(tree);
+	return ERR_PTR(err);
 }
 
 static struct rb_node *rbtree_map_alloc_node(struct bpf_map *map, size_t sz)
@@ -159,8 +193,7 @@ static void rbtree_map_free(struct bpf_map *map)
 
 	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root)
 		kfree(pos);
-	kfree(tree->lock);
-	bpf_map_area_free(tree);
+	__rbtree_map_free(tree);
 }
 
 static int rbtree_map_check_btf(const struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3947fbd137af..fa1220394462 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1066,6 +1066,12 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
+static bool map_uses_map_extra(enum bpf_map_type type)
+{
+	return type == BPF_MAP_TYPE_BLOOM_FILTER ||
+	       type == BPF_MAP_TYPE_RBTREE;
+}
+
 #define BPF_MAP_CREATE_LAST_FIELD map_extra
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
@@ -1087,7 +1093,7 @@ static int map_create(union bpf_attr *attr)
 		return -EINVAL;
 	}
 
-	if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
+	if (!map_uses_map_extra(attr->map_type) &&
 	    attr->map_extra != 0)
 		return -EINVAL;
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a6dd53e0c4b4..10c840137bac 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1429,6 +1429,11 @@ bpf_object__init_kversion(struct bpf_object *obj, void *data, size_t size)
 	return 0;
 }
 
+static bool bpf_map_type__uses_lock_def(enum bpf_map_type type)
+{
+	return type == BPF_MAP_TYPE_RBTREE;
+}
+
 static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
 {
 	if (type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
@@ -1517,6 +1522,16 @@ static size_t bpf_map_mmap_sz(const struct bpf_map *map)
 	return map_sz;
 }
 
+static bool internal_map_in_custom_section(const char *real_name)
+{
+	if (strchr(real_name + 1, '.') != NULL) {
+		if (strcmp(real_name, BSS_SEC_PRIVATE) == 0)
+			return false;
+		return true;
+	}
+	return false;
+}
+
 static char *internal_map_name(struct bpf_object *obj, const char *real_name)
 {
 	char map_name[BPF_OBJ_NAME_LEN], *p;
@@ -1559,7 +1574,7 @@ static char *internal_map_name(struct bpf_object *obj, const char *real_name)
 		sfx_len = BPF_OBJ_NAME_LEN - 1;
 
 	/* if there are two or more dots in map name, it's a custom dot map */
-	if (strchr(real_name + 1, '.') != NULL)
+	if (internal_map_in_custom_section(real_name))
 		pfx_len = 0;
 	else
 		pfx_len = min((size_t)BPF_OBJ_NAME_LEN - sfx_len - 1, strlen(obj->name));
@@ -2331,10 +2346,23 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
 		} else if (strcmp(name, "map_extra") == 0) {
 			__u32 map_extra;
 
+			if (bpf_map_type__uses_lock_def(map_def->map_type)) {
+				pr_warn("map '%s': can't set map_extra for map using 'lock' def.\n",
+					map_name);
+				return -EINVAL;
+			}
+
 			if (!get_map_field_int(map_name, btf, m, &map_extra))
 				return -EINVAL;
 			map_def->map_extra = map_extra;
 			map_def->parts |= MAP_DEF_MAP_EXTRA;
+		} else if (strcmp(name, "lock") == 0) {
+			if (!bpf_map_type__uses_lock_def(map_def->map_type)) {
+				pr_warn("map '%s': can't set 'lock' for map.\n", map_name);
+				return -ENOTSUP;
+			}
+			/* TODO: More sanity checking
+			 */
 		} else {
 			if (strict) {
 				pr_warn("map '%s': unknown field '%s'.\n", map_name, name);
@@ -2603,8 +2631,8 @@ static int bpf_object__init_maps(struct bpf_object *obj,
 	strict = !OPTS_GET(opts, relaxed_maps, false);
 	pin_root_path = OPTS_GET(opts, pin_root_path, NULL);
 
-	err = err ?: bpf_object__init_user_btf_maps(obj, strict, pin_root_path);
 	err = err ?: bpf_object__init_global_data_maps(obj);
+	err = err ?: bpf_object__init_user_btf_maps(obj, strict, pin_root_path);
 	err = err ?: bpf_object__init_kconfig_map(obj);
 	err = err ?: bpf_object__init_struct_ops_maps(obj);
 
@@ -4865,6 +4893,25 @@ static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
 		map_info.map_extra == map->map_extra);
 }
 
+static struct bpf_map *find_internal_map_by_shndx(struct bpf_object *obj,
+						  int shndx)
+{
+	struct bpf_map *map;
+	int i;
+
+	for (i = 0; i < obj->nr_maps; i++) {
+		map = &obj->maps[i];
+
+		if (!bpf_map__is_internal(map))
+			continue;
+
+		if (map->sec_idx == shndx)
+			return map;
+	}
+
+	return NULL;
+}
+
 static int
 bpf_object__reuse_map(struct bpf_map *map)
 {
@@ -4981,6 +5028,19 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 		}
 		if (map->inner_map_fd >= 0)
 			create_attr.inner_map_fd = map->inner_map_fd;
+	} else if (bpf_map_type__uses_lock_def(def->type)) {
+		if (map->init_slots_sz != 1) {
+			pr_warn("map '%s': expecting single lock def, actual count %d\n",
+				map->name, map->init_slots_sz);
+			return -EINVAL;
+		}
+
+		if (bpf_map__fd(map->init_slots[0]) < 0) {
+			pr_warn("map '%s': failed to find lock map fd\n", map->name);
+			return -EINVAL;
+		}
+
+		create_attr.map_extra |= (__u64)bpf_map__fd(map->init_slots[0]) << 32;
 	}
 
 	switch (def->type) {
@@ -5229,8 +5289,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 					goto err_out;
 				}
 			}
-
-			if (map->init_slots_sz && map->def.type != BPF_MAP_TYPE_PROG_ARRAY) {
+			if (map->init_slots_sz && bpf_map_type__is_map_in_map(map->def.type)) {
 				err = init_map_in_map_slots(obj, map);
 				if (err < 0) {
 					zclose(map->fd);
@@ -6424,9 +6483,9 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 	const struct btf_type *sec, *var, *def;
 	struct bpf_map *map = NULL, *targ_map = NULL;
 	struct bpf_program *targ_prog = NULL;
-	bool is_prog_array, is_map_in_map;
 	const struct btf_member *member;
 	const char *name, *mname, *type;
+	bool is_prog_array;
 	unsigned int moff;
 	Elf64_Sym *sym;
 	Elf64_Rel *rel;
@@ -6474,10 +6533,12 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 			return -EINVAL;
 		}
 
-		is_map_in_map = bpf_map_type__is_map_in_map(map->def.type);
+		/* PROG_ARRAY passes prog pointers using init_slots, other map
+		 * types pass map pointers
+		 */
 		is_prog_array = map->def.type == BPF_MAP_TYPE_PROG_ARRAY;
-		type = is_map_in_map ? "map" : "prog";
-		if (is_map_in_map) {
+		type = is_prog_array ? "prog" : "map";
+		if (bpf_map_type__is_map_in_map(map->def.type)) {
 			if (sym->st_shndx != obj->efile.btf_maps_shndx) {
 				pr_warn(".maps relo #%d: '%s' isn't a BTF-defined map\n",
 					i, name);
@@ -6509,6 +6570,24 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 					i, name);
 				return -LIBBPF_ERRNO__RELOC;
 			}
+		} else if (bpf_map_type__uses_lock_def(map->def.type)) {
+			targ_map = find_internal_map_by_shndx(obj, sym->st_shndx);
+			if (!targ_map) {
+				pr_warn(".maps relo #%d: '%s' isn't a valid map reference\n",
+					i, name);
+				return -LIBBPF_ERRNO__RELOC;
+			}
+
+			/* This shouldn't happen, check in parse_btf_map_def
+			 * should catch this, but to be safe let's prevent
+			 * map_extra overwrite
+			 */
+			if (map->map_extra) {
+				pr_warn(".maps rbtree relo #%d: map '%s' has ", i, map->name);
+				pr_warn("map_extra, can't relo lock, internal error.\n");
+				return -EINVAL;
+			}
+			map->map_extra = sym->st_value;
 		} else {
 			return -EINVAL;
 		}
@@ -6519,7 +6598,7 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 			return -EINVAL;
 		member = btf_members(def) + btf_vlen(def) - 1;
 		mname = btf__name_by_offset(obj->btf, member->name_off);
-		if (strcmp(mname, "values"))
+		if (strcmp(mname, "values") && strcmp(mname, "lock"))
 			return -EINVAL;
 
 		moff = btf_member_bit_offset(def, btf_vlen(def) - 1) / 8;
@@ -6543,7 +6622,7 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 			       (new_sz - map->init_slots_sz) * host_ptr_sz);
 			map->init_slots_sz = new_sz;
 		}
-		map->init_slots[moff] = is_map_in_map ? (void *)targ_map : (void *)targ_prog;
+		map->init_slots[moff] = is_prog_array ? (void *)targ_prog : (void *)targ_map;
 
 		pr_debug(".maps relo #%d: map '%s' slot [%d] points to %s '%s'\n",
 			 i, map->name, moff, type, name);

From patchwork Tue Aug 30 17:27:51 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959646
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1E8F1ECAAA1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:53:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229902AbiH3Rw7 (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:52:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55200 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231808AbiH3Rwn (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:52:43 -0400
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com
 [67.231.145.42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBE02A9C38
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:49:19 -0700 (PDT)
Received: from pps.filterd (m0109333.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UFrVfi007679
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:15 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=TPNw2gPY2Pxjdru4n0HetYVA0UxIqST4v7qkVNdkt9E=;
 b=iJMeQo+IMZ3aLt8OE2B+VMfEKe3BSbFv/53LjLDs5PNoS8X/3pFcAsFj8yOjVW/gnROL
 FqdKBOH6zO+nxY5nDBjynME4mrE6mEGf6zAvqxufhzF+g7DUa2bo92D7OVIdqGnpTPlE
 4oLQ/3GNl7zkJSkj+xkRc61kXuROGEmPZ1A=
Received: from mail.thefacebook.com ([163.114.132.120])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9e9yk9df-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:15 -0700
Received: from twshared10711.09.ash9.facebook.com (2620:10d:c085:208::f) by
 mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:35:14 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 5F1B3CAD0781; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 10/18] bpf: Verifier tracking of
 rbtree_spin_lock held
Date: Tue, 30 Aug 2022 10:27:51 -0700
Message-ID: <20220830172759.4069786-11-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: ebW7U4wkAjWabDPwk_GD9Ne9aMa0bLPq
X-Proofpoint-GUID: ebW7U4wkAjWabDPwk_GD9Ne9aMa0bLPq
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

This patch teaches the verifier that rbtree_{lock,unlock} interact with
locks and eases use of rbtree_lock(&some_lock) where &some_lock is in a
global internal map.

The verifier now tracks lock id for rbtree_{lock,unlock} and understands
that lock can / must be held when calling various other rbtree helpers.

Logic is also added to ease this pattern:

  /* internal map */
  struct bpf_spin_lock lock SEC(".bss.private");

  /* In bpf prog */

  rbtree_lock(&lock);
  /* ... use all registers for other work */
  rbtree_unlock(&lock);

In the above example, the prog compiles down to something like:

  r1 = 0x12345
  call rbtree_lock
  /* begin other work
  r1 = some_unrelated_value
  r2 = r1 or similar never happens
  */
  r1 = 0x12345
  call rbtree_unlock

Each time "r1 = 0x12345" happens, verifier's check_ld_imm will assign a
new id to the lock, which will result in it later complaining that
the incorrect lock is being unlocked when checking rbtree_unlock.

To help with this pattern, bpf_verifier_state now has a
maybe_active_spin_lock_addr field. If this field is nonzero and
bpf_verifier_state's active_spin_lock is also nonzero, then
maybe_active_spin_lock_addr contains the address of the active spin lock
(corresponding to active_spin_lock's id). This allows the verifier to
avoid assigning a new lock id when it sees the second "r1 = 0x12345",
since it can recognize that the address matches an existing lock id.

[ RFC Notes:

  * rbtree_process_spin_lock should be merged w/ normal
    process_spin_lock, same with rbtree_lock and normal lock helpers.
    Left separate for now to highlight the logic differences.

  * The hacky maybe_active_spin_lock_addr logic can be improved by
    adding support to a custom .lock section similar to existing use of
    .bss.private. The new section type would function like .bss.private,
    but the verifier would know that locks in .lock are likely to be
    used like bpf_spin_lock(&lock), and could track the address of each
    map value for deduping, instead of just tracking single address. For
    multiple-lock scenario this is probably necessary.
]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h          |   2 +
 include/linux/bpf_verifier.h |   1 +
 kernel/bpf/rbtree.c          |   2 +-
 kernel/bpf/verifier.c        | 136 ++++++++++++++++++++++++++++++++---
 4 files changed, 129 insertions(+), 12 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b4a44ffb0d6c..d6458aa7b79c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -497,6 +497,7 @@ enum bpf_return_type {
 	RET_PTR_TO_ALLOC_MEM,		/* returns a pointer to dynamically allocated memory */
 	RET_PTR_TO_MEM_OR_BTF_ID,	/* returns a pointer to a valid memory or a btf_id */
 	RET_PTR_TO_BTF_ID,		/* returns a pointer to a btf_id */
+	RET_PTR_TO_SPIN_LOCK,		/* returns a pointer to a struct bpf_spin_lock */
 	__BPF_RET_TYPE_MAX,
 
 	/* Extended ret_types. */
@@ -612,6 +613,7 @@ enum bpf_reg_type {
 	PTR_TO_MEM,		 /* reg points to valid memory region */
 	PTR_TO_BUF,		 /* reg points to a read/write buffer */
 	PTR_TO_FUNC,		 /* reg points to a bpf program function */
+	PTR_TO_SPIN_LOCK,	 /* reg points to a struct bpf_spin_lock */
 	__BPF_REG_TYPE_MAX,
 
 	/* Extended reg_types. */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9c017575c034..f81638844a4d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -313,6 +313,7 @@ struct bpf_verifier_state {
 	u32 insn_idx;
 	u32 curframe;
 	u32 active_spin_lock;
+	void *maybe_active_spin_lock_addr;
 	bool speculative;
 
 	/* first and last insn idx of this verifier state */
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index c61662822511..0821e841a518 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -305,7 +305,7 @@ BPF_CALL_1(bpf_rbtree_get_lock, struct bpf_map *, map)
 const struct bpf_func_proto bpf_rbtree_get_lock_proto = {
 	.func = bpf_rbtree_get_lock,
 	.gpl_only = true,
-	.ret_type = RET_PTR_TO_MAP_VALUE,
+	.ret_type = RET_PTR_TO_SPIN_LOCK,
 	.arg1_type = ARG_CONST_MAP_PTR,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b9e5d87fe323..f8ba381f1327 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -452,8 +452,9 @@ static bool reg_type_not_null(enum bpf_reg_type type)
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
-	return reg->type == PTR_TO_MAP_VALUE &&
-		map_value_has_spin_lock(reg->map_ptr);
+	return (reg->type == PTR_TO_MAP_VALUE &&
+		map_value_has_spin_lock(reg->map_ptr)) ||
+		reg->type == PTR_TO_SPIN_LOCK;
 }
 
 static bool reg_type_may_be_refcounted_or_null(enum bpf_reg_type type)
@@ -507,6 +508,34 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_skc_to_tcp_request_sock;
 }
 
+/* These functions can only be called when spinlock associated with rbtree
+ * is held. If they have a callback argument, that callback is not required
+ * to release active_spin_lock before exiting
+ */
+static bool is_rbtree_lock_required_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_rbtree_add ||
+		func_id == BPF_FUNC_rbtree_remove ||
+		func_id == BPF_FUNC_rbtree_find ||
+		func_id == BPF_FUNC_rbtree_unlock;
+}
+
+/* These functions are OK to call when spinlock associated with rbtree
+ * is held.
+ */
+static bool is_rbtree_lock_ok_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_rbtree_alloc_node ||
+		func_id == BPF_FUNC_rbtree_free_node ||
+		is_rbtree_lock_required_function(func_id);
+}
+
+static bool is_lock_allowed_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_spin_unlock ||
+		is_rbtree_lock_ok_function(func_id);
+}
+
 static bool is_dynptr_ref_function(enum bpf_func_id func_id)
 {
 	return func_id == BPF_FUNC_dynptr_data;
@@ -579,6 +608,7 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 		[PTR_TO_BUF]		= "buf",
 		[PTR_TO_FUNC]		= "func",
 		[PTR_TO_MAP_KEY]	= "map_key",
+		[PTR_TO_SPIN_LOCK]	= "spin_lock",
 	};
 
 	if (type & PTR_MAYBE_NULL) {
@@ -1199,6 +1229,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->speculative = src->speculative;
 	dst_state->curframe = src->curframe;
 	dst_state->active_spin_lock = src->active_spin_lock;
+	dst_state->maybe_active_spin_lock_addr = src->maybe_active_spin_lock_addr;
 	dst_state->branches = src->branches;
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
@@ -5471,6 +5502,35 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			return -EINVAL;
 		}
 		cur->active_spin_lock = 0;
+		cur->maybe_active_spin_lock_addr = 0;
+	}
+	return 0;
+}
+
+static int rbtree_process_spin_lock(struct bpf_verifier_env *env, int regno,
+				    bool is_lock)
+{
+	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	struct bpf_verifier_state *cur = env->cur_state;
+
+	if (is_lock) {
+		if (cur->active_spin_lock) {
+			verbose(env,
+				"Locking two bpf_spin_locks are not allowed\n");
+			return -EINVAL;
+		}
+		cur->active_spin_lock = reg->id;
+	} else {
+		if (!cur->active_spin_lock) {
+			verbose(env, "rbtree_spin_unlock without taking a lock\n");
+			return -EINVAL;
+		}
+		if (cur->active_spin_lock != reg->id) {
+			verbose(env, "rbtree_spin_unlock of different lock\n");
+			return -EINVAL;
+		}
+		cur->active_spin_lock = 0;
+		cur->maybe_active_spin_lock_addr = 0;
 	}
 	return 0;
 }
@@ -5686,12 +5746,18 @@ static const struct bpf_reg_types int_ptr_types = {
 	},
 };
 
+static const struct bpf_reg_types spin_lock_types = {
+	.types = {
+		PTR_TO_MAP_VALUE,
+		PTR_TO_SPIN_LOCK
+	},
+};
+
 static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } };
 static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
 static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
 static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
@@ -6057,8 +6123,12 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		} else if (meta->func_id == BPF_FUNC_spin_unlock) {
 			if (process_spin_lock(env, regno, false))
 				return -EACCES;
-		} else if (meta->func_id == BPF_FUNC_rbtree_lock ||
-			   meta->func_id == BPF_FUNC_rbtree_unlock) { // Do nothing for now
+		} else if (meta->func_id == BPF_FUNC_rbtree_lock) {
+			if (rbtree_process_spin_lock(env, regno, true))
+				return -EACCES;
+		} else if (meta->func_id == BPF_FUNC_rbtree_unlock) {
+			if (rbtree_process_spin_lock(env, regno, false))
+				return -EACCES;
 		} else {
 			verbose(env, "verifier internal error\n");
 			return -EFAULT;
@@ -6993,6 +7063,29 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/* Are we currently verifying the callback for a rbtree helper that must
+ * be called with lock held? If so, no need to complain about unreleased
+ * lock
+ */
+static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env)
+{
+	struct bpf_verifier_state *state = env->cur_state;
+	struct bpf_insn *insn = env->prog->insnsi;
+	struct bpf_func_state *callee;
+	int func_id;
+
+	if (!state->curframe)
+		return false;
+
+	callee = state->frame[state->curframe];
+
+	if (!callee->in_callback_fn)
+		return false;
+
+	func_id = insn[callee->callsite].imm;
+	return is_rbtree_lock_required_function(func_id);
+}
+
 static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 {
 	struct bpf_verifier_state *state = env->cur_state;
@@ -7508,6 +7601,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
 		break;
+	case RET_PTR_TO_SPIN_LOCK:
+		mark_reg_known_zero(env, regs, BPF_REG_0);
+		regs[BPF_REG_0].type = PTR_TO_SPIN_LOCK | ret_flag;
+		regs[BPF_REG_0].id = ++env->id_gen;
+		break;
 	case RET_PTR_TO_SOCKET:
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_SOCKET | ret_flag;
@@ -10366,6 +10464,20 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static unsigned int ld_imm_lock_id_gen(struct bpf_verifier_env *env,
+					     void *imm)
+{
+	struct bpf_verifier_state *cur = env->cur_state;
+
+	if (cur->active_spin_lock && cur->maybe_active_spin_lock_addr &&
+	    cur->maybe_active_spin_lock_addr == imm)
+		return cur->active_spin_lock;
+
+	if (!cur->active_spin_lock)
+		cur->maybe_active_spin_lock_addr = imm;
+	return ++env->id_gen;
+}
+
 /* verify BPF_LD_IMM64 instruction */
 static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {
@@ -10373,6 +10485,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	struct bpf_reg_state *regs = cur_regs(env);
 	struct bpf_reg_state *dst_reg;
 	struct bpf_map *map;
+	u64 imm;
 	int err;
 
 	if (BPF_SIZE(insn->code) != BPF_DW) {
@@ -10390,7 +10503,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 
 	dst_reg = &regs[insn->dst_reg];
 	if (insn->src_reg == 0) {
-		u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
+		imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
 
 		dst_reg->type = SCALAR_VALUE;
 		__mark_reg_known(&regs[insn->dst_reg], imm);
@@ -10441,13 +10554,14 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 
 	map = env->used_maps[aux->map_index];
 	dst_reg->map_ptr = map;
-
 	if (insn->src_reg == BPF_PSEUDO_MAP_VALUE ||
 	    insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE) {
 		dst_reg->type = PTR_TO_MAP_VALUE;
 		dst_reg->off = aux->map_off;
-		if (map_value_has_spin_lock(map))
-			dst_reg->id = ++env->id_gen;
+		if (map_value_has_spin_lock(map)) {
+			imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
+			dst_reg->id = ld_imm_lock_id_gen(env, (void *)imm);
+		}
 	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
 		   insn->src_reg == BPF_PSEUDO_MAP_IDX) {
 		dst_reg->type = CONST_PTR_TO_MAP;
@@ -12432,7 +12546,7 @@ static int do_check(struct bpf_verifier_env *env)
 
 				if (env->cur_state->active_spin_lock &&
 				    (insn->src_reg == BPF_PSEUDO_CALL ||
-				     insn->imm != BPF_FUNC_spin_unlock)) {
+				     !is_lock_allowed_function(insn->imm))) {
 					verbose(env, "function calls are not allowed while holding a lock\n");
 					return -EINVAL;
 				}
@@ -12467,7 +12581,7 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_spin_lock) {
+				if (state->active_spin_lock && !in_rbtree_lock_required_cb(env)) {
 					verbose(env, "bpf_spin_unlock is missing\n");
 					return -EINVAL;
 				}

From patchwork Tue Aug 30 17:27:52 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959632
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5B9B9ECAAA1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:40:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232137AbiH3Rka (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:40:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47104 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231314AbiH3RkG (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:40:06 -0400
Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F12AA8318
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:37:08 -0700 (PDT)
Received: from pps.filterd (m0089730.ppops.net [127.0.0.1])
        by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27UBSjjn011705
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:18 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=5uFV82NmSqXGw01d54oYp2JuDAhn5np5/kV4LmVjmm0=;
 b=EkOkHeaxmQv2MFn3PQcWxzJ1uwUxCcwoBXev8LpK/zYQ2MbodxqAT4Lg7LRgeildo79z
 yj7iMSqWO6GsjyXun82OveE9xgMo5E6WgOHh0DlbIXRBeu0Nv5tEMw0uKWJniIzkznF6
 caF/oX1pNjBOcCnLlTpKlJbPpIW3sqUXptg=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by m0089730.ppops.net (PPS) with ESMTPS id 3j9hpwjmdc-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:18 -0700
Received: from twshared14074.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:35:17 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 7B94ACAD078A; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 11/18] bpf: Check rbtree lock held during
 verification
Date: Tue, 30 Aug 2022 10:27:52 -0700
Message-ID: <20220830172759.4069786-12-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: orJ_8fvw40SS0E_NtpbBN6reP4OoOajS
X-Proofpoint-GUID: orJ_8fvw40SS0E_NtpbBN6reP4OoOajS
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

This patch builds on the previous few locking patches, teaching the
verifier to check whether rbtree's lock is held.

[ RFC Notes:

  * After this patch, could change helpers which only can fail due to
    dynamic lock check to always succeed, likely allowing us to get rid
    of CONDITIONAL_RELEASE logic. But since dynamic lock checking is
    almost certainly going to be necessary regardless, didn't do so.
]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h   |  2 ++
 kernel/bpf/rbtree.c   |  8 ++++++++
 kernel/bpf/verifier.c | 43 +++++++++++++++++++++++++++++++++++++++----
 3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d6458aa7b79c..b762c6b3dcfb 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -155,6 +155,8 @@ struct bpf_map_ops {
 				     bpf_callback_t callback_fn,
 				     void *callback_ctx, u64 flags);
 
+	bool (*map_lock_held)(struct bpf_map *map, void *current_lock);
+
 	/* BTF id of struct allocated by map_alloc */
 	int *map_btf_id;
 
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 0821e841a518..b5d158254de6 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -245,6 +245,13 @@ static int rbtree_map_delete_elem(struct bpf_map *map, void *value)
 	return -ENOTSUPP;
 }
 
+static bool rbtree_map_lock_held(struct bpf_map *map, void *current_lock)
+{
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+	return tree->lock == current_lock;
+}
+
 BPF_CALL_2(bpf_rbtree_remove, struct bpf_map *, map, void *, value)
 {
 	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
@@ -353,4 +360,5 @@ const struct bpf_map_ops rbtree_map_ops = {
 	.map_delete_elem = rbtree_map_delete_elem,
 	.map_check_btf = rbtree_map_check_btf,
 	.map_btf_id = &bpf_rbtree_map_btf_ids[0],
+	.map_lock_held = rbtree_map_lock_held,
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f8ba381f1327..3c9af1047d80 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -508,16 +508,25 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_skc_to_tcp_request_sock;
 }
 
+/* These functions can only be called when spinlock associated with rbtree
+ * is held. They all take struct bpf_map ptr to rbtree as their first argument,
+ * so we can verify that the correct lock is held before loading program
+ */
+static bool is_rbtree_lock_check_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_rbtree_add ||
+		func_id == BPF_FUNC_rbtree_remove ||
+		func_id == BPF_FUNC_rbtree_find;
+}
+
 /* These functions can only be called when spinlock associated with rbtree
  * is held. If they have a callback argument, that callback is not required
  * to release active_spin_lock before exiting
  */
 static bool is_rbtree_lock_required_function(enum bpf_func_id func_id)
 {
-	return func_id == BPF_FUNC_rbtree_add ||
-		func_id == BPF_FUNC_rbtree_remove ||
-		func_id == BPF_FUNC_rbtree_find ||
-		func_id == BPF_FUNC_rbtree_unlock;
+	return func_id == BPF_FUNC_rbtree_unlock ||
+		is_rbtree_lock_check_function(func_id);
 }
 
 /* These functions are OK to call when spinlock associated with rbtree
@@ -7354,6 +7363,26 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
 				 state->callback_subprogno == subprogno);
 }
 
+static int check_rbtree_lock_held(struct bpf_verifier_env *env,
+				  struct bpf_map *map)
+{
+	struct bpf_verifier_state *cur = env->cur_state;
+
+	if (!map)
+		return -1;
+
+	if (!cur->active_spin_lock || !cur->maybe_active_spin_lock_addr ||
+	    !map || !map->ops->map_lock_held)
+		return -1;
+
+	if (!map->ops->map_lock_held(map, cur->maybe_active_spin_lock_addr)) {
+		verbose(env, "active spin lock doesn't match rbtree's lock\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx_p)
 {
@@ -7560,6 +7589,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	if (err)
 		return err;
 
+	if (is_rbtree_lock_check_function(func_id) &&
+	    check_rbtree_lock_held(env, meta.map_ptr)) {
+		verbose(env, "lock associated with rbtree is not held\n");
+		return -EINVAL;
+	}
+
 	/* reset caller saved regs */
 	for (i = 0; i < CALLER_SAVED_REGS; i++) {
 		mark_reg_not_init(env, regs, caller_saved[i]);

From patchwork Tue Aug 30 17:27:53 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959640
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F3648ECAAA1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:49:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232172AbiH3RtF (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:49:05 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48562 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232246AbiH3Rsn (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:48:43 -0400
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com
 [67.231.145.42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9D3425E9B
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:45:42 -0700 (PDT)
Received: from pps.filterd (m0044010.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UG2F9Y025015
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:21 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=xjs6vWKiLzieNqi5jZB2GEaVN9g15z7SKNRg+eGLHrg=;
 b=rRmgZcD9Rg/MM+Q8CKuJRgWzzTop7+/tLpvmpKZVRrsL+2Jg9rNUJKpNvXegdVCOLJF4
 n181tmtVxiIQ9GVXwYdyWgAxbU3bRzsi08/fJFo6V9EuLLpYpu2Jal00LPhqT9KUf4zr
 cwLKUfWOmhkQoxWY+EBwUGHyp02jzNQ0DG0=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9ae4c5ht-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:20 -0700
Received: from twshared0646.06.ash9.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:31:19 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id F19EDCAD079B; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 12/18] bpf: Add OBJ_NON_OWNING_REF type flag
Date: Tue, 30 Aug 2022 10:27:53 -0700
Message-ID: <20220830172759.4069786-13-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: xeFsjwL0cdoTY6sXGyAuwjam8L_KzDm0
X-Proofpoint-GUID: xeFsjwL0cdoTY6sXGyAuwjam8L_KzDm0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Consider a pointer to a type that would normally need acquire / release
semantics to be safely held. There may be scenarios where such a pointer
can be safely held without the need to acquire a reference.

For example, although a PTR_TO_BTF_ID for a rbtree_map node is released
via bpf_rbtree_add helper, the helper doesn't change the address of the
node and must be called with the rbtree_map's spinlock held. Since the
only way to remove a node from the rbtree - bpf_rbtree_remove helper -
requires the same lock, the newly-added node cannot be removed by a
concurrently-running program until the lock is released. Therefore it is
safe to hold a reference to this node until bpf_rbtree_unlock is called.

This patch introduces a new type flag and associated verifier logic to
handle such "non-owning" references.

Currently the only usecase I have is the rbtree example above, so the
verifier logic is straightforward:
  * Tag return types of bpf_rbtree_{add,find} with OBJ_NON_OWNING_REF
    * These both require the rbtree lock to be held to return anything
    non-NULL
    * Since ret type for both is PTR_TO_BTF_ID_OR_NULL, if lock is not
    held and NULL is returned, existing mark_ptr_or_null_reg logic
    will clear reg type.
    * So if mark_ptr_or_null_reg logic turns the returned reg into a
    PTR_TO_BTF_ID | OBJ_NON_OWNING_REF, verifier knows lock is held.

  * When the lock is released the verifier invalidates any regs holding
  non owning refs similarly to existing release_reference logic - but no
  need to clear ref_obj_id as an 'owning' reference was never acquired.

[ TODO: Currently the invalidation logic in
clear_rbtree_node_non_owning_refs is not parametrized by map so
unlocking any rbtree lock will invalidate all non-owning refs ]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h   |  1 +
 kernel/bpf/rbtree.c   |  4 +--
 kernel/bpf/verifier.c | 65 +++++++++++++++++++++++++++++++++++++++----
 3 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b762c6b3dcfb..f164bd6e2f3a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -415,6 +415,7 @@ enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	OBJ_NON_OWNING_REF	= BIT(11 + BPF_BASE_TYPE_BITS),
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index b5d158254de6..cc89639df8a2 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -144,7 +144,7 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
 const struct bpf_func_proto bpf_rbtree_add_proto = {
 	.func = bpf_rbtree_add,
 	.gpl_only = true,
-	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
 	.ret_btf_id = BPF_PTR_POISON,
 	.arg1_type = ARG_CONST_MAP_PTR,
 	.arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
@@ -167,7 +167,7 @@ BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb)
 const struct bpf_func_proto bpf_rbtree_find_proto = {
 	.func = bpf_rbtree_find,
 	.gpl_only = true,
-	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
 	.ret_btf_id = BPF_PTR_POISON,
 	.arg1_type = ARG_CONST_MAP_PTR,
 	.arg2_type = ARG_ANYTHING,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c9af1047d80..26aa228fa860 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -469,6 +469,11 @@ static bool type_is_rdonly_mem(u32 type)
 	return type & MEM_RDONLY;
 }
 
+static bool type_is_non_owning_ref(u32 type)
+{
+	return type & OBJ_NON_OWNING_REF;
+}
+
 static bool type_may_be_null(u32 type)
 {
 	return type & PTR_MAYBE_NULL;
@@ -595,7 +600,9 @@ static bool function_returns_rbtree_node(enum bpf_func_id func_id)
 static const char *reg_type_str(struct bpf_verifier_env *env,
 				enum bpf_reg_type type)
 {
-	char postfix[16] = {0}, prefix[32] = {0};
+	char postfix[32] = {0}, prefix[32] = {0};
+	unsigned int postfix_idx = 0;
+
 	static const char * const str[] = {
 		[NOT_INIT]		= "?",
 		[SCALAR_VALUE]		= "scalar",
@@ -620,11 +627,18 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 		[PTR_TO_SPIN_LOCK]	= "spin_lock",
 	};
 
-	if (type & PTR_MAYBE_NULL) {
+	if (type_may_be_null(type)) {
 		if (base_type(type) == PTR_TO_BTF_ID)
-			strncpy(postfix, "or_null_", 16);
+			postfix_idx += strlcpy(postfix + postfix_idx, "or_null_", 32 - postfix_idx);
 		else
-			strncpy(postfix, "_or_null", 16);
+			postfix_idx += strlcpy(postfix + postfix_idx, "_or_null", 32 - postfix_idx);
+	}
+
+	if (type_is_non_owning_ref(type)) {
+		if (base_type(type) == PTR_TO_BTF_ID)
+			postfix_idx += strlcpy(postfix + postfix_idx, "non_own_", 32 - postfix_idx);
+		else
+			postfix_idx += strlcpy(postfix + postfix_idx, "_non_own", 32 - postfix_idx);
 	}
 
 	if (type & MEM_RDONLY)
@@ -5758,7 +5772,14 @@ static const struct bpf_reg_types int_ptr_types = {
 static const struct bpf_reg_types spin_lock_types = {
 	.types = {
 		PTR_TO_MAP_VALUE,
-		PTR_TO_SPIN_LOCK
+		PTR_TO_SPIN_LOCK,
+	},
+};
+
+static const struct bpf_reg_types btf_ptr_types = {
+	.types = {
+		PTR_TO_BTF_ID,
+		PTR_TO_BTF_ID | OBJ_NON_OWNING_REF,
 	},
 };
 
@@ -5767,7 +5788,6 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
 static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
@@ -6723,6 +6743,33 @@ static int release_reference(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static void clear_non_owning_ref_regs(struct bpf_verifier_env *env,
+				      struct bpf_func_state *state)
+{
+	struct bpf_reg_state *regs = state->regs, *reg;
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++)
+		if (type_is_non_owning_ref(regs[i].type))
+			mark_reg_unknown(env, regs, i);
+
+	bpf_for_each_spilled_reg(i, state, reg) {
+		if (!reg)
+			continue;
+		if (type_is_non_owning_ref(reg->type))
+			__mark_reg_unknown(env, reg);
+	}
+}
+
+static void clear_rbtree_node_non_owning_refs(struct bpf_verifier_env *env)
+{
+	struct bpf_verifier_state *vstate = env->cur_state;
+	int i;
+
+	for (i = 0; i <= vstate->curframe; i++)
+		clear_non_owning_ref_regs(env, vstate->frame[i]);
+}
+
 static void clear_caller_saved_regs(struct bpf_verifier_env *env,
 				    struct bpf_reg_state *regs)
 {
@@ -7584,6 +7631,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return -EFAULT;
 		}
 		break;
+	case BPF_FUNC_rbtree_unlock:
+		/* TODO clear_rbtree_node_non_owning_refs calls should be
+		 * parametrized by base_type or ideally by owning map
+		 */
+		clear_rbtree_node_non_owning_refs(env);
+		break;
 	}
 
 	if (err)

From patchwork Tue Aug 30 17:27:54 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959647
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D9A90ECAAD1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:53:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230163AbiH3Rw7 (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:52:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55210 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231869AbiH3Rwn (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:52:43 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CA57AE87A
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:49:20 -0700 (PDT)
Received: from pps.filterd (m0109331.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27U7aBf9009095
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:14 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=wP8wWcO6Fpe5apg5LunaNHP01cFXqJAJFd3opF3Tb64=;
 b=RGCu9tcWtrlIf0TX4vGj8DcPI5uJ3PBZY+vprjcvc2lEzlHNSf7t6LRgKNIzsg2//2+j
 TETvtiDB+msRE/n2FVOIhayNNz9JqDqTN1Yc7ztwsh2BJo8tdKsweM0sKAErtjRPI6HS
 I9GHEvWrHaOAgPZJjlulRtIcomAUdeq8ziA=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9e9yuhb8-3
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:14 -0700
Received: from twshared14074.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:35:13 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 09A34CAD079C; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 13/18] bpf: Add CONDITIONAL_RELEASE type flag
Date: Tue, 30 Aug 2022 10:27:54 -0700
Message-ID: <20220830172759.4069786-14-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: iMBGzDY_BXoDmQLNTQXd87BykWnYscSv
X-Proofpoint-GUID: iMBGzDY_BXoDmQLNTQXd87BykWnYscSv
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Currently if a helper proto has an arg with OBJ_RELEASE flag, the verifier
assumes the 'release' logic in the helper will always succeed, and
therefore always clears the arg reg, other regs w/ same
ref_obj_id, and releases the reference. This poses a problem for
'release' helpers which may not always succeed.

For example, bpf_rbtree_add will fail to add the passed-in node to a
bpf_rbtree if the rbtree's lock is not held when the helper is called.
In this case the helper returns NULL and the calling bpf program must
release the node in another way before terminating to avoid leaking
memory. An example of such logic:

1  struct node_data *node = bpf_rbtree_alloc_node(&rbtree, ...);
2  struct node_data *ret = bpf_rbtree_add(&rbtree, node);
3  if (!ret) {
4    bpf_rbtree_free_node(node);
5    return 0;
6  }
7  bpf_trace_printk("%d\n", ret->id);

However, current verifier logic does not allow this: after line 2,
ref_obj_id of reg holding 'node' (BPF_REG_2) will be released via
release_reference function, which will mark BPF_REG_2 and any other reg
with same ref_obj_id as unknown. As a result neither ret nor node will
point to anything useful if line 3's check passes. Additionally, since the
reference is unconditionally released, the program would pass the
verifier without the null check.

This patch adds 'conditional release' semantics so that the verifier can
handle the above example correctly. The CONDITIONAL_RELEASE type flag
works in concert with the existing OBJ_RELEASE flag - the latter is used
to tag an argument, while the new type flag is used to tag return type.

If a helper has an OBJ_RELEASE arg type and CONDITIONAL_RELEASE return
type, the helper is considered to use its return value to indicate
success or failure of the release operation. NULL is returned if release
fails, non-null otherwise.

For my concrete usecase - bpf_rbtree_add - CONDITIONAL_RELEASE works in
concert with OBJ_NON_OWNING_REF: successful release results in a non-owning
reference being returned, allowing line 7 in above example.

Instead of unconditionally releasing the OBJ_RELEASE reference when
doing check_helper_call, for CONDITIONAL_RELEASE helpers the verifier
will wait until the return value is checked for null.
  If not null: the reference is released

  If null: no reference is released. Since other regs w/ same ref_obj_id
           were not marked unknown by check_helper_call, they can be
           used to release the reference via other means (line 4 above),

It's necessary to prevent conditionally-released ref_obj_id regs from
being used between the release helper and null check. For example:

1  struct node_data *node = bpf_rbtree_alloc_node(&rbtree, ...);
2  struct node_data *ret = bpf_rbtree_add(&rbtree, node);
3  do_something_with_a_node(node);
4  if (!ret) {
5    bpf_rbtree_free_node(node);
6    return 0;
7  }

Line 3 shouldn't be allowed since node may have been released. The
verifier tags all regs with ref_obj_id of the conditionally-released arg
in the period between the helper call and null check for this reason.

Why no matching CONDITIONAL_ACQUIRE type flag? Existing verifier logic
already treats acquire of an _OR_NULL type as a conditional acquire.
Consider this code:

1  struct thing *i = acquire_helper_that_returns_thing_or_null();
2  if (!i)
3    return 0;
4  manipulate_thing(i);
5  release_thing(i);

After line 1, BPF_REG_0 will have an _OR_NULL type and a ref_obj_id set.
When the verifier sees line 2's conditional jump, existing logic in
mark_ptr_or_null_regs, specifically the if:

  if (ref_obj_id && ref_obj_id == id && is_null)
          /* regs[regno] is in the " == NULL" branch.
           * No one could have freed the reference state before
           * doing the NULL check.
           */
           WARN_ON_ONCE(release_reference_state(state, id));

will release the reference in the is_null state.

[ TODO: Either need to remove WARN_ON_ONCE there without adding
CONDITIONAL_ACQUIRE flag or add the flag and don't WARN_ON_ONCE if it's
set. Left out of first pass for simplicity's sake. ]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h          |   3 +
 include/linux/bpf_verifier.h |   1 +
 kernel/bpf/rbtree.c          |   2 +-
 kernel/bpf/verifier.c        | 114 +++++++++++++++++++++++++++++++----
 4 files changed, 108 insertions(+), 12 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f164bd6e2f3a..83b8d63545e3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -416,6 +416,9 @@ enum bpf_type_flag {
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
 	OBJ_NON_OWNING_REF	= BIT(11 + BPF_BASE_TYPE_BITS),
+
+	CONDITIONAL_RELEASE	= BIT(12 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f81638844a4d..e5f967d17bb9 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -314,6 +314,7 @@ struct bpf_verifier_state {
 	u32 curframe;
 	u32 active_spin_lock;
 	void *maybe_active_spin_lock_addr;
+	u32 active_cond_ref_obj_id;
 	bool speculative;
 
 	/* first and last insn idx of this verifier state */
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index cc89639df8a2..e6f51c27661c 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -144,7 +144,7 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
 const struct bpf_func_proto bpf_rbtree_add_proto = {
 	.func = bpf_rbtree_add,
 	.gpl_only = true,
-	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF | CONDITIONAL_RELEASE,
 	.ret_btf_id = BPF_PTR_POISON,
 	.arg1_type = ARG_CONST_MAP_PTR,
 	.arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 26aa228fa860..3da8959e5f7a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -474,6 +474,11 @@ static bool type_is_non_owning_ref(u32 type)
 	return type & OBJ_NON_OWNING_REF;
 }
 
+static bool type_is_cond_release(u32 type)
+{
+	return type & CONDITIONAL_RELEASE;
+}
+
 static bool type_may_be_null(u32 type)
 {
 	return type & PTR_MAYBE_NULL;
@@ -641,6 +646,15 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 			postfix_idx += strlcpy(postfix + postfix_idx, "_non_own", 32 - postfix_idx);
 	}
 
+	if (type_is_cond_release(type)) {
+		if (base_type(type) == PTR_TO_BTF_ID)
+			postfix_idx += strlcpy(postfix + postfix_idx, "cond_rel_",
+					       32 - postfix_idx);
+		else
+			postfix_idx += strlcpy(postfix + postfix_idx, "_cond_rel",
+					       32 - postfix_idx);
+	}
+
 	if (type & MEM_RDONLY)
 		strncpy(prefix, "rdonly_", 32);
 	if (type & MEM_ALLOC)
@@ -1253,6 +1267,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->curframe = src->curframe;
 	dst_state->active_spin_lock = src->active_spin_lock;
 	dst_state->maybe_active_spin_lock_addr = src->maybe_active_spin_lock_addr;
+	dst_state->active_cond_ref_obj_id = src->active_cond_ref_obj_id;
 	dst_state->branches = src->branches;
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
@@ -1460,6 +1475,7 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
 		return;
 	}
 
+	reg->type &= ~CONDITIONAL_RELEASE;
 	reg->type &= ~PTR_MAYBE_NULL;
 }
 
@@ -6723,24 +6739,78 @@ static void release_reg_references(struct bpf_verifier_env *env,
 	}
 }
 
+static int __release_reference(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate,
+			       int ref_obj_id)
+{
+	int err;
+	int i;
+
+	err = release_reference_state(vstate->frame[vstate->curframe], ref_obj_id);
+	if (err)
+		return err;
+
+	for (i = 0; i <= vstate->curframe; i++)
+		release_reg_references(env, vstate->frame[i], ref_obj_id);
+	return 0;
+}
+
 /* The pointer with the specified id has released its reference to kernel
  * resources. Identify all copies of the same pointer and clear the reference.
  */
 static int release_reference(struct bpf_verifier_env *env,
 			     int ref_obj_id)
 {
-	struct bpf_verifier_state *vstate = env->cur_state;
-	int err;
+	return __release_reference(env, env->cur_state, ref_obj_id);
+}
+
+static void tag_reference_cond_release_regs(struct bpf_verifier_env *env,
+					    struct bpf_func_state *state,
+					    int ref_obj_id,
+					    bool remove)
+{
+	struct bpf_reg_state *regs = state->regs, *reg;
 	int i;
 
-	err = release_reference_state(cur_func(env), ref_obj_id);
-	if (err)
-		return err;
+	for (i = 0; i < MAX_BPF_REG; i++)
+		if (regs[i].ref_obj_id == ref_obj_id) {
+			if (remove)
+				regs[i].type &= ~CONDITIONAL_RELEASE;
+			else
+				regs[i].type |= CONDITIONAL_RELEASE;
+		}
+
+	bpf_for_each_spilled_reg(i, state, reg) {
+		if (!reg)
+			continue;
+		if (reg->ref_obj_id == ref_obj_id) {
+			if (remove)
+				reg->type &= ~CONDITIONAL_RELEASE;
+			else
+				reg->type |= CONDITIONAL_RELEASE;
+		}
+	}
+}
+
+static void tag_reference_cond_release(struct bpf_verifier_env *env,
+				       int ref_obj_id)
+{
+	struct bpf_verifier_state *vstate = env->cur_state;
+	int i;
 
 	for (i = 0; i <= vstate->curframe; i++)
-		release_reg_references(env, vstate->frame[i], ref_obj_id);
+		tag_reference_cond_release_regs(env, vstate->frame[i],
+						ref_obj_id, false);
+}
 
-	return 0;
+static void untag_reference_cond_release(struct bpf_verifier_env *env,
+					 struct bpf_verifier_state *vstate,
+					 int ref_obj_id)
+{
+	int i;
+
+	for (i = 0; i <= vstate->curframe; i++)
+		tag_reference_cond_release_regs(env, vstate->frame[i],
+						ref_obj_id, true);
 }
 
 static void clear_non_owning_ref_regs(struct bpf_verifier_env *env,
@@ -7537,7 +7607,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		if (arg_type_is_dynptr(fn->arg_type[meta.release_regno - BPF_REG_1]))
 			err = unmark_stack_slots_dynptr(env, &regs[meta.release_regno]);
 		else if (meta.ref_obj_id)
-			err = release_reference(env, meta.ref_obj_id);
+			if (type_is_cond_release(fn->ret_type)) {
+				if (env->cur_state->active_cond_ref_obj_id) {
+					verbose(env, "can't handle >1 cond_release\n");
+					return err;
+				}
+				env->cur_state->active_cond_ref_obj_id = meta.ref_obj_id;
+				tag_reference_cond_release(env, meta.ref_obj_id);
+				err = 0;
+			} else {
+				err = release_reference(env, meta.ref_obj_id);
+			}
 		/* meta.ref_obj_id can only be 0 if register that is meant to be
 		 * released is NULL, which must be > R0.
 		 */
@@ -10212,7 +10292,7 @@ static void __mark_ptr_or_null_regs(struct bpf_func_state *state, u32 id,
  * be folded together at some point.
  */
 static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
-				  bool is_null)
+				  bool is_null, struct bpf_verifier_env *env)
 {
 	struct bpf_func_state *state = vstate->frame[vstate->curframe];
 	struct bpf_reg_state *regs = state->regs;
@@ -10227,6 +10307,15 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
 		 */
 		WARN_ON_ONCE(release_reference_state(state, id));
 
+	if (type_is_cond_release(regs[regno].type)) {
+		if (!is_null) {
+			__release_reference(env, vstate, vstate->active_cond_ref_obj_id);
+			vstate->active_cond_ref_obj_id = 0;
+		} else {
+			untag_reference_cond_release(env, vstate, vstate->active_cond_ref_obj_id);
+			vstate->active_cond_ref_obj_id = 0;
+		}
+	}
 	for (i = 0; i <= vstate->curframe; i++)
 		__mark_ptr_or_null_regs(vstate->frame[i], id, is_null);
 }
@@ -10537,9 +10626,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 		 * safe or unknown depending R == 0 or R != 0 conditional.
 		 */
 		mark_ptr_or_null_regs(this_branch, insn->dst_reg,
-				      opcode == BPF_JNE);
+				      opcode == BPF_JNE, env);
 		mark_ptr_or_null_regs(other_branch, insn->dst_reg,
-				      opcode == BPF_JEQ);
+				      opcode == BPF_JEQ, env);
 	} else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
 					   this_branch, other_branch) &&
 		   is_pointer_value(env, insn->dst_reg)) {
@@ -11996,6 +12085,9 @@ static bool states_equal(struct bpf_verifier_env *env,
 	if (old->active_spin_lock != cur->active_spin_lock)
 		return false;
 
+	if (old->active_cond_ref_obj_id != cur->active_cond_ref_obj_id)
+		return false;
+
 	/* for states to be equal callsites have to be the same
 	 * and all frame states need to be equivalent
 	 */

From patchwork Tue Aug 30 17:27:55 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959635
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 65366ECAAD1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:45:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230093AbiH3Rpk (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:45:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60296 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231883AbiH3RpN (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:45:13 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7101EBCC02
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:41:48 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UGLbLQ003220
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:28 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=XbKTLtkRtqi5g/CEyYGaDoZsNgdHirfzPIeoXWRfhG4=;
 b=mS6848Albcv1Y8RpeJZrheKbqSezI2uukZMiy1scfCFmPpx95ITqyc5W0zEtpG0ENWK/
 R0sJR5Kwwh3RvBCXTEtUO9/EnCz0ExHZRT9SigF5eDTPQFfTZs0UZUo1KM/6BNNFuFfu
 uj2DlzcjiUvjAN0wqpRBqowmE5epaHiToT4=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5djsgd-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:28 -0700
Received: from twshared0823.18.frc3.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:35:27 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 15D77CAD079D; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 14/18] bpf: Introduce PTR_ITER and PTR_ITER_END
 type flags
Date: Tue, 30 Aug 2022 10:27:55 -0700
Message-ID: <20220830172759.4069786-15-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: fmuqmzsrWcGiuyIBeWYyXUxDXDZ8V6Rh
X-Proofpoint-ORIG-GUID: fmuqmzsrWcGiuyIBeWYyXUxDXDZ8V6Rh
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Add two type flags, PTR_ITER and PTR_ITER_END, meant to support the
following pattern for iterating data structures:

  struct node *elem = data_structure_iter_first(&some_map)
  while (elem) {
    do_something();
    elem = data_structure_iter_next(&some_map, elem);
  }

Currently the ret_type of both _first() and _next() helpers would be
PTR_TO_BTF_ID_OR_NULL and the verifier would consider the loop to be
infinite as it knows nothing about an arbitrary PTR_TO_BTF_ID value.

However, if we can express "this PTR_TO_BTF_ID will eventually be
NULL if used in iteration helpers" via a new type flag, the verifier can
use this information to determine that the loop will terminate while
still verifying the loop body.

So for our contrived example above, _first() now returns a
PTR_TO_BTF_ID_OR_NULL | PTR_ITER, which _next() expects as input,
itself returning PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END.

The verifier does nothing special for PTR_ITER, so the first iteration
of the example loop will result in both elem == NULL and elem != NULL
branches being verified. When verifying the latter branch, elem will
become a PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END.

When the verifier sees a reg holding PTR_ITER_END in a conditional jump
it knows the reg type and val can be replaced with SCALAR 0. So in the
example loop on 2nd iteration elem == NULL will be assumed and
verification will continue with that branch only.

[ TODO:
  * PTR_ITER needs to be associated with the helper that produced it, to
  prevent something like:
    struct node *elem = data_structure_iter_last(&some_map)
    while (elem) {
      do_something();
      elem = data_structure_iter_next(&some_map, elem);
    }

  i.e. _first() and _next() should be used together, same with _last()
  and _prev().

  * This is currently very unsafe. Per multiple conversations w/ Alexei
  and Andrii, there are probably some ways forward here, but may be more
  complex than it's worth. If so, can migrate to a callback-based
  approach with similar functionality .
]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h            |  3 ++
 include/uapi/linux/bpf.h       | 32 ++++++++++++++
 kernel/bpf/helpers.c           | 12 ++++++
 kernel/bpf/rbtree.c            | 78 ++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          | 53 +++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 32 ++++++++++++++
 6 files changed, 207 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 83b8d63545e3..f4aabfa943c1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -419,6 +419,9 @@ enum bpf_type_flag {
 
 	CONDITIONAL_RELEASE	= BIT(12 + BPF_BASE_TYPE_BITS),
 
+	PTR_ITER	= BIT(13 + BPF_BASE_TYPE_BITS),
+	PTR_ITER_END	= BIT(14 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f4c615fbf64f..bb556c449cf0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5433,6 +5433,34 @@ union bpf_attr {
  *
  *	Return
  *		0
+ *
+ * long bpf_rbtree_first(struct bpf_map *map)
+ *	Description
+ *		Return the first node in the tree according to sort order
+ *
+ *	Return
+ *		If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_last(struct bpf_map *map)
+ *	Description
+ *		Return the last node in the tree according to sort order
+ *
+ *	Return
+ *		If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_next(struct bpf_map *map, void *cur)
+ *	Description
+ *		Return the next node in the tree according to sort order
+ *
+ *	Return
+ *		If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_prev(struct bpf_map *map, void *cur)
+ *	Description
+ *		Return the previous node in the tree according to sort order
+ *
+ *	Return
+ *		If found, ptr to node, otherwise NULL
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5652,6 +5680,10 @@ union bpf_attr {
 	FN(rbtree_get_lock),		\
 	FN(rbtree_lock),		\
 	FN(rbtree_unlock),		\
+	FN(rbtree_first),		\
+	FN(rbtree_last),		\
+	FN(rbtree_next),		\
+	FN(rbtree_prev),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 0ca5fed1013b..32e245c559c4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1608,6 +1608,10 @@ const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
 const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak;
 const struct bpf_func_proto bpf_rbtree_lock_proto __weak;
 const struct bpf_func_proto bpf_rbtree_unlock_proto __weak;
+const struct bpf_func_proto bpf_rbtree_first_proto __weak;
+const struct bpf_func_proto bpf_rbtree_last_proto __weak;
+const struct bpf_func_proto bpf_rbtree_next_proto __weak;
+const struct bpf_func_proto bpf_rbtree_prev_proto __weak;
 
 const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1715,6 +1719,14 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_rbtree_lock_proto;
 	case BPF_FUNC_rbtree_unlock:
 		return &bpf_rbtree_unlock_proto;
+	case BPF_FUNC_rbtree_first:
+		return &bpf_rbtree_first_proto;
+	case BPF_FUNC_rbtree_last:
+		return &bpf_rbtree_last_proto;
+	case BPF_FUNC_rbtree_next:
+		return &bpf_rbtree_next_proto;
+	case BPF_FUNC_rbtree_prev:
+		return &bpf_rbtree_prev_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index e6f51c27661c..4794a50adbca 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -174,6 +174,84 @@ const struct bpf_func_proto bpf_rbtree_find_proto = {
 	.arg3_type = ARG_PTR_TO_FUNC,
 };
 
+BPF_CALL_1(bpf_rbtree_first, struct bpf_map *, map)
+{
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+	if (!__rbtree_lock_held(tree))
+		return (u64)NULL;
+
+	return (u64)rb_first_cached(&tree->root);
+}
+
+const struct bpf_func_proto bpf_rbtree_first_proto = {
+	.func = bpf_rbtree_first,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER | OBJ_NON_OWNING_REF,
+	.ret_btf_id = BPF_PTR_POISON,
+	.arg1_type = ARG_CONST_MAP_PTR,
+};
+
+BPF_CALL_1(bpf_rbtree_last, struct bpf_map *, map)
+{
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+	if (!__rbtree_lock_held(tree))
+		return (u64)NULL;
+
+	return (u64)rb_last(&tree->root.rb_root);
+}
+
+const struct bpf_func_proto bpf_rbtree_last_proto = {
+	.func = bpf_rbtree_last,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER | OBJ_NON_OWNING_REF,
+	.ret_btf_id = BPF_PTR_POISON,
+	.arg1_type = ARG_CONST_MAP_PTR,
+};
+
+BPF_CALL_2(bpf_rbtree_next, struct bpf_map *, map, void *, cur)
+{
+	struct rb_node *next = rb_next((struct rb_node *)cur);
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+	if (!__rbtree_lock_held(tree))
+		return (u64)NULL;
+
+	return (u64)next;
+}
+
+const struct bpf_func_proto bpf_rbtree_next_proto = {
+	.func = bpf_rbtree_next,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END | OBJ_NON_OWNING_REF,
+	.ret_btf_id = BPF_PTR_POISON,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_PTR_TO_BTF_ID | PTR_ITER,
+	.arg2_btf_id = BPF_PTR_POISON,
+};
+
+BPF_CALL_2(bpf_rbtree_prev, struct bpf_map *, map, void *, cur)
+{
+	struct rb_node *node = (struct rb_node *)cur;
+	struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+	if (!__rbtree_lock_held(tree))
+		return (u64)NULL;
+
+	return (u64)rb_prev(node);
+}
+
+const struct bpf_func_proto bpf_rbtree_prev_proto = {
+	.func = bpf_rbtree_prev,
+	.gpl_only = true,
+	.ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END | OBJ_NON_OWNING_REF,
+	.ret_btf_id = BPF_PTR_POISON,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_PTR_TO_BTF_ID | PTR_ITER,
+	.arg2_btf_id = BPF_PTR_POISON,
+};
+
 /* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are
  * 'rb_node *', so field name of rb_node within containing struct is not
  * needed.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3da8959e5f7a..6354d3a2217d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -484,6 +484,11 @@ static bool type_may_be_null(u32 type)
 	return type & PTR_MAYBE_NULL;
 }
 
+static bool type_is_iter(u32 type)
+{
+	return type & PTR_ITER || type & PTR_ITER_END;
+}
+
 static bool is_acquire_function(enum bpf_func_id func_id,
 				const struct bpf_map *map)
 {
@@ -586,7 +591,9 @@ static bool function_manipulates_rbtree_node(enum bpf_func_id func_id)
 {
 	return func_id == BPF_FUNC_rbtree_add ||
 		func_id == BPF_FUNC_rbtree_remove ||
-		func_id == BPF_FUNC_rbtree_free_node;
+		func_id == BPF_FUNC_rbtree_free_node ||
+		func_id == BPF_FUNC_rbtree_next ||
+		func_id == BPF_FUNC_rbtree_prev;
 }
 
 static bool function_returns_rbtree_node(enum bpf_func_id func_id)
@@ -594,7 +601,11 @@ static bool function_returns_rbtree_node(enum bpf_func_id func_id)
 	return func_id == BPF_FUNC_rbtree_alloc_node ||
 		func_id == BPF_FUNC_rbtree_find ||
 		func_id == BPF_FUNC_rbtree_add ||
-		func_id == BPF_FUNC_rbtree_remove;
+		func_id == BPF_FUNC_rbtree_remove ||
+		func_id == BPF_FUNC_rbtree_first ||
+		func_id == BPF_FUNC_rbtree_last ||
+		func_id == BPF_FUNC_rbtree_next ||
+		func_id == BPF_FUNC_rbtree_prev;
 }
 
 /* string representation of 'enum bpf_reg_type'
@@ -655,6 +666,12 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 					       32 - postfix_idx);
 	}
 
+	if (type_is_iter(type)) {
+		postfix_idx += strlcpy(postfix + postfix_idx, "_iter", 32 - postfix_idx);
+		if (type & PTR_ITER_END)
+			postfix_idx += strlcpy(postfix + postfix_idx, "_end", 32 - postfix_idx);
+	}
+
 	if (type & MEM_RDONLY)
 		strncpy(prefix, "rdonly_", 32);
 	if (type & MEM_ALLOC)
@@ -1470,7 +1487,7 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
 			   map->map_type == BPF_MAP_TYPE_SOCKHASH) {
 			reg->type = PTR_TO_SOCKET;
 		} else {
-			reg->type = PTR_TO_MAP_VALUE;
+			reg->type &= ~PTR_MAYBE_NULL;
 		}
 		return;
 	}
@@ -3063,6 +3080,11 @@ static bool __is_pointer_value(bool allow_ptr_leaks,
 	return reg->type != SCALAR_VALUE;
 }
 
+static bool __is_iter_end(const struct bpf_reg_state *reg)
+{
+	return reg->type & PTR_ITER_END;
+}
+
 static void save_register_state(struct bpf_func_state *state,
 				int spi, struct bpf_reg_state *reg,
 				int size)
@@ -5737,6 +5759,8 @@ static const struct bpf_reg_types map_key_value_types = {
 		PTR_TO_PACKET_META,
 		PTR_TO_MAP_KEY,
 		PTR_TO_MAP_VALUE,
+		PTR_TO_MAP_VALUE | PTR_ITER,
+		PTR_TO_MAP_VALUE | PTR_ITER_END,
 	},
 };
 
@@ -5870,6 +5894,17 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	if (arg_type & PTR_MAYBE_NULL)
 		type &= ~PTR_MAYBE_NULL;
 
+	/* TYPE | PTR_ITER is valid input for helpers that expect TYPE
+	 * TYPE is not valid input for helpers that expect TYPE | PTR_ITER
+	 */
+	if (type_is_iter(arg_type)) {
+		if (!type_is_iter(type))
+			goto not_found;
+
+		type &= ~PTR_ITER;
+		type &= ~PTR_ITER_END;
+	}
+
 	for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
 		expected = compatible->types[i];
 		if (expected == NOT_INIT)
@@ -5879,6 +5914,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			goto found;
 	}
 
+not_found:
 	verbose(env, "R%d type=%s expected=", regno, reg_type_str(env, reg->type));
 	for (j = 0; j + 1 < i; j++)
 		verbose(env, "%s, ", reg_type_str(env, compatible->types[j]));
@@ -9933,6 +9969,17 @@ static int is_branch_taken(struct bpf_reg_state *reg, u64 val, u8 opcode,
 			   bool is_jmp32)
 {
 	if (__is_pointer_value(false, reg)) {
+		if (__is_iter_end(reg) && val == 0) {
+			__mark_reg_const_zero(reg);
+			switch (opcode) {
+			case BPF_JEQ:
+				return 1;
+			case BPF_JNE:
+				return 0;
+			default:
+				return -1;
+			}
+		}
 		if (!reg_type_not_null(reg->type))
 			return -1;
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f4c615fbf64f..bb556c449cf0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5433,6 +5433,34 @@ union bpf_attr {
  *
  *	Return
  *		0
+ *
+ * long bpf_rbtree_first(struct bpf_map *map)
+ *	Description
+ *		Return the first node in the tree according to sort order
+ *
+ *	Return
+ *		If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_last(struct bpf_map *map)
+ *	Description
+ *		Return the last node in the tree according to sort order
+ *
+ *	Return
+ *		If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_next(struct bpf_map *map, void *cur)
+ *	Description
+ *		Return the next node in the tree according to sort order
+ *
+ *	Return
+ *		If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_prev(struct bpf_map *map, void *cur)
+ *	Description
+ *		Return the previous node in the tree according to sort order
+ *
+ *	Return
+ *		If found, ptr to node, otherwise NULL
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5652,6 +5680,10 @@ union bpf_attr {
 	FN(rbtree_get_lock),		\
 	FN(rbtree_lock),		\
 	FN(rbtree_unlock),		\
+	FN(rbtree_first),		\
+	FN(rbtree_last),		\
+	FN(rbtree_next),		\
+	FN(rbtree_prev),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

From patchwork Tue Aug 30 17:27:56 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959629
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CC2A8ECAAD1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:37:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232086AbiH3RhM (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:37:12 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47052 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231830AbiH3Rgm (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:36:42 -0400
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com
 [67.231.145.42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A10C212A5E5
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:33:21 -0700 (PDT)
Received: from pps.filterd (m0148461.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UG2NEe010140
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:26 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=vWJ04e9HlRIugBxVyk5DZsLK/c/mhW8RV268HwXg76A=;
 b=GDEZNpJ5HkTwAFEtjYLeT1nOktHmH7sv/fdlbwfJ0wyaA+bDkVbe+otu17/w+Xi4/cLC
 ih9r0ZhnzROOCrkHRt94fGWRiofszULIJM201cgq/n0a9GkacXM2yivuv34hkY0fGZ7r
 NFUxdJUAPSBBJ0cPaOnGtmvnCjJXTLjjW9I=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j94gye2dv-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:25 -0700
Received: from twshared30313.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:31:24 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 20058CAD079E; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 15/18] selftests/bpf: Add rbtree map tests
Date: Tue, 30 Aug 2022 10:27:56 -0700
Message-ID: <20220830172759.4069786-16-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: dqCM8F0dzbDXFzS1FX02HGebw7YFEbj3
X-Proofpoint-GUID: dqCM8F0dzbDXFzS1FX02HGebw7YFEbj3
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

Add tests demonstrating happy path of rbtree map usage as well as
exercising numerous failure paths and conditions. Structure of failing
test runner is based on dynptr tests.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/prog_tests/rbtree_map.c     | 164 ++++++++++++
 .../testing/selftests/bpf/progs/rbtree_map.c  | 111 ++++++++
 .../selftests/bpf/progs/rbtree_map_fail.c     | 236 ++++++++++++++++++
 .../bpf/progs/rbtree_map_load_fail.c          |  24 ++
 4 files changed, 535 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
new file mode 100644
index 000000000000..17cadcd05ee4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <sys/syscall.h>
+#include <test_progs.h>
+#include "rbtree_map.skel.h"
+#include "rbtree_map_fail.skel.h"
+#include "rbtree_map_load_fail.skel.h"
+
+static size_t log_buf_sz = 1048576; /* 1 MB */
+static char obj_log_buf[1048576];
+
+static struct {
+	const char *prog_name;
+	const char *expected_err_msg;
+} rbtree_prog_load_fail_tests[] = {
+	{"rb_node__field_store", "only read is supported"},
+	{"rb_node__alloc_no_add", "Unreleased reference id=2 alloc_insn=3"},
+	{"rb_node__two_alloc_one_add", "Unreleased reference id=2 alloc_insn=3"},
+	{"rb_node__remove_no_free", "Unreleased reference id=5 alloc_insn=28"},
+	{"rb_tree__add_wrong_type", "rbtree: R2 is of type task_struct but node_data is expected"},
+	{"rb_tree__conditional_release_helper_usage",
+		"R2 type=ptr_cond_rel_ expected=ptr_"},
+};
+
+void test_rbtree_map_load_fail(void)
+{
+	struct rbtree_map_load_fail *skel;
+
+	skel = rbtree_map_load_fail__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "rbtree_map_load_fail__open_and_load"))
+		rbtree_map_load_fail__destroy(skel);
+}
+
+static void verify_fail(const char *prog_name, const char *expected_err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	struct rbtree_map_fail *skel;
+	struct bpf_program *prog;
+	int err;
+
+	opts.kernel_log_buf = obj_log_buf;
+	opts.kernel_log_size = log_buf_sz;
+	opts.kernel_log_level = 1;
+
+	skel = rbtree_map_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open_opts"))
+		goto cleanup;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	bpf_program__set_autoload(prog, true);
+	err = rbtree_map_fail__load(skel);
+	if (!ASSERT_ERR(err, "unexpected load success"))
+		goto cleanup;
+
+	if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
+		fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
+		fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
+	}
+
+cleanup:
+	rbtree_map_fail__destroy(skel);
+}
+
+void test_rbtree_map_alloc_node__size_too_small(void)
+{
+	struct rbtree_map_fail *skel;
+	struct bpf_program *prog;
+	struct bpf_link *link;
+	int err;
+
+	skel = rbtree_map_fail__open();
+	if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open"))
+		goto cleanup;
+
+	prog = skel->progs.alloc_node__size_too_small;
+	bpf_program__set_autoload(prog, true);
+
+	err = rbtree_map_fail__load(skel);
+	if (!ASSERT_OK(err, "unexpected load fail"))
+		goto cleanup;
+
+	link = bpf_program__attach(skel->progs.alloc_node__size_too_small);
+	if (!ASSERT_OK_PTR(link, "link"))
+		goto cleanup;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->size_too_small__alloc_fail, 1, "alloc_fail");
+
+	bpf_link__destroy(link);
+cleanup:
+	rbtree_map_fail__destroy(skel);
+}
+
+void test_rbtree_map_add_node__no_lock(void)
+{
+	struct rbtree_map_fail *skel;
+	struct bpf_program *prog;
+	struct bpf_link *link;
+	int err;
+
+	skel = rbtree_map_fail__open();
+	if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open"))
+		goto cleanup;
+
+	prog = skel->progs.add_node__no_lock;
+	bpf_program__set_autoload(prog, true);
+
+	err = rbtree_map_fail__load(skel);
+	if (!ASSERT_OK(err, "unexpected load fail"))
+		goto cleanup;
+
+	link = bpf_program__attach(skel->progs.add_node__no_lock);
+	if (!ASSERT_OK_PTR(link, "link"))
+		goto cleanup;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->no_lock_add__fail, 1, "alloc_fail");
+
+	bpf_link__destroy(link);
+cleanup:
+	rbtree_map_fail__destroy(skel);
+}
+
+void test_rbtree_map_prog_load_fail(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(rbtree_prog_load_fail_tests); i++) {
+		if (!test__start_subtest(rbtree_prog_load_fail_tests[i].prog_name))
+			continue;
+
+		verify_fail(rbtree_prog_load_fail_tests[i].prog_name,
+			    rbtree_prog_load_fail_tests[i].expected_err_msg);
+	}
+}
+
+void test_rbtree_map(void)
+{
+	struct rbtree_map *skel;
+	struct bpf_link *link;
+
+	skel = rbtree_map__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree_map__open_and_load"))
+		goto cleanup;
+
+	link = bpf_program__attach(skel->progs.check_rbtree);
+	if (!ASSERT_OK_PTR(link, "link"))
+		goto cleanup;
+
+	for (int i = 0; i < 100; i++)
+		syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->calls, 100, "calls_equal");
+
+	bpf_link__destroy(link);
+cleanup:
+	rbtree_map__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map.c b/tools/testing/selftests/bpf/progs/rbtree_map.c
new file mode 100644
index 000000000000..0cd467838f6e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_map.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+struct node_data {
+	struct rb_node node;
+	__u32 one;
+	__u32 two;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RBTREE);
+	__type(value, struct node_data);
+} rbtree SEC(".maps");
+
+long calls;
+
+static bool less(struct rb_node *a, const struct rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_a->one < node_b->one;
+}
+
+// Key = node_datq
+static int cmp(const void *key, const struct rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(key, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_b->one - node_a->one;
+}
+
+// Key = just node_data.one
+static int cmp2(const void *key, const struct rb_node *b)
+{
+	__u32 one;
+	struct node_data *node_b;
+
+	one = *(__u32 *)key;
+	node_b = container_of(b, struct node_data, node);
+
+	return node_b->one - one;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_getpgid")
+int check_rbtree(void *ctx)
+{
+	struct node_data *node, *found, *ret;
+	struct node_data popped;
+	struct node_data search;
+	__u32 search2;
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+	if (!node)
+		return 0;
+
+	node->one = calls;
+	node->two = 6;
+	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+	ret = (struct node_data *)bpf_rbtree_add(&rbtree, node, less);
+	if (!ret) {
+		bpf_rbtree_free_node(&rbtree, node);
+		goto unlock_ret;
+	}
+
+	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+
+	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+	search.one = calls;
+	found = (struct node_data *)bpf_rbtree_find(&rbtree, &search, cmp);
+	if (!found)
+		goto unlock_ret;
+
+	int node_ct = 0;
+	struct node_data *iter = (struct node_data *)bpf_rbtree_first(&rbtree);
+
+	while (iter) {
+		node_ct++;
+		iter = (struct node_data *)bpf_rbtree_next(&rbtree, iter);
+	}
+
+	ret = (struct node_data *)bpf_rbtree_remove(&rbtree, found);
+	if (!ret)
+		goto unlock_ret;
+
+	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+
+	bpf_rbtree_free_node(&rbtree, ret);
+
+	__sync_fetch_and_add(&calls, 1);
+	return 0;
+
+unlock_ret:
+	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
new file mode 100644
index 000000000000..287c8db080d8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+struct node_data {
+	struct rb_node node;
+	__u32 one;
+	__u32 two;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RBTREE);
+	__type(value, struct node_data);
+} rbtree SEC(".maps");
+
+long calls;
+
+static bool less(struct rb_node *a, const struct rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_a->one < node_b->one;
+}
+
+// Key = node_datq
+static int cmp(const void *key, const struct rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(key, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_b->one - node_a->one;
+}
+
+long size_too_small__alloc_fail;
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int alloc_node__size_too_small(void *ctx)
+{
+	struct node_data *node, *ret;
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(char));
+	if (!node) {
+		size_too_small__alloc_fail++;
+		return 0;
+	}
+
+	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+	/* will never execute, alloc_node should fail */
+	node->one = 1;
+	ret = bpf_rbtree_add(&rbtree, node, less);
+	if (!ret) {
+		bpf_rbtree_free_node(&rbtree, node);
+		goto unlock_ret;
+	}
+
+unlock_ret:
+	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	return 0;
+}
+
+long no_lock_add__fail;
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int add_node__no_lock(void *ctx)
+{
+	struct node_data *node, *ret;
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+	if (!node)
+		return 0;
+
+	node->one = 1;
+	ret = bpf_rbtree_add(&rbtree, node, less);
+	if (!ret) {
+		no_lock_add__fail++;
+		/* will always execute, rbtree_add should fail
+		 * because no lock held
+		 */
+		bpf_rbtree_free_node(&rbtree, node);
+	}
+
+unlock_ret:
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_node__field_store(void *ctx)
+{
+	struct node_data *node;
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+	if (!node)
+		return 0;
+
+	/* Only rbtree_map helpers can modify rb_node field */
+	node->node.rb_left = NULL;
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_node__alloc_no_add(void *ctx)
+{
+	struct node_data *node;
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+	if (!node)
+		return 0;
+	/* The node alloc'd above is never added to the rbtree. It must be
+	 * added or free'd before prog terminates.
+	 */
+
+	node->one = 42;
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_node__two_alloc_one_add(void *ctx)
+{
+	struct node_data *node, *ret;
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+	if (!node)
+		return 0;
+	node->one = 1;
+	/* The node alloc'd above is never added to the rbtree. It must be
+	 * added or free'd before prog terminates.
+	 */
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+	if (!node)
+		return 0;
+	node->one = 42;
+
+	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+	ret = bpf_rbtree_add(&rbtree, node, less);
+	if (!ret) {
+		bpf_rbtree_free_node(&rbtree, node);
+		goto unlock_ret;
+	}
+
+unlock_ret:
+	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_node__remove_no_free(void *ctx)
+{
+	struct node_data *node, *ret;
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+	if (!node)
+		return 0;
+	node->one = 42;
+
+	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+	ret = bpf_rbtree_add(&rbtree, node, less);
+	if (!ret) {
+		bpf_rbtree_free_node(&rbtree, node);
+		goto unlock_ret;
+	}
+
+	ret = bpf_rbtree_remove(&rbtree, ret);
+	if (!ret)
+		goto unlock_ret;
+	/* At this point we've successfully acquired a reference from
+	 * bpf_rbtree_remove. It must be released via rbtree_add or
+	 * rbtree_free_node before prog terminates.
+	 */
+
+unlock_ret:
+	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_tree__add_wrong_type(void *ctx)
+{
+	/* Can't add a task_struct to rbtree
+	 */
+	struct task_struct *task;
+	struct node_data *ret;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+	ret = bpf_rbtree_add(&rbtree, task, less);
+	/* Verifier should fail at bpf_rbtree_add, so don't bother handling
+	 * failure.
+	 */
+
+	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_tree__conditional_release_helper_usage(void *ctx)
+{
+	struct node_data *node, *ret;
+
+	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+	if (!node)
+		return 0;
+	node->one = 42;
+
+	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+	ret = bpf_rbtree_add(&rbtree, node, less);
+	/* Verifier should fail when trying to use CONDITIONAL_RELEASE
+	 * type in a helper
+	 */
+	bpf_rbtree_free_node(&rbtree, node);
+	if (!ret) {
+		bpf_rbtree_free_node(&rbtree, node);
+		goto unlock_ret;
+	}
+
+unlock_ret:
+	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
new file mode 100644
index 000000000000..036558eedd66
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+struct node_data_no_rb_node {
+	__u64 one;
+	__u64 two;
+	__u64 three;
+	__u64 four;
+	__u64 five;
+	__u64 six;
+	__u64 seven;
+};
+
+/* Should fail because value struct has no rb_node
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_RBTREE);
+	__type(value, struct node_data_no_rb_node);
+} rbtree_fail_no_rb_node SEC(".maps");
+
+char _license[] SEC("license") = "GPL";

From patchwork Tue Aug 30 17:27:57 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959643
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4FD9FECAAD8
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:52:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229766AbiH3RwB (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:52:01 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60350 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229732AbiH3Rvl (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:51:41 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F84A12E4C7
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:48:13 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UApEhV031288
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:19 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=Kp7Gg9qS97Vp+LREyK38K5z0NO8den/G9Yf0ejc2J7g=;
 b=C4ZdScggXsYb7vzwpsCzP5ybmpXOK1hJTXhf5Q1MIgKae/0Kdg4bhtBHFUfooXs88y3T
 rQri01gww/oYKHaAW+OvUDb/79mC4+mx9osWd/rfmAKhBbYN1ZRkq6m9wYiGfDMN/PvL
 7LpthAtGgwV5txLgRIy88AY4Hx/o8CCoHeg=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5djsfe-2
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:18 -0700
Received: from twshared8288.05.ash9.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:35:18 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 6F6C5CAD079F; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 16/18] selftests/bpf: Declarative lock
 definition test changes
Date: Tue, 30 Aug 2022 10:27:57 -0700
Message-ID: <20220830172759.4069786-17-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: ZlTbxFTYQ4RUHukQX92iamhu44OWxoY0
X-Proofpoint-ORIG-GUID: ZlTbxFTYQ4RUHukQX92iamhu44OWxoY0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

This patch contains test changes corresponding to the functional changes
in patch "bpf: Support declarative association of lock with rbtree map".
It will be squashed with other test patches, leaving in this state for
RFCv2 feedback.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 tools/testing/selftests/bpf/progs/rbtree_map.c       | 12 +++++++++---
 tools/testing/selftests/bpf/progs/rbtree_map_fail.c  |  9 ++++++++-
 .../selftests/bpf/progs/rbtree_map_load_fail.c       |  9 ++++++++-
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/rbtree_map.c b/tools/testing/selftests/bpf/progs/rbtree_map.c
index 0cd467838f6e..50f29b9a5b82 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map.c
@@ -11,12 +11,18 @@ struct node_data {
 	__u32 two;
 };
 
+long calls;
+struct bpf_spin_lock rbtree_lock SEC(".bss.private");
+
 struct {
 	__uint(type, BPF_MAP_TYPE_RBTREE);
 	__type(value, struct node_data);
-} rbtree SEC(".maps");
-
-long calls;
+	__array(lock, struct bpf_spin_lock);
+} rbtree SEC(".maps") = {
+	.lock = {
+		[0] = &rbtree_lock,
+	},
+};
 
 static bool less(struct rb_node *a, const struct rb_node *b)
 {
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
index 287c8db080d8..ab4002a8211c 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
@@ -11,10 +11,17 @@ struct node_data {
 	__u32 two;
 };
 
+struct bpf_spin_lock rbtree_lock SEC(".bss.private");
+
 struct {
 	__uint(type, BPF_MAP_TYPE_RBTREE);
 	__type(value, struct node_data);
-} rbtree SEC(".maps");
+	__array(lock, struct bpf_spin_lock);
+} rbtree SEC(".maps") = {
+	.lock = {
+		[0] = &rbtree_lock,
+	},
+};
 
 long calls;
 
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
index 036558eedd66..5578769efa2f 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
@@ -14,11 +14,18 @@ struct node_data_no_rb_node {
 	__u64 seven;
 };
 
+struct bpf_spin_lock rbtree_lock SEC(".bss.private");
+
 /* Should fail because value struct has no rb_node
  */
 struct {
 	__uint(type, BPF_MAP_TYPE_RBTREE);
 	__type(value, struct node_data_no_rb_node);
-} rbtree_fail_no_rb_node SEC(".maps");
+	__array(lock, struct bpf_spin_lock);
+} rbtree_fail_no_rb_node SEC(".maps") = {
+	.lock = {
+		[0] = &rbtree_lock,
+	},
+};
 
 char _license[] SEC("license") = "GPL";

From patchwork Tue Aug 30 17:27:58 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959642
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CF07AECAAA1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:51:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231415AbiH3RvN (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:51:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58094 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231707AbiH3Ruj (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:50:39 -0400
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com
 [67.231.145.42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E2762C654
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:47:32 -0700 (PDT)
Received: from pps.filterd (m0044012.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UG2pTM028613
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:19 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=QjYiXkalxW1LLJ8pZsb13vQbAqc4bBRpNGuIiYuECnc=;
 b=ckPGQPAKVTGcXE6ameuZqM9IQPKc+e13RBlBgAVIwGEYMZhZ3dZg0umv+nmm74qjXWUF
 EiflyIiqv4jvm6n62bzjy0pawrs9chQsdA6GnLpDVRgtlehwHQ7dUI9Yo9ZjWqbeY3SB
 UrFF7H5X1z58zYd1XyB0AVhDLFB1YhGLHDY=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9a6j4521-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:35:19 -0700
Received: from twshared8288.05.ash9.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:35:18 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 7FBCFCAD07A1; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 17/18] selftests/bpf: Lock tracking test
 changes
Date: Tue, 30 Aug 2022 10:27:58 -0700
Message-ID: <20220830172759.4069786-18-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: Y6roqS7oS3IeJvFYsDNV8VP0dFkSCkbv
X-Proofpoint-GUID: Y6roqS7oS3IeJvFYsDNV8VP0dFkSCkbv
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

This patch contains test changes corresponding to the functional changes
in "bpf: Verifier tracking of rbtree_spin_lock held". It will be
squashed with other test patches, leaving in this state for RFCv2
feedback.

iter section of rbtree_map.c prog is commented out because iter helpers
will be tossed anyways.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/prog_tests/rbtree_map.c     |  2 +-
 .../testing/selftests/bpf/progs/rbtree_map.c  | 16 ++++++++-------
 .../selftests/bpf/progs/rbtree_map_fail.c     | 20 +++++++++----------
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
index 17cadcd05ee4..7634a2d93f0b 100644
--- a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
@@ -17,7 +17,7 @@ static struct {
 	{"rb_node__field_store", "only read is supported"},
 	{"rb_node__alloc_no_add", "Unreleased reference id=2 alloc_insn=3"},
 	{"rb_node__two_alloc_one_add", "Unreleased reference id=2 alloc_insn=3"},
-	{"rb_node__remove_no_free", "Unreleased reference id=5 alloc_insn=28"},
+	{"rb_node__remove_no_free", "Unreleased reference id=6 alloc_insn=26"},
 	{"rb_tree__add_wrong_type", "rbtree: R2 is of type task_struct but node_data is expected"},
 	{"rb_tree__conditional_release_helper_usage",
 		"R2 type=ptr_cond_rel_ expected=ptr_"},
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map.c b/tools/testing/selftests/bpf/progs/rbtree_map.c
index 50f29b9a5b82..957672cce82a 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map.c
@@ -65,6 +65,7 @@ int check_rbtree(void *ctx)
 	struct node_data *node, *found, *ret;
 	struct node_data popped;
 	struct node_data search;
+	struct bpf_spin_lock *lock;
 	__u32 search2;
 
 	node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
@@ -73,7 +74,8 @@ int check_rbtree(void *ctx)
 
 	node->one = calls;
 	node->two = 6;
-	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+	lock = &rbtree_lock;
+	bpf_rbtree_lock(lock);
 
 	ret = (struct node_data *)bpf_rbtree_add(&rbtree, node, less);
 	if (!ret) {
@@ -81,28 +83,28 @@ int check_rbtree(void *ctx)
 		goto unlock_ret;
 	}
 
-	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_unlock(lock);
 
-	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_lock(lock);
 
 	search.one = calls;
 	found = (struct node_data *)bpf_rbtree_find(&rbtree, &search, cmp);
 	if (!found)
 		goto unlock_ret;
 
-	int node_ct = 0;
+	/*int node_ct = 0;
 	struct node_data *iter = (struct node_data *)bpf_rbtree_first(&rbtree);
 
 	while (iter) {
 		node_ct++;
 		iter = (struct node_data *)bpf_rbtree_next(&rbtree, iter);
-	}
+	}*/
 
 	ret = (struct node_data *)bpf_rbtree_remove(&rbtree, found);
 	if (!ret)
 		goto unlock_ret;
 
-	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_unlock(lock);
 
 	bpf_rbtree_free_node(&rbtree, ret);
 
@@ -110,7 +112,7 @@ int check_rbtree(void *ctx)
 	return 0;
 
 unlock_ret:
-	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_unlock(&rbtree_lock);
 	return 0;
 }
 
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
index ab4002a8211c..779b85294f37 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
@@ -61,7 +61,7 @@ int alloc_node__size_too_small(void *ctx)
 		return 0;
 	}
 
-	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_lock(&rbtree_lock);
 	/* will never execute, alloc_node should fail */
 	node->one = 1;
 	ret = bpf_rbtree_add(&rbtree, node, less);
@@ -71,7 +71,7 @@ int alloc_node__size_too_small(void *ctx)
 	}
 
 unlock_ret:
-	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_unlock(&rbtree_lock);
 	return 0;
 }
 
@@ -148,7 +148,7 @@ int rb_node__two_alloc_one_add(void *ctx)
 		return 0;
 	node->one = 42;
 
-	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_lock(&rbtree_lock);
 
 	ret = bpf_rbtree_add(&rbtree, node, less);
 	if (!ret) {
@@ -157,7 +157,7 @@ int rb_node__two_alloc_one_add(void *ctx)
 	}
 
 unlock_ret:
-	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_unlock(&rbtree_lock);
 	return 0;
 }
 
@@ -171,7 +171,7 @@ int rb_node__remove_no_free(void *ctx)
 		return 0;
 	node->one = 42;
 
-	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_lock(&rbtree_lock);
 
 	ret = bpf_rbtree_add(&rbtree, node, less);
 	if (!ret) {
@@ -188,7 +188,7 @@ int rb_node__remove_no_free(void *ctx)
 	 */
 
 unlock_ret:
-	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_unlock(&rbtree_lock);
 	return 0;
 }
 
@@ -202,14 +202,14 @@ int rb_tree__add_wrong_type(void *ctx)
 
 	task = bpf_get_current_task_btf();
 
-	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_lock(&rbtree_lock);
 
 	ret = bpf_rbtree_add(&rbtree, task, less);
 	/* Verifier should fail at bpf_rbtree_add, so don't bother handling
 	 * failure.
 	 */
 
-	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_unlock(&rbtree_lock);
 	return 0;
 }
 
@@ -223,7 +223,7 @@ int rb_tree__conditional_release_helper_usage(void *ctx)
 		return 0;
 	node->one = 42;
 
-	bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_lock(&rbtree_lock);
 
 	ret = bpf_rbtree_add(&rbtree, node, less);
 	/* Verifier should fail when trying to use CONDITIONAL_RELEASE
@@ -236,7 +236,7 @@ int rb_tree__conditional_release_helper_usage(void *ctx)
 	}
 
 unlock_ret:
-	bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+	bpf_rbtree_unlock(&rbtree_lock);
 	return 0;
 }
 

From patchwork Tue Aug 30 17:27:59 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 12959627
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AEEF8ECAAA1
	for <bpf@archiver.kernel.org>; Tue, 30 Aug 2022 17:36:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231788AbiH3Rgb (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 30 Aug 2022 13:36:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48208 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231866AbiH3RgG (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 30 Aug 2022 13:36:06 -0400
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com
 [67.231.145.42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7209163B4E
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:32:44 -0700 (PDT)
Received: from pps.filterd (m0148461.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 27UG2Hxw010065
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:23 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=wCvW/O2CjKKP6QQJwp4qzYhnTiB7xx+PP49P3exZYb4=;
 b=NfWlK1DnShkPY12VbZ/NH/TpGkThXP1L57RR5Xhj31jZsJig9639sSOmDRGYF8tod1Xj
 P9DZ7/qbe4myaRGBL8NCNGqObftMRJlDqE5PxrlTazxVVOJ2d2hGx/AnadXdZow4W5V5
 350bLDvYXOeTOS5Y3cBA6gXnsYeYupvW9Sk=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j94gye2dm-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Tue, 30 Aug 2022 10:31:23 -0700
Received: from twshared25017.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2375.31; Tue, 30 Aug 2022 10:31:22 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id CB421CAD07AA; Tue, 30 Aug 2022 10:28:10 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 18/18] selftests/bpf: Rbtree static lock
 verification test changes
Date: Tue, 30 Aug 2022 10:27:59 -0700
Message-ID: <20220830172759.4069786-19-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>
References: <20220830172759.4069786-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: POZU6oMc1vCgKr2HIJO73IdMlppZflSi
X-Proofpoint-GUID: POZU6oMc1vCgKr2HIJO73IdMlppZflSi
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1
 definitions=2022-08-30_10,2022-08-30_01,2022-06-22_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net
X-Patchwork-State: RFC

These are test chagnes corresponding to commit "bpf: Check rbtree lock
held during verification". They should be squashed with other test
changes, but are left unsquashed as part of RFCv2 to ease tossing them
piecemeal if associated functionality is dropped.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/prog_tests/rbtree_map.c     | 32 +------------------
 1 file changed, 1 insertion(+), 31 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
index 7634a2d93f0b..07bc2780854c 100644
--- a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
@@ -19,6 +19,7 @@ static struct {
 	{"rb_node__two_alloc_one_add", "Unreleased reference id=2 alloc_insn=3"},
 	{"rb_node__remove_no_free", "Unreleased reference id=6 alloc_insn=26"},
 	{"rb_tree__add_wrong_type", "rbtree: R2 is of type task_struct but node_data is expected"},
+	{"add_node__no_lock", "lock associated with rbtree is not held"},
 	{"rb_tree__conditional_release_helper_usage",
 		"R2 type=ptr_cond_rel_ expected=ptr_"},
 };
@@ -96,37 +97,6 @@ void test_rbtree_map_alloc_node__size_too_small(void)
 	rbtree_map_fail__destroy(skel);
 }
 
-void test_rbtree_map_add_node__no_lock(void)
-{
-	struct rbtree_map_fail *skel;
-	struct bpf_program *prog;
-	struct bpf_link *link;
-	int err;
-
-	skel = rbtree_map_fail__open();
-	if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open"))
-		goto cleanup;
-
-	prog = skel->progs.add_node__no_lock;
-	bpf_program__set_autoload(prog, true);
-
-	err = rbtree_map_fail__load(skel);
-	if (!ASSERT_OK(err, "unexpected load fail"))
-		goto cleanup;
-
-	link = bpf_program__attach(skel->progs.add_node__no_lock);
-	if (!ASSERT_OK_PTR(link, "link"))
-		goto cleanup;
-
-	syscall(SYS_getpgid);
-
-	ASSERT_EQ(skel->bss->no_lock_add__fail, 1, "alloc_fail");
-
-	bpf_link__destroy(link);
-cleanup:
-	rbtree_map_fail__destroy(skel);
-}
-
 void test_rbtree_map_prog_load_fail(void)
 {
 	int i;