From patchwork Thu May  4 05:33:30 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230772
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B9CB4C77B7C
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229446AbjEDFeH (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:34:07 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50108 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229639AbjEDFeF (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:34:05 -0400
Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CE72212F
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:34:04 -0700 (PDT)
Received: from pps.filterd (m0001303.ppops.net [127.0.0.1])
        by m0001303.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id
 3440rqmg028570
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:34:03 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=HZIAX8MckQZy4H8kf69h4gro+xRaKTsFPHnD7RW0cjE=;
 b=MnbQ5xG57Xh1x2oHRgiDilHXdugvKE9mEXGTFWyE+Nxjq12JYWDc86elRhNYZHUtcQ6h
 vQHIaZui5CUCvDYUG7wTJWcAaHH5+9isIUSV1WhPWlRvjK1FfWvQgvEFa2z/9FnpBOCd
 JVxj6AsQH6ilgirKszLSqpfbTpIrVhf+Klg=
Received: from mail.thefacebook.com ([163.114.132.120])
        by m0001303.ppops.net (PPS) with ESMTPS id 3qbkxdqn0n-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:34:03 -0700
Received: from twshared35445.38.frc1.facebook.com (2620:10d:c085:108::4) by
 mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:34:02 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 2DFCA1D7BFC4F; Wed,  3 May 2023 22:33:45 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 1/9] [DONOTAPPLY] Revert "bpf: Disable
 bpf_refcount_acquire kfunc calls until race conditions are fixed"
Date: Wed, 3 May 2023 22:33:30 -0700
Message-ID: <20230504053338.1778690-2-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: vJjlVfWCrNGCXhS4bFL87lnGBoD2bXbl
X-Proofpoint-ORIG-GUID: vJjlVfWCrNGCXhS4bFL87lnGBoD2bXbl
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

This patch should not be applied with the rest of the series as the
series doesn't fix all bpf_refcount issues. It is included here to allow
CI to run the test added later in the series.

Followup series which fixes the remaining problems will include a revert
that should be applied.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c                                    | 5 +----
 tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c | 2 ++
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ff4a8ab99f08..6f39534ded2e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10572,10 +10572,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "arg#%d doesn't point to a type with bpf_refcount field\n", i);
 				return -EINVAL;
 			}
-			if (rec->refcount_off >= 0) {
-				verbose(env, "bpf_refcount_acquire calls are disabled for now\n");
-				return -EINVAL;
-			}
+
 			meta->arg_refcount_acquire.btf = reg->btf;
 			meta->arg_refcount_acquire.btf_id = reg->btf_id;
 			break;
diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index 595cbf92bff5..2ab23832062d 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -9,8 +9,10 @@
 
 void test_refcounted_kptr(void)
 {
+	RUN_TESTS(refcounted_kptr);
 }
 
 void test_refcounted_kptr_fail(void)
 {
+	RUN_TESTS(refcounted_kptr_fail);
 }

From patchwork Thu May  4 05:33:31 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230773
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D358BC77B78
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229610AbjEDFeJ (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:34:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50118 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229522AbjEDFeH (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:34:07 -0400
Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C54C71FC9
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:34:06 -0700 (PDT)
Received: from pps.filterd (m0089730.ppops.net [127.0.0.1])
        by m0089730.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id
 3440qUlf017028
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:34:06 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=r9s6tunHtJxz3JCJdvJIt+aUgRn9CEJNZ/7pL2SWZYU=;
 b=e/p3Yc+m5p1LdFMKRnFoVU1XbpKO0vO/S+4cb/RgpQfYjfidTbRjVlh8hi8pNXGtGXW/
 vBZeD5nDkUL/C8nPMztNLzDEZYjYie8cDkt4T1L8FTpKaa/CeR1j7nyLtuRyZ21siXpH
 TadXKDfD0lt4Qg70Esho64G1sZndyg6dTKQ=
Received: from mail.thefacebook.com ([163.114.132.120])
        by m0089730.ppops.net (PPS) with ESMTPS id 3qbkgnfnxj-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:34:05 -0700
Received: from twshared1349.05.ash8.facebook.com (2620:10d:c085:108::8) by
 mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:34:04 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 4AF8C1D7BFC54; Wed,  3 May 2023 22:33:46 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 2/9] bpf: Set kptr_struct_meta for node param to
 list and rbtree insert funcs
Date: Wed, 3 May 2023 22:33:31 -0700
Message-ID: <20230504053338.1778690-3-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: JIuR-41nxpviloKh3NuGUTP-I-bXk02n
X-Proofpoint-ORIG-GUID: JIuR-41nxpviloKh3NuGUTP-I-bXk02n
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

In verifier.c, fixup_kfunc_call uses struct bpf_insn_aux_data's
kptr_struct_meta field to pass information about local kptr types to
various helpers and kfuncs at runtime. The recent bpf_refcount series
added a few functions to the set that need this information:

  * bpf_refcount_acquire
    * Needs to know where the refcount field is in order to increment
  * Graph collection insert kfuncs: bpf_rbtree_add, bpf_list_push_{front,back}
    * Were migrated to possibly fail by the bpf_refcount series. If
      insert fails, the input node is bpf_obj_drop'd. bpf_obj_drop needs
      the kptr_struct_meta in order to decr refcount and properly free
      special fields.

Unfortunately the verifier handling of collection insert kfuncs was not
modified to actually populate kptr_struct_meta. Accordingly, when the
node input to those kfuncs is passed to bpf_obj_drop, it is done so
without the information necessary to decr refcount.

This patch fixes the issue by populating kptr_struct_meta for those
kfuncs.

Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f39534ded2e..26c072e34834 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -299,6 +299,7 @@ struct bpf_kfunc_call_arg_meta {
 	union {
 		struct btf_and_id arg_obj_drop;
 		struct btf_and_id arg_refcount_acquire;
+		struct btf_and_id arg_graph_node;
 	};
 	struct {
 		struct btf_field *field;
@@ -10158,6 +10159,8 @@ __process_kf_arg_ptr_to_graph_node(struct bpf_verifier_env *env,
 			node_off, btf_name_by_offset(reg->btf, t->name_off));
 		return -EINVAL;
 	}
+	meta->arg_graph_node.btf = reg->btf;
+	meta->arg_graph_node.btf_id = reg->btf_id;
 
 	if (node_off != field->graph_root.node_offset) {
 		verbose(env, "arg#1 offset=%d, but expected %s at offset=%d in struct %s\n",
@@ -10720,6 +10723,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	    meta.func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
 		release_ref_obj_id = regs[BPF_REG_2].ref_obj_id;
 		insn_aux->insert_off = regs[BPF_REG_2].off;
+		insn_aux->kptr_struct_meta = btf_find_struct_meta(meta.arg_graph_node.btf,
+								  meta.arg_graph_node.btf_id);
 		err = ref_convert_owning_non_owning(env, release_ref_obj_id);
 		if (err) {
 			verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n",

From patchwork Thu May  4 05:33:32 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230770
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 927F8C77B78
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229564AbjEDFd7 (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:33:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50036 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229514AbjEDFd7 (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:33:59 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9951B1BEB
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:33:57 -0700 (PDT)
Received: from pps.filterd (m0109332.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 3440qRFG023172
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:33:56 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=TabE7jwAh6oZYRmGnSUkJ6+20VLHXm/JpOuJLqf7+wE=;
 b=K3gPGid3pgkzome5qKSQt7q7v0yCWQxVnLT34+D9IA22gBe+INobWTAuaPjqcUtlmidH
 Lnw+DkgGpTSdoE0FwpWAR/1tfFjCZn8qJLFtwxAa1+/Gr24dPz5XVZl8K+ACW9Qn++91
 Kyctwn0krDsS6Nt0tbOtp0gq66dBxq3ttQc=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qbhx28a31-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:33:56 -0700
Received: from twshared4902.04.ash8.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:33:55 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 6CBA91D7BFC5B; Wed,  3 May 2023 22:33:47 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 3/9] bpf: Fix __bpf_{list,rbtree}_add's
 beginning-of-node calculation
Date: Wed, 3 May 2023 22:33:32 -0700
Message-ID: <20230504053338.1778690-4-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: FDJri5Z8EkUiWEqeRkred-cgVCMbiU-7
X-Proofpoint-GUID: FDJri5Z8EkUiWEqeRkred-cgVCMbiU-7
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

Given the pointer to struct bpf_{rb,list}_node within a local kptr and
the byte offset of that field within the kptr struct, the calculation changed
by this patch is meant to find the beginning of the kptr so that it can
be passed to bpf_obj_drop.

Unfortunately instead of doing

  ptr_to_kptr = ptr_to_node_field - offset_bytes

the calculation is erroneously doing

  ptr_to_ktpr = ptr_to_node_field - (offset_bytes * sizeof(struct bpf_rb_node))

or the bpf_list_node equivalent.

This patch fixes the calculation.

Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index bb6b4637ebf2..7a8968839e91 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1950,7 +1950,7 @@ static int __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head
 		INIT_LIST_HEAD(h);
 	if (!list_empty(n)) {
 		/* Only called from BPF prog, no need to migrate_disable */
-		__bpf_obj_drop_impl(n - off, rec);
+		__bpf_obj_drop_impl((void *)n - off, rec);
 		return -EINVAL;
 	}
 
@@ -2032,7 +2032,7 @@ static int __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
 
 	if (!RB_EMPTY_NODE(n)) {
 		/* Only called from BPF prog, no need to migrate_disable */
-		__bpf_obj_drop_impl(n - off, rec);
+		__bpf_obj_drop_impl((void *)n - off, rec);
 		return -EINVAL;
 	}
 

From patchwork Thu May  4 05:33:33 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230775
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 24017C77B78
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229645AbjEDFeM (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:34:12 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50166 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229607AbjEDFeL (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:34:11 -0400
Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DCBE2122
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:34:09 -0700 (PDT)
Received: from pps.filterd (m0089730.ppops.net [127.0.0.1])
        by m0089730.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id
 3440qUPK016990
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:34:08 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=XFFAf5dfjl+Rt3jWqHNrw4NZbYLGA1/35aiH7fT0Mn0=;
 b=LGL6+yVo2Zu6dAIGJRiPcwUWIb0NecK/bAq5ybDKo9EPtnGyHin0aPkU/aX0z0sorLtv
 qhFw7uIz8akefFfr9Q6KunN2Kdj5ZERAjMuhOKyYkZMq6KB0rDgu845rFNjHsJOYCtvK
 2fhgFQHEck9dJ3KoYmWpmBC7cELoAGdpXbI=
Received: from mail.thefacebook.com ([163.114.132.120])
        by m0089730.ppops.net (PPS) with ESMTPS id 3qbkgnfnxh-6
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:34:08 -0700
Received: from twshared1349.05.ash8.facebook.com (2620:10d:c085:108::4) by
 mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:34:04 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 614701D7BFC6C; Wed,  3 May 2023 22:33:49 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 4/9] bpf: Allow KF_DESTRUCTIVE-flagged kfuncs to
 be called under spinlock
Date: Wed, 3 May 2023 22:33:33 -0700
Message-ID: <20230504053338.1778690-5-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: 7J7ELa3wHV6aY8qaOWy5s64NjyIg0TWv
X-Proofpoint-ORIG-GUID: 7J7ELa3wHV6aY8qaOWy5s64NjyIg0TWv
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

In order to prevent deadlock the verifier currently disallows any
function calls under bpf_spin_lock save for a small set of allowlisted
helpers/kfuncs. A BPF program that calls destructive kfuncs might be
trying to cause deadlock, and regardless is understood to be capable of
causing system breakage of similar severity. Per kfuncs.rst:

  The KF_DESTRUCTIVE flag is used to indicate functions calling which is
  destructive to the system. For example such a call can result in system
  rebooting or panicking. Due to this additional restrictions apply to these
  calls.

Preventing BPF programs from crashing or otherwise blowing up the system
is generally the verifier's goal, but destructive kfuncs might have such
a state be their intended result. Preventing KF_DESTRUCTIVE kfunc calls
under spinlock with the goal of safety is therefore unnecessarily
strict. This patch modifies the "function calls are not allowed while
holding a lock" check to allow calling destructive kfuncs with an
active_lock.

The motivating usecase for this change - unsafe locking of
bpf_spin_locks for easy testing of race conditions - is implemented in
the next two patches in the series.

Note that the removed insn->off check was rejecting any calls to kfuncs
defined in non-vmlinux BTF. In order to get the system in a broken or
otherwise interesting state for inspection, developers might load a
module implementing destructive kfuncs particular to their usecase. The
unsafe_spin_{lock, unlock} kfuncs later in this series are a good
example: there's no clear reason for them to be in vmlinux as they're
specifically for BPF selftests, so they live in bpf_testmod. The check
is removed in favor of a newly-added helper function to enable such
usecases.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 26c072e34834..f96e5b9c790b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -319,6 +319,11 @@ struct bpf_kfunc_call_arg_meta {
 	u64 mem_size;
 };
 
+static int fetch_kfunc_meta(struct bpf_verifier_env *env,
+			    struct bpf_insn *insn,
+			    struct bpf_kfunc_call_arg_meta *meta,
+			    const char **kfunc_name);
+
 struct btf *btf_vmlinux;
 
 static DEFINE_MUTEX(bpf_verifier_lock);
@@ -9989,6 +9994,21 @@ static bool is_rbtree_lock_required_kfunc(u32 btf_id)
 	return is_bpf_rbtree_api_kfunc(btf_id);
 }
 
+static bool is_kfunc_callable_in_spinlock(struct bpf_verifier_env *env,
+					  struct bpf_insn *insn)
+{
+	struct bpf_kfunc_call_arg_meta meta;
+
+	/* insn->off is idx into btf fd_array - 0 for vmlinux btf, else nonzero */
+	if (!insn->off && is_bpf_graph_api_kfunc(insn->imm))
+		return true;
+
+	if (fetch_kfunc_meta(env, insn, &meta, NULL))
+		return false;
+
+	return is_kfunc_destructive(&meta);
+}
+
 static bool check_kfunc_is_graph_root_api(struct bpf_verifier_env *env,
 					  enum btf_field_type head_field_type,
 					  u32 kfunc_btf_id)
@@ -15875,7 +15895,7 @@ static int do_check(struct bpf_verifier_env *env)
 					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
 					    (insn->src_reg == BPF_PSEUDO_CALL) ||
 					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
-					     (insn->off != 0 || !is_bpf_graph_api_kfunc(insn->imm)))) {
+					     !is_kfunc_callable_in_spinlock(env, insn))) {
 						verbose(env, "function calls are not allowed while holding a lock\n");
 						return -EINVAL;
 					}

From patchwork Thu May  4 05:33:34 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230774
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8C44AC77B7C
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229499AbjEDFeK (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:34:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50142 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229619AbjEDFeJ (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:34:09 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA7BB1BFA
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:34:07 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 3440qXot025763
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:34:07 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=MVo/iJTFExwB1IxoyyzSzjn1v9oyIJf2ejBZHESgeMY=;
 b=LuJiujg0WYeeNLEWh3FLIowPGeejTm+KaMrpAYxbgvEt9ypaj6MkjR9nmKQBCyatEGot
 lXE+ULDTFTeTdvakD2E1Mut8mUcMBK0RuuSi6qsUFjMEOeokzw0Le31v+mWPc8Hwh15V
 NWBX0R2Uyz3pZuN2e8fYAzua08QA9gP9LjI=
Received: from mail.thefacebook.com ([163.114.132.120])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qbjd080mp-2
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:34:06 -0700
Received: from twshared1349.05.ash8.facebook.com (2620:10d:c085:208::11) by
 mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:34:04 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id C06DA1D7BFC78; Wed,  3 May 2023 22:33:51 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 5/9] [RFC] selftests/bpf: Add unsafe lock/unlock
 and refcount_read kfuncs to bpf_testmod
Date: Wed, 3 May 2023 22:33:34 -0700
Message-ID: <20230504053338.1778690-6-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: R-d0hvytAthzaasEiw0wyNcEH7quiZUh
X-Proofpoint-ORIG-GUID: R-d0hvytAthzaasEiw0wyNcEH7quiZUh
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

[
RFC: This patch currently copies static inline helpers:

  __bpf_spin_lock
  __bpf_spin_unlock
  __bpf_spin_lock_irqsave
  __bpf_spin_unlock_irqrestore

from kernel/bpf/helpers.c . The definition of these helpers is
config-dependant and they're not meant to be called from a module, so
not sure how to proceed here.
]

This patch adds three unsafe kfuncs to bpf_testmod for use in
selftests:

  - bpf__unsafe_spin_lock
  - bpf__unsafe_spin_unlock
  - bpf_refcount_read

The first two are equivalent to bpf_spin_{lock, unlock}, except without
any special treatment from the verifier, which allows them to be used in
tests to guarantee a specific interleaving of program execution. This
will simplify testing race conditions in BPF programs, as demonstrated
in further patches in the series. The kfuncs are marked KF_DESTRUCTIVE
as they can easily cause deadlock, and are only intended to be used in
tests.

bpf_refcount_read simply reads the refcount from the uapi-opaque
bpf_refcount struct and returns it. This allows more precise testing of
specific bpf_refcount scenarios, also demonstrated in further patches in
the series. Although this kfunc can't break the system as
catastrophically as the unsafe locking kfuncs, it's also marked
KF_DESTRUCTIVE as it relies on bpf_refcount implementation details, and
shouldn't be used outside of tests regardless.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 61 +++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 52785ba671e6..30762558b16f 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -108,6 +108,64 @@ __bpf_kfunc void bpf_iter_testmod_seq_destroy(struct bpf_iter_testmod_seq *it)
 	it->cnt = 0;
 }
 
+/* BEGIN copied from kernel/bpf/helpers.c */
+static DEFINE_PER_CPU(unsigned long, irqsave_flags);
+
+static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
+{
+        arch_spinlock_t *l = (void *)lock;
+        union {
+                __u32 val;
+                arch_spinlock_t lock;
+        } u = { .lock = __ARCH_SPIN_LOCK_UNLOCKED };
+
+        compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
+        BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
+        BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
+        arch_spin_lock(l);
+}
+
+static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
+{
+        arch_spinlock_t *l = (void *)lock;
+
+        arch_spin_unlock(l);
+}
+
+static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
+{
+        unsigned long flags;
+
+        local_irq_save(flags);
+        __bpf_spin_lock(lock);
+        __this_cpu_write(irqsave_flags, flags);
+}
+
+static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
+{
+        unsigned long flags;
+
+        flags = __this_cpu_read(irqsave_flags);
+        __bpf_spin_unlock(lock);
+        local_irq_restore(flags);
+}
+/* END copied from kernel/bpf/helpers.c */
+
+__bpf_kfunc void bpf__unsafe_spin_lock(void *lock__ign)
+{
+	__bpf_spin_lock_irqsave((struct bpf_spin_lock *)lock__ign);
+}
+
+__bpf_kfunc void bpf__unsafe_spin_unlock(void *lock__ign)
+{
+	__bpf_spin_unlock_irqrestore((struct bpf_spin_lock *)lock__ign);
+}
+
+__bpf_kfunc int bpf_refcount_read(void *refcount__ign)
+{
+	return refcount_read((refcount_t *)refcount__ign);
+}
+
 struct bpf_testmod_btf_type_tag_1 {
 	int a;
 };
@@ -282,6 +340,9 @@ BTF_SET8_START(bpf_testmod_common_kfunc_ids)
 BTF_ID_FLAGS(func, bpf_iter_testmod_seq_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_testmod_seq_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_testmod_seq_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf__unsafe_spin_lock, KF_DESTRUCTIVE)
+BTF_ID_FLAGS(func, bpf__unsafe_spin_unlock, KF_DESTRUCTIVE)
+BTF_ID_FLAGS(func, bpf_refcount_read, KF_DESTRUCTIVE)
 BTF_SET8_END(bpf_testmod_common_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_testmod_common_kfunc_set = {

From patchwork Thu May  4 05:33:35 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230779
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7E7E1C77B7C
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229620AbjEDFeb (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:34:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50462 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229607AbjEDFea (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:34:30 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93DDF1BEB
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:34:28 -0700 (PDT)
Received: from pps.filterd (m0109331.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 3440qVEo010198
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:34:27 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : content-type :
 content-transfer-encoding : mime-version; s=facebook;
 bh=/uNsAYWheVXY6SXC1mviQxgXCaD8CtmcDJ0GeIh2YFE=;
 b=q/15U/gxNGff0sZpfPvUrUEoc4gcgmGg2Rv/0THOuKNrbDh0rRVgYUjV5cHo79iImQZw
 q9rBAERKSKkQ18doRpKF5wN6GaKxBtiDH41B1MOukQk/4YBBUp2eja8u/YUx+5RnOOQ8
 SH/76jR/wYirqNBFv1kHakrBnaZ7mNyEyfE=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qc1ua9jc5-5
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:34:27 -0700
Received: from twshared17808.08.ash9.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:34:07 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 3EB611D7BFC7B; Wed,  3 May 2023 22:33:53 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 6/9] bpf: Make bpf_refcount_acquire fallible for
 non-owning refs
Date: Wed, 3 May 2023 22:33:35 -0700
Message-ID: <20230504053338.1778690-7-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: PKsp526I9_RHyo0SUjaLWkadBS7i5eDf
X-Proofpoint-GUID: PKsp526I9_RHyo0SUjaLWkadBS7i5eDf
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

This patch fixes an incorrect assumption made in the original
bpf_refcount series [0], specifically that the BPF program calling
bpf_refcount_acquire on some node can always guarantee that the node is
alive. In that series, the patch adding failure behavior to rbtree_add
and list_push_{front, back} breaks this assumption for non-owning
references.

Consider the following program:

  n = bpf_kptr_xchg(&mapval, NULL);
  /* skip error checking */

  bpf_spin_lock(&l);
  if(bpf_rbtree_add(&t, &n->rb, less)) {
    bpf_refcount_acquire(n);
    /* Failed to add, do something else with the node */
  }
  bpf_spin_unlock(&l);

It's incorrect to assume that bpf_refcount_acquire will always succeed in this
scenario. bpf_refcount_acquire is being called in a critical section
here, but the lock being held is associated with rbtree t, which isn't
necessarily the lock associated with the tree that the node is already
in. So after bpf_rbtree_add fails to add the node and calls bpf_obj_drop
in it, the program has no ownership of the node's lifetime. Therefore
the node's refcount can be decr'd to 0 at any time after the failing
rbtree_add. If this happens before the refcount_acquire above, the node
might be free'd, and regardless refcount_acquire will be incrementing a
0 refcount.

Later patches in the series exercise this scenario, resulting in the
expected complaint from the kernel (without this patch's changes):

  refcount_t: addition on 0; use-after-free.
  WARNING: CPU: 1 PID: 207 at lib/refcount.c:25 refcount_warn_saturate+0xbc/0x110
  Modules linked in: bpf_testmod(O)
  CPU: 1 PID: 207 Comm: test_progs Tainted: G           O       6.3.0-rc7-02231-g723de1a718a2-dirty #371
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
  RIP: 0010:refcount_warn_saturate+0xbc/0x110
  Code: 6f 64 f6 02 01 e8 84 a3 5c ff 0f 0b eb 9d 80 3d 5e 64 f6 02 00 75 94 48 c7 c7 e0 13 d2 82 c6 05 4e 64 f6 02 01 e8 64 a3 5c ff <0f> 0b e9 7a ff ff ff 80 3d 38 64 f6 02 00 0f 85 6d ff ff ff 48 c7
  RSP: 0018:ffff88810b9179b0 EFLAGS: 00010082
  RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
  RDX: 0000000000000202 RSI: 0000000000000008 RDI: ffffffff857c3680
  RBP: ffff88810027d3c0 R08: ffffffff8125f2a4 R09: ffff88810b9176e7
  R10: ffffed1021722edc R11: 746e756f63666572 R12: ffff88810027d388
  R13: ffff88810027d3c0 R14: ffffc900005fe030 R15: ffffc900005fe048
  FS:  00007fee0584a700(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00005634a96f6c58 CR3: 0000000108ce9002 CR4: 0000000000770ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <TASK>
   bpf_refcount_acquire_impl+0xb5/0xc0

  (rest of output snipped)

The patch addresses this by changing bpf_refcount_acquire_impl to use
refcount_inc_not_zero instead of refcount_inc and marking
bpf_refcount_acquire KF_RET_NULL.

For owning references, though, we know the above scenario is not possible
and thus that bpf_refcount_acquire will always succeed. Some verifier
bookkeeping is added to track "is input owning ref?" for bpf_refcount_acquire
calls and return false from is_kfunc_ret_null for bpf_refcount_acquire on
owning refs despite it being marked KF_RET_NULL.

Existing selftests using bpf_refcount_acquire are modified where
necessary to NULL-check its return value.

  [0]: https://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/

Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c                          |  8 +++-
 kernel/bpf/verifier.c                         | 38 ++++++++++++-------
 .../selftests/bpf/progs/refcounted_kptr.c     |  2 +
 .../bpf/progs/refcounted_kptr_fail.c          |  4 +-
 4 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7a8968839e91..8283069349f4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1933,8 +1933,12 @@ __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta
 	 * bpf_refcount type so that it is emitted in vmlinux BTF
 	 */
 	ref = (struct bpf_refcount *)(p__refcounted_kptr + meta->record->refcount_off);
+	if (!refcount_inc_not_zero((refcount_t *)ref))
+		return NULL;
 
-	refcount_inc((refcount_t *)ref);
+	/* Verifier strips KF_RET_NULL if input is owned ref, see is_kfunc_ret_null
+	 * in verifier.c
+	 */
 	return (void *)p__refcounted_kptr;
 }
 
@@ -2384,7 +2388,7 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
-BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE)
+BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_push_front_impl)
 BTF_ID_FLAGS(func, bpf_list_push_back_impl)
 BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f96e5b9c790b..34ff68842ece 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -298,8 +298,11 @@ struct bpf_kfunc_call_arg_meta {
 	} arg_constant;
 	union {
 		struct btf_and_id arg_obj_drop;
-		struct btf_and_id arg_refcount_acquire;
 		struct btf_and_id arg_graph_node;
+		struct {
+			struct btf_and_id btf_and_id;
+			bool owning_ref;
+		} arg_refcount_acquire;
 	};
 	struct {
 		struct btf_field *field;
@@ -9372,11 +9375,6 @@ static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->kfunc_flags & KF_ACQUIRE;
 }
 
-static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
-{
-	return meta->kfunc_flags & KF_RET_NULL;
-}
-
 static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_RELEASE;
@@ -9687,6 +9685,16 @@ BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
 
+static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
+{
+	if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
+	    meta->arg_refcount_acquire.owning_ref) {
+		return false;
+	}
+
+	return meta->kfunc_flags & KF_RET_NULL;
+}
+
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_lock];
@@ -10580,10 +10588,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			meta->subprogno = reg->subprogno;
 			break;
 		case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
-			if (!type_is_ptr_alloc_obj(reg->type) && !type_is_non_owning_ref(reg->type)) {
+			if (!type_is_ptr_alloc_obj(reg->type)) {
 				verbose(env, "arg#%d is neither owning or non-owning ref\n", i);
 				return -EINVAL;
 			}
+			if (!type_is_non_owning_ref(reg->type))
+				meta->arg_refcount_acquire.owning_ref = true;
 
 			rec = reg_btf_record(reg);
 			if (!rec) {
@@ -10596,8 +10606,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EINVAL;
 			}
 
-			meta->arg_refcount_acquire.btf = reg->btf;
-			meta->arg_refcount_acquire.btf_id = reg->btf_id;
+			meta->arg_refcount_acquire.btf_and_id.btf = reg->btf;
+			meta->arg_refcount_acquire.btf_and_id.btf_id = reg->btf_id;
 			break;
 		}
 	}
@@ -10829,14 +10839,14 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				insn_aux->kptr_struct_meta =
 					btf_find_struct_meta(ret_btf, ret_btf_id);
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]) {
+				struct btf_and_id *b = &meta.arg_refcount_acquire.btf_and_id;
+
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
-				regs[BPF_REG_0].btf = meta.arg_refcount_acquire.btf;
-				regs[BPF_REG_0].btf_id = meta.arg_refcount_acquire.btf_id;
+				regs[BPF_REG_0].btf = b->btf;
+				regs[BPF_REG_0].btf_id = b->btf_id;
 
-				insn_aux->kptr_struct_meta =
-					btf_find_struct_meta(meta.arg_refcount_acquire.btf,
-							     meta.arg_refcount_acquire.btf_id);
+				insn_aux->kptr_struct_meta = btf_find_struct_meta(b->btf, b->btf_id);
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_list_pop_front] ||
 				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
 				struct btf_field *field = meta.arg_list_head.field;
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index 1d348a225140..a3da610b1e6b 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -375,6 +375,8 @@ long rbtree_refcounted_node_ref_escapes(void *ctx)
 	bpf_rbtree_add(&aroot, &n->node, less_a);
 	m = bpf_refcount_acquire(n);
 	bpf_spin_unlock(&alock);
+	if (!m)
+		return 2;
 
 	m->key = 2;
 	bpf_obj_drop(m);
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
index efcb308f80ad..0b09e5c915b1 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
@@ -29,7 +29,7 @@ static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
 }
 
 SEC("?tc")
-__failure __msg("Unreleased reference id=3 alloc_insn=21")
+__failure __msg("Unreleased reference id=4 alloc_insn=21")
 long rbtree_refcounted_node_ref_escapes(void *ctx)
 {
 	struct node_acquire *n, *m;
@@ -43,6 +43,8 @@ long rbtree_refcounted_node_ref_escapes(void *ctx)
 	/* m becomes an owning ref but is never drop'd or added to a tree */
 	m = bpf_refcount_acquire(n);
 	bpf_spin_unlock(&glock);
+	if (!m)
+		return 2;
 
 	m->key = 2;
 	return 0;

From patchwork Thu May  4 05:33:36 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230776
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5B94EC77B7C
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:16 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229639AbjEDFeO (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:34:14 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50192 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229619AbjEDFeM (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:34:12 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 680A12120
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:34:09 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 3440qV82025505
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:34:08 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : content-type :
 content-transfer-encoding : mime-version; s=facebook;
 bh=sMkWbAkq5YkcH2ZwIaxp411OtvXYjEWVas3ZsCgZ9uk=;
 b=BLokR3fwz6cXDdRKV8aDcpruyp5eHbKXo1Idp3LXwac1mMk/WT4Z0DeOi7LWXt9YPuJY
 Tg5hWlZsilHgQIJcvewScwn+SKeL8A/cKCZQ4Ndxk89T7HrQWomCGmuJI3VsMqvrw7A2
 Royvx5UaBTi8MCBsvaRQFTMWW4go9TpZbv0=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qbjd080mx-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:34:08 -0700
Received: from twshared4902.04.ash8.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:34:07 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 66F111D7BFC82; Wed,  3 May 2023 22:33:55 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 7/9] selftests/bpf: Add test exercising
 bpf_refcount_acquire race condition
Date: Wed, 3 May 2023 22:33:36 -0700
Message-ID: <20230504053338.1778690-8-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
X-FB-Internal: Safe
X-Proofpoint-GUID: OXX5ZJlHc9pckDGTSJb9TSU_sPsAyp6X
X-Proofpoint-ORIG-GUID: OXX5ZJlHc9pckDGTSJb9TSU_sPsAyp6X
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

The selftest added in this patch is the exact scenario described by
Kumar in [0] and fixed by earlier patches in this series. The long
comment added in progs/refcounted_kptr.c restates the use-after-free
scenario.

The added test uses bpf__unsafe_spin_{lock, unlock} to force the
specific problematic interleaving we're interested in testing, and
bpf_refcount_read to confirm refcount incr/decr work as expected.

  [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../bpf/prog_tests/refcounted_kptr.c          | 104 +++++++++++-
 .../selftests/bpf/progs/refcounted_kptr.c     | 158 ++++++++++++++++++
 2 files changed, 261 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index 2ab23832062d..e7fcc1dd8864 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -1,8 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
-
+#define _GNU_SOURCE
 #include <test_progs.h>
 #include <network_helpers.h>
+#include <pthread.h>
+#include <sched.h>
 
 #include "refcounted_kptr.skel.h"
 #include "refcounted_kptr_fail.skel.h"
@@ -16,3 +18,103 @@ void test_refcounted_kptr_fail(void)
 {
 	RUN_TESTS(refcounted_kptr_fail);
 }
+
+static void force_cpu(pthread_t thread, int cpunum)
+{
+	cpu_set_t cpuset;
+	int err;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(cpunum, &cpuset);
+	err = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
+	if (!ASSERT_OK(err, "pthread_setaffinity_np"))
+		return;
+}
+
+struct refcounted_kptr *skel;
+
+static void *run_unstash_acq_ref(void *unused)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	long ret, unstash_acq_ref_fd;
+	force_cpu(pthread_self(), 1);
+
+	unstash_acq_ref_fd = bpf_program__fd(skel->progs.unstash_add_and_acquire_refcount);
+
+	ret = bpf_prog_test_run_opts(unstash_acq_ref_fd, &opts);
+	ASSERT_EQ(opts.retval, 0, "unstash_add_and_acquire_refcount retval");
+	ASSERT_EQ(skel->bss->ref_check_3, 2, "ref_check_3");
+	ASSERT_EQ(skel->bss->ref_check_4, 1, "ref_check_4");
+	ASSERT_EQ(skel->bss->ref_check_5, 0, "ref_check_5");
+	pthread_exit((void *)ret);
+}
+
+void test_refcounted_kptr_races(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	int ref_acq_lock_fd, ref_acq_unlock_fd, rem_node_lock_fd;
+	int add_stash_fd, remove_tree_fd;
+	pthread_t thread_id;
+	int ret;
+
+	force_cpu(pthread_self(), 0);
+	skel = refcounted_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
+		return;
+
+	add_stash_fd = bpf_program__fd(skel->progs.add_refcounted_node_to_tree_and_stash);
+	remove_tree_fd = bpf_program__fd(skel->progs.remove_refcounted_node_from_tree);
+	ref_acq_lock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_lock);
+	ref_acq_unlock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_unlock);
+	rem_node_lock_fd = bpf_program__fd(skel->progs.unsafe_rem_node_lock);
+
+	ret = bpf_prog_test_run_opts(rem_node_lock_fd, &opts);
+	if (!ASSERT_OK(ret, "rem_node_lock"))
+		return;
+
+	ret = bpf_prog_test_run_opts(ref_acq_lock_fd, &opts);
+	if (!ASSERT_OK(ret, "ref_acq_lock"))
+		return;
+
+	ret = bpf_prog_test_run_opts(add_stash_fd, &opts);
+	if (!ASSERT_OK(ret, "add_stash"))
+		return;
+	if (!ASSERT_OK(opts.retval, "add_stash retval"))
+		return;
+
+	ret = pthread_create(&thread_id, NULL, &run_unstash_acq_ref, NULL);
+	if (!ASSERT_OK(ret, "pthread_create"))
+		goto cleanup;
+
+	force_cpu(thread_id, 1);
+
+	/* This program will execute before unstash_acq_ref's refcount_acquire, then
+	 * unstash_acq_ref can proceed after unsafe_unlock
+	 */
+	ret = bpf_prog_test_run_opts(remove_tree_fd, &opts);
+	if (!ASSERT_OK(ret, "remove_tree"))
+		goto cleanup;
+
+	ret = bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
+	if (!ASSERT_OK(ret, "ref_acq_unlock"))
+		goto cleanup;
+
+	ret = pthread_join(thread_id, NULL);
+	if (!ASSERT_OK(ret, "pthread_join"))
+		goto cleanup;
+
+	refcounted_kptr__destroy(skel);
+	return;
+cleanup:
+	bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
+	refcounted_kptr__destroy(skel);
+	return;
+}
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index a3da610b1e6b..2951f45291c1 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -39,9 +39,20 @@ private(A) struct bpf_spin_lock lock;
 private(A) struct bpf_rb_root root __contains(node_data, r);
 private(A) struct bpf_list_head head __contains(node_data, l);
 
+private(C) struct bpf_spin_lock lock2;
+private(C) struct bpf_rb_root root2 __contains(node_data, r);
+
 private(B) struct bpf_spin_lock alock;
 private(B) struct bpf_rb_root aroot __contains(node_acquire, node);
 
+private(D) struct bpf_spin_lock ref_acq_lock;
+private(E) struct bpf_spin_lock rem_node_lock;
+
+/* Provided by bpf_testmod */
+extern void bpf__unsafe_spin_lock(void *lock__ign) __ksym;
+extern void bpf__unsafe_spin_unlock(void *lock__ign) __ksym;
+extern volatile int bpf_refcount_read(void *refcount__ign) __ksym;
+
 static bool less(struct bpf_rb_node *node_a, const struct bpf_rb_node *node_b)
 {
 	struct node_data *a;
@@ -405,4 +416,151 @@ long rbtree_refcounted_node_ref_escapes_owning_input(void *ctx)
 	return 0;
 }
 
+SEC("tc")
+long unsafe_ref_acq_lock(void *ctx)
+{
+	bpf__unsafe_spin_lock(&ref_acq_lock);
+	return 0;
+}
+
+SEC("tc")
+long unsafe_ref_acq_unlock(void *ctx)
+{
+	bpf__unsafe_spin_unlock(&ref_acq_lock);
+	return 0;
+}
+
+SEC("tc")
+long unsafe_rem_node_lock(void *ctx)
+{
+	bpf__unsafe_spin_lock(&rem_node_lock);
+	return 0;
+}
+
+/* The following 3 progs are used in concert to test a bpf_refcount-related
+ * race. Consider the following pseudocode interleaving of rbtree operations:
+ *
+ * (Assumptions: n, m, o, p, q are pointers to nodes, t1 and t2 are different
+ * rbtrees, l1 and l2 are locks accompanying the trees, mapval is some
+ * kptr_xchg'able ptr_to_map_value. A single node is being manipulated by both
+ * programs. Irrelevant error-checking and casting is omitted.)
+ *
+ *               CPU O                               CPU 1
+ *     ----------------------------------|---------------------------
+ *     n = bpf_obj_new  [0]              |
+ *     lock(l1)                          |
+ *     bpf_rbtree_add(t1, &n->r, less)   |
+ *     m = bpf_refcount_acquire(n)  [1]  |
+ *     unlock(l1)                        |
+ *     kptr_xchg(mapval, m)         [2]  |
+ *     --------------------------------------------------------------
+ *                                       |    o = kptr_xchg(mapval, NULL)  [3]
+ *                                       |    lock(l2)
+ *                                       |    rbtree_add(t2, &o->r, less)  [4]
+ *     --------------------------------------------------------------
+ *     lock(l1)                          |
+ *     p = rbtree_first(t1)              |
+ *     p = rbtree_remove(t1, p)          |
+ *     unlock(l1)                        |
+ *     if (p)                            |
+ *       bpf_obj_drop(p)  [5]            |
+ *     --------------------------------------------------------------
+ *                                       |    q = bpf_refcount_acquire(o)  [6]
+ *                                       |    unlock(l2)
+ *
+ * If bpf_refcount_acquire can't fail, the sequence of operations on the node's
+ * refcount is:
+ *    [0] - refcount initialized to 1
+ *    [1] - refcount bumped to 2
+ *    [2] - refcount is still 2, but m's ownership passed to mapval
+ *    [3] - refcount is still 2, mapval's ownership passed to o
+ *    [4] - refcount is decr'd to 1, rbtree_add fails, node is already in t1
+ *          o is converted to non-owning reference
+ *    [5] - refcount is decr'd to 0, node free'd
+ *    [6] - refcount is incr'd to 1 from 0, ERROR
+ *
+ * To prevent [6] bpf_refcount_acquire was made failable. This interleaving is
+ * used to test failable refcount_acquire.
+ *
+ * The two halves of CPU 0's operations are implemented by
+ * add_refcounted_node_to_tree_and_stash and remove_refcounted_node_from_tree.
+ * We can't do the same for CPU 1's operations due to l2 critical section.
+ * Instead, bpf__unsafe_spin_{lock, unlock} are used to ensure the expected
+ * order of operations.
+ */
+
+SEC("tc")
+long add_refcounted_node_to_tree_and_stash(void *ctx)
+{
+	long err;
+
+	err = __stash_map_insert_tree(0, 42, &root, &lock);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+SEC("tc")
+long remove_refcounted_node_from_tree(void *ctx)
+{
+	long ret = 0;
+
+	/* rem_node_lock is held by another program to force race */
+	bpf__unsafe_spin_lock(&rem_node_lock);
+	ret = __read_from_tree(&root, &lock, true);
+	if (ret != 42)
+		return ret;
+
+	bpf__unsafe_spin_unlock(&rem_node_lock);
+	return 0;
+}
+
+/* ref_check_n numbers correspond to refcount operation points in comment above */
+int ref_check_3, ref_check_4, ref_check_5;
+
+SEC("tc")
+long unstash_add_and_acquire_refcount(void *ctx)
+{
+	struct map_value *mapval;
+	struct node_data *n, *m;
+	int idx = 0;
+
+	mapval = bpf_map_lookup_elem(&stashed_nodes, &idx);
+	if (!mapval)
+		return -1;
+
+	n = bpf_kptr_xchg(&mapval->node, NULL);
+	if (!n)
+		return -2;
+	ref_check_3 = bpf_refcount_read(&n->ref);
+
+	bpf_spin_lock(&lock2);
+	bpf_rbtree_add(&root2, &n->r, less);
+	ref_check_4 = bpf_refcount_read(&n->ref);
+
+	/* Let CPU 0 do first->remove->drop */
+	bpf__unsafe_spin_unlock(&rem_node_lock);
+
+	/* ref_acq_lock is held by another program to force race
+	 * when this program holds the lock, remove_refcounted_node_from_tree
+	 * has finished
+	 */
+	bpf__unsafe_spin_lock(&ref_acq_lock);
+	ref_check_5 = bpf_refcount_read(&n->ref);
+
+	/* Error-causing use-after-free incr ([6] in long comment above) */
+	m = bpf_refcount_acquire(n);
+	bpf__unsafe_spin_unlock(&ref_acq_lock);
+
+	bpf_spin_unlock(&lock2);
+
+	if (m) {
+		bpf_obj_drop(m);
+		return -3;
+	}
+
+	return !!m;
+}
+
 char _license[] SEC("license") = "GPL";

From patchwork Thu May  4 05:33:37 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230778
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E29F9C77B78
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229522AbjEDFe2 (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:34:28 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50440 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229607AbjEDFe1 (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:34:27 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CCB31BEB
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:34:26 -0700 (PDT)
Received: from pps.filterd (m0109331.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 3440qY3w010485
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:34:25 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=XWrSlb+AG1qaHY7bnOaCSLx8bqGpMVGegcEPyDdpsRg=;
 b=RbQwNueuSMDDapeKtppTAjhL3v/gv3r99BTEud1f3mOCk1M2DUEKaJNfAx76vEJhn2uH
 edWcRppmOmdLoTSndoZNvMnEEx3Kkn/pmIhYedT3XYLDQU7ZJCIcECfWl7pRsMfz9cI4
 b7TVilDoIkfWwc7lhjiJIUu4ghQeV8vLLF0=
Received: from mail.thefacebook.com ([163.114.132.120])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qc1ua9jdy-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:34:25 -0700
Received: from twshared35445.38.frc1.facebook.com (2620:10d:c085:208::11) by
 mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:34:13 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 28BAA1D7BFC88; Wed,  3 May 2023 22:33:55 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 8/9] selftests/bpf: Disable newly-added
 refcounted_kptr_races test
Date: Wed, 3 May 2023 22:33:37 -0700
Message-ID: <20230504053338.1778690-9-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-ORIG-GUID: UzeuhV0Kkat3ln9eTV5WGUdavadTRV8v
X-Proofpoint-GUID: UzeuhV0Kkat3ln9eTV5WGUdavadTRV8v
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

The previous patch added a new test exercising a race condition which
was fixed earlier in the series. Similarly to other tests in this file,
the new test should not run while bpf_refcount_acquire is disabled as it
requires that kfunc.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../bpf/prog_tests/refcounted_kptr.c          | 100 ------------------
 1 file changed, 100 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index e7fcc1dd8864..6a53f304f3e4 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -18,103 +18,3 @@ void test_refcounted_kptr_fail(void)
 {
 	RUN_TESTS(refcounted_kptr_fail);
 }
-
-static void force_cpu(pthread_t thread, int cpunum)
-{
-	cpu_set_t cpuset;
-	int err;
-
-	CPU_ZERO(&cpuset);
-	CPU_SET(cpunum, &cpuset);
-	err = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
-	if (!ASSERT_OK(err, "pthread_setaffinity_np"))
-		return;
-}
-
-struct refcounted_kptr *skel;
-
-static void *run_unstash_acq_ref(void *unused)
-{
-	LIBBPF_OPTS(bpf_test_run_opts, opts,
-		.data_in = &pkt_v4,
-		.data_size_in = sizeof(pkt_v4),
-		.repeat = 1,
-	);
-	long ret, unstash_acq_ref_fd;
-	force_cpu(pthread_self(), 1);
-
-	unstash_acq_ref_fd = bpf_program__fd(skel->progs.unstash_add_and_acquire_refcount);
-
-	ret = bpf_prog_test_run_opts(unstash_acq_ref_fd, &opts);
-	ASSERT_EQ(opts.retval, 0, "unstash_add_and_acquire_refcount retval");
-	ASSERT_EQ(skel->bss->ref_check_3, 2, "ref_check_3");
-	ASSERT_EQ(skel->bss->ref_check_4, 1, "ref_check_4");
-	ASSERT_EQ(skel->bss->ref_check_5, 0, "ref_check_5");
-	pthread_exit((void *)ret);
-}
-
-void test_refcounted_kptr_races(void)
-{
-	LIBBPF_OPTS(bpf_test_run_opts, opts,
-		.data_in = &pkt_v4,
-		.data_size_in = sizeof(pkt_v4),
-		.repeat = 1,
-	);
-	int ref_acq_lock_fd, ref_acq_unlock_fd, rem_node_lock_fd;
-	int add_stash_fd, remove_tree_fd;
-	pthread_t thread_id;
-	int ret;
-
-	force_cpu(pthread_self(), 0);
-	skel = refcounted_kptr__open_and_load();
-	if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
-		return;
-
-	add_stash_fd = bpf_program__fd(skel->progs.add_refcounted_node_to_tree_and_stash);
-	remove_tree_fd = bpf_program__fd(skel->progs.remove_refcounted_node_from_tree);
-	ref_acq_lock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_lock);
-	ref_acq_unlock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_unlock);
-	rem_node_lock_fd = bpf_program__fd(skel->progs.unsafe_rem_node_lock);
-
-	ret = bpf_prog_test_run_opts(rem_node_lock_fd, &opts);
-	if (!ASSERT_OK(ret, "rem_node_lock"))
-		return;
-
-	ret = bpf_prog_test_run_opts(ref_acq_lock_fd, &opts);
-	if (!ASSERT_OK(ret, "ref_acq_lock"))
-		return;
-
-	ret = bpf_prog_test_run_opts(add_stash_fd, &opts);
-	if (!ASSERT_OK(ret, "add_stash"))
-		return;
-	if (!ASSERT_OK(opts.retval, "add_stash retval"))
-		return;
-
-	ret = pthread_create(&thread_id, NULL, &run_unstash_acq_ref, NULL);
-	if (!ASSERT_OK(ret, "pthread_create"))
-		goto cleanup;
-
-	force_cpu(thread_id, 1);
-
-	/* This program will execute before unstash_acq_ref's refcount_acquire, then
-	 * unstash_acq_ref can proceed after unsafe_unlock
-	 */
-	ret = bpf_prog_test_run_opts(remove_tree_fd, &opts);
-	if (!ASSERT_OK(ret, "remove_tree"))
-		goto cleanup;
-
-	ret = bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
-	if (!ASSERT_OK(ret, "ref_acq_unlock"))
-		goto cleanup;
-
-	ret = pthread_join(thread_id, NULL);
-	if (!ASSERT_OK(ret, "pthread_join"))
-		goto cleanup;
-
-	refcounted_kptr__destroy(skel);
-	return;
-cleanup:
-	bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
-	refcounted_kptr__destroy(skel);
-	return;
-}

From patchwork Thu May  4 05:33:38 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Marchevsky <davemarchevsky@fb.com>
X-Patchwork-Id: 13230777
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <bpf-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 65A5FC7EE23
	for <bpf@archiver.kernel.org>; Thu,  4 May 2023 05:34:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229619AbjEDFeP (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Thu, 4 May 2023 01:34:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50198 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229622AbjEDFeM (ORCPT <rfc822;bpf@vger.kernel.org>);
        Thu, 4 May 2023 01:34:12 -0400
Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com
 [67.231.153.30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF24B2128
        for <bpf@vger.kernel.org>; Wed,  3 May 2023 22:34:09 -0700 (PDT)
Received: from pps.filterd (m0148460.ppops.net [127.0.0.1])
        by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 3440qV83025505
        for <bpf@vger.kernel.org>; Wed, 3 May 2023 22:34:08 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com;
 h=from : to : cc : subject
 : date : message-id : in-reply-to : references : mime-version :
 content-transfer-encoding : content-type; s=facebook;
 bh=MfdQDA9uhS6eEkZuafW+fI3gyC3ZMjiWPTRJzziSk1E=;
 b=C1KmzyTE+wlr2n+vO+p7Lj+xv/rbORl/SpB5q0Fm5RmikpwhaW3zWGLjKqfe9JIISRag
 1b/08M0g6hWpQdwHWtEJs9uiTF64dV46+2YaWj00JQiUESeNEeXNPU0gpC8qu6WQIrYb
 PfZvZ3MdrabvzefWWyhG3iQTuwrQoIwlKKw=
Received: from maileast.thefacebook.com ([163.114.130.16])
        by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qbjd080mx-2
        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128
 verify=NOT)
        for <bpf@vger.kernel.org>; Wed, 03 May 2023 22:34:08 -0700
Received: from twshared4902.04.ash8.facebook.com (2620:10d:c0a8:1b::d) by
 mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Wed, 3 May 2023 22:34:07 -0700
Received: by devbig077.ldc1.facebook.com (Postfix, from userid 158236)
        id 11D2D1D7BFC8F; Wed,  3 May 2023 22:33:58 -0700 (PDT)
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
CC: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andrii Nakryiko <andrii@kernel.org>,
        Martin KaFai Lau <martin.lau@kernel.org>,
        Kernel Team <kernel-team@fb.com>,
        Kumar Kartikeya Dwivedi <memxor@gmail.com>,
        Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 9/9] [DONOTAPPLY] Revert "selftests/bpf: Disable
 newly-added refcounted_kptr_races test"
Date: Wed, 3 May 2023 22:33:38 -0700
Message-ID: <20230504053338.1778690-10-davemarchevsky@fb.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230504053338.1778690-1-davemarchevsky@fb.com>
References: <20230504053338.1778690-1-davemarchevsky@fb.com>
MIME-Version: 1.0
X-FB-Internal: Safe
X-Proofpoint-GUID: pVd0oLbKZBYlnAhhKNOIf5r7Z0YQZ7U9
X-Proofpoint-ORIG-GUID: pVd0oLbKZBYlnAhhKNOIf5r7Z0YQZ7U9
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-05-04_02,2023-05-03_01,2023-02-09_01
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org
X-Patchwork-Delegate: bpf@iogearbox.net

This patch reverts the previous patch's disabling of
refcounted_kptr_races selftest. It is included with the series so that
BPF CI will be able to run the test. This patch should not be applied -
followups which fix remaining bpf_refcount issues will re-enable this
test.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../bpf/prog_tests/refcounted_kptr.c          | 100 ++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index 6a53f304f3e4..e7fcc1dd8864 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -18,3 +18,103 @@ void test_refcounted_kptr_fail(void)
 {
 	RUN_TESTS(refcounted_kptr_fail);
 }
+
+static void force_cpu(pthread_t thread, int cpunum)
+{
+	cpu_set_t cpuset;
+	int err;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(cpunum, &cpuset);
+	err = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
+	if (!ASSERT_OK(err, "pthread_setaffinity_np"))
+		return;
+}
+
+struct refcounted_kptr *skel;
+
+static void *run_unstash_acq_ref(void *unused)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	long ret, unstash_acq_ref_fd;
+	force_cpu(pthread_self(), 1);
+
+	unstash_acq_ref_fd = bpf_program__fd(skel->progs.unstash_add_and_acquire_refcount);
+
+	ret = bpf_prog_test_run_opts(unstash_acq_ref_fd, &opts);
+	ASSERT_EQ(opts.retval, 0, "unstash_add_and_acquire_refcount retval");
+	ASSERT_EQ(skel->bss->ref_check_3, 2, "ref_check_3");
+	ASSERT_EQ(skel->bss->ref_check_4, 1, "ref_check_4");
+	ASSERT_EQ(skel->bss->ref_check_5, 0, "ref_check_5");
+	pthread_exit((void *)ret);
+}
+
+void test_refcounted_kptr_races(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	int ref_acq_lock_fd, ref_acq_unlock_fd, rem_node_lock_fd;
+	int add_stash_fd, remove_tree_fd;
+	pthread_t thread_id;
+	int ret;
+
+	force_cpu(pthread_self(), 0);
+	skel = refcounted_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
+		return;
+
+	add_stash_fd = bpf_program__fd(skel->progs.add_refcounted_node_to_tree_and_stash);
+	remove_tree_fd = bpf_program__fd(skel->progs.remove_refcounted_node_from_tree);
+	ref_acq_lock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_lock);
+	ref_acq_unlock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_unlock);
+	rem_node_lock_fd = bpf_program__fd(skel->progs.unsafe_rem_node_lock);
+
+	ret = bpf_prog_test_run_opts(rem_node_lock_fd, &opts);
+	if (!ASSERT_OK(ret, "rem_node_lock"))
+		return;
+
+	ret = bpf_prog_test_run_opts(ref_acq_lock_fd, &opts);
+	if (!ASSERT_OK(ret, "ref_acq_lock"))
+		return;
+
+	ret = bpf_prog_test_run_opts(add_stash_fd, &opts);
+	if (!ASSERT_OK(ret, "add_stash"))
+		return;
+	if (!ASSERT_OK(opts.retval, "add_stash retval"))
+		return;
+
+	ret = pthread_create(&thread_id, NULL, &run_unstash_acq_ref, NULL);
+	if (!ASSERT_OK(ret, "pthread_create"))
+		goto cleanup;
+
+	force_cpu(thread_id, 1);
+
+	/* This program will execute before unstash_acq_ref's refcount_acquire, then
+	 * unstash_acq_ref can proceed after unsafe_unlock
+	 */
+	ret = bpf_prog_test_run_opts(remove_tree_fd, &opts);
+	if (!ASSERT_OK(ret, "remove_tree"))
+		goto cleanup;
+
+	ret = bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
+	if (!ASSERT_OK(ret, "ref_acq_unlock"))
+		goto cleanup;
+
+	ret = pthread_join(thread_id, NULL);
+	if (!ASSERT_OK(ret, "pthread_join"))
+		goto cleanup;
+
+	refcounted_kptr__destroy(skel);
+	return;
+cleanup:
+	bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
+	refcounted_kptr__destroy(skel);
+	return;
+}