From patchwork Mon Mar  3 15:22:41 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999018
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A38872144C3;
	Mon,  3 Mar 2025 15:23:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015395; cv=none;
 b=Hne3ljpHVk6JZKDU/UOdIU7mmxJz0R/c3VhkKccGsMmdU6FCcYT/RWevvSMIh4cXRHL9tPAwshDTIZpeCRgoTikduiCHzv1Xbgbu7y62l8JlxEaOYlp2mRlty2z/zpcgPSugVomj97qibrK0++I99eGqB1pF4goclDmQhE17ShM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015395; c=relaxed/simple;
	bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=sIQTD3DBg1c1tvTNLs+RPA7dp3pIGtwPrymIgGlFNGfUIJSvDKbiOOv4Za8l5PUK/uZnyQgsYICJRvQb2bjZzH9Z23QNZEFI114e+VwuaM5E+Kf4cgUMo9jR9tD0rrpg+smC7EOaWL19OsU2eK0Vqq4s5Mwdl3j8bKD29vP7RxE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Z51Dp6Df; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Z51Dp6Df"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-4393dc02b78so29222855e9.3;
        Mon, 03 Mar 2025 07:23:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015392; x=1741620192;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=;
        b=Z51Dp6DfOVoMM2SZoNOnS7Y51ZJ5AiIM8lmnXn2DSEP4GknEP/eiGT8HomESgYohPL
         6d2YosDZjlp6I0NjUyey05jwGux6PNlAL0RgzhL79/ShLoV2afTLvb89LLtgkiJtSf4U
         D8yQln4/ti/H+ZvMwW/ORzdiVOcx29E3AfOM2Phs+eX0F1IPmEEu1jyxKYg/Ya/F5uqu
         bOvgsgrlQtGaWwmNFwXgFciavzLvbssm0sPM2VeVM1twHAaT6HSIt1B7RI50ZNQ9C/nJ
         SWA95HdnQnJS2aVb3ZoMz9RO0pqlREAgqfGbfYKUr0wxPh3xotUCqyDdxDCqnSx+tgO9
         ClNw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015392; x=1741620192;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=;
        b=OaxrvSj37OW5LI/2fiSlEImqa3atd9ZIM/WMtuwFxP7hqbWKYQghcXIYaPm8zLlxIM
         IYsqoYgNNI/DqzTq2NcMRZU0ywlmeOJjsI4T18hJYFzFP0QP1IJBBzq+/4eGiW3IcRa3
         VjmmhFXHWxINgsZEpepynIutEAlwVhOwkSBsNsvqm8rbjsqOGcj9BDDsYh2dqcIHeIvE
         bfei30ZZihPwTp5qBXiwk47xdNSvO+moJi4s5uwU6p7izixoBBqqCI9PruBDdzbmf3WQ
         7DyXSUv6tMshUjYuBsiIYwXi7huBltXALXkoD6cclmjT+BBK1iFYW4Qan9lokrFTfN/W
         kMcQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCXPYWbPPUBLyCcS639f8MTPs0BLRo0EK5+ooJkmDRGarVcc+rUzIbRMLe/OGYTD/P/mEqNLTc/LGuuu6KM=@vger.kernel.org
X-Gm-Message-State: AOJu0YyA18NYncjoL4F4Fq4pWxqv29L2KegsakNoWp1KDyvwy97eii/G
	xxuK9xpPX7zJWzDhuUMAqd6sZi63jh3ptuw4z+M7ZKe5UrvMbAA9JSnmeYGPFYU=
X-Gm-Gg: ASbGncuSFvYJN0BiEQRQFGnzI6UqVda+et7UBRXsd6MBjtnt7+sGq5PkLJNVoxJTxlL
	TXvp9DSbeP6jZYiMjWNMg6KJD2o2TlzwLCOFqBvBUX3WjANeNfnyFkuVq5gYOoM7yPcWehLH7OV
	dOzr0g/e0Obs6yUidiutKYHSnac2SBf2nZMPrUm0gm5s3DIIrZjdoJ9kbazB0KZClxMZ0qUBfo1
	leEP/WdLwDNi3K/PI2gs4AKXdjL37VnAVbX84AwuDxOWCV5/Ax26DEjL1KkFIXqhi6ffH5KxN//
	Db1RIw2Q0Jaic5mSCpCcJXCXarw5ZsTg2A==
X-Google-Smtp-Source: 
 AGHT+IGWe7AGIoZyzhIx4o6frHx+jxnpDfdaTig7kvrlS9c7aRLExKFsaeXCaLAhRyKx5jPukd6geQ==
X-Received: by 2002:a05:600c:4f86:b0:43b:ca39:6c75 with SMTP id
 5b1f17b1804b1-43bca3972d7mr4843355e9.16.1741015391407;
        Mon, 03 Mar 2025 07:23:11 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:4::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43b736f75ebsm164799315e9.3.2025.03.03.07.23.10
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:11 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 01/25] locking: Move MCS struct definition to
 public header
Date: Mon,  3 Mar 2025 07:22:41 -0800
Message-ID: <20250303152305.3195648-2-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1522; h=from:subject;
 bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVghejvu1V67bqXOIvMlrdiOA5EFuq9ml673Px
 YVwX7L2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8RypodD/
 sHpZiA9+maif3gXLaLeb6BTa8ziYYbNY7ZG6/NdOcNw/uV3egEUUzRRqjKaMmDsRtl8u3i8UuKv93i
 Jpn+XC/A+FwN36qoS13Atwe1+eFSD/ZK0b52m1Tgy5sHI8wLBe/eCop0B6+TdV+nqpCUG22mIxK8kL
 yxABesXn6haH0qjCe1WgAx2qbP15M5KLx2leHpzsbXkBjQN/URoZHzBv3oyjrWiFXXn0R1lS3yGOwR
 V3j3EA44LSwOtcKzHtFmZOPnzXP9h/VaWi9Q/827Uq8Q5YpRgAhsDHaBEinMJ6xrWKWKxzosaTM4qZ
 3Kl57QWIF7hlmRccjfD2KGQDbup22QXtz34QHm0FjNknfyT3oVfUVoJ8RTYsXR4h6sIkAEV8mEu9Il
 8o/qwNMFXj/K1a3Z2+hKs9CkXdftARzIiKNhnOLoZ6OcMybgSp86eS2StUIeegiSSPWCLSHJRijnuU
 wb0sFuJWRQ6eveZiG59om0uLPkO1knbSuAk+dkykWu5TmE9nKyt7WXULkN9Q6+rkG9Lkt1jV89MQEX
 3sPA7uXmn3t9nLxQINGa3cpYEru6LaMTvUmeUSf5maMPLnh3CMlBYvFVMi5q2JPW9m7JyF1/sqOLrp
 wGbPN0FktYyVe186BbPjg+7BtvK8bU5OX0ZT5CupTYRg2a1ZfvVBTo+6gZnQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Move the definition of the struct mcs_spinlock from the private
mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h
asm-generic header, since we will need to reference it from the
qspinlock.h header in subsequent commits.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/mcs_spinlock.h | 6 ++++++
 kernel/locking/mcs_spinlock.h      | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h
index 10cd4ffc6ba2..39c94012b88a 100644
--- a/include/asm-generic/mcs_spinlock.h
+++ b/include/asm-generic/mcs_spinlock.h
@@ -1,6 +1,12 @@
 #ifndef __ASM_MCS_SPINLOCK_H
 #define __ASM_MCS_SPINLOCK_H
 
+struct mcs_spinlock {
+	struct mcs_spinlock *next;
+	int locked; /* 1 if lock acquired */
+	int count;  /* nesting count, see qspinlock.c */
+};
+
 /*
  * Architectures can define their own:
  *
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 85251d8771d9..16160ca8907f 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -15,12 +15,6 @@
 
 #include <asm/mcs_spinlock.h>
 
-struct mcs_spinlock {
-	struct mcs_spinlock *next;
-	int locked; /* 1 if lock acquired */
-	int count;  /* nesting count, see qspinlock.c */
-};
-
 #ifndef arch_mcs_spin_lock_contended
 /*
  * Using smp_cond_load_acquire() provides the acquire semantics

From patchwork Mon Mar  3 15:22:42 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999020
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com
 [209.85.221.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11C472144B8;
	Mon,  3 Mar 2025 15:23:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015397; cv=none;
 b=Z4WRHfkPFT3/oFVXT6rj+CArfhTj+JTzBcbOMEx88CD4e1pAyuDPMeRmFu7/b0OY7Gf39Mzq4WaaS4ktCK7VHzE55eLq/FfJSYBgqkd8uiZAJ2loThqJh6JxlbfngxKc2AwsE5rn2U6mrkXXg+RLzl4r0ThXDK/87cuNgfpGI+Q=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015397; c=relaxed/simple;
	bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=BzdtEkxaQbKvj4HmUSt24ItguLENV6gVtBU0hzIV6NA4xRHuifKk+9hlq891vUWkFl9KW0m1inF/hVk4F1PTUrBjs6DxCq93/KUulbvTtHDFBoYZza+agV1xWFoYfNzI11shN7rD7DUbrfhcAKeOmlqkdkNxanBmXCLHioZIH7M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=ZiI5Fwmv; arc=none smtp.client-ip=209.85.221.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="ZiI5Fwmv"
Received: by mail-wr1-f68.google.com with SMTP id
 ffacd0b85a97d-38a25d4b9d4so2800697f8f.0;
        Mon, 03 Mar 2025 07:23:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015393; x=1741620193;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=;
        b=ZiI5FwmvmJlS7OLfuaAkvdSj9VJomiPAso2WUwHXGLZIkgfXxDUI+E4hINbsMyiXy2
         NO10lb2PnqVcmmM6WdBWCbZycRwgr+Y1CZ5ecsKscmumYdyBEK6naUi+Rvijs4928WWv
         ak8z35fHd+IpITa3z+4pjctQ/Tbs8ExARrFHpEH6yLniHiw6f9tFVEUyMu1fCkkrvbEV
         657v8+wVL9D6pCngryTNKTAAm4kud4mw455KGcqDBr0DWsCtM0rtU+T9+3GB9YV5+kHk
         AzrC3RVoWdU0lXG5BC/PtS6spTH1iaQ04hL1C8aMcIIib+9rP5rCMwMfFD15fzOXMA7x
         nS7w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015393; x=1741620193;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=;
        b=pG1BOb0V6zEFAWtfoK4UdUM9zWcvSIUN/3AQhgxKp9TDPTg5u8pV3aqyT5VHGzJ0Tx
         CLD7KcVGckeM05bRDxBXoSRMeHtIlys++IFeu58blj7+qmECzn6RC8TgPU4Nw7x2t0qe
         Fd++54iha21zbQabqQJM2fAAesBE3lZfo+s72G48oZ2vG5YqJxqg37b+29uijH44+vK9
         XQZxgNS62qrWjuetpP48nYfH/aVKeM7WfWEQNsKzTQ1nmp9NBLIKsA0/RCMsVlUmaM8d
         vnIaYir0+q226rAYPqcnk/08qD9lkCL/UoIY7vX7wYyjlHSoCdLKIeFzmPZXwy2McRxy
         uOwg==
X-Forwarded-Encrypted: i=1;
 AJvYcCV8UMZbhIH6frs2xiSona1bh6Rc4B8Gv7lp80EArOTZHIhmWXZjRGAS5gqFWsWiLPsRBVZa7qU+d6LaaTY=@vger.kernel.org
X-Gm-Message-State: AOJu0YwHdWzhjqo5/OQvpeslVcijTwDKTqNR71ZNRcKxmQ46BgzEz5iA
	QMX7mdwtwWqZf8qRuD59CJnR3BZxQdi2Ct17BbOJz/0UVQDJIXpC2GHkPxyIZ3Y=
X-Gm-Gg: ASbGncvfNB29LGj7+UmIlZH3DdwkNCmOqK6Rg9sIzM1l2Yrge88lVn0TqSVWfvkb0L2
	5uyxY/2s8on8XPU4XClA3nA1mHms7/dHpksaSpBYIS78qx+XKP1UHNRXIV5LLfhjTDDr6kblmsj
	hc+24ZVBx+ncXIQO0fuOhXMVogfoZrmk+NPUmiWRhIQaFf99uqVEjKm/+UpCnVJDH/OEL29TU5I
	70SycISns9C8ZRCCpIyZvjkW+nWd3aXGR0jifba6RFXHtfKaklOJgsZUupBZVq79YPVquibZalN
	dT0KiySS6RiUNI2mt0nysT7bqjU8cCD6R5U=
X-Google-Smtp-Source: 
 AGHT+IEZmbL4kTVe2++oKcRU5JQYI+CySPZk9bC17VESbEqwwtp816GvVhoQy61ssaTtOJsaTHaHDQ==
X-Received: by 2002:a5d:64a5:0:b0:391:b23:b318 with SMTP id
 ffacd0b85a97d-3910b23b573mr3648211f8f.41.1741015392642;
        Mon, 03 Mar 2025 07:23:12 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:72::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e486eea3sm14548017f8f.101.2025.03.03.07.23.11
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:12 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 02/25] locking: Move common qspinlock helpers to a
 private header
Date: Mon,  3 Mar 2025 07:22:42 -0800
Message-ID: <20250303152305.3195648-3-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=13562; h=from:subject;
 bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVkVfZqT1tCitfFNFTby5Hz/Q0Ls5KtoFEDTCL
 cZHH7UOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8RypysEA
 C3leDFcx+MonrV13UNP/a1KQjzZBifx/rS8N1isQIHzlulO71sBTAbTJDuZM6yq/0bU/qUs8hwXLq/
 w8imvbbOhV0sg2v0XcsUu+Qqaji1dlaEl8+Yzo1SsBEL0NAFxDJtE2OgpBRNEITRSoUsbzwYGVJItt
 ptZni9OCm9cqgf2GOBiN2XXULRyhnYmdK/eUHMcHrEQvOy2rOdJNpf2dSxW8CcYd49oZaMAqF319OI
 19D1ra8czLwFo1Y3MCrEQuT73bn0YZIwfZYdmwgoihZWMjvvZ6rNwRLzpDN5XJLhZLUZEK6pUn5Xpe
 GcTDtEFLciWRXU6QhN2gu0ZAm9XKtRaR4Hp5yOy3qf2tkMxt5L5ad1BnhyPm8drcY8LQPc/zB5fuIX
 t029FoAw1yTCn//bVcO6KrFEyGMCMgZ8/UtZpPrcIB70+R1GFk0CB/u0iyKykO1ee5ke5aVFFPZMsb
 KGBfAxmEcDwbj/LlC/q7xnfODH0cObC073WRxVzQMIOOBYdONvuAD5d1+YlWUOSAUNWrB8olMJS67I
 sIUraNIODLQ7Rw8YCl2Ekg1SRQA0pvBono+BLof7ZvXRM/L1GUVhc8kNk6XpOToWnQf5XX5OU35vVG
 7APJ8S46OuAh06qCXM9teIlkPaNqlA0TdWqHhyfcftt9TvzHsXib5eD7UN6A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Move qspinlock helper functions that encode, decode tail word, set and
clear the pending and locked bits, and other miscellaneous definitions
and macros to a private header. To this end, create a qspinlock.h header
file in kernel/locking. Subsequent commits will introduce a modified
qspinlock slow path function, thus moving shared code to a private
header will help minimize unnecessary code duplication.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/qspinlock.c | 193 +----------------------------------
 kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++
 2 files changed, 205 insertions(+), 188 deletions(-)
 create mode 100644 kernel/locking/qspinlock.h

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 7d96bed718e4..af8d122bb649 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -25,8 +25,9 @@
 #include <trace/events/lock.h>
 
 /*
- * Include queued spinlock statistics code
+ * Include queued spinlock definitions and statistics code
  */
+#include "qspinlock.h"
 #include "qspinlock_stat.h"
 
 /*
@@ -67,36 +68,6 @@
  */
 
 #include "mcs_spinlock.h"
-#define MAX_NODES	4
-
-/*
- * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
- * size and four of them will fit nicely in one 64-byte cacheline. For
- * pvqspinlock, however, we need more space for extra data. To accommodate
- * that, we insert two more long words to pad it up to 32 bytes. IOW, only
- * two of them can fit in a cacheline in this case. That is OK as it is rare
- * to have more than 2 levels of slowpath nesting in actual use. We don't
- * want to penalize pvqspinlocks to optimize for a rare case in native
- * qspinlocks.
- */
-struct qnode {
-	struct mcs_spinlock mcs;
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-	long reserved[2];
-#endif
-};
-
-/*
- * The pending bit spinning loop count.
- * This heuristic is used to limit the number of lockword accesses
- * made by atomic_cond_read_relaxed when waiting for the lock to
- * transition out of the "== _Q_PENDING_VAL" state. We don't spin
- * indefinitely because there's no guarantee that we'll make forward
- * progress.
- */
-#ifndef _Q_PENDING_LOOPS
-#define _Q_PENDING_LOOPS	1
-#endif
 
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
@@ -106,161 +77,7 @@ struct qnode {
  *
  * PV doubles the storage and uses the second cacheline for PV state.
  */
-static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]);
-
-/*
- * We must be able to distinguish between no-tail and the tail at 0:0,
- * therefore increment the cpu number by one.
- */
-
-static inline __pure u32 encode_tail(int cpu, int idx)
-{
-	u32 tail;
-
-	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
-	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
-
-	return tail;
-}
-
-static inline __pure struct mcs_spinlock *decode_tail(u32 tail)
-{
-	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
-	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
-
-	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
-}
-
-static inline __pure
-struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
-{
-	return &((struct qnode *)base + idx)->mcs;
-}
-
-#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
-
-#if _Q_PENDING_BITS == 8
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->pending, 0);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- *
- * Lock stealing is not allowed if this function is used.
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
-}
-
-/*
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail), which heads an address dependency
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	/*
-	 * We can use relaxed semantics since the caller ensures that the
-	 * MCS node is properly initialized before updating the tail.
-	 */
-	return (u32)xchg_relaxed(&lock->tail,
-				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
-}
-
-#else /* _Q_PENDING_BITS == 8 */
-
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	atomic_andnot(_Q_PENDING_VAL, &lock->val);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
-}
-
-/**
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail)
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	u32 old, new;
-
-	old = atomic_read(&lock->val);
-	do {
-		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
-		/*
-		 * We can use relaxed semantics since the caller ensures that
-		 * the MCS node is properly initialized before updating the
-		 * tail.
-		 */
-	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
-
-	return old;
-}
-#endif /* _Q_PENDING_BITS == 8 */
-
-/**
- * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
- * @lock : Pointer to queued spinlock structure
- * Return: The previous lock value
- *
- * *,*,* -> *,1,*
- */
-#ifndef queued_fetch_set_pending_acquire
-static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
-{
-	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
-}
-#endif
-
-/**
- * set_locked - Set the lock bit and own the lock
- * @lock: Pointer to queued spinlock structure
- *
- * *,*,0 -> *,0,1
- */
-static __always_inline void set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
-}
-
+static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
 /*
  * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for
@@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * any MCS node. This is not the most elegant solution, but is
 	 * simple enough.
 	 */
-	if (unlikely(idx >= MAX_NODES)) {
+	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
 		while (!queued_spin_trylock(lock))
 			cpu_relax();
@@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
-		prev = decode_tail(old);
+		prev = decode_tail(old, qnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h
new file mode 100644
index 000000000000..d4ceb9490365
--- /dev/null
+++ b/kernel/locking/qspinlock.h
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Queued spinlock defines
+ *
+ * This file contains macro definitions and functions shared between different
+ * qspinlock slow path implementations.
+ */
+#ifndef __LINUX_QSPINLOCK_H
+#define __LINUX_QSPINLOCK_H
+
+#include <asm-generic/percpu.h>
+#include <linux/percpu-defs.h>
+#include <asm-generic/qspinlock.h>
+#include <asm-generic/mcs_spinlock.h>
+
+#define _Q_MAX_NODES	4
+
+/*
+ * The pending bit spinning loop count.
+ * This heuristic is used to limit the number of lockword accesses
+ * made by atomic_cond_read_relaxed when waiting for the lock to
+ * transition out of the "== _Q_PENDING_VAL" state. We don't spin
+ * indefinitely because there's no guarantee that we'll make forward
+ * progress.
+ */
+#ifndef _Q_PENDING_LOOPS
+#define _Q_PENDING_LOOPS	1
+#endif
+
+/*
+ * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
+ * size and four of them will fit nicely in one 64-byte cacheline. For
+ * pvqspinlock, however, we need more space for extra data. To accommodate
+ * that, we insert two more long words to pad it up to 32 bytes. IOW, only
+ * two of them can fit in a cacheline in this case. That is OK as it is rare
+ * to have more than 2 levels of slowpath nesting in actual use. We don't
+ * want to penalize pvqspinlocks to optimize for a rare case in native
+ * qspinlocks.
+ */
+struct qnode {
+	struct mcs_spinlock mcs;
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+	long reserved[2];
+#endif
+};
+
+/*
+ * We must be able to distinguish between no-tail and the tail at 0:0,
+ * therefore increment the cpu number by one.
+ */
+
+static inline __pure u32 encode_tail(int cpu, int idx)
+{
+	u32 tail;
+
+	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
+	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
+
+	return tail;
+}
+
+static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes)
+{
+	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
+	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
+
+	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
+}
+
+static inline __pure
+struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
+{
+	return &((struct qnode *)base + idx)->mcs;
+}
+
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
+#if _Q_PENDING_BITS == 8
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->pending, 0);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ *
+ * Lock stealing is not allowed if this function is used.
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
+}
+
+/*
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail), which heads an address dependency
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	/*
+	 * We can use relaxed semantics since the caller ensures that the
+	 * MCS node is properly initialized before updating the tail.
+	 */
+	return (u32)xchg_relaxed(&lock->tail,
+				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
+}
+
+#else /* _Q_PENDING_BITS == 8 */
+
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	atomic_andnot(_Q_PENDING_VAL, &lock->val);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
+}
+
+/**
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
+		/*
+		 * We can use relaxed semantics since the caller ensures that
+		 * the MCS node is properly initialized before updating the
+		 * tail.
+		 */
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return old;
+}
+#endif /* _Q_PENDING_BITS == 8 */
+
+/**
+ * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
+ * @lock : Pointer to queued spinlock structure
+ * Return: The previous lock value
+ *
+ * *,*,* -> *,1,*
+ */
+#ifndef queued_fetch_set_pending_acquire
+static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
+{
+	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
+}
+#endif
+
+/**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,*,0 -> *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
+}
+
+#endif /* __LINUX_QSPINLOCK_H */

From patchwork Mon Mar  3 15:22:43 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999021
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com
 [209.85.221.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 116B1214A9F;
	Mon,  3 Mar 2025 15:23:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015397; cv=none;
 b=XsuF3zjpYwC8DWuv9B6/dM8NQZxj3fET3S9mwP63jI4OYnjn1EE2GZCZEgjCo4EKM00IHlrA07nn26/qUrMGoZcXfqcXhtlTVZp6x9HXv1+9EuugxD8d6SSJ4COe4exkK9CLDgH6n/8nHVqhdRlFHFPDJQLIPqTmNXVrAcaOwwo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015397; c=relaxed/simple;
	bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=XMsz5NTwCHgjjcHc5kj1vRhLV3xaxQNKs4pk6jdBSSSIgUFQonICox/nC/cEPANi6Re8Td5I7oje3HKLWPWvknYL4HE7FZdoBtpnJbaNj5pbM0X89JbiBCGom1fHO65FrskiNOmQWz9YB/LrcrfAJ9BDOIcHuFyzu05OiA2A4LM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=FF+3ClHU; arc=none smtp.client-ip=209.85.221.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="FF+3ClHU"
Received: by mail-wr1-f67.google.com with SMTP id
 ffacd0b85a97d-38f406e9f80so3511875f8f.2;
        Mon, 03 Mar 2025 07:23:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015394; x=1741620194;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=;
        b=FF+3ClHUM5qdsDNkKTJaQppHYYQOWx5AOshQ5SK+m8bovKSGD/6fsotHYAbxxzSHKz
         Pnd8S3v1bmtuG2lLBOB1JCyWbNlpds7KJW1PmT3+aTRYjNowIkiLqgwxDGmYWsqybxbP
         XPyKPtcpU2IcwgaYnFmcmJl4zE6seVO8ZnaPAjz4Cqc9AuqOvMsacKzVtBqBzXsm2bd1
         WhPm96D1sXqyn+1nUnjFu2+dFqzjXsLm4SEcvudV9bKX+tII3eRIsB5vs4FlThElTanX
         nDDaRvo6XPtVeRDYoksAhBo95NvoLtk+mPNc6lxHlrFUq2bf1ybvKY3+9eWC+QDmk3uv
         0jXw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015394; x=1741620194;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=;
        b=a9FCb++9XXeMv5Qe856/bQdFNgVGI5/R+0Y34pSN8jLxzucnh+b1fp1VdbHgCmvnUh
         p46XyEW9y1fQm74wcmhUm8rOpe+G9VL0TVXBJeJojZxATcHx0jvGdQHEjZ98XuUnFDqM
         T9bCgPDUNQmgpiqYuGCXD8zAHJhqiPM4/QYM7l1P5+KQn4sJ4JPySLgFVTfgFQ4K28rf
         9bZag51uuo4ZLFzgmtvJoWAFg3dMawn31Wn4eATsJ46WTdxrEPfDbJ0LIcELOZSlbJMp
         uMLJ9ru5nyzpkTmU68+exxovbBPZN3jrNIl9HpEMLkrxDYMIXuqXl7XsqgTbHmnplnuA
         xH3A==
X-Forwarded-Encrypted: i=1;
 AJvYcCWxZh6Nqt1mv0bqQgZ4tdQG3KeqWyzZwywZSJ73VASPyAdvkecZTPhsev0h/g3r7HpOcCYoIMxFGIL5Wnk=@vger.kernel.org
X-Gm-Message-State: AOJu0Yz5Jp2QonbPebGWAw94aQ0MUAnW01tgZlVK9eoNoZw4IrNgjxks
	JhXtFwlI1yAlDGk+MBULZOStAgMCkPcJXYXBb2Mt+e7fkIj1/XFzf0+dqqqO42A=
X-Gm-Gg: ASbGncs+LCE3cc1g5+oZcLGBzjRdglH6bPzganBHnWsoO7TQo+FSLGCdwEIcBD/Ytih
	W0c2SnCJWH/dwaND63E/FH+pTEnd4HyDlB3unhxvrvVXOj42G/ocUI7loRzOTT3jpnuln0DggfI
	Buu9PZvMRhkaS4Kli++9JiZlDb3MQJgIgK6kXw36+TOIbXxrvtfbsfPDLdGzOVHeJ1j21lUvAeQ
	6LimM3NmDxfpfk+Qs1yIlbLz4DEqgbA1JWD+TyaskQbEgkxy0gOSIDoHLQigO68rUyU1GiWzZY/
	FX1aiepq7JLlANYIzca5xrlWU9QrpVuv2iw=
X-Google-Smtp-Source: 
 AGHT+IErll0m3qO5vPfIOAEgdmC0C+DMiSxgMRXz0/y2x++hKd/Wp+79L1fagMz6mKX2oQCqD0CzfA==
X-Received: by 2002:a5d:59a4:0:b0:390:f822:3ca3 with SMTP id
 ffacd0b85a97d-390f8223d8amr8118617f8f.37.1741015393939;
        Mon, 03 Mar 2025 07:23:13 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:74::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e47b7c4fsm14904499f8f.52.2025.03.03.07.23.13
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:13 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 03/25] locking: Allow obtaining result of
 arch_mcs_spin_lock_contended
Date: Mon,  3 Mar 2025 07:22:43 -0800
Message-ID: <20250303152305.3195648-4-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1052; h=from:subject;
 bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVvzlsIk8Mh+hFnelZKKgCgtqU9iOBLKbPXk+b
 Hr7ixKqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8Ryow+EA
 C1eFvCJZOUlVvCcUeprQTPLwv72Q6ykc0vZYM6C30cB2mh8RDYCi22FekqJDDuIBP3ggLIEkBn++iL
 HayE8E7tcJD+XUme2bPMlycWLuboqWEkhYbEUmsxEiHDMF0tIGq3CXtTX/EFmdPiqhrfYjK2/U0WMS
 NOVAt9VBBJ2Gkr/Ahg/bKs76BmL83Wf4QvULiFNYpGucOQLCWNYYcOf3+zvSDaQnE1PLiL4lwcEWgJ
 BWhLal7zLGBO37rSCp8pXZxAjZbOmlIMd7El/c3QBiJ/AUbfRk2SI2aoPiCG7z+vImKhFG9QUUxpHv
 FXE64wiHpCMTGwLOtsovyQ7+cDSbyd9myY4DYnoOFearxpOGJS6oDKVpKL+YUs0/lrw0EOGtKl8onS
 nwiFb0cJg40JCw+qXxsA3WdiI5uIK9cMKMra5mIr7h6U3ffX7Tt02zhl0EZujgfUYOKA9Jtck4Hd1x
 mkS1YmJ/dZEejK5aHgZ5LEBPw9S9BHwJj17/s5GOZti6Ra8ymdC+LLaxSZUDhaBNfI2G2cQJ70ISBx
 MVd2vcRakzCnM/Eax/CdmqnGby9zEaPOamB6GxfsXaYvqAw9KSqQjVShNIdzUnP/UWD4l5rcaZ50ZC
 /DEcBbUVqWlsEJY2xKVzRGMI5YAnPpqIb2pZParSp7t5oTMcgQWd2IuyPnAA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

To support upcoming changes that require inspecting the return value
once the conditional waiting loop in arch_mcs_spin_lock_contended
terminates, modify the macro to preserve the result of
smp_cond_load_acquire. This enables checking the return value as needed,
which will help disambiguate the MCS node’s locked state in future
patches.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/mcs_spinlock.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 16160ca8907f..5c92ba199b90 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -24,9 +24,7 @@
  * spinning, and smp_cond_load_acquire() provides that behavior.
  */
 #define arch_mcs_spin_lock_contended(l)					\
-do {									\
-	smp_cond_load_acquire(l, VAL);					\
-} while (0)
+	smp_cond_load_acquire(l, VAL)
 #endif
 
 #ifndef arch_mcs_spin_unlock_contended

From patchwork Mon Mar  3 15:22:44 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999022
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 730F9215065;
	Mon,  3 Mar 2025 15:23:17 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015399; cv=none;
 b=IsxQhYSHgWD7qTzh8Fkl87bZivgh9cBiSK/Q6era/B/fGLFXBx5IETCmCkSfwPQATFFnb1XEtJYLq4q5dV6cRpXO3D/c370gaC2+xPh/4y5BN7KsIjySrHE2YVAIsUMZuQ9IO6i+OAt77vaQjvbG35avfIk/1Y6YzkbbGZ8HFQU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015399; c=relaxed/simple;
	bh=FniirE2Pxgu9OpGG3yatwrPHxVskM6KALs9TZCH9in0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=s+G6pEEbZWoKoY8l5U3E4WGrD3PiD+sy2FfKwwAP0YmpfiQg70Xc3CbgNs+G61qfsqKXHzzs8XpIYrRjV4NHS0aY1ToIlrroIxFc9zl2MYduYA2sm5g97G+MMBNdsGLNBeoljv5MEd0by+ksTYYLSilEw6+tiJN9rffjdyyLpeM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=br5hRMpA; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="br5hRMpA"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-43bc4b1603fso5896375e9.0;
        Mon, 03 Mar 2025 07:23:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015395; x=1741620195;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=vhHDQAJKH3ocHzxkhwcqvNhQHWrp7KSdfiwoYhzvJgU=;
        b=br5hRMpA59qON1flntxQ3C0Mm5yclomIVQuIszk/FZIFIJnjMyy1EurVFzhCZCsXMD
         G0RjYgVvBfZy15+ZkbmdhMItAMppcskRXQQBvLLO1Bxh5bwltPZSiNt/2deaZRQOD2FC
         j7UvkaS8pBuvIlUfUbT5VLPURF7ZnD7Zr9OvX2NjgBdZV7/yBy+SHzlJ+H2dbRJzd1SI
         OW4viYTXe5/iZU1SrJkTMlKUbtHD/K02u6rk4eOFLvwTI9hiiLo1AybFNaWRJlbWmUN2
         Nj2I12109wdPF7TxNfcBbYWk16izboCQlFSEBvTTjKtHA+/6IS262+7+lWanpS4tqCb/
         LCTw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015395; x=1741620195;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=vhHDQAJKH3ocHzxkhwcqvNhQHWrp7KSdfiwoYhzvJgU=;
        b=sTs4ysbxyWF5D5BwhOwFvyzO5ruaYvFvOqi/FLV4M/V2yxgBlsx2KoyIJDR66wi5Qj
         /bPbt/eI+koHGqcdbPxg8fH+Y7yOe7aFT0kXQDB8h2HnmuayPh1hpBuiIcn7k5rw4D/L
         5AsWoCeKu/ZOd6DBlyEfUsnnjPyqeQzwPx0HHr2ZJ2xeb/pKwiK4PofLcTYtKASMYToj
         1uI2T3DwmVUOvL//E23FrM0wTDf4kmpB/NEOpYCnsh0uBn19NxRNygdkuM3VizuW+xUv
         Ug0L+4bUjBm3+KwDcp476XPe1Cu60W5z7BQzJGL63PSrIl619rU6UmlEgL00ua18FcF8
         ehlQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCUxuN6zOy2KWLEiOpIwUfU9QwGdSK6k+quCUYgmEMGsjUCJCy0La3OeqU5H6ZsEh8OMWX4IwQ01rhtwXsc=@vger.kernel.org
X-Gm-Message-State: AOJu0YxUvzsWM8fGztnqoHAYHdO7csoFimKhU0BUo+wJZ+ZPxTG6NA2w
	wvayOpadqACqNX89DYZWH+1j8f2APb8dVFM7hTCBas/eM1mF3a4Cnb0u0JJghiM=
X-Gm-Gg: ASbGncvwrbdslqcaeNgwEzY0zb4P1nNqym3mnUaaAohTv65K/r4nTNpYdzlo/o0O+Fe
	QyXxMUV0oHdITV9vkHmaiZ5k8969/KAFekquOPXpUUlH4NkyI1PyNIuni+dnhU/4xyE3mtZ0UW+
	msn3htUQyaxr5BxS9rxeBnDTBJQcpsuqWCthz8fk7SMx24Ln6QZgLXG9jRc0a1moJtrKpBoKNVM
	UKw+YEqKdANVOvUrypqTiNLXecJ/QNv1Amc2coV33sPbGKK2YggINQi1Ozu8hklkWUiO6QZo3DZ
	n+JpvJLvIIyxIkP0UkArYzfiCWw5XHJ1CfQ=
X-Google-Smtp-Source: 
 AGHT+IE3JAxXjo712pa7fQEf55+AI3IriNv8TobwwGlXtFQEUu+iVyTKfQujHQCVAl/7LqPxVk7beQ==
X-Received: by 2002:a05:6000:1889:b0:391:1388:64ba with SMTP id
 ffacd0b85a97d-391138867d4mr1529975f8f.53.1741015395150;
        Mon, 03 Mar 2025 07:23:15 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:53::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43aba532ba6sm192659325e9.12.2025.03.03.07.23.14
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:14 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 04/25] locking: Copy out qspinlock.c to
 rqspinlock.c
Date: Mon,  3 Mar 2025 07:22:44 -0800
Message-ID: <20250303152305.3195648-5-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=14136; h=from:subject;
 bh=FniirE2Pxgu9OpGG3yatwrPHxVskM6KALs9TZCH9in0=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWndtVbKfQ0brUJQ1Pu+p+CwSfYBiqIiL7G+dP
 dbh8lRCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyhVqD/
 9bLvU1dNWAPayltIPZP16vP1AsOwOHsJyFNXWpDwNiZ1qq0VKLVkyh39BB5atzrU8GmCmvOtk4O21X
 zxnGzqe3KHERKzUnHAxQ84tq7TtWJmYpGb782XCl4nGCku28qV4MMEVf0kY978uJDlkI22pwpgzxXl
 Ad/D867MgAc6EnI7AWwCjx2npU0iiyt8W51gEKTcLQDYOV5an6SMzUJaStowYBPPOF9mZmmO/wlgSy
 x0tWHA3CX2ArkkHhhSPRjicpI5bPnZPRXJDHGP6qT5QHrakIh/iWZ6PgLluepWjV9l0sJJzjzturnU
 5wMo9UFPiC2rDbO6mNGZ46vb5CCMdbXvReB/KUoGr/pYYSLsltLfKFt1Mq8mNXuTeSjOEwrrl45qgQ
 sge0VZeMBTbe0+LF41B8beG6OLHzHrRoI8HOj5/q5nN5UOUhvjVYkQY23mNGf5Otbs1iBxKUAyQvFS
 9bRBOVst7t2mGL3JfUjPU6vhvThxbdWmwiJR9I1BPkEhkSI5Y8qrFR3xatt/2y4QmSYv03XvGMVxiU
 8ux76lHK7T3Z8akuGqQUDJo7e8s7xsc4VQZq+432lnziprAu374O5OhUeEUlGQ+g/UB9aV5dRqWaQx
 cNzDpDl8Pw+F8zsoiwmdYpfZ+eJi06ZdG228YQJakYYr4dF2raMxHKG7hV+w==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

In preparation for introducing a new lock implementation, Resilient
Queued Spin Lock, or rqspinlock, we first begin our modifications by
using the existing qspinlock.c code as the base. Simply copy the code to
a new file and rename functions and variables from 'queued' to
'resilient_queued'.

This helps each subsequent commit in clearly showing how and where the
code is being changed. The only change after a literal copy in this
commit is renaming the functions where necessary, and rename qnodes to
rqnodes.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++
 1 file changed, 410 insertions(+)
 create mode 100644 kernel/locking/rqspinlock.c

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
new file mode 100644
index 000000000000..143d9dda36f9
--- /dev/null
+++ b/kernel/locking/rqspinlock.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
+ * (C) Copyright 2013-2014,2018 Red Hat, Inc.
+ * (C) Copyright 2015 Intel Corp.
+ * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ *
+ * Authors: Waiman Long <longman@redhat.com>
+ *          Peter Zijlstra <peterz@infradead.org>
+ */
+
+#ifndef _GEN_PV_LOCK_SLOWPATH
+
+#include <linux/smp.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+#include <linux/mutex.h>
+#include <linux/prefetch.h>
+#include <asm/byteorder.h>
+#include <asm/qspinlock.h>
+#include <trace/events/lock.h>
+
+/*
+ * Include queued spinlock definitions and statistics code
+ */
+#include "qspinlock.h"
+#include "qspinlock_stat.h"
+
+/*
+ * The basic principle of a queue-based spinlock can best be understood
+ * by studying a classic queue-based spinlock implementation called the
+ * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable
+ * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and
+ * Scott") is available at
+ *
+ * https://bugzilla.kernel.org/show_bug.cgi?id=206115
+ *
+ * This queued spinlock implementation is based on the MCS lock, however to
+ * make it fit the 4 bytes we assume spinlock_t to be, and preserve its
+ * existing API, we must modify it somehow.
+ *
+ * In particular; where the traditional MCS lock consists of a tail pointer
+ * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to
+ * unlock the next pending (next->locked), we compress both these: {tail,
+ * next->locked} into a single u32 value.
+ *
+ * Since a spinlock disables recursion of its own context and there is a limit
+ * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there
+ * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now
+ * we can encode the tail by combining the 2-bit nesting level with the cpu
+ * number. With one byte for the lock value and 3 bytes for the tail, only a
+ * 32-bit word is now needed. Even though we only need 1 bit for the lock,
+ * we extend it to a full byte to achieve better performance for architectures
+ * that support atomic byte write.
+ *
+ * We also change the first spinner to spin on the lock bit instead of its
+ * node; whereby avoiding the need to carry a node from lock to unlock, and
+ * preserving existing lock API. This also makes the unlock code simpler and
+ * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *      atomic operations on smaller 8-bit and 16-bit data types.
+ *
+ */
+
+#include "mcs_spinlock.h"
+
+/*
+ * Per-CPU queue node structures; we can never have more than 4 nested
+ * contexts: task, softirq, hardirq, nmi.
+ *
+ * Exactly fits one 64-byte cacheline on a 64-bit architecture.
+ *
+ * PV doubles the storage and uses the second cacheline for PV state.
+ */
+static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
+
+/*
+ * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
+ * for all the PV callbacks.
+ */
+
+static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
+					   struct mcs_spinlock *prev) { }
+static __always_inline void __pv_kick_node(struct qspinlock *lock,
+					   struct mcs_spinlock *node) { }
+static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
+						   struct mcs_spinlock *node)
+						   { return 0; }
+
+#define pv_enabled()		false
+
+#define pv_init_node		__pv_init_node
+#define pv_wait_node		__pv_wait_node
+#define pv_kick_node		__pv_kick_node
+#define pv_wait_head_or_lock	__pv_wait_head_or_lock
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
+#endif
+
+#endif /* _GEN_PV_LOCK_SLOWPATH */
+
+/**
+ * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ * @val: Current value of the queued spinlock 32-bit word
+ *
+ * (queue tail, pending bit, lock value)
+ *
+ *              fast     :    slow                                  :    unlock
+ *                       :                                          :
+ * uncontended  (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0)
+ *                       :       | ^--------.------.             /  :
+ *                       :       v           \      \            |  :
+ * pending               :    (0,1,1) +--> (0,1,0)   \           |  :
+ *                       :       | ^--'              |           |  :
+ *                       :       v                   |           |  :
+ * uncontended           :    (n,x,y) +--> (n,0,0) --'           |  :
+ *   queue               :       | ^--'                          |  :
+ *                       :       v                               |  :
+ * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
+ *   queue               :         ^--'                             :
+ */
+void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+	struct mcs_spinlock *prev, *next, *node;
+	u32 old, tail;
+	int idx;
+
+	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
+
+	if (pv_enabled())
+		goto pv_queue;
+
+	if (virt_spin_lock(lock))
+		return;
+
+	/*
+	 * Wait for in-progress pending->locked hand-overs with a bounded
+	 * number of spins so that we guarantee forward progress.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	if (val == _Q_PENDING_VAL) {
+		int cnt = _Q_PENDING_LOOPS;
+		val = atomic_cond_read_relaxed(&lock->val,
+					       (VAL != _Q_PENDING_VAL) || !cnt--);
+	}
+
+	/*
+	 * If we observe any contention; queue.
+	 */
+	if (val & ~_Q_LOCKED_MASK)
+		goto queue;
+
+	/*
+	 * trylock || pending
+	 *
+	 * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock
+	 */
+	val = queued_fetch_set_pending_acquire(lock);
+
+	/*
+	 * If we observe contention, there is a concurrent locker.
+	 *
+	 * Undo and queue; our setting of PENDING might have made the
+	 * n,0,0 -> 0,0,0 transition fail and it will now be waiting
+	 * on @next to become !NULL.
+	 */
+	if (unlikely(val & ~_Q_LOCKED_MASK)) {
+
+		/* Undo PENDING if we set it. */
+		if (!(val & _Q_PENDING_MASK))
+			clear_pending(lock);
+
+		goto queue;
+	}
+
+	/*
+	 * We're pending, wait for the owner to go away.
+	 *
+	 * 0,1,1 -> *,1,0
+	 *
+	 * this wait loop must be a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because not all
+	 * clear_pending_set_locked() implementations imply full
+	 * barriers.
+	 */
+	if (val & _Q_LOCKED_MASK)
+		smp_cond_load_acquire(&lock->locked, !VAL);
+
+	/*
+	 * take ownership and clear the pending bit.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	clear_pending_set_locked(lock);
+	lockevent_inc(lock_pending);
+	return;
+
+	/*
+	 * End of pending bit optimistic spinning and beginning of MCS
+	 * queuing.
+	 */
+queue:
+	lockevent_inc(lock_slowpath);
+pv_queue:
+	node = this_cpu_ptr(&rqnodes[0].mcs);
+	idx = node->count++;
+	tail = encode_tail(smp_processor_id(), idx);
+
+	trace_contention_begin(lock, LCB_F_SPIN);
+
+	/*
+	 * 4 nodes are allocated based on the assumption that there will
+	 * not be nested NMIs taking spinlocks. That may not be true in
+	 * some architectures even though the chance of needing more than
+	 * 4 nodes will still be extremely unlikely. When that happens,
+	 * we fall back to spinning on the lock directly without using
+	 * any MCS node. This is not the most elegant solution, but is
+	 * simple enough.
+	 */
+	if (unlikely(idx >= _Q_MAX_NODES)) {
+		lockevent_inc(lock_no_node);
+		while (!queued_spin_trylock(lock))
+			cpu_relax();
+		goto release;
+	}
+
+	node = grab_mcs_node(node, idx);
+
+	/*
+	 * Keep counts of non-zero index values:
+	 */
+	lockevent_cond_inc(lock_use_node2 + idx - 1, idx);
+
+	/*
+	 * Ensure that we increment the head node->count before initialising
+	 * the actual node. If the compiler is kind enough to reorder these
+	 * stores, then an IRQ could overwrite our assignments.
+	 */
+	barrier();
+
+	node->locked = 0;
+	node->next = NULL;
+	pv_init_node(node);
+
+	/*
+	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
+	 * attempt the trylock once more in the hope someone let go while we
+	 * weren't watching.
+	 */
+	if (queued_spin_trylock(lock))
+		goto release;
+
+	/*
+	 * Ensure that the initialisation of @node is complete before we
+	 * publish the updated tail via xchg_tail() and potentially link
+	 * @node into the waitqueue via WRITE_ONCE(prev->next, node) below.
+	 */
+	smp_wmb();
+
+	/*
+	 * Publish the updated tail.
+	 * We have already touched the queueing cacheline; don't bother with
+	 * pending stuff.
+	 *
+	 * p,*,* -> n,*,*
+	 */
+	old = xchg_tail(lock, tail);
+	next = NULL;
+
+	/*
+	 * if there was a previous node; link it and wait until reaching the
+	 * head of the waitqueue.
+	 */
+	if (old & _Q_TAIL_MASK) {
+		prev = decode_tail(old, rqnodes);
+
+		/* Link @node into the waitqueue. */
+		WRITE_ONCE(prev->next, node);
+
+		pv_wait_node(node, prev);
+		arch_mcs_spin_lock_contended(&node->locked);
+
+		/*
+		 * While waiting for the MCS lock, the next pointer may have
+		 * been set by another lock waiter. We optimistically load
+		 * the next pointer & prefetch the cacheline for writing
+		 * to reduce latency in the upcoming MCS unlock operation.
+		 */
+		next = READ_ONCE(node->next);
+		if (next)
+			prefetchw(next);
+	}
+
+	/*
+	 * we're at the head of the waitqueue, wait for the owner & pending to
+	 * go away.
+	 *
+	 * *,x,y -> *,0,0
+	 *
+	 * this wait loop must use a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because the set_locked() function below
+	 * does not imply a full barrier.
+	 *
+	 * The PV pv_wait_head_or_lock function, if active, will acquire
+	 * the lock and return a non-zero value. So we have to skip the
+	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
+	 * been designated yet, there is no way for the locked value to become
+	 * _Q_SLOW_VAL. So both the set_locked() and the
+	 * atomic_cmpxchg_relaxed() calls will be safe.
+	 *
+	 * If PV isn't active, 0 will be returned instead.
+	 *
+	 */
+	if ((val = pv_wait_head_or_lock(lock, node)))
+		goto locked;
+
+	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+
+locked:
+	/*
+	 * claim the lock:
+	 *
+	 * n,0,0 -> 0,0,1 : lock, uncontended
+	 * *,*,0 -> *,*,1 : lock, contended
+	 *
+	 * If the queue head is the only one in the queue (lock value == tail)
+	 * and nobody is pending, clear the tail code and grab the lock.
+	 * Otherwise, we only need to grab the lock.
+	 */
+
+	/*
+	 * In the PV case we might already have _Q_LOCKED_VAL set, because
+	 * of lock stealing; therefore we must also allow:
+	 *
+	 * n,0,1 -> 0,0,1
+	 *
+	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
+	 *       above wait condition, therefore any concurrent setting of
+	 *       PENDING will make the uncontended transition fail.
+	 */
+	if ((val & _Q_TAIL_MASK) == tail) {
+		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
+			goto release; /* No contention */
+	}
+
+	/*
+	 * Either somebody is queued behind us or _Q_PENDING_VAL got set
+	 * which will then detect the remaining tail and queue behind us
+	 * ensuring we'll see a @next.
+	 */
+	set_locked(lock);
+
+	/*
+	 * contended path; wait for next if not observed yet, release.
+	 */
+	if (!next)
+		next = smp_cond_load_relaxed(&node->next, (VAL));
+
+	arch_mcs_spin_unlock_contended(&next->locked);
+	pv_kick_node(lock, next);
+
+release:
+	trace_contention_end(lock, 0);
+
+	/*
+	 * release the node
+	 */
+	__this_cpu_dec(rqnodes[0].mcs.count);
+}
+EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
+
+/*
+ * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
+ */
+#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
+#define _GEN_PV_LOCK_SLOWPATH
+
+#undef  pv_enabled
+#define pv_enabled()	true
+
+#undef pv_init_node
+#undef pv_wait_node
+#undef pv_kick_node
+#undef pv_wait_head_or_lock
+
+#undef  resilient_queued_spin_lock_slowpath
+#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
+
+#include "qspinlock_paravirt.h"
+#include "rqspinlock.c"
+
+bool nopvspin;
+static __init int parse_nopvspin(char *arg)
+{
+	nopvspin = true;
+	return 0;
+}
+early_param("nopvspin", parse_nopvspin);
+#endif

From patchwork Mon Mar  3 15:22:45 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999023
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C61F621577E;
	Mon,  3 Mar 2025 15:23:18 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015400; cv=none;
 b=Oyl9jJ3xhlfk2frrOUnZggBdpZd3mpbgQFOveUFc7gtQDEyX/e91v8hNT/0HqjRtEvl4crxPoXejyuGmyzNzmvcLWUYNqbOem3iLKLBejg5gjDNKski8c3bgiuMzs6VXt02KKwrIgCyTU4c7JpHeHO40qPkow6avxQk105DW1TQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015400; c=relaxed/simple;
	bh=C0ciCNeduyvvQFAO0DwlqwtcddwN6zNDgdODlI5CgNw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=V1VIvNd2CFJev7WdfDLJsRPDaBpZOxY4Tk43Un5+zGQTc8Yk/ui1Y85kvFzhnnlmMUFp7WA0u7u5D0kke25u6ZYPx0me2NPHd3NduMJrDjnjdTwnYbSEIc7+0QJVbyDfXLLeFwwHG2UZ3uN8qxFFI/hHZ0vOrwmFWiM5KrzgNwY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=lt/DfloQ; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="lt/DfloQ"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-43aac0390e8so29239935e9.2;
        Mon, 03 Mar 2025 07:23:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015397; x=1741620197;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=R4KdASnnyjPBe0E8JbE/B8q1tkmUaCwFOXMgt/4BU28=;
        b=lt/DfloQHy0LC0EndTqHv1INvtUj8FZn/V4BmDCE2sGh1ywS768fs7KTuul71rFLC3
         aEdFpCEW8wzbKJTXILCa6u+OotXzX61tvWQFw9+wb31VwUdDs4RfuKbZNBFoNA6bmRQl
         ytPzu62VxIbOJeuqNZLo5eAn46FtWnUzJtLysKPlRRXDaKf4bdYcqOGWmDIRoC+1gkRA
         hZp0fcnjQIwVoVf8Ev34BbaoUWo+BbgbT7DCSqgzim6cmxSchz6gQRhYsOAwSxB6HfDk
         kycvXXL2WqBebE8wGQ0+ptE/5lDB9PNP2uKigcd5aEp5+LA480viirzvQKvqgM2wKqur
         7TRg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015397; x=1741620197;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=R4KdASnnyjPBe0E8JbE/B8q1tkmUaCwFOXMgt/4BU28=;
        b=mVr0QDvmXSMukFhZ7R5v0SlCLP+DH7MiejYooXKBW7NDHRJB79CVd1ogGjoatShDRS
         jrDQdfyxbfjbp9zVMN42eZOLOO09ZbKhuBqGCUQzeeJPgn6pjhluoCCQ1dJnyfToh/05
         KTnCwc8YvhJW4SxpevQQIdlQGVeby8gKr2CUmwbsZvNJKUSz/S9Acqixaw7nZDvTP4j2
         LqjLKacFyb7ZNolE6dIq1AftEtVIYBPnV/r+ZukapjF2i90+xYW320cl4r5LvVvqPAht
         +NMcN9Att5W2AYPXIcWCn0qnKHA2yRqeZDj5iDOzkZyxovDF+vNEbTzWkT/qv4HhtBcr
         9yBA==
X-Forwarded-Encrypted: i=1;
 AJvYcCVjkmFw1Z6GDxOYGDfV+d/3xBNgD3jck/hXDRCwiFZVX6XvWDBFFAfssswC72Lht0zScPu41KyH8mpyyRA=@vger.kernel.org
X-Gm-Message-State: AOJu0YwI3Atno+Jq4Y8d6a0m1m7ioNdvhLiN8YUd9oHn4Uh5+CXKGFz6
	O3F8qbZ58BBqr12FPkDmyAjvuKq6oenCsOdCLHkIQZd82Ceu1ZTxHi+OuOs8QNI=
X-Gm-Gg: ASbGnctKNR+27FWDluihPb1n+rKM2pMbM+QVT8ifMrb31+jqqtu3PyZbvkvXEzRPziZ
	0hNKnf8WVubO+y9VOT9mEbNpSF34tsfuZ7hwufMoW2rl3WwSnVjXHwRYjxyaylOduG6gJgM/QpQ
	TjUW9oOfoMsSSnhvbAXvU1xXffdieEZcSS9zZKTqojrsbUCBgVv0j5C/F7lithBCrG0OQP1kEh3
	xXkzLN/hX7l6SDpXxpdoZso5dfOmk+ejGGNn0QA3fAzICq9pUGctbjw9Af5in7EYskiEzYvdFhR
	IYeSX3c/Dtea3tgYNQtreIQKyk+bxMolHHU=
X-Google-Smtp-Source: 
 AGHT+IEXzqJkS/D47SGG6s47BNUvrMNCFfD+WCo/Ixy2e0y3ir/cTpzrdXL3Bgt5bdvzEI6kv6bwOQ==
X-Received: by 2002:a05:600c:3ba9:b0:439:a155:549d with SMTP id
 5b1f17b1804b1-43ba66e74f8mr113912555e9.12.1741015396488;
        Mon, 03 Mar 2025 07:23:16 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:50::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bc18452c3sm41377135e9.25.2025.03.03.07.23.15
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:15 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 05/25] rqspinlock: Add rqspinlock.h header
Date: Mon,  3 Mar 2025 07:22:45 -0800
Message-ID: <20250303152305.3195648-6-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2297; h=from:subject;
 bh=C0ciCNeduyvvQFAO0DwlqwtcddwN6zNDgdODlI5CgNw=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWmeJIFZGQqQJ9zuXTndNKetm3e9meOOzs99f5
 56FqOzGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyoR+D/
 0VwsuuvPBO31kFGb0t61y+N4gbFPQVPDJBrXCTRuRgO0fOw5WCQWwcpin+e1kMbSgYKAZzVwfLS3fl
 USxgEUjPhUFvS788CF15UqpS6n/tM0irZYhiD/4t5EWU0g0ioKfKj6gj6yBwqsSMBTsIRoIbfdAtpW
 A1sQkzl5Q8TKau41XUFL5/fRetjHPVeJCUKabg5SUIG5I9iY6ZjqHLHvD+LFUnOo8bFfHwkb1uWXh3
 hFp6LuX4Ip5Q0OVOMs+ec1b0966SfNEsHIk3+Z8XY4f0eqnk7Ez83Zc/Hjz7csGn6uh8bcb6eG7ZYC
 +/A6aSKaqeZbUC4ssfb84IHdHW7hVWFUk4czo8NMacOAr3Qy1tlY/IIaFasBI7NmhQiZgLc8EPbgHi
 7TMVD7hKBCP+NHjd0dPY2vu1dxsGHJ7glfhHL4X9QrkiqZi5/Mkl3n7BQmfN/AKLVD7Azctv4JBglx
 VxQbreb1HGCpiW2R+/1KbXrT472iuSH/6kefuuYYh1hQbdVrvkEdNJfBXlKOJBdRGO9hA4i5b/tbBn
 1CQPYg9cYRVAZ147OeK3geUwV37j9ZjgLrI0+k1TkwXMQtUgriZg5GETC7ZgO+AO2BGZkTI4gFTPr6
 Jo2JmjrdfCTWQ/LMn3yeXbbAnc4me8NdPXySZtWjaqKhp7V4WbNvPwwg8qPw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

This header contains the public declarations usable in the rest of the
kernel for rqspinlock.

Let's also type alias qspinlock to rqspinlock_t to ensure consistent use
of the new lock type. We want to remove dependence on the qspinlock type
in later patches as we need to provide a test-and-set fallback, hence
begin abstracting away from now onwards.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 19 +++++++++++++++++++
 kernel/locking/rqspinlock.c      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 include/asm-generic/rqspinlock.h

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
new file mode 100644
index 000000000000..54860b519571
--- /dev/null
+++ b/include/asm-generic/rqspinlock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __ASM_GENERIC_RQSPINLOCK_H
+#define __ASM_GENERIC_RQSPINLOCK_H
+
+#include <linux/types.h>
+
+struct qspinlock;
+typedef struct qspinlock rqspinlock_t;
+
+extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
+
+#endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 143d9dda36f9..414a3ec8cf70 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -23,6 +23,7 @@
 #include <asm/byteorder.h>
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
+#include <asm/rqspinlock.h>
 
 /*
  * Include queued spinlock definitions and statistics code
@@ -127,7 +128,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	u32 old, tail;

From patchwork Mon Mar  3 15:22:46 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999024
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com
 [209.85.221.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0F30215F47;
	Mon,  3 Mar 2025 15:23:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015402; cv=none;
 b=PytKtH8+4l9R2MXyCJWoh9u1msihL13y1d/xhwfRZ9PlKC2uRhgdGuT7GMJKh5dhDdBtJT076AcYxSZqxR4S7CLqI1Ye3xxgO8TNNkAP81iqAix0tFc0rXK5Sn4q9l51ugJG36ggFXJS1r6kYlrrTUYOWhN1hsf9rvuGujZcQH0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015402; c=relaxed/simple;
	bh=x1Z+pB9xhs3i58kf9Kac9b92frWP4j0eHxNENfgFcNU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=JJLlOZ8c1FffY9PW1elBlQxNvyY59jWu5jvf6VJ/OnCALkGIlUoo1KKjfVuqaPIeZwshIYrmQnlWBKAUNygKssaEkDy86S9vll9qATuq7Ax0VQlfQaEYMvc6ZQjb9+jIHao+1jv8Z5MYKGOCZxc0l7+WS2cd6tFxWgCC+KTfxjY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=KAmJ6Ovl; arc=none smtp.client-ip=209.85.221.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="KAmJ6Ovl"
Received: by mail-wr1-f66.google.com with SMTP id
 ffacd0b85a97d-390eebcc331so1731940f8f.1;
        Mon, 03 Mar 2025 07:23:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015398; x=1741620198;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=VWj7fRL2wjv9R4mziQW8bTTqGP2D1A3sv3W8wRB2IlU=;
        b=KAmJ6Ovlw8Jt9GHtuTHvPLmuzQDOPF+RqYkQY64cqbMXNpK2DlMu3jNeSUmdxI59Yk
         dPFzwVMKDm/qkkYtDSn6UHFAkl6B5S5UJlKupOj1C792RrJ8g21f8VhEbMi5BET+lhuk
         rkdq/uM06WZZWMTjhZoiyLfzqb9yb8V+ovRKXZ2fTzIMKLOZxK0y9dK2EvG9QLY/ZbIL
         MIAfPSIFdaCe57Cz4/A5fvikEQG03FTZ8JxMDLBeEF9Qd7iQvJ98aiwmyOtrbA1x539X
         6bMk7OCHij3C4wymzEwcxCWZJgkXKORwEVTdg77WS5rpfzwXThbFWoPJzhe30kNMQFoo
         GdcQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015398; x=1741620198;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=VWj7fRL2wjv9R4mziQW8bTTqGP2D1A3sv3W8wRB2IlU=;
        b=Z1qn92tpH1LDlXSIyGrV5JxrYVbHX3EC1iWGAUQjrHHgyZWVvLmv0HOK0Ynl5qct3a
         /HlD8TigSnqmsIPVQD5GxB+0WfXKyqNLZOcywuDTv3ZKAzl1OWMt0eL4KaxJQ42XOvTq
         iu7YWXdnp4671S3cc9n/inf380ifmHDZulDaMCh/vrWgOYqqrKKkLLkLABcqKTicAIiq
         piZHf/BvEycjBfe3ksDebFR2bd3nkApzlH1+Xn4oqNesCFB9D8SdlrzLXzfyMSBFte5T
         7Rjwu6ZRIrR9AXfONd3g5NPoJwgKAO4IG/0BX+VYXXScrNsOByjhKdNL7In7JGJs39pl
         k1nw==
X-Forwarded-Encrypted: i=1;
 AJvYcCV6L/TU9/KIj7ELbNAC2V0/hx3ZsrYgWoFCSANWsWZoy7Dnpyg2Qhr2CnW/NBKtpeLheX1ZTLw/fGCi558=@vger.kernel.org
X-Gm-Message-State: AOJu0YxMN+CUP30Z10lZddrnkbNOnZtiiFxowgYBXeBFWoD2lXHd/8Nk
	6q/4gJxznRE69nnW/clJq1QvkBhew8+3ArLht8OdapaRBjglRja6m0nGePCUCG4=
X-Gm-Gg: ASbGncvWxGjY/HGdYLSZtZidpYoCbTJcu4rIDinWqNJHOoNhoq9xk6j5OepRUwsMrQF
	9aDqnFAO2ETvF+V6+9yRinX/oTA8JnCQgcQBwbUgWWrH7F9Cj9rIdcBPbYPx9ZtRHNx0ks8aYHR
	aRx9BY6YRyDzVPEGMePyecvK9jQKVERqYtUUUBlg9R3IJTLUEXeUsNaeA9BaVrpc/Bovz8IRMsB
	H3VAxES21wKUh7UXa9docIBDE99uXukNUQSZWIk+h9NV+VgnuE/yd581W671SBDDfbb77/G7HuN
	1BSTS9MJZRt2X7gfkSi2dULp6umTRaelNA==
X-Google-Smtp-Source: 
 AGHT+IFjm+ROX4Lt+V90Jn3TMk492UXdruKl3KBdRoIEAwuyla143y7WSrkvsYEZT3Xx7lgzY1sCvw==
X-Received: by 2002:a05:6000:1787:b0:38d:df15:2770 with SMTP id
 ffacd0b85a97d-390e15da77amr15041952f8f.0.1741015397860;
        Mon, 03 Mar 2025 07:23:17 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:b::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43b736f8034sm169770555e9.4.2025.03.03.07.23.17
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:17 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 06/25] rqspinlock: Drop PV and virtualization
 support
Date: Mon,  3 Mar 2025 07:22:46 -0800
Message-ID: <20250303152305.3195648-7-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6795; h=from:subject;
 bh=x1Z+pB9xhs3i58kf9Kac9b92frWP4j0eHxNENfgFcNU=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWAhlIoArRUBi9+QCUAqMzXrVAKF0JtHZdQgoX
 fPFJU92JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RygwXEA
 DAmIu9BajEdITHz1mKPuYwgeMXa7iRtPHxfqCpPmzf/TH+bsToJCV02MBZgALM6kWCm/5rAUSraLhw
 BnMrlVK/RcAw8Kxjwu1xmv4nZtV3SYbVUGx9WVMII5Oeyew86x2PDffmVG2n1obhHHjW3irYV8YsdE
 Yp1hHfinMz7/BSq/yV3BC5t/Xnfyeqm/J8YZY2QWNJBC4EKexPFCjmswgCSAaSFCcxhApWXXJzknMU
 MFRWSRvdZ44k9m+DhbcggcjpFSytSET3RxyTw4yuiGnXVqdQBBg0JBdMGJz/TatR2aEVIb7cJrYFAg
 j53Yifny9409xdRc2gl46en9AmpDm3/WgCsa0MG4u2UZyRfxyJFucVeD8D37S0ybhHJqnoRd8Idhi7
 QmQz4h/wP2rzNMb9gJ5XgohE5pZ2V72I2JHpPuL7p4uQ5QTqHm+xQyirFjDGGmfxNDIuH/1w//0thM
 DL6ypO4I4NPgIyNpKI0IVWdOcJ4sbxDX2seupxCRksDYYMwlRTYnEPgbMhqobCiAvBuEuRRvINLHkc
 gdFHigGRIq+ZNPJc74y2xJvhNd4dehMSg8IlVcZDLDmYHHHUCDiHbmlnJLWJ+hw/5iLNlSJtmRadiM
 JCiKxpddvVm6tqQTMDIdqvcgLcDn7u2gAg21mmdVh/8SvCqLxQsKxzk73jWA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Changes to rqspinlock in subsequent commits will be algorithmic
modifications, which won't remain in agreement with the implementations
of paravirt spin lock and virt_spin_lock support. These future changes
include measures for terminating waiting loops in slow path after a
certain point. While using a fair lock like qspinlock directly inside
virtual machines leads to suboptimal performance under certain
conditions, we cannot use the existing virtualization support before we
make it resilient as well.  Therefore, drop it for now.

Note that we need to drop qspinlock_stat.h, as it's only relevant in
case of CONFIG_PARAVIRT_SPINLOCKS=y, but we need to keep lock_events.h
in the includes, which was indirectly pulled in before.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 91 +------------------------------------
 1 file changed, 1 insertion(+), 90 deletions(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 414a3ec8cf70..98cdcc5f1784 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -11,8 +11,6 @@
  *          Peter Zijlstra <peterz@infradead.org>
  */
 
-#ifndef _GEN_PV_LOCK_SLOWPATH
-
 #include <linux/smp.h>
 #include <linux/bug.h>
 #include <linux/cpumask.h>
@@ -29,7 +27,7 @@
  * Include queued spinlock definitions and statistics code
  */
 #include "qspinlock.h"
-#include "qspinlock_stat.h"
+#include "lock_events.h"
 
 /*
  * The basic principle of a queue-based spinlock can best be understood
@@ -75,38 +73,9 @@
  * contexts: task, softirq, hardirq, nmi.
  *
  * Exactly fits one 64-byte cacheline on a 64-bit architecture.
- *
- * PV doubles the storage and uses the second cacheline for PV state.
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
 
-/*
- * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
- * for all the PV callbacks.
- */
-
-static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
-					   struct mcs_spinlock *prev) { }
-static __always_inline void __pv_kick_node(struct qspinlock *lock,
-					   struct mcs_spinlock *node) { }
-static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
-						   struct mcs_spinlock *node)
-						   { return 0; }
-
-#define pv_enabled()		false
-
-#define pv_init_node		__pv_init_node
-#define pv_wait_node		__pv_wait_node
-#define pv_kick_node		__pv_kick_node
-#define pv_wait_head_or_lock	__pv_wait_head_or_lock
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
-#endif
-
-#endif /* _GEN_PV_LOCK_SLOWPATH */
-
 /**
  * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure
@@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
-	if (pv_enabled())
-		goto pv_queue;
-
-	if (virt_spin_lock(lock))
-		return;
-
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.
@@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
-pv_queue:
 	node = this_cpu_ptr(&rqnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 
 	node->locked = 0;
 	node->next = NULL;
-	pv_init_node(node);
 
 	/*
 	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
@@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		pv_wait_node(node, prev);
 		arch_mcs_spin_lock_contended(&node->locked);
 
 		/*
@@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * store-release that clears the locked bit and create lock
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
-	 *
-	 * The PV pv_wait_head_or_lock function, if active, will acquire
-	 * the lock and return a non-zero value. So we have to skip the
-	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
-	 * been designated yet, there is no way for the locked value to become
-	 * _Q_SLOW_VAL. So both the set_locked() and the
-	 * atomic_cmpxchg_relaxed() calls will be safe.
-	 *
-	 * If PV isn't active, 0 will be returned instead.
-	 *
 	 */
-	if ((val = pv_wait_head_or_lock(lock, node)))
-		goto locked;
-
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
 
-locked:
 	/*
 	 * claim the lock:
 	 *
@@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 
 	/*
-	 * In the PV case we might already have _Q_LOCKED_VAL set, because
-	 * of lock stealing; therefore we must also allow:
-	 *
-	 * n,0,1 -> 0,0,1
-	 *
 	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
 	 *       above wait condition, therefore any concurrent setting of
 	 *       PENDING will make the uncontended transition fail.
@@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		next = smp_cond_load_relaxed(&node->next, (VAL));
 
 	arch_mcs_spin_unlock_contended(&next->locked);
-	pv_kick_node(lock, next);
 
 release:
 	trace_contention_end(lock, 0);
@@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	__this_cpu_dec(rqnodes[0].mcs.count);
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
-
-/*
- * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
- */
-#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
-#define _GEN_PV_LOCK_SLOWPATH
-
-#undef  pv_enabled
-#define pv_enabled()	true
-
-#undef pv_init_node
-#undef pv_wait_node
-#undef pv_kick_node
-#undef pv_wait_head_or_lock
-
-#undef  resilient_queued_spin_lock_slowpath
-#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
-
-#include "qspinlock_paravirt.h"
-#include "rqspinlock.c"
-
-bool nopvspin;
-static __init int parse_nopvspin(char *arg)
-{
-	nopvspin = true;
-	return 0;
-}
-early_param("nopvspin", parse_nopvspin);
-#endif

From patchwork Mon Mar  3 15:22:47 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999025
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82B6F217F46;
	Mon,  3 Mar 2025 15:23:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015403; cv=none;
 b=fhmcXD2o5wyr+iXCNNR811c3evct1V0TJ4DVQzzh5m6yvySl+NRORxd6LNNQEA62VAnI/owNo9arm4xFx/c8do1x/Sy1BMFMeMo3KAJT3CN5rLyQF1oHY3HGvYljDr8/JOGyyPgvQvot1WQRKGq8g1wmh4gBqVnyAJwAzPZ3baA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015403; c=relaxed/simple;
	bh=yvqL/a4hlJcETEVluNNRDX4NLx5YZM2izb98JHL7vqQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=cLnvRMaJno3sYgKKmVNIYGLc2uv2hKclwmxaaqzVI03xY95Hehmr+Oa+rBlr23oJYwPikjFU/FXL4xHMRZdtTYy6fQfw0gne5IKjrk7v0yXIZ4hKIPtpDE/D893Z4t9LKG8rGY/FFPCNf5tcGc7/rhKJzzNqNvjnPIf69G6MXbs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=b+IrzupR; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="b+IrzupR"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-439a1e8ba83so43690825e9.3;
        Mon, 03 Mar 2025 07:23:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015399; x=1741620199;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=N/HTxEZqZcqCwShvGdM3eA0fWGvVAu5JcH2SiIADZmE=;
        b=b+IrzupRyeEecmDU1c1L87OUw+ZaA3EpXUTie8OXAYLIrH/crnDwkbdZXbI5fkCGgr
         FVERZuXKnprHRACrcvqxPRdu9ysstt7tFALWCvSQA+qFtRoN100mGFSXkMTIYrwhSg6n
         Z0n9t1nFfqdhMMe9cn6kvvQIzIctLsdjgZrS2gClJtYQN2xiFhB+zAhYzmzUeLyngAtU
         jaJrooyLb+KoYFEngMQDEIQvFcIfs1pO34769Q2+yQ6sxajz7YHS3ll6yRIHI3V3Daw+
         O47a+1wszwp2Fb3ccw/496HA5yqYqIs5yZr2BMPaGJVIaAvByUI6M4r6KeYiU/pWwsOV
         xzBw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015399; x=1741620199;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=N/HTxEZqZcqCwShvGdM3eA0fWGvVAu5JcH2SiIADZmE=;
        b=H1w0Ptr00YcIiqVbo3mtN/H+MBV/72hJEo6XyYu3ss+rPYXQPXISENaWe1DfR3hptL
         PNsdZacAY/mb5YRhHrMkv/rfIEAMn1IhVCYGtCZCIPTGNDfmYpPxVZjCcV5CqGEql6px
         +pptyrcI2OS8ejZ4swfKG2fxFJJIKOVvWcPPkDT/BItZRvZGMvhhfCMeTlisQXZ3G2E6
         2YaubY6BNejAuaTKqsY/yQAHCrxvyta9vqTaVvL5Rhh3rr825ZNRLDlCTlu0UiuwNFmq
         kGjN5nOP+mOzmClRAXbnOwF5dJmiUZNfKZsouD4TH7p9Rldg743Cy2VIV8939ux1xe6A
         iKvA==
X-Forwarded-Encrypted: i=1;
 AJvYcCWFRFs2d1lrG2aXxFJRCBUSbWBLEqpUEMQHeQI//ozhHBKFOx5VttvNaetikrcwRn/+G/BxgspqMeKerUk=@vger.kernel.org
X-Gm-Message-State: AOJu0Yxig7dPLWSck70lBe4/N3G2ejqhKFKvbkmBN/5cqN2M6yXTVCdD
	9Qj4EC/QrJsbtQyU2hghkFT6r/J/nZACF2/1k375Tm4PAabbvrggVVwCJPU+G10=
X-Gm-Gg: ASbGncvOsNyNnBe3yI3JU/f36rXNdlc7Kdllqux5tRqV2hadkorOoL10ga5ZH49km30
	URl8+voKKNjLxnY4D4WIppBxSJWEh/EHtyaFq6QU37f8NjqWVLB1Qfaa7Z8bS0P4RaeAoK2VqeW
	Cf6n5n/lDOD5qwU2EFrFCJs0a6I+4MgC4jzFe/nWL5jIxKnwOEUi/2xJBkN+PIe41SvmK2exQWs
	Q0VItx248hy314jfT9YN3wD4Z+nO1gQBiSvQuBN4xV9W6n5N4xkuFyAd5hpDhA6f1PNLC7kdxPF
	T7mAkPPJxdy5cOCo+NjceIoUI/1GytWDsDk=
X-Google-Smtp-Source: 
 AGHT+IEbsAZvJmITJ9ICif5ulCDnUNtcT3LryQIPvfnsjMgh8qRyST/7vsmVWTkT2yKdvlAmVbm5gQ==
X-Received: by 2002:a05:600c:138e:b0:439:9a43:dd62 with SMTP id
 5b1f17b1804b1-43ba675a8fbmr94459815e9.24.1741015399077;
        Mon, 03 Mar 2025 07:23:19 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:4e::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bca26676esm4051765e9.8.2025.03.03.07.23.18
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:18 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 07/25] rqspinlock: Add support for timeouts
Date: Mon,  3 Mar 2025 07:22:47 -0800
Message-ID: <20250303152305.3195648-8-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4618; h=from:subject;
 bh=yvqL/a4hlJcETEVluNNRDX4NLx5YZM2izb98JHL7vqQ=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWW5YAHky9QnaXijfLpbPfmrcJbz9mGABaMN7Gq
 0o42/62JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyhFZD/
 98oVQe8PFJhIllEShh4nhwUHn8NyXfjHgZA8EYD/afmXUtbe6cebWH5AkecePI/ENfeIJGVq2k75qO
 bvWe4RtiqLDWBsJl0E7U+s2KeDZ0DSk9f2WEvtzxSmCp/7UvUNNCppI/KwpLK+0e7hAe+GARXNg60S
 gvFuMulaOvXckRa9tpI+Hr6QKTNcN2LRmRExBXLd9T3LEoZTDJmyajfWO+Rfz5YU0g5KANdcy5e0qn
 GSzgVdRNgcSQOk3Zbp4619EukeKa9f23Tg02x5DCOUpx6mZ5nLqck/vFZ9oU8Webx+07h2RX91o2Kv
 TN1pq4jrCnQVVKWpOXrlnqPje97GpGVOCULGt1HNSZjpiEHi1aaxH6G0Y04RUnW0tHWAoxe7UDJk0o
 fCSuU0DhZpfltTY7/WE+Vf0bJFidSjLAaFUBjfaGSjVpQiuT/NaM6RgcLOAfFZ+kZKor4iNdDigsyS
 uMikiOXPVSq5YpOAmNxGz++xsEFagK0j1pVs6TqaC0dd7hKb57Ww60taY11XX+2jebXLUb/Ujov7TM
 eHo97ARRBCdiHc7+5UtfXPPw64OraEi2LfbonjvDU0tqtoL2yij/h7RNtLLUQP0RCqFJfuaR1AezSv
 BHNMKZgxnulhR86H/4dzv/v/FYXTwu9j3CZEhgrEpd9sdWsCL1hOwLpR5Hcg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect
when the timeout has expired for the slow path to return an error. It
depends on being passed two variables initialized to 0: ts, ret. The
'ts' parameter is of type rqspinlock_timeout.

This macro resolves to the (ret) expression so that it can be used in
statements like smp_cond_load_acquire to break the waiting loop
condition.

The 'spin' member is used to amortize the cost of checking time by
dispatching to the implementation every 64k iterations. The
'timeout_end' member is used to keep track of the timestamp that denotes
the end of the waiting period. The 'ret' parameter denotes the status of
the timeout, and can be checked in the slow path to detect timeouts
after waiting loops.

The 'duration' member is used to store the timeout duration for each
waiting loop. The default timeout value defined in the header
(RES_DEF_TIMEOUT) is 0.25 seconds.

This macro will be used as a condition for waiting loops in the slow
path.  Since each waiting loop applies a fresh timeout using the same
rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the
values can be easily reinitialized to the default state.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  6 +++++
 kernel/locking/rqspinlock.c      | 45 ++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 54860b519571..96cea871fdd2 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -10,10 +10,16 @@
 #define __ASM_GENERIC_RQSPINLOCK_H
 
 #include <linux/types.h>
+#include <vdso/time64.h>
 
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
 
 extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
 
+/*
+ * Default timeout for waiting loops is 0.25 seconds
+ */
+#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4)
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 98cdcc5f1784..6b547f85fa95 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -6,9 +6,11 @@
  * (C) Copyright 2013-2014,2018 Red Hat, Inc.
  * (C) Copyright 2015 Intel Corp.
  * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
  *
  * Authors: Waiman Long <longman@redhat.com>
  *          Peter Zijlstra <peterz@infradead.org>
+ *          Kumar Kartikeya Dwivedi <memxor@gmail.com>
  */
 
 #include <linux/smp.h>
@@ -22,6 +24,7 @@
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
 #include <asm/rqspinlock.h>
+#include <linux/timekeeping.h>
 
 /*
  * Include queued spinlock definitions and statistics code
@@ -68,6 +71,45 @@
 
 #include "mcs_spinlock.h"
 
+struct rqspinlock_timeout {
+	u64 timeout_end;
+	u64 duration;
+	u16 spin;
+};
+
+static noinline int check_timeout(struct rqspinlock_timeout *ts)
+{
+	u64 time = ktime_get_mono_fast_ns();
+
+	if (!ts->timeout_end) {
+		ts->timeout_end = time + ts->duration;
+		return 0;
+	}
+
+	if (time > ts->timeout_end)
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+#define RES_CHECK_TIMEOUT(ts, ret)                    \
+	({                                            \
+		if (!(ts).spin++)                     \
+			(ret) = check_timeout(&(ts)); \
+		(ret);                                \
+	})
+
+/*
+ * Initialize the 'spin' member.
+ */
+#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; })
+
+/*
+ * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
+ * Duration is defined for each spin attempt, so set it here.
+ */
+#define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; })
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -100,11 +142,14 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
 void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 {
 	struct mcs_spinlock *prev, *next, *node;
+	struct rqspinlock_timeout ts;
 	u32 old, tail;
 	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	RES_INIT_TIMEOUT(ts);
+
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.

From patchwork Mon Mar  3 15:22:48 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999026
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B70F6219A9B;
	Mon,  3 Mar 2025 15:23:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015404; cv=none;
 b=n/UF7jKMIUgwh8oXverjEP4L2SpTYd19BVL1v3NnAuDUFvWKEt4gfd+tIQsGx54nB/GexDb9s6xX5MG52PyAFR5kEwh3xyT4qFXVNXRC0LEeLJRhvYMnr9OudKlPxyUoTzZDnuRAghX+rV9ZLu5PYO3m3dLCB6M2T3XAUGwE5UY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015404; c=relaxed/simple;
	bh=OXZjfkj9vlmQB5DKUaVgZP2eUXPIscjbGs4w8nWWb1Y=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=EV8X02VIbpASsvvsTntkBrBmJvOLg2DAiNt7dYDW7fjwvK3bCcrFpzCAVKROw+5U7JWijJx+d9x0FMo0q1UGuN2UCU/OdgGkyaiaK5Jww0W9Z4LnFGAXcBGuEcL+09VgrJkL+Z8GS+IbaVwNQS0pIkIh7apReOtPLOtEwlO6oQA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Mwp17mOC; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Mwp17mOC"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-43994ef3872so29735825e9.2;
        Mon, 03 Mar 2025 07:23:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015401; x=1741620201;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=CiH4DXwFE1Biytfxts8kRShbn5OFLAeIwdTEbcHfEBs=;
        b=Mwp17mOCz2Oihm1rAVi8dFSp3Z3rfI02Yv/BXOU4E6BJ9YD6dD7g2DiJ1lkfGteTKf
         3DjZ7/zjSfp+28w+01jhtnsk7xXce6+9PuIcrIvi4Q97Pj/auUAjFdxbJQSpama4jfvL
         Eeg5g/N+DkRVvRtGEn7ue2zuODVHuW3DcsZ386IAlD50PyPC9NYaD4diioaffX8yFKJN
         zLqbwv9zN2dY2LcD2RGau704mJJn+agZQARLNfoKHqxokqYLzBZAfRr4ydHtdN9/aygS
         WJNOHT2SACr/YEgbUKsz/eESZh1CFjC1VcnSPvYuFOhT6XVi2P6lZqCkNmYagqTwKTkb
         AZFg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015401; x=1741620201;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=CiH4DXwFE1Biytfxts8kRShbn5OFLAeIwdTEbcHfEBs=;
        b=wE8pGxnHxLu7gHH+sCLx1u8XPI1f1kvOS4SzQoJwKxd5DtE+fSkW0gZW7GTIWkbWPa
         8T5+i/W2FaHun1RFN19+IH2fKpdjnrHhqHbtUeU8Fr5kQNzftf1wVC1FsvC9qUNIg4x9
         s7sFEbX9RiIHvCoKZQ0zR2zmIIdlN/jlQVoFdIrlRoaE7aYtCuf245yyJH4dDR6SglXc
         cMK7pmWbZccAEq5/7AgrPTiHuBlvu+d2PX1JmepUYedOVCjvbZrMJoNQib6HKDltmmgn
         7jYn7SucogrNaU/cl+CDpmWR+Pz9Da8GmEb6JExIc/8rcxdr/cg6rJZXgB6US6qJV8/x
         o98g==
X-Forwarded-Encrypted: i=1;
 AJvYcCXCTJoDovzn+c60tfPC4MylbyeWB+MVtqiAZjHDVma2oNuBDGeLqVmgJSDptVKIuoXLNaEybbbCmYgk26M=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy89SnkIu/HTimbvxdRU+FtNv6iTNB2qL+UV8zwBzkJrscXfUnf
	/JE03X27255Ffe4N5F2tj0G4uu9OrhM7MS6brFOIy+0u1cbfpyUYCiwybGQ2pb4=
X-Gm-Gg: ASbGncvlSmpwfE8Dd/BCXiYND9mqubJ0troQI7KKxI1NnF0PTHPakAVnOILbYrHrLA3
	aN81ZgNG0i72YmVqqjEiGegmF51QEhc0pOaTxdxN4T2/FwDPRHZ3LVW6XiN8sO4xwINkYxYtAYj
	xTei7Hb/BYGMaI2nvTdbD4H3fAMmAn+MAoe5RuyGLpZRy4qYRMWUlVmeea7Jr6edlYlTRdFIqJJ
	rg2DjQ0/58xEzdsv5sB1ZObYCcn0CIUR/JoAXURt7ipx57VaYwHDYbc0tlp9yQKN4xBR8oiVAGs
	tCjiF+xt3jDDBdEFAWbH4R+OY24scy+TfA==
X-Google-Smtp-Source: 
 AGHT+IGX9Uo3KWT7ZP1Tg6bcJFL7kn9d7FzFa8+K2r3RPDuxVc4WDWiD0YgV40rXuhBZGtIqZf8VUA==
X-Received: by 2002:a05:600c:6b65:b0:439:a6db:1824 with SMTP id
 5b1f17b1804b1-43bb3c30d77mr71469755e9.16.1741015400470;
        Mon, 03 Mar 2025 07:23:20 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:b::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bc11e1c8esm42306695e9.32.2025.03.03.07.23.19
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:19 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 08/25] rqspinlock: Hardcode cond_acquire loops for
 arm64
Date: Mon,  3 Mar 2025 07:22:48 -0800
Message-ID: <20250303152305.3195648-9-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6192; h=from:subject;
 bh=OXZjfkj9vlmQB5DKUaVgZP2eUXPIscjbGs4w8nWWb1Y=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWOoYAiRmhF6y065QEj1LrKSfWllbpNyyYNZSa
 qSwKiLWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyoQ1D/
 wLlVGYocGxdNg4s1QyAFfYsSQi2WqV1eBWYZSjpNeeklY5NGh1Tkry86bADuiPA41U+6sKcb/xJFWh
 P98uXD3GNRpR3Qc0ZlcFxJ8UNRa1pt3WjaL+WU2IF7ab7pFZ78BILJH5IehQy38rq89MVWCIwd2y8a
 RHZEOFbn9h2lwC+gVyaVp6R/8EqwozY7peLmtmdEFyLzkanBkkKXXc+vmf2LUSP//uz18W5Dvlk/u0
 Ij4Dh4XIXBZs+zio7QVmeQ1DgFpVZqqqb4UgpoAdO7QiQV0VZszEUsbM5tyg0NeNseWkS9u9vn89cy
 o5IFAwdAf7z+FqxhLOTysBwrma0uhfKDfzVTbqISED2s8YCgwTcFTVLcOnpawilUTon7ijZDzJQQza
 5cEcDi2MdF3WOvcWqxiMLcgBJpoJwzQQ6MByaMhPTBx+dbyWoJwX54M+fwqGKqjL1nIy4s38v/fflb
 5KSHn5vrHcWHNLNI4f1kxITi+5bQsaMeifIC5ltUctjwP0uh/WVhq+B63kLgKyxgW8zDe5RkYxvVVZ
 W2bMlEgbU71tVLuA5VUZTOizoax2TL2hJ1Ox21/A0q4t3m/Th739SY3jusfPle/dtSkQURdfj5CVS+
 7WI5z2oPcdvV5Fu3sRfrtb1EwqRYq3nUqzxzQ781Ej823z3Nf0EW8WW0F5Dg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Currently, for rqspinlock usage, the implementation of
smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are
susceptible to stalls on arm64, because they do not guarantee that the
conditional expression will be repeatedly invoked if the address being
loaded from is not written to by other CPUs. When support for
event-streams is absent (which unblocks stuck WFE-based loops every
~100us), we may end up being stuck forever.

This causes a problem for us, as we need to repeatedly invoke the
RES_CHECK_TIMEOUT in the spin loop to break out when the timeout
expires.

Let us import the smp_cond_load_acquire_timewait implementation Ankur is
proposing in [0], and then fallback to it once it is merged.

While we rely on the implementation to amortize the cost of sampling
check_timeout for us, it will not happen when event stream support is
unavailable. This is not the common case, and it would be difficult to
fit our logic in the time_expr_ns >= time_limit_ns comparison, hence
just let it be.

  [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com

Cc: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/arm64/include/asm/rqspinlock.h | 93 +++++++++++++++++++++++++++++
 kernel/locking/rqspinlock.c         | 15 +++++
 2 files changed, 108 insertions(+)
 create mode 100644 arch/arm64/include/asm/rqspinlock.h

diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h
new file mode 100644
index 000000000000..5b80785324b6
--- /dev/null
+++ b/arch/arm64/include/asm/rqspinlock.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RQSPINLOCK_H
+#define _ASM_RQSPINLOCK_H
+
+#include <asm/barrier.h>
+
+/*
+ * Hardcode res_smp_cond_load_acquire implementations for arm64 to a custom
+ * version based on [0]. In rqspinlock code, our conditional expression involves
+ * checking the value _and_ additionally a timeout. However, on arm64, the
+ * WFE-based implementation may never spin again if no stores occur to the
+ * locked byte in the lock word. As such, we may be stuck forever if
+ * event-stream based unblocking is not available on the platform for WFE spin
+ * loops (arch_timer_evtstrm_available).
+ *
+ * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this
+ * copy-paste.
+ *
+ * While we rely on the implementation to amortize the cost of sampling
+ * cond_expr for us, it will not happen when event stream support is
+ * unavailable, time_expr check is amortized. This is not the common case, and
+ * it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns
+ * comparison, hence just let it be. In case of event-stream, the loop is woken
+ * up at microsecond granularity.
+ *
+ * [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com
+ */
+
+#ifndef smp_cond_load_acquire_timewait
+
+#define smp_cond_time_check_count	200
+
+#define __smp_cond_load_relaxed_spinwait(ptr, cond_expr, time_expr_ns,	\
+					 time_limit_ns) ({		\
+	typeof(ptr) __PTR = (ptr);					\
+	__unqual_scalar_typeof(*ptr) VAL;				\
+	unsigned int __count = 0;					\
+	for (;;) {							\
+		VAL = READ_ONCE(*__PTR);				\
+		if (cond_expr)						\
+			break;						\
+		cpu_relax();						\
+		if (__count++ < smp_cond_time_check_count)		\
+			continue;					\
+		if ((time_expr_ns) >= (time_limit_ns))			\
+			break;						\
+		__count = 0;						\
+	}								\
+	(typeof(*ptr))VAL;						\
+})
+
+#define __smp_cond_load_acquire_timewait(ptr, cond_expr,		\
+					 time_expr_ns, time_limit_ns)	\
+({									\
+	typeof(ptr) __PTR = (ptr);					\
+	__unqual_scalar_typeof(*ptr) VAL;				\
+	for (;;) {							\
+		VAL = smp_load_acquire(__PTR);				\
+		if (cond_expr)						\
+			break;						\
+		__cmpwait_relaxed(__PTR, VAL);				\
+		if ((time_expr_ns) >= (time_limit_ns))			\
+			break;						\
+	}								\
+	(typeof(*ptr))VAL;						\
+})
+
+#define smp_cond_load_acquire_timewait(ptr, cond_expr,			\
+				      time_expr_ns, time_limit_ns)	\
+({									\
+	__unqual_scalar_typeof(*ptr) _val;				\
+	int __wfe = arch_timer_evtstrm_available();			\
+									\
+	if (likely(__wfe)) {						\
+		_val = __smp_cond_load_acquire_timewait(ptr, cond_expr,	\
+							time_expr_ns,	\
+							time_limit_ns);	\
+	} else {							\
+		_val = __smp_cond_load_relaxed_spinwait(ptr, cond_expr,	\
+							time_expr_ns,	\
+							time_limit_ns);	\
+		smp_acquire__after_ctrl_dep();				\
+	}								\
+	(typeof(*ptr))_val;						\
+})
+
+#endif
+
+#define res_smp_cond_load_acquire_timewait(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1)
+
+#include <asm-generic/rqspinlock.h>
+
+#endif /* _ASM_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 6b547f85fa95..efa937ea80d9 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -92,12 +92,21 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
 	return 0;
 }
 
+/*
+ * Do not amortize with spins when res_smp_cond_load_acquire is defined,
+ * as the macro does internal amortization for us.
+ */
+#ifndef res_smp_cond_load_acquire
 #define RES_CHECK_TIMEOUT(ts, ret)                    \
 	({                                            \
 		if (!(ts).spin++)                     \
 			(ret) = check_timeout(&(ts)); \
 		(ret);                                \
 	})
+#else
+#define RES_CHECK_TIMEOUT(ts, ret, mask)	      \
+	({ (ret) = check_timeout(&(ts)); })
+#endif
 
 /*
  * Initialize the 'spin' member.
@@ -118,6 +127,12 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
 
+#ifndef res_smp_cond_load_acquire
+#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire(v, c)
+#endif
+
+#define res_atomic_cond_read_acquire(v, c) res_smp_cond_load_acquire(&(v)->counter, (c))
+
 /**
  * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure

From patchwork Mon Mar  3 15:22:49 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999027
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com
 [209.85.221.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFE7921B9F1;
	Mon,  3 Mar 2025 15:23:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015406; cv=none;
 b=rmSQlQJXrwjXacTui/LH/LCN0JCRk+6aQEi+5c/lhMvN3I5aCeBQvPkwKjQ+onRqqlGKmpKfaBygVdVV16r3o99yC/+eHSXhlaEpFgwjqIbubmn1cH1Cxl9dQcCUko69rWuWocjroXn1CCsWpAP8dnRqNVQi6pto9BfHcDR7ZB8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015406; c=relaxed/simple;
	bh=1XvWGmZCOVeyUxqvFf8Y0s6SXxKLnF+WaHRbErsWH/c=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=IfJoSx7GfLR4KWXL9xnFnra6pSaeM/6XxFZzdp3jLy7gWiLxYWFg0rKNB7CSMo7jalQEtz9kyqtuLKDhRBHd0aKXFy5PmQVmSVvEclMezz87m/9Adk7vsOljY+2l3eSHAwC0Zso+/6Y6RhZZSH/l7ZBBIHUajgcRfzU3Sq/Ge4E=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=coviwiKf; arc=none smtp.client-ip=209.85.221.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="coviwiKf"
Received: by mail-wr1-f66.google.com with SMTP id
 ffacd0b85a97d-390df0138beso2442175f8f.0;
        Mon, 03 Mar 2025 07:23:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015402; x=1741620202;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=GSHeNmV46kRt/dp4H0+hnJsEV9KJXQ8UwiwQeMlZ2+c=;
        b=coviwiKf8BvG3pKRk5I2UH0LUS0JqEWF6Q4VO+H0uI4xlrLbBElW8choyUqoBdPXmV
         8/OGXf+KGJ094ukTLMTNcDqRhZzYfMFlEUwfF8lByAa9WBfiJbPu5w5sRmcaYTQJqewS
         HX8FrkB0TlETmUGX5lXYTGDPmp0DEPiVOTIcxf84zDBMDDrfb3vcu/oa3BSxJ0rySoi2
         TBlgv4NTr3kqNgnL8VCOgTX2gj7XRNz0EgM0vyyl1pX98LwJzz0QhME+J4ESJgJdWPJE
         gwbUCG5FYr0n1u91JnvmUkRdZaokRu8ZEz9CG4s+/X5/xKbAXX5I/DpTElPGFe+WLL1Y
         REdQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015402; x=1741620202;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=GSHeNmV46kRt/dp4H0+hnJsEV9KJXQ8UwiwQeMlZ2+c=;
        b=UPMCCawzgLN1u73Y6FDenfFra4eeSHd3kSzMX5wyKS3Bo4ETj1CV30rXVYrEDfJiJ1
         CWRE5Se8dSw+zI34oGgYG9WeanJfYMBNotbjzY1YkbzF9Y5i/Q1yrZkGkGRkJQ3vQCVf
         s4UaCogyLbStSOFY2ClPJTrtMGVh4L3XtCHxIqCilueJomfz6DWKK9/GkcFcdIZ/wtDb
         kSbgMPc9B+qq+LEK/z1aC6FLQbD8IWamrM1+eIZE0ZmQmRTWkecehBR2Lc+hdo0zGliG
         2ki82wrvsryYoF2pbnr4kxCbHXHtkXgFRXxaai3fP3QAg+OLD+4r1Z0pBp9avRDZTeKj
         O8+Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCWFttPu1WrSjd4TIa2dKsNTZKeuLWMzveAs7+yW0kxSlt4ObtFSIsx0RGb0DiG5K4gQTJrijvx9PiyAsyY=@vger.kernel.org
X-Gm-Message-State: AOJu0YxyxTuFBYOUzI5U8vKCsQoB2MoITcm4FwJJQ+wbVUlSgw04vV/x
	q5VslEudZyDI2KK1b4ogmvC124jsbHyc+Tt63L/yMX5bLFqaab2jG+8gJXvPvTo=
X-Gm-Gg: ASbGncvOVHKb3YhT5JnofNHlqmIBPPHnCK6dQkxyds64Qe68Rj1r3gzvsNvga9JiYnZ
	krLfUmWrb/r+ZdB64+6XIJHd8Ve3amPZe15BzSAJfkgF4FI6V/TzN0px4GCelGDX2jsLZrvdkVH
	jclWzF/GFmannOMRGxSS8QwbuuedAY/wsVGnDTVp9w/oQvYtb5gzJQOkyZDehthDDCvP9e1A65l
	97rvNyaFmb2qS7fh293j3KWrkqRASEVCHXzOPQugV+euZR8vcSBS3TaHTKq+Tjaf5lc6ArH1cUL
	GYNwy2W+p1DO7ha4gezGV1xE0NnSvmUIKxI=
X-Google-Smtp-Source: 
 AGHT+IEnT4A+TUqzNaTrtLXu5OtEH4QH2hmaxPGiGwoQEqitjFfl0aXZCCxloD3CNgY+V4T+aWGquw==
X-Received: by 2002:a05:6000:2c4:b0:390:fbba:e65e with SMTP id
 ffacd0b85a97d-390fbbb1ce1mr6065042f8f.32.1741015401792;
        Mon, 03 Mar 2025 07:23:21 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:51::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e485dcc1sm14484132f8f.87.2025.03.03.07.23.21
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:21 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 09/25] rqspinlock: Protect pending bit owners from
 stalls
Date: Mon,  3 Mar 2025 07:22:49 -0800
Message-ID: <20250303152305.3195648-10-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4562; h=from:subject;
 bh=1XvWGmZCOVeyUxqvFf8Y0s6SXxKLnF+WaHRbErsWH/c=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWW53FYHDbQPTx2MaF10QE+C9S9CqxKdZN1xCD5
 5EH85qGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RygSbD/
 9QtT69Fbx2ynZFLU/vCOAn9155npdoPHm32pPMjxG7qV6+Th71XhvC5I9p08WQrwACm1JzGxkepYjm
 pDnmwck8jRMUBZWjdXi29dJEj7At072EjdZOXCp/Pzn/DxG1hBXTY/ZH9GJo3q8nxLqI/8mzOGpvR2
 b9bpAUy1mnqdSz0AULDNnLik1YTh5mj/YEE1p2drByM6bXTGqqEfUlBQDrhOqf8KdBxa4dxNRcWKlB
 0tNMmc0lPHuNsFGhIa98RSTjnkXRl2GzOutIcnXs93hpXKn8yL6/Un1t4hb0mHzlLTyxwVwY/g2PPn
 0AA/IDGt1/cVQydezHH65YVA5Teh+546Vhq2IbW4nmbmef5EAMQ84COS1A81yPDyV5q0JTKBMjPvx1
 p9XYSN4yy8hzTmwMquOoMa64dIFJQHMW0aXviAnk5fYkdWxSk5Gv34nqs96z8x2Eo7/RrZBWNCXpxA
 iRgBViTStzOx2G/rqo/jRde/IWmULIVwPeiAlMApDAYkehjUopupyV601mAfuY2y35Aiv6QNqYtJWc
 T6+pG9F4lzVDFo4lc2g8miE5ZrRBaoNA8GI9PRcWFMSNQAd8K0mEFPFYcYwWm/d8UaoZIqpQnY8Z2P
 uxJ+hkyt/kL3Vl61gkgHMJVOdMx+RocUAFG+5xXN2fv9S6IKl87+y7Q5xRlQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

The pending bit is used to avoid queueing in case the lock is
uncontended, and has demonstrated benefits for the 2 contender scenario,
esp. on x86. In case the pending bit is acquired and we wait for the
locked bit to disappear, we may get stuck due to the lock owner not
making progress. Hence, this waiting loop must be protected with a
timeout check.

To perform a graceful recovery once we decide to abort our lock
acquisition attempt in this case, we must unset the pending bit since we
own it. All waiters undoing their changes and exiting gracefully allows
the lock word to be restored to the unlocked state once all participants
(owner, waiters) have been recovered, and the lock remains usable.
Hence, set the pending bit back to zero before returning to the caller.

Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout
event statistics.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h  |  2 +-
 kernel/locking/lock_events_list.h |  5 +++++
 kernel/locking/rqspinlock.c       | 28 +++++++++++++++++++++++-----
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 96cea871fdd2..d23793d8e64d 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -15,7 +15,7 @@
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
 
-extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
+extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
 
 /*
  * Default timeout for waiting loops is 0.25 seconds
diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 97fb6f3f840a..c5286249994d 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4)	/* # of locking ops that use 4th percpu node */
 LOCK_EVENT(lock_no_node)	/* # of locking ops w/o using percpu node    */
 #endif /* CONFIG_QUEUED_SPINLOCKS */
 
+/*
+ * Locking events for Resilient Queued Spin Lock
+ */
+LOCK_EVENT(rqspinlock_lock_timeout)	/* # of locking ops that timeout	*/
+
 /*
  * Locking events for rwsem
  */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index efa937ea80d9..6be36798ded9 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -154,12 +154,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
+int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	struct rqspinlock_timeout ts;
+	int idx, ret = 0;
 	u32 old, tail;
-	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
@@ -217,8 +217,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * clear_pending_set_locked() implementations imply full
 	 * barriers.
 	 */
-	if (val & _Q_LOCKED_MASK)
-		smp_cond_load_acquire(&lock->locked, !VAL);
+	if (val & _Q_LOCKED_MASK) {
+		RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
+		res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+	}
+
+	if (ret) {
+		/*
+		 * We waited for the locked bit to go back to 0, as the pending
+		 * waiter, but timed out. We need to clear the pending bit since
+		 * we own it. Once a stuck owner has been recovered, the lock
+		 * must be restored to a valid state, hence removing the pending
+		 * bit is necessary.
+		 *
+		 * *,1,* -> *,0,*
+		 */
+		clear_pending(lock);
+		lockevent_inc(rqspinlock_lock_timeout);
+		return ret;
+	}
 
 	/*
 	 * take ownership and clear the pending bit.
@@ -227,7 +244,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	clear_pending_set_locked(lock);
 	lockevent_inc(lock_pending);
-	return;
+	return 0;
 
 	/*
 	 * End of pending bit optimistic spinning and beginning of MCS
@@ -378,5 +395,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * release the node
 	 */
 	__this_cpu_dec(rqnodes[0].mcs.count);
+	return 0;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);

From patchwork Mon Mar  3 15:22:50 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999028
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BFAF222561;
	Mon,  3 Mar 2025 15:23:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015407; cv=none;
 b=bNOz72qgHO0tGr9LCQVcC5arudiuWlO4kof0aD1WzhZgyMFmVb6G48GlDpY7k+OYBVqY1/T2Ypr8k/CH56BhKmiKBsSxdTat1OAzZRtq3z7Py0QdbjDO6h6lXmZBmevWekAuYp9gYXjn0KLqgrxLNbca0hDKC4frVzoW0TPkU3A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015407; c=relaxed/simple;
	bh=/tO7WsHRnQ1y7o7jGLLRZ75jx2rvQksn4PRPhYxulGw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Lk+bIZwCgE9cx/eIHy/CywQridyM3XpCeuqlSJaAr6pH/gguLuzepxJ5CfJyvHFAJ2f9RTaBk09RX8Ib4sk+R53LnZA0BtB6XajDuyAwqWqA8GjOX+8FHLPNArm92uuPq/7+OGACcIrm6lTwuKoH7cfAN++Uq4Lw2CXTnmWstXs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=OFavoeNh; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="OFavoeNh"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-43bbc8b7c65so13968525e9.0;
        Mon, 03 Mar 2025 07:23:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015403; x=1741620203;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=TdqbAA6BbOHNHxPeedO3e4MUz1jdw47Vp3CnPvreE48=;
        b=OFavoeNhctuNGFXzajlY/KtkiyyVB9NuSUcqtDkD6x/eS0XEWHjU9L1md4qwSYtlCE
         da7jDjhEQdFewBosMApe8VsmWp3SrPLHzN2LPJHO/4UAiua/+F3BHlAi3aSL49jBfc5O
         8YA+DnQCyquEBJlOfN6bhO/SZFurLherMLuihbJHZDgclYLJAc4rNoEPn+JQgQy0/D6G
         gS1s0DDw1+F6i2CuhbEKeQPxJ3NZpbDDzxHpjaNYr1SZEHJCwSuYIEMN3J24XEK89s78
         3/9HT/06G8RrfJ6yHveyQmrLHoIq46R0mGFj8p2jPXrovptNWYmreP0F9NfacFQuAr5e
         yKZA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015403; x=1741620203;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=TdqbAA6BbOHNHxPeedO3e4MUz1jdw47Vp3CnPvreE48=;
        b=wTrXta+lFhMBBOXV/yLErsSBkh6sG4WJVUcESHk61auQI3/n292XD4R1pNkqFmVNAj
         mJGNXOVWTDKZMC55YMj6PYPsw47OaVFG/xLoi1Xvie7nSqvBg+NlMHPpOLyUD/9F9yGU
         D9Vrl1APZWh1UKlKJ8qjr9mMU2RMcAUDm9M+bJdImjKP6TdSAZ/msc0VDIekEoVKU96b
         H7exP2StUS3UxhA/YTon1nkAHN+wAvqUVuNCoCDWvqhodzH7hrteSdF77HVOtDRV/D3W
         7ZYvlhsNiaKrnL7MrvNlTlxR2yDXWAmbIXDg23ZrqxhXXqvefOrlmTLKvAo5nMofhl9g
         AICQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCURYOT6mD+idXOSffOJ/8AIDqY/Llew4Anes9l5SL2pFGauvM0Nc+6jXu+AUgqQ985dmb0FUNPd/ZqqG0k=@vger.kernel.org
X-Gm-Message-State: AOJu0YyB2lwA/VLoark8hWP/iG01KEp7Ho4BGCK97+zlpHwJ/JRDBcs7
	kS1qEuj0B9YBqt681HrinBOCY75pCGys4ViYl/jBaYjjqh5lZFB3hJKTFQrL1Ls=
X-Gm-Gg: ASbGncvFgySIeoR2GEzTaZu+efUizK9O3Ver08ZyxHXlIqiOBHWSPQxyIvqtntIVOw+
	g1TteDRxTWVL5qaNSjw+5eTf4pYMEq+mc7XyqAw1ucUGslS+6Q3g2+rX4gQzx+7pPB+h9obv5jM
	qeq1aNHnO1AQDsS8wBbOlRDeRrIS8o2XtVb5T9M9mtfMx3ZPkxYZDaM55FbnyFJ2Rp/MZNnI3yk
	sIV12JhPxgKPoVxa/K7ZUhsgxiSU44Ws+f8pfrdzeH1EWG0AoWBN708xuZXSRAaTJoMvUKPkYsU
	6uymBa4SOt7ZUdepJbL+ibwzK6+4F/irvc8=
X-Google-Smtp-Source: 
 AGHT+IGFxxitWQfJahw973p0I2xV7/kZwy8gicFDigUlKkHKzWL+qRQIH+fD/xVRKEKwh58Qm9F6xw==
X-Received: by 2002:a05:600c:5014:b0:439:89d1:30dc with SMTP id
 5b1f17b1804b1-43ba730d5b6mr142469095e9.10.1741015402850;
        Mon, 03 Mar 2025 07:23:22 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:54::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bc6a8ff01sm23489995e9.39.2025.03.03.07.23.22
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:22 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 10/25] rqspinlock: Protect waiters in queue from
 stalls
Date: Mon,  3 Mar 2025 07:22:50 -0800
Message-ID: <20250303152305.3195648-11-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=8548; h=from:subject;
 bh=/tO7WsHRnQ1y7o7jGLLRZ75jx2rvQksn4PRPhYxulGw=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXBmmTn9Q/JpHcUWagL6D0LQj7D9XnazfmRv20
 F3ZP162JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyoaOD/
 9RNH1I8JHE99sQbIlMiKmtHTFHuJCcV0nvMTdA9wCT6zW9DamNJ7yjyl81IYTGgDbSp7m2QaZu788r
 wMj5fcd2iwhWcBBc3PlDuAKaTQwSz8B5BN1i6UV7vcK1E/AtpCEpvKXpqZH1SSiPx2YkgVYq8KmZxt
 Z/dIeZESDo9b9d1uSbJmJosgaNhZz1AbbuYEp6kniedp3jozmo3YFBcOue3YFKPe2Kg4NTJhwyRrXv
 tULCtm9RRxbu6QxV+kLPyqhYo74CyHJaolrRbLzf8we5rzaMwdJnBzFvKd7i2ZpNaoWQxykdCanhPC
 kPtXLBOdzzqKzwxaC1t4g3BJ7UIeouteWNdMi5qZqLhvMLByPg/i0nacGi4CD64Lriiq0FL+AMSlTM
 GZPhan3dSlPjsN45LIuBIPnN5XtrmgKV629AYm01KKITf7cSzqgl2iJ3dpqSasXs//53rgJxbnJaVq
 VOv1pbHmn+VLbJiV2X88Q43nhN1CwA5HGLBirqzzZ4K0hRzxuo9oGT6S2r76PZFDucgeMDRJB9c7Mm
 GUAgAQ0RCNDP+ehlhqzyWE0+v7m1qVYImUE0vh4QbgRxK8wEBGaupTPCZ0wHn0ruv4PtHa7PABtP+B
 PRV0YKX9hGtJ//vt5pVqTc7HjT1GByFMSa0MhSbB3s6F+v0Qkm6vjB3b7DBQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Implement the wait queue cleanup algorithm for rqspinlock. There are
three forms of waiters in the original queued spin lock algorithm. The
first is the waiter which acquires the pending bit and spins on the lock
word without forming a wait queue. The second is the head waiter that is
the first waiter heading the wait queue. The third form is of all the
non-head waiters queued behind the head, waiting to be signalled through
their MCS node to overtake the responsibility of the head.

In this commit, we are concerned with the second and third kind. First,
we augment the waiting loop of the head of the wait queue with a
timeout. When this timeout happens, all waiters part of the wait queue
will abort their lock acquisition attempts. This happens in three steps.
First, the head breaks out of its loop waiting for pending and locked
bits to turn to 0, and non-head waiters break out of their MCS node spin
(more on that later). Next, every waiter (head or non-head) attempts to
check whether they are also the tail waiter, in such a case they attempt
to zero out the tail word and allow a new queue to be built up for this
lock. If they succeed, they have no one to signal next in the queue to
stop spinning. Otherwise, they signal the MCS node of the next waiter to
break out of its spin and try resetting the tail word back to 0. This
goes on until the tail waiter is found. In case of races, the new tail
will be responsible for performing the same task, as the old tail will
then fail to reset the tail word and wait for its next pointer to be
updated before it signals the new tail to do the same.

We terminate the whole wait queue because of two main reasons. Firstly,
we eschew per-waiter timeouts with one applied at the head of the wait
queue.  This allows everyone to break out faster once we've seen the
owner / pending waiter not responding for the timeout duration from the
head.  Secondly, it avoids complicated synchronization, because when not
leaving in FIFO order, prev's next pointer needs to be fixed up etc.

Lastly, all of these waiters release the rqnode and return to the
caller. This patch underscores the point that rqspinlock's timeout does
not apply to each waiter individually, and cannot be relied upon as an
upper bound. It is possible for the rqspinlock waiters to return early
from a failed lock acquisition attempt as soon as stalls are detected.

The head waiter cannot directly WRITE_ONCE the tail to zero, as it may
race with a concurrent xchg and a non-head waiter linking its MCS node
to the head's MCS node through 'prev->next' assignment.

One notable thing is that we must use RES_DEF_TIMEOUT * 2 as our maximum
duration for the waiting loop (for the wait queue head), since we may
have both the owner and pending bit waiter ahead of us, and in the worst
case, need to span their maximum permitted critical section lengths.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 55 +++++++++++++++++++++++++++++++++++--
 kernel/locking/rqspinlock.h | 48 ++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+), 3 deletions(-)
 create mode 100644 kernel/locking/rqspinlock.h

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 6be36798ded9..9ad18b3c46f7 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -77,6 +77,8 @@ struct rqspinlock_timeout {
 	u16 spin;
 };
 
+#define RES_TIMEOUT_VAL	2
+
 static noinline int check_timeout(struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
@@ -321,12 +323,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
+		int val;
+
 		prev = decode_tail(old, rqnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		arch_mcs_spin_lock_contended(&node->locked);
+		val = arch_mcs_spin_lock_contended(&node->locked);
+		if (val == RES_TIMEOUT_VAL) {
+			ret = -EDEADLK;
+			goto waitq_timeout;
+		}
 
 		/*
 		 * While waiting for the MCS lock, the next pointer may have
@@ -349,8 +357,49 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * store-release that clears the locked bit and create lock
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
+	 *
+	 * We use RES_DEF_TIMEOUT * 2 as the duration, as RES_DEF_TIMEOUT is
+	 * meant to span maximum allowed time per critical section, and we may
+	 * have both the owner of the lock and the pending bit waiter ahead of
+	 * us.
 	 */
-	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+	RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2);
+	val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
+					   RES_CHECK_TIMEOUT(ts, ret));
+
+waitq_timeout:
+	if (ret) {
+		/*
+		 * If the tail is still pointing to us, then we are the final waiter,
+		 * and are responsible for resetting the tail back to 0. Otherwise, if
+		 * the cmpxchg operation fails, we signal the next waiter to take exit
+		 * and try the same. For a waiter with tail node 'n':
+		 *
+		 * n,*,* -> 0,*,*
+		 *
+		 * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is
+		 * possible locked/pending bits keep changing and we see failures even
+		 * when we remain the head of wait queue. However, eventually,
+		 * pending bit owner will unset the pending bit, and new waiters
+		 * will queue behind us. This will leave the lock owner in
+		 * charge, and it will eventually either set locked bit to 0, or
+		 * leave it as 1, allowing us to make progress.
+		 *
+		 * We terminate the whole wait queue for two reasons. Firstly,
+		 * we eschew per-waiter timeouts with one applied at the head of
+		 * the wait queue.  This allows everyone to break out faster
+		 * once we've seen the owner / pending waiter not responding for
+		 * the timeout duration from the head.  Secondly, it avoids
+		 * complicated synchronization, because when not leaving in FIFO
+		 * order, prev's next pointer needs to be fixed up etc.
+		 */
+		if (!try_cmpxchg_tail(lock, tail, 0)) {
+			next = smp_cond_load_relaxed(&node->next, VAL);
+			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
+		}
+		lockevent_inc(rqspinlock_lock_timeout);
+		goto release;
+	}
 
 	/*
 	 * claim the lock:
@@ -395,6 +444,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 * release the node
 	 */
 	__this_cpu_dec(rqnodes[0].mcs.count);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
diff --git a/kernel/locking/rqspinlock.h b/kernel/locking/rqspinlock.h
new file mode 100644
index 000000000000..3cec3a0f2d7e
--- /dev/null
+++ b/kernel/locking/rqspinlock.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock defines
+ *
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __LINUX_RQSPINLOCK_H
+#define __LINUX_RQSPINLOCK_H
+
+#include "qspinlock.h"
+
+/*
+ * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value
+ * @lock: Pointer to queued spinlock structure
+ * @tail: The tail to compare against
+ * @new_tail: The new queue tail code word
+ * Return: Bool to indicate whether the cmpxchg operation succeeded
+ *
+ * This is used by the head of the wait queue to clean up the queue.
+ * Provides relaxed ordering, since observers only rely on initialized
+ * state of the node which was made visible through the xchg_tail operation,
+ * i.e. through the smp_wmb preceding xchg_tail.
+ *
+ * We avoid using 16-bit cmpxchg, which is not available on all architectures.
+ */
+static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		/*
+		 * Is the tail part we compare to already stale? Fail.
+		 */
+		if ((old & _Q_TAIL_MASK) != tail)
+			return false;
+		/*
+		 * Encode latest locked/pending state for new tail.
+		 */
+		new = (old & _Q_LOCKED_PENDING_MASK) | new_tail;
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return true;
+}
+
+#endif /* __LINUX_RQSPINLOCK_H */

From patchwork Mon Mar  3 15:22:51 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999029
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com
 [209.85.221.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E44122689C;
	Mon,  3 Mar 2025 15:23:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015408; cv=none;
 b=lBa6jqcrgh6pk8oD93xTvvOMo4bYAZPVFsFADYjWadhH17C8QK7/tL2PU+gI74c3oooX/64JrxkH/o+lo8bU8XPEgyeuyiHTPZ5mWzJoP82cVl9ir8OwasQLdHAN5fQJkuBGDCOTNAiWN7x/H8IsQgLbAbbX5hGXlYn9pi08Qu4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015408; c=relaxed/simple;
	bh=Q9a38BvImlUP8RKT7WE0lh2B2I6Iy4KpJSDwsgIePN4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=XCvQqvA4BLUq9vFTKhqbkiglTir4NSX3aRb8wxUgxlpU4WFFJX4CAv3Ag5NUMhLRwpb3nIhl7MAZYCb4VtKXieimOPQMkn0vUx2eRtkYLH/TN0z+WYFxY4yOegQTjb0tRKlTmB8uHkL9Rp/orB7knWnkNuzOSCjkA/eMArqMICo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=OJseEvj3; arc=none smtp.client-ip=209.85.221.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="OJseEvj3"
Received: by mail-wr1-f68.google.com with SMTP id
 ffacd0b85a97d-38a25d4b9d4so2800841f8f.0;
        Mon, 03 Mar 2025 07:23:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015404; x=1741620204;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=3HqVASaJxCqTvezKR1/7pImOEJNTa/8ze2+bN0p0ybM=;
        b=OJseEvj3XZ/R5cm5xFxU41w/qS4yftAmuBkTGibF9q7vkW1wx7hAtGwbTaV97shn81
         OO0YspiKCRyqIKU45QR/uhfBDO98FjaurMo2cblur57AigQBPOuD6CbJpn9XajQ3HHgQ
         ywkXLcr3DB1fXTSY2JpfRoMG7ySMpMnmwWt6OdsFJfcis+3vBreIdl74mX+k0H+3AMhD
         +Imxht8gr/0ugpjvQ7N0E97pugH+yrRVRDThr0zxCkc1tXghfvXkpit0KyhthTjbpWfx
         NHP4+/7d7eJd5lAFsPtXveNyA/wkkAmioBhQ6tOslmOGBDHWyvadj6i4N8TvITFMUSyc
         NVjg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015404; x=1741620204;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=3HqVASaJxCqTvezKR1/7pImOEJNTa/8ze2+bN0p0ybM=;
        b=YRt0e29N05C9pEuSbWyzEy6PctiXbo0pFivHcISG+Bjkqfk17m4WID5QPW6ZmW/YKf
         /zgGzeoBAb70W52Kmj/CeCFXnxU5QMdl7y9ACwDwfcIy1WXz4WlHIGRiDaIq6Hg4HP8V
         635yWfDkr+7CDfwd++VQUtrtA9M1IG/MbKYtPk3QCV1QE0EMvwJc0XlKUeearJ1Oib/1
         Zj7JitRhfmm02kiVWfUZTYJ0vTVGCp7pMyAeh2D7mrsYakxqe7gqdQHURB4h8E5n9UHd
         bVTV68bFIn4pL3p+/1xP9nGXi95FcQs3Rrp3wq1XXjCn504L9qvU0XURITbzLxKOgElH
         NRmg==
X-Forwarded-Encrypted: i=1;
 AJvYcCWxVa5ad8Bn5Zz4wu2w+VIHkjItAiI1dW82POIbNXCw41zZWgoAbJfdYrdD4AowYC2Ks/huSqjf0fpBfS0=@vger.kernel.org
X-Gm-Message-State: AOJu0YwVuCQ7HSR3IChRSdfhI8WufXp03rJoF6rzxR/QcwpxEJhqVN1V
	FY74/JrL/mO1FZhRszRqP41QUINjZDzsNBOuKZjp5Xf4Fv8X1Un3VSGK9BL4cgQ=
X-Gm-Gg: ASbGncsWVl1mZu9OKKkuHsUGmv4l1JyIMaMEguUE1sVNW75aC/khFFH+i5a/SjsRWbH
	JJWs+7VpZHfA78jTU9eWsciWjjXnCSV20HS+ZqS7fwnnKOxRyXEjTuPIvOa1rPR/0KJzbcGfwEC
	B8nC/axVyUM61ioQATXk7sQo4RrDQ9EvXi/BXIi3ddLY7/Z2vqHGLOOYNNtNtvP59cZOuQhVEX5
	12SPAq2uGhMb2DOBTPnRYVZVt3xuUopH9vt6rK6OpBM0xkhW4VFrUW/uGdS2VWwSfKu4iSE273u
	ssB83VdqGwitJnBCOIBIQ5DkekxFCsmrcTc=
X-Google-Smtp-Source: 
 AGHT+IGJtvDm6QNWOZ/Cpj2mx/hKnagicnAIvDMSVdaEmiufPdG1r9WWQB7409X5ARNMd+GMAA5o0g==
X-Received: by 2002:a5d:5c84:0:b0:390:f1cb:286e with SMTP id
 ffacd0b85a97d-390f1cb2ceemr5781942f8f.27.1741015404157;
        Mon, 03 Mar 2025 07:23:24 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:44::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e47a7473sm14977125f8f.38.2025.03.03.07.23.23
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:23 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 11/25] rqspinlock: Protect waiters in trylock
 fallback from stalls
Date: Mon,  3 Mar 2025 07:22:51 -0800
Message-ID: <20250303152305.3195648-12-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=1845; h=from:subject;
 bh=Q9a38BvImlUP8RKT7WE0lh2B2I6Iy4KpJSDwsgIePN4=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXG8w8qbawJw3OszHwUL3Z+OsS55BpLQ/c3hWE
 /xTGsM2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8Ryn5oD/
 96RVXMaF0ne5BLzCBhyNtkJBr6CAlauxQ3pblR3jmHlcP2GkKE00BZqXSuo5K0M0W8hzmyVBm8WZpv
 QmAprnJvjPvw3bTmZO232VjrI6ST5tJSd6XooAlRFPv85J3lQFS6z5IVGVtrJ/t6kbOg+xfwatWJVH
 JpGj9jJXyCIiXjHxaK91EcYxqzLOnLljC6WmGxE1N0lmh870+qkXn0vCZjMQ0lyW8wHEBbZ6Ywy8Gz
 55T4SsC+beWbwhBNCJn02UhnONKtZqi0rm69Mb6AY+eRzPn9ZAI/n8F3jYqkY+yDCFq1Z4mdsWAsU+
 UcxEA0xxZEA1gBVbVzAhkYvCHyFYnA/ZKE/8EZl6+ARIPGiQVPEqYETfeYZTVMxznw060kQAguI7WX
 WEFHRA8OzFTHycOX3/U8m+f3+9dgBhhVfpf0wJJHSMzNPR8J1we3rcdGBFiA3d8MdoQPth0iHEU2zP
 AGaEkhSEo4gXrUXNZnLq3ZpLEB+nn1ImiZ31A2o/3uJ4Vw9/x4a0iPKa01RJjV83Kjms2TGGOvbW2N
 GPkv4fOWnkSZv94YXDW/lzuzDng6iHB/gMJzFDkesuEjSysN2zSjMlyomMe3ja5puRwyFe2tE80jLM
 qADm2HxuhDZkxotSHNpoBoyUppVvy95LQycs0czfCa2Iqw1nW/eGWv4P5IYQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

When we run out of maximum rqnodes, the original queued spin lock slow
path falls back to a try lock. In such a case, we are again susceptible
to stalls in case the lock owner fails to make progress. We use the
timeout as a fallback to break out of this loop and return to the
caller. This is a fallback for an extreme edge case, when on the same
CPU we run out of all 4 qnodes. When could this happen? We are in slow
path in task context, we get interrupted by an IRQ, which while in the
slow path gets interrupted by an NMI, whcih in the slow path gets
another nested NMI, which enters the slow path. All of the interruptions
happen after node->count++.

We use RES_DEF_TIMEOUT as our spinning duration, but in the case of this
fallback, no fairness is guaranteed, so the duration may be too small
for contended cases, as the waiting time is not bounded. Since this is
an extreme corner case, let's just prefer timing out instead of
attempting to spin for longer.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 9ad18b3c46f7..16ec1b9eb005 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -271,8 +271,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
-		while (!queued_spin_trylock(lock))
+		RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
+		while (!queued_spin_trylock(lock)) {
+			if (RES_CHECK_TIMEOUT(ts, ret)) {
+				lockevent_inc(rqspinlock_lock_timeout);
+				break;
+			}
 			cpu_relax();
+		}
 		goto release;
 	}
 

From patchwork Mon Mar  3 15:22:52 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999030
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8C9522A817;
	Mon,  3 Mar 2025 15:23:27 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015410; cv=none;
 b=dgL0eEVqSw+R+GDGYSAwFum8k/1ySlk+YWJe+gJom5b2rZVKPMI05/tIY/ZBIy1vmDy/Hkcg5wYUdaNM6Y2Xa3zyTNGaiTcboG2V6hOak9yuVbInG47D/s5dQcmIWRQxItld/PzV8H6XPffvICi9BM9RT8hNbQNkQ3l4t+azZoo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015410; c=relaxed/simple;
	bh=q+PlHVYmXyQy+isvTJhD2W+ttvErhONLhj2AWs0Sbuk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ZtIKLl7G6Mbvm9g1hm/V6jDWNpc6DciUsvD1rLpHF4+9mULPBLD0KUp1Rx8qDiPLCtlBNiWTo33zhRqXSmrFCs4hLC+TXVNJ4jxq2q29eYDExwWeV2MPYN9t98GBsBzsCGb5EOHApfVX8MVKwzWO1WN6QrywR/UImyEP9QD0Lk0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Gfpo1w9S; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Gfpo1w9S"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-4399a1eada3so42083525e9.2;
        Mon, 03 Mar 2025 07:23:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015406; x=1741620206;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=tKa/wmg3ittWkt3o2YJLMmVqh431E62mqFSXV7PYA3Q=;
        b=Gfpo1w9SY0es2ojVK4tXzK9oaUkJRiBDd6tTAqhqmWLz2H9h1Nqya9MzeILzaFKZaK
         uiU95nTuLiwxT5Z87flL7E3QcPD7jO5Y5ITXc3UCbOnEKWkUGD7x617JoDI8Y+McPdO7
         cLZef0P+k4rQAC4M1rSF1nWMPDvgacY8agL/Ql87Ll/Q7hRNQrtJ0TOUY9w8KxRzVa5+
         FRZEPGEHlmMfyB51ZQmQZmMPSvh3dEk29WjrhXiCxtzVTLEofna2sxmmWbE+1n6oN4DN
         HF4e15uYVtLx/+dnDCPPiq5XIPVLDGzrOu7mAfVWbG1Wl50q6XkQJ+JPnT9BkFqLvpSf
         II6A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015406; x=1741620206;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=tKa/wmg3ittWkt3o2YJLMmVqh431E62mqFSXV7PYA3Q=;
        b=rbuFCSJneXsZejfAqGcwCRrSwPfbflpeyVe2Jmsw+Y5Fqk9mz+trjtDsHEGup2hpGY
         t5ZAAQreYwCkad3184U2qE2vzTJVO8SLyuIpAgY0z4RS9h27ZFGGDFWEOHfe7ar0ik+U
         VPM97dg0Y+It0Rj6983g1zTB/U7gFv2P3OGQFW+iBlWktM9RTqTiKQGh1XBU+bpq4xe/
         RpxkddAUelqW9hvk44O5YE8d/ieeQrrKuq9E0LWtkb3Ezg21mwugaW+I1f3/ty0XBUg9
         Z5u2n/4LhJ62YI060xhNsad6NScV2o2HZE1rN5pc6Em4Ug9iwMeFjbH02rGBpL3cuu+i
         ecPA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUpy1RdemLSndtY064snxmvnR0anS5d6V25B/UY7ix6P+DWh5+2+APoI2hX2gnBuzkeZS7o1Q7rPXatksY=@vger.kernel.org
X-Gm-Message-State: AOJu0YxWIOQ0vCAlHTufBBd38X0oy5yqDRbMoX+Pc8yxKSiSzZbPIp6j
	qfUJHje5ezZAjjtEXNtGrMXZd6dIlEkbIKGKnJw+1o+cHlrG1nPDzuZL7pfVdJA=
X-Gm-Gg: ASbGnct1RNwXHqDpkqYsPw3L8TW3YbSlzZZ1zm97LrB2DDJJY+bQeZHibK4QUUxVPP0
	yMq9U82CEEmCy4cTzjXDoBmSMJRIr0prjZLc3FLg2jAe6PlyaVtn6EijSXMIhVyJZHYlycuwqYA
	H6OlbKPbJpJ0g7r+mY5VrWjWfH+DHdRSoTceDBYL4gCAFfi9jKE3auiBlWSKzoYEMDLH8vH1c21
	znA2FaiVa12+iYZPmJsyBtHjTPqscHawhJvo3gZ37KmLRjKGbTvtFHRGqkJl33K1agOm2o0Do7o
	oJzOZpPOnlpJQNWUX4giML/7lKyJ+Q+IEHQ=
X-Google-Smtp-Source: 
 AGHT+IG1blRJwAMlXeHRPq+X+kGTGI57ZLlk81V9zn4NhYMCeonnkf5jNU1HzLwhFv1wOWVdopS8gA==
X-Received: by 2002:a5d:64cf:0:b0:38f:30a3:51fe with SMTP id
 ffacd0b85a97d-390eca53071mr9776559f8f.42.1741015405695;
        Mon, 03 Mar 2025 07:23:25 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:4f::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e4844ac6sm14636626f8f.71.2025.03.03.07.23.24
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:24 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 12/25] rqspinlock: Add deadlock detection and
 recovery
Date: Mon,  3 Mar 2025 07:22:52 -0800
Message-ID: <20250303152305.3195648-13-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=16419; h=from:subject;
 bh=q+PlHVYmXyQy+isvTJhD2W+ttvErhONLhj2AWs0Sbuk=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWX+cvAWmiPyeTrpA7wf0kdH6RoTzfmmo3/HOrt
 rkmH2BWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RylcSD/
 98d0x1nLa8rNVvdG3GA2itnPRdooLYI4T2fCCWt7Ts3e/+FELsL6YecnuYvTRcC8lgfNog49Q61vR9
 ZbdiJhRYp0foaE6nkdJ6ZNdOUUqB7XNQ/5a9wspJwtac9mvnLvvflyBBpMra1iGmSbkeC1INTGs0k9
 d4LX43Pt6LiEThANydEY2aB/KLWgJ8yJ0GI8rF9cmw9HZVEiXBI+DnpCcdFXfOXk4JFOIsM6J3+9uR
 gF1XAmwR9+OPBLj3ew1jqeOvImHAC6Q3KhH0fM2Khxp54yewJN3fdK27ecmcZ7dFD955LNup8Jwhc9
 6No4sxHtKn3TRa233wc0E2c0WN41u0ANO2hskwdvDQrpBuGt1oekXg4lIEhasIAqkmRUwda8Se6FsW
 917K8r3V5eRVzYlOuc2QnlSi1Z7UwvssgtZzlONowb11BDRvYdc6n30bdv8j5j5CkOWLS8QRFw1R2d
 02L7M5THvXBlDnXEF17/fIPhaBsUgYWj9Tb6Te0pcUFTth8/hsrrqoLxYgdezymUP+YaKk4gSaxwYS
 UKsOuzcvXTm/Ole1m5oquCmbxoW+PiysOrhmj6Bjrhr2L2rTOE7Il4KP1WZyiYF3CZJJblxuxIzJD9
 ftOVHvT1bfBqlCs5q/Ga8rae8DZswqhS7fMfgOHnwnPkrgWBo+l+BmoWa58A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

While the timeout logic provides guarantees for the waiter's forward
progress, the time until a stalling waiter unblocks can still be long.
The default timeout of 1/2 sec can be excessively long for some use
cases.  Additionally, custom timeouts may exacerbate recovery time.

Introduce logic to detect common cases of deadlocks and perform quicker
recovery. This is done by dividing the time from entry into the locking
slow path until the timeout into intervals of 1 ms. Then, after each
interval elapses, deadlock detection is performed, while also polling
the lock word to ensure we can quickly break out of the detection logic
and proceed with lock acquisition.

A 'held_locks' table is maintained per-CPU where the entry at the bottom
denotes a lock being waited for or already taken. Entries coming before
it denote locks that are already held. The current CPU's table can thus
be looked at to detect AA deadlocks. The tables from other CPUs can be
looked at to discover ABBA situations. Finally, when a matching entry
for the lock being taken on the current CPU is found on some other CPU,
a deadlock situation is detected. This function can take a long time,
therefore the lock word is constantly polled in each loop iteration to
ensure we can preempt detection and proceed with lock acquisition, using
the is_lock_released check.

We set 'spin' member of rqspinlock_timeout struct to 0 to trigger
deadlock checks immediately to perform faster recovery.

Note: Extending lock word size by 4 bytes to record owner CPU can allow
faster detection for ABBA. It is typically the owner which participates
in a ABBA situation. However, to keep compatibility with existing lock
words in the kernel (struct qspinlock), and given deadlocks are a rare
event triggered by bugs, we choose to favor compatibility over faster
detection.

The release_held_lock_entry function requires an smp_wmb, while the
release store on unlock will provide the necessary ordering for us. Add
comments to document the subtleties of why this is correct. It is
possible for stores to be reordered still, but in the context of the
deadlock detection algorithm, a release barrier is sufficient and
needn't be stronger for unlock's case.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 100 +++++++++++++++++
 kernel/locking/rqspinlock.c      | 185 ++++++++++++++++++++++++++++---
 2 files changed, 271 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index d23793d8e64d..b685f243cf96 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -11,6 +11,7 @@
 
 #include <linux/types.h>
 #include <vdso/time64.h>
+#include <linux/percpu.h>
 
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
@@ -22,4 +23,103 @@ extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
  */
 #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4)
 
+/*
+ * Choose 31 as it makes rqspinlock_held cacheline-aligned.
+ */
+#define RES_NR_HELD 31
+
+struct rqspinlock_held {
+	int cnt;
+	void *locks[RES_NR_HELD];
+};
+
+DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static __always_inline void grab_held_lock_entry(void *lock)
+{
+	int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt);
+
+	if (unlikely(cnt > RES_NR_HELD)) {
+		/* Still keep the inc so we decrement later. */
+		return;
+	}
+
+	/*
+	 * Implied compiler barrier in per-CPU operations; otherwise we can have
+	 * the compiler reorder inc with write to table, allowing interrupts to
+	 * overwrite and erase our write to the table (as on interrupt exit it
+	 * will be reset to NULL).
+	 *
+	 * It is fine for cnt inc to be reordered wrt remote readers though,
+	 * they won't observe our entry until the cnt update is visible, that's
+	 * all.
+	 */
+	this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock);
+}
+
+/*
+ * We simply don't support out-of-order unlocks, and keep the logic simple here.
+ * The verifier prevents BPF programs from unlocking out-of-order, and the same
+ * holds for in-kernel users.
+ *
+ * It is possible to run into misdetection scenarios of AA deadlocks on the same
+ * CPU, and missed ABBA deadlocks on remote CPUs if this function pops entries
+ * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
+ * logic to preserve right entries in the table would be to walk the array of
+ * held locks and swap and clear out-of-order entries, but that's too
+ * complicated and we don't have a compelling use case for out of order unlocking.
+ */
+static __always_inline void release_held_lock_entry(void)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto dec;
+	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
+dec:
+	/*
+	 * Reordering of clearing above with inc and its write in
+	 * grab_held_lock_entry that came before us (in same acquisition
+	 * attempt) is ok, we either see a valid entry or NULL when it's
+	 * visible.
+	 *
+	 * But this helper is invoked when we unwind upon failing to acquire the
+	 * lock. Unlike the unlock path which constitutes a release store after
+	 * we clear the entry, we need to emit a write barrier here. Otherwise,
+	 * we may have a situation as follows:
+	 *
+	 * <error> for lock B
+	 * release_held_lock_entry
+	 *
+	 * try_cmpxchg_acquire for lock A
+	 * grab_held_lock_entry
+	 *
+	 * Lack of any ordering means reordering may occur such that dec, inc
+	 * are done before entry is overwritten. This permits a remote lock
+	 * holder of lock B (which this CPU failed to acquire) to now observe it
+	 * as being attempted on this CPU, and may lead to misdetection (if this
+	 * CPU holds a lock it is attempting to acquire, leading to false ABBA
+	 * diagnosis).
+	 *
+	 * In case of unlock, we will always do a release on the lock word after
+	 * releasing the entry, ensuring that other CPUs cannot hold the lock
+	 * (and make conclusions about deadlocks) until the entry has been
+	 * cleared on the local CPU, preventing any anomalies. Reordering is
+	 * still possible there, but a remote CPU cannot observe a lock in our
+	 * table which it is already holding, since visibility entails our
+	 * release store for the said lock has not retired.
+	 *
+	 * In theory we don't have a problem if the dec and WRITE_ONCE above get
+	 * reordered with each other, we either notice an empty NULL entry on
+	 * top (if dec succeeds WRITE_ONCE), or a potentially stale entry which
+	 * cannot be observed (if dec precedes WRITE_ONCE).
+	 *
+	 * Emit the write barrier _before_ the dec, this permits dec-inc
+	 * reordering but that is harmless as we'd have new entry set to NULL
+	 * already, i.e. they cannot precede the NULL store above.
+	 */
+	smp_wmb();
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+}
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 16ec1b9eb005..ce2bc0a85a07 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -31,6 +31,7 @@
  */
 #include "qspinlock.h"
 #include "lock_events.h"
+#include "rqspinlock.h"
 
 /*
  * The basic principle of a queue-based spinlock can best be understood
@@ -74,16 +75,146 @@
 struct rqspinlock_timeout {
 	u64 timeout_end;
 	u64 duration;
+	u64 cur;
 	u16 spin;
 };
 
 #define RES_TIMEOUT_VAL	2
 
-static noinline int check_timeout(struct rqspinlock_timeout *ts)
+DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts)
+{
+	if (!(atomic_read_acquire(&lock->val) & (mask)))
+		return true;
+	return false;
+}
+
+static noinline int check_deadlock_AA(rqspinlock_t *lock, u32 mask,
+				      struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int cnt = min(RES_NR_HELD, rqh->cnt);
+
+	/*
+	 * Return an error if we hold the lock we are attempting to acquire.
+	 * We'll iterate over max 32 locks; no need to do is_lock_released.
+	 */
+	for (int i = 0; i < cnt - 1; i++) {
+		if (rqh->locks[i] == lock)
+			return -EDEADLK;
+	}
+	return 0;
+}
+
+/*
+ * This focuses on the most common case of ABBA deadlocks (or ABBA involving
+ * more locks, which reduce to ABBA). This is not exhaustive, and we rely on
+ * timeouts as the final line of defense.
+ */
+static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask,
+					struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int rqh_cnt = min(RES_NR_HELD, rqh->cnt);
+	void *remote_lock;
+	int cpu;
+
+	/*
+	 * Find the CPU holding the lock that we want to acquire. If there is a
+	 * deadlock scenario, we will read a stable set on the remote CPU and
+	 * find the target. This would be a constant time operation instead of
+	 * O(NR_CPUS) if we could determine the owning CPU from a lock value, but
+	 * that requires increasing the size of the lock word.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu);
+		int real_cnt = READ_ONCE(rqh_cpu->cnt);
+		int cnt = min(RES_NR_HELD, real_cnt);
+
+		/*
+		 * Let's ensure to break out of this loop if the lock is available for
+		 * us to potentially acquire.
+		 */
+		if (is_lock_released(lock, mask, ts))
+			return 0;
+
+		/*
+		 * Skip ourselves, and CPUs whose count is less than 2, as they need at
+		 * least one held lock and one acquisition attempt (reflected as top
+		 * most entry) to participate in an ABBA deadlock.
+		 *
+		 * If cnt is more than RES_NR_HELD, it means the current lock being
+		 * acquired won't appear in the table, and other locks in the table are
+		 * already held, so we can't determine ABBA.
+		 */
+		if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD)
+			continue;
+
+		/*
+		 * Obtain the entry at the top, this corresponds to the lock the
+		 * remote CPU is attempting to acquire in a deadlock situation,
+		 * and would be one of the locks we hold on the current CPU.
+		 */
+		remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]);
+		/*
+		 * If it is NULL, we've raced and cannot determine a deadlock
+		 * conclusively, skip this CPU.
+		 */
+		if (!remote_lock)
+			continue;
+		/*
+		 * Find if the lock we're attempting to acquire is held by this CPU.
+		 * Don't consider the topmost entry, as that must be the latest lock
+		 * being held or acquired.  For a deadlock, the target CPU must also
+		 * attempt to acquire a lock we hold, so for this search only 'cnt - 1'
+		 * entries are important.
+		 */
+		for (int i = 0; i < cnt - 1; i++) {
+			if (READ_ONCE(rqh_cpu->locks[i]) != lock)
+				continue;
+			/*
+			 * We found our lock as held on the remote CPU.  Is the
+			 * acquisition attempt on the remote CPU for a lock held
+			 * by us?  If so, we have a deadlock situation, and need
+			 * to recover.
+			 */
+			for (int i = 0; i < rqh_cnt - 1; i++) {
+				if (rqh->locks[i] == remote_lock)
+					return -EDEADLK;
+			}
+			/*
+			 * Inconclusive; retry again later.
+			 */
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static noinline int check_deadlock(rqspinlock_t *lock, u32 mask,
+				   struct rqspinlock_timeout *ts)
+{
+	int ret;
+
+	ret = check_deadlock_AA(lock, mask, ts);
+	if (ret)
+		return ret;
+	ret = check_deadlock_ABBA(lock, mask, ts);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
+				  struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
+	u64 prev = ts->cur;
 
 	if (!ts->timeout_end) {
+		ts->cur = time;
 		ts->timeout_end = time + ts->duration;
 		return 0;
 	}
@@ -91,6 +222,15 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
 	if (time > ts->timeout_end)
 		return -ETIMEDOUT;
 
+	/*
+	 * A millisecond interval passed from last time? Trigger deadlock
+	 * checks.
+	 */
+	if (prev + NSEC_PER_MSEC < time) {
+		ts->cur = time;
+		return check_deadlock(lock, mask, ts);
+	}
+
 	return 0;
 }
 
@@ -99,21 +239,22 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
  * as the macro does internal amortization for us.
  */
 #ifndef res_smp_cond_load_acquire
-#define RES_CHECK_TIMEOUT(ts, ret)                    \
-	({                                            \
-		if (!(ts).spin++)                     \
-			(ret) = check_timeout(&(ts)); \
-		(ret);                                \
+#define RES_CHECK_TIMEOUT(ts, ret, mask)                              \
+	({                                                            \
+		if (!(ts).spin++)                                     \
+			(ret) = check_timeout((lock), (mask), &(ts)); \
+		(ret);                                                \
 	})
 #else
-#define RES_CHECK_TIMEOUT(ts, ret, mask)	      \
+#define RES_CHECK_TIMEOUT(ts, ret, mask)			      \
 	({ (ret) = check_timeout(&(ts)); })
 #endif
 
 /*
  * Initialize the 'spin' member.
+ * Set spin member to 0 to trigger AA/ABBA checks immediately.
  */
-#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; })
+#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 0; })
 
 /*
  * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
@@ -208,6 +349,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		goto queue;
 	}
 
+	/*
+	 * Grab an entry in the held locks array, to enable deadlock detection.
+	 */
+	grab_held_lock_entry(lock);
+
 	/*
 	 * We're pending, wait for the owner to go away.
 	 *
@@ -221,7 +367,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	if (val & _Q_LOCKED_MASK) {
 		RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
-		res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+		res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
 	}
 
 	if (ret) {
@@ -236,7 +382,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		 */
 		clear_pending(lock);
 		lockevent_inc(rqspinlock_lock_timeout);
-		return ret;
+		goto err_release_entry;
 	}
 
 	/*
@@ -254,6 +400,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
+	/*
+	 * Grab deadlock detection entry for the queue path.
+	 */
+	grab_held_lock_entry(lock);
+
 	node = this_cpu_ptr(&rqnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -273,9 +424,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 		lockevent_inc(lock_no_node);
 		RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
 		while (!queued_spin_trylock(lock)) {
-			if (RES_CHECK_TIMEOUT(ts, ret)) {
+			if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) {
 				lockevent_inc(rqspinlock_lock_timeout);
-				break;
+				goto err_release_node;
 			}
 			cpu_relax();
 		}
@@ -371,7 +522,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2);
 	val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
-					   RES_CHECK_TIMEOUT(ts, ret));
+					   RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
 
 waitq_timeout:
 	if (ret) {
@@ -404,7 +555,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
 		}
 		lockevent_inc(rqspinlock_lock_timeout);
-		goto release;
+		goto err_release_node;
 	}
 
 	/*
@@ -451,5 +602,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	 */
 	__this_cpu_dec(rqnodes[0].mcs.count);
 	return ret;
+err_release_node:
+	trace_contention_end(lock, ret);
+	__this_cpu_dec(rqnodes[0].mcs.count);
+err_release_entry:
+	release_held_lock_entry();
+	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);

From patchwork Mon Mar  3 15:22:53 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999031
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com
 [209.85.221.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 767242144DA;
	Mon,  3 Mar 2025 15:23:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015411; cv=none;
 b=L74UweeZsizcJtdM2e2TVeXp8HbGXOpfQiXppnaN/EVqBiHbiAGfGGW5VItyvce7q7zFSZy4aGSeez7Tqi2l6dPO0Nx6deIzdQbJSYsotSsL0Nl50EdqvFJZhI1hnY++2xAHZ4Kn4dIECvDyBohgsLhvxdd29UYZJ4ZucUB4uTw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015411; c=relaxed/simple;
	bh=o1MKDNUhE9EFH4RwMPASaZvJKuJrMCZ8uocCQm849Ik=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=r8K1Ci0Qf+JbVPG7rNJA3eEamLwcvhem4/r4xxZ42QxukbL+Eq6bIz2atjjxlL+5LPVCFhpTdTH2FAmwN8qoQHq7YM5hMWb+nC0spNXikebfxtcIy1+KFaq5IixgPVIsPdQVuVfLlb7hHQme39Bpx7vSCasCbSKD0ssupgU/GPQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=if2UKnKk; arc=none smtp.client-ip=209.85.221.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="if2UKnKk"
Received: by mail-wr1-f67.google.com with SMTP id
 ffacd0b85a97d-390ec7c2cd8so2125332f8f.1;
        Mon, 03 Mar 2025 07:23:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015407; x=1741620207;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Narwm03ehkHlcNbrRom2a3WOE/jJ7XHKhd4vuGjJYfQ=;
        b=if2UKnKkmKtTN+nRd8KBMZL9tSyE2nMh+MkKabwRqw6KdaCqGAyZoNWiw1CK2x5T+o
         O3b3EDVoq96b/fVgqHyH0YFJ86YAcOZh0giiN6vx9XhMKyZUrTtgXQ1yTn50+8jqTMfl
         K4rrPzkA1mEAxTQDHW1gLTay4rDwfAUVP6SNDkzsdI48weZL9J6VuTxS67fe6O907LwJ
         y7UtDts0jga0c8zhJ8NskTuolRfwEgWyMhmDzb8cNIs3UiWh8ALOYrLI6ZhE1IAQVtK2
         3Rz4QxM8BNPMSoqbdJni1pxvUi44shZuu2IzltI2eGlVFjkuCH8h7tR7dcolKn75/bu8
         3u6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015407; x=1741620207;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Narwm03ehkHlcNbrRom2a3WOE/jJ7XHKhd4vuGjJYfQ=;
        b=gcBgEKjbq5qGU6kBI564fZB4I1N1rM4cwOzo2WN1c6y6AyZdFVakxRK2Lb7OXjjR+7
         eoiOdj4DseEiDL8Bxf1T9Z/7z5sXdcXFPHQjgS/Yao3yXzcqES9hrDMykz9aZBuvLRuj
         R6MrU+i9d9KoGecGtXhPCs5hnIDFiyilwPiVkuBp6zlJoZgODaj9ieSnX7CNd8xCAb6f
         hOErFqNYcSdTYk45HmxxvuCHJ9YWxkBNZ7F8RiABe0r6wpbVj4SG0v04EnOhxNSwDL7s
         VgqBJc1W/D+e07DvY2l1S9sBVUMMVweKVynnlDtPTk7M8YTOCw0fdXl9Batluzfnclta
         42Og==
X-Forwarded-Encrypted: i=1;
 AJvYcCUC4uCpoNSbC0tjc5dwu/ld1ijwxCbKT1aJzSo6J127pa8cem8nweSUtHzEZqyxNktOuQca8ib7CzFUPTQ=@vger.kernel.org
X-Gm-Message-State: AOJu0YxuSdGDWoaZRgZjLij3I1V5KJg6ksW2Unf2SfsfJsi3RSkpSWFX
	cmKuypz6tsbIk17vIr5O1+iMfoqPc1JmM+g3sTlAE+AdwfYZ7tnBSWlT4/eJZs4=
X-Gm-Gg: ASbGncsd7frVPbtj1TvU+puz76lPdgONbTsZ6rHoue/CkpCICjevifreYQ+k8H60lCe
	0LH+k2uf1I+d96kCludPzlFfvn3q2Kqc0bUv3WVMUrVbGMK+bAE0yEtmNoOUAOayAcrIrSkOmi6
	Yv6LF1e+czDUOXQPjnLm7fdahMy759XRUiSl8FTmO9fiKSk5exGwcY0qE3sCQNgN3HkbGZKMPKY
	WERbJTbeDnDsYjY/nC4sqkd6OHFuXNlZkIq83jxhoj+WHfxlck3zYGuBv9yZUETlaB/k9O4HijT
	QglEiE/bfClGSMdbebIQNIOJwv3jTdw9mCQ=
X-Google-Smtp-Source: 
 AGHT+IHQPjApFZs3HhsmlujTPNCDLSejJ0ltemem59qnL0CWos33mq2nn38IumeJ2bsiNZQ67lldHQ==
X-Received: by 2002:a05:6000:1f8d:b0:390:fb37:1bd with SMTP id
 ffacd0b85a97d-390fb370470mr6319805f8f.46.1741015407243;
        Mon, 03 Mar 2025 07:23:27 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:4e::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e485db82sm14509941f8f.88.2025.03.03.07.23.26
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:26 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 13/25] rqspinlock: Add a test-and-set fallback
Date: Mon,  3 Mar 2025 07:22:53 -0800
Message-ID: <20250303152305.3195648-14-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4064; h=from:subject;
 bh=o1MKDNUhE9EFH4RwMPASaZvJKuJrMCZ8uocCQm849Ik=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXyKg4wB+6V7Ple/tkprfUEWpR0Hl6dgveKapl
 mmrrb5WJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyrORD/
 9X6zKnZ50odRHn/aTMB5eerXFYbZYCFEA0zhWN7wOz/mXTLzA+cu1Kx16cILZLnewteP+spWp6TfeG
 oO/uEEbmEQvn0We0uAupBYIt8VRyjDPQs99CWCu4QGxc3xxkMS5+mQTJ2ons155TFMD85j8Vt7WBeN
 Gp3Q7X7RM7uzhQ4y+EgVeTAa8/W6GwRYnV/14RG89yraqvQIzLJtA/BHAHhJwp3um17ldvOg30GGTG
 ljTKl/gZXoBY1wmcLDkkukm03I2d3VP267EdahhXrctgXkA3dPZCyVzJg/lHJF41jU28pRqEqdU1fZ
 NB5PW9KYd1XzxvhBhwPdgXR2qunX1i8LP/jJn9SDb6JtxKugtHFL9y+ODarv6pIP3Of+NZLd7dwdtO
 cLPFwRPp5fOfnOfrSQBkPYD+CksR9o5ApDj8rMEimbPEIeKSfuHi4JgiExIi0TmcmH2AHxW+M+Jons
 MY9cSj6MOAtu7gTUDpDFMkHrkip9TL0zZhiW9CJHxYfOVZLWIFlgX00dUWHbzWPbhiRM6gc8tnmL4e
 gYYeCoy81wYg2HhX5/dCjB6beylFJO8aHhgT+X4oqKJx85F7LbFa6OameusA/gU2YMVg/nDYQKZMxs
 TIUKm5h4/2zgx4el81Kce6UG+hDPxefexA5tXAA1BD6DWZNiRUU8p7hniQGw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Include a test-and-set fallback when queued spinlock support is not
available. Introduce a rqspinlock type to act as a fallback when
qspinlock support is absent.

Include ifdef guards to ensure the slow path in this file is only
compiled when CONFIG_QUEUED_SPINLOCKS=y. Subsequent patches will add
further logic to ensure fallback to the test-and-set implementation
when queued spinlock support is unavailable on an architecture.

Unlike other waiting loops in rqspinlock code, the one for test-and-set
has no theoretical upper bound under contention, therefore we need a
longer timeout than usual. Bump it up to a second in this case.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 17 ++++++++++++
 kernel/locking/rqspinlock.c      | 45 ++++++++++++++++++++++++++++++--
 2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index b685f243cf96..b30a86abad7b 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -12,11 +12,28 @@
 #include <linux/types.h>
 #include <vdso/time64.h>
 #include <linux/percpu.h>
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#endif
+
+struct rqspinlock {
+	union {
+		atomic_t val;
+		u32 locked;
+	};
+};
 
 struct qspinlock;
+#ifdef CONFIG_QUEUED_SPINLOCKS
 typedef struct qspinlock rqspinlock_t;
+#else
+typedef struct rqspinlock rqspinlock_t;
+#endif
 
+extern int resilient_tas_spin_lock(rqspinlock_t *lock);
+#ifdef CONFIG_QUEUED_SPINLOCKS
 extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
+#endif
 
 /*
  * Default timeout for waiting loops is 0.25 seconds
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index ce2bc0a85a07..27ab4642f894 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -21,7 +21,9 @@
 #include <linux/mutex.h>
 #include <linux/prefetch.h>
 #include <asm/byteorder.h>
+#ifdef CONFIG_QUEUED_SPINLOCKS
 #include <asm/qspinlock.h>
+#endif
 #include <trace/events/lock.h>
 #include <asm/rqspinlock.h>
 #include <linux/timekeeping.h>
@@ -29,9 +31,12 @@
 /*
  * Include queued spinlock definitions and statistics code
  */
+#ifdef CONFIG_QUEUED_SPINLOCKS
 #include "qspinlock.h"
 #include "lock_events.h"
 #include "rqspinlock.h"
+#include "mcs_spinlock.h"
+#endif
 
 /*
  * The basic principle of a queue-based spinlock can best be understood
@@ -70,8 +75,6 @@
  *
  */
 
-#include "mcs_spinlock.h"
-
 struct rqspinlock_timeout {
 	u64 timeout_end;
 	u64 duration;
@@ -262,6 +265,42 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
  */
 #define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; })
 
+/*
+ * Provide a test-and-set fallback for cases when queued spin lock support is
+ * absent from the architecture.
+ */
+int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock)
+{
+	struct rqspinlock_timeout ts;
+	int val, ret = 0;
+
+	RES_INIT_TIMEOUT(ts);
+	grab_held_lock_entry(lock);
+
+	/*
+	 * Since the waiting loop's time is dependent on the amount of
+	 * contention, a short timeout unlike rqspinlock waiting loops
+	 * isn't enough. Choose a second as the timeout value.
+	 */
+	RES_RESET_TIMEOUT(ts, NSEC_PER_SEC);
+retry:
+	val = atomic_read(&lock->val);
+
+	if (val || !atomic_try_cmpxchg(&lock->val, &val, 1)) {
+		if (RES_CHECK_TIMEOUT(ts, ret, ~0u))
+			goto out;
+		cpu_relax();
+		goto retry;
+	}
+
+	return 0;
+out:
+	release_held_lock_entry();
+	return ret;
+}
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -610,3 +649,5 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
+
+#endif /* CONFIG_QUEUED_SPINLOCKS */

From patchwork Mon Mar  3 15:22:54 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999032
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com
 [209.85.128.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65EB922D7A3;
	Mon,  3 Mar 2025 15:23:30 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015413; cv=none;
 b=Bsa259LOdbz6/dah/kTmNpwVEGrbtQcE83gAVPBSKNMxxm5WAk4SPfOagRSGVE6vPbWTcuHqPUL9Ue3WbKkRptDARgoItXTYkUxw5QB7sRr0HxB5mtRxA9Axx22Pg8manHXjeGaYdu5Htz/o8mR15AL/ZOFKxhUS8w4tiqrjE8A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015413; c=relaxed/simple;
	bh=SCTiw9WUUkbOGd1OOljQ8eDMmea9HElqFJBZYwx5dAE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=asV59swEBzK4V2MdrL7XvJxH6Zae+lFT29zHPjeGwJPICH5e0Mh4JvdJx6wl1bqME+zZo9Wc6lf8U3Otmq8+8vh867xNI62MxpoUtYrqhfWTlg30n0mezY4Uxt170yr6wHmoJD1DOQx4qdxaQTVSRujphtGUhseKl0InMPLFNQ8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Fe2xwk52; arc=none smtp.client-ip=209.85.128.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Fe2xwk52"
Received: by mail-wm1-f67.google.com with SMTP id
 5b1f17b1804b1-4394a0c65fcso49255555e9.1;
        Mon, 03 Mar 2025 07:23:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015408; x=1741620208;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=hbiXkX+sz+mKWhiZ2JD9Yjh7XLLMoN8o2kVSbNXCy0Y=;
        b=Fe2xwk522exit2mCEy2efsBasggJQNI7xwjAK9hBxseMbR/fvkvXQiC9IosGpavzX+
         ysav9vOyN5Hgo8wlLGpFH5wNadK5JnYWOWww9paSlSmgEZv+BJXv+8zPVe9xx3cjQXIY
         lAzq8hEGZmFadaZNGcAzYYnbOZNso7eKKoWpBWx9O/bRBFaIItZmGnM7ZPlfs8ExKvXa
         1Bjzrxyi4HDfISeIFKzJFvGHuyf8eQoiiGZLRnsJnc4ujJag7Cu549I2mp/DJcJjzkDr
         z80p84YJoeu3ukc+jJnwGKx8pqIG4bieUst8BKaANa9Vt6nmddYBVnle15ng0LsbpCfX
         2E0A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015408; x=1741620208;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=hbiXkX+sz+mKWhiZ2JD9Yjh7XLLMoN8o2kVSbNXCy0Y=;
        b=PcsX0M8PkXazU0kpddiMOqcg7uRnheQnUDsGEgAPi7C6G70EDHUlxFDXBpqFePffr+
         V6Z1N+rGkisXxMtLTMEFwRkZbP2vYIrxD75XY97fHc+AUp4T5f3gPgsql1vMa1Wz7NNE
         1gtfzKz7WJkRYWOp+3Q8g7nR+ikbnY2CpQrJhnvfW40nzzCXRSEVNWZEBhWFCXdcD1uH
         rHPr0EPOwMy6waJqjEKaxro7Jm53wtpF1pvoZC4itYBWJZ4/5nlqtDaXxssP/Az7YND6
         6NQ1kVgsd1jEPLI9APVxQKZ0UlgOgwlwy32X+OAQGSaahz3Azy/WGHiRKzSdLd5f6ScW
         lvvQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWhuh8dbieIzPoQglfKHfSDSHkAiCtGk/JI763W8wFFhVJ2ps9Vp4Cm+Sm4ba/iyXhynQUfekUlWGScZPo=@vger.kernel.org
X-Gm-Message-State: AOJu0Yxv+WelQgsT4liVx49WKifQuOeo/pM/6+kgH/vjOwzBZ/rfBMex
	FHz9AWWoxanuEePc9l8urRUftGgEy3hntiliUhYK6c2KnsIXnihtfc9POrwbtbQ=
X-Gm-Gg: ASbGncsfmvG1ScNKacg5H+HT0xCLS4x+ecb1VeZMBsX8nqs/+R80Sja/RHyXjzOVgvF
	vmGicZRTa1LQ3GStM5J9pGXpmQliBV40VrRCX/lk/OmfDsFCg2stfAk3hfhkuJvP1JMIEJbJOVe
	pI+toWk4UsSBbZPbSOjte9e3CWxvqpAzF680fEie9buorXAgMfg6D5303uw9k4PpwK1WHDSsjvM
	3mflQ8gQlXZPwMerekFuQR/rP8k+HpYiJ8j4T8hfaDSXcZD/tM/EGiXoE48nWz9XNfjK3uGG+0/
	Bu6nb/cjQ4fm+ai42raOYhEATesnxJAxi0Y=
X-Google-Smtp-Source: 
 AGHT+IGKz1oo1UHRACNo+L3weWdPcL06paS7sy9ejxox0qqfekJgP8kl/qKZCrzxii+jUg8vEiUMxA==
X-Received: by 2002:a05:600c:4685:b0:439:a0a3:a15 with SMTP id
 5b1f17b1804b1-43ba67045camr146732535e9.14.1741015408419;
        Mon, 03 Mar 2025 07:23:28 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:74::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e485d8e4sm14531679f8f.85.2025.03.03.07.23.27
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:27 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 14/25] rqspinlock: Add basic support for
 CONFIG_PARAVIRT
Date: Mon,  3 Mar 2025 07:22:54 -0800
Message-ID: <20250303152305.3195648-15-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3277; h=from:subject;
 bh=SCTiw9WUUkbOGd1OOljQ8eDMmea9HElqFJBZYwx5dAE=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXfTMwb5GMWPgrjqrFJA/0gyw9UdNiWN2qmLrL
 O0hijiGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8Ryu1NEA
 CTUuGHPTrmc3ouIq7xk/7vEotpGLUGgeucb8GQwgQo6rcM9OI5bBvMW9JZfYy/los+WuIlwT/5IdCo
 AYY6qGCzJtoH464Llu8bFT1g6BAyOhZoDkX30C4FLyMD7aeKOCsD4w4X1KQJ0ITOJjLgSiO02ZBcsk
 G+vSU3qCtMZHbeLI9Q3/47WprV1GDIVEhOPmoC4z80iJULvFyEEv1KPAsKo+TyvXw2oYlSbxG+1gBi
 1wmztdcoMQy6Y25bbcqgDlSLphn/5jKPVmwA/PqJCSlHPCcKKDENr5iG+htXq3ifKoKap5E/XN3j1P
 jGGqMu1j7i2P7YfUmNzvcqd+995DkNkOql0XTtz5YXJU70HWgDDjw2yhkdCHu8HgF+xp8nM1Y5yZHx
 UhdzTefDYMPL5JVyOInrepgDvHYQkN52ytGBNLKZt6tBQmXSeSSKJxhmpHya3TBUlZ78CPzq9vCVut
 qCdY7gXM8RU5a13RYFgs3bRptzVLJiuo+wnUZRwM7Cob+PtqcIIrMV1tm6IdmQBmSY8EzRJM447VxP
 ERz7W85ZSGfHanbBjbzfofA1Q57J4Au8euTPc7e/mx0ZR2sk/gqk632LbaDQKoQcxFXXSO+DEafFhX
 F8BMJap/MTZa0kNF5S+kGk3gqFtzwIqXvRBWM8HCOLgpW8KCnOFSK/bHk0mw==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

We ripped out PV and virtualization related bits from rqspinlock in an
earlier commit, however, a fair lock performs poorly within a virtual
machine when the lock holder is preempted. As such, retain the
virt_spin_lock fallback to test and set lock, but with timeout and
deadlock detection. We can do this by simply depending on the
resilient_tas_spin_lock implementation from the previous patch.

We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
requires more involved algorithmic changes and introduces more
complexity. It can be done when the need arises in the future.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/include/asm/rqspinlock.h | 33 +++++++++++++++++++++++++++++++
 include/asm-generic/rqspinlock.h  | 14 +++++++++++++
 kernel/locking/rqspinlock.c       |  3 +++
 3 files changed, 50 insertions(+)
 create mode 100644 arch/x86/include/asm/rqspinlock.h

diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h
new file mode 100644
index 000000000000..24a885449ee6
--- /dev/null
+++ b/arch/x86/include/asm/rqspinlock.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_RQSPINLOCK_H
+#define _ASM_X86_RQSPINLOCK_H
+
+#include <asm/paravirt.h>
+
+#ifdef CONFIG_PARAVIRT
+DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
+
+#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+       return static_branch_likely(&virt_spin_lock_key);
+}
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+typedef struct qspinlock rqspinlock_t;
+#else
+typedef struct rqspinlock rqspinlock_t;
+#endif
+extern int resilient_tas_spin_lock(rqspinlock_t *lock);
+
+#define resilient_virt_spin_lock resilient_virt_spin_lock
+static inline int resilient_virt_spin_lock(rqspinlock_t *lock)
+{
+	return resilient_tas_spin_lock(lock);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#include <asm-generic/rqspinlock.h>
+
+#endif /* _ASM_X86_RQSPINLOCK_H */
diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index b30a86abad7b..f8850f09d0d6 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -35,6 +35,20 @@ extern int resilient_tas_spin_lock(rqspinlock_t *lock);
 extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val);
 #endif
 
+#ifndef resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+	return false;
+}
+#endif
+
+#ifndef resilient_virt_spin_lock
+static __always_inline int resilient_virt_spin_lock(rqspinlock_t *lock)
+{
+	return 0;
+}
+#endif
+
 /*
  * Default timeout for waiting loops is 0.25 seconds
  */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 27ab4642f894..b06256bb16f4 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -345,6 +345,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	if (resilient_virt_spin_lock_enabled())
+		return resilient_virt_spin_lock(lock);
+
 	RES_INIT_TIMEOUT(ts);
 
 	/*

From patchwork Mon Mar  3 15:22:55 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999034
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AED5A22E402;
	Mon,  3 Mar 2025 15:23:31 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015414; cv=none;
 b=hMbKWq04gZI43a/xoB6N/jt5+q7HnPPMimyEANKu8+ueZSmz0G96TkNygq8VgcK19uC+OaiUZN7VD0yqc89Ip9NzcDld1oSDjVz06eONIin392iP4kFBCzXTHqhYVnekD2FtMj0ggNohKZM1X5C2LoBwfXzJlU0zastE5AyERLo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015414; c=relaxed/simple;
	bh=XzehY2DPwsFHlB5VuQsc3kNRx10DtgHBazvwnaqKM4M=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ZtMh2OLNoAjAE3zIEnDpcSqBAzEXqWBSlUtlhu+AIJL8Jo3PL//f5GgNFM/ZUqFxZUYxHbJ7xlU9RLrLfPeRQMCQ4kyXa0Z3CI800RkUrU/ZbjeDSojj/UR73C+1s+Og+0yvwkm/hG3QfdAL/bHtO7ydA0hciGLQPuy3UhN0qwE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=OmHHiovI; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="OmHHiovI"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-43995b907cfso29199445e9.3;
        Mon, 03 Mar 2025 07:23:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015409; x=1741620209;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Oib9X6F+jD32qthRhftfxOrIXcfGV0ymse+3a+fep28=;
        b=OmHHiovIYiqBqXja5o/ADuRyU/yF2u8LZyen7tD6rnTSZSpd0zeU3u0j8gOi0c8B4r
         rXfMjn6JNaMnMqm1WS89uFz+Fw5MNS2omOxIBUdbxmHkwRNbQfeYmCGvflsgybe4B33p
         133FUlXGCCI6bzHWz7znMNdFFEH4TdMc94VOdSwmJwpKssQJEfqJAhTC3omSyES10Ws5
         P36yD3/5ALeMX4aPR0no1mOcHPy94pHLRziDdewgdjR2eKtE7wi50lgw01ko9J8SSU5D
         B58U1zd/6EXJs0TgtbzhM95Gf1SQ16co2xlnPKpFvuvc5VWkWtNrCci4PhICRs/25Iao
         NngQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015409; x=1741620209;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Oib9X6F+jD32qthRhftfxOrIXcfGV0ymse+3a+fep28=;
        b=VToX3WE3WC631tBwXCx88A6wfTLjx0Ia7Fx/K04vS5INXI4xZ84B567Zas7USsbUS1
         vMIsDwt+qof7cI8ofvA4hrBCatfQ50qSksabxowVMke5Y7e8CyPLQvs8Q9WG3VQoghSp
         373CbFxLoZRorly2bjlcY6c1j2+W0DIHPmNSws5q58wvPW4tdseue04Hf+3MO71C5nNP
         NGT2dcw3O8YRo8wfUO25TXeZJ1fkFnZVY2+begZptCM9vft8NxfqltDE4DUP1HWD5SFE
         KaVGOaEk81gaw3GUYRlUmqgsv24bO3PbIDCBOD1N0HpuOiJ5LUg7vu03p22JLs6lXeld
         TJ2A==
X-Forwarded-Encrypted: i=1;
 AJvYcCWEw2AC+RhQCzWoxv/Z6EA3vahXuNIXQKteZFCVW/AmL3jCmJ2aOuC8I1kca0vk94skpVKBipBmWsRvKts=@vger.kernel.org
X-Gm-Message-State: AOJu0YzWRVHw9i+rONpLzwp/xDqjca0WvLf5y59M9za9xSmshSRfSduc
	IX9JCUhvLb8w8ub45zuode+MYo4UYwQUTAyK8FXfPvV2ScL/vFUQOkptUpIQSJE=
X-Gm-Gg: ASbGncs+UYqzICcPGwkXz8CCJURjxtHhZgcvb0+R8qkjCUv69jJxmjEMZ40HwQVByx4
	WkvPBZVUJcvfq2B9g+uzS3GxXNqjR4ngK5sk0X5HKaEdztabQsBD0Fk/fmuUb5NaZxaIIO3lgij
	X/gWbjF7/YGHgB7jCh3iq8zSiqibermIbLlhhDlAlOekkCq20Ar7YUgLfL2lmV5AApj/vXpB3Z5
	bktSLxHunlgNasv+qvMIzNhkeOiyvXTSOoCGNVXQedG6DwTUigbmSjMOOhUtxCGkplJuVxhpysV
	tTEGM4H9++JSO+Y7cpugCGpVgakes6vs2yI=
X-Google-Smtp-Source: 
 AGHT+IHK0E0NT4rN+IBrKPzoxmISlpoRx++6wRPzHRPKMbke27YkpGiSQnpSIDIa7J8yfk3LY6QLQA==
X-Received: by 2002:a05:600c:4e8b:b0:439:a1c7:7b29 with SMTP id
 5b1f17b1804b1-43ba6703c35mr123515565e9.17.1741015409574;
        Mon, 03 Mar 2025 07:23:29 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:44::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e485db82sm14510061f8f.88.2025.03.03.07.23.28
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:29 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 15/25] rqspinlock: Add helper to print a splat on
 timeout or deadlock
Date: Mon,  3 Mar 2025 07:22:55 -0800
Message-ID: <20250303152305.3195648-16-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2125; h=from:subject;
 bh=XzehY2DPwsFHlB5VuQsc3kNRx10DtgHBazvwnaqKM4M=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXYD8EVbB4Iydj+LTkfn6cdW8csGFE6TObIbKx
 E0nmgrCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyrkJEA
 Ct9v2S2uZ5k2kQeAFIPGD+gS68B4nVheOj37Umc8z+EwrGBzZ82mY8MkklYthiJE2AQBdHrtWAwoqG
 tZbTvmL6+lzWfEYRZt2xi60n3hTJMO3dlntsG+nTWnBWpIQZX5PNvzwPOl1/8OQMO+69prEjQItSyu
 eMEd38+aF8KI6iN85h1WM0KeOubpLLpUWfdNufr7Gsq0Oi1qImSnF/ilMPpOkoWQnvjxBU2ZKp75De
 P5Whx7I1WBvsuPW2lX4rzXBrMWbAWPQpojlxod4Ls4y3A61yRekC7LremgMkSs6gvc7Tnw/Z9lQWCU
 CgtczCyFYGLMz+mNBIEEbLqTQdgdrvV3fFCY78DcBB7xUvGlqD6CIZGr6DZljaV2SoWRHRCmv5Ft8H
 gVx7H5tqZgvWVo2SoMK4dnjxpn4/df+FCKxss6T2T6c60eFLmTgvI0DME2DNvOKFHgBhQpy6p3Cxsl
 H4g6MJFPnCJsqycg5cOVJKjPPgeoTGZlXrmAyPL60whVruu8QiWcCAFv4taoKU7M5yTeAGfAYdFU8l
 6Ws3FsgbVuQ7DFWtfrLU7462oedcdeEAj81i6KLZmyfucZ073rF8/ghmFeBJtrIiheHs7mCqctGJjH
 +MDX6Nid6DYpDuM6VXjrvhhN8QfGnRpqsnGqUV9O6Cf6rbVzKauRdtvFhKgA==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Whenever a timeout and a deadlock occurs, we would want to print a
message to the dmesg console, including the CPU where the event
occurred, the list of locks in the held locks table, and the stack trace
of the caller, which allows determining where exactly in the slow path
the waiter timed out or detected a deadlock.

Splats are limited to atmost one per-CPU during machine uptime, and a
lock is acquired to ensure that no interleaving occurs when a concurrent
set of CPUs conflict and enter a deadlock situation and start printing
data.

Later patches will use this to inspect return value of rqspinlock API
and then report a violation if necessary.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index b06256bb16f4..3b4fdb183588 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -195,6 +195,35 @@ static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask,
 	return 0;
 }
 
+static DEFINE_PER_CPU(int, report_nest_cnt);
+static DEFINE_PER_CPU(bool, report_flag);
+static arch_spinlock_t report_lock;
+
+static void rqspinlock_report_violation(const char *s, void *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (this_cpu_inc_return(report_nest_cnt) != 1) {
+		this_cpu_dec(report_nest_cnt);
+		return;
+	}
+	if (this_cpu_read(report_flag))
+		goto end;
+	this_cpu_write(report_flag, true);
+	arch_spin_lock(&report_lock);
+
+	pr_err("CPU %d: %s", smp_processor_id(), s);
+	pr_info("Held locks: %d\n", rqh->cnt + 1);
+	pr_info("Held lock[%2d] = 0x%px\n", 0, lock);
+	for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
+		pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]);
+	dump_stack();
+
+	arch_spin_unlock(&report_lock);
+end:
+	this_cpu_dec(report_nest_cnt);
+}
+
 static noinline int check_deadlock(rqspinlock_t *lock, u32 mask,
 				   struct rqspinlock_timeout *ts)
 {

From patchwork Mon Mar  3 15:22:56 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999033
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBFA222F39C;
	Mon,  3 Mar 2025 15:23:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015414; cv=none;
 b=lPmo1nEou5ToT3KmkeM6ua2qP84ORePFEaMgPX+2U9FG2q9Mh8aaMY+HJ3HG6OLKH8aP9aX/A3xwPb+VHjVNSeHsu8RZz7DQm5btQOeOPa3zbDRFwNAyK7lMqlxLQVWcAo2x3LivhW+z6hHP9HTFY/iOhe+9c1f6BDrdcBMyfjs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015414; c=relaxed/simple;
	bh=1vN5D7C4tfoaawMuUjp7ZN+fQy3wYGzp7nrXDJHdVAo=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=tbcPSTU82yOuyh4VdiVtBdZEEC2bkcj6uZLApt12b0ca/g8RBikIO3w2bedPNPpv5NXXF0g8bbQr8+kk54o2mKEygUkcTFdtPYmvZEvHZfKJVpijYECRF0a55VW9DRnx14NHHagoK/XFSj9QKtomQ5VE9WYVcVaD5vUw9pplt9Q=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=esHjuS0K; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="esHjuS0K"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43994ef3872so29737585e9.2;
        Mon, 03 Mar 2025 07:23:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015411; x=1741620211;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=eIFRymfR26T1V4pI+hLoHtvRY/1A7N6xjkuyA7wDGHE=;
        b=esHjuS0KUnRgpTrqDEV5y+TmxAwcmkjCUu9AkzwFK2Pk+WhPBiCOYwT4MrDO6Hs/RL
         P2YhT+kzmLdmHfP/OF1irzYkQwNiGS4IwX+7wAWa0LIq/LuxP+DE+IpK3sCfLs4UPvzi
         zdmDYGVgpalFeUBO91K9NxfyrQ3T7MA+O+OiPFPR7CEkdvzudPdoze9VkIJOkFcmx5cs
         UaHtCg5ebXkSpWYsUP6llSGIkf6652Yn4f51caXIyc1CE5vCXN1ldAh9UE/1b2zh1WZ6
         cPW8I3TcW9MfrVoPVAVUumNfB5vVc8josPWR/wjYU6B4oCFlrXt5ZsVyx5Ic9Z/dJI8k
         xyoA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015411; x=1741620211;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=eIFRymfR26T1V4pI+hLoHtvRY/1A7N6xjkuyA7wDGHE=;
        b=fzG3TadPoQEStJiCIiwzZQc/GMxF3JyfUVUHxHP1ZdlWZxI8h1kbEGqe9M5NlER/Id
         sK/x7LzMekZ1hnn1hZEt25DD7ClQkmOhZBo7W6vhmslavY8s95kNRzu2DPaBnd6rNYxX
         Ct8sC43pSEIdKw+aN9CnJ+yJWzRmRMu4QpeedlrMSP7kWa1FBAJY+tOICjwsXJ/lFFwD
         cEV4TNtoIWB9nIbuhwR7EbtjgslAAfgL10ymArzjujCpdKlyZ36a7xgCckfc5PJAFrji
         q3mylCp4xgvYvIv8zRC29sFa82RhZ/iwyq1JPYsMEZrOThZJO+sftaU07zFZU9tglBqh
         u0bg==
X-Forwarded-Encrypted: i=1;
 AJvYcCUdV55WfzqjYBNADJ8hg5UgZhtfO3YNWojbiOlw23sc06Vpooy6C7+LgEzxlOukLwAnSy+1r91cTgH58jU=@vger.kernel.org
X-Gm-Message-State: AOJu0YxJLpQHRqMeKVBlwtTPRrenp8NbidG03yhwwfoxcvloJL60xUHE
	IApFHsk77cvaN8k8RXIp65z0iWwHplA3/+YWbLyrO2pFbZgdh6edwFYcI6JdtOg=
X-Gm-Gg: ASbGncvXuQzR4QSBBk1DhmDQ+85RJQvVZuw2+9R3ucw23TtLQnhn8UUpBLYQ/98O/pm
	iqE/SOE+N+914e+F3Mnr1EgRu2YeOkU0XnIjEzC4JF2usYK/BaUlogbfHlhZKqud+QBF+5Vb5qw
	5p3YJu101uuE2P3PDaS84ROaYPYvBQIVRRYD4vqrK3ICKGxDMxkzM7OxBNqih7zUYwvZNM9qV8E
	okmpSy23si6d3AQDJPXTXr1sg9oalHSC9Sq2pgrNylTC6n0d4f2qugdnqxUmckyoFu4xs2Sz8yB
	fD83Wo9iNJs+DBZdDgJ25x33xsmocuY6jg==
X-Google-Smtp-Source: 
 AGHT+IHhLXGjzf0udnyNXtByFw5OFSp0rn67LfLvjV3YqWbj0wvdktay+P0e8RZUkjfn4MvH3tUgiw==
X-Received: by 2002:a5d:5886:0:b0:390:efa5:9f6 with SMTP id
 ffacd0b85a97d-390efa50b8dmr10698291f8f.51.1741015410709;
        Mon, 03 Mar 2025 07:23:30 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:b::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e485dba4sm15037247f8f.92.2025.03.03.07.23.30
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:30 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 16/25] rqspinlock: Add macros for rqspinlock usage
Date: Mon,  3 Mar 2025 07:22:56 -0800
Message-ID: <20250303152305.3195648-17-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3988; h=from:subject;
 bh=1vN5D7C4tfoaawMuUjp7ZN+fQy3wYGzp7nrXDJHdVAo=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYD7W48Zy4MCJ4hC7Laq1GOwnd+65hIihP1XlI
 Lb5c/6OJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RyvneD/
 9CSceH24BxXpHlqBvfIHao0z3PNag/hh6hxWdzMJtXnIupGNQZXotn7KA2u9LloczdUDTDU6i5vyel
 pyxYLhk/EO2NcmqNvsUhMnV5AQmt3i0G4X/wKGhsKXL1avILr8mSIPCqUG6RFfbDQB7MpnGHT9wsVp
 cQasSdKg8ZNoIJzBhiV64wzVF1LDmN5meStwPmrIfdXHD02M3uHizakkrT2k6ueYJNn2a4tr/vVKoU
 iRwh0w3G9w3jgipmvotAi+BjklTirlbzwjKDUdCWzM60WlU06NkoPlkUgR2q9m+804yz5qTMaenuPO
 mDGBty0syn+pois9BdJv0UeaAwoksZN74YC6vuHDh/6BnSK4mOisuHLP/D7iby5FHVL4tFHia6FYXq
 0+DkRs/a2OqWFTk+o+cN5A+NCwpDIUbZfEc88aigLqitRC4eJuZQEHywIriAiRmy9ImnyJodlammr7
 s9AjvCAzf340XzmF6Gumn6IbMHZb4p5hLT33cHlh7X1AjJ8RCiWQU9k8Q+zrAYThzoK/jY88XCaIjI
 FPn+NFQHv4b/Llf94bia06KjI0S5GgJJFxeSqmSE2wqAdn52bTxowHvRwOUwjeeTpXHaxYnYDXqa7m
 FXdZOR4Qg1a7dlf/djjyVfSDMq679lEDbrl1aOKGxNSRnLgsYeiMckHq9q8Q==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce helper macros that wrap around the rqspinlock slow path and
provide an interface analogous to the raw_spin_lock API. Note that
in case of error conditions, preemption and IRQ disabling is
automatically unrolled before returning the error back to the caller.

Ensure that in absence of CONFIG_QUEUED_SPINLOCKS support, we fallback
to the test-and-set implementation.

Add some comments describing the subtle memory ordering logic during
unlock, and why it's safe.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 82 ++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index f8850f09d0d6..418b652e0249 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -153,4 +153,86 @@ static __always_inline void release_held_lock_entry(void)
 	this_cpu_dec(rqspinlock_held_locks.cnt);
 }
 
+#ifdef CONFIG_QUEUED_SPINLOCKS
+
+/**
+ * res_spin_lock - acquire a queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ */
+static __always_inline int res_spin_lock(rqspinlock_t *lock)
+{
+	int val = 0;
+
+	if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
+		grab_held_lock_entry(lock);
+		return 0;
+	}
+	return resilient_queued_spin_lock_slowpath(lock, val);
+}
+
+#else
+
+#define res_spin_lock(lock) resilient_tas_spin_lock(lock)
+
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+static __always_inline void res_spin_unlock(rqspinlock_t *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto unlock;
+	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
+unlock:
+	/*
+	 * Release barrier, ensures correct ordering. See release_held_lock_entry
+	 * for details.  Perform release store instead of queued_spin_unlock,
+	 * since we use this function for test-and-set fallback as well. When we
+	 * have CONFIG_QUEUED_SPINLOCKS=n, we clear the full 4-byte lockword.
+	 *
+	 * Like release_held_lock_entry, we can do the release before the dec.
+	 * We simply care about not seeing the 'lock' in our table from a remote
+	 * CPU once the lock has been released, which doesn't rely on the dec.
+	 *
+	 * Unlike smp_wmb(), release is not a two way fence, hence it is
+	 * possible for a inc to move up and reorder with our clearing of the
+	 * entry. This isn't a problem however, as for a misdiagnosis of ABBA,
+	 * the remote CPU needs to hold this lock, which won't be released until
+	 * the store below is done, which would ensure the entry is overwritten
+	 * to NULL, etc.
+	 */
+	smp_store_release(&lock->locked, 0);
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+}
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; })
+#else
+#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t){0}; })
+#endif
+
+#define raw_res_spin_lock(lock)                    \
+	({                                         \
+		int __ret;                         \
+		preempt_disable();                 \
+		__ret = res_spin_lock(lock);	   \
+		if (__ret)                         \
+			preempt_enable();          \
+		__ret;                             \
+	})
+
+#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); })
+
+#define raw_res_spin_lock_irqsave(lock, flags)    \
+	({                                        \
+		int __ret;                        \
+		local_irq_save(flags);            \
+		__ret = raw_res_spin_lock(lock);  \
+		if (__ret)                        \
+			local_irq_restore(flags); \
+		__ret;                            \
+	})
+
+#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); })
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */

From patchwork Mon Mar  3 15:22:57 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999035
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACE2122FF4F;
	Mon,  3 Mar 2025 15:23:33 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.66
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015415; cv=none;
 b=SikjabbnpQOL4SIZXILE99OK4O2ECii48BztuHBtT6rn7SfTSCHDyIAtbT6gLXohEa776xS9WYkz/r9E2qlJnVtn0zTsEuyqRJ+uxGqXcyR1oFaSfdkZO9hlU+xazO1KIMmVhn2+mUm6y8c3Y+b/2Agpn+obgAvViNs+Z+rLUvo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015415; c=relaxed/simple;
	bh=bthuxBncjGOs4mky1mKdWZk3nTI3RWDlXR+RsRusmm0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=QFdaGRqwKfnKQQSS1YBSBUws9L0Kr//zZtU1Qyw7Cs9ggDSozkExC3S64nPLjt5bEo7alyG34/C23eJCzZZFD2CC8XsBQBOtFLNWcsD/mvW9tcZrB+TnJKg5Ln/9HYbjg8+ouQXppJ3/0MD3dflgHp3RUnxNDWGJ5ZhjAk/BNHo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=JbMHeOSh; arc=none smtp.client-ip=209.85.128.66
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="JbMHeOSh"
Received: by mail-wm1-f66.google.com with SMTP id
 5b1f17b1804b1-4394036c0efso29344875e9.2;
        Mon, 03 Mar 2025 07:23:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015412; x=1741620212;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=MKS1T/XQ/tWs8ecRTlu3cl4kKPJmu8O7rwuQ99x/CTQ=;
        b=JbMHeOShC0Meih8JAt1gcghndF+ApNyi7ZpsZf4kwqWvY31rCG/VKgVJsX+kPGt1It
         MwV7IfZSCTnQjZwTyfhVm0Q05DEFWH6AjqcREWV6ixLI81SM98HYt+QBkoW929thymdi
         c7H2AaIeagt+wh4H0GnKlgwMs+h/Vd5AKrdBzr2mlNKzF94MVHJ+AnaCKsOqDqBvuZji
         /t+Rg86uzVI1S0mLpBjXiFohfduRLo3NPScfPZelqIH7PbSn4iQcvABkCv5s7t/5jKp0
         DL1XCOoE9VLst1xoGolpsV+LQ2m0AhDigfv5MbIu6YJfUWtU8l3FOuaBdAtP0iW4zRBO
         gQkA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015412; x=1741620212;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=MKS1T/XQ/tWs8ecRTlu3cl4kKPJmu8O7rwuQ99x/CTQ=;
        b=YJxd3qT+/pEyuSw9zrxVtt8aR5cLWvHNRLjmcJ+Zp0CMIlNXjzhfzKfD+indHC+jga
         SbDSpLZE6hVdmieVf618u8BsUcY/g1aqC/9YJp9Fw09hUAD3DQODEJ8w5p2WzaOJ2OMx
         2ZV4iSlenIrt3YX5V0GIi8YKPKx3VgEiTvPlkIvCkshMKQ79TV53ZCdspLEr49l29Og+
         HyA+f4TYLY/41cdx9BlWPHtWgdVAM+BocvaOzWfCLS4sVtHzSJ3pNnbCk4oheSJep0d/
         47d6v/08IAfBxWfYFkuN2ucWu0TP62ImYKhbpvHOrzUMQXEcPY5BpPWCLziqd7d6lu1j
         LOug==
X-Forwarded-Encrypted: i=1;
 AJvYcCWhOGx6d/1Vk2nBXyij6YHm/ldVKjNh6L0GV9phX8lN8RVh2/yU2Ggvepo7XXaO1ZGZrWThMVi8jU+In/k=@vger.kernel.org
X-Gm-Message-State: AOJu0YwZ2yncphtmBYRwQK4Yeb41r3MNQt3Y2zkxNYs7RegTmIwBKzwt
	5azNNhQdojn5CTyWOtBnxMoKFf4zWp6BKZPzdkT5WsBNyIhxhV6sBMIdyJUWhmE=
X-Gm-Gg: ASbGnctM8nRsCwExO2rKBwUVTFiMFWF8oQb+H84Xzttl1WYebBdy1OnfInNRRuRKdpi
	FtD/9www8lmpDpTKd1HF7nP6Qb8/vne2tWac5PYfAmtrVa2QX0EVJ0Tioraoki9P2vbrakQcNn1
	HkfazlklT4g4geWkU1CFMwKN1stgd9d7bfbuOPZj1r3iKbj0u5S0lWeCrl4yxTkK5KExfwqUJtB
	pUo8PYcYQV+/L7JR1b3OyVyCQyFlnO3OwI8orRh+Z+S92ZIYvJ8ihIiSKsKuZbqK0SpIZFeJpD/
	wuSyyskwR1GiDzVitLRTzRuxH8XQyk8+aPQ=
X-Google-Smtp-Source: 
 AGHT+IGlH16ifEyb4PlL+eWM8UXuzJH348pjxTrHJU3GO0ZnngxsPxkDQyxyCAS30KE/L9MfeFB7Sw==
X-Received: by 2002:a05:600c:511e:b0:439:6017:6689 with SMTP id
 5b1f17b1804b1-43ba66e0bf5mr109140235e9.9.1741015411800;
        Mon, 03 Mar 2025 07:23:31 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:74::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e4796084sm15030999f8f.19.2025.03.03.07.23.31
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:31 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 17/25] rqspinlock: Add locktorture support
Date: Mon,  3 Mar 2025 07:22:57 -0800
Message-ID: <20250303152305.3195648-18-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3149; h=from:subject;
 bh=bthuxBncjGOs4mky1mKdWZk3nTI3RWDlXR+RsRusmm0=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYc+Sgz0ORRtjBX9jRAAhbYoGw0HwGVEnt1lkK
 0s8wahWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RylTSD/
 9TP8RtAKolLrilNYH10VbArS3ra9jOCKUgFIRYMn7uqajviRSi77CSgnaVlYnmA9EfkXqnPw6r53W8
 inm8T7K1fkgOmvIbYxtsKTKVPyW6c0loQf/rxpgCr7uK34o5YRBK/iSgFyLS4uAE5QbB/Pf9jJ28lv
 Wtgk/Ey42frf7Uf5RQNdrzPGWDnh/n2149+2agrmt9UNjkHTYYsrn6TOvYBRgxX0kiN60RLQbmXTjC
 nHD31GAomXGCnNxzEDhY1G2vG7r5C67GzOWT3UMHmKgQdaRcF/Lsw8fkI6g612CABQio63//Nm0QSA
 Myxwcedn9UdnGHGvM0F3IlXZs2Ub9fwmfoi6C1gmJ3MJiii8TYzxMoVYTXMM5yktxCqQG11WnTNq0l
 b3CpjATOtu5zhfGJH4mG717bInDxZ1ZmPKWEEGT4Hz0ePktKsfs56WLmTMhKVkRnC8Fa8ztfYGJLdG
 bQg5bPK0unpb/3Z3pIjZUHVpqVYWKGFYCmKd8AYmN3D7RlJn7HRmZCf56bovKsEgHPk9ZlLamM213o
 m1cj0p0dLPhfrYItv1LDuMe8Xkd/8Dzp3oWLhMa097fZFVsUhzB6TnuCE56VdlV2ZEvObJHft6x3OR
 yX8L4HXgOOdKq1WkjxYTYQtu9ZbqUVfB5ojTHrx1qJR9u7pdx19dUvyvMZQg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce locktorture support for rqspinlock using the newly added
macros as the first in-kernel user and consumer. Guard the code with
CONFIG_BPF_SYSCALL ifdef since rqspinlock is not available otherwise.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/locktorture.c | 57 ++++++++++++++++++++++++++++++++++++
 kernel/locking/rqspinlock.c  |  1 +
 2 files changed, 58 insertions(+)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index cc33470f4de9..ce0362f0a871 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -362,6 +362,60 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = {
 	.name		= "raw_spin_lock_irq"
 };
 
+#ifdef CONFIG_BPF_SYSCALL
+
+#include <asm/rqspinlock.h>
+static rqspinlock_t rqspinlock;
+
+static int torture_raw_res_spin_write_lock(int tid __maybe_unused)
+{
+	raw_res_spin_lock(&rqspinlock);
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock(int tid __maybe_unused)
+{
+	raw_res_spin_unlock(&rqspinlock);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_ops = {
+	.writelock	= torture_raw_res_spin_write_lock,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock"
+};
+
+static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused)
+{
+	unsigned long flags;
+
+	raw_res_spin_lock_irqsave(&rqspinlock, flags);
+	cxt.cur_ops->flags = flags;
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused)
+{
+	raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_irq_ops = {
+	.writelock	= torture_raw_res_spin_write_lock_irq,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock_irq,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock_irq"
+};
+
+#endif
+
 static DEFINE_RWLOCK(torture_rwlock);
 
 static int torture_rwlock_write_lock(int tid __maybe_unused)
@@ -1168,6 +1222,9 @@ static int __init lock_torture_init(void)
 		&lock_busted_ops,
 		&spin_lock_ops, &spin_lock_irq_ops,
 		&raw_spin_lock_ops, &raw_spin_lock_irq_ops,
+#ifdef CONFIG_BPF_SYSCALL
+		&raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops,
+#endif
 		&rw_lock_ops, &rw_lock_irq_ops,
 		&mutex_lock_ops,
 		&ww_mutex_lock_ops,
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 3b4fdb183588..0031a1bfbd4e 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -85,6 +85,7 @@ struct rqspinlock_timeout {
 #define RES_TIMEOUT_VAL	2
 
 DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+EXPORT_SYMBOL_GPL(rqspinlock_held_locks);
 
 static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts)
 {

From patchwork Mon Mar  3 15:22:58 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999036
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com
 [209.85.221.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47167231CAE;
	Mon,  3 Mar 2025 15:23:35 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015417; cv=none;
 b=JvCPiSpQpr2myJxr3vNmr6Jtb3JAYsouwbs8AVhf8uKBhVEoZgm/jsJkAD6YGxKlkbQCMuFO7ojZw2JjMvnkSwRqZOhtCxUYiwDX6I+bS0gS0ZAuRmzsxDQkJ29vu0EG3BdJxJVfndo8fT09bxsEopjFp2WhXMgr+jQmZNeQq9E=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015417; c=relaxed/simple;
	bh=qSIKPInVzZ0EbqaTg/PJPPvBSQOZHHVdC+CsMy/ps90=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=BLCOhj+aI+MAbRb89H8F1BZ/Bhw71RG7PNwrhT1hKllfeUgkcGAD7061ZoBvLAR4EhVqYYaKnAkzVz0fwJ5eXGM441XGkyo2WKmGBEDL4X3IaPMYsCOH+5BSUDP/3bgKRmrXTcGIYeBplP9Lvse5ioha6qucs5T+SVnJBxS2WnE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=TnBKoaiU; arc=none smtp.client-ip=209.85.221.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="TnBKoaiU"
Received: by mail-wr1-f68.google.com with SMTP id
 ffacd0b85a97d-390ec7c2cd8so2125421f8f.1;
        Mon, 03 Mar 2025 07:23:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015413; x=1741620213;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=7Uc01JwRLTx4LvOAU9cv3DgvL5O0RF1rRGwSEjckiRA=;
        b=TnBKoaiUodbGA0uOVg4hxvU7srRK28WThtS/o9qLwC/0es2dBI14SDdv+SIkMcSDTO
         1DkAcYhnM5L9NB4lgQLPInEHQWStaVwO5+VYMZkjvOIGyBTH0OW1xzgoJOcMYXb1kpwt
         oOO9PGlaLBB1hymbXGnKLGHSY/7d05UgoXy1uCXGe1jRdKc4Ik5Y/DPvCVhg4HBJy5pk
         ifCF6z7ckKETo0SVgdTtvl3fMxuQ68uWBM2gIyOxGFARWa2mwubUySI40Nae5g31UQha
         LkF3l/Tl/Kpylj6811E10PKPtANvTt0pt0Kyc7lZk++nEn14ti8Jpz9CRBq8DYL01tTf
         bofg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015413; x=1741620213;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=7Uc01JwRLTx4LvOAU9cv3DgvL5O0RF1rRGwSEjckiRA=;
        b=opNI5Kfqr4F6+co2uAT0bLkCbP4tvt6vK/K9cI1eKmKvqEtrIZGb5bVqC2CHd2VuPE
         qHj2q5/T+EaaewLsWE5U9kJfsDiBQ4SGGOvhjyTlm7/W+5IKUgLkl/eeSQlN6e7E73sz
         xkmU+1qpLueJaMOpriGkKYWAttlCVRlQm59qTx9nU0bLvGjAIg39zT6Qxvzc0HpPQk9D
         9VI56ztbD+r9nJ4gN2Hc/3H1d9K6Xjl6VcTs2WnF+E7Fq462pj/nUSrXVXfS/4qAB3Uh
         KA+ejEEVmmxkYurwwl3wxeehwi+rD5bOvUNV1pUluRWk1vuvelxvTyFvxVANFrejSB9A
         /dSg==
X-Forwarded-Encrypted: i=1;
 AJvYcCU64FhroOy9q5SbfqrjxWIjms0k78Rc4eBuKierI6HpevI6iwLGVXcUwaCJMlr0AdIMZFTjJo5WDvzzBFk=@vger.kernel.org
X-Gm-Message-State: AOJu0Yyqqroey/eRynecUXY5/a/YlgFOzHvd9lizgIryqrr4cY5svPXP
	MEXvkbjI7YnIaKTA+L5SNsRnggAIb0UpnVQMOPVElDULFBxm6DCq7BkOMKPRRXs=
X-Gm-Gg: ASbGnct7ZHyxTB/N/xeLYgLJvL2ISnBTKe3KPme8y1coIGPmgi3BQ5OVK/d7pnYL130
	+QC0wLnmFS0LeZHR70CpIJeef1/Ll6lNOTUZNoQfa9/FNDxFZNw+rUlE+WH49ddjbNzvxHzf7MU
	xOVpgdUjD6M7EShQBemCbyTfwcXQOArsjVyUpFuSeKPw7jgUl6W1iYBYW93agXwZAKnmPEjq42R
	OcK7wXu4jTOyp0NPvSfLJTMusIdSpnaoKSSwwlotKWDQjngRxXZv/GhrtKcqeOE/IKG2dp+hWuT
	NMfHbkXBd3wmcU0oQi+r0kdfxminMhbqFQ==
X-Google-Smtp-Source: 
 AGHT+IE38gK5wJECZxFlCYL7vqyBrElWlJLLivqEZ8lVMAciRT4pe2f0X2OlAMW3qevmIMhBs6VcWg==
X-Received: by 2002:a5d:6489:0:b0:390:f641:d8bb with SMTP id
 ffacd0b85a97d-390f641d990mr8952175f8f.36.1741015412881;
        Mon, 03 Mar 2025 07:23:32 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:1::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e4795d1asm14571262f8f.4.2025.03.03.07.23.32
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:32 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 18/25] rqspinlock: Add entry to Makefile,
 MAINTAINERS
Date: Mon,  3 Mar 2025 07:22:58 -0800
Message-ID: <20250303152305.3195648-19-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=2083; h=from:subject;
 bh=qSIKPInVzZ0EbqaTg/PJPPvBSQOZHHVdC+CsMy/ps90=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYI9HLNzIFosVfpo/eAGZLrsdbAXI02uiFuv5j
 gAn43fCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RyraqD/
 429SlE3+raZFobBUUclVgrmx/5S5ghp0W6qHlpRJg15mqb/wQ7Nm7MfnP8Mt/cd/yugjieC5xXaIZZ
 2gRGy1o4SiUDG91z7FYRxs5REy+twKo6/1wchd494MGNac2afJXVeogt88nnikDfs98ML1W3w4kR3r
 djShNfhU1YlT5UB9SixxXZsaVabsgXdlD37rG/rItcXLa2S3En21E7gq3ApgqdOWZE8a6/JpEG8kW4
 1oXfdGyOuubLdfXTC9F1nuSlIdMSGnFINmAGPeE0mrtgS74cGYxE1pMpplfJMozQabn7YjTuU03xNZ
 11p1zrVmvfoac73XycC6N+KM+AcpskEIKpjnp+SKHkhbf4PG513KLKQYTz3VNWwfUHh5AKOH6FYBPn
 sZ9LhuZ5FHOn+bfFHRURJQOjSlOHS088r968BsyQsVY1qv9RVyVM8VbqXS/C+frx/mSypWU0V8xVte
 r5NQyi+CfLnuOjZe5fyJo34b4gNNn+J/hqFR486kfOAD9hBD9pjSNnaD7P8U1z/36+m57yipbTlO3B
 QKtFBqPHAvNwGSaCkhb+JWZfcH7pd2jC83wadWyvMeixxn0WY4FwIP1tLyCP6lxkcbqNUICVYdOfqn
 LQ9KXmI+sQGaXI4NJ2OV88iI9YE445UWW9V+2WXlW683DUBbYbB8AjtUHjIg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Ensure that rqspinlock is built when qspinlock support and BPF subsystem
is enabled. Also, add the file under the BPF MAINTAINERS entry so that
all patches changing code in the file end up Cc'ing bpf@vger and the
maintainers/reviewers.

Ensure that the rqspinlock code is only built when the BPF subsystem is
compiled in. Depending on queued spinlock support, we may or may not end
up building the queued spinlock slowpath, and instead fallback to the
test-and-set implementation.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 MAINTAINERS                | 3 +++
 include/asm-generic/Kbuild | 1 +
 kernel/locking/Makefile    | 1 +
 3 files changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3864d473f52f..b0179ef867eb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4297,6 +4297,9 @@ F:	include/uapi/linux/filter.h
 F:	kernel/bpf/
 F:	kernel/trace/bpf_trace.c
 F:	lib/buildid.c
+F:	arch/*/include/asm/rqspinlock.h
+F:	include/asm-generic/rqspinlock.h
+F:	kernel/locking/rqspinlock.c
 F:	lib/test_bpf.c
 F:	net/bpf/
 F:	net/core/filter.c
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 1b43c3a77012..8675b7b4ad23 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -45,6 +45,7 @@ mandatory-y += pci.h
 mandatory-y += percpu.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
+mandatory-y += rqspinlock.h
 mandatory-y += runtime-const.h
 mandatory-y += rwonce.h
 mandatory-y += sections.h
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 0db4093d17b8..5645e9029bc0 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_SMP) += spinlock.o
 obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
 obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
+obj-$(CONFIG_BPF_SYSCALL) += rqspinlock.o
 obj-$(CONFIG_RT_MUTEXES) += rtmutex_api.o
 obj-$(CONFIG_PREEMPT_RT) += spinlock_rt.o ww_rt_mutex.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o

From patchwork Mon Mar  3 15:22:59 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999037
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BCB8233D8C;
	Mon,  3 Mar 2025 15:23:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015418; cv=none;
 b=QGQMVXH3l2hgiQmHmXvCqAz5Xj4MvXx9+nVVppe77aqZqqGO3yj7y7lbbXOSaSyNkab1bdkJhoSqZPa+SUjGE51CN351e9iGDIXTT2UPO3ywP3ESaBgNzuMcah3f517FcGOV3bsCU/Z9rtXnXfSgbCJqFu3a+KUaWlwotFGVhbQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015418; c=relaxed/simple;
	bh=4ih0G/0286de3X2HOwCgeQ8ZZwbokSdXkcTQRkadVk8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=h45zHW78kfpDOEQ4j+AjLBMXS1EuTQ2ZXCe7MwsYg4/OICSM/UrHvaZInmaD7XKp6B+XTu3e71cn7S/KXwlWyN+EGoIkg9glfZO6RRaJlQ8o7x/WmG7tzdeg9z4oS1f4RnLxEGFmuG0QqNbbPdOHm3cYl46R47y8zjAeEOAQW90=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=MZYk0jzh; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="MZYk0jzh"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-4394a823036so43939805e9.0;
        Mon, 03 Mar 2025 07:23:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015414; x=1741620214;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=jp6uG/E/DcwB9A5vS09y6QoqRf0wNeVJjI9fKDMNnPg=;
        b=MZYk0jzhbj9BL4/oHqAaDY9k0ixarFj+A9KBZzJ2IOIFajWn8p3t0oOtLcr6EQiGD/
         8yelWg8C9D18uqpR82g8AsySYnpmjw0TAegVKsCJxJPrZjSflzMd8WDh0Gsns/iwTM1t
         tNXRDY6PgLvG+/rwME7GKKM8M6o15TQOr8EGjLTsytjKJ/IaSKOqcvNqYFoaK69BwaNL
         obPgxNy24YpW+QCNRn3e5U/PBAd5TrMDV/nqFum7yr2xwFgOKgvIABMKDT418tn0nSOP
         BfcTWptL3anBd3njGDFbHtnsOrweJc0amCzLvcliCAGyMsZr8QhqjhoNNhXvuAnBMGVA
         l/2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015414; x=1741620214;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=jp6uG/E/DcwB9A5vS09y6QoqRf0wNeVJjI9fKDMNnPg=;
        b=QlNyQJNPl+/x9pSZCO9FpJSftDTWk4zvXBFg2K4qTJnNlfCDGtOGOL0+o0gcEY+jCl
         T30hhbM5EveD53ZJX4sAiOeXgm1JmgJON0RpHc4V/DOUcv3MmNcuqVOeHa6gMcoMTkPM
         EPXN3EucYzhQXndnx/ADK0pAFyJAvdeuN27YFRDRCisA7pXJ8A6+cwsSo0tw4d1KsUs6
         zZkrvZRwCAicOGPdO1WEysaWHKzG6lc+eAfJLR5SVrduLFlEa9t97gRZaGtryRQTvTF6
         DN+zJmSMqtni4l9nvzLl9w6mAiSs7SFlgoT+j+D7cYhwmt/hfKvA2r9AnuQPden7aXwr
         M2/Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCU+54MumYyjdaka5MAAneN7ppSSKis4TmbU7HBCCo7QuD6gjpXpLZjQitG61QVS3RNSjEv0qai7WPE4MTs=@vger.kernel.org
X-Gm-Message-State: AOJu0YzapEr8gCGAvskdTUMUGkD90u6LLntcP8tp+ExpQtRzxhpdbjx5
	qt/gKvI3WliNV4fCF6YuUUu9g6lC/Ey6W1W9gfLnIU7TAUr44fyJ2iRKZSHcY3E=
X-Gm-Gg: ASbGncsVArEx3JCMwAEf+pwqCzfPhr5+2GvoeMODu7fAFyXsmmbjDKYsbrnnEFKzrdQ
	Z0D3z3kBlCw863LRcGE1ZE6O0M9FTcCKLIw9zPgLr+n2wdOdj/n+nkZGwBJEyqMvPv1fAGrB1U/
	m6ZVWfd0i7PcS0n1fC+6zXAzp1lZcVjmR+DGAd1M7gS9LkZ4xXKupHJsaYpPNB6quo6eM2pMF7Q
	WCYI3wzfLBtAqNJmrGkSCQIr9fJsr+a5bxJ6afOdK8xQdN9wC0Iwo5trvfL+RWZZgmfY2GNdF6a
	uhKna14cqRANBGH+7dtBCXX5HC5oVl4=
X-Google-Smtp-Source: 
 AGHT+IEbl3LTjrdqPSH5d3kkpLqYi842EMyLleWiCvMkmyw1mm4oqWLsvTK8TF5jjGJQgV7w+8Emkg==
X-Received: by 2002:a05:600c:19ce:b0:439:99e6:2ab with SMTP id
 5b1f17b1804b1-43ba6766b44mr95692715e9.28.1741015414338;
        Mon, 03 Mar 2025 07:23:34 -0800 (PST)
Received: from localhost ([2a03:2880:31ff::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bad347823sm110368705e9.0.2025.03.03.07.23.33
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:33 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 19/25] bpf: Convert hashtab.c to rqspinlock
Date: Mon,  3 Mar 2025 07:22:59 -0800
Message-ID: <20250303152305.3195648-20-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=11131; h=from:subject;
 bh=4ih0G/0286de3X2HOwCgeQ8ZZwbokSdXkcTQRkadVk8=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYUEPdLFj8sZgqLb5zn7VbigpDGWSBgEbqM3XA
 x/fwLJeJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Ryv8LD/
 97jXIjkq7Lj++KTQTcQoBHp5m3OgRssDYOiFDubICyCP6RJL9F3I1YPvZ3V7aYFcJJBEuTYglkhD3z
 vdEzq0g+0G/b/2vgbS8q/Q6tlI4W54PihyS+qwGrnVz3uJPq2Vh8nK0kcPwLJzfLzM9O0weSfEcFz3
 WN5AmPXgB759SWiegOPd5FlfEdvcgoDzQHqKNy5Oc4QGYqMMxfoe1BajpDdGIKckYE1k/Q3jD4Ika7
 0Ocsr3XMKJdKzWaZ0++mS0Xg1WPgVlB28K08SYP/VcJv8ymSdntzx3T2FCD5ezdoG+8lXnrnhFKkSq
 JPUZizhxMzGDID+hebcv5VroikizRXHUOZP+OdlHq3FVRtQcu9YmttvUI/AYkCqVIlchD1WcW7nPg3
 wBl+QpZfzhAmz3RlwPTacFVnw92AeQLJ/wEgyufOlTdQkFBwITR04JWL46ibiZugmnV5nF/HRctNPR
 z8cV3U0H2O39bk3dQ7m7Gb17vyUwD+31Bu/9WsIU+oFd9vj3uWEWCMCk7SbIHD+lTSHwZ9j90+KCKm
 2aohPllCBBFhVPuDHR3hlKDDoNV1hA1lVZZIbepftZGRQYKUeoQQdxjHTha7P81H3VVeD0xhU+HmJN
 uqimoBk7BWk8BQ4E3HYoNBJ1UHaIZO4UP5kjM4Il4c+PzynID4cM955BCL0A==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed
per-cpu counter crud from the code base which is no longer necessary.

Closes: https://lore.kernel.org/bpf/675302fd.050a0220.2477f.0004.GAE@google.com
Closes: https://lore.kernel.org/bpf/000000000000b3e63e061eed3f6b@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/hashtab.c | 102 ++++++++++++++-----------------------------
 1 file changed, 32 insertions(+), 70 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index c308300fc72f..93d45812bb6a 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -16,6 +16,7 @@
 #include "bpf_lru_list.h"
 #include "map_in_map.h"
 #include <linux/bpf_mem_alloc.h>
+#include <asm/rqspinlock.h>
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
@@ -78,7 +79,7 @@
  */
 struct bucket {
 	struct hlist_nulls_head head;
-	raw_spinlock_t raw_lock;
+	rqspinlock_t raw_lock;
 };
 
 #define HASHTAB_MAP_LOCK_COUNT 8
@@ -104,8 +105,6 @@ struct bpf_htab {
 	u32 n_buckets;	/* number of hash buckets */
 	u32 elem_size;	/* size of each element in bytes */
 	u32 hashrnd;
-	struct lock_class_key lockdep_key;
-	int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT];
 };
 
 /* each htab element is struct htab_elem + key + value */
@@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab)
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
-		raw_spin_lock_init(&htab->buckets[i].raw_lock);
-		lockdep_set_class(&htab->buckets[i].raw_lock,
-					  &htab->lockdep_key);
+		raw_res_spin_lock_init(&htab->buckets[i].raw_lock);
 		cond_resched();
 	}
 }
 
-static inline int htab_lock_bucket(const struct bpf_htab *htab,
-				   struct bucket *b, u32 hash,
-				   unsigned long *pflags)
+static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags)
 {
 	unsigned long flags;
+	int ret;
 
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-
-	preempt_disable();
-	local_irq_save(flags);
-	if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
-		__this_cpu_dec(*(htab->map_locked[hash]));
-		local_irq_restore(flags);
-		preempt_enable();
-		return -EBUSY;
-	}
-
-	raw_spin_lock(&b->raw_lock);
+	ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags);
+	if (ret)
+		return ret;
 	*pflags = flags;
-
 	return 0;
 }
 
-static inline void htab_unlock_bucket(const struct bpf_htab *htab,
-				      struct bucket *b, u32 hash,
-				      unsigned long flags)
+static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags)
 {
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-	raw_spin_unlock(&b->raw_lock);
-	__this_cpu_dec(*(htab->map_locked[hash]));
-	local_irq_restore(flags);
-	preempt_enable();
+	raw_res_spin_unlock_irqrestore(&b->raw_lock, flags);
 }
 
 static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
@@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
 	bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
 	struct bpf_htab *htab;
-	int err, i;
+	int err;
 
 	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-	lockdep_register_key(&htab->lockdep_key);
-
 	bpf_map_init_from_attr(&htab->map, attr);
 
 	if (percpu_lru) {
@@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	if (!htab->buckets)
 		goto free_elem_count;
 
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) {
-		htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map,
-							   sizeof(int),
-							   sizeof(int),
-							   GFP_USER);
-		if (!htab->map_locked[i])
-			goto free_map_locked;
-	}
-
 	if (htab->map.map_flags & BPF_F_ZERO_SEED)
 		htab->hashrnd = 0;
 	else
@@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 free_map_locked:
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
 	bpf_map_area_free(htab->buckets);
 	bpf_mem_alloc_destroy(&htab->pcpu_ma);
 	bpf_mem_alloc_destroy(&htab->ma);
 free_elem_count:
 	bpf_map_free_elem_count(&htab->map);
 free_htab:
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
@@ -817,7 +783,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	b = __select_bucket(htab, tgt_l->hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return false;
 
@@ -828,7 +794,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 			break;
 		}
 
-	htab_unlock_bucket(htab, b, tgt_l->hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	if (l == tgt_l)
 		check_and_free_fields(htab, l);
@@ -1147,7 +1113,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		 */
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1198,7 +1164,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 			check_and_free_fields(htab, l_old);
 		}
 	}
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l_old) {
 		if (old_map_ptr)
 			map->ops->map_fd_put_ptr(map, old_map_ptr, true);
@@ -1207,7 +1173,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	}
 	return 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1254,7 +1220,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	copy_map_value(&htab->map,
 		       l_new->key + round_up(map->key_size, 8), value);
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1275,7 +1241,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	ret = 0;
 
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 err_lock_bucket:
 	if (ret)
@@ -1312,7 +1278,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1337,7 +1303,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1378,7 +1344,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 			return -ENOMEM;
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1402,7 +1368,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 err_lock_bucket:
 	if (l_new) {
 		bpf_map_dec_elem_count(&htab->map);
@@ -1444,7 +1410,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1454,7 +1420,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	if (l)
 		free_htab_elem(htab, l);
@@ -1480,7 +1446,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1491,7 +1457,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l)
 		htab_lru_push_free(htab, l);
 	return ret;
@@ -1558,7 +1524,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map)
 static void htab_map_free(struct bpf_map *map)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
-	int i;
 
 	/* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback.
 	 * bpf_free_used_maps() is called after bpf prog is no longer executing.
@@ -1583,9 +1548,6 @@ static void htab_map_free(struct bpf_map *map)
 	bpf_mem_alloc_destroy(&htab->ma);
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 }
 
@@ -1628,7 +1590,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &bflags);
+	ret = htab_lock_bucket(b, &bflags);
 	if (ret)
 		return ret;
 
@@ -1665,7 +1627,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	hlist_nulls_del_rcu(&l->hash_node);
 
 out_unlock:
-	htab_unlock_bucket(htab, b, hash, bflags);
+	htab_unlock_bucket(b, bflags);
 
 	if (l) {
 		if (is_lru_map)
@@ -1787,7 +1749,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	head = &b->head;
 	/* do not grab the lock unless need it (bucket_cnt > 0). */
 	if (locked) {
-		ret = htab_lock_bucket(htab, b, batch, &flags);
+		ret = htab_lock_bucket(b, &flags);
 		if (ret) {
 			rcu_read_unlock();
 			bpf_enable_instrumentation();
@@ -1810,7 +1772,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		goto after_loop;
@@ -1821,7 +1783,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		kvfree(keys);
@@ -1884,7 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		dst_val += value_size;
 	}
 
-	htab_unlock_bucket(htab, b, batch, flags);
+	htab_unlock_bucket(b, flags);
 	locked = false;
 
 	while (node_to_free) {

From patchwork Mon Mar  3 15:23:00 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999038
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DABAF2356A6;
	Mon,  3 Mar 2025 15:23:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015419; cv=none;
 b=UkqY6Px38VXsDl9draMxZwDGXYxh5xLxmdnQyzdSsvHKbmkh6ml0CILcYzOBm4J6nCD9YlVCWTg5YLxbVFu41LccP23cC3BqxSN4WQidkew9p7JoN2f0+YGFFxCUc/5A9Oodu0SLKNk3WQSX34ikCxYQafdVuAw/O7PIkHfs8oQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015419; c=relaxed/simple;
	bh=at8Ekv6S0aGbB9KEki4TPKeyEB0J4bUZMvZ4q+vMLKI=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=I0yiLLz68ziMo+7935BgbMBPZ6rmrLjWq4sDmLZrbSVdOTJSymLvrVML3sNaIPl8G/T9y5uqRkiqpo7JT7NZDuTS/VAR62xdAEi8QIrzGaGJR64CJjofSyoW872sTUhZecbLgH+/0xiNSAbuzrsW5uEMF0HvFrfhikytITUVzoA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=MnFEf66B; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="MnFEf66B"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43948021a45so42175175e9.1;
        Mon, 03 Mar 2025 07:23:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015416; x=1741620216;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=e8dD4/Kp++x7YqbDF8EB2j5tk6yfejIzgpWScMmcz5s=;
        b=MnFEf66BIAGpvy3dHyStWHajoA4i6JOty4nZJ8hAPBs88ToLIr+VOQ7Ox7Gr9Mzew9
         sf9l9ABpVA+ORLagjSAmXSLAQsgOtRgfUyNmzcr+LzZZwA6ngnOwYGHEgEBILpkTl0Zz
         z15VpoblZVp2QMOBgzx+uHJdxtbEsyBxOFC9xOiIoyJPuHd+vtKSuzFY7d+CZv72EHKu
         Uu3D7jWZ1bvDNDEykdHnCMwcwIG5QRtUCLjywWNTNTwdksPMt1qd5JpxOLeMtKpTEVQz
         VCAp+vCFQQ7fDul2IIJmgfliHkPAZtL7LPn7xI2ISboX0X9DEBYo6/u19aTXjS+lDnKi
         o+4Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015416; x=1741620216;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=e8dD4/Kp++x7YqbDF8EB2j5tk6yfejIzgpWScMmcz5s=;
        b=BCk/fe2CvR4Lr/uhhm0vs6owyIGBgV4N4gp3asepA6zXxgNMLyPXNdkz+tU5aIZUKC
         KIexCJmb3TjRAmbwsP4qbZOPE4GMIzmpXNKlkcyOWg4pzdtQncrvu9SG9/aiBDO5xbQG
         XnuQ3mQlnwpgToaSuD5IeJp+XtodvAHGhVihJmEcznVQE9RsVPz/Mvb17EqeUyNdmHdr
         JUwZjnNSAaMTPTiofmr6QWlZ4u/PNo7DnMFLPAvNlUJ4OIGfSPGNi01MOKAMlP0epPZG
         RwGAzGViPje90BJNGnJhr1RAxkdE/HhhZR4mHTYwrYfvm/mprNTweBFwZSTkAB1ruU08
         K/yw==
X-Forwarded-Encrypted: i=1;
 AJvYcCU/QfailE7pUwLwtKDWZpRu6U84RoXrRJ8jxxlFwaFZhpdXib+k+ljHaVax5dYjY6XrxCPChf8JZLTYQWs=@vger.kernel.org
X-Gm-Message-State: AOJu0Yw5XEcWsAhsqbOeRCjkVuFeBeXBsVlUV2vhfqMFpN/4EZOQ+Q7e
	XV+tY8rfJ9j+4itS70pq/J3Gd8p1ZBMNUYl6lnmbsS1iuuLlEoWlKktEqX6RwyA=
X-Gm-Gg: ASbGncty5kCBBqkBHvKF2cPng1+7ZTIxPPBixbu8SMjxd5+F13OJCs2dMYEYuIBfRTW
	vh4EvyDjPirNtFbBpI6vm8BYc8W7YQlQCxrJJr/zAPt/AI3WKLPkvdBxrmQur0pUsvIoIhlicCH
	BuKnrq9XUXVfZJjwTEUwNiqkZNb7u/wCsms+LFLwvq+ApHa5XfbAuRo5a2fo0K9A6Owq9QOeWHE
	4i9LcX+rof5iCIJUAFd2xQdx+HgD8D61IZcb1Uk/LA2Ya+ld+Ly+WOL/AVQOOmutWcUHbSyCLFk
	0O9KKn9Sitzi14UEDUvza9GKgkPQ2bDInFM=
X-Google-Smtp-Source: 
 AGHT+IFM6Zq1v0jmMwXwYqRoD9oDMqR8dF4x86BBEAU0OdxfdV7k9GzF2/Hh85klPoaqsXWBJydz9g==
X-Received: by 2002:a05:600c:138c:b0:439:5da7:8e0 with SMTP id
 5b1f17b1804b1-43ba6710819mr131963315e9.16.1741015415569;
        Mon, 03 Mar 2025 07:23:35 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:52::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43aba5329besm191031045e9.15.2025.03.03.07.23.34
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:34 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 20/25] bpf: Convert percpu_freelist.c to
 rqspinlock
Date: Mon,  3 Mar 2025 07:23:00 -0800
Message-ID: <20250303152305.3195648-21-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=6720; h=from:subject;
 bh=at8Ekv6S0aGbB9KEki4TPKeyEB0J4bUZMvZ4q+vMLKI=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYU15SR6rg3ngy0KboyZkXB1I7iXAgTU+4Zv4a
 q8n+lJ+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Ryp6rD/
 9okBvTAvR2PTgrgAzFgfUFBu77V0INCzcE3PwQfEwD6rcvTnQUIlryGruEBawLfwn0wxW24wRL/dRy
 nGL+yuYrEX9YWrs+Pqsz7o2vt7yJfza0S2lZUFT62qMjwUQg9BarfBMcmdWT9afoe0wpXqzQJZc6jh
 N6kk4VGDj6B7Px2V2Q0EmPXTnjMNmTYmW+3aIZ54OWksD6UxJnflDMq5xbEWu/GF4B5K4mbSvFMCII
 lU78oDBb2buz6I6WKt9BQukjzIuQSRCcH/r1FEJvzfoQ4A3TWk18Y1lxWqoLywUa4Ts7ff2I7yus+1
 Zj/YRvGo//Lcbv35gHIqrjLfvvKXZaW20bcvjfIhytSshAaX3innu2cR2sqFFrVmcb9QwjjqIKzOCa
 5FGA1xCE5jK6UkBwTmMKx4rg1e1fUi1vRIbBAjX86NNJ0GuOCdFjKuYocy4yxL30V9Ik680i0XeSlr
 QnkJNq/7VLZRNEjs54b4gDrXYDTwnRE6WzcfhV7ZQ4Ju0vanEBetGxwfUjmZM0M0ajtqIWGYnzgiFS
 R5C7BadAXVRz5lKKGAlQxBfZ6nS2erxfMxeR6kGXDbOMO6k1byllETTKxBijhGc5WApPvSUKF8aOUK
 sGKE2uv8eHTZLfSfjrYm32k98H6hP2O73mx941c2I3ASB3LXurQ/pSaQGQRQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert the percpu_freelist.c code to use rqspinlock, and remove the
extralist fallback and trylock-based acquisitions to avoid deadlocks.

Key thing to note is the retained while (true) loop to search through
other CPUs when failing to push a node due to locking errors. This
retains the behavior of the old code, where it would keep trying until
it would be able to successfully push the node back into the freelist of
a CPU.

Technically, we should start iteration for this loop from
raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus,
we skip execution in the loop body instead.

Closes: https://lore.kernel.org/bpf/CAPPBnEa1_pZ6W24+WwtcNFvTUHTHO7KUmzEbOcMqxp+m2o15qQ@mail.gmail.com
Closes: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/percpu_freelist.c | 113 ++++++++---------------------------
 kernel/bpf/percpu_freelist.h |   4 +-
 2 files changed, 27 insertions(+), 90 deletions(-)

diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 034cf87b54e9..632762b57299 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s)
 	for_each_possible_cpu(cpu) {
 		struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu);
 
-		raw_spin_lock_init(&head->lock);
+		raw_res_spin_lock_init(&head->lock);
 		head->first = NULL;
 	}
-	raw_spin_lock_init(&s->extralist.lock);
-	s->extralist.first = NULL;
 	return 0;
 }
 
@@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head,
 	WRITE_ONCE(head->first, node);
 }
 
-static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
+static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head,
 					 struct pcpu_freelist_node *node)
 {
-	raw_spin_lock(&head->lock);
-	pcpu_freelist_push_node(head, node);
-	raw_spin_unlock(&head->lock);
-}
-
-static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s,
-						struct pcpu_freelist_node *node)
-{
-	if (!raw_spin_trylock(&s->extralist.lock))
+	if (raw_res_spin_lock(&head->lock))
 		return false;
-
-	pcpu_freelist_push_node(&s->extralist, node);
-	raw_spin_unlock(&s->extralist.lock);
+	pcpu_freelist_push_node(head, node);
+	raw_res_spin_unlock(&head->lock);
 	return true;
 }
 
-static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
-					     struct pcpu_freelist_node *node)
+void __pcpu_freelist_push(struct pcpu_freelist *s,
+			struct pcpu_freelist_node *node)
 {
-	int cpu, orig_cpu;
+	struct pcpu_freelist_head *head;
+	int cpu;
 
-	orig_cpu = raw_smp_processor_id();
-	while (1) {
-		for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
-			struct pcpu_freelist_head *head;
+	if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node))
+		return;
 
+	while (true) {
+		for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
+			if (cpu == raw_smp_processor_id())
+				continue;
 			head = per_cpu_ptr(s->freelist, cpu);
-			if (raw_spin_trylock(&head->lock)) {
-				pcpu_freelist_push_node(head, node);
-				raw_spin_unlock(&head->lock);
-				return;
-			}
-		}
-
-		/* cannot lock any per cpu lock, try extralist */
-		if (pcpu_freelist_try_push_extra(s, node))
+			if (raw_res_spin_lock(&head->lock))
+				continue;
+			pcpu_freelist_push_node(head, node);
+			raw_res_spin_unlock(&head->lock);
 			return;
+		}
 	}
 }
 
-void __pcpu_freelist_push(struct pcpu_freelist *s,
-			struct pcpu_freelist_node *node)
-{
-	if (in_nmi())
-		___pcpu_freelist_push_nmi(s, node);
-	else
-		___pcpu_freelist_push(this_cpu_ptr(s->freelist), node);
-}
-
 void pcpu_freelist_push(struct pcpu_freelist *s,
 			struct pcpu_freelist_node *node)
 {
@@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 
 static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
 {
+	struct pcpu_freelist_node *node = NULL;
 	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
 	int cpu;
 
 	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
 		head = per_cpu_ptr(s->freelist, cpu);
 		if (!READ_ONCE(head->first))
 			continue;
-		raw_spin_lock(&head->lock);
+		if (raw_res_spin_lock(&head->lock))
+			continue;
 		node = head->first;
 		if (node) {
 			WRITE_ONCE(head->first, node->next);
-			raw_spin_unlock(&head->lock);
+			raw_res_spin_unlock(&head->lock);
 			return node;
 		}
-		raw_spin_unlock(&head->lock);
+		raw_res_spin_unlock(&head->lock);
 	}
-
-	/* per cpu lists are all empty, try extralist */
-	if (!READ_ONCE(s->extralist.first))
-		return NULL;
-	raw_spin_lock(&s->extralist.lock);
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
-	return node;
-}
-
-static struct pcpu_freelist_node *
-___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
-{
-	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
-	int cpu;
-
-	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
-		head = per_cpu_ptr(s->freelist, cpu);
-		if (!READ_ONCE(head->first))
-			continue;
-		if (raw_spin_trylock(&head->lock)) {
-			node = head->first;
-			if (node) {
-				WRITE_ONCE(head->first, node->next);
-				raw_spin_unlock(&head->lock);
-				return node;
-			}
-			raw_spin_unlock(&head->lock);
-		}
-	}
-
-	/* cannot pop from per cpu lists, try extralist */
-	if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock))
-		return NULL;
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
 	return node;
 }
 
 struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
 {
-	if (in_nmi())
-		return ___pcpu_freelist_pop_nmi(s);
 	return ___pcpu_freelist_pop(s);
 }
 
diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h
index 3c76553cfe57..914798b74967 100644
--- a/kernel/bpf/percpu_freelist.h
+++ b/kernel/bpf/percpu_freelist.h
@@ -5,15 +5,15 @@
 #define __PERCPU_FREELIST_H__
 #include <linux/spinlock.h>
 #include <linux/percpu.h>
+#include <asm/rqspinlock.h>
 
 struct pcpu_freelist_head {
 	struct pcpu_freelist_node *first;
-	raw_spinlock_t lock;
+	rqspinlock_t lock;
 };
 
 struct pcpu_freelist {
 	struct pcpu_freelist_head __percpu *freelist;
-	struct pcpu_freelist_head extralist;
 };
 
 struct pcpu_freelist_node {

From patchwork Mon Mar  3 15:23:01 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999039
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com
 [209.85.128.65])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 124B42356D2;
	Mon,  3 Mar 2025 15:23:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.65
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015420; cv=none;
 b=WIDvM1dvgOKov/ZLHCFarZEB7FxAuzA695RQQM2TttIZV0PHNIZpPtlgxO4+bmj8l1eNrWVXpOiu/r02pImPRClIsE78txGWwggbdrvJa2fO24XGDe1TJiK7w27TewIvKEfcquiYBcoWNT4T1QyquofAPMlpUAGZcVLaGOXmvhw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015420; c=relaxed/simple;
	bh=H+4fvDbeMKpg9kdHFVQF9EirB/DOLCaq4vd16HHmH14=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=eRXZjT9Wsj+gf5JjDh0q4/1XHX4w+3FU5soHhHtNwvfFj03mUF12/RV1+OWGqmUY+BimeM/uO0ZYA+DSy6rr5cKgyAtt78wWgOdZIO24nbelqogUjSezSoQOPMCoObkLfibHC/4pxQgCjCOvubIBz8r8RdmycXsEKF3hizVS7IQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=TkDugpBn; arc=none smtp.client-ip=209.85.128.65
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="TkDugpBn"
Received: by mail-wm1-f65.google.com with SMTP id
 5b1f17b1804b1-43bc0b8520cso7209845e9.1;
        Mon, 03 Mar 2025 07:23:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015417; x=1741620217;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ign20YEmiqhrw2u8+y6jgpbQIECAjRoqEHqNv09q2OE=;
        b=TkDugpBnxQ1istzcieqRvVSmwrgdVo5Om7HfAezQOioKJ0ohdJVw6vK3TH/GKYQ9vZ
         suKW6m+VMMRGmnrEraNh8lB3EsUe29ly8uvn3qKSYtHbngTEgt0pWazTs4nhOPLWspGf
         Yz3tREYLMNvNmz1MacKAW2AyAdlE1SmvMwkfqB3LN8xduC9NDvm55rXPOS+WzGON3Nvi
         IL7NYCFoimHwB+T/c6+AUVbZ0yhKyInwJYzOdb6OLN3uICjSkiuc0vnS8LDHdSJvD8kg
         6fW8dygGbm9BhztKxsjr8rNYudrj03vmOUzFcfDzcZ5bZHQ5lRrsYTgI6m2ca9Nw722o
         TLOA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015417; x=1741620217;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ign20YEmiqhrw2u8+y6jgpbQIECAjRoqEHqNv09q2OE=;
        b=q1WxqiJ3yq+WqpC9F9QKI4cx37iCYpoW/PG0ieQIQel7sgwnwgfNxmxcS4P8Z/VsVG
         hOI0hnIpq4zZucr9RMVfQNuM5W5ixyRRD68gk+vjffnVebqpmOQk1CbbrvWz/zHsiFio
         4dRlTA8V7otislpIjPWHSU6cN0ANX1xj0hZmF6qbueB9DPUxxkBH3xF8saBlvCNEZVp6
         ZP51JRUWQDtHjE5bfI98D6py6Jg7umajk2THiQzIIvXQZN266oFJwjMdqgqFxpW8SsIk
         Yc8cFbwLJHvcJytxJSHfMuBreytx9Z8VU8rhSAtAejdL9DqiCJLlJeLCJGlGFls5ep+m
         TMvQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWe1J8pLeA94aS61BYnxTQ76YUsjqOAoQ3tmxxBieKkR/dMcozTpqHddC3DtR38yjHrruh5wNHPosNpODU=@vger.kernel.org
X-Gm-Message-State: AOJu0YywLsuCHhZxNDrT34+yRKJpkw7AK1Z6Hj8y8FYBAoFwmbCHdDBI
	uf8mB5Fl/g+Y+b+TByCxOhHFXu0hcdF7Sq3HnGXMVb1Ezg1+xXCnhA/RsOL8a1g=
X-Gm-Gg: ASbGncsSAfjCkpCLZ3VPubugkjDjXk9Bl+z5ZOeIaKkpURkezU4AYMO70xEsBTw9/im
	JyRZrnj1ZKup7eOpV3NYZHA9IWP5rnrbKKXbHQA+SxVeNblS8IYA6KkdSnfTGYQRklEnJ9L2pGx
	dpjXwJi1hXcnWeUA4K10xs7BJ8sneal5BwApDDEaG3Mzy9eJx3qIPJfslMv2Tp4ok+U3KHYL9YD
	angsj34WKDC9C4LorwYjNG79Yybq+7HCoairO3zydD8oLgw3+QnACxMfEQwEhL0v/ZpLZ1aA4SL
	f10lkXXmSalk3xEp9nhXv/aD0jQ0aT3PzA==
X-Google-Smtp-Source: 
 AGHT+IGlleKe6Z0c4L94quE3jCAiGhg6NJQ9iC0AaFH2JLOqqs7KqiXhmqbOtJsTuHI4jJYfhIdEnw==
X-Received: by 2002:a05:600c:a03:b0:439:9b19:9e2d with SMTP id
 5b1f17b1804b1-43ba6702becmr131087705e9.16.1741015416963;
        Mon, 03 Mar 2025 07:23:36 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:7::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bbfece041sm37242865e9.1.2025.03.03.07.23.36
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:36 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 21/25] bpf: Convert lpm_trie.c to rqspinlock
Date: Mon,  3 Mar 2025 07:23:01 -0800
Message-ID: <20250303152305.3195648-22-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=3875; h=from:subject;
 bh=H+4fvDbeMKpg9kdHFVQF9EirB/DOLCaq4vd16HHmH14=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYzhMQMRB+v202mYygVFkG5s/se6GlCLJdFF8d
 3MnU/7aJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Rym2eEA
 CegI2sLVgC2H3/ckPAgaXiYSYSB/uZkXAnww6a6gZc4jNT9kbnHO5KY2RaymVBLci89GDlmnX3v+wp
 WH5WovKA4m3qH12EjfliSemjg4XIf5M+9Ubag8HHRATXQ0yaW0/Qm4ssV5nbBuntkgRgngLkiyA5C8
 6fjsJiKGdQIahwdn/67q1wa9oR78sDXdVF+AOJsCO2QuhaU5qq0dltH27b3gJqWKSyc1OZcC5r1l61
 elIyx2qtvgu8OqcZZqaRnEqrlW0x0BpkkO7Dgku2PsEA2SH7M+1DMl5mqtd3J2A0MZrUOT6OjSXmg4
 p9NU7L83xsKPB1L/pYutorTTzi6toA8506vLLki/zJPux9cMC5JYuSLqcpz0pb1Q9FKfd8pB/9TfSu
 pbO9LSlidNGUkw43mfpYyTaDXU14Ud+GC+0SLrUZPMF5UwGNMJuS7N5MfGYxZ+Cwmw5fKGhqPOl9iz
 9LGmjGVn6Kv7wo6LX+4HFS3gLLUM/eXMg+u9WWdTUBvFK+ksxSCkHM5fghBlgZS02NQfzLCdz6uO07
 4a4aGj9rfzYsf+X9gWldSoL3VrCmnfaj9BW6RTJpdYDI5j73EYdqt946jfnpVYlit0hY5aV7DI4ZiI
 +0MFK7IDXVAmFcgWYbJJ2fnguEXpNKxH7KrBlkDDpqfs578zqAeL7aGfqhUg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Convert all LPM trie usage of raw_spinlock to rqspinlock.

Note that rcu_dereference_protected in trie_delete_elem is switched over
to plain rcu_dereference, the RCU read lock should be held from BPF
program side or eBPF syscall path, and the trie->lock is just acquired
before the dereference. It is not clear the reason the protected variant
was used from the commit history, but the above reasoning makes sense so
switch over.

Closes: https://lore.kernel.org/lkml/000000000000adb08b061413919e@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/lpm_trie.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index e8a772e64324..be66d7e520e0 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -15,6 +15,7 @@
 #include <net/ipv6.h>
 #include <uapi/linux/btf.h>
 #include <linux/btf_ids.h>
+#include <asm/rqspinlock.h>
 #include <linux/bpf_mem_alloc.h>
 
 /* Intermediate node */
@@ -36,7 +37,7 @@ struct lpm_trie {
 	size_t				n_entries;
 	size_t				max_prefixlen;
 	size_t				data_size;
-	raw_spinlock_t			lock;
+	rqspinlock_t			lock;
 };
 
 /* This trie implements a longest prefix match algorithm that can be used to
@@ -342,7 +343,9 @@ static long trie_update_elem(struct bpf_map *map,
 	if (!new_node)
 		return -ENOMEM;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		goto out_free;
 
 	new_node->prefixlen = key->prefixlen;
 	RCU_INIT_POINTER(new_node->child[0], NULL);
@@ -356,8 +359,7 @@ static long trie_update_elem(struct bpf_map *map,
 	 */
 	slot = &trie->root;
 
-	while ((node = rcu_dereference_protected(*slot,
-					lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*slot))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -442,8 +444,8 @@ static long trie_update_elem(struct bpf_map *map,
 	rcu_assign_pointer(*slot, im_node);
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
-
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
+out_free:
 	if (ret)
 		bpf_mem_cache_free(&trie->ma, new_node);
 	bpf_mem_cache_free_rcu(&trie->ma, free_node);
@@ -467,7 +469,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	if (key->prefixlen > trie->max_prefixlen)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		return ret;
 
 	/* Walk the tree looking for an exact key/length match and keeping
 	 * track of the path we traverse.  We will need to know the node
@@ -478,8 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	trim = &trie->root;
 	trim2 = trim;
 	parent = NULL;
-	while ((node = rcu_dereference_protected(
-		       *trim, lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*trim))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -543,7 +546,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	free_node = node;
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
 
 	bpf_mem_cache_free_rcu(&trie->ma, free_parent);
 	bpf_mem_cache_free_rcu(&trie->ma, free_node);
@@ -592,7 +595,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 			  offsetof(struct bpf_lpm_trie_key_u8, data);
 	trie->max_prefixlen = trie->data_size * 8;
 
-	raw_spin_lock_init(&trie->lock);
+	raw_res_spin_lock_init(&trie->lock);
 
 	/* Allocate intermediate and leaf nodes from the same allocator */
 	leaf_size = sizeof(struct lpm_trie_node) + trie->data_size +

From patchwork Mon Mar  3 15:23:02 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999040
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C169236442;
	Mon,  3 Mar 2025 15:23:40 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015422; cv=none;
 b=C8Xt9BoRkZn01kTLtjeNAgm4YmsBijMeQpDfBGXNOORp7Ufnez6mnKlGC7Wf162iItbs/VmQD/BMoKMVIXM7uFvVHw3ZBPfvsFmRvt3o/ChGT6sfpuhiV8EyZKGH2adNX6c/HKnH1TY5zz+vy4D9sMSFMysrvieAM/QHCCrNoJ0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015422; c=relaxed/simple;
	bh=S1zZ1+V5HXfDNJ9sHSSvG8gCJgMqlpTvJbjPruz8ZRU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=kAdkvDynE0yoY21tC8qTOmeraaLweZGJ9O60D7jGci0ZPXAz/VK2H9qHeYPdIwSwZF0hwxxETbB2JH05C5atnKEScxtzCYDEHC5jXhu3FWzK1AMRtmvGDto+r7Ta99o9BDuRrQw/X7E6ZyogTJ75BrMyZ50rxk+EltStrfO/QXg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=ArKuszSL; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="ArKuszSL"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43bc31227ecso6734085e9.1;
        Mon, 03 Mar 2025 07:23:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015418; x=1741620218;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ftGzO2s1hk5aoPF/5zj2qPXt6ZSl0PVETns25Hb5ycw=;
        b=ArKuszSLaiojNWM50p+lrlj2rYkb6lonEJgjF7+QNsXlWXSlicVRKd+5mH0pHJWl1F
         JXdbU64bGe6n7qXYycvL2uQyfWiTZXSJGZz3ANXtFSK6JsjQVylb5LXxM85Es1eSDDeO
         WKQLGk48ClITjHB9/ZksyOxLVP0UiT3dW3Ohry/MTgND0rrvPvrzhJ2gTR7lab21SUg+
         vPV9cgf2frIapzjo971MfBYTk+qbO7HCByqlYPym/6m2jRNkaOy+25hPweNFo4821YTS
         SFItmqLpccOI59wRPoFZ1+F2meNrUTLpTO6TjIFHmP9FiRV0ArbDaIHJO8D7yK5wt6jP
         HPhw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015418; x=1741620218;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ftGzO2s1hk5aoPF/5zj2qPXt6ZSl0PVETns25Hb5ycw=;
        b=tM+eKIL4gSd6Q9B20y97oMT1b1GDu+CyPdVLFBYerOrNr1sljhe2W6qTgp/L2auYXo
         nejL4G2x/mTOzdgMR21a80DVzYgcUJtktxkOAkmYlFXigk6V5NMheBeIpY3tZAskD4X8
         IQ93LU2br25v/SXQc97eZzwdYT7QswFtJqO/qJu7Y2ykRAKN6JXjeZNI6CUPnUDq3DLT
         d4KgCLHdS3gqtgMBDT/mlXD0X5TC3GAtMhWiYf8UDguvGhY5My/fsHvKP2kP1UGyJrlc
         JTdQX108Qz8u2/DtVNuUW/6lRPTisSr7hM5eNUizj8s1daJ1CkfxLwmjLvTSlNcP7/hQ
         FGfg==
X-Forwarded-Encrypted: i=1;
 AJvYcCUkzQ2+3lUMTs89UuayyrYLaTmjuf/KDtCvRqfTUIcEoESLyux9r4olVxbcNIKiklzddtaZ30I0IreFWw4=@vger.kernel.org
X-Gm-Message-State: AOJu0Yx6QBop8TST4KX/+d1uHqqlRJ3Ksl+H+ReIkyWNtBfnhpHZdB6c
	t1shr/R06atMLYpPsv+PW6wb2sqd+7cwJX3sHVzdqj4m1831GVmqv7Es5yS/ZC4=
X-Gm-Gg: ASbGncvkxdVf9QLy4xdaboZU2rOJ2c9UQ72oib2i6tNy7DlMCBs2y1CuOcBCLbbYE07
	LcLmceLtNE8emHhOytfCK/nx8+/f/96BwnkirjZrnHdQCCYbaAFL0qMRIloYBbaOHfGqT/13YC2
	dInupfBjZfoZxy6it2aJuw3oN9mn9MTPwn1frBewGGBvAiVNywMPYLDERRUMKGIBQi8REkFHIh5
	d1uAS/7+W/x+mEWGCeoIQd1jyoJDWbxtH3mWKw2UbYO2hfhbAXIUWaHJUkt3g8SrxGUnHUBSF7b
	88lP/vx0w0AV9DrHYF2rWCKooXI+v00brQ==
X-Google-Smtp-Source: 
 AGHT+IFUgzbddleUlZT0ehYk/Y59B0PJMzJQ++IN3Hp+ieXBjwlFjh5Q5E9wkdn7XokNgE0Z1HE0+g==
X-Received: by 2002:a05:600c:190b:b0:439:955d:7ad9 with SMTP id
 5b1f17b1804b1-43ba66fe855mr116664525e9.14.1741015418083;
        Mon, 03 Mar 2025 07:23:38 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:b::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bc63bcaafsm23440385e9.28.2025.03.03.07.23.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:37 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 22/25] bpf: Introduce rqspinlock kfuncs
Date: Mon,  3 Mar 2025 07:23:02 -0800
Message-ID: <20250303152305.3195648-23-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=5071; h=from:subject;
 bh=S1zZ1+V5HXfDNJ9sHSSvG8gCJgMqlpTvJbjPruz8ZRU=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZSnBmS5ietCtuPE+4pmKWtJULnErXEc4IC9nN
 w/N0+8KJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RyvpbD/
 9n03epztxKLKXs5mq2JtSowibMlL/cxUOqIT/Er6iBycLJW6QrV68Gu4EXHFaJ9MZk5E3g2pzTcM3D
 JO8iLC+CLg/IwzMs+VQkPJBZpTbzrjX8NXMFOVdKOCPuTyBon/w6QQHN+dI0ms1DMMWkcHXpNgvkCp
 uHYwYc6JsQ4WUGaza1Vx3zZFXhgnXCN7o74lDcriARO1+lcB8Se5+bLCMqU6hcPK03ccyXEFk3RrYH
 c5lUO3C9t8W0edpQE1LWCGNLwrhhNivyOOOk7vLbdm4Exk01CJuRc5bDzvHgykGNsJg/UDR2EMGFIK
 4uj6moeuOAxJcYm832dS62gCEz5OxXOr2LOg1Ee2XdP2obxhEA9pFLKdvbXo59xMfmioEC2XKuP0T3
 WjHNHXhqtKaYrns10R6ilDge5CWxKksBPL30/9UkZDy9ERjZkfyD/SDxhmVdNy5ExqHKZCanzFCnMv
 iDoBAewmWR2hj287w0kBdw7CHaoPg9/HIyKMCv0MImIHZlzDasPBMAXnJaYyY0V7LM2fwr3F0UuOFF
 DrPYoPvey+vLaRgS1kIfcBQHd6v9z4fLn9JPWdc5ggzlJdRI3a8rLp2gDF14/LcQZJzK3T86HKBGUn
 Sv9e4xwbZoBil2FPHRU7/R9ghWKfT+cmbi6Iu1+TkNXFlGRHV+TPloE9KwMg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock,
and their irqsave/irqrestore variants, which wrap the rqspinlock APIs.
bpf_res_spin_lock returns a conditional result, depending on whether the
lock was acquired (NULL is returned when lock acquisition succeeds,
non-NULL upon failure). The memory pointed to by the returned pointer
upon failure can be dereferenced after the NULL check to obtain the
error code.

Instead of using the old bpf_spin_lock type, introduce a new type with
the same layout, and the same alignment, but a different name to avoid
type confusion.

Preemption is disabled upon successful lock acquisition, however IRQs
are not. Special kfuncs can be introduced later to allow disabling IRQs
when taking a spin lock. Resilient locks are safe against AA deadlocks,
hence not disabling IRQs currently does not allow violation of kernel
safety.

__irq_flag annotation is used to accept IRQ flags for the IRQ-variants,
with the same semantics as existing bpf_local_irq_{save, restore}.

These kfuncs will require additional verifier-side support in subsequent
commits, to allow programs to hold multiple locks at the same time.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  7 +++
 include/linux/bpf.h              |  1 +
 kernel/locking/rqspinlock.c      | 78 ++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 418b652e0249..06906489d9ba 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -23,6 +23,13 @@ struct rqspinlock {
 	};
 };
 
+/* Even though this is same as struct rqspinlock, we need to emit a distinct
+ * type in BTF for BPF programs.
+ */
+struct bpf_res_spin_lock {
+	u32 val;
+};
+
 struct qspinlock;
 #ifdef CONFIG_QUEUED_SPINLOCKS
 typedef struct qspinlock rqspinlock_t;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4c4028d865ee..aa47e11371b3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -30,6 +30,7 @@
 #include <linux/static_call.h>
 #include <linux/memcontrol.h>
 #include <linux/cfi.h>
+#include <asm/rqspinlock.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 0031a1bfbd4e..0c53d36e2f6c 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -15,6 +15,8 @@
 
 #include <linux/smp.h>
 #include <linux/bug.h>
+#include <linux/bpf.h>
+#include <linux/err.h>
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
@@ -684,3 +686,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
 
 #endif /* CONFIG_QUEUED_SPINLOCKS */
+
+__bpf_kfunc_start_defs();
+
+#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; })
+
+__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
+{
+	int ret;
+
+	BUILD_BUG_ON(sizeof(rqspinlock_t) != sizeof(struct bpf_res_spin_lock));
+	BUILD_BUG_ON(__alignof__(rqspinlock_t) != __alignof__(struct bpf_res_spin_lock));
+
+	preempt_disable();
+	ret = res_spin_lock((rqspinlock_t *)lock);
+	if (unlikely(ret)) {
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock)
+{
+	res_spin_unlock((rqspinlock_t *)lock);
+	preempt_enable();
+}
+
+__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags;
+	int ret;
+
+	preempt_disable();
+	local_irq_save(flags);
+	ret = res_spin_lock((rqspinlock_t *)lock);
+	if (unlikely(ret)) {
+		local_irq_restore(flags);
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	*ptr = flags;
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags = *ptr;
+
+	res_spin_unlock((rqspinlock_t *)lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(rqspinlock_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock)
+BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore)
+BTF_KFUNCS_END(rqspinlock_kfunc_ids)
+
+static const struct btf_kfunc_id_set rqspinlock_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &rqspinlock_kfunc_ids,
+};
+
+static __init int rqspinlock_register_kfuncs(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set);
+}
+late_initcall(rqspinlock_register_kfuncs);

From patchwork Mon Mar  3 15:23:03 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999043
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1268923908C;
	Mon,  3 Mar 2025 15:23:41 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015428; cv=none;
 b=iAlGwwjpOuoRml2fNcTHbUak4/UQyf7V4J9p8kk74u6gw2wzhk2T0xs4Fi9AYLhO0xF7XjHFOLBcJuGbC5IZR0A+WZHVy+IKgs2WqSM89rfaNKFIAlFZzu6nIwPyqBbxcyILWN3bNpxOdSwEuoQe4FbAsX4RcUq6ZRqg3s0TUIg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015428; c=relaxed/simple;
	bh=MO7GJTSenU9MV6IQtiRHmjgSgn53O+a0eeEhZHGj470=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=WDp7mOY2bsn2F5NiXcxV7MCF7HTWvPMcCVbj0yS9rJbrBpoYBMUm/rjN/EcTmj/kxOIv8w5frPUbAIa5zmM1MEUwN+mh6IiUjtuc2kQ6Qqkfo6Zi2u89ho2aLJDVVouIne8tIL6UOLoKff8S5aSR4tiRocaBY3rIj/oEEgfYwsc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Pyh3rxiO; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Pyh3rxiO"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-43bbd711eedso11284795e9.3;
        Mon, 03 Mar 2025 07:23:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015420; x=1741620220;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=+4h+ukFvLyEJPpccEjjwwvawjNKj/Xrf/LqUnWOEd4Y=;
        b=Pyh3rxiOQSnrOSxW+Dw79e5FHhpcNOwIwHrI6AgsAcAUvLXwm9yZgRxcZpuCY91+43
         TzEjVTpE23gq2nn17b7jVNjvXOvbthW1i3TxylKmxJSODE/aowgTNH+rQlInt/hnkqSd
         +gOStsjjV219VoVErI2hEI3Hvx1cGEe7PbJdFkw+EetOHQ6AJes0/U+tuXHxKtUENd42
         rYinUFUu9psfuy6lvgJWR6sWbLI/qwzZ9gkb1MTDAeCrCBpeFzXXhWUPh1Z5Lwq30s0m
         YQP82Ax5WWAbfwXwBesyqJdsKQ/Ho1Yd9sYsPjKoCVKZZwLqcoG+Pm4bu73c33euyp/W
         t8dQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015420; x=1741620220;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=+4h+ukFvLyEJPpccEjjwwvawjNKj/Xrf/LqUnWOEd4Y=;
        b=lOAwSw5oHa++4gpAAZZTzkRb/8qOJg4QRBF3szafaNJIIYmuK++mJP3jj+kgHVUiGf
         O8K/VLYchBYUH8zvo4zD2lxJf7eEQ3PGvM/vVUNatd1cRPOJFpNGbbb4EiXIjHhyadEU
         EGCeZ/ryJX0lsu6zOCfVJ2q3joBEldTDvkQs71LFKwTluSdpXxER38bZ4+LbElxjt6ao
         qwIgI0TJteeFVotO4CgNOh6E8WcsW9LkOHBxnrkfniq4RnNClZvvHSlSL+a7DctU1RVD
         wID0AA2miJwypDnfpunZ8EFKlJnFCM5F0BWlbvz6t5sgU27VZnlz46aDUlbcd3rVuicO
         u36w==
X-Forwarded-Encrypted: i=1;
 AJvYcCVyE1UA2o+X2KGUmetSsE6Vgqa7YcSMr1BkkwdSFhR/F4ZPj5rBw3SJo7MRRh/Hp3Gy4E9EUaTovKvu2bg=@vger.kernel.org
X-Gm-Message-State: AOJu0YxpEx2R/YnS5pGYAAAmY5n3myZFmeXDqm8sBbIDkiDdMQpqKsdJ
	dNmxGa8UGOpNdpWKJYLfEwW2eCltHX9Pidbs4fRi70hi8DWxKaN+INWP5GMF5/c=
X-Gm-Gg: ASbGnctoBxHLlzyJx3iPIZ7Tyry5Z4KUZlOJM3EjPQ7HrexGulFGuaz/hDPd8U8bo6s
	W4mWAAlmtSrghJB6DcVYUNCLz+58vBKNeNmD7K5auUBG7MdHO+MVJduJkiijyK6MtUpj2xYzmKU
	zkqHU1Uida8VrFIYsqoSM3/S8nEb5ezXo1k7z7ivsORXECfSHm4rta/J8A7pBgnJSCPHDTElVJt
	Y7RiBUB/qlizYACQfKoC5w8yCRwWauPHxs+e43574QKMOW9ICbCij7VmFYInvWroGrfDv2YeQmC
	Z1l2zlHmv9pdV5XbiWzvVYHl8KlsaahGRYs=
X-Google-Smtp-Source: 
 AGHT+IEdAQLbk3XWgP5dxzDcU8Ji7pMN7/8gM3fi+1QJQlX4tiY+pFVN015rHmkdZZHHkiT9FlYiGA==
X-Received: by 2002:a05:600c:a4b:b0:439:8185:4ad4 with SMTP id
 5b1f17b1804b1-43ba6747082mr109032045e9.27.1741015419506;
        Mon, 03 Mar 2025 07:23:39 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:5f::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e47b7c4fsm14905417f8f.52.2025.03.03.07.23.38
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:38 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Eduard Zingerman <eddyz87@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 23/25] bpf: Implement verifier support for
 rqspinlock
Date: Mon,  3 Mar 2025 07:23:03 -0800
Message-ID: <20250303152305.3195648-24-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=28432; h=from:subject;
 bh=MO7GJTSenU9MV6IQtiRHmjgSgn53O+a0eeEhZHGj470=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZN6Fo9q/LgMjUiSwqqQrTQMGCUIN7wbtUmcfA
 +jv1xeaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8Ryk4iD/
 965nxBXYR4dDVLmBm4StFtOJqi96rYjPW14Nx0sXbgZqtredefGx6C9t3nso+ga5OFs6Yx+xMoDQRg
 YWbhYxO3Ah7+hOpG/d6L0s+xnArh3d/iZccG+ZZBrwtDmkTir95j6qzGzg5Yih5QvE+voTDuUY53Em
 Ksftg/Dy4mTbp9WeCZgzvl9cgXSCuOmkDBPQVStIqV0sD7Cc2djPPY5bziy8LAbFRzIg9fLWLDQsOP
 i/7Edl/jMnt26/1s8G2GQHZmONpUOMz1+EbKo5yUnb2FDMLMN3cWT77SMGKxAf3WGBuM8dh7aETlfc
 Qwa3IlRRmJlIT2V7aaq5wpCTglKdk7GcrwmBLAZwPa9E2OdnrsE0xbfgKwyadRTkKi/g+YDjuX62oK
 bxvAMKv1+mMNQ2R2NWpZSi77gVZSJN8/IMzAifanV8irD72FhR30XzZ6eKed/e6z7Q2b4Vh9CK6/X+
 5cTtfh+RGvbKXZ2/rpoirQMG6Mn8kwxoChEbw8Qg1FSoZt6b1R81aAfJyxksexQhZaz0g9/KRtau4E
 isCtpZec3tmqmAfuHSAs7azUWnmw/YBFVByhGB8eK5zHHNdoDBerhK5Et15S+Ds3zyUv2wL9MxvyPs
 McFMK4xYH+u7FbG+hORXqQ2cp/vKO8Oaanj5qSwrZI4qmZ63JRCdJp0VULrQ==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce verifier-side support for rqspinlock kfuncs. The first step is
allowing bpf_res_spin_lock type to be defined in map values and
allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK
field to recognize and validate.

Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only
one of them (and at most one of them per-object, like before) must be
present. The bpf_res_spin_lock can also be used to protect objects that
require lock protection for their kfuncs, like BPF rbtree and linked
list.

The verifier plumbing to simulate success and failure cases when calling
the kfuncs is done by pushing a new verifier state to the verifier state
stack which will verify the failure case upon calling the kfunc. The
path where success is indicated creates all lock reference state and IRQ
state (if necessary for irqsave variants). In the case of failure, the
state clears the registers r0-r5, sets the return value, and skips kfunc
processing, proceeding to the next instruction.

When marking the return value for success case, the value is marked as
0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program,
whenever user checks the return value as 'if (ret)' or 'if (ret < 0)'
the verifier never traverses such branches for success cases, and would
be aware that the lock is not held in such cases.

We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs
are invoked. We introduce a kfunc_class state to avoid mixing lock
irqrestore kfuncs with IRQ state created by bpf_local_irq_save.

With all this infrastructure, these kfuncs become usable in programs
while satisfying all safety properties required by the kernel.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h          |   9 ++
 include/linux/bpf_verifier.h |  16 ++-
 kernel/bpf/btf.c             |  26 ++++-
 kernel/bpf/syscall.c         |   6 +-
 kernel/bpf/verifier.c        | 219 ++++++++++++++++++++++++++++-------
 5 files changed, 231 insertions(+), 45 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index aa47e11371b3..ad4468422770 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -205,6 +205,7 @@ enum btf_field_type {
 	BPF_REFCOUNT   = (1 << 9),
 	BPF_WORKQUEUE  = (1 << 10),
 	BPF_UPTR       = (1 << 11),
+	BPF_RES_SPIN_LOCK = (1 << 12),
 };
 
 typedef void (*btf_dtor_kfunc_t)(void *);
@@ -240,6 +241,7 @@ struct btf_record {
 	u32 cnt;
 	u32 field_mask;
 	int spin_lock_off;
+	int res_spin_lock_off;
 	int timer_off;
 	int wq_off;
 	int refcount_off;
@@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return "bpf_spin_lock";
+	case BPF_RES_SPIN_LOCK:
+		return "bpf_res_spin_lock";
 	case BPF_TIMER:
 		return "bpf_timer";
 	case BPF_WORKQUEUE:
@@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return sizeof(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return sizeof(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return sizeof(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return __alignof__(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return __alignof__(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return __alignof__(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 	case BPF_RB_ROOT:
 		/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_KPTR_UNREF:
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d338f2a96bba..269449363f78 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -115,6 +115,14 @@ struct bpf_reg_state {
 			int depth:30;
 		} iter;
 
+		/* For irq stack slots */
+		struct {
+			enum {
+				IRQ_NATIVE_KFUNC,
+				IRQ_LOCK_KFUNC,
+			} kfunc_class;
+		} irq;
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -255,9 +263,11 @@ struct bpf_reference_state {
 	 * default to pointer reference on zero initialization of a state.
 	 */
 	enum ref_state_type {
-		REF_TYPE_PTR	= 1,
-		REF_TYPE_IRQ	= 2,
-		REF_TYPE_LOCK	= 3,
+		REF_TYPE_PTR		= (1 << 1),
+		REF_TYPE_IRQ		= (1 << 2),
+		REF_TYPE_LOCK		= (1 << 3),
+		REF_TYPE_RES_LOCK 	= (1 << 4),
+		REF_TYPE_RES_LOCK_IRQ	= (1 << 5),
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 519e3f5e9c10..f7a2bfb0c11a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3481,6 +3481,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
 			goto end;
 		}
 	}
+	if (field_mask & BPF_RES_SPIN_LOCK) {
+		if (!strcmp(name, "bpf_res_spin_lock")) {
+			if (*seen_mask & BPF_RES_SPIN_LOCK)
+				return -E2BIG;
+			*seen_mask |= BPF_RES_SPIN_LOCK;
+			type = BPF_RES_SPIN_LOCK;
+			goto end;
+		}
+	}
 	if (field_mask & BPF_TIMER) {
 		if (!strcmp(name, "bpf_timer")) {
 			if (*seen_mask & BPF_TIMER)
@@ -3659,6 +3668,7 @@ static int btf_find_field_one(const struct btf *btf,
 
 	switch (field_type) {
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_LIST_NODE:
@@ -3952,6 +3962,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		return ERR_PTR(-ENOMEM);
 
 	rec->spin_lock_off = -EINVAL;
+	rec->res_spin_lock_off = -EINVAL;
 	rec->timer_off = -EINVAL;
 	rec->wq_off = -EINVAL;
 	rec->refcount_off = -EINVAL;
@@ -3979,6 +3990,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			/* Cache offset for faster lookup at runtime */
 			rec->spin_lock_off = rec->fields[i].offset;
 			break;
+		case BPF_RES_SPIN_LOCK:
+			WARN_ON_ONCE(rec->spin_lock_off >= 0);
+			/* Cache offset for faster lookup at runtime */
+			rec->res_spin_lock_off = rec->fields[i].offset;
+			break;
 		case BPF_TIMER:
 			WARN_ON_ONCE(rec->timer_off >= 0);
 			/* Cache offset for faster lookup at runtime */
@@ -4022,9 +4038,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->cnt++;
 	}
 
+	if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
 	/* bpf_{list_head, rb_node} require bpf_spin_lock */
 	if ((btf_record_has_field(rec, BPF_LIST_HEAD) ||
-	     btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) {
+	     btf_record_has_field(rec, BPF_RB_ROOT)) &&
+		 (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -5637,7 +5659,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 
 		type = &tab->types[tab->cnt];
 		type->btf_id = i;
-		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
 						  BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT |
 						  BPF_KPTR, t->size);
 		/* The record cannot be unset, treat it as an error if so */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 57a438706215..5cf017e37d7d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -665,6 +665,7 @@ void btf_record_free(struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -717,6 +718,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -794,6 +796,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 
 		switch (fields[i].type) {
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 			break;
 		case BPF_TIMER:
 			bpf_timer_cancel_and_free(field_ptr);
@@ -1229,7 +1232,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 		return -EINVAL;
 
 	map->record = btf_parse_fields(btf, value_type,
-				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+				       BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
 				       BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR,
 				       map->value_size);
 	if (!IS_ERR_OR_NULL(map->record)) {
@@ -1248,6 +1251,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 			case 0:
 				continue;
 			case BPF_SPIN_LOCK:
+			case BPF_RES_SPIN_LOCK:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY &&
 				    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index eb1624f6e743..6c8ef72ee6bc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -456,7 +456,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog)
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
-	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK);
+	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK);
 }
 
 static bool type_is_rdonly_mem(u32 type)
@@ -1148,7 +1148,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id);
 
 static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 				     struct bpf_kfunc_call_arg_meta *meta,
-				     struct bpf_reg_state *reg, int insn_idx)
+				     struct bpf_reg_state *reg, int insn_idx,
+				     int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1170,6 +1171,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
 	st->live |= REG_LIVE_WRITTEN;
 	st->ref_obj_id = id;
+	st->irq.kfunc_class = kfunc_class;
 
 	for (i = 0; i < BPF_REG_SIZE; i++)
 		slot->slot_type[i] = STACK_IRQ_FLAG;
@@ -1178,7 +1180,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				      int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1192,6 +1195,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r
 	slot = &state->stack[spi];
 	st = &slot->spilled_ptr;
 
+	if (st->irq.kfunc_class != kfunc_class) {
+		const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+		const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+
+		verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n",
+			flag_kfunc, used_kfunc);
+		return -EINVAL;
+	}
+
 	err = release_irq_state(env->cur_state, st->ref_obj_id);
 	WARN_ON_ONCE(err && err != -EACCES);
 	if (err) {
@@ -1602,7 +1614,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st
 	for (i = 0; i < state->acquired_refs; i++) {
 		struct bpf_reference_state *s = &state->refs[i];
 
-		if (s->type != type)
+		if (!(s->type & type))
 			continue;
 
 		if (s->id == id && s->ptr == ptr)
@@ -8063,6 +8075,12 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
 	return err;
 }
 
+enum {
+	PROCESS_SPIN_LOCK = (1 << 0),
+	PROCESS_RES_LOCK  = (1 << 1),
+	PROCESS_LOCK_IRQ  = (1 << 2),
+};
+
 /* Implementation details:
  * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL.
  * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL.
@@ -8085,30 +8103,33 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
  * env->cur_state->active_locks remembers which map value element or allocated
  * object got locked and clears it after bpf_spin_unlock.
  */
-static int process_spin_lock(struct bpf_verifier_env *env, int regno,
-			     bool is_lock)
+static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags)
 {
+	bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK;
+	const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin";
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	struct bpf_verifier_state *cur = env->cur_state;
 	bool is_const = tnum_is_const(reg->var_off);
+	bool is_irq = flags & PROCESS_LOCK_IRQ;
 	u64 val = reg->var_off.value;
 	struct bpf_map *map = NULL;
 	struct btf *btf = NULL;
 	struct btf_record *rec;
+	u32 spin_lock_off;
 	int err;
 
 	if (!is_const) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s_lock has to be at the constant offset\n",
+			regno, lock_str);
 		return -EINVAL;
 	}
 	if (reg->type == PTR_TO_MAP_VALUE) {
 		map = reg->map_ptr;
 		if (!map->btf) {
 			verbose(env,
-				"map '%s' has to have BTF in order to use bpf_spin_lock\n",
-				map->name);
+				"map '%s' has to have BTF in order to use %s_lock\n",
+				map->name, lock_str);
 			return -EINVAL;
 		}
 	} else {
@@ -8116,36 +8137,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 	}
 
 	rec = reg_btf_record(reg);
-	if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
-		verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
-			map ? map->name : "kptr");
+	if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) {
+		verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local",
+			map ? map->name : "kptr", lock_str);
 		return -EINVAL;
 	}
-	if (rec->spin_lock_off != val + reg->off) {
-		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
-			val + reg->off, rec->spin_lock_off);
+	spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off;
+	if (spin_lock_off != val + reg->off) {
+		verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n",
+			val + reg->off, lock_str, spin_lock_off);
 		return -EINVAL;
 	}
 	if (is_lock) {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
 		else
 			ptr = btf;
 
-		if (cur->active_locks) {
-			verbose(env,
-				"Locking two bpf_spin_locks are not allowed\n");
-			return -EINVAL;
+		if (!is_res_lock && cur->active_locks) {
+			if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) {
+				verbose(env,
+					"Locking two bpf_spin_locks are not allowed\n");
+				return -EINVAL;
+			}
+		} else if (is_res_lock && cur->active_locks) {
+			if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, reg->id, ptr)) {
+				verbose(env, "Acquiring the same lock again, AA deadlock detected\n");
+				return -EINVAL;
+			}
 		}
-		err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
+
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr);
 		if (err < 0) {
 			verbose(env, "Failed to acquire lock state\n");
 			return err;
 		}
 	} else {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
@@ -8153,12 +8191,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			ptr = btf;
 
 		if (!cur->active_locks) {
-			verbose(env, "bpf_spin_unlock without taking a lock\n");
+			verbose(env, "%s_unlock without taking a lock\n", lock_str);
 			return -EINVAL;
 		}
 
-		if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) {
-			verbose(env, "bpf_spin_unlock of different lock\n");
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		if (release_lock_state(cur, type, reg->id, ptr)) {
+			verbose(env, "%s_unlock of different lock\n", lock_str);
 			return -EINVAL;
 		}
 
@@ -9484,11 +9528,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 		}
 		if (meta->func_id == BPF_FUNC_spin_lock) {
-			err = process_spin_lock(env, regno, true);
+			err = process_spin_lock(env, regno, PROCESS_SPIN_LOCK);
 			if (err)
 				return err;
 		} else if (meta->func_id == BPF_FUNC_spin_unlock) {
-			err = process_spin_lock(env, regno, false);
+			err = process_spin_lock(env, regno, 0);
 			if (err)
 				return err;
 		} else {
@@ -11370,7 +11414,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		regs[BPF_REG_0].map_uid = meta.map_uid;
 		regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag;
 		if (!type_may_be_null(ret_flag) &&
-		    btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) {
+		    btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) {
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
 		break;
@@ -11542,10 +11586,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 /* mark_btf_func_reg_size() is used when the reg size is determined by
  * the BTF func_proto's return value size and argument.
  */
-static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
-				   size_t reg_size)
+static void __mark_btf_func_reg_size(struct bpf_verifier_env *env, struct bpf_reg_state *regs,
+				     u32 regno, size_t reg_size)
 {
-	struct bpf_reg_state *reg = &cur_regs(env)[regno];
+	struct bpf_reg_state *reg = &regs[regno];
 
 	if (regno == BPF_REG_0) {
 		/* Function return value */
@@ -11563,6 +11607,12 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
 	}
 }
 
+static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
+				   size_t reg_size)
+{
+	return __mark_btf_func_reg_size(env, cur_regs(env), regno, reg_size);
+}
+
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_ACQUIRE;
@@ -11700,6 +11750,7 @@ enum {
 	KF_ARG_RB_ROOT_ID,
 	KF_ARG_RB_NODE_ID,
 	KF_ARG_WORKQUEUE_ID,
+	KF_ARG_RES_SPIN_LOCK_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -11709,6 +11760,7 @@ BTF_ID(struct, bpf_list_node)
 BTF_ID(struct, bpf_rb_root)
 BTF_ID(struct, bpf_rb_node)
 BTF_ID(struct, bpf_wq)
+BTF_ID(struct, bpf_res_spin_lock)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -11757,6 +11809,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
 }
 
+static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID);
+}
+
 static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
 				  const struct btf_param *arg)
 {
@@ -11828,6 +11885,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_MAP,
 	KF_ARG_PTR_TO_WORKQUEUE,
 	KF_ARG_PTR_TO_IRQ_FLAG,
+	KF_ARG_PTR_TO_RES_SPIN_LOCK,
 };
 
 enum special_kfunc_type {
@@ -11866,6 +11924,10 @@ enum special_kfunc_type {
 	KF_bpf_iter_num_destroy,
 	KF_bpf_set_dentry_xattr,
 	KF_bpf_remove_dentry_xattr,
+	KF_bpf_res_spin_lock,
+	KF_bpf_res_spin_unlock,
+	KF_bpf_res_spin_lock_irqsave,
+	KF_bpf_res_spin_unlock_irqrestore,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -11955,6 +12017,10 @@ BTF_ID(func, bpf_remove_dentry_xattr)
 BTF_ID_UNUSED
 BTF_ID_UNUSED
 #endif
+BTF_ID(func, bpf_res_spin_lock)
+BTF_ID(func, bpf_res_spin_unlock)
+BTF_ID(func, bpf_res_spin_lock_irqsave)
+BTF_ID(func, bpf_res_spin_unlock_irqrestore)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -12048,6 +12114,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_irq_flag(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_IRQ_FLAG;
 
+	if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RES_SPIN_LOCK;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -12155,13 +12224,19 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 			     struct bpf_kfunc_call_arg_meta *meta)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	int err, kfunc_class = IRQ_NATIVE_KFUNC;
 	bool irq_save;
-	int err;
 
-	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) {
+	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) {
 		irq_save = true;
-	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) {
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+			kfunc_class = IRQ_LOCK_KFUNC;
+	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] ||
+		   meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) {
 		irq_save = false;
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+			kfunc_class = IRQ_LOCK_KFUNC;
 	} else {
 		verbose(env, "verifier internal error: unknown irq flags kfunc\n");
 		return -EFAULT;
@@ -12177,7 +12252,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx);
+		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class);
 		if (err)
 			return err;
 	} else {
@@ -12191,7 +12266,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = unmark_stack_slot_irq_flag(env, reg);
+		err = unmark_stack_slot_irq_flag(env, reg, kfunc_class);
 		if (err)
 			return err;
 	}
@@ -12318,7 +12393,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
+			    id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -12354,9 +12430,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl];
 }
 
+static bool is_bpf_res_spin_lock_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore];
+}
+
 static bool kfunc_spin_allowed(u32 btf_id)
 {
-	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id);
+	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) ||
+	       is_bpf_res_spin_lock_kfunc(btf_id);
 }
 
 static bool is_sync_callback_calling_kfunc(u32 btf_id)
@@ -12788,6 +12873,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_CONST_STR:
 		case KF_ARG_PTR_TO_WORKQUEUE:
 		case KF_ARG_PTR_TO_IRQ_FLAG:
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -13086,6 +13172,28 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
+		{
+			int flags = PROCESS_RES_LOCK;
+
+			if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d doesn't point to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+
+			if (!is_bpf_res_spin_lock_kfunc(meta->func_id))
+				return -EFAULT;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+				flags |= PROCESS_SPIN_LOCK;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+				flags |= PROCESS_LOCK_IRQ;
+			ret = process_spin_lock(env, regno, flags);
+			if (ret < 0)
+				return ret;
+			break;
+		}
 		}
 	}
 
@@ -13171,6 +13279,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	insn_aux->is_iter_next = is_iter_next_kfunc(&meta);
 
+	if (!insn->off &&
+	    (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] ||
+	     insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) {
+		struct bpf_verifier_state *branch;
+		struct bpf_reg_state *regs;
+
+		branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
+		if (!branch) {
+			verbose(env, "failed to push state for failed lock acquisition\n");
+			return -ENOMEM;
+		}
+
+		regs = branch->frame[branch->curframe]->regs;
+
+		/* Clear r0-r5 registers in forked state */
+		for (i = 0; i < CALLER_SAVED_REGS; i++)
+			mark_reg_not_init(env, regs, caller_saved[i]);
+
+		mark_reg_unknown(env, regs, BPF_REG_0);
+		err = __mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1);
+		if (err) {
+			verbose(env, "failed to mark s32 range for retval in forked state for lock\n");
+			return err;
+		}
+		__mark_btf_func_reg_size(env, regs, BPF_REG_0, sizeof(u32));
+	}
+
 	if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) {
 		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n");
 		return -EACCES;
@@ -13341,6 +13476,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	if (btf_type_is_scalar(t)) {
 		mark_reg_unknown(env, regs, BPF_REG_0);
+		if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+		    meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]))
+			__mark_reg_const_zero(env, &regs[BPF_REG_0]);
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
@@ -18275,7 +18413,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 		case STACK_IRQ_FLAG:
 			old_reg = &old->stack[spi].spilled_ptr;
 			cur_reg = &cur->stack[spi].spilled_ptr;
-			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) ||
+			    old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class)
 				return false;
 			break;
 		case STACK_MISC:
@@ -18319,6 +18458,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 		case REF_TYPE_IRQ:
 			break;
 		case REF_TYPE_LOCK:
+		case REF_TYPE_RES_LOCK:
+		case REF_TYPE_RES_LOCK_IRQ:
 			if (old->refs[i].ptr != cur->refs[i].ptr)
 				return false;
 			break;
@@ -19641,7 +19782,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
-	if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
+	if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) {
 		if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {
 			verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;

From patchwork Mon Mar  3 15:23:04 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999041
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com
 [209.85.221.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DC9B23909C;
	Mon,  3 Mar 2025 15:23:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015426; cv=none;
 b=fv5mVwBTXsoj7nZHw+quwegW+ziYjxHOt07vah5u51YcMZTIvBgKgEdWo4blwkdAl9RgKijtSFst23u8KQW/oA5rkcs8MGQDkN7rJsDd1WsYeSL3x3eo77FPSON7coh3TNeY53mkJPYlNJX61zdaNgnxziwLfJHRLL47GJW0qgs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015426; c=relaxed/simple;
	bh=yd3gRpKNHABqLGxFwmo4qitDCfszyenvxRvQDwJYTjI=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=LkjQokhZaovb6BWZwvXOJeLKxjCi1DO4nUP7bVqy96w62XyaaRFnZYwHuBOTyWHZSzZNIEuUFxVpVYxQ5q22NJHl2M4rJsareYbEK+JwqFJBkDHFIohkzo9A+dlRkgQlajLybYurzKXHoktSXZYjSTO5dukZGBm2QqjnoncB53s=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=gDl/zvub; arc=none smtp.client-ip=209.85.221.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="gDl/zvub"
Received: by mail-wr1-f68.google.com with SMTP id
 ffacd0b85a97d-390effd3e85so3257382f8f.0;
        Mon, 03 Mar 2025 07:23:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015421; x=1741620221;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=JK06538PoT0hdERtpqSLZ9dXgVnOMpofNScplUny+Pg=;
        b=gDl/zvub9tYQEpfE8tziR5SbZ72TGGOg4J6glV0DxW4iF/MreCWZfVVmaE+XujCFvK
         j1ZjIPSTdiBCQcRndtg7lE2Kd3strHDyH8lmMQAbQm1pHNI4tFa5mqfGfiPN0YZyB4ur
         NLpAx8zHe/Q06Z6yPXfXCXq5tD7wFXKc/LWtdwgyUYucDeGRcZ7Gf31GS5sk6HS4AqLI
         m+WPDU7s+1CfqIwaOZ/HmWjvGTjRcmMIkyGsN+rX0bRNtWSPOxKl1XZLle3oK1tOmEoA
         gyV1YKm7fk2xJaKvVfzaDYLCyjWUh1ody+tlfrf6RE7DsdU9Ak3eN/uE9A8JLP+qodCi
         pGog==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015421; x=1741620221;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=JK06538PoT0hdERtpqSLZ9dXgVnOMpofNScplUny+Pg=;
        b=pAeG0yzcg7MdoUyRp5M1urStVy30ct4qDrIgOyES5r07axv2J9G7cip5/ME0vdco5Z
         8z4NbQwZBFBgq0gJxC7YOw+LCpo7x2VFNVKzkIZ4GaJrrYuuoqkAgvTI2g0O94MaePSX
         XUxq3C+tukQ2Gs6VNUY6w+6H92kEEFS9rNbwdpRl+rNfJkiCAglHFZQVMvpMwlICLXj+
         N5kqzjE2JUYfaC3ncthVpldI8O33GWs6e8xg0ACm+D/gGIMIJ0Camh190iEBlkdKptwQ
         owuDq3QhPlIVqt5CMD5SSRblYg0wARAGJD25dll2nKpBuu0tk0r5nDjNeFVA3Nt0GM9m
         Uy/Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCXfSg8NQxqF+i754jFCvUFbWjCsMZiGpnHCWCAnQ4AlK+HQPhjYuWSxtJOdlnhCAM7GeljPo7SRTEVDYPw=@vger.kernel.org
X-Gm-Message-State: AOJu0YyOOtQcKKLxasqsnX+rqof2Iz1CRmdfA3XfJSYvSoPm0DH8U0Ce
	XpRUnw/aKDJ1tu+SKltGxQcofRqzK6gH7now7Rw/DuTzmg/Vt2aNlwrn+cwXXWE=
X-Gm-Gg: ASbGncvV4XMuxuAouIb4Q71ZcbwcymYwIHHAs7CZh8nuLcgYQV/orq9yMMbOrX+4LaW
	xitjdPIcAj86dZGzskj/ABRpm24Uh9pmekx9QsrKGeEgNiSVtt9EzeF6I3GHpIC9C2yFHrnQhQB
	KeAyMvrEbhYulwdEPyuiWxi7glOvuTBy7a7IIr365+uc2B1wIRYbLeAg4YocrK6Xr2cjNDXthrK
	vNh92aHMI/IAYQJ0/pkq7vj08bEypnp3+ning44bA86l46UQShgEn7c9iWLRQvVRV/4YirdmSuG
	GpiqTPgTsOJHMOo88ekNVF18tJlr+DmrS0M=
X-Google-Smtp-Source: 
 AGHT+IEWrOLXEnoxplL0eXEi/SpthtQwbZ6y7pKL2e59tZ8s3JQadQxULhRUvfXdSLXbDwyc9fQvQw==
X-Received: by 2002:a5d:5f42:0:b0:391:5f:fa4e with SMTP id
 ffacd0b85a97d-391005ffe69mr4807658f8f.29.1741015421039;
        Mon, 03 Mar 2025 07:23:41 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:74::])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-390e47a7868sm14700001f8f.24.2025.03.03.07.23.40
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:40 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 24/25] bpf: Maintain FIFO property for rqspinlock
 unlock
Date: Mon,  3 Mar 2025 07:23:04 -0800
Message-ID: <20250303152305.3195648-25-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=4773; h=from:subject;
 bh=yd3gRpKNHABqLGxFwmo4qitDCfszyenvxRvQDwJYTjI=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZ7k4vMGesNz3gUHT6LOfeGtXRbGBf+aKUvuv/
 umGpqdqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RygluD/
 4uWKUHAjRffEJ86xP+I5MHqRuOHQ9jEAhSSdV7VPagDzTfZS8/JrswTTGLd4rS2xgwcFBwODkZ2aBs
 ayN+aLN1VPfzRRauqqYTO74IQCjSQVlRfrUc7QPeFDKF5iD9QPiLL3exq4eW9AGB7e+PuYaWH0wp3Q
 Wo/2jS/gXUhBTx2P5Iv9Lj56om69Hq7TGuUFrQVyvaC0CeinRJzaROx754KgWhnijwvOYUHsFZNFWS
 BR9qzsCLUTJhZhlEoX1z+ckbFZlKaMumFJaLjFUi3LW0r0wppTr+j6iVZ29CPh4p0x4H1/YOVIOehV
 mrk58kirt5EbcalsufoTGdl152sD0qdKJHO0pIefGYB9QVp9zzHJp+dGg5erc8UIcmn6J/IJuehSWC
 KgKZF6l00P5i0xwwz7h3GxNqSOfKGemo+nMTASSCL5HQGjzyWrrTOl3Yt5Sztu7F+qo4kTW554DaT6
 z+9LE7AOC45QooYOFu15kG+QAhLeo+5nrl38cyPi60IHkU1JQ74e4y/92wb+i1BJsmtndDZAcwh48I
 YcPMi8nG85nDdwOaRNx4VNvVKDPLz3FdagXogkzXKN9BtKEUy/7x5cVgwICmIOCmHC3fqFTyvAimEK
 8q9RN8RRT/ciTibX4PZBm9gB4uF8rSuHE+89XzSMRgzUjf0vXy/Qf6oVA84w==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Since out-of-order unlocks are unsupported for rqspinlock, and irqsave
variants enforce strict FIFO ordering anyway, make the same change for
normal non-irqsave variants, such that FIFO ordering is enforced.

Two new verifier state fields (active_lock_id, active_lock_ptr) are used
to denote the top of the stack, and prev_id and prev_ptr are ascertained
whenever popping the topmost entry through an unlock.

Take special care to make these fields part of the state comparison in
refsafe.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  3 +++
 kernel/bpf/verifier.c        | 33 ++++++++++++++++++++++++++++-----
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 269449363f78..7348bd824e16 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -268,6 +268,7 @@ struct bpf_reference_state {
 		REF_TYPE_LOCK		= (1 << 3),
 		REF_TYPE_RES_LOCK 	= (1 << 4),
 		REF_TYPE_RES_LOCK_IRQ	= (1 << 5),
+		REF_TYPE_LOCK_MASK	= REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
@@ -434,6 +435,8 @@ struct bpf_verifier_state {
 	u32 active_locks;
 	u32 active_preempt_locks;
 	u32 active_irq_id;
+	u32 active_lock_id;
+	void *active_lock_ptr;
 	bool active_rcu_lock;
 
 	bool speculative;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6c8ef72ee6bc..d3be8932abe4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1421,6 +1421,8 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf
 	dst->active_preempt_locks = src->active_preempt_locks;
 	dst->active_rcu_lock = src->active_rcu_lock;
 	dst->active_irq_id = src->active_irq_id;
+	dst->active_lock_id = src->active_lock_id;
+	dst->active_lock_ptr = src->active_lock_ptr;
 	return 0;
 }
 
@@ -1520,6 +1522,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r
 	s->ptr = ptr;
 
 	state->active_locks++;
+	state->active_lock_id = id;
+	state->active_lock_ptr = ptr;
 	return 0;
 }
 
@@ -1570,16 +1574,24 @@ static bool find_reference_state(struct bpf_verifier_state *state, int ptr_id)
 
 static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr)
 {
+	void *prev_ptr = NULL;
+	u32 prev_id = 0;
 	int i;
 
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].type != type)
-			continue;
-		if (state->refs[i].id == id && state->refs[i].ptr == ptr) {
+		if (state->refs[i].type == type && state->refs[i].id == id &&
+		    state->refs[i].ptr == ptr) {
 			release_reference_state(state, i);
 			state->active_locks--;
+			/* Reassign active lock (id, ptr). */
+			state->active_lock_id = prev_id;
+			state->active_lock_ptr = prev_ptr;
 			return 0;
 		}
+		if (state->refs[i].type & REF_TYPE_LOCK_MASK) {
+			prev_id = state->refs[i].id;
+			prev_ptr = state->refs[i].ptr;
+		}
 	}
 	return -EINVAL;
 }
@@ -8201,6 +8213,14 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags)
 			type = REF_TYPE_RES_LOCK;
 		else
 			type = REF_TYPE_LOCK;
+		if (!find_lock_state(cur, type, reg->id, ptr)) {
+			verbose(env, "%s_unlock of different lock\n", lock_str);
+			return -EINVAL;
+		}
+		if (reg->id != cur->active_lock_id || ptr != cur->active_lock_ptr) {
+			verbose(env, "%s_unlock cannot be out of order\n", lock_str);
+			return -EINVAL;
+		}
 		if (release_lock_state(cur, type, reg->id, ptr)) {
 			verbose(env, "%s_unlock of different lock\n", lock_str);
 			return -EINVAL;
@@ -12393,8 +12413,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
-			    id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK_MASK, id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -18449,6 +18468,10 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 	if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap))
 		return false;
 
+	if (!check_ids(old->active_lock_id, cur->active_lock_id, idmap) ||
+	    old->active_lock_ptr != cur->active_lock_ptr)
+		return false;
+
 	for (i = 0; i < old->acquired_refs; i++) {
 		if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) ||
 		    old->refs[i].type != cur->refs[i].type)

From patchwork Mon Mar  3 15:23:05 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kumar Kartikeya Dwivedi <memxor@gmail.com>
X-Patchwork-Id: 13999042
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com
 [209.85.128.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92D14239561;
	Mon,  3 Mar 2025 15:23:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.68
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741015426; cv=none;
 b=suKoWTd7Hrs3xVfFFGqcqfngpgyIOAaoK14KdQzf6To8em0CN3nvwsjdVsod28evG9Rf3GjpoMFlmZQQ2frLFem4Tf55rGcwqvm5cACP3lFe5cLOVDASvox9LfsYNrMeV/qeL/0AUUkgG/kd/+u6ZFr7qRljG7lLvev4k4cLinE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741015426; c=relaxed/simple;
	bh=U3o4FRMgSYlK94f6p6XtgMmHxSWNXlOtPMuGZSejx2w=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=sOeHIJqXzcf7DcklNF7QB6UmbSn0dA3c7dvP8UfyH1YtGemiWrBusT6UdswSzaxCElZq4WmoVD+rLVU7rIhos2tBkd5/x/Us8grshtcOC4H4ei6SFk6uniNgrg/hUmxMcKnvxi8OJyFfl+VcX6oqKbmZqwAdymhEWhRNOJ3OtXE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=a2gOlkK8; arc=none smtp.client-ip=209.85.128.68
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="a2gOlkK8"
Received: by mail-wm1-f68.google.com with SMTP id
 5b1f17b1804b1-4399ee18a57so28857335e9.1;
        Mon, 03 Mar 2025 07:23:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741015422; x=1741620222;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=c2YDELerz8rU9E2PXmGNUF4c5wFbG0aqMQwnamqYdcU=;
        b=a2gOlkK8VwPr1Q+d8pMxXIbPi8dxxEcZw5KN5x4R2QGHwdRuC8dILbaGLbk/xldmGE
         3ABfQJ6ThsKkJarNNyY/Gqm1fkR5UUYlKGg8daTYzxc9tXvMg/QlGN/C4hfDw6SOkzZT
         Ipb1FpNxlf9Itqx4tbT9SQVDTSsRdzIHMp3Q7rpmwfto1bGXSBiODwvBXq7zXUJYKiPu
         K+ceTyqk3ITXmP5uDMN1UptA74wQbnXATzMKYHerUuOZa3yo6ac3oFyHipeh7Nwh54BO
         1uK3N2DCE52uRABiG0QcR4uGqjZDO/eaQFUps14/SveXBPTUu6e3Rx96sBSeziiiip06
         a+Ig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741015422; x=1741620222;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=c2YDELerz8rU9E2PXmGNUF4c5wFbG0aqMQwnamqYdcU=;
        b=KLWDJR7SyRAckK3EXZH1NHZscYn3SntYuUvF33bEQzwjmiBa5tCceTDfzfIfD2o0bi
         +yVkWtV0BuUBIvhZUwOxDIgpKX1PdIS3MadFRMZwu5EG7NyE7AYK7hl7fdKDYrL0+dH1
         Ldu/bsgfzzfE6FKGaYYhdoZbSq+eiSVqokIU/veB90BK4qQ7woevhImdzSrFtfHfIdm5
         U9/aslNlVRMAirVQDiQrQMrtY1Q0n1RSdcxXOmNyW1pf4ehO+epiL/7+IF3+GWjgCSER
         LWP2oow9KYjuY2Fr5v7Cp0ungeNzEHV+NrFg1tM+oMhkwAt3o7/hO1Y5Oe+bY+GOfLGO
         RE7g==
X-Forwarded-Encrypted: i=1;
 AJvYcCVN/4qt6hP2a+BlrrpnGOaet6F8Td9C711cGHvxXUil7Aq+vqyjQqTZ9Wb6WHsu8+qGnWTjAeKOLcKODBs=@vger.kernel.org
X-Gm-Message-State: AOJu0Yz0GkQVZevUDlU5970UPjRF1/HtpARakx0CWUYy7DgchKSq5ZX2
	trBAF7Ab+cCTmobH8thIH0YH7b4+tyfnL2W3Xm6HX2H/cHCgWa58Gxm4ZXTBgdw=
X-Gm-Gg: ASbGncs/YC8fPRAspVhCJJIv83eY3AU40ze8YI4AKLNbd3uvomg7Gf0CnBkrCVCn9W5
	gYSglScJEqw2bKHM2+wLCPmJli4s9tC4HpjsAaMjo96G1zdLdrDcBvtovjq3cVu5rp5q7S+813S
	BOaBQ8T14yT6cZpN+dDO4WQtHHTE3V1JofDXHb4zIVg+E3+3OEqpaUOp1abGO03JPTgH1Q0G3Yf
	k61ZFo3lXbFuS/BIJsxp5xNql9BqWsWJ0elV08GxRLbMHWQP7kGy/x+KmhLohTNagG7G3jm9Fz5
	VYzVa/SgI1D2+IFFYY9f6wN5Rv6Jf7zrWXE=
X-Google-Smtp-Source: 
 AGHT+IF1BwyKpBxufZL15FHfV1K+xkaoboqbhKYSSMajj4aWR8VsvlX6AMmLz83+PD1s+XMRyoVIXQ==
X-Received: by 2002:a05:600c:ca:b0:439:91c7:895a with SMTP id
 5b1f17b1804b1-43afddc6489mr144850385e9.7.1741015422319;
        Mon, 03 Mar 2025 07:23:42 -0800 (PST)
Received: from localhost ([2a03:2880:31ff:73::])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bc24a51bfsm38651415e9.10.2025.03.03.07.23.41
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 Mar 2025 07:23:41 -0800 (PST)
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Waiman Long <llong@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Barret Rhoden <brho@google.com>,
	Josh Don <joshdon@google.com>,
	Dohyun Kim <dohyunkim@google.com>,
	linux-arm-kernel@lists.infradead.org,
	kkd@meta.com,
	kernel-team@meta.com
Subject: [PATCH bpf-next v3 25/25] selftests/bpf: Add tests for rqspinlock
Date: Mon,  3 Mar 2025 07:23:05 -0800
Message-ID: <20250303152305.3195648-26-memxor@gmail.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com>
References: <20250303152305.3195648-1-memxor@gmail.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=15728; h=from:subject;
 bh=U3o4FRMgSYlK94f6p6XtgMmHxSWNXlOtPMuGZSejx2w=;
 b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZzKIrFm1iuCUUDDtfLRnhLPc+SCt9cT5eXF6o
 6N+GSmqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RysxCD/
 0USidd7H00hBTlGhkNe1qL1P8T3pcVx1ejxkBbgrAma8ybJKZyercWVHVXCW8eQg784Ri97C4xBbFo
 3V9qUCkJUJZUX4AnPiNsAdiAQCHKECTLYm0p0KqI9BcdgZKQv+HVHH54QDArzyN9nBuG4Uks81oq90
 LbGJLfxQ/EP7x8KvPJqEa3auefyGtsq6f6d5JT+n+8gWtbOpnhncsq5NPk+Cfh4XjdHGw1ij0f6kb8
 sYkYXwCPxGNbt7GuxA7DUA28RLTd+CWeLMnLz8sqlSHEqT3geqqTQ7zCGroosld1WZxYffYop/FpWT
 CK438ZkEWr6oW0Z9sokHWAwu5YnrGNcsZCNzUGyAnTFnYrhqn4p2oXPPWxdJP4MkivrQbM7nFp9g0W
 oGzGvCmn61ww7IdcCEK09nEtye3GvefEpTNPahCH5W1538XTcMV+ivRdfq2TzV48irQGjQEOLSEamT
 opLsWsX/ngZ8TSUM8rHQd6R1Xn/nxAehi9yFiEr8Afb4nBBoSL5SIDxtqDnDYjMBMZv/cHK/f0yOuL
 QwW/1Dd+qnhEq3YSKQmvGN8vnM1bubpAJGBgYYiv5mH/KDdvzmV0IA20+gJ8g+d9P9Z/RnqFZnIFPw
 ciMScGjt5UN3k63SRNESddfekGf3gq1yUKZbJrWdML5Zi9tRdfNR7DtyAqgg==
X-Developer-Key: i=memxor@gmail.com; a=openpgp;
 fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA
X-Patchwork-Delegate: bpf@iogearbox.net

Introduce selftests that trigger AA, ABBA deadlocks, and test the edge
case where the held locks table runs out of entries, since we then
fallback to the timeout as the final line of defense. Also exercise
verifier's AA detection where applicable.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/res_spin_lock.c  |  92 +++++++
 tools/testing/selftests/bpf/progs/irq.c       |  53 ++++
 .../selftests/bpf/progs/res_spin_lock.c       | 143 ++++++++++
 .../selftests/bpf/progs/res_spin_lock_fail.c  | 244 ++++++++++++++++++
 4 files changed, 532 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
new file mode 100644
index 000000000000..563d0d2801bb
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "res_spin_lock.skel.h"
+#include "res_spin_lock_fail.skel.h"
+
+void test_res_spin_lock_failure(void)
+{
+	RUN_TESTS(res_spin_lock_fail);
+}
+
+static volatile int skip;
+
+static void *spin_lock_thread(void *arg)
+{
+	int err, prog_fd = *(u32 *) arg;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 10000,
+	);
+
+	while (!READ_ONCE(skip)) {
+		err = bpf_prog_test_run_opts(prog_fd, &topts);
+		ASSERT_OK(err, "test_run");
+		ASSERT_OK(topts.retval, "test_run retval");
+	}
+	pthread_exit(arg);
+}
+
+void test_res_spin_lock_success(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct res_spin_lock *skel;
+	pthread_t thread_id[16];
+	int prog_fd, i, err;
+	void *ret;
+
+	skel = res_spin_lock__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load"))
+		return;
+	/* AA deadlock */
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	/* Multi-threaded ABBA deadlock. */
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB);
+	for (i = 0; i < 16; i++) {
+		int err;
+
+		err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd);
+		if (!ASSERT_OK(err, "pthread_create"))
+			goto end;
+	}
+
+	topts.retval = 0;
+	topts.repeat = 1000;
+	int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA);
+	while (!topts.retval && !err && !READ_ONCE(skel->bss->err)) {
+		err = bpf_prog_test_run_opts(fd, &topts);
+	}
+
+	WRITE_ONCE(skip, true);
+
+	for (i = 0; i < 16; i++) {
+		if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join"))
+			goto end;
+		if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd"))
+			goto end;
+	}
+
+	ASSERT_EQ(READ_ONCE(skel->bss->err), -EDEADLK, "timeout err");
+	ASSERT_OK(err, "err");
+	ASSERT_EQ(topts.retval, -EDEADLK, "timeout");
+end:
+	res_spin_lock__destroy(skel);
+	return;
+}
diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
index 298d48d7886d..74d912b22de9 100644
--- a/tools/testing/selftests/bpf/progs/irq.c
+++ b/tools/testing/selftests/bpf/progs/irq.c
@@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym;
 extern void bpf_local_irq_restore(unsigned long *) __weak __ksym;
 extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym;
 
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
 SEC("?tc")
 __failure __msg("arg#0 doesn't point to an irq flag on stack")
 int irq_save_bad_arg(struct __sk_buff *ctx)
@@ -510,4 +513,54 @@ int irq_sleepable_global_subprog_indirect(void *ctx)
 	return 0;
 }
 
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_ooo_lock_cond_inv(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) {
+		bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+		return 0;
+	}
+
+	bpf_res_spin_unlock_irqrestore(&lockB, &flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_1(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	/* For now, bpf_local_irq_restore is not allowed in critical section,
+	 * but this test ensures error will be caught with kfunc_class when it's
+	 * opened up. Tested by temporarily permitting this kfunc in critical
+	 * section.
+	 */
+	bpf_local_irq_restore(&flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	bpf_local_irq_save(&flags1);
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags2))
+		return 0;
+	bpf_local_irq_restore(&flags2);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c
new file mode 100644
index 000000000000..40ac06c91779
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#define EDEADLK 35
+#define ETIMEDOUT 110
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 64);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
+SEC("tc")
+int res_spin_lock_test(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem1, *elem2;
+	int r;
+
+	elem1 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem1)
+		return -1;
+	elem2 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem2)
+		return -1;
+
+	r = bpf_res_spin_lock(&elem1->lock);
+	if (r)
+		return r;
+	if (!bpf_res_spin_lock(&elem2->lock)) {
+		bpf_res_spin_unlock(&elem2->lock);
+		bpf_res_spin_unlock(&elem1->lock);
+		return -1;
+	}
+	bpf_res_spin_unlock(&elem1->lock);
+	return 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_AB(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockA);
+	if (r)
+		return !r;
+	/* Only unlock if we took the lock. */
+	if (!bpf_res_spin_lock(&lockB))
+		bpf_res_spin_unlock(&lockB);
+	bpf_res_spin_unlock(&lockA);
+	return 0;
+}
+
+int err;
+
+SEC("tc")
+int res_spin_lock_test_BA(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockB);
+	if (r)
+		return !r;
+	if (!bpf_res_spin_lock(&lockA))
+		bpf_res_spin_unlock(&lockA);
+	else
+		err = -EDEADLK;
+	bpf_res_spin_unlock(&lockB);
+	return err ?: 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx)
+{
+	struct bpf_res_spin_lock *locks[48] = {};
+	struct arr_elem *e;
+	u64 time_beg, time;
+	int ret = 0, i;
+
+	_Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 31,
+		       "RES_NR_HELD assumed to be 31");
+
+	for (i = 0; i < 34; i++) {
+		int key = i;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	for (; i < 48; i++) {
+		int key = i - 2;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	time_beg = bpf_ktime_get_ns();
+	for (i = 0; i < 34; i++) {
+		if (bpf_res_spin_lock(locks[i]))
+			goto end;
+	}
+
+	/* Trigger AA, after exhausting entries in the held lock table. This
+	 * time, only the timeout can save us, as AA detection won't succeed.
+	 */
+	if (!bpf_res_spin_lock(locks[34])) {
+		bpf_res_spin_unlock(locks[34]);
+		ret = 1;
+		goto end;
+	}
+
+end:
+	for (i = i - 1; i >= 0; i--)
+		bpf_res_spin_unlock(locks[i]);
+	time = bpf_ktime_get_ns() - time_beg;
+	/* Time spent should be easily above our limit (1/4 s), since AA
+	 * detection won't be expedited due to lack of held lock entry.
+	 */
+	return ret ?: (time > 1000000000 / 4 ? 0 : 1);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
new file mode 100644
index 000000000000..3222e9283c78
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+long value;
+
+struct bpf_spin_lock lock __hidden SEC(".data.A");
+struct bpf_res_spin_lock res_lock __hidden SEC(".data.B");
+
+SEC("?tc")
+__failure __msg("point to map value or allocated object")
+int res_spin_lock_arg(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff));
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_cond_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_local_irq_save(&f1);
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock)) {
+		bpf_res_spin_unlock(&res_lock);
+		return 0;
+	}
+	bpf_res_spin_unlock(&elem->lock);
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo_irq(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1, f2;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) {
+		bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+		/* We won't have a unreleased IRQ flag error here. */
+		return 0;
+	}
+	bpf_res_spin_unlock_irqrestore(&elem->lock, &f2);
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+struct bpf_res_spin_lock lock1 __hidden SEC(".data.OO1");
+struct bpf_res_spin_lock lock2 __hidden SEC(".data.OO2");
+
+SEC("?tc")
+__failure __msg("bpf_res_spin_unlock cannot be out of order")
+int res_spin_lock_ooo_unlock(struct __sk_buff *ctx)
+{
+	if (bpf_res_spin_lock(&lock1))
+		return 0;
+	if (bpf_res_spin_lock(&lock2)) {
+		bpf_res_spin_unlock(&lock1);
+		return 0;
+	}
+	bpf_res_spin_unlock(&lock1);
+	bpf_res_spin_unlock(&lock2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0")
+int res_spin_lock_bad_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((void *)&elem->lock + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset")
+int res_spin_lock_var_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	u64 val = value;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem) {
+		// FIXME: Only inline assembly use in assert macro doesn't emit
+		//	  BTF definition.
+		bpf_throw(0);
+		return 0;
+	}
+	bpf_assert_range(val, 0, 40);
+	bpf_res_spin_lock((void *)&value + val);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_map(struct __sk_buff *ctx)
+{
+	bpf_res_spin_lock((void *)&value + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("local 'kptr' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx)
+{
+	struct { int i; } *p = bpf_obj_new(typeof(*p));
+
+	if (!p)
+		return 0;
+	bpf_res_spin_lock((void *)p);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";