From patchwork Mon Mar 3 15:22:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999018 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A38872144C3; Mon, 3 Mar 2025 15:23:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015395; cv=none; b=Hne3ljpHVk6JZKDU/UOdIU7mmxJz0R/c3VhkKccGsMmdU6FCcYT/RWevvSMIh4cXRHL9tPAwshDTIZpeCRgoTikduiCHzv1Xbgbu7y62l8JlxEaOYlp2mRlty2z/zpcgPSugVomj97qibrK0++I99eGqB1pF4goclDmQhE17ShM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015395; c=relaxed/simple; bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sIQTD3DBg1c1tvTNLs+RPA7dp3pIGtwPrymIgGlFNGfUIJSvDKbiOOv4Za8l5PUK/uZnyQgsYICJRvQb2bjZzH9Z23QNZEFI114e+VwuaM5E+Kf4cgUMo9jR9tD0rrpg+smC7EOaWL19OsU2eK0Vqq4s5Mwdl3j8bKD29vP7RxE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Z51Dp6Df; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Z51Dp6Df" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-4393dc02b78so29222855e9.3; Mon, 03 Mar 2025 07:23:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015392; x=1741620192; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=; b=Z51Dp6DfOVoMM2SZoNOnS7Y51ZJ5AiIM8lmnXn2DSEP4GknEP/eiGT8HomESgYohPL 6d2YosDZjlp6I0NjUyey05jwGux6PNlAL0RgzhL79/ShLoV2afTLvb89LLtgkiJtSf4U D8yQln4/ti/H+ZvMwW/ORzdiVOcx29E3AfOM2Phs+eX0F1IPmEEu1jyxKYg/Ya/F5uqu bOvgsgrlQtGaWwmNFwXgFciavzLvbssm0sPM2VeVM1twHAaT6HSIt1B7RI50ZNQ9C/nJ SWA95HdnQnJS2aVb3ZoMz9RO0pqlREAgqfGbfYKUr0wxPh3xotUCqyDdxDCqnSx+tgO9 ClNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015392; x=1741620192; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=; b=OaxrvSj37OW5LI/2fiSlEImqa3atd9ZIM/WMtuwFxP7hqbWKYQghcXIYaPm8zLlxIM IYsqoYgNNI/DqzTq2NcMRZU0ywlmeOJjsI4T18hJYFzFP0QP1IJBBzq+/4eGiW3IcRa3 VjmmhFXHWxINgsZEpepynIutEAlwVhOwkSBsNsvqm8rbjsqOGcj9BDDsYh2dqcIHeIvE bfei30ZZihPwTp5qBXiwk47xdNSvO+moJi4s5uwU6p7izixoBBqqCI9PruBDdzbmf3WQ 7DyXSUv6tMshUjYuBsiIYwXi7huBltXALXkoD6cclmjT+BBK1iFYW4Qan9lokrFTfN/W kMcQ== X-Forwarded-Encrypted: i=1; AJvYcCXPYWbPPUBLyCcS639f8MTPs0BLRo0EK5+ooJkmDRGarVcc+rUzIbRMLe/OGYTD/P/mEqNLTc/LGuuu6KM=@vger.kernel.org X-Gm-Message-State: AOJu0YyA18NYncjoL4F4Fq4pWxqv29L2KegsakNoWp1KDyvwy97eii/G xxuK9xpPX7zJWzDhuUMAqd6sZi63jh3ptuw4z+M7ZKe5UrvMbAA9JSnmeYGPFYU= X-Gm-Gg: ASbGncuSFvYJN0BiEQRQFGnzI6UqVda+et7UBRXsd6MBjtnt7+sGq5PkLJNVoxJTxlL TXvp9DSbeP6jZYiMjWNMg6KJD2o2TlzwLCOFqBvBUX3WjANeNfnyFkuVq5gYOoM7yPcWehLH7OV dOzr0g/e0Obs6yUidiutKYHSnac2SBf2nZMPrUm0gm5s3DIIrZjdoJ9kbazB0KZClxMZ0qUBfo1 leEP/WdLwDNi3K/PI2gs4AKXdjL37VnAVbX84AwuDxOWCV5/Ax26DEjL1KkFIXqhi6ffH5KxN// Db1RIw2Q0Jaic5mSCpCcJXCXarw5ZsTg2A== X-Google-Smtp-Source: AGHT+IGWe7AGIoZyzhIx4o6frHx+jxnpDfdaTig7kvrlS9c7aRLExKFsaeXCaLAhRyKx5jPukd6geQ== X-Received: by 2002:a05:600c:4f86:b0:43b:ca39:6c75 with SMTP id 5b1f17b1804b1-43bca3972d7mr4843355e9.16.1741015391407; Mon, 03 Mar 2025 07:23:11 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b736f75ebsm164799315e9.3.2025.03.03.07.23.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:11 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 01/25] locking: Move MCS struct definition to public header Date: Mon, 3 Mar 2025 07:22:41 -0800 Message-ID: <20250303152305.3195648-2-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1522; h=from:subject; bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVghejvu1V67bqXOIvMlrdiOA5EFuq9ml673Px YVwX7L2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8RypodD/ sHpZiA9+maif3gXLaLeb6BTa8ziYYbNY7ZG6/NdOcNw/uV3egEUUzRRqjKaMmDsRtl8u3i8UuKv93i Jpn+XC/A+FwN36qoS13Atwe1+eFSD/ZK0b52m1Tgy5sHI8wLBe/eCop0B6+TdV+nqpCUG22mIxK8kL yxABesXn6haH0qjCe1WgAx2qbP15M5KLx2leHpzsbXkBjQN/URoZHzBv3oyjrWiFXXn0R1lS3yGOwR V3j3EA44LSwOtcKzHtFmZOPnzXP9h/VaWi9Q/827Uq8Q5YpRgAhsDHaBEinMJ6xrWKWKxzosaTM4qZ 3Kl57QWIF7hlmRccjfD2KGQDbup22QXtz34QHm0FjNknfyT3oVfUVoJ8RTYsXR4h6sIkAEV8mEu9Il 8o/qwNMFXj/K1a3Z2+hKs9CkXdftARzIiKNhnOLoZ6OcMybgSp86eS2StUIeegiSSPWCLSHJRijnuU wb0sFuJWRQ6eveZiG59om0uLPkO1knbSuAk+dkykWu5TmE9nKyt7WXULkN9Q6+rkG9Lkt1jV89MQEX 3sPA7uXmn3t9nLxQINGa3cpYEru6LaMTvUmeUSf5maMPLnh3CMlBYvFVMi5q2JPW9m7JyF1/sqOLrp wGbPN0FktYyVe186BbPjg+7BtvK8bU5OX0ZT5CupTYRg2a1ZfvVBTo+6gZnQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Move the definition of the struct mcs_spinlock from the private mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h asm-generic header, since we will need to reference it from the qspinlock.h header in subsequent commits. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/mcs_spinlock.h | 6 ++++++ kernel/locking/mcs_spinlock.h | 6 ------ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h index 10cd4ffc6ba2..39c94012b88a 100644 --- a/include/asm-generic/mcs_spinlock.h +++ b/include/asm-generic/mcs_spinlock.h @@ -1,6 +1,12 @@ #ifndef __ASM_MCS_SPINLOCK_H #define __ASM_MCS_SPINLOCK_H +struct mcs_spinlock { + struct mcs_spinlock *next; + int locked; /* 1 if lock acquired */ + int count; /* nesting count, see qspinlock.c */ +}; + /* * Architectures can define their own: * diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 85251d8771d9..16160ca8907f 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -15,12 +15,6 @@ #include -struct mcs_spinlock { - struct mcs_spinlock *next; - int locked; /* 1 if lock acquired */ - int count; /* nesting count, see qspinlock.c */ -}; - #ifndef arch_mcs_spin_lock_contended /* * Using smp_cond_load_acquire() provides the acquire semantics From patchwork Mon Mar 3 15:22:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999020 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11C472144B8; Mon, 3 Mar 2025 15:23:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015397; cv=none; b=Z4WRHfkPFT3/oFVXT6rj+CArfhTj+JTzBcbOMEx88CD4e1pAyuDPMeRmFu7/b0OY7Gf39Mzq4WaaS4ktCK7VHzE55eLq/FfJSYBgqkd8uiZAJ2loThqJh6JxlbfngxKc2AwsE5rn2U6mrkXXg+RLzl4r0ThXDK/87cuNgfpGI+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015397; c=relaxed/simple; bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BzdtEkxaQbKvj4HmUSt24ItguLENV6gVtBU0hzIV6NA4xRHuifKk+9hlq891vUWkFl9KW0m1inF/hVk4F1PTUrBjs6DxCq93/KUulbvTtHDFBoYZza+agV1xWFoYfNzI11shN7rD7DUbrfhcAKeOmlqkdkNxanBmXCLHioZIH7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZiI5Fwmv; arc=none smtp.client-ip=209.85.221.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZiI5Fwmv" Received: by mail-wr1-f68.google.com with SMTP id ffacd0b85a97d-38a25d4b9d4so2800697f8f.0; Mon, 03 Mar 2025 07:23:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015393; x=1741620193; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=; b=ZiI5FwmvmJlS7OLfuaAkvdSj9VJomiPAso2WUwHXGLZIkgfXxDUI+E4hINbsMyiXy2 NO10lb2PnqVcmmM6WdBWCbZycRwgr+Y1CZ5ecsKscmumYdyBEK6naUi+Rvijs4928WWv ak8z35fHd+IpITa3z+4pjctQ/Tbs8ExARrFHpEH6yLniHiw6f9tFVEUyMu1fCkkrvbEV 657v8+wVL9D6pCngryTNKTAAm4kud4mw455KGcqDBr0DWsCtM0rtU+T9+3GB9YV5+kHk AzrC3RVoWdU0lXG5BC/PtS6spTH1iaQ04hL1C8aMcIIib+9rP5rCMwMfFD15fzOXMA7x nS7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015393; x=1741620193; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=; b=pG1BOb0V6zEFAWtfoK4UdUM9zWcvSIUN/3AQhgxKp9TDPTg5u8pV3aqyT5VHGzJ0Tx CLD7KcVGckeM05bRDxBXoSRMeHtIlys++IFeu58blj7+qmECzn6RC8TgPU4Nw7x2t0qe Fd++54iha21zbQabqQJM2fAAesBE3lZfo+s72G48oZ2vG5YqJxqg37b+29uijH44+vK9 XQZxgNS62qrWjuetpP48nYfH/aVKeM7WfWEQNsKzTQ1nmp9NBLIKsA0/RCMsVlUmaM8d vnIaYir0+q226rAYPqcnk/08qD9lkCL/UoIY7vX7wYyjlHSoCdLKIeFzmPZXwy2McRxy uOwg== X-Forwarded-Encrypted: i=1; AJvYcCV8UMZbhIH6frs2xiSona1bh6Rc4B8Gv7lp80EArOTZHIhmWXZjRGAS5gqFWsWiLPsRBVZa7qU+d6LaaTY=@vger.kernel.org X-Gm-Message-State: AOJu0YwHdWzhjqo5/OQvpeslVcijTwDKTqNR71ZNRcKxmQ46BgzEz5iA QMX7mdwtwWqZf8qRuD59CJnR3BZxQdi2Ct17BbOJz/0UVQDJIXpC2GHkPxyIZ3Y= X-Gm-Gg: ASbGncvfNB29LGj7+UmIlZH3DdwkNCmOqK6Rg9sIzM1l2Yrge88lVn0TqSVWfvkb0L2 5uyxY/2s8on8XPU4XClA3nA1mHms7/dHpksaSpBYIS78qx+XKP1UHNRXIV5LLfhjTDDr6kblmsj hc+24ZVBx+ncXIQO0fuOhXMVogfoZrmk+NPUmiWRhIQaFf99uqVEjKm/+UpCnVJDH/OEL29TU5I 70SycISns9C8ZRCCpIyZvjkW+nWd3aXGR0jifba6RFXHtfKaklOJgsZUupBZVq79YPVquibZalN dT0KiySS6RiUNI2mt0nysT7bqjU8cCD6R5U= X-Google-Smtp-Source: AGHT+IEZmbL4kTVe2++oKcRU5JQYI+CySPZk9bC17VESbEqwwtp816GvVhoQy61ssaTtOJsaTHaHDQ== X-Received: by 2002:a5d:64a5:0:b0:391:b23:b318 with SMTP id ffacd0b85a97d-3910b23b573mr3648211f8f.41.1741015392642; Mon, 03 Mar 2025 07:23:12 -0800 (PST) Received: from localhost ([2a03:2880:31ff:72::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e486eea3sm14548017f8f.101.2025.03.03.07.23.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:12 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 02/25] locking: Move common qspinlock helpers to a private header Date: Mon, 3 Mar 2025 07:22:42 -0800 Message-ID: <20250303152305.3195648-3-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=13562; h=from:subject; bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVkVfZqT1tCitfFNFTby5Hz/Q0Ls5KtoFEDTCL cZHH7UOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8RypysEA C3leDFcx+MonrV13UNP/a1KQjzZBifx/rS8N1isQIHzlulO71sBTAbTJDuZM6yq/0bU/qUs8hwXLq/ w8imvbbOhV0sg2v0XcsUu+Qqaji1dlaEl8+Yzo1SsBEL0NAFxDJtE2OgpBRNEITRSoUsbzwYGVJItt ptZni9OCm9cqgf2GOBiN2XXULRyhnYmdK/eUHMcHrEQvOy2rOdJNpf2dSxW8CcYd49oZaMAqF319OI 19D1ra8czLwFo1Y3MCrEQuT73bn0YZIwfZYdmwgoihZWMjvvZ6rNwRLzpDN5XJLhZLUZEK6pUn5Xpe GcTDtEFLciWRXU6QhN2gu0ZAm9XKtRaR4Hp5yOy3qf2tkMxt5L5ad1BnhyPm8drcY8LQPc/zB5fuIX t029FoAw1yTCn//bVcO6KrFEyGMCMgZ8/UtZpPrcIB70+R1GFk0CB/u0iyKykO1ee5ke5aVFFPZMsb KGBfAxmEcDwbj/LlC/q7xnfODH0cObC073WRxVzQMIOOBYdONvuAD5d1+YlWUOSAUNWrB8olMJS67I sIUraNIODLQ7Rw8YCl2Ekg1SRQA0pvBono+BLof7ZvXRM/L1GUVhc8kNk6XpOToWnQf5XX5OU35vVG 7APJ8S46OuAh06qCXM9teIlkPaNqlA0TdWqHhyfcftt9TvzHsXib5eD7UN6A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Move qspinlock helper functions that encode, decode tail word, set and clear the pending and locked bits, and other miscellaneous definitions and macros to a private header. To this end, create a qspinlock.h header file in kernel/locking. Subsequent commits will introduce a modified qspinlock slow path function, thus moving shared code to a private header will help minimize unnecessary code duplication. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/qspinlock.c | 193 +---------------------------------- kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+), 188 deletions(-) create mode 100644 kernel/locking/qspinlock.h diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 7d96bed718e4..af8d122bb649 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -25,8 +25,9 @@ #include /* - * Include queued spinlock statistics code + * Include queued spinlock definitions and statistics code */ +#include "qspinlock.h" #include "qspinlock_stat.h" /* @@ -67,36 +68,6 @@ */ #include "mcs_spinlock.h" -#define MAX_NODES 4 - -/* - * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in - * size and four of them will fit nicely in one 64-byte cacheline. For - * pvqspinlock, however, we need more space for extra data. To accommodate - * that, we insert two more long words to pad it up to 32 bytes. IOW, only - * two of them can fit in a cacheline in this case. That is OK as it is rare - * to have more than 2 levels of slowpath nesting in actual use. We don't - * want to penalize pvqspinlocks to optimize for a rare case in native - * qspinlocks. - */ -struct qnode { - struct mcs_spinlock mcs; -#ifdef CONFIG_PARAVIRT_SPINLOCKS - long reserved[2]; -#endif -}; - -/* - * The pending bit spinning loop count. - * This heuristic is used to limit the number of lockword accesses - * made by atomic_cond_read_relaxed when waiting for the lock to - * transition out of the "== _Q_PENDING_VAL" state. We don't spin - * indefinitely because there's no guarantee that we'll make forward - * progress. - */ -#ifndef _Q_PENDING_LOOPS -#define _Q_PENDING_LOOPS 1 -#endif /* * Per-CPU queue node structures; we can never have more than 4 nested @@ -106,161 +77,7 @@ struct qnode { * * PV doubles the storage and uses the second cacheline for PV state. */ -static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]); - -/* - * We must be able to distinguish between no-tail and the tail at 0:0, - * therefore increment the cpu number by one. - */ - -static inline __pure u32 encode_tail(int cpu, int idx) -{ - u32 tail; - - tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; - tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ - - return tail; -} - -static inline __pure struct mcs_spinlock *decode_tail(u32 tail) -{ - int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; - int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; - - return per_cpu_ptr(&qnodes[idx].mcs, cpu); -} - -static inline __pure -struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) -{ - return &((struct qnode *)base + idx)->mcs; -} - -#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) - -#if _Q_PENDING_BITS == 8 -/** - * clear_pending - clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,* -> *,0,* - */ -static __always_inline void clear_pending(struct qspinlock *lock) -{ - WRITE_ONCE(lock->pending, 0); -} - -/** - * clear_pending_set_locked - take ownership and clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,0 -> *,0,1 - * - * Lock stealing is not allowed if this function is used. - */ -static __always_inline void clear_pending_set_locked(struct qspinlock *lock) -{ - WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); -} - -/* - * xchg_tail - Put in the new queue tail code word & retrieve previous one - * @lock : Pointer to queued spinlock structure - * @tail : The new queue tail code word - * Return: The previous queue tail code word - * - * xchg(lock, tail), which heads an address dependency - * - * p,*,* -> n,*,* ; prev = xchg(lock, node) - */ -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) -{ - /* - * We can use relaxed semantics since the caller ensures that the - * MCS node is properly initialized before updating the tail. - */ - return (u32)xchg_relaxed(&lock->tail, - tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; -} - -#else /* _Q_PENDING_BITS == 8 */ - -/** - * clear_pending - clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,* -> *,0,* - */ -static __always_inline void clear_pending(struct qspinlock *lock) -{ - atomic_andnot(_Q_PENDING_VAL, &lock->val); -} - -/** - * clear_pending_set_locked - take ownership and clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,0 -> *,0,1 - */ -static __always_inline void clear_pending_set_locked(struct qspinlock *lock) -{ - atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); -} - -/** - * xchg_tail - Put in the new queue tail code word & retrieve previous one - * @lock : Pointer to queued spinlock structure - * @tail : The new queue tail code word - * Return: The previous queue tail code word - * - * xchg(lock, tail) - * - * p,*,* -> n,*,* ; prev = xchg(lock, node) - */ -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) -{ - u32 old, new; - - old = atomic_read(&lock->val); - do { - new = (old & _Q_LOCKED_PENDING_MASK) | tail; - /* - * We can use relaxed semantics since the caller ensures that - * the MCS node is properly initialized before updating the - * tail. - */ - } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); - - return old; -} -#endif /* _Q_PENDING_BITS == 8 */ - -/** - * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending - * @lock : Pointer to queued spinlock structure - * Return: The previous lock value - * - * *,*,* -> *,1,* - */ -#ifndef queued_fetch_set_pending_acquire -static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock) -{ - return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); -} -#endif - -/** - * set_locked - Set the lock bit and own the lock - * @lock: Pointer to queued spinlock structure - * - * *,*,0 -> *,0,1 - */ -static __always_inline void set_locked(struct qspinlock *lock) -{ - WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); -} - +static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); /* * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for @@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * any MCS node. This is not the most elegant solution, but is * simple enough. */ - if (unlikely(idx >= MAX_NODES)) { + if (unlikely(idx >= _Q_MAX_NODES)) { lockevent_inc(lock_no_node); while (!queued_spin_trylock(lock)) cpu_relax(); @@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * head of the waitqueue. */ if (old & _Q_TAIL_MASK) { - prev = decode_tail(old); + prev = decode_tail(old, qnodes); /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h new file mode 100644 index 000000000000..d4ceb9490365 --- /dev/null +++ b/kernel/locking/qspinlock.h @@ -0,0 +1,200 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Queued spinlock defines + * + * This file contains macro definitions and functions shared between different + * qspinlock slow path implementations. + */ +#ifndef __LINUX_QSPINLOCK_H +#define __LINUX_QSPINLOCK_H + +#include +#include +#include +#include + +#define _Q_MAX_NODES 4 + +/* + * The pending bit spinning loop count. + * This heuristic is used to limit the number of lockword accesses + * made by atomic_cond_read_relaxed when waiting for the lock to + * transition out of the "== _Q_PENDING_VAL" state. We don't spin + * indefinitely because there's no guarantee that we'll make forward + * progress. + */ +#ifndef _Q_PENDING_LOOPS +#define _Q_PENDING_LOOPS 1 +#endif + +/* + * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in + * size and four of them will fit nicely in one 64-byte cacheline. For + * pvqspinlock, however, we need more space for extra data. To accommodate + * that, we insert two more long words to pad it up to 32 bytes. IOW, only + * two of them can fit in a cacheline in this case. That is OK as it is rare + * to have more than 2 levels of slowpath nesting in actual use. We don't + * want to penalize pvqspinlocks to optimize for a rare case in native + * qspinlocks. + */ +struct qnode { + struct mcs_spinlock mcs; +#ifdef CONFIG_PARAVIRT_SPINLOCKS + long reserved[2]; +#endif +}; + +/* + * We must be able to distinguish between no-tail and the tail at 0:0, + * therefore increment the cpu number by one. + */ + +static inline __pure u32 encode_tail(int cpu, int idx) +{ + u32 tail; + + tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; + tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ + + return tail; +} + +static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes) +{ + int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; + int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; + + return per_cpu_ptr(&qnodes[idx].mcs, cpu); +} + +static inline __pure +struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) +{ + return &((struct qnode *)base + idx)->mcs; +} + +#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) + +#if _Q_PENDING_BITS == 8 +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,* -> *,0,* + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + WRITE_ONCE(lock->pending, 0); +} + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + * + * Lock stealing is not allowed if this function is used. + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); +} + +/* + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail), which heads an address dependency + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + /* + * We can use relaxed semantics since the caller ensures that the + * MCS node is properly initialized before updating the tail. + */ + return (u32)xchg_relaxed(&lock->tail, + tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; +} + +#else /* _Q_PENDING_BITS == 8 */ + +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,* -> *,0,* + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + atomic_andnot(_Q_PENDING_VAL, &lock->val); +} + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); +} + +/** + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + u32 old, new; + + old = atomic_read(&lock->val); + do { + new = (old & _Q_LOCKED_PENDING_MASK) | tail; + /* + * We can use relaxed semantics since the caller ensures that + * the MCS node is properly initialized before updating the + * tail. + */ + } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); + + return old; +} +#endif /* _Q_PENDING_BITS == 8 */ + +/** + * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending + * @lock : Pointer to queued spinlock structure + * Return: The previous lock value + * + * *,*,* -> *,1,* + */ +#ifndef queued_fetch_set_pending_acquire +static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock) +{ + return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); +} +#endif + +/** + * set_locked - Set the lock bit and own the lock + * @lock: Pointer to queued spinlock structure + * + * *,*,0 -> *,0,1 + */ +static __always_inline void set_locked(struct qspinlock *lock) +{ + WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); +} + +#endif /* __LINUX_QSPINLOCK_H */ From patchwork Mon Mar 3 15:22:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999021 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 116B1214A9F; Mon, 3 Mar 2025 15:23:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015397; cv=none; b=XsuF3zjpYwC8DWuv9B6/dM8NQZxj3fET3S9mwP63jI4OYnjn1EE2GZCZEgjCo4EKM00IHlrA07nn26/qUrMGoZcXfqcXhtlTVZp6x9HXv1+9EuugxD8d6SSJ4COe4exkK9CLDgH6n/8nHVqhdRlFHFPDJQLIPqTmNXVrAcaOwwo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015397; c=relaxed/simple; bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XMsz5NTwCHgjjcHc5kj1vRhLV3xaxQNKs4pk6jdBSSSIgUFQonICox/nC/cEPANi6Re8Td5I7oje3HKLWPWvknYL4HE7FZdoBtpnJbaNj5pbM0X89JbiBCGom1fHO65FrskiNOmQWz9YB/LrcrfAJ9BDOIcHuFyzu05OiA2A4LM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FF+3ClHU; arc=none smtp.client-ip=209.85.221.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FF+3ClHU" Received: by mail-wr1-f67.google.com with SMTP id ffacd0b85a97d-38f406e9f80so3511875f8f.2; Mon, 03 Mar 2025 07:23:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015394; x=1741620194; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=; b=FF+3ClHUM5qdsDNkKTJaQppHYYQOWx5AOshQ5SK+m8bovKSGD/6fsotHYAbxxzSHKz Pnd8S3v1bmtuG2lLBOB1JCyWbNlpds7KJW1PmT3+aTRYjNowIkiLqgwxDGmYWsqybxbP XPyKPtcpU2IcwgaYnFmcmJl4zE6seVO8ZnaPAjz4Cqc9AuqOvMsacKzVtBqBzXsm2bd1 WhPm96D1sXqyn+1nUnjFu2+dFqzjXsLm4SEcvudV9bKX+tII3eRIsB5vs4FlThElTanX nDDaRvo6XPtVeRDYoksAhBo95NvoLtk+mPNc6lxHlrFUq2bf1ybvKY3+9eWC+QDmk3uv 0jXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015394; x=1741620194; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=; b=a9FCb++9XXeMv5Qe856/bQdFNgVGI5/R+0Y34pSN8jLxzucnh+b1fp1VdbHgCmvnUh p46XyEW9y1fQm74wcmhUm8rOpe+G9VL0TVXBJeJojZxATcHx0jvGdQHEjZ98XuUnFDqM T9bCgPDUNQmgpiqYuGCXD8zAHJhqiPM4/QYM7l1P5+KQn4sJ4JPySLgFVTfgFQ4K28rf 9bZag51uuo4ZLFzgmtvJoWAFg3dMawn31Wn4eATsJ46WTdxrEPfDbJ0LIcELOZSlbJMp uMLJ9ru5nyzpkTmU68+exxovbBPZN3jrNIl9HpEMLkrxDYMIXuqXl7XsqgTbHmnplnuA xH3A== X-Forwarded-Encrypted: i=1; AJvYcCWxZh6Nqt1mv0bqQgZ4tdQG3KeqWyzZwywZSJ73VASPyAdvkecZTPhsev0h/g3r7HpOcCYoIMxFGIL5Wnk=@vger.kernel.org X-Gm-Message-State: AOJu0Yz5Jp2QonbPebGWAw94aQ0MUAnW01tgZlVK9eoNoZw4IrNgjxks JhXtFwlI1yAlDGk+MBULZOStAgMCkPcJXYXBb2Mt+e7fkIj1/XFzf0+dqqqO42A= X-Gm-Gg: ASbGncs+LCE3cc1g5+oZcLGBzjRdglH6bPzganBHnWsoO7TQo+FSLGCdwEIcBD/Ytih W0c2SnCJWH/dwaND63E/FH+pTEnd4HyDlB3unhxvrvVXOj42G/ocUI7loRzOTT3jpnuln0DggfI Buu9PZvMRhkaS4Kli++9JiZlDb3MQJgIgK6kXw36+TOIbXxrvtfbsfPDLdGzOVHeJ1j21lUvAeQ 6LimM3NmDxfpfk+Qs1yIlbLz4DEqgbA1JWD+TyaskQbEgkxy0gOSIDoHLQigO68rUyU1GiWzZY/ FX1aiepq7JLlANYIzca5xrlWU9QrpVuv2iw= X-Google-Smtp-Source: AGHT+IErll0m3qO5vPfIOAEgdmC0C+DMiSxgMRXz0/y2x++hKd/Wp+79L1fagMz6mKX2oQCqD0CzfA== X-Received: by 2002:a5d:59a4:0:b0:390:f822:3ca3 with SMTP id ffacd0b85a97d-390f8223d8amr8118617f8f.37.1741015393939; Mon, 03 Mar 2025 07:23:13 -0800 (PST) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47b7c4fsm14904499f8f.52.2025.03.03.07.23.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:13 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 03/25] locking: Allow obtaining result of arch_mcs_spin_lock_contended Date: Mon, 3 Mar 2025 07:22:43 -0800 Message-ID: <20250303152305.3195648-4-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1052; h=from:subject; bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVvzlsIk8Mh+hFnelZKKgCgtqU9iOBLKbPXk+b Hr7ixKqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8Ryow+EA C1eFvCJZOUlVvCcUeprQTPLwv72Q6ykc0vZYM6C30cB2mh8RDYCi22FekqJDDuIBP3ggLIEkBn++iL HayE8E7tcJD+XUme2bPMlycWLuboqWEkhYbEUmsxEiHDMF0tIGq3CXtTX/EFmdPiqhrfYjK2/U0WMS NOVAt9VBBJ2Gkr/Ahg/bKs76BmL83Wf4QvULiFNYpGucOQLCWNYYcOf3+zvSDaQnE1PLiL4lwcEWgJ BWhLal7zLGBO37rSCp8pXZxAjZbOmlIMd7El/c3QBiJ/AUbfRk2SI2aoPiCG7z+vImKhFG9QUUxpHv FXE64wiHpCMTGwLOtsovyQ7+cDSbyd9myY4DYnoOFearxpOGJS6oDKVpKL+YUs0/lrw0EOGtKl8onS nwiFb0cJg40JCw+qXxsA3WdiI5uIK9cMKMra5mIr7h6U3ffX7Tt02zhl0EZujgfUYOKA9Jtck4Hd1x mkS1YmJ/dZEejK5aHgZ5LEBPw9S9BHwJj17/s5GOZti6Ra8ymdC+LLaxSZUDhaBNfI2G2cQJ70ISBx MVd2vcRakzCnM/Eax/CdmqnGby9zEaPOamB6GxfsXaYvqAw9KSqQjVShNIdzUnP/UWD4l5rcaZ50ZC /DEcBbUVqWlsEJY2xKVzRGMI5YAnPpqIb2pZParSp7t5oTMcgQWd2IuyPnAA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net To support upcoming changes that require inspecting the return value once the conditional waiting loop in arch_mcs_spin_lock_contended terminates, modify the macro to preserve the result of smp_cond_load_acquire. This enables checking the return value as needed, which will help disambiguate the MCS node’s locked state in future patches. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/mcs_spinlock.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 16160ca8907f..5c92ba199b90 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -24,9 +24,7 @@ * spinning, and smp_cond_load_acquire() provides that behavior. */ #define arch_mcs_spin_lock_contended(l) \ -do { \ - smp_cond_load_acquire(l, VAL); \ -} while (0) + smp_cond_load_acquire(l, VAL) #endif #ifndef arch_mcs_spin_unlock_contended From patchwork Mon Mar 3 15:22:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999022 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 730F9215065; Mon, 3 Mar 2025 15:23:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015399; cv=none; b=IsxQhYSHgWD7qTzh8Fkl87bZivgh9cBiSK/Q6era/B/fGLFXBx5IETCmCkSfwPQATFFnb1XEtJYLq4q5dV6cRpXO3D/c370gaC2+xPh/4y5BN7KsIjySrHE2YVAIsUMZuQ9IO6i+OAt77vaQjvbG35avfIk/1Y6YzkbbGZ8HFQU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015399; c=relaxed/simple; bh=FniirE2Pxgu9OpGG3yatwrPHxVskM6KALs9TZCH9in0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=s+G6pEEbZWoKoY8l5U3E4WGrD3PiD+sy2FfKwwAP0YmpfiQg70Xc3CbgNs+G61qfsqKXHzzs8XpIYrRjV4NHS0aY1ToIlrroIxFc9zl2MYduYA2sm5g97G+MMBNdsGLNBeoljv5MEd0by+ksTYYLSilEw6+tiJN9rffjdyyLpeM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=br5hRMpA; arc=none smtp.client-ip=209.85.128.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="br5hRMpA" Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-43bc4b1603fso5896375e9.0; Mon, 03 Mar 2025 07:23:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015395; x=1741620195; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vhHDQAJKH3ocHzxkhwcqvNhQHWrp7KSdfiwoYhzvJgU=; b=br5hRMpA59qON1flntxQ3C0Mm5yclomIVQuIszk/FZIFIJnjMyy1EurVFzhCZCsXMD G0RjYgVvBfZy15+ZkbmdhMItAMppcskRXQQBvLLO1Bxh5bwltPZSiNt/2deaZRQOD2FC j7UvkaS8pBuvIlUfUbT5VLPURF7ZnD7Zr9OvX2NjgBdZV7/yBy+SHzlJ+H2dbRJzd1SI OW4viYTXe5/iZU1SrJkTMlKUbtHD/K02u6rk4eOFLvwTI9hiiLo1AybFNaWRJlbWmUN2 Nj2I12109wdPF7TxNfcBbYWk16izboCQlFSEBvTTjKtHA+/6IS262+7+lWanpS4tqCb/ LCTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015395; x=1741620195; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vhHDQAJKH3ocHzxkhwcqvNhQHWrp7KSdfiwoYhzvJgU=; b=sTs4ysbxyWF5D5BwhOwFvyzO5ruaYvFvOqi/FLV4M/V2yxgBlsx2KoyIJDR66wi5Qj /bPbt/eI+koHGqcdbPxg8fH+Y7yOe7aFT0kXQDB8h2HnmuayPh1hpBuiIcn7k5rw4D/L 5AsWoCeKu/ZOd6DBlyEfUsnnjPyqeQzwPx0HHr2ZJ2xeb/pKwiK4PofLcTYtKASMYToj 1uI2T3DwmVUOvL//E23FrM0wTDf4kmpB/NEOpYCnsh0uBn19NxRNygdkuM3VizuW+xUv Ug0L+4bUjBm3+KwDcp476XPe1Cu60W5z7BQzJGL63PSrIl619rU6UmlEgL00ua18FcF8 ehlQ== X-Forwarded-Encrypted: i=1; AJvYcCUxuN6zOy2KWLEiOpIwUfU9QwGdSK6k+quCUYgmEMGsjUCJCy0La3OeqU5H6ZsEh8OMWX4IwQ01rhtwXsc=@vger.kernel.org X-Gm-Message-State: AOJu0YxUvzsWM8fGztnqoHAYHdO7csoFimKhU0BUo+wJZ+ZPxTG6NA2w wvayOpadqACqNX89DYZWH+1j8f2APb8dVFM7hTCBas/eM1mF3a4Cnb0u0JJghiM= X-Gm-Gg: ASbGncvwrbdslqcaeNgwEzY0zb4P1nNqym3mnUaaAohTv65K/r4nTNpYdzlo/o0O+Fe QyXxMUV0oHdITV9vkHmaiZ5k8969/KAFekquOPXpUUlH4NkyI1PyNIuni+dnhU/4xyE3mtZ0UW+ msn3htUQyaxr5BxS9rxeBnDTBJQcpsuqWCthz8fk7SMx24Ln6QZgLXG9jRc0a1moJtrKpBoKNVM UKw+YEqKdANVOvUrypqTiNLXecJ/QNv1Amc2coV33sPbGKK2YggINQi1Ozu8hklkWUiO6QZo3DZ n+JpvJLvIIyxIkP0UkArYzfiCWw5XHJ1CfQ= X-Google-Smtp-Source: AGHT+IE3JAxXjo712pa7fQEf55+AI3IriNv8TobwwGlXtFQEUu+iVyTKfQujHQCVAl/7LqPxVk7beQ== X-Received: by 2002:a05:6000:1889:b0:391:1388:64ba with SMTP id ffacd0b85a97d-391138867d4mr1529975f8f.53.1741015395150; Mon, 03 Mar 2025 07:23:15 -0800 (PST) Received: from localhost ([2a03:2880:31ff:53::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43aba532ba6sm192659325e9.12.2025.03.03.07.23.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:14 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 04/25] locking: Copy out qspinlock.c to rqspinlock.c Date: Mon, 3 Mar 2025 07:22:44 -0800 Message-ID: <20250303152305.3195648-5-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=14136; h=from:subject; bh=FniirE2Pxgu9OpGG3yatwrPHxVskM6KALs9TZCH9in0=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWndtVbKfQ0brUJQ1Pu+p+CwSfYBiqIiL7G+dP dbh8lRCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyhVqD/ 9bLvU1dNWAPayltIPZP16vP1AsOwOHsJyFNXWpDwNiZ1qq0VKLVkyh39BB5atzrU8GmCmvOtk4O21X zxnGzqe3KHERKzUnHAxQ84tq7TtWJmYpGb782XCl4nGCku28qV4MMEVf0kY978uJDlkI22pwpgzxXl Ad/D867MgAc6EnI7AWwCjx2npU0iiyt8W51gEKTcLQDYOV5an6SMzUJaStowYBPPOF9mZmmO/wlgSy x0tWHA3CX2ArkkHhhSPRjicpI5bPnZPRXJDHGP6qT5QHrakIh/iWZ6PgLluepWjV9l0sJJzjzturnU 5wMo9UFPiC2rDbO6mNGZ46vb5CCMdbXvReB/KUoGr/pYYSLsltLfKFt1Mq8mNXuTeSjOEwrrl45qgQ sge0VZeMBTbe0+LF41B8beG6OLHzHrRoI8HOj5/q5nN5UOUhvjVYkQY23mNGf5Otbs1iBxKUAyQvFS 9bRBOVst7t2mGL3JfUjPU6vhvThxbdWmwiJR9I1BPkEhkSI5Y8qrFR3xatt/2y4QmSYv03XvGMVxiU 8ux76lHK7T3Z8akuGqQUDJo7e8s7xsc4VQZq+432lnziprAu374O5OhUeEUlGQ+g/UB9aV5dRqWaQx cNzDpDl8Pw+F8zsoiwmdYpfZ+eJi06ZdG228YQJakYYr4dF2raMxHKG7hV+w== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net In preparation for introducing a new lock implementation, Resilient Queued Spin Lock, or rqspinlock, we first begin our modifications by using the existing qspinlock.c code as the base. Simply copy the code to a new file and rename functions and variables from 'queued' to 'resilient_queued'. This helps each subsequent commit in clearly showing how and where the code is being changed. The only change after a literal copy in this commit is renaming the functions where necessary, and rename qnodes to rqnodes. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++ 1 file changed, 410 insertions(+) create mode 100644 kernel/locking/rqspinlock.c diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c new file mode 100644 index 000000000000..143d9dda36f9 --- /dev/null +++ b/kernel/locking/rqspinlock.c @@ -0,0 +1,410 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Resilient Queued Spin Lock + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * (C) Copyright 2013-2014,2018 Red Hat, Inc. + * (C) Copyright 2015 Intel Corp. + * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP + * + * Authors: Waiman Long + * Peter Zijlstra + */ + +#ifndef _GEN_PV_LOCK_SLOWPATH + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Include queued spinlock definitions and statistics code + */ +#include "qspinlock.h" +#include "qspinlock_stat.h" + +/* + * The basic principle of a queue-based spinlock can best be understood + * by studying a classic queue-based spinlock implementation called the + * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable + * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and + * Scott") is available at + * + * https://bugzilla.kernel.org/show_bug.cgi?id=206115 + * + * This queued spinlock implementation is based on the MCS lock, however to + * make it fit the 4 bytes we assume spinlock_t to be, and preserve its + * existing API, we must modify it somehow. + * + * In particular; where the traditional MCS lock consists of a tail pointer + * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to + * unlock the next pending (next->locked), we compress both these: {tail, + * next->locked} into a single u32 value. + * + * Since a spinlock disables recursion of its own context and there is a limit + * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there + * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now + * we can encode the tail by combining the 2-bit nesting level with the cpu + * number. With one byte for the lock value and 3 bytes for the tail, only a + * 32-bit word is now needed. Even though we only need 1 bit for the lock, + * we extend it to a full byte to achieve better performance for architectures + * that support atomic byte write. + * + * We also change the first spinner to spin on the lock bit instead of its + * node; whereby avoiding the need to carry a node from lock to unlock, and + * preserving existing lock API. This also makes the unlock code simpler and + * faster. + * + * N.B. The current implementation only supports architectures that allow + * atomic operations on smaller 8-bit and 16-bit data types. + * + */ + +#include "mcs_spinlock.h" + +/* + * Per-CPU queue node structures; we can never have more than 4 nested + * contexts: task, softirq, hardirq, nmi. + * + * Exactly fits one 64-byte cacheline on a 64-bit architecture. + * + * PV doubles the storage and uses the second cacheline for PV state. + */ +static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); + +/* + * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs + * for all the PV callbacks. + */ + +static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } +static __always_inline void __pv_wait_node(struct mcs_spinlock *node, + struct mcs_spinlock *prev) { } +static __always_inline void __pv_kick_node(struct qspinlock *lock, + struct mcs_spinlock *node) { } +static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, + struct mcs_spinlock *node) + { return 0; } + +#define pv_enabled() false + +#define pv_init_node __pv_init_node +#define pv_wait_node __pv_wait_node +#define pv_kick_node __pv_kick_node +#define pv_wait_head_or_lock __pv_wait_head_or_lock + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define resilient_queued_spin_lock_slowpath native_resilient_queued_spin_lock_slowpath +#endif + +#endif /* _GEN_PV_LOCK_SLOWPATH */ + +/** + * resilient_queued_spin_lock_slowpath - acquire the queued spinlock + * @lock: Pointer to queued spinlock structure + * @val: Current value of the queued spinlock 32-bit word + * + * (queue tail, pending bit, lock value) + * + * fast : slow : unlock + * : : + * uncontended (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0) + * : | ^--------.------. / : + * : v \ \ | : + * pending : (0,1,1) +--> (0,1,0) \ | : + * : | ^--' | | : + * : v | | : + * uncontended : (n,x,y) +--> (n,0,0) --' | : + * queue : | ^--' | : + * : v | : + * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : + * queue : ^--' : + */ +void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +{ + struct mcs_spinlock *prev, *next, *node; + u32 old, tail; + int idx; + + BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + + if (pv_enabled()) + goto pv_queue; + + if (virt_spin_lock(lock)) + return; + + /* + * Wait for in-progress pending->locked hand-overs with a bounded + * number of spins so that we guarantee forward progress. + * + * 0,1,0 -> 0,0,1 + */ + if (val == _Q_PENDING_VAL) { + int cnt = _Q_PENDING_LOOPS; + val = atomic_cond_read_relaxed(&lock->val, + (VAL != _Q_PENDING_VAL) || !cnt--); + } + + /* + * If we observe any contention; queue. + */ + if (val & ~_Q_LOCKED_MASK) + goto queue; + + /* + * trylock || pending + * + * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock + */ + val = queued_fetch_set_pending_acquire(lock); + + /* + * If we observe contention, there is a concurrent locker. + * + * Undo and queue; our setting of PENDING might have made the + * n,0,0 -> 0,0,0 transition fail and it will now be waiting + * on @next to become !NULL. + */ + if (unlikely(val & ~_Q_LOCKED_MASK)) { + + /* Undo PENDING if we set it. */ + if (!(val & _Q_PENDING_MASK)) + clear_pending(lock); + + goto queue; + } + + /* + * We're pending, wait for the owner to go away. + * + * 0,1,1 -> *,1,0 + * + * this wait loop must be a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because not all + * clear_pending_set_locked() implementations imply full + * barriers. + */ + if (val & _Q_LOCKED_MASK) + smp_cond_load_acquire(&lock->locked, !VAL); + + /* + * take ownership and clear the pending bit. + * + * 0,1,0 -> 0,0,1 + */ + clear_pending_set_locked(lock); + lockevent_inc(lock_pending); + return; + + /* + * End of pending bit optimistic spinning and beginning of MCS + * queuing. + */ +queue: + lockevent_inc(lock_slowpath); +pv_queue: + node = this_cpu_ptr(&rqnodes[0].mcs); + idx = node->count++; + tail = encode_tail(smp_processor_id(), idx); + + trace_contention_begin(lock, LCB_F_SPIN); + + /* + * 4 nodes are allocated based on the assumption that there will + * not be nested NMIs taking spinlocks. That may not be true in + * some architectures even though the chance of needing more than + * 4 nodes will still be extremely unlikely. When that happens, + * we fall back to spinning on the lock directly without using + * any MCS node. This is not the most elegant solution, but is + * simple enough. + */ + if (unlikely(idx >= _Q_MAX_NODES)) { + lockevent_inc(lock_no_node); + while (!queued_spin_trylock(lock)) + cpu_relax(); + goto release; + } + + node = grab_mcs_node(node, idx); + + /* + * Keep counts of non-zero index values: + */ + lockevent_cond_inc(lock_use_node2 + idx - 1, idx); + + /* + * Ensure that we increment the head node->count before initialising + * the actual node. If the compiler is kind enough to reorder these + * stores, then an IRQ could overwrite our assignments. + */ + barrier(); + + node->locked = 0; + node->next = NULL; + pv_init_node(node); + + /* + * We touched a (possibly) cold cacheline in the per-cpu queue node; + * attempt the trylock once more in the hope someone let go while we + * weren't watching. + */ + if (queued_spin_trylock(lock)) + goto release; + + /* + * Ensure that the initialisation of @node is complete before we + * publish the updated tail via xchg_tail() and potentially link + * @node into the waitqueue via WRITE_ONCE(prev->next, node) below. + */ + smp_wmb(); + + /* + * Publish the updated tail. + * We have already touched the queueing cacheline; don't bother with + * pending stuff. + * + * p,*,* -> n,*,* + */ + old = xchg_tail(lock, tail); + next = NULL; + + /* + * if there was a previous node; link it and wait until reaching the + * head of the waitqueue. + */ + if (old & _Q_TAIL_MASK) { + prev = decode_tail(old, rqnodes); + + /* Link @node into the waitqueue. */ + WRITE_ONCE(prev->next, node); + + pv_wait_node(node, prev); + arch_mcs_spin_lock_contended(&node->locked); + + /* + * While waiting for the MCS lock, the next pointer may have + * been set by another lock waiter. We optimistically load + * the next pointer & prefetch the cacheline for writing + * to reduce latency in the upcoming MCS unlock operation. + */ + next = READ_ONCE(node->next); + if (next) + prefetchw(next); + } + + /* + * we're at the head of the waitqueue, wait for the owner & pending to + * go away. + * + * *,x,y -> *,0,0 + * + * this wait loop must use a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because the set_locked() function below + * does not imply a full barrier. + * + * The PV pv_wait_head_or_lock function, if active, will acquire + * the lock and return a non-zero value. So we have to skip the + * atomic_cond_read_acquire() call. As the next PV queue head hasn't + * been designated yet, there is no way for the locked value to become + * _Q_SLOW_VAL. So both the set_locked() and the + * atomic_cmpxchg_relaxed() calls will be safe. + * + * If PV isn't active, 0 will be returned instead. + * + */ + if ((val = pv_wait_head_or_lock(lock, node))) + goto locked; + + val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); + +locked: + /* + * claim the lock: + * + * n,0,0 -> 0,0,1 : lock, uncontended + * *,*,0 -> *,*,1 : lock, contended + * + * If the queue head is the only one in the queue (lock value == tail) + * and nobody is pending, clear the tail code and grab the lock. + * Otherwise, we only need to grab the lock. + */ + + /* + * In the PV case we might already have _Q_LOCKED_VAL set, because + * of lock stealing; therefore we must also allow: + * + * n,0,1 -> 0,0,1 + * + * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the + * above wait condition, therefore any concurrent setting of + * PENDING will make the uncontended transition fail. + */ + if ((val & _Q_TAIL_MASK) == tail) { + if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL)) + goto release; /* No contention */ + } + + /* + * Either somebody is queued behind us or _Q_PENDING_VAL got set + * which will then detect the remaining tail and queue behind us + * ensuring we'll see a @next. + */ + set_locked(lock); + + /* + * contended path; wait for next if not observed yet, release. + */ + if (!next) + next = smp_cond_load_relaxed(&node->next, (VAL)); + + arch_mcs_spin_unlock_contended(&next->locked); + pv_kick_node(lock, next); + +release: + trace_contention_end(lock, 0); + + /* + * release the node + */ + __this_cpu_dec(rqnodes[0].mcs.count); +} +EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); + +/* + * Generate the paravirt code for resilient_queued_spin_unlock_slowpath(). + */ +#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS) +#define _GEN_PV_LOCK_SLOWPATH + +#undef pv_enabled +#define pv_enabled() true + +#undef pv_init_node +#undef pv_wait_node +#undef pv_kick_node +#undef pv_wait_head_or_lock + +#undef resilient_queued_spin_lock_slowpath +#define resilient_queued_spin_lock_slowpath __pv_resilient_queued_spin_lock_slowpath + +#include "qspinlock_paravirt.h" +#include "rqspinlock.c" + +bool nopvspin; +static __init int parse_nopvspin(char *arg) +{ + nopvspin = true; + return 0; +} +early_param("nopvspin", parse_nopvspin); +#endif From patchwork Mon Mar 3 15:22:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999023 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C61F621577E; Mon, 3 Mar 2025 15:23:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015400; cv=none; b=Oyl9jJ3xhlfk2frrOUnZggBdpZd3mpbgQFOveUFc7gtQDEyX/e91v8hNT/0HqjRtEvl4crxPoXejyuGmyzNzmvcLWUYNqbOem3iLKLBejg5gjDNKski8c3bgiuMzs6VXt02KKwrIgCyTU4c7JpHeHO40qPkow6avxQk105DW1TQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015400; c=relaxed/simple; bh=C0ciCNeduyvvQFAO0DwlqwtcddwN6zNDgdODlI5CgNw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V1VIvNd2CFJev7WdfDLJsRPDaBpZOxY4Tk43Un5+zGQTc8Yk/ui1Y85kvFzhnnlmMUFp7WA0u7u5D0kke25u6ZYPx0me2NPHd3NduMJrDjnjdTwnYbSEIc7+0QJVbyDfXLLeFwwHG2UZ3uN8qxFFI/hHZ0vOrwmFWiM5KrzgNwY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lt/DfloQ; arc=none smtp.client-ip=209.85.128.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lt/DfloQ" Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-43aac0390e8so29239935e9.2; Mon, 03 Mar 2025 07:23:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015397; x=1741620197; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=R4KdASnnyjPBe0E8JbE/B8q1tkmUaCwFOXMgt/4BU28=; b=lt/DfloQHy0LC0EndTqHv1INvtUj8FZn/V4BmDCE2sGh1ywS768fs7KTuul71rFLC3 aEdFpCEW8wzbKJTXILCa6u+OotXzX61tvWQFw9+wb31VwUdDs4RfuKbZNBFoNA6bmRQl ytPzu62VxIbOJeuqNZLo5eAn46FtWnUzJtLysKPlRRXDaKf4bdYcqOGWmDIRoC+1gkRA hZp0fcnjQIwVoVf8Ev34BbaoUWo+BbgbT7DCSqgzim6cmxSchz6gQRhYsOAwSxB6HfDk kycvXXL2WqBebE8wGQ0+ptE/5lDB9PNP2uKigcd5aEp5+LA480viirzvQKvqgM2wKqur 7TRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015397; x=1741620197; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R4KdASnnyjPBe0E8JbE/B8q1tkmUaCwFOXMgt/4BU28=; b=mVr0QDvmXSMukFhZ7R5v0SlCLP+DH7MiejYooXKBW7NDHRJB79CVd1ogGjoatShDRS jrDQdfyxbfjbp9zVMN42eZOLOO09ZbKhuBqGCUQzeeJPgn6pjhluoCCQ1dJnyfToh/05 KTnCwc8YvhJW4SxpevQQIdlQGVeby8gKr2CUmwbsZvNJKUSz/S9Acqixaw7nZDvTP4j2 LqjLKacFyb7ZNolE6dIq1AftEtVIYBPnV/r+ZukapjF2i90+xYW320cl4r5LvVvqPAht +NMcN9Att5W2AYPXIcWCn0qnKHA2yRqeZDj5iDOzkZyxovDF+vNEbTzWkT/qv4HhtBcr 9yBA== X-Forwarded-Encrypted: i=1; AJvYcCVjkmFw1Z6GDxOYGDfV+d/3xBNgD3jck/hXDRCwiFZVX6XvWDBFFAfssswC72Lht0zScPu41KyH8mpyyRA=@vger.kernel.org X-Gm-Message-State: AOJu0YwI3Atno+Jq4Y8d6a0m1m7ioNdvhLiN8YUd9oHn4Uh5+CXKGFz6 O3F8qbZ58BBqr12FPkDmyAjvuKq6oenCsOdCLHkIQZd82Ceu1ZTxHi+OuOs8QNI= X-Gm-Gg: ASbGnctKNR+27FWDluihPb1n+rKM2pMbM+QVT8ifMrb31+jqqtu3PyZbvkvXEzRPziZ 0hNKnf8WVubO+y9VOT9mEbNpSF34tsfuZ7hwufMoW2rl3WwSnVjXHwRYjxyaylOduG6gJgM/QpQ TjUW9oOfoMsSSnhvbAXvU1xXffdieEZcSS9zZKTqojrsbUCBgVv0j5C/F7lithBCrG0OQP1kEh3 xXkzLN/hX7l6SDpXxpdoZso5dfOmk+ejGGNn0QA3fAzICq9pUGctbjw9Af5in7EYskiEzYvdFhR IYeSX3c/Dtea3tgYNQtreIQKyk+bxMolHHU= X-Google-Smtp-Source: AGHT+IEXzqJkS/D47SGG6s47BNUvrMNCFfD+WCo/Ixy2e0y3ir/cTpzrdXL3Bgt5bdvzEI6kv6bwOQ== X-Received: by 2002:a05:600c:3ba9:b0:439:a155:549d with SMTP id 5b1f17b1804b1-43ba66e74f8mr113912555e9.12.1741015396488; Mon, 03 Mar 2025 07:23:16 -0800 (PST) Received: from localhost ([2a03:2880:31ff:50::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc18452c3sm41377135e9.25.2025.03.03.07.23.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:15 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 05/25] rqspinlock: Add rqspinlock.h header Date: Mon, 3 Mar 2025 07:22:45 -0800 Message-ID: <20250303152305.3195648-6-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2297; h=from:subject; bh=C0ciCNeduyvvQFAO0DwlqwtcddwN6zNDgdODlI5CgNw=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWmeJIFZGQqQJ9zuXTndNKetm3e9meOOzs99f5 56FqOzGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyoR+D/ 0VwsuuvPBO31kFGb0t61y+N4gbFPQVPDJBrXCTRuRgO0fOw5WCQWwcpin+e1kMbSgYKAZzVwfLS3fl USxgEUjPhUFvS788CF15UqpS6n/tM0irZYhiD/4t5EWU0g0ioKfKj6gj6yBwqsSMBTsIRoIbfdAtpW A1sQkzl5Q8TKau41XUFL5/fRetjHPVeJCUKabg5SUIG5I9iY6ZjqHLHvD+LFUnOo8bFfHwkb1uWXh3 hFp6LuX4Ip5Q0OVOMs+ec1b0966SfNEsHIk3+Z8XY4f0eqnk7Ez83Zc/Hjz7csGn6uh8bcb6eG7ZYC +/A6aSKaqeZbUC4ssfb84IHdHW7hVWFUk4czo8NMacOAr3Qy1tlY/IIaFasBI7NmhQiZgLc8EPbgHi 7TMVD7hKBCP+NHjd0dPY2vu1dxsGHJ7glfhHL4X9QrkiqZi5/Mkl3n7BQmfN/AKLVD7Azctv4JBglx VxQbreb1HGCpiW2R+/1KbXrT472iuSH/6kefuuYYh1hQbdVrvkEdNJfBXlKOJBdRGO9hA4i5b/tbBn 1CQPYg9cYRVAZ147OeK3geUwV37j9ZjgLrI0+k1TkwXMQtUgriZg5GETC7ZgO+AO2BGZkTI4gFTPr6 Jo2JmjrdfCTWQ/LMn3yeXbbAnc4me8NdPXySZtWjaqKhp7V4WbNvPwwg8qPw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net This header contains the public declarations usable in the rest of the kernel for rqspinlock. Let's also type alias qspinlock to rqspinlock_t to ensure consistent use of the new lock type. We want to remove dependence on the qspinlock type in later patches as we need to provide a test-and-set fallback, hence begin abstracting away from now onwards. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 19 +++++++++++++++++++ kernel/locking/rqspinlock.c | 3 ++- 2 files changed, 21 insertions(+), 1 deletion(-) create mode 100644 include/asm-generic/rqspinlock.h diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h new file mode 100644 index 000000000000..54860b519571 --- /dev/null +++ b/include/asm-generic/rqspinlock.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Resilient Queued Spin Lock + * + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. + * + * Authors: Kumar Kartikeya Dwivedi + */ +#ifndef __ASM_GENERIC_RQSPINLOCK_H +#define __ASM_GENERIC_RQSPINLOCK_H + +#include + +struct qspinlock; +typedef struct qspinlock rqspinlock_t; + +extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); + +#endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 143d9dda36f9..414a3ec8cf70 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -23,6 +23,7 @@ #include #include #include +#include /* * Include queued spinlock definitions and statistics code @@ -127,7 +128,7 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; u32 old, tail; From patchwork Mon Mar 3 15:22:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999024 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0F30215F47; Mon, 3 Mar 2025 15:23:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015402; cv=none; b=PytKtH8+4l9R2MXyCJWoh9u1msihL13y1d/xhwfRZ9PlKC2uRhgdGuT7GMJKh5dhDdBtJT076AcYxSZqxR4S7CLqI1Ye3xxgO8TNNkAP81iqAix0tFc0rXK5Sn4q9l51ugJG36ggFXJS1r6kYlrrTUYOWhN1hsf9rvuGujZcQH0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015402; c=relaxed/simple; bh=x1Z+pB9xhs3i58kf9Kac9b92frWP4j0eHxNENfgFcNU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JJLlOZ8c1FffY9PW1elBlQxNvyY59jWu5jvf6VJ/OnCALkGIlUoo1KKjfVuqaPIeZwshIYrmQnlWBKAUNygKssaEkDy86S9vll9qATuq7Ax0VQlfQaEYMvc6ZQjb9+jIHao+1jv8Z5MYKGOCZxc0l7+WS2cd6tFxWgCC+KTfxjY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KAmJ6Ovl; arc=none smtp.client-ip=209.85.221.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KAmJ6Ovl" Received: by mail-wr1-f66.google.com with SMTP id ffacd0b85a97d-390eebcc331so1731940f8f.1; Mon, 03 Mar 2025 07:23:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015398; x=1741620198; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VWj7fRL2wjv9R4mziQW8bTTqGP2D1A3sv3W8wRB2IlU=; b=KAmJ6Ovlw8Jt9GHtuTHvPLmuzQDOPF+RqYkQY64cqbMXNpK2DlMu3jNeSUmdxI59Yk dPFzwVMKDm/qkkYtDSn6UHFAkl6B5S5UJlKupOj1C792RrJ8g21f8VhEbMi5BET+lhuk rkdq/uM06WZZWMTjhZoiyLfzqb9yb8V+ovRKXZ2fTzIMKLOZxK0y9dK2EvG9QLY/ZbIL MIAfPSIFdaCe57Cz4/A5fvikEQG03FTZ8JxMDLBeEF9Qd7iQvJ98aiwmyOtrbA1x539X 6bMk7OCHij3C4wymzEwcxCWZJgkXKORwEVTdg77WS5rpfzwXThbFWoPJzhe30kNMQFoo GdcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015398; x=1741620198; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VWj7fRL2wjv9R4mziQW8bTTqGP2D1A3sv3W8wRB2IlU=; b=Z1qn92tpH1LDlXSIyGrV5JxrYVbHX3EC1iWGAUQjrHHgyZWVvLmv0HOK0Ynl5qct3a /HlD8TigSnqmsIPVQD5GxB+0WfXKyqNLZOcywuDTv3ZKAzl1OWMt0eL4KaxJQ42XOvTq iu7YWXdnp4671S3cc9n/inf380ifmHDZulDaMCh/vrWgOYqqrKKkLLkLABcqKTicAIiq piZHf/BvEycjBfe3ksDebFR2bd3nkApzlH1+Xn4oqNesCFB9D8SdlrzLXzfyMSBFte5T 7Rjwu6ZRIrR9AXfONd3g5NPoJwgKAO4IG/0BX+VYXXScrNsOByjhKdNL7In7JGJs39pl k1nw== X-Forwarded-Encrypted: i=1; AJvYcCV6L/TU9/KIj7ELbNAC2V0/hx3ZsrYgWoFCSANWsWZoy7Dnpyg2Qhr2CnW/NBKtpeLheX1ZTLw/fGCi558=@vger.kernel.org X-Gm-Message-State: AOJu0YxMN+CUP30Z10lZddrnkbNOnZtiiFxowgYBXeBFWoD2lXHd/8Nk 6q/4gJxznRE69nnW/clJq1QvkBhew8+3ArLht8OdapaRBjglRja6m0nGePCUCG4= X-Gm-Gg: ASbGncvWxGjY/HGdYLSZtZidpYoCbTJcu4rIDinWqNJHOoNhoq9xk6j5OepRUwsMrQF 9aDqnFAO2ETvF+V6+9yRinX/oTA8JnCQgcQBwbUgWWrH7F9Cj9rIdcBPbYPx9ZtRHNx0ks8aYHR aRx9BY6YRyDzVPEGMePyecvK9jQKVERqYtUUUBlg9R3IJTLUEXeUsNaeA9BaVrpc/Bovz8IRMsB H3VAxES21wKUh7UXa9docIBDE99uXukNUQSZWIk+h9NV+VgnuE/yd581W671SBDDfbb77/G7HuN 1BSTS9MJZRt2X7gfkSi2dULp6umTRaelNA== X-Google-Smtp-Source: AGHT+IFjm+ROX4Lt+V90Jn3TMk492UXdruKl3KBdRoIEAwuyla143y7WSrkvsYEZT3Xx7lgzY1sCvw== X-Received: by 2002:a05:6000:1787:b0:38d:df15:2770 with SMTP id ffacd0b85a97d-390e15da77amr15041952f8f.0.1741015397860; Mon, 03 Mar 2025 07:23:17 -0800 (PST) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b736f8034sm169770555e9.4.2025.03.03.07.23.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:17 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 06/25] rqspinlock: Drop PV and virtualization support Date: Mon, 3 Mar 2025 07:22:46 -0800 Message-ID: <20250303152305.3195648-7-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6795; h=from:subject; bh=x1Z+pB9xhs3i58kf9Kac9b92frWP4j0eHxNENfgFcNU=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWAhlIoArRUBi9+QCUAqMzXrVAKF0JtHZdQgoX fPFJU92JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RygwXEA DAmIu9BajEdITHz1mKPuYwgeMXa7iRtPHxfqCpPmzf/TH+bsToJCV02MBZgALM6kWCm/5rAUSraLhw BnMrlVK/RcAw8Kxjwu1xmv4nZtV3SYbVUGx9WVMII5Oeyew86x2PDffmVG2n1obhHHjW3irYV8YsdE Yp1hHfinMz7/BSq/yV3BC5t/Xnfyeqm/J8YZY2QWNJBC4EKexPFCjmswgCSAaSFCcxhApWXXJzknMU MFRWSRvdZ44k9m+DhbcggcjpFSytSET3RxyTw4yuiGnXVqdQBBg0JBdMGJz/TatR2aEVIb7cJrYFAg j53Yifny9409xdRc2gl46en9AmpDm3/WgCsa0MG4u2UZyRfxyJFucVeD8D37S0ybhHJqnoRd8Idhi7 QmQz4h/wP2rzNMb9gJ5XgohE5pZ2V72I2JHpPuL7p4uQ5QTqHm+xQyirFjDGGmfxNDIuH/1w//0thM DL6ypO4I4NPgIyNpKI0IVWdOcJ4sbxDX2seupxCRksDYYMwlRTYnEPgbMhqobCiAvBuEuRRvINLHkc gdFHigGRIq+ZNPJc74y2xJvhNd4dehMSg8IlVcZDLDmYHHHUCDiHbmlnJLWJ+hw/5iLNlSJtmRadiM JCiKxpddvVm6tqQTMDIdqvcgLcDn7u2gAg21mmdVh/8SvCqLxQsKxzk73jWA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Changes to rqspinlock in subsequent commits will be algorithmic modifications, which won't remain in agreement with the implementations of paravirt spin lock and virt_spin_lock support. These future changes include measures for terminating waiting loops in slow path after a certain point. While using a fair lock like qspinlock directly inside virtual machines leads to suboptimal performance under certain conditions, we cannot use the existing virtualization support before we make it resilient as well. Therefore, drop it for now. Note that we need to drop qspinlock_stat.h, as it's only relevant in case of CONFIG_PARAVIRT_SPINLOCKS=y, but we need to keep lock_events.h in the includes, which was indirectly pulled in before. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 91 +------------------------------------ 1 file changed, 1 insertion(+), 90 deletions(-) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 414a3ec8cf70..98cdcc5f1784 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -11,8 +11,6 @@ * Peter Zijlstra */ -#ifndef _GEN_PV_LOCK_SLOWPATH - #include #include #include @@ -29,7 +27,7 @@ * Include queued spinlock definitions and statistics code */ #include "qspinlock.h" -#include "qspinlock_stat.h" +#include "lock_events.h" /* * The basic principle of a queue-based spinlock can best be understood @@ -75,38 +73,9 @@ * contexts: task, softirq, hardirq, nmi. * * Exactly fits one 64-byte cacheline on a 64-bit architecture. - * - * PV doubles the storage and uses the second cacheline for PV state. */ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); -/* - * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs - * for all the PV callbacks. - */ - -static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } -static __always_inline void __pv_wait_node(struct mcs_spinlock *node, - struct mcs_spinlock *prev) { } -static __always_inline void __pv_kick_node(struct qspinlock *lock, - struct mcs_spinlock *node) { } -static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, - struct mcs_spinlock *node) - { return 0; } - -#define pv_enabled() false - -#define pv_init_node __pv_init_node -#define pv_wait_node __pv_wait_node -#define pv_kick_node __pv_kick_node -#define pv_wait_head_or_lock __pv_wait_head_or_lock - -#ifdef CONFIG_PARAVIRT_SPINLOCKS -#define resilient_queued_spin_lock_slowpath native_resilient_queued_spin_lock_slowpath -#endif - -#endif /* _GEN_PV_LOCK_SLOWPATH */ - /** * resilient_queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure @@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); - if (pv_enabled()) - goto pv_queue; - - if (virt_spin_lock(lock)) - return; - /* * Wait for in-progress pending->locked hand-overs with a bounded * number of spins so that we guarantee forward progress. @@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ queue: lockevent_inc(lock_slowpath); -pv_queue: node = this_cpu_ptr(&rqnodes[0].mcs); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) node->locked = 0; node->next = NULL; - pv_init_node(node); /* * We touched a (possibly) cold cacheline in the per-cpu queue node; @@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); - pv_wait_node(node, prev); arch_mcs_spin_lock_contended(&node->locked); /* @@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * store-release that clears the locked bit and create lock * sequentiality; this is because the set_locked() function below * does not imply a full barrier. - * - * The PV pv_wait_head_or_lock function, if active, will acquire - * the lock and return a non-zero value. So we have to skip the - * atomic_cond_read_acquire() call. As the next PV queue head hasn't - * been designated yet, there is no way for the locked value to become - * _Q_SLOW_VAL. So both the set_locked() and the - * atomic_cmpxchg_relaxed() calls will be safe. - * - * If PV isn't active, 0 will be returned instead. - * */ - if ((val = pv_wait_head_or_lock(lock, node))) - goto locked; - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); -locked: /* * claim the lock: * @@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ /* - * In the PV case we might already have _Q_LOCKED_VAL set, because - * of lock stealing; therefore we must also allow: - * - * n,0,1 -> 0,0,1 - * * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the * above wait condition, therefore any concurrent setting of * PENDING will make the uncontended transition fail. @@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) next = smp_cond_load_relaxed(&node->next, (VAL)); arch_mcs_spin_unlock_contended(&next->locked); - pv_kick_node(lock, next); release: trace_contention_end(lock, 0); @@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) __this_cpu_dec(rqnodes[0].mcs.count); } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); - -/* - * Generate the paravirt code for resilient_queued_spin_unlock_slowpath(). - */ -#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS) -#define _GEN_PV_LOCK_SLOWPATH - -#undef pv_enabled -#define pv_enabled() true - -#undef pv_init_node -#undef pv_wait_node -#undef pv_kick_node -#undef pv_wait_head_or_lock - -#undef resilient_queued_spin_lock_slowpath -#define resilient_queued_spin_lock_slowpath __pv_resilient_queued_spin_lock_slowpath - -#include "qspinlock_paravirt.h" -#include "rqspinlock.c" - -bool nopvspin; -static __init int parse_nopvspin(char *arg) -{ - nopvspin = true; - return 0; -} -early_param("nopvspin", parse_nopvspin); -#endif From patchwork Mon Mar 3 15:22:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999025 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82B6F217F46; Mon, 3 Mar 2025 15:23:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015403; cv=none; b=fhmcXD2o5wyr+iXCNNR811c3evct1V0TJ4DVQzzh5m6yvySl+NRORxd6LNNQEA62VAnI/owNo9arm4xFx/c8do1x/Sy1BMFMeMo3KAJT3CN5rLyQF1oHY3HGvYljDr8/JOGyyPgvQvot1WQRKGq8g1wmh4gBqVnyAJwAzPZ3baA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015403; c=relaxed/simple; bh=yvqL/a4hlJcETEVluNNRDX4NLx5YZM2izb98JHL7vqQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cLnvRMaJno3sYgKKmVNIYGLc2uv2hKclwmxaaqzVI03xY95Hehmr+Oa+rBlr23oJYwPikjFU/FXL4xHMRZdtTYy6fQfw0gne5IKjrk7v0yXIZ4hKIPtpDE/D893Z4t9LKG8rGY/FFPCNf5tcGc7/rhKJzzNqNvjnPIf69G6MXbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=b+IrzupR; arc=none smtp.client-ip=209.85.128.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="b+IrzupR" Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-439a1e8ba83so43690825e9.3; Mon, 03 Mar 2025 07:23:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015399; x=1741620199; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=N/HTxEZqZcqCwShvGdM3eA0fWGvVAu5JcH2SiIADZmE=; b=b+IrzupRyeEecmDU1c1L87OUw+ZaA3EpXUTie8OXAYLIrH/crnDwkbdZXbI5fkCGgr FVERZuXKnprHRACrcvqxPRdu9ysstt7tFALWCvSQA+qFtRoN100mGFSXkMTIYrwhSg6n Z0n9t1nFfqdhMMe9cn6kvvQIzIctLsdjgZrS2gClJtYQN2xiFhB+zAhYzmzUeLyngAtU jaJrooyLb+KoYFEngMQDEIQvFcIfs1pO34769Q2+yQ6sxajz7YHS3ll6yRIHI3V3Daw+ O47a+1wszwp2Fb3ccw/496HA5yqYqIs5yZr2BMPaGJVIaAvByUI6M4r6KeYiU/pWwsOV xzBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015399; x=1741620199; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N/HTxEZqZcqCwShvGdM3eA0fWGvVAu5JcH2SiIADZmE=; b=H1w0Ptr00YcIiqVbo3mtN/H+MBV/72hJEo6XyYu3ss+rPYXQPXISENaWe1DfR3hptL PNsdZacAY/mb5YRhHrMkv/rfIEAMn1IhVCYGtCZCIPTGNDfmYpPxVZjCcV5CqGEql6px +pptyrcI2OS8ejZ4swfKG2fxFJJIKOVvWcPPkDT/BItZRvZGMvhhfCMeTlisQXZ3G2E6 2YaubY6BNejAuaTKqsY/yQAHCrxvyta9vqTaVvL5Rhh3rr825ZNRLDlCTlu0UiuwNFmq kGjN5nOP+mOzmClRAXbnOwF5dJmiUZNfKZsouD4TH7p9Rldg743Cy2VIV8939ux1xe6A iKvA== X-Forwarded-Encrypted: i=1; AJvYcCWFRFs2d1lrG2aXxFJRCBUSbWBLEqpUEMQHeQI//ozhHBKFOx5VttvNaetikrcwRn/+G/BxgspqMeKerUk=@vger.kernel.org X-Gm-Message-State: AOJu0Yxig7dPLWSck70lBe4/N3G2ejqhKFKvbkmBN/5cqN2M6yXTVCdD 9Qj4EC/QrJsbtQyU2hghkFT6r/J/nZACF2/1k375Tm4PAabbvrggVVwCJPU+G10= X-Gm-Gg: ASbGncvOsNyNnBe3yI3JU/f36rXNdlc7Kdllqux5tRqV2hadkorOoL10ga5ZH49km30 URl8+voKKNjLxnY4D4WIppBxSJWEh/EHtyaFq6QU37f8NjqWVLB1Qfaa7Z8bS0P4RaeAoK2VqeW Cf6n5n/lDOD5qwU2EFrFCJs0a6I+4MgC4jzFe/nWL5jIxKnwOEUi/2xJBkN+PIe41SvmK2exQWs Q0VItx248hy314jfT9YN3wD4Z+nO1gQBiSvQuBN4xV9W6n5N4xkuFyAd5hpDhA6f1PNLC7kdxPF T7mAkPPJxdy5cOCo+NjceIoUI/1GytWDsDk= X-Google-Smtp-Source: AGHT+IEbsAZvJmITJ9ICif5ulCDnUNtcT3LryQIPvfnsjMgh8qRyST/7vsmVWTkT2yKdvlAmVbm5gQ== X-Received: by 2002:a05:600c:138e:b0:439:9a43:dd62 with SMTP id 5b1f17b1804b1-43ba675a8fbmr94459815e9.24.1741015399077; Mon, 03 Mar 2025 07:23:19 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4e::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bca26676esm4051765e9.8.2025.03.03.07.23.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:18 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 07/25] rqspinlock: Add support for timeouts Date: Mon, 3 Mar 2025 07:22:47 -0800 Message-ID: <20250303152305.3195648-8-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4618; h=from:subject; bh=yvqL/a4hlJcETEVluNNRDX4NLx5YZM2izb98JHL7vqQ=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWW5YAHky9QnaXijfLpbPfmrcJbz9mGABaMN7Gq 0o42/62JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyhFZD/ 98oVQe8PFJhIllEShh4nhwUHn8NyXfjHgZA8EYD/afmXUtbe6cebWH5AkecePI/ENfeIJGVq2k75qO bvWe4RtiqLDWBsJl0E7U+s2KeDZ0DSk9f2WEvtzxSmCp/7UvUNNCppI/KwpLK+0e7hAe+GARXNg60S gvFuMulaOvXckRa9tpI+Hr6QKTNcN2LRmRExBXLd9T3LEoZTDJmyajfWO+Rfz5YU0g5KANdcy5e0qn GSzgVdRNgcSQOk3Zbp4619EukeKa9f23Tg02x5DCOUpx6mZ5nLqck/vFZ9oU8Webx+07h2RX91o2Kv TN1pq4jrCnQVVKWpOXrlnqPje97GpGVOCULGt1HNSZjpiEHi1aaxH6G0Y04RUnW0tHWAoxe7UDJk0o fCSuU0DhZpfltTY7/WE+Vf0bJFidSjLAaFUBjfaGSjVpQiuT/NaM6RgcLOAfFZ+kZKor4iNdDigsyS uMikiOXPVSq5YpOAmNxGz++xsEFagK0j1pVs6TqaC0dd7hKb57Ww60taY11XX+2jebXLUb/Ujov7TM eHo97ARRBCdiHc7+5UtfXPPw64OraEi2LfbonjvDU0tqtoL2yij/h7RNtLLUQP0RCqFJfuaR1AezSv BHNMKZgxnulhR86H/4dzv/v/FYXTwu9j3CZEhgrEpd9sdWsCL1hOwLpR5Hcg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect when the timeout has expired for the slow path to return an error. It depends on being passed two variables initialized to 0: ts, ret. The 'ts' parameter is of type rqspinlock_timeout. This macro resolves to the (ret) expression so that it can be used in statements like smp_cond_load_acquire to break the waiting loop condition. The 'spin' member is used to amortize the cost of checking time by dispatching to the implementation every 64k iterations. The 'timeout_end' member is used to keep track of the timestamp that denotes the end of the waiting period. The 'ret' parameter denotes the status of the timeout, and can be checked in the slow path to detect timeouts after waiting loops. The 'duration' member is used to store the timeout duration for each waiting loop. The default timeout value defined in the header (RES_DEF_TIMEOUT) is 0.25 seconds. This macro will be used as a condition for waiting loops in the slow path. Since each waiting loop applies a fresh timeout using the same rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the values can be easily reinitialized to the default state. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 6 +++++ kernel/locking/rqspinlock.c | 45 ++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 54860b519571..96cea871fdd2 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -10,10 +10,16 @@ #define __ASM_GENERIC_RQSPINLOCK_H #include +#include struct qspinlock; typedef struct qspinlock rqspinlock_t; extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +/* + * Default timeout for waiting loops is 0.25 seconds + */ +#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4) + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 98cdcc5f1784..6b547f85fa95 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -6,9 +6,11 @@ * (C) Copyright 2013-2014,2018 Red Hat, Inc. * (C) Copyright 2015 Intel Corp. * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. * * Authors: Waiman Long * Peter Zijlstra + * Kumar Kartikeya Dwivedi */ #include @@ -22,6 +24,7 @@ #include #include #include +#include /* * Include queued spinlock definitions and statistics code @@ -68,6 +71,45 @@ #include "mcs_spinlock.h" +struct rqspinlock_timeout { + u64 timeout_end; + u64 duration; + u16 spin; +}; + +static noinline int check_timeout(struct rqspinlock_timeout *ts) +{ + u64 time = ktime_get_mono_fast_ns(); + + if (!ts->timeout_end) { + ts->timeout_end = time + ts->duration; + return 0; + } + + if (time > ts->timeout_end) + return -ETIMEDOUT; + + return 0; +} + +#define RES_CHECK_TIMEOUT(ts, ret) \ + ({ \ + if (!(ts).spin++) \ + (ret) = check_timeout(&(ts)); \ + (ret); \ + }) + +/* + * Initialize the 'spin' member. + */ +#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; }) + +/* + * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary. + * Duration is defined for each spin attempt, so set it here. + */ +#define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; }) + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. @@ -100,11 +142,14 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; + struct rqspinlock_timeout ts; u32 old, tail; int idx; BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + RES_INIT_TIMEOUT(ts); + /* * Wait for in-progress pending->locked hand-overs with a bounded * number of spins so that we guarantee forward progress. From patchwork Mon Mar 3 15:22:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999026 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B70F6219A9B; Mon, 3 Mar 2025 15:23:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015404; cv=none; b=n/UF7jKMIUgwh8oXverjEP4L2SpTYd19BVL1v3NnAuDUFvWKEt4gfd+tIQsGx54nB/GexDb9s6xX5MG52PyAFR5kEwh3xyT4qFXVNXRC0LEeLJRhvYMnr9OudKlPxyUoTzZDnuRAghX+rV9ZLu5PYO3m3dLCB6M2T3XAUGwE5UY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015404; c=relaxed/simple; bh=OXZjfkj9vlmQB5DKUaVgZP2eUXPIscjbGs4w8nWWb1Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EV8X02VIbpASsvvsTntkBrBmJvOLg2DAiNt7dYDW7fjwvK3bCcrFpzCAVKROw+5U7JWijJx+d9x0FMo0q1UGuN2UCU/OdgGkyaiaK5Jww0W9Z4LnFGAXcBGuEcL+09VgrJkL+Z8GS+IbaVwNQS0pIkIh7apReOtPLOtEwlO6oQA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Mwp17mOC; arc=none smtp.client-ip=209.85.128.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Mwp17mOC" Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-43994ef3872so29735825e9.2; Mon, 03 Mar 2025 07:23:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015401; x=1741620201; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CiH4DXwFE1Biytfxts8kRShbn5OFLAeIwdTEbcHfEBs=; b=Mwp17mOCz2Oihm1rAVi8dFSp3Z3rfI02Yv/BXOU4E6BJ9YD6dD7g2DiJ1lkfGteTKf 3DjZ7/zjSfp+28w+01jhtnsk7xXce6+9PuIcrIvi4Q97Pj/auUAjFdxbJQSpama4jfvL Eeg5g/N+DkRVvRtGEn7ue2zuODVHuW3DcsZ386IAlD50PyPC9NYaD4diioaffX8yFKJN zLqbwv9zN2dY2LcD2RGau704mJJn+agZQARLNfoKHqxokqYLzBZAfRr4ydHtdN9/aygS WJNOHT2SACr/YEgbUKsz/eESZh1CFjC1VcnSPvYuFOhT6XVi2P6lZqCkNmYagqTwKTkb AZFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015401; x=1741620201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CiH4DXwFE1Biytfxts8kRShbn5OFLAeIwdTEbcHfEBs=; b=wE8pGxnHxLu7gHH+sCLx1u8XPI1f1kvOS4SzQoJwKxd5DtE+fSkW0gZW7GTIWkbWPa 8T5+i/W2FaHun1RFN19+IH2fKpdjnrHhqHbtUeU8Fr5kQNzftf1wVC1FsvC9qUNIg4x9 s7sFEbX9RiIHvCoKZQ0zR2zmIIdlN/jlQVoFdIrlRoaE7aYtCuf245yyJH4dDR6SglXc cMK7pmWbZccAEq5/7AgrPTiHuBlvu+d2PX1JmepUYedOVCjvbZrMJoNQib6HKDltmmgn 7jYn7SucogrNaU/cl+CDpmWR+Pz9Da8GmEb6JExIc/8rcxdr/cg6rJZXgB6US6qJV8/x o98g== X-Forwarded-Encrypted: i=1; AJvYcCXCTJoDovzn+c60tfPC4MylbyeWB+MVtqiAZjHDVma2oNuBDGeLqVmgJSDptVKIuoXLNaEybbbCmYgk26M=@vger.kernel.org X-Gm-Message-State: AOJu0Yy89SnkIu/HTimbvxdRU+FtNv6iTNB2qL+UV8zwBzkJrscXfUnf /JE03X27255Ffe4N5F2tj0G4uu9OrhM7MS6brFOIy+0u1cbfpyUYCiwybGQ2pb4= X-Gm-Gg: ASbGncvlSmpwfE8Dd/BCXiYND9mqubJ0troQI7KKxI1NnF0PTHPakAVnOILbYrHrLA3 aN81ZgNG0i72YmVqqjEiGegmF51QEhc0pOaTxdxN4T2/FwDPRHZ3LVW6XiN8sO4xwINkYxYtAYj xTei7Hb/BYGMaI2nvTdbD4H3fAMmAn+MAoe5RuyGLpZRy4qYRMWUlVmeea7Jr6edlYlTRdFIqJJ rg2DjQ0/58xEzdsv5sB1ZObYCcn0CIUR/JoAXURt7ipx57VaYwHDYbc0tlp9yQKN4xBR8oiVAGs tCjiF+xt3jDDBdEFAWbH4R+OY24scy+TfA== X-Google-Smtp-Source: AGHT+IGX9Uo3KWT7ZP1Tg6bcJFL7kn9d7FzFa8+K2r3RPDuxVc4WDWiD0YgV40rXuhBZGtIqZf8VUA== X-Received: by 2002:a05:600c:6b65:b0:439:a6db:1824 with SMTP id 5b1f17b1804b1-43bb3c30d77mr71469755e9.16.1741015400470; Mon, 03 Mar 2025 07:23:20 -0800 (PST) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc11e1c8esm42306695e9.32.2025.03.03.07.23.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:19 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Ankur Arora , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 08/25] rqspinlock: Hardcode cond_acquire loops for arm64 Date: Mon, 3 Mar 2025 07:22:48 -0800 Message-ID: <20250303152305.3195648-9-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6192; h=from:subject; bh=OXZjfkj9vlmQB5DKUaVgZP2eUXPIscjbGs4w8nWWb1Y=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWOoYAiRmhF6y065QEj1LrKSfWllbpNyyYNZSa qSwKiLWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyoQ1D/ wLlVGYocGxdNg4s1QyAFfYsSQi2WqV1eBWYZSjpNeeklY5NGh1Tkry86bADuiPA41U+6sKcb/xJFWh P98uXD3GNRpR3Qc0ZlcFxJ8UNRa1pt3WjaL+WU2IF7ab7pFZ78BILJH5IehQy38rq89MVWCIwd2y8a RHZEOFbn9h2lwC+gVyaVp6R/8EqwozY7peLmtmdEFyLzkanBkkKXXc+vmf2LUSP//uz18W5Dvlk/u0 Ij4Dh4XIXBZs+zio7QVmeQ1DgFpVZqqqb4UgpoAdO7QiQV0VZszEUsbM5tyg0NeNseWkS9u9vn89cy o5IFAwdAf7z+FqxhLOTysBwrma0uhfKDfzVTbqISED2s8YCgwTcFTVLcOnpawilUTon7ijZDzJQQza 5cEcDi2MdF3WOvcWqxiMLcgBJpoJwzQQ6MByaMhPTBx+dbyWoJwX54M+fwqGKqjL1nIy4s38v/fflb 5KSHn5vrHcWHNLNI4f1kxITi+5bQsaMeifIC5ltUctjwP0uh/WVhq+B63kLgKyxgW8zDe5RkYxvVVZ W2bMlEgbU71tVLuA5VUZTOizoax2TL2hJ1Ox21/A0q4t3m/Th739SY3jusfPle/dtSkQURdfj5CVS+ 7WI5z2oPcdvV5Fu3sRfrtb1EwqRYq3nUqzxzQ781Ej823z3Nf0EW8WW0F5Dg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Currently, for rqspinlock usage, the implementation of smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are susceptible to stalls on arm64, because they do not guarantee that the conditional expression will be repeatedly invoked if the address being loaded from is not written to by other CPUs. When support for event-streams is absent (which unblocks stuck WFE-based loops every ~100us), we may end up being stuck forever. This causes a problem for us, as we need to repeatedly invoke the RES_CHECK_TIMEOUT in the spin loop to break out when the timeout expires. Let us import the smp_cond_load_acquire_timewait implementation Ankur is proposing in [0], and then fallback to it once it is merged. While we rely on the implementation to amortize the cost of sampling check_timeout for us, it will not happen when event stream support is unavailable. This is not the common case, and it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns comparison, hence just let it be. [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com Cc: Ankur Arora Signed-off-by: Kumar Kartikeya Dwivedi --- arch/arm64/include/asm/rqspinlock.h | 93 +++++++++++++++++++++++++++++ kernel/locking/rqspinlock.c | 15 +++++ 2 files changed, 108 insertions(+) create mode 100644 arch/arm64/include/asm/rqspinlock.h diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h new file mode 100644 index 000000000000..5b80785324b6 --- /dev/null +++ b/arch/arm64/include/asm/rqspinlock.h @@ -0,0 +1,93 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_RQSPINLOCK_H +#define _ASM_RQSPINLOCK_H + +#include + +/* + * Hardcode res_smp_cond_load_acquire implementations for arm64 to a custom + * version based on [0]. In rqspinlock code, our conditional expression involves + * checking the value _and_ additionally a timeout. However, on arm64, the + * WFE-based implementation may never spin again if no stores occur to the + * locked byte in the lock word. As such, we may be stuck forever if + * event-stream based unblocking is not available on the platform for WFE spin + * loops (arch_timer_evtstrm_available). + * + * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this + * copy-paste. + * + * While we rely on the implementation to amortize the cost of sampling + * cond_expr for us, it will not happen when event stream support is + * unavailable, time_expr check is amortized. This is not the common case, and + * it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns + * comparison, hence just let it be. In case of event-stream, the loop is woken + * up at microsecond granularity. + * + * [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com + */ + +#ifndef smp_cond_load_acquire_timewait + +#define smp_cond_time_check_count 200 + +#define __smp_cond_load_relaxed_spinwait(ptr, cond_expr, time_expr_ns, \ + time_limit_ns) ({ \ + typeof(ptr) __PTR = (ptr); \ + __unqual_scalar_typeof(*ptr) VAL; \ + unsigned int __count = 0; \ + for (;;) { \ + VAL = READ_ONCE(*__PTR); \ + if (cond_expr) \ + break; \ + cpu_relax(); \ + if (__count++ < smp_cond_time_check_count) \ + continue; \ + if ((time_expr_ns) >= (time_limit_ns)) \ + break; \ + __count = 0; \ + } \ + (typeof(*ptr))VAL; \ +}) + +#define __smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, time_limit_ns) \ +({ \ + typeof(ptr) __PTR = (ptr); \ + __unqual_scalar_typeof(*ptr) VAL; \ + for (;;) { \ + VAL = smp_load_acquire(__PTR); \ + if (cond_expr) \ + break; \ + __cmpwait_relaxed(__PTR, VAL); \ + if ((time_expr_ns) >= (time_limit_ns)) \ + break; \ + } \ + (typeof(*ptr))VAL; \ +}) + +#define smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, time_limit_ns) \ +({ \ + __unqual_scalar_typeof(*ptr) _val; \ + int __wfe = arch_timer_evtstrm_available(); \ + \ + if (likely(__wfe)) { \ + _val = __smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, \ + time_limit_ns); \ + } else { \ + _val = __smp_cond_load_relaxed_spinwait(ptr, cond_expr, \ + time_expr_ns, \ + time_limit_ns); \ + smp_acquire__after_ctrl_dep(); \ + } \ + (typeof(*ptr))_val; \ +}) + +#endif + +#define res_smp_cond_load_acquire_timewait(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1) + +#include + +#endif /* _ASM_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 6b547f85fa95..efa937ea80d9 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -92,12 +92,21 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) return 0; } +/* + * Do not amortize with spins when res_smp_cond_load_acquire is defined, + * as the macro does internal amortization for us. + */ +#ifndef res_smp_cond_load_acquire #define RES_CHECK_TIMEOUT(ts, ret) \ ({ \ if (!(ts).spin++) \ (ret) = check_timeout(&(ts)); \ (ret); \ }) +#else +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ + ({ (ret) = check_timeout(&(ts)); }) +#endif /* * Initialize the 'spin' member. @@ -118,6 +127,12 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) */ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); +#ifndef res_smp_cond_load_acquire +#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire(v, c) +#endif + +#define res_atomic_cond_read_acquire(v, c) res_smp_cond_load_acquire(&(v)->counter, (c)) + /** * resilient_queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure From patchwork Mon Mar 3 15:22:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999027 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFE7921B9F1; Mon, 3 Mar 2025 15:23:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015406; cv=none; b=rmSQlQJXrwjXacTui/LH/LCN0JCRk+6aQEi+5c/lhMvN3I5aCeBQvPkwKjQ+onRqqlGKmpKfaBygVdVV16r3o99yC/+eHSXhlaEpFgwjqIbubmn1cH1Cxl9dQcCUko69rWuWocjroXn1CCsWpAP8dnRqNVQi6pto9BfHcDR7ZB8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015406; c=relaxed/simple; bh=1XvWGmZCOVeyUxqvFf8Y0s6SXxKLnF+WaHRbErsWH/c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IfJoSx7GfLR4KWXL9xnFnra6pSaeM/6XxFZzdp3jLy7gWiLxYWFg0rKNB7CSMo7jalQEtz9kyqtuLKDhRBHd0aKXFy5PmQVmSVvEclMezz87m/9Adk7vsOljY+2l3eSHAwC0Zso+/6Y6RhZZSH/l7ZBBIHUajgcRfzU3Sq/Ge4E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=coviwiKf; arc=none smtp.client-ip=209.85.221.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="coviwiKf" Received: by mail-wr1-f66.google.com with SMTP id ffacd0b85a97d-390df0138beso2442175f8f.0; Mon, 03 Mar 2025 07:23:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015402; x=1741620202; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GSHeNmV46kRt/dp4H0+hnJsEV9KJXQ8UwiwQeMlZ2+c=; b=coviwiKf8BvG3pKRk5I2UH0LUS0JqEWF6Q4VO+H0uI4xlrLbBElW8choyUqoBdPXmV 8/OGXf+KGJ094ukTLMTNcDqRhZzYfMFlEUwfF8lByAa9WBfiJbPu5w5sRmcaYTQJqewS HX8FrkB0TlETmUGX5lXYTGDPmp0DEPiVOTIcxf84zDBMDDrfb3vcu/oa3BSxJ0rySoi2 TBlgv4NTr3kqNgnL8VCOgTX2gj7XRNz0EgM0vyyl1pX98LwJzz0QhME+J4ESJgJdWPJE gwbUCG5FYr0n1u91JnvmUkRdZaokRu8ZEz9CG4s+/X5/xKbAXX5I/DpTElPGFe+WLL1Y REdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015402; x=1741620202; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GSHeNmV46kRt/dp4H0+hnJsEV9KJXQ8UwiwQeMlZ2+c=; b=UPMCCawzgLN1u73Y6FDenfFra4eeSHd3kSzMX5wyKS3Bo4ETj1CV30rXVYrEDfJiJ1 CWRE5Se8dSw+zI34oGgYG9WeanJfYMBNotbjzY1YkbzF9Y5i/Q1yrZkGkGRkJQ3vQCVf s4UaCogyLbStSOFY2ClPJTrtMGVh4L3XtCHxIqCilueJomfz6DWKK9/GkcFcdIZ/wtDb kSbgMPc9B+qq+LEK/z1aC6FLQbD8IWamrM1+eIZE0ZmQmRTWkecehBR2Lc+hdo0zGliG 2ki82wrvsryYoF2pbnr4kxCbHXHtkXgFRXxaai3fP3QAg+OLD+4r1Z0pBp9avRDZTeKj O8+Q== X-Forwarded-Encrypted: i=1; AJvYcCWFttPu1WrSjd4TIa2dKsNTZKeuLWMzveAs7+yW0kxSlt4ObtFSIsx0RGb0DiG5K4gQTJrijvx9PiyAsyY=@vger.kernel.org X-Gm-Message-State: AOJu0YxyxTuFBYOUzI5U8vKCsQoB2MoITcm4FwJJQ+wbVUlSgw04vV/x q5VslEudZyDI2KK1b4ogmvC124jsbHyc+Tt63L/yMX5bLFqaab2jG+8gJXvPvTo= X-Gm-Gg: ASbGncvOVHKb3YhT5JnofNHlqmIBPPHnCK6dQkxyds64Qe68Rj1r3gzvsNvga9JiYnZ krLfUmWrb/r+ZdB64+6XIJHd8Ve3amPZe15BzSAJfkgF4FI6V/TzN0px4GCelGDX2jsLZrvdkVH jclWzF/GFmannOMRGxSS8QwbuuedAY/wsVGnDTVp9w/oQvYtb5gzJQOkyZDehthDDCvP9e1A65l 97rvNyaFmb2qS7fh293j3KWrkqRASEVCHXzOPQugV+euZR8vcSBS3TaHTKq+Tjaf5lc6ArH1cUL GYNwy2W+p1DO7ha4gezGV1xE0NnSvmUIKxI= X-Google-Smtp-Source: AGHT+IEnT4A+TUqzNaTrtLXu5OtEH4QH2hmaxPGiGwoQEqitjFfl0aXZCCxloD3CNgY+V4T+aWGquw== X-Received: by 2002:a05:6000:2c4:b0:390:fbba:e65e with SMTP id ffacd0b85a97d-390fbbb1ce1mr6065042f8f.32.1741015401792; Mon, 03 Mar 2025 07:23:21 -0800 (PST) Received: from localhost ([2a03:2880:31ff:51::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485dcc1sm14484132f8f.87.2025.03.03.07.23.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:21 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 09/25] rqspinlock: Protect pending bit owners from stalls Date: Mon, 3 Mar 2025 07:22:49 -0800 Message-ID: <20250303152305.3195648-10-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4562; h=from:subject; bh=1XvWGmZCOVeyUxqvFf8Y0s6SXxKLnF+WaHRbErsWH/c=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWW53FYHDbQPTx2MaF10QE+C9S9CqxKdZN1xCD5 5EH85qGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RygSbD/ 9QtT69Fbx2ynZFLU/vCOAn9155npdoPHm32pPMjxG7qV6+Th71XhvC5I9p08WQrwACm1JzGxkepYjm pDnmwck8jRMUBZWjdXi29dJEj7At072EjdZOXCp/Pzn/DxG1hBXTY/ZH9GJo3q8nxLqI/8mzOGpvR2 b9bpAUy1mnqdSz0AULDNnLik1YTh5mj/YEE1p2drByM6bXTGqqEfUlBQDrhOqf8KdBxa4dxNRcWKlB 0tNMmc0lPHuNsFGhIa98RSTjnkXRl2GzOutIcnXs93hpXKn8yL6/Un1t4hb0mHzlLTyxwVwY/g2PPn 0AA/IDGt1/cVQydezHH65YVA5Teh+546Vhq2IbW4nmbmef5EAMQ84COS1A81yPDyV5q0JTKBMjPvx1 p9XYSN4yy8hzTmwMquOoMa64dIFJQHMW0aXviAnk5fYkdWxSk5Gv34nqs96z8x2Eo7/RrZBWNCXpxA iRgBViTStzOx2G/rqo/jRde/IWmULIVwPeiAlMApDAYkehjUopupyV601mAfuY2y35Aiv6QNqYtJWc T6+pG9F4lzVDFo4lc2g8miE5ZrRBaoNA8GI9PRcWFMSNQAd8K0mEFPFYcYwWm/d8UaoZIqpQnY8Z2P uxJ+hkyt/kL3Vl61gkgHMJVOdMx+RocUAFG+5xXN2fv9S6IKl87+y7Q5xRlQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net The pending bit is used to avoid queueing in case the lock is uncontended, and has demonstrated benefits for the 2 contender scenario, esp. on x86. In case the pending bit is acquired and we wait for the locked bit to disappear, we may get stuck due to the lock owner not making progress. Hence, this waiting loop must be protected with a timeout check. To perform a graceful recovery once we decide to abort our lock acquisition attempt in this case, we must unset the pending bit since we own it. All waiters undoing their changes and exiting gracefully allows the lock word to be restored to the unlocked state once all participants (owner, waiters) have been recovered, and the lock remains usable. Hence, set the pending bit back to zero before returning to the caller. Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout event statistics. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 2 +- kernel/locking/lock_events_list.h | 5 +++++ kernel/locking/rqspinlock.c | 28 +++++++++++++++++++++++----- 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 96cea871fdd2..d23793d8e64d 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -15,7 +15,7 @@ struct qspinlock; typedef struct qspinlock rqspinlock_t; -extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); /* * Default timeout for waiting loops is 0.25 seconds diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 97fb6f3f840a..c5286249994d 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */ LOCK_EVENT(lock_no_node) /* # of locking ops w/o using percpu node */ #endif /* CONFIG_QUEUED_SPINLOCKS */ +/* + * Locking events for Resilient Queued Spin Lock + */ +LOCK_EVENT(rqspinlock_lock_timeout) /* # of locking ops that timeout */ + /* * Locking events for rwsem */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index efa937ea80d9..6be36798ded9 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -154,12 +154,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) +int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; struct rqspinlock_timeout ts; + int idx, ret = 0; u32 old, tail; - int idx; BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); @@ -217,8 +217,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * clear_pending_set_locked() implementations imply full * barriers. */ - if (val & _Q_LOCKED_MASK) - smp_cond_load_acquire(&lock->locked, !VAL); + if (val & _Q_LOCKED_MASK) { + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); + res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); + } + + if (ret) { + /* + * We waited for the locked bit to go back to 0, as the pending + * waiter, but timed out. We need to clear the pending bit since + * we own it. Once a stuck owner has been recovered, the lock + * must be restored to a valid state, hence removing the pending + * bit is necessary. + * + * *,1,* -> *,0,* + */ + clear_pending(lock); + lockevent_inc(rqspinlock_lock_timeout); + return ret; + } /* * take ownership and clear the pending bit. @@ -227,7 +244,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ clear_pending_set_locked(lock); lockevent_inc(lock_pending); - return; + return 0; /* * End of pending bit optimistic spinning and beginning of MCS @@ -378,5 +395,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * release the node */ __this_cpu_dec(rqnodes[0].mcs.count); + return 0; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); From patchwork Mon Mar 3 15:22:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999028 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BFAF222561; Mon, 3 Mar 2025 15:23:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015407; cv=none; b=bNOz72qgHO0tGr9LCQVcC5arudiuWlO4kof0aD1WzhZgyMFmVb6G48GlDpY7k+OYBVqY1/T2Ypr8k/CH56BhKmiKBsSxdTat1OAzZRtq3z7Py0QdbjDO6h6lXmZBmevWekAuYp9gYXjn0KLqgrxLNbca0hDKC4frVzoW0TPkU3A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015407; c=relaxed/simple; bh=/tO7WsHRnQ1y7o7jGLLRZ75jx2rvQksn4PRPhYxulGw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lk+bIZwCgE9cx/eIHy/CywQridyM3XpCeuqlSJaAr6pH/gguLuzepxJ5CfJyvHFAJ2f9RTaBk09RX8Ib4sk+R53LnZA0BtB6XajDuyAwqWqA8GjOX+8FHLPNArm92uuPq/7+OGACcIrm6lTwuKoH7cfAN++Uq4Lw2CXTnmWstXs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OFavoeNh; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OFavoeNh" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-43bbc8b7c65so13968525e9.0; Mon, 03 Mar 2025 07:23:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015403; x=1741620203; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TdqbAA6BbOHNHxPeedO3e4MUz1jdw47Vp3CnPvreE48=; b=OFavoeNhctuNGFXzajlY/KtkiyyVB9NuSUcqtDkD6x/eS0XEWHjU9L1md4qwSYtlCE da7jDjhEQdFewBosMApe8VsmWp3SrPLHzN2LPJHO/4UAiua/+F3BHlAi3aSL49jBfc5O 8YA+DnQCyquEBJlOfN6bhO/SZFurLherMLuihbJHZDgclYLJAc4rNoEPn+JQgQy0/D6G gS1s0DDw1+F6i2CuhbEKeQPxJ3NZpbDDzxHpjaNYr1SZEHJCwSuYIEMN3J24XEK89s78 3/9HT/06G8RrfJ6yHveyQmrLHoIq46R0mGFj8p2jPXrovptNWYmreP0F9NfacFQuAr5e yKZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015403; x=1741620203; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TdqbAA6BbOHNHxPeedO3e4MUz1jdw47Vp3CnPvreE48=; b=wTrXta+lFhMBBOXV/yLErsSBkh6sG4WJVUcESHk61auQI3/n292XD4R1pNkqFmVNAj mJGNXOVWTDKZMC55YMj6PYPsw47OaVFG/xLoi1Xvie7nSqvBg+NlMHPpOLyUD/9F9yGU D9Vrl1APZWh1UKlKJ8qjr9mMU2RMcAUDm9M+bJdImjKP6TdSAZ/msc0VDIekEoVKU96b H7exP2StUS3UxhA/YTon1nkAHN+wAvqUVuNCoCDWvqhodzH7hrteSdF77HVOtDRV/D3W 7ZYvlhsNiaKrnL7MrvNlTlxR2yDXWAmbIXDg23ZrqxhXXqvefOrlmTLKvAo5nMofhl9g AICQ== X-Forwarded-Encrypted: i=1; AJvYcCURYOT6mD+idXOSffOJ/8AIDqY/Llew4Anes9l5SL2pFGauvM0Nc+6jXu+AUgqQ985dmb0FUNPd/ZqqG0k=@vger.kernel.org X-Gm-Message-State: AOJu0YyB2lwA/VLoark8hWP/iG01KEp7Ho4BGCK97+zlpHwJ/JRDBcs7 kS1qEuj0B9YBqt681HrinBOCY75pCGys4ViYl/jBaYjjqh5lZFB3hJKTFQrL1Ls= X-Gm-Gg: ASbGncvFgySIeoR2GEzTaZu+efUizK9O3Ver08ZyxHXlIqiOBHWSPQxyIvqtntIVOw+ g1TteDRxTWVL5qaNSjw+5eTf4pYMEq+mc7XyqAw1ucUGslS+6Q3g2+rX4gQzx+7pPB+h9obv5jM qeq1aNHnO1AQDsS8wBbOlRDeRrIS8o2XtVb5T9M9mtfMx3ZPkxYZDaM55FbnyFJ2Rp/MZNnI3yk sIV12JhPxgKPoVxa/K7ZUhsgxiSU44Ws+f8pfrdzeH1EWG0AoWBN708xuZXSRAaTJoMvUKPkYsU 6uymBa4SOt7ZUdepJbL+ibwzK6+4F/irvc8= X-Google-Smtp-Source: AGHT+IGFxxitWQfJahw973p0I2xV7/kZwy8gicFDigUlKkHKzWL+qRQIH+fD/xVRKEKwh58Qm9F6xw== X-Received: by 2002:a05:600c:5014:b0:439:89d1:30dc with SMTP id 5b1f17b1804b1-43ba730d5b6mr142469095e9.10.1741015402850; Mon, 03 Mar 2025 07:23:22 -0800 (PST) Received: from localhost ([2a03:2880:31ff:54::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc6a8ff01sm23489995e9.39.2025.03.03.07.23.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:22 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 10/25] rqspinlock: Protect waiters in queue from stalls Date: Mon, 3 Mar 2025 07:22:50 -0800 Message-ID: <20250303152305.3195648-11-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=8548; h=from:subject; bh=/tO7WsHRnQ1y7o7jGLLRZ75jx2rvQksn4PRPhYxulGw=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXBmmTn9Q/JpHcUWagL6D0LQj7D9XnazfmRv20 F3ZP162JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyoaOD/ 9RNH1I8JHE99sQbIlMiKmtHTFHuJCcV0nvMTdA9wCT6zW9DamNJ7yjyl81IYTGgDbSp7m2QaZu788r wMj5fcd2iwhWcBBc3PlDuAKaTQwSz8B5BN1i6UV7vcK1E/AtpCEpvKXpqZH1SSiPx2YkgVYq8KmZxt Z/dIeZESDo9b9d1uSbJmJosgaNhZz1AbbuYEp6kniedp3jozmo3YFBcOue3YFKPe2Kg4NTJhwyRrXv tULCtm9RRxbu6QxV+kLPyqhYo74CyHJaolrRbLzf8we5rzaMwdJnBzFvKd7i2ZpNaoWQxykdCanhPC kPtXLBOdzzqKzwxaC1t4g3BJ7UIeouteWNdMi5qZqLhvMLByPg/i0nacGi4CD64Lriiq0FL+AMSlTM GZPhan3dSlPjsN45LIuBIPnN5XtrmgKV629AYm01KKITf7cSzqgl2iJ3dpqSasXs//53rgJxbnJaVq VOv1pbHmn+VLbJiV2X88Q43nhN1CwA5HGLBirqzzZ4K0hRzxuo9oGT6S2r76PZFDucgeMDRJB9c7Mm GUAgAQ0RCNDP+ehlhqzyWE0+v7m1qVYImUE0vh4QbgRxK8wEBGaupTPCZ0wHn0ruv4PtHa7PABtP+B PRV0YKX9hGtJ//vt5pVqTc7HjT1GByFMSa0MhSbB3s6F+v0Qkm6vjB3b7DBQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Implement the wait queue cleanup algorithm for rqspinlock. There are three forms of waiters in the original queued spin lock algorithm. The first is the waiter which acquires the pending bit and spins on the lock word without forming a wait queue. The second is the head waiter that is the first waiter heading the wait queue. The third form is of all the non-head waiters queued behind the head, waiting to be signalled through their MCS node to overtake the responsibility of the head. In this commit, we are concerned with the second and third kind. First, we augment the waiting loop of the head of the wait queue with a timeout. When this timeout happens, all waiters part of the wait queue will abort their lock acquisition attempts. This happens in three steps. First, the head breaks out of its loop waiting for pending and locked bits to turn to 0, and non-head waiters break out of their MCS node spin (more on that later). Next, every waiter (head or non-head) attempts to check whether they are also the tail waiter, in such a case they attempt to zero out the tail word and allow a new queue to be built up for this lock. If they succeed, they have no one to signal next in the queue to stop spinning. Otherwise, they signal the MCS node of the next waiter to break out of its spin and try resetting the tail word back to 0. This goes on until the tail waiter is found. In case of races, the new tail will be responsible for performing the same task, as the old tail will then fail to reset the tail word and wait for its next pointer to be updated before it signals the new tail to do the same. We terminate the whole wait queue because of two main reasons. Firstly, we eschew per-waiter timeouts with one applied at the head of the wait queue. This allows everyone to break out faster once we've seen the owner / pending waiter not responding for the timeout duration from the head. Secondly, it avoids complicated synchronization, because when not leaving in FIFO order, prev's next pointer needs to be fixed up etc. Lastly, all of these waiters release the rqnode and return to the caller. This patch underscores the point that rqspinlock's timeout does not apply to each waiter individually, and cannot be relied upon as an upper bound. It is possible for the rqspinlock waiters to return early from a failed lock acquisition attempt as soon as stalls are detected. The head waiter cannot directly WRITE_ONCE the tail to zero, as it may race with a concurrent xchg and a non-head waiter linking its MCS node to the head's MCS node through 'prev->next' assignment. One notable thing is that we must use RES_DEF_TIMEOUT * 2 as our maximum duration for the waiting loop (for the wait queue head), since we may have both the owner and pending bit waiter ahead of us, and in the worst case, need to span their maximum permitted critical section lengths. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 55 +++++++++++++++++++++++++++++++++++-- kernel/locking/rqspinlock.h | 48 ++++++++++++++++++++++++++++++++ 2 files changed, 100 insertions(+), 3 deletions(-) create mode 100644 kernel/locking/rqspinlock.h diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 6be36798ded9..9ad18b3c46f7 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -77,6 +77,8 @@ struct rqspinlock_timeout { u16 spin; }; +#define RES_TIMEOUT_VAL 2 + static noinline int check_timeout(struct rqspinlock_timeout *ts) { u64 time = ktime_get_mono_fast_ns(); @@ -321,12 +323,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * head of the waitqueue. */ if (old & _Q_TAIL_MASK) { + int val; + prev = decode_tail(old, rqnodes); /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); - arch_mcs_spin_lock_contended(&node->locked); + val = arch_mcs_spin_lock_contended(&node->locked); + if (val == RES_TIMEOUT_VAL) { + ret = -EDEADLK; + goto waitq_timeout; + } /* * While waiting for the MCS lock, the next pointer may have @@ -349,8 +357,49 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * store-release that clears the locked bit and create lock * sequentiality; this is because the set_locked() function below * does not imply a full barrier. + * + * We use RES_DEF_TIMEOUT * 2 as the duration, as RES_DEF_TIMEOUT is + * meant to span maximum allowed time per critical section, and we may + * have both the owner of the lock and the pending bit waiter ahead of + * us. */ - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2); + val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || + RES_CHECK_TIMEOUT(ts, ret)); + +waitq_timeout: + if (ret) { + /* + * If the tail is still pointing to us, then we are the final waiter, + * and are responsible for resetting the tail back to 0. Otherwise, if + * the cmpxchg operation fails, we signal the next waiter to take exit + * and try the same. For a waiter with tail node 'n': + * + * n,*,* -> 0,*,* + * + * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is + * possible locked/pending bits keep changing and we see failures even + * when we remain the head of wait queue. However, eventually, + * pending bit owner will unset the pending bit, and new waiters + * will queue behind us. This will leave the lock owner in + * charge, and it will eventually either set locked bit to 0, or + * leave it as 1, allowing us to make progress. + * + * We terminate the whole wait queue for two reasons. Firstly, + * we eschew per-waiter timeouts with one applied at the head of + * the wait queue. This allows everyone to break out faster + * once we've seen the owner / pending waiter not responding for + * the timeout duration from the head. Secondly, it avoids + * complicated synchronization, because when not leaving in FIFO + * order, prev's next pointer needs to be fixed up etc. + */ + if (!try_cmpxchg_tail(lock, tail, 0)) { + next = smp_cond_load_relaxed(&node->next, VAL); + WRITE_ONCE(next->locked, RES_TIMEOUT_VAL); + } + lockevent_inc(rqspinlock_lock_timeout); + goto release; + } /* * claim the lock: @@ -395,6 +444,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * release the node */ __this_cpu_dec(rqnodes[0].mcs.count); - return 0; + return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); diff --git a/kernel/locking/rqspinlock.h b/kernel/locking/rqspinlock.h new file mode 100644 index 000000000000..3cec3a0f2d7e --- /dev/null +++ b/kernel/locking/rqspinlock.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Resilient Queued Spin Lock defines + * + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. + * + * Authors: Kumar Kartikeya Dwivedi + */ +#ifndef __LINUX_RQSPINLOCK_H +#define __LINUX_RQSPINLOCK_H + +#include "qspinlock.h" + +/* + * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value + * @lock: Pointer to queued spinlock structure + * @tail: The tail to compare against + * @new_tail: The new queue tail code word + * Return: Bool to indicate whether the cmpxchg operation succeeded + * + * This is used by the head of the wait queue to clean up the queue. + * Provides relaxed ordering, since observers only rely on initialized + * state of the node which was made visible through the xchg_tail operation, + * i.e. through the smp_wmb preceding xchg_tail. + * + * We avoid using 16-bit cmpxchg, which is not available on all architectures. + */ +static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail) +{ + u32 old, new; + + old = atomic_read(&lock->val); + do { + /* + * Is the tail part we compare to already stale? Fail. + */ + if ((old & _Q_TAIL_MASK) != tail) + return false; + /* + * Encode latest locked/pending state for new tail. + */ + new = (old & _Q_LOCKED_PENDING_MASK) | new_tail; + } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); + + return true; +} + +#endif /* __LINUX_RQSPINLOCK_H */ From patchwork Mon Mar 3 15:22:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999029 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E44122689C; Mon, 3 Mar 2025 15:23:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015408; cv=none; b=lBa6jqcrgh6pk8oD93xTvvOMo4bYAZPVFsFADYjWadhH17C8QK7/tL2PU+gI74c3oooX/64JrxkH/o+lo8bU8XPEgyeuyiHTPZ5mWzJoP82cVl9ir8OwasQLdHAN5fQJkuBGDCOTNAiWN7x/H8IsQgLbAbbX5hGXlYn9pi08Qu4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015408; c=relaxed/simple; bh=Q9a38BvImlUP8RKT7WE0lh2B2I6Iy4KpJSDwsgIePN4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XCvQqvA4BLUq9vFTKhqbkiglTir4NSX3aRb8wxUgxlpU4WFFJX4CAv3Ag5NUMhLRwpb3nIhl7MAZYCb4VtKXieimOPQMkn0vUx2eRtkYLH/TN0z+WYFxY4yOegQTjb0tRKlTmB8uHkL9Rp/orB7knWnkNuzOSCjkA/eMArqMICo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OJseEvj3; arc=none smtp.client-ip=209.85.221.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OJseEvj3" Received: by mail-wr1-f68.google.com with SMTP id ffacd0b85a97d-38a25d4b9d4so2800841f8f.0; Mon, 03 Mar 2025 07:23:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015404; x=1741620204; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3HqVASaJxCqTvezKR1/7pImOEJNTa/8ze2+bN0p0ybM=; b=OJseEvj3XZ/R5cm5xFxU41w/qS4yftAmuBkTGibF9q7vkW1wx7hAtGwbTaV97shn81 OO0YspiKCRyqIKU45QR/uhfBDO98FjaurMo2cblur57AigQBPOuD6CbJpn9XajQ3HHgQ ywkXLcr3DB1fXTSY2JpfRoMG7ySMpMnmwWt6OdsFJfcis+3vBreIdl74mX+k0H+3AMhD +Imxht8gr/0ugpjvQ7N0E97pugH+yrRVRDThr0zxCkc1tXghfvXkpit0KyhthTjbpWfx NHP4+/7d7eJd5lAFsPtXveNyA/wkkAmioBhQ6tOslmOGBDHWyvadj6i4N8TvITFMUSyc NVjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015404; x=1741620204; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3HqVASaJxCqTvezKR1/7pImOEJNTa/8ze2+bN0p0ybM=; b=YRt0e29N05C9pEuSbWyzEy6PctiXbo0pFivHcISG+Bjkqfk17m4WID5QPW6ZmW/YKf /zgGzeoBAb70W52Kmj/CeCFXnxU5QMdl7y9ACwDwfcIy1WXz4WlHIGRiDaIq6Hg4HP8V 635yWfDkr+7CDfwd++VQUtrtA9M1IG/MbKYtPk3QCV1QE0EMvwJc0XlKUeearJ1Oib/1 Zj7JitRhfmm02kiVWfUZTYJ0vTVGCp7pMyAeh2D7mrsYakxqe7gqdQHURB4h8E5n9UHd bVTV68bFIn4pL3p+/1xP9nGXi95FcQs3Rrp3wq1XXjCn504L9qvU0XURITbzLxKOgElH NRmg== X-Forwarded-Encrypted: i=1; AJvYcCWxVa5ad8Bn5Zz4wu2w+VIHkjItAiI1dW82POIbNXCw41zZWgoAbJfdYrdD4AowYC2Ks/huSqjf0fpBfS0=@vger.kernel.org X-Gm-Message-State: AOJu0YwVuCQ7HSR3IChRSdfhI8WufXp03rJoF6rzxR/QcwpxEJhqVN1V FY74/JrL/mO1FZhRszRqP41QUINjZDzsNBOuKZjp5Xf4Fv8X1Un3VSGK9BL4cgQ= X-Gm-Gg: ASbGncsWVl1mZu9OKKkuHsUGmv4l1JyIMaMEguUE1sVNW75aC/khFFH+i5a/SjsRWbH JJWs+7VpZHfA78jTU9eWsciWjjXnCSV20HS+ZqS7fwnnKOxRyXEjTuPIvOa1rPR/0KJzbcGfwEC B8nC/axVyUM61ioQATXk7sQo4RrDQ9EvXi/BXIi3ddLY7/Z2vqHGLOOYNNtNtvP59cZOuQhVEX5 12SPAq2uGhMb2DOBTPnRYVZVt3xuUopH9vt6rK6OpBM0xkhW4VFrUW/uGdS2VWwSfKu4iSE273u ssB83VdqGwitJnBCOIBIQ5DkekxFCsmrcTc= X-Google-Smtp-Source: AGHT+IGJtvDm6QNWOZ/Cpj2mx/hKnagicnAIvDMSVdaEmiufPdG1r9WWQB7409X5ARNMd+GMAA5o0g== X-Received: by 2002:a5d:5c84:0:b0:390:f1cb:286e with SMTP id ffacd0b85a97d-390f1cb2ceemr5781942f8f.27.1741015404157; Mon, 03 Mar 2025 07:23:24 -0800 (PST) Received: from localhost ([2a03:2880:31ff:44::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47a7473sm14977125f8f.38.2025.03.03.07.23.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:23 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 11/25] rqspinlock: Protect waiters in trylock fallback from stalls Date: Mon, 3 Mar 2025 07:22:51 -0800 Message-ID: <20250303152305.3195648-12-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1845; h=from:subject; bh=Q9a38BvImlUP8RKT7WE0lh2B2I6Iy4KpJSDwsgIePN4=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXG8w8qbawJw3OszHwUL3Z+OsS55BpLQ/c3hWE /xTGsM2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8Ryn5oD/ 96RVXMaF0ne5BLzCBhyNtkJBr6CAlauxQ3pblR3jmHlcP2GkKE00BZqXSuo5K0M0W8hzmyVBm8WZpv QmAprnJvjPvw3bTmZO232VjrI6ST5tJSd6XooAlRFPv85J3lQFS6z5IVGVtrJ/t6kbOg+xfwatWJVH JpGj9jJXyCIiXjHxaK91EcYxqzLOnLljC6WmGxE1N0lmh870+qkXn0vCZjMQ0lyW8wHEBbZ6Ywy8Gz 55T4SsC+beWbwhBNCJn02UhnONKtZqi0rm69Mb6AY+eRzPn9ZAI/n8F3jYqkY+yDCFq1Z4mdsWAsU+ UcxEA0xxZEA1gBVbVzAhkYvCHyFYnA/ZKE/8EZl6+ARIPGiQVPEqYETfeYZTVMxznw060kQAguI7WX WEFHRA8OzFTHycOX3/U8m+f3+9dgBhhVfpf0wJJHSMzNPR8J1we3rcdGBFiA3d8MdoQPth0iHEU2zP AGaEkhSEo4gXrUXNZnLq3ZpLEB+nn1ImiZ31A2o/3uJ4Vw9/x4a0iPKa01RJjV83Kjms2TGGOvbW2N GPkv4fOWnkSZv94YXDW/lzuzDng6iHB/gMJzFDkesuEjSysN2zSjMlyomMe3ja5puRwyFe2tE80jLM qADm2HxuhDZkxotSHNpoBoyUppVvy95LQycs0czfCa2Iqw1nW/eGWv4P5IYQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net When we run out of maximum rqnodes, the original queued spin lock slow path falls back to a try lock. In such a case, we are again susceptible to stalls in case the lock owner fails to make progress. We use the timeout as a fallback to break out of this loop and return to the caller. This is a fallback for an extreme edge case, when on the same CPU we run out of all 4 qnodes. When could this happen? We are in slow path in task context, we get interrupted by an IRQ, which while in the slow path gets interrupted by an NMI, whcih in the slow path gets another nested NMI, which enters the slow path. All of the interruptions happen after node->count++. We use RES_DEF_TIMEOUT as our spinning duration, but in the case of this fallback, no fairness is guaranteed, so the duration may be too small for contended cases, as the waiting time is not bounded. Since this is an extreme corner case, let's just prefer timing out instead of attempting to spin for longer. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 9ad18b3c46f7..16ec1b9eb005 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -271,8 +271,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ if (unlikely(idx >= _Q_MAX_NODES)) { lockevent_inc(lock_no_node); - while (!queued_spin_trylock(lock)) + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); + while (!queued_spin_trylock(lock)) { + if (RES_CHECK_TIMEOUT(ts, ret)) { + lockevent_inc(rqspinlock_lock_timeout); + break; + } cpu_relax(); + } goto release; } From patchwork Mon Mar 3 15:22:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999030 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8C9522A817; Mon, 3 Mar 2025 15:23:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015410; cv=none; b=dgL0eEVqSw+R+GDGYSAwFum8k/1ySlk+YWJe+gJom5b2rZVKPMI05/tIY/ZBIy1vmDy/Hkcg5wYUdaNM6Y2Xa3zyTNGaiTcboG2V6hOak9yuVbInG47D/s5dQcmIWRQxItld/PzV8H6XPffvICi9BM9RT8hNbQNkQ3l4t+azZoo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015410; c=relaxed/simple; bh=q+PlHVYmXyQy+isvTJhD2W+ttvErhONLhj2AWs0Sbuk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZtIKLl7G6Mbvm9g1hm/V6jDWNpc6DciUsvD1rLpHF4+9mULPBLD0KUp1Rx8qDiPLCtlBNiWTo33zhRqXSmrFCs4hLC+TXVNJ4jxq2q29eYDExwWeV2MPYN9t98GBsBzsCGb5EOHApfVX8MVKwzWO1WN6QrywR/UImyEP9QD0Lk0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Gfpo1w9S; arc=none smtp.client-ip=209.85.128.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Gfpo1w9S" Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-4399a1eada3so42083525e9.2; Mon, 03 Mar 2025 07:23:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015406; x=1741620206; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tKa/wmg3ittWkt3o2YJLMmVqh431E62mqFSXV7PYA3Q=; b=Gfpo1w9SY0es2ojVK4tXzK9oaUkJRiBDd6tTAqhqmWLz2H9h1Nqya9MzeILzaFKZaK uiU95nTuLiwxT5Z87flL7E3QcPD7jO5Y5ITXc3UCbOnEKWkUGD7x617JoDI8Y+McPdO7 cLZef0P+k4rQAC4M1rSF1nWMPDvgacY8agL/Ql87Ll/Q7hRNQrtJ0TOUY9w8KxRzVa5+ FRZEPGEHlmMfyB51ZQmQZmMPSvh3dEk29WjrhXiCxtzVTLEofna2sxmmWbE+1n6oN4DN HF4e15uYVtLx/+dnDCPPiq5XIPVLDGzrOu7mAfVWbG1Wl50q6XkQJ+JPnT9BkFqLvpSf II6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015406; x=1741620206; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tKa/wmg3ittWkt3o2YJLMmVqh431E62mqFSXV7PYA3Q=; b=rbuFCSJneXsZejfAqGcwCRrSwPfbflpeyVe2Jmsw+Y5Fqk9mz+trjtDsHEGup2hpGY t5ZAAQreYwCkad3184U2qE2vzTJVO8SLyuIpAgY0z4RS9h27ZFGGDFWEOHfe7ar0ik+U VPM97dg0Y+It0Rj6983g1zTB/U7gFv2P3OGQFW+iBlWktM9RTqTiKQGh1XBU+bpq4xe/ RpxkddAUelqW9hvk44O5YE8d/ieeQrrKuq9E0LWtkb3Ezg21mwugaW+I1f3/ty0XBUg9 Z5u2n/4LhJ62YI060xhNsad6NScV2o2HZE1rN5pc6Em4Ug9iwMeFjbH02rGBpL3cuu+i ecPA== X-Forwarded-Encrypted: i=1; AJvYcCUpy1RdemLSndtY064snxmvnR0anS5d6V25B/UY7ix6P+DWh5+2+APoI2hX2gnBuzkeZS7o1Q7rPXatksY=@vger.kernel.org X-Gm-Message-State: AOJu0YxWIOQ0vCAlHTufBBd38X0oy5yqDRbMoX+Pc8yxKSiSzZbPIp6j qfUJHje5ezZAjjtEXNtGrMXZd6dIlEkbIKGKnJw+1o+cHlrG1nPDzuZL7pfVdJA= X-Gm-Gg: ASbGnct1RNwXHqDpkqYsPw3L8TW3YbSlzZZ1zm97LrB2DDJJY+bQeZHibK4QUUxVPP0 yMq9U82CEEmCy4cTzjXDoBmSMJRIr0prjZLc3FLg2jAe6PlyaVtn6EijSXMIhVyJZHYlycuwqYA H6OlbKPbJpJ0g7r+mY5VrWjWfH+DHdRSoTceDBYL4gCAFfi9jKE3auiBlWSKzoYEMDLH8vH1c21 znA2FaiVa12+iYZPmJsyBtHjTPqscHawhJvo3gZ37KmLRjKGbTvtFHRGqkJl33K1agOm2o0Do7o oJzOZpPOnlpJQNWUX4giML/7lKyJ+Q+IEHQ= X-Google-Smtp-Source: AGHT+IG1blRJwAMlXeHRPq+X+kGTGI57ZLlk81V9zn4NhYMCeonnkf5jNU1HzLwhFv1wOWVdopS8gA== X-Received: by 2002:a5d:64cf:0:b0:38f:30a3:51fe with SMTP id ffacd0b85a97d-390eca53071mr9776559f8f.42.1741015405695; Mon, 03 Mar 2025 07:23:25 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4f::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e4844ac6sm14636626f8f.71.2025.03.03.07.23.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:24 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 12/25] rqspinlock: Add deadlock detection and recovery Date: Mon, 3 Mar 2025 07:22:52 -0800 Message-ID: <20250303152305.3195648-13-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=16419; h=from:subject; bh=q+PlHVYmXyQy+isvTJhD2W+ttvErhONLhj2AWs0Sbuk=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWX+cvAWmiPyeTrpA7wf0kdH6RoTzfmmo3/HOrt rkmH2BWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RylcSD/ 98d0x1nLa8rNVvdG3GA2itnPRdooLYI4T2fCCWt7Ts3e/+FELsL6YecnuYvTRcC8lgfNog49Q61vR9 ZbdiJhRYp0foaE6nkdJ6ZNdOUUqB7XNQ/5a9wspJwtac9mvnLvvflyBBpMra1iGmSbkeC1INTGs0k9 d4LX43Pt6LiEThANydEY2aB/KLWgJ8yJ0GI8rF9cmw9HZVEiXBI+DnpCcdFXfOXk4JFOIsM6J3+9uR gF1XAmwR9+OPBLj3ew1jqeOvImHAC6Q3KhH0fM2Khxp54yewJN3fdK27ecmcZ7dFD955LNup8Jwhc9 6No4sxHtKn3TRa233wc0E2c0WN41u0ANO2hskwdvDQrpBuGt1oekXg4lIEhasIAqkmRUwda8Se6FsW 917K8r3V5eRVzYlOuc2QnlSi1Z7UwvssgtZzlONowb11BDRvYdc6n30bdv8j5j5CkOWLS8QRFw1R2d 02L7M5THvXBlDnXEF17/fIPhaBsUgYWj9Tb6Te0pcUFTth8/hsrrqoLxYgdezymUP+YaKk4gSaxwYS UKsOuzcvXTm/Ole1m5oquCmbxoW+PiysOrhmj6Bjrhr2L2rTOE7Il4KP1WZyiYF3CZJJblxuxIzJD9 ftOVHvT1bfBqlCs5q/Ga8rae8DZswqhS7fMfgOHnwnPkrgWBo+l+BmoWa58A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net While the timeout logic provides guarantees for the waiter's forward progress, the time until a stalling waiter unblocks can still be long. The default timeout of 1/2 sec can be excessively long for some use cases. Additionally, custom timeouts may exacerbate recovery time. Introduce logic to detect common cases of deadlocks and perform quicker recovery. This is done by dividing the time from entry into the locking slow path until the timeout into intervals of 1 ms. Then, after each interval elapses, deadlock detection is performed, while also polling the lock word to ensure we can quickly break out of the detection logic and proceed with lock acquisition. A 'held_locks' table is maintained per-CPU where the entry at the bottom denotes a lock being waited for or already taken. Entries coming before it denote locks that are already held. The current CPU's table can thus be looked at to detect AA deadlocks. The tables from other CPUs can be looked at to discover ABBA situations. Finally, when a matching entry for the lock being taken on the current CPU is found on some other CPU, a deadlock situation is detected. This function can take a long time, therefore the lock word is constantly polled in each loop iteration to ensure we can preempt detection and proceed with lock acquisition, using the is_lock_released check. We set 'spin' member of rqspinlock_timeout struct to 0 to trigger deadlock checks immediately to perform faster recovery. Note: Extending lock word size by 4 bytes to record owner CPU can allow faster detection for ABBA. It is typically the owner which participates in a ABBA situation. However, to keep compatibility with existing lock words in the kernel (struct qspinlock), and given deadlocks are a rare event triggered by bugs, we choose to favor compatibility over faster detection. The release_held_lock_entry function requires an smp_wmb, while the release store on unlock will provide the necessary ordering for us. Add comments to document the subtleties of why this is correct. It is possible for stores to be reordered still, but in the context of the deadlock detection algorithm, a release barrier is sufficient and needn't be stronger for unlock's case. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 100 +++++++++++++++++ kernel/locking/rqspinlock.c | 185 ++++++++++++++++++++++++++++--- 2 files changed, 271 insertions(+), 14 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index d23793d8e64d..b685f243cf96 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -11,6 +11,7 @@ #include #include +#include struct qspinlock; typedef struct qspinlock rqspinlock_t; @@ -22,4 +23,103 @@ extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); */ #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4) +/* + * Choose 31 as it makes rqspinlock_held cacheline-aligned. + */ +#define RES_NR_HELD 31 + +struct rqspinlock_held { + int cnt; + void *locks[RES_NR_HELD]; +}; + +DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); + +static __always_inline void grab_held_lock_entry(void *lock) +{ + int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt); + + if (unlikely(cnt > RES_NR_HELD)) { + /* Still keep the inc so we decrement later. */ + return; + } + + /* + * Implied compiler barrier in per-CPU operations; otherwise we can have + * the compiler reorder inc with write to table, allowing interrupts to + * overwrite and erase our write to the table (as on interrupt exit it + * will be reset to NULL). + * + * It is fine for cnt inc to be reordered wrt remote readers though, + * they won't observe our entry until the cnt update is visible, that's + * all. + */ + this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock); +} + +/* + * We simply don't support out-of-order unlocks, and keep the logic simple here. + * The verifier prevents BPF programs from unlocking out-of-order, and the same + * holds for in-kernel users. + * + * It is possible to run into misdetection scenarios of AA deadlocks on the same + * CPU, and missed ABBA deadlocks on remote CPUs if this function pops entries + * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct + * logic to preserve right entries in the table would be to walk the array of + * held locks and swap and clear out-of-order entries, but that's too + * complicated and we don't have a compelling use case for out of order unlocking. + */ +static __always_inline void release_held_lock_entry(void) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (unlikely(rqh->cnt > RES_NR_HELD)) + goto dec; + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL); +dec: + /* + * Reordering of clearing above with inc and its write in + * grab_held_lock_entry that came before us (in same acquisition + * attempt) is ok, we either see a valid entry or NULL when it's + * visible. + * + * But this helper is invoked when we unwind upon failing to acquire the + * lock. Unlike the unlock path which constitutes a release store after + * we clear the entry, we need to emit a write barrier here. Otherwise, + * we may have a situation as follows: + * + * for lock B + * release_held_lock_entry + * + * try_cmpxchg_acquire for lock A + * grab_held_lock_entry + * + * Lack of any ordering means reordering may occur such that dec, inc + * are done before entry is overwritten. This permits a remote lock + * holder of lock B (which this CPU failed to acquire) to now observe it + * as being attempted on this CPU, and may lead to misdetection (if this + * CPU holds a lock it is attempting to acquire, leading to false ABBA + * diagnosis). + * + * In case of unlock, we will always do a release on the lock word after + * releasing the entry, ensuring that other CPUs cannot hold the lock + * (and make conclusions about deadlocks) until the entry has been + * cleared on the local CPU, preventing any anomalies. Reordering is + * still possible there, but a remote CPU cannot observe a lock in our + * table which it is already holding, since visibility entails our + * release store for the said lock has not retired. + * + * In theory we don't have a problem if the dec and WRITE_ONCE above get + * reordered with each other, we either notice an empty NULL entry on + * top (if dec succeeds WRITE_ONCE), or a potentially stale entry which + * cannot be observed (if dec precedes WRITE_ONCE). + * + * Emit the write barrier _before_ the dec, this permits dec-inc + * reordering but that is harmless as we'd have new entry set to NULL + * already, i.e. they cannot precede the NULL store above. + */ + smp_wmb(); + this_cpu_dec(rqspinlock_held_locks.cnt); +} + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 16ec1b9eb005..ce2bc0a85a07 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -31,6 +31,7 @@ */ #include "qspinlock.h" #include "lock_events.h" +#include "rqspinlock.h" /* * The basic principle of a queue-based spinlock can best be understood @@ -74,16 +75,146 @@ struct rqspinlock_timeout { u64 timeout_end; u64 duration; + u64 cur; u16 spin; }; #define RES_TIMEOUT_VAL 2 -static noinline int check_timeout(struct rqspinlock_timeout *ts) +DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); + +static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) +{ + if (!(atomic_read_acquire(&lock->val) & (mask))) + return true; + return false; +} + +static noinline int check_deadlock_AA(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + int cnt = min(RES_NR_HELD, rqh->cnt); + + /* + * Return an error if we hold the lock we are attempting to acquire. + * We'll iterate over max 32 locks; no need to do is_lock_released. + */ + for (int i = 0; i < cnt - 1; i++) { + if (rqh->locks[i] == lock) + return -EDEADLK; + } + return 0; +} + +/* + * This focuses on the most common case of ABBA deadlocks (or ABBA involving + * more locks, which reduce to ABBA). This is not exhaustive, and we rely on + * timeouts as the final line of defense. + */ +static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + int rqh_cnt = min(RES_NR_HELD, rqh->cnt); + void *remote_lock; + int cpu; + + /* + * Find the CPU holding the lock that we want to acquire. If there is a + * deadlock scenario, we will read a stable set on the remote CPU and + * find the target. This would be a constant time operation instead of + * O(NR_CPUS) if we could determine the owning CPU from a lock value, but + * that requires increasing the size of the lock word. + */ + for_each_possible_cpu(cpu) { + struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu); + int real_cnt = READ_ONCE(rqh_cpu->cnt); + int cnt = min(RES_NR_HELD, real_cnt); + + /* + * Let's ensure to break out of this loop if the lock is available for + * us to potentially acquire. + */ + if (is_lock_released(lock, mask, ts)) + return 0; + + /* + * Skip ourselves, and CPUs whose count is less than 2, as they need at + * least one held lock and one acquisition attempt (reflected as top + * most entry) to participate in an ABBA deadlock. + * + * If cnt is more than RES_NR_HELD, it means the current lock being + * acquired won't appear in the table, and other locks in the table are + * already held, so we can't determine ABBA. + */ + if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD) + continue; + + /* + * Obtain the entry at the top, this corresponds to the lock the + * remote CPU is attempting to acquire in a deadlock situation, + * and would be one of the locks we hold on the current CPU. + */ + remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]); + /* + * If it is NULL, we've raced and cannot determine a deadlock + * conclusively, skip this CPU. + */ + if (!remote_lock) + continue; + /* + * Find if the lock we're attempting to acquire is held by this CPU. + * Don't consider the topmost entry, as that must be the latest lock + * being held or acquired. For a deadlock, the target CPU must also + * attempt to acquire a lock we hold, so for this search only 'cnt - 1' + * entries are important. + */ + for (int i = 0; i < cnt - 1; i++) { + if (READ_ONCE(rqh_cpu->locks[i]) != lock) + continue; + /* + * We found our lock as held on the remote CPU. Is the + * acquisition attempt on the remote CPU for a lock held + * by us? If so, we have a deadlock situation, and need + * to recover. + */ + for (int i = 0; i < rqh_cnt - 1; i++) { + if (rqh->locks[i] == remote_lock) + return -EDEADLK; + } + /* + * Inconclusive; retry again later. + */ + return 0; + } + } + return 0; +} + +static noinline int check_deadlock(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + int ret; + + ret = check_deadlock_AA(lock, mask, ts); + if (ret) + return ret; + ret = check_deadlock_ABBA(lock, mask, ts); + if (ret) + return ret; + + return 0; +} + +static noinline int check_timeout(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) { u64 time = ktime_get_mono_fast_ns(); + u64 prev = ts->cur; if (!ts->timeout_end) { + ts->cur = time; ts->timeout_end = time + ts->duration; return 0; } @@ -91,6 +222,15 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) if (time > ts->timeout_end) return -ETIMEDOUT; + /* + * A millisecond interval passed from last time? Trigger deadlock + * checks. + */ + if (prev + NSEC_PER_MSEC < time) { + ts->cur = time; + return check_deadlock(lock, mask, ts); + } + return 0; } @@ -99,21 +239,22 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) * as the macro does internal amortization for us. */ #ifndef res_smp_cond_load_acquire -#define RES_CHECK_TIMEOUT(ts, ret) \ - ({ \ - if (!(ts).spin++) \ - (ret) = check_timeout(&(ts)); \ - (ret); \ +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ + ({ \ + if (!(ts).spin++) \ + (ret) = check_timeout((lock), (mask), &(ts)); \ + (ret); \ }) #else -#define RES_CHECK_TIMEOUT(ts, ret, mask) \ +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ ({ (ret) = check_timeout(&(ts)); }) #endif /* * Initialize the 'spin' member. + * Set spin member to 0 to trigger AA/ABBA checks immediately. */ -#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; }) +#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 0; }) /* * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary. @@ -208,6 +349,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) goto queue; } + /* + * Grab an entry in the held locks array, to enable deadlock detection. + */ + grab_held_lock_entry(lock); + /* * We're pending, wait for the owner to go away. * @@ -221,7 +367,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ if (val & _Q_LOCKED_MASK) { RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); - res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); + res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK)); } if (ret) { @@ -236,7 +382,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ clear_pending(lock); lockevent_inc(rqspinlock_lock_timeout); - return ret; + goto err_release_entry; } /* @@ -254,6 +400,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ queue: lockevent_inc(lock_slowpath); + /* + * Grab deadlock detection entry for the queue path. + */ + grab_held_lock_entry(lock); + node = this_cpu_ptr(&rqnodes[0].mcs); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -273,9 +424,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) lockevent_inc(lock_no_node); RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); while (!queued_spin_trylock(lock)) { - if (RES_CHECK_TIMEOUT(ts, ret)) { + if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) { lockevent_inc(rqspinlock_lock_timeout); - break; + goto err_release_node; } cpu_relax(); } @@ -371,7 +522,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2); val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || - RES_CHECK_TIMEOUT(ts, ret)); + RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK)); waitq_timeout: if (ret) { @@ -404,7 +555,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) WRITE_ONCE(next->locked, RES_TIMEOUT_VAL); } lockevent_inc(rqspinlock_lock_timeout); - goto release; + goto err_release_node; } /* @@ -451,5 +602,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ __this_cpu_dec(rqnodes[0].mcs.count); return ret; +err_release_node: + trace_contention_end(lock, ret); + __this_cpu_dec(rqnodes[0].mcs.count); +err_release_entry: + release_held_lock_entry(); + return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); From patchwork Mon Mar 3 15:22:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999031 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 767242144DA; Mon, 3 Mar 2025 15:23:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015411; cv=none; b=L74UweeZsizcJtdM2e2TVeXp8HbGXOpfQiXppnaN/EVqBiHbiAGfGGW5VItyvce7q7zFSZy4aGSeez7Tqi2l6dPO0Nx6deIzdQbJSYsotSsL0Nl50EdqvFJZhI1hnY++2xAHZ4Kn4dIECvDyBohgsLhvxdd29UYZJ4ZucUB4uTw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015411; c=relaxed/simple; bh=o1MKDNUhE9EFH4RwMPASaZvJKuJrMCZ8uocCQm849Ik=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r8K1Ci0Qf+JbVPG7rNJA3eEamLwcvhem4/r4xxZ42QxukbL+Eq6bIz2atjjxlL+5LPVCFhpTdTH2FAmwN8qoQHq7YM5hMWb+nC0spNXikebfxtcIy1+KFaq5IixgPVIsPdQVuVfLlb7hHQme39Bpx7vSCasCbSKD0ssupgU/GPQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=if2UKnKk; arc=none smtp.client-ip=209.85.221.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="if2UKnKk" Received: by mail-wr1-f67.google.com with SMTP id ffacd0b85a97d-390ec7c2cd8so2125332f8f.1; Mon, 03 Mar 2025 07:23:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015407; x=1741620207; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Narwm03ehkHlcNbrRom2a3WOE/jJ7XHKhd4vuGjJYfQ=; b=if2UKnKkmKtTN+nRd8KBMZL9tSyE2nMh+MkKabwRqw6KdaCqGAyZoNWiw1CK2x5T+o O3b3EDVoq96b/fVgqHyH0YFJ86YAcOZh0giiN6vx9XhMKyZUrTtgXQ1yTn50+8jqTMfl K4rrPzkA1mEAxTQDHW1gLTay4rDwfAUVP6SNDkzsdI48weZL9J6VuTxS67fe6O907LwJ y7UtDts0jga0c8zhJ8NskTuolRfwEgWyMhmDzb8cNIs3UiWh8ALOYrLI6ZhE1IAQVtK2 3Rz4QxM8BNPMSoqbdJni1pxvUi44shZuu2IzltI2eGlVFjkuCH8h7tR7dcolKn75/bu8 3u6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015407; x=1741620207; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Narwm03ehkHlcNbrRom2a3WOE/jJ7XHKhd4vuGjJYfQ=; b=gcBgEKjbq5qGU6kBI564fZB4I1N1rM4cwOzo2WN1c6y6AyZdFVakxRK2Lb7OXjjR+7 eoiOdj4DseEiDL8Bxf1T9Z/7z5sXdcXFPHQjgS/Yao3yXzcqES9hrDMykz9aZBuvLRuj R6MrU+i9d9KoGecGtXhPCs5hnIDFiyilwPiVkuBp6zlJoZgODaj9ieSnX7CNd8xCAb6f hOErFqNYcSdTYk45HmxxvuCHJ9YWxkBNZ7F8RiABe0r6wpbVj4SG0v04EnOhxNSwDL7s VgqBJc1W/D+e07DvY2l1S9sBVUMMVweKVynnlDtPTk7M8YTOCw0fdXl9Batluzfnclta 42Og== X-Forwarded-Encrypted: i=1; AJvYcCUC4uCpoNSbC0tjc5dwu/ld1ijwxCbKT1aJzSo6J127pa8cem8nweSUtHzEZqyxNktOuQca8ib7CzFUPTQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxuSdGDWoaZRgZjLij3I1V5KJg6ksW2Unf2SfsfJsi3RSkpSWFX cmKuypz6tsbIk17vIr5O1+iMfoqPc1JmM+g3sTlAE+AdwfYZ7tnBSWlT4/eJZs4= X-Gm-Gg: ASbGncsd7frVPbtj1TvU+puz76lPdgONbTsZ6rHoue/CkpCICjevifreYQ+k8H60lCe 0LH+k2uf1I+d96kCludPzlFfvn3q2Kqc0bUv3WVMUrVbGMK+bAE0yEtmNoOUAOayAcrIrSkOmi6 Yv6LF1e+czDUOXQPjnLm7fdahMy759XRUiSl8FTmO9fiKSk5exGwcY0qE3sCQNgN3HkbGZKMPKY WERbJTbeDnDsYjY/nC4sqkd6OHFuXNlZkIq83jxhoj+WHfxlck3zYGuBv9yZUETlaB/k9O4HijT QglEiE/bfClGSMdbebIQNIOJwv3jTdw9mCQ= X-Google-Smtp-Source: AGHT+IHQPjApFZs3HhsmlujTPNCDLSejJ0ltemem59qnL0CWos33mq2nn38IumeJ2bsiNZQ67lldHQ== X-Received: by 2002:a05:6000:1f8d:b0:390:fb37:1bd with SMTP id ffacd0b85a97d-390fb370470mr6319805f8f.46.1741015407243; Mon, 03 Mar 2025 07:23:27 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4e::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485db82sm14509941f8f.88.2025.03.03.07.23.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:26 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 13/25] rqspinlock: Add a test-and-set fallback Date: Mon, 3 Mar 2025 07:22:53 -0800 Message-ID: <20250303152305.3195648-14-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4064; h=from:subject; bh=o1MKDNUhE9EFH4RwMPASaZvJKuJrMCZ8uocCQm849Ik=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXyKg4wB+6V7Ple/tkprfUEWpR0Hl6dgveKapl mmrrb5WJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyrORD/ 9X6zKnZ50odRHn/aTMB5eerXFYbZYCFEA0zhWN7wOz/mXTLzA+cu1Kx16cILZLnewteP+spWp6TfeG oO/uEEbmEQvn0We0uAupBYIt8VRyjDPQs99CWCu4QGxc3xxkMS5+mQTJ2ons155TFMD85j8Vt7WBeN Gp3Q7X7RM7uzhQ4y+EgVeTAa8/W6GwRYnV/14RG89yraqvQIzLJtA/BHAHhJwp3um17ldvOg30GGTG ljTKl/gZXoBY1wmcLDkkukm03I2d3VP267EdahhXrctgXkA3dPZCyVzJg/lHJF41jU28pRqEqdU1fZ NB5PW9KYd1XzxvhBhwPdgXR2qunX1i8LP/jJn9SDb6JtxKugtHFL9y+ODarv6pIP3Of+NZLd7dwdtO cLPFwRPp5fOfnOfrSQBkPYD+CksR9o5ApDj8rMEimbPEIeKSfuHi4JgiExIi0TmcmH2AHxW+M+Jons MY9cSj6MOAtu7gTUDpDFMkHrkip9TL0zZhiW9CJHxYfOVZLWIFlgX00dUWHbzWPbhiRM6gc8tnmL4e gYYeCoy81wYg2HhX5/dCjB6beylFJO8aHhgT+X4oqKJx85F7LbFa6OameusA/gU2YMVg/nDYQKZMxs TIUKm5h4/2zgx4el81Kce6UG+hDPxefexA5tXAA1BD6DWZNiRUU8p7hniQGw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Include a test-and-set fallback when queued spinlock support is not available. Introduce a rqspinlock type to act as a fallback when qspinlock support is absent. Include ifdef guards to ensure the slow path in this file is only compiled when CONFIG_QUEUED_SPINLOCKS=y. Subsequent patches will add further logic to ensure fallback to the test-and-set implementation when queued spinlock support is unavailable on an architecture. Unlike other waiting loops in rqspinlock code, the one for test-and-set has no theoretical upper bound under contention, therefore we need a longer timeout than usual. Bump it up to a second in this case. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 17 ++++++++++++ kernel/locking/rqspinlock.c | 45 ++++++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index b685f243cf96..b30a86abad7b 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -12,11 +12,28 @@ #include #include #include +#ifdef CONFIG_QUEUED_SPINLOCKS +#include +#endif + +struct rqspinlock { + union { + atomic_t val; + u32 locked; + }; +}; struct qspinlock; +#ifdef CONFIG_QUEUED_SPINLOCKS typedef struct qspinlock rqspinlock_t; +#else +typedef struct rqspinlock rqspinlock_t; +#endif +extern int resilient_tas_spin_lock(rqspinlock_t *lock); +#ifdef CONFIG_QUEUED_SPINLOCKS extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +#endif /* * Default timeout for waiting loops is 0.25 seconds diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index ce2bc0a85a07..27ab4642f894 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -21,7 +21,9 @@ #include #include #include +#ifdef CONFIG_QUEUED_SPINLOCKS #include +#endif #include #include #include @@ -29,9 +31,12 @@ /* * Include queued spinlock definitions and statistics code */ +#ifdef CONFIG_QUEUED_SPINLOCKS #include "qspinlock.h" #include "lock_events.h" #include "rqspinlock.h" +#include "mcs_spinlock.h" +#endif /* * The basic principle of a queue-based spinlock can best be understood @@ -70,8 +75,6 @@ * */ -#include "mcs_spinlock.h" - struct rqspinlock_timeout { u64 timeout_end; u64 duration; @@ -262,6 +265,42 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask, */ #define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; }) +/* + * Provide a test-and-set fallback for cases when queued spin lock support is + * absent from the architecture. + */ +int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock) +{ + struct rqspinlock_timeout ts; + int val, ret = 0; + + RES_INIT_TIMEOUT(ts); + grab_held_lock_entry(lock); + + /* + * Since the waiting loop's time is dependent on the amount of + * contention, a short timeout unlike rqspinlock waiting loops + * isn't enough. Choose a second as the timeout value. + */ + RES_RESET_TIMEOUT(ts, NSEC_PER_SEC); +retry: + val = atomic_read(&lock->val); + + if (val || !atomic_try_cmpxchg(&lock->val, &val, 1)) { + if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) + goto out; + cpu_relax(); + goto retry; + } + + return 0; +out: + release_held_lock_entry(); + return ret; +} + +#ifdef CONFIG_QUEUED_SPINLOCKS + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. @@ -610,3 +649,5 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); + +#endif /* CONFIG_QUEUED_SPINLOCKS */ From patchwork Mon Mar 3 15:22:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999032 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65EB922D7A3; Mon, 3 Mar 2025 15:23:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015413; cv=none; b=Bsa259LOdbz6/dah/kTmNpwVEGrbtQcE83gAVPBSKNMxxm5WAk4SPfOagRSGVE6vPbWTcuHqPUL9Ue3WbKkRptDARgoItXTYkUxw5QB7sRr0HxB5mtRxA9Axx22Pg8manHXjeGaYdu5Htz/o8mR15AL/ZOFKxhUS8w4tiqrjE8A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015413; c=relaxed/simple; bh=SCTiw9WUUkbOGd1OOljQ8eDMmea9HElqFJBZYwx5dAE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=asV59swEBzK4V2MdrL7XvJxH6Zae+lFT29zHPjeGwJPICH5e0Mh4JvdJx6wl1bqME+zZo9Wc6lf8U3Otmq8+8vh867xNI62MxpoUtYrqhfWTlg30n0mezY4Uxt170yr6wHmoJD1DOQx4qdxaQTVSRujphtGUhseKl0InMPLFNQ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Fe2xwk52; arc=none smtp.client-ip=209.85.128.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Fe2xwk52" Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-4394a0c65fcso49255555e9.1; Mon, 03 Mar 2025 07:23:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015408; x=1741620208; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hbiXkX+sz+mKWhiZ2JD9Yjh7XLLMoN8o2kVSbNXCy0Y=; b=Fe2xwk522exit2mCEy2efsBasggJQNI7xwjAK9hBxseMbR/fvkvXQiC9IosGpavzX+ ysav9vOyN5Hgo8wlLGpFH5wNadK5JnYWOWww9paSlSmgEZv+BJXv+8zPVe9xx3cjQXIY lAzq8hEGZmFadaZNGcAzYYnbOZNso7eKKoWpBWx9O/bRBFaIItZmGnM7ZPlfs8ExKvXa 1Bjzrxyi4HDfISeIFKzJFvGHuyf8eQoiiGZLRnsJnc4ujJag7Cu549I2mp/DJcJjzkDr z80p84YJoeu3ukc+jJnwGKx8pqIG4bieUst8BKaANa9Vt6nmddYBVnle15ng0LsbpCfX 2E0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015408; x=1741620208; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hbiXkX+sz+mKWhiZ2JD9Yjh7XLLMoN8o2kVSbNXCy0Y=; b=PcsX0M8PkXazU0kpddiMOqcg7uRnheQnUDsGEgAPi7C6G70EDHUlxFDXBpqFePffr+ V6Z1N+rGkisXxMtLTMEFwRkZbP2vYIrxD75XY97fHc+AUp4T5f3gPgsql1vMa1Wz7NNE 1gtfzKz7WJkRYWOp+3Q8g7nR+ikbnY2CpQrJhnvfW40nzzCXRSEVNWZEBhWFCXdcD1uH rHPr0EPOwMy6waJqjEKaxro7Jm53wtpF1pvoZC4itYBWJZ4/5nlqtDaXxssP/Az7YND6 6NQ1kVgsd1jEPLI9APVxQKZ0UlgOgwlwy32X+OAQGSaahz3Azy/WGHiRKzSdLd5f6ScW lvvQ== X-Forwarded-Encrypted: i=1; AJvYcCWhuh8dbieIzPoQglfKHfSDSHkAiCtGk/JI763W8wFFhVJ2ps9Vp4Cm+Sm4ba/iyXhynQUfekUlWGScZPo=@vger.kernel.org X-Gm-Message-State: AOJu0Yxv+WelQgsT4liVx49WKifQuOeo/pM/6+kgH/vjOwzBZ/rfBMex FHz9AWWoxanuEePc9l8urRUftGgEy3hntiliUhYK6c2KnsIXnihtfc9POrwbtbQ= X-Gm-Gg: ASbGncsfmvG1ScNKacg5H+HT0xCLS4x+ecb1VeZMBsX8nqs/+R80Sja/RHyXjzOVgvF vmGicZRTa1LQ3GStM5J9pGXpmQliBV40VrRCX/lk/OmfDsFCg2stfAk3hfhkuJvP1JMIEJbJOVe pI+toWk4UsSBbZPbSOjte9e3CWxvqpAzF680fEie9buorXAgMfg6D5303uw9k4PpwK1WHDSsjvM 3mflQ8gQlXZPwMerekFuQR/rP8k+HpYiJ8j4T8hfaDSXcZD/tM/EGiXoE48nWz9XNfjK3uGG+0/ Bu6nb/cjQ4fm+ai42raOYhEATesnxJAxi0Y= X-Google-Smtp-Source: AGHT+IGKz1oo1UHRACNo+L3weWdPcL06paS7sy9ejxox0qqfekJgP8kl/qKZCrzxii+jUg8vEiUMxA== X-Received: by 2002:a05:600c:4685:b0:439:a0a3:a15 with SMTP id 5b1f17b1804b1-43ba67045camr146732535e9.14.1741015408419; Mon, 03 Mar 2025 07:23:28 -0800 (PST) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485d8e4sm14531679f8f.85.2025.03.03.07.23.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:27 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 14/25] rqspinlock: Add basic support for CONFIG_PARAVIRT Date: Mon, 3 Mar 2025 07:22:54 -0800 Message-ID: <20250303152305.3195648-15-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3277; h=from:subject; bh=SCTiw9WUUkbOGd1OOljQ8eDMmea9HElqFJBZYwx5dAE=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXfTMwb5GMWPgrjqrFJA/0gyw9UdNiWN2qmLrL O0hijiGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8Ryu1NEA CTUuGHPTrmc3ouIq7xk/7vEotpGLUGgeucb8GQwgQo6rcM9OI5bBvMW9JZfYy/los+WuIlwT/5IdCo AYY6qGCzJtoH464Llu8bFT1g6BAyOhZoDkX30C4FLyMD7aeKOCsD4w4X1KQJ0ITOJjLgSiO02ZBcsk G+vSU3qCtMZHbeLI9Q3/47WprV1GDIVEhOPmoC4z80iJULvFyEEv1KPAsKo+TyvXw2oYlSbxG+1gBi 1wmztdcoMQy6Y25bbcqgDlSLphn/5jKPVmwA/PqJCSlHPCcKKDENr5iG+htXq3ifKoKap5E/XN3j1P jGGqMu1j7i2P7YfUmNzvcqd+995DkNkOql0XTtz5YXJU70HWgDDjw2yhkdCHu8HgF+xp8nM1Y5yZHx UhdzTefDYMPL5JVyOInrepgDvHYQkN52ytGBNLKZt6tBQmXSeSSKJxhmpHya3TBUlZ78CPzq9vCVut qCdY7gXM8RU5a13RYFgs3bRptzVLJiuo+wnUZRwM7Cob+PtqcIIrMV1tm6IdmQBmSY8EzRJM447VxP ERz7W85ZSGfHanbBjbzfofA1Q57J4Au8euTPc7e/mx0ZR2sk/gqk632LbaDQKoQcxFXXSO+DEafFhX F8BMJap/MTZa0kNF5S+kGk3gqFtzwIqXvRBWM8HCOLgpW8KCnOFSK/bHk0mw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net We ripped out PV and virtualization related bits from rqspinlock in an earlier commit, however, a fair lock performs poorly within a virtual machine when the lock holder is preempted. As such, retain the virt_spin_lock fallback to test and set lock, but with timeout and deadlock detection. We can do this by simply depending on the resilient_tas_spin_lock implementation from the previous patch. We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that requires more involved algorithmic changes and introduces more complexity. It can be done when the need arises in the future. Signed-off-by: Kumar Kartikeya Dwivedi --- arch/x86/include/asm/rqspinlock.h | 33 +++++++++++++++++++++++++++++++ include/asm-generic/rqspinlock.h | 14 +++++++++++++ kernel/locking/rqspinlock.c | 3 +++ 3 files changed, 50 insertions(+) create mode 100644 arch/x86/include/asm/rqspinlock.h diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h new file mode 100644 index 000000000000..24a885449ee6 --- /dev/null +++ b/arch/x86/include/asm/rqspinlock.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_RQSPINLOCK_H +#define _ASM_X86_RQSPINLOCK_H + +#include + +#ifdef CONFIG_PARAVIRT +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key); + +#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled +static __always_inline bool resilient_virt_spin_lock_enabled(void) +{ + return static_branch_likely(&virt_spin_lock_key); +} + +#ifdef CONFIG_QUEUED_SPINLOCKS +typedef struct qspinlock rqspinlock_t; +#else +typedef struct rqspinlock rqspinlock_t; +#endif +extern int resilient_tas_spin_lock(rqspinlock_t *lock); + +#define resilient_virt_spin_lock resilient_virt_spin_lock +static inline int resilient_virt_spin_lock(rqspinlock_t *lock) +{ + return resilient_tas_spin_lock(lock); +} + +#endif /* CONFIG_PARAVIRT */ + +#include + +#endif /* _ASM_X86_RQSPINLOCK_H */ diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index b30a86abad7b..f8850f09d0d6 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -35,6 +35,20 @@ extern int resilient_tas_spin_lock(rqspinlock_t *lock); extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); #endif +#ifndef resilient_virt_spin_lock_enabled +static __always_inline bool resilient_virt_spin_lock_enabled(void) +{ + return false; +} +#endif + +#ifndef resilient_virt_spin_lock +static __always_inline int resilient_virt_spin_lock(rqspinlock_t *lock) +{ + return 0; +} +#endif + /* * Default timeout for waiting loops is 0.25 seconds */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 27ab4642f894..b06256bb16f4 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -345,6 +345,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + if (resilient_virt_spin_lock_enabled()) + return resilient_virt_spin_lock(lock); + RES_INIT_TIMEOUT(ts); /* From patchwork Mon Mar 3 15:22:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999034 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AED5A22E402; Mon, 3 Mar 2025 15:23:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015414; cv=none; b=hMbKWq04gZI43a/xoB6N/jt5+q7HnPPMimyEANKu8+ueZSmz0G96TkNygq8VgcK19uC+OaiUZN7VD0yqc89Ip9NzcDld1oSDjVz06eONIin392iP4kFBCzXTHqhYVnekD2FtMj0ggNohKZM1X5C2LoBwfXzJlU0zastE5AyERLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015414; c=relaxed/simple; bh=XzehY2DPwsFHlB5VuQsc3kNRx10DtgHBazvwnaqKM4M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZtMh2OLNoAjAE3zIEnDpcSqBAzEXqWBSlUtlhu+AIJL8Jo3PL//f5GgNFM/ZUqFxZUYxHbJ7xlU9RLrLfPeRQMCQ4kyXa0Z3CI800RkUrU/ZbjeDSojj/UR73C+1s+Og+0yvwkm/hG3QfdAL/bHtO7ydA0hciGLQPuy3UhN0qwE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OmHHiovI; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OmHHiovI" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-43995b907cfso29199445e9.3; Mon, 03 Mar 2025 07:23:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015409; x=1741620209; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Oib9X6F+jD32qthRhftfxOrIXcfGV0ymse+3a+fep28=; b=OmHHiovIYiqBqXja5o/ADuRyU/yF2u8LZyen7tD6rnTSZSpd0zeU3u0j8gOi0c8B4r rXfMjn6JNaMnMqm1WS89uFz+Fw5MNS2omOxIBUdbxmHkwRNbQfeYmCGvflsgybe4B33p 133FUlXGCCI6bzHWz7znMNdFFEH4TdMc94VOdSwmJwpKssQJEfqJAhTC3omSyES10Ws5 P36yD3/5ALeMX4aPR0no1mOcHPy94pHLRziDdewgdjR2eKtE7wi50lgw01ko9J8SSU5D B58U1zd/6EXJs0TgtbzhM95Gf1SQ16co2xlnPKpFvuvc5VWkWtNrCci4PhICRs/25Iao NngQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015409; x=1741620209; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Oib9X6F+jD32qthRhftfxOrIXcfGV0ymse+3a+fep28=; b=VToX3WE3WC631tBwXCx88A6wfTLjx0Ia7Fx/K04vS5INXI4xZ84B567Zas7USsbUS1 vMIsDwt+qof7cI8ofvA4hrBCatfQ50qSksabxowVMke5Y7e8CyPLQvs8Q9WG3VQoghSp 373CbFxLoZRorly2bjlcY6c1j2+W0DIHPmNSws5q58wvPW4tdseue04Hf+3MO71C5nNP NGT2dcw3O8YRo8wfUO25TXeZJ1fkFnZVY2+begZptCM9vft8NxfqltDE4DUP1HWD5SFE KaVGOaEk81gaw3GUYRlUmqgsv24bO3PbIDCBOD1N0HpuOiJ5LUg7vu03p22JLs6lXeld TJ2A== X-Forwarded-Encrypted: i=1; AJvYcCWEw2AC+RhQCzWoxv/Z6EA3vahXuNIXQKteZFCVW/AmL3jCmJ2aOuC8I1kca0vk94skpVKBipBmWsRvKts=@vger.kernel.org X-Gm-Message-State: AOJu0YzWRVHw9i+rONpLzwp/xDqjca0WvLf5y59M9za9xSmshSRfSduc IX9JCUhvLb8w8ub45zuode+MYo4UYwQUTAyK8FXfPvV2ScL/vFUQOkptUpIQSJE= X-Gm-Gg: ASbGncs+UYqzICcPGwkXz8CCJURjxtHhZgcvb0+R8qkjCUv69jJxmjEMZ40HwQVByx4 WkvPBZVUJcvfq2B9g+uzS3GxXNqjR4ngK5sk0X5HKaEdztabQsBD0Fk/fmuUb5NaZxaIIO3lgij X/gWbjF7/YGHgB7jCh3iq8zSiqibermIbLlhhDlAlOekkCq20Ar7YUgLfL2lmV5AApj/vXpB3Z5 bktSLxHunlgNasv+qvMIzNhkeOiyvXTSOoCGNVXQedG6DwTUigbmSjMOOhUtxCGkplJuVxhpysV tTEGM4H9++JSO+Y7cpugCGpVgakes6vs2yI= X-Google-Smtp-Source: AGHT+IHK0E0NT4rN+IBrKPzoxmISlpoRx++6wRPzHRPKMbke27YkpGiSQnpSIDIa7J8yfk3LY6QLQA== X-Received: by 2002:a05:600c:4e8b:b0:439:a1c7:7b29 with SMTP id 5b1f17b1804b1-43ba6703c35mr123515565e9.17.1741015409574; Mon, 03 Mar 2025 07:23:29 -0800 (PST) Received: from localhost ([2a03:2880:31ff:44::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485db82sm14510061f8f.88.2025.03.03.07.23.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:29 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 15/25] rqspinlock: Add helper to print a splat on timeout or deadlock Date: Mon, 3 Mar 2025 07:22:55 -0800 Message-ID: <20250303152305.3195648-16-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2125; h=from:subject; bh=XzehY2DPwsFHlB5VuQsc3kNRx10DtgHBazvwnaqKM4M=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXYD8EVbB4Iydj+LTkfn6cdW8csGFE6TObIbKx E0nmgrCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyrkJEA Ct9v2S2uZ5k2kQeAFIPGD+gS68B4nVheOj37Umc8z+EwrGBzZ82mY8MkklYthiJE2AQBdHrtWAwoqG tZbTvmL6+lzWfEYRZt2xi60n3hTJMO3dlntsG+nTWnBWpIQZX5PNvzwPOl1/8OQMO+69prEjQItSyu eMEd38+aF8KI6iN85h1WM0KeOubpLLpUWfdNufr7Gsq0Oi1qImSnF/ilMPpOkoWQnvjxBU2ZKp75De P5Whx7I1WBvsuPW2lX4rzXBrMWbAWPQpojlxod4Ls4y3A61yRekC7LremgMkSs6gvc7Tnw/Z9lQWCU CgtczCyFYGLMz+mNBIEEbLqTQdgdrvV3fFCY78DcBB7xUvGlqD6CIZGr6DZljaV2SoWRHRCmv5Ft8H gVx7H5tqZgvWVo2SoMK4dnjxpn4/df+FCKxss6T2T6c60eFLmTgvI0DME2DNvOKFHgBhQpy6p3Cxsl H4g6MJFPnCJsqycg5cOVJKjPPgeoTGZlXrmAyPL60whVruu8QiWcCAFv4taoKU7M5yTeAGfAYdFU8l 6Ws3FsgbVuQ7DFWtfrLU7462oedcdeEAj81i6KLZmyfucZ073rF8/ghmFeBJtrIiheHs7mCqctGJjH +MDX6Nid6DYpDuM6VXjrvhhN8QfGnRpqsnGqUV9O6Cf6rbVzKauRdtvFhKgA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Whenever a timeout and a deadlock occurs, we would want to print a message to the dmesg console, including the CPU where the event occurred, the list of locks in the held locks table, and the stack trace of the caller, which allows determining where exactly in the slow path the waiter timed out or detected a deadlock. Splats are limited to atmost one per-CPU during machine uptime, and a lock is acquired to ensure that no interleaving occurs when a concurrent set of CPUs conflict and enter a deadlock situation and start printing data. Later patches will use this to inspect return value of rqspinlock API and then report a violation if necessary. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index b06256bb16f4..3b4fdb183588 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -195,6 +195,35 @@ static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask, return 0; } +static DEFINE_PER_CPU(int, report_nest_cnt); +static DEFINE_PER_CPU(bool, report_flag); +static arch_spinlock_t report_lock; + +static void rqspinlock_report_violation(const char *s, void *lock) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (this_cpu_inc_return(report_nest_cnt) != 1) { + this_cpu_dec(report_nest_cnt); + return; + } + if (this_cpu_read(report_flag)) + goto end; + this_cpu_write(report_flag, true); + arch_spin_lock(&report_lock); + + pr_err("CPU %d: %s", smp_processor_id(), s); + pr_info("Held locks: %d\n", rqh->cnt + 1); + pr_info("Held lock[%2d] = 0x%px\n", 0, lock); + for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++) + pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]); + dump_stack(); + + arch_spin_unlock(&report_lock); +end: + this_cpu_dec(report_nest_cnt); +} + static noinline int check_deadlock(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) { From patchwork Mon Mar 3 15:22:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999033 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBFA222F39C; Mon, 3 Mar 2025 15:23:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015414; cv=none; b=lPmo1nEou5ToT3KmkeM6ua2qP84ORePFEaMgPX+2U9FG2q9Mh8aaMY+HJ3HG6OLKH8aP9aX/A3xwPb+VHjVNSeHsu8RZz7DQm5btQOeOPa3zbDRFwNAyK7lMqlxLQVWcAo2x3LivhW+z6hHP9HTFY/iOhe+9c1f6BDrdcBMyfjs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015414; c=relaxed/simple; bh=1vN5D7C4tfoaawMuUjp7ZN+fQy3wYGzp7nrXDJHdVAo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tbcPSTU82yOuyh4VdiVtBdZEEC2bkcj6uZLApt12b0ca/g8RBikIO3w2bedPNPpv5NXXF0g8bbQr8+kk54o2mKEygUkcTFdtPYmvZEvHZfKJVpijYECRF0a55VW9DRnx14NHHagoK/XFSj9QKtomQ5VE9WYVcVaD5vUw9pplt9Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=esHjuS0K; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="esHjuS0K" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-43994ef3872so29737585e9.2; Mon, 03 Mar 2025 07:23:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015411; x=1741620211; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eIFRymfR26T1V4pI+hLoHtvRY/1A7N6xjkuyA7wDGHE=; b=esHjuS0KUnRgpTrqDEV5y+TmxAwcmkjCUu9AkzwFK2Pk+WhPBiCOYwT4MrDO6Hs/RL P2YhT+kzmLdmHfP/OF1irzYkQwNiGS4IwX+7wAWa0LIq/LuxP+DE+IpK3sCfLs4UPvzi zdmDYGVgpalFeUBO91K9NxfyrQ3T7MA+O+OiPFPR7CEkdvzudPdoze9VkIJOkFcmx5cs UaHtCg5ebXkSpWYsUP6llSGIkf6652Yn4f51caXIyc1CE5vCXN1ldAh9UE/1b2zh1WZ6 cPW8I3TcW9MfrVoPVAVUumNfB5vVc8josPWR/wjYU6B4oCFlrXt5ZsVyx5Ic9Z/dJI8k xyoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015411; x=1741620211; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eIFRymfR26T1V4pI+hLoHtvRY/1A7N6xjkuyA7wDGHE=; b=fzG3TadPoQEStJiCIiwzZQc/GMxF3JyfUVUHxHP1ZdlWZxI8h1kbEGqe9M5NlER/Id sK/x7LzMekZ1hnn1hZEt25DD7ClQkmOhZBo7W6vhmslavY8s95kNRzu2DPaBnd6rNYxX Ct8sC43pSEIdKw+aN9CnJ+yJWzRmRMu4QpeedlrMSP7kWa1FBAJY+tOICjwsXJ/lFFwD cEV4TNtoIWB9nIbuhwR7EbtjgslAAfgL10ymArzjujCpdKlyZ36a7xgCckfc5PJAFrji q3mylCp4xgvYvIv8zRC29sFa82RhZ/iwyq1JPYsMEZrOThZJO+sftaU07zFZU9tglBqh u0bg== X-Forwarded-Encrypted: i=1; AJvYcCUdV55WfzqjYBNADJ8hg5UgZhtfO3YNWojbiOlw23sc06Vpooy6C7+LgEzxlOukLwAnSy+1r91cTgH58jU=@vger.kernel.org X-Gm-Message-State: AOJu0YxJLpQHRqMeKVBlwtTPRrenp8NbidG03yhwwfoxcvloJL60xUHE IApFHsk77cvaN8k8RXIp65z0iWwHplA3/+YWbLyrO2pFbZgdh6edwFYcI6JdtOg= X-Gm-Gg: ASbGncvXuQzR4QSBBk1DhmDQ+85RJQvVZuw2+9R3ucw23TtLQnhn8UUpBLYQ/98O/pm iqE/SOE+N+914e+F3Mnr1EgRu2YeOkU0XnIjEzC4JF2usYK/BaUlogbfHlhZKqud+QBF+5Vb5qw 5p3YJu101uuE2P3PDaS84ROaYPYvBQIVRRYD4vqrK3ICKGxDMxkzM7OxBNqih7zUYwvZNM9qV8E okmpSy23si6d3AQDJPXTXr1sg9oalHSC9Sq2pgrNylTC6n0d4f2qugdnqxUmckyoFu4xs2Sz8yB fD83Wo9iNJs+DBZdDgJ25x33xsmocuY6jg== X-Google-Smtp-Source: AGHT+IHhLXGjzf0udnyNXtByFw5OFSp0rn67LfLvjV3YqWbj0wvdktay+P0e8RZUkjfn4MvH3tUgiw== X-Received: by 2002:a5d:5886:0:b0:390:efa5:9f6 with SMTP id ffacd0b85a97d-390efa50b8dmr10698291f8f.51.1741015410709; Mon, 03 Mar 2025 07:23:30 -0800 (PST) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485dba4sm15037247f8f.92.2025.03.03.07.23.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:30 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 16/25] rqspinlock: Add macros for rqspinlock usage Date: Mon, 3 Mar 2025 07:22:56 -0800 Message-ID: <20250303152305.3195648-17-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3988; h=from:subject; bh=1vN5D7C4tfoaawMuUjp7ZN+fQy3wYGzp7nrXDJHdVAo=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYD7W48Zy4MCJ4hC7Laq1GOwnd+65hIihP1XlI Lb5c/6OJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RyvneD/ 9CSceH24BxXpHlqBvfIHao0z3PNag/hh6hxWdzMJtXnIupGNQZXotn7KA2u9LloczdUDTDU6i5vyel pyxYLhk/EO2NcmqNvsUhMnV5AQmt3i0G4X/wKGhsKXL1avILr8mSIPCqUG6RFfbDQB7MpnGHT9wsVp cQasSdKg8ZNoIJzBhiV64wzVF1LDmN5meStwPmrIfdXHD02M3uHizakkrT2k6ueYJNn2a4tr/vVKoU iRwh0w3G9w3jgipmvotAi+BjklTirlbzwjKDUdCWzM60WlU06NkoPlkUgR2q9m+804yz5qTMaenuPO mDGBty0syn+pois9BdJv0UeaAwoksZN74YC6vuHDh/6BnSK4mOisuHLP/D7iby5FHVL4tFHia6FYXq 0+DkRs/a2OqWFTk+o+cN5A+NCwpDIUbZfEc88aigLqitRC4eJuZQEHywIriAiRmy9ImnyJodlammr7 s9AjvCAzf340XzmF6Gumn6IbMHZb4p5hLT33cHlh7X1AjJ8RCiWQU9k8Q+zrAYThzoK/jY88XCaIjI FPn+NFQHv4b/Llf94bia06KjI0S5GgJJFxeSqmSE2wqAdn52bTxowHvRwOUwjeeTpXHaxYnYDXqa7m FXdZOR4Qg1a7dlf/djjyVfSDMq679lEDbrl1aOKGxNSRnLgsYeiMckHq9q8Q== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce helper macros that wrap around the rqspinlock slow path and provide an interface analogous to the raw_spin_lock API. Note that in case of error conditions, preemption and IRQ disabling is automatically unrolled before returning the error back to the caller. Ensure that in absence of CONFIG_QUEUED_SPINLOCKS support, we fallback to the test-and-set implementation. Add some comments describing the subtle memory ordering logic during unlock, and why it's safe. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 82 ++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index f8850f09d0d6..418b652e0249 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -153,4 +153,86 @@ static __always_inline void release_held_lock_entry(void) this_cpu_dec(rqspinlock_held_locks.cnt); } +#ifdef CONFIG_QUEUED_SPINLOCKS + +/** + * res_spin_lock - acquire a queued spinlock + * @lock: Pointer to queued spinlock structure + */ +static __always_inline int res_spin_lock(rqspinlock_t *lock) +{ + int val = 0; + + if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) { + grab_held_lock_entry(lock); + return 0; + } + return resilient_queued_spin_lock_slowpath(lock, val); +} + +#else + +#define res_spin_lock(lock) resilient_tas_spin_lock(lock) + +#endif /* CONFIG_QUEUED_SPINLOCKS */ + +static __always_inline void res_spin_unlock(rqspinlock_t *lock) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (unlikely(rqh->cnt > RES_NR_HELD)) + goto unlock; + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL); +unlock: + /* + * Release barrier, ensures correct ordering. See release_held_lock_entry + * for details. Perform release store instead of queued_spin_unlock, + * since we use this function for test-and-set fallback as well. When we + * have CONFIG_QUEUED_SPINLOCKS=n, we clear the full 4-byte lockword. + * + * Like release_held_lock_entry, we can do the release before the dec. + * We simply care about not seeing the 'lock' in our table from a remote + * CPU once the lock has been released, which doesn't rely on the dec. + * + * Unlike smp_wmb(), release is not a two way fence, hence it is + * possible for a inc to move up and reorder with our clearing of the + * entry. This isn't a problem however, as for a misdiagnosis of ABBA, + * the remote CPU needs to hold this lock, which won't be released until + * the store below is done, which would ensure the entry is overwritten + * to NULL, etc. + */ + smp_store_release(&lock->locked, 0); + this_cpu_dec(rqspinlock_held_locks.cnt); +} + +#ifdef CONFIG_QUEUED_SPINLOCKS +#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; }) +#else +#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t){0}; }) +#endif + +#define raw_res_spin_lock(lock) \ + ({ \ + int __ret; \ + preempt_disable(); \ + __ret = res_spin_lock(lock); \ + if (__ret) \ + preempt_enable(); \ + __ret; \ + }) + +#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); }) + +#define raw_res_spin_lock_irqsave(lock, flags) \ + ({ \ + int __ret; \ + local_irq_save(flags); \ + __ret = raw_res_spin_lock(lock); \ + if (__ret) \ + local_irq_restore(flags); \ + __ret; \ + }) + +#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); }) + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ From patchwork Mon Mar 3 15:22:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999035 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACE2122FF4F; Mon, 3 Mar 2025 15:23:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.66 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015415; cv=none; b=SikjabbnpQOL4SIZXILE99OK4O2ECii48BztuHBtT6rn7SfTSCHDyIAtbT6gLXohEa776xS9WYkz/r9E2qlJnVtn0zTsEuyqRJ+uxGqXcyR1oFaSfdkZO9hlU+xazO1KIMmVhn2+mUm6y8c3Y+b/2Agpn+obgAvViNs+Z+rLUvo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015415; c=relaxed/simple; bh=bthuxBncjGOs4mky1mKdWZk3nTI3RWDlXR+RsRusmm0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QFdaGRqwKfnKQQSS1YBSBUws9L0Kr//zZtU1Qyw7Cs9ggDSozkExC3S64nPLjt5bEo7alyG34/C23eJCzZZFD2CC8XsBQBOtFLNWcsD/mvW9tcZrB+TnJKg5Ln/9HYbjg8+ouQXppJ3/0MD3dflgHp3RUnxNDWGJ5ZhjAk/BNHo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JbMHeOSh; arc=none smtp.client-ip=209.85.128.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JbMHeOSh" Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-4394036c0efso29344875e9.2; Mon, 03 Mar 2025 07:23:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015412; x=1741620212; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MKS1T/XQ/tWs8ecRTlu3cl4kKPJmu8O7rwuQ99x/CTQ=; b=JbMHeOShC0Meih8JAt1gcghndF+ApNyi7ZpsZf4kwqWvY31rCG/VKgVJsX+kPGt1It MwV7IfZSCTnQjZwTyfhVm0Q05DEFWH6AjqcREWV6ixLI81SM98HYt+QBkoW929thymdi c7H2AaIeagt+wh4H0GnKlgwMs+h/Vd5AKrdBzr2mlNKzF94MVHJ+AnaCKsOqDqBvuZji /t+Rg86uzVI1S0mLpBjXiFohfduRLo3NPScfPZelqIH7PbSn4iQcvABkCv5s7t/5jKp0 DL1XCOoE9VLst1xoGolpsV+LQ2m0AhDigfv5MbIu6YJfUWtU8l3FOuaBdAtP0iW4zRBO gQkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015412; x=1741620212; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MKS1T/XQ/tWs8ecRTlu3cl4kKPJmu8O7rwuQ99x/CTQ=; b=YJxd3qT+/pEyuSw9zrxVtt8aR5cLWvHNRLjmcJ+Zp0CMIlNXjzhfzKfD+indHC+jga SbDSpLZE6hVdmieVf618u8BsUcY/g1aqC/9YJp9Fw09hUAD3DQODEJ8w5p2WzaOJ2OMx 2ZV4iSlenIrt3YX5V0GIi8YKPKx3VgEiTvPlkIvCkshMKQ79TV53ZCdspLEr49l29Og+ HyA+f4TYLY/41cdx9BlWPHtWgdVAM+BocvaOzWfCLS4sVtHzSJ3pNnbCk4oheSJep0d/ 47d6v/08IAfBxWfYFkuN2ucWu0TP62ImYKhbpvHOrzUMQXEcPY5BpPWCLziqd7d6lu1j LOug== X-Forwarded-Encrypted: i=1; AJvYcCWhOGx6d/1Vk2nBXyij6YHm/ldVKjNh6L0GV9phX8lN8RVh2/yU2Ggvepo7XXaO1ZGZrWThMVi8jU+In/k=@vger.kernel.org X-Gm-Message-State: AOJu0YwZ2yncphtmBYRwQK4Yeb41r3MNQt3Y2zkxNYs7RegTmIwBKzwt 5azNNhQdojn5CTyWOtBnxMoKFf4zWp6BKZPzdkT5WsBNyIhxhV6sBMIdyJUWhmE= X-Gm-Gg: ASbGnctM8nRsCwExO2rKBwUVTFiMFWF8oQb+H84Xzttl1WYebBdy1OnfInNRRuRKdpi FtD/9www8lmpDpTKd1HF7nP6Qb8/vne2tWac5PYfAmtrVa2QX0EVJ0Tioraoki9P2vbrakQcNn1 HkfazlklT4g4geWkU1CFMwKN1stgd9d7bfbuOPZj1r3iKbj0u5S0lWeCrl4yxTkK5KExfwqUJtB pUo8PYcYQV+/L7JR1b3OyVyCQyFlnO3OwI8orRh+Z+S92ZIYvJ8ihIiSKsKuZbqK0SpIZFeJpD/ wuSyyskwR1GiDzVitLRTzRuxH8XQyk8+aPQ= X-Google-Smtp-Source: AGHT+IGlH16ifEyb4PlL+eWM8UXuzJH348pjxTrHJU3GO0ZnngxsPxkDQyxyCAS30KE/L9MfeFB7Sw== X-Received: by 2002:a05:600c:511e:b0:439:6017:6689 with SMTP id 5b1f17b1804b1-43ba66e0bf5mr109140235e9.9.1741015411800; Mon, 03 Mar 2025 07:23:31 -0800 (PST) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e4796084sm15030999f8f.19.2025.03.03.07.23.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:31 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 17/25] rqspinlock: Add locktorture support Date: Mon, 3 Mar 2025 07:22:57 -0800 Message-ID: <20250303152305.3195648-18-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3149; h=from:subject; bh=bthuxBncjGOs4mky1mKdWZk3nTI3RWDlXR+RsRusmm0=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYc+Sgz0ORRtjBX9jRAAhbYoGw0HwGVEnt1lkK 0s8wahWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RylTSD/ 9TP8RtAKolLrilNYH10VbArS3ra9jOCKUgFIRYMn7uqajviRSi77CSgnaVlYnmA9EfkXqnPw6r53W8 inm8T7K1fkgOmvIbYxtsKTKVPyW6c0loQf/rxpgCr7uK34o5YRBK/iSgFyLS4uAE5QbB/Pf9jJ28lv Wtgk/Ey42frf7Uf5RQNdrzPGWDnh/n2149+2agrmt9UNjkHTYYsrn6TOvYBRgxX0kiN60RLQbmXTjC nHD31GAomXGCnNxzEDhY1G2vG7r5C67GzOWT3UMHmKgQdaRcF/Lsw8fkI6g612CABQio63//Nm0QSA Myxwcedn9UdnGHGvM0F3IlXZs2Ub9fwmfoi6C1gmJ3MJiii8TYzxMoVYTXMM5yktxCqQG11WnTNq0l b3CpjATOtu5zhfGJH4mG717bInDxZ1ZmPKWEEGT4Hz0ePktKsfs56WLmTMhKVkRnC8Fa8ztfYGJLdG bQg5bPK0unpb/3Z3pIjZUHVpqVYWKGFYCmKd8AYmN3D7RlJn7HRmZCf56bovKsEgHPk9ZlLamM213o m1cj0p0dLPhfrYItv1LDuMe8Xkd/8Dzp3oWLhMa097fZFVsUhzB6TnuCE56VdlV2ZEvObJHft6x3OR yX8L4HXgOOdKq1WkjxYTYQtu9ZbqUVfB5ojTHrx1qJR9u7pdx19dUvyvMZQg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce locktorture support for rqspinlock using the newly added macros as the first in-kernel user and consumer. Guard the code with CONFIG_BPF_SYSCALL ifdef since rqspinlock is not available otherwise. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/locktorture.c | 57 ++++++++++++++++++++++++++++++++++++ kernel/locking/rqspinlock.c | 1 + 2 files changed, 58 insertions(+) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index cc33470f4de9..ce0362f0a871 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -362,6 +362,60 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = { .name = "raw_spin_lock_irq" }; +#ifdef CONFIG_BPF_SYSCALL + +#include +static rqspinlock_t rqspinlock; + +static int torture_raw_res_spin_write_lock(int tid __maybe_unused) +{ + raw_res_spin_lock(&rqspinlock); + return 0; +} + +static void torture_raw_res_spin_write_unlock(int tid __maybe_unused) +{ + raw_res_spin_unlock(&rqspinlock); +} + +static struct lock_torture_ops raw_res_spin_lock_ops = { + .writelock = torture_raw_res_spin_write_lock, + .write_delay = torture_spin_lock_write_delay, + .task_boost = torture_rt_boost, + .writeunlock = torture_raw_res_spin_write_unlock, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, + .name = "raw_res_spin_lock" +}; + +static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused) +{ + unsigned long flags; + + raw_res_spin_lock_irqsave(&rqspinlock, flags); + cxt.cur_ops->flags = flags; + return 0; +} + +static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused) +{ + raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags); +} + +static struct lock_torture_ops raw_res_spin_lock_irq_ops = { + .writelock = torture_raw_res_spin_write_lock_irq, + .write_delay = torture_spin_lock_write_delay, + .task_boost = torture_rt_boost, + .writeunlock = torture_raw_res_spin_write_unlock_irq, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, + .name = "raw_res_spin_lock_irq" +}; + +#endif + static DEFINE_RWLOCK(torture_rwlock); static int torture_rwlock_write_lock(int tid __maybe_unused) @@ -1168,6 +1222,9 @@ static int __init lock_torture_init(void) &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, &raw_spin_lock_ops, &raw_spin_lock_irq_ops, +#ifdef CONFIG_BPF_SYSCALL + &raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops, +#endif &rw_lock_ops, &rw_lock_irq_ops, &mutex_lock_ops, &ww_mutex_lock_ops, diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 3b4fdb183588..0031a1bfbd4e 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -85,6 +85,7 @@ struct rqspinlock_timeout { #define RES_TIMEOUT_VAL 2 DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); +EXPORT_SYMBOL_GPL(rqspinlock_held_locks); static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) { From patchwork Mon Mar 3 15:22:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999036 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47167231CAE; Mon, 3 Mar 2025 15:23:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015417; cv=none; b=JvCPiSpQpr2myJxr3vNmr6Jtb3JAYsouwbs8AVhf8uKBhVEoZgm/jsJkAD6YGxKlkbQCMuFO7ojZw2JjMvnkSwRqZOhtCxUYiwDX6I+bS0gS0ZAuRmzsxDQkJ29vu0EG3BdJxJVfndo8fT09bxsEopjFp2WhXMgr+jQmZNeQq9E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015417; c=relaxed/simple; bh=qSIKPInVzZ0EbqaTg/PJPPvBSQOZHHVdC+CsMy/ps90=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BLCOhj+aI+MAbRb89H8F1BZ/Bhw71RG7PNwrhT1hKllfeUgkcGAD7061ZoBvLAR4EhVqYYaKnAkzVz0fwJ5eXGM441XGkyo2WKmGBEDL4X3IaPMYsCOH+5BSUDP/3bgKRmrXTcGIYeBplP9Lvse5ioha6qucs5T+SVnJBxS2WnE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TnBKoaiU; arc=none smtp.client-ip=209.85.221.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TnBKoaiU" Received: by mail-wr1-f68.google.com with SMTP id ffacd0b85a97d-390ec7c2cd8so2125421f8f.1; Mon, 03 Mar 2025 07:23:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015413; x=1741620213; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7Uc01JwRLTx4LvOAU9cv3DgvL5O0RF1rRGwSEjckiRA=; b=TnBKoaiUodbGA0uOVg4hxvU7srRK28WThtS/o9qLwC/0es2dBI14SDdv+SIkMcSDTO 1DkAcYhnM5L9NB4lgQLPInEHQWStaVwO5+VYMZkjvOIGyBTH0OW1xzgoJOcMYXb1kpwt oOO9PGlaLBB1hymbXGnKLGHSY/7d05UgoXy1uCXGe1jRdKc4Ik5Y/DPvCVhg4HBJy5pk ifCF6z7ckKETo0SVgdTtvl3fMxuQ68uWBM2gIyOxGFARWa2mwubUySI40Nae5g31UQha LkF3l/Tl/Kpylj6811E10PKPtANvTt0pt0Kyc7lZk++nEn14ti8Jpz9CRBq8DYL01tTf bofg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015413; x=1741620213; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Uc01JwRLTx4LvOAU9cv3DgvL5O0RF1rRGwSEjckiRA=; b=opNI5Kfqr4F6+co2uAT0bLkCbP4tvt6vK/K9cI1eKmKvqEtrIZGb5bVqC2CHd2VuPE qHj2q5/T+EaaewLsWE5U9kJfsDiBQ4SGGOvhjyTlm7/W+5IKUgLkl/eeSQlN6e7E73sz xkmU+1qpLueJaMOpriGkKYWAttlCVRlQm59qTx9nU0bLvGjAIg39zT6Qxvzc0HpPQk9D 9VI56ztbD+r9nJ4gN2Hc/3H1d9K6Xjl6VcTs2WnF+E7Fq462pj/nUSrXVXfS/4qAB3Uh KA+ejEEVmmxkYurwwl3wxeehwi+rD5bOvUNV1pUluRWk1vuvelxvTyFvxVANFrejSB9A /dSg== X-Forwarded-Encrypted: i=1; AJvYcCU64FhroOy9q5SbfqrjxWIjms0k78Rc4eBuKierI6HpevI6iwLGVXcUwaCJMlr0AdIMZFTjJo5WDvzzBFk=@vger.kernel.org X-Gm-Message-State: AOJu0Yyqqroey/eRynecUXY5/a/YlgFOzHvd9lizgIryqrr4cY5svPXP MEXvkbjI7YnIaKTA+L5SNsRnggAIb0UpnVQMOPVElDULFBxm6DCq7BkOMKPRRXs= X-Gm-Gg: ASbGnct7ZHyxTB/N/xeLYgLJvL2ISnBTKe3KPme8y1coIGPmgi3BQ5OVK/d7pnYL130 +QC0wLnmFS0LeZHR70CpIJeef1/Ll6lNOTUZNoQfa9/FNDxFZNw+rUlE+WH49ddjbNzvxHzf7MU xOVpgdUjD6M7EShQBemCbyTfwcXQOArsjVyUpFuSeKPw7jgUl6W1iYBYW93agXwZAKnmPEjq42R OcK7wXu4jTOyp0NPvSfLJTMusIdSpnaoKSSwwlotKWDQjngRxXZv/GhrtKcqeOE/IKG2dp+hWuT NMfHbkXBd3wmcU0oQi+r0kdfxminMhbqFQ== X-Google-Smtp-Source: AGHT+IE38gK5wJECZxFlCYL7vqyBrElWlJLLivqEZ8lVMAciRT4pe2f0X2OlAMW3qevmIMhBs6VcWg== X-Received: by 2002:a5d:6489:0:b0:390:f641:d8bb with SMTP id ffacd0b85a97d-390f641d990mr8952175f8f.36.1741015412881; Mon, 03 Mar 2025 07:23:32 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e4795d1asm14571262f8f.4.2025.03.03.07.23.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:32 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 18/25] rqspinlock: Add entry to Makefile, MAINTAINERS Date: Mon, 3 Mar 2025 07:22:58 -0800 Message-ID: <20250303152305.3195648-19-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2083; h=from:subject; bh=qSIKPInVzZ0EbqaTg/PJPPvBSQOZHHVdC+CsMy/ps90=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYI9HLNzIFosVfpo/eAGZLrsdbAXI02uiFuv5j gAn43fCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RyraqD/ 429SlE3+raZFobBUUclVgrmx/5S5ghp0W6qHlpRJg15mqb/wQ7Nm7MfnP8Mt/cd/yugjieC5xXaIZZ 2gRGy1o4SiUDG91z7FYRxs5REy+twKo6/1wchd494MGNac2afJXVeogt88nnikDfs98ML1W3w4kR3r djShNfhU1YlT5UB9SixxXZsaVabsgXdlD37rG/rItcXLa2S3En21E7gq3ApgqdOWZE8a6/JpEG8kW4 1oXfdGyOuubLdfXTC9F1nuSlIdMSGnFINmAGPeE0mrtgS74cGYxE1pMpplfJMozQabn7YjTuU03xNZ 11p1zrVmvfoac73XycC6N+KM+AcpskEIKpjnp+SKHkhbf4PG513KLKQYTz3VNWwfUHh5AKOH6FYBPn sZ9LhuZ5FHOn+bfFHRURJQOjSlOHS088r968BsyQsVY1qv9RVyVM8VbqXS/C+frx/mSypWU0V8xVte r5NQyi+CfLnuOjZe5fyJo34b4gNNn+J/hqFR486kfOAD9hBD9pjSNnaD7P8U1z/36+m57yipbTlO3B QKtFBqPHAvNwGSaCkhb+JWZfcH7pd2jC83wadWyvMeixxn0WY4FwIP1tLyCP6lxkcbqNUICVYdOfqn LQ9KXmI+sQGaXI4NJ2OV88iI9YE445UWW9V+2WXlW683DUBbYbB8AjtUHjIg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Ensure that rqspinlock is built when qspinlock support and BPF subsystem is enabled. Also, add the file under the BPF MAINTAINERS entry so that all patches changing code in the file end up Cc'ing bpf@vger and the maintainers/reviewers. Ensure that the rqspinlock code is only built when the BPF subsystem is compiled in. Depending on queued spinlock support, we may or may not end up building the queued spinlock slowpath, and instead fallback to the test-and-set implementation. Signed-off-by: Kumar Kartikeya Dwivedi --- MAINTAINERS | 3 +++ include/asm-generic/Kbuild | 1 + kernel/locking/Makefile | 1 + 3 files changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 3864d473f52f..b0179ef867eb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4297,6 +4297,9 @@ F: include/uapi/linux/filter.h F: kernel/bpf/ F: kernel/trace/bpf_trace.c F: lib/buildid.c +F: arch/*/include/asm/rqspinlock.h +F: include/asm-generic/rqspinlock.h +F: kernel/locking/rqspinlock.c F: lib/test_bpf.c F: net/bpf/ F: net/core/filter.c diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 1b43c3a77012..8675b7b4ad23 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -45,6 +45,7 @@ mandatory-y += pci.h mandatory-y += percpu.h mandatory-y += pgalloc.h mandatory-y += preempt.h +mandatory-y += rqspinlock.h mandatory-y += runtime-const.h mandatory-y += rwonce.h mandatory-y += sections.h diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 0db4093d17b8..5645e9029bc0 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -24,6 +24,7 @@ obj-$(CONFIG_SMP) += spinlock.o obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o obj-$(CONFIG_PROVE_LOCKING) += spinlock.o obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o +obj-$(CONFIG_BPF_SYSCALL) += rqspinlock.o obj-$(CONFIG_RT_MUTEXES) += rtmutex_api.o obj-$(CONFIG_PREEMPT_RT) += spinlock_rt.o ww_rt_mutex.o obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o From patchwork Mon Mar 3 15:22:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999037 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BCB8233D8C; Mon, 3 Mar 2025 15:23:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015418; cv=none; b=QGQMVXH3l2hgiQmHmXvCqAz5Xj4MvXx9+nVVppe77aqZqqGO3yj7y7lbbXOSaSyNkab1bdkJhoSqZPa+SUjGE51CN351e9iGDIXTT2UPO3ywP3ESaBgNzuMcah3f517FcGOV3bsCU/Z9rtXnXfSgbCJqFu3a+KUaWlwotFGVhbQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015418; c=relaxed/simple; bh=4ih0G/0286de3X2HOwCgeQ8ZZwbokSdXkcTQRkadVk8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=h45zHW78kfpDOEQ4j+AjLBMXS1EuTQ2ZXCe7MwsYg4/OICSM/UrHvaZInmaD7XKp6B+XTu3e71cn7S/KXwlWyN+EGoIkg9glfZO6RRaJlQ8o7x/WmG7tzdeg9z4oS1f4RnLxEGFmuG0QqNbbPdOHm3cYl46R47y8zjAeEOAQW90= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MZYk0jzh; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MZYk0jzh" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-4394a823036so43939805e9.0; Mon, 03 Mar 2025 07:23:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015414; x=1741620214; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jp6uG/E/DcwB9A5vS09y6QoqRf0wNeVJjI9fKDMNnPg=; b=MZYk0jzhbj9BL4/oHqAaDY9k0ixarFj+A9KBZzJ2IOIFajWn8p3t0oOtLcr6EQiGD/ 8yelWg8C9D18uqpR82g8AsySYnpmjw0TAegVKsCJxJPrZjSflzMd8WDh0Gsns/iwTM1t tNXRDY6PgLvG+/rwME7GKKM8M6o15TQOr8EGjLTsytjKJ/IaSKOqcvNqYFoaK69BwaNL obPgxNy24YpW+QCNRn3e5U/PBAd5TrMDV/nqFum7yr2xwFgOKgvIABMKDT418tn0nSOP BfcTWptL3anBd3njGDFbHtnsOrweJc0amCzLvcliCAGyMsZr8QhqjhoNNhXvuAnBMGVA l/2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015414; x=1741620214; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jp6uG/E/DcwB9A5vS09y6QoqRf0wNeVJjI9fKDMNnPg=; b=QlNyQJNPl+/x9pSZCO9FpJSftDTWk4zvXBFg2K4qTJnNlfCDGtOGOL0+o0gcEY+jCl T30hhbM5EveD53ZJX4sAiOeXgm1JmgJON0RpHc4V/DOUcv3MmNcuqVOeHa6gMcoMTkPM EPXN3EucYzhQXndnx/ADK0pAFyJAvdeuN27YFRDRCisA7pXJ8A6+cwsSo0tw4d1KsUs6 zZkrvZRwCAicOGPdO1WEysaWHKzG6lc+eAfJLR5SVrduLFlEa9t97gRZaGtryRQTvTF6 DN+zJmSMqtni4l9nvzLl9w6mAiSs7SFlgoT+j+D7cYhwmt/hfKvA2r9AnuQPden7aXwr M2/Q== X-Forwarded-Encrypted: i=1; AJvYcCU+54MumYyjdaka5MAAneN7ppSSKis4TmbU7HBCCo7QuD6gjpXpLZjQitG61QVS3RNSjEv0qai7WPE4MTs=@vger.kernel.org X-Gm-Message-State: AOJu0YzapEr8gCGAvskdTUMUGkD90u6LLntcP8tp+ExpQtRzxhpdbjx5 qt/gKvI3WliNV4fCF6YuUUu9g6lC/Ey6W1W9gfLnIU7TAUr44fyJ2iRKZSHcY3E= X-Gm-Gg: ASbGncsVArEx3JCMwAEf+pwqCzfPhr5+2GvoeMODu7fAFyXsmmbjDKYsbrnnEFKzrdQ Z0D3z3kBlCw863LRcGE1ZE6O0M9FTcCKLIw9zPgLr+n2wdOdj/n+nkZGwBJEyqMvPv1fAGrB1U/ m6ZVWfd0i7PcS0n1fC+6zXAzp1lZcVjmR+DGAd1M7gS9LkZ4xXKupHJsaYpPNB6quo6eM2pMF7Q WCYI3wzfLBtAqNJmrGkSCQIr9fJsr+a5bxJ6afOdK8xQdN9wC0Iwo5trvfL+RWZZgmfY2GNdF6a uhKna14cqRANBGH+7dtBCXX5HC5oVl4= X-Google-Smtp-Source: AGHT+IEbl3LTjrdqPSH5d3kkpLqYi842EMyLleWiCvMkmyw1mm4oqWLsvTK8TF5jjGJQgV7w+8Emkg== X-Received: by 2002:a05:600c:19ce:b0:439:99e6:2ab with SMTP id 5b1f17b1804b1-43ba6766b44mr95692715e9.28.1741015414338; Mon, 03 Mar 2025 07:23:34 -0800 (PST) Received: from localhost ([2a03:2880:31ff::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bad347823sm110368705e9.0.2025.03.03.07.23.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:33 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 19/25] bpf: Convert hashtab.c to rqspinlock Date: Mon, 3 Mar 2025 07:22:59 -0800 Message-ID: <20250303152305.3195648-20-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=11131; h=from:subject; bh=4ih0G/0286de3X2HOwCgeQ8ZZwbokSdXkcTQRkadVk8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYUEPdLFj8sZgqLb5zn7VbigpDGWSBgEbqM3XA x/fwLJeJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Ryv8LD/ 97jXIjkq7Lj++KTQTcQoBHp5m3OgRssDYOiFDubICyCP6RJL9F3I1YPvZ3V7aYFcJJBEuTYglkhD3z vdEzq0g+0G/b/2vgbS8q/Q6tlI4W54PihyS+qwGrnVz3uJPq2Vh8nK0kcPwLJzfLzM9O0weSfEcFz3 WN5AmPXgB759SWiegOPd5FlfEdvcgoDzQHqKNy5Oc4QGYqMMxfoe1BajpDdGIKckYE1k/Q3jD4Ika7 0Ocsr3XMKJdKzWaZ0++mS0Xg1WPgVlB28K08SYP/VcJv8ymSdntzx3T2FCD5ezdoG+8lXnrnhFKkSq JPUZizhxMzGDID+hebcv5VroikizRXHUOZP+OdlHq3FVRtQcu9YmttvUI/AYkCqVIlchD1WcW7nPg3 wBl+QpZfzhAmz3RlwPTacFVnw92AeQLJ/wEgyufOlTdQkFBwITR04JWL46ibiZugmnV5nF/HRctNPR z8cV3U0H2O39bk3dQ7m7Gb17vyUwD+31Bu/9WsIU+oFd9vj3uWEWCMCk7SbIHD+lTSHwZ9j90+KCKm 2aohPllCBBFhVPuDHR3hlKDDoNV1hA1lVZZIbepftZGRQYKUeoQQdxjHTha7P81H3VVeD0xhU+HmJN uqimoBk7BWk8BQ4E3HYoNBJ1UHaIZO4UP5kjM4Il4c+PzynID4cM955BCL0A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed per-cpu counter crud from the code base which is no longer necessary. Closes: https://lore.kernel.org/bpf/675302fd.050a0220.2477f.0004.GAE@google.com Closes: https://lore.kernel.org/bpf/000000000000b3e63e061eed3f6b@google.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/hashtab.c | 102 ++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 70 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index c308300fc72f..93d45812bb6a 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -16,6 +16,7 @@ #include "bpf_lru_list.h" #include "map_in_map.h" #include +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -78,7 +79,7 @@ */ struct bucket { struct hlist_nulls_head head; - raw_spinlock_t raw_lock; + rqspinlock_t raw_lock; }; #define HASHTAB_MAP_LOCK_COUNT 8 @@ -104,8 +105,6 @@ struct bpf_htab { u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; - struct lock_class_key lockdep_key; - int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT]; }; /* each htab element is struct htab_elem + key + value */ @@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab) for (i = 0; i < htab->n_buckets; i++) { INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i); - raw_spin_lock_init(&htab->buckets[i].raw_lock); - lockdep_set_class(&htab->buckets[i].raw_lock, - &htab->lockdep_key); + raw_res_spin_lock_init(&htab->buckets[i].raw_lock); cond_resched(); } } -static inline int htab_lock_bucket(const struct bpf_htab *htab, - struct bucket *b, u32 hash, - unsigned long *pflags) +static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags) { unsigned long flags; + int ret; - hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1); - - preempt_disable(); - local_irq_save(flags); - if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { - __this_cpu_dec(*(htab->map_locked[hash])); - local_irq_restore(flags); - preempt_enable(); - return -EBUSY; - } - - raw_spin_lock(&b->raw_lock); + ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags); + if (ret) + return ret; *pflags = flags; - return 0; } -static inline void htab_unlock_bucket(const struct bpf_htab *htab, - struct bucket *b, u32 hash, - unsigned long flags) +static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags) { - hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1); - raw_spin_unlock(&b->raw_lock); - __this_cpu_dec(*(htab->map_locked[hash])); - local_irq_restore(flags); - preempt_enable(); + raw_res_spin_unlock_irqrestore(&b->raw_lock, flags); } static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); @@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU); bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC); struct bpf_htab *htab; - int err, i; + int err; htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE); if (!htab) return ERR_PTR(-ENOMEM); - lockdep_register_key(&htab->lockdep_key); - bpf_map_init_from_attr(&htab->map, attr); if (percpu_lru) { @@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (!htab->buckets) goto free_elem_count; - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) { - htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map, - sizeof(int), - sizeof(int), - GFP_USER); - if (!htab->map_locked[i]) - goto free_map_locked; - } - if (htab->map.map_flags & BPF_F_ZERO_SEED) htab->hashrnd = 0; else @@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) free_map_locked: if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) - free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_elem_count: bpf_map_free_elem_count(&htab->map); free_htab: - lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); return ERR_PTR(err); } @@ -817,7 +783,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) b = __select_bucket(htab, tgt_l->hash); head = &b->head; - ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return false; @@ -828,7 +794,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) break; } - htab_unlock_bucket(htab, b, tgt_l->hash, flags); + htab_unlock_bucket(b, flags); if (l == tgt_l) check_and_free_fields(htab, l); @@ -1147,7 +1113,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, */ } - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1198,7 +1164,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, check_and_free_fields(htab, l_old); } } - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l_old) { if (old_map_ptr) map->ops->map_fd_put_ptr(map, old_map_ptr, true); @@ -1207,7 +1173,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, } return 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); return ret; } @@ -1254,7 +1220,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value copy_map_value(&htab->map, l_new->key + round_up(map->key_size, 8), value); - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; @@ -1275,7 +1241,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); err_lock_bucket: if (ret) @@ -1312,7 +1278,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1337,7 +1303,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, } ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); return ret; } @@ -1378,7 +1344,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, return -ENOMEM; } - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; @@ -1402,7 +1368,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, } ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); err_lock_bucket: if (l_new) { bpf_map_dec_elem_count(&htab->map); @@ -1444,7 +1410,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1454,7 +1420,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) else ret = -ENOENT; - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l) free_htab_elem(htab, l); @@ -1480,7 +1446,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1491,7 +1457,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) else ret = -ENOENT; - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l) htab_lru_push_free(htab, l); return ret; @@ -1558,7 +1524,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map) static void htab_map_free(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); - int i; /* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback. * bpf_free_used_maps() is called after bpf prog is no longer executing. @@ -1583,9 +1548,6 @@ static void htab_map_free(struct bpf_map *map) bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) - free_percpu(htab->map_locked[i]); - lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); } @@ -1628,7 +1590,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &bflags); + ret = htab_lock_bucket(b, &bflags); if (ret) return ret; @@ -1665,7 +1627,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, hlist_nulls_del_rcu(&l->hash_node); out_unlock: - htab_unlock_bucket(htab, b, hash, bflags); + htab_unlock_bucket(b, bflags); if (l) { if (is_lru_map) @@ -1787,7 +1749,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, head = &b->head; /* do not grab the lock unless need it (bucket_cnt > 0). */ if (locked) { - ret = htab_lock_bucket(htab, b, batch, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) { rcu_read_unlock(); bpf_enable_instrumentation(); @@ -1810,7 +1772,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); goto after_loop; @@ -1821,7 +1783,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); kvfree(keys); @@ -1884,7 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, dst_val += value_size; } - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); locked = false; while (node_to_free) { From patchwork Mon Mar 3 15:23:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999038 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DABAF2356A6; Mon, 3 Mar 2025 15:23:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015419; cv=none; b=UkqY6Px38VXsDl9draMxZwDGXYxh5xLxmdnQyzdSsvHKbmkh6ml0CILcYzOBm4J6nCD9YlVCWTg5YLxbVFu41LccP23cC3BqxSN4WQidkew9p7JoN2f0+YGFFxCUc/5A9Oodu0SLKNk3WQSX34ikCxYQafdVuAw/O7PIkHfs8oQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015419; c=relaxed/simple; bh=at8Ekv6S0aGbB9KEki4TPKeyEB0J4bUZMvZ4q+vMLKI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=I0yiLLz68ziMo+7935BgbMBPZ6rmrLjWq4sDmLZrbSVdOTJSymLvrVML3sNaIPl8G/T9y5uqRkiqpo7JT7NZDuTS/VAR62xdAEi8QIrzGaGJR64CJjofSyoW872sTUhZecbLgH+/0xiNSAbuzrsW5uEMF0HvFrfhikytITUVzoA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MnFEf66B; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MnFEf66B" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-43948021a45so42175175e9.1; Mon, 03 Mar 2025 07:23:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015416; x=1741620216; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e8dD4/Kp++x7YqbDF8EB2j5tk6yfejIzgpWScMmcz5s=; b=MnFEf66BIAGpvy3dHyStWHajoA4i6JOty4nZJ8hAPBs88ToLIr+VOQ7Ox7Gr9Mzew9 sf9l9ABpVA+ORLagjSAmXSLAQsgOtRgfUyNmzcr+LzZZwA6ngnOwYGHEgEBILpkTl0Zz z15VpoblZVp2QMOBgzx+uHJdxtbEsyBxOFC9xOiIoyJPuHd+vtKSuzFY7d+CZv72EHKu Uu3D7jWZ1bvDNDEykdHnCMwcwIG5QRtUCLjywWNTNTwdksPMt1qd5JpxOLeMtKpTEVQz VCAp+vCFQQ7fDul2IIJmgfliHkPAZtL7LPn7xI2ISboX0X9DEBYo6/u19aTXjS+lDnKi o+4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015416; x=1741620216; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e8dD4/Kp++x7YqbDF8EB2j5tk6yfejIzgpWScMmcz5s=; b=BCk/fe2CvR4Lr/uhhm0vs6owyIGBgV4N4gp3asepA6zXxgNMLyPXNdkz+tU5aIZUKC KIexCJmb3TjRAmbwsP4qbZOPE4GMIzmpXNKlkcyOWg4pzdtQncrvu9SG9/aiBDO5xbQG XnuQ3mQlnwpgToaSuD5IeJp+XtodvAHGhVihJmEcznVQE9RsVPz/Mvb17EqeUyNdmHdr JUwZjnNSAaMTPTiofmr6QWlZ4u/PNo7DnMFLPAvNlUJ4OIGfSPGNi01MOKAMlP0epPZG RwGAzGViPje90BJNGnJhr1RAxkdE/HhhZR4mHTYwrYfvm/mprNTweBFwZSTkAB1ruU08 K/yw== X-Forwarded-Encrypted: i=1; AJvYcCU/QfailE7pUwLwtKDWZpRu6U84RoXrRJ8jxxlFwaFZhpdXib+k+ljHaVax5dYjY6XrxCPChf8JZLTYQWs=@vger.kernel.org X-Gm-Message-State: AOJu0Yw5XEcWsAhsqbOeRCjkVuFeBeXBsVlUV2vhfqMFpN/4EZOQ+Q7e XV+tY8rfJ9j+4itS70pq/J3Gd8p1ZBMNUYl6lnmbsS1iuuLlEoWlKktEqX6RwyA= X-Gm-Gg: ASbGncty5kCBBqkBHvKF2cPng1+7ZTIxPPBixbu8SMjxd5+F13OJCs2dMYEYuIBfRTW vh4EvyDjPirNtFbBpI6vm8BYc8W7YQlQCxrJJr/zAPt/AI3WKLPkvdBxrmQur0pUsvIoIhlicCH BuKnrq9XUXVfZJjwTEUwNiqkZNb7u/wCsms+LFLwvq+ApHa5XfbAuRo5a2fo0K9A6Owq9QOeWHE 4i9LcX+rof5iCIJUAFd2xQdx+HgD8D61IZcb1Uk/LA2Ya+ld+Ly+WOL/AVQOOmutWcUHbSyCLFk 0O9KKn9Sitzi14UEDUvza9GKgkPQ2bDInFM= X-Google-Smtp-Source: AGHT+IFM6Zq1v0jmMwXwYqRoD9oDMqR8dF4x86BBEAU0OdxfdV7k9GzF2/Hh85klPoaqsXWBJydz9g== X-Received: by 2002:a05:600c:138c:b0:439:5da7:8e0 with SMTP id 5b1f17b1804b1-43ba6710819mr131963315e9.16.1741015415569; Mon, 03 Mar 2025 07:23:35 -0800 (PST) Received: from localhost ([2a03:2880:31ff:52::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43aba5329besm191031045e9.15.2025.03.03.07.23.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:34 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 20/25] bpf: Convert percpu_freelist.c to rqspinlock Date: Mon, 3 Mar 2025 07:23:00 -0800 Message-ID: <20250303152305.3195648-21-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6720; h=from:subject; bh=at8Ekv6S0aGbB9KEki4TPKeyEB0J4bUZMvZ4q+vMLKI=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYU15SR6rg3ngy0KboyZkXB1I7iXAgTU+4Zv4a q8n+lJ+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Ryp6rD/ 9okBvTAvR2PTgrgAzFgfUFBu77V0INCzcE3PwQfEwD6rcvTnQUIlryGruEBawLfwn0wxW24wRL/dRy nGL+yuYrEX9YWrs+Pqsz7o2vt7yJfza0S2lZUFT62qMjwUQg9BarfBMcmdWT9afoe0wpXqzQJZc6jh N6kk4VGDj6B7Px2V2Q0EmPXTnjMNmTYmW+3aIZ54OWksD6UxJnflDMq5xbEWu/GF4B5K4mbSvFMCII lU78oDBb2buz6I6WKt9BQukjzIuQSRCcH/r1FEJvzfoQ4A3TWk18Y1lxWqoLywUa4Ts7ff2I7yus+1 Zj/YRvGo//Lcbv35gHIqrjLfvvKXZaW20bcvjfIhytSshAaX3innu2cR2sqFFrVmcb9QwjjqIKzOCa 5FGA1xCE5jK6UkBwTmMKx4rg1e1fUi1vRIbBAjX86NNJ0GuOCdFjKuYocy4yxL30V9Ik680i0XeSlr QnkJNq/7VLZRNEjs54b4gDrXYDTwnRE6WzcfhV7ZQ4Ju0vanEBetGxwfUjmZM0M0ajtqIWGYnzgiFS R5C7BadAXVRz5lKKGAlQxBfZ6nS2erxfMxeR6kGXDbOMO6k1byllETTKxBijhGc5WApPvSUKF8aOUK sGKE2uv8eHTZLfSfjrYm32k98H6hP2O73mx941c2I3ASB3LXurQ/pSaQGQRQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Convert the percpu_freelist.c code to use rqspinlock, and remove the extralist fallback and trylock-based acquisitions to avoid deadlocks. Key thing to note is the retained while (true) loop to search through other CPUs when failing to push a node due to locking errors. This retains the behavior of the old code, where it would keep trying until it would be able to successfully push the node back into the freelist of a CPU. Technically, we should start iteration for this loop from raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus, we skip execution in the loop body instead. Closes: https://lore.kernel.org/bpf/CAPPBnEa1_pZ6W24+WwtcNFvTUHTHO7KUmzEbOcMqxp+m2o15qQ@mail.gmail.com Closes: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/percpu_freelist.c | 113 ++++++++--------------------------- kernel/bpf/percpu_freelist.h | 4 +- 2 files changed, 27 insertions(+), 90 deletions(-) diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c index 034cf87b54e9..632762b57299 100644 --- a/kernel/bpf/percpu_freelist.c +++ b/kernel/bpf/percpu_freelist.c @@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s) for_each_possible_cpu(cpu) { struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu); - raw_spin_lock_init(&head->lock); + raw_res_spin_lock_init(&head->lock); head->first = NULL; } - raw_spin_lock_init(&s->extralist.lock); - s->extralist.first = NULL; return 0; } @@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head, WRITE_ONCE(head->first, node); } -static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head, +static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head, struct pcpu_freelist_node *node) { - raw_spin_lock(&head->lock); - pcpu_freelist_push_node(head, node); - raw_spin_unlock(&head->lock); -} - -static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) -{ - if (!raw_spin_trylock(&s->extralist.lock)) + if (raw_res_spin_lock(&head->lock)) return false; - - pcpu_freelist_push_node(&s->extralist, node); - raw_spin_unlock(&s->extralist.lock); + pcpu_freelist_push_node(head, node); + raw_res_spin_unlock(&head->lock); return true; } -static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) +void __pcpu_freelist_push(struct pcpu_freelist *s, + struct pcpu_freelist_node *node) { - int cpu, orig_cpu; + struct pcpu_freelist_head *head; + int cpu; - orig_cpu = raw_smp_processor_id(); - while (1) { - for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) { - struct pcpu_freelist_head *head; + if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node)) + return; + while (true) { + for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { + if (cpu == raw_smp_processor_id()) + continue; head = per_cpu_ptr(s->freelist, cpu); - if (raw_spin_trylock(&head->lock)) { - pcpu_freelist_push_node(head, node); - raw_spin_unlock(&head->lock); - return; - } - } - - /* cannot lock any per cpu lock, try extralist */ - if (pcpu_freelist_try_push_extra(s, node)) + if (raw_res_spin_lock(&head->lock)) + continue; + pcpu_freelist_push_node(head, node); + raw_res_spin_unlock(&head->lock); return; + } } } -void __pcpu_freelist_push(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) -{ - if (in_nmi()) - ___pcpu_freelist_push_nmi(s, node); - else - ___pcpu_freelist_push(this_cpu_ptr(s->freelist), node); -} - void pcpu_freelist_push(struct pcpu_freelist *s, struct pcpu_freelist_node *node) { @@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size, static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s) { + struct pcpu_freelist_node *node = NULL; struct pcpu_freelist_head *head; - struct pcpu_freelist_node *node; int cpu; for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { head = per_cpu_ptr(s->freelist, cpu); if (!READ_ONCE(head->first)) continue; - raw_spin_lock(&head->lock); + if (raw_res_spin_lock(&head->lock)) + continue; node = head->first; if (node) { WRITE_ONCE(head->first, node->next); - raw_spin_unlock(&head->lock); + raw_res_spin_unlock(&head->lock); return node; } - raw_spin_unlock(&head->lock); + raw_res_spin_unlock(&head->lock); } - - /* per cpu lists are all empty, try extralist */ - if (!READ_ONCE(s->extralist.first)) - return NULL; - raw_spin_lock(&s->extralist.lock); - node = s->extralist.first; - if (node) - WRITE_ONCE(s->extralist.first, node->next); - raw_spin_unlock(&s->extralist.lock); - return node; -} - -static struct pcpu_freelist_node * -___pcpu_freelist_pop_nmi(struct pcpu_freelist *s) -{ - struct pcpu_freelist_head *head; - struct pcpu_freelist_node *node; - int cpu; - - for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { - head = per_cpu_ptr(s->freelist, cpu); - if (!READ_ONCE(head->first)) - continue; - if (raw_spin_trylock(&head->lock)) { - node = head->first; - if (node) { - WRITE_ONCE(head->first, node->next); - raw_spin_unlock(&head->lock); - return node; - } - raw_spin_unlock(&head->lock); - } - } - - /* cannot pop from per cpu lists, try extralist */ - if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock)) - return NULL; - node = s->extralist.first; - if (node) - WRITE_ONCE(s->extralist.first, node->next); - raw_spin_unlock(&s->extralist.lock); return node; } struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s) { - if (in_nmi()) - return ___pcpu_freelist_pop_nmi(s); return ___pcpu_freelist_pop(s); } diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h index 3c76553cfe57..914798b74967 100644 --- a/kernel/bpf/percpu_freelist.h +++ b/kernel/bpf/percpu_freelist.h @@ -5,15 +5,15 @@ #define __PERCPU_FREELIST_H__ #include #include +#include struct pcpu_freelist_head { struct pcpu_freelist_node *first; - raw_spinlock_t lock; + rqspinlock_t lock; }; struct pcpu_freelist { struct pcpu_freelist_head __percpu *freelist; - struct pcpu_freelist_head extralist; }; struct pcpu_freelist_node { From patchwork Mon Mar 3 15:23:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999039 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 124B42356D2; Mon, 3 Mar 2025 15:23:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015420; cv=none; b=WIDvM1dvgOKov/ZLHCFarZEB7FxAuzA695RQQM2TttIZV0PHNIZpPtlgxO4+bmj8l1eNrWVXpOiu/r02pImPRClIsE78txGWwggbdrvJa2fO24XGDe1TJiK7w27TewIvKEfcquiYBcoWNT4T1QyquofAPMlpUAGZcVLaGOXmvhw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015420; c=relaxed/simple; bh=H+4fvDbeMKpg9kdHFVQF9EirB/DOLCaq4vd16HHmH14=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eRXZjT9Wsj+gf5JjDh0q4/1XHX4w+3FU5soHhHtNwvfFj03mUF12/RV1+OWGqmUY+BimeM/uO0ZYA+DSy6rr5cKgyAtt78wWgOdZIO24nbelqogUjSezSoQOPMCoObkLfibHC/4pxQgCjCOvubIBz8r8RdmycXsEKF3hizVS7IQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TkDugpBn; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TkDugpBn" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-43bc0b8520cso7209845e9.1; Mon, 03 Mar 2025 07:23:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015417; x=1741620217; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ign20YEmiqhrw2u8+y6jgpbQIECAjRoqEHqNv09q2OE=; b=TkDugpBnxQ1istzcieqRvVSmwrgdVo5Om7HfAezQOioKJ0ohdJVw6vK3TH/GKYQ9vZ suKW6m+VMMRGmnrEraNh8lB3EsUe29ly8uvn3qKSYtHbngTEgt0pWazTs4nhOPLWspGf Yz3tREYLMNvNmz1MacKAW2AyAdlE1SmvMwkfqB3LN8xduC9NDvm55rXPOS+WzGON3Nvi IL7NYCFoimHwB+T/c6+AUVbZ0yhKyInwJYzOdb6OLN3uICjSkiuc0vnS8LDHdSJvD8kg 6fW8dygGbm9BhztKxsjr8rNYudrj03vmOUzFcfDzcZ5bZHQ5lRrsYTgI6m2ca9Nw722o TLOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015417; x=1741620217; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ign20YEmiqhrw2u8+y6jgpbQIECAjRoqEHqNv09q2OE=; b=q1WxqiJ3yq+WqpC9F9QKI4cx37iCYpoW/PG0ieQIQel7sgwnwgfNxmxcS4P8Z/VsVG hOI0hnIpq4zZucr9RMVfQNuM5W5ixyRRD68gk+vjffnVebqpmOQk1CbbrvWz/zHsiFio 4dRlTA8V7otislpIjPWHSU6cN0ANX1xj0hZmF6qbueB9DPUxxkBH3xF8saBlvCNEZVp6 ZP51JRUWQDtHjE5bfI98D6py6Jg7umajk2THiQzIIvXQZN266oFJwjMdqgqFxpW8SsIk Yc8cFbwLJHvcJytxJSHfMuBreytx9Z8VU8rhSAtAejdL9DqiCJLlJeLCJGlGFls5ep+m TMvQ== X-Forwarded-Encrypted: i=1; AJvYcCWe1J8pLeA94aS61BYnxTQ76YUsjqOAoQ3tmxxBieKkR/dMcozTpqHddC3DtR38yjHrruh5wNHPosNpODU=@vger.kernel.org X-Gm-Message-State: AOJu0YywLsuCHhZxNDrT34+yRKJpkw7AK1Z6Hj8y8FYBAoFwmbCHdDBI uf8mB5Fl/g+Y+b+TByCxOhHFXu0hcdF7Sq3HnGXMVb1Ezg1+xXCnhA/RsOL8a1g= X-Gm-Gg: ASbGncsSAfjCkpCLZ3VPubugkjDjXk9Bl+z5ZOeIaKkpURkezU4AYMO70xEsBTw9/im JyRZrnj1ZKup7eOpV3NYZHA9IWP5rnrbKKXbHQA+SxVeNblS8IYA6KkdSnfTGYQRklEnJ9L2pGx dpjXwJi1hXcnWeUA4K10xs7BJ8sneal5BwApDDEaG3Mzy9eJx3qIPJfslMv2Tp4ok+U3KHYL9YD angsj34WKDC9C4LorwYjNG79Yybq+7HCoairO3zydD8oLgw3+QnACxMfEQwEhL0v/ZpLZ1aA4SL f10lkXXmSalk3xEp9nhXv/aD0jQ0aT3PzA== X-Google-Smtp-Source: AGHT+IGlleKe6Z0c4L94quE3jCAiGhg6NJQ9iC0AaFH2JLOqqs7KqiXhmqbOtJsTuHI4jJYfhIdEnw== X-Received: by 2002:a05:600c:a03:b0:439:9b19:9e2d with SMTP id 5b1f17b1804b1-43ba6702becmr131087705e9.16.1741015416963; Mon, 03 Mar 2025 07:23:36 -0800 (PST) Received: from localhost ([2a03:2880:31ff:7::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bbfece041sm37242865e9.1.2025.03.03.07.23.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:36 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 21/25] bpf: Convert lpm_trie.c to rqspinlock Date: Mon, 3 Mar 2025 07:23:01 -0800 Message-ID: <20250303152305.3195648-22-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3875; h=from:subject; bh=H+4fvDbeMKpg9kdHFVQF9EirB/DOLCaq4vd16HHmH14=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYzhMQMRB+v202mYygVFkG5s/se6GlCLJdFF8d 3MnU/7aJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Rym2eEA CegI2sLVgC2H3/ckPAgaXiYSYSB/uZkXAnww6a6gZc4jNT9kbnHO5KY2RaymVBLci89GDlmnX3v+wp WH5WovKA4m3qH12EjfliSemjg4XIf5M+9Ubag8HHRATXQ0yaW0/Qm4ssV5nbBuntkgRgngLkiyA5C8 6fjsJiKGdQIahwdn/67q1wa9oR78sDXdVF+AOJsCO2QuhaU5qq0dltH27b3gJqWKSyc1OZcC5r1l61 elIyx2qtvgu8OqcZZqaRnEqrlW0x0BpkkO7Dgku2PsEA2SH7M+1DMl5mqtd3J2A0MZrUOT6OjSXmg4 p9NU7L83xsKPB1L/pYutorTTzi6toA8506vLLki/zJPux9cMC5JYuSLqcpz0pb1Q9FKfd8pB/9TfSu pbO9LSlidNGUkw43mfpYyTaDXU14Ud+GC+0SLrUZPMF5UwGNMJuS7N5MfGYxZ+Cwmw5fKGhqPOl9iz 9LGmjGVn6Kv7wo6LX+4HFS3gLLUM/eXMg+u9WWdTUBvFK+ksxSCkHM5fghBlgZS02NQfzLCdz6uO07 4a4aGj9rfzYsf+X9gWldSoL3VrCmnfaj9BW6RTJpdYDI5j73EYdqt946jfnpVYlit0hY5aV7DI4ZiI +0MFK7IDXVAmFcgWYbJJ2fnguEXpNKxH7KrBlkDDpqfs578zqAeL7aGfqhUg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Convert all LPM trie usage of raw_spinlock to rqspinlock. Note that rcu_dereference_protected in trie_delete_elem is switched over to plain rcu_dereference, the RCU read lock should be held from BPF program side or eBPF syscall path, and the trie->lock is just acquired before the dereference. It is not clear the reason the protected variant was used from the commit history, but the above reasoning makes sense so switch over. Closes: https://lore.kernel.org/lkml/000000000000adb08b061413919e@google.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/lpm_trie.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index e8a772e64324..be66d7e520e0 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -15,6 +15,7 @@ #include #include #include +#include #include /* Intermediate node */ @@ -36,7 +37,7 @@ struct lpm_trie { size_t n_entries; size_t max_prefixlen; size_t data_size; - raw_spinlock_t lock; + rqspinlock_t lock; }; /* This trie implements a longest prefix match algorithm that can be used to @@ -342,7 +343,9 @@ static long trie_update_elem(struct bpf_map *map, if (!new_node) return -ENOMEM; - raw_spin_lock_irqsave(&trie->lock, irq_flags); + ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags); + if (ret) + goto out_free; new_node->prefixlen = key->prefixlen; RCU_INIT_POINTER(new_node->child[0], NULL); @@ -356,8 +359,7 @@ static long trie_update_elem(struct bpf_map *map, */ slot = &trie->root; - while ((node = rcu_dereference_protected(*slot, - lockdep_is_held(&trie->lock)))) { + while ((node = rcu_dereference(*slot))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -442,8 +444,8 @@ static long trie_update_elem(struct bpf_map *map, rcu_assign_pointer(*slot, im_node); out: - raw_spin_unlock_irqrestore(&trie->lock, irq_flags); - + raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags); +out_free: if (ret) bpf_mem_cache_free(&trie->ma, new_node); bpf_mem_cache_free_rcu(&trie->ma, free_node); @@ -467,7 +469,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) if (key->prefixlen > trie->max_prefixlen) return -EINVAL; - raw_spin_lock_irqsave(&trie->lock, irq_flags); + ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags); + if (ret) + return ret; /* Walk the tree looking for an exact key/length match and keeping * track of the path we traverse. We will need to know the node @@ -478,8 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) trim = &trie->root; trim2 = trim; parent = NULL; - while ((node = rcu_dereference_protected( - *trim, lockdep_is_held(&trie->lock)))) { + while ((node = rcu_dereference(*trim))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -543,7 +546,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) free_node = node; out: - raw_spin_unlock_irqrestore(&trie->lock, irq_flags); + raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags); bpf_mem_cache_free_rcu(&trie->ma, free_parent); bpf_mem_cache_free_rcu(&trie->ma, free_node); @@ -592,7 +595,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr) offsetof(struct bpf_lpm_trie_key_u8, data); trie->max_prefixlen = trie->data_size * 8; - raw_spin_lock_init(&trie->lock); + raw_res_spin_lock_init(&trie->lock); /* Allocate intermediate and leaf nodes from the same allocator */ leaf_size = sizeof(struct lpm_trie_node) + trie->data_size + From patchwork Mon Mar 3 15:23:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999040 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C169236442; Mon, 3 Mar 2025 15:23:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015422; cv=none; b=C8Xt9BoRkZn01kTLtjeNAgm4YmsBijMeQpDfBGXNOORp7Ufnez6mnKlGC7Wf162iItbs/VmQD/BMoKMVIXM7uFvVHw3ZBPfvsFmRvt3o/ChGT6sfpuhiV8EyZKGH2adNX6c/HKnH1TY5zz+vy4D9sMSFMysrvieAM/QHCCrNoJ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015422; c=relaxed/simple; bh=S1zZ1+V5HXfDNJ9sHSSvG8gCJgMqlpTvJbjPruz8ZRU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kAdkvDynE0yoY21tC8qTOmeraaLweZGJ9O60D7jGci0ZPXAz/VK2H9qHeYPdIwSwZF0hwxxETbB2JH05C5atnKEScxtzCYDEHC5jXhu3FWzK1AMRtmvGDto+r7Ta99o9BDuRrQw/X7E6ZyogTJ75BrMyZ50rxk+EltStrfO/QXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ArKuszSL; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ArKuszSL" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-43bc31227ecso6734085e9.1; Mon, 03 Mar 2025 07:23:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015418; x=1741620218; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ftGzO2s1hk5aoPF/5zj2qPXt6ZSl0PVETns25Hb5ycw=; b=ArKuszSLaiojNWM50p+lrlj2rYkb6lonEJgjF7+QNsXlWXSlicVRKd+5mH0pHJWl1F JXdbU64bGe6n7qXYycvL2uQyfWiTZXSJGZz3ANXtFSK6JsjQVylb5LXxM85Es1eSDDeO WKQLGk48ClITjHB9/ZksyOxLVP0UiT3dW3Ohry/MTgND0rrvPvrzhJ2gTR7lab21SUg+ vPV9cgf2frIapzjo971MfBYTk+qbO7HCByqlYPym/6m2jRNkaOy+25hPweNFo4821YTS SFItmqLpccOI59wRPoFZ1+F2meNrUTLpTO6TjIFHmP9FiRV0ArbDaIHJO8D7yK5wt6jP HPhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015418; x=1741620218; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ftGzO2s1hk5aoPF/5zj2qPXt6ZSl0PVETns25Hb5ycw=; b=tM+eKIL4gSd6Q9B20y97oMT1b1GDu+CyPdVLFBYerOrNr1sljhe2W6qTgp/L2auYXo nejL4G2x/mTOzdgMR21a80DVzYgcUJtktxkOAkmYlFXigk6V5NMheBeIpY3tZAskD4X8 IQ93LU2br25v/SXQc97eZzwdYT7QswFtJqO/qJu7Y2ykRAKN6JXjeZNI6CUPnUDq3DLT d4KgCLHdS3gqtgMBDT/mlXD0X5TC3GAtMhWiYf8UDguvGhY5My/fsHvKP2kP1UGyJrlc JTdQX108Qz8u2/DtVNuUW/6lRPTisSr7hM5eNUizj8s1daJ1CkfxLwmjLvTSlNcP7/hQ FGfg== X-Forwarded-Encrypted: i=1; AJvYcCUkzQ2+3lUMTs89UuayyrYLaTmjuf/KDtCvRqfTUIcEoESLyux9r4olVxbcNIKiklzddtaZ30I0IreFWw4=@vger.kernel.org X-Gm-Message-State: AOJu0Yx6QBop8TST4KX/+d1uHqqlRJ3Ksl+H+ReIkyWNtBfnhpHZdB6c t1shr/R06atMLYpPsv+PW6wb2sqd+7cwJX3sHVzdqj4m1831GVmqv7Es5yS/ZC4= X-Gm-Gg: ASbGncvkxdVf9QLy4xdaboZU2rOJ2c9UQ72oib2i6tNy7DlMCBs2y1CuOcBCLbbYE07 LcLmceLtNE8emHhOytfCK/nx8+/f/96BwnkirjZrnHdQCCYbaAFL0qMRIloYBbaOHfGqT/13YC2 dInupfBjZfoZxy6it2aJuw3oN9mn9MTPwn1frBewGGBvAiVNywMPYLDERRUMKGIBQi8REkFHIh5 d1uAS/7+W/x+mEWGCeoIQd1jyoJDWbxtH3mWKw2UbYO2hfhbAXIUWaHJUkt3g8SrxGUnHUBSF7b 88lP/vx0w0AV9DrHYF2rWCKooXI+v00brQ== X-Google-Smtp-Source: AGHT+IFUgzbddleUlZT0ehYk/Y59B0PJMzJQ++IN3Hp+ieXBjwlFjh5Q5E9wkdn7XokNgE0Z1HE0+g== X-Received: by 2002:a05:600c:190b:b0:439:955d:7ad9 with SMTP id 5b1f17b1804b1-43ba66fe855mr116664525e9.14.1741015418083; Mon, 03 Mar 2025 07:23:38 -0800 (PST) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc63bcaafsm23440385e9.28.2025.03.03.07.23.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:37 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 22/25] bpf: Introduce rqspinlock kfuncs Date: Mon, 3 Mar 2025 07:23:02 -0800 Message-ID: <20250303152305.3195648-23-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5071; h=from:subject; bh=S1zZ1+V5HXfDNJ9sHSSvG8gCJgMqlpTvJbjPruz8ZRU=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZSnBmS5ietCtuPE+4pmKWtJULnErXEc4IC9nN w/N0+8KJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RyvpbD/ 9n03epztxKLKXs5mq2JtSowibMlL/cxUOqIT/Er6iBycLJW6QrV68Gu4EXHFaJ9MZk5E3g2pzTcM3D JO8iLC+CLg/IwzMs+VQkPJBZpTbzrjX8NXMFOVdKOCPuTyBon/w6QQHN+dI0ms1DMMWkcHXpNgvkCp uHYwYc6JsQ4WUGaza1Vx3zZFXhgnXCN7o74lDcriARO1+lcB8Se5+bLCMqU6hcPK03ccyXEFk3RrYH c5lUO3C9t8W0edpQE1LWCGNLwrhhNivyOOOk7vLbdm4Exk01CJuRc5bDzvHgykGNsJg/UDR2EMGFIK 4uj6moeuOAxJcYm832dS62gCEz5OxXOr2LOg1Ee2XdP2obxhEA9pFLKdvbXo59xMfmioEC2XKuP0T3 WjHNHXhqtKaYrns10R6ilDge5CWxKksBPL30/9UkZDy9ERjZkfyD/SDxhmVdNy5ExqHKZCanzFCnMv iDoBAewmWR2hj287w0kBdw7CHaoPg9/HIyKMCv0MImIHZlzDasPBMAXnJaYyY0V7LM2fwr3F0UuOFF DrPYoPvey+vLaRgS1kIfcBQHd6v9z4fLn9JPWdc5ggzlJdRI3a8rLp2gDF14/LcQZJzK3T86HKBGUn Sv9e4xwbZoBil2FPHRU7/R9ghWKfT+cmbi6Iu1+TkNXFlGRHV+TPloE9KwMg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock, and their irqsave/irqrestore variants, which wrap the rqspinlock APIs. bpf_res_spin_lock returns a conditional result, depending on whether the lock was acquired (NULL is returned when lock acquisition succeeds, non-NULL upon failure). The memory pointed to by the returned pointer upon failure can be dereferenced after the NULL check to obtain the error code. Instead of using the old bpf_spin_lock type, introduce a new type with the same layout, and the same alignment, but a different name to avoid type confusion. Preemption is disabled upon successful lock acquisition, however IRQs are not. Special kfuncs can be introduced later to allow disabling IRQs when taking a spin lock. Resilient locks are safe against AA deadlocks, hence not disabling IRQs currently does not allow violation of kernel safety. __irq_flag annotation is used to accept IRQ flags for the IRQ-variants, with the same semantics as existing bpf_local_irq_{save, restore}. These kfuncs will require additional verifier-side support in subsequent commits, to allow programs to hold multiple locks at the same time. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 7 +++ include/linux/bpf.h | 1 + kernel/locking/rqspinlock.c | 78 ++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 418b652e0249..06906489d9ba 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -23,6 +23,13 @@ struct rqspinlock { }; }; +/* Even though this is same as struct rqspinlock, we need to emit a distinct + * type in BTF for BPF programs. + */ +struct bpf_res_spin_lock { + u32 val; +}; + struct qspinlock; #ifdef CONFIG_QUEUED_SPINLOCKS typedef struct qspinlock rqspinlock_t; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 4c4028d865ee..aa47e11371b3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -30,6 +30,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 0031a1bfbd4e..0c53d36e2f6c 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -15,6 +15,8 @@ #include #include +#include +#include #include #include #include @@ -684,3 +686,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); #endif /* CONFIG_QUEUED_SPINLOCKS */ + +__bpf_kfunc_start_defs(); + +#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; }) + +__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock) +{ + int ret; + + BUILD_BUG_ON(sizeof(rqspinlock_t) != sizeof(struct bpf_res_spin_lock)); + BUILD_BUG_ON(__alignof__(rqspinlock_t) != __alignof__(struct bpf_res_spin_lock)); + + preempt_disable(); + ret = res_spin_lock((rqspinlock_t *)lock); + if (unlikely(ret)) { + preempt_enable(); + rqspinlock_report_violation(REPORT_STR(ret), lock); + return ret; + } + return 0; +} + +__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock) +{ + res_spin_unlock((rqspinlock_t *)lock); + preempt_enable(); +} + +__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag) +{ + u64 *ptr = (u64 *)flags__irq_flag; + unsigned long flags; + int ret; + + preempt_disable(); + local_irq_save(flags); + ret = res_spin_lock((rqspinlock_t *)lock); + if (unlikely(ret)) { + local_irq_restore(flags); + preempt_enable(); + rqspinlock_report_violation(REPORT_STR(ret), lock); + return ret; + } + *ptr = flags; + return 0; +} + +__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag) +{ + u64 *ptr = (u64 *)flags__irq_flag; + unsigned long flags = *ptr; + + res_spin_unlock((rqspinlock_t *)lock); + local_irq_restore(flags); + preempt_enable(); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(rqspinlock_kfunc_ids) +BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_res_spin_unlock) +BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore) +BTF_KFUNCS_END(rqspinlock_kfunc_ids) + +static const struct btf_kfunc_id_set rqspinlock_kfunc_set = { + .owner = THIS_MODULE, + .set = &rqspinlock_kfunc_ids, +}; + +static __init int rqspinlock_register_kfuncs(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set); +} +late_initcall(rqspinlock_register_kfuncs); From patchwork Mon Mar 3 15:23:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999043 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1268923908C; Mon, 3 Mar 2025 15:23:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015428; cv=none; b=iAlGwwjpOuoRml2fNcTHbUak4/UQyf7V4J9p8kk74u6gw2wzhk2T0xs4Fi9AYLhO0xF7XjHFOLBcJuGbC5IZR0A+WZHVy+IKgs2WqSM89rfaNKFIAlFZzu6nIwPyqBbxcyILWN3bNpxOdSwEuoQe4FbAsX4RcUq6ZRqg3s0TUIg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015428; c=relaxed/simple; bh=MO7GJTSenU9MV6IQtiRHmjgSgn53O+a0eeEhZHGj470=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WDp7mOY2bsn2F5NiXcxV7MCF7HTWvPMcCVbj0yS9rJbrBpoYBMUm/rjN/EcTmj/kxOIv8w5frPUbAIa5zmM1MEUwN+mh6IiUjtuc2kQ6Qqkfo6Zi2u89ho2aLJDVVouIne8tIL6UOLoKff8S5aSR4tiRocaBY3rIj/oEEgfYwsc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Pyh3rxiO; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Pyh3rxiO" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-43bbd711eedso11284795e9.3; Mon, 03 Mar 2025 07:23:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015420; x=1741620220; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+4h+ukFvLyEJPpccEjjwwvawjNKj/Xrf/LqUnWOEd4Y=; b=Pyh3rxiOQSnrOSxW+Dw79e5FHhpcNOwIwHrI6AgsAcAUvLXwm9yZgRxcZpuCY91+43 TzEjVTpE23gq2nn17b7jVNjvXOvbthW1i3TxylKmxJSODE/aowgTNH+rQlInt/hnkqSd +gOStsjjV219VoVErI2hEI3Hvx1cGEe7PbJdFkw+EetOHQ6AJes0/U+tuXHxKtUENd42 rYinUFUu9psfuy6lvgJWR6sWbLI/qwzZ9gkb1MTDAeCrCBpeFzXXhWUPh1Z5Lwq30s0m YQP82Ax5WWAbfwXwBesyqJdsKQ/Ho1Yd9sYsPjKoCVKZZwLqcoG+Pm4bu73c33euyp/W t8dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015420; x=1741620220; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+4h+ukFvLyEJPpccEjjwwvawjNKj/Xrf/LqUnWOEd4Y=; b=lOAwSw5oHa++4gpAAZZTzkRb/8qOJg4QRBF3szafaNJIIYmuK++mJP3jj+kgHVUiGf O8K/VLYchBYUH8zvo4zD2lxJf7eEQ3PGvM/vVUNatd1cRPOJFpNGbbb4EiXIjHhyadEU EGCeZ/ryJX0lsu6zOCfVJ2q3joBEldTDvkQs71LFKwTluSdpXxER38bZ4+LbElxjt6ao qwIgI0TJteeFVotO4CgNOh6E8WcsW9LkOHBxnrkfniq4RnNClZvvHSlSL+a7DctU1RVD wID0AA2miJwypDnfpunZ8EFKlJnFCM5F0BWlbvz6t5sgU27VZnlz46aDUlbcd3rVuicO u36w== X-Forwarded-Encrypted: i=1; AJvYcCVyE1UA2o+X2KGUmetSsE6Vgqa7YcSMr1BkkwdSFhR/F4ZPj5rBw3SJo7MRRh/Hp3Gy4E9EUaTovKvu2bg=@vger.kernel.org X-Gm-Message-State: AOJu0YxpEx2R/YnS5pGYAAAmY5n3myZFmeXDqm8sBbIDkiDdMQpqKsdJ dNmxGa8UGOpNdpWKJYLfEwW2eCltHX9Pidbs4fRi70hi8DWxKaN+INWP5GMF5/c= X-Gm-Gg: ASbGnctoBxHLlzyJx3iPIZ7Tyry5Z4KUZlOJM3EjPQ7HrexGulFGuaz/hDPd8U8bo6s W4mWAAlmtSrghJB6DcVYUNCLz+58vBKNeNmD7K5auUBG7MdHO+MVJduJkiijyK6MtUpj2xYzmKU zkqHU1Uida8VrFIYsqoSM3/S8nEb5ezXo1k7z7ivsORXECfSHm4rta/J8A7pBgnJSCPHDTElVJt Y7RiBUB/qlizYACQfKoC5w8yCRwWauPHxs+e43574QKMOW9ICbCij7VmFYInvWroGrfDv2YeQmC Z1l2zlHmv9pdV5XbiWzvVYHl8KlsaahGRYs= X-Google-Smtp-Source: AGHT+IEdAQLbk3XWgP5dxzDcU8Ji7pMN7/8gM3fi+1QJQlX4tiY+pFVN015rHmkdZZHHkiT9FlYiGA== X-Received: by 2002:a05:600c:a4b:b0:439:8185:4ad4 with SMTP id 5b1f17b1804b1-43ba6747082mr109032045e9.27.1741015419506; Mon, 03 Mar 2025 07:23:39 -0800 (PST) Received: from localhost ([2a03:2880:31ff:5f::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47b7c4fsm14905417f8f.52.2025.03.03.07.23.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:38 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Eduard Zingerman , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 23/25] bpf: Implement verifier support for rqspinlock Date: Mon, 3 Mar 2025 07:23:03 -0800 Message-ID: <20250303152305.3195648-24-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=28432; h=from:subject; bh=MO7GJTSenU9MV6IQtiRHmjgSgn53O+a0eeEhZHGj470=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZN6Fo9q/LgMjUiSwqqQrTQMGCUIN7wbtUmcfA +jv1xeaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8Ryk4iD/ 965nxBXYR4dDVLmBm4StFtOJqi96rYjPW14Nx0sXbgZqtredefGx6C9t3nso+ga5OFs6Yx+xMoDQRg YWbhYxO3Ah7+hOpG/d6L0s+xnArh3d/iZccG+ZZBrwtDmkTir95j6qzGzg5Yih5QvE+voTDuUY53Em Ksftg/Dy4mTbp9WeCZgzvl9cgXSCuOmkDBPQVStIqV0sD7Cc2djPPY5bziy8LAbFRzIg9fLWLDQsOP i/7Edl/jMnt26/1s8G2GQHZmONpUOMz1+EbKo5yUnb2FDMLMN3cWT77SMGKxAf3WGBuM8dh7aETlfc Qwa3IlRRmJlIT2V7aaq5wpCTglKdk7GcrwmBLAZwPa9E2OdnrsE0xbfgKwyadRTkKi/g+YDjuX62oK bxvAMKv1+mMNQ2R2NWpZSi77gVZSJN8/IMzAifanV8irD72FhR30XzZ6eKed/e6z7Q2b4Vh9CK6/X+ 5cTtfh+RGvbKXZ2/rpoirQMG6Mn8kwxoChEbw8Qg1FSoZt6b1R81aAfJyxksexQhZaz0g9/KRtau4E isCtpZec3tmqmAfuHSAs7azUWnmw/YBFVByhGB8eK5zHHNdoDBerhK5Et15S+Ds3zyUv2wL9MxvyPs McFMK4xYH+u7FbG+hORXqQ2cp/vKO8Oaanj5qSwrZI4qmZ63JRCdJp0VULrQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce verifier-side support for rqspinlock kfuncs. The first step is allowing bpf_res_spin_lock type to be defined in map values and allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK field to recognize and validate. Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only one of them (and at most one of them per-object, like before) must be present. The bpf_res_spin_lock can also be used to protect objects that require lock protection for their kfuncs, like BPF rbtree and linked list. The verifier plumbing to simulate success and failure cases when calling the kfuncs is done by pushing a new verifier state to the verifier state stack which will verify the failure case upon calling the kfunc. The path where success is indicated creates all lock reference state and IRQ state (if necessary for irqsave variants). In the case of failure, the state clears the registers r0-r5, sets the return value, and skips kfunc processing, proceeding to the next instruction. When marking the return value for success case, the value is marked as 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program, whenever user checks the return value as 'if (ret)' or 'if (ret < 0)' the verifier never traverses such branches for success cases, and would be aware that the lock is not held in such cases. We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs are invoked. We introduce a kfunc_class state to avoid mixing lock irqrestore kfuncs with IRQ state created by bpf_local_irq_save. With all this infrastructure, these kfuncs become usable in programs while satisfying all safety properties required by the kernel. Acked-by: Eduard Zingerman Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf.h | 9 ++ include/linux/bpf_verifier.h | 16 ++- kernel/bpf/btf.c | 26 ++++- kernel/bpf/syscall.c | 6 +- kernel/bpf/verifier.c | 219 ++++++++++++++++++++++++++++------- 5 files changed, 231 insertions(+), 45 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index aa47e11371b3..ad4468422770 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -205,6 +205,7 @@ enum btf_field_type { BPF_REFCOUNT = (1 << 9), BPF_WORKQUEUE = (1 << 10), BPF_UPTR = (1 << 11), + BPF_RES_SPIN_LOCK = (1 << 12), }; typedef void (*btf_dtor_kfunc_t)(void *); @@ -240,6 +241,7 @@ struct btf_record { u32 cnt; u32 field_mask; int spin_lock_off; + int res_spin_lock_off; int timer_off; int wq_off; int refcount_off; @@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return "bpf_spin_lock"; + case BPF_RES_SPIN_LOCK: + return "bpf_res_spin_lock"; case BPF_TIMER: return "bpf_timer"; case BPF_WORKQUEUE: @@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return sizeof(struct bpf_spin_lock); + case BPF_RES_SPIN_LOCK: + return sizeof(struct bpf_res_spin_lock); case BPF_TIMER: return sizeof(struct bpf_timer); case BPF_WORKQUEUE: @@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return __alignof__(struct bpf_spin_lock); + case BPF_RES_SPIN_LOCK: + return __alignof__(struct bpf_res_spin_lock); case BPF_TIMER: return __alignof__(struct bpf_timer); case BPF_WORKQUEUE: @@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr) case BPF_RB_ROOT: /* RB_ROOT_CACHED 0-inits, no need to do anything after memset */ case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_WORKQUEUE: case BPF_KPTR_UNREF: diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index d338f2a96bba..269449363f78 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -115,6 +115,14 @@ struct bpf_reg_state { int depth:30; } iter; + /* For irq stack slots */ + struct { + enum { + IRQ_NATIVE_KFUNC, + IRQ_LOCK_KFUNC, + } kfunc_class; + } irq; + /* Max size from any of the above. */ struct { unsigned long raw1; @@ -255,9 +263,11 @@ struct bpf_reference_state { * default to pointer reference on zero initialization of a state. */ enum ref_state_type { - REF_TYPE_PTR = 1, - REF_TYPE_IRQ = 2, - REF_TYPE_LOCK = 3, + REF_TYPE_PTR = (1 << 1), + REF_TYPE_IRQ = (1 << 2), + REF_TYPE_LOCK = (1 << 3), + REF_TYPE_RES_LOCK = (1 << 4), + REF_TYPE_RES_LOCK_IRQ = (1 << 5), } type; /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 519e3f5e9c10..f7a2bfb0c11a 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3481,6 +3481,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_ goto end; } } + if (field_mask & BPF_RES_SPIN_LOCK) { + if (!strcmp(name, "bpf_res_spin_lock")) { + if (*seen_mask & BPF_RES_SPIN_LOCK) + return -E2BIG; + *seen_mask |= BPF_RES_SPIN_LOCK; + type = BPF_RES_SPIN_LOCK; + goto end; + } + } if (field_mask & BPF_TIMER) { if (!strcmp(name, "bpf_timer")) { if (*seen_mask & BPF_TIMER) @@ -3659,6 +3668,7 @@ static int btf_find_field_one(const struct btf *btf, switch (field_type) { case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_WORKQUEUE: case BPF_LIST_NODE: @@ -3952,6 +3962,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type return ERR_PTR(-ENOMEM); rec->spin_lock_off = -EINVAL; + rec->res_spin_lock_off = -EINVAL; rec->timer_off = -EINVAL; rec->wq_off = -EINVAL; rec->refcount_off = -EINVAL; @@ -3979,6 +3990,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type /* Cache offset for faster lookup at runtime */ rec->spin_lock_off = rec->fields[i].offset; break; + case BPF_RES_SPIN_LOCK: + WARN_ON_ONCE(rec->spin_lock_off >= 0); + /* Cache offset for faster lookup at runtime */ + rec->res_spin_lock_off = rec->fields[i].offset; + break; case BPF_TIMER: WARN_ON_ONCE(rec->timer_off >= 0); /* Cache offset for faster lookup at runtime */ @@ -4022,9 +4038,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type rec->cnt++; } + if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) { + ret = -EINVAL; + goto end; + } + /* bpf_{list_head, rb_node} require bpf_spin_lock */ if ((btf_record_has_field(rec, BPF_LIST_HEAD) || - btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) { + btf_record_has_field(rec, BPF_RB_ROOT)) && + (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) { ret = -EINVAL; goto end; } @@ -5637,7 +5659,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) type = &tab->types[tab->cnt]; type->btf_id = i; - record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | + record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT | BPF_KPTR, t->size); /* The record cannot be unset, treat it as an error if so */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 57a438706215..5cf017e37d7d 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -665,6 +665,7 @@ void btf_record_free(struct btf_record *rec) case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: @@ -717,6 +718,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: @@ -794,6 +796,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) switch (fields[i].type) { case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: break; case BPF_TIMER: bpf_timer_cancel_and_free(field_ptr); @@ -1229,7 +1232,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, return -EINVAL; map->record = btf_parse_fields(btf, value_type, - BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | + BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR, map->value_size); if (!IS_ERR_OR_NULL(map->record)) { @@ -1248,6 +1251,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, case 0: continue; case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_ARRAY && map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index eb1624f6e743..6c8ef72ee6bc 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -456,7 +456,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog) static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) { - return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK); + return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK); } static bool type_is_rdonly_mem(u32 type) @@ -1148,7 +1148,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id); static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, - struct bpf_reg_state *reg, int insn_idx) + struct bpf_reg_state *reg, int insn_idx, + int kfunc_class) { struct bpf_func_state *state = func(env, reg); struct bpf_stack_state *slot; @@ -1170,6 +1171,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ st->live |= REG_LIVE_WRITTEN; st->ref_obj_id = id; + st->irq.kfunc_class = kfunc_class; for (i = 0; i < BPF_REG_SIZE; i++) slot->slot_type[i] = STACK_IRQ_FLAG; @@ -1178,7 +1180,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, return 0; } -static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg) +static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg, + int kfunc_class) { struct bpf_func_state *state = func(env, reg); struct bpf_stack_state *slot; @@ -1192,6 +1195,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r slot = &state->stack[spi]; st = &slot->spilled_ptr; + if (st->irq.kfunc_class != kfunc_class) { + const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock"; + const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock"; + + verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n", + flag_kfunc, used_kfunc); + return -EINVAL; + } + err = release_irq_state(env->cur_state, st->ref_obj_id); WARN_ON_ONCE(err && err != -EACCES); if (err) { @@ -1602,7 +1614,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st for (i = 0; i < state->acquired_refs; i++) { struct bpf_reference_state *s = &state->refs[i]; - if (s->type != type) + if (!(s->type & type)) continue; if (s->id == id && s->ptr == ptr) @@ -8063,6 +8075,12 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg return err; } +enum { + PROCESS_SPIN_LOCK = (1 << 0), + PROCESS_RES_LOCK = (1 << 1), + PROCESS_LOCK_IRQ = (1 << 2), +}; + /* Implementation details: * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL. * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL. @@ -8085,30 +8103,33 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg * env->cur_state->active_locks remembers which map value element or allocated * object got locked and clears it after bpf_spin_unlock. */ -static int process_spin_lock(struct bpf_verifier_env *env, int regno, - bool is_lock) +static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags) { + bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK; + const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin"; struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; struct bpf_verifier_state *cur = env->cur_state; bool is_const = tnum_is_const(reg->var_off); + bool is_irq = flags & PROCESS_LOCK_IRQ; u64 val = reg->var_off.value; struct bpf_map *map = NULL; struct btf *btf = NULL; struct btf_record *rec; + u32 spin_lock_off; int err; if (!is_const) { verbose(env, - "R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n", - regno); + "R%d doesn't have constant offset. %s_lock has to be at the constant offset\n", + regno, lock_str); return -EINVAL; } if (reg->type == PTR_TO_MAP_VALUE) { map = reg->map_ptr; if (!map->btf) { verbose(env, - "map '%s' has to have BTF in order to use bpf_spin_lock\n", - map->name); + "map '%s' has to have BTF in order to use %s_lock\n", + map->name, lock_str); return -EINVAL; } } else { @@ -8116,36 +8137,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, } rec = reg_btf_record(reg); - if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) { - verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local", - map ? map->name : "kptr"); + if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) { + verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local", + map ? map->name : "kptr", lock_str); return -EINVAL; } - if (rec->spin_lock_off != val + reg->off) { - verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n", - val + reg->off, rec->spin_lock_off); + spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off; + if (spin_lock_off != val + reg->off) { + verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n", + val + reg->off, lock_str, spin_lock_off); return -EINVAL; } if (is_lock) { void *ptr; + int type; if (map) ptr = map; else ptr = btf; - if (cur->active_locks) { - verbose(env, - "Locking two bpf_spin_locks are not allowed\n"); - return -EINVAL; + if (!is_res_lock && cur->active_locks) { + if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) { + verbose(env, + "Locking two bpf_spin_locks are not allowed\n"); + return -EINVAL; + } + } else if (is_res_lock && cur->active_locks) { + if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, reg->id, ptr)) { + verbose(env, "Acquiring the same lock again, AA deadlock detected\n"); + return -EINVAL; + } } - err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr); + + if (is_res_lock && is_irq) + type = REF_TYPE_RES_LOCK_IRQ; + else if (is_res_lock) + type = REF_TYPE_RES_LOCK; + else + type = REF_TYPE_LOCK; + err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr); if (err < 0) { verbose(env, "Failed to acquire lock state\n"); return err; } } else { void *ptr; + int type; if (map) ptr = map; @@ -8153,12 +8191,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, ptr = btf; if (!cur->active_locks) { - verbose(env, "bpf_spin_unlock without taking a lock\n"); + verbose(env, "%s_unlock without taking a lock\n", lock_str); return -EINVAL; } - if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) { - verbose(env, "bpf_spin_unlock of different lock\n"); + if (is_res_lock && is_irq) + type = REF_TYPE_RES_LOCK_IRQ; + else if (is_res_lock) + type = REF_TYPE_RES_LOCK; + else + type = REF_TYPE_LOCK; + if (release_lock_state(cur, type, reg->id, ptr)) { + verbose(env, "%s_unlock of different lock\n", lock_str); return -EINVAL; } @@ -9484,11 +9528,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, return -EACCES; } if (meta->func_id == BPF_FUNC_spin_lock) { - err = process_spin_lock(env, regno, true); + err = process_spin_lock(env, regno, PROCESS_SPIN_LOCK); if (err) return err; } else if (meta->func_id == BPF_FUNC_spin_unlock) { - err = process_spin_lock(env, regno, false); + err = process_spin_lock(env, regno, 0); if (err) return err; } else { @@ -11370,7 +11414,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn regs[BPF_REG_0].map_uid = meta.map_uid; regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag; if (!type_may_be_null(ret_flag) && - btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) { + btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) { regs[BPF_REG_0].id = ++env->id_gen; } break; @@ -11542,10 +11586,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn /* mark_btf_func_reg_size() is used when the reg size is determined by * the BTF func_proto's return value size and argument. */ -static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, - size_t reg_size) +static void __mark_btf_func_reg_size(struct bpf_verifier_env *env, struct bpf_reg_state *regs, + u32 regno, size_t reg_size) { - struct bpf_reg_state *reg = &cur_regs(env)[regno]; + struct bpf_reg_state *reg = ®s[regno]; if (regno == BPF_REG_0) { /* Function return value */ @@ -11563,6 +11607,12 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, } } +static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, + size_t reg_size) +{ + return __mark_btf_func_reg_size(env, cur_regs(env), regno, reg_size); +} + static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta) { return meta->kfunc_flags & KF_ACQUIRE; @@ -11700,6 +11750,7 @@ enum { KF_ARG_RB_ROOT_ID, KF_ARG_RB_NODE_ID, KF_ARG_WORKQUEUE_ID, + KF_ARG_RES_SPIN_LOCK_ID, }; BTF_ID_LIST(kf_arg_btf_ids) @@ -11709,6 +11760,7 @@ BTF_ID(struct, bpf_list_node) BTF_ID(struct, bpf_rb_root) BTF_ID(struct, bpf_rb_node) BTF_ID(struct, bpf_wq) +BTF_ID(struct, bpf_res_spin_lock) static bool __is_kfunc_ptr_arg_type(const struct btf *btf, const struct btf_param *arg, int type) @@ -11757,6 +11809,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg) return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID); } +static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg) +{ + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID); +} + static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf, const struct btf_param *arg) { @@ -11828,6 +11885,7 @@ enum kfunc_ptr_arg_type { KF_ARG_PTR_TO_MAP, KF_ARG_PTR_TO_WORKQUEUE, KF_ARG_PTR_TO_IRQ_FLAG, + KF_ARG_PTR_TO_RES_SPIN_LOCK, }; enum special_kfunc_type { @@ -11866,6 +11924,10 @@ enum special_kfunc_type { KF_bpf_iter_num_destroy, KF_bpf_set_dentry_xattr, KF_bpf_remove_dentry_xattr, + KF_bpf_res_spin_lock, + KF_bpf_res_spin_unlock, + KF_bpf_res_spin_lock_irqsave, + KF_bpf_res_spin_unlock_irqrestore, }; BTF_SET_START(special_kfunc_set) @@ -11955,6 +12017,10 @@ BTF_ID(func, bpf_remove_dentry_xattr) BTF_ID_UNUSED BTF_ID_UNUSED #endif +BTF_ID(func, bpf_res_spin_lock) +BTF_ID(func, bpf_res_spin_unlock) +BTF_ID(func, bpf_res_spin_lock_irqsave) +BTF_ID(func, bpf_res_spin_unlock_irqrestore) static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { @@ -12048,6 +12114,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, if (is_kfunc_arg_irq_flag(meta->btf, &args[argno])) return KF_ARG_PTR_TO_IRQ_FLAG; + if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno])) + return KF_ARG_PTR_TO_RES_SPIN_LOCK; + if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) { if (!btf_type_is_struct(ref_t)) { verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n", @@ -12155,13 +12224,19 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, struct bpf_kfunc_call_arg_meta *meta) { struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; + int err, kfunc_class = IRQ_NATIVE_KFUNC; bool irq_save; - int err; - if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) { + if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) { irq_save = true; - } else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) { + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) + kfunc_class = IRQ_LOCK_KFUNC; + } else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) { irq_save = false; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) + kfunc_class = IRQ_LOCK_KFUNC; } else { verbose(env, "verifier internal error: unknown irq flags kfunc\n"); return -EFAULT; @@ -12177,7 +12252,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, if (err) return err; - err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx); + err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class); if (err) return err; } else { @@ -12191,7 +12266,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, if (err) return err; - err = unmark_stack_slot_irq_flag(env, reg); + err = unmark_stack_slot_irq_flag(env, reg, kfunc_class); if (err) return err; } @@ -12318,7 +12393,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ if (!env->cur_state->active_locks) return -EINVAL; - s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr); + s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, + id, ptr); if (!s) { verbose(env, "held lock and object are not in the same allocation\n"); return -EINVAL; @@ -12354,9 +12430,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id) btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]; } +static bool is_bpf_res_spin_lock_kfunc(u32 btf_id) +{ + return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] || + btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] || + btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] || + btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]; +} + static bool kfunc_spin_allowed(u32 btf_id) { - return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id); + return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) || + is_bpf_res_spin_lock_kfunc(btf_id); } static bool is_sync_callback_calling_kfunc(u32 btf_id) @@ -12788,6 +12873,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ case KF_ARG_PTR_TO_CONST_STR: case KF_ARG_PTR_TO_WORKQUEUE: case KF_ARG_PTR_TO_IRQ_FLAG: + case KF_ARG_PTR_TO_RES_SPIN_LOCK: break; default: WARN_ON_ONCE(1); @@ -13086,6 +13172,28 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ if (ret < 0) return ret; break; + case KF_ARG_PTR_TO_RES_SPIN_LOCK: + { + int flags = PROCESS_RES_LOCK; + + if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { + verbose(env, "arg#%d doesn't point to map value or allocated object\n", i); + return -EINVAL; + } + + if (!is_bpf_res_spin_lock_kfunc(meta->func_id)) + return -EFAULT; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) + flags |= PROCESS_SPIN_LOCK; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) + flags |= PROCESS_LOCK_IRQ; + ret = process_spin_lock(env, regno, flags); + if (ret < 0) + return ret; + break; + } } } @@ -13171,6 +13279,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, insn_aux->is_iter_next = is_iter_next_kfunc(&meta); + if (!insn->off && + (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] || + insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) { + struct bpf_verifier_state *branch; + struct bpf_reg_state *regs; + + branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false); + if (!branch) { + verbose(env, "failed to push state for failed lock acquisition\n"); + return -ENOMEM; + } + + regs = branch->frame[branch->curframe]->regs; + + /* Clear r0-r5 registers in forked state */ + for (i = 0; i < CALLER_SAVED_REGS; i++) + mark_reg_not_init(env, regs, caller_saved[i]); + + mark_reg_unknown(env, regs, BPF_REG_0); + err = __mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1); + if (err) { + verbose(env, "failed to mark s32 range for retval in forked state for lock\n"); + return err; + } + __mark_btf_func_reg_size(env, regs, BPF_REG_0, sizeof(u32)); + } + if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) { verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n"); return -EACCES; @@ -13341,6 +13476,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (btf_type_is_scalar(t)) { mark_reg_unknown(env, regs, BPF_REG_0); + if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] || + meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) + __mark_reg_const_zero(env, ®s[BPF_REG_0]); mark_btf_func_reg_size(env, BPF_REG_0, t->size); } else if (btf_type_is_ptr(t)) { ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id); @@ -18275,7 +18413,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, case STACK_IRQ_FLAG: old_reg = &old->stack[spi].spilled_ptr; cur_reg = &cur->stack[spi].spilled_ptr; - if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap)) + if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) || + old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class) return false; break; case STACK_MISC: @@ -18319,6 +18458,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c case REF_TYPE_IRQ: break; case REF_TYPE_LOCK: + case REF_TYPE_RES_LOCK: + case REF_TYPE_RES_LOCK_IRQ: if (old->refs[i].ptr != cur->refs[i].ptr) return false; break; @@ -19641,7 +19782,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } } - if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) { + if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n"); return -EINVAL; From patchwork Mon Mar 3 15:23:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999041 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DC9B23909C; Mon, 3 Mar 2025 15:23:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015426; cv=none; b=fv5mVwBTXsoj7nZHw+quwegW+ziYjxHOt07vah5u51YcMZTIvBgKgEdWo4blwkdAl9RgKijtSFst23u8KQW/oA5rkcs8MGQDkN7rJsDd1WsYeSL3x3eo77FPSON7coh3TNeY53mkJPYlNJX61zdaNgnxziwLfJHRLL47GJW0qgs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015426; c=relaxed/simple; bh=yd3gRpKNHABqLGxFwmo4qitDCfszyenvxRvQDwJYTjI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LkjQokhZaovb6BWZwvXOJeLKxjCi1DO4nUP7bVqy96w62XyaaRFnZYwHuBOTyWHZSzZNIEuUFxVpVYxQ5q22NJHl2M4rJsareYbEK+JwqFJBkDHFIohkzo9A+dlRkgQlajLybYurzKXHoktSXZYjSTO5dukZGBm2QqjnoncB53s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gDl/zvub; arc=none smtp.client-ip=209.85.221.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gDl/zvub" Received: by mail-wr1-f68.google.com with SMTP id ffacd0b85a97d-390effd3e85so3257382f8f.0; Mon, 03 Mar 2025 07:23:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015421; x=1741620221; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JK06538PoT0hdERtpqSLZ9dXgVnOMpofNScplUny+Pg=; b=gDl/zvub9tYQEpfE8tziR5SbZ72TGGOg4J6glV0DxW4iF/MreCWZfVVmaE+XujCFvK j1ZjIPSTdiBCQcRndtg7lE2Kd3strHDyH8lmMQAbQm1pHNI4tFa5mqfGfiPN0YZyB4ur NLpAx8zHe/Q06Z6yPXfXCXq5tD7wFXKc/LWtdwgyUYucDeGRcZ7Gf31GS5sk6HS4AqLI m+WPDU7s+1CfqIwaOZ/HmWjvGTjRcmMIkyGsN+rX0bRNtWSPOxKl1XZLle3oK1tOmEoA gyV1YKm7fk2xJaKvVfzaDYLCyjWUh1ody+tlfrf6RE7DsdU9Ak3eN/uE9A8JLP+qodCi pGog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015421; x=1741620221; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JK06538PoT0hdERtpqSLZ9dXgVnOMpofNScplUny+Pg=; b=pAeG0yzcg7MdoUyRp5M1urStVy30ct4qDrIgOyES5r07axv2J9G7cip5/ME0vdco5Z 8z4NbQwZBFBgq0gJxC7YOw+LCpo7x2VFNVKzkIZ4GaJrrYuuoqkAgvTI2g0O94MaePSX XUxq3C+tukQ2Gs6VNUY6w+6H92kEEFS9rNbwdpRl+rNfJkiCAglHFZQVMvpMwlICLXj+ N5kqzjE2JUYfaC3ncthVpldI8O33GWs6e8xg0ACm+D/gGIMIJ0Camh190iEBlkdKptwQ owuDq3QhPlIVqt5CMD5SSRblYg0wARAGJD25dll2nKpBuu0tk0r5nDjNeFVA3Nt0GM9m Uy/Q== X-Forwarded-Encrypted: i=1; AJvYcCXfSg8NQxqF+i754jFCvUFbWjCsMZiGpnHCWCAnQ4AlK+HQPhjYuWSxtJOdlnhCAM7GeljPo7SRTEVDYPw=@vger.kernel.org X-Gm-Message-State: AOJu0YyOOtQcKKLxasqsnX+rqof2Iz1CRmdfA3XfJSYvSoPm0DH8U0Ce XpRUnw/aKDJ1tu+SKltGxQcofRqzK6gH7now7Rw/DuTzmg/Vt2aNlwrn+cwXXWE= X-Gm-Gg: ASbGncvV4XMuxuAouIb4Q71ZcbwcymYwIHHAs7CZh8nuLcgYQV/orq9yMMbOrX+4LaW xitjdPIcAj86dZGzskj/ABRpm24Uh9pmekx9QsrKGeEgNiSVtt9EzeF6I3GHpIC9C2yFHrnQhQB KeAyMvrEbhYulwdEPyuiWxi7glOvuTBy7a7IIr365+uc2B1wIRYbLeAg4YocrK6Xr2cjNDXthrK vNh92aHMI/IAYQJ0/pkq7vj08bEypnp3+ning44bA86l46UQShgEn7c9iWLRQvVRV/4YirdmSuG GpiqTPgTsOJHMOo88ekNVF18tJlr+DmrS0M= X-Google-Smtp-Source: AGHT+IEWrOLXEnoxplL0eXEi/SpthtQwbZ6y7pKL2e59tZ8s3JQadQxULhRUvfXdSLXbDwyc9fQvQw== X-Received: by 2002:a5d:5f42:0:b0:391:5f:fa4e with SMTP id ffacd0b85a97d-391005ffe69mr4807658f8f.29.1741015421039; Mon, 03 Mar 2025 07:23:41 -0800 (PST) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47a7868sm14700001f8f.24.2025.03.03.07.23.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:40 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 24/25] bpf: Maintain FIFO property for rqspinlock unlock Date: Mon, 3 Mar 2025 07:23:04 -0800 Message-ID: <20250303152305.3195648-25-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4773; h=from:subject; bh=yd3gRpKNHABqLGxFwmo4qitDCfszyenvxRvQDwJYTjI=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZ7k4vMGesNz3gUHT6LOfeGtXRbGBf+aKUvuv/ umGpqdqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RygluD/ 4uWKUHAjRffEJ86xP+I5MHqRuOHQ9jEAhSSdV7VPagDzTfZS8/JrswTTGLd4rS2xgwcFBwODkZ2aBs ayN+aLN1VPfzRRauqqYTO74IQCjSQVlRfrUc7QPeFDKF5iD9QPiLL3exq4eW9AGB7e+PuYaWH0wp3Q Wo/2jS/gXUhBTx2P5Iv9Lj56om69Hq7TGuUFrQVyvaC0CeinRJzaROx754KgWhnijwvOYUHsFZNFWS BR9qzsCLUTJhZhlEoX1z+ckbFZlKaMumFJaLjFUi3LW0r0wppTr+j6iVZ29CPh4p0x4H1/YOVIOehV mrk58kirt5EbcalsufoTGdl152sD0qdKJHO0pIefGYB9QVp9zzHJp+dGg5erc8UIcmn6J/IJuehSWC KgKZF6l00P5i0xwwz7h3GxNqSOfKGemo+nMTASSCL5HQGjzyWrrTOl3Yt5Sztu7F+qo4kTW554DaT6 z+9LE7AOC45QooYOFu15kG+QAhLeo+5nrl38cyPi60IHkU1JQ74e4y/92wb+i1BJsmtndDZAcwh48I YcPMi8nG85nDdwOaRNx4VNvVKDPLz3FdagXogkzXKN9BtKEUy/7x5cVgwICmIOCmHC3fqFTyvAimEK 8q9RN8RRT/ciTibX4PZBm9gB4uF8rSuHE+89XzSMRgzUjf0vXy/Qf6oVA84w== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Since out-of-order unlocks are unsupported for rqspinlock, and irqsave variants enforce strict FIFO ordering anyway, make the same change for normal non-irqsave variants, such that FIFO ordering is enforced. Two new verifier state fields (active_lock_id, active_lock_ptr) are used to denote the top of the stack, and prev_id and prev_ptr are ascertained whenever popping the topmost entry through an unlock. Take special care to make these fields part of the state comparison in refsafe. Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf_verifier.h | 3 +++ kernel/bpf/verifier.c | 33 ++++++++++++++++++++++++++++----- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 269449363f78..7348bd824e16 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -268,6 +268,7 @@ struct bpf_reference_state { REF_TYPE_LOCK = (1 << 3), REF_TYPE_RES_LOCK = (1 << 4), REF_TYPE_RES_LOCK_IRQ = (1 << 5), + REF_TYPE_LOCK_MASK = REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, } type; /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). @@ -434,6 +435,8 @@ struct bpf_verifier_state { u32 active_locks; u32 active_preempt_locks; u32 active_irq_id; + u32 active_lock_id; + void *active_lock_ptr; bool active_rcu_lock; bool speculative; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6c8ef72ee6bc..d3be8932abe4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1421,6 +1421,8 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf dst->active_preempt_locks = src->active_preempt_locks; dst->active_rcu_lock = src->active_rcu_lock; dst->active_irq_id = src->active_irq_id; + dst->active_lock_id = src->active_lock_id; + dst->active_lock_ptr = src->active_lock_ptr; return 0; } @@ -1520,6 +1522,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r s->ptr = ptr; state->active_locks++; + state->active_lock_id = id; + state->active_lock_ptr = ptr; return 0; } @@ -1570,16 +1574,24 @@ static bool find_reference_state(struct bpf_verifier_state *state, int ptr_id) static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr) { + void *prev_ptr = NULL; + u32 prev_id = 0; int i; for (i = 0; i < state->acquired_refs; i++) { - if (state->refs[i].type != type) - continue; - if (state->refs[i].id == id && state->refs[i].ptr == ptr) { + if (state->refs[i].type == type && state->refs[i].id == id && + state->refs[i].ptr == ptr) { release_reference_state(state, i); state->active_locks--; + /* Reassign active lock (id, ptr). */ + state->active_lock_id = prev_id; + state->active_lock_ptr = prev_ptr; return 0; } + if (state->refs[i].type & REF_TYPE_LOCK_MASK) { + prev_id = state->refs[i].id; + prev_ptr = state->refs[i].ptr; + } } return -EINVAL; } @@ -8201,6 +8213,14 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags) type = REF_TYPE_RES_LOCK; else type = REF_TYPE_LOCK; + if (!find_lock_state(cur, type, reg->id, ptr)) { + verbose(env, "%s_unlock of different lock\n", lock_str); + return -EINVAL; + } + if (reg->id != cur->active_lock_id || ptr != cur->active_lock_ptr) { + verbose(env, "%s_unlock cannot be out of order\n", lock_str); + return -EINVAL; + } if (release_lock_state(cur, type, reg->id, ptr)) { verbose(env, "%s_unlock of different lock\n", lock_str); return -EINVAL; @@ -12393,8 +12413,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ if (!env->cur_state->active_locks) return -EINVAL; - s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, - id, ptr); + s = find_lock_state(env->cur_state, REF_TYPE_LOCK_MASK, id, ptr); if (!s) { verbose(env, "held lock and object are not in the same allocation\n"); return -EINVAL; @@ -18449,6 +18468,10 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap)) return false; + if (!check_ids(old->active_lock_id, cur->active_lock_id, idmap) || + old->active_lock_ptr != cur->active_lock_ptr) + return false; + for (i = 0; i < old->acquired_refs; i++) { if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) || old->refs[i].type != cur->refs[i].type) From patchwork Mon Mar 3 15:23:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999042 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92D14239561; Mon, 3 Mar 2025 15:23:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015426; cv=none; b=suKoWTd7Hrs3xVfFFGqcqfngpgyIOAaoK14KdQzf6To8em0CN3nvwsjdVsod28evG9Rf3GjpoMFlmZQQ2frLFem4Tf55rGcwqvm5cACP3lFe5cLOVDASvox9LfsYNrMeV/qeL/0AUUkgG/kd/+u6ZFr7qRljG7lLvev4k4cLinE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741015426; c=relaxed/simple; bh=U3o4FRMgSYlK94f6p6XtgMmHxSWNXlOtPMuGZSejx2w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sOeHIJqXzcf7DcklNF7QB6UmbSn0dA3c7dvP8UfyH1YtGemiWrBusT6UdswSzaxCElZq4WmoVD+rLVU7rIhos2tBkd5/x/Us8grshtcOC4H4ei6SFk6uniNgrg/hUmxMcKnvxi8OJyFfl+VcX6oqKbmZqwAdymhEWhRNOJ3OtXE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=a2gOlkK8; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="a2gOlkK8" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-4399ee18a57so28857335e9.1; Mon, 03 Mar 2025 07:23:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015422; x=1741620222; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c2YDELerz8rU9E2PXmGNUF4c5wFbG0aqMQwnamqYdcU=; b=a2gOlkK8VwPr1Q+d8pMxXIbPi8dxxEcZw5KN5x4R2QGHwdRuC8dILbaGLbk/xldmGE 3ABfQJ6ThsKkJarNNyY/Gqm1fkR5UUYlKGg8daTYzxc9tXvMg/QlGN/C4hfDw6SOkzZT Ipb1FpNxlf9Itqx4tbT9SQVDTSsRdzIHMp3Q7rpmwfto1bGXSBiODwvBXq7zXUJYKiPu K+ceTyqk3ITXmP5uDMN1UptA74wQbnXATzMKYHerUuOZa3yo6ac3oFyHipeh7Nwh54BO 1uK3N2DCE52uRABiG0QcR4uGqjZDO/eaQFUps14/SveXBPTUu6e3Rx96sBSeziiiip06 a+Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015422; x=1741620222; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c2YDELerz8rU9E2PXmGNUF4c5wFbG0aqMQwnamqYdcU=; b=KLWDJR7SyRAckK3EXZH1NHZscYn3SntYuUvF33bEQzwjmiBa5tCceTDfzfIfD2o0bi +yVkWtV0BuUBIvhZUwOxDIgpKX1PdIS3MadFRMZwu5EG7NyE7AYK7hl7fdKDYrL0+dH1 Ldu/bsgfzzfE6FKGaYYhdoZbSq+eiSVqokIU/veB90BK4qQ7woevhImdzSrFtfHfIdm5 U9/aslNlVRMAirVQDiQrQMrtY1Q0n1RSdcxXOmNyW1pf4ehO+epiL/7+IF3+GWjgCSER LWP2oow9KYjuY2Fr5v7Cp0ungeNzEHV+NrFg1tM+oMhkwAt3o7/hO1Y5Oe+bY+GOfLGO RE7g== X-Forwarded-Encrypted: i=1; AJvYcCVN/4qt6hP2a+BlrrpnGOaet6F8Td9C711cGHvxXUil7Aq+vqyjQqTZ9Wb6WHsu8+qGnWTjAeKOLcKODBs=@vger.kernel.org X-Gm-Message-State: AOJu0Yz0GkQVZevUDlU5970UPjRF1/HtpARakx0CWUYy7DgchKSq5ZX2 trBAF7Ab+cCTmobH8thIH0YH7b4+tyfnL2W3Xm6HX2H/cHCgWa58Gxm4ZXTBgdw= X-Gm-Gg: ASbGncs/YC8fPRAspVhCJJIv83eY3AU40ze8YI4AKLNbd3uvomg7Gf0CnBkrCVCn9W5 gYSglScJEqw2bKHM2+wLCPmJli4s9tC4HpjsAaMjo96G1zdLdrDcBvtovjq3cVu5rp5q7S+813S BOaBQ8T14yT6cZpN+dDO4WQtHHTE3V1JofDXHb4zIVg+E3+3OEqpaUOp1abGO03JPTgH1Q0G3Yf k61ZFo3lXbFuS/BIJsxp5xNql9BqWsWJ0elV08GxRLbMHWQP7kGy/x+KmhLohTNagG7G3jm9Fz5 VYzVa/SgI1D2+IFFYY9f6wN5Rv6Jf7zrWXE= X-Google-Smtp-Source: AGHT+IF1BwyKpBxufZL15FHfV1K+xkaoboqbhKYSSMajj4aWR8VsvlX6AMmLz83+PD1s+XMRyoVIXQ== X-Received: by 2002:a05:600c:ca:b0:439:91c7:895a with SMTP id 5b1f17b1804b1-43afddc6489mr144850385e9.7.1741015422319; Mon, 03 Mar 2025 07:23:42 -0800 (PST) Received: from localhost ([2a03:2880:31ff:73::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc24a51bfsm38651415e9.10.2025.03.03.07.23.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:41 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 25/25] selftests/bpf: Add tests for rqspinlock Date: Mon, 3 Mar 2025 07:23:05 -0800 Message-ID: <20250303152305.3195648-26-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=15728; h=from:subject; bh=U3o4FRMgSYlK94f6p6XtgMmHxSWNXlOtPMuGZSejx2w=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZzKIrFm1iuCUUDDtfLRnhLPc+SCt9cT5eXF6o 6N+GSmqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RysxCD/ 0USidd7H00hBTlGhkNe1qL1P8T3pcVx1ejxkBbgrAma8ybJKZyercWVHVXCW8eQg784Ri97C4xBbFo 3V9qUCkJUJZUX4AnPiNsAdiAQCHKECTLYm0p0KqI9BcdgZKQv+HVHH54QDArzyN9nBuG4Uks81oq90 LbGJLfxQ/EP7x8KvPJqEa3auefyGtsq6f6d5JT+n+8gWtbOpnhncsq5NPk+Cfh4XjdHGw1ij0f6kb8 sYkYXwCPxGNbt7GuxA7DUA28RLTd+CWeLMnLz8sqlSHEqT3geqqTQ7zCGroosld1WZxYffYop/FpWT CK438ZkEWr6oW0Z9sokHWAwu5YnrGNcsZCNzUGyAnTFnYrhqn4p2oXPPWxdJP4MkivrQbM7nFp9g0W oGzGvCmn61ww7IdcCEK09nEtye3GvefEpTNPahCH5W1538XTcMV+ivRdfq2TzV48irQGjQEOLSEamT opLsWsX/ngZ8TSUM8rHQd6R1Xn/nxAehi9yFiEr8Afb4nBBoSL5SIDxtqDnDYjMBMZv/cHK/f0yOuL QwW/1Dd+qnhEq3YSKQmvGN8vnM1bubpAJGBgYYiv5mH/KDdvzmV0IA20+gJ8g+d9P9Z/RnqFZnIFPw ciMScGjt5UN3k63SRNESddfekGf3gq1yUKZbJrWdML5Zi9tRdfNR7DtyAqgg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Introduce selftests that trigger AA, ABBA deadlocks, and test the edge case where the held locks table runs out of entries, since we then fallback to the timeout as the final line of defense. Also exercise verifier's AA detection where applicable. Signed-off-by: Kumar Kartikeya Dwivedi --- .../selftests/bpf/prog_tests/res_spin_lock.c | 92 +++++++ tools/testing/selftests/bpf/progs/irq.c | 53 ++++ .../selftests/bpf/progs/res_spin_lock.c | 143 ++++++++++ .../selftests/bpf/progs/res_spin_lock_fail.c | 244 ++++++++++++++++++ 4 files changed, 532 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c new file mode 100644 index 000000000000..563d0d2801bb --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include + +#include "res_spin_lock.skel.h" +#include "res_spin_lock_fail.skel.h" + +void test_res_spin_lock_failure(void) +{ + RUN_TESTS(res_spin_lock_fail); +} + +static volatile int skip; + +static void *spin_lock_thread(void *arg) +{ + int err, prog_fd = *(u32 *) arg; + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 10000, + ); + + while (!READ_ONCE(skip)) { + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run"); + ASSERT_OK(topts.retval, "test_run retval"); + } + pthread_exit(arg); +} + +void test_res_spin_lock_success(void) +{ + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 1, + ); + struct res_spin_lock *skel; + pthread_t thread_id[16]; + int prog_fd, i, err; + void *ret; + + skel = res_spin_lock__open_and_load(); + if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load")) + return; + /* AA deadlock */ + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "error"); + ASSERT_OK(topts.retval, "retval"); + + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "error"); + ASSERT_OK(topts.retval, "retval"); + + /* Multi-threaded ABBA deadlock. */ + + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB); + for (i = 0; i < 16; i++) { + int err; + + err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd); + if (!ASSERT_OK(err, "pthread_create")) + goto end; + } + + topts.retval = 0; + topts.repeat = 1000; + int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA); + while (!topts.retval && !err && !READ_ONCE(skel->bss->err)) { + err = bpf_prog_test_run_opts(fd, &topts); + } + + WRITE_ONCE(skip, true); + + for (i = 0; i < 16; i++) { + if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join")) + goto end; + if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd")) + goto end; + } + + ASSERT_EQ(READ_ONCE(skel->bss->err), -EDEADLK, "timeout err"); + ASSERT_OK(err, "err"); + ASSERT_EQ(topts.retval, -EDEADLK, "timeout"); +end: + res_spin_lock__destroy(skel); + return; +} diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c index 298d48d7886d..74d912b22de9 100644 --- a/tools/testing/selftests/bpf/progs/irq.c +++ b/tools/testing/selftests/bpf/progs/irq.c @@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym; extern void bpf_local_irq_restore(unsigned long *) __weak __ksym; extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym; +struct bpf_res_spin_lock lockA __hidden SEC(".data.A"); +struct bpf_res_spin_lock lockB __hidden SEC(".data.B"); + SEC("?tc") __failure __msg("arg#0 doesn't point to an irq flag on stack") int irq_save_bad_arg(struct __sk_buff *ctx) @@ -510,4 +513,54 @@ int irq_sleepable_global_subprog_indirect(void *ctx) return 0; } +SEC("?tc") +__failure __msg("cannot restore irq state out of order") +int irq_ooo_lock_cond_inv(struct __sk_buff *ctx) +{ + unsigned long flags1, flags2; + + if (bpf_res_spin_lock_irqsave(&lockA, &flags1)) + return 0; + if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) { + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; + } + + bpf_res_spin_unlock_irqrestore(&lockB, &flags1); + bpf_res_spin_unlock_irqrestore(&lockA, &flags2); + return 0; +} + +SEC("?tc") +__failure __msg("function calls are not allowed") +int irq_wrong_kfunc_class_1(struct __sk_buff *ctx) +{ + unsigned long flags1; + + if (bpf_res_spin_lock_irqsave(&lockA, &flags1)) + return 0; + /* For now, bpf_local_irq_restore is not allowed in critical section, + * but this test ensures error will be caught with kfunc_class when it's + * opened up. Tested by temporarily permitting this kfunc in critical + * section. + */ + bpf_local_irq_restore(&flags1); + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; +} + +SEC("?tc") +__failure __msg("function calls are not allowed") +int irq_wrong_kfunc_class_2(struct __sk_buff *ctx) +{ + unsigned long flags1, flags2; + + bpf_local_irq_save(&flags1); + if (bpf_res_spin_lock_irqsave(&lockA, &flags2)) + return 0; + bpf_local_irq_restore(&flags2); + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; +} + char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c new file mode 100644 index 000000000000..40ac06c91779 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include "bpf_misc.h" + +#define EDEADLK 35 +#define ETIMEDOUT 110 + +struct arr_elem { + struct bpf_res_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 64); + __type(key, int); + __type(value, struct arr_elem); +} arrmap SEC(".maps"); + +struct bpf_res_spin_lock lockA __hidden SEC(".data.A"); +struct bpf_res_spin_lock lockB __hidden SEC(".data.B"); + +SEC("tc") +int res_spin_lock_test(struct __sk_buff *ctx) +{ + struct arr_elem *elem1, *elem2; + int r; + + elem1 = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem1) + return -1; + elem2 = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem2) + return -1; + + r = bpf_res_spin_lock(&elem1->lock); + if (r) + return r; + if (!bpf_res_spin_lock(&elem2->lock)) { + bpf_res_spin_unlock(&elem2->lock); + bpf_res_spin_unlock(&elem1->lock); + return -1; + } + bpf_res_spin_unlock(&elem1->lock); + return 0; +} + +SEC("tc") +int res_spin_lock_test_AB(struct __sk_buff *ctx) +{ + int r; + + r = bpf_res_spin_lock(&lockA); + if (r) + return !r; + /* Only unlock if we took the lock. */ + if (!bpf_res_spin_lock(&lockB)) + bpf_res_spin_unlock(&lockB); + bpf_res_spin_unlock(&lockA); + return 0; +} + +int err; + +SEC("tc") +int res_spin_lock_test_BA(struct __sk_buff *ctx) +{ + int r; + + r = bpf_res_spin_lock(&lockB); + if (r) + return !r; + if (!bpf_res_spin_lock(&lockA)) + bpf_res_spin_unlock(&lockA); + else + err = -EDEADLK; + bpf_res_spin_unlock(&lockB); + return err ?: 0; +} + +SEC("tc") +int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx) +{ + struct bpf_res_spin_lock *locks[48] = {}; + struct arr_elem *e; + u64 time_beg, time; + int ret = 0, i; + + _Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 31, + "RES_NR_HELD assumed to be 31"); + + for (i = 0; i < 34; i++) { + int key = i; + + /* We cannot pass in i as it will get spilled/filled by the compiler and + * loses bounds in verifier state. + */ + e = bpf_map_lookup_elem(&arrmap, &key); + if (!e) + return 1; + locks[i] = &e->lock; + } + + for (; i < 48; i++) { + int key = i - 2; + + /* We cannot pass in i as it will get spilled/filled by the compiler and + * loses bounds in verifier state. + */ + e = bpf_map_lookup_elem(&arrmap, &key); + if (!e) + return 1; + locks[i] = &e->lock; + } + + time_beg = bpf_ktime_get_ns(); + for (i = 0; i < 34; i++) { + if (bpf_res_spin_lock(locks[i])) + goto end; + } + + /* Trigger AA, after exhausting entries in the held lock table. This + * time, only the timeout can save us, as AA detection won't succeed. + */ + if (!bpf_res_spin_lock(locks[34])) { + bpf_res_spin_unlock(locks[34]); + ret = 1; + goto end; + } + +end: + for (i = i - 1; i >= 0; i--) + bpf_res_spin_unlock(locks[i]); + time = bpf_ktime_get_ns() - time_beg; + /* Time spent should be easily above our limit (1/4 s), since AA + * detection won't be expedited due to lack of held lock entry. + */ + return ret ?: (time > 1000000000 / 4 ? 0 : 1); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c new file mode 100644 index 000000000000..3222e9283c78 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +struct arr_elem { + struct bpf_res_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct arr_elem); +} arrmap SEC(".maps"); + +long value; + +struct bpf_spin_lock lock __hidden SEC(".data.A"); +struct bpf_res_spin_lock res_lock __hidden SEC(".data.B"); + +SEC("?tc") +__failure __msg("point to map value or allocated object") +int res_spin_lock_arg(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff)); + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("AA deadlock detected") +int res_spin_lock_AA(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock(&elem->lock); + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("AA deadlock detected") +int res_spin_lock_cond_AA(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&elem->lock)) + return 0; + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_mismatch_1(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&elem->lock)) + return 0; + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_mismatch_2(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&res_lock)) + return 0; + bpf_res_spin_unlock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_local_irq_save(&f1); + if (bpf_res_spin_lock(&res_lock)) + return 0; + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock_irqsave(&res_lock, &f1)) + return 0; + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__success +int res_spin_lock_ooo(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&res_lock)) + return 0; + if (bpf_res_spin_lock(&elem->lock)) { + bpf_res_spin_unlock(&res_lock); + return 0; + } + bpf_res_spin_unlock(&elem->lock); + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__success +int res_spin_lock_ooo_irq(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1, f2; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock_irqsave(&res_lock, &f1)) + return 0; + if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) { + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + /* We won't have a unreleased IRQ flag error here. */ + return 0; + } + bpf_res_spin_unlock_irqrestore(&elem->lock, &f2); + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + return 0; +} + +struct bpf_res_spin_lock lock1 __hidden SEC(".data.OO1"); +struct bpf_res_spin_lock lock2 __hidden SEC(".data.OO2"); + +SEC("?tc") +__failure __msg("bpf_res_spin_unlock cannot be out of order") +int res_spin_lock_ooo_unlock(struct __sk_buff *ctx) +{ + if (bpf_res_spin_lock(&lock1)) + return 0; + if (bpf_res_spin_lock(&lock2)) { + bpf_res_spin_unlock(&lock1); + return 0; + } + bpf_res_spin_unlock(&lock1); + bpf_res_spin_unlock(&lock2); + return 0; +} + +SEC("?tc") +__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0") +int res_spin_lock_bad_off(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock((void *)&elem->lock + 1); + return 0; +} + +SEC("?tc") +__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset") +int res_spin_lock_var_off(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + u64 val = value; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) { + // FIXME: Only inline assembly use in assert macro doesn't emit + // BTF definition. + bpf_throw(0); + return 0; + } + bpf_assert_range(val, 0, 40); + bpf_res_spin_lock((void *)&value + val); + return 0; +} + +SEC("?tc") +__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock") +int res_spin_lock_no_lock_map(struct __sk_buff *ctx) +{ + bpf_res_spin_lock((void *)&value + 1); + return 0; +} + +SEC("?tc") +__failure __msg("local 'kptr' has no valid bpf_res_spin_lock") +int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx) +{ + struct { int i; } *p = bpf_obj_new(typeof(*p)); + + if (!p) + return 0; + bpf_res_spin_lock((void *)p); + return 0; +} + +char _license[] SEC("license") = "GPL";