From patchwork Mon Mar 3 15:22:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E9D55C282CD for ; Mon, 3 Mar 2025 15:25:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=; b=efzI0QCH1+DAJg7ghOUCMjZUIH I5E8FFcgWNVn6WMg3Nuz26dZ5HnM6Z5Mux9C7rx/6UXnAMYfZf/ifQzo2BWG1e8bBdMo+jSt+9zAP 6af7i5qnTMCujtVVun9K1MKLXPdWES3GCQdK4YRJxgEHEqrWPYuNZAHD8UnRHy5LIT72Fka/irwYW 47JzUQxTwHlFzLlaCb63VwJ+Lj9ppWMhfHWhX0UAJD5i0zK2/PdwmrRtnyElfkMu/5uvvuS3Xg6dN Vdq+SeXjIGcj5GznfwnWABXgdQjiCKLRF2LT7QvfMTb9ZtxqGdaYKJukzZjkPTIVmsKPgaxYDjftW GUM03sSw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7f2-00000001Ipo-38e6; Mon, 03 Mar 2025 15:24:52 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dR-00000001I6z-27L9 for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:14 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5b1f17b1804b1-4393dc02b78so29222815e9.3 for ; Mon, 03 Mar 2025 07:23:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015392; x=1741620192; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=; b=dAK+Ck11HhxemuHsf5MdzI2VvNIJDO3wBqiQKpl7nLtD6P65coarz2z6RZ4YmmiXin Wyb9xC0AH7bnRZdBVOZvan4h6cwfUVNR4lyp6Ez7suJxCU7Yho4xEcDTrQsLSia4gpvr 4YvL0+QEMFoGrp1Gw2Wge0fmQnMRCOCbBZj3bG0UsRrhEyPbOyKfgUU3+WZu84cXfkMk OTPAol4hGBHvpoNvC3bCFdqoALIvZ8q8cYvQb/UBUhTfP40bzBRmgqJrpED6q2ncJsnb 8ZNXzstQ91F0Vmq73T9QZyZWKPdWzhkxTzVG3anN7XtIfUjhN6VWqKiYE9uzPfeUGqN7 upBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015392; x=1741620192; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y4Fpzu6vEVBo0wCJhRUrT6eOf6iL8Bnn94ks0el4cwM=; b=Hnlt5TFiNfqw+R+gmZMkYrOugHoMDkj8itxZjDAubjnq+VaH3GC9LunWmUWzRA+lQu oZxskGdqRjl0IivsKtj6f/XqFDqDR5H2sMGCowouhik3DskOZhofcc0FGJN2Hk1XcSxM YPBOpVD7Ibl+ghMdjC3cu/OauY+Nx0/wvqjgPAIX+UmUHDHIX4LqYrmR2UR7pBSsryXQ sYnyiVDDvEkaUTC/S5VW0DdP/jhr5YRRggNaI9q01pAOixhPmyBQ2LOGg93h5cjcXfm2 OA4H3UnWWiekG849f6mM1GOCdDzFA9U8VHqpEvceKV819rCyj3vG5zoFnyM850Yz71l+ nXKw== X-Forwarded-Encrypted: i=1; AJvYcCWmh//e9AqG5WvarW1jfNH8Y0Z1gS+5sJ5u45eZrHCPC2cEXcODg5RwOBBztgwfs0/2YpjoElPuZIVEQm0VzjnY@lists.infradead.org X-Gm-Message-State: AOJu0YzGIiYbSKf2QL2AX9YCQn3giU60p+Sp18eseOQaOqZpHoKzqz73 AgnHxgJDmuuj2vi7wiGnWW8Vat8Ku/otXU77Ezw6344A71Ok5R2C X-Gm-Gg: ASbGnctrdZdRDHgY177Rd5nyV2DPgEfTWyaPnI0V0mOgDBxiNC9Z8PtAQtSfjKEFgt9 2i6QJTA+mt8rARd6lq8f54mA6XvxLVLffWuFfjuZctRouJoNTyyfMgRHPLrfJs2qUzM8jxKaHcQ LyR5LvhOXail2i6cdRQuI/w3cKJpd+f7g4Mm2vmpPhiq8R2IaVkl56rMFOrddmHBxoQSxt0aQYs kzGWdbC9cxw3dqDXFn9N5CwU+rVpC7wOCMP93KMgg6E4x1H1rzV52jIwTIFh2VupnoIlBIZE+cQ wcjxBZDgZ6pxvtYC9P9xwkEyxvizBvfMkw== X-Google-Smtp-Source: AGHT+IGWe7AGIoZyzhIx4o6frHx+jxnpDfdaTig7kvrlS9c7aRLExKFsaeXCaLAhRyKx5jPukd6geQ== X-Received: by 2002:a05:600c:4f86:b0:43b:ca39:6c75 with SMTP id 5b1f17b1804b1-43bca3972d7mr4843355e9.16.1741015391407; Mon, 03 Mar 2025 07:23:11 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b736f75ebsm164799315e9.3.2025.03.03.07.23.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:11 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 01/25] locking: Move MCS struct definition to public header Date: Mon, 3 Mar 2025 07:22:41 -0800 Message-ID: <20250303152305.3195648-2-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1522; h=from:subject; bh=l+FzxzqcQOvCH8c5GurhKDmppajCdKyqii/WUkvzxT8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVghejvu1V67bqXOIvMlrdiOA5EFuq9ml673Px YVwX7L2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8RypodD/ sHpZiA9+maif3gXLaLeb6BTa8ziYYbNY7ZG6/NdOcNw/uV3egEUUzRRqjKaMmDsRtl8u3i8UuKv93i Jpn+XC/A+FwN36qoS13Atwe1+eFSD/ZK0b52m1Tgy5sHI8wLBe/eCop0B6+TdV+nqpCUG22mIxK8kL yxABesXn6haH0qjCe1WgAx2qbP15M5KLx2leHpzsbXkBjQN/URoZHzBv3oyjrWiFXXn0R1lS3yGOwR V3j3EA44LSwOtcKzHtFmZOPnzXP9h/VaWi9Q/827Uq8Q5YpRgAhsDHaBEinMJ6xrWKWKxzosaTM4qZ 3Kl57QWIF7hlmRccjfD2KGQDbup22QXtz34QHm0FjNknfyT3oVfUVoJ8RTYsXR4h6sIkAEV8mEu9Il 8o/qwNMFXj/K1a3Z2+hKs9CkXdftARzIiKNhnOLoZ6OcMybgSp86eS2StUIeegiSSPWCLSHJRijnuU wb0sFuJWRQ6eveZiG59om0uLPkO1knbSuAk+dkykWu5TmE9nKyt7WXULkN9Q6+rkG9Lkt1jV89MQEX 3sPA7uXmn3t9nLxQINGa3cpYEru6LaMTvUmeUSf5maMPLnh3CMlBYvFVMi5q2JPW9m7JyF1/sqOLrp wGbPN0FktYyVe186BbPjg+7BtvK8bU5OX0ZT5CupTYRg2a1ZfvVBTo+6gZnQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072313_541823_BC5F63E7 X-CRM114-Status: GOOD ( 12.58 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Move the definition of the struct mcs_spinlock from the private mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h asm-generic header, since we will need to reference it from the qspinlock.h header in subsequent commits. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/mcs_spinlock.h | 6 ++++++ kernel/locking/mcs_spinlock.h | 6 ------ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h index 10cd4ffc6ba2..39c94012b88a 100644 --- a/include/asm-generic/mcs_spinlock.h +++ b/include/asm-generic/mcs_spinlock.h @@ -1,6 +1,12 @@ #ifndef __ASM_MCS_SPINLOCK_H #define __ASM_MCS_SPINLOCK_H +struct mcs_spinlock { + struct mcs_spinlock *next; + int locked; /* 1 if lock acquired */ + int count; /* nesting count, see qspinlock.c */ +}; + /* * Architectures can define their own: * diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 85251d8771d9..16160ca8907f 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -15,12 +15,6 @@ #include -struct mcs_spinlock { - struct mcs_spinlock *next; - int locked; /* 1 if lock acquired */ - int count; /* nesting count, see qspinlock.c */ -}; - #ifndef arch_mcs_spin_lock_contended /* * Using smp_cond_load_acquire() provides the acquire semantics From patchwork Mon Mar 3 15:22:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 12783C282CD for ; Mon, 3 Mar 2025 15:29:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=; b=vY1cwwQgtrJ3juKu88rnFDBL8S dUUe2xJmEnLTcgFEN7OKsaukFf/WfnoJl2cZT93AfHDGOeoUG1Sqafg7wrURxm+VfVMYk0glz/9jl gXqf2kj7Dgoyqu8ndM9HEGj/TwdtmeOphfzUcg5mfL1VRqSTLE9Y+ryAfxOKL2jSlD68n6AZQravT 0iTVNiSnT2fIcopIZ21g9dJQ73dt3fEO0JHjN0hqA7uEAWfaxzfRthjYA0YfpLPHcKZNatDKxxhSR aB/sUWOqPNwWdR4lEmmUurskg9aAgPH5VkbYatjx5UnHjepDhPZrtBU14ZFW+So7f7Gvbmcsh5QXm pv5qyxcg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7jg-00000001Jtq-3SII; Mon, 03 Mar 2025 15:29:40 +0000 Received: from mail-wr1-x444.google.com ([2a00:1450:4864:20::444]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dS-00000001I87-1Jny for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:16 +0000 Received: by mail-wr1-x444.google.com with SMTP id ffacd0b85a97d-388cae9eb9fso2401174f8f.3 for ; Mon, 03 Mar 2025 07:23:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015393; x=1741620193; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=; b=igbSSwUTLt+5Zi/hdUHj1eQyUNijeHTqXxMkjCigt2/WUk9qlyloyX50KI1DYnTjw3 0b6tY37OKeOpLRyN9WhnLvOnYMPKtrAPoGLVqQfu6qRaIDYNc5m/D15cQ99GaJZ03OY6 d7o332JTOjZfFJ6lJviftOTbKUD/gdxN58CJBfet6NdKU24UT7M7vY77gJdfG82NHpb7 0RM+M95+EuwlJ19ddECCXwr9frHS3JiiGs0wdpEMkf0YvdnH0FOAu4Gir94yJzN85pmV H/agVXXYvaiou2MtAYAmltZmNcKOn6efdYhxnK+Pb73ANtyMkaJsKlj9z8cVPq2Pe3LW aiFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015393; x=1741620193; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hgmvOyDRE/uHdGDx6ezBHiwZ8Zmst4a3Z/7igl9yl54=; b=T/220FR4y8QuOXy4FdVg687qYO14rPO/ES8x9wQR54FaJQmiYD1JaWf+GAsDFUQSuV 6zPXg4aPTPQMpeJaH9wJ/15Nwm6yBBgGz9vn3NSoS+LlPw7tuFAGhvZkLvePZ0NRp2X/ mpzjtNV2DbIm3531myo7nPd2JCnglH1G1lzzKex18Q/S7C6ovP9FzBzUocehWtnFDv84 Zk1l81UGLgGCXWkCtDmv/IuMt79oDSIoizj1Wk7ZSM5kvB7b+WDLpgAASQJm+jwV88fM 7ljuO9yTZARmsEhgS6UH/2R6Fsa5aX1zj1p4m0fDnkaXGggVMMNY9+ZGUAxakhpk40DA ze1w== X-Forwarded-Encrypted: i=1; AJvYcCUZGzGf9ILX1SSYPOMv2WHMQp1mGUviGryNO0bOgO7NX8ul4AIhrERWZhDTW1AZ1byDqI30bBlBZei0DwFWLBjW@lists.infradead.org X-Gm-Message-State: AOJu0YxNDDn9Wk6dpMK5f9OjX3UjhNxUF8GbBEaMXpWP/7lmRzI3wf7J NxHMOMOiX/JgPYRjiUHEN1/J2GVxAelvj73DxSJE0L3lyJaV1lSp X-Gm-Gg: ASbGncsc0F5vx70Tpz4+vs7pFeXlEB9M2lXw5WpmS0XDWcyBXE/L3w/Af209pqRoGhz gTV67tYKJG51uB64ZMZxkiayZK9/b5V64jRJwnltvRHbgbLLvmDjzTvX6D+2gjFKmN1n8wPlrDh rZyneNpVTcsQ9OqU5RlYBS3SIhpI2TxWNNjfr9cLve01LavllA6QQDa9pZIClD2xGmKyrN90tq3 bHlF0WKTFa8x28e7rM5ZZRt/Oat26nvy4LZMO93rxysYYIs2aOwRT4U2KoFZAhQuUGWm5u9kHtB GXTBPe5sOlSkHA1k2X+VRD0wNUiPzUrIKoQ= X-Google-Smtp-Source: AGHT+IEZmbL4kTVe2++oKcRU5JQYI+CySPZk9bC17VESbEqwwtp816GvVhoQy61ssaTtOJsaTHaHDQ== X-Received: by 2002:a5d:64a5:0:b0:391:b23:b318 with SMTP id ffacd0b85a97d-3910b23b573mr3648211f8f.41.1741015392642; Mon, 03 Mar 2025 07:23:12 -0800 (PST) Received: from localhost ([2a03:2880:31ff:72::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e486eea3sm14548017f8f.101.2025.03.03.07.23.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:12 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 02/25] locking: Move common qspinlock helpers to a private header Date: Mon, 3 Mar 2025 07:22:42 -0800 Message-ID: <20250303152305.3195648-3-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=13562; h=from:subject; bh=eKJ3qxGBtRJg8l1rvSHjaQtIHqaeLQcTbgGFSWnocak=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVkVfZqT1tCitfFNFTby5Hz/Q0Ls5KtoFEDTCL cZHH7UOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8RypysEA C3leDFcx+MonrV13UNP/a1KQjzZBifx/rS8N1isQIHzlulO71sBTAbTJDuZM6yq/0bU/qUs8hwXLq/ w8imvbbOhV0sg2v0XcsUu+Qqaji1dlaEl8+Yzo1SsBEL0NAFxDJtE2OgpBRNEITRSoUsbzwYGVJItt ptZni9OCm9cqgf2GOBiN2XXULRyhnYmdK/eUHMcHrEQvOy2rOdJNpf2dSxW8CcYd49oZaMAqF319OI 19D1ra8czLwFo1Y3MCrEQuT73bn0YZIwfZYdmwgoihZWMjvvZ6rNwRLzpDN5XJLhZLUZEK6pUn5Xpe GcTDtEFLciWRXU6QhN2gu0ZAm9XKtRaR4Hp5yOy3qf2tkMxt5L5ad1BnhyPm8drcY8LQPc/zB5fuIX t029FoAw1yTCn//bVcO6KrFEyGMCMgZ8/UtZpPrcIB70+R1GFk0CB/u0iyKykO1ee5ke5aVFFPZMsb KGBfAxmEcDwbj/LlC/q7xnfODH0cObC073WRxVzQMIOOBYdONvuAD5d1+YlWUOSAUNWrB8olMJS67I sIUraNIODLQ7Rw8YCl2Ekg1SRQA0pvBono+BLof7ZvXRM/L1GUVhc8kNk6XpOToWnQf5XX5OU35vVG 7APJ8S46OuAh06qCXM9teIlkPaNqlA0TdWqHhyfcftt9TvzHsXib5eD7UN6A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072314_489999_1BC711CE X-CRM114-Status: GOOD ( 28.00 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Move qspinlock helper functions that encode, decode tail word, set and clear the pending and locked bits, and other miscellaneous definitions and macros to a private header. To this end, create a qspinlock.h header file in kernel/locking. Subsequent commits will introduce a modified qspinlock slow path function, thus moving shared code to a private header will help minimize unnecessary code duplication. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/qspinlock.c | 193 +---------------------------------- kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+), 188 deletions(-) create mode 100644 kernel/locking/qspinlock.h diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 7d96bed718e4..af8d122bb649 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -25,8 +25,9 @@ #include /* - * Include queued spinlock statistics code + * Include queued spinlock definitions and statistics code */ +#include "qspinlock.h" #include "qspinlock_stat.h" /* @@ -67,36 +68,6 @@ */ #include "mcs_spinlock.h" -#define MAX_NODES 4 - -/* - * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in - * size and four of them will fit nicely in one 64-byte cacheline. For - * pvqspinlock, however, we need more space for extra data. To accommodate - * that, we insert two more long words to pad it up to 32 bytes. IOW, only - * two of them can fit in a cacheline in this case. That is OK as it is rare - * to have more than 2 levels of slowpath nesting in actual use. We don't - * want to penalize pvqspinlocks to optimize for a rare case in native - * qspinlocks. - */ -struct qnode { - struct mcs_spinlock mcs; -#ifdef CONFIG_PARAVIRT_SPINLOCKS - long reserved[2]; -#endif -}; - -/* - * The pending bit spinning loop count. - * This heuristic is used to limit the number of lockword accesses - * made by atomic_cond_read_relaxed when waiting for the lock to - * transition out of the "== _Q_PENDING_VAL" state. We don't spin - * indefinitely because there's no guarantee that we'll make forward - * progress. - */ -#ifndef _Q_PENDING_LOOPS -#define _Q_PENDING_LOOPS 1 -#endif /* * Per-CPU queue node structures; we can never have more than 4 nested @@ -106,161 +77,7 @@ struct qnode { * * PV doubles the storage and uses the second cacheline for PV state. */ -static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]); - -/* - * We must be able to distinguish between no-tail and the tail at 0:0, - * therefore increment the cpu number by one. - */ - -static inline __pure u32 encode_tail(int cpu, int idx) -{ - u32 tail; - - tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; - tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ - - return tail; -} - -static inline __pure struct mcs_spinlock *decode_tail(u32 tail) -{ - int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; - int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; - - return per_cpu_ptr(&qnodes[idx].mcs, cpu); -} - -static inline __pure -struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) -{ - return &((struct qnode *)base + idx)->mcs; -} - -#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) - -#if _Q_PENDING_BITS == 8 -/** - * clear_pending - clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,* -> *,0,* - */ -static __always_inline void clear_pending(struct qspinlock *lock) -{ - WRITE_ONCE(lock->pending, 0); -} - -/** - * clear_pending_set_locked - take ownership and clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,0 -> *,0,1 - * - * Lock stealing is not allowed if this function is used. - */ -static __always_inline void clear_pending_set_locked(struct qspinlock *lock) -{ - WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); -} - -/* - * xchg_tail - Put in the new queue tail code word & retrieve previous one - * @lock : Pointer to queued spinlock structure - * @tail : The new queue tail code word - * Return: The previous queue tail code word - * - * xchg(lock, tail), which heads an address dependency - * - * p,*,* -> n,*,* ; prev = xchg(lock, node) - */ -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) -{ - /* - * We can use relaxed semantics since the caller ensures that the - * MCS node is properly initialized before updating the tail. - */ - return (u32)xchg_relaxed(&lock->tail, - tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; -} - -#else /* _Q_PENDING_BITS == 8 */ - -/** - * clear_pending - clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,* -> *,0,* - */ -static __always_inline void clear_pending(struct qspinlock *lock) -{ - atomic_andnot(_Q_PENDING_VAL, &lock->val); -} - -/** - * clear_pending_set_locked - take ownership and clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,0 -> *,0,1 - */ -static __always_inline void clear_pending_set_locked(struct qspinlock *lock) -{ - atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); -} - -/** - * xchg_tail - Put in the new queue tail code word & retrieve previous one - * @lock : Pointer to queued spinlock structure - * @tail : The new queue tail code word - * Return: The previous queue tail code word - * - * xchg(lock, tail) - * - * p,*,* -> n,*,* ; prev = xchg(lock, node) - */ -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) -{ - u32 old, new; - - old = atomic_read(&lock->val); - do { - new = (old & _Q_LOCKED_PENDING_MASK) | tail; - /* - * We can use relaxed semantics since the caller ensures that - * the MCS node is properly initialized before updating the - * tail. - */ - } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); - - return old; -} -#endif /* _Q_PENDING_BITS == 8 */ - -/** - * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending - * @lock : Pointer to queued spinlock structure - * Return: The previous lock value - * - * *,*,* -> *,1,* - */ -#ifndef queued_fetch_set_pending_acquire -static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock) -{ - return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); -} -#endif - -/** - * set_locked - Set the lock bit and own the lock - * @lock: Pointer to queued spinlock structure - * - * *,*,0 -> *,0,1 - */ -static __always_inline void set_locked(struct qspinlock *lock) -{ - WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); -} - +static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); /* * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for @@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * any MCS node. This is not the most elegant solution, but is * simple enough. */ - if (unlikely(idx >= MAX_NODES)) { + if (unlikely(idx >= _Q_MAX_NODES)) { lockevent_inc(lock_no_node); while (!queued_spin_trylock(lock)) cpu_relax(); @@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * head of the waitqueue. */ if (old & _Q_TAIL_MASK) { - prev = decode_tail(old); + prev = decode_tail(old, qnodes); /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h new file mode 100644 index 000000000000..d4ceb9490365 --- /dev/null +++ b/kernel/locking/qspinlock.h @@ -0,0 +1,200 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Queued spinlock defines + * + * This file contains macro definitions and functions shared between different + * qspinlock slow path implementations. + */ +#ifndef __LINUX_QSPINLOCK_H +#define __LINUX_QSPINLOCK_H + +#include +#include +#include +#include + +#define _Q_MAX_NODES 4 + +/* + * The pending bit spinning loop count. + * This heuristic is used to limit the number of lockword accesses + * made by atomic_cond_read_relaxed when waiting for the lock to + * transition out of the "== _Q_PENDING_VAL" state. We don't spin + * indefinitely because there's no guarantee that we'll make forward + * progress. + */ +#ifndef _Q_PENDING_LOOPS +#define _Q_PENDING_LOOPS 1 +#endif + +/* + * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in + * size and four of them will fit nicely in one 64-byte cacheline. For + * pvqspinlock, however, we need more space for extra data. To accommodate + * that, we insert two more long words to pad it up to 32 bytes. IOW, only + * two of them can fit in a cacheline in this case. That is OK as it is rare + * to have more than 2 levels of slowpath nesting in actual use. We don't + * want to penalize pvqspinlocks to optimize for a rare case in native + * qspinlocks. + */ +struct qnode { + struct mcs_spinlock mcs; +#ifdef CONFIG_PARAVIRT_SPINLOCKS + long reserved[2]; +#endif +}; + +/* + * We must be able to distinguish between no-tail and the tail at 0:0, + * therefore increment the cpu number by one. + */ + +static inline __pure u32 encode_tail(int cpu, int idx) +{ + u32 tail; + + tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; + tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ + + return tail; +} + +static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes) +{ + int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; + int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; + + return per_cpu_ptr(&qnodes[idx].mcs, cpu); +} + +static inline __pure +struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) +{ + return &((struct qnode *)base + idx)->mcs; +} + +#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) + +#if _Q_PENDING_BITS == 8 +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,* -> *,0,* + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + WRITE_ONCE(lock->pending, 0); +} + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + * + * Lock stealing is not allowed if this function is used. + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); +} + +/* + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail), which heads an address dependency + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + /* + * We can use relaxed semantics since the caller ensures that the + * MCS node is properly initialized before updating the tail. + */ + return (u32)xchg_relaxed(&lock->tail, + tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; +} + +#else /* _Q_PENDING_BITS == 8 */ + +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,* -> *,0,* + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + atomic_andnot(_Q_PENDING_VAL, &lock->val); +} + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); +} + +/** + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + u32 old, new; + + old = atomic_read(&lock->val); + do { + new = (old & _Q_LOCKED_PENDING_MASK) | tail; + /* + * We can use relaxed semantics since the caller ensures that + * the MCS node is properly initialized before updating the + * tail. + */ + } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); + + return old; +} +#endif /* _Q_PENDING_BITS == 8 */ + +/** + * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending + * @lock : Pointer to queued spinlock structure + * Return: The previous lock value + * + * *,*,* -> *,1,* + */ +#ifndef queued_fetch_set_pending_acquire +static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock) +{ + return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); +} +#endif + +/** + * set_locked - Set the lock bit and own the lock + * @lock: Pointer to queued spinlock structure + * + * *,*,0 -> *,0,1 + */ +static __always_inline void set_locked(struct qspinlock *lock) +{ + WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); +} + +#endif /* __LINUX_QSPINLOCK_H */ From patchwork Mon Mar 3 15:22:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999052 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 15E45C282CD for ; Mon, 3 Mar 2025 15:28:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=; b=Fj6jnXor1tUq6d47oj0w17rR65 KNYtOJDtqM4vuWr5T2UpIz58HGnFZhbS97uXbkVBuR3ZvOeIdCspwGG/0RcQ7uh+7KgyJuHwGU0xC 3JVCipOAMikExCAxYPg0mJuEAN0DDcZ4PxRCFI33hAizyJBCO0MQQLd8L40ETQpnQ/P29XC9KmWp5 7QgEKcx77LSGA0usn1ENrZhXX5vZZLIH/94zAOh5yVzt18wWP9HgXVHYmfZBh/5ENf7iEP79hxHpa vVRickQhkuh+VGf18yY5ULTkQD0hBV9PTadUVsJTZwfq4d/zzoJYv4qsjI5Khk5PyU8U27Kq3AXa0 bx0rIA3g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7i8-00000001JWR-0fYz; Mon, 03 Mar 2025 15:28:04 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dT-00000001I9r-1AJA for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:16 +0000 Received: by mail-wr1-x441.google.com with SMTP id ffacd0b85a97d-390fb1b987fso2419056f8f.1 for ; Mon, 03 Mar 2025 07:23:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015394; x=1741620194; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=; b=Y/NXm2TriO9ZpjwDfeIKlbgT9NVrHuYKEjHpLgDhMCyfIEcGHSnEBSQuwJUZ4tptnZ NPE2TgDP2Oz160Kp5QC775ePUiDsrPA2XGXIwfqVul8h/WEnM63jiTdrbKmzO4BdEEqP /eBHQJ4Yv6eBXfKledqlRtteFYhN/1ALDBW8J+QuBHwJZoqa4lTgNtMJJcrMsApVzKCQ dY76AbeETGucnYvOeNuH50WKc5Fr0Zg+OTOuCEcE/uNPJWj2hYHU1u9UHaFp9+gGsaZj E8BTwJz5A8CIwokEqpRbz7h/uGyYjtT9a7zbk8yliQkwn+r9xIsqx3VdLegLJ2tSx6vV r+Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015394; x=1741620194; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OWs3GXRlx91qlbus5IlQPX+xUrucb3RvojJ3Jxtdv6A=; b=Y7o0CyAVTP1diYz7o5EtX8jaPHM/qisoufcVgnQYSFp88DSFKNAg/+iNgRv4S3FfXz GCV50KBk4JM7PMVecxfV+ovzybeVDSBlRuB6pPL2M5hXwG3oRcvaD3TNfwlm7w/oN/d3 olVzppgipYmaA2Hl5nsEuI7iihuhCJnK7aK8kZ58l4dD++hUwC0cDtnYEycKbU6znGA0 5bNM7KZ8uo4Aw8BVnDLMAL/ibYJnLRJEkgcpb6bB2t4mIOcZraIsthjUOi9ohxB2e4di xhcrjc9kF1PBGeNwmXsyVm2sXyBTwTvkQ/AZ7z/zudD5P0nbSY5N+FfHK7fjkUeuHwMa nAow== X-Forwarded-Encrypted: i=1; AJvYcCUJLFpYz+1ugkVX1X9fHWaVkzuHdoYBtnlCy87tmBlQGICY7CMFdped3/5CzSBRnM59A1u0oxIfuLUaZuf9AxfX@lists.infradead.org X-Gm-Message-State: AOJu0YyO7IO/SOztNXobhW+9U7uzgIH+TlkLrcItCFHXjQhLDhX6xmv2 /hXZYkPkD68XLoSARvSxwvvKQ5qgbfMAuzgHBst/Wh8qUGvGVjhh X-Gm-Gg: ASbGnctdNiJ/wvc5bTRtSoxemT89zHSEgDaiuO5djHybeXPwd/g1av3QbvI0UWrXy7c guw7zxD2KyJt7M+2ROU1bH5u4PY4IbQ03l65Mo+8DsNHMYPYgZOhSTqy49p8ewqk5F3g5HOvYXF pCp1EOk1PluVf1OJC/Qp1CBQa7L1HFF0JxHvQANQtehfuShuR1ExYhj/VbcrpUA8v8Wui6YulAU HqZ4oXp8KgzwF8TGfcfmlIai3eu33MweFhEmXptM/qWoFGZWh/9AGS5Tswl0PBMRI/ehsefshKt BHeCOpVOx+L7Og2daP1hXzRekJ40bPavfzw= X-Google-Smtp-Source: AGHT+IErll0m3qO5vPfIOAEgdmC0C+DMiSxgMRXz0/y2x++hKd/Wp+79L1fagMz6mKX2oQCqD0CzfA== X-Received: by 2002:a5d:59a4:0:b0:390:f822:3ca3 with SMTP id ffacd0b85a97d-390f8223d8amr8118617f8f.37.1741015393939; Mon, 03 Mar 2025 07:23:13 -0800 (PST) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47b7c4fsm14904499f8f.52.2025.03.03.07.23.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:13 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 03/25] locking: Allow obtaining result of arch_mcs_spin_lock_contended Date: Mon, 3 Mar 2025 07:22:43 -0800 Message-ID: <20250303152305.3195648-4-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1052; h=from:subject; bh=WHhXMqIdalfkSexlY5e1BRqspbIYdrDmDQT3rX3AKP8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWVvzlsIk8Mh+hFnelZKKgCgtqU9iOBLKbPXk+b Hr7ixKqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlQAKCRBM4MiGSL8Ryow+EA C1eFvCJZOUlVvCcUeprQTPLwv72Q6ykc0vZYM6C30cB2mh8RDYCi22FekqJDDuIBP3ggLIEkBn++iL HayE8E7tcJD+XUme2bPMlycWLuboqWEkhYbEUmsxEiHDMF0tIGq3CXtTX/EFmdPiqhrfYjK2/U0WMS NOVAt9VBBJ2Gkr/Ahg/bKs76BmL83Wf4QvULiFNYpGucOQLCWNYYcOf3+zvSDaQnE1PLiL4lwcEWgJ BWhLal7zLGBO37rSCp8pXZxAjZbOmlIMd7El/c3QBiJ/AUbfRk2SI2aoPiCG7z+vImKhFG9QUUxpHv FXE64wiHpCMTGwLOtsovyQ7+cDSbyd9myY4DYnoOFearxpOGJS6oDKVpKL+YUs0/lrw0EOGtKl8onS nwiFb0cJg40JCw+qXxsA3WdiI5uIK9cMKMra5mIr7h6U3ffX7Tt02zhl0EZujgfUYOKA9Jtck4Hd1x mkS1YmJ/dZEejK5aHgZ5LEBPw9S9BHwJj17/s5GOZti6Ra8ymdC+LLaxSZUDhaBNfI2G2cQJ70ISBx MVd2vcRakzCnM/Eax/CdmqnGby9zEaPOamB6GxfsXaYvqAw9KSqQjVShNIdzUnP/UWD4l5rcaZ50ZC /DEcBbUVqWlsEJY2xKVzRGMI5YAnPpqIb2pZParSp7t5oTMcgQWd2IuyPnAA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072315_332124_F558BA73 X-CRM114-Status: GOOD ( 10.91 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To support upcoming changes that require inspecting the return value once the conditional waiting loop in arch_mcs_spin_lock_contended terminates, modify the macro to preserve the result of smp_cond_load_acquire. This enables checking the return value as needed, which will help disambiguate the MCS node’s locked state in future patches. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/mcs_spinlock.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 16160ca8907f..5c92ba199b90 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -24,9 +24,7 @@ * spinning, and smp_cond_load_acquire() provides that behavior. */ #define arch_mcs_spin_lock_contended(l) \ -do { \ - smp_cond_load_acquire(l, VAL); \ -} while (0) + smp_cond_load_acquire(l, VAL) #endif #ifndef arch_mcs_spin_unlock_contended From patchwork Mon Mar 3 15:22:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BC59BC282D1 for ; Mon, 3 Mar 2025 15:33:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=vhHDQAJKH3ocHzxkhwcqvNhQHWrp7KSdfiwoYhzvJgU=; b=Fh3kVztQEmkHvjmB6DJvrJubq2 AxMiw/SwN4T5DLYa0fhv9tuEawpxR4pp2TG07m155uQ6wMbNzNDBI+P4ZgTmXY+FbtIShz1rQRCUA WvlP527GLEVofFS8NrZl1OttumGX4l2VFnvdrY8U8HCrUU14C6IKHs4v+qtmPdTOclOi1IqZZzCCe xHAPJa5Uhx5UFK+7lpPPF6XvsSIcJVddBuzJ7MkIMe8Rio42AFhMqJFRw1PF3iSPggF9Sk+QMJYu6 0IH5wW+ELatEK2rcVRpsKG9kxIhRLWEE2249uf6A5AyO1rm131lmStPJwuj8Vmjn5AZyiWTN1/MBa skOdiPrQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7mn-00000001KX5-14H0; Mon, 03 Mar 2025 15:32:53 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dY-00000001IC1-0kjB for linux-arm-kernel@bombadil.infradead.org; Mon, 03 Mar 2025 15:23:20 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=vhHDQAJKH3ocHzxkhwcqvNhQHWrp7KSdfiwoYhzvJgU=; b=f0WPLXfWcGV5F+U0YwQ83gkSe7 VTplooD045of48pp+3Vr0ZVYxZvLRYTz6jKxawUCMAD0jlJrfmoSZMAMnpWvsW9FvU8R4w4TlNOce Qy7K/Gl5xmJlQniQSVJZGWgtwI6EfTGfKYlxgcA4FSmmqv/67ORedrunvW2j1Bn66/bmiXtnUKryx SNcyJQNHEngkC9QkGOsGJvqdNSrupewu6pO+S8aMYTkl9Sd+bj14QRs3JRq/H+Eoibl+DCVcdYbPc K6Gixiw6HWzuVcH8YJDNWRmQGrjanFjxnn95tC7oVZPCHgCS24ybYxd9rBRW2eURqHpxxvdxERZs9 3diUvRgA==; Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by desiato.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dU-00000004Zy5-2AoV for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:18 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-4393dc02b78so29223145e9.3 for ; Mon, 03 Mar 2025 07:23:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015395; x=1741620195; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vhHDQAJKH3ocHzxkhwcqvNhQHWrp7KSdfiwoYhzvJgU=; b=FPThzxA+Q//Xi5rb10zgM6KtqvyOZMhVxM2tDIHdMDcHm9/VIYZ5uzoUZe4wUO9r56 fTm9bGGw1pIPiXqBUDhBH1YZLFq+ZpyLbOA9rTDU3dUIZNXq2tiAM8gKbfqtclW0tVyM 9sgFTiMVe6qiVq/EpP3sRxIISEtf1NzjlnMZxf9DsVomRwIemxB41U1eXtK+K9zU8nhf R+hoDgA4RCFl7E56+erB62DpW1suEQpICC/uVvSxt2JeWHoyPGnI4RaFcS0D+5juj0je CDhRydr6Yn+zxZQogurM9Y5GcqibK791cnlZsAgeV0PFQfrmNtbiLrKU7YvMXHYZnYPr SIqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015395; x=1741620195; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vhHDQAJKH3ocHzxkhwcqvNhQHWrp7KSdfiwoYhzvJgU=; b=qRsWq7taLglRrpp1GOqH8sXW86Dqyo/QGlwYRnHpamWwDlEJoEb3EnmY4QaSbJGa+k vcSmARWJ26CeOoKQIXdZ438A+P0xyPBrr3yqHUfA52zabz1/GXb3pI+C1+A3xDPE6htM hnzoib1a4Lz8+geoe4q5xCW5DsStboe61/pzAiZ0bQ1ui9mJ5/0Uq23v3XNARK6sG1Vz 3upOd2wkipxABvbiA8/CaoupI/xXqTuBbF9pp7tZVlAjCFU30ya1mAZag/Tw6z+JFwpa P3O5CBSXXj2HM/yLNuxThnupVqXKg3whLCcWW9cufScCX1kiezE5ion+BAWrJNh8a4ch IeJw== X-Forwarded-Encrypted: i=1; AJvYcCUiCKdynbDsknm7u7uoK7Gs2a0J7AT4ZrH0gGMCkN6EFE2cffvk2PQGW+sRMWikQwTLeqfpPepbNsfUJjcHBdpG@lists.infradead.org X-Gm-Message-State: AOJu0Ywlm3aAZa8EhnUFQw9SzD3eYq7vVgCtk1/T1zEZYz75V74r0dQQ nq4DCcJY7ltJ3pgb0aNVtCqulrTh4CyzYo/TcfLzYAyV2f+/qfaynkzV19i97lE= X-Gm-Gg: ASbGncvd29jvmAG7i0qn5oQhD7FmeABvebB1UY6vwZ4y15Ep/5g1ZoEuLxXANJ0q/j1 /622cAk8ILb2UodI/1IJ9ZeRhpMY2WCE640xJf0nQlRvi4KgWb6G6xgSMdDHel0w4KUH+kNQ9Jg o3xqsQxi9UicT3E6r7iPvCDtDkoV84B1/MqVYVbhrUAco+LEzKDXjIvTUjSjZ/M+jV3jlUlz9S8 UOzixiDkXFIKasaQSpj8D06BTpyviEmGMrB1ac30lHDkyerWvSChHAkoOtW9kAw4b8jTs33+RMM 1eYaIJLaUFoSoGrFVLBpGU3Vlcs7QzTmgBI= X-Google-Smtp-Source: AGHT+IE3JAxXjo712pa7fQEf55+AI3IriNv8TobwwGlXtFQEUu+iVyTKfQujHQCVAl/7LqPxVk7beQ== X-Received: by 2002:a05:6000:1889:b0:391:1388:64ba with SMTP id ffacd0b85a97d-391138867d4mr1529975f8f.53.1741015395150; Mon, 03 Mar 2025 07:23:15 -0800 (PST) Received: from localhost ([2a03:2880:31ff:53::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43aba532ba6sm192659325e9.12.2025.03.03.07.23.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:14 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 04/25] locking: Copy out qspinlock.c to rqspinlock.c Date: Mon, 3 Mar 2025 07:22:44 -0800 Message-ID: <20250303152305.3195648-5-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=14136; h=from:subject; bh=FniirE2Pxgu9OpGG3yatwrPHxVskM6KALs9TZCH9in0=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWndtVbKfQ0brUJQ1Pu+p+CwSfYBiqIiL7G+dP dbh8lRCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyhVqD/ 9bLvU1dNWAPayltIPZP16vP1AsOwOHsJyFNXWpDwNiZ1qq0VKLVkyh39BB5atzrU8GmCmvOtk4O21X zxnGzqe3KHERKzUnHAxQ84tq7TtWJmYpGb782XCl4nGCku28qV4MMEVf0kY978uJDlkI22pwpgzxXl Ad/D867MgAc6EnI7AWwCjx2npU0iiyt8W51gEKTcLQDYOV5an6SMzUJaStowYBPPOF9mZmmO/wlgSy x0tWHA3CX2ArkkHhhSPRjicpI5bPnZPRXJDHGP6qT5QHrakIh/iWZ6PgLluepWjV9l0sJJzjzturnU 5wMo9UFPiC2rDbO6mNGZ46vb5CCMdbXvReB/KUoGr/pYYSLsltLfKFt1Mq8mNXuTeSjOEwrrl45qgQ sge0VZeMBTbe0+LF41B8beG6OLHzHrRoI8HOj5/q5nN5UOUhvjVYkQY23mNGf5Otbs1iBxKUAyQvFS 9bRBOVst7t2mGL3JfUjPU6vhvThxbdWmwiJR9I1BPkEhkSI5Y8qrFR3xatt/2y4QmSYv03XvGMVxiU 8ux76lHK7T3Z8akuGqQUDJo7e8s7xsc4VQZq+432lnziprAu374O5OhUeEUlGQ+g/UB9aV5dRqWaQx cNzDpDl8Pw+F8zsoiwmdYpfZ+eJi06ZdG228YQJakYYr4dF2raMxHKG7hV+w== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_152316_744374_CD97274A X-CRM114-Status: GOOD ( 32.56 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org In preparation for introducing a new lock implementation, Resilient Queued Spin Lock, or rqspinlock, we first begin our modifications by using the existing qspinlock.c code as the base. Simply copy the code to a new file and rename functions and variables from 'queued' to 'resilient_queued'. This helps each subsequent commit in clearly showing how and where the code is being changed. The only change after a literal copy in this commit is renaming the functions where necessary, and rename qnodes to rqnodes. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++ 1 file changed, 410 insertions(+) create mode 100644 kernel/locking/rqspinlock.c diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c new file mode 100644 index 000000000000..143d9dda36f9 --- /dev/null +++ b/kernel/locking/rqspinlock.c @@ -0,0 +1,410 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Resilient Queued Spin Lock + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * (C) Copyright 2013-2014,2018 Red Hat, Inc. + * (C) Copyright 2015 Intel Corp. + * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP + * + * Authors: Waiman Long + * Peter Zijlstra + */ + +#ifndef _GEN_PV_LOCK_SLOWPATH + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Include queued spinlock definitions and statistics code + */ +#include "qspinlock.h" +#include "qspinlock_stat.h" + +/* + * The basic principle of a queue-based spinlock can best be understood + * by studying a classic queue-based spinlock implementation called the + * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable + * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and + * Scott") is available at + * + * https://bugzilla.kernel.org/show_bug.cgi?id=206115 + * + * This queued spinlock implementation is based on the MCS lock, however to + * make it fit the 4 bytes we assume spinlock_t to be, and preserve its + * existing API, we must modify it somehow. + * + * In particular; where the traditional MCS lock consists of a tail pointer + * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to + * unlock the next pending (next->locked), we compress both these: {tail, + * next->locked} into a single u32 value. + * + * Since a spinlock disables recursion of its own context and there is a limit + * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there + * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now + * we can encode the tail by combining the 2-bit nesting level with the cpu + * number. With one byte for the lock value and 3 bytes for the tail, only a + * 32-bit word is now needed. Even though we only need 1 bit for the lock, + * we extend it to a full byte to achieve better performance for architectures + * that support atomic byte write. + * + * We also change the first spinner to spin on the lock bit instead of its + * node; whereby avoiding the need to carry a node from lock to unlock, and + * preserving existing lock API. This also makes the unlock code simpler and + * faster. + * + * N.B. The current implementation only supports architectures that allow + * atomic operations on smaller 8-bit and 16-bit data types. + * + */ + +#include "mcs_spinlock.h" + +/* + * Per-CPU queue node structures; we can never have more than 4 nested + * contexts: task, softirq, hardirq, nmi. + * + * Exactly fits one 64-byte cacheline on a 64-bit architecture. + * + * PV doubles the storage and uses the second cacheline for PV state. + */ +static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); + +/* + * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs + * for all the PV callbacks. + */ + +static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } +static __always_inline void __pv_wait_node(struct mcs_spinlock *node, + struct mcs_spinlock *prev) { } +static __always_inline void __pv_kick_node(struct qspinlock *lock, + struct mcs_spinlock *node) { } +static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, + struct mcs_spinlock *node) + { return 0; } + +#define pv_enabled() false + +#define pv_init_node __pv_init_node +#define pv_wait_node __pv_wait_node +#define pv_kick_node __pv_kick_node +#define pv_wait_head_or_lock __pv_wait_head_or_lock + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define resilient_queued_spin_lock_slowpath native_resilient_queued_spin_lock_slowpath +#endif + +#endif /* _GEN_PV_LOCK_SLOWPATH */ + +/** + * resilient_queued_spin_lock_slowpath - acquire the queued spinlock + * @lock: Pointer to queued spinlock structure + * @val: Current value of the queued spinlock 32-bit word + * + * (queue tail, pending bit, lock value) + * + * fast : slow : unlock + * : : + * uncontended (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0) + * : | ^--------.------. / : + * : v \ \ | : + * pending : (0,1,1) +--> (0,1,0) \ | : + * : | ^--' | | : + * : v | | : + * uncontended : (n,x,y) +--> (n,0,0) --' | : + * queue : | ^--' | : + * : v | : + * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : + * queue : ^--' : + */ +void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +{ + struct mcs_spinlock *prev, *next, *node; + u32 old, tail; + int idx; + + BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + + if (pv_enabled()) + goto pv_queue; + + if (virt_spin_lock(lock)) + return; + + /* + * Wait for in-progress pending->locked hand-overs with a bounded + * number of spins so that we guarantee forward progress. + * + * 0,1,0 -> 0,0,1 + */ + if (val == _Q_PENDING_VAL) { + int cnt = _Q_PENDING_LOOPS; + val = atomic_cond_read_relaxed(&lock->val, + (VAL != _Q_PENDING_VAL) || !cnt--); + } + + /* + * If we observe any contention; queue. + */ + if (val & ~_Q_LOCKED_MASK) + goto queue; + + /* + * trylock || pending + * + * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock + */ + val = queued_fetch_set_pending_acquire(lock); + + /* + * If we observe contention, there is a concurrent locker. + * + * Undo and queue; our setting of PENDING might have made the + * n,0,0 -> 0,0,0 transition fail and it will now be waiting + * on @next to become !NULL. + */ + if (unlikely(val & ~_Q_LOCKED_MASK)) { + + /* Undo PENDING if we set it. */ + if (!(val & _Q_PENDING_MASK)) + clear_pending(lock); + + goto queue; + } + + /* + * We're pending, wait for the owner to go away. + * + * 0,1,1 -> *,1,0 + * + * this wait loop must be a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because not all + * clear_pending_set_locked() implementations imply full + * barriers. + */ + if (val & _Q_LOCKED_MASK) + smp_cond_load_acquire(&lock->locked, !VAL); + + /* + * take ownership and clear the pending bit. + * + * 0,1,0 -> 0,0,1 + */ + clear_pending_set_locked(lock); + lockevent_inc(lock_pending); + return; + + /* + * End of pending bit optimistic spinning and beginning of MCS + * queuing. + */ +queue: + lockevent_inc(lock_slowpath); +pv_queue: + node = this_cpu_ptr(&rqnodes[0].mcs); + idx = node->count++; + tail = encode_tail(smp_processor_id(), idx); + + trace_contention_begin(lock, LCB_F_SPIN); + + /* + * 4 nodes are allocated based on the assumption that there will + * not be nested NMIs taking spinlocks. That may not be true in + * some architectures even though the chance of needing more than + * 4 nodes will still be extremely unlikely. When that happens, + * we fall back to spinning on the lock directly without using + * any MCS node. This is not the most elegant solution, but is + * simple enough. + */ + if (unlikely(idx >= _Q_MAX_NODES)) { + lockevent_inc(lock_no_node); + while (!queued_spin_trylock(lock)) + cpu_relax(); + goto release; + } + + node = grab_mcs_node(node, idx); + + /* + * Keep counts of non-zero index values: + */ + lockevent_cond_inc(lock_use_node2 + idx - 1, idx); + + /* + * Ensure that we increment the head node->count before initialising + * the actual node. If the compiler is kind enough to reorder these + * stores, then an IRQ could overwrite our assignments. + */ + barrier(); + + node->locked = 0; + node->next = NULL; + pv_init_node(node); + + /* + * We touched a (possibly) cold cacheline in the per-cpu queue node; + * attempt the trylock once more in the hope someone let go while we + * weren't watching. + */ + if (queued_spin_trylock(lock)) + goto release; + + /* + * Ensure that the initialisation of @node is complete before we + * publish the updated tail via xchg_tail() and potentially link + * @node into the waitqueue via WRITE_ONCE(prev->next, node) below. + */ + smp_wmb(); + + /* + * Publish the updated tail. + * We have already touched the queueing cacheline; don't bother with + * pending stuff. + * + * p,*,* -> n,*,* + */ + old = xchg_tail(lock, tail); + next = NULL; + + /* + * if there was a previous node; link it and wait until reaching the + * head of the waitqueue. + */ + if (old & _Q_TAIL_MASK) { + prev = decode_tail(old, rqnodes); + + /* Link @node into the waitqueue. */ + WRITE_ONCE(prev->next, node); + + pv_wait_node(node, prev); + arch_mcs_spin_lock_contended(&node->locked); + + /* + * While waiting for the MCS lock, the next pointer may have + * been set by another lock waiter. We optimistically load + * the next pointer & prefetch the cacheline for writing + * to reduce latency in the upcoming MCS unlock operation. + */ + next = READ_ONCE(node->next); + if (next) + prefetchw(next); + } + + /* + * we're at the head of the waitqueue, wait for the owner & pending to + * go away. + * + * *,x,y -> *,0,0 + * + * this wait loop must use a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because the set_locked() function below + * does not imply a full barrier. + * + * The PV pv_wait_head_or_lock function, if active, will acquire + * the lock and return a non-zero value. So we have to skip the + * atomic_cond_read_acquire() call. As the next PV queue head hasn't + * been designated yet, there is no way for the locked value to become + * _Q_SLOW_VAL. So both the set_locked() and the + * atomic_cmpxchg_relaxed() calls will be safe. + * + * If PV isn't active, 0 will be returned instead. + * + */ + if ((val = pv_wait_head_or_lock(lock, node))) + goto locked; + + val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); + +locked: + /* + * claim the lock: + * + * n,0,0 -> 0,0,1 : lock, uncontended + * *,*,0 -> *,*,1 : lock, contended + * + * If the queue head is the only one in the queue (lock value == tail) + * and nobody is pending, clear the tail code and grab the lock. + * Otherwise, we only need to grab the lock. + */ + + /* + * In the PV case we might already have _Q_LOCKED_VAL set, because + * of lock stealing; therefore we must also allow: + * + * n,0,1 -> 0,0,1 + * + * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the + * above wait condition, therefore any concurrent setting of + * PENDING will make the uncontended transition fail. + */ + if ((val & _Q_TAIL_MASK) == tail) { + if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL)) + goto release; /* No contention */ + } + + /* + * Either somebody is queued behind us or _Q_PENDING_VAL got set + * which will then detect the remaining tail and queue behind us + * ensuring we'll see a @next. + */ + set_locked(lock); + + /* + * contended path; wait for next if not observed yet, release. + */ + if (!next) + next = smp_cond_load_relaxed(&node->next, (VAL)); + + arch_mcs_spin_unlock_contended(&next->locked); + pv_kick_node(lock, next); + +release: + trace_contention_end(lock, 0); + + /* + * release the node + */ + __this_cpu_dec(rqnodes[0].mcs.count); +} +EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); + +/* + * Generate the paravirt code for resilient_queued_spin_unlock_slowpath(). + */ +#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS) +#define _GEN_PV_LOCK_SLOWPATH + +#undef pv_enabled +#define pv_enabled() true + +#undef pv_init_node +#undef pv_wait_node +#undef pv_kick_node +#undef pv_wait_head_or_lock + +#undef resilient_queued_spin_lock_slowpath +#define resilient_queued_spin_lock_slowpath __pv_resilient_queued_spin_lock_slowpath + +#include "qspinlock_paravirt.h" +#include "rqspinlock.c" + +bool nopvspin; +static __init int parse_nopvspin(char *arg) +{ + nopvspin = true; + return 0; +} +early_param("nopvspin", parse_nopvspin); +#endif From patchwork Mon Mar 3 15:22:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999056 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DF55CC282CD for ; Mon, 3 Mar 2025 15:31:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=R4KdASnnyjPBe0E8JbE/B8q1tkmUaCwFOXMgt/4BU28=; b=H3HJQroYZTmZ3LdrLQT1McsFFd SiuCU0FUaBNAUUKP95TbYGEPqo7U2AQri+sANZ2OLYL2UaIIDwon5W+TOU8cAztOfDZSqrydwZxZd 1G4zWiutpAKOZLpQ/iDT4mmzFQ6p1CaCuIuy5b2BvVwPYXc4TBly8t3Lk+MpSSdp5esH0dtSSwuff Gp08VafLHYhtiA3Ox130ABf8/axVOQQOPF5kyB6r/wqHa6KvY2fxM60bCDVLb8t4frJmlcCgRqrMl AfVK46i13d+daNAlZ9QIchfEvcbb+yIo5dh7YOJ03mXPt2zQ2vDFYRfZVSmJj0xqI1rK24sgLvvKe VlXcj1MA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7lE-00000001KBL-2XeH; Mon, 03 Mar 2025 15:31:16 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dW-00000001IAr-25ta for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:19 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-4393dc02b78so29223295e9.3 for ; Mon, 03 Mar 2025 07:23:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015397; x=1741620197; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=R4KdASnnyjPBe0E8JbE/B8q1tkmUaCwFOXMgt/4BU28=; b=YnGLHCCAsjkGf88yxFkgPPrrdQySvcTuN6ZthxeeSvYAgN/MW1iuCytNRqiGQDD+gd 6SkQ8s9WbkEwCPvNhRZAn3hFk8htp05w+CeB81effqsJNPXjmasEeXCyQD/2bClgDBNC OtuwELd+pX4Ws6sB+XeAMc2QyBtyNwP+gO5LVXXJooTe4T+aHISv20TZttllgoJs+jiU Wt3ZPLcIci8TWq8EQNHrJO8oVWqOPS8wnz88AqExol9MruLiESfdza/DIZ1aRg1O/hyb bUojvvQ95wAPvfudRWFOOnxQ9LTt0y3lf+C9D2MKFNrYpA/F1x8VUtaYCFA9hmjGEkSo Wr6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015397; x=1741620197; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R4KdASnnyjPBe0E8JbE/B8q1tkmUaCwFOXMgt/4BU28=; b=sa43RFN8KTH15dmE9wtUizXXtkZrCG5kUHKxiVNqMuXT2QgJr59fi0TWCG+s29gGz1 dVx/y5l39D5QYaV4tpqu9fdXlNV+gYoD+9IGep6HjOrCmiEkFd2EAmPAEP/PpADFUWgG /4RTRycp2UBr3oR+ltGK4ydI1s/xwETJHrWv/FLiGwv/6ZNlWl6WoKvC0ODBzrdyhrkg 4uAXLrWHBB05avno70WWSH9D0ny3Z6QFirQy/PK1qBvDcP8JhRSdlqu5GW/ShyimWqmn cfb3DIXbgLpwwBkfyKWe3orAZX9F31EqVKsNncnrR2RLaAzlRTgvl74V8pys4XsLUdcI 98jg== X-Forwarded-Encrypted: i=1; AJvYcCVBrblxvkyMcOsn4XMRedYyRwEAQEVKuN0hVXaBl1Mbf8D+eJhQgboZOaYS3vljKpgUGsmKJpqxmt1Ohh0GMSXB@lists.infradead.org X-Gm-Message-State: AOJu0YwkXW+GPMBGK8tpWwTXBAnMW39hrldVol2KhNHuZATXi8mSVp3G hwaVWSXT0jZy8EexdlwBZH3NB6ZG9PvJHSU0nMGhST0ezalnktcp X-Gm-Gg: ASbGncto8KnyaZIMmXI/jlhwb7xJRuqMQlRomFONpdZUOyJ7SLvGuNMo1wsxN89XP+f bJMkMQr7B4NvNr8bBGaIt8QjYT1NmcOgs9XA7ODdKtm3oIxY/PbGMdG3GRvwZ5G8+hVsSDqBnXz vP4bOdJCY/AouKRkGXHWK1UMUNFXGfmCDLc+xHE4UflruUNlx/3JMGCme0LrpXwBxxQiIKN8ol0 4HG4WRZD9pLPbpNrRmrHoAyEe0DGGdu9ddpfvo+dZCx/WE/ckmOELRNVNcXPtWtm3onItx3jCRy bV44iU3CeDRSemWaLw/dMGZdeARzNU5cysI= X-Google-Smtp-Source: AGHT+IEXzqJkS/D47SGG6s47BNUvrMNCFfD+WCo/Ixy2e0y3ir/cTpzrdXL3Bgt5bdvzEI6kv6bwOQ== X-Received: by 2002:a05:600c:3ba9:b0:439:a155:549d with SMTP id 5b1f17b1804b1-43ba66e74f8mr113912555e9.12.1741015396488; Mon, 03 Mar 2025 07:23:16 -0800 (PST) Received: from localhost ([2a03:2880:31ff:50::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc18452c3sm41377135e9.25.2025.03.03.07.23.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:15 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 05/25] rqspinlock: Add rqspinlock.h header Date: Mon, 3 Mar 2025 07:22:45 -0800 Message-ID: <20250303152305.3195648-6-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2297; h=from:subject; bh=C0ciCNeduyvvQFAO0DwlqwtcddwN6zNDgdODlI5CgNw=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWmeJIFZGQqQJ9zuXTndNKetm3e9meOOzs99f5 56FqOzGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyoR+D/ 0VwsuuvPBO31kFGb0t61y+N4gbFPQVPDJBrXCTRuRgO0fOw5WCQWwcpin+e1kMbSgYKAZzVwfLS3fl USxgEUjPhUFvS788CF15UqpS6n/tM0irZYhiD/4t5EWU0g0ioKfKj6gj6yBwqsSMBTsIRoIbfdAtpW A1sQkzl5Q8TKau41XUFL5/fRetjHPVeJCUKabg5SUIG5I9iY6ZjqHLHvD+LFUnOo8bFfHwkb1uWXh3 hFp6LuX4Ip5Q0OVOMs+ec1b0966SfNEsHIk3+Z8XY4f0eqnk7Ez83Zc/Hjz7csGn6uh8bcb6eG7ZYC +/A6aSKaqeZbUC4ssfb84IHdHW7hVWFUk4czo8NMacOAr3Qy1tlY/IIaFasBI7NmhQiZgLc8EPbgHi 7TMVD7hKBCP+NHjd0dPY2vu1dxsGHJ7glfhHL4X9QrkiqZi5/Mkl3n7BQmfN/AKLVD7Azctv4JBglx VxQbreb1HGCpiW2R+/1KbXrT472iuSH/6kefuuYYh1hQbdVrvkEdNJfBXlKOJBdRGO9hA4i5b/tbBn 1CQPYg9cYRVAZ147OeK3geUwV37j9ZjgLrI0+k1TkwXMQtUgriZg5GETC7ZgO+AO2BGZkTI4gFTPr6 Jo2JmjrdfCTWQ/LMn3yeXbbAnc4me8NdPXySZtWjaqKhp7V4WbNvPwwg8qPw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072318_529971_4F8AABC2 X-CRM114-Status: GOOD ( 15.25 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org This header contains the public declarations usable in the rest of the kernel for rqspinlock. Let's also type alias qspinlock to rqspinlock_t to ensure consistent use of the new lock type. We want to remove dependence on the qspinlock type in later patches as we need to provide a test-and-set fallback, hence begin abstracting away from now onwards. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 19 +++++++++++++++++++ kernel/locking/rqspinlock.c | 3 ++- 2 files changed, 21 insertions(+), 1 deletion(-) create mode 100644 include/asm-generic/rqspinlock.h diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h new file mode 100644 index 000000000000..54860b519571 --- /dev/null +++ b/include/asm-generic/rqspinlock.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Resilient Queued Spin Lock + * + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. + * + * Authors: Kumar Kartikeya Dwivedi + */ +#ifndef __ASM_GENERIC_RQSPINLOCK_H +#define __ASM_GENERIC_RQSPINLOCK_H + +#include + +struct qspinlock; +typedef struct qspinlock rqspinlock_t; + +extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); + +#endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 143d9dda36f9..414a3ec8cf70 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -23,6 +23,7 @@ #include #include #include +#include /* * Include queued spinlock definitions and statistics code @@ -127,7 +128,7 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; u32 old, tail; From patchwork Mon Mar 3 15:22:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3CA0EC282CD for ; Mon, 3 Mar 2025 15:37:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=VWj7fRL2wjv9R4mziQW8bTTqGP2D1A3sv3W8wRB2IlU=; b=nRSrk5Sg3ANwn6I2FAb6F7mTmM 2ESZ3V/9t0uJ9PzDrak+Mro+k3xWS6Tk8vRSCsrM9jHSYIHMxQC5GFS9ThOxuNrVyUYiq5O3UJaK/ kC4WopE6HqNRIBw1IvHmlDchjogr13jZzebIQX1mUlatYoYUibP3FO/UR3XY2huipdEw6+timCOgF t8VZrsyr3I7ECvbzX/Not5xJptpPoYtKKj5uhn4ksK56ED7RUYDZQVxC57xndk3jyBHecy0yidA8y JP+gXHkZyNeyTf2g8yyEBYAfl4THF1/QcDZgOEOcgNnZ2zGF+QmXDmDjzegAxMPAPVehbkRZyCklV ZokyGMtg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7rS-00000001Lok-1yM8; Mon, 03 Mar 2025 15:37:42 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dc-00000001IEa-14f4 for linux-arm-kernel@bombadil.infradead.org; Mon, 03 Mar 2025 15:23:24 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=VWj7fRL2wjv9R4mziQW8bTTqGP2D1A3sv3W8wRB2IlU=; b=mtvOR/lLTkGp9N60TOrGEYtejh 1xcWcZ3yIYzqw6r85ES+aL0saiqfRaj8fDvV4G0I3/xX+PfxuEzhd+gfHHqXZ9PqyGmZUP/F7OwrM b34fucCS42Td8nYoiHa53sum1eL+gUYvyTl539rx9A5OrYTvSzz34PO0vCpCUvIVSv8kI9Z+u/Lid U2kLsl62ZasaLHHdUy9CMxYvRz2rGGSnA6R3vhYGBFg/KpJpTJac1yTuz5bgvyhmsS8pc4gJUqqn+ wqrH86xXP0mZ/s/I65jrIy8mdShCGsxBzcFCB8AM2H7OicHhqOe1JjfmTaiLBPt2Vlk0usgDJXOQk pj6nWRYg==; Received: from mail-wr1-x443.google.com ([2a00:1450:4864:20::443]) by desiato.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dY-00000004ZyZ-0qVV for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:22 +0000 Received: by mail-wr1-x443.google.com with SMTP id ffacd0b85a97d-390f69e71c8so1672443f8f.0 for ; Mon, 03 Mar 2025 07:23:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015398; x=1741620198; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VWj7fRL2wjv9R4mziQW8bTTqGP2D1A3sv3W8wRB2IlU=; b=RLnzOQoa8BCAn06Pi8doRdWLX1ab/JMcJXCTpaOykO0LxHE2iLeVscjddAlDPi1Ipq 4bNXtYmjqbNnHlOJb+wXDpMM8HJhxtMhuxHHzCBDUORL9YD3GKwDqaYoprX0aH7vXzYi xcFUnYjv/XRjMOMQ4mB72MU6ok82saiZhVkGJOi9i/8AsUhPGqxSKAm6oqYK/SK3nXeK IK7FldT7KpqQFn8ly9G/QQq56jL9c9nVV4+YErJcMBdiayIkO6NCNv6+xEsX8mYTkFON UaRaCqR1C7BjNFJkdjTVgiHE0Ofk6lhCg58Rus0ZA7FmLaw3QaT5Ke19REpDEO+ziMBu 82pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015398; x=1741620198; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VWj7fRL2wjv9R4mziQW8bTTqGP2D1A3sv3W8wRB2IlU=; b=BpimqRNxcnSjCgTM3h47vZYu8/5OlCoMrGbyrhZhIMxzDgIVraS4x1pLvRN2NfjuPf fFAiC+tOCz4btP66gZJOAbtLBP+sb0VftvT+vzd2ZWZmbcwCbFV9zyfZ+5+S+cUeq0t4 +n5W8uB0xNXbfojjqt2qmUKW0PuNq0XaNFbx98VQB5aAvaZFSs1UMo/3CU2QawyF5HJn AuCrs4OR6KL0nG2UDwv9dMyYvPi0LgzLYKI0UQUzZhfSw1nZ+aThAFIEXesRlv75SIv3 lujlREfskSVr2awnZVwQBQIYHGS71nwevAWYsU7pY4V8HereWRnhYt3G3EG+YZrDZsmH R2HA== X-Forwarded-Encrypted: i=1; AJvYcCXhYkgfbSyt1jFL2qATOLQPais6I9yv5J6qzlLR+sMjIgjShTJHc07PkfMeiYvzkyHUrTYf1Exf41zSMeqR0O+E@lists.infradead.org X-Gm-Message-State: AOJu0Yxzml+sKfZN8d5B8GFo1ow2OYRILCiRHLg7a72VzY5EKoWyi8NK wrIuFS3xNoc+BBBj/mtcTE5dir9DEnyZ/F/DfsMboLWmIs0GcNJx1xopLNvTN9s= X-Gm-Gg: ASbGncuQCiYXvSiPOocVHD6x1GVgv8PCyFIXfL2kGUo54zkMCJpIWw5XC5woD2N3mxG B46LHXdqhUlwdDWOblx3q9PS0eDcCXcmeNsZuHnAyz4tXYlhh6WrnihI4U1Gue4kRAAiZqbshp2 itVhRFJcWPlI8feEi563JWNogmNUbzmLZzZMANfwRyW9oWfJyM06qMkJN2eZPVoWVnUq+1MAuXC 8l3e7y3ghR9V823q74eF5jOOn3693Lvk4fy36nbU1fmYxidqZO/j/L2r8kK9PK+s+QZCsh9gDWC gLC/tYSp8tw4l8pOo7y0R93QYz+kcT9aWw== X-Google-Smtp-Source: AGHT+IFjm+ROX4Lt+V90Jn3TMk492UXdruKl3KBdRoIEAwuyla143y7WSrkvsYEZT3Xx7lgzY1sCvw== X-Received: by 2002:a05:6000:1787:b0:38d:df15:2770 with SMTP id ffacd0b85a97d-390e15da77amr15041952f8f.0.1741015397860; Mon, 03 Mar 2025 07:23:17 -0800 (PST) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43b736f8034sm169770555e9.4.2025.03.03.07.23.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:17 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 06/25] rqspinlock: Drop PV and virtualization support Date: Mon, 3 Mar 2025 07:22:46 -0800 Message-ID: <20250303152305.3195648-7-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6795; h=from:subject; bh=x1Z+pB9xhs3i58kf9Kac9b92frWP4j0eHxNENfgFcNU=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWAhlIoArRUBi9+QCUAqMzXrVAKF0JtHZdQgoX fPFJU92JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RygwXEA DAmIu9BajEdITHz1mKPuYwgeMXa7iRtPHxfqCpPmzf/TH+bsToJCV02MBZgALM6kWCm/5rAUSraLhw BnMrlVK/RcAw8Kxjwu1xmv4nZtV3SYbVUGx9WVMII5Oeyew86x2PDffmVG2n1obhHHjW3irYV8YsdE Yp1hHfinMz7/BSq/yV3BC5t/Xnfyeqm/J8YZY2QWNJBC4EKexPFCjmswgCSAaSFCcxhApWXXJzknMU MFRWSRvdZ44k9m+DhbcggcjpFSytSET3RxyTw4yuiGnXVqdQBBg0JBdMGJz/TatR2aEVIb7cJrYFAg j53Yifny9409xdRc2gl46en9AmpDm3/WgCsa0MG4u2UZyRfxyJFucVeD8D37S0ybhHJqnoRd8Idhi7 QmQz4h/wP2rzNMb9gJ5XgohE5pZ2V72I2JHpPuL7p4uQ5QTqHm+xQyirFjDGGmfxNDIuH/1w//0thM DL6ypO4I4NPgIyNpKI0IVWdOcJ4sbxDX2seupxCRksDYYMwlRTYnEPgbMhqobCiAvBuEuRRvINLHkc gdFHigGRIq+ZNPJc74y2xJvhNd4dehMSg8IlVcZDLDmYHHHUCDiHbmlnJLWJ+hw/5iLNlSJtmRadiM JCiKxpddvVm6tqQTMDIdqvcgLcDn7u2gAg21mmdVh/8SvCqLxQsKxzk73jWA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_152320_532843_068BE536 X-CRM114-Status: GOOD ( 19.60 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Changes to rqspinlock in subsequent commits will be algorithmic modifications, which won't remain in agreement with the implementations of paravirt spin lock and virt_spin_lock support. These future changes include measures for terminating waiting loops in slow path after a certain point. While using a fair lock like qspinlock directly inside virtual machines leads to suboptimal performance under certain conditions, we cannot use the existing virtualization support before we make it resilient as well. Therefore, drop it for now. Note that we need to drop qspinlock_stat.h, as it's only relevant in case of CONFIG_PARAVIRT_SPINLOCKS=y, but we need to keep lock_events.h in the includes, which was indirectly pulled in before. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 91 +------------------------------------ 1 file changed, 1 insertion(+), 90 deletions(-) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 414a3ec8cf70..98cdcc5f1784 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -11,8 +11,6 @@ * Peter Zijlstra */ -#ifndef _GEN_PV_LOCK_SLOWPATH - #include #include #include @@ -29,7 +27,7 @@ * Include queued spinlock definitions and statistics code */ #include "qspinlock.h" -#include "qspinlock_stat.h" +#include "lock_events.h" /* * The basic principle of a queue-based spinlock can best be understood @@ -75,38 +73,9 @@ * contexts: task, softirq, hardirq, nmi. * * Exactly fits one 64-byte cacheline on a 64-bit architecture. - * - * PV doubles the storage and uses the second cacheline for PV state. */ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); -/* - * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs - * for all the PV callbacks. - */ - -static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } -static __always_inline void __pv_wait_node(struct mcs_spinlock *node, - struct mcs_spinlock *prev) { } -static __always_inline void __pv_kick_node(struct qspinlock *lock, - struct mcs_spinlock *node) { } -static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, - struct mcs_spinlock *node) - { return 0; } - -#define pv_enabled() false - -#define pv_init_node __pv_init_node -#define pv_wait_node __pv_wait_node -#define pv_kick_node __pv_kick_node -#define pv_wait_head_or_lock __pv_wait_head_or_lock - -#ifdef CONFIG_PARAVIRT_SPINLOCKS -#define resilient_queued_spin_lock_slowpath native_resilient_queued_spin_lock_slowpath -#endif - -#endif /* _GEN_PV_LOCK_SLOWPATH */ - /** * resilient_queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure @@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); - if (pv_enabled()) - goto pv_queue; - - if (virt_spin_lock(lock)) - return; - /* * Wait for in-progress pending->locked hand-overs with a bounded * number of spins so that we guarantee forward progress. @@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ queue: lockevent_inc(lock_slowpath); -pv_queue: node = this_cpu_ptr(&rqnodes[0].mcs); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) node->locked = 0; node->next = NULL; - pv_init_node(node); /* * We touched a (possibly) cold cacheline in the per-cpu queue node; @@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); - pv_wait_node(node, prev); arch_mcs_spin_lock_contended(&node->locked); /* @@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * store-release that clears the locked bit and create lock * sequentiality; this is because the set_locked() function below * does not imply a full barrier. - * - * The PV pv_wait_head_or_lock function, if active, will acquire - * the lock and return a non-zero value. So we have to skip the - * atomic_cond_read_acquire() call. As the next PV queue head hasn't - * been designated yet, there is no way for the locked value to become - * _Q_SLOW_VAL. So both the set_locked() and the - * atomic_cmpxchg_relaxed() calls will be safe. - * - * If PV isn't active, 0 will be returned instead. - * */ - if ((val = pv_wait_head_or_lock(lock, node))) - goto locked; - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); -locked: /* * claim the lock: * @@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ /* - * In the PV case we might already have _Q_LOCKED_VAL set, because - * of lock stealing; therefore we must also allow: - * - * n,0,1 -> 0,0,1 - * * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the * above wait condition, therefore any concurrent setting of * PENDING will make the uncontended transition fail. @@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) next = smp_cond_load_relaxed(&node->next, (VAL)); arch_mcs_spin_unlock_contended(&next->locked); - pv_kick_node(lock, next); release: trace_contention_end(lock, 0); @@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) __this_cpu_dec(rqnodes[0].mcs.count); } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); - -/* - * Generate the paravirt code for resilient_queued_spin_unlock_slowpath(). - */ -#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS) -#define _GEN_PV_LOCK_SLOWPATH - -#undef pv_enabled -#define pv_enabled() true - -#undef pv_init_node -#undef pv_wait_node -#undef pv_kick_node -#undef pv_wait_head_or_lock - -#undef resilient_queued_spin_lock_slowpath -#define resilient_queued_spin_lock_slowpath __pv_resilient_queued_spin_lock_slowpath - -#include "qspinlock_paravirt.h" -#include "rqspinlock.c" - -bool nopvspin; -static __init int parse_nopvspin(char *arg) -{ - nopvspin = true; - return 0; -} -early_param("nopvspin", parse_nopvspin); -#endif From patchwork Mon Mar 3 15:22:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999058 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1E450C282CD for ; Mon, 3 Mar 2025 15:34:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=N/HTxEZqZcqCwShvGdM3eA0fWGvVAu5JcH2SiIADZmE=; b=UDwuSfjpbA2u8lHDWMu4TquMtC I8/BD/s3HSGwU7ybo+LOMubRBg1Y4e4VM0jXPVZu/oCBsRzgNdQ5dZ7/MxCOQTZx9wPEPzGy1jJYQ M2wq8cC1IW5p1Xi2n90Eksl0JcmN32l9/+izQhxwQIro2fCzvTuUkHlR/C2edXnML1XWefq+sEPmT YEhJiL9pxI8Dld250nmB3wwnTlzfvUQXyDNAPST1WLUUzgWFSnb+ixx2Zldrh4nGRxO6LarDEJc0v KMLE6KYW2DYzRFbPN3U7nCRN5+wQjVHa72MQ8N8SkFjq2Gf4jOHEiJMRmE3F6jWmlYmvV7nkoZWVI aXRBrsXg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7oL-00000001Kqv-4Aoe; Mon, 03 Mar 2025 15:34:29 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dY-00000001IC2-3jbh for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:22 +0000 Received: by mail-wm1-x343.google.com with SMTP id 5b1f17b1804b1-439a1e8ba83so43690745e9.3 for ; Mon, 03 Mar 2025 07:23:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015399; x=1741620199; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=N/HTxEZqZcqCwShvGdM3eA0fWGvVAu5JcH2SiIADZmE=; b=PFInnqXCdvi3sHY3O9e1tc0n+WJZlZwGOAT8s0lEN+hmIU8KXfxMyvxyEVM8GU4Ops A3wIgoSJdNXEdOLtr7+axtyAMhB4f0+IuJWlctAaL2mE2ebcRc3ZkZKABqJiGgDaCnPN Px67/l4Y0LiiCKXTbypzGr1pqtHqFALooOzT5QfWcOiLReWmREOerrVsqg8jDh4Se9mE vw/HYjuA7Vg7xoEA3sUUC9bNWhXgZ2U15Skx5vSrAHdupwFuhYWGOesL6jVjTIm4ebvK 9A8SQqlZccTqKdXDPwNZmZCCmNThBDiYKKYiMIL3W/kufMfrvUKd0QpY4ZPTePPU22vq Odcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015399; x=1741620199; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N/HTxEZqZcqCwShvGdM3eA0fWGvVAu5JcH2SiIADZmE=; b=r7/4/TXfW3qC/vG2egWaMXckLHrBOwSqWYLEr31bOrmeQhVlQZj9jao3YQgHY4b6/o 06pvedtYEJ7DgR6IIbzTjEBKV0za1IV3uBRDy9ssofh7Khm9Wh5fxtsibSWEJ8x3nMXH paAPiWTOWWwTX9IwQU5IJcDUCKVGMoAOIXINUOAxqxlvGI7iveTSR1EFh7Bbwylp04RT fT0zt1Q1P9dAc3cqV+TdkuUZRC9Hpn2lOLztb7SmRgASCFt2jwugIRrlw1vvUBnQ/zug L5Srmkbrylo+mzLR6f8ZyFBqBZasOUbFzWM7dsjxjvhrIdB5eKiMQkjENOQShFk4Xk39 /R3w== X-Forwarded-Encrypted: i=1; AJvYcCURz2BvNm6HRJS9liLc9rAeRvfEwI2alMku6xw0YOCRo+8oW7B5ecRemK3tzxTTBuGlWD7nYmNOUeZCocJwuarx@lists.infradead.org X-Gm-Message-State: AOJu0Yx7tiEJKDrCcWXlc33gLKxkDyX0v3UPsGYKoriZFGRYFz8OEJlE Me6VaCwW3GiwhaHYt8ExwYf9k+46UUijYiU6/r7L6/3tpv3d99v3VYD0eW/jdGo= X-Gm-Gg: ASbGncuoP85iR1Ixzmu8Xk5dz7tp3WgdrlL7XyqIj2T7FsQ/2r26htR7ybi+q1rCja9 hVdKpM2yfUoc6Zsa6k7q/Ptn72/Y4KGmPakwZLondogSHlnLDr4+agkYrTQC3G1CnTKadqyHiu6 58bsHT9jMYDTRmVUvTVKT9Cm9s8nsS5W7xEQQlYRhcNFDLNOsIzQJuD+yAUJ/dG4z1I4QnOnuU2 zYeFKs2DsFzZKg1j0SEVaMeHL+PvgLyCTlyaG9dd81+wWxKP040UgnNPwyIdXA7TLuaeNxRYqGc pNrEtNA1ZHtRucJWJgHa8HetvyopiLAOYVc= X-Google-Smtp-Source: AGHT+IEbsAZvJmITJ9ICif5ulCDnUNtcT3LryQIPvfnsjMgh8qRyST/7vsmVWTkT2yKdvlAmVbm5gQ== X-Received: by 2002:a05:600c:138e:b0:439:9a43:dd62 with SMTP id 5b1f17b1804b1-43ba675a8fbmr94459815e9.24.1741015399077; Mon, 03 Mar 2025 07:23:19 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4e::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bca26676esm4051765e9.8.2025.03.03.07.23.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:18 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 07/25] rqspinlock: Add support for timeouts Date: Mon, 3 Mar 2025 07:22:47 -0800 Message-ID: <20250303152305.3195648-8-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4618; h=from:subject; bh=yvqL/a4hlJcETEVluNNRDX4NLx5YZM2izb98JHL7vqQ=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWW5YAHky9QnaXijfLpbPfmrcJbz9mGABaMN7Gq 0o42/62JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyhFZD/ 98oVQe8PFJhIllEShh4nhwUHn8NyXfjHgZA8EYD/afmXUtbe6cebWH5AkecePI/ENfeIJGVq2k75qO bvWe4RtiqLDWBsJl0E7U+s2KeDZ0DSk9f2WEvtzxSmCp/7UvUNNCppI/KwpLK+0e7hAe+GARXNg60S gvFuMulaOvXckRa9tpI+Hr6QKTNcN2LRmRExBXLd9T3LEoZTDJmyajfWO+Rfz5YU0g5KANdcy5e0qn GSzgVdRNgcSQOk3Zbp4619EukeKa9f23Tg02x5DCOUpx6mZ5nLqck/vFZ9oU8Webx+07h2RX91o2Kv TN1pq4jrCnQVVKWpOXrlnqPje97GpGVOCULGt1HNSZjpiEHi1aaxH6G0Y04RUnW0tHWAoxe7UDJk0o fCSuU0DhZpfltTY7/WE+Vf0bJFidSjLAaFUBjfaGSjVpQiuT/NaM6RgcLOAfFZ+kZKor4iNdDigsyS uMikiOXPVSq5YpOAmNxGz++xsEFagK0j1pVs6TqaC0dd7hKb57Ww60taY11XX+2jebXLUb/Ujov7TM eHo97ARRBCdiHc7+5UtfXPPw64OraEi2LfbonjvDU0tqtoL2yij/h7RNtLLUQP0RCqFJfuaR1AezSv BHNMKZgxnulhR86H/4dzv/v/FYXTwu9j3CZEhgrEpd9sdWsCL1hOwLpR5Hcg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072320_928474_395FD6F3 X-CRM114-Status: GOOD ( 23.78 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect when the timeout has expired for the slow path to return an error. It depends on being passed two variables initialized to 0: ts, ret. The 'ts' parameter is of type rqspinlock_timeout. This macro resolves to the (ret) expression so that it can be used in statements like smp_cond_load_acquire to break the waiting loop condition. The 'spin' member is used to amortize the cost of checking time by dispatching to the implementation every 64k iterations. The 'timeout_end' member is used to keep track of the timestamp that denotes the end of the waiting period. The 'ret' parameter denotes the status of the timeout, and can be checked in the slow path to detect timeouts after waiting loops. The 'duration' member is used to store the timeout duration for each waiting loop. The default timeout value defined in the header (RES_DEF_TIMEOUT) is 0.25 seconds. This macro will be used as a condition for waiting loops in the slow path. Since each waiting loop applies a fresh timeout using the same rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the values can be easily reinitialized to the default state. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 6 +++++ kernel/locking/rqspinlock.c | 45 ++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 54860b519571..96cea871fdd2 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -10,10 +10,16 @@ #define __ASM_GENERIC_RQSPINLOCK_H #include +#include struct qspinlock; typedef struct qspinlock rqspinlock_t; extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +/* + * Default timeout for waiting loops is 0.25 seconds + */ +#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4) + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 98cdcc5f1784..6b547f85fa95 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -6,9 +6,11 @@ * (C) Copyright 2013-2014,2018 Red Hat, Inc. * (C) Copyright 2015 Intel Corp. * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. * * Authors: Waiman Long * Peter Zijlstra + * Kumar Kartikeya Dwivedi */ #include @@ -22,6 +24,7 @@ #include #include #include +#include /* * Include queued spinlock definitions and statistics code @@ -68,6 +71,45 @@ #include "mcs_spinlock.h" +struct rqspinlock_timeout { + u64 timeout_end; + u64 duration; + u16 spin; +}; + +static noinline int check_timeout(struct rqspinlock_timeout *ts) +{ + u64 time = ktime_get_mono_fast_ns(); + + if (!ts->timeout_end) { + ts->timeout_end = time + ts->duration; + return 0; + } + + if (time > ts->timeout_end) + return -ETIMEDOUT; + + return 0; +} + +#define RES_CHECK_TIMEOUT(ts, ret) \ + ({ \ + if (!(ts).spin++) \ + (ret) = check_timeout(&(ts)); \ + (ret); \ + }) + +/* + * Initialize the 'spin' member. + */ +#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; }) + +/* + * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary. + * Duration is defined for each spin attempt, so set it here. + */ +#define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; }) + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. @@ -100,11 +142,14 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; + struct rqspinlock_timeout ts; u32 old, tail; int idx; BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + RES_INIT_TIMEOUT(ts); + /* * Wait for in-progress pending->locked hand-overs with a bounded * number of spins so that we guarantee forward progress. From patchwork Mon Mar 3 15:22:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1D3E7C282CD for ; Mon, 3 Mar 2025 15:36:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=CiH4DXwFE1Biytfxts8kRShbn5OFLAeIwdTEbcHfEBs=; b=RHiBfjkkFMTDadFUhmCChdNyvu tloKj2N+aZQvtI6FGt8XRdpPOiXNI3E6cNUC09N1Vt0PLcKuzUPX3TdO5/qbQ8Gg4rw7/2Z2uozIT NirejEojkytES1OBNtpdXHI4oXZz+F2lU1Mm15cZkcPunszWZVfTTU8blW5UogoDHdg7vxrkIGNFI voMvZI6OrqDujRC6eMYn/IEBUIrYeaY/m3viIUV4caVMjllED4rJ1JPPYeUAW+eJ/nSCO0vaS94QB hQfwN/JX2xeE97sjOHebbVxXP/igmox48BFXElSqSNs9pDukoFxb4V8pTbci+8J1NXXmM1VdY1VY+ pSTzv2Dg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7pu-00000001LXJ-2ktF; Mon, 03 Mar 2025 15:36:06 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7da-00000001IDJ-00Ek for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:23 +0000 Received: by mail-wm1-x343.google.com with SMTP id 5b1f17b1804b1-43bc4b16135so5813095e9.1 for ; Mon, 03 Mar 2025 07:23:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015401; x=1741620201; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CiH4DXwFE1Biytfxts8kRShbn5OFLAeIwdTEbcHfEBs=; b=FGXjDq0dCCCyUa8oyuf5w3mjssQbjbwu+THtvh5eH3GCW0NAb1f2moghbnYTTDAssG 1Yn2eNeZuhK8A9+mFnh5BoFiXlpjvrpGK/chtnMvY4B57ea5Zfv8ytG4T9nQF0dAqZvb heQEmHZZ+pGyx6Im1tJa88LOJikW/23D7MvqTnYt4a/4YksGShkiltc4voQnCklRw4LU TC28zD7bgD5wxm0yqgWubuRNffxh2gC4WxLkE9jctfAgkgdE2wFYwoH697ks9Rdoyrqk ya6gUjO7Q3BqMTzOLYcd2DSgcD17caai1CsMexo/Tay/iLjVv0qSm6CNPBTh+0oQxgSm YFrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015401; x=1741620201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CiH4DXwFE1Biytfxts8kRShbn5OFLAeIwdTEbcHfEBs=; b=vqw0tpNH2xxuLo/uGg6yzzDNgtXlUQHz5eD3Gh4N5Sh7ukMXbRDQ9nXO8/uOQhWksW gHv0ho8RUlznxiGaYtXKqDRzScgR1M/LD2SVkeBFCL3lgUEWxTlupROVgk9TSF8SG/TK lcDA2Dm8Og1hyeYJ+70QxFWvGCaIzQtItHrV4CKeqqt+YpIKusYl8AkfwEeaw169d5HW bVayKZSuKjMiJDSr9s9SUGT3bXb38In2FQ1dZ+W5x6YzTo3hjP91pIMQVAW+up0f37KO a++gzktO3FKGRn7lJR49NzezxMlY8JvDNpiH94Sj2c/q1ssyJxUTrpVSGqMiaBveQrWW jIdw== X-Forwarded-Encrypted: i=1; AJvYcCWg7GhLhLWcLUAaX5+TabYWblW2GKNbyH5WpfxngtEejYP8yjiwyiYN/kt5R71xQcN4oMik5+j1K5Z5EOXY77oQ@lists.infradead.org X-Gm-Message-State: AOJu0Yy2T71P17ZLcsdcQqFGAqEbFmQjE6yn7ruX49/zjmNQ/DXQdL/M 8ms6FPLJpe9qGVaA89mbvJJ43PP0Ho4vxXWoCljoLxmbPUd3bTe/ X-Gm-Gg: ASbGncs9csc99tp7rb9tHnO3hg9X0h1Nw0rySl9ynpr1aiWw8jZFifSCNOPJlcH7azt XI6Uz9dZJfyZUxirADUzEcirPfhlZNgP1jtLqZDaQ+6dGa0KcIsHV86BUXWxTfevoJZnEpnC7Oi qaDkRl5HtciCpfos2U3sFHmMczcLSfWW073mOxCharD0r/pNK71IFfBvBiYfBSlw2GlB4eCx3Lp 0zUbPNNT7hm3+qeRrwq04HL+SFFJcuMXNWNxFOmZByyk98zQAlms37bzrzJS+2JneFdMKVLsbd3 vsMZUI7UPpj+Riu5LGkfq76VqVcdcJa2zQ== X-Google-Smtp-Source: AGHT+IGX9Uo3KWT7ZP1Tg6bcJFL7kn9d7FzFa8+K2r3RPDuxVc4WDWiD0YgV40rXuhBZGtIqZf8VUA== X-Received: by 2002:a05:600c:6b65:b0:439:a6db:1824 with SMTP id 5b1f17b1804b1-43bb3c30d77mr71469755e9.16.1741015400470; Mon, 03 Mar 2025 07:23:20 -0800 (PST) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc11e1c8esm42306695e9.32.2025.03.03.07.23.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:19 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Ankur Arora , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 08/25] rqspinlock: Hardcode cond_acquire loops for arm64 Date: Mon, 3 Mar 2025 07:22:48 -0800 Message-ID: <20250303152305.3195648-9-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6192; h=from:subject; bh=OXZjfkj9vlmQB5DKUaVgZP2eUXPIscjbGs4w8nWWb1Y=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWWOoYAiRmhF6y065QEj1LrKSfWllbpNyyYNZSa qSwKiLWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RyoQ1D/ wLlVGYocGxdNg4s1QyAFfYsSQi2WqV1eBWYZSjpNeeklY5NGh1Tkry86bADuiPA41U+6sKcb/xJFWh P98uXD3GNRpR3Qc0ZlcFxJ8UNRa1pt3WjaL+WU2IF7ab7pFZ78BILJH5IehQy38rq89MVWCIwd2y8a RHZEOFbn9h2lwC+gVyaVp6R/8EqwozY7peLmtmdEFyLzkanBkkKXXc+vmf2LUSP//uz18W5Dvlk/u0 Ij4Dh4XIXBZs+zio7QVmeQ1DgFpVZqqqb4UgpoAdO7QiQV0VZszEUsbM5tyg0NeNseWkS9u9vn89cy o5IFAwdAf7z+FqxhLOTysBwrma0uhfKDfzVTbqISED2s8YCgwTcFTVLcOnpawilUTon7ijZDzJQQza 5cEcDi2MdF3WOvcWqxiMLcgBJpoJwzQQ6MByaMhPTBx+dbyWoJwX54M+fwqGKqjL1nIy4s38v/fflb 5KSHn5vrHcWHNLNI4f1kxITi+5bQsaMeifIC5ltUctjwP0uh/WVhq+B63kLgKyxgW8zDe5RkYxvVVZ W2bMlEgbU71tVLuA5VUZTOizoax2TL2hJ1Ox21/A0q4t3m/Th739SY3jusfPle/dtSkQURdfj5CVS+ 7WI5z2oPcdvV5Fu3sRfrtb1EwqRYq3nUqzxzQ781Ej823z3Nf0EW8WW0F5Dg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072322_075827_4637E55F X-CRM114-Status: GOOD ( 23.28 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Currently, for rqspinlock usage, the implementation of smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are susceptible to stalls on arm64, because they do not guarantee that the conditional expression will be repeatedly invoked if the address being loaded from is not written to by other CPUs. When support for event-streams is absent (which unblocks stuck WFE-based loops every ~100us), we may end up being stuck forever. This causes a problem for us, as we need to repeatedly invoke the RES_CHECK_TIMEOUT in the spin loop to break out when the timeout expires. Let us import the smp_cond_load_acquire_timewait implementation Ankur is proposing in [0], and then fallback to it once it is merged. While we rely on the implementation to amortize the cost of sampling check_timeout for us, it will not happen when event stream support is unavailable. This is not the common case, and it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns comparison, hence just let it be. [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com Cc: Ankur Arora Signed-off-by: Kumar Kartikeya Dwivedi --- arch/arm64/include/asm/rqspinlock.h | 93 +++++++++++++++++++++++++++++ kernel/locking/rqspinlock.c | 15 +++++ 2 files changed, 108 insertions(+) create mode 100644 arch/arm64/include/asm/rqspinlock.h diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h new file mode 100644 index 000000000000..5b80785324b6 --- /dev/null +++ b/arch/arm64/include/asm/rqspinlock.h @@ -0,0 +1,93 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_RQSPINLOCK_H +#define _ASM_RQSPINLOCK_H + +#include + +/* + * Hardcode res_smp_cond_load_acquire implementations for arm64 to a custom + * version based on [0]. In rqspinlock code, our conditional expression involves + * checking the value _and_ additionally a timeout. However, on arm64, the + * WFE-based implementation may never spin again if no stores occur to the + * locked byte in the lock word. As such, we may be stuck forever if + * event-stream based unblocking is not available on the platform for WFE spin + * loops (arch_timer_evtstrm_available). + * + * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this + * copy-paste. + * + * While we rely on the implementation to amortize the cost of sampling + * cond_expr for us, it will not happen when event stream support is + * unavailable, time_expr check is amortized. This is not the common case, and + * it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns + * comparison, hence just let it be. In case of event-stream, the loop is woken + * up at microsecond granularity. + * + * [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com + */ + +#ifndef smp_cond_load_acquire_timewait + +#define smp_cond_time_check_count 200 + +#define __smp_cond_load_relaxed_spinwait(ptr, cond_expr, time_expr_ns, \ + time_limit_ns) ({ \ + typeof(ptr) __PTR = (ptr); \ + __unqual_scalar_typeof(*ptr) VAL; \ + unsigned int __count = 0; \ + for (;;) { \ + VAL = READ_ONCE(*__PTR); \ + if (cond_expr) \ + break; \ + cpu_relax(); \ + if (__count++ < smp_cond_time_check_count) \ + continue; \ + if ((time_expr_ns) >= (time_limit_ns)) \ + break; \ + __count = 0; \ + } \ + (typeof(*ptr))VAL; \ +}) + +#define __smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, time_limit_ns) \ +({ \ + typeof(ptr) __PTR = (ptr); \ + __unqual_scalar_typeof(*ptr) VAL; \ + for (;;) { \ + VAL = smp_load_acquire(__PTR); \ + if (cond_expr) \ + break; \ + __cmpwait_relaxed(__PTR, VAL); \ + if ((time_expr_ns) >= (time_limit_ns)) \ + break; \ + } \ + (typeof(*ptr))VAL; \ +}) + +#define smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, time_limit_ns) \ +({ \ + __unqual_scalar_typeof(*ptr) _val; \ + int __wfe = arch_timer_evtstrm_available(); \ + \ + if (likely(__wfe)) { \ + _val = __smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, \ + time_limit_ns); \ + } else { \ + _val = __smp_cond_load_relaxed_spinwait(ptr, cond_expr, \ + time_expr_ns, \ + time_limit_ns); \ + smp_acquire__after_ctrl_dep(); \ + } \ + (typeof(*ptr))_val; \ +}) + +#endif + +#define res_smp_cond_load_acquire_timewait(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1) + +#include + +#endif /* _ASM_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 6b547f85fa95..efa937ea80d9 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -92,12 +92,21 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) return 0; } +/* + * Do not amortize with spins when res_smp_cond_load_acquire is defined, + * as the macro does internal amortization for us. + */ +#ifndef res_smp_cond_load_acquire #define RES_CHECK_TIMEOUT(ts, ret) \ ({ \ if (!(ts).spin++) \ (ret) = check_timeout(&(ts)); \ (ret); \ }) +#else +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ + ({ (ret) = check_timeout(&(ts)); }) +#endif /* * Initialize the 'spin' member. @@ -118,6 +127,12 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) */ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); +#ifndef res_smp_cond_load_acquire +#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire(v, c) +#endif + +#define res_atomic_cond_read_acquire(v, c) res_smp_cond_load_acquire(&(v)->counter, (c)) + /** * resilient_queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure From patchwork Mon Mar 3 15:22:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3CAE4C282CD for ; Mon, 3 Mar 2025 15:39:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=GSHeNmV46kRt/dp4H0+hnJsEV9KJXQ8UwiwQeMlZ2+c=; b=Zxz74QIyMdnf59lOTMj8c5V3KP zaC6Lmel4OS69zCey6SCnWV9grhdtCQwdR/XGgu0jjoMDVjrTRZEW0bzUFBlTvqPsoKAVxHUjsTxn ep4I5IoLOj4OL8+B2TyolWOJpCfMFKPDm+7S+1DH+A7Bq5tLYgqs8ll6Cxp32MZYrkVZBGI71wEBW cmoX8S8SGd9jF7YCy4GcFMRF/cECBEpH56DmelOKnSWI4yqXmlMETyzWJfXaNFDooHzGQOnRY64cA hPe8grpzPN0AzNZDwSshGceIeLfBoa251O+qzhWFMxu+ojmHmV0om69Cxq/c9MkdKal5eydPgAY9P 1Slmnrpg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7t1-00000001M6H-0XLX; Mon, 03 Mar 2025 15:39:19 +0000 Received: from mail-wr1-x444.google.com ([2a00:1450:4864:20::444]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7db-00000001IDh-176l for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:24 +0000 Received: by mail-wr1-x444.google.com with SMTP id ffacd0b85a97d-390df0138beso2442171f8f.0 for ; Mon, 03 Mar 2025 07:23:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015402; x=1741620202; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GSHeNmV46kRt/dp4H0+hnJsEV9KJXQ8UwiwQeMlZ2+c=; b=aPvXjnW/c3TMk49yrhiwJB/+7gKaJrIXU5oBwVfm8cJKWgQBBwDYQS3M9D8ADs4CFX 4ARm5hFFWy/ztrmLqvltXgEDlmFeO7XbZn/Qk5h+uTrVtRWiIMJJpcu9JR8pwtD+8rMg /C9nFtMGxf+swxBX08qw1Ui+Y4TLgDamKa85wH+v7C3USjhmWMm3gEdNF+yVupSJkvLt KmspVZadKQwmh5LntnFGyG/pUGu0DnQr5DhVserOySsxrstZ12JFy0QTfJcFL2PYjyc1 0+Yd6tXinHA69ICk1f9Z3Aq/8DQR3q2U8f+pbPgSJM1yoBW64tq4SfbyRxFET6Id/87b M24Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015402; x=1741620202; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GSHeNmV46kRt/dp4H0+hnJsEV9KJXQ8UwiwQeMlZ2+c=; b=qqOgkdd2349nSaMBMwtEjO1yxF4yxxd+GQOKPQ2SvUZdi+h4wGh2/xojPZenyKbJiI 6MVHCBuzC7/GAGjV7ne9Sa7Uj6LIsQMcGMdY1V++C81JjVe+rTP5gQfqj5SH3RJ4jobz hLD/rEhePmnuWy0gPc7IAoSdLnTIuSPqlbBvrVrczkT8I8S5YTURb2NPHvG4Q+TVVpzp J4A104yP6Akd4t06SKYC/Ik21WZzXDfxU+oracc+f5QjMm2FNBHdjPocU1eLnESpS8ou Rba0wCNHp6HWHNsS1dYO08HG753pF9tfHWGkS5IqvENparauLclJ9nOJLLbFPIl1YR+7 2QfA== X-Forwarded-Encrypted: i=1; AJvYcCVNJsT3gdWulM6wT40ZlPggHNgtqXyDWdxWPzdLcttdLD8yaAaGW4VRBp84k+twdig433a8oGDEMs5R2EN7k+a3@lists.infradead.org X-Gm-Message-State: AOJu0Yxgzl8mc7tJ4AMKwC9yFPy2cpMgblxPEF4THHs2DLDa4ohArh/a J9YXv+6QDIAAmqdKq/qRnTPSFBJ6tbA0YYlA9iYsliMgFGM0/rMI X-Gm-Gg: ASbGnct/xVPANMCN0QXbUg8Z+522GbhpjwgJBwuHKJ1NZQJ92l3bsQR7RXrM6aMinCx 2foBQe0UvfcXF7Lb1gxeDrmc4U5QqUBrokYERv9pf35OQ/ynK5MTiZcEYdNRSq7uhZBiH2sZABj icSDWRWqCPST+qezOH0mY1dK9kpkZ5udFYVKB5mdQRpYl5XSibFkgtNFXlLOJjj78qhE0p3i4aP WdDp1CoBeyCZBmaExVS79+KnXYTp6DRFIB2Q3MUhSE505OEZZmiyiC+RsPGbzWUvJeRCM9W5i8s Yk4R2kpSxB1LPzPnr4qLzG0TuQoEidNs1no= X-Google-Smtp-Source: AGHT+IEnT4A+TUqzNaTrtLXu5OtEH4QH2hmaxPGiGwoQEqitjFfl0aXZCCxloD3CNgY+V4T+aWGquw== X-Received: by 2002:a05:6000:2c4:b0:390:fbba:e65e with SMTP id ffacd0b85a97d-390fbbb1ce1mr6065042f8f.32.1741015401792; Mon, 03 Mar 2025 07:23:21 -0800 (PST) Received: from localhost ([2a03:2880:31ff:51::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485dcc1sm14484132f8f.87.2025.03.03.07.23.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:21 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 09/25] rqspinlock: Protect pending bit owners from stalls Date: Mon, 3 Mar 2025 07:22:49 -0800 Message-ID: <20250303152305.3195648-10-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4562; h=from:subject; bh=1XvWGmZCOVeyUxqvFf8Y0s6SXxKLnF+WaHRbErsWH/c=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWW53FYHDbQPTx2MaF10QE+C9S9CqxKdZN1xCD5 5EH85qGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlgAKCRBM4MiGSL8RygSbD/ 9QtT69Fbx2ynZFLU/vCOAn9155npdoPHm32pPMjxG7qV6+Th71XhvC5I9p08WQrwACm1JzGxkepYjm pDnmwck8jRMUBZWjdXi29dJEj7At072EjdZOXCp/Pzn/DxG1hBXTY/ZH9GJo3q8nxLqI/8mzOGpvR2 b9bpAUy1mnqdSz0AULDNnLik1YTh5mj/YEE1p2drByM6bXTGqqEfUlBQDrhOqf8KdBxa4dxNRcWKlB 0tNMmc0lPHuNsFGhIa98RSTjnkXRl2GzOutIcnXs93hpXKn8yL6/Un1t4hb0mHzlLTyxwVwY/g2PPn 0AA/IDGt1/cVQydezHH65YVA5Teh+546Vhq2IbW4nmbmef5EAMQ84COS1A81yPDyV5q0JTKBMjPvx1 p9XYSN4yy8hzTmwMquOoMa64dIFJQHMW0aXviAnk5fYkdWxSk5Gv34nqs96z8x2Eo7/RrZBWNCXpxA iRgBViTStzOx2G/rqo/jRde/IWmULIVwPeiAlMApDAYkehjUopupyV601mAfuY2y35Aiv6QNqYtJWc T6+pG9F4lzVDFo4lc2g8miE5ZrRBaoNA8GI9PRcWFMSNQAd8K0mEFPFYcYwWm/d8UaoZIqpQnY8Z2P uxJ+hkyt/kL3Vl61gkgHMJVOdMx+RocUAFG+5xXN2fv9S6IKl87+y7Q5xRlQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072323_326242_C39FE62E X-CRM114-Status: GOOD ( 19.47 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The pending bit is used to avoid queueing in case the lock is uncontended, and has demonstrated benefits for the 2 contender scenario, esp. on x86. In case the pending bit is acquired and we wait for the locked bit to disappear, we may get stuck due to the lock owner not making progress. Hence, this waiting loop must be protected with a timeout check. To perform a graceful recovery once we decide to abort our lock acquisition attempt in this case, we must unset the pending bit since we own it. All waiters undoing their changes and exiting gracefully allows the lock word to be restored to the unlocked state once all participants (owner, waiters) have been recovered, and the lock remains usable. Hence, set the pending bit back to zero before returning to the caller. Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout event statistics. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 2 +- kernel/locking/lock_events_list.h | 5 +++++ kernel/locking/rqspinlock.c | 28 +++++++++++++++++++++++----- 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 96cea871fdd2..d23793d8e64d 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -15,7 +15,7 @@ struct qspinlock; typedef struct qspinlock rqspinlock_t; -extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); /* * Default timeout for waiting loops is 0.25 seconds diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 97fb6f3f840a..c5286249994d 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */ LOCK_EVENT(lock_no_node) /* # of locking ops w/o using percpu node */ #endif /* CONFIG_QUEUED_SPINLOCKS */ +/* + * Locking events for Resilient Queued Spin Lock + */ +LOCK_EVENT(rqspinlock_lock_timeout) /* # of locking ops that timeout */ + /* * Locking events for rwsem */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index efa937ea80d9..6be36798ded9 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -154,12 +154,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) +int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; struct rqspinlock_timeout ts; + int idx, ret = 0; u32 old, tail; - int idx; BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); @@ -217,8 +217,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * clear_pending_set_locked() implementations imply full * barriers. */ - if (val & _Q_LOCKED_MASK) - smp_cond_load_acquire(&lock->locked, !VAL); + if (val & _Q_LOCKED_MASK) { + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); + res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); + } + + if (ret) { + /* + * We waited for the locked bit to go back to 0, as the pending + * waiter, but timed out. We need to clear the pending bit since + * we own it. Once a stuck owner has been recovered, the lock + * must be restored to a valid state, hence removing the pending + * bit is necessary. + * + * *,1,* -> *,0,* + */ + clear_pending(lock); + lockevent_inc(rqspinlock_lock_timeout); + return ret; + } /* * take ownership and clear the pending bit. @@ -227,7 +244,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ clear_pending_set_locked(lock); lockevent_inc(lock_pending); - return; + return 0; /* * End of pending bit optimistic spinning and beginning of MCS @@ -378,5 +395,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * release the node */ __this_cpu_dec(rqnodes[0].mcs.count); + return 0; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); From patchwork Mon Mar 3 15:22:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C11AAC282CD for ; Mon, 3 Mar 2025 15:41:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=TdqbAA6BbOHNHxPeedO3e4MUz1jdw47Vp3CnPvreE48=; b=AtvVEENlmfE6SIHVjCt4rhk0ys NhB4IL5mxjCyLLzWEUrIRiDEq1fWYgoN/KJh1eD5cqFjYw0LVeZGEa7W4a8R8IJpuEot9QahL24qS /CtSitZWB4sz93YAICrGfc6tX2sRHTSxV5QBt8hgR+5xzjseyyPWkwsoqLmX1mAlQgN14IFGN3Wtb Xvgr7q/mHmF92HgfBUtOvGFIMaB4Qz4b7MUPMmyDBM2tYUAQuWsnAV6p1vJtXAcm0H9vi/txWMqpG /bV9OzNW5VDQVQWP7v/ZiiCCyw5ivHl2Uaiako2uI70HL/KDVsPt6ZQt+cGuz0LOPbatm4UQ1inG2 vxDH4o0w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7uY-00000001MME-3ZTE; Mon, 03 Mar 2025 15:40:54 +0000 Received: from mail-wm1-f67.google.com ([209.85.128.67]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dd-00000001IF1-0yTd for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:26 +0000 Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-43bbb440520so14147045e9.2 for ; Mon, 03 Mar 2025 07:23:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015403; x=1741620203; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TdqbAA6BbOHNHxPeedO3e4MUz1jdw47Vp3CnPvreE48=; b=YpyfxFhyeb2LCNr5X509ou23qx35VU8AMZivuPwkOTtec3m8+9VJxj+v2ci3T/kNLk 5dpU5v86dC9iK0xBSruusrzpX223mVeA5z7lmXQLUn7yqX7qVEKfHVtGzTyDltuvQ/wF GiBV/KlUbbGUl1vGU9kkNyL5cYeDXR8fVoNYwMB0dM1qbI7L/HyNxqNe68Bdoi87NVit fAKFpGsCoHZOVtqKIuTrlbRS2MrcT9d6zSZwNnhKkbpUKqFXr3lZXPPG6fTojtd6ZFla NKseu5Lrf/1HqZCwnjNiSiDzQa58WwmSraW4UJ8kR3CROFUASvOMsdAp8uZobY4Jmwzq RNMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015403; x=1741620203; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TdqbAA6BbOHNHxPeedO3e4MUz1jdw47Vp3CnPvreE48=; b=ABMnEVOazxDgeNrTLO6a4zYKru3I/LXj+QZAELvF/idc44bSMZj1zytINnHuMORAGG 3LsZa1jerTkvO8sGw6MIMo0SrK8XulU2oakRTkh+DQJBGkSkVAXge8BncghYQa5fWrMX oDPbdEgssNCGhZOtUbbZtUQ6uAIBh6Y2mI2SN3bUJsIh1djANIoljLcTA4dLoiWuXOvj 7fqyuhgwuaWNairt2PlremuMLYPnMn3KfvM77soBrBLq9m3l/g78HTeMJJEyPXVkr5eh dXmackdtEP4v2fT74SoI6ecP0UPj67bETmD0HxREQmRd9qNYxqZrnpSaqQuNvD1cpr9J 3Ueg== X-Forwarded-Encrypted: i=1; AJvYcCX9i2m7n5KzKxkE5IzM4+Oy+OSm1dgcT3IELO9k7gQ03Y8CnvkM5F5vBaLYv+kbDBP2CqoqCp8fTDwk6ZWv863f@lists.infradead.org X-Gm-Message-State: AOJu0YxQOUI0zrXQGtFnTg7SNZuy7s8aHkUbHZeR60U1ls48u+eJ7T8E qHWQd7Duhjp0qbTIOE9aY5d5Yk0uSStEma+z3Tv38E2PGvUvW1bE X-Gm-Gg: ASbGncsWCA2B3c8jpBis8kLDpvNck7wYYV20cOdCvbvvMz3kNybpmHGteD55ClZWTbu Xon8agNVEdHfdcWyrXb0c1Q3ywqEplOqFPBL316lsK/DjQLY3QFrBw54ja/28V3zUADo5Wv+8nv 6eK+2ogn/7UoF/ULPiT3SXhFz8MoCoTf8VYJfq4ZPMws9q3lpzJeLynfMUMUbQFuJP8SxhkkUqq RyFYJCnJHfOXBrDuL8NXaElfUCgRcMOYbsjphaqD8ucKc29Pu5RCCw3rDTlOquSBUkQCvNRCmez rjMUFc0iDJE3d9oSFQEgaRDIdVGqzCaPde8= X-Google-Smtp-Source: AGHT+IGFxxitWQfJahw973p0I2xV7/kZwy8gicFDigUlKkHKzWL+qRQIH+fD/xVRKEKwh58Qm9F6xw== X-Received: by 2002:a05:600c:5014:b0:439:89d1:30dc with SMTP id 5b1f17b1804b1-43ba730d5b6mr142469095e9.10.1741015402850; Mon, 03 Mar 2025 07:23:22 -0800 (PST) Received: from localhost ([2a03:2880:31ff:54::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc6a8ff01sm23489995e9.39.2025.03.03.07.23.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:22 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 10/25] rqspinlock: Protect waiters in queue from stalls Date: Mon, 3 Mar 2025 07:22:50 -0800 Message-ID: <20250303152305.3195648-11-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=8548; h=from:subject; bh=/tO7WsHRnQ1y7o7jGLLRZ75jx2rvQksn4PRPhYxulGw=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXBmmTn9Q/JpHcUWagL6D0LQj7D9XnazfmRv20 F3ZP162JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyoaOD/ 9RNH1I8JHE99sQbIlMiKmtHTFHuJCcV0nvMTdA9wCT6zW9DamNJ7yjyl81IYTGgDbSp7m2QaZu788r wMj5fcd2iwhWcBBc3PlDuAKaTQwSz8B5BN1i6UV7vcK1E/AtpCEpvKXpqZH1SSiPx2YkgVYq8KmZxt Z/dIeZESDo9b9d1uSbJmJosgaNhZz1AbbuYEp6kniedp3jozmo3YFBcOue3YFKPe2Kg4NTJhwyRrXv tULCtm9RRxbu6QxV+kLPyqhYo74CyHJaolrRbLzf8we5rzaMwdJnBzFvKd7i2ZpNaoWQxykdCanhPC kPtXLBOdzzqKzwxaC1t4g3BJ7UIeouteWNdMi5qZqLhvMLByPg/i0nacGi4CD64Lriiq0FL+AMSlTM GZPhan3dSlPjsN45LIuBIPnN5XtrmgKV629AYm01KKITf7cSzqgl2iJ3dpqSasXs//53rgJxbnJaVq VOv1pbHmn+VLbJiV2X88Q43nhN1CwA5HGLBirqzzZ4K0hRzxuo9oGT6S2r76PZFDucgeMDRJB9c7Mm GUAgAQ0RCNDP+ehlhqzyWE0+v7m1qVYImUE0vh4QbgRxK8wEBGaupTPCZ0wHn0ruv4PtHa7PABtP+B PRV0YKX9hGtJ//vt5pVqTc7HjT1GByFMSa0MhSbB3s6F+v0Qkm6vjB3b7DBQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072325_267848_6EFEBAA4 X-CRM114-Status: GOOD ( 38.26 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Implement the wait queue cleanup algorithm for rqspinlock. There are three forms of waiters in the original queued spin lock algorithm. The first is the waiter which acquires the pending bit and spins on the lock word without forming a wait queue. The second is the head waiter that is the first waiter heading the wait queue. The third form is of all the non-head waiters queued behind the head, waiting to be signalled through their MCS node to overtake the responsibility of the head. In this commit, we are concerned with the second and third kind. First, we augment the waiting loop of the head of the wait queue with a timeout. When this timeout happens, all waiters part of the wait queue will abort their lock acquisition attempts. This happens in three steps. First, the head breaks out of its loop waiting for pending and locked bits to turn to 0, and non-head waiters break out of their MCS node spin (more on that later). Next, every waiter (head or non-head) attempts to check whether they are also the tail waiter, in such a case they attempt to zero out the tail word and allow a new queue to be built up for this lock. If they succeed, they have no one to signal next in the queue to stop spinning. Otherwise, they signal the MCS node of the next waiter to break out of its spin and try resetting the tail word back to 0. This goes on until the tail waiter is found. In case of races, the new tail will be responsible for performing the same task, as the old tail will then fail to reset the tail word and wait for its next pointer to be updated before it signals the new tail to do the same. We terminate the whole wait queue because of two main reasons. Firstly, we eschew per-waiter timeouts with one applied at the head of the wait queue. This allows everyone to break out faster once we've seen the owner / pending waiter not responding for the timeout duration from the head. Secondly, it avoids complicated synchronization, because when not leaving in FIFO order, prev's next pointer needs to be fixed up etc. Lastly, all of these waiters release the rqnode and return to the caller. This patch underscores the point that rqspinlock's timeout does not apply to each waiter individually, and cannot be relied upon as an upper bound. It is possible for the rqspinlock waiters to return early from a failed lock acquisition attempt as soon as stalls are detected. The head waiter cannot directly WRITE_ONCE the tail to zero, as it may race with a concurrent xchg and a non-head waiter linking its MCS node to the head's MCS node through 'prev->next' assignment. One notable thing is that we must use RES_DEF_TIMEOUT * 2 as our maximum duration for the waiting loop (for the wait queue head), since we may have both the owner and pending bit waiter ahead of us, and in the worst case, need to span their maximum permitted critical section lengths. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 55 +++++++++++++++++++++++++++++++++++-- kernel/locking/rqspinlock.h | 48 ++++++++++++++++++++++++++++++++ 2 files changed, 100 insertions(+), 3 deletions(-) create mode 100644 kernel/locking/rqspinlock.h diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 6be36798ded9..9ad18b3c46f7 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -77,6 +77,8 @@ struct rqspinlock_timeout { u16 spin; }; +#define RES_TIMEOUT_VAL 2 + static noinline int check_timeout(struct rqspinlock_timeout *ts) { u64 time = ktime_get_mono_fast_ns(); @@ -321,12 +323,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * head of the waitqueue. */ if (old & _Q_TAIL_MASK) { + int val; + prev = decode_tail(old, rqnodes); /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); - arch_mcs_spin_lock_contended(&node->locked); + val = arch_mcs_spin_lock_contended(&node->locked); + if (val == RES_TIMEOUT_VAL) { + ret = -EDEADLK; + goto waitq_timeout; + } /* * While waiting for the MCS lock, the next pointer may have @@ -349,8 +357,49 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * store-release that clears the locked bit and create lock * sequentiality; this is because the set_locked() function below * does not imply a full barrier. + * + * We use RES_DEF_TIMEOUT * 2 as the duration, as RES_DEF_TIMEOUT is + * meant to span maximum allowed time per critical section, and we may + * have both the owner of the lock and the pending bit waiter ahead of + * us. */ - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2); + val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || + RES_CHECK_TIMEOUT(ts, ret)); + +waitq_timeout: + if (ret) { + /* + * If the tail is still pointing to us, then we are the final waiter, + * and are responsible for resetting the tail back to 0. Otherwise, if + * the cmpxchg operation fails, we signal the next waiter to take exit + * and try the same. For a waiter with tail node 'n': + * + * n,*,* -> 0,*,* + * + * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is + * possible locked/pending bits keep changing and we see failures even + * when we remain the head of wait queue. However, eventually, + * pending bit owner will unset the pending bit, and new waiters + * will queue behind us. This will leave the lock owner in + * charge, and it will eventually either set locked bit to 0, or + * leave it as 1, allowing us to make progress. + * + * We terminate the whole wait queue for two reasons. Firstly, + * we eschew per-waiter timeouts with one applied at the head of + * the wait queue. This allows everyone to break out faster + * once we've seen the owner / pending waiter not responding for + * the timeout duration from the head. Secondly, it avoids + * complicated synchronization, because when not leaving in FIFO + * order, prev's next pointer needs to be fixed up etc. + */ + if (!try_cmpxchg_tail(lock, tail, 0)) { + next = smp_cond_load_relaxed(&node->next, VAL); + WRITE_ONCE(next->locked, RES_TIMEOUT_VAL); + } + lockevent_inc(rqspinlock_lock_timeout); + goto release; + } /* * claim the lock: @@ -395,6 +444,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * release the node */ __this_cpu_dec(rqnodes[0].mcs.count); - return 0; + return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); diff --git a/kernel/locking/rqspinlock.h b/kernel/locking/rqspinlock.h new file mode 100644 index 000000000000..3cec3a0f2d7e --- /dev/null +++ b/kernel/locking/rqspinlock.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Resilient Queued Spin Lock defines + * + * (C) Copyright 2024 Meta Platforms, Inc. and affiliates. + * + * Authors: Kumar Kartikeya Dwivedi + */ +#ifndef __LINUX_RQSPINLOCK_H +#define __LINUX_RQSPINLOCK_H + +#include "qspinlock.h" + +/* + * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value + * @lock: Pointer to queued spinlock structure + * @tail: The tail to compare against + * @new_tail: The new queue tail code word + * Return: Bool to indicate whether the cmpxchg operation succeeded + * + * This is used by the head of the wait queue to clean up the queue. + * Provides relaxed ordering, since observers only rely on initialized + * state of the node which was made visible through the xchg_tail operation, + * i.e. through the smp_wmb preceding xchg_tail. + * + * We avoid using 16-bit cmpxchg, which is not available on all architectures. + */ +static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail) +{ + u32 old, new; + + old = atomic_read(&lock->val); + do { + /* + * Is the tail part we compare to already stale? Fail. + */ + if ((old & _Q_TAIL_MASK) != tail) + return false; + /* + * Encode latest locked/pending state for new tail. + */ + new = (old & _Q_LOCKED_PENDING_MASK) | new_tail; + } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); + + return true; +} + +#endif /* __LINUX_RQSPINLOCK_H */ From patchwork Mon Mar 3 15:22:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9ED15C282CD for ; Mon, 3 Mar 2025 15:42:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=3HqVASaJxCqTvezKR1/7pImOEJNTa/8ze2+bN0p0ybM=; b=zfo9976TZJcr4l1aPTf4HKVlSi C9BgqaWLTDN2m0+XNX7FqpXmiqIU2nGvDWyNMXgTakU3PFP7uKGedYn+Q5wiLzwLMzFcsYcHOqZ12 2ltSBCzVTO6m6sbBRtWlGz51fRSsiaDucarRQSahzwzKHbl5qkQkorEQEiDMNRz4FE6v22+v7bY7m axVfL2t7OUUttq+QblAXNqjujxKsafwJGGopGJy0TDz+DtVXri9+2XHVJO0d9Z5SZ37X++AaqdZVW mPXBW6uEzeZrRWKrXEwaj//1zeUrVLOXNBR34HzsFwYOboEOLrXck3ouKCECWUIpkj2SlmGopyVMZ AKaXJa4A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7w6-00000001Mey-25py; Mon, 03 Mar 2025 15:42:30 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dd-00000001IFV-2x2a for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:26 +0000 Received: by mail-wr1-x441.google.com with SMTP id ffacd0b85a97d-390dc0a7605so2500898f8f.1 for ; Mon, 03 Mar 2025 07:23:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015404; x=1741620204; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3HqVASaJxCqTvezKR1/7pImOEJNTa/8ze2+bN0p0ybM=; b=EYEVHDIr9jRkMvNvq3ivV7jHqHIM/QPOL1RDvZnOyIKUUKPO1KZSQNtdFZoyJ9/FSV 2r8rUWtBa0uhbeGEnPSAeAUJ7U2GuFAR2cy6eMBmNU9IKm/fGCI0ovqh4wlDBtwwf0ok VC6RsUTiF0L58CKXlWl+0g7t4Yk20uMqmqRQ1ArxZZeUSxmr14Gtj0DiU/n+EsRAkwzk 7pRUBUX2CRZeKA/RnPW0c1bNCJjOINCdYDeFGCTXYZvM0YAE71CncEim6OCSFtNGBUmX ASH6lyB5Io6550GWgAAiHuAAdv55/Jew9uiJSdB/7oedIYz5expxUmXkuI31UHqvBocW L8aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015404; x=1741620204; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3HqVASaJxCqTvezKR1/7pImOEJNTa/8ze2+bN0p0ybM=; b=IzOIUSl5qgVxZ41t3wk26xFGG/P4N1qHl66F19jMxbk8WAkkGqatuMJ6TRzLnaVj/L ZfTaa5K9obCQjs73QRh39T/lQfeCrLJoiAJkuE80DwbklAVdQCMyGIZrhmF90vPeEK5O FPCbEhMvYb+mE+GpTABnxN+B9E7RV9nwhmo7CqmQrmZ9lldV/ZoG4BXxRrGw6FJfKzAG JWZypXSBaayuZ1GlUROilH/4tF/Ys3H7HhHBrxN1PZSTuba71mhQ5yos8XziurGO/Xqi b8DMMGEjIMvJCTMXGGWUzRLurJoUA1pT2twCCg/YeQxMQcVJfVBcaLp81/wZXiWTEC0y gAkA== X-Forwarded-Encrypted: i=1; AJvYcCWeI2Fc13Wu5NpBbf9dVnf3Y6/ET/S2B6VcLO2Aan8GQxWMVSH3ItP0k3jS1kTlub5bE5+5M4QrbA0/hQ1iih2N@lists.infradead.org X-Gm-Message-State: AOJu0YyywCJepdmH2ticY3nvKKkUACrqEIabtH7LkAhukBcFQ4YIE68H b8bsApY/qM6XWa77SM84h3JbHjIyIXUsMOM9ov+yCs9Eon2uhsFF X-Gm-Gg: ASbGncvsXiqHUlblccV4apWras+BSoK6lc1KELDr5q7nw/E2GIiv34HE+RWpXucII6y zfiUkbTe61csQG/Gx1c+RILdA997VEeQs+6E2gM3qdRXOrnmapde3xR+brd7y+po7KJsxMYdeY1 mAd9YCl0x6ch4kkmh7lyTsM/1Eyp5/BNip2QV9aqrTT3RIDCxRiW37QgNtWMac2i6qOmBbOs/Ae SP22ye5pkrPq86Df6JItYR/OHAFIVlMu18dvDasWYiIhwBW7lPK+iE14XhW/A1xTKcqa9CR2Aeh I2CJquyTwnyr/xs3ZLKOAePuI+rUUOsChPE= X-Google-Smtp-Source: AGHT+IGJtvDm6QNWOZ/Cpj2mx/hKnagicnAIvDMSVdaEmiufPdG1r9WWQB7409X5ARNMd+GMAA5o0g== X-Received: by 2002:a5d:5c84:0:b0:390:f1cb:286e with SMTP id ffacd0b85a97d-390f1cb2ceemr5781942f8f.27.1741015404157; Mon, 03 Mar 2025 07:23:24 -0800 (PST) Received: from localhost ([2a03:2880:31ff:44::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47a7473sm14977125f8f.38.2025.03.03.07.23.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:23 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 11/25] rqspinlock: Protect waiters in trylock fallback from stalls Date: Mon, 3 Mar 2025 07:22:51 -0800 Message-ID: <20250303152305.3195648-12-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1845; h=from:subject; bh=Q9a38BvImlUP8RKT7WE0lh2B2I6Iy4KpJSDwsgIePN4=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXG8w8qbawJw3OszHwUL3Z+OsS55BpLQ/c3hWE /xTGsM2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8Ryn5oD/ 96RVXMaF0ne5BLzCBhyNtkJBr6CAlauxQ3pblR3jmHlcP2GkKE00BZqXSuo5K0M0W8hzmyVBm8WZpv QmAprnJvjPvw3bTmZO232VjrI6ST5tJSd6XooAlRFPv85J3lQFS6z5IVGVtrJ/t6kbOg+xfwatWJVH JpGj9jJXyCIiXjHxaK91EcYxqzLOnLljC6WmGxE1N0lmh870+qkXn0vCZjMQ0lyW8wHEBbZ6Ywy8Gz 55T4SsC+beWbwhBNCJn02UhnONKtZqi0rm69Mb6AY+eRzPn9ZAI/n8F3jYqkY+yDCFq1Z4mdsWAsU+ UcxEA0xxZEA1gBVbVzAhkYvCHyFYnA/ZKE/8EZl6+ARIPGiQVPEqYETfeYZTVMxznw060kQAguI7WX WEFHRA8OzFTHycOX3/U8m+f3+9dgBhhVfpf0wJJHSMzNPR8J1we3rcdGBFiA3d8MdoQPth0iHEU2zP AGaEkhSEo4gXrUXNZnLq3ZpLEB+nn1ImiZ31A2o/3uJ4Vw9/x4a0iPKa01RJjV83Kjms2TGGOvbW2N GPkv4fOWnkSZv94YXDW/lzuzDng6iHB/gMJzFDkesuEjSysN2zSjMlyomMe3ja5puRwyFe2tE80jLM qADm2HxuhDZkxotSHNpoBoyUppVvy95LQycs0czfCa2Iqw1nW/eGWv4P5IYQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072325_755968_7272B211 X-CRM114-Status: GOOD ( 15.12 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When we run out of maximum rqnodes, the original queued spin lock slow path falls back to a try lock. In such a case, we are again susceptible to stalls in case the lock owner fails to make progress. We use the timeout as a fallback to break out of this loop and return to the caller. This is a fallback for an extreme edge case, when on the same CPU we run out of all 4 qnodes. When could this happen? We are in slow path in task context, we get interrupted by an IRQ, which while in the slow path gets interrupted by an NMI, whcih in the slow path gets another nested NMI, which enters the slow path. All of the interruptions happen after node->count++. We use RES_DEF_TIMEOUT as our spinning duration, but in the case of this fallback, no fairness is guaranteed, so the duration may be too small for contended cases, as the waiting time is not bounded. Since this is an extreme corner case, let's just prefer timing out instead of attempting to spin for longer. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 9ad18b3c46f7..16ec1b9eb005 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -271,8 +271,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ if (unlikely(idx >= _Q_MAX_NODES)) { lockevent_inc(lock_no_node); - while (!queued_spin_trylock(lock)) + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); + while (!queued_spin_trylock(lock)) { + if (RES_CHECK_TIMEOUT(ts, ret)) { + lockevent_inc(rqspinlock_lock_timeout); + break; + } cpu_relax(); + } goto release; } From patchwork Mon Mar 3 15:22:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999064 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2B843C282CD for ; Mon, 3 Mar 2025 15:44:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=tKa/wmg3ittWkt3o2YJLMmVqh431E62mqFSXV7PYA3Q=; b=Z2FGlivXwj6Kgai+ATd3b0hGhL IjUr+qCwW3wOuuqiBmqtRzCPGNqh3bfWR3D914uw1QBGJcH9aFdo3MetLeNsQVSbwRZyqhnCrWlNI d8qFs5boNwxD9eCzdiYqfqbgdZq9W8lMGKXNVN+7VZWP+CN5o4F0/z51Bn5RQSWKLBN1MRTnlcgoT xRsux+Acb9PD7Oa0OuyQ7j6P5BDnJUvYJVR21XxpEzHhWcKfzhQaE7+vXxrVkGa7lf1ZnmahH0fI4 /BiPp6mEHfKrYo+SUU30X7TS10u/WwTrtz76wSL9dkCNXmrK2Ujm+6Gv5/1zs7KiRnJ4DjEqWlaFx QbK4e4vw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7xe-00000001Mvw-1Q1c; Mon, 03 Mar 2025 15:44:06 +0000 Received: from mail-wr1-x443.google.com ([2a00:1450:4864:20::443]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7df-00000001IGO-0mAp for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:29 +0000 Received: by mail-wr1-x443.google.com with SMTP id ffacd0b85a97d-390df942558so3625012f8f.2 for ; Mon, 03 Mar 2025 07:23:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015406; x=1741620206; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tKa/wmg3ittWkt3o2YJLMmVqh431E62mqFSXV7PYA3Q=; b=Iz07KREfpMPPM8YzPTrY2sX/p2RbXbpvtmx5jOgWQgEYUMoa7teEgMfcmkf4N2HjzX okCnwWvHqtCnuFBmQXu6PWsCSPTBpa/JYglqjgrqRwl6kVeIgnE0lz9KNwsmJU5alnQ8 WFKx1TCeFQ8KH5pfH8PFsB5/Lh//yPmI64O96X/AAPW82ZPgKHULRx6GJdj9sJHJ8zKC 6EPC0+HdlOCtAs8S4vZlRzWdDbeO5PhbHGZc/VYWgpmAFZk0wlVsqwnI3B/1Q8FIVz1R uqzP1BhZto3PHyaoK91l+j1BmNTaU4rbxcpxZJy+/dspbuFZsxxV8dh3MRKYLui5RuW1 eMxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015406; x=1741620206; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tKa/wmg3ittWkt3o2YJLMmVqh431E62mqFSXV7PYA3Q=; b=RK9j+i7S6/0hklAlm+cwEmC7aaTs/5TsV4mQcPnl2qMj6OuWoj46+ixNALZr882I4G TcrjE5/AqbQYTL3VuNqxwW5vqX7GZ2dbBwZulCRQs16GBlTVOYb7u6tRNo0CtqFQkO3K a12O/3eWZZm/OY/ZLSvfROGV/urgR+Y9aIUI3xQgaF9CpBA4F8s6k7JsXGEtv9jEB/5q AsAhAet66FrlJUYaH99g773UkSq8hofaEW+bc0ZuWggUNP+NgEwSl36bjBv/b4b9Ptw+ GLMpa9znZNsfqN+P0+PgugIWfjCKZ63ANNWzaXurLjkyfR6IMjsAqxQZ1iEVrlBB/nkh u7xg== X-Forwarded-Encrypted: i=1; AJvYcCWM9tFrknM+mq7OLM66KXqXBjfKHvpR4YBE7KEgy79GH6kbo/y01o9u3JoOT9ZF2HlwiPRMFBtDOy7s9G0G3a21@lists.infradead.org X-Gm-Message-State: AOJu0Yxf86lc2opaDRDMWJHG4bNF4ox4lTyPhlcRwwFoGEBlz3y1LlnQ Fh2tbyDX6r9K1/MKacKdclBUsatCoStgOmkYhYTx0CpVMwwi1XaZ X-Gm-Gg: ASbGncsnL79Xy+o87/9d4veDrd5Leh7JVLOd5vGNIFxNaHY4kLs87aSgGZ2TNz7+n2C UShgvH25IsJzscaHKFwdw4OkRRu3PzkeGOmGORTzKkTvgv3i8UBRDCsiCHG/4QRj9JdxyBJ/NNb HLF/ojIJBatC23dyddqiQrQ7D30E8Oru2iwfJVjDdp8s8I0McqNr9PLzlgpHL8clgE7CV1TfaXf CRO/P9n/sM2jv/YhsY5mUV7FvuMPOe8chCh0gZY9TBlfs9mxJARjbS1cYs1hrgGGYl7j7Ov5F33 2TUYvws1jjr3njb75bo75X7Y9cRUFBmUPag= X-Google-Smtp-Source: AGHT+IG1blRJwAMlXeHRPq+X+kGTGI57ZLlk81V9zn4NhYMCeonnkf5jNU1HzLwhFv1wOWVdopS8gA== X-Received: by 2002:a5d:64cf:0:b0:38f:30a3:51fe with SMTP id ffacd0b85a97d-390eca53071mr9776559f8f.42.1741015405695; Mon, 03 Mar 2025 07:23:25 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4f::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e4844ac6sm14636626f8f.71.2025.03.03.07.23.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:24 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 12/25] rqspinlock: Add deadlock detection and recovery Date: Mon, 3 Mar 2025 07:22:52 -0800 Message-ID: <20250303152305.3195648-13-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=16419; h=from:subject; bh=q+PlHVYmXyQy+isvTJhD2W+ttvErhONLhj2AWs0Sbuk=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWX+cvAWmiPyeTrpA7wf0kdH6RoTzfmmo3/HOrt rkmH2BWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RylcSD/ 98d0x1nLa8rNVvdG3GA2itnPRdooLYI4T2fCCWt7Ts3e/+FELsL6YecnuYvTRcC8lgfNog49Q61vR9 ZbdiJhRYp0foaE6nkdJ6ZNdOUUqB7XNQ/5a9wspJwtac9mvnLvvflyBBpMra1iGmSbkeC1INTGs0k9 d4LX43Pt6LiEThANydEY2aB/KLWgJ8yJ0GI8rF9cmw9HZVEiXBI+DnpCcdFXfOXk4JFOIsM6J3+9uR gF1XAmwR9+OPBLj3ew1jqeOvImHAC6Q3KhH0fM2Khxp54yewJN3fdK27ecmcZ7dFD955LNup8Jwhc9 6No4sxHtKn3TRa233wc0E2c0WN41u0ANO2hskwdvDQrpBuGt1oekXg4lIEhasIAqkmRUwda8Se6FsW 917K8r3V5eRVzYlOuc2QnlSi1Z7UwvssgtZzlONowb11BDRvYdc6n30bdv8j5j5CkOWLS8QRFw1R2d 02L7M5THvXBlDnXEF17/fIPhaBsUgYWj9Tb6Te0pcUFTth8/hsrrqoLxYgdezymUP+YaKk4gSaxwYS UKsOuzcvXTm/Ole1m5oquCmbxoW+PiysOrhmj6Bjrhr2L2rTOE7Il4KP1WZyiYF3CZJJblxuxIzJD9 ftOVHvT1bfBqlCs5q/Ga8rae8DZswqhS7fMfgOHnwnPkrgWBo+l+BmoWa58A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072327_382657_EE1FE61B X-CRM114-Status: GOOD ( 38.53 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org While the timeout logic provides guarantees for the waiter's forward progress, the time until a stalling waiter unblocks can still be long. The default timeout of 1/2 sec can be excessively long for some use cases. Additionally, custom timeouts may exacerbate recovery time. Introduce logic to detect common cases of deadlocks and perform quicker recovery. This is done by dividing the time from entry into the locking slow path until the timeout into intervals of 1 ms. Then, after each interval elapses, deadlock detection is performed, while also polling the lock word to ensure we can quickly break out of the detection logic and proceed with lock acquisition. A 'held_locks' table is maintained per-CPU where the entry at the bottom denotes a lock being waited for or already taken. Entries coming before it denote locks that are already held. The current CPU's table can thus be looked at to detect AA deadlocks. The tables from other CPUs can be looked at to discover ABBA situations. Finally, when a matching entry for the lock being taken on the current CPU is found on some other CPU, a deadlock situation is detected. This function can take a long time, therefore the lock word is constantly polled in each loop iteration to ensure we can preempt detection and proceed with lock acquisition, using the is_lock_released check. We set 'spin' member of rqspinlock_timeout struct to 0 to trigger deadlock checks immediately to perform faster recovery. Note: Extending lock word size by 4 bytes to record owner CPU can allow faster detection for ABBA. It is typically the owner which participates in a ABBA situation. However, to keep compatibility with existing lock words in the kernel (struct qspinlock), and given deadlocks are a rare event triggered by bugs, we choose to favor compatibility over faster detection. The release_held_lock_entry function requires an smp_wmb, while the release store on unlock will provide the necessary ordering for us. Add comments to document the subtleties of why this is correct. It is possible for stores to be reordered still, but in the context of the deadlock detection algorithm, a release barrier is sufficient and needn't be stronger for unlock's case. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 100 +++++++++++++++++ kernel/locking/rqspinlock.c | 185 ++++++++++++++++++++++++++++--- 2 files changed, 271 insertions(+), 14 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index d23793d8e64d..b685f243cf96 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -11,6 +11,7 @@ #include #include +#include struct qspinlock; typedef struct qspinlock rqspinlock_t; @@ -22,4 +23,103 @@ extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); */ #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4) +/* + * Choose 31 as it makes rqspinlock_held cacheline-aligned. + */ +#define RES_NR_HELD 31 + +struct rqspinlock_held { + int cnt; + void *locks[RES_NR_HELD]; +}; + +DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); + +static __always_inline void grab_held_lock_entry(void *lock) +{ + int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt); + + if (unlikely(cnt > RES_NR_HELD)) { + /* Still keep the inc so we decrement later. */ + return; + } + + /* + * Implied compiler barrier in per-CPU operations; otherwise we can have + * the compiler reorder inc with write to table, allowing interrupts to + * overwrite and erase our write to the table (as on interrupt exit it + * will be reset to NULL). + * + * It is fine for cnt inc to be reordered wrt remote readers though, + * they won't observe our entry until the cnt update is visible, that's + * all. + */ + this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock); +} + +/* + * We simply don't support out-of-order unlocks, and keep the logic simple here. + * The verifier prevents BPF programs from unlocking out-of-order, and the same + * holds for in-kernel users. + * + * It is possible to run into misdetection scenarios of AA deadlocks on the same + * CPU, and missed ABBA deadlocks on remote CPUs if this function pops entries + * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct + * logic to preserve right entries in the table would be to walk the array of + * held locks and swap and clear out-of-order entries, but that's too + * complicated and we don't have a compelling use case for out of order unlocking. + */ +static __always_inline void release_held_lock_entry(void) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (unlikely(rqh->cnt > RES_NR_HELD)) + goto dec; + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL); +dec: + /* + * Reordering of clearing above with inc and its write in + * grab_held_lock_entry that came before us (in same acquisition + * attempt) is ok, we either see a valid entry or NULL when it's + * visible. + * + * But this helper is invoked when we unwind upon failing to acquire the + * lock. Unlike the unlock path which constitutes a release store after + * we clear the entry, we need to emit a write barrier here. Otherwise, + * we may have a situation as follows: + * + * for lock B + * release_held_lock_entry + * + * try_cmpxchg_acquire for lock A + * grab_held_lock_entry + * + * Lack of any ordering means reordering may occur such that dec, inc + * are done before entry is overwritten. This permits a remote lock + * holder of lock B (which this CPU failed to acquire) to now observe it + * as being attempted on this CPU, and may lead to misdetection (if this + * CPU holds a lock it is attempting to acquire, leading to false ABBA + * diagnosis). + * + * In case of unlock, we will always do a release on the lock word after + * releasing the entry, ensuring that other CPUs cannot hold the lock + * (and make conclusions about deadlocks) until the entry has been + * cleared on the local CPU, preventing any anomalies. Reordering is + * still possible there, but a remote CPU cannot observe a lock in our + * table which it is already holding, since visibility entails our + * release store for the said lock has not retired. + * + * In theory we don't have a problem if the dec and WRITE_ONCE above get + * reordered with each other, we either notice an empty NULL entry on + * top (if dec succeeds WRITE_ONCE), or a potentially stale entry which + * cannot be observed (if dec precedes WRITE_ONCE). + * + * Emit the write barrier _before_ the dec, this permits dec-inc + * reordering but that is harmless as we'd have new entry set to NULL + * already, i.e. they cannot precede the NULL store above. + */ + smp_wmb(); + this_cpu_dec(rqspinlock_held_locks.cnt); +} + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 16ec1b9eb005..ce2bc0a85a07 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -31,6 +31,7 @@ */ #include "qspinlock.h" #include "lock_events.h" +#include "rqspinlock.h" /* * The basic principle of a queue-based spinlock can best be understood @@ -74,16 +75,146 @@ struct rqspinlock_timeout { u64 timeout_end; u64 duration; + u64 cur; u16 spin; }; #define RES_TIMEOUT_VAL 2 -static noinline int check_timeout(struct rqspinlock_timeout *ts) +DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); + +static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) +{ + if (!(atomic_read_acquire(&lock->val) & (mask))) + return true; + return false; +} + +static noinline int check_deadlock_AA(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + int cnt = min(RES_NR_HELD, rqh->cnt); + + /* + * Return an error if we hold the lock we are attempting to acquire. + * We'll iterate over max 32 locks; no need to do is_lock_released. + */ + for (int i = 0; i < cnt - 1; i++) { + if (rqh->locks[i] == lock) + return -EDEADLK; + } + return 0; +} + +/* + * This focuses on the most common case of ABBA deadlocks (or ABBA involving + * more locks, which reduce to ABBA). This is not exhaustive, and we rely on + * timeouts as the final line of defense. + */ +static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + int rqh_cnt = min(RES_NR_HELD, rqh->cnt); + void *remote_lock; + int cpu; + + /* + * Find the CPU holding the lock that we want to acquire. If there is a + * deadlock scenario, we will read a stable set on the remote CPU and + * find the target. This would be a constant time operation instead of + * O(NR_CPUS) if we could determine the owning CPU from a lock value, but + * that requires increasing the size of the lock word. + */ + for_each_possible_cpu(cpu) { + struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu); + int real_cnt = READ_ONCE(rqh_cpu->cnt); + int cnt = min(RES_NR_HELD, real_cnt); + + /* + * Let's ensure to break out of this loop if the lock is available for + * us to potentially acquire. + */ + if (is_lock_released(lock, mask, ts)) + return 0; + + /* + * Skip ourselves, and CPUs whose count is less than 2, as they need at + * least one held lock and one acquisition attempt (reflected as top + * most entry) to participate in an ABBA deadlock. + * + * If cnt is more than RES_NR_HELD, it means the current lock being + * acquired won't appear in the table, and other locks in the table are + * already held, so we can't determine ABBA. + */ + if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD) + continue; + + /* + * Obtain the entry at the top, this corresponds to the lock the + * remote CPU is attempting to acquire in a deadlock situation, + * and would be one of the locks we hold on the current CPU. + */ + remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]); + /* + * If it is NULL, we've raced and cannot determine a deadlock + * conclusively, skip this CPU. + */ + if (!remote_lock) + continue; + /* + * Find if the lock we're attempting to acquire is held by this CPU. + * Don't consider the topmost entry, as that must be the latest lock + * being held or acquired. For a deadlock, the target CPU must also + * attempt to acquire a lock we hold, so for this search only 'cnt - 1' + * entries are important. + */ + for (int i = 0; i < cnt - 1; i++) { + if (READ_ONCE(rqh_cpu->locks[i]) != lock) + continue; + /* + * We found our lock as held on the remote CPU. Is the + * acquisition attempt on the remote CPU for a lock held + * by us? If so, we have a deadlock situation, and need + * to recover. + */ + for (int i = 0; i < rqh_cnt - 1; i++) { + if (rqh->locks[i] == remote_lock) + return -EDEADLK; + } + /* + * Inconclusive; retry again later. + */ + return 0; + } + } + return 0; +} + +static noinline int check_deadlock(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + int ret; + + ret = check_deadlock_AA(lock, mask, ts); + if (ret) + return ret; + ret = check_deadlock_ABBA(lock, mask, ts); + if (ret) + return ret; + + return 0; +} + +static noinline int check_timeout(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) { u64 time = ktime_get_mono_fast_ns(); + u64 prev = ts->cur; if (!ts->timeout_end) { + ts->cur = time; ts->timeout_end = time + ts->duration; return 0; } @@ -91,6 +222,15 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) if (time > ts->timeout_end) return -ETIMEDOUT; + /* + * A millisecond interval passed from last time? Trigger deadlock + * checks. + */ + if (prev + NSEC_PER_MSEC < time) { + ts->cur = time; + return check_deadlock(lock, mask, ts); + } + return 0; } @@ -99,21 +239,22 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) * as the macro does internal amortization for us. */ #ifndef res_smp_cond_load_acquire -#define RES_CHECK_TIMEOUT(ts, ret) \ - ({ \ - if (!(ts).spin++) \ - (ret) = check_timeout(&(ts)); \ - (ret); \ +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ + ({ \ + if (!(ts).spin++) \ + (ret) = check_timeout((lock), (mask), &(ts)); \ + (ret); \ }) #else -#define RES_CHECK_TIMEOUT(ts, ret, mask) \ +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ ({ (ret) = check_timeout(&(ts)); }) #endif /* * Initialize the 'spin' member. + * Set spin member to 0 to trigger AA/ABBA checks immediately. */ -#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; }) +#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 0; }) /* * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary. @@ -208,6 +349,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) goto queue; } + /* + * Grab an entry in the held locks array, to enable deadlock detection. + */ + grab_held_lock_entry(lock); + /* * We're pending, wait for the owner to go away. * @@ -221,7 +367,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ if (val & _Q_LOCKED_MASK) { RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); - res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); + res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK)); } if (ret) { @@ -236,7 +382,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ clear_pending(lock); lockevent_inc(rqspinlock_lock_timeout); - return ret; + goto err_release_entry; } /* @@ -254,6 +400,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ queue: lockevent_inc(lock_slowpath); + /* + * Grab deadlock detection entry for the queue path. + */ + grab_held_lock_entry(lock); + node = this_cpu_ptr(&rqnodes[0].mcs); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -273,9 +424,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) lockevent_inc(lock_no_node); RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); while (!queued_spin_trylock(lock)) { - if (RES_CHECK_TIMEOUT(ts, ret)) { + if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) { lockevent_inc(rqspinlock_lock_timeout); - break; + goto err_release_node; } cpu_relax(); } @@ -371,7 +522,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2); val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || - RES_CHECK_TIMEOUT(ts, ret)); + RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK)); waitq_timeout: if (ret) { @@ -404,7 +555,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) WRITE_ONCE(next->locked, RES_TIMEOUT_VAL); } lockevent_inc(rqspinlock_lock_timeout); - goto release; + goto err_release_node; } /* @@ -451,5 +602,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ __this_cpu_dec(rqnodes[0].mcs.count); return ret; +err_release_node: + trace_contention_end(lock, ret); + __this_cpu_dec(rqnodes[0].mcs.count); +err_release_entry: + release_held_lock_entry(); + return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); From patchwork Mon Mar 3 15:22:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9C4DDC282CD for ; Mon, 3 Mar 2025 15:45:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Narwm03ehkHlcNbrRom2a3WOE/jJ7XHKhd4vuGjJYfQ=; b=qz8rnl7YZPnDqv4D6457Vmj4u7 VagvAtfYtRjSpfNmIGH/BEZ96xrximkrHTLJdVzTE447ALIo1bghxJcZRWh8878eM65R9ZNVrsjnA vUptpqCFjWhw0MeaTfGyQ/yd1LuqLOt/ein7z6+MKI7ZTpsqrJW1IseAyqOLh2OX8qEdG31fGV9Cs zqE6kCCTxSbqK9S3VlQ3jb/ExzvjGdFcSkP9NzrmxOoHYrE4AhbrsEaazldhTkP/VOOf2CQHioyYN FLI5gGJuqkoVdtuiL2ujhVzxBta6hha24I0JQ7ZiI1pcAD8S8NAGfzACHt8yW00AvjwCcvIHHazQY 9SEq2y2w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp7zB-00000001N8D-47na; Mon, 03 Mar 2025 15:45:41 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dg-00000001IHo-2pVf for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:30 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5b1f17b1804b1-4394036c0efso29343985e9.2 for ; Mon, 03 Mar 2025 07:23:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015407; x=1741620207; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Narwm03ehkHlcNbrRom2a3WOE/jJ7XHKhd4vuGjJYfQ=; b=H0v+dV99KB/Sq1m1JGsVU8wY4uplI/zIUWck365nXWgrkB8HGoEKSIjYqKD8DWC2sD 9gMirr7+JJZVXWptVGkAW+uPn7EX2oNJ08NePpXT/+cDaogsN2D3wGbQF9VfgrEeYiSa q1DX9n6ZfMhQeGeSotQLgXnmxTJ4SPd7c6jSduL61ReWsT/YRRvV3kY6srHbBgTMkqyq 5ZP9O1weg05O0SgW7TUo341ldVNVzHFyXIsUMjq6xzcBgvlIzxl11HGwviUNv0ofpUi2 mPD0yXQ7qgrgqbo8MVgCAArC5camG395a2DBtdEYCjqeBYGFFGhvOM64X8qAw2/9MHcZ N8gA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015407; x=1741620207; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Narwm03ehkHlcNbrRom2a3WOE/jJ7XHKhd4vuGjJYfQ=; b=TTOn89JuXsxNaOeCTDE31UVRbeBY8/CeWcCJ3SM95kHIofPMkCWOQjNgJMgiNi0oEf 6J5cL/qHWN1tmNZQ7ScUFwYdjLwSCvH5qP0HWBAQvLgL6yx84qT8jIec5ektxBFbjOlU tpjFLbXFhc9DN+MArpNQAEeBkQEohruqY1JXzF+lP/A+Jv3AI9UBzwWp55Q/ylN543cK 3gUa9KWU6I3voWklW4lMqEvtmIGYlHPuFbjxdymF68jgwNw8x+NEgNqedpaDGvUVLRHU P2/ULlsALTAchnbIP8xviwHwjPo7pk+gdoGfoAor0DhEdg269QJr+t0OScJLqKH/quWh 8nEQ== X-Forwarded-Encrypted: i=1; AJvYcCUTnvfHoBSSPHQn8aoHRYdei7xAN+ymITFaH+2AsdhhLeu3QJk6wONziZseeo5zebJsZsl7SIgxoUoz4btW3h/a@lists.infradead.org X-Gm-Message-State: AOJu0Yz3XjUYif8zPgCyvfvH7luWU+gtfRm1vvsmB1pNYb1ik79RGQTw myYy2T9NjUa5QGclv8JMickKNWWUQeQ9XGR+6QUd+BP1r9T1vEADqiEEABgy5ek= X-Gm-Gg: ASbGncvGYZVX1ZlQR0x3O0YPjKRYMD9vFHWpAjj3KzvpLUiCMEpW+WVBL1TANYrMTUr iahr/jszUkMLdXnTmoNdX9ByOzjBEhm4YF/RI8XHUGu/URAOwtWB0AVc98eZozfvi99Do7t/1/k cJ2jvCc6cOGGk9afOw+Vs4/2N2dLGPHaCDfjuc2u5Q0HZ3TZ0gpYCu6dLymozI7i8Ry+sOyCsFe OPmB2pciBV5o4ll/Fu6kmDZ26442+CX+GwZkzNGNIXPDh8vZsMsvG3Kovxqc+OyPbUh0fQIbaDC U2BZkCgcB52yvZeW2p+YhXz58pQ2HeXa+7Q= X-Google-Smtp-Source: AGHT+IHQPjApFZs3HhsmlujTPNCDLSejJ0ltemem59qnL0CWos33mq2nn38IumeJ2bsiNZQ67lldHQ== X-Received: by 2002:a05:6000:1f8d:b0:390:fb37:1bd with SMTP id ffacd0b85a97d-390fb370470mr6319805f8f.46.1741015407243; Mon, 03 Mar 2025 07:23:27 -0800 (PST) Received: from localhost ([2a03:2880:31ff:4e::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485db82sm14509941f8f.88.2025.03.03.07.23.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:26 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 13/25] rqspinlock: Add a test-and-set fallback Date: Mon, 3 Mar 2025 07:22:53 -0800 Message-ID: <20250303152305.3195648-14-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4064; h=from:subject; bh=o1MKDNUhE9EFH4RwMPASaZvJKuJrMCZ8uocCQm849Ik=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXyKg4wB+6V7Ple/tkprfUEWpR0Hl6dgveKapl mmrrb5WJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyrORD/ 9X6zKnZ50odRHn/aTMB5eerXFYbZYCFEA0zhWN7wOz/mXTLzA+cu1Kx16cILZLnewteP+spWp6TfeG oO/uEEbmEQvn0We0uAupBYIt8VRyjDPQs99CWCu4QGxc3xxkMS5+mQTJ2ons155TFMD85j8Vt7WBeN Gp3Q7X7RM7uzhQ4y+EgVeTAa8/W6GwRYnV/14RG89yraqvQIzLJtA/BHAHhJwp3um17ldvOg30GGTG ljTKl/gZXoBY1wmcLDkkukm03I2d3VP267EdahhXrctgXkA3dPZCyVzJg/lHJF41jU28pRqEqdU1fZ NB5PW9KYd1XzxvhBhwPdgXR2qunX1i8LP/jJn9SDb6JtxKugtHFL9y+ODarv6pIP3Of+NZLd7dwdtO cLPFwRPp5fOfnOfrSQBkPYD+CksR9o5ApDj8rMEimbPEIeKSfuHi4JgiExIi0TmcmH2AHxW+M+Jons MY9cSj6MOAtu7gTUDpDFMkHrkip9TL0zZhiW9CJHxYfOVZLWIFlgX00dUWHbzWPbhiRM6gc8tnmL4e gYYeCoy81wYg2HhX5/dCjB6beylFJO8aHhgT+X4oqKJx85F7LbFa6OameusA/gU2YMVg/nDYQKZMxs TIUKm5h4/2zgx4el81Kce6UG+hDPxefexA5tXAA1BD6DWZNiRUU8p7hniQGw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072328_728577_C028AFFD X-CRM114-Status: GOOD ( 18.17 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Include a test-and-set fallback when queued spinlock support is not available. Introduce a rqspinlock type to act as a fallback when qspinlock support is absent. Include ifdef guards to ensure the slow path in this file is only compiled when CONFIG_QUEUED_SPINLOCKS=y. Subsequent patches will add further logic to ensure fallback to the test-and-set implementation when queued spinlock support is unavailable on an architecture. Unlike other waiting loops in rqspinlock code, the one for test-and-set has no theoretical upper bound under contention, therefore we need a longer timeout than usual. Bump it up to a second in this case. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 17 ++++++++++++ kernel/locking/rqspinlock.c | 45 ++++++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index b685f243cf96..b30a86abad7b 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -12,11 +12,28 @@ #include #include #include +#ifdef CONFIG_QUEUED_SPINLOCKS +#include +#endif + +struct rqspinlock { + union { + atomic_t val; + u32 locked; + }; +}; struct qspinlock; +#ifdef CONFIG_QUEUED_SPINLOCKS typedef struct qspinlock rqspinlock_t; +#else +typedef struct rqspinlock rqspinlock_t; +#endif +extern int resilient_tas_spin_lock(rqspinlock_t *lock); +#ifdef CONFIG_QUEUED_SPINLOCKS extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +#endif /* * Default timeout for waiting loops is 0.25 seconds diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index ce2bc0a85a07..27ab4642f894 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -21,7 +21,9 @@ #include #include #include +#ifdef CONFIG_QUEUED_SPINLOCKS #include +#endif #include #include #include @@ -29,9 +31,12 @@ /* * Include queued spinlock definitions and statistics code */ +#ifdef CONFIG_QUEUED_SPINLOCKS #include "qspinlock.h" #include "lock_events.h" #include "rqspinlock.h" +#include "mcs_spinlock.h" +#endif /* * The basic principle of a queue-based spinlock can best be understood @@ -70,8 +75,6 @@ * */ -#include "mcs_spinlock.h" - struct rqspinlock_timeout { u64 timeout_end; u64 duration; @@ -262,6 +265,42 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask, */ #define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; }) +/* + * Provide a test-and-set fallback for cases when queued spin lock support is + * absent from the architecture. + */ +int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock) +{ + struct rqspinlock_timeout ts; + int val, ret = 0; + + RES_INIT_TIMEOUT(ts); + grab_held_lock_entry(lock); + + /* + * Since the waiting loop's time is dependent on the amount of + * contention, a short timeout unlike rqspinlock waiting loops + * isn't enough. Choose a second as the timeout value. + */ + RES_RESET_TIMEOUT(ts, NSEC_PER_SEC); +retry: + val = atomic_read(&lock->val); + + if (val || !atomic_try_cmpxchg(&lock->val, &val, 1)) { + if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) + goto out; + cpu_relax(); + goto retry; + } + + return 0; +out: + release_held_lock_entry(); + return ret; +} + +#ifdef CONFIG_QUEUED_SPINLOCKS + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. @@ -610,3 +649,5 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) return ret; } EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); + +#endif /* CONFIG_QUEUED_SPINLOCKS */ From patchwork Mon Mar 3 15:22:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 69F52C282CD for ; Mon, 3 Mar 2025 15:50:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=hbiXkX+sz+mKWhiZ2JD9Yjh7XLLMoN8o2kVSbNXCy0Y=; b=myM9v3CY38IRYLcPJTtxH4qKoQ WLXpL1S9zIxMC2ezkyOeXS5xUepaOpvkDNQbYVspV5izyIZzoQqO89LQQgcADBAY49zV/e5412ss7 GuwlTMIHXD8JlkFfhz6FQAeB1v/lUwdNwDGT2qjrRN0jXSQL7l00NCR58Z6tuZ49TJb/UVryTz2UO ZI9sXnASOhiJe6taw2nj3ykD/V5PqaA/hnQisVpEYk1gro9uZmKO+OHFYQoByf8AI3y02UqCtXWUg BXWYiGVX52Lf8P2N4RZrePu1pllxA19xrSPtY/t/iaIRYEF09BuYfuYuqE5Q2An39LDJc2xpF32nN Efhyx8ww==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp83q-00000001O2B-0jbF; Mon, 03 Mar 2025 15:50:30 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dl-00000001IL2-3oGA for linux-arm-kernel@bombadil.infradead.org; Mon, 03 Mar 2025 15:23:34 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=hbiXkX+sz+mKWhiZ2JD9Yjh7XLLMoN8o2kVSbNXCy0Y=; b=OjJX0Eu8UOalTZ25uaJsoQqaqM 1lCAhxwIzxbHkGAENxszzVK5oMmNZ0qHQOetS4YOiGsGpd3VFPTe/HDW9jxxkqSUXK17U+/dIm0yZ idpZVfJ8XhI92tdhQFivR9nP03dcrylRAkEAkw70yoUi0LnhjH7i843TQ9N9GHJCSBHkCvngtF/B/ tary/G0sfoOi2+VrXgt/DxaTbz0MwsjW3CWFwam2wwBM/F5bdxow7TIMsp3KIXu5lU1R48xPpJmPQ jFmr273mADVVZCmtoYV24IkIuY12AfGTsYfPjIOtbIipMxiCK5zkjfzd9xUsecz01dQZmEGoVx2AU gm5ElKUA==; Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by desiato.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7di-00000004Zzb-3VJj for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:32 +0000 Received: by mail-wm1-x341.google.com with SMTP id 5b1f17b1804b1-43bbc8b7c65so13970075e9.0 for ; Mon, 03 Mar 2025 07:23:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015408; x=1741620208; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hbiXkX+sz+mKWhiZ2JD9Yjh7XLLMoN8o2kVSbNXCy0Y=; b=jN5EfvZJBcO+blCOpiqFDWAOJ5tjQGL1HjR4Asj2/t9FHgcUMKlobVxTSXj/dCv3mh +NqBp2/7xkL7SKRhLstAw6iUkqtISMTg0qjJgfkzgKbrh0oeFZmxdGsLPvmeB2AOPsfq wqN1rZZQQZJML++X72doaB+k7BpvimSuOFiwbdUIu66S8Upt4+7sw21PWXvuLxEcIhn/ Yv/jGC9oNNs/mqbegr8AqyVWKOdjklg4HJyIANQBc4/JAcpMWbdYSteNKdjavrfKy4Sn HoJRZD0UM66BgSotiObEsyqEMMsU5acdtlYtdqOZwODbOjKx80E8uGVFRajf/Duudnya wO6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015408; x=1741620208; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hbiXkX+sz+mKWhiZ2JD9Yjh7XLLMoN8o2kVSbNXCy0Y=; b=fgXohOgIMO63X8N5vZGP3K1LzJ7wwse1BZqj6l/KnQoUJ2nawX8zQHZLqyHg5Eh+/5 8Uw/Qy44G6pceI1K4BBlrAQu0IVecqlHy0MfI17Bm9RJs+KlrgWRJBsyxQ4nG/HpIPKV GNgUXWCuGeaqo0V1lM1cEGQd/L/WlUcNrpNWV9Cnm2VUqnAnTJBkFK7XRZnzfEiiugYS 6D1HtS2SlQ+pD6B8502cfyKmo9Acf/O9gcTUlCd2GbIH9CUKfwCQ98OZj7o4Vz9xHNJY pPmD0ZTp7vljlsNktIC74qzcn0F9Qi+jFKoT9aOi6AGGi+31/moNxToEScyIGIMTxe5O UNtg== X-Forwarded-Encrypted: i=1; AJvYcCWdQq2UkGft7zTe7VtIF3lbCwpIOwKtjXArHbHfTYz8k0QTmqjH3KWPG79H7mZNJVkQ2Oyj2VgazQzloYNp7jmH@lists.infradead.org X-Gm-Message-State: AOJu0YxLVsTFS5BhqoHl6lLonUAtGw2oJvj+j7FYQVx6apTQsOsUwydx PXKxCKWlw5X57TcYWPipyzWDKf6+lNd/dDvsQ1Ttr1GRDe0W4OSb X-Gm-Gg: ASbGnct2IEcZp7m3RkhW0vXGXWpA6RaBmbWI/P6V3pfCzGkQDETFsfWrYCReGxgT9UI Rqu3/QDdx2zIU04OBBpVTH1npOXqRpZaZ8pyAnqPkkoJ0QBS/m/Vhpzc7I2BKa3ao8jnQhUUUWH RcXycrSA+kF+zsY9mjoms41BK3xiSOl4wj5gQ9Ced3hoSBvrwN3Jt58vj5/FaaBdn/erbqFQrTg wZ5ODIiF4+sWg6eTbi+YmL8uirbzsy0YGGu8oPGLqM4O6ipyF4geoofkuzifNgZO4cUG6w6i9LQ ULsEGdkRH6mjq5S7orMllWed6r16wH9hSJE= X-Google-Smtp-Source: AGHT+IGKz1oo1UHRACNo+L3weWdPcL06paS7sy9ejxox0qqfekJgP8kl/qKZCrzxii+jUg8vEiUMxA== X-Received: by 2002:a05:600c:4685:b0:439:a0a3:a15 with SMTP id 5b1f17b1804b1-43ba67045camr146732535e9.14.1741015408419; Mon, 03 Mar 2025 07:23:28 -0800 (PST) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485d8e4sm14531679f8f.85.2025.03.03.07.23.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:27 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 14/25] rqspinlock: Add basic support for CONFIG_PARAVIRT Date: Mon, 3 Mar 2025 07:22:54 -0800 Message-ID: <20250303152305.3195648-15-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3277; h=from:subject; bh=SCTiw9WUUkbOGd1OOljQ8eDMmea9HElqFJBZYwx5dAE=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXfTMwb5GMWPgrjqrFJA/0gyw9UdNiWN2qmLrL O0hijiGJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8Ryu1NEA CTUuGHPTrmc3ouIq7xk/7vEotpGLUGgeucb8GQwgQo6rcM9OI5bBvMW9JZfYy/los+WuIlwT/5IdCo AYY6qGCzJtoH464Llu8bFT1g6BAyOhZoDkX30C4FLyMD7aeKOCsD4w4X1KQJ0ITOJjLgSiO02ZBcsk G+vSU3qCtMZHbeLI9Q3/47WprV1GDIVEhOPmoC4z80iJULvFyEEv1KPAsKo+TyvXw2oYlSbxG+1gBi 1wmztdcoMQy6Y25bbcqgDlSLphn/5jKPVmwA/PqJCSlHPCcKKDENr5iG+htXq3ifKoKap5E/XN3j1P jGGqMu1j7i2P7YfUmNzvcqd+995DkNkOql0XTtz5YXJU70HWgDDjw2yhkdCHu8HgF+xp8nM1Y5yZHx UhdzTefDYMPL5JVyOInrepgDvHYQkN52ytGBNLKZt6tBQmXSeSSKJxhmpHya3TBUlZ78CPzq9vCVut qCdY7gXM8RU5a13RYFgs3bRptzVLJiuo+wnUZRwM7Cob+PtqcIIrMV1tm6IdmQBmSY8EzRJM447VxP ERz7W85ZSGfHanbBjbzfofA1Q57J4Au8euTPc7e/mx0ZR2sk/gqk632LbaDQKoQcxFXXSO+DEafFhX F8BMJap/MTZa0kNF5S+kGk3gqFtzwIqXvRBWM8HCOLgpW8KCnOFSK/bHk0mw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_152331_321650_1206E663 X-CRM114-Status: GOOD ( 16.95 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org We ripped out PV and virtualization related bits from rqspinlock in an earlier commit, however, a fair lock performs poorly within a virtual machine when the lock holder is preempted. As such, retain the virt_spin_lock fallback to test and set lock, but with timeout and deadlock detection. We can do this by simply depending on the resilient_tas_spin_lock implementation from the previous patch. We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that requires more involved algorithmic changes and introduces more complexity. It can be done when the need arises in the future. Signed-off-by: Kumar Kartikeya Dwivedi --- arch/x86/include/asm/rqspinlock.h | 33 +++++++++++++++++++++++++++++++ include/asm-generic/rqspinlock.h | 14 +++++++++++++ kernel/locking/rqspinlock.c | 3 +++ 3 files changed, 50 insertions(+) create mode 100644 arch/x86/include/asm/rqspinlock.h diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h new file mode 100644 index 000000000000..24a885449ee6 --- /dev/null +++ b/arch/x86/include/asm/rqspinlock.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_RQSPINLOCK_H +#define _ASM_X86_RQSPINLOCK_H + +#include + +#ifdef CONFIG_PARAVIRT +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key); + +#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled +static __always_inline bool resilient_virt_spin_lock_enabled(void) +{ + return static_branch_likely(&virt_spin_lock_key); +} + +#ifdef CONFIG_QUEUED_SPINLOCKS +typedef struct qspinlock rqspinlock_t; +#else +typedef struct rqspinlock rqspinlock_t; +#endif +extern int resilient_tas_spin_lock(rqspinlock_t *lock); + +#define resilient_virt_spin_lock resilient_virt_spin_lock +static inline int resilient_virt_spin_lock(rqspinlock_t *lock) +{ + return resilient_tas_spin_lock(lock); +} + +#endif /* CONFIG_PARAVIRT */ + +#include + +#endif /* _ASM_X86_RQSPINLOCK_H */ diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index b30a86abad7b..f8850f09d0d6 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -35,6 +35,20 @@ extern int resilient_tas_spin_lock(rqspinlock_t *lock); extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); #endif +#ifndef resilient_virt_spin_lock_enabled +static __always_inline bool resilient_virt_spin_lock_enabled(void) +{ + return false; +} +#endif + +#ifndef resilient_virt_spin_lock +static __always_inline int resilient_virt_spin_lock(rqspinlock_t *lock) +{ + return 0; +} +#endif + /* * Default timeout for waiting loops is 0.25 seconds */ diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 27ab4642f894..b06256bb16f4 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -345,6 +345,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + if (resilient_virt_spin_lock_enabled()) + return resilient_virt_spin_lock(lock); + RES_INIT_TIMEOUT(ts); /* From patchwork Mon Mar 3 15:22:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C3702C282CD for ; Mon, 3 Mar 2025 15:47:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Oib9X6F+jD32qthRhftfxOrIXcfGV0ymse+3a+fep28=; b=SfR0MBJwSzOkq6L14gdx+vl99j 8KVCEYLpSzOMSbhziOJTr+00UNdjdpiM51U1ELjfG183F24BUN0Uw6gqSSDfdZVrrJY0p8ZdtYWBc i2ivBiZ2xuDOCRRjykDs2d+b4fL9rj0i09owYpgY9D9IP4kqERAiP/1eo6L1WC2gZvm65J1CXG7kx ELwM5sNNXtM6aRzhAHUgf4Ngz3e8xfjOTlYhRmeOBJA+Wf7C0lLaUGsHntszIzancZ4zY/aEQNp2d tinzoOQwR7/3CELcR/jorHokXSmG9m7q/pxVyYZNFXQfjaufaK96szPQ4b0UofDt9mp2JXcE8+XDt 5v+jznSA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp80k-00000001NRP-2wSz; Mon, 03 Mar 2025 15:47:18 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7di-00000001IJE-3sMO for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:32 +0000 Received: by mail-wm1-x341.google.com with SMTP id 5b1f17b1804b1-439950a45daso29214605e9.2 for ; Mon, 03 Mar 2025 07:23:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015409; x=1741620209; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Oib9X6F+jD32qthRhftfxOrIXcfGV0ymse+3a+fep28=; b=niFplgjsxlU4d06uMPUhh2HGkRR1HgebKy5/eeJNqS835Hbv7NJ2syNGh76R13kHE8 YQA2xYcjbxEdYAwfKJ3TccyZbuaG32FTXRaFGilwTB4cG3NpykjqbgQf9V5PCuJpVwUc 7uzm8h3oGUFqWHy7rO8MunDtIFeEYTJxadMIxjn7VZIfnEVxp2LrCYiYRye8ZNhqJuIb y68nXh3VtnTK2KQV2f+sGd6Wt3VMLl+mXEg5QW9NoiKaUZ8Zcpc63pDkxHfDcT8ggCRG zik7z9wVts3T3/QYuxmCvuuha8zhDy03U18PuHlOb82wF/Cma1CGRlHEt1f8HxyQvYo+ QEJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015409; x=1741620209; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Oib9X6F+jD32qthRhftfxOrIXcfGV0ymse+3a+fep28=; b=RdXa8+sArT+c3nwgSg1zipDedSkaGy/X6RrMkpFimq9o9IwC9mOwKKhnK9dJQ85Mor J4GCUDYwrJnlyjxFoZ9xRC01ffV1YRlUDGXm97frOFKyaArWDPVnzOwROj2hagk328XE tkdbyZiVKjQOYHHlE4fJ5FdVjEYpLDBwbMZpf6ft6vetQmAJGxUNXmk7/OEUcLhUh+L4 kq1PO/4+yDsFTtOm4uIMhW6SG/3JlgvOnOifTl0AseLAKfo7WatgcLbNiSuw/AVed186 +4+pK5Ua0RdO7fxKE8mBHMCmikzlpDWjOWjlGj8oHx1HEjUFdxLJ190CRy3NNz4/aM29 yd0Q== X-Forwarded-Encrypted: i=1; AJvYcCWbCl5JJ4qsaKZYtESI+E4DgtbAkDeXSEGefYp7riZfpLSDfkd1hRIX0ZaimfstVS3Pu6gAuQiUxto4rD3RS2Fu@lists.infradead.org X-Gm-Message-State: AOJu0Yz1XK5khtq+tfpBkQxuyHnA5dIK8ezqkXMa6YNfh/SCwv+gaXA7 0xjZ2SFdVsNgBbHeUDlXyi0ThcW38otRoZTXRTdvEhFhNKaf+Vm9 X-Gm-Gg: ASbGncvlPv7VjdueAGnorcJU+W0kJmvkMpIHbQeHg5aNlL+hoB52VIL2GbXWQRoD1Sk 55QqLVAF1+DIePPhOPefIYV5mxaH8e1st1T8d/7LouUYo3JdCK+MlxHs1hNfxi+tv3IdcVtVLbD GENeFYXvE3SY71BMWuUOfOYL6lo6WV+k2NK6ojIBgS91rQqUvPBvqbTQQN4EnE9O3wFE3vP/y7o s/jlL+8OvHe7HfgiwfCwlnHWT2eW/nZctDanJLBgflXvBIPgbJQGIFVDvf+4X/Rsdg4LSNB4kWs S/Bi+DTTNyXlbmihYPZrIEW6dSAuGdPAp5E= X-Google-Smtp-Source: AGHT+IHK0E0NT4rN+IBrKPzoxmISlpoRx++6wRPzHRPKMbke27YkpGiSQnpSIDIa7J8yfk3LY6QLQA== X-Received: by 2002:a05:600c:4e8b:b0:439:a1c7:7b29 with SMTP id 5b1f17b1804b1-43ba6703c35mr123515565e9.17.1741015409574; Mon, 03 Mar 2025 07:23:29 -0800 (PST) Received: from localhost ([2a03:2880:31ff:44::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485db82sm14510061f8f.88.2025.03.03.07.23.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:29 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 15/25] rqspinlock: Add helper to print a splat on timeout or deadlock Date: Mon, 3 Mar 2025 07:22:55 -0800 Message-ID: <20250303152305.3195648-16-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2125; h=from:subject; bh=XzehY2DPwsFHlB5VuQsc3kNRx10DtgHBazvwnaqKM4M=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWXYD8EVbB4Iydj+LTkfn6cdW8csGFE6TObIbKx E0nmgrCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFlwAKCRBM4MiGSL8RyrkJEA Ct9v2S2uZ5k2kQeAFIPGD+gS68B4nVheOj37Umc8z+EwrGBzZ82mY8MkklYthiJE2AQBdHrtWAwoqG tZbTvmL6+lzWfEYRZt2xi60n3hTJMO3dlntsG+nTWnBWpIQZX5PNvzwPOl1/8OQMO+69prEjQItSyu eMEd38+aF8KI6iN85h1WM0KeOubpLLpUWfdNufr7Gsq0Oi1qImSnF/ilMPpOkoWQnvjxBU2ZKp75De P5Whx7I1WBvsuPW2lX4rzXBrMWbAWPQpojlxod4Ls4y3A61yRekC7LremgMkSs6gvc7Tnw/Z9lQWCU CgtczCyFYGLMz+mNBIEEbLqTQdgdrvV3fFCY78DcBB7xUvGlqD6CIZGr6DZljaV2SoWRHRCmv5Ft8H gVx7H5tqZgvWVo2SoMK4dnjxpn4/df+FCKxss6T2T6c60eFLmTgvI0DME2DNvOKFHgBhQpy6p3Cxsl H4g6MJFPnCJsqycg5cOVJKjPPgeoTGZlXrmAyPL60whVruu8QiWcCAFv4taoKU7M5yTeAGfAYdFU8l 6Ws3FsgbVuQ7DFWtfrLU7462oedcdeEAj81i6KLZmyfucZ073rF8/ghmFeBJtrIiheHs7mCqctGJjH +MDX6Nid6DYpDuM6VXjrvhhN8QfGnRpqsnGqUV9O6Cf6rbVzKauRdtvFhKgA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072330_963850_F0DF2DC8 X-CRM114-Status: GOOD ( 14.54 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Whenever a timeout and a deadlock occurs, we would want to print a message to the dmesg console, including the CPU where the event occurred, the list of locks in the held locks table, and the stack trace of the caller, which allows determining where exactly in the slow path the waiter timed out or detected a deadlock. Splats are limited to atmost one per-CPU during machine uptime, and a lock is acquired to ensure that no interleaving occurs when a concurrent set of CPUs conflict and enter a deadlock situation and start printing data. Later patches will use this to inspect return value of rqspinlock API and then report a violation if necessary. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/rqspinlock.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index b06256bb16f4..3b4fdb183588 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -195,6 +195,35 @@ static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask, return 0; } +static DEFINE_PER_CPU(int, report_nest_cnt); +static DEFINE_PER_CPU(bool, report_flag); +static arch_spinlock_t report_lock; + +static void rqspinlock_report_violation(const char *s, void *lock) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (this_cpu_inc_return(report_nest_cnt) != 1) { + this_cpu_dec(report_nest_cnt); + return; + } + if (this_cpu_read(report_flag)) + goto end; + this_cpu_write(report_flag, true); + arch_spin_lock(&report_lock); + + pr_err("CPU %d: %s", smp_processor_id(), s); + pr_info("Held locks: %d\n", rqh->cnt + 1); + pr_info("Held lock[%2d] = 0x%px\n", 0, lock); + for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++) + pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]); + dump_stack(); + + arch_spin_unlock(&report_lock); +end: + this_cpu_dec(report_nest_cnt); +} + static noinline int check_deadlock(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) { From patchwork Mon Mar 3 15:22:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999076 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B6E54C282D1 for ; Mon, 3 Mar 2025 15:49:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=eIFRymfR26T1V4pI+hLoHtvRY/1A7N6xjkuyA7wDGHE=; b=J8mET7eSWmQ+qLUPd85N3zfpF3 A4Wv4nQyEhJ85mC5pWO5SzjlfTAy9+Lk/GaquOQ1PlyGv8KxfJ8TSKwVI1QpPQtpYeXSe3bsrFgqn On3nRj7wMvwv0v6Ju/8YBsdz5fxH5JH+lqK9/viA984DpugN0oH46wwQPtissl5M3DhVzihZxSQa7 awoty3exkWI3DfVMXpK9zGpQ4rS5QGzLcWikfwXT9m9ClKhRcQwnbezSRWVTcoA5Z7YTYKTbfyoel IlwDMEs8J67nesNW3wCE0RaKM7twLR/MilwRGTR6eURuGy5YR+GLQZ6HocgjTGvysfE1WdUgDePWT ckKni++w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp82I-00000001Njm-1SGE; Mon, 03 Mar 2025 15:48:54 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dk-00000001IJw-0YNL for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:33 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5b1f17b1804b1-43bc38bb6baso6275965e9.3 for ; Mon, 03 Mar 2025 07:23:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015411; x=1741620211; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eIFRymfR26T1V4pI+hLoHtvRY/1A7N6xjkuyA7wDGHE=; b=O/6phnUbwQNHxKyuXIrS9NITwBhlaTr57VLCdSkftKU8G+n/T531IBxCWCHlF8+umn cF54c7d5jP3gH7f1Aea3BAoPC/x0W/DMFd36IoH9KRS06hHzK84VWGHDm5NFcmgq65vA V09LZ0Dp84caUj5WSVYJIewoeY+dfgBmZOYhEu96EEs4Neuo40JlKqprJZM8x3BxIfFH yxMNfevz+meFBf/5Z9Ft/iRMRtpy2VcR7Eupu5TSvspsXLRoFZdC5pJ1rDYUqjqn4fA7 eyPnDhjFt8Qso+FEb+OjLL2XdNiRzBE3p/m1UINVE+pkAvI/QhiATvwBWUJ8qgAwaNzF 7yXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015411; x=1741620211; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eIFRymfR26T1V4pI+hLoHtvRY/1A7N6xjkuyA7wDGHE=; b=TkvrSB1ApOuQq+qNSnCimJ2YFP6nxj+C5h8ycAFITaUxXmov+b/K7zriLxW3Q3Ha/Y 8Iu99FnagzeOwGSn8oEtQz8M2izOaqfRTdscXtfMn9JOe+0ogBff7lhfrVMjjw8jTv7f 1WXoNIoXCk7PWBdfoOeGZgzNeDFFTkMd0CUpwefWgh9PFDi7I2DCn25sABd2okY8tBiz LVCiiSjXHUpeatqbbvrdN3NErl5KpwXErq0teNLf3bAZ/6DIHz4+gH0o+Dy+HzIrfvgI VdH7nhD2Dd9kCc12I9YNf8P6fcLK0LKqbIj+V4F9F2taPc881BgqhJkWD7wyJPWTBqpP HInQ== X-Forwarded-Encrypted: i=1; AJvYcCW9CANOx5SAOOOIqGavMHIuyrIaLvI0nCCdZVZ+Zr/hVA8TEtHL4xW2pNTb/z0089yoaZOhFVZAE/TfjxlYHc8w@lists.infradead.org X-Gm-Message-State: AOJu0Yz1HlNfhETD7nGFXz71tl480FrQY8JzgpJiKeezcp0izPW7gDOE 224Tou/9p0OcCod0c1FLfpw3bmdsVjxX5+Oaq1kwZnHeTIckig2L X-Gm-Gg: ASbGnct7vWLQH9s57xqphZlvfU/BG6xDerg59VbJT1rhTXGeGLYyTsBvAkwiHkomun1 EapG0LbmqsjFjM3NfW4GRawxEc8RsuUr3jVSl9GzCuvjyRKOu8rvI+T4ztM8AVZ51rwY5IG6yqJ yVOPFos8iozXSPMALQt+WnuEXGn09ULUcji3+MkjhbHp6FpUuoznORjsrZ68MfUXnN2HZ2TMQE3 RnVv1Mx2i/ECg3RCLoLxYADNWKSyvCOoXXME/F+KgsvABbudSKtXck83mXJGTagb4RWeCiNCVMz jfb3KG+9qnm/9zbTcvwbuBOb2/UovW8A8g== X-Google-Smtp-Source: AGHT+IHhLXGjzf0udnyNXtByFw5OFSp0rn67LfLvjV3YqWbj0wvdktay+P0e8RZUkjfn4MvH3tUgiw== X-Received: by 2002:a5d:5886:0:b0:390:efa5:9f6 with SMTP id ffacd0b85a97d-390efa50b8dmr10698291f8f.51.1741015410709; Mon, 03 Mar 2025 07:23:30 -0800 (PST) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e485dba4sm15037247f8f.92.2025.03.03.07.23.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:30 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 16/25] rqspinlock: Add macros for rqspinlock usage Date: Mon, 3 Mar 2025 07:22:56 -0800 Message-ID: <20250303152305.3195648-17-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3988; h=from:subject; bh=1vN5D7C4tfoaawMuUjp7ZN+fQy3wYGzp7nrXDJHdVAo=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYD7W48Zy4MCJ4hC7Laq1GOwnd+65hIihP1XlI Lb5c/6OJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RyvneD/ 9CSceH24BxXpHlqBvfIHao0z3PNag/hh6hxWdzMJtXnIupGNQZXotn7KA2u9LloczdUDTDU6i5vyel pyxYLhk/EO2NcmqNvsUhMnV5AQmt3i0G4X/wKGhsKXL1avILr8mSIPCqUG6RFfbDQB7MpnGHT9wsVp cQasSdKg8ZNoIJzBhiV64wzVF1LDmN5meStwPmrIfdXHD02M3uHizakkrT2k6ueYJNn2a4tr/vVKoU iRwh0w3G9w3jgipmvotAi+BjklTirlbzwjKDUdCWzM60WlU06NkoPlkUgR2q9m+804yz5qTMaenuPO mDGBty0syn+pois9BdJv0UeaAwoksZN74YC6vuHDh/6BnSK4mOisuHLP/D7iby5FHVL4tFHia6FYXq 0+DkRs/a2OqWFTk+o+cN5A+NCwpDIUbZfEc88aigLqitRC4eJuZQEHywIriAiRmy9ImnyJodlammr7 s9AjvCAzf340XzmF6Gumn6IbMHZb4p5hLT33cHlh7X1AjJ8RCiWQU9k8Q+zrAYThzoK/jY88XCaIjI FPn+NFQHv4b/Llf94bia06KjI0S5GgJJFxeSqmSE2wqAdn52bTxowHvRwOUwjeeTpXHaxYnYDXqa7m FXdZOR4Qg1a7dlf/djjyVfSDMq679lEDbrl1aOKGxNSRnLgsYeiMckHq9q8Q== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072332_169400_8687CC89 X-CRM114-Status: GOOD ( 20.17 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce helper macros that wrap around the rqspinlock slow path and provide an interface analogous to the raw_spin_lock API. Note that in case of error conditions, preemption and IRQ disabling is automatically unrolled before returning the error back to the caller. Ensure that in absence of CONFIG_QUEUED_SPINLOCKS support, we fallback to the test-and-set implementation. Add some comments describing the subtle memory ordering logic during unlock, and why it's safe. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 82 ++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index f8850f09d0d6..418b652e0249 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -153,4 +153,86 @@ static __always_inline void release_held_lock_entry(void) this_cpu_dec(rqspinlock_held_locks.cnt); } +#ifdef CONFIG_QUEUED_SPINLOCKS + +/** + * res_spin_lock - acquire a queued spinlock + * @lock: Pointer to queued spinlock structure + */ +static __always_inline int res_spin_lock(rqspinlock_t *lock) +{ + int val = 0; + + if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) { + grab_held_lock_entry(lock); + return 0; + } + return resilient_queued_spin_lock_slowpath(lock, val); +} + +#else + +#define res_spin_lock(lock) resilient_tas_spin_lock(lock) + +#endif /* CONFIG_QUEUED_SPINLOCKS */ + +static __always_inline void res_spin_unlock(rqspinlock_t *lock) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (unlikely(rqh->cnt > RES_NR_HELD)) + goto unlock; + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL); +unlock: + /* + * Release barrier, ensures correct ordering. See release_held_lock_entry + * for details. Perform release store instead of queued_spin_unlock, + * since we use this function for test-and-set fallback as well. When we + * have CONFIG_QUEUED_SPINLOCKS=n, we clear the full 4-byte lockword. + * + * Like release_held_lock_entry, we can do the release before the dec. + * We simply care about not seeing the 'lock' in our table from a remote + * CPU once the lock has been released, which doesn't rely on the dec. + * + * Unlike smp_wmb(), release is not a two way fence, hence it is + * possible for a inc to move up and reorder with our clearing of the + * entry. This isn't a problem however, as for a misdiagnosis of ABBA, + * the remote CPU needs to hold this lock, which won't be released until + * the store below is done, which would ensure the entry is overwritten + * to NULL, etc. + */ + smp_store_release(&lock->locked, 0); + this_cpu_dec(rqspinlock_held_locks.cnt); +} + +#ifdef CONFIG_QUEUED_SPINLOCKS +#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; }) +#else +#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t){0}; }) +#endif + +#define raw_res_spin_lock(lock) \ + ({ \ + int __ret; \ + preempt_disable(); \ + __ret = res_spin_lock(lock); \ + if (__ret) \ + preempt_enable(); \ + __ret; \ + }) + +#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); }) + +#define raw_res_spin_lock_irqsave(lock, flags) \ + ({ \ + int __ret; \ + local_irq_save(flags); \ + __ret = raw_res_spin_lock(lock); \ + if (__ret) \ + local_irq_restore(flags); \ + __ret; \ + }) + +#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); }) + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ From patchwork Mon Mar 3 15:22:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 89E40C282D1 for ; Mon, 3 Mar 2025 15:52:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=MKS1T/XQ/tWs8ecRTlu3cl4kKPJmu8O7rwuQ99x/CTQ=; b=Xt9gEtO+XFCDhS2qHbmvKKF0gx 9Qb9kNNqrb5+sQvyxrWqEZj02vkvv6QWuSpTeG51SlYMbx+RZMBl99e9m7abHx4r0WXpm7/GRQyOZ g7FWOrmZxLIQWSlMvmF5KChx544l/IQ27F8fB3FsvRcd1Rx1hnVNhuJawDtGqlTmrQ7EJMzOXcnRL FqvgL453lp2s5Q6WXZ3XvKDhxvzzPm6mc+EmFj5GCWfB/EHR7950ZwS5E108pZdXNGYjlDyOdZqqX YYRv74F39/Ru90BsmCsP8OjhLfrS/TOaRX0qTlartsjDul8y85MBt90Q3GV0U1snCmEQi8hSro6gI Deal98pA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp85M-00000001OKF-3SV0; Mon, 03 Mar 2025 15:52:04 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dl-00000001IKY-10sN for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:34 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5b1f17b1804b1-43bc31227ecso6733345e9.1 for ; Mon, 03 Mar 2025 07:23:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015412; x=1741620212; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MKS1T/XQ/tWs8ecRTlu3cl4kKPJmu8O7rwuQ99x/CTQ=; b=KPP6gGYv58mYWPodLVCYF6EMogfkyHpUVXXgFy/owQ5AG5+lxd5HSbBa16FYuU3ceA xYJNTh5WMg7UjMg5ubMMbfQE67gzpFm69JC96fXukfYoqwK9w4GQnmZtaHb2GCwEBxlH UGyd8S2kz5cO+DGp0sGZ+2/MKb0ONjka43cLcx1LJqbaNUTn50bM4WdnL5bWqQtlmDvx mRPHszlNcUp3B8s4SB0kb3En+g+CdRVqaO3F041L1kjq+K+l2a3y2MFKe+aQa91DraZt cljW3Ob1wouglKe+5Op+VfdRCmtoYW/CxcCKPJ1tCQVc0TfaktvrYqqRvBKzYqclJzaT UWIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015412; x=1741620212; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MKS1T/XQ/tWs8ecRTlu3cl4kKPJmu8O7rwuQ99x/CTQ=; b=clf4bjCHL1SDct2KS6hCwCQYGupsnOp95fJQjDaoFKTRNlRCSOLM+8+JV5z6uSOE1s I8hm1dfrMGdCAjcn3Uu/431sCiZgB0kNsC+xZrs2zcvQoPHg+V+kjP+RccejhqDOQAu5 TOOFYz9BUcrGIj5DyISWUsWdd8fnHxgIurZjH1mGQuxAEovwOYVHNXvMPCeyz/6aPsVn fII2yoctzRa2XOgq3LufaIc00eB72q3GsC6m8NMc+h8EHlJx77HESAnH1qsavPLPKW++ OmyYQuC9cW6siNlN3ki6sQmLnR6rXi/+iDcV+NqOh/xJ3zM6T+BOifNNphN+E1xwz8Gs GS+Q== X-Forwarded-Encrypted: i=1; AJvYcCVo63+IvAsdkTPBCLHgMidzwOLjbgaBcqZ9sCR7SEhjdE8CRho7kI+Pq8SQTgWfEBG0l5xNlDHd/zDUtxipxqrX@lists.infradead.org X-Gm-Message-State: AOJu0YycTICWnwoOwFkXIiiuaJSbDV8qSP6zE7Ag17eqgQfYkDACiOmw ba/zu6Zd8uxTcoh4lVQkNkKvjScoldetIRTirNU5uFKTdSbWLzct X-Gm-Gg: ASbGncv0UcZUhLnkhkRubjT/zRDfIrwFXauom7EikMrF3+zXqHpLEjUn0vhdZqcVu1o 02btA1DkhyGLU7RQA2EcJmdRBYuvXstK4xVF02DTr6CWK8zn70+wFnEB0Qew/QOykeuQXL5m3Pe L6qfoPD8+eiqZOW0GMvwc7L8MkH7nWfHFG7irKb9sF/m56kHymc9MU4NFixcYBd6KUaGxVESFVz avQaNhsnN+5sDrpr7JrisFytawIZjamQyBoNedaO/e40ZOZPlw9kbGRzY64VleNJQn+3hTFRoMx UAK81Im7UskQSt5/uRRbdZA0YhCBF7tmbR8= X-Google-Smtp-Source: AGHT+IGlH16ifEyb4PlL+eWM8UXuzJH348pjxTrHJU3GO0ZnngxsPxkDQyxyCAS30KE/L9MfeFB7Sw== X-Received: by 2002:a05:600c:511e:b0:439:6017:6689 with SMTP id 5b1f17b1804b1-43ba66e0bf5mr109140235e9.9.1741015411800; Mon, 03 Mar 2025 07:23:31 -0800 (PST) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e4796084sm15030999f8f.19.2025.03.03.07.23.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:31 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 17/25] rqspinlock: Add locktorture support Date: Mon, 3 Mar 2025 07:22:57 -0800 Message-ID: <20250303152305.3195648-18-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3149; h=from:subject; bh=bthuxBncjGOs4mky1mKdWZk3nTI3RWDlXR+RsRusmm0=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYc+Sgz0ORRtjBX9jRAAhbYoGw0HwGVEnt1lkK 0s8wahWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RylTSD/ 9TP8RtAKolLrilNYH10VbArS3ra9jOCKUgFIRYMn7uqajviRSi77CSgnaVlYnmA9EfkXqnPw6r53W8 inm8T7K1fkgOmvIbYxtsKTKVPyW6c0loQf/rxpgCr7uK34o5YRBK/iSgFyLS4uAE5QbB/Pf9jJ28lv Wtgk/Ey42frf7Uf5RQNdrzPGWDnh/n2149+2agrmt9UNjkHTYYsrn6TOvYBRgxX0kiN60RLQbmXTjC nHD31GAomXGCnNxzEDhY1G2vG7r5C67GzOWT3UMHmKgQdaRcF/Lsw8fkI6g612CABQio63//Nm0QSA Myxwcedn9UdnGHGvM0F3IlXZs2Ub9fwmfoi6C1gmJ3MJiii8TYzxMoVYTXMM5yktxCqQG11WnTNq0l b3CpjATOtu5zhfGJH4mG717bInDxZ1ZmPKWEEGT4Hz0ePktKsfs56WLmTMhKVkRnC8Fa8ztfYGJLdG bQg5bPK0unpb/3Z3pIjZUHVpqVYWKGFYCmKd8AYmN3D7RlJn7HRmZCf56bovKsEgHPk9ZlLamM213o m1cj0p0dLPhfrYItv1LDuMe8Xkd/8Dzp3oWLhMa097fZFVsUhzB6TnuCE56VdlV2ZEvObJHft6x3OR yX8L4HXgOOdKq1WkjxYTYQtu9ZbqUVfB5ojTHrx1qJR9u7pdx19dUvyvMZQg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072333_289485_5FEC1A11 X-CRM114-Status: GOOD ( 14.38 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce locktorture support for rqspinlock using the newly added macros as the first in-kernel user and consumer. Guard the code with CONFIG_BPF_SYSCALL ifdef since rqspinlock is not available otherwise. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/locktorture.c | 57 ++++++++++++++++++++++++++++++++++++ kernel/locking/rqspinlock.c | 1 + 2 files changed, 58 insertions(+) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index cc33470f4de9..ce0362f0a871 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -362,6 +362,60 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = { .name = "raw_spin_lock_irq" }; +#ifdef CONFIG_BPF_SYSCALL + +#include +static rqspinlock_t rqspinlock; + +static int torture_raw_res_spin_write_lock(int tid __maybe_unused) +{ + raw_res_spin_lock(&rqspinlock); + return 0; +} + +static void torture_raw_res_spin_write_unlock(int tid __maybe_unused) +{ + raw_res_spin_unlock(&rqspinlock); +} + +static struct lock_torture_ops raw_res_spin_lock_ops = { + .writelock = torture_raw_res_spin_write_lock, + .write_delay = torture_spin_lock_write_delay, + .task_boost = torture_rt_boost, + .writeunlock = torture_raw_res_spin_write_unlock, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, + .name = "raw_res_spin_lock" +}; + +static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused) +{ + unsigned long flags; + + raw_res_spin_lock_irqsave(&rqspinlock, flags); + cxt.cur_ops->flags = flags; + return 0; +} + +static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused) +{ + raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags); +} + +static struct lock_torture_ops raw_res_spin_lock_irq_ops = { + .writelock = torture_raw_res_spin_write_lock_irq, + .write_delay = torture_spin_lock_write_delay, + .task_boost = torture_rt_boost, + .writeunlock = torture_raw_res_spin_write_unlock_irq, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, + .name = "raw_res_spin_lock_irq" +}; + +#endif + static DEFINE_RWLOCK(torture_rwlock); static int torture_rwlock_write_lock(int tid __maybe_unused) @@ -1168,6 +1222,9 @@ static int __init lock_torture_init(void) &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, &raw_spin_lock_ops, &raw_spin_lock_irq_ops, +#ifdef CONFIG_BPF_SYSCALL + &raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops, +#endif &rw_lock_ops, &rw_lock_irq_ops, &mutex_lock_ops, &ww_mutex_lock_ops, diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 3b4fdb183588..0031a1bfbd4e 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -85,6 +85,7 @@ struct rqspinlock_timeout { #define RES_TIMEOUT_VAL 2 DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); +EXPORT_SYMBOL_GPL(rqspinlock_held_locks); static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) { From patchwork Mon Mar 3 15:22:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 36D4FC282CD for ; Mon, 3 Mar 2025 15:55:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=7Uc01JwRLTx4LvOAU9cv3DgvL5O0RF1rRGwSEjckiRA=; b=bINY5Offsee24oI2ibDZZvlyMT 6wCisX8bH/DIIhvXkhWVhGIMcjUNIzmTRbNaPNp/Vl7UllDmT9lDgB2Cx4DCTG8JuUMYcRpyFRgvv NYdpshDJYfWm1A3QRN83+oayjFiBtrDUmevlHI9IXs3w057t7IT8jO9byFHcsDLMGkzzteCPBrHzM Mxa2xGVsA0GYe6TTmhAV29g9f7eRqi7o7Hz8zOflh3R08ed900QrK4WXXkHaDvnAqw4m/H+U4YHy0 UAvc7tZGmAoLiSMnr9iCMOoDJTVQJow7k2btAyA94N55vU76g5quPlgATPzMc208hk0m4dzahOlIq tlHvAG+Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp88S-00000001On6-0y1Y; Mon, 03 Mar 2025 15:55:16 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dp-00000001INm-1aUo for linux-arm-kernel@bombadil.infradead.org; Mon, 03 Mar 2025 15:23:37 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=7Uc01JwRLTx4LvOAU9cv3DgvL5O0RF1rRGwSEjckiRA=; b=YGgcnJLofQuLq3hr9pd9FD5uu4 dgegGu9QIXWBIViqcZmsWFzEr8Wc/zh0giEX+iNF/jVNyy/gAwe5yOWp+6dIQb9ffHRJDWpPk0jBV ZE8cFe5wooAuz/y0PLoIRgDbkX7R65Pq7U1wnUbXU8E9nQeqDTsimuiAh60jlHDjKJLpp8wRZF/Us 6zy+hAh3vR1YJsDBLs8S4IbmXRdKqhOofOeYYcOu/Bop93Ik5N8Jc8IwGJo+0kAOnSXI/yarlzUVX DjFd9R6zuTZmuAQMBgLCmAxov37jSaz3htNys0eDso/f38udyYnUE8YI7BACE5V8ZDJLGXoPuggK0 MuJbGLIQ==; Received: from mail-wr1-x442.google.com ([2a00:1450:4864:20::442]) by desiato.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dm-00000004Zzt-0gWk for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:35 +0000 Received: by mail-wr1-x442.google.com with SMTP id ffacd0b85a97d-38f2f391864so2550624f8f.3 for ; Mon, 03 Mar 2025 07:23:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015413; x=1741620213; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7Uc01JwRLTx4LvOAU9cv3DgvL5O0RF1rRGwSEjckiRA=; b=Uza+bhpc5aIm59jnCMJHWhP3RNt+7qFNKw8jPWYNbns2RvzGrgsl02Bo2zMrKl+QUy R1i98HK4yUgNvAiSdL+Zhk5Hsp6m930SaVvmHFqCiDlLf1YqEwx3G+Mw2M2DSn2AS6Nt 06B6d4qpsgURdZzTYv93N6CcZIVPkPwZV+jgroY60UKOqc8BmRfCo9URmis/3EtvDdq8 AMxlF8ipZSFRFd0tYi+K7/WWPt6ogSCl/a2lDnxEPlUmF2CtFOj0Uo64mTs1tsFcFF1R +JGX2eWBeBIzHn+LNwI0KnMo/o7uF8zl68SMV2BuiQTpNYmY/yxJb9/12+o2xmffNyfg FjJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015413; x=1741620213; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Uc01JwRLTx4LvOAU9cv3DgvL5O0RF1rRGwSEjckiRA=; b=MFy9LD6UdnGV3oI1UzOfnrNu6SLpSzJGovH2SLH4kIKCr4Q8SiHAgdfN2Y2qHtasWe uvdEtzZvSgo/Xqi2968mhGKanCEfB+8WspXzLBe0dTLRsvlfEyesaAcexDy55ViGZzYc C9Qt7sa5Y4YwJmSt+0Gh9xwViopiiEafwIHNrtSPI/epsMiHbN90i6sHDr4GsfCt97lv bCTCJ2NZk4DETDLbo6L3G5OZ3ZiJCUtSC5PPi9korPB/FcH7nNLVWbGr61eVJq2H/Udt cYWj5IhBIHFB7FZNzq2XLy0xxlQrlNb0i7UsSO2RSGkPoxTeX764ILkYS3K8FlRppEIQ b7RQ== X-Forwarded-Encrypted: i=1; AJvYcCU710yM4Ypot1DEmOU7dH0dZz3jW6lZAyLx4m+pBSlWIc64qJlPOrKpO3jDEv9rB4hMrqnyd3K5KHUOrbR214+9@lists.infradead.org X-Gm-Message-State: AOJu0YzTcIz4+Si/gqgdhMaCZBgEPnvM8nrFB9D5nL20vzvq26QodrW9 WF8SgNzy/T5ybfE5wq5DHa1UxnjfpjHP0rC+TmO92l8Dg+qqu4OX X-Gm-Gg: ASbGncuIADWQPlACTkfa8NenuoXxQCsuM0fgZ4JSZUianr5bH3UBkRkourRet+d3bnp RRThGjKcOMEpaeZUDc29qSdeGub5I1O+g/Z7cwC3ssJMyteued98T26Xo+reg2n2KHwttyPl2Nc ARmUA75703gkjTD+4PjayVT/jUsbRBdtGGp1Dj+0Cm572pGTBJiMS18JX/kxwgNio2OmyEjBXwN ZtJfGE8KP3C2IwmvAoH/qzgbTMceXTgToLr83sMKlOij7OUykPlesaMetadCPKbyY818EGRrJLK srHQ6k2n/GuQVOj+P7hy1pIF7aooKZ8TkQ== X-Google-Smtp-Source: AGHT+IE38gK5wJECZxFlCYL7vqyBrElWlJLLivqEZ8lVMAciRT4pe2f0X2OlAMW3qevmIMhBs6VcWg== X-Received: by 2002:a5d:6489:0:b0:390:f641:d8bb with SMTP id ffacd0b85a97d-390f641d990mr8952175f8f.36.1741015412881; Mon, 03 Mar 2025 07:23:32 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e4795d1asm14571262f8f.4.2025.03.03.07.23.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:32 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 18/25] rqspinlock: Add entry to Makefile, MAINTAINERS Date: Mon, 3 Mar 2025 07:22:58 -0800 Message-ID: <20250303152305.3195648-19-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2083; h=from:subject; bh=qSIKPInVzZ0EbqaTg/PJPPvBSQOZHHVdC+CsMy/ps90=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYI9HLNzIFosVfpo/eAGZLrsdbAXI02uiFuv5j gAn43fCJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8RyraqD/ 429SlE3+raZFobBUUclVgrmx/5S5ghp0W6qHlpRJg15mqb/wQ7Nm7MfnP8Mt/cd/yugjieC5xXaIZZ 2gRGy1o4SiUDG91z7FYRxs5REy+twKo6/1wchd494MGNac2afJXVeogt88nnikDfs98ML1W3w4kR3r djShNfhU1YlT5UB9SixxXZsaVabsgXdlD37rG/rItcXLa2S3En21E7gq3ApgqdOWZE8a6/JpEG8kW4 1oXfdGyOuubLdfXTC9F1nuSlIdMSGnFINmAGPeE0mrtgS74cGYxE1pMpplfJMozQabn7YjTuU03xNZ 11p1zrVmvfoac73XycC6N+KM+AcpskEIKpjnp+SKHkhbf4PG513KLKQYTz3VNWwfUHh5AKOH6FYBPn sZ9LhuZ5FHOn+bfFHRURJQOjSlOHS088r968BsyQsVY1qv9RVyVM8VbqXS/C+frx/mSypWU0V8xVte r5NQyi+CfLnuOjZe5fyJo34b4gNNn+J/hqFR486kfOAD9hBD9pjSNnaD7P8U1z/36+m57yipbTlO3B QKtFBqPHAvNwGSaCkhb+JWZfcH7pd2jC83wadWyvMeixxn0WY4FwIP1tLyCP6lxkcbqNUICVYdOfqn LQ9KXmI+sQGaXI4NJ2OV88iI9YE445UWW9V+2WXlW683DUBbYbB8AjtUHjIg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_152334_422453_D522B374 X-CRM114-Status: GOOD ( 13.09 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Ensure that rqspinlock is built when qspinlock support and BPF subsystem is enabled. Also, add the file under the BPF MAINTAINERS entry so that all patches changing code in the file end up Cc'ing bpf@vger and the maintainers/reviewers. Ensure that the rqspinlock code is only built when the BPF subsystem is compiled in. Depending on queued spinlock support, we may or may not end up building the queued spinlock slowpath, and instead fallback to the test-and-set implementation. Signed-off-by: Kumar Kartikeya Dwivedi --- MAINTAINERS | 3 +++ include/asm-generic/Kbuild | 1 + kernel/locking/Makefile | 1 + 3 files changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 3864d473f52f..b0179ef867eb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4297,6 +4297,9 @@ F: include/uapi/linux/filter.h F: kernel/bpf/ F: kernel/trace/bpf_trace.c F: lib/buildid.c +F: arch/*/include/asm/rqspinlock.h +F: include/asm-generic/rqspinlock.h +F: kernel/locking/rqspinlock.c F: lib/test_bpf.c F: net/bpf/ F: net/core/filter.c diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 1b43c3a77012..8675b7b4ad23 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -45,6 +45,7 @@ mandatory-y += pci.h mandatory-y += percpu.h mandatory-y += pgalloc.h mandatory-y += preempt.h +mandatory-y += rqspinlock.h mandatory-y += runtime-const.h mandatory-y += rwonce.h mandatory-y += sections.h diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 0db4093d17b8..5645e9029bc0 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -24,6 +24,7 @@ obj-$(CONFIG_SMP) += spinlock.o obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o obj-$(CONFIG_PROVE_LOCKING) += spinlock.o obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o +obj-$(CONFIG_BPF_SYSCALL) += rqspinlock.o obj-$(CONFIG_RT_MUTEXES) += rtmutex_api.o obj-$(CONFIG_PREEMPT_RT) += spinlock_rt.o ww_rt_mutex.o obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o From patchwork Mon Mar 3 15:22:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999082 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BD688C282CD for ; Mon, 3 Mar 2025 15:53:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=jp6uG/E/DcwB9A5vS09y6QoqRf0wNeVJjI9fKDMNnPg=; b=QpYgygQJ69NjINwwviiNDCR23l S75RZFdSfU0Y3jvD4AQ5qhHVfxl7ko2uAmlDd9mx5nZOu1yBFQ25CQOSW5YBhG5PFXIAhCHcvxvhf mXAqb4JbJpX9/R4ku23E+vDXTlNdKzXDNZmtd3ndAKkn/PTG+Jdy+whuWAxfoqSmgn9RD0cdQY3wx Avse2DUYrzJejQafQED4PHv/A6L/tu7Y8+5VeiMitHHttxlbyqMI0ts6mWdQVa0dQL3co8heKtqxY K4N3Uc3NClgncxaPZo3y558TsmXjTBNY+SSTONO9ffbyfCCjiDYirWd4fC7nLDtIbF27MfYHCf6Bi tfmUPL+w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp86u-00000001OZz-2MFW; Mon, 03 Mar 2025 15:53:40 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dn-00000001IMV-3fyk for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:37 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-43998deed24so43716285e9.2 for ; Mon, 03 Mar 2025 07:23:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015414; x=1741620214; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jp6uG/E/DcwB9A5vS09y6QoqRf0wNeVJjI9fKDMNnPg=; b=DDaeivJSPmh7QxMD7aC6PDeWTbEtBHx3LRL+tgJevcvhSEskRTvBkvKwmzOknhcBn7 9Q3gEgbXjaRr8L8cHF951kaz8HO3p8Sr3JCzoYoQZsD2nsHajxVqqtjxUVpjhSiSkPvs d/lyqt9vk/SaWhU3N6SZ3FAt8LRCLHDEauihXWG9jW269V6v51dEXVH08t7XivG8yJwf 8ikeU8+F/ErjqeV5ZJaDiDilI6Sx1E06eaRkEkXKrqTYew/QREBGNVery5PaxXhRU5ZB Eg7RCMPI9xxRCoRLfwvSW172PXfIUMCInTVnWWKDrVGyoxsC5jfz9xNNXGXoJz6wMQxZ BiEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015414; x=1741620214; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jp6uG/E/DcwB9A5vS09y6QoqRf0wNeVJjI9fKDMNnPg=; b=sj4rgPvFyyMAqLotAKn748hqNTyNY3gM8ZiBAf2p7SJ/4SNCd54ho5HTIoMeJimUKi c91rMC6faLuS6vD+W6eFWwnfWTls+epkbneWAMVUmySTKuS55F4utpcIHaiiO/+rUkcC vaycgJxmLvsVI3iVhV1uxbPbuCEDRvWu2uL7kM0idc382BBVs3Ez+QLqAlkwtXYzdPdR 7gyqJ0gQ2R7IMIElBUSHbykXD+1qhk52MRYttq6lH+gUrLX84AkNmtgReRzia6hpgjp3 nfmnKDSgM+KpWuLIghhAngTwAGOyiNIYgffBsucr2rIVETMJCspLrP56W6FPyBvkCVyU 4kFg== X-Forwarded-Encrypted: i=1; AJvYcCV7+3JD7NxxKupKqf69Fg66LtgjD+9puTnfcN61W4+R1we6QbFmWT6UJBI/BmlC99X6V5nAjW/JjzdfgOkxIIgK@lists.infradead.org X-Gm-Message-State: AOJu0Yz1xBCf7GO++IDtrq1g9gHc0b8SZnOKZBbGHlGIGJBFquTWPZSJ UQBCIdrRLAFbc3Uj7wy8EcAhILQOXLI9TmuVAxDEXsBynChWdzm/ X-Gm-Gg: ASbGnct7RkL5y7TPWsMBGyCaCg6An8Sop4v+HqWkUmD2ippZu5WdNwcYh0FfzakiMLz TkGIPwidaH7vicD3PVILGSWRzZMroiIg7Od8t+sHFHMVJEq3TkrjqSSi92rp1KXxFhOJIxr3W8Q Kltqfi9sMcj1W39+79CP45bzkLL7+KY+qSd5GDgSoRg17eG4cknt00YfMVpuXbKjT+gIgWtJGA8 l02PQ4/neu2Y9jE9SkW/kb1mv5F9jIyJXa/OJKOASs1ZGu9uhsQbtRXPdooHX1KtK7L4m8fACfl +wTixx7uYNAUiO7vfIDf9OG266pJ4jQ= X-Google-Smtp-Source: AGHT+IEbl3LTjrdqPSH5d3kkpLqYi842EMyLleWiCvMkmyw1mm4oqWLsvTK8TF5jjGJQgV7w+8Emkg== X-Received: by 2002:a05:600c:19ce:b0:439:99e6:2ab with SMTP id 5b1f17b1804b1-43ba6766b44mr95692715e9.28.1741015414338; Mon, 03 Mar 2025 07:23:34 -0800 (PST) Received: from localhost ([2a03:2880:31ff::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bad347823sm110368705e9.0.2025.03.03.07.23.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:33 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 19/25] bpf: Convert hashtab.c to rqspinlock Date: Mon, 3 Mar 2025 07:22:59 -0800 Message-ID: <20250303152305.3195648-20-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=11131; h=from:subject; bh=4ih0G/0286de3X2HOwCgeQ8ZZwbokSdXkcTQRkadVk8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYUEPdLFj8sZgqLb5zn7VbigpDGWSBgEbqM3XA x/fwLJeJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Ryv8LD/ 97jXIjkq7Lj++KTQTcQoBHp5m3OgRssDYOiFDubICyCP6RJL9F3I1YPvZ3V7aYFcJJBEuTYglkhD3z vdEzq0g+0G/b/2vgbS8q/Q6tlI4W54PihyS+qwGrnVz3uJPq2Vh8nK0kcPwLJzfLzM9O0weSfEcFz3 WN5AmPXgB759SWiegOPd5FlfEdvcgoDzQHqKNy5Oc4QGYqMMxfoe1BajpDdGIKckYE1k/Q3jD4Ika7 0Ocsr3XMKJdKzWaZ0++mS0Xg1WPgVlB28K08SYP/VcJv8ymSdntzx3T2FCD5ezdoG+8lXnrnhFKkSq JPUZizhxMzGDID+hebcv5VroikizRXHUOZP+OdlHq3FVRtQcu9YmttvUI/AYkCqVIlchD1WcW7nPg3 wBl+QpZfzhAmz3RlwPTacFVnw92AeQLJ/wEgyufOlTdQkFBwITR04JWL46ibiZugmnV5nF/HRctNPR z8cV3U0H2O39bk3dQ7m7Gb17vyUwD+31Bu/9WsIU+oFd9vj3uWEWCMCk7SbIHD+lTSHwZ9j90+KCKm 2aohPllCBBFhVPuDHR3hlKDDoNV1hA1lVZZIbepftZGRQYKUeoQQdxjHTha7P81H3VVeD0xhU+HmJN uqimoBk7BWk8BQ4E3HYoNBJ1UHaIZO4UP5kjM4Il4c+PzynID4cM955BCL0A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072336_070576_9B41AED7 X-CRM114-Status: GOOD ( 21.91 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed per-cpu counter crud from the code base which is no longer necessary. Closes: https://lore.kernel.org/bpf/675302fd.050a0220.2477f.0004.GAE@google.com Closes: https://lore.kernel.org/bpf/000000000000b3e63e061eed3f6b@google.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/hashtab.c | 102 ++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 70 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index c308300fc72f..93d45812bb6a 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -16,6 +16,7 @@ #include "bpf_lru_list.h" #include "map_in_map.h" #include +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -78,7 +79,7 @@ */ struct bucket { struct hlist_nulls_head head; - raw_spinlock_t raw_lock; + rqspinlock_t raw_lock; }; #define HASHTAB_MAP_LOCK_COUNT 8 @@ -104,8 +105,6 @@ struct bpf_htab { u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; - struct lock_class_key lockdep_key; - int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT]; }; /* each htab element is struct htab_elem + key + value */ @@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab) for (i = 0; i < htab->n_buckets; i++) { INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i); - raw_spin_lock_init(&htab->buckets[i].raw_lock); - lockdep_set_class(&htab->buckets[i].raw_lock, - &htab->lockdep_key); + raw_res_spin_lock_init(&htab->buckets[i].raw_lock); cond_resched(); } } -static inline int htab_lock_bucket(const struct bpf_htab *htab, - struct bucket *b, u32 hash, - unsigned long *pflags) +static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags) { unsigned long flags; + int ret; - hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1); - - preempt_disable(); - local_irq_save(flags); - if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { - __this_cpu_dec(*(htab->map_locked[hash])); - local_irq_restore(flags); - preempt_enable(); - return -EBUSY; - } - - raw_spin_lock(&b->raw_lock); + ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags); + if (ret) + return ret; *pflags = flags; - return 0; } -static inline void htab_unlock_bucket(const struct bpf_htab *htab, - struct bucket *b, u32 hash, - unsigned long flags) +static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags) { - hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1); - raw_spin_unlock(&b->raw_lock); - __this_cpu_dec(*(htab->map_locked[hash])); - local_irq_restore(flags); - preempt_enable(); + raw_res_spin_unlock_irqrestore(&b->raw_lock, flags); } static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); @@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU); bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC); struct bpf_htab *htab; - int err, i; + int err; htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE); if (!htab) return ERR_PTR(-ENOMEM); - lockdep_register_key(&htab->lockdep_key); - bpf_map_init_from_attr(&htab->map, attr); if (percpu_lru) { @@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (!htab->buckets) goto free_elem_count; - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) { - htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map, - sizeof(int), - sizeof(int), - GFP_USER); - if (!htab->map_locked[i]) - goto free_map_locked; - } - if (htab->map.map_flags & BPF_F_ZERO_SEED) htab->hashrnd = 0; else @@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) free_map_locked: if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) - free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_elem_count: bpf_map_free_elem_count(&htab->map); free_htab: - lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); return ERR_PTR(err); } @@ -817,7 +783,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) b = __select_bucket(htab, tgt_l->hash); head = &b->head; - ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return false; @@ -828,7 +794,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) break; } - htab_unlock_bucket(htab, b, tgt_l->hash, flags); + htab_unlock_bucket(b, flags); if (l == tgt_l) check_and_free_fields(htab, l); @@ -1147,7 +1113,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, */ } - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1198,7 +1164,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, check_and_free_fields(htab, l_old); } } - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l_old) { if (old_map_ptr) map->ops->map_fd_put_ptr(map, old_map_ptr, true); @@ -1207,7 +1173,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, } return 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); return ret; } @@ -1254,7 +1220,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value copy_map_value(&htab->map, l_new->key + round_up(map->key_size, 8), value); - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; @@ -1275,7 +1241,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); err_lock_bucket: if (ret) @@ -1312,7 +1278,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1337,7 +1303,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, } ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); return ret; } @@ -1378,7 +1344,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, return -ENOMEM; } - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; @@ -1402,7 +1368,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, } ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); err_lock_bucket: if (l_new) { bpf_map_dec_elem_count(&htab->map); @@ -1444,7 +1410,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1454,7 +1420,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) else ret = -ENOENT; - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l) free_htab_elem(htab, l); @@ -1480,7 +1446,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1491,7 +1457,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) else ret = -ENOENT; - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l) htab_lru_push_free(htab, l); return ret; @@ -1558,7 +1524,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map) static void htab_map_free(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); - int i; /* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback. * bpf_free_used_maps() is called after bpf prog is no longer executing. @@ -1583,9 +1548,6 @@ static void htab_map_free(struct bpf_map *map) bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) - free_percpu(htab->map_locked[i]); - lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); } @@ -1628,7 +1590,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &bflags); + ret = htab_lock_bucket(b, &bflags); if (ret) return ret; @@ -1665,7 +1627,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, hlist_nulls_del_rcu(&l->hash_node); out_unlock: - htab_unlock_bucket(htab, b, hash, bflags); + htab_unlock_bucket(b, bflags); if (l) { if (is_lru_map) @@ -1787,7 +1749,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, head = &b->head; /* do not grab the lock unless need it (bucket_cnt > 0). */ if (locked) { - ret = htab_lock_bucket(htab, b, batch, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) { rcu_read_unlock(); bpf_enable_instrumentation(); @@ -1810,7 +1772,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); goto after_loop; @@ -1821,7 +1783,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); kvfree(keys); @@ -1884,7 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, dst_val += value_size; } - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); locked = false; while (node_to_free) { From patchwork Mon Mar 3 15:23:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6902DC282CD for ; Mon, 3 Mar 2025 15:57:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=e8dD4/Kp++x7YqbDF8EB2j5tk6yfejIzgpWScMmcz5s=; b=urGRr29LaWYQz0LuBPROjQm0wK mcfG5vmdyqS5LT2dIfy3teQNU0cxh4kQrBlrf62Lj9fR80iZUKJN51SW8E/Oh6Hwhj5+dDpDcPXfv Wp33NV9qS4fEHmYeBPrmQirA/R3N/9/SeQOTcC/56Sc/I3I/ckSVXJ4WrdTN+LWYz/BgEgY/WvdpI xOcXuYwTReDa6islL4ixmUlRXeGNMzU0GbJ/bsBwSs/0MySZsmRoZUsw0MopG6xWY9jtcP+v0bXT6 hCl2/LZjjtxEauer+3N3Mhy1VlxZjVXhZGH0LntyCMPapTvKmWVHHmRvpAD15O+5tWiVZKXAt0ixA hMbBghMw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp8A0-00000001P6Z-412t; Mon, 03 Mar 2025 15:56:52 +0000 Received: from mail-wm1-f66.google.com ([209.85.128.66]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dp-00000001INn-1tI1 for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:38 +0000 Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-4399a1eada3so42085095e9.2 for ; Mon, 03 Mar 2025 07:23:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015416; x=1741620216; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e8dD4/Kp++x7YqbDF8EB2j5tk6yfejIzgpWScMmcz5s=; b=HZ5RTEOOHn62QyWCGtmiGuqMwz6tsBjIQ1uXE4Ud11O5xfcpethMNsbPq+++pN5Ywv st0eoyisG+U6rg7ajaMETOelpBBLU7Onvtmbh5o8AorvXuRPyHkoTDkhzPJbC2ulI7QW zIGfkycj74q1e2nLFmE/3y5EkCp5fL2dOStn5WHfToQKdwGXlhGRZybPbq/hWd7fa/s1 nN+itU8a8AvwUSBeUcaLJ17BuMZ1b4IRVCeR8rRgsTxLB+ZmrUmQZsRYHA2QP+HY0Vtx g1vvtkpRs7WT6HttWVcUHLAa/miCTVF20NB/Wkbsk18boPBJu5S0TVYpCMCl8Bv4xJaW ZJtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015416; x=1741620216; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e8dD4/Kp++x7YqbDF8EB2j5tk6yfejIzgpWScMmcz5s=; b=f9TE8lybQD4pvsMUuTHhPVAhJS/Z9rdeFx401t7P+pxjT24SyjCgYokjL/V0FPnfW4 PXLw1KGMQB63PAzfyf3fkyorNhxW3hZPr1NFrbqYOVN6vfkTmgMPfWta7diNM9krJwp1 P9X5ylPHUjvOX6lH9AWCFizRS/ZSRvh0H3hEoyS9KtZm5Z6npeFag4geRzDyR4XYy1OF YKpYLoDr8t6ZksZCHBh4KPGUyYC1DZc4uQfO/UcWMtkZHEFAi92tnFeR4l7K6z/QbZ/Y PDEkYOhtOWmVTvwpLsaXaBekMDs7GBuQSyzcSg59EOOpxWymyZgFeAg3vFmbWwNda4U9 glYw== X-Forwarded-Encrypted: i=1; AJvYcCWMlwP9z7wPlqZimH6QQ/+80RTTY2/PsNrePRvnzmn3cH1XKNh1REfFlVIgoOxyjuknQVoRwsGUOSb1/LOivH/Q@lists.infradead.org X-Gm-Message-State: AOJu0YyZe1s8PtwGDuGrUHOLler+0/3dnkpsHWHj4LJbni3xpq+3xb9X fad3vD1b+iRjocr1XcfdShdCTJHPGP84SbaoSG9ASE/ieFZCoq0b X-Gm-Gg: ASbGncumHJ0BziuHi0Hsq3Vny3FGzyObZmDkFjaHFXJSMwQfN6bnLvJ9hxKwfoHMJBr dGwkG2Jk8C843MPwtLQKdbmkspgu4hIAFjI5x50N9M980HhzHYPW52TizXEUW6ZhOQZh490wL7v UgtG+0DMyuM1fi0btVPxPLWcg0LOaaWpxz4DlJvwlLcmETu4y7AsByvjwbPcwfj88qlaZkpWyBc Rvtw5RI4gowFs7eDMImYLpwsnHZCkGBJfbRDl1Jvklqd1q0twq4s9kPH76ZXdq1JXytOmc0zxBu ZWvmsTmQlD0fTA/T3iemTJpieVJprIU8jS8= X-Google-Smtp-Source: AGHT+IFM6Zq1v0jmMwXwYqRoD9oDMqR8dF4x86BBEAU0OdxfdV7k9GzF2/Hh85klPoaqsXWBJydz9g== X-Received: by 2002:a05:600c:138c:b0:439:5da7:8e0 with SMTP id 5b1f17b1804b1-43ba6710819mr131963315e9.16.1741015415569; Mon, 03 Mar 2025 07:23:35 -0800 (PST) Received: from localhost ([2a03:2880:31ff:52::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43aba5329besm191031045e9.15.2025.03.03.07.23.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:34 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 20/25] bpf: Convert percpu_freelist.c to rqspinlock Date: Mon, 3 Mar 2025 07:23:00 -0800 Message-ID: <20250303152305.3195648-21-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6720; h=from:subject; bh=at8Ekv6S0aGbB9KEki4TPKeyEB0J4bUZMvZ4q+vMLKI=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYU15SR6rg3ngy0KboyZkXB1I7iXAgTU+4Zv4a q8n+lJ+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Ryp6rD/ 9okBvTAvR2PTgrgAzFgfUFBu77V0INCzcE3PwQfEwD6rcvTnQUIlryGruEBawLfwn0wxW24wRL/dRy nGL+yuYrEX9YWrs+Pqsz7o2vt7yJfza0S2lZUFT62qMjwUQg9BarfBMcmdWT9afoe0wpXqzQJZc6jh N6kk4VGDj6B7Px2V2Q0EmPXTnjMNmTYmW+3aIZ54OWksD6UxJnflDMq5xbEWu/GF4B5K4mbSvFMCII lU78oDBb2buz6I6WKt9BQukjzIuQSRCcH/r1FEJvzfoQ4A3TWk18Y1lxWqoLywUa4Ts7ff2I7yus+1 Zj/YRvGo//Lcbv35gHIqrjLfvvKXZaW20bcvjfIhytSshAaX3innu2cR2sqFFrVmcb9QwjjqIKzOCa 5FGA1xCE5jK6UkBwTmMKx4rg1e1fUi1vRIbBAjX86NNJ0GuOCdFjKuYocy4yxL30V9Ik680i0XeSlr QnkJNq/7VLZRNEjs54b4gDrXYDTwnRE6WzcfhV7ZQ4Ju0vanEBetGxwfUjmZM0M0ajtqIWGYnzgiFS R5C7BadAXVRz5lKKGAlQxBfZ6nS2erxfMxeR6kGXDbOMO6k1byllETTKxBijhGc5WApPvSUKF8aOUK sGKE2uv8eHTZLfSfjrYm32k98H6hP2O73mx941c2I3ASB3LXurQ/pSaQGQRQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072337_529752_3201D7F4 X-CRM114-Status: GOOD ( 19.97 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Convert the percpu_freelist.c code to use rqspinlock, and remove the extralist fallback and trylock-based acquisitions to avoid deadlocks. Key thing to note is the retained while (true) loop to search through other CPUs when failing to push a node due to locking errors. This retains the behavior of the old code, where it would keep trying until it would be able to successfully push the node back into the freelist of a CPU. Technically, we should start iteration for this loop from raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus, we skip execution in the loop body instead. Closes: https://lore.kernel.org/bpf/CAPPBnEa1_pZ6W24+WwtcNFvTUHTHO7KUmzEbOcMqxp+m2o15qQ@mail.gmail.com Closes: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/percpu_freelist.c | 113 ++++++++--------------------------- kernel/bpf/percpu_freelist.h | 4 +- 2 files changed, 27 insertions(+), 90 deletions(-) diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c index 034cf87b54e9..632762b57299 100644 --- a/kernel/bpf/percpu_freelist.c +++ b/kernel/bpf/percpu_freelist.c @@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s) for_each_possible_cpu(cpu) { struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu); - raw_spin_lock_init(&head->lock); + raw_res_spin_lock_init(&head->lock); head->first = NULL; } - raw_spin_lock_init(&s->extralist.lock); - s->extralist.first = NULL; return 0; } @@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head, WRITE_ONCE(head->first, node); } -static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head, +static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head, struct pcpu_freelist_node *node) { - raw_spin_lock(&head->lock); - pcpu_freelist_push_node(head, node); - raw_spin_unlock(&head->lock); -} - -static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) -{ - if (!raw_spin_trylock(&s->extralist.lock)) + if (raw_res_spin_lock(&head->lock)) return false; - - pcpu_freelist_push_node(&s->extralist, node); - raw_spin_unlock(&s->extralist.lock); + pcpu_freelist_push_node(head, node); + raw_res_spin_unlock(&head->lock); return true; } -static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) +void __pcpu_freelist_push(struct pcpu_freelist *s, + struct pcpu_freelist_node *node) { - int cpu, orig_cpu; + struct pcpu_freelist_head *head; + int cpu; - orig_cpu = raw_smp_processor_id(); - while (1) { - for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) { - struct pcpu_freelist_head *head; + if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node)) + return; + while (true) { + for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { + if (cpu == raw_smp_processor_id()) + continue; head = per_cpu_ptr(s->freelist, cpu); - if (raw_spin_trylock(&head->lock)) { - pcpu_freelist_push_node(head, node); - raw_spin_unlock(&head->lock); - return; - } - } - - /* cannot lock any per cpu lock, try extralist */ - if (pcpu_freelist_try_push_extra(s, node)) + if (raw_res_spin_lock(&head->lock)) + continue; + pcpu_freelist_push_node(head, node); + raw_res_spin_unlock(&head->lock); return; + } } } -void __pcpu_freelist_push(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) -{ - if (in_nmi()) - ___pcpu_freelist_push_nmi(s, node); - else - ___pcpu_freelist_push(this_cpu_ptr(s->freelist), node); -} - void pcpu_freelist_push(struct pcpu_freelist *s, struct pcpu_freelist_node *node) { @@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size, static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s) { + struct pcpu_freelist_node *node = NULL; struct pcpu_freelist_head *head; - struct pcpu_freelist_node *node; int cpu; for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { head = per_cpu_ptr(s->freelist, cpu); if (!READ_ONCE(head->first)) continue; - raw_spin_lock(&head->lock); + if (raw_res_spin_lock(&head->lock)) + continue; node = head->first; if (node) { WRITE_ONCE(head->first, node->next); - raw_spin_unlock(&head->lock); + raw_res_spin_unlock(&head->lock); return node; } - raw_spin_unlock(&head->lock); + raw_res_spin_unlock(&head->lock); } - - /* per cpu lists are all empty, try extralist */ - if (!READ_ONCE(s->extralist.first)) - return NULL; - raw_spin_lock(&s->extralist.lock); - node = s->extralist.first; - if (node) - WRITE_ONCE(s->extralist.first, node->next); - raw_spin_unlock(&s->extralist.lock); - return node; -} - -static struct pcpu_freelist_node * -___pcpu_freelist_pop_nmi(struct pcpu_freelist *s) -{ - struct pcpu_freelist_head *head; - struct pcpu_freelist_node *node; - int cpu; - - for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { - head = per_cpu_ptr(s->freelist, cpu); - if (!READ_ONCE(head->first)) - continue; - if (raw_spin_trylock(&head->lock)) { - node = head->first; - if (node) { - WRITE_ONCE(head->first, node->next); - raw_spin_unlock(&head->lock); - return node; - } - raw_spin_unlock(&head->lock); - } - } - - /* cannot pop from per cpu lists, try extralist */ - if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock)) - return NULL; - node = s->extralist.first; - if (node) - WRITE_ONCE(s->extralist.first, node->next); - raw_spin_unlock(&s->extralist.lock); return node; } struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s) { - if (in_nmi()) - return ___pcpu_freelist_pop_nmi(s); return ___pcpu_freelist_pop(s); } diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h index 3c76553cfe57..914798b74967 100644 --- a/kernel/bpf/percpu_freelist.h +++ b/kernel/bpf/percpu_freelist.h @@ -5,15 +5,15 @@ #define __PERCPU_FREELIST_H__ #include #include +#include struct pcpu_freelist_head { struct pcpu_freelist_node *first; - raw_spinlock_t lock; + rqspinlock_t lock; }; struct pcpu_freelist { struct pcpu_freelist_head __percpu *freelist; - struct pcpu_freelist_head extralist; }; struct pcpu_freelist_node { From patchwork Mon Mar 3 15:23:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 785F1C282CD for ; Mon, 3 Mar 2025 15:58:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ign20YEmiqhrw2u8+y6jgpbQIECAjRoqEHqNv09q2OE=; b=ZRmD4/a03Vhxgu3g8bvEX8piJD aXmMgNVFoBh+xLAoEIL+tPVH1sEf8581KGvUq0LqTvHZ91Y2P1KFwR+P1DG8imQphUZxvFiY3A6CB Z01Y1DZTiD6Cphlt3N2tgdEcHZcaKUsKhFGLYslWb22AQ0Lgx+pwzMhetLB2fi/HI4aRg6gtkwuM5 ALJgepWzxqD6AKIZdPwteuxfbBuZKYy0JO9+3r4m4JIN22hLdkksZzFRNzHjQPZA3elLG8Jphe5n/ J6LzOrn7gf4dQEN+aGPjkp2SmD8+xdiH2FPfGrOSZMDz40nKjKrxIt6LtngVN/8WsISXg71Ji28mV PjBTXtcg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp8BX-00000001PN7-2WhK; Mon, 03 Mar 2025 15:58:27 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dq-00000001IOf-1jx2 for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:39 +0000 Received: by mail-wm1-x341.google.com with SMTP id 5b1f17b1804b1-43bc0b8520cso7209815e9.1 for ; Mon, 03 Mar 2025 07:23:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015417; x=1741620217; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ign20YEmiqhrw2u8+y6jgpbQIECAjRoqEHqNv09q2OE=; b=GE/1xtrBxxpVEt+57RrGxiPgKkuYv7VEJER6yyrZQniW6bt8Vfamq0+3y08zdPokds Bt0DkwtJt3X6Vd5rDDUDBN6wj7RbbwHX1UQdYJvhgPRrjM1oWG3A+AW56vsKbkXyhW19 3QaCR7GMm7+jvjg8as6KZuDrloQtpsue1x/94O8NUIUhR2kjLw9fY6dOUxznPWLuSlhV cFlicCKuU6lqJNYETNBgUw7HU/jVsjHgpU+hhp47pn2Kh//eDXnP3xaYvTXNVWbVKR78 evzozGKT8ZDxX5T6FMT9PwjOAGIiZeLkKz4Vv06r9eY/MxnUacm+Pfnbv4b9m/TQLNGf VMbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015417; x=1741620217; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ign20YEmiqhrw2u8+y6jgpbQIECAjRoqEHqNv09q2OE=; b=aObZGjeyzCZJUF2khhualLhlNJCIqmW0N90Qy08/2AzXPOUajN4uxEQVFcvNVBCQNq KWyf7CirqLJpy3VTykdD938JYuAj3G2Z0MKS4730bTE6lNsRwcvHpCI3WOiJltFLXdGd 1UiOIbrxft8VYHItxs+qMhJmPEJ5QVmQRt92/wVJ543yaev/HjXKKM+xzoxmllwthLso sU3mMrTJSpnozsthBHY0SmkXTPyEoYk0UfAVhfPGlHrlsAz+NS0O3PEkLzaRMqODdGv6 I2I7dEPGA8/laciXpiI3WN1UWyWChZ36QdhuKZ/00P1eskHF0TYriz3PjXbkB3Nl/xCB V73Q== X-Forwarded-Encrypted: i=1; AJvYcCXRJMRvvYjoLoRF+1gGB+3VNHdTnvdbZ7NMfsMV+JUuJnrBSV+M80697UuylXgwgTXxHCqXlajTKwdw4KYj5Y13@lists.infradead.org X-Gm-Message-State: AOJu0Yy7H4qk6k7VZTsjXgOmEYj3PBl08+jrjayDNvm+ql4Hb9TvLYCk WtuQ5lRVqZHx+kRGw+w9smN4XiPJ06SHpGZRqDKCRUSmsvpCwI9v X-Gm-Gg: ASbGncuN3YATR2FL5NWFTZdq7ZYBoebfprj2apSQpkJQyMlaUXkraYzDzvmVRNoYNf1 /kYSOBkIjQHbOLfVSXDo1Jmfz1N9qyEQ2Y6ozLx2lbw1TL7+qW/RkKSibcwJLZpVVB6Pjb7bBlm KVI8Z/LXEdbOZoL5a3HuVhFmsCOcq24dSWd5nV4Hre8j+bR2Gwxglw2afRrZ+9Da/7aODzZxzsP qQvhdeN3O1ZFg+PqUMN73Q5BMZTfNym37ec1sLujKg6/WBatwEmjBcNOfQDapHfFWlKImYNmKkX WJUhDNjO3A43f3guMbBgRabZL0Ll4GHLrg== X-Google-Smtp-Source: AGHT+IGlleKe6Z0c4L94quE3jCAiGhg6NJQ9iC0AaFH2JLOqqs7KqiXhmqbOtJsTuHI4jJYfhIdEnw== X-Received: by 2002:a05:600c:a03:b0:439:9b19:9e2d with SMTP id 5b1f17b1804b1-43ba6702becmr131087705e9.16.1741015416963; Mon, 03 Mar 2025 07:23:36 -0800 (PST) Received: from localhost ([2a03:2880:31ff:7::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bbfece041sm37242865e9.1.2025.03.03.07.23.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:36 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 21/25] bpf: Convert lpm_trie.c to rqspinlock Date: Mon, 3 Mar 2025 07:23:01 -0800 Message-ID: <20250303152305.3195648-22-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3875; h=from:subject; bh=H+4fvDbeMKpg9kdHFVQF9EirB/DOLCaq4vd16HHmH14=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWYzhMQMRB+v202mYygVFkG5s/se6GlCLJdFF8d 3MnU/7aJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmAAKCRBM4MiGSL8Rym2eEA CegI2sLVgC2H3/ckPAgaXiYSYSB/uZkXAnww6a6gZc4jNT9kbnHO5KY2RaymVBLci89GDlmnX3v+wp WH5WovKA4m3qH12EjfliSemjg4XIf5M+9Ubag8HHRATXQ0yaW0/Qm4ssV5nbBuntkgRgngLkiyA5C8 6fjsJiKGdQIahwdn/67q1wa9oR78sDXdVF+AOJsCO2QuhaU5qq0dltH27b3gJqWKSyc1OZcC5r1l61 elIyx2qtvgu8OqcZZqaRnEqrlW0x0BpkkO7Dgku2PsEA2SH7M+1DMl5mqtd3J2A0MZrUOT6OjSXmg4 p9NU7L83xsKPB1L/pYutorTTzi6toA8506vLLki/zJPux9cMC5JYuSLqcpz0pb1Q9FKfd8pB/9TfSu pbO9LSlidNGUkw43mfpYyTaDXU14Ud+GC+0SLrUZPMF5UwGNMJuS7N5MfGYxZ+Cwmw5fKGhqPOl9iz 9LGmjGVn6Kv7wo6LX+4HFS3gLLUM/eXMg+u9WWdTUBvFK+ksxSCkHM5fghBlgZS02NQfzLCdz6uO07 4a4aGj9rfzYsf+X9gWldSoL3VrCmnfaj9BW6RTJpdYDI5j73EYdqt946jfnpVYlit0hY5aV7DI4ZiI +0MFK7IDXVAmFcgWYbJJ2fnguEXpNKxH7KrBlkDDpqfs578zqAeL7aGfqhUg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072338_457546_047D8422 X-CRM114-Status: GOOD ( 17.67 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Convert all LPM trie usage of raw_spinlock to rqspinlock. Note that rcu_dereference_protected in trie_delete_elem is switched over to plain rcu_dereference, the RCU read lock should be held from BPF program side or eBPF syscall path, and the trie->lock is just acquired before the dereference. It is not clear the reason the protected variant was used from the commit history, but the above reasoning makes sense so switch over. Closes: https://lore.kernel.org/lkml/000000000000adb08b061413919e@google.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/lpm_trie.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index e8a772e64324..be66d7e520e0 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -15,6 +15,7 @@ #include #include #include +#include #include /* Intermediate node */ @@ -36,7 +37,7 @@ struct lpm_trie { size_t n_entries; size_t max_prefixlen; size_t data_size; - raw_spinlock_t lock; + rqspinlock_t lock; }; /* This trie implements a longest prefix match algorithm that can be used to @@ -342,7 +343,9 @@ static long trie_update_elem(struct bpf_map *map, if (!new_node) return -ENOMEM; - raw_spin_lock_irqsave(&trie->lock, irq_flags); + ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags); + if (ret) + goto out_free; new_node->prefixlen = key->prefixlen; RCU_INIT_POINTER(new_node->child[0], NULL); @@ -356,8 +359,7 @@ static long trie_update_elem(struct bpf_map *map, */ slot = &trie->root; - while ((node = rcu_dereference_protected(*slot, - lockdep_is_held(&trie->lock)))) { + while ((node = rcu_dereference(*slot))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -442,8 +444,8 @@ static long trie_update_elem(struct bpf_map *map, rcu_assign_pointer(*slot, im_node); out: - raw_spin_unlock_irqrestore(&trie->lock, irq_flags); - + raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags); +out_free: if (ret) bpf_mem_cache_free(&trie->ma, new_node); bpf_mem_cache_free_rcu(&trie->ma, free_node); @@ -467,7 +469,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) if (key->prefixlen > trie->max_prefixlen) return -EINVAL; - raw_spin_lock_irqsave(&trie->lock, irq_flags); + ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags); + if (ret) + return ret; /* Walk the tree looking for an exact key/length match and keeping * track of the path we traverse. We will need to know the node @@ -478,8 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) trim = &trie->root; trim2 = trim; parent = NULL; - while ((node = rcu_dereference_protected( - *trim, lockdep_is_held(&trie->lock)))) { + while ((node = rcu_dereference(*trim))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -543,7 +546,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) free_node = node; out: - raw_spin_unlock_irqrestore(&trie->lock, irq_flags); + raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags); bpf_mem_cache_free_rcu(&trie->ma, free_parent); bpf_mem_cache_free_rcu(&trie->ma, free_node); @@ -592,7 +595,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr) offsetof(struct bpf_lpm_trie_key_u8, data); trie->max_prefixlen = trie->data_size * 8; - raw_spin_lock_init(&trie->lock); + raw_res_spin_lock_init(&trie->lock); /* Allocate intermediate and leaf nodes from the same allocator */ leaf_size = sizeof(struct lpm_trie_node) + trie->data_size + From patchwork Mon Mar 3 15:23:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7EAF1C282D2 for ; Mon, 3 Mar 2025 16:00:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ftGzO2s1hk5aoPF/5zj2qPXt6ZSl0PVETns25Hb5ycw=; b=wsTcTLTPkq5zEnbaBWpJfBprzE BuIySytu3LOi+jy1CMKNjwP5IYSwvbgGEUgoliV9lG+vN6SDK2nj+VBwjGmpyBbhlSYmjNeUHa8W4 VvNfSxJ+YHpy0uhcot1aq68BqQpLS6YLzfa2ZZB2obu+leBzNOaUgbkuMUkatNPCX/qcumqn2iOif 64F5Xa3Kfji5W1egUfgA7iwELUTLaGIw/j4puOJDzTPoPOlEVYSkLxJoroMntf8CbD3MmqjsNJ2NJ GHayZ+n5c1I4z3T2fTbYHCkblTomJB5qiTU3apl0lhp/E2dy+aYVBm6g3hnzN1cGd+Ud1jvvQJjmm qRA3qBGQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp8D7-00000001PdG-3X5m; Mon, 03 Mar 2025 16:00:05 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dv-00000001IRG-20Kq for linux-arm-kernel@bombadil.infradead.org; Mon, 03 Mar 2025 15:23:43 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=ftGzO2s1hk5aoPF/5zj2qPXt6ZSl0PVETns25Hb5ycw=; b=Sknx1sw2QOs/SBYNhEaVOKsBLM Nypnl1AM550OE//nxxe0bCOl8rVelptcQrdLhJuK7lidOLKRC57E8mcQQQR3mImL0F52zPVo8SXpl NxGiE4QGRDB1zp7KGggsjHidZjzoHOiuKNkzZ9mkwy56Er8SOldhDsz16Hv7fRmyDoGFRZWCQSbPj 5/vEZXpj6XlZRle0yjoo3pDo2VLK9ieK4ODWHwHNPEg57uIyed0NNWUWYKd5bO5w603GrGFEzS33/ DNAZi4Ocl27TAPmUAVIgimPvVRaZmE9rjjSkTUZeiBc1PdXeHxglZNmh7voYwwYuYylhBrc+Uez74 oetr6e3g==; Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by desiato.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7ds-00000004a0l-1uII for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:42 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5b1f17b1804b1-4394036c0efso29345465e9.2 for ; Mon, 03 Mar 2025 07:23:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015418; x=1741620218; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ftGzO2s1hk5aoPF/5zj2qPXt6ZSl0PVETns25Hb5ycw=; b=HB3p4OURFJetkfagLTuQ3g+D6tsRItPXhzKdgEqMk9ZdgjW3gq/IlBtp8qxiYxxz/g Z05JfX62lQcQKr25jf5EAXy1R10JpdcyahTeWeKPsElg1lCPd4GdkhzVOlGVtewc1qyT 2O1/dqDbr+zVAPe3YBcE3DtVxFNWHp7S31Vl31C56YscySMKs5SKvUWmtNnQoJDufYrh AKuCU5F7wFOFojEeq6L6vugIkZ95HG3rX1vnparl2r8dh0UE4BO5qQGoJofmOfxD1uL+ OpnZaUTJuuWM/mNZsFCeUUvGVSL7UGEUiP8IRyExpQkFD5emYgEhqYgGsTGaz2t+/awD f8wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015418; x=1741620218; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ftGzO2s1hk5aoPF/5zj2qPXt6ZSl0PVETns25Hb5ycw=; b=hNxTP4viDvBHLrb+9RjH6vpj0LgngRQuHBMjGT1KMYRcS78Di1/Opqy+iYC0YV6AjK 5dd9Bjz4Sxax56ExQvXBWjAW8EVbfSoHcdjrKop0DXZU1lPfqpQ3PypXBimlDMPZuI1w HqX8kZwY+eM6ia7UGzxIDaqJhmdk0BnRQcUHQVv87votgyfesEsY57sRSxxjwY3FoYRP DeN8VlnMCJ3gCAVMBjuA+S5NJTrTKKrMQMjx6vfvrGnofzuw9qMsovawtnhdCYLsj2pU 9OAGG6b0ja7uFTAaLEpSq1nuViowMT2zLrvD1rObJtm6p6xY9IYn2G2bQAfdAebSKZXL HP2w== X-Forwarded-Encrypted: i=1; AJvYcCVZLVLcbssgGiy+lPPfs9SmOca+Efs9sYl6JkPtEQDEfQjXwLb873neVlPdI16l64x5M7Uv3IH5Wl+S0ncFDQQD@lists.infradead.org X-Gm-Message-State: AOJu0Yzf0Qus5IO9Lx/zP6srKfs28f5QcleB2Qo5yHjMqHj/dFR+kcoC aPFiwCXje8+Wu8sTJT8tx6pT7sWNAwOIldFr47QlUlo748mHHOgR X-Gm-Gg: ASbGnctli/FiBmIR0lmQxUW3pUe+W4YVDt1vhC/yb4McWIjoFjU0lWl+3ae/hD7Ojua 63XEjzcH1/BsBQgMKr79HbYeAezybcFNrHhVdZWtjsnN0K+lCX4kNufgpilaZJcoYeir+I0s3wg qVI6qoJUrDLFCJDfdLmbpo4NkfhGdtDK9jVdzwqKQS4QRoqTgdzUIeOsoG/8V5IkUOKXaGKL3MG Ybzomon6DJb0GBidgOaRpI5i/qSTve8q9TYUrBoqiI22byBby7NyJzi/XCWUyacIIvShukBYLEE ngJ1Qjl9ItrBBHemYbVUITwQYaXhRcAAGg== X-Google-Smtp-Source: AGHT+IFUgzbddleUlZT0ehYk/Y59B0PJMzJQ++IN3Hp+ieXBjwlFjh5Q5E9wkdn7XokNgE0Z1HE0+g== X-Received: by 2002:a05:600c:190b:b0:439:955d:7ad9 with SMTP id 5b1f17b1804b1-43ba66fe855mr116664525e9.14.1741015418083; Mon, 03 Mar 2025 07:23:38 -0800 (PST) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc63bcaafsm23440385e9.28.2025.03.03.07.23.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:37 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 22/25] bpf: Introduce rqspinlock kfuncs Date: Mon, 3 Mar 2025 07:23:02 -0800 Message-ID: <20250303152305.3195648-23-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5071; h=from:subject; bh=S1zZ1+V5HXfDNJ9sHSSvG8gCJgMqlpTvJbjPruz8ZRU=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZSnBmS5ietCtuPE+4pmKWtJULnErXEc4IC9nN w/N0+8KJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RyvpbD/ 9n03epztxKLKXs5mq2JtSowibMlL/cxUOqIT/Er6iBycLJW6QrV68Gu4EXHFaJ9MZk5E3g2pzTcM3D JO8iLC+CLg/IwzMs+VQkPJBZpTbzrjX8NXMFOVdKOCPuTyBon/w6QQHN+dI0ms1DMMWkcHXpNgvkCp uHYwYc6JsQ4WUGaza1Vx3zZFXhgnXCN7o74lDcriARO1+lcB8Se5+bLCMqU6hcPK03ccyXEFk3RrYH c5lUO3C9t8W0edpQE1LWCGNLwrhhNivyOOOk7vLbdm4Exk01CJuRc5bDzvHgykGNsJg/UDR2EMGFIK 4uj6moeuOAxJcYm832dS62gCEz5OxXOr2LOg1Ee2XdP2obxhEA9pFLKdvbXo59xMfmioEC2XKuP0T3 WjHNHXhqtKaYrns10R6ilDge5CWxKksBPL30/9UkZDy9ERjZkfyD/SDxhmVdNy5ExqHKZCanzFCnMv iDoBAewmWR2hj287w0kBdw7CHaoPg9/HIyKMCv0MImIHZlzDasPBMAXnJaYyY0V7LM2fwr3F0UuOFF DrPYoPvey+vLaRgS1kIfcBQHd6v9z4fLn9JPWdc5ggzlJdRI3a8rLp2gDF14/LcQZJzK3T86HKBGUn Sv9e4xwbZoBil2FPHRU7/R9ghWKfT+cmbi6Iu1+TkNXFlGRHV+TPloE9KwMg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_152340_738530_50DEB8C0 X-CRM114-Status: GOOD ( 19.07 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock, and their irqsave/irqrestore variants, which wrap the rqspinlock APIs. bpf_res_spin_lock returns a conditional result, depending on whether the lock was acquired (NULL is returned when lock acquisition succeeds, non-NULL upon failure). The memory pointed to by the returned pointer upon failure can be dereferenced after the NULL check to obtain the error code. Instead of using the old bpf_spin_lock type, introduce a new type with the same layout, and the same alignment, but a different name to avoid type confusion. Preemption is disabled upon successful lock acquisition, however IRQs are not. Special kfuncs can be introduced later to allow disabling IRQs when taking a spin lock. Resilient locks are safe against AA deadlocks, hence not disabling IRQs currently does not allow violation of kernel safety. __irq_flag annotation is used to accept IRQ flags for the IRQ-variants, with the same semantics as existing bpf_local_irq_{save, restore}. These kfuncs will require additional verifier-side support in subsequent commits, to allow programs to hold multiple locks at the same time. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 7 +++ include/linux/bpf.h | 1 + kernel/locking/rqspinlock.c | 78 ++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 418b652e0249..06906489d9ba 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -23,6 +23,13 @@ struct rqspinlock { }; }; +/* Even though this is same as struct rqspinlock, we need to emit a distinct + * type in BTF for BPF programs. + */ +struct bpf_res_spin_lock { + u32 val; +}; + struct qspinlock; #ifdef CONFIG_QUEUED_SPINLOCKS typedef struct qspinlock rqspinlock_t; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 4c4028d865ee..aa47e11371b3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -30,6 +30,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c index 0031a1bfbd4e..0c53d36e2f6c 100644 --- a/kernel/locking/rqspinlock.c +++ b/kernel/locking/rqspinlock.c @@ -15,6 +15,8 @@ #include #include +#include +#include #include #include #include @@ -684,3 +686,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath); #endif /* CONFIG_QUEUED_SPINLOCKS */ + +__bpf_kfunc_start_defs(); + +#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; }) + +__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock) +{ + int ret; + + BUILD_BUG_ON(sizeof(rqspinlock_t) != sizeof(struct bpf_res_spin_lock)); + BUILD_BUG_ON(__alignof__(rqspinlock_t) != __alignof__(struct bpf_res_spin_lock)); + + preempt_disable(); + ret = res_spin_lock((rqspinlock_t *)lock); + if (unlikely(ret)) { + preempt_enable(); + rqspinlock_report_violation(REPORT_STR(ret), lock); + return ret; + } + return 0; +} + +__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock) +{ + res_spin_unlock((rqspinlock_t *)lock); + preempt_enable(); +} + +__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag) +{ + u64 *ptr = (u64 *)flags__irq_flag; + unsigned long flags; + int ret; + + preempt_disable(); + local_irq_save(flags); + ret = res_spin_lock((rqspinlock_t *)lock); + if (unlikely(ret)) { + local_irq_restore(flags); + preempt_enable(); + rqspinlock_report_violation(REPORT_STR(ret), lock); + return ret; + } + *ptr = flags; + return 0; +} + +__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag) +{ + u64 *ptr = (u64 *)flags__irq_flag; + unsigned long flags = *ptr; + + res_spin_unlock((rqspinlock_t *)lock); + local_irq_restore(flags); + preempt_enable(); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(rqspinlock_kfunc_ids) +BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_res_spin_unlock) +BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore) +BTF_KFUNCS_END(rqspinlock_kfunc_ids) + +static const struct btf_kfunc_id_set rqspinlock_kfunc_set = { + .owner = THIS_MODULE, + .set = &rqspinlock_kfunc_ids, +}; + +static __init int rqspinlock_register_kfuncs(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set); +} +late_initcall(rqspinlock_register_kfuncs); From patchwork Mon Mar 3 15:23:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999114 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 73C82C282CD for ; Mon, 3 Mar 2025 16:01:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=+4h+ukFvLyEJPpccEjjwwvawjNKj/Xrf/LqUnWOEd4Y=; b=m2RcPatunxP2yX9oDuefu+Ypv8 ydVUPt8vWiZirU2K2+YBnteiYtJnR2868h7WKFS4BjQIhEFoJua84krDIxMqRJkeGgY6w5F6OEBs5 AO4nplwOfLvh1NeoX7hI8DAK+El9grPcdO8AVsituX+rK7klY6Y3ZPSrKhnjKXXNqPaGb2OTi3zwQ I+PHANhJS4E/gomfnzRUGrhWVs38MFTaRSl0MxYu84PhOT9wMiX9YfF2LlwHxfZHH5kOnbU9d2Oks AQCyZXZFY8WD9cWz0HN3GNjvtWJzSVwHSbcAk16zKuqCoqJbxcsurNybJzzPFXOe8KxgGdBUTm0eq zI2/4OKA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp8Ed-00000001PtB-26f3; Mon, 03 Mar 2025 16:01:39 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dt-00000001IQ8-13zK for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:43 +0000 Received: by mail-wm1-x343.google.com with SMTP id 5b1f17b1804b1-4398e3dfc66so41395395e9.0 for ; Mon, 03 Mar 2025 07:23:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015420; x=1741620220; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+4h+ukFvLyEJPpccEjjwwvawjNKj/Xrf/LqUnWOEd4Y=; b=jrRXhIsqStyh10h7QFZDT8pdbCX7E4ad0LPaLTaJWT2V1QOxB0mB9MR0/1qkdFTn6P XA++jQdSWK3UXxCsiGKS639w49Z0R9VjhjCxN0BMXxHs5zl+RCvUqMb/eI376xDfuBVO vDn1YMgJJcEW2L/iYk1eooqXDu09Mc3rXiltS2k28Kc4idwHrRB+1dz+4wjAfCratX3Y 3Bu9jKn2Iqg0Csq4fRXw34ENtYqvBMyy70DbnieSNisHOSycdJF9kFcEbdWDHEJEUMPM U3CXjhEiMkGiIulwUgXto3BT9tWtw1lI+5pAKe3wtOpfEb9bztWKSr2lOBjqXV6olcIM VG2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015420; x=1741620220; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+4h+ukFvLyEJPpccEjjwwvawjNKj/Xrf/LqUnWOEd4Y=; b=npBinP00Rk0v95b3qgOX3OCeF5D+RcjI7vt4i5BBqMPJdZeAvda+0ZJpO9de0FQTgm tFY0LSa2Hjx8hZW5bGl9Dx5hd0JJLNMmxvMCiN+VooqsraD90EnOZDD4S/QWQqYs4hZ3 iVjuZLq8LVe5dmSMMtMha/gxEDMQIWHBFuBPhw+UkerwxEuSquIfsg1TV1o1xBIFCkpF 1FXhS5pe0H1JjzTYn55GY4kJPudYCAdkZ7zy0V9AvZu4qky0RmLQarqtZBkIWaoVatGy 4jXS3246IPPbPhzTsQC2Y30GGtYO/ZGI9mxkBgrPZXtQV3tTjgPaAQro4ITEMAu4fAnl vbZw== X-Forwarded-Encrypted: i=1; AJvYcCUnFLLhIDXSCjpDI+AnMC3GZJlpmMCNXZBc5SOLEs/SU4Za3wU5KuBxa9CW3Myg3RPp5InwZmei37MkduNp2i+U@lists.infradead.org X-Gm-Message-State: AOJu0YxtU+uRToORvoSs8RmS7KHcJJQj6HRpKBs8qZQpGxw7CdjzSLnP 5BYoUTvxV2wLZghtVSqAsIRA/DUC0CGN/Wh7pJ5MhgHeR+7+PZwU X-Gm-Gg: ASbGnctx0uUdmgOCpcFEbSmji/eaAVjguJZNzvf0Zsynq39Di8Ir+lti6FYH4Kh9E+J R0oJvRXm1kmXO8xgOozkrOrVQFf58A9Qiq1ZoEX0RrYMpvZVByoVqTfSRBXererbAQtDDY352Wz e7+YasdzQachLBAhWv5nLXH07vAAMdvrWGrJkQGrlyzdyhS2C9aC8WF6ISjBXJo+jD9StU8HHl8 kSbBjaqhkiQrUZjIwE6KRTj4dYwvbbw5HXl5ZMKZs+4JUrmIzDytsS6o08cOIP4JnKQn9nMyoz2 MtUFO5BwO5iv3gluEcWIrRnQR7HVU8Pq0xQ= X-Google-Smtp-Source: AGHT+IEdAQLbk3XWgP5dxzDcU8Ji7pMN7/8gM3fi+1QJQlX4tiY+pFVN015rHmkdZZHHkiT9FlYiGA== X-Received: by 2002:a05:600c:a4b:b0:439:8185:4ad4 with SMTP id 5b1f17b1804b1-43ba6747082mr109032045e9.27.1741015419506; Mon, 03 Mar 2025 07:23:39 -0800 (PST) Received: from localhost ([2a03:2880:31ff:5f::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47b7c4fsm14905417f8f.52.2025.03.03.07.23.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:38 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Eduard Zingerman , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 23/25] bpf: Implement verifier support for rqspinlock Date: Mon, 3 Mar 2025 07:23:03 -0800 Message-ID: <20250303152305.3195648-24-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=28432; h=from:subject; bh=MO7GJTSenU9MV6IQtiRHmjgSgn53O+a0eeEhZHGj470=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZN6Fo9q/LgMjUiSwqqQrTQMGCUIN7wbtUmcfA +jv1xeaJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8Ryk4iD/ 965nxBXYR4dDVLmBm4StFtOJqi96rYjPW14Nx0sXbgZqtredefGx6C9t3nso+ga5OFs6Yx+xMoDQRg YWbhYxO3Ah7+hOpG/d6L0s+xnArh3d/iZccG+ZZBrwtDmkTir95j6qzGzg5Yih5QvE+voTDuUY53Em Ksftg/Dy4mTbp9WeCZgzvl9cgXSCuOmkDBPQVStIqV0sD7Cc2djPPY5bziy8LAbFRzIg9fLWLDQsOP i/7Edl/jMnt26/1s8G2GQHZmONpUOMz1+EbKo5yUnb2FDMLMN3cWT77SMGKxAf3WGBuM8dh7aETlfc Qwa3IlRRmJlIT2V7aaq5wpCTglKdk7GcrwmBLAZwPa9E2OdnrsE0xbfgKwyadRTkKi/g+YDjuX62oK bxvAMKv1+mMNQ2R2NWpZSi77gVZSJN8/IMzAifanV8irD72FhR30XzZ6eKed/e6z7Q2b4Vh9CK6/X+ 5cTtfh+RGvbKXZ2/rpoirQMG6Mn8kwxoChEbw8Qg1FSoZt6b1R81aAfJyxksexQhZaz0g9/KRtau4E isCtpZec3tmqmAfuHSAs7azUWnmw/YBFVByhGB8eK5zHHNdoDBerhK5Et15S+Ds3zyUv2wL9MxvyPs McFMK4xYH+u7FbG+hORXqQ2cp/vKO8Oaanj5qSwrZI4qmZ63JRCdJp0VULrQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072341_459776_F720F3B6 X-CRM114-Status: GOOD ( 29.39 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce verifier-side support for rqspinlock kfuncs. The first step is allowing bpf_res_spin_lock type to be defined in map values and allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK field to recognize and validate. Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only one of them (and at most one of them per-object, like before) must be present. The bpf_res_spin_lock can also be used to protect objects that require lock protection for their kfuncs, like BPF rbtree and linked list. The verifier plumbing to simulate success and failure cases when calling the kfuncs is done by pushing a new verifier state to the verifier state stack which will verify the failure case upon calling the kfunc. The path where success is indicated creates all lock reference state and IRQ state (if necessary for irqsave variants). In the case of failure, the state clears the registers r0-r5, sets the return value, and skips kfunc processing, proceeding to the next instruction. When marking the return value for success case, the value is marked as 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program, whenever user checks the return value as 'if (ret)' or 'if (ret < 0)' the verifier never traverses such branches for success cases, and would be aware that the lock is not held in such cases. We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs are invoked. We introduce a kfunc_class state to avoid mixing lock irqrestore kfuncs with IRQ state created by bpf_local_irq_save. With all this infrastructure, these kfuncs become usable in programs while satisfying all safety properties required by the kernel. Acked-by: Eduard Zingerman Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf.h | 9 ++ include/linux/bpf_verifier.h | 16 ++- kernel/bpf/btf.c | 26 ++++- kernel/bpf/syscall.c | 6 +- kernel/bpf/verifier.c | 219 ++++++++++++++++++++++++++++------- 5 files changed, 231 insertions(+), 45 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index aa47e11371b3..ad4468422770 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -205,6 +205,7 @@ enum btf_field_type { BPF_REFCOUNT = (1 << 9), BPF_WORKQUEUE = (1 << 10), BPF_UPTR = (1 << 11), + BPF_RES_SPIN_LOCK = (1 << 12), }; typedef void (*btf_dtor_kfunc_t)(void *); @@ -240,6 +241,7 @@ struct btf_record { u32 cnt; u32 field_mask; int spin_lock_off; + int res_spin_lock_off; int timer_off; int wq_off; int refcount_off; @@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return "bpf_spin_lock"; + case BPF_RES_SPIN_LOCK: + return "bpf_res_spin_lock"; case BPF_TIMER: return "bpf_timer"; case BPF_WORKQUEUE: @@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return sizeof(struct bpf_spin_lock); + case BPF_RES_SPIN_LOCK: + return sizeof(struct bpf_res_spin_lock); case BPF_TIMER: return sizeof(struct bpf_timer); case BPF_WORKQUEUE: @@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return __alignof__(struct bpf_spin_lock); + case BPF_RES_SPIN_LOCK: + return __alignof__(struct bpf_res_spin_lock); case BPF_TIMER: return __alignof__(struct bpf_timer); case BPF_WORKQUEUE: @@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr) case BPF_RB_ROOT: /* RB_ROOT_CACHED 0-inits, no need to do anything after memset */ case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_WORKQUEUE: case BPF_KPTR_UNREF: diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index d338f2a96bba..269449363f78 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -115,6 +115,14 @@ struct bpf_reg_state { int depth:30; } iter; + /* For irq stack slots */ + struct { + enum { + IRQ_NATIVE_KFUNC, + IRQ_LOCK_KFUNC, + } kfunc_class; + } irq; + /* Max size from any of the above. */ struct { unsigned long raw1; @@ -255,9 +263,11 @@ struct bpf_reference_state { * default to pointer reference on zero initialization of a state. */ enum ref_state_type { - REF_TYPE_PTR = 1, - REF_TYPE_IRQ = 2, - REF_TYPE_LOCK = 3, + REF_TYPE_PTR = (1 << 1), + REF_TYPE_IRQ = (1 << 2), + REF_TYPE_LOCK = (1 << 3), + REF_TYPE_RES_LOCK = (1 << 4), + REF_TYPE_RES_LOCK_IRQ = (1 << 5), } type; /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 519e3f5e9c10..f7a2bfb0c11a 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3481,6 +3481,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_ goto end; } } + if (field_mask & BPF_RES_SPIN_LOCK) { + if (!strcmp(name, "bpf_res_spin_lock")) { + if (*seen_mask & BPF_RES_SPIN_LOCK) + return -E2BIG; + *seen_mask |= BPF_RES_SPIN_LOCK; + type = BPF_RES_SPIN_LOCK; + goto end; + } + } if (field_mask & BPF_TIMER) { if (!strcmp(name, "bpf_timer")) { if (*seen_mask & BPF_TIMER) @@ -3659,6 +3668,7 @@ static int btf_find_field_one(const struct btf *btf, switch (field_type) { case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_WORKQUEUE: case BPF_LIST_NODE: @@ -3952,6 +3962,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type return ERR_PTR(-ENOMEM); rec->spin_lock_off = -EINVAL; + rec->res_spin_lock_off = -EINVAL; rec->timer_off = -EINVAL; rec->wq_off = -EINVAL; rec->refcount_off = -EINVAL; @@ -3979,6 +3990,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type /* Cache offset for faster lookup at runtime */ rec->spin_lock_off = rec->fields[i].offset; break; + case BPF_RES_SPIN_LOCK: + WARN_ON_ONCE(rec->spin_lock_off >= 0); + /* Cache offset for faster lookup at runtime */ + rec->res_spin_lock_off = rec->fields[i].offset; + break; case BPF_TIMER: WARN_ON_ONCE(rec->timer_off >= 0); /* Cache offset for faster lookup at runtime */ @@ -4022,9 +4038,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type rec->cnt++; } + if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) { + ret = -EINVAL; + goto end; + } + /* bpf_{list_head, rb_node} require bpf_spin_lock */ if ((btf_record_has_field(rec, BPF_LIST_HEAD) || - btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) { + btf_record_has_field(rec, BPF_RB_ROOT)) && + (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) { ret = -EINVAL; goto end; } @@ -5637,7 +5659,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) type = &tab->types[tab->cnt]; type->btf_id = i; - record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | + record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT | BPF_KPTR, t->size); /* The record cannot be unset, treat it as an error if so */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 57a438706215..5cf017e37d7d 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -665,6 +665,7 @@ void btf_record_free(struct btf_record *rec) case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: @@ -717,6 +718,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: @@ -794,6 +796,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) switch (fields[i].type) { case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: break; case BPF_TIMER: bpf_timer_cancel_and_free(field_ptr); @@ -1229,7 +1232,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, return -EINVAL; map->record = btf_parse_fields(btf, value_type, - BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | + BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR, map->value_size); if (!IS_ERR_OR_NULL(map->record)) { @@ -1248,6 +1251,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, case 0: continue; case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_ARRAY && map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index eb1624f6e743..6c8ef72ee6bc 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -456,7 +456,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog) static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) { - return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK); + return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK); } static bool type_is_rdonly_mem(u32 type) @@ -1148,7 +1148,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id); static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, - struct bpf_reg_state *reg, int insn_idx) + struct bpf_reg_state *reg, int insn_idx, + int kfunc_class) { struct bpf_func_state *state = func(env, reg); struct bpf_stack_state *slot; @@ -1170,6 +1171,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ st->live |= REG_LIVE_WRITTEN; st->ref_obj_id = id; + st->irq.kfunc_class = kfunc_class; for (i = 0; i < BPF_REG_SIZE; i++) slot->slot_type[i] = STACK_IRQ_FLAG; @@ -1178,7 +1180,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, return 0; } -static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg) +static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg, + int kfunc_class) { struct bpf_func_state *state = func(env, reg); struct bpf_stack_state *slot; @@ -1192,6 +1195,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r slot = &state->stack[spi]; st = &slot->spilled_ptr; + if (st->irq.kfunc_class != kfunc_class) { + const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock"; + const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock"; + + verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n", + flag_kfunc, used_kfunc); + return -EINVAL; + } + err = release_irq_state(env->cur_state, st->ref_obj_id); WARN_ON_ONCE(err && err != -EACCES); if (err) { @@ -1602,7 +1614,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st for (i = 0; i < state->acquired_refs; i++) { struct bpf_reference_state *s = &state->refs[i]; - if (s->type != type) + if (!(s->type & type)) continue; if (s->id == id && s->ptr == ptr) @@ -8063,6 +8075,12 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg return err; } +enum { + PROCESS_SPIN_LOCK = (1 << 0), + PROCESS_RES_LOCK = (1 << 1), + PROCESS_LOCK_IRQ = (1 << 2), +}; + /* Implementation details: * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL. * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL. @@ -8085,30 +8103,33 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg * env->cur_state->active_locks remembers which map value element or allocated * object got locked and clears it after bpf_spin_unlock. */ -static int process_spin_lock(struct bpf_verifier_env *env, int regno, - bool is_lock) +static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags) { + bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK; + const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin"; struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; struct bpf_verifier_state *cur = env->cur_state; bool is_const = tnum_is_const(reg->var_off); + bool is_irq = flags & PROCESS_LOCK_IRQ; u64 val = reg->var_off.value; struct bpf_map *map = NULL; struct btf *btf = NULL; struct btf_record *rec; + u32 spin_lock_off; int err; if (!is_const) { verbose(env, - "R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n", - regno); + "R%d doesn't have constant offset. %s_lock has to be at the constant offset\n", + regno, lock_str); return -EINVAL; } if (reg->type == PTR_TO_MAP_VALUE) { map = reg->map_ptr; if (!map->btf) { verbose(env, - "map '%s' has to have BTF in order to use bpf_spin_lock\n", - map->name); + "map '%s' has to have BTF in order to use %s_lock\n", + map->name, lock_str); return -EINVAL; } } else { @@ -8116,36 +8137,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, } rec = reg_btf_record(reg); - if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) { - verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local", - map ? map->name : "kptr"); + if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) { + verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local", + map ? map->name : "kptr", lock_str); return -EINVAL; } - if (rec->spin_lock_off != val + reg->off) { - verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n", - val + reg->off, rec->spin_lock_off); + spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off; + if (spin_lock_off != val + reg->off) { + verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n", + val + reg->off, lock_str, spin_lock_off); return -EINVAL; } if (is_lock) { void *ptr; + int type; if (map) ptr = map; else ptr = btf; - if (cur->active_locks) { - verbose(env, - "Locking two bpf_spin_locks are not allowed\n"); - return -EINVAL; + if (!is_res_lock && cur->active_locks) { + if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) { + verbose(env, + "Locking two bpf_spin_locks are not allowed\n"); + return -EINVAL; + } + } else if (is_res_lock && cur->active_locks) { + if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, reg->id, ptr)) { + verbose(env, "Acquiring the same lock again, AA deadlock detected\n"); + return -EINVAL; + } } - err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr); + + if (is_res_lock && is_irq) + type = REF_TYPE_RES_LOCK_IRQ; + else if (is_res_lock) + type = REF_TYPE_RES_LOCK; + else + type = REF_TYPE_LOCK; + err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr); if (err < 0) { verbose(env, "Failed to acquire lock state\n"); return err; } } else { void *ptr; + int type; if (map) ptr = map; @@ -8153,12 +8191,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, ptr = btf; if (!cur->active_locks) { - verbose(env, "bpf_spin_unlock without taking a lock\n"); + verbose(env, "%s_unlock without taking a lock\n", lock_str); return -EINVAL; } - if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) { - verbose(env, "bpf_spin_unlock of different lock\n"); + if (is_res_lock && is_irq) + type = REF_TYPE_RES_LOCK_IRQ; + else if (is_res_lock) + type = REF_TYPE_RES_LOCK; + else + type = REF_TYPE_LOCK; + if (release_lock_state(cur, type, reg->id, ptr)) { + verbose(env, "%s_unlock of different lock\n", lock_str); return -EINVAL; } @@ -9484,11 +9528,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, return -EACCES; } if (meta->func_id == BPF_FUNC_spin_lock) { - err = process_spin_lock(env, regno, true); + err = process_spin_lock(env, regno, PROCESS_SPIN_LOCK); if (err) return err; } else if (meta->func_id == BPF_FUNC_spin_unlock) { - err = process_spin_lock(env, regno, false); + err = process_spin_lock(env, regno, 0); if (err) return err; } else { @@ -11370,7 +11414,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn regs[BPF_REG_0].map_uid = meta.map_uid; regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag; if (!type_may_be_null(ret_flag) && - btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) { + btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) { regs[BPF_REG_0].id = ++env->id_gen; } break; @@ -11542,10 +11586,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn /* mark_btf_func_reg_size() is used when the reg size is determined by * the BTF func_proto's return value size and argument. */ -static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, - size_t reg_size) +static void __mark_btf_func_reg_size(struct bpf_verifier_env *env, struct bpf_reg_state *regs, + u32 regno, size_t reg_size) { - struct bpf_reg_state *reg = &cur_regs(env)[regno]; + struct bpf_reg_state *reg = ®s[regno]; if (regno == BPF_REG_0) { /* Function return value */ @@ -11563,6 +11607,12 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, } } +static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, + size_t reg_size) +{ + return __mark_btf_func_reg_size(env, cur_regs(env), regno, reg_size); +} + static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta) { return meta->kfunc_flags & KF_ACQUIRE; @@ -11700,6 +11750,7 @@ enum { KF_ARG_RB_ROOT_ID, KF_ARG_RB_NODE_ID, KF_ARG_WORKQUEUE_ID, + KF_ARG_RES_SPIN_LOCK_ID, }; BTF_ID_LIST(kf_arg_btf_ids) @@ -11709,6 +11760,7 @@ BTF_ID(struct, bpf_list_node) BTF_ID(struct, bpf_rb_root) BTF_ID(struct, bpf_rb_node) BTF_ID(struct, bpf_wq) +BTF_ID(struct, bpf_res_spin_lock) static bool __is_kfunc_ptr_arg_type(const struct btf *btf, const struct btf_param *arg, int type) @@ -11757,6 +11809,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg) return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID); } +static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg) +{ + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID); +} + static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf, const struct btf_param *arg) { @@ -11828,6 +11885,7 @@ enum kfunc_ptr_arg_type { KF_ARG_PTR_TO_MAP, KF_ARG_PTR_TO_WORKQUEUE, KF_ARG_PTR_TO_IRQ_FLAG, + KF_ARG_PTR_TO_RES_SPIN_LOCK, }; enum special_kfunc_type { @@ -11866,6 +11924,10 @@ enum special_kfunc_type { KF_bpf_iter_num_destroy, KF_bpf_set_dentry_xattr, KF_bpf_remove_dentry_xattr, + KF_bpf_res_spin_lock, + KF_bpf_res_spin_unlock, + KF_bpf_res_spin_lock_irqsave, + KF_bpf_res_spin_unlock_irqrestore, }; BTF_SET_START(special_kfunc_set) @@ -11955,6 +12017,10 @@ BTF_ID(func, bpf_remove_dentry_xattr) BTF_ID_UNUSED BTF_ID_UNUSED #endif +BTF_ID(func, bpf_res_spin_lock) +BTF_ID(func, bpf_res_spin_unlock) +BTF_ID(func, bpf_res_spin_lock_irqsave) +BTF_ID(func, bpf_res_spin_unlock_irqrestore) static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { @@ -12048,6 +12114,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, if (is_kfunc_arg_irq_flag(meta->btf, &args[argno])) return KF_ARG_PTR_TO_IRQ_FLAG; + if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno])) + return KF_ARG_PTR_TO_RES_SPIN_LOCK; + if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) { if (!btf_type_is_struct(ref_t)) { verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n", @@ -12155,13 +12224,19 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, struct bpf_kfunc_call_arg_meta *meta) { struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; + int err, kfunc_class = IRQ_NATIVE_KFUNC; bool irq_save; - int err; - if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) { + if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) { irq_save = true; - } else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) { + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) + kfunc_class = IRQ_LOCK_KFUNC; + } else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) { irq_save = false; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) + kfunc_class = IRQ_LOCK_KFUNC; } else { verbose(env, "verifier internal error: unknown irq flags kfunc\n"); return -EFAULT; @@ -12177,7 +12252,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, if (err) return err; - err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx); + err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class); if (err) return err; } else { @@ -12191,7 +12266,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, if (err) return err; - err = unmark_stack_slot_irq_flag(env, reg); + err = unmark_stack_slot_irq_flag(env, reg, kfunc_class); if (err) return err; } @@ -12318,7 +12393,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ if (!env->cur_state->active_locks) return -EINVAL; - s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr); + s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, + id, ptr); if (!s) { verbose(env, "held lock and object are not in the same allocation\n"); return -EINVAL; @@ -12354,9 +12430,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id) btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]; } +static bool is_bpf_res_spin_lock_kfunc(u32 btf_id) +{ + return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] || + btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] || + btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] || + btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]; +} + static bool kfunc_spin_allowed(u32 btf_id) { - return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id); + return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) || + is_bpf_res_spin_lock_kfunc(btf_id); } static bool is_sync_callback_calling_kfunc(u32 btf_id) @@ -12788,6 +12873,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ case KF_ARG_PTR_TO_CONST_STR: case KF_ARG_PTR_TO_WORKQUEUE: case KF_ARG_PTR_TO_IRQ_FLAG: + case KF_ARG_PTR_TO_RES_SPIN_LOCK: break; default: WARN_ON_ONCE(1); @@ -13086,6 +13172,28 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ if (ret < 0) return ret; break; + case KF_ARG_PTR_TO_RES_SPIN_LOCK: + { + int flags = PROCESS_RES_LOCK; + + if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { + verbose(env, "arg#%d doesn't point to map value or allocated object\n", i); + return -EINVAL; + } + + if (!is_bpf_res_spin_lock_kfunc(meta->func_id)) + return -EFAULT; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) + flags |= PROCESS_SPIN_LOCK; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) + flags |= PROCESS_LOCK_IRQ; + ret = process_spin_lock(env, regno, flags); + if (ret < 0) + return ret; + break; + } } } @@ -13171,6 +13279,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, insn_aux->is_iter_next = is_iter_next_kfunc(&meta); + if (!insn->off && + (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] || + insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) { + struct bpf_verifier_state *branch; + struct bpf_reg_state *regs; + + branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false); + if (!branch) { + verbose(env, "failed to push state for failed lock acquisition\n"); + return -ENOMEM; + } + + regs = branch->frame[branch->curframe]->regs; + + /* Clear r0-r5 registers in forked state */ + for (i = 0; i < CALLER_SAVED_REGS; i++) + mark_reg_not_init(env, regs, caller_saved[i]); + + mark_reg_unknown(env, regs, BPF_REG_0); + err = __mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1); + if (err) { + verbose(env, "failed to mark s32 range for retval in forked state for lock\n"); + return err; + } + __mark_btf_func_reg_size(env, regs, BPF_REG_0, sizeof(u32)); + } + if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) { verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n"); return -EACCES; @@ -13341,6 +13476,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (btf_type_is_scalar(t)) { mark_reg_unknown(env, regs, BPF_REG_0); + if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] || + meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) + __mark_reg_const_zero(env, ®s[BPF_REG_0]); mark_btf_func_reg_size(env, BPF_REG_0, t->size); } else if (btf_type_is_ptr(t)) { ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id); @@ -18275,7 +18413,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, case STACK_IRQ_FLAG: old_reg = &old->stack[spi].spilled_ptr; cur_reg = &cur->stack[spi].spilled_ptr; - if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap)) + if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) || + old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class) return false; break; case STACK_MISC: @@ -18319,6 +18458,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c case REF_TYPE_IRQ: break; case REF_TYPE_LOCK: + case REF_TYPE_RES_LOCK: + case REF_TYPE_RES_LOCK_IRQ: if (old->refs[i].ptr != cur->refs[i].ptr) return false; break; @@ -19641,7 +19782,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } } - if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) { + if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n"); return -EINVAL; From patchwork Mon Mar 3 15:23:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AE93FC282CD for ; Mon, 3 Mar 2025 16:03:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=JK06538PoT0hdERtpqSLZ9dXgVnOMpofNScplUny+Pg=; b=ToLUDp6fAIqh9QVzoW61g/9m8f z1fO06X5g8aYvBEndPN4cP31Cjo4lG7o+OJJaM2ZS7rmnIzCjfrM3YNcgK38tii4x1x/4p+Bjyz0k Fyajq2PHsPsc7Iuemu01xG1CY+nDkfrf5h/xJcXIdbsDznfehiDkzRs1+vPTc2XfDDDol0o5MksIm p5OpX4/dYS5jEQgkihAQ6qQ19/cdN0WKwjAih/KvW+2SKucV0ws4VI2iT+h6yTGWdQCKm1gcJK8Q2 Gv5BnclwD8RE9RYeoFUvQP3+iMRp4MjndovpWhCDa3Qk9zNbsnun5SffVS2k8YJKzKDCTXUeedXKI uQfVAFvA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp8GC-00000001QG9-1GT7; Mon, 03 Mar 2025 16:03:16 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7du-00000001IQv-2rt0 for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:43 +0000 Received: by mail-wr1-x441.google.com with SMTP id ffacd0b85a97d-390effd3e85so3257377f8f.0 for ; Mon, 03 Mar 2025 07:23:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015421; x=1741620221; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JK06538PoT0hdERtpqSLZ9dXgVnOMpofNScplUny+Pg=; b=kBINbnLkRAJEqIP3EO+9jeNAC0kGTS/P09lZqom/mUpCWOyukQjTOMo5yc3w0hxTKU XytO/KfRt56rtkvzIu8jtRNH0cNdTP4xFl6WWJjMkayEtPshE6h5oZIusKy+9Wunsp8B 8pWDQbhVNnYjutNsGdXN+9TP11myjEZJBp5+uqLGmrBMWzKKLBR79ia68zHVc64zzgW6 teP+B/NeFArBaoydzCMTvBNYuAZP+A0FqvLifJTsLLPbgk3NOtCvmC86JeKlI84/t1EQ KgbZ6Vi8HG8Sqwqjrl05FhjEhMVvlqz68RPQ8h8+WSAod2EJBpmbip5u+SRkMsHhDzyu HERQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015421; x=1741620221; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JK06538PoT0hdERtpqSLZ9dXgVnOMpofNScplUny+Pg=; b=L2Pu6dtECUE58nqnqHcuRE/VlzHWzSRK6r0Ws01ATb4jTCpN+Onp7NCMtAeIhMUr4Q kYZIkgGou65J+0H7r5RLLZ1zM8alcnyWRLkD+imd+ES5++VNZmNJ4he8UwxksixXti4s 7hLGx6aEnyP8sONl8JyStO/l18og0MMOHKyKi0EBb3jxi3l+vBk3Y6O+JvyTD/9Xc95B FHjh6L0+8Iv/KIiZA2LrRbs/rn6ce+eMYp6ujencMei1vvbzlG+ebrRtNytu6HE/vBt6 ouuaEON4tqAFeHk/w0Waao6lLQ+PAbuDV4beynCuojuZGi29wOed7m5Ln0T5OvfIX1c7 gz8A== X-Forwarded-Encrypted: i=1; AJvYcCUFYjqyDDKt191ARFJhGbOyh+JiE3TdJl2d/MxZtplILdOQF3c4PWnD8tD8hjubXwb7G6MvZraC1IQnE/xBSfUb@lists.infradead.org X-Gm-Message-State: AOJu0YxwDEv5nwCbKoToX1bBVn0UR7vCYVxdQMdP558BbPCAm3W/fBPX h9ETyY0C9UhNrBfg22TgC6xLuGeAISoqyUujzAetmJmwqLAOSsKe X-Gm-Gg: ASbGnct2PnAXDLyF4o53QHQeWpFPedp/Sp699MtPH1pNqaQJeAkfQRKZr484LzO1mpD pVrEifFx56BDa0bIQgrnyYlwIbdgI7PWI0RiEUMXEy12eQOu4eaPIv0/E7m3f8zDivz0sVYnPM9 ct0t31hnkZCh6KaoqK2YtjT7PU7JZQyfHJS70Pm29SZmuKt1W0e+8mpFW8VrcDsgXmP9nfs3qzf 7WZKoADOYgfmo3X4lTrfyNmGT4fKmICh4mO2vEAdCabkz9iPc/w6mNxHam0/BQWiptS3TCwQWew Fv5PoKD6HI6ZsfHQ3NlkS+7KmY3qLAp3NAk= X-Google-Smtp-Source: AGHT+IEWrOLXEnoxplL0eXEi/SpthtQwbZ6y7pKL2e59tZ8s3JQadQxULhRUvfXdSLXbDwyc9fQvQw== X-Received: by 2002:a5d:5f42:0:b0:391:5f:fa4e with SMTP id ffacd0b85a97d-391005ffe69mr4807658f8f.29.1741015421039; Mon, 03 Mar 2025 07:23:41 -0800 (PST) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47a7868sm14700001f8f.24.2025.03.03.07.23.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:40 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 24/25] bpf: Maintain FIFO property for rqspinlock unlock Date: Mon, 3 Mar 2025 07:23:04 -0800 Message-ID: <20250303152305.3195648-25-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4773; h=from:subject; bh=yd3gRpKNHABqLGxFwmo4qitDCfszyenvxRvQDwJYTjI=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZ7k4vMGesNz3gUHT6LOfeGtXRbGBf+aKUvuv/ umGpqdqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RygluD/ 4uWKUHAjRffEJ86xP+I5MHqRuOHQ9jEAhSSdV7VPagDzTfZS8/JrswTTGLd4rS2xgwcFBwODkZ2aBs ayN+aLN1VPfzRRauqqYTO74IQCjSQVlRfrUc7QPeFDKF5iD9QPiLL3exq4eW9AGB7e+PuYaWH0wp3Q Wo/2jS/gXUhBTx2P5Iv9Lj56om69Hq7TGuUFrQVyvaC0CeinRJzaROx754KgWhnijwvOYUHsFZNFWS BR9qzsCLUTJhZhlEoX1z+ckbFZlKaMumFJaLjFUi3LW0r0wppTr+j6iVZ29CPh4p0x4H1/YOVIOehV mrk58kirt5EbcalsufoTGdl152sD0qdKJHO0pIefGYB9QVp9zzHJp+dGg5erc8UIcmn6J/IJuehSWC KgKZF6l00P5i0xwwz7h3GxNqSOfKGemo+nMTASSCL5HQGjzyWrrTOl3Yt5Sztu7F+qo4kTW554DaT6 z+9LE7AOC45QooYOFu15kG+QAhLeo+5nrl38cyPi60IHkU1JQ74e4y/92wb+i1BJsmtndDZAcwh48I YcPMi8nG85nDdwOaRNx4VNvVKDPLz3FdagXogkzXKN9BtKEUy/7x5cVgwICmIOCmHC3fqFTyvAimEK 8q9RN8RRT/ciTibX4PZBm9gB4uF8rSuHE+89XzSMRgzUjf0vXy/Qf6oVA84w== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072342_725281_EB763419 X-CRM114-Status: GOOD ( 20.29 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Since out-of-order unlocks are unsupported for rqspinlock, and irqsave variants enforce strict FIFO ordering anyway, make the same change for normal non-irqsave variants, such that FIFO ordering is enforced. Two new verifier state fields (active_lock_id, active_lock_ptr) are used to denote the top of the stack, and prev_id and prev_ptr are ascertained whenever popping the topmost entry through an unlock. Take special care to make these fields part of the state comparison in refsafe. Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf_verifier.h | 3 +++ kernel/bpf/verifier.c | 33 ++++++++++++++++++++++++++++----- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 269449363f78..7348bd824e16 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -268,6 +268,7 @@ struct bpf_reference_state { REF_TYPE_LOCK = (1 << 3), REF_TYPE_RES_LOCK = (1 << 4), REF_TYPE_RES_LOCK_IRQ = (1 << 5), + REF_TYPE_LOCK_MASK = REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, } type; /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). @@ -434,6 +435,8 @@ struct bpf_verifier_state { u32 active_locks; u32 active_preempt_locks; u32 active_irq_id; + u32 active_lock_id; + void *active_lock_ptr; bool active_rcu_lock; bool speculative; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6c8ef72ee6bc..d3be8932abe4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1421,6 +1421,8 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf dst->active_preempt_locks = src->active_preempt_locks; dst->active_rcu_lock = src->active_rcu_lock; dst->active_irq_id = src->active_irq_id; + dst->active_lock_id = src->active_lock_id; + dst->active_lock_ptr = src->active_lock_ptr; return 0; } @@ -1520,6 +1522,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r s->ptr = ptr; state->active_locks++; + state->active_lock_id = id; + state->active_lock_ptr = ptr; return 0; } @@ -1570,16 +1574,24 @@ static bool find_reference_state(struct bpf_verifier_state *state, int ptr_id) static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr) { + void *prev_ptr = NULL; + u32 prev_id = 0; int i; for (i = 0; i < state->acquired_refs; i++) { - if (state->refs[i].type != type) - continue; - if (state->refs[i].id == id && state->refs[i].ptr == ptr) { + if (state->refs[i].type == type && state->refs[i].id == id && + state->refs[i].ptr == ptr) { release_reference_state(state, i); state->active_locks--; + /* Reassign active lock (id, ptr). */ + state->active_lock_id = prev_id; + state->active_lock_ptr = prev_ptr; return 0; } + if (state->refs[i].type & REF_TYPE_LOCK_MASK) { + prev_id = state->refs[i].id; + prev_ptr = state->refs[i].ptr; + } } return -EINVAL; } @@ -8201,6 +8213,14 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags) type = REF_TYPE_RES_LOCK; else type = REF_TYPE_LOCK; + if (!find_lock_state(cur, type, reg->id, ptr)) { + verbose(env, "%s_unlock of different lock\n", lock_str); + return -EINVAL; + } + if (reg->id != cur->active_lock_id || ptr != cur->active_lock_ptr) { + verbose(env, "%s_unlock cannot be out of order\n", lock_str); + return -EINVAL; + } if (release_lock_state(cur, type, reg->id, ptr)) { verbose(env, "%s_unlock of different lock\n", lock_str); return -EINVAL; @@ -12393,8 +12413,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ if (!env->cur_state->active_locks) return -EINVAL; - s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, - id, ptr); + s = find_lock_state(env->cur_state, REF_TYPE_LOCK_MASK, id, ptr); if (!s) { verbose(env, "held lock and object are not in the same allocation\n"); return -EINVAL; @@ -18449,6 +18468,10 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap)) return false; + if (!check_ids(old->active_lock_id, cur->active_lock_id, idmap) || + old->active_lock_ptr != cur->active_lock_ptr) + return false; + for (i = 0; i < old->acquired_refs; i++) { if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) || old->refs[i].type != cur->refs[i].type) From patchwork Mon Mar 3 15:23:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CF15CC282CD for ; Mon, 3 Mar 2025 16:05:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=c2YDELerz8rU9E2PXmGNUF4c5wFbG0aqMQwnamqYdcU=; b=H8NnlF4UcmD7LgUzqTDKfG6KGo 5S3wyMxx42TduKyf5+h9v/x4AY1aCAXQ7M6wzSO9M9vhmjtUaiE6WDj82UF7sRQVap4jSU793QW8n MjOqbXDVldGhDBlrfvE8vvasPcEjtDdZf+fRk9i1YS8Y32mA4bHxaB2YGdUrNXNSI11egZ5m6Bhno wio3yRZ+IMX1X62HAZtKeDyhVa7hJVUnh28FNsTY4Caf+MK4sVMCtDTXLsIVhZM1P2GgBH9vZdueM EIn0MuBBkPtfT37abCrVhWKk9s4Y9Bp/EF11d353puMwAuKbMkRO7xztKxLe74Zdj5cz+UxViK2LX sQdpWIrA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tp8Hj-00000001Qaa-3vJY; Mon, 03 Mar 2025 16:04:51 +0000 Received: from mail-wm1-f66.google.com ([209.85.128.66]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tp7dw-00000001IRT-0ioM for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 15:23:45 +0000 Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-43bc48ff815so5652935e9.0 for ; Mon, 03 Mar 2025 07:23:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741015422; x=1741620222; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c2YDELerz8rU9E2PXmGNUF4c5wFbG0aqMQwnamqYdcU=; b=GqDmh6bqPtklNu1/POTxaHbfjxDuNRe4+BTNb1rEwHc7qoaQgEqToS0yqWtQ9QcRrf 9ndQXXp3Gavyv2exhqaNXLoZo8HrX+zVdHFZIhO/pZE/10SUyS4osLIBvkJiiiE/e5or mwxEswHg1pj7ThHa1qODKKnH1/m9/6+ZwJ+2KcyM2RmmhJw2SFEHVcC87FpcAtj+nvZ2 EJSAdNrbr4Y1leIGZtMO5xy6NR7HoK4H/oFXntvhNKtwtZOuoMi8KtSYNQOraMu5x/PR LasZaObDIvMTHL95v0eB042sYUl0sUsiHvumgvzlwjUmw2rZEoDxmoG5DqBdDSpWgKZK HKTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015422; x=1741620222; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c2YDELerz8rU9E2PXmGNUF4c5wFbG0aqMQwnamqYdcU=; b=JwszqRl1As3LXDR+xBj1zIwCuL0kYPgnAVhge4yYzRiOyfGfuYKOrlY373lLqC6QI8 SKtogLSyaUa0EYFtzdo1ID5ijXtg9yMqrbvwhihDSrWR5yN1XkQH0K0mI8IOjKGKvfFM vmqNfG+jcTHFAwuxYi1nLEber7FrH5oeGtSlgp6ZiAO2Tsfe7/q3p4YkPPDnq80oeQ3h icFHEw5M264Srzv8n7KFpHYQniXcN9epK+lOHn7BaioYVALs2gbRu5OqB57vDTXjTEqK mNlXl6ZaYnfVAfXdts9PyY8F1KXKEq+PR0HJXgZ7WfAlB06GaoL2TUsw+xxkI15pQyUy wxrw== X-Forwarded-Encrypted: i=1; AJvYcCUBcF+3UkIflSSTgdms6y2539rOtEB8c4WUWCT9Ck5yD/JTC4O+GbLFMY8zuBn3THi7ST1KI/qmRVT8ux++wHU7@lists.infradead.org X-Gm-Message-State: AOJu0YyHCC5BivtYuwNCPksaH3d2tDIUIE5b1npGCQDdoXMEHuOVXlEY dUZG6GC0Q9Ogh0pjoibmxeBPGIc6QOz0sDygOyLgL/CmJTGvOkad X-Gm-Gg: ASbGnctef/LBpG84emlCWifEXSpOW+0l3ioOfB8RVkhhR/BUIZVF6sz5F9bswdxguuY 0jd+QUETjIyMRYxkSDDdQBgRf0/1qhmvdhFK1I4/yKdKjAt8DFvhjmrW6x3/jIHJzYY9/bGx9IG g5wavjfaOupk2BGYAnJd1DMQip5wGHHqBIYoJ2sv9Qd2BWp0lkMxmoYuUIiQR5ZPfPVd/xHLMgt SrP6ydmZOavN+jByRCxNIRad3qSQTgVOx8MzI1a7c6qVTEjkleF13cdWssKrH59JzXdbWTh9Z1k lSRYMmTsbXmjP2mhilnOR1V7LYpmuyriGPk= X-Google-Smtp-Source: AGHT+IF1BwyKpBxufZL15FHfV1K+xkaoboqbhKYSSMajj4aWR8VsvlX6AMmLz83+PD1s+XMRyoVIXQ== X-Received: by 2002:a05:600c:ca:b0:439:91c7:895a with SMTP id 5b1f17b1804b1-43afddc6489mr144850385e9.7.1741015422319; Mon, 03 Mar 2025 07:23:42 -0800 (PST) Received: from localhost ([2a03:2880:31ff:73::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc24a51bfsm38651415e9.10.2025.03.03.07.23.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:23:41 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v3 25/25] selftests/bpf: Add tests for rqspinlock Date: Mon, 3 Mar 2025 07:23:05 -0800 Message-ID: <20250303152305.3195648-26-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250303152305.3195648-1-memxor@gmail.com> References: <20250303152305.3195648-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=15728; h=from:subject; bh=U3o4FRMgSYlK94f6p6XtgMmHxSWNXlOtPMuGZSejx2w=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxcWZzKIrFm1iuCUUDDtfLRnhLPc+SCt9cT5eXF6o 6N+GSmqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8XFmQAKCRBM4MiGSL8RysxCD/ 0USidd7H00hBTlGhkNe1qL1P8T3pcVx1ejxkBbgrAma8ybJKZyercWVHVXCW8eQg784Ri97C4xBbFo 3V9qUCkJUJZUX4AnPiNsAdiAQCHKECTLYm0p0KqI9BcdgZKQv+HVHH54QDArzyN9nBuG4Uks81oq90 LbGJLfxQ/EP7x8KvPJqEa3auefyGtsq6f6d5JT+n+8gWtbOpnhncsq5NPk+Cfh4XjdHGw1ij0f6kb8 sYkYXwCPxGNbt7GuxA7DUA28RLTd+CWeLMnLz8sqlSHEqT3geqqTQ7zCGroosld1WZxYffYop/FpWT CK438ZkEWr6oW0Z9sokHWAwu5YnrGNcsZCNzUGyAnTFnYrhqn4p2oXPPWxdJP4MkivrQbM7nFp9g0W oGzGvCmn61ww7IdcCEK09nEtye3GvefEpTNPahCH5W1538XTcMV+ivRdfq2TzV48irQGjQEOLSEamT opLsWsX/ngZ8TSUM8rHQd6R1Xn/nxAehi9yFiEr8Afb4nBBoSL5SIDxtqDnDYjMBMZv/cHK/f0yOuL QwW/1Dd+qnhEq3YSKQmvGN8vnM1bubpAJGBgYYiv5mH/KDdvzmV0IA20+gJ8g+d9P9Z/RnqFZnIFPw ciMScGjt5UN3k63SRNESddfekGf3gq1yUKZbJrWdML5Zi9tRdfNR7DtyAqgg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250303_072344_334494_7574EB69 X-CRM114-Status: GOOD ( 24.66 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce selftests that trigger AA, ABBA deadlocks, and test the edge case where the held locks table runs out of entries, since we then fallback to the timeout as the final line of defense. Also exercise verifier's AA detection where applicable. Signed-off-by: Kumar Kartikeya Dwivedi --- .../selftests/bpf/prog_tests/res_spin_lock.c | 92 +++++++ tools/testing/selftests/bpf/progs/irq.c | 53 ++++ .../selftests/bpf/progs/res_spin_lock.c | 143 ++++++++++ .../selftests/bpf/progs/res_spin_lock_fail.c | 244 ++++++++++++++++++ 4 files changed, 532 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c new file mode 100644 index 000000000000..563d0d2801bb --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include + +#include "res_spin_lock.skel.h" +#include "res_spin_lock_fail.skel.h" + +void test_res_spin_lock_failure(void) +{ + RUN_TESTS(res_spin_lock_fail); +} + +static volatile int skip; + +static void *spin_lock_thread(void *arg) +{ + int err, prog_fd = *(u32 *) arg; + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 10000, + ); + + while (!READ_ONCE(skip)) { + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run"); + ASSERT_OK(topts.retval, "test_run retval"); + } + pthread_exit(arg); +} + +void test_res_spin_lock_success(void) +{ + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 1, + ); + struct res_spin_lock *skel; + pthread_t thread_id[16]; + int prog_fd, i, err; + void *ret; + + skel = res_spin_lock__open_and_load(); + if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load")) + return; + /* AA deadlock */ + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "error"); + ASSERT_OK(topts.retval, "retval"); + + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "error"); + ASSERT_OK(topts.retval, "retval"); + + /* Multi-threaded ABBA deadlock. */ + + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB); + for (i = 0; i < 16; i++) { + int err; + + err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd); + if (!ASSERT_OK(err, "pthread_create")) + goto end; + } + + topts.retval = 0; + topts.repeat = 1000; + int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA); + while (!topts.retval && !err && !READ_ONCE(skel->bss->err)) { + err = bpf_prog_test_run_opts(fd, &topts); + } + + WRITE_ONCE(skip, true); + + for (i = 0; i < 16; i++) { + if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join")) + goto end; + if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd")) + goto end; + } + + ASSERT_EQ(READ_ONCE(skel->bss->err), -EDEADLK, "timeout err"); + ASSERT_OK(err, "err"); + ASSERT_EQ(topts.retval, -EDEADLK, "timeout"); +end: + res_spin_lock__destroy(skel); + return; +} diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c index 298d48d7886d..74d912b22de9 100644 --- a/tools/testing/selftests/bpf/progs/irq.c +++ b/tools/testing/selftests/bpf/progs/irq.c @@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym; extern void bpf_local_irq_restore(unsigned long *) __weak __ksym; extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym; +struct bpf_res_spin_lock lockA __hidden SEC(".data.A"); +struct bpf_res_spin_lock lockB __hidden SEC(".data.B"); + SEC("?tc") __failure __msg("arg#0 doesn't point to an irq flag on stack") int irq_save_bad_arg(struct __sk_buff *ctx) @@ -510,4 +513,54 @@ int irq_sleepable_global_subprog_indirect(void *ctx) return 0; } +SEC("?tc") +__failure __msg("cannot restore irq state out of order") +int irq_ooo_lock_cond_inv(struct __sk_buff *ctx) +{ + unsigned long flags1, flags2; + + if (bpf_res_spin_lock_irqsave(&lockA, &flags1)) + return 0; + if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) { + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; + } + + bpf_res_spin_unlock_irqrestore(&lockB, &flags1); + bpf_res_spin_unlock_irqrestore(&lockA, &flags2); + return 0; +} + +SEC("?tc") +__failure __msg("function calls are not allowed") +int irq_wrong_kfunc_class_1(struct __sk_buff *ctx) +{ + unsigned long flags1; + + if (bpf_res_spin_lock_irqsave(&lockA, &flags1)) + return 0; + /* For now, bpf_local_irq_restore is not allowed in critical section, + * but this test ensures error will be caught with kfunc_class when it's + * opened up. Tested by temporarily permitting this kfunc in critical + * section. + */ + bpf_local_irq_restore(&flags1); + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; +} + +SEC("?tc") +__failure __msg("function calls are not allowed") +int irq_wrong_kfunc_class_2(struct __sk_buff *ctx) +{ + unsigned long flags1, flags2; + + bpf_local_irq_save(&flags1); + if (bpf_res_spin_lock_irqsave(&lockA, &flags2)) + return 0; + bpf_local_irq_restore(&flags2); + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; +} + char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c new file mode 100644 index 000000000000..40ac06c91779 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include "bpf_misc.h" + +#define EDEADLK 35 +#define ETIMEDOUT 110 + +struct arr_elem { + struct bpf_res_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 64); + __type(key, int); + __type(value, struct arr_elem); +} arrmap SEC(".maps"); + +struct bpf_res_spin_lock lockA __hidden SEC(".data.A"); +struct bpf_res_spin_lock lockB __hidden SEC(".data.B"); + +SEC("tc") +int res_spin_lock_test(struct __sk_buff *ctx) +{ + struct arr_elem *elem1, *elem2; + int r; + + elem1 = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem1) + return -1; + elem2 = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem2) + return -1; + + r = bpf_res_spin_lock(&elem1->lock); + if (r) + return r; + if (!bpf_res_spin_lock(&elem2->lock)) { + bpf_res_spin_unlock(&elem2->lock); + bpf_res_spin_unlock(&elem1->lock); + return -1; + } + bpf_res_spin_unlock(&elem1->lock); + return 0; +} + +SEC("tc") +int res_spin_lock_test_AB(struct __sk_buff *ctx) +{ + int r; + + r = bpf_res_spin_lock(&lockA); + if (r) + return !r; + /* Only unlock if we took the lock. */ + if (!bpf_res_spin_lock(&lockB)) + bpf_res_spin_unlock(&lockB); + bpf_res_spin_unlock(&lockA); + return 0; +} + +int err; + +SEC("tc") +int res_spin_lock_test_BA(struct __sk_buff *ctx) +{ + int r; + + r = bpf_res_spin_lock(&lockB); + if (r) + return !r; + if (!bpf_res_spin_lock(&lockA)) + bpf_res_spin_unlock(&lockA); + else + err = -EDEADLK; + bpf_res_spin_unlock(&lockB); + return err ?: 0; +} + +SEC("tc") +int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx) +{ + struct bpf_res_spin_lock *locks[48] = {}; + struct arr_elem *e; + u64 time_beg, time; + int ret = 0, i; + + _Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 31, + "RES_NR_HELD assumed to be 31"); + + for (i = 0; i < 34; i++) { + int key = i; + + /* We cannot pass in i as it will get spilled/filled by the compiler and + * loses bounds in verifier state. + */ + e = bpf_map_lookup_elem(&arrmap, &key); + if (!e) + return 1; + locks[i] = &e->lock; + } + + for (; i < 48; i++) { + int key = i - 2; + + /* We cannot pass in i as it will get spilled/filled by the compiler and + * loses bounds in verifier state. + */ + e = bpf_map_lookup_elem(&arrmap, &key); + if (!e) + return 1; + locks[i] = &e->lock; + } + + time_beg = bpf_ktime_get_ns(); + for (i = 0; i < 34; i++) { + if (bpf_res_spin_lock(locks[i])) + goto end; + } + + /* Trigger AA, after exhausting entries in the held lock table. This + * time, only the timeout can save us, as AA detection won't succeed. + */ + if (!bpf_res_spin_lock(locks[34])) { + bpf_res_spin_unlock(locks[34]); + ret = 1; + goto end; + } + +end: + for (i = i - 1; i >= 0; i--) + bpf_res_spin_unlock(locks[i]); + time = bpf_ktime_get_ns() - time_beg; + /* Time spent should be easily above our limit (1/4 s), since AA + * detection won't be expedited due to lack of held lock entry. + */ + return ret ?: (time > 1000000000 / 4 ? 0 : 1); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c new file mode 100644 index 000000000000..3222e9283c78 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +struct arr_elem { + struct bpf_res_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct arr_elem); +} arrmap SEC(".maps"); + +long value; + +struct bpf_spin_lock lock __hidden SEC(".data.A"); +struct bpf_res_spin_lock res_lock __hidden SEC(".data.B"); + +SEC("?tc") +__failure __msg("point to map value or allocated object") +int res_spin_lock_arg(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff)); + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("AA deadlock detected") +int res_spin_lock_AA(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock(&elem->lock); + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("AA deadlock detected") +int res_spin_lock_cond_AA(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&elem->lock)) + return 0; + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_mismatch_1(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&elem->lock)) + return 0; + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_mismatch_2(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&res_lock)) + return 0; + bpf_res_spin_unlock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_local_irq_save(&f1); + if (bpf_res_spin_lock(&res_lock)) + return 0; + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock_irqsave(&res_lock, &f1)) + return 0; + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__success +int res_spin_lock_ooo(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&res_lock)) + return 0; + if (bpf_res_spin_lock(&elem->lock)) { + bpf_res_spin_unlock(&res_lock); + return 0; + } + bpf_res_spin_unlock(&elem->lock); + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__success +int res_spin_lock_ooo_irq(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1, f2; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock_irqsave(&res_lock, &f1)) + return 0; + if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) { + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + /* We won't have a unreleased IRQ flag error here. */ + return 0; + } + bpf_res_spin_unlock_irqrestore(&elem->lock, &f2); + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + return 0; +} + +struct bpf_res_spin_lock lock1 __hidden SEC(".data.OO1"); +struct bpf_res_spin_lock lock2 __hidden SEC(".data.OO2"); + +SEC("?tc") +__failure __msg("bpf_res_spin_unlock cannot be out of order") +int res_spin_lock_ooo_unlock(struct __sk_buff *ctx) +{ + if (bpf_res_spin_lock(&lock1)) + return 0; + if (bpf_res_spin_lock(&lock2)) { + bpf_res_spin_unlock(&lock1); + return 0; + } + bpf_res_spin_unlock(&lock1); + bpf_res_spin_unlock(&lock2); + return 0; +} + +SEC("?tc") +__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0") +int res_spin_lock_bad_off(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock((void *)&elem->lock + 1); + return 0; +} + +SEC("?tc") +__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset") +int res_spin_lock_var_off(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + u64 val = value; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) { + // FIXME: Only inline assembly use in assert macro doesn't emit + // BTF definition. + bpf_throw(0); + return 0; + } + bpf_assert_range(val, 0, 40); + bpf_res_spin_lock((void *)&value + val); + return 0; +} + +SEC("?tc") +__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock") +int res_spin_lock_no_lock_map(struct __sk_buff *ctx) +{ + bpf_res_spin_lock((void *)&value + 1); + return 0; +} + +SEC("?tc") +__failure __msg("local 'kptr' has no valid bpf_res_spin_lock") +int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx) +{ + struct { int i; } *p = bpf_obj_new(typeof(*p)); + + if (!p) + return 0; + bpf_res_spin_lock((void *)p); + return 0; +} + +char _license[] SEC("license") = "GPL";