From patchwork Sun Mar 16 04:05:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B21F0C282DE for ; Sun, 16 Mar 2025 04:07:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=iAZ9jQC/mJhCSOh/eXQcdu5v8UyTgJRpZIptn/Tfisg=; b=sDH7v2soveLroXVqg88YIBGKRK jiKkDXszwPAGf53STYDQQ7MDLsCKkqrUUGay1xf3lO1UuEmJtQZqQc7GAawVndV+c9NEKtpYhaQNZ pFpLbwNqLLsvyMjqVqaZoZtG3N8RKsYhZ3oxttbIZWs3ynBFK5acBOvx2yXS/BbeOuowzwOSuPElM uzNyhah9MV0yT9xOi9GULZ+lhZRk9VXydcwinhHCxX1ESbfm36RxFjXGJU0UZvVIqWw64pdEkhFkH T4WzbqWKgtl96lQxjNBn31n/2D9VEkYulTt1DqbbPKej+ZJW18Xus4Dcr5VpJHUzr21QJanZv10GI PfMA0vfg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfHf-0000000HCjY-1Q6U; Sun, 16 Mar 2025 04:07:31 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfFx-0000000HC8U-3bBQ for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:47 +0000 Received: by mail-wm1-x343.google.com with SMTP id 5b1f17b1804b1-43cf034d4abso11625375e9.3 for ; Sat, 15 Mar 2025 21:05:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097944; x=1742702744; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iAZ9jQC/mJhCSOh/eXQcdu5v8UyTgJRpZIptn/Tfisg=; b=ggOHiU4z9mxgNU+shSrVw+6jinkyYBmCQcSlAUpjVuyTW4wK08vB2E/Dz84C+KF/++ 1fUUAwQH2D7eHRbvYNi/XeShfkZHQmG4+IpiWMH3qkdBYs/JxBjjGt/YrcS6jNpWctyO //vBYAUtxRigCP5xodJVJxcsrzIaYo1rtFuP5jy6QaXdLP7KqJqG4Thxx3uV4wJDARu2 AhJzeu4PT1a9ujNojmSJ8uKt71S0U5NfiXHKQWWef0DHZbsH1JfiQgs3AMfFpYDE69le l9NJ7uasiaomB0oEmGPn/WfllE06irKh/95hAfdcNbTH1a9LjcytI7J107KA9DrevnkR F3rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097944; x=1742702744; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iAZ9jQC/mJhCSOh/eXQcdu5v8UyTgJRpZIptn/Tfisg=; b=QrSss1yohq0THZAoOl7EGb7N+16TVwcD44LBPR/UX6JjNEh3QzlQDziMbR/qoAWCbW tOulTIwxZD59RdI3ZCvrQyqeFMZSdLGaQ5z7M5xgcwWbecKmrwMjWXYPa3IbqmnmA2MD fA62tddtnVBcIlc+RpWICuJ3v0kXgFdI0yMQV012ckl7MM385YxcMusRB1trQOIaxJa+ El7S9wOamxQhfa/YiCk2D323tKaW/LJH3YC65JEVessTLblk/1BLKNHzd47fRfRRK2L2 K82prg4pxsGO0Fi4Dc87Zil49BQDnIRQbtp3c13uYZ4hhb2v+KrRNMyO3TcY6BeBoaHC gQJg== X-Forwarded-Encrypted: i=1; AJvYcCVoC1zxQQ8YrkB/Yqu1t1MwJUBvWVxrvvo6O4EMwH7HyrIxP0yCC8yptm+j3fS0bYPtDv6L92dAt8A5h0jDD7CF@lists.infradead.org X-Gm-Message-State: AOJu0Yyxa7JvSJiieVg75RJDBdG7SGAR0unR101iDA4+GplLTOxyq3ih isRckU19i8z6Toqr0KAPY/pxtgunfYtblPZsTtgD4iuqtaaIA1PK X-Gm-Gg: ASbGncvS7ztNpyO5hcTbYjdonSDSvQXcM+UAB5b65hn0F8lEXsXieB6gTvEX0Pz3cEk UnlWPJhPbQdzK7SVG5f5j31xdAeZxDNDGdrX83L9x9vVQOq1hs5NkMd9c3VTnTvHb4vy62onH/f rik7lJYVdj967htqUkFqX1J2b8OqBfl/eSgxb5mrW6fbj35ytOPZ34nAWLNwl55jXQSZgxoPbOP wUKHdz/qKeOGej61mhviRrqNnNXy7Nc3QV1RoU+E9RDM7Kk1mMqoTPxrK0runGPUNgqAnIL5qgk Y717B6PcmcnG07X6N06OIVcYO2CdwDxI8Q== X-Google-Smtp-Source: AGHT+IGf5Cwi04Qa+X8PjeKePQUKLiwDo+IJ3EDto0rr/2TEINmkWVGh9IecdlUhicgQ4p2TyfcEGQ== X-Received: by 2002:a05:600c:4f41:b0:43c:e7a7:aea0 with SMTP id 5b1f17b1804b1-43d1ecd94b6mr82913775e9.26.1742097943833; Sat, 15 Mar 2025 21:05:43 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:5::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3978ef9a23bsm6539658f8f.15.2025.03.15.21.05.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:43 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 01/25] locking: Move MCS struct definition to public header Date: Sat, 15 Mar 2025 21:05:17 -0700 Message-ID: <20250316040541.108729-2-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1522; h=from:subject; bh=eRhFkF/3rcweOD1hEb8YOfkkc+MgHB8pVKtpJtztTto=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3butWk8EuLzPVXxsuVw8yqD8MdVwsBmree1XPD V9Au8WSJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyiyIEA CF0DaeI20sABfTiW+XnI2K40s/yy2XdKyaP2M+UF+REcO/yDFmgt6PB92KqELak5+J8Zi2i1IGAPbL VUQGB/9GPicCJbHPwoFytzQGK9WLhUge3emsZIXUEY6hsTw5s5YX3aKvWQSxl2n+2FRSIopyc1rC3+ nX4Pt4N8iQzyrESusYoivUFphBfzorcIhNELZlC8XfeNgSvpH7VhGHxzgDF+f0x7ptI8tLqB4S9WnC rGlawzSJSmYe+054/yEUR03h250A1H8XHtH8s0UkaymWWfZwSVmdA2vRjk0YvRn/kn0jx3eBtT/Tc1 MnmcwgF3gfyF2j7LQ9KQ79/bXniDKzmIuPc4cUE7V8Mj/AWjZPH/AEFoQy7K47YG2vKIGBw+LK0DZD 8CMVBbbCsDt32Q699Fn0tSZeYkwgoDuNKv3MIFEk8nz7Ih7KdMQgjbUPJyk1KW8RabsVw0BOlA3igv /cfexOGCQVv0Tr8SQeEhc6OGyDr1NKnYqPqJ3+aBlq4sy68DpHYpQSYKJcESF4nGYrf9eOB6GPGPGK BEZnXXAechWthJbFWsNbSQMW/iZ7GgQ5MtQ4/T19J94XQ3Bc/LoSVUvlTBboqEyIUDd+eYO1ECll0r w9vUiTso2v9M8XgX8vGZeEBkK42nAuhPuDiT1R4EOcasWTSuQANf3LFmwfGQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210545_917019_265BE91C X-CRM114-Status: GOOD ( 12.77 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Move the definition of the struct mcs_spinlock from the private mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h asm-generic header, since we will need to reference it from the qspinlock.h header in subsequent commits. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/mcs_spinlock.h | 6 ++++++ kernel/locking/mcs_spinlock.h | 6 ------ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h index 10cd4ffc6ba2..39c94012b88a 100644 --- a/include/asm-generic/mcs_spinlock.h +++ b/include/asm-generic/mcs_spinlock.h @@ -1,6 +1,12 @@ #ifndef __ASM_MCS_SPINLOCK_H #define __ASM_MCS_SPINLOCK_H +struct mcs_spinlock { + struct mcs_spinlock *next; + int locked; /* 1 if lock acquired */ + int count; /* nesting count, see qspinlock.c */ +}; + /* * Architectures can define their own: * diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 85251d8771d9..16160ca8907f 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -15,12 +15,6 @@ #include -struct mcs_spinlock { - struct mcs_spinlock *next; - int locked; /* 1 if lock acquired */ - int count; /* nesting count, see qspinlock.c */ -}; - #ifndef arch_mcs_spin_lock_contended /* * Using smp_cond_load_acquire() provides the acquire semantics From patchwork Sun Mar 16 04:05:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 376AFC282DE for ; Sun, 16 Mar 2025 04:11:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=1ZDPUUfPIPI1I/a7znm6GXK798ScLYPYP7jBEkUfqo8=; b=2UpW8o29doYIeLQaOaGP30yh8D 6ZDJcJjU+dpUCnAa/mXQoBH2Og9JiNAn/mK6yBUSg0br485KA8OyOwzOiDeJRtxl39EH67WAD0U0T pV+FKKnjp/11C2RW2qEIAgrBFwGpCVrJW6tSFbjecJhbv1cqKf86GZIN3S9fFC91v3eAXV+GYuZWP 1KmepunIyhXvmhskf2GAytflBD5/SbWv1D1AITM1c0UvVRUaCS3lBPPSkgcXEovgYKnCX2qW2JJEg /T8j1omifmChSZE06JTaKCGYQ1N2N+YrLftQBgBZKjadepjHeNk2af6nZRqwNZLXC6p8Fml+b6kNU 2PMo6LwA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfKx-0000000HDQh-3A8z; Sun, 16 Mar 2025 04:10:55 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfFz-0000000HC90-22n2 for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:49 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5b1f17b1804b1-43690d4605dso6606655e9.0 for ; Sat, 15 Mar 2025 21:05:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097946; x=1742702746; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1ZDPUUfPIPI1I/a7znm6GXK798ScLYPYP7jBEkUfqo8=; b=lEEpXhPddsMjbVTPihJUAw/fe5r9NqCKnb89HJP+ck/+00prkqYaRBeQHRIrzSMjJw AlrV11fTU+ZOV8GSW/sHQw8G1QgzrAnVUfb5Ui9bg1vafUPufqDcsyo+fJ8+iRb56mcZ qfmlPQIkXFhwJMxgBWz8fR40dtTBIRJ6ivMxCVzHt4a2Jld6sVgAkmky7knoNO/29Mhz etMp6gSmUXqA6s88CE6Va4h6YgHK9LtNEfteAYVE3EOnUg/ZUJ6QRU/6ybeOHHH4c6RZ 6bAIr6qbdDl4f4XkgNpA705X4nft0wgEQ3NIKB9d0LE5hTssDSbRmrH2RxDAsG8QMJad hXcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097946; x=1742702746; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1ZDPUUfPIPI1I/a7znm6GXK798ScLYPYP7jBEkUfqo8=; b=lQrAum4pCFQWlpECSOpNOIp5OCt85kDSlneNwt3UChFsbp7agTe6ZziiqzxMXx28Wv eJfkhVJQwCBSXHd3+xTz7cVfvCB3XpByqGGs3V60GN1Y2sb16EfLA1697uPenRdyf0ZV vbpT49dNgCzPR2FoUAggAoM+c2bCPwBBAby7AWRcNEJqidq+HSUHV4Jp434hbbdM1PJW T39ijt9sFcHTVOJFYB+ydDW8QCH3lLVBcUBXH62i1yzMen5Y+PzkXEf48wjSu1GmvQIl aqiUJF38uE0upelqCUoILc9KgZO8zwjqJPloBgzOIExtmRU8hCPx8T6YDh254IiBbCjU /x1g== X-Forwarded-Encrypted: i=1; AJvYcCWAfYh7O91vN/imVQcJYxl+/z5ecpmEbHZzwFDSyYS1XzsnFOa6Wr2ZXl4EGky76ep64K8conBrvSwcqqG8Shvv@lists.infradead.org X-Gm-Message-State: AOJu0Yz+YaPkSIUD2Y61EvRwHANepNN46O/Xp+X+T1jXQtJWzC1Iuo8t NdT8qlXWohogXQV3A1uHqjWdh2P0I46IXC2KtLlt9XgloDGDFI5Z X-Gm-Gg: ASbGncs/AesxwqVyMygbYMEYnxFDTHSTGl/wab5mQZjkbMpdiT+3XuLho/AKlC2Ztjw RfPtTzd2cJs4ZyWV6uYo/YQCbclxUQaHxbro91a5T/t/NZd25SdVjREp7+gajclTwNNaX2hntGl MM+6uBFSePi7POTkskshFOZTakaDZo7hWyn2vyhZd3VOrZ42F6BU4drVIk0xRg1phpsYfhnDYJO t8R/cuMmOhypGhX/AzulTfFiZAU1IAGQn6bUeihvHQeExmSeFt5yW//28D000htDZhI41Yvf54d ft4IMN8+D/CNuuUhWUaLDKA1o0LGouQ8qg== X-Google-Smtp-Source: AGHT+IGQENkXPWklAfMwa5tz+DJLXBtrNTvE3SYBajxXKIcQybZMggtMRpVOuyPRCt7uZr8imHSk7A== X-Received: by 2002:adf:b511:0:b0:391:2bab:d2fd with SMTP id ffacd0b85a97d-3972086e264mr8092081f8f.37.1742097945494; Sat, 15 Mar 2025 21:05:45 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:3::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1ffc4173sm67180505e9.20.2025.03.15.21.05.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:44 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 02/25] locking: Move common qspinlock helpers to a private header Date: Sat, 15 Mar 2025 21:05:18 -0700 Message-ID: <20250316040541.108729-3-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=13562; h=from:subject; bh=P6WGJXH3MTYAaWinW8eboOniR3lR+VqqyTTeTx9qsto=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3bMavRJM1KjtMppAwdQp0y3W5tnfeN56xhPQqP M1qWb9WJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyreTD/ 4hVzlupZxK+mXnZN8k0vM3l0bfdhXAjegYO6MB4u/i91194228LN6CqHYReD3NrKgIY7Ue/lOtB5I8 pe6xiAeiXsjhwaxVw+eERShrPtXscrLjqVkPbhYaUpztghP5FcpkEiHrtf0Ovl1mV3AioL8+6Yud4a 855N6w8ocvsmrAjpnk7ZG3LlTm2/sqyQ/oSATI/ru4AY+Npf46eYKJPRQgZmUCHnTHycySFMZ/0O9y 4sZpNEcRRxc5gekVdr1W5QxoZ8exvppkXShi14L8nOQtu07c7MIvXstpHDOjpgBiUp2S2lEaoSxwuJ e3PMjaLA59vDdLR+cjJ0ONxQDARw/hXHz0hJZ0Y6KtzdJ6YHyQrMQyboYuuk68RN9xyKHCziUOmdLw 3QB7tcInWuPYX0gO9t5CjDcd5uYjUE+jBNb5ZVL3I9eLcaDXPaydxo37NnezFHGPOAKrgVt2OpwekQ 3Ydi+cTB2XWKxI8kABZ6BZPbW2LrCGMQbmad22jcHBlWed26N8fAYEIehmtbwSCE1AlSuX+jTR0jLw wIWBpXB9Tsrio20/PuUDpgbem6EDL2zHfZhFYrLtNTtbLpI8ttEz/hBqaeOveace8E/95yjGlAzosm me8B3MJ2NYiphGOt8QwTmzvU3+qxGbwXERjjtIv0R2DUcIe8gjvYD7V85LEg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210547_534915_C384B70B X-CRM114-Status: GOOD ( 28.27 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Move qspinlock helper functions that encode, decode tail word, set and clear the pending and locked bits, and other miscellaneous definitions and macros to a private header. To this end, create a qspinlock.h header file in kernel/locking. Subsequent commits will introduce a modified qspinlock slow path function, thus moving shared code to a private header will help minimize unnecessary code duplication. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/qspinlock.c | 193 +---------------------------------- kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+), 188 deletions(-) create mode 100644 kernel/locking/qspinlock.h diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 7d96bed718e4..af8d122bb649 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -25,8 +25,9 @@ #include /* - * Include queued spinlock statistics code + * Include queued spinlock definitions and statistics code */ +#include "qspinlock.h" #include "qspinlock_stat.h" /* @@ -67,36 +68,6 @@ */ #include "mcs_spinlock.h" -#define MAX_NODES 4 - -/* - * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in - * size and four of them will fit nicely in one 64-byte cacheline. For - * pvqspinlock, however, we need more space for extra data. To accommodate - * that, we insert two more long words to pad it up to 32 bytes. IOW, only - * two of them can fit in a cacheline in this case. That is OK as it is rare - * to have more than 2 levels of slowpath nesting in actual use. We don't - * want to penalize pvqspinlocks to optimize for a rare case in native - * qspinlocks. - */ -struct qnode { - struct mcs_spinlock mcs; -#ifdef CONFIG_PARAVIRT_SPINLOCKS - long reserved[2]; -#endif -}; - -/* - * The pending bit spinning loop count. - * This heuristic is used to limit the number of lockword accesses - * made by atomic_cond_read_relaxed when waiting for the lock to - * transition out of the "== _Q_PENDING_VAL" state. We don't spin - * indefinitely because there's no guarantee that we'll make forward - * progress. - */ -#ifndef _Q_PENDING_LOOPS -#define _Q_PENDING_LOOPS 1 -#endif /* * Per-CPU queue node structures; we can never have more than 4 nested @@ -106,161 +77,7 @@ struct qnode { * * PV doubles the storage and uses the second cacheline for PV state. */ -static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]); - -/* - * We must be able to distinguish between no-tail and the tail at 0:0, - * therefore increment the cpu number by one. - */ - -static inline __pure u32 encode_tail(int cpu, int idx) -{ - u32 tail; - - tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; - tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ - - return tail; -} - -static inline __pure struct mcs_spinlock *decode_tail(u32 tail) -{ - int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; - int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; - - return per_cpu_ptr(&qnodes[idx].mcs, cpu); -} - -static inline __pure -struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) -{ - return &((struct qnode *)base + idx)->mcs; -} - -#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) - -#if _Q_PENDING_BITS == 8 -/** - * clear_pending - clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,* -> *,0,* - */ -static __always_inline void clear_pending(struct qspinlock *lock) -{ - WRITE_ONCE(lock->pending, 0); -} - -/** - * clear_pending_set_locked - take ownership and clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,0 -> *,0,1 - * - * Lock stealing is not allowed if this function is used. - */ -static __always_inline void clear_pending_set_locked(struct qspinlock *lock) -{ - WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); -} - -/* - * xchg_tail - Put in the new queue tail code word & retrieve previous one - * @lock : Pointer to queued spinlock structure - * @tail : The new queue tail code word - * Return: The previous queue tail code word - * - * xchg(lock, tail), which heads an address dependency - * - * p,*,* -> n,*,* ; prev = xchg(lock, node) - */ -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) -{ - /* - * We can use relaxed semantics since the caller ensures that the - * MCS node is properly initialized before updating the tail. - */ - return (u32)xchg_relaxed(&lock->tail, - tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; -} - -#else /* _Q_PENDING_BITS == 8 */ - -/** - * clear_pending - clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,* -> *,0,* - */ -static __always_inline void clear_pending(struct qspinlock *lock) -{ - atomic_andnot(_Q_PENDING_VAL, &lock->val); -} - -/** - * clear_pending_set_locked - take ownership and clear the pending bit. - * @lock: Pointer to queued spinlock structure - * - * *,1,0 -> *,0,1 - */ -static __always_inline void clear_pending_set_locked(struct qspinlock *lock) -{ - atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); -} - -/** - * xchg_tail - Put in the new queue tail code word & retrieve previous one - * @lock : Pointer to queued spinlock structure - * @tail : The new queue tail code word - * Return: The previous queue tail code word - * - * xchg(lock, tail) - * - * p,*,* -> n,*,* ; prev = xchg(lock, node) - */ -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) -{ - u32 old, new; - - old = atomic_read(&lock->val); - do { - new = (old & _Q_LOCKED_PENDING_MASK) | tail; - /* - * We can use relaxed semantics since the caller ensures that - * the MCS node is properly initialized before updating the - * tail. - */ - } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); - - return old; -} -#endif /* _Q_PENDING_BITS == 8 */ - -/** - * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending - * @lock : Pointer to queued spinlock structure - * Return: The previous lock value - * - * *,*,* -> *,1,* - */ -#ifndef queued_fetch_set_pending_acquire -static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock) -{ - return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); -} -#endif - -/** - * set_locked - Set the lock bit and own the lock - * @lock: Pointer to queued spinlock structure - * - * *,*,0 -> *,0,1 - */ -static __always_inline void set_locked(struct qspinlock *lock) -{ - WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); -} - +static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]); /* * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for @@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * any MCS node. This is not the most elegant solution, but is * simple enough. */ - if (unlikely(idx >= MAX_NODES)) { + if (unlikely(idx >= _Q_MAX_NODES)) { lockevent_inc(lock_no_node); while (!queued_spin_trylock(lock)) cpu_relax(); @@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * head of the waitqueue. */ if (old & _Q_TAIL_MASK) { - prev = decode_tail(old); + prev = decode_tail(old, qnodes); /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h new file mode 100644 index 000000000000..d4ceb9490365 --- /dev/null +++ b/kernel/locking/qspinlock.h @@ -0,0 +1,200 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Queued spinlock defines + * + * This file contains macro definitions and functions shared between different + * qspinlock slow path implementations. + */ +#ifndef __LINUX_QSPINLOCK_H +#define __LINUX_QSPINLOCK_H + +#include +#include +#include +#include + +#define _Q_MAX_NODES 4 + +/* + * The pending bit spinning loop count. + * This heuristic is used to limit the number of lockword accesses + * made by atomic_cond_read_relaxed when waiting for the lock to + * transition out of the "== _Q_PENDING_VAL" state. We don't spin + * indefinitely because there's no guarantee that we'll make forward + * progress. + */ +#ifndef _Q_PENDING_LOOPS +#define _Q_PENDING_LOOPS 1 +#endif + +/* + * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in + * size and four of them will fit nicely in one 64-byte cacheline. For + * pvqspinlock, however, we need more space for extra data. To accommodate + * that, we insert two more long words to pad it up to 32 bytes. IOW, only + * two of them can fit in a cacheline in this case. That is OK as it is rare + * to have more than 2 levels of slowpath nesting in actual use. We don't + * want to penalize pvqspinlocks to optimize for a rare case in native + * qspinlocks. + */ +struct qnode { + struct mcs_spinlock mcs; +#ifdef CONFIG_PARAVIRT_SPINLOCKS + long reserved[2]; +#endif +}; + +/* + * We must be able to distinguish between no-tail and the tail at 0:0, + * therefore increment the cpu number by one. + */ + +static inline __pure u32 encode_tail(int cpu, int idx) +{ + u32 tail; + + tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; + tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ + + return tail; +} + +static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes) +{ + int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; + int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; + + return per_cpu_ptr(&qnodes[idx].mcs, cpu); +} + +static inline __pure +struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) +{ + return &((struct qnode *)base + idx)->mcs; +} + +#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) + +#if _Q_PENDING_BITS == 8 +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,* -> *,0,* + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + WRITE_ONCE(lock->pending, 0); +} + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + * + * Lock stealing is not allowed if this function is used. + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL); +} + +/* + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail), which heads an address dependency + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + /* + * We can use relaxed semantics since the caller ensures that the + * MCS node is properly initialized before updating the tail. + */ + return (u32)xchg_relaxed(&lock->tail, + tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; +} + +#else /* _Q_PENDING_BITS == 8 */ + +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,* -> *,0,* + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + atomic_andnot(_Q_PENDING_VAL, &lock->val); +} + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); +} + +/** + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + u32 old, new; + + old = atomic_read(&lock->val); + do { + new = (old & _Q_LOCKED_PENDING_MASK) | tail; + /* + * We can use relaxed semantics since the caller ensures that + * the MCS node is properly initialized before updating the + * tail. + */ + } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); + + return old; +} +#endif /* _Q_PENDING_BITS == 8 */ + +/** + * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending + * @lock : Pointer to queued spinlock structure + * Return: The previous lock value + * + * *,*,* -> *,1,* + */ +#ifndef queued_fetch_set_pending_acquire +static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock) +{ + return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); +} +#endif + +/** + * set_locked - Set the lock bit and own the lock + * @lock: Pointer to queued spinlock structure + * + * *,*,0 -> *,0,1 + */ +static __always_inline void set_locked(struct qspinlock *lock) +{ + WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); +} + +#endif /* __LINUX_QSPINLOCK_H */ From patchwork Sun Mar 16 04:05:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018312 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 26932C282DE for ; Sun, 16 Mar 2025 04:12:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=arAQKO8UQM0dQfXie+8D2prlulJwIsVlpZ53Yt0ULrw=; b=MsyKHk9cootx+wua+4nUBoBf3m r+2zKxEycwXuG25D/+9qMck4zW9Us8ey4EZUrTJs57+vrzB5QaaCo2bFWqdvzDKn+wbkX1ATACXzM HyEdGz6Rg+UtvNqLjUxW5vRU913zDEQ7eOtM0fAyZ1/wVIge0Gg020gDUfEqd2yt+P3oOENcRxLTt xEh47ejHSUqT+RkOweQgzvjxCOvK8CNKtgov1zTH/gvxKb5mgwrFu+rIqBXwePmeoycaSJUmmwzjg TyEbwOO/Lr9KNfFcebmowYPgQYseidqPE+SojeP6qkNZD5KrF701yU+ZsDS6DnbQxRdW3TgeORguA E+Ths8uQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfMc-0000000HDmD-2Z8X; Sun, 16 Mar 2025 04:12:38 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfG0-0000000HCAR-3HU3 for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:50 +0000 Received: by mail-wr1-x441.google.com with SMTP id ffacd0b85a97d-3914bc3e01aso2176542f8f.2 for ; Sat, 15 Mar 2025 21:05:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097947; x=1742702747; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=arAQKO8UQM0dQfXie+8D2prlulJwIsVlpZ53Yt0ULrw=; b=dDWrlm1QOAhKANFdb4zvSVfDAhQw1LmvwnSFdRl9M3jUtKRnEe03Fa39ZdmfJJympL AwHVhMjoYtWTAtwIS/mf5IZcnlWn6habmF1V0XWAVIfHRFdBmWU10dTqdnLc/QQzv5L5 HpMvrEQqUBbZD4VuluHcPWt6EeF84gSvs0tyoUh0mc5EqxeSAgKZyZIcEQuFWihCi9gl mhaIRSmAitacHlgWQwCrS58ZpEpQ44RhMKGd66kUypWYSE8QRmIJNeJ6lRvjePX0yFv3 yOGPB4OpPq0eGeCXxd0ZUBRWnWUF3ofoyPf1Ex2xSrPu7T3JDTSUWYXgPp/mAUWTuRT1 jEZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097947; x=1742702747; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=arAQKO8UQM0dQfXie+8D2prlulJwIsVlpZ53Yt0ULrw=; b=szle70Vq5sSdGF5AVHW4x/gmaMRbP6icFS5VvE7rMH+m6k2BUdxR7JW4zkhrdD6UF2 /Vl9s1YLHUdr+ubrN7Q6Vh3aHGBz9uaNfCbbCPVHCZMsiq+68iXCkPeHR13WuvZQMUaJ KUEbgcZx/icC7O50BUbmu2ZkMPalzNgW5z4Jh9AJYSy4+62yGwGTh7YECgPSIMb1Ry2M MZUEiyEMo0YsR2V2o11+t3tJAAz9eEtyBpBQoKvUL8gpyPgzTXnFmHDslaxhaKwaiHfE T5CIwt1EpU4ruwWJ0/BpvvPzM7GggtL78JU5BX5eYI1ySHLj8KuEnSfUKB9zX3i5LEXF L0fg== X-Forwarded-Encrypted: i=1; AJvYcCVDpJE89tlP22DN+L6pkjwVqx2ESjbHqj28ytKUbuoMCCkCgJ5+VUq5V/fhyVaXiGp8uZAZgJ4/DVc6rRssE8gb@lists.infradead.org X-Gm-Message-State: AOJu0YzpUfMcwBaM8QS6yzDnEL5tk0r1xZCQNMK2qNIkLlb15Aw4R0zX c+7lC9PkOVDdh4KwhPQlIps1WVMIjuIJfsnJfJI+m3edGIIvSFrf X-Gm-Gg: ASbGncvd3BTRLyLCAMhAIUUtdA0RLuIyGP69jGqIK7eSSGyuXQdAXEAcvbZBzvL89TO 1iHmtlfKC7RuS+QpAvCppY6F308FmUFZOxL+OTDBUEexr/UTP525BilzZZKyuefB+wUyucEE9EB fTk/QIkKXeIASWNjnqO3jgtwKhN1XyDj2rF4Ni82aQcN3MBOSp4RbxZA2eNMByybDypoZioHMYg GW0H6eN3ldJrTuy2hMe+HXFTbud0ix9N0NIlZpdRbZWo3g9BMLYNVCfzotYw4jDe6t1Z/609IYQ XquyTkVxmmBbnkor7PSntPipFdiO4Ky0XF4= X-Google-Smtp-Source: AGHT+IGeJzY3eBGexLcsk1yze9S3oNLCw88EaC+mhaa9vg5SAERJj6XSK27DdfLilXTABNLSoX68Ig== X-Received: by 2002:a5d:6487:0:b0:391:4999:778b with SMTP id ffacd0b85a97d-3971ded24eamr8650576f8f.28.1742097947068; Sat, 15 Mar 2025 21:05:47 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:48::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395cb40cdafsm10936707f8f.62.2025.03.15.21.05.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:46 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 03/25] locking: Allow obtaining result of arch_mcs_spin_lock_contended Date: Sat, 15 Mar 2025 21:05:19 -0700 Message-ID: <20250316040541.108729-4-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1052; h=from:subject; bh=bCYnbE+K73lTBoOLttYSRrRShne2EKo6ypzNdER5dO0=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3bZVJO+UtaXh2ksd2ygsxMkhcgERBwBjs9FllI XcQQD02JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyrE+D/ 9mak4NXwqkExtA6v4P4pv81udcNOxwDMXUb35o6/CJjxArzw0HvdY428jcentyE0WrkYShVveSLa8v gbZpOY+R4DxGLAG9A7vhsHWyBPTG4Nm/fz4K6mql1dG8Rn86G+pi7KNm9DUbA8dhS79edorwxQ74w2 ZgfwIKcHyHKk//0Zzo+3TqqGOY647CW2eelFr/fOUbVF3aHHtj3H9SF2Cn5TYkGwX37s0nQT08NeRC 6lC3cRBakQV16GrBRvnzuxwKCr0riv4WOvH+At23JAXrPoKxRRCJyHthJUB5AjLXXVpWjd3vQRZ84e m4j04JXHSKMor4vL6CJNdPbQ45sjOyPq0IX85pP4AbEnoRaoWA17Z6Sc8YfUIE7vO6miyigFrZ/abj THdDEqULYZgI9XRQlWB8yDm/grvuD87SrkKzJWdCqZOqs254FuPsbYcY7KBd395/gAGmRcGdmPdBEn vM+vkNfbds7UMoaPYMZ25ck7vCMuPGGuBlKCj7EkeTPCiIB0tGsLZGXd2YsjABChEPH7pPPLS7CnTm TU+UtlJRzlhncgkJdmlb8xaZyibSXEjKB9b3pd4fVNDotHhyB0wT74fhCqGrDkuRX7vV7yNHq1zr5J hr44cNlreYhdVZJm7c3j0JwwYDA2zXpv5B2N5U9pbKtklVtFtNJa4aT0axtw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210548_880345_A62195B6 X-CRM114-Status: GOOD ( 11.04 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To support upcoming changes that require inspecting the return value once the conditional waiting loop in arch_mcs_spin_lock_contended terminates, modify the macro to preserve the result of smp_cond_load_acquire. This enables checking the return value as needed, which will help disambiguate the MCS node’s locked state in future patches. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/mcs_spinlock.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 16160ca8907f..5c92ba199b90 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -24,9 +24,7 @@ * spinning, and smp_cond_load_acquire() provides that behavior. */ #define arch_mcs_spin_lock_contended(l) \ -do { \ - smp_cond_load_acquire(l, VAL); \ -} while (0) + smp_cond_load_acquire(l, VAL) #endif #ifndef arch_mcs_spin_unlock_contended From patchwork Sun Mar 16 04:05:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 72ADBC282DE for ; Sun, 16 Mar 2025 04:14:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=f2y/CQninyYBUKgOmM8/BKgG1hjtF3cTKSSqa63B7Zo=; b=Gr+6ByF7HDKlEqZBVQ6VCkwmYn Ze+JhnnETMJFnU9eMmpPiszGxOy5cy7XwgpIStpJyW9LCZJwb62ew+L0yjgGY2/AUrNItecsVXLQI dG8KpKQwek9jhYSM6dphvqK+vRTHqBsWJty1xFutq2P9zyzwISqQpgWBzQzppp/lfpAF9TV73Nv2p pkLR2L5EAkXUkQN9QB/4TBhJbBUwOgr00Ue3Q49hWIzJqcCVCeC81jwl2ZgzkvLhR/ke8GeuLy4PV MHr35YGJ9jxP9s7AbwM5A3OfGKw3Gfi/S1x4Wxzg0T60AvB2FHICQ50T23EW5JtD8DIgTE4tS4PfS hBLvomyA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfOG-0000000HE5J-1rpS; Sun, 16 Mar 2025 04:14:20 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfG1-0000000HCBB-3Hg9 for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:51 +0000 Received: by mail-wm1-x341.google.com with SMTP id 5b1f17b1804b1-43cf034d4abso11625635e9.3 for ; Sat, 15 Mar 2025 21:05:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097948; x=1742702748; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=f2y/CQninyYBUKgOmM8/BKgG1hjtF3cTKSSqa63B7Zo=; b=W4+iuNeMAXFCcRkq9qvAEr5dZTsuSNkOpiup2N+oHd43Twe3v53BELgHm41Z5fdZfa a8Py+lTtnpP1PDOOin8UOh9aCkNdFqpGR9ut06HH5RTxwCeZhcDiHpTDRl7VT6eIzOAr v+oNA9wn5lqVH8EeW7RQQ0BFM58DhfzZ3/EwgD8K0x5UPXwYzBqR8W27I5d7ZzSO3Fzt kqTsOPeufHlYrSauroUgb2Qpp1cU96v4L8AOw/QpVwcZ3kmfoHxxYSLVC8I8N8/cXf48 OcDnNTrtKT6eMjiCYSRuRtx4SpqrfPzTfFIw5PgpkQR9U9r7P9K6dx0D4+ewnTdqI14Y 6nYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097948; x=1742702748; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f2y/CQninyYBUKgOmM8/BKgG1hjtF3cTKSSqa63B7Zo=; b=Bnr8Zt7L3/9NmjitEUQv5WHkJtbkKemWJ2w+CJ6QdwNUMZnCQ4ci6293Lc5iSjgihR SpV5TggKBQiU5hp3uaq8lFYh8X+dYrShVsx955Kg2IRPDqaeXQM1iO29LceEaQbGLeOc a8jn2tR1adsNqGw4SXgicTlOs9P3IVUyrISF/dB/W+/iQ6zI4Pwv7JDC+gokxZVw7K3q 8xDH+nzIqO+6CiFmsLBYZ2/CEPLZwnifjOEMNu4vlmwfWhcDNNkqW7g0G2h5V0hNNBPU Q8snTGuJL817wbCPrRj/h57wRKM7+AK+615L+zgmfkAV/WO1/+i+Px0sBTl3OaFpJFau vubw== X-Forwarded-Encrypted: i=1; AJvYcCVuTqc5+/cqlGK4ldD3MbWp4Kx1UU83BUscwzArbF9Wqo4JsZ8B8Gd2RzEZFUtd/7Wde6m/ZqJKA6Glzz5Rancx@lists.infradead.org X-Gm-Message-State: AOJu0YzraKMQgFc4tlypw+kRz7zsqWOE5ZvCoyrQqeiuMIecfMGfGaU4 IUs5fQDhyTNSB0juIx1M80ZLmeTVRUSNGDj5ix7SrEet45M854uR X-Gm-Gg: ASbGnctwN3Q4rMiUuzzAYAtMRmY3XcneYR8KLKK9BKmeQbv2/RlUAqXtHV5ugdhh8cX 7w71zvC+v/gYTeFndhbClrs0iFWr40R+C7cA2PZMh+U4OslZ6laacek3YNS9OKhH8p1/J/tL0yc b/DaNLoS4wpnIpJ5CoQEyDi71YsuMZR2LNMQqisho8dzD16ogrYDu6CrsJXfnAa75b8BWMLStcP waf1njMUWORtODZYfU7mxGrlbO3oxKoy1wjr7pw+/es+jkru8xA7TDn5R2fsKZJ5+Cqviu7IVJY jmqvJJnkynCrTCGmQ0YuUHBJQbsDnas9KA== X-Google-Smtp-Source: AGHT+IFEdFdDXX9mlo+XtsgH+mGwVEJHM88X5q2zA7d0QlYla/iKH6Eki4j8d7sIyrRi/+0SAHZX/g== X-Received: by 2002:a05:6000:1564:b0:391:487f:2828 with SMTP id ffacd0b85a97d-3971cd5741emr10187766f8f.10.1742097948274; Sat, 15 Mar 2025 21:05:48 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:6::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395cb7e9f8asm10778057f8f.81.2025.03.15.21.05.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:47 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 04/25] locking: Copy out qspinlock.c to kernel/bpf/rqspinlock.c Date: Sat, 15 Mar 2025 21:05:20 -0700 Message-ID: <20250316040541.108729-5-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=14298; h=from:subject; bh=DTAwKIVBPvF1KKfxWrr0m6ASI2nIK7yi8aJrpCgtpuQ=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3blMqF8h+waofaYnPYffWsyIxmJXHoZWK8cilz sx2D7EOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyqWDD/ 4iz3IWbiJGJg7NDgqC/uX4CYqzlAw+XBoYpqBZ/clcvmzbcRmbTxPUZ0R9Ie5CE0I0dwwV+EyXzgyB hdRetK1UASNu62rCl1VvTLKutqrKZTV0fSgJi0G+Lc+HIXzLFJQN8HOs6021WdsJ3hQVe0Nodfaayc rOZ38nVZc1VOkyLt6sZogAB11mcfHkIxiCxNuYR2PeJf+4sZGbYVzCDIcm9tOdXHdlwuSZVs0D/iC/ 7C2qmm5QAFZEqLkwkng+38O8+fvezDqYrCo/9JoXbgxRheezxc7SigIeSWuLGhJHjChmQ9La+Z0GaS jq/pHgViyDFaWG6M728MpakUQzHNZc4exWIIWZ4RW7txFmj8jx2VkOZ66ujXVcmsq+WEXKQIKpKbPx ixuSnEPKewz1R6iZfbQ1RK3La6Wb7uaoIIwYiE5N+L1mpAisgmWBioo+mP+yYpwX2tBA0ysFM5N3Ai pu/r8fsLYaL4NBMM4qwyDdUfYUbjDVc4UbnmoKu2BDyEtvF7ZMoo4Szfav/DmdfmbPsWvdcNvEwl84 XQKvCfEK4kkUO0GpJFyJjVgSZwvDgSdBQ40EZeT6K62Yh+6HAnW06odP4bjb4LDbUEcIZxlW8u/YuW 8AQZa8G/aKtb1jxi7Ceo45b2pVKB6qCXKqVJSl5wLSrHXIrRteojWecbpAXA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210550_011529_861B2168 X-CRM114-Status: GOOD ( 32.95 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org In preparation for introducing a new lock implementation, Resilient Queued Spin Lock, or rqspinlock, we first begin our modifications by using the existing qspinlock.c code as the base. Simply copy the code to a new file and rename functions and variables from 'queued' to 'resilient_queued'. Since we place the file in kernel/bpf, include needs to be relative. This helps each subsequent commit in clearly showing how and where the code is being changed. The only change after a literal copy in this commit is renaming the functions where necessary, and rename qnodes to rqnodes. Let's also use EXPORT_SYMBOL_GPL for rqspinlock slowpath. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 410 insertions(+) create mode 100644 kernel/bpf/rqspinlock.c diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c new file mode 100644 index 000000000000..762108cb0f38 --- /dev/null +++ b/kernel/bpf/rqspinlock.c @@ -0,0 +1,410 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Resilient Queued Spin Lock + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * (C) Copyright 2013-2014,2018 Red Hat, Inc. + * (C) Copyright 2015 Intel Corp. + * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP + * + * Authors: Waiman Long + * Peter Zijlstra + */ + +#ifndef _GEN_PV_LOCK_SLOWPATH + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Include queued spinlock definitions and statistics code + */ +#include "../locking/qspinlock.h" +#include "../locking/qspinlock_stat.h" + +/* + * The basic principle of a queue-based spinlock can best be understood + * by studying a classic queue-based spinlock implementation called the + * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable + * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and + * Scott") is available at + * + * https://bugzilla.kernel.org/show_bug.cgi?id=206115 + * + * This queued spinlock implementation is based on the MCS lock, however to + * make it fit the 4 bytes we assume spinlock_t to be, and preserve its + * existing API, we must modify it somehow. + * + * In particular; where the traditional MCS lock consists of a tail pointer + * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to + * unlock the next pending (next->locked), we compress both these: {tail, + * next->locked} into a single u32 value. + * + * Since a spinlock disables recursion of its own context and there is a limit + * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there + * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now + * we can encode the tail by combining the 2-bit nesting level with the cpu + * number. With one byte for the lock value and 3 bytes for the tail, only a + * 32-bit word is now needed. Even though we only need 1 bit for the lock, + * we extend it to a full byte to achieve better performance for architectures + * that support atomic byte write. + * + * We also change the first spinner to spin on the lock bit instead of its + * node; whereby avoiding the need to carry a node from lock to unlock, and + * preserving existing lock API. This also makes the unlock code simpler and + * faster. + * + * N.B. The current implementation only supports architectures that allow + * atomic operations on smaller 8-bit and 16-bit data types. + * + */ + +#include "../locking/mcs_spinlock.h" + +/* + * Per-CPU queue node structures; we can never have more than 4 nested + * contexts: task, softirq, hardirq, nmi. + * + * Exactly fits one 64-byte cacheline on a 64-bit architecture. + * + * PV doubles the storage and uses the second cacheline for PV state. + */ +static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); + +/* + * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs + * for all the PV callbacks. + */ + +static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } +static __always_inline void __pv_wait_node(struct mcs_spinlock *node, + struct mcs_spinlock *prev) { } +static __always_inline void __pv_kick_node(struct qspinlock *lock, + struct mcs_spinlock *node) { } +static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, + struct mcs_spinlock *node) + { return 0; } + +#define pv_enabled() false + +#define pv_init_node __pv_init_node +#define pv_wait_node __pv_wait_node +#define pv_kick_node __pv_kick_node +#define pv_wait_head_or_lock __pv_wait_head_or_lock + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define resilient_queued_spin_lock_slowpath native_resilient_queued_spin_lock_slowpath +#endif + +#endif /* _GEN_PV_LOCK_SLOWPATH */ + +/** + * resilient_queued_spin_lock_slowpath - acquire the queued spinlock + * @lock: Pointer to queued spinlock structure + * @val: Current value of the queued spinlock 32-bit word + * + * (queue tail, pending bit, lock value) + * + * fast : slow : unlock + * : : + * uncontended (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0) + * : | ^--------.------. / : + * : v \ \ | : + * pending : (0,1,1) +--> (0,1,0) \ | : + * : | ^--' | | : + * : v | | : + * uncontended : (n,x,y) +--> (n,0,0) --' | : + * queue : | ^--' | : + * : v | : + * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : + * queue : ^--' : + */ +void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +{ + struct mcs_spinlock *prev, *next, *node; + u32 old, tail; + int idx; + + BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + + if (pv_enabled()) + goto pv_queue; + + if (virt_spin_lock(lock)) + return; + + /* + * Wait for in-progress pending->locked hand-overs with a bounded + * number of spins so that we guarantee forward progress. + * + * 0,1,0 -> 0,0,1 + */ + if (val == _Q_PENDING_VAL) { + int cnt = _Q_PENDING_LOOPS; + val = atomic_cond_read_relaxed(&lock->val, + (VAL != _Q_PENDING_VAL) || !cnt--); + } + + /* + * If we observe any contention; queue. + */ + if (val & ~_Q_LOCKED_MASK) + goto queue; + + /* + * trylock || pending + * + * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock + */ + val = queued_fetch_set_pending_acquire(lock); + + /* + * If we observe contention, there is a concurrent locker. + * + * Undo and queue; our setting of PENDING might have made the + * n,0,0 -> 0,0,0 transition fail and it will now be waiting + * on @next to become !NULL. + */ + if (unlikely(val & ~_Q_LOCKED_MASK)) { + + /* Undo PENDING if we set it. */ + if (!(val & _Q_PENDING_MASK)) + clear_pending(lock); + + goto queue; + } + + /* + * We're pending, wait for the owner to go away. + * + * 0,1,1 -> *,1,0 + * + * this wait loop must be a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because not all + * clear_pending_set_locked() implementations imply full + * barriers. + */ + if (val & _Q_LOCKED_MASK) + smp_cond_load_acquire(&lock->locked, !VAL); + + /* + * take ownership and clear the pending bit. + * + * 0,1,0 -> 0,0,1 + */ + clear_pending_set_locked(lock); + lockevent_inc(lock_pending); + return; + + /* + * End of pending bit optimistic spinning and beginning of MCS + * queuing. + */ +queue: + lockevent_inc(lock_slowpath); +pv_queue: + node = this_cpu_ptr(&rqnodes[0].mcs); + idx = node->count++; + tail = encode_tail(smp_processor_id(), idx); + + trace_contention_begin(lock, LCB_F_SPIN); + + /* + * 4 nodes are allocated based on the assumption that there will + * not be nested NMIs taking spinlocks. That may not be true in + * some architectures even though the chance of needing more than + * 4 nodes will still be extremely unlikely. When that happens, + * we fall back to spinning on the lock directly without using + * any MCS node. This is not the most elegant solution, but is + * simple enough. + */ + if (unlikely(idx >= _Q_MAX_NODES)) { + lockevent_inc(lock_no_node); + while (!queued_spin_trylock(lock)) + cpu_relax(); + goto release; + } + + node = grab_mcs_node(node, idx); + + /* + * Keep counts of non-zero index values: + */ + lockevent_cond_inc(lock_use_node2 + idx - 1, idx); + + /* + * Ensure that we increment the head node->count before initialising + * the actual node. If the compiler is kind enough to reorder these + * stores, then an IRQ could overwrite our assignments. + */ + barrier(); + + node->locked = 0; + node->next = NULL; + pv_init_node(node); + + /* + * We touched a (possibly) cold cacheline in the per-cpu queue node; + * attempt the trylock once more in the hope someone let go while we + * weren't watching. + */ + if (queued_spin_trylock(lock)) + goto release; + + /* + * Ensure that the initialisation of @node is complete before we + * publish the updated tail via xchg_tail() and potentially link + * @node into the waitqueue via WRITE_ONCE(prev->next, node) below. + */ + smp_wmb(); + + /* + * Publish the updated tail. + * We have already touched the queueing cacheline; don't bother with + * pending stuff. + * + * p,*,* -> n,*,* + */ + old = xchg_tail(lock, tail); + next = NULL; + + /* + * if there was a previous node; link it and wait until reaching the + * head of the waitqueue. + */ + if (old & _Q_TAIL_MASK) { + prev = decode_tail(old, rqnodes); + + /* Link @node into the waitqueue. */ + WRITE_ONCE(prev->next, node); + + pv_wait_node(node, prev); + arch_mcs_spin_lock_contended(&node->locked); + + /* + * While waiting for the MCS lock, the next pointer may have + * been set by another lock waiter. We optimistically load + * the next pointer & prefetch the cacheline for writing + * to reduce latency in the upcoming MCS unlock operation. + */ + next = READ_ONCE(node->next); + if (next) + prefetchw(next); + } + + /* + * we're at the head of the waitqueue, wait for the owner & pending to + * go away. + * + * *,x,y -> *,0,0 + * + * this wait loop must use a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because the set_locked() function below + * does not imply a full barrier. + * + * The PV pv_wait_head_or_lock function, if active, will acquire + * the lock and return a non-zero value. So we have to skip the + * atomic_cond_read_acquire() call. As the next PV queue head hasn't + * been designated yet, there is no way for the locked value to become + * _Q_SLOW_VAL. So both the set_locked() and the + * atomic_cmpxchg_relaxed() calls will be safe. + * + * If PV isn't active, 0 will be returned instead. + * + */ + if ((val = pv_wait_head_or_lock(lock, node))) + goto locked; + + val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); + +locked: + /* + * claim the lock: + * + * n,0,0 -> 0,0,1 : lock, uncontended + * *,*,0 -> *,*,1 : lock, contended + * + * If the queue head is the only one in the queue (lock value == tail) + * and nobody is pending, clear the tail code and grab the lock. + * Otherwise, we only need to grab the lock. + */ + + /* + * In the PV case we might already have _Q_LOCKED_VAL set, because + * of lock stealing; therefore we must also allow: + * + * n,0,1 -> 0,0,1 + * + * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the + * above wait condition, therefore any concurrent setting of + * PENDING will make the uncontended transition fail. + */ + if ((val & _Q_TAIL_MASK) == tail) { + if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL)) + goto release; /* No contention */ + } + + /* + * Either somebody is queued behind us or _Q_PENDING_VAL got set + * which will then detect the remaining tail and queue behind us + * ensuring we'll see a @next. + */ + set_locked(lock); + + /* + * contended path; wait for next if not observed yet, release. + */ + if (!next) + next = smp_cond_load_relaxed(&node->next, (VAL)); + + arch_mcs_spin_unlock_contended(&next->locked); + pv_kick_node(lock, next); + +release: + trace_contention_end(lock, 0); + + /* + * release the node + */ + __this_cpu_dec(rqnodes[0].mcs.count); +} +EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath); + +/* + * Generate the paravirt code for resilient_queued_spin_unlock_slowpath(). + */ +#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS) +#define _GEN_PV_LOCK_SLOWPATH + +#undef pv_enabled +#define pv_enabled() true + +#undef pv_init_node +#undef pv_wait_node +#undef pv_kick_node +#undef pv_wait_head_or_lock + +#undef resilient_queued_spin_lock_slowpath +#define resilient_queued_spin_lock_slowpath __pv_resilient_queued_spin_lock_slowpath + +#include "../locking/qspinlock_paravirt.h" +#include "rqspinlock.c" + +bool nopvspin; +static __init int parse_nopvspin(char *arg) +{ + nopvspin = true; + return 0; +} +early_param("nopvspin", parse_nopvspin); +#endif From patchwork Sun Mar 16 04:05:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018314 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C725C28B30 for ; Sun, 16 Mar 2025 04:16:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=beVjeo/VucQmznYM//yRk+lY9Xj8BUazd5x3dTjfvms=; b=BYzm+qEb8EWHrj/efeNBwogymS bw6qORP96UNfSlmp8JLvq6GrRVznYsQC2GMNxInl5ZezHvIvfTt3K3PGt9XldwNP8SbZLmeniv6RI hHRBXG4+y4DQm8fIkkQ8Q7yeysPzUfYzSPkKvANX/2YxdxmBFR6ph1SAlJuj5H18DcII0yERNG7I5 9bOoeKyZN0xmrjfZgUg+swjFxNclF+J1V4Ko9wTT/fOaErQSQOUnDmUEwe2viNuqUEKVy/Mw7hBKS oH1sprGI2SK8PlWRcQN/MwNVaWyljrbENmkxVgLT5SzkKqKXninTHZKvhWaD015z4Z3VbYJk19nDW 66AvMZMw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfPu-0000000HEIX-0Y8x; Sun, 16 Mar 2025 04:16:02 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfG2-0000000HCBs-3kyD for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:52 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5b1f17b1804b1-43d07ca6a80so4664555e9.1 for ; Sat, 15 Mar 2025 21:05:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097949; x=1742702749; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=beVjeo/VucQmznYM//yRk+lY9Xj8BUazd5x3dTjfvms=; b=gzJQM8WZgXfHWlKoaPeaxztFOjvH54A7iVEM41UBeVCZGG3lqYtP1bHBe3W+yDBE6C mMR2xzcdxSQ48dUYjty1JzZZJdnpk12lK1kDELORlduQLVYd0Jo3O/ZWJ2tWfRnlrT96 GQ6l5VzL7m4KnDCsUQVJCp6uvBmF3EP0HdOsli1NrQfyqmS/W9fbeBlFN7v7EoJQ5DIp E0c/OBr2OD41c7krIzssE/LZEEmw22a3HmBmaq0TDT6osGIOYIGU5ZFfUBlzBIKYYv9o cKeqp8PpfK1k02azlKbK1z5niPz9ZoIOpa5PSvjUagblpZpkld04Bh37yQ+9+FRAVXKL 5PsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097949; x=1742702749; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=beVjeo/VucQmznYM//yRk+lY9Xj8BUazd5x3dTjfvms=; b=QJVAGQbXKWcBwj9Sqlm49XuhdcLvOsaS3M8elDmJ8vldUCAf1pLEAI6R0EYBIMpoFi J/Zi4QxObJL3RsaEsGW3IK24r/k+9vU3WOPXZny55Aurle0hiWVO5ojkEjZ9vI5NvXX+ 7qkyV8/QjHUkLxB52vc7hf1Jn+BawVFOt+hW54kchiES1hHICmTNJJ4+gEkB5JDE6zkZ FvNRlQn6x0BzW+jcsRfZzAQSwgZTw7LH0wBRGMgHitV07igk9OqKheZYBPSZV+v5Q5eA pEvlcw0/+PX87S5ku4EpRP3b0uKQ+VWbkzL6vb0WVaNrKF3UIzyAuNC0THQCx8B2uVXu GRLA== X-Forwarded-Encrypted: i=1; AJvYcCXSr3bHjE0FfCwnjhuKPCWhFaL7PMMDdII4Bc+mLDabXKB8a4zQyY+BkhyMbwC/OBuwKoKlJnQaVohOK10Thbhf@lists.infradead.org X-Gm-Message-State: AOJu0YwGnMeMhWIVh3ykgpXPHCEVAHzRVGWpxKuMDcRx+qlNiMoXvEPh XuXAquCkZWbWvRw0SzCHHjSR6yamzpI7hxBMNWrz79ZnEaKXbX1j X-Gm-Gg: ASbGncv4dnQRQYCBoBLWNdd4+mrJhbN3++Lg1gzgrENsNWa6ea+pORdxp1HD6ZcEE92 TCcUgv6MmGyoGaFxBFV02UneCabmOfEAwRZaBZWjblodHNfdBhQ0TWgoEMxiXmvm1u5bT35Yf5T wRvkHf7fcwV0sc5J78h4ahPAubHLNvk6UnQ4d4fMYzKHGKZf+TQ4bP+9ZS6RE/ZaHHfxpR8PCqh 09gLK101I+/xV5LqZeko4IfuLTWLgzCfaGfxTTJ+LsPMN8i0BjooUmtMdrWjVI59Hy/MifCfQYy jbdYeBnTWA88BhbDWHEMV4fftL2Rjkai3A== X-Google-Smtp-Source: AGHT+IHr2XBo0Zj7lg7uqlYGSFDV9q6POPhHGxARC6AThdf9jY9aY0L+6DWSNr/0NWU8352lQtntYA== X-Received: by 2002:a5d:6da3:0:b0:38d:dd52:1b5d with SMTP id ffacd0b85a97d-3971d03eeb0mr7839110f8f.4.1742097949466; Sat, 15 Mar 2025 21:05:49 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:9::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1fe60b91sm67303265e9.31.2025.03.15.21.05.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:48 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 05/25] rqspinlock: Add rqspinlock.h header Date: Sat, 15 Mar 2025 21:05:21 -0700 Message-ID: <20250316040541.108729-6-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2286; h=from:subject; bh=J879+EgbuyTebrgMR7yNECE6EUzDiFxlkQMxluMZnf4=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3b9elCAd3YQ236M81hMVLjSuOppZtbhch+VA6/ +rsNVFWJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN2wAKCRBM4MiGSL8RyshkEA CCv98+4lY1p9/sFVO0M2tPH7eTtOTASKJhakQbPqh1VQV36TyCzSaQMxrKctCpN8C7owWTC01bHGqc uI+Z9UudyPSj0Cm7RYxCrP93Lb3Uayvcj3YxfZ4ECgMrBwLS3EOpGMhRJd60Dcb6XFtUQOMUQvrve4 cuGnRz6yfsa+vQNLSW1t0H2BODbKRzlFEu1WGD42YiEzjN4knrh/owAlZ1Av8cUPJpxMNzueQASucb eGyQ18wsYNVZjVcc/wp4dif0cQLiLc7c+6E9bxeuTzbgRHtAxzROq9WCxfhDOPYLlZe5m/aFW2HojL 7ilY58wpGW0Rhs3XUzv4sGUT9tj3GcDBl0dm7dPpLB4d0JrX4YkowIj4b/07SIASKj3ovG7wFW+o0m 5qe+QjTAhKQ1RgSIfkCaRVQGQBhPFqBL5Zf1/JTJ1VmHf0uuxx8/7CvgRM+p5KETo8myeIC5Tt9//M B+5VUrfWUAdSF6KWRzKCTFpqBHuJS6T+4JbZX2he0aHaQZOHxX2qn3lDC6VzuHL5BR5XX5LdizGzBV f/EiK7VPNweuCM6BurIbBqvfKkiq86dQe15jvoRriwpQuGOlaOaeUayhNQDPOQpAlufShTTNKO/MiT z5CdzlZbqINK5Uj9m28uAQkGmSkxfPYss0RS+pN4X5EuZINLNuJyoxi2UMzA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210550_946620_005AB190 X-CRM114-Status: GOOD ( 15.45 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org This header contains the public declarations usable in the rest of the kernel for rqspinlock. Let's also type alias qspinlock to rqspinlock_t to ensure consistent use of the new lock type. We want to remove dependence on the qspinlock type in later patches as we need to provide a test-and-set fallback, hence begin abstracting away from now onwards. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 19 +++++++++++++++++++ kernel/bpf/rqspinlock.c | 3 ++- 2 files changed, 21 insertions(+), 1 deletion(-) create mode 100644 include/asm-generic/rqspinlock.h diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h new file mode 100644 index 000000000000..22f8094d0550 --- /dev/null +++ b/include/asm-generic/rqspinlock.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Resilient Queued Spin Lock + * + * (C) Copyright 2024-2025 Meta Platforms, Inc. and affiliates. + * + * Authors: Kumar Kartikeya Dwivedi + */ +#ifndef __ASM_GENERIC_RQSPINLOCK_H +#define __ASM_GENERIC_RQSPINLOCK_H + +#include + +struct qspinlock; +typedef struct qspinlock rqspinlock_t; + +extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); + +#endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index 762108cb0f38..93e31633c2aa 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -23,6 +23,7 @@ #include #include #include +#include /* * Include queued spinlock definitions and statistics code @@ -127,7 +128,7 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; u32 old, tail; From patchwork Sun Mar 16 04:05:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 55824C282DE for ; Sun, 16 Mar 2025 04:17:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Fh+KgT9rgROAc0Y/x+RNT9BL+c7q6+gqaABwWSSeoyU=; b=nI8lMbtLBk41puM42ao39WSeFx /Z+hYo5q/0OlXU2RNS/oXZXT1/XgJztfGra7tC7CxD1O+p36EoR3J940L7XHxxqKBCGVyI3k4kdiV WSTjKSHgF3hHD4k1SPtfQ5CUccruyphISHIWcz1gvtIc90SF0187QVlKLIvW+PbIkN4veUQD0VUMR WQQQ67H9qk2AFNKfdUA2ots9EgwOKF4anNWwvjBGw1GmHh6lJf92Y2g42+Io9ePqRG5McfDH+Ps6o mGgrrYyF3EMjTj922Awe7+4CVfpd1T9/TL1ToeO0UMd6UkfI8yx4fQcPlTGVfrVtfW8YZvJwy2kSJ VoyGSMZQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfRX-0000000HEQ9-3vQM; Sun, 16 Mar 2025 04:17:43 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfG4-0000000HCCX-0Tnf for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:53 +0000 Received: by mail-wm1-x343.google.com with SMTP id 5b1f17b1804b1-43cfebc343dso6753645e9.2 for ; Sat, 15 Mar 2025 21:05:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097951; x=1742702751; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Fh+KgT9rgROAc0Y/x+RNT9BL+c7q6+gqaABwWSSeoyU=; b=bEtdf8owsoVrFbIRL8rQ5dTBwfEL3uE3j6CFOnyFTRIP6rhEG+uVakVEj8yJemmrO2 2bZAhmPaNfpzbfRrOXMLkpY+v/umo8aGmjXzZYAT+lC4pGrAc9fpHnIzuVpF2JzqkOY9 E3MlOqT7sjiltV6ME6Qgkk/LLZvId2Z1y99Uw9i9LiLwynX1gVMY7Q3s5pp4gS9NrAOx JRMJi/9sxwh24DP9J7jV+/VgerzCaVpWZOqeJF02mDlT7oP2uO6uVEm2seTNbkc0U/C4 HkZhz/OsKQ/3o429RSjz1MVMSGBHezvUOvLNM26y88tPFOB1Z+2UopaRb81EqYcmXvBk To/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097951; x=1742702751; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Fh+KgT9rgROAc0Y/x+RNT9BL+c7q6+gqaABwWSSeoyU=; b=H/EzbAeqCSKY1vcs2/h7we4fBBPKgV5PRSNa2cxrZieobEa3G2k/kKXnqt2XDm8R34 0RnHlpY/21Yeymehq9SlHgIvGkr+V6Ip6mBhegM9iQX02gI2/SVG232Gx3UlVZzsYtdH s6L5x1Flv+guiu2WqJOd0nauqKHlRfe/vEPq4pU5agBJyoWmy4tG4UMUaPq2SPFHADGc aveEky2gOYIpmKY+3tRl/JJRZtOenXO1y8M7OT/ydyKBRRt5nHt/3kExYxwJLuCOlVXm uZuQGTi3i1FNypcWaVLxUADUdRxyjPhypEDQhGz15QbK6539vHkPlt6bMP/fsTuHHd8U zrlA== X-Forwarded-Encrypted: i=1; AJvYcCXptrRwE1/iJvS3Lzhpq9ysCka9l2Cmu5ad34TLUYCH4uDqhoQAab3GwP3Qd2PBeWRg/i/Tij1PspVMCmmybBr6@lists.infradead.org X-Gm-Message-State: AOJu0YwDn1P2mqZUWHn8ly4njAU/KBvwXZkX2FJNDhfERwE8NwVli46s kYLmwLh8tZaqCg3JlEbQ2cOV9gYlmxrMIIuPATZL3KaCjwWmVZn6 X-Gm-Gg: ASbGncsyFYKKjJzcIl9KqlSFbkrX/kQrObW4LhbnyDRNAKAhe8sBUsK4zF63nzDMhuq Y1yncxWkLYifkqhm3dkeqFNSZ6aAeODkvsqoL0YTn83OPqtWtoyZZi4Ryw4+luo0lePBTjBmLE9 nzOyveSQGSVA0pmJsbOykvc+i4zTn2/+XNA01uqObjqcfr27IPe/nzVA9TZUM4wUuYIuMn7JN7N 1zp1ZSDqBYdfSqmUATvNt7YROL7FlNpUQjM85NaPYXgYQ2ofJzPKw9grtqqCM8ajH+0SvoIFGla v9yv15vQNuLTB97uu5ORBB/rdsZYiPFXiw== X-Google-Smtp-Source: AGHT+IHzQuqWCQcYRFu+U0l6j9WPow5BG3/fLKT5syvwgj2QS6unu4OTMYwom2NbnxvyXS8VNENPuw== X-Received: by 2002:a5d:6d02:0:b0:391:2b04:73d9 with SMTP id ffacd0b85a97d-3971f511669mr9037415f8f.49.1742097950669; Sat, 15 Mar 2025 21:05:50 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:4::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395c7df35f7sm10982359f8f.13.2025.03.15.21.05.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:50 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 06/25] rqspinlock: Drop PV and virtualization support Date: Sat, 15 Mar 2025 21:05:22 -0700 Message-ID: <20250316040541.108729-7-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6827; h=from:subject; bh=UZ8naMS1hT+7LGCLfEmfT5uEbBX3f9Jyevr2RnHN8ec=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3cNxB8Y7ML8wYp3OrW2rUAHgS5cApFt293sVR+ FUNcGiOJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8Ryo0MD/ 97ReRi3PswH9DGAYd8NdunrsJZxOckjHjAmgCpfUiaRZykVSnDzlH/qc51eaWr0NnIeisWZ8UIFUkI jzIM3cz8kFKlXhwFkC5OIr8zKUy1nLd1KZ2VvWvTGtfiNK7FcKf4UnaPtm2h7FR7ymOEqkfRwa6Sv4 KDpMy2fYSfBwlS2zzE8lvtID6NrSQMO0nG9hZI5Vlq1gZCjr6wPcg1I4W8Yg2A3HVdq9TJesAs459g +kngV/eR+MZhuO1EKoqAhoyCPZCd04dcd4H0/wCcPMCioozsjnZ0Pa/DN9InNXE1fB2owf7rCR5D5e IDZglzG1xyPXgfH8+VJFxlCX5hKTz84v/Ew/3CT6rx8xeARZY2r2Jf9awUQd+pYCqBL8Ugzs+zgQI8 AuUcemjQLAR/CPa+I8ZnSbCaZMFh10dPUjejN/Iy/kmM0UrW++NXO3QeJwX9dDQBlV/O4GVAYCmH0V QmKi1sVhk4Ai7I32j5292bI9eoweCYwn57ZCmo/xRT/DXKexPr/MiFRPmqpaEWk4UY9ls3MvPYS9mZ Nj1/QhALuONxN9ys6+jE6eMSXgQ6Bly0zRkRx4vhY/FbDWv43MLGyjayc7Mf/dHyABAf66kMw1z4wM PNcxJIsebXTOhOAMPJvATfcKgOhdcWe0Hs5dZLIc37/5kP9BpbmTuPiGZdvg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210552_170713_547EB753 X-CRM114-Status: GOOD ( 20.16 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Changes to rqspinlock in subsequent commits will be algorithmic modifications, which won't remain in agreement with the implementations of paravirt spin lock and virt_spin_lock support. These future changes include measures for terminating waiting loops in slow path after a certain point. While using a fair lock like qspinlock directly inside virtual machines leads to suboptimal performance under certain conditions, we cannot use the existing virtualization support before we make it resilient as well. Therefore, drop it for now. Note that we need to drop qspinlock_stat.h, as it's only relevant in case of CONFIG_PARAVIRT_SPINLOCKS=y, but we need to keep lock_events.h in the includes, which was indirectly pulled in before. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/rqspinlock.c | 91 +---------------------------------------- 1 file changed, 1 insertion(+), 90 deletions(-) diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index 93e31633c2aa..c2646cffc59e 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -11,8 +11,6 @@ * Peter Zijlstra */ -#ifndef _GEN_PV_LOCK_SLOWPATH - #include #include #include @@ -29,7 +27,7 @@ * Include queued spinlock definitions and statistics code */ #include "../locking/qspinlock.h" -#include "../locking/qspinlock_stat.h" +#include "../locking/lock_events.h" /* * The basic principle of a queue-based spinlock can best be understood @@ -75,38 +73,9 @@ * contexts: task, softirq, hardirq, nmi. * * Exactly fits one 64-byte cacheline on a 64-bit architecture. - * - * PV doubles the storage and uses the second cacheline for PV state. */ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); -/* - * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs - * for all the PV callbacks. - */ - -static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } -static __always_inline void __pv_wait_node(struct mcs_spinlock *node, - struct mcs_spinlock *prev) { } -static __always_inline void __pv_kick_node(struct qspinlock *lock, - struct mcs_spinlock *node) { } -static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, - struct mcs_spinlock *node) - { return 0; } - -#define pv_enabled() false - -#define pv_init_node __pv_init_node -#define pv_wait_node __pv_wait_node -#define pv_kick_node __pv_kick_node -#define pv_wait_head_or_lock __pv_wait_head_or_lock - -#ifdef CONFIG_PARAVIRT_SPINLOCKS -#define resilient_queued_spin_lock_slowpath native_resilient_queued_spin_lock_slowpath -#endif - -#endif /* _GEN_PV_LOCK_SLOWPATH */ - /** * resilient_queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure @@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); - if (pv_enabled()) - goto pv_queue; - - if (virt_spin_lock(lock)) - return; - /* * Wait for in-progress pending->locked hand-overs with a bounded * number of spins so that we guarantee forward progress. @@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ queue: lockevent_inc(lock_slowpath); -pv_queue: node = this_cpu_ptr(&rqnodes[0].mcs); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) node->locked = 0; node->next = NULL; - pv_init_node(node); /* * We touched a (possibly) cold cacheline in the per-cpu queue node; @@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); - pv_wait_node(node, prev); arch_mcs_spin_lock_contended(&node->locked); /* @@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * store-release that clears the locked bit and create lock * sequentiality; this is because the set_locked() function below * does not imply a full barrier. - * - * The PV pv_wait_head_or_lock function, if active, will acquire - * the lock and return a non-zero value. So we have to skip the - * atomic_cond_read_acquire() call. As the next PV queue head hasn't - * been designated yet, there is no way for the locked value to become - * _Q_SLOW_VAL. So both the set_locked() and the - * atomic_cmpxchg_relaxed() calls will be safe. - * - * If PV isn't active, 0 will be returned instead. - * */ - if ((val = pv_wait_head_or_lock(lock, node))) - goto locked; - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); -locked: /* * claim the lock: * @@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ /* - * In the PV case we might already have _Q_LOCKED_VAL set, because - * of lock stealing; therefore we must also allow: - * - * n,0,1 -> 0,0,1 - * * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the * above wait condition, therefore any concurrent setting of * PENDING will make the uncontended transition fail. @@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) next = smp_cond_load_relaxed(&node->next, (VAL)); arch_mcs_spin_unlock_contended(&next->locked); - pv_kick_node(lock, next); release: trace_contention_end(lock, 0); @@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) __this_cpu_dec(rqnodes[0].mcs.count); } EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath); - -/* - * Generate the paravirt code for resilient_queued_spin_unlock_slowpath(). - */ -#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS) -#define _GEN_PV_LOCK_SLOWPATH - -#undef pv_enabled -#define pv_enabled() true - -#undef pv_init_node -#undef pv_wait_node -#undef pv_kick_node -#undef pv_wait_head_or_lock - -#undef resilient_queued_spin_lock_slowpath -#define resilient_queued_spin_lock_slowpath __pv_resilient_queued_spin_lock_slowpath - -#include "../locking/qspinlock_paravirt.h" -#include "rqspinlock.c" - -bool nopvspin; -static __init int parse_nopvspin(char *arg) -{ - nopvspin = true; - return 0; -} -early_param("nopvspin", parse_nopvspin); -#endif From patchwork Sun Mar 16 04:05:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0670BC282DE for ; Sun, 16 Mar 2025 04:19:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=9VRGS9P+jyMhEgNXbehIAgq9LwV+0ThEZLQ6VCSjK78=; b=Timk0yNEwg1oFR4sYYucB4QaI9 9tHZwsY5ZevOEM/9vDINgNllNeyZzJ/6TtdfTNTVqv3Yfb7cELDQbyn8zV42mzlqzf0FdIUJBC8W3 fZHC9pyH/fPC/rDix87su4KXfFsLrKBFBLETIyFjX2G4gl/Yoxxmu2vZFxGJVgYBdba8IWm7uHnPJ kO5B8bz2YYfUTu7UQkzxx90CGOIc8CrBe/o/BloNmBoQfxGvCSbxVWlrp+jyBMff9eUfQ8hyAuIRw VKiYutu/Tnoyn/VnnvSO2yHtu2xh6syl/hA0kEuixqerx4ahW7i5E4rrBEkHhbdK9sY68zT87Hvel zkuLs86Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfTB-0000000HEW4-2dLz; Sun, 16 Mar 2025 04:19:25 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfG5-0000000HCDG-23Sh for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:55 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-43d0782d787so6681555e9.0 for ; Sat, 15 Mar 2025 21:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097952; x=1742702752; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9VRGS9P+jyMhEgNXbehIAgq9LwV+0ThEZLQ6VCSjK78=; b=HJhSVnsYONOaGBznqQUin7l5Ez07ojlRBci0clKpcC1ttRqhCBM5bB55wDsWh77nZ0 17rMzj7X6BPzDdxVQ+pgv6k1YWOi6RoTVDBMkY85zkQmvFatx3xrVN0qXMr/9P5HFb/C iOqLsjJad+xBq1N1NTXajplrionPjWnHZuD305mPC+iw2Jw8buJdB7mJRGe5hlowqF0s 5SL/r58mGy6rklEMmt8FRKQtmdaL0CVtAkCq63NtbbkDk5oAxF4nyNqPdDqfkx6Aobun hU2eFXJXB2y+AHVk7PoBtrPPeFMiwFbXf3yCmE16GsZ0fDakgze8ITUzru6UEUCuGuFu iRCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097952; x=1742702752; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9VRGS9P+jyMhEgNXbehIAgq9LwV+0ThEZLQ6VCSjK78=; b=D11KJGloRxlUaxtrOjr4bM6Xb1r3EtKVzH4cyzzNR/ZrLUW2c4dqf54294Xrr1iiEg p5bgh2ZtmKfFZbcecYKlCs8TpyhMy9wm4v7lBeYQHtDOLtsTfBy143sJZ2dH+U1wQYhT LKpEvLk6UGAEDyXH0DdCzZcNPoRrZtZjxovG7P2B2c+JN6TQ8yNID+HOjAaltgC7gGuR dYanTeFPtu3zptlaOZzlmeeOavgt7a+tURXcoVMrJjFo49sPRDEE+/N4YzSjJca0kf7J d1tntrNflY5O7/QHS/4sPmG4gf5U6PusoQdwz+NQN+XtAuQZEvjCpLpJ7YZIx/tYeXHH 2jnw== X-Forwarded-Encrypted: i=1; AJvYcCUeTp6hGN6BP2hiFGlo1jVPYk3RTcGenndf+WB0mm60agftC894i6aYO1+Rf5K8s0i2zmCO7I9LbaFBcXVyHyQy@lists.infradead.org X-Gm-Message-State: AOJu0Yz57NErDDFjISMhgeQyegqKxW6e6rKRbnFl+VozF0abpA3IvJca VaUxgtz0E0mUJWZOS80kZkuoIvyZHXb+KV6GeivXDtAV/mlo1QCt X-Gm-Gg: ASbGncu+ZO6pMqD9WlEALXGSEVLRTSUNqx387rLUt5wChux8HT9sKmT31dpcy22231q uHOrrQTJiFKv3xxSXP/YbwVC9oGSf7Y25qNDkZ5uhFmJaW1JyvlEhP4oDrRrPgy+6V1cqy4+lu5 fHxf2kGp7XZYMvNCsGQK9sB0/VM5HAe2vSFTJS9XxyBysl3m0TrVDK0pdDaqMQtd+tXXepCkmrl XrOL38Fj28iSSpy73lNVuPug357dgnQz2pEMsQkefsKVmVQmjJj/BwbgWV5ITYzBX4SYIQrMHnb u1uaLTmObl1sypFqNBLEEfIbshX5bhlQ1Oo= X-Google-Smtp-Source: AGHT+IFz3AnPPrWkJH2W87H/P68UdDnYxlyg4j1YFp+J6rmRetr1S1beo6FRJwr3WTZ5ZKGh7xpY2g== X-Received: by 2002:a5d:64e3:0:b0:391:47d8:de25 with SMTP id ffacd0b85a97d-3971ee4421fmr11004162f8f.41.1742097952104; Sat, 15 Mar 2025 21:05:52 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:72::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395cb318afbsm11186285f8f.72.2025.03.15.21.05.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:51 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 07/25] rqspinlock: Add support for timeouts Date: Sat, 15 Mar 2025 21:05:23 -0700 Message-ID: <20250316040541.108729-8-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4618; h=from:subject; bh=y9/H+/hm8Whgobd7BHik2UQJXH7cEJPayXzZNcl5svk=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3c971BRmRrbtZ27c2Isb9sJ77Is1kCbCdZsV6t 1TF9L6+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8RysI2EA CWS2QokbV6D30J2uxyFyVfcIaONPfZhICR7q5XYFdtmvsNs0zVjHot9yAnnznptXBAiSS+cpG5Aawg z2RsgVOs+pD0nZva8ZR6I0u3fDlBkTQkxzWXNevfnkDzWf6uUiNYOYliNQaW5nWk7Xw7ZhjvCiKRPX 5A+tArW1n6UuDrD81t1KOunXMvupCUoGslJanF+8gDOt4ww2O9a2OMUAEYlDam8nqkz9cM0CzCd7lj QMx+mD23eRdYJPFBaZ0DwuFWjSBsFEv1inLqSt9M2aRZyPyhN91AjHkP7RVP+M9mho4L+KwU5R0cXs ze0x9jqRuyREct7IZSep7fJflgyK3LtyZ98G5SfXEQFt45kdMIOrFHPRuBnQm4E6sCs4aJuA0iCmAf dZlv2k2m5XIg3dN0guZFl2s1WPD8cB9W0MpgJTmOTnOl4cdR5tCYbuQwVZ8AuFNtt9E5raXpwdig8A 3cVwVvesOvJ3R3L0nMCpCVvkK2xGt3HgFD3qTNJiFpVZ1DYfR37iPcxY/fvZQlkrC//giZDQ8bcvxJ GzzbGGJVJzdeQQcjyz/aBTh+kag7lpu9O1tyMEIIWMI7H0OfiLPjxnjWKbSoAZGBcrjRyr2gPN67vS 42Su7CQ/jdGZQj58ck+gkVIG4s5JTH4rC9is1NwbNmozjCDyE+65OdLS++Gg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210553_534601_12F0F04F X-CRM114-Status: GOOD ( 24.08 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect when the timeout has expired for the slow path to return an error. It depends on being passed two variables initialized to 0: ts, ret. The 'ts' parameter is of type rqspinlock_timeout. This macro resolves to the (ret) expression so that it can be used in statements like smp_cond_load_acquire to break the waiting loop condition. The 'spin' member is used to amortize the cost of checking time by dispatching to the implementation every 64k iterations. The 'timeout_end' member is used to keep track of the timestamp that denotes the end of the waiting period. The 'ret' parameter denotes the status of the timeout, and can be checked in the slow path to detect timeouts after waiting loops. The 'duration' member is used to store the timeout duration for each waiting loop. The default timeout value defined in the header (RES_DEF_TIMEOUT) is 0.25 seconds. This macro will be used as a condition for waiting loops in the slow path. Since each waiting loop applies a fresh timeout using the same rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the values can be easily reinitialized to the default state. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 6 +++++ kernel/bpf/rqspinlock.c | 45 ++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 22f8094d0550..5dd4dd8aee69 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -10,10 +10,16 @@ #define __ASM_GENERIC_RQSPINLOCK_H #include +#include struct qspinlock; typedef struct qspinlock rqspinlock_t; extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +/* + * Default timeout for waiting loops is 0.25 seconds + */ +#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4) + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index c2646cffc59e..0d8964b4d44a 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -6,9 +6,11 @@ * (C) Copyright 2013-2014,2018 Red Hat, Inc. * (C) Copyright 2015 Intel Corp. * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP + * (C) Copyright 2024-2025 Meta Platforms, Inc. and affiliates. * * Authors: Waiman Long * Peter Zijlstra + * Kumar Kartikeya Dwivedi */ #include @@ -22,6 +24,7 @@ #include #include #include +#include /* * Include queued spinlock definitions and statistics code @@ -68,6 +71,45 @@ #include "../locking/mcs_spinlock.h" +struct rqspinlock_timeout { + u64 timeout_end; + u64 duration; + u16 spin; +}; + +static noinline int check_timeout(struct rqspinlock_timeout *ts) +{ + u64 time = ktime_get_mono_fast_ns(); + + if (!ts->timeout_end) { + ts->timeout_end = time + ts->duration; + return 0; + } + + if (time > ts->timeout_end) + return -ETIMEDOUT; + + return 0; +} + +#define RES_CHECK_TIMEOUT(ts, ret) \ + ({ \ + if (!(ts).spin++) \ + (ret) = check_timeout(&(ts)); \ + (ret); \ + }) + +/* + * Initialize the 'spin' member. + */ +#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; }) + +/* + * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary. + * Duration is defined for each spin attempt, so set it here. + */ +#define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; }) + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. @@ -100,11 +142,14 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; + struct rqspinlock_timeout ts; u32 old, tail; int idx; BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + RES_INIT_TIMEOUT(ts); + /* * Wait for in-progress pending->locked hand-overs with a bounded * number of spins so that we guarantee forward progress. From patchwork Sun Mar 16 04:05:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A4D1EC282DE for ; Sun, 16 Mar 2025 04:21:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=zFK9QP2ryp6S9vfpdoBhpsIqnMqrZRtjhBu2k3+ZF1k=; b=ZqXjwa0p94o/M4JB58IHKyl+Tq xijpGQTXGxAXDrAYISR5p5Ly0OX6PihC9ACd1m+ktwcy4LJl3fsSllypfyAitXl7AnEhwXJudiD1C WcQgVCSWaWdJBRGiTVdROMFJQkvUvpOh+AOydCCmLemiGEhHXt2GtuuftthKZ9CxXmpvR8RwZ+XDu yy1NXbT89YdtdECHA159z6vReYhwzAfDudRR/jVKHCB1QdnWDM6huaaewlkDxHFdBwUx5eWVpcJNP wMGRM9UmTPmu1CgZCqME5vfUmYHiud8625vQs0qmlA0dw1L1Ea/asgGK07ltfeywf87nycdaKBHa4 Nbwb/Avw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfUp-0000000HEgU-1UaC; Sun, 16 Mar 2025 04:21:07 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfG6-0000000HCEF-2kFw for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:55 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-43d0618746bso7043455e9.2 for ; Sat, 15 Mar 2025 21:05:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097953; x=1742702753; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zFK9QP2ryp6S9vfpdoBhpsIqnMqrZRtjhBu2k3+ZF1k=; b=ID+IbasKIgqcjFt12ywQWhWm0yila5OC+eiup4IrP1rIjInejv+RH4y9dN1w6fbJ8k mvqYBnto/B0H2GnNT5LHFSBiQgLm8bTlUbfzWhar6/13MvgvT2S98xz+t5bJAxf+PDvr tclnHF1NYPUwBgPqatnHIfuT5H3W1NSt0U8p4AIuIL7WP0UXJKPk9rbvWYbNgHzzG4SY IZBsOqlLenC4oIqS1hT+AoXnNVigkOVAE7SfHhcaGXPBvRermKVtSxG4teFmdJdrwFOu Zxujax7O1OErVrXT4Mj7wiSGlq81MGlwrD9OtmnJRdhd+cjrzMQACTX3F0S2M0E1YT2j tS6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097953; x=1742702753; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zFK9QP2ryp6S9vfpdoBhpsIqnMqrZRtjhBu2k3+ZF1k=; b=veElZ4ZOqHtwHKDp5fxQobLe8gMWVBHToi46j1fxhZ+wO5rOj+c15+n+NwxBawwQtz QfXuyysAdFC0RSrZbzM6hjQII3XmrHBdTJKWAiZgS4NYb5A8bGiRQ0y8ddzTJ+u1DkGp C0bJhZ+sPjo2zzYIHBkEkuTZOl+MF1FaG9Dx2aM2TRfgJmoHpdTujpn22wyudlPOhpn+ /FWD/z76tFsVwmK8AWxJKtsJ4UhdceDVmACU6EmTl4WPm10MOddatvUChorEzwHSefHZ obScHrEBCQsADdVkcrgcu/pgrQDeRQOje1ZodUhYV7Ad0//mD/JzmNrv6GtBkzDFtQCI DliQ== X-Forwarded-Encrypted: i=1; AJvYcCXAoe5n05REndS+AoFCjB/K4FOYzTFX/v/lb0qylL/MSnMAo43Q0S8hvCkzP2rBWBbierc3kWt7c38rK3HjNe7f@lists.infradead.org X-Gm-Message-State: AOJu0YzsNnJioP5A+2+zt1mkQGUgegQ3TpWOQudAbqbC/LD9iheKzq+O hE1gDz90mfftXcqe68DXJ7N7GLCVbolHiiPCB3aYla+4Bbhk1nrs X-Gm-Gg: ASbGnctF1zqHA5wdNc342CCELg/m5/fQTVs6m20C+TK6CJVX3+BILJ1XyHUqmXXCrUR SpfXFwuDHOg92Yk8+DmJHW1H/w9Mx82L9dbL6t9jSkZwuHPQYyjawnrQhruA4aKJ7LmvnqFHTF8 2FF+66plHcCHVUZzEQyGWnf/jBJ5tn/oVtxJtUT6En9Hf1ysiNoKM65EoxgJaYDBFqMzkfxJKm+ +JF+a8KmULIDqOmZ636YE2z6yK23PMaip5Yh+O/mSYeIDnsIQmKpDdF5r1igQZp1O8ixU4Ad/eL fdkjYOYTYu68s2wO1CHsA93EL/6ptNHKzA== X-Google-Smtp-Source: AGHT+IGhUW+f1q1VNdCXhZBFgRFdmfv75hlIFEK2IxoeJXmnqDxRW8wtCa/CmOIyhQqKD2xosxaDyg== X-Received: by 2002:a05:6000:1aca:b0:390:d6ab:6c49 with SMTP id ffacd0b85a97d-397209627cbmr11722558f8f.35.1742097953204; Sat, 15 Mar 2025 21:05:53 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:5::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1fe60b91sm67304495e9.31.2025.03.15.21.05.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:52 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Ankur Arora , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 08/25] rqspinlock: Hardcode cond_acquire loops for arm64 Date: Sat, 15 Mar 2025 21:05:24 -0700 Message-ID: <20250316040541.108729-9-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6176; h=from:subject; bh=lKBJ9aBAw973+or/w6UqktipL8hdcm3y+5hYOVP0ZOk=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3cMbmNKFvx4Ik3/qHmmldYWMH/I0PyXp8ZVeBa UGufzLyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8RynLiD/ 9hx5tU4NN3EbyQPx9n+eKNGaPXqNZPXxhQCFKmNLAzfKnq/pcezJXwAaBvF3MvIZVzVlUV27hYJOaT VPdqeyvrqyrQLXnxKw6ZyYJH2pYpdUil8/2N6BpCLOUHygH+NSQiJ34gEdnURzGDh3RFatKjQifwQj lDUDg2B3dBHy3a6sRr+8+0JMw9/DK+acuvbq6yAHzyfOVH7EAeE7PdjvyFOsJ6AfH1Ljn2Ef3zbnuU 4Kqfx0mCHSNEKwZ0r8BjSLhmMaDqE/5s0iAaMrUyXfyyyNVJOuJXT1uPjOdz+svu+4lrBlI5NJUzSw vc3EODS08XuQXKt5+uiNPa+moqvRfm9TtSz/S9Rm6uUPIqM4LPg45KROS4P27UjPZ8jACSk0pQRrk1 IQrUW4DQinLN6Jf2g6ub0hZHFhXkkBZp9VYetZWXYCyaDyoLnKM2lzdUeRof5bkUQac+W3mWbGSYhs 1ZNsPxQs2uHgi2xFAHk3Z5vYT46PZIa1+0Dv3eLErjz6D62AmpZar3lhqeQ7Ta7XuCzKvDvHNqyisy aDrHLSUt2eVz4sbmWW6daJ4kEbOdjsCeHybPzY3n20R7UPwcoiGQld4ZgBGXkP06x2uAvOfgJMIG0M 9QbNJoAG58//G++hEQlk7Pm0zJfn+jEgToyvLCWCL5AFtls4uNh/Ru6sTjkA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210554_699416_316DE120 X-CRM114-Status: GOOD ( 23.40 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Currently, for rqspinlock usage, the implementation of smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are susceptible to stalls on arm64, because they do not guarantee that the conditional expression will be repeatedly invoked if the address being loaded from is not written to by other CPUs. When support for event-streams is absent (which unblocks stuck WFE-based loops every ~100us), we may end up being stuck forever. This causes a problem for us, as we need to repeatedly invoke the RES_CHECK_TIMEOUT in the spin loop to break out when the timeout expires. Let us import the smp_cond_load_acquire_timewait implementation Ankur is proposing in [0], and then fallback to it once it is merged. While we rely on the implementation to amortize the cost of sampling check_timeout for us, it will not happen when event stream support is unavailable. This is not the common case, and it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns comparison, hence just let it be. [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com Cc: Ankur Arora Signed-off-by: Kumar Kartikeya Dwivedi --- arch/arm64/include/asm/rqspinlock.h | 93 +++++++++++++++++++++++++++++ kernel/bpf/rqspinlock.c | 15 +++++ 2 files changed, 108 insertions(+) create mode 100644 arch/arm64/include/asm/rqspinlock.h diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h new file mode 100644 index 000000000000..5b80785324b6 --- /dev/null +++ b/arch/arm64/include/asm/rqspinlock.h @@ -0,0 +1,93 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_RQSPINLOCK_H +#define _ASM_RQSPINLOCK_H + +#include + +/* + * Hardcode res_smp_cond_load_acquire implementations for arm64 to a custom + * version based on [0]. In rqspinlock code, our conditional expression involves + * checking the value _and_ additionally a timeout. However, on arm64, the + * WFE-based implementation may never spin again if no stores occur to the + * locked byte in the lock word. As such, we may be stuck forever if + * event-stream based unblocking is not available on the platform for WFE spin + * loops (arch_timer_evtstrm_available). + * + * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this + * copy-paste. + * + * While we rely on the implementation to amortize the cost of sampling + * cond_expr for us, it will not happen when event stream support is + * unavailable, time_expr check is amortized. This is not the common case, and + * it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns + * comparison, hence just let it be. In case of event-stream, the loop is woken + * up at microsecond granularity. + * + * [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com + */ + +#ifndef smp_cond_load_acquire_timewait + +#define smp_cond_time_check_count 200 + +#define __smp_cond_load_relaxed_spinwait(ptr, cond_expr, time_expr_ns, \ + time_limit_ns) ({ \ + typeof(ptr) __PTR = (ptr); \ + __unqual_scalar_typeof(*ptr) VAL; \ + unsigned int __count = 0; \ + for (;;) { \ + VAL = READ_ONCE(*__PTR); \ + if (cond_expr) \ + break; \ + cpu_relax(); \ + if (__count++ < smp_cond_time_check_count) \ + continue; \ + if ((time_expr_ns) >= (time_limit_ns)) \ + break; \ + __count = 0; \ + } \ + (typeof(*ptr))VAL; \ +}) + +#define __smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, time_limit_ns) \ +({ \ + typeof(ptr) __PTR = (ptr); \ + __unqual_scalar_typeof(*ptr) VAL; \ + for (;;) { \ + VAL = smp_load_acquire(__PTR); \ + if (cond_expr) \ + break; \ + __cmpwait_relaxed(__PTR, VAL); \ + if ((time_expr_ns) >= (time_limit_ns)) \ + break; \ + } \ + (typeof(*ptr))VAL; \ +}) + +#define smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, time_limit_ns) \ +({ \ + __unqual_scalar_typeof(*ptr) _val; \ + int __wfe = arch_timer_evtstrm_available(); \ + \ + if (likely(__wfe)) { \ + _val = __smp_cond_load_acquire_timewait(ptr, cond_expr, \ + time_expr_ns, \ + time_limit_ns); \ + } else { \ + _val = __smp_cond_load_relaxed_spinwait(ptr, cond_expr, \ + time_expr_ns, \ + time_limit_ns); \ + smp_acquire__after_ctrl_dep(); \ + } \ + (typeof(*ptr))_val; \ +}) + +#endif + +#define res_smp_cond_load_acquire_timewait(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1) + +#include + +#endif /* _ASM_RQSPINLOCK_H */ diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index 0d8964b4d44a..d429b923b58f 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -92,12 +92,21 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) return 0; } +/* + * Do not amortize with spins when res_smp_cond_load_acquire is defined, + * as the macro does internal amortization for us. + */ +#ifndef res_smp_cond_load_acquire #define RES_CHECK_TIMEOUT(ts, ret) \ ({ \ if (!(ts).spin++) \ (ret) = check_timeout(&(ts)); \ (ret); \ }) +#else +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ + ({ (ret) = check_timeout(&(ts)); }) +#endif /* * Initialize the 'spin' member. @@ -118,6 +127,12 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) */ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); +#ifndef res_smp_cond_load_acquire +#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire(v, c) +#endif + +#define res_atomic_cond_read_acquire(v, c) res_smp_cond_load_acquire(&(v)->counter, (c)) + /** * resilient_queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure From patchwork Sun Mar 16 04:05:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C817EC282DE for ; Sun, 16 Mar 2025 04:22:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=DZSlxIaXESbScuJYNNT7oHLcn/yU5dVEPTh1HXSGjKY=; b=hipk3bKMAvcDORmGg51VLx4SSF BLQcF7uBKCB2wq8PD9jmvSRmi5bYE9FfdE3pYJ+4A/mDao+ZgQbfRJUcezvLEPKHqjVnoANKsO2VL cL6O1RBTSoCIyZUFHzO8gyyWwMED8YmoPbZRhm0/7BBHo/+SpYDSLKYbXnfWpK4FJGxhSeJWW2gaO Ksgkzb6ZHGiO7i4Ha7rmtWUpNj93o3M4CPwbcmx4qOXFr64t81sjDot6zo1fLh7wiNo7JYcAOAj78 94deQc3is3loWDiEPdWVP3yuzdo2fR0GcE/hzTn/Yi/7Kswag3CUJj5GXxy8xJP+NaMzNDrEiLfjo d62PLtlw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfWT-0000000HEoS-0aOM; Sun, 16 Mar 2025 04:22:49 +0000 Received: from mail-wr1-x444.google.com ([2a00:1450:4864:20::444]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfG8-0000000HCF4-1tCB for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:58 +0000 Received: by mail-wr1-x444.google.com with SMTP id ffacd0b85a97d-39149bccb69so3174232f8f.2 for ; Sat, 15 Mar 2025 21:05:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097955; x=1742702755; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DZSlxIaXESbScuJYNNT7oHLcn/yU5dVEPTh1HXSGjKY=; b=BEdQjyayBBPFNvNZmM5EE+9d5j5Sk3+ZXqY0EdNPHSlFXjirv5rPojKjEW6PFix5ub iUcj12djxCL3Eybm9kehRoan5SiKBLxsIUoi5xCSJFuybsjRb2W0fvYk0eynDo9WB12q +2BozOGkEjM4UxgLpvhIRg9TZ+sCmJMN3K+Bi+WhFa+8J3xa8Wn9FsEwHofO3YU8pqaM hpJxUrGucKSJks6SsJGNUL7j5edMMSIDMCvL/Tsg3S2PBHiw84aSwJmDv8lzeqBZCrL8 KgGBlhitU67bICX9s+1NYBBONgfXVRzCOCjxwYubjdKNR1IqnD8nyJ/0RIcLGc4lA+ur Sj+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097955; x=1742702755; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DZSlxIaXESbScuJYNNT7oHLcn/yU5dVEPTh1HXSGjKY=; b=PdmlmDxNpIC0DalvXiDysWChCdUShFE+wpZDclDxuuMdbQ+zXyLeFi9gYlQqrgNL+L oYLfUXRpK9JHlaU1HzXspZ4g0HPMOP1LRZRTknmZE2VpfucrlM/GIRanQnr2yFUYfd7q rZrK3ioGgeVme4lnffjXKq5PVAyaAq2jmdC6pMbW5ItTHPaqKuzRYplvHP6v70nQYkB3 tsaIV14C3w86X4lhRQtoJ41pqxAcZOg9r6ODK0/vmPx1pRpoez3RmwuQMn63R1Uwh69u i8zW4e/ibnDzRhpljMcup12O2fZp5zEKraekrUcdluGYps4tweZHkvI0Aq7w8KRGkUE5 an4g== X-Forwarded-Encrypted: i=1; AJvYcCW6D1OK0F7+UuTytJ+hTIXQmRdJT+blOy34jzUTc/hnfqo81Vjadwnql7wy3mf67Z96MK6txa7aDJ+SheiqKXhn@lists.infradead.org X-Gm-Message-State: AOJu0YwWxkjiMjYmZaKjrjxqM/dagCgckqlNnUfRg3l+YwppNjwZwXZc I6pFRScBUVOqemYvExRb3opefDwMD0DxT27oBMQzfF4+oJ5KTfT1 X-Gm-Gg: ASbGncsAr4H2PJeh93RcPeuuGFznloKYc/83Gnkew+0QMFrI/5zRRQN2um3Q2oibrQW 4ixWELyxAQT13xLDEfD11vAsVnoAY8aTgCaGhExEyq6qaL7mW9/acSlB+mahCy0q+nrPElnJg7U FiywFIs4t0FLzBtRozf3YMiTMdSG1Lpb+/hFbAtCYVy7CR0hVCUVTglGVhThIPfml3BTXocHlIQ sYLryHYG0GYr8nkA9FFYpqL5rVib0SNCaayzzm/YflAzSW1KsXkHfFeJpII0qYzwR8z2LLAMano WS4wnRi25HHiR6cbDSN4cAPYaKrosl1bsjk= X-Google-Smtp-Source: AGHT+IGXqKHvuoBJmJkfw5Qx4e5pyHmiy2Na/MLB4Z1m2JlcO42ipiA1BkRJPFk9pYLAuBnL9gK5Ng== X-Received: by 2002:a5d:588b:0:b0:391:1213:9475 with SMTP id ffacd0b85a97d-3971d8021e7mr8853463f8f.24.1742097954718; Sat, 15 Mar 2025 21:05:54 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:44::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395c83b6a5esm10682902f8f.27.2025.03.15.21.05.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:53 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 09/25] rqspinlock: Protect pending bit owners from stalls Date: Sat, 15 Mar 2025 21:05:25 -0700 Message-ID: <20250316040541.108729-10-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5013; h=from:subject; bh=akh5yhXfrEG7tkbyObFzFj7wBccI2uw5HAmi9Ua2XD8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3c4jSnmA5ONpoVoAIOIOlsxwJqGlCaF71ZqrDS 9f300ICJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8Ryj+AD/ 9QOKlJQsL0hfRsxlnp+LcUhoinUUhuf5PSUgCRTOdkG5asYnxKe4RE+k6QEZqq50n7HbKHjFsdfrSv 5px/fRWV280XEsplziWpBm3nu9/uGIn06D/6DArjzNdiOULmNoKhIVTCdpvva8L3HI7GSTxZani9n6 MMd1j6C8TM9+oKJsg31rq03joUCbrTzttht7zSk0WcHYwGmbbQNFYepfby2aIG6QR4whRiRfFigmjD 1nOdu68WNCp+zFeB5UwD1SKNXHXAiI47mTVfHATyZW2gO+qoVjxj0jcjEQISr2U7ilsZqhtUV4qu79 0rmI1TdWYDffcf35yEPkekzca8OUaen5puFYbpAn2C5IQghtsvhiDeLpI+qrW+Dh9VrVLfJeZb4PA/ CmKe2nkK73+vP66V/atyI3Nop6a65yKO4dVVKbSm3r6dXZdycqN3CO6uWPWXasVJOnEwagR0GKKcHV 5lNkIep13oO1YQkgmpMqkC4RCf6IhbIwCFAm6j9Mt+0TJyAB8WHcAIwetF6IDcyMBK5R+DiMlKafao b9h8GkivtSiN4L7TcMtqek4PinmZ+hoaj24dkKmwfoF1g7gqEI1D1SHHR9nGA/BtBKsGojUhnVbY1O sUkCihUbg8MADno6l9S45vaZux7YwSBu0+kqCFmosGJEUxcqTgLWNTqfWjaA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210556_496448_E7852F21 X-CRM114-Status: GOOD ( 20.36 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The pending bit is used to avoid queueing in case the lock is uncontended, and has demonstrated benefits for the 2 contender scenario, esp. on x86. In case the pending bit is acquired and we wait for the locked bit to disappear, we may get stuck due to the lock owner not making progress. Hence, this waiting loop must be protected with a timeout check. To perform a graceful recovery once we decide to abort our lock acquisition attempt in this case, we must unset the pending bit since we own it. All waiters undoing their changes and exiting gracefully allows the lock word to be restored to the unlocked state once all participants (owner, waiters) have been recovered, and the lock remains usable. Hence, set the pending bit back to zero before returning to the caller. Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout event statistics. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 2 +- kernel/bpf/rqspinlock.c | 32 ++++++++++++++++++++++++++----- kernel/locking/lock_events_list.h | 5 +++++ 3 files changed, 33 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 5dd4dd8aee69..9bd11cb7acd6 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -15,7 +15,7 @@ struct qspinlock; typedef struct qspinlock rqspinlock_t; -extern void resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); /* * Default timeout for waiting loops is 0.25 seconds diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index d429b923b58f..262294cfd36f 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -138,6 +138,10 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); * @lock: Pointer to queued spinlock structure * @val: Current value of the queued spinlock 32-bit word * + * Return: + * * 0 - Lock was acquired successfully. + * * -ETIMEDOUT - Lock acquisition failed because of timeout. + * * (queue tail, pending bit, lock value) * * fast : slow : unlock @@ -154,12 +158,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : * queue : ^--' : */ -void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) +int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) { struct mcs_spinlock *prev, *next, *node; struct rqspinlock_timeout ts; + int idx, ret = 0; u32 old, tail; - int idx; BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); @@ -217,8 +221,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * clear_pending_set_locked() implementations imply full * barriers. */ - if (val & _Q_LOCKED_MASK) - smp_cond_load_acquire(&lock->locked, !VAL); + if (val & _Q_LOCKED_MASK) { + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); + res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); + } + + if (ret) { + /* + * We waited for the locked bit to go back to 0, as the pending + * waiter, but timed out. We need to clear the pending bit since + * we own it. Once a stuck owner has been recovered, the lock + * must be restored to a valid state, hence removing the pending + * bit is necessary. + * + * *,1,* -> *,0,* + */ + clear_pending(lock); + lockevent_inc(rqspinlock_lock_timeout); + return ret; + } /* * take ownership and clear the pending bit. @@ -227,7 +248,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ clear_pending_set_locked(lock); lockevent_inc(lock_pending); - return; + return 0; /* * End of pending bit optimistic spinning and beginning of MCS @@ -378,5 +399,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * release the node */ __this_cpu_dec(rqnodes[0].mcs.count); + return 0; } EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath); diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 97fb6f3f840a..c5286249994d 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */ LOCK_EVENT(lock_no_node) /* # of locking ops w/o using percpu node */ #endif /* CONFIG_QUEUED_SPINLOCKS */ +/* + * Locking events for Resilient Queued Spin Lock + */ +LOCK_EVENT(rqspinlock_lock_timeout) /* # of locking ops that timeout */ + /* * Locking events for rwsem */ From patchwork Sun Mar 16 04:05:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2A6A6C282DE for ; Sun, 16 Mar 2025 04:24:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=P9oRTQ+VV5E61dUbGRZRpYYl6d2ZXF0CN+Od4Rtwxwc=; b=a3zEQPt0BMSEq37A6REJxT3Ftx 51mq0V4l26vldbuMJGxZVSD2y9jD6mDveM5XuOPOZtsuxXbqcHAHCAv1kt0JMoCq8tsDVg8jTNAOl b1g7bvOcPVaL/YV1fnaLjlUj39+Qld3/QQ1lrQnafB4DhNuLv7udwPbgIy/B5aUIh+VmIPyqjl8+g f0lhd4C9fFp6P+spsiLFVfzjyVfRKGmumxhiUWUBuRhNOypqNhdk/7+P6FUfRnl2knMuMBmQjOlyJ Vo5pYDK87M0JcycMl/0MKwZVX4HOz3z3lZxU7r3DcSls8DQxYgHRz1zqmaOhtBOhNsabbG37U8gxX UaJiapaQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfY7-0000000HEwj-3Uo4; Sun, 16 Mar 2025 04:24:31 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfG9-0000000HCGA-2GMN for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:58 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-43cf0d787eeso11500115e9.3 for ; Sat, 15 Mar 2025 21:05:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097956; x=1742702756; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P9oRTQ+VV5E61dUbGRZRpYYl6d2ZXF0CN+Od4Rtwxwc=; b=gwsmfCL+LgzbDeQkwUr82h+SeN4fk8yl3accNjdJ9FEoK84YfPOePoBWb0NcJhz7+S q+AHcqf7LAKhUsCjIg4hybUm+E4Wmx1g6X5Df9ix5tQ3ckMZkoq+6wV4JRlJjlO/K5Bt zZ+uPC92PLPvg2lkTnAV/NsAkKhppmIPZAfqMVhxxfLecLEvnfXdLtM+8uFLf0CmTYEq YIMIXm7zT0hcdq6yroBYNZis6vyB2MbSae7xw+7+06XoDs8Sx1zxcB+nkbG6IkR/dfDN zgAF4osIVhkPB96gy/f7KVmFXyZSGeMcHjQqMIgxbbtTI9wX64STo3PaQ2OiotSHwpuV J0SQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097956; x=1742702756; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P9oRTQ+VV5E61dUbGRZRpYYl6d2ZXF0CN+Od4Rtwxwc=; b=ZKmguzby1kYs+k3VPksH9MfjQA+NwQDnGNRI8m28C+JA2tdKjr1o/+iVXDe3PmYeyw 5K57A9/RbdjYjRvDgTwVDQ/fHYhzwg0Zkrj9HjwNqI3ZKndYKds4bd+7EcnPJXe0i8Im Co+DIxYQrmPd2iyUxJiqeiH7y3yWPqWA8AubGJAfTSxKP7pf+u9PGFWUqLZ8UBsOsjnY n83VOiypN1xot6AptUV9+87ujgBXDVvKK7mCzB8umGCUiDWkZHhYNY+5LKGIvBGvY881 0olBrNz/0FEenAGW6mMORFc0NuidZYlkIgeElEdsB+7eMWpNhPn2yckRX3vBDL55Yw0n j6Cg== X-Forwarded-Encrypted: i=1; AJvYcCVzerS6jKQ6ePf9o65/kqMxcI7R6hsCr4H45D1yeoIXpo32ItLPDLUUXkz1tKTSomo3kTOmCRZ8qNdifLI5673U@lists.infradead.org X-Gm-Message-State: AOJu0YyvT+m6tSrjell3DGviWoRB5axVvuwQXaPiMXp9e4d8cgIli7oD Hkf+vXeh8jCoh2YpLFY9KWSt0XjXnc1umhY4JZxl0QTY9N7NLf7HBD0mqVN2HfY= X-Gm-Gg: ASbGncsuLByQQwj3MeAgfPJOt1FQdwN31Dg47ef+B5/ME3KKVe4H/Tnu7MP3noYq2Ys aq5Fs65gckuJURFr/PP74YOQDCIcelLg3d1K4mep8KdRwMEHyOh/DLApP+hYMKxUycZsVxKmSma GqawdypC2ag5o+pN5NYl82Fd4mNvt1fVbHkLSqxgy3YF2jhzpiNFdS0abmonhrm/aY3hgAshxAx I1hNA/fA9RAqpu3qH3egm6iZmidePATbzHTIuoql687Ju3z56bfWXWCDteIDHHe14aBUIlRzmwa RjFPZZNGz4r/HbO+DMWhINZ2WZYibtSy2yE= X-Google-Smtp-Source: AGHT+IFkEauTHl/obyRmNhRmy/pJEBw/VAi/+QiZFNK10Y71rJhRLvMF1sFfj0C34r6TkaQ2AWy76g== X-Received: by 2002:a05:600c:548e:b0:43d:b32:40aa with SMTP id 5b1f17b1804b1-43d1ec72a60mr95473375e9.3.1742097956021; Sat, 15 Mar 2025 21:05:56 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:71::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395cb40fab8sm11240358f8f.63.2025.03.15.21.05.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:55 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 10/25] rqspinlock: Protect waiters in queue from stalls Date: Sat, 15 Mar 2025 21:05:26 -0700 Message-ID: <20250316040541.108729-11-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=8535; h=from:subject; bh=vVjQGLLHhTQCMBetlOFIdU+ojPl1vLkRKHm30dVzCfs=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3cSLlo4tmtO0ItN2MrwskyKIwLLs6w+ebaLBXo 7AIxPteJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8RysmSD/ 4gAxpbVRAMKnrlkkFfN4Ga5MkaT9kFKIB75NzYnTksK7CH8hhGDYv+JLi+RdymiKCyY9nzzCusLCvH DgKiy8dOd7jd4kK2asNPQv83NeMlK0Y7ez2xIW0aheacqs2xHVRzHNpXJrhlXMFk9AOjed7lRqaZkk MHYjtNrFJYw2buxWBArDplbtplJ6ZFnH/R4X9150luydwS58JO6v1dpAXhDRtZug46Mf3bEhF2OxqC jOYtEwXKFE02GDoCLR+Ux1Tq9HAULxytmj+cG4B/5lN3ArOwmML81960DEOmDRtiBuM13th+wrnsBv j5ht8UeNorxeqAqlS0DKTVL5GwgKQ0J/gMt567PrzjsdttH9cQ2U+GS1RvleETvs2fRov1kvHQlCBh zWbay++IImUzfsbrUbJsWtbQz5zd7WoeMVXwycELimgnu6pMDDfOmXnPyktSfkP+Y2afRyImMjuGmz onfCAORf8i939MF0Z78DKXqnm5ooMaXPCja7iWalvBWDsRQVJXMA4eNMfbKC582zqFpuWWwa5V8NWm VhoIH2++de0gRcjcmh/Wg9q2PogVDyiUpdPW0390l2OIviAlmCmDbOMeMMvmSAOonJmYnfqgmklT82 erZE+jj8vv/93kMS/3ldvy0UA/t4FMfRnggOei1u5ghurEjEm+jmBceT6Qrg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210557_700313_5CF20A48 X-CRM114-Status: GOOD ( 38.16 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Implement the wait queue cleanup algorithm for rqspinlock. There are three forms of waiters in the original queued spin lock algorithm. The first is the waiter which acquires the pending bit and spins on the lock word without forming a wait queue. The second is the head waiter that is the first waiter heading the wait queue. The third form is of all the non-head waiters queued behind the head, waiting to be signalled through their MCS node to overtake the responsibility of the head. In this commit, we are concerned with the second and third kind. First, we augment the waiting loop of the head of the wait queue with a timeout. When this timeout happens, all waiters part of the wait queue will abort their lock acquisition attempts. This happens in three steps. First, the head breaks out of its loop waiting for pending and locked bits to turn to 0, and non-head waiters break out of their MCS node spin (more on that later). Next, every waiter (head or non-head) attempts to check whether they are also the tail waiter, in such a case they attempt to zero out the tail word and allow a new queue to be built up for this lock. If they succeed, they have no one to signal next in the queue to stop spinning. Otherwise, they signal the MCS node of the next waiter to break out of its spin and try resetting the tail word back to 0. This goes on until the tail waiter is found. In case of races, the new tail will be responsible for performing the same task, as the old tail will then fail to reset the tail word and wait for its next pointer to be updated before it signals the new tail to do the same. We terminate the whole wait queue because of two main reasons. Firstly, we eschew per-waiter timeouts with one applied at the head of the wait queue. This allows everyone to break out faster once we've seen the owner / pending waiter not responding for the timeout duration from the head. Secondly, it avoids complicated synchronization, because when not leaving in FIFO order, prev's next pointer needs to be fixed up etc. Lastly, all of these waiters release the rqnode and return to the caller. This patch underscores the point that rqspinlock's timeout does not apply to each waiter individually, and cannot be relied upon as an upper bound. It is possible for the rqspinlock waiters to return early from a failed lock acquisition attempt as soon as stalls are detected. The head waiter cannot directly WRITE_ONCE the tail to zero, as it may race with a concurrent xchg and a non-head waiter linking its MCS node to the head's MCS node through 'prev->next' assignment. One notable thing is that we must use RES_DEF_TIMEOUT * 2 as our maximum duration for the waiting loop (for the wait queue head), since we may have both the owner and pending bit waiter ahead of us, and in the worst case, need to span their maximum permitted critical section lengths. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/rqspinlock.c | 55 ++++++++++++++++++++++++++++++++++++++--- kernel/bpf/rqspinlock.h | 48 +++++++++++++++++++++++++++++++++++ 2 files changed, 100 insertions(+), 3 deletions(-) create mode 100644 kernel/bpf/rqspinlock.h diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index 262294cfd36f..65c2b41d8937 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -77,6 +77,8 @@ struct rqspinlock_timeout { u16 spin; }; +#define RES_TIMEOUT_VAL 2 + static noinline int check_timeout(struct rqspinlock_timeout *ts) { u64 time = ktime_get_mono_fast_ns(); @@ -325,12 +327,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * head of the waitqueue. */ if (old & _Q_TAIL_MASK) { + int val; + prev = decode_tail(old, rqnodes); /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); - arch_mcs_spin_lock_contended(&node->locked); + val = arch_mcs_spin_lock_contended(&node->locked); + if (val == RES_TIMEOUT_VAL) { + ret = -EDEADLK; + goto waitq_timeout; + } /* * While waiting for the MCS lock, the next pointer may have @@ -353,8 +361,49 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * store-release that clears the locked bit and create lock * sequentiality; this is because the set_locked() function below * does not imply a full barrier. + * + * We use RES_DEF_TIMEOUT * 2 as the duration, as RES_DEF_TIMEOUT is + * meant to span maximum allowed time per critical section, and we may + * have both the owner of the lock and the pending bit waiter ahead of + * us. */ - val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK)); + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2); + val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || + RES_CHECK_TIMEOUT(ts, ret)); + +waitq_timeout: + if (ret) { + /* + * If the tail is still pointing to us, then we are the final waiter, + * and are responsible for resetting the tail back to 0. Otherwise, if + * the cmpxchg operation fails, we signal the next waiter to take exit + * and try the same. For a waiter with tail node 'n': + * + * n,*,* -> 0,*,* + * + * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is + * possible locked/pending bits keep changing and we see failures even + * when we remain the head of wait queue. However, eventually, + * pending bit owner will unset the pending bit, and new waiters + * will queue behind us. This will leave the lock owner in + * charge, and it will eventually either set locked bit to 0, or + * leave it as 1, allowing us to make progress. + * + * We terminate the whole wait queue for two reasons. Firstly, + * we eschew per-waiter timeouts with one applied at the head of + * the wait queue. This allows everyone to break out faster + * once we've seen the owner / pending waiter not responding for + * the timeout duration from the head. Secondly, it avoids + * complicated synchronization, because when not leaving in FIFO + * order, prev's next pointer needs to be fixed up etc. + */ + if (!try_cmpxchg_tail(lock, tail, 0)) { + next = smp_cond_load_relaxed(&node->next, VAL); + WRITE_ONCE(next->locked, RES_TIMEOUT_VAL); + } + lockevent_inc(rqspinlock_lock_timeout); + goto release; + } /* * claim the lock: @@ -399,6 +448,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * release the node */ __this_cpu_dec(rqnodes[0].mcs.count); - return 0; + return ret; } EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath); diff --git a/kernel/bpf/rqspinlock.h b/kernel/bpf/rqspinlock.h new file mode 100644 index 000000000000..5d8cb1b1aab4 --- /dev/null +++ b/kernel/bpf/rqspinlock.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Resilient Queued Spin Lock defines + * + * (C) Copyright 2024-2025 Meta Platforms, Inc. and affiliates. + * + * Authors: Kumar Kartikeya Dwivedi + */ +#ifndef __LINUX_RQSPINLOCK_H +#define __LINUX_RQSPINLOCK_H + +#include "../locking/qspinlock.h" + +/* + * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value + * @lock: Pointer to queued spinlock structure + * @tail: The tail to compare against + * @new_tail: The new queue tail code word + * Return: Bool to indicate whether the cmpxchg operation succeeded + * + * This is used by the head of the wait queue to clean up the queue. + * Provides relaxed ordering, since observers only rely on initialized + * state of the node which was made visible through the xchg_tail operation, + * i.e. through the smp_wmb preceding xchg_tail. + * + * We avoid using 16-bit cmpxchg, which is not available on all architectures. + */ +static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail) +{ + u32 old, new; + + old = atomic_read(&lock->val); + do { + /* + * Is the tail part we compare to already stale? Fail. + */ + if ((old & _Q_TAIL_MASK) != tail) + return false; + /* + * Encode latest locked/pending state for new tail. + */ + new = (old & _Q_LOCKED_PENDING_MASK) | new_tail; + } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new)); + + return true; +} + +#endif /* __LINUX_RQSPINLOCK_H */ From patchwork Sun Mar 16 04:05:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A7EABC282DE for ; Sun, 16 Mar 2025 04:26:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=HX6HuMB01wWAUhOciNQt+DcJpa4mtBn0KQKhBNdUTd8=; b=BQgoT29Vi4fRoybWUfHlrpESZG 1QKTIneNKaJVF2c1Pnmms6vHc8cvkq0ecaRT/Tz7GjluSiFjexYDshutCIbTowfaNQw0ecvyxXxwl euRzlGVZlD6bOt2DRtPm1qXmOHc3b5rMhkX7pTuamfQCkLxu+Vb1dc/w1okBUIkHBI+YXyzqHxqcX AR5S7uqnGOK3xzq+bXByEjo28Mos5vtvZ1j6/RMB+S2PblVEKTIHnjFiLk+8B4HhaR+arD21KN734 TQ2hOb/FLbCSK4Tvs7+9uNioYpvEfLXrDMGu6lC7C4/O+mQUEdcqDV0MG+nycR7/z69EAMgFenSUG XpyU9R4w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfZl-0000000HF5D-285O; Sun, 16 Mar 2025 04:26:13 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGA-0000000HCGm-1z53 for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:05:59 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-43d04ea9d9aso3923145e9.3 for ; Sat, 15 Mar 2025 21:05:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097957; x=1742702757; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HX6HuMB01wWAUhOciNQt+DcJpa4mtBn0KQKhBNdUTd8=; b=enOdna0yVBWP5aKiY+KtxDxEhdH3gf3z7YGuh4gVOjfEDQGn/ZzUgYbKWzG3RcUwmD VgeMp90yQKrPCNPOkdpGIqTbYHKLnsY2tNUiLhrLuytyOdDppRUz8vo6PrC1355JV+fp y0+RT0OYBzrxB00AEG6onYzenVPFSFm5r00dV4hHgOx4oxfXVMr2wsfW/0lkJd9+yg1j Kdma1o81Yv7qni7h+Py1Nw/mcdti5rcIeLQYCS0xkDRLNCV5NxwAkS35Qjie92qX0E53 Rd+aL7oncHgwzFErS5c0su0syZDzKkSz7e4egs00WgCgM5EWCI7vkwa9ORvPGZ8rK86y 4KqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097957; x=1742702757; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HX6HuMB01wWAUhOciNQt+DcJpa4mtBn0KQKhBNdUTd8=; b=VtWEmweonhepNeTS/A7XshDCT1y7Of51iEDOVUwPZZ4mMPW7uFr4AvWMKcBth4qeCL hZd+PTmOeEZnMtrVYCZRXj2l93JOPbFoULj8Qw0ju/OeK4wc4Ex/lIZc+OjRMayU78ng QeZ+0xHOjFWxzsTJvoJa5mmEL/cuykyo/OhOM2JErCgoryNncp224Aw67FFh/AZOGXqj KBwfpH4aEIvMEw4QYhZ1vDVzGs/O4zOfb0yQI+j8OlLt36ywQR8TqmtcFiATOaKtBcO/ sju0BP9nJpmH4GIumD2kStPUzMsWuYwlLZhUCvZ4Aq8Qw1+JvJ+jAmkoBb2/CHYlJ0nq 0dZw== X-Forwarded-Encrypted: i=1; AJvYcCWA+wxYmPcuhRjOlbFs8wOeJQdXFcu93Xc1Noawg4wkbfc/AYBV7nOmr01LmRjnr1dZfB/GWeTbQ8qp1kMCa/si@lists.infradead.org X-Gm-Message-State: AOJu0Yz3nqCb/mi/6a5UWpUEECN3AEMwXiSmjgL4wuojcWqHydL/fRpr Zp5tSwojzVVNDmP9rUshNhKvdM8PX0INHvUixi6WXtWsKqdnkWfnqmH7VsBE+w8= X-Gm-Gg: ASbGnctUPnKzMMj2zLk8oeiw9qKLmSo/Puh8/J7Tog+Hk/eYdThww/wZ7OoMYBBwpdW kD6r/QCGbYRW1Z5yGMQtQ906A11L3v1rQ/26Sm/mKzBvR+4uZDhmevz0lxSLzKfm8g6hb+tFuAm MkTBXzdhK/2WPBzSw0SDLsIfAvcoK/h7lr1ydben4/ZA68W28YAAfjM6wNOE4vkq9CDRpf4ZbuW YKUAaDuR9aUiitOV8lW7fghvMeGLzHm+S5avI00PzcId/NTh9Mm3OqkTR/NdCx79SJwRcGtbHeR n3kF/pb9jWzaXVXQFmn7LxIth52OKirWeuIgahsI3PU2ow== X-Google-Smtp-Source: AGHT+IGdjZLS03B/i/oyj0FVCH6ak251OqqszbVDTcp6yx3FeG6rdnMDu6XNNYRan/DL0Mz7Gi3+Xw== X-Received: by 2002:a05:600c:1548:b0:43c:e2dd:98f3 with SMTP id 5b1f17b1804b1-43d1ecff3d0mr77112145e9.21.1742097957130; Sat, 15 Mar 2025 21:05:57 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:4c::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d200fad59sm67783415e9.26.2025.03.15.21.05.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:56 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Barret Rhoden , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 11/25] rqspinlock: Protect waiters in trylock fallback from stalls Date: Sat, 15 Mar 2025 21:05:27 -0700 Message-ID: <20250316040541.108729-12-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1825; h=from:subject; bh=6ga9NaPnm2qscV5e908A787vKJcwTZL0L+jNyYJdaTA=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3cd+lc2mT7eAYTtSCeru7PXMbtZTLl1rVKe2Na zqq61WeJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3AAKCRBM4MiGSL8RymCDEA C4K937bgxXiBFwWg7wVsj1Ouwcy1m1SvtfIZLVr3xtAqcbiYKXGG0i5pXm3XBUM2mgmI9CropTshbQ dQWs7kFH1UBO0zy35y69RMSbrW+XYvOIUv+szj8w1OnAXfbt5cxjZY2oWdwBfSiPYvDI8qqWAJq92x Eu9XxhjsiU3DKxEIhABz7oovXElnhcqiaJX0bMqx1PJqHQEUx+v8FQX4j/K/bNuVyKsahTVF0QLweA VdrXkytJDmq29EvwZiID/KwMdFSAlK8imRvbwx38oDpj5kOvmKd2YPf+9yf+Q1X6A2Om6Z42ulZ+ju 44r5dNM2UD3qahjawRGUazaB+18OIHX1yD++NqUAjsQFympNzvBFjjeeC8ffwxmNJkaXt7zT43YAqL /uOkIADmjS54logpdlm10XVwYzPrtEOTELnZknQ8zGIG8h4on+rvZfgi++0tV7gs78T0qEEknIsIYB 0uXO7Y1CGP+IevSyeQ3EJvHV2IYztylGYEHyA9GKqpmo5nBFslUK7B3Y9WsiuQJFs3DzzTDWNjIOsc 8Y37rSMW+XsAoDPAbL+h6fkt5rkhyDfpkpnfY7TEZ9vp3KcnIvAS4dh/o691vcK4ZXIZx+LGjvb7WN VxSTT/cmkbB6g757jZk6EG/qXsjkzmmz6wFs7p7bSxl0HNH1488vWapQ0xWw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210558_527222_1FA9224F X-CRM114-Status: GOOD ( 15.22 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When we run out of maximum rqnodes, the original queued spin lock slow path falls back to a try lock. In such a case, we are again susceptible to stalls in case the lock owner fails to make progress. We use the timeout as a fallback to break out of this loop and return to the caller. This is a fallback for an extreme edge case, when on the same CPU we run out of all 4 qnodes. When could this happen? We are in slow path in task context, we get interrupted by an IRQ, which while in the slow path gets interrupted by an NMI, whcih in the slow path gets another nested NMI, which enters the slow path. All of the interruptions happen after node->count++. We use RES_DEF_TIMEOUT as our spinning duration, but in the case of this fallback, no fairness is guaranteed, so the duration may be too small for contended cases, as the waiting time is not bounded. Since this is an extreme corner case, let's just prefer timing out instead of attempting to spin for longer. Reviewed-by: Barret Rhoden Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/rqspinlock.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index 65c2b41d8937..361d452f027c 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -275,8 +275,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ if (unlikely(idx >= _Q_MAX_NODES)) { lockevent_inc(lock_no_node); - while (!queued_spin_trylock(lock)) + RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); + while (!queued_spin_trylock(lock)) { + if (RES_CHECK_TIMEOUT(ts, ret)) { + lockevent_inc(rqspinlock_lock_timeout); + break; + } cpu_relax(); + } goto release; } From patchwork Sun Mar 16 04:05:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4C3DDC282DE for ; Sun, 16 Mar 2025 04:28:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Trt2PZaoDvW1qF5TKGphGKbeL9d09FYulI4YnHalG2E=; b=A+/xU7iiR8MAL/CUUvaXWjDceN 1pWORZPEVV6KS6EfBJnEwi6gqOePC2W5JBSDEH3U99AfKmC5Qdy1g6DQ4ehVX7Mn13oBVvPbT7zlE 2o8OTqkqb67PMm9SK2lCDUFV4LGZtQFyNsAhzjZKrB/jQfAorOddLBJGY7y+7QZ23TUIOG62EOEpn C2evAFCcRXsM1LC+WNnPb3h1tFJSVpu+8YNevYa5YZgu4thMbRUgtyuUxizolIrV7sOt2AHJLtbeN fBwElQ9dHu1u8pZnEXl3dqMNi61wRoKAKVTtbV4s96AG/qh5a6NKh5DNn4IuRqRuW2de4r8yhFRe5 qnWanqQQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfbP-0000000HFGO-1G48; Sun, 16 Mar 2025 04:27:55 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGC-0000000HCHc-1ZLk for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:02 +0000 Received: by mail-wm1-x343.google.com with SMTP id 5b1f17b1804b1-43d0618746bso7043755e9.2 for ; Sat, 15 Mar 2025 21:06:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097959; x=1742702759; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Trt2PZaoDvW1qF5TKGphGKbeL9d09FYulI4YnHalG2E=; b=E8cqzRDl4ZNoYPyohZgjQpyIA7A3hbJUYe4O3xAMPDD0LAqv9E71ghaF8VTN5sROqO ce5rpC4+g7pKebYpd4ljyQhzf2u4tdWP+SNHX1vJomtZcHHW2lcbzcXd4HlujZ+0T4Bp DNVEKRMruW5ykAIt5Ui0TmvOcllhRpcreJnMjtRiDReNVaQRTm6B9hJCIDmnxjX0aVj6 j5J07xS2DerrjA6774AtrdesMGOnoof2nGYAthEBl8voAlDD6K2BA8c/kjK10gczWBK9 3XO0OfBlAHwT+RN7NZNYfmhS1HgjKWH2I6OWjjxwjK4d3YQnttXqoAIOhMAwCMP8rTfR bBxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097959; x=1742702759; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Trt2PZaoDvW1qF5TKGphGKbeL9d09FYulI4YnHalG2E=; b=Q6/pJFyf+0wVZhff4ef/mdad1G7vBvv8ZmWMIqVesV/9cjuvsXIH7RmlGFlIZk5ykV /aKpZGflZZ1h0zSLOboFlM3IxWlxBwFYOcJzeQC2tC5ci/4UTWdN0tYUtIfRI0qKMBmr RN01rtTBdik7rr0pkjCfRIxmoh6qf7uwts+ZHjXthMBeM3E8MNNRaESJQF2dVwUyhgN1 dOBITfTLNNcNsZjKx6hwbYgDCrbcixrI33LXFVimjRK6OD7U/a8IZep4+vzyjiL98Fkr hfTLeso+pYHrVSi4pcizOZ1r6N8f2xkkswyel7HgDMDMfL3oZBwjjwWXumeE5iDT8jRp nlvw== X-Forwarded-Encrypted: i=1; AJvYcCXmJfvuo3evzJUwhLP/nn1xobDuIQHRlyV0i7WixwprZpctyBJvSp1Fz8n4kbT2z3fKqyP01aHaq6ksRSakSuVd@lists.infradead.org X-Gm-Message-State: AOJu0Yz5es1VgtSuI1WFwIQCnGLsg9FuO8UQoUiHKQq2A8et42K3gNJ2 RXcPGFu7ZWrFKGd1Wfu5hrJY//JeAbTsPQHqW3gwGD3NHGo2jIIeJlN43m7FWfs= X-Gm-Gg: ASbGnctSYlUi175P2KtA5up0RfKYHZE2sr+MlyyTlQhFY12HHLmifBKAwzLkqhjcDdY aVdT3BDJ6pmylA0Kdd2euG/rOr9QZt3SuetAmEOYkfYV4mrjTJzAxKyGee0XfxGiP/zAgJFKva3 3zZLZ3LD3dcImjsB3BW5HBUTtz5P/DJDo2Mzdu1exaBiciO8eRp57bp6Za3njupZOzA1Blv7LmL xy+quRkOE1XMjqPNIIVBDXzc0bjt9E2wQ0XC66MHNaOXs8wzq3mJW/GUsKthqjL/jONUS+1F1Kr 3fVsZeI6UuSChiB5qly6/RxocauAZLSeHpY= X-Google-Smtp-Source: AGHT+IELCRC7Npj53fsWLLPGTTMW2S8mtuRNJw4+tQrj7Ej7EcmIuAdgGibINSTvjMReh3aiCtM8KA== X-Received: by 2002:a5d:6da1:0:b0:391:4389:f363 with SMTP id ffacd0b85a97d-3971ee44e17mr9518660f8f.21.1742097958853; Sat, 15 Mar 2025 21:05:58 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:72::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395c8975b90sm11082741f8f.53.2025.03.15.21.05.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:57 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 12/25] rqspinlock: Add deadlock detection and recovery Date: Sat, 15 Mar 2025 21:05:28 -0700 Message-ID: <20250316040541.108729-13-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=16807; h=from:subject; bh=R8rK3hAtLu0/qV+H7D4vZfaR/OW4OI0d017EzOpkC74=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3d3cpOPE+JD/6A6QOyXue338yygczvobcBs9p0 t9lKfEyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8RyulYD/ oCAFAcDJwUGqxq0HokrhAu3NOXVZPUQzrD1Fd/bAV3pkN73meLLVeovkYVI0ve7Kjx2/hcMJyYvJbr jkpwrffoXO8Xr768OKNR4mBc6jrytY5czg/+e0BKRXct/n5IFosZ8+tHgjaY9j5RcQcq7eRwUQnxnn Q7S84XghkBroVcQtWuJM98v/KONhHgJCpGMQAAUU32SNWJkc28V4HIWt9hoQsoIi6CJTX8Uc88a/Qf FVOQlNaBXGiYKMb9K3kOjA41VLmhe565kuFeGvXUnJi+Cn9xac01lMqZRoHRIVKeHgz/mo0i+F7abp eofFVG0gbpmw0Gb+GlDHtD6UztvzXQW6TUTZqpfs9TXc5ti4+Nv12bATZP9jhNTd3dmKxw+z7JScl4 BMgrqxtE/WrA7jZVALBqwJ2kPbQolbz8sbNHH5z0796JP71TrPx7mfKFi2Us1I0Nv+hUkuwarjUzIt VJhlvGVJXd9BRHbCHCSIq+sh7i5y3bcjHmfFhim9zewk0L8xWupkFUohlI78gsDmgMlQl7f9w8VVza 8hOQDu9DZMx4wjyXZL5LKr4XOZiXlowtWpcR++9jgQoihCpQXlPZJouhUpu2Gt9rOQdgIMp/8Ws4U2 ep59HLE49pV/ul4ULzQIO5O665ynyxzgE/f/3H6pSPpftXxAxajnOTA5uYRw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210600_568117_3409E6B9 X-CRM114-Status: GOOD ( 38.34 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org While the timeout logic provides guarantees for the waiter's forward progress, the time until a stalling waiter unblocks can still be long. The default timeout of 1/4 sec can be excessively long for some use cases. Additionally, custom timeouts may exacerbate recovery time. Introduce logic to detect common cases of deadlocks and perform quicker recovery. This is done by dividing the time from entry into the locking slow path until the timeout into intervals of 1 ms. Then, after each interval elapses, deadlock detection is performed, while also polling the lock word to ensure we can quickly break out of the detection logic and proceed with lock acquisition. A 'held_locks' table is maintained per-CPU where the entry at the bottom denotes a lock being waited for or already taken. Entries coming before it denote locks that are already held. The current CPU's table can thus be looked at to detect AA deadlocks. The tables from other CPUs can be looked at to discover ABBA situations. Finally, when a matching entry for the lock being taken on the current CPU is found on some other CPU, a deadlock situation is detected. This function can take a long time, therefore the lock word is constantly polled in each loop iteration to ensure we can preempt detection and proceed with lock acquisition, using the is_lock_released check. We set 'spin' member of rqspinlock_timeout struct to 0 to trigger deadlock checks immediately to perform faster recovery. Note: Extending lock word size by 4 bytes to record owner CPU can allow faster detection for ABBA. It is typically the owner which participates in a ABBA situation. However, to keep compatibility with existing lock words in the kernel (struct qspinlock), and given deadlocks are a rare event triggered by bugs, we choose to favor compatibility over faster detection. The release_held_lock_entry function requires an smp_wmb, while the release store on unlock will provide the necessary ordering for us. Add comments to document the subtleties of why this is correct. It is possible for stores to be reordered still, but in the context of the deadlock detection algorithm, a release barrier is sufficient and needn't be stronger for unlock's case. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 100 +++++++++++++++++ kernel/bpf/rqspinlock.c | 187 ++++++++++++++++++++++++++++--- 2 files changed, 273 insertions(+), 14 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 9bd11cb7acd6..34c3dcb4299e 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -11,6 +11,7 @@ #include #include +#include struct qspinlock; typedef struct qspinlock rqspinlock_t; @@ -22,4 +23,103 @@ extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); */ #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 4) +/* + * Choose 31 as it makes rqspinlock_held cacheline-aligned. + */ +#define RES_NR_HELD 31 + +struct rqspinlock_held { + int cnt; + void *locks[RES_NR_HELD]; +}; + +DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); + +static __always_inline void grab_held_lock_entry(void *lock) +{ + int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt); + + if (unlikely(cnt > RES_NR_HELD)) { + /* Still keep the inc so we decrement later. */ + return; + } + + /* + * Implied compiler barrier in per-CPU operations; otherwise we can have + * the compiler reorder inc with write to table, allowing interrupts to + * overwrite and erase our write to the table (as on interrupt exit it + * will be reset to NULL). + * + * It is fine for cnt inc to be reordered wrt remote readers though, + * they won't observe our entry until the cnt update is visible, that's + * all. + */ + this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock); +} + +/* + * We simply don't support out-of-order unlocks, and keep the logic simple here. + * The verifier prevents BPF programs from unlocking out-of-order, and the same + * holds for in-kernel users. + * + * It is possible to run into misdetection scenarios of AA deadlocks on the same + * CPU, and missed ABBA deadlocks on remote CPUs if this function pops entries + * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct + * logic to preserve right entries in the table would be to walk the array of + * held locks and swap and clear out-of-order entries, but that's too + * complicated and we don't have a compelling use case for out of order unlocking. + */ +static __always_inline void release_held_lock_entry(void) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (unlikely(rqh->cnt > RES_NR_HELD)) + goto dec; + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL); +dec: + /* + * Reordering of clearing above with inc and its write in + * grab_held_lock_entry that came before us (in same acquisition + * attempt) is ok, we either see a valid entry or NULL when it's + * visible. + * + * But this helper is invoked when we unwind upon failing to acquire the + * lock. Unlike the unlock path which constitutes a release store after + * we clear the entry, we need to emit a write barrier here. Otherwise, + * we may have a situation as follows: + * + * for lock B + * release_held_lock_entry + * + * try_cmpxchg_acquire for lock A + * grab_held_lock_entry + * + * Lack of any ordering means reordering may occur such that dec, inc + * are done before entry is overwritten. This permits a remote lock + * holder of lock B (which this CPU failed to acquire) to now observe it + * as being attempted on this CPU, and may lead to misdetection (if this + * CPU holds a lock it is attempting to acquire, leading to false ABBA + * diagnosis). + * + * In case of unlock, we will always do a release on the lock word after + * releasing the entry, ensuring that other CPUs cannot hold the lock + * (and make conclusions about deadlocks) until the entry has been + * cleared on the local CPU, preventing any anomalies. Reordering is + * still possible there, but a remote CPU cannot observe a lock in our + * table which it is already holding, since visibility entails our + * release store for the said lock has not retired. + * + * In theory we don't have a problem if the dec and WRITE_ONCE above get + * reordered with each other, we either notice an empty NULL entry on + * top (if dec succeeds WRITE_ONCE), or a potentially stale entry which + * cannot be observed (if dec precedes WRITE_ONCE). + * + * Emit the write barrier _before_ the dec, this permits dec-inc + * reordering but that is harmless as we'd have new entry set to NULL + * already, i.e. they cannot precede the NULL store above. + */ + smp_wmb(); + this_cpu_dec(rqspinlock_held_locks.cnt); +} + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index 361d452f027c..bddbcc47d38f 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -31,6 +31,7 @@ */ #include "../locking/qspinlock.h" #include "../locking/lock_events.h" +#include "rqspinlock.h" /* * The basic principle of a queue-based spinlock can best be understood @@ -74,16 +75,147 @@ struct rqspinlock_timeout { u64 timeout_end; u64 duration; + u64 cur; u16 spin; }; #define RES_TIMEOUT_VAL 2 -static noinline int check_timeout(struct rqspinlock_timeout *ts) +DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks); +EXPORT_SYMBOL_GPL(rqspinlock_held_locks); + +static bool is_lock_released(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) +{ + if (!(atomic_read_acquire(&lock->val) & (mask))) + return true; + return false; +} + +static noinline int check_deadlock_AA(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + int cnt = min(RES_NR_HELD, rqh->cnt); + + /* + * Return an error if we hold the lock we are attempting to acquire. + * We'll iterate over max 32 locks; no need to do is_lock_released. + */ + for (int i = 0; i < cnt - 1; i++) { + if (rqh->locks[i] == lock) + return -EDEADLK; + } + return 0; +} + +/* + * This focuses on the most common case of ABBA deadlocks (or ABBA involving + * more locks, which reduce to ABBA). This is not exhaustive, and we rely on + * timeouts as the final line of defense. + */ +static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + int rqh_cnt = min(RES_NR_HELD, rqh->cnt); + void *remote_lock; + int cpu; + + /* + * Find the CPU holding the lock that we want to acquire. If there is a + * deadlock scenario, we will read a stable set on the remote CPU and + * find the target. This would be a constant time operation instead of + * O(NR_CPUS) if we could determine the owning CPU from a lock value, but + * that requires increasing the size of the lock word. + */ + for_each_possible_cpu(cpu) { + struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu); + int real_cnt = READ_ONCE(rqh_cpu->cnt); + int cnt = min(RES_NR_HELD, real_cnt); + + /* + * Let's ensure to break out of this loop if the lock is available for + * us to potentially acquire. + */ + if (is_lock_released(lock, mask, ts)) + return 0; + + /* + * Skip ourselves, and CPUs whose count is less than 2, as they need at + * least one held lock and one acquisition attempt (reflected as top + * most entry) to participate in an ABBA deadlock. + * + * If cnt is more than RES_NR_HELD, it means the current lock being + * acquired won't appear in the table, and other locks in the table are + * already held, so we can't determine ABBA. + */ + if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD) + continue; + + /* + * Obtain the entry at the top, this corresponds to the lock the + * remote CPU is attempting to acquire in a deadlock situation, + * and would be one of the locks we hold on the current CPU. + */ + remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]); + /* + * If it is NULL, we've raced and cannot determine a deadlock + * conclusively, skip this CPU. + */ + if (!remote_lock) + continue; + /* + * Find if the lock we're attempting to acquire is held by this CPU. + * Don't consider the topmost entry, as that must be the latest lock + * being held or acquired. For a deadlock, the target CPU must also + * attempt to acquire a lock we hold, so for this search only 'cnt - 1' + * entries are important. + */ + for (int i = 0; i < cnt - 1; i++) { + if (READ_ONCE(rqh_cpu->locks[i]) != lock) + continue; + /* + * We found our lock as held on the remote CPU. Is the + * acquisition attempt on the remote CPU for a lock held + * by us? If so, we have a deadlock situation, and need + * to recover. + */ + for (int i = 0; i < rqh_cnt - 1; i++) { + if (rqh->locks[i] == remote_lock) + return -EDEADLK; + } + /* + * Inconclusive; retry again later. + */ + return 0; + } + } + return 0; +} + +static noinline int check_deadlock(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) +{ + int ret; + + ret = check_deadlock_AA(lock, mask, ts); + if (ret) + return ret; + ret = check_deadlock_ABBA(lock, mask, ts); + if (ret) + return ret; + + return 0; +} + +static noinline int check_timeout(rqspinlock_t *lock, u32 mask, + struct rqspinlock_timeout *ts) { u64 time = ktime_get_mono_fast_ns(); + u64 prev = ts->cur; if (!ts->timeout_end) { + ts->cur = time; ts->timeout_end = time + ts->duration; return 0; } @@ -91,6 +223,15 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) if (time > ts->timeout_end) return -ETIMEDOUT; + /* + * A millisecond interval passed from last time? Trigger deadlock + * checks. + */ + if (prev + NSEC_PER_MSEC < time) { + ts->cur = time; + return check_deadlock(lock, mask, ts); + } + return 0; } @@ -99,21 +240,22 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts) * as the macro does internal amortization for us. */ #ifndef res_smp_cond_load_acquire -#define RES_CHECK_TIMEOUT(ts, ret) \ - ({ \ - if (!(ts).spin++) \ - (ret) = check_timeout(&(ts)); \ - (ret); \ +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ + ({ \ + if (!(ts).spin++) \ + (ret) = check_timeout((lock), (mask), &(ts)); \ + (ret); \ }) #else -#define RES_CHECK_TIMEOUT(ts, ret, mask) \ +#define RES_CHECK_TIMEOUT(ts, ret, mask) \ ({ (ret) = check_timeout(&(ts)); }) #endif /* * Initialize the 'spin' member. + * Set spin member to 0 to trigger AA/ABBA checks immediately. */ -#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 1; }) +#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 0; }) /* * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary. @@ -142,6 +284,7 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]); * * Return: * * 0 - Lock was acquired successfully. + * * -EDEADLK - Lock acquisition failed because of AA/ABBA deadlock. * * -ETIMEDOUT - Lock acquisition failed because of timeout. * * (queue tail, pending bit, lock value) @@ -212,6 +355,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) goto queue; } + /* + * Grab an entry in the held locks array, to enable deadlock detection. + */ + grab_held_lock_entry(lock); + /* * We're pending, wait for the owner to go away. * @@ -225,7 +373,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ if (val & _Q_LOCKED_MASK) { RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); - res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); + res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK)); } if (ret) { @@ -240,7 +388,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ clear_pending(lock); lockevent_inc(rqspinlock_lock_timeout); - return ret; + goto err_release_entry; } /* @@ -258,6 +406,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ queue: lockevent_inc(lock_slowpath); + /* + * Grab deadlock detection entry for the queue path. + */ + grab_held_lock_entry(lock); + node = this_cpu_ptr(&rqnodes[0].mcs); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -277,9 +430,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) lockevent_inc(lock_no_node); RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); while (!queued_spin_trylock(lock)) { - if (RES_CHECK_TIMEOUT(ts, ret)) { + if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) { lockevent_inc(rqspinlock_lock_timeout); - break; + goto err_release_node; } cpu_relax(); } @@ -375,7 +528,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2); val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) || - RES_CHECK_TIMEOUT(ts, ret)); + RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK)); waitq_timeout: if (ret) { @@ -408,7 +561,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) WRITE_ONCE(next->locked, RES_TIMEOUT_VAL); } lockevent_inc(rqspinlock_lock_timeout); - goto release; + goto err_release_node; } /* @@ -455,5 +608,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) */ __this_cpu_dec(rqnodes[0].mcs.count); return ret; +err_release_node: + trace_contention_end(lock, ret); + __this_cpu_dec(rqnodes[0].mcs.count); +err_release_entry: + release_held_lock_entry(); + return ret; } EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath); From patchwork Sun Mar 16 04:05:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018336 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 51EC1C282DE for ; Sun, 16 Mar 2025 04:29:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ClBpfryF0l26LJucqaea+hnSd7y5tppZSgQBloKDt3A=; b=MGnhYh5wH8UrbzzWoi5IicaRDx tsTqrwPTch2dSHEG14T8CtNH01uvuEUusHRZ5wEQ4bVfug/beM3aMGvDF+IIu9R+bg5vUPRsPgjMU 2bqN79/wwh0nX1k1NJPcb7dYJP4uPIzvjYuSAa51Md0j4yDQBTC//Rkph4a4c3fz8ENCfIjEJ63bM Nz8btP2R3hk/NjLJ8tCtNRbWkGKBvteRE8tSoDzGvGiXvx6e+KyGX3Ss8/9Tz04EWFPB4eoO0UgyU kunC0fZ5gmenQZutDpuoGk+Tt1ZxU1FWOdjYDE0M/FAq+QREZQcN23PhVR0Fx86KgBUzk8z7raloU 82Cswo7Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfd4-0000000HFUw-0VGe; Sun, 16 Mar 2025 04:29:38 +0000 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGD-0000000HCIL-1lhe for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:02 +0000 Received: by mail-wm1-x344.google.com with SMTP id 5b1f17b1804b1-43cf628cb14so5252345e9.1 for ; Sat, 15 Mar 2025 21:06:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097960; x=1742702760; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ClBpfryF0l26LJucqaea+hnSd7y5tppZSgQBloKDt3A=; b=UJfaO4QDK/3TRriW65ZaZFNL11A/1/j1Q++M0CwRFxRXQJ6eLYdS5VVwMYxbn8+gJS zSYw1wKmfVpBuhN7RkKHt6kujQ9EFZiKBF0y7mNblgF4PpQKpAY48cJ1Hdnt67iZJbSo VTYwPcOMBCWVwwF+YUjAlgTe/f5oQVIR5MclLMGDLzezOnep/Ct1a7b1pr1FkpxHLhY0 x3GPTaMUV/BWl/SADUUM0fOJYmwd7fAaWOVw/eykOPAw3NemnFTH71KLjnX5alGTRCpN nqJI6AKbNdukwmTyLXogSKhzVGyp9sZyMH0DmNSr1X2xssgKMfQsAqIFcY1QBt+GeDij 83mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097960; x=1742702760; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ClBpfryF0l26LJucqaea+hnSd7y5tppZSgQBloKDt3A=; b=U04ivbI6/fGhSkHwYZyJVRLLM8JquSU79vhmi+/TFf2FE5IzD5/jpMXEsI2GBTKgBn lPD8Yk0zGQI4i29sjFFxcKnE48mKHSFSGmto/VlypcVgfWpNdor8l9mefkrz9m+U1n7M 4+cfdipWMD1o9fJvwhVJSo4jWwI2YWacnVBD4LZ62p551M2ZJmiTeSgErf+h2daAb5kv McKVIGC1fs5K7qgXBkAJmBkc+qJO04FbbYfcm9+BYY7GkMEvNGMqoI3wSn5PwLduB0D8 JRxO7wgJCzXVQzn5gDtOJ+XLEfK1MUlQ8aPufeKJlCHl/l/8KBkUCkM8MSbfQb0uhjsL 47kQ== X-Forwarded-Encrypted: i=1; AJvYcCUU6+pRnQBWSYXMYS9nxoSAr4+gP3jk1vX61LxKHJXImlVwT2ypcdHUX39BqAbyT+v0l8vqxKBEuwnS+2ZtlFPC@lists.infradead.org X-Gm-Message-State: AOJu0YxTOZGvzlBBMk8uH7uzu5XWu2EbHYBeMO+ysP5368YMcDSnGpPr xJtOFxgBSfwKqQXrPOb+FZleSImhB+lzy0CPBOTfnWMtZ8DNOaIh X-Gm-Gg: ASbGncv45GVAU2vlCEHCaXD+h53yGxN5OVb2bpzFsUyPn7PDhr4ZZkUO0zIw/AHXy5E YeMSat8e7p7tZisOL8+giEUa3nHK1Ni+PcDSOOJtUelZPep45E3dnJ1/UcNj9TSMonMV20Uszzk K3B1eIRAK21CtF0t6xh8Lnzm+F8dXKGFMytDmpSjvIGX5xLzjsQiE53XDaOqFRI+pu0PweNA5TD xC18GZdy5Ngyeb5scdtRnUwGeo8+EmH7IH6kVmo1/8WbdRH4GgMWfGyqBJWiYYwRz0tRw4idpAy HyjvdaGO2sIZeV+fxomUMAw1Xn7rz/Qxk0Y= X-Google-Smtp-Source: AGHT+IH/e6sr7Q8NtaxNCMMeZqQ+z9X8uI7I1YSl5ZwD64++FRmjeUrFOkqAexRXD/7ze8hpT1OmbA== X-Received: by 2002:a05:6000:1562:b0:38d:dc03:a3d6 with SMTP id ffacd0b85a97d-395b70b7668mr13114610f8f.4.1742097960004; Sat, 15 Mar 2025 21:06:00 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:4f::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1fdda30esm68369765e9.5.2025.03.15.21.05.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:05:59 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 13/25] rqspinlock: Add a test-and-set fallback Date: Sat, 15 Mar 2025 21:05:29 -0700 Message-ID: <20250316040541.108729-14-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4142; h=from:subject; bh=6EUILfgiZohkAPsyvtNzcRk40CzXPZuYHAK+ukHSlzU=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3dWZVH4cGT4jygyYaRMgYBKoSMntnk4oFzqRKP +TJ3LmiJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8RyhmDEA C1zcUyxSUagfAmjyxuovyTwYJTtWqoL5BpkcoMeatgpoXvSFS4tL7BP+3f57t/dG51OVYvBhT2RXN3 nhHuP1IuXSpAdZG+q3swIHEVfIED9SS3PCn0geldDUhkAzxMkpbZqWPwJhh4IcqZB+IWBsN7BZYtW+ eiq/JKntCpyChduNnwrkkDRMWiws3F5N+JllmYp8XC54/J48uYmiFyJaYYEUR7ONy+IL/f21dHKQrT 8xCYRQblvr2tXiLe/rj9uJrq5h9J1j+Qw16WsNZ8WDBUWx52xoX1xV1UdOS/8vQX9HCEOJ8KDbo/db spk5jmSQFHBWJjYK39tihe0CXejyi/J+FpiRzMRNopnL44+efFbw10U5IgaVjMEP9LkozIVmnowmrI jCZ5a0OfO+eZf6xAeHlknuB9N5Zl0SNEZhWh3ayRtPeF/OpHs9sZrE+CaTsjnHnh2g8vIQt/VzzuU9 bJyTo/zGSOiYSFmfYQPN3JfpgR58GeJTqb15WlpNy21ivZWR9dptXHGTECX7xeEoJ7ns+MuNnYAC9e miR4qDevVacWqmiWfTKfc8CbfAmZI//+1NKlJgfhOomrfHwGNvSEeKxTnMs0pANCDx/wKI/kHLuWqC rR72s0G1l2FIBNETNQNhbftHp/sZqx1oEyYMlCsrRgBNyzGuhAHxNLnCg7Ww== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210601_467976_486A4966 X-CRM114-Status: GOOD ( 18.39 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Include a test-and-set fallback when queued spinlock support is not available. Introduce a rqspinlock type to act as a fallback when qspinlock support is absent. Include ifdef guards to ensure the slow path in this file is only compiled when CONFIG_QUEUED_SPINLOCKS=y. Subsequent patches will add further logic to ensure fallback to the test-and-set implementation when queued spinlock support is unavailable on an architecture. Unlike other waiting loops in rqspinlock code, the one for test-and-set has no theoretical upper bound under contention, therefore we need a longer timeout than usual. Bump it up to a second in this case. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 17 ++++++++++++ kernel/bpf/rqspinlock.c | 46 ++++++++++++++++++++++++++++++-- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 34c3dcb4299e..12f72c4a97cd 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -12,11 +12,28 @@ #include #include #include +#ifdef CONFIG_QUEUED_SPINLOCKS +#include +#endif + +struct rqspinlock { + union { + atomic_t val; + u32 locked; + }; +}; struct qspinlock; +#ifdef CONFIG_QUEUED_SPINLOCKS typedef struct qspinlock rqspinlock_t; +#else +typedef struct rqspinlock rqspinlock_t; +#endif +extern int resilient_tas_spin_lock(rqspinlock_t *lock); +#ifdef CONFIG_QUEUED_SPINLOCKS extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); +#endif /* * Default timeout for waiting loops is 0.25 seconds diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index bddbcc47d38f..714dfab5caa8 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -21,7 +21,9 @@ #include #include #include +#ifdef CONFIG_QUEUED_SPINLOCKS #include +#endif #include #include #include @@ -29,9 +31,12 @@ /* * Include queued spinlock definitions and statistics code */ +#ifdef CONFIG_QUEUED_SPINLOCKS #include "../locking/qspinlock.h" #include "../locking/lock_events.h" #include "rqspinlock.h" +#include "../locking/mcs_spinlock.h" +#endif /* * The basic principle of a queue-based spinlock can best be understood @@ -70,8 +75,6 @@ * */ -#include "../locking/mcs_spinlock.h" - struct rqspinlock_timeout { u64 timeout_end; u64 duration; @@ -263,6 +266,43 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask, */ #define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; }) +/* + * Provide a test-and-set fallback for cases when queued spin lock support is + * absent from the architecture. + */ +int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock) +{ + struct rqspinlock_timeout ts; + int val, ret = 0; + + RES_INIT_TIMEOUT(ts); + grab_held_lock_entry(lock); + + /* + * Since the waiting loop's time is dependent on the amount of + * contention, a short timeout unlike rqspinlock waiting loops + * isn't enough. Choose a second as the timeout value. + */ + RES_RESET_TIMEOUT(ts, NSEC_PER_SEC); +retry: + val = atomic_read(&lock->val); + + if (val || !atomic_try_cmpxchg(&lock->val, &val, 1)) { + if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) + goto out; + cpu_relax(); + goto retry; + } + + return 0; +out: + release_held_lock_entry(); + return ret; +} +EXPORT_SYMBOL_GPL(resilient_tas_spin_lock); + +#ifdef CONFIG_QUEUED_SPINLOCKS + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. @@ -616,3 +656,5 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) return ret; } EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath); + +#endif /* CONFIG_QUEUED_SPINLOCKS */ From patchwork Sun Mar 16 04:05:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D516EC282DE for ; Sun, 16 Mar 2025 04:31:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=2SJ4rKlnRTNLC7EmgTg9GusoyyGplCB9SKvBAwvhMho=; b=QfZbQX1zekUyZdNqwZ6+2JjbFB KO2xK+CuLQ8oNB8zriwExj5ot161HU9uf/WRlsRVdfyw5nh3vDRbmbGoahKYyOTe+11PgYYrrJl27 PeJebTsztjhwAtSNwhDB2pd+clC7IF/PSV1gxqa2kuSwkVr88z9qiY4vhQbkBRjr9/aCGudrfzw1q G8J3IxlfLUbNIST5Ij2XXdV+7wfGI/I3YQohrFXgWJ/KQY4Nrwcrr4Up7e3pY2XRRw878isve66lT 4Cexk6fe0RLKUAhp9Fa33yDLkTd3zE+WmP9NtBDzOokARINs/Q7ShW4JWcwXKtI8Lud6jmQIq6Sjv 8JYfkpUg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfei-0000000HG0B-3q5x; Sun, 16 Mar 2025 04:31:20 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGF-0000000HCJF-0Lfe for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:04 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-4393dc02b78so5089765e9.3 for ; Sat, 15 Mar 2025 21:06:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097962; x=1742702762; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2SJ4rKlnRTNLC7EmgTg9GusoyyGplCB9SKvBAwvhMho=; b=WCeXW5ugTKckp1EO5XXkpb7joOKwJbhMYupici7E/4/hyHXIZ17lI9Rnddy8mCLHy+ jETCr7EqH6WLgzA9z339ujGCXtGDbpRvXmDToZB5mEUyos+y2fP2fRNnaBTKA2FbwKAL Vb6Kk3kFp+xZgbgZ7J5IK1Md3CBsTIEFni7TMjTb1mbKMBbkXWP91KNzFn60Z7ryeBQX DA5uP0m0+tlcdauV1hgquGkfwjo2Z2VsyAFBbEoYLTewI0pctj0pDYtMEd3XA+Cay/Ea m8WzcyCmeke7udW5m7z3Baxh8vzfL9d2VFbvISDpR8RIDoN+e43430l+yfBvW+dgGeXb xVMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097962; x=1742702762; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2SJ4rKlnRTNLC7EmgTg9GusoyyGplCB9SKvBAwvhMho=; b=md0TtHVQKFmOPkfpRfHCutbngga+0b2c6iOz+9hpC2+TGjwbC3xNFnHPIxsJVvw6vH 7s+5kS1tuoVIEfTu/SaCMKbg4Gdt4JENatZR1uRiIjOvoySC/0bPN80t9jxiWHZcBNxq cIh6Zb4YsLS+q7QeAGwc2MtS8yiwZM4TwO0IpwqsM+IEVPEAPD+eb0sa9+qZPgKErT2/ 19/y1MDRT/ehUzh95dNQDVFnWFFxYK0EpLOLVVvjJbpp31quk0gFZlSlodzoZE3N6Yei yp5do+xI6R2SXw+WH/xvTNXIeKBPaFV8taRoxZU5uqM86l5XfPo/fHxEgPb4I0zLUpIZ fVgg== X-Forwarded-Encrypted: i=1; AJvYcCXiSH+onDEVEmGjqSGaD7+2LMICGrYPKnrmwpYj6sH/6FDao9msDPL4rh5qk5QpcOsZs5vUCS52mv9TMoEfIxGH@lists.infradead.org X-Gm-Message-State: AOJu0YytEDlic6j/bDltI55ITANsqiDG/5Fi0XECW8TZnu2SvhMRjLB1 /4TkO6vh+paMFLK4FgsrSMNuNH2Nk6xKwLEtjwWZ3CssqIxQ3QdX X-Gm-Gg: ASbGncuM9zOvOD1jYsFHi8NMIgEGcUV162OqccNh8iiEIECZa6tCMK8i2bYXRFe4Kof ZU75dC3CUhbCkVtMQmGaF9orgS9WEvff3UXyjjx4SThOWmhhNVoAte8jTitbdTjm+IGyiBcesa/ LCbk+b+xHZo3G8KuoejnsQf6oHMGa+Kf48hb5q+RHAredDNJTbvMkaLWTu0tQSvkXo5a1qO6QKn 98jTvUsKH2IUlx1FjhufqC4VUd6QdUQkqPIbDf5FHGcqaxoF3YpjChxeWE96N2To+qc4pf9yzAp 3waeFv2w9PbGG8OXZKDdjGI6ZbQPamgZcmM= X-Google-Smtp-Source: AGHT+IEG8+7fAz+0OnMkkxzP6+jI+ColvkT10MRmYk8Gri8AF6ZxFi7vPhA4DMmTFfNK0LsRQARGzA== X-Received: by 2002:a05:600c:1d1a:b0:43b:cc3c:60bc with SMTP id 5b1f17b1804b1-43d1ec87be2mr100575065e9.15.1742097961450; Sat, 15 Mar 2025 21:06:01 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:70::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1fe609dasm66578265e9.28.2025.03.15.21.06.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:00 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 14/25] rqspinlock: Add basic support for CONFIG_PARAVIRT Date: Sat, 15 Mar 2025 21:05:30 -0700 Message-ID: <20250316040541.108729-15-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3261; h=from:subject; bh=9lJX2DtyRTnUplpFVsYE5ziAxwiiosjb1D9cDEXkG+8=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3dEsMhP+74YqrqyvnIh88VKcNnAxL0cvJ1gkJa n7xGLWyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8Rynp7D/ 9jwO0LqsAd6i0H1M1SNcFjzNyEgzms+NwhZBtBuTdTvqAT3wwJj/F4EEK9EJQFykdau+00yKY/pgkR PEVdRrxlAgs2crbDzjqJYh+q9G8WNGP+ThpQ+Zt+aw/TVeYukHpS1pR9iEmD9srnDE2dwebiAGPgMM vTFWGqOETpp80HUL8s2G0XCRaH0zfVXr66xapYIetDVzQF3xUHsS3DDcFORGeinrUfxhoMZBRNkRpt xHq2rF0Oll2LPziNq1W4U5lK75qNdpx3oRkhj+J0GxrLo2VYmf1jZjs1Pu6Tjls7sPAuCKGc1sNfpS IDLbiiBhtgpaN8QlAzXc1TliiueghupnCo+aVENDvLEHG4QQhpeyGROKQPIJ0/cABmhk8YeMQ6eg1t NBNIZblT+x0GbPMC1SZ6cVNewgGcNju7EedsTRHf/SrbrIuDhXY0XZ2aZOzt8gKa4kk1obC3tFP+GE k1ddu1M9BmjHiqEhAX6NpIf1U2Rwsg3tljPLO06eQOH/TWhlonNlZYWV4s35e2UG0C3gGcj5Uk7p1E h5WyV089i3EAdv/Bmul3PVl64TmvbJSqqDNMSyPFFUS3n3IjFeEhF6GX8NEeNGPNzpYJAgQvnGibRB k7tvZ6agwzjq0rg4VW9szOQG84Ei1XQvpZiRUTuhGSeEo0cdLmuN9TNcntsA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210603_196041_CB651241 X-CRM114-Status: GOOD ( 17.19 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org We ripped out PV and virtualization related bits from rqspinlock in an earlier commit, however, a fair lock performs poorly within a virtual machine when the lock holder is preempted. As such, retain the virt_spin_lock fallback to test and set lock, but with timeout and deadlock detection. We can do this by simply depending on the resilient_tas_spin_lock implementation from the previous patch. We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that requires more involved algorithmic changes and introduces more complexity. It can be done when the need arises in the future. Signed-off-by: Kumar Kartikeya Dwivedi --- arch/x86/include/asm/rqspinlock.h | 33 +++++++++++++++++++++++++++++++ include/asm-generic/rqspinlock.h | 14 +++++++++++++ kernel/bpf/rqspinlock.c | 3 +++ 3 files changed, 50 insertions(+) create mode 100644 arch/x86/include/asm/rqspinlock.h diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h new file mode 100644 index 000000000000..24a885449ee6 --- /dev/null +++ b/arch/x86/include/asm/rqspinlock.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_RQSPINLOCK_H +#define _ASM_X86_RQSPINLOCK_H + +#include + +#ifdef CONFIG_PARAVIRT +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key); + +#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled +static __always_inline bool resilient_virt_spin_lock_enabled(void) +{ + return static_branch_likely(&virt_spin_lock_key); +} + +#ifdef CONFIG_QUEUED_SPINLOCKS +typedef struct qspinlock rqspinlock_t; +#else +typedef struct rqspinlock rqspinlock_t; +#endif +extern int resilient_tas_spin_lock(rqspinlock_t *lock); + +#define resilient_virt_spin_lock resilient_virt_spin_lock +static inline int resilient_virt_spin_lock(rqspinlock_t *lock) +{ + return resilient_tas_spin_lock(lock); +} + +#endif /* CONFIG_PARAVIRT */ + +#include + +#endif /* _ASM_X86_RQSPINLOCK_H */ diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 12f72c4a97cd..a837c6b6abd9 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -35,6 +35,20 @@ extern int resilient_tas_spin_lock(rqspinlock_t *lock); extern int resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val); #endif +#ifndef resilient_virt_spin_lock_enabled +static __always_inline bool resilient_virt_spin_lock_enabled(void) +{ + return false; +} +#endif + +#ifndef resilient_virt_spin_lock +static __always_inline int resilient_virt_spin_lock(rqspinlock_t *lock) +{ + return 0; +} +#endif + /* * Default timeout for waiting loops is 0.25 seconds */ diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index 714dfab5caa8..ed21ee010063 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -352,6 +352,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + if (resilient_virt_spin_lock_enabled()) + return resilient_virt_spin_lock(lock); + RES_INIT_TIMEOUT(ts); /* From patchwork Sun Mar 16 04:05:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018338 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B7953C282DE for ; Sun, 16 Mar 2025 04:33:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=gUCAEDGBHb6/0D9rNwLR8XLz0vdrsmnhtv+wLj43xrw=; b=0uRIyzUFTDB9/CIMBrwXxMjsBc uewXjCTNTYh8ETJZJjvtrh65JVWrVNlcOMM2YWKT+kEFtRxzGddZOsI4o2EM9wukn/nSDuwsk/8fO 7+hqtMpSMpOEmiGQu7pDjzB/0rQcj2zHhqoFIHg9YKjQAsyBh9+ME+P57ByNQMwnC3QSCFeFNRjD7 j/RCLcCYkEc2jEb9NaKXmLQLP+nFhL2sYNjaeF060L49SvOF47BjIBBHaPnQ8Wj1eEtGbfNs2imwW dEb3blM8Y9P1SgLrRtH5O8h380mC5uALLSXm7z+eRzpqsmiXrT0tSeFgbp3nO4uOfWFpvEZydb5kP dHWNyNBQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfgM-0000000HGEw-3T0B; Sun, 16 Mar 2025 04:33:02 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGG-0000000HCJs-0hNx for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:05 +0000 Received: by mail-wr1-x441.google.com with SMTP id ffacd0b85a97d-3912c09be7dso2191607f8f.1 for ; Sat, 15 Mar 2025 21:06:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097963; x=1742702763; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gUCAEDGBHb6/0D9rNwLR8XLz0vdrsmnhtv+wLj43xrw=; b=lhXUWbqHIfgvILUC2IolMkn36ghtLAMTpS8QkzfwJntTa3hwLqqWeFWCLb2S2dDTF1 BT9Mk9VBJnJxY2ALX/uuun/ZhpqQQ24ocTP53qqAxWLg2GawgUYwKL3qQoXZODaK3SnB xCXyipWClo/wkNpokhwpvgJtbfxZHNCpd8yIg+O6kSJJB2MKUNRQemhrrQIXjVHivkMA CNOOF6R99bARsSx8wP2cXmnuQfPRGYoMChC8NNDgwXTmtMJBoD7A5ghzqrqwUtuESJYe p0xgk5nqQFo/zIMbK8cXMu9auAV21urp7wPF89LgICOLCyBZQ0bn+HABvfKTO7sRU//F QW+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097963; x=1742702763; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gUCAEDGBHb6/0D9rNwLR8XLz0vdrsmnhtv+wLj43xrw=; b=Fik2bZYSUmQXg/KORM9TmX23M3id2ricM3LPW3y4/kW1Q0flfI8jIvZfbdPUVvOWiK b+9/ffFOtuYBQOUdBKcfkgsCdnt8Ol0S6PTG3Rhv5wliItfb8G8H0axJrzM6vynbWHB0 NoNCmaNC/3OX9K4YXLyfnYQuEyXXBnWsWtybwAf1EqcPCim5FBgpQfydLxCntBCZ5vrl AAQhogk+5msxbxW8NB08H9eGKwjaaRSGQgyYEpzBWr9voJyN1GAJkyW+FfzMxQ0sMZm3 VoIRMOdRNR4kHKPaV0aGYhZknH1lXJxL5cst/hJVBXwO1ZcYuyqmafEyMkn4ajrWC3dQ oVBw== X-Forwarded-Encrypted: i=1; AJvYcCVE7AFw+rehyJIr7jKsgTcU/9BBUmcuQc24azpF1tw9ml2io9h//ulwU8Mr5d8jDpk7y/OLuVp92stSsvVcTm+h@lists.infradead.org X-Gm-Message-State: AOJu0Yyuzu6/NfNTLzKnazIAMq2Ak+Vb+xi6+o6AH5vELmsN22bqI2w3 zFG9l9gQpUt40Fek2/aClTgRHFcZqrOWyfDiQKYbBnuLFR0JAzC6 X-Gm-Gg: ASbGncsufPI2iFia35hFBStVffbh/myZDL5ttWZfmC2KzJuf0fYy57SpG+akWzyts3h BiN1k9KkgBZYC7lMNJKArUCxndCGzXvTnMaP83yBNM/y2GeTX/x4gY9dWR5OK9hGzIpsMfDg0fG FzW8y3bQbaiMArlMHnRf0ErcVdPTRKbVrJ3CEmoBGRqcXCpZwftX2Y3AM5vtZFEQ6EJ1Iag2a6I R0Izp8qchOKk5uIlGWT6uID4eTtk5dbTcf4VVrqF7E8UaEVMOaGgkZsNXdNx5clH1V+HBCThVmj eZ737GiRxZDn3P+pVxy/4svWGw5ewDqUDJ4= X-Google-Smtp-Source: AGHT+IH1QUcBVLxG3+dHXNVouxh8ZnGD/jGRFx3laH54pFhC0UFA/apB3TrLa3WvvXdMcVjhfYlapw== X-Received: by 2002:a05:6000:1789:b0:391:29f:4f87 with SMTP id ffacd0b85a97d-3971fadef12mr8962187f8f.49.1742097962702; Sat, 15 Mar 2025 21:06:02 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:74::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395c7df344dsm11217081f8f.10.2025.03.15.21.06.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:02 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 15/25] rqspinlock: Add helper to print a splat on timeout or deadlock Date: Sat, 15 Mar 2025 21:05:31 -0700 Message-ID: <20250316040541.108729-16-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2105; h=from:subject; bh=mHarWyQwszN5GN3f/z3j87BhPIW1TR13hflSsKwjB3k=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3dhiQsKeYiA7FvlWbRSOujsyykBVeLrqG6gNLF YBDPA3+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8Rynm4EA C7H/n0q49/3AIp2C9KPljoZUU6s3o4IV5coN6RHcn0nxNkwVlUMUqOmO9viCbAZR7x0xMkUUmESdVT d6RNwHM/sq7WZtekq1OVAesNrzsND00uiegqn4MdL6uYpmvesuXJil3K3FhyYPLgN1JEHEqn3q3PBv xEs6tpCRF2FogvO8Tfpo6OUlBQnqsAMOrv/J8wqR1Gx7BEvoiegfsOIGmpZAokPBs4GJs6KAPSQClK HxiqImv6qIUN2ImtcFSFuFx06FDW/ZuZn9aAOOuM6yq2wUZRxJdk8jnj4FSWZ07IPxAyPRIP2+TY/n 0PxAmhbVU6mwE2FD91sqm9V8TimYDCc4sNi992SQfyyqJjkx/kGNPQtDvAdPwpyBgxyyzjdR95KB+e BFMmApqGZMvfh6jlwMum3F21CCRWHLRpNkz5ljBnZMAjdTg89tJD5sv4Ou6blclSsNG2BCuW8lf3lA 7ZB2YghW4IzPDw26l5e8nshAHdm5PjP4WKtdJIn0qnNsB5Ola4jYzwK17AyivSL0eKjEJJ/w+R6zJ5 KgZ0wYVgXs+bAMtfs3VFbCbhWwu/+hDk1SvRy5bdwaGvHJO+1LyPBdOPEIJFkP1mI+vfxEE/9UTaJs 7wEdv4wf2t1oJZEUzmrqwj+FYMxpA5vyu9CskwLFQ1+b395NhYuHMPNuVNCA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210604_198206_4F61AE3B X-CRM114-Status: GOOD ( 14.74 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Whenever a timeout and a deadlock occurs, we would want to print a message to the dmesg console, including the CPU where the event occurred, the list of locks in the held locks table, and the stack trace of the caller, which allows determining where exactly in the slow path the waiter timed out or detected a deadlock. Splats are limited to atmost one per-CPU during machine uptime, and a lock is acquired to ensure that no interleaving occurs when a concurrent set of CPUs conflict and enter a deadlock situation and start printing data. Later patches will use this to inspect return value of rqspinlock API and then report a violation if necessary. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/rqspinlock.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index ed21ee010063..ad0fc35c647e 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -196,6 +196,35 @@ static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask, return 0; } +static DEFINE_PER_CPU(int, report_nest_cnt); +static DEFINE_PER_CPU(bool, report_flag); +static arch_spinlock_t report_lock; + +static void rqspinlock_report_violation(const char *s, void *lock) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (this_cpu_inc_return(report_nest_cnt) != 1) { + this_cpu_dec(report_nest_cnt); + return; + } + if (this_cpu_read(report_flag)) + goto end; + this_cpu_write(report_flag, true); + arch_spin_lock(&report_lock); + + pr_err("CPU %d: %s", smp_processor_id(), s); + pr_info("Held locks: %d\n", rqh->cnt + 1); + pr_info("Held lock[%2d] = 0x%px\n", 0, lock); + for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++) + pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]); + dump_stack(); + + arch_spin_unlock(&report_lock); +end: + this_cpu_dec(report_nest_cnt); +} + static noinline int check_deadlock(rqspinlock_t *lock, u32 mask, struct rqspinlock_timeout *ts) { From patchwork Sun Mar 16 04:05:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D64C9C282DE for ; Sun, 16 Mar 2025 04:34:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=lVJp3+JT7y8+ub6buLRsxOUyy8d6vFpxA5yFbRAEMPI=; b=B3lGWl8vt/PbHS0pOn1Ftp5vHE UatEOYuMWGND/8dPlhuU4G6UKNkXhHS0GlKibqdO51L7Nk171zQ1YOhVYY2olY9F+tDO8lI4qSgL1 uL7F75yLTSV34SxkeGXpp6rUnYadK1FrGkhargrVD4wYrnAcCIvvkb3dWQ3F4xPQbLtZB2XFOu2Fv MilIuk9cTA8VTmTzlzTpyNIUSDNPdvFOzvlyflQziYVgn1Tqnspg7KlJfz+/x3iXgPynyssw4qd5U ySkAz7e4purIX8yii9cwaWNXyHUEOVlaWWhdhuv6QpWkifW9eDMTrHOnuaMeAmTb6MUgbHnGCAvG8 +xUayFeA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfi0-0000000HGMi-2B4J; Sun, 16 Mar 2025 04:34:44 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGH-0000000HCL3-2S6S for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:06 +0000 Received: by mail-wm1-x341.google.com with SMTP id 5b1f17b1804b1-43cfb6e9031so8918355e9.0 for ; Sat, 15 Mar 2025 21:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097964; x=1742702764; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lVJp3+JT7y8+ub6buLRsxOUyy8d6vFpxA5yFbRAEMPI=; b=dCUUcTv3uJXTXLjsg9Vz6Gq87VVDYDitkDMF74JhdHO4jp4nDRAbFUWKEEMMHwCQRI q4paBW9+PkdtgyOvWfBEnj/u11991PoiZ1kh1zALdfUuE2i+UP1ZUbfNCeNxmEXmtJBu X48rMFgDbQmt0vHZUhPHesftzQA9/5jYULK6ylXBscGA+8e3BSEy+lGI9Yil+E/M2uxa 0Fx5f+3GK+1fMAmSAurKs7hSuF5JYTR7aX27+ohGeQWv2nZVHvSKhITZBdUAHKD73pVi D/N7AAtE0dxsiwuIzF9c4OI3G2fcbWK5LXnTNPhfEVBWiOA8PyUv1vhdp7KA0A3mZ3oe 7M+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097964; x=1742702764; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lVJp3+JT7y8+ub6buLRsxOUyy8d6vFpxA5yFbRAEMPI=; b=n3ge0Zcdj9Z4AmYciynduAjSwLsLCASaelj8fzEcRAkgySW0wuudDd87enfmCDlpH2 mkmkfQjHTl7O5ZZWwtEfhdpr49LpUtERUJ08kfG3Rrq9eL3a1bkZg2z5c9C5NQ8KbduM IPCn1DZngTyGI3xZJcngOAAdSXua+IlCgaU7pdLAR2cwddeDx9nbiHJ2kSj5xWQhRYOo gfr0GF1bZnsYPLBpX5MadwmAgW+Z7TWa+kXtkQiFs+Zm3AFavdtEwEBC1ZNmvY76cui7 ImPY/8TfJ+Apet6xi40fDdTuF134tGtHm8hA61GoQP8mksBtilAoWQCAwZL+6dw9IJqC dKBg== X-Forwarded-Encrypted: i=1; AJvYcCWU9p6+J2v2fjGM01i+mPOgSOfUY5EVrFB6LHlOD3fknpOM8EY66GEaW8QRMFlAmjO3g7cQIFhSEK6nIZlwZe7z@lists.infradead.org X-Gm-Message-State: AOJu0YxfPL/8r1yRaQQ9yIXfgXWhQD1dYDE0xlTUDZezAyo+MXy6bUsj fhZhohZP6NBfSEuKNG0nqjALM4I3lS5naNGEyCDbosX0fhzLNMry X-Gm-Gg: ASbGnctVa8oBDoPHytzzUwRffzCoMiWObw9LZFCDwhf2JENHqZVoIJ+GlbK3Cs64CZF oeDfZJZ/BBbg+ZdVSn8JOTNgEFKffW4RbOrvIyX3Wj4aoDKmA0M+EtK2xx9DG8C359yv64yQnrA qEX8V9yRCbEDoiwNtwtH9CQbkwrNc+zcXdhl+s7y6wDNxdAkGs82fAnA9M0bDseA1QA5bCcTZkr RfxlvBNXUmbH+b0CI5b+NhXxaFapZLOTIuxc2wlNMpgJ8heLWzXK8xcbP3Xbd1uVIG0joA7hYyM HpCigoD2kUtvUqmui183UaSqxzwceyNdeNg= X-Google-Smtp-Source: AGHT+IEqbu6eKdKbUR9hLIUzBRTM7W+fO2MaRKuwN5PjghpeobTfOg76MZ+yqvlOP3B3sgZbmvRoKw== X-Received: by 2002:a05:600c:3c89:b0:43c:ea36:9840 with SMTP id 5b1f17b1804b1-43d1ecd7926mr80182815e9.22.1742097964115; Sat, 15 Mar 2025 21:06:04 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:70::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d25593a94sm21073705e9.3.2025.03.15.21.06.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:03 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 16/25] rqspinlock: Add macros for rqspinlock usage Date: Sat, 15 Mar 2025 21:05:32 -0700 Message-ID: <20250316040541.108729-17-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4185; h=from:subject; bh=Yp6VTxKtpBDQhUgfKGSHMWzJmZu3deRDpy/hLhkjXfk=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3d7ne1febSzud04+mu0CKRHqx+H8Jq+gCO2ZsR MN8cyy6JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8RykcmD/ 9gTc/RWOMn5cBBm9fdR7z3Whng8H/b1v47vd75XWWl34N0yLT/YWBxTTQ+sPVZzGGuo6B9xsO7eKc9 SZp8I4okxcB2aM74N8m344h+WeCA3ZDdOAN1PStTNACyJB8JKCSwhxf5/MD4VnGmAAvRd7JcgDsl3T lkua13O2hBIxLJEBPb7jFRav4gDcQhaWZZY8+I8wIj4D2LsynzlcQxPMMg/IOGoojbmx1IZpibvTSS cOab748UkW42TGSTq4cdoSqNEyQkllc5IG3CoplNecSOYw91jN/9/nfpRQLlkeNk4MMkIpW1wkerkH stNypWjI3+zMxxxWOpnatrs1AhcbL9xdZy/yI6HeaBpRLCWE7EP2tyRtxMLNP0xZTzDUbhelU5s2xS ISufWNdpb9hhDr2p7Az4+QCAW0/IGguxT1YSSJ3VIvP0PSGYdPGn7h5UGMmLAsKRteUQPiIinHVfjq jswG4p2GalFrgmjtSAlgCOgfnnBgdFSBZ8KrUhAEDyoSkn3oC7EfkzE4RurRDNaK7Dla53h3swFtN6 ebHJiW8K6xEarRowvHg21BJJhRkrGgUz7TcTos8teOYQ7GScBu5AHx4/hAgO+A+G8EAYCC6p9bLmWT uhQX3Azu8Y8rim3+JoGB/tvJ9vAcTAUH3AyLU5MWGMfUmvqRvuAqdg2eMQjQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210605_635098_9EF3F1B5 X-CRM114-Status: GOOD ( 20.57 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce helper macros that wrap around the rqspinlock slow path and provide an interface analogous to the raw_spin_lock API. Note that in case of error conditions, preemption and IRQ disabling is automatically unrolled before returning the error back to the caller. Ensure that in absence of CONFIG_QUEUED_SPINLOCKS support, we fallback to the test-and-set implementation. Add some comments describing the subtle memory ordering logic during unlock, and why it's safe. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 87 ++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index a837c6b6abd9..23abd0b8d0f9 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -153,4 +153,91 @@ static __always_inline void release_held_lock_entry(void) this_cpu_dec(rqspinlock_held_locks.cnt); } +#ifdef CONFIG_QUEUED_SPINLOCKS + +/** + * res_spin_lock - acquire a queued spinlock + * @lock: Pointer to queued spinlock structure + * + * Return: + * * 0 - Lock was acquired successfully. + * * -EDEADLK - Lock acquisition failed because of AA/ABBA deadlock. + * * -ETIMEDOUT - Lock acquisition failed because of timeout. + */ +static __always_inline int res_spin_lock(rqspinlock_t *lock) +{ + int val = 0; + + if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) { + grab_held_lock_entry(lock); + return 0; + } + return resilient_queued_spin_lock_slowpath(lock, val); +} + +#else + +#define res_spin_lock(lock) resilient_tas_spin_lock(lock) + +#endif /* CONFIG_QUEUED_SPINLOCKS */ + +static __always_inline void res_spin_unlock(rqspinlock_t *lock) +{ + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks); + + if (unlikely(rqh->cnt > RES_NR_HELD)) + goto unlock; + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL); +unlock: + /* + * Release barrier, ensures correct ordering. See release_held_lock_entry + * for details. Perform release store instead of queued_spin_unlock, + * since we use this function for test-and-set fallback as well. When we + * have CONFIG_QUEUED_SPINLOCKS=n, we clear the full 4-byte lockword. + * + * Like release_held_lock_entry, we can do the release before the dec. + * We simply care about not seeing the 'lock' in our table from a remote + * CPU once the lock has been released, which doesn't rely on the dec. + * + * Unlike smp_wmb(), release is not a two way fence, hence it is + * possible for a inc to move up and reorder with our clearing of the + * entry. This isn't a problem however, as for a misdiagnosis of ABBA, + * the remote CPU needs to hold this lock, which won't be released until + * the store below is done, which would ensure the entry is overwritten + * to NULL, etc. + */ + smp_store_release(&lock->locked, 0); + this_cpu_dec(rqspinlock_held_locks.cnt); +} + +#ifdef CONFIG_QUEUED_SPINLOCKS +#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; }) +#else +#define raw_res_spin_lock_init(lock) ({ *(lock) = (rqspinlock_t){0}; }) +#endif + +#define raw_res_spin_lock(lock) \ + ({ \ + int __ret; \ + preempt_disable(); \ + __ret = res_spin_lock(lock); \ + if (__ret) \ + preempt_enable(); \ + __ret; \ + }) + +#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); }) + +#define raw_res_spin_lock_irqsave(lock, flags) \ + ({ \ + int __ret; \ + local_irq_save(flags); \ + __ret = raw_res_spin_lock(lock); \ + if (__ret) \ + local_irq_restore(flags); \ + __ret; \ + }) + +#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); }) + #endif /* __ASM_GENERIC_RQSPINLOCK_H */ From patchwork Sun Mar 16 04:05:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EEDE4C282DE for ; Sun, 16 Mar 2025 04:36:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=VFS019JG0RFjz5io9ra87jX+S2pDR+I2Zduhbgoaa20=; b=NpA15wZ37SRVtJQkn1vU0ad01U 5NxA3o+z4oJ5MqvBFEVny0eIBit0NPcUC14PdOi+B2yyiNP2iUHCOXyVhmxhwZP6/iMOwma+lGX1c 7iJXDFRIdZsjfMD02u/EfdL5W7iVsUOhsjm1aTXZgX/gnoo0DeAv/CReRHymdUy+mYqLv1QTb+MWW vZfppniFBf416QzGj+/3wWAHozE2Ha/kfDPEfe0lei4UWhAIL+zx9c8KViNQ8OUR29TQsl9vBzGiV bzRo9wabO+FBR/wMSnhFI1wVPfjzaaipWTTzrSnXD31l/wHN/+kZFmVul3EogyMQXYbgSAB7iusUp j77szcpg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfje-0000000HGVW-0iAa; Sun, 16 Mar 2025 04:36:26 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGI-0000000HCLm-2h36 for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:08 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-43cef035a3bso6727445e9.1 for ; Sat, 15 Mar 2025 21:06:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097965; x=1742702765; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VFS019JG0RFjz5io9ra87jX+S2pDR+I2Zduhbgoaa20=; b=FAx9zTK52Y/63/B4jkg8hZrn2dXcNVUeR345KM4a7eF8R1dUhTyUYwhy5dOvtruVj3 ywaBJiCsscLyU0j5gT1ru/B6K8n9l6LEcdXKi6tiSCsVqKfvR7tPNn3Jk9cegwduWsdk j9OqSSxrVXMmN0psDZ+GvP+fPYtGbZWYVPyf+40kWhCwhy3hZstkp1ZKCtme+ll9quFx eQ+qc49EEZSW0jcQ13nVoZj54oRys5xcahzmTRr/DHLXey4aaRv11/aSnf89IYSsYVyR U6cV4ilmSFDym/A2KG0g7VqPoCtwL3136WfpROonC1S8rklwkaF4nGrUz6eRl7RvQOXz g5tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097965; x=1742702765; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VFS019JG0RFjz5io9ra87jX+S2pDR+I2Zduhbgoaa20=; b=D4MhQMsksmDexy5J+rtdFpAPYWO0gI8p4Unmc8J76x4KnieiPEF9I2KZJxCYP18Ve/ 1RCJSZHlpoPV2yWrLGqcMBOH/ulLRouWBAknPf0qdLSpDvohbQ6dPqA9/29gmb0ILqpq bYMxGJUoaA/rw5tKUEJXKqJPrIu/ktu47rPqmvtRWH5g2mI28gOmFxq6nreZbY0lSZG1 gMjaGyX8KlFcJqAMHJILmUviX8sbz8KeZ5ztxLC/kxd+zwHzooVcikpSTVPVkN3IaaVm K4+oB/DBxSCLlHPSOnfrb054TpNKPPMw+1co7Ii+kpoEuUHGbDU/p3lB5BoBSe5oCcCU XUvQ== X-Forwarded-Encrypted: i=1; AJvYcCVKcbFoDBlbeGB58fxyRkEITbWzYTY29kvM7bTUpNr/qc63FdyGaPC+mGlxTd31o7eaKQr5t1hUAWb/ExNbboEq@lists.infradead.org X-Gm-Message-State: AOJu0YxgR3gDMzGuROYtDi/mlaxOkSytwP5D/gxx/lr+wHq0H+edUt9M Af1TvyFJbLQrcJ0+tT2Nl2DX8ndv66lbxHDzopir4/OtDn6giJcB X-Gm-Gg: ASbGncvt09SPx1De4qAi3lb0JiNC9SYroiyoV2dk4/cS69VPPlWe7xc/CLWC6kOc9FB U5Q+C+YqdlhlqJ0AOTAj+0d6jcDVhnpjvgu0Xjd5a/rniPTQHIMG4QtmK7bQryPK7Cm06v0v+An z1ifUBCVMs2WNvEejtMwwG+SjYOhJSGyMr4OJ4MaLav9V9QnM/seqzQC29WqaTIl4TiCHY4lrPX 6TNDA8pzPstSE9GWLrotUAZYwdMQ/Noe1AjEsVNqux38XoNKHAHyYcsSTS0IS9Y0e+U2BXGMLuU eODwRdgs+4MqpiETTPEzF2ov87Hq5Fvbpg== X-Google-Smtp-Source: AGHT+IHBIpDn5I6oDUPVr9V5ItvG6mUgYbh8JUY0xjPJvkMd3lQIrG6tia0tYndOnZLCB6ubQR+rjQ== X-Received: by 2002:a05:600c:56c5:b0:43c:fffc:7886 with SMTP id 5b1f17b1804b1-43d1ef4b074mr78421855e9.8.1742097965220; Sat, 15 Mar 2025 21:06:05 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:1::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d2010e618sm67780255e9.40.2025.03.15.21.06.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:04 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 17/25] rqspinlock: Add entry to Makefile, MAINTAINERS Date: Sat, 15 Mar 2025 21:05:33 -0700 Message-ID: <20250316040541.108729-18-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1896; h=from:subject; bh=TAwmD3mE9IWzqTEdxgzO+SkBgGc9QW4lOFTMCWQX6XQ=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3dm8uyglj3VmEv2htJ2FLhnf1JsYuPf2RZsbxU w2qldUiJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3QAKCRBM4MiGSL8RytukD/ 9AxDBzDg2jelKivmQxOWC/3ddLF46czI6LpjRZJqkS8CF+Rtgtq1ntcZfd3DwB1YlWNi59wKsFGBnJ dvHyPQHl0wf8x/+YSIKnWOfRTtXXm7HO9EOcusWDuo7nYRoA6dOki8cc0dHPcr7GEZguYUvNkT0I48 RiCeb9ouXdF+9R7dZDI+htv3iB6XrR4vhmfLw+DsE6xSAwFvvJafz+GmaF+szclNho4RF6EOzxUl+s 9fggq12ESIsqk1CtgtcCSL6XBKYoZejR5xun8ldsowdPo5VU+FkBaYLjflcljs+9humS8ea8kzLQdP tBQO13yb3uzxTs0rzzQ5/m2QAPHnV4C/XBS1o7/1f/CZhhWnZfWCaezDqmj0DFHBrWFw2t2BYMKyOt pEej2vsoOJjeV8O2P9XN9/Iu3qJeRZnrOSzeuVMQefuxjD1W0fZFj6Xwgp1NI5FVcn/7XfvNSAOiu/ DO6EIdHh69dWmTxrN+gImrK3Wzp51GSrlSwfasmj7JZIFiKeByLszlkhWluDw0sXtGBsmACfkKy+sM T3FsOfpMqz/WkgyhFoHDnsNMix7+9iuhgQVHa+2cRw3AIbEwsgvPZpdR/BwhcqlQHmSCPw7fcEmu1a biNUSz51QbPHYHW/PBBsMPvZawZXKGycs82Q07EQPrq2bm8G4sFtEpjZMzyA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210606_688882_F0862DFE X-CRM114-Status: GOOD ( 13.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Ensure that the rqspinlock code is only built when the BPF subsystem is compiled in. Depending on queued spinlock support, we may or may not end up building the queued spinlock slowpath, and instead fallback to the test-and-set implementation. Also add entries to MAINTAINERS file. Signed-off-by: Kumar Kartikeya Dwivedi --- MAINTAINERS | 2 ++ include/asm-generic/Kbuild | 1 + kernel/bpf/Makefile | 2 +- 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 3864d473f52f..c545cd149cd1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4297,6 +4297,8 @@ F: include/uapi/linux/filter.h F: kernel/bpf/ F: kernel/trace/bpf_trace.c F: lib/buildid.c +F: arch/*/include/asm/rqspinlock.h +F: include/asm-generic/rqspinlock.h F: lib/test_bpf.c F: net/bpf/ F: net/core/filter.c diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 1b43c3a77012..8675b7b4ad23 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -45,6 +45,7 @@ mandatory-y += pci.h mandatory-y += percpu.h mandatory-y += pgalloc.h mandatory-y += preempt.h +mandatory-y += rqspinlock.h mandatory-y += runtime-const.h mandatory-y += rwonce.h mandatory-y += sections.h diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 410028633621..70502f038b92 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -14,7 +14,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o obj-$(CONFIG_BPF_JIT) += trampoline.o -obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o +obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy) obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o endif From patchwork Sun Mar 16 04:05:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A4241C282DE for ; Sun, 16 Mar 2025 04:38:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=elS4VXsAjSZVIDIoCNxyLW7RW0fHjd1SmHmDu6oNDNQ=; b=yIxpInBeoI7mzQye6ajvQB3ZCv gaEVbZnKF9wh06bFIIOSais97gvHqVZ1QOjpXwfsjToDaE1Sekz+6JZFHVGDEUFYKPoCSKV4Y+Xpi pKTow2fnk5uLPXx3m/5FAYo7gVnUZxDl6N6n/Of9yjHS9VDgZSZlgpoSexUzBPfgQU3+L4jkP91Wu 6uRPTdMgYMBr9RazZPVcBg4cQjVNUnlqBQN1LIC8CMsk9z3Oz0ApMmoXdzT7Go1iB/de5sF1BuPZO i9FWUwgO6AGOgwOQLp8pglgWdlpV7RSyqDY2TyQFajV4lpaC162KP5uuYtlD3VU2Z3Qr8E5FV0OjH 2hlZNb4A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttflH-0000000HGdE-3xRf; Sun, 16 Mar 2025 04:38:07 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGJ-0000000HCMf-3Fsm for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:09 +0000 Received: by mail-wm1-x343.google.com with SMTP id 5b1f17b1804b1-43690d4605dso6607905e9.0 for ; Sat, 15 Mar 2025 21:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097966; x=1742702766; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=elS4VXsAjSZVIDIoCNxyLW7RW0fHjd1SmHmDu6oNDNQ=; b=II5XEAfQDqoQKimpQO3T8ly9n2o7Lc0wSEZKFsf8Tl65LEDI1bryUFeaOxU1Jj8EDB 65DqgcNoqL9K3VH1xa+depruoo9XovUChsFG1iXjXwqGJNIBr8rFeGhdVSON4iBz2KwY C+gqvSaMbhY7Oy13o04CHFIY3ShTygK1CsuZz7VoahuMdxg/iTY6lg1hZgjJc8hXz8ie XLQR849aHm5wkliGsRNJClFDiMwnXXKnOG9/uVYt37eZ9XO/Q8890wT5ireMpT88877H m3Nf1xx89+3v7Mtkcjf4Z3rekgYLvds8Z0N5AsXAjWt1MGmaggIOnVcQwvA/9QHMNTM6 CmqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097966; x=1742702766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=elS4VXsAjSZVIDIoCNxyLW7RW0fHjd1SmHmDu6oNDNQ=; b=giBXACnISM6nAs6/E/JHpIX/Qwog8J/8GxvB/jw27s/TwROhJZ8RBQlL0hNO8gAbM2 JS4oW9yeP4cctMHQlKZ1LwrgAMwnpnA46CYaT3nxj/4WPRJ93Fi4bPeen4+vPIG1wRud I9cJLp0zSvvDawZDC/8MB71NMyKilGq2srA+3edakBntnH9Jt30T2f9kNb0u/8Um+R0a Cn+xsKF5rxj6MAtOlMWFC2KDJr9Yc6JDyRSjkb9+ow+t8XYYaWHch6qBswxFLyNVCJsV X1F7CTL8N6X7d1NlfnbdOvQVZURUaKjorrKkF78Snz4j+xzrIuwkFog74cRyQFHK3NTw Q9Bg== X-Forwarded-Encrypted: i=1; AJvYcCWlJxFd85wPmP+54IMNRUOClcyLAFQ5Doh/SHZK4s6fCvy9r1RAgpwU6MPTFHxchuokq6aFbMsPBR3IxMGjX1CU@lists.infradead.org X-Gm-Message-State: AOJu0YwI8JAd6ggXciP0MnOLPYN2C+9q8n6bJsa1u1ALUreW2Rk5ZOdC gZzTk5SzcqaGLpdLbvpGkEomABwJ+3rz867LWr0PhWSdESEtDCj2 X-Gm-Gg: ASbGnctKG1Mi/Q76lR8T0AOGO0+CgJzzDpTVN7dFu5Z4Ghnc8XnlsF2S7sepjkuQI/H t/6KnnVy+Q8JU38I6NaRDhZ/J1GeqIMCeZYlf4IsGWTlPC/mKV23O7tizrutOPpHGAX2l+Nn4Cj R2lZn03k+090+jqEpmn3mHF6jPrhsQJR6ZjwVWOGaMpe8PemK4vjmRgil++Rrewy3ML2EIKdyCq QnnfMgRYy2qmCscdUFN1/3+sZxcfk1Os/6Gc9hnk3HOW4AD9D3ymSby8SJ3zd3bCRHcaUOMHTlm ZyA6mPfONn00++TEIT3fcH026p5K4zzhKw== X-Google-Smtp-Source: AGHT+IEnbdVB/nE2Pi33KFYYtnULRWUI98TYw/MBJWQBTX6XEUdt0SDKhZrXk7vHOjd1S2LDfZ7qnQ== X-Received: by 2002:adf:c790:0:b0:391:41fb:89ff with SMTP id ffacd0b85a97d-3971f60b104mr8750937f8f.27.1742097966388; Sat, 15 Mar 2025 21:06:06 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:b::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395cb7eb9ccsm11053346f8f.96.2025.03.15.21.06.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:05 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 18/25] rqspinlock: Add locktorture support Date: Sat, 15 Mar 2025 21:05:34 -0700 Message-ID: <20250316040541.108729-19-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2633; h=from:subject; bh=SBe9SrR1gqB5bS55WnFfwdHWzM4LavhATA7RzB99U+Y=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3eNpMeA1g8bEkNbE5LoMw6rWgMD0iqfFTREsw3 1ViDmC2JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RyuqtD/ 9/IGQPEqqIYM45Wbzz/zxdRnXzdviyqlrvI07Exouh0vJd+riQkKUn0fvtxOmYGiW0hf+KSPes3ypJ tSGUhUEHoy4KZ6aHeBpDd+Atw6aeLia+nCh/xXga/cb8an5pVO+3oHinEiNot2dfTsznbw2rgqLI/o 18KlBPsGuCz9o9fNDbrrYW89iCAR62qZm/ELBGZ5tUXVLKJTJU5+/FUG2EM+D6yyeWllgXPFCL9B// ySaQANZ8Bhy9vbrVe4wW67L581XJr2ML9um5yEfmYDqkTanYBBHCU+e7kguhEs2Q0bjT9F9AO609C7 uL1NTzcDsH6c0Avd6a/ITdoNp1E4ZvVp6HQqo/pBk6PXgu1gBcudXCyPs+Igx8NV4FiaP31yQ5e01S 9UbxJSSzxs2+fgRnO/i6pqNrO8Mktz8buhxlr5r2gqkDOXqWeWZEDHrj32fD3WgY7j99ENTR9y98bp L0wcKbx1en8AJckwWYMKtSC5TbsAU1rBvKW1jJtjXsifakOYMtf+dASlhIY/fIoefIOQ98NgY3wJDS RwY9kcDNvp8LkYBgR0/kRIpMAkUqpoon7lKnqgV4ZsxKpi8p5BN3ZSvGMbHMrJs+4gaGIeSnBJFaNM 5Lj0WTWrsYOJXYBXA0iaogmJAwZ8UtN5+FNxg+SHFESqpNMmvjruoQOLu8vg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210607_805699_ED1F1637 X-CRM114-Status: GOOD ( 13.42 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce locktorture support for rqspinlock using the newly added macros as the first in-kernel user and consumer. Guard the code with CONFIG_BPF_SYSCALL ifdef since rqspinlock is not available otherwise. Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/locking/locktorture.c | 57 ++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index cc33470f4de9..ce0362f0a871 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -362,6 +362,60 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = { .name = "raw_spin_lock_irq" }; +#ifdef CONFIG_BPF_SYSCALL + +#include +static rqspinlock_t rqspinlock; + +static int torture_raw_res_spin_write_lock(int tid __maybe_unused) +{ + raw_res_spin_lock(&rqspinlock); + return 0; +} + +static void torture_raw_res_spin_write_unlock(int tid __maybe_unused) +{ + raw_res_spin_unlock(&rqspinlock); +} + +static struct lock_torture_ops raw_res_spin_lock_ops = { + .writelock = torture_raw_res_spin_write_lock, + .write_delay = torture_spin_lock_write_delay, + .task_boost = torture_rt_boost, + .writeunlock = torture_raw_res_spin_write_unlock, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, + .name = "raw_res_spin_lock" +}; + +static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused) +{ + unsigned long flags; + + raw_res_spin_lock_irqsave(&rqspinlock, flags); + cxt.cur_ops->flags = flags; + return 0; +} + +static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused) +{ + raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags); +} + +static struct lock_torture_ops raw_res_spin_lock_irq_ops = { + .writelock = torture_raw_res_spin_write_lock_irq, + .write_delay = torture_spin_lock_write_delay, + .task_boost = torture_rt_boost, + .writeunlock = torture_raw_res_spin_write_unlock_irq, + .readlock = NULL, + .read_delay = NULL, + .readunlock = NULL, + .name = "raw_res_spin_lock_irq" +}; + +#endif + static DEFINE_RWLOCK(torture_rwlock); static int torture_rwlock_write_lock(int tid __maybe_unused) @@ -1168,6 +1222,9 @@ static int __init lock_torture_init(void) &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, &raw_spin_lock_ops, &raw_spin_lock_irq_ops, +#ifdef CONFIG_BPF_SYSCALL + &raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops, +#endif &rw_lock_ops, &rw_lock_irq_ops, &mutex_lock_ops, &ww_mutex_lock_ops, From patchwork Sun Mar 16 04:05:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1E1ACC28B2F for ; Sun, 16 Mar 2025 04:39:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=CF4Uk2MYBA/hkoA+QSy5dWTlz17aW/WlIb/dNVcZIBA=; b=Vz+rvHxDr6F6YMTykaCQ2UmRhW iqRPxHkF2ZtZuNaaur7G3wiJ4vPrTh3H6L0LzDXp9wb14VF2KVFP/cIpM98Cfo5W5L1vBpOAO1vfN nYjDZS5gWfQJYOIFPZEa/NoKazE5UcSmMYwdylNIg8eoPBIPjDAUO2Ne+CNHRos+IKPpN8OF6E4ET luwRv+AfAZXlky+T9x8vlIYYvKn5TAvpJucz2gwxfHPejUS/AyJoGCcFoZewYYXZAu5yW6WfM+t+C 1qRsMwvUrLf5h0X7A+k3hzFuHv90O7QA8fY1j0fdl/ZT5s3VjQFI3vJO0s0Hap7l+jxaM4C1x2kTq ltDXOyKQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfmv-0000000HGkz-2cFH; Sun, 16 Mar 2025 04:39:49 +0000 Received: from mail-wr1-x442.google.com ([2a00:1450:4864:20::442]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGL-0000000HCNQ-1R7R for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:11 +0000 Received: by mail-wr1-x442.google.com with SMTP id ffacd0b85a97d-3965c995151so2199423f8f.1 for ; Sat, 15 Mar 2025 21:06:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097968; x=1742702768; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CF4Uk2MYBA/hkoA+QSy5dWTlz17aW/WlIb/dNVcZIBA=; b=IFCGbjTcCc0kQn0/8Tj8h9c3SphohCzwa3MrAFv7u5SGk++NpSC6ch3ipHpkqNwfje 7hWdVUbq0iPxsA5f3lVRmA5FNZq0pciyIniv1Va/pD31cE6QycJMmtjTAGjxID3LBBUl H8GzR4jrO0XeqbNb3FjR+ISY7ghJEKrAc29PdqD2RY6DAEKq0SxPQ0vJ0KVRDwPew6Zn chbdDX9OnV2C30F3Glhq6SiTVJPKpuPh2eT7Je7m+1VNtq6lVw1lInUYcjb9MAy6MZqg epZOkW2a7mvcA/bCldyVxC1y8Hkj/0HXDnsZzhYNrkZFUMIm63iRV7oztIIjOryStD9C eZYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097968; x=1742702768; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CF4Uk2MYBA/hkoA+QSy5dWTlz17aW/WlIb/dNVcZIBA=; b=gk6BVJNTkrmVt6dfKAeUnPplgC8ELb76PQJI+AZfySxDIgyEoThx1SZqy8pCMWhkh/ rjbQQsV02qk9pitCMn/23k8d8HcnbbXzjMwyNwgbLUjucy5LA5wsdGgnD7BvMNEOxrEY xUDtiMUb4p6fNZoV4iun/DldPvBu3ZjbdPnzt85jpmXmfPmBzaF4cOUUM/mMXXMzfsaO UmRNl2wA58LxR0Xjmjozt1+cCdDlKJ3LL1gJuZ7EUPDeKEjFvdjDDpeaGtwwq3LwyUze OjymM/xTf7FWJe091z7YQEuxQLO9if/pV2EXVuHiz+p5vOlg2tz62DjmDlUMCnMO0YfP 1deQ== X-Forwarded-Encrypted: i=1; AJvYcCWgGWhlwENy2TjPfp34Lr76NnrBSsbLYUI+a7bFJkm637c7dHyQ7UX7IuWw0VJPjXi+1hRq0VoZSOusscX/0w8v@lists.infradead.org X-Gm-Message-State: AOJu0YzHuV720TlsEuH3SB+e+4Nm3yLs/U4KiYzA4ySALAu3Jw1O9Sle H6CQ6Gk39bJjlsDh3e22UlDjpjJT6AXmqzI05hOaJwkSGcsQz0wa X-Gm-Gg: ASbGncsS4hydLPytvK04hjPA8Sot6TqIHbQI4CtuM6rKI1zeLF9u9T74FS79sBhCBvP BH+wXLSOJx9NcYsJlSNUKhIBHm64/Uzerp1SBrcnsrK1LkmYkh+63Ygmmdytjf4zMPc7RgwnSkv Pz1Eb+lPE4Jrs7UzgP/xuhv5JbDocwA5l77jrMfuLroeUJpk7nMlzZMTcjQKtxOIjDRwhsMJc0I 6Vqtj1qzv+Vbg/0ejFhhwH86d70qmkdShJtI2tqDzjDUtaTWbp/I5gFeEPhoYuno25hRB5AWDxQ rulWIRjrPtp+xuLYGlm6QxieN6xmNKDeuKQ= X-Google-Smtp-Source: AGHT+IG/cybe/X/vKmWywaeOPxKG+ExqzgfSAMkufyXQIte9dsBNmp17N0O0pQGVtXSWVe9+PQcBvQ== X-Received: by 2002:a5d:64a2:0:b0:38f:28dc:ec23 with SMTP id ffacd0b85a97d-3971d23799cmr10157067f8f.19.1742097967532; Sat, 15 Mar 2025 21:06:07 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:48::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395cb7eb9d7sm10695515f8f.89.2025.03.15.21.06.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:07 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 19/25] bpf: Convert hashtab.c to rqspinlock Date: Sat, 15 Mar 2025 21:05:35 -0700 Message-ID: <20250316040541.108729-20-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=11131; h=from:subject; bh=VwAQAxwAow7TrrvxCboQzb6cYiUbz63qPyaMUR2Lneg=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3eGs+bJjtEdeiJjw+Cd9PjBR7SOiQpae9Vzm/W Hk3IcQyJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RymFdD/ 4guFTOgG2ezQtX/qNWMi70RSbmgFndzWKODlNByC2n50hmYhhX9QlUKP+tsvZw793omMVa4G7ivzR8 dj195Z7rzAQlMEA+Y2fHGnBQqQWjPQhXQk2bQ1Mvvs4/C/zBpkrWLeXIrs3l4L0VhKDvg+nTGxmJGQ 5pe6Dz+d15FajBJ5NTRVGVX6/BLShHrT3NpUFLx1iGqGO6ilzcz0SKrJXuIX1thW1XMvRbyacIxc8d c4rraPbRxMyYAuYdgHX6O2iJCeDXsVnlCJ9w4YSuWCUoiRrt8AF+73hKjEtO7yhk59ZCUsvojBLWQ6 7OaL3u0YLBcxSyG4OsaCu27eyiDMBSYTJQVfjnq63hJ00wz+6TReX4DDpYAn/KUjeQLxVBgjJ2HRYh JMe4RZjVlfiK51f7sGm9Oy5JzPDbkpQSBBg9DDeMWd+aOkMDTzERynTnkWImFKroy/83XrcICnFQDC uSy6RpJt69kFqYBIjZ3E6vOhQOsNPO2Yrf53s+utbaMKhRLyMdMWYrgSiP2r0CSyFlpEWI3ZWhW9WF mfskIl0ctWP39nXAdkW/yj8vNd3DeBVAAoWJp35wU3L1cBjMrTAyXac4M8lv5Ua+lc7vYEe85mI1ry qx+it9Hd826i/HpocK9iaHfGwuWdtF0b2JNeAuwKceiyeIVeKMW5ebSgsc3w== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210609_415753_B5989732 X-CRM114-Status: GOOD ( 22.29 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed per-cpu counter crud from the code base which is no longer necessary. Closes: https://lore.kernel.org/bpf/675302fd.050a0220.2477f.0004.GAE@google.com Closes: https://lore.kernel.org/bpf/000000000000b3e63e061eed3f6b@google.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/hashtab.c | 102 ++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 70 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 877298133fda..5a5adc66b8e2 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -16,6 +16,7 @@ #include "bpf_lru_list.h" #include "map_in_map.h" #include +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -78,7 +79,7 @@ */ struct bucket { struct hlist_nulls_head head; - raw_spinlock_t raw_lock; + rqspinlock_t raw_lock; }; #define HASHTAB_MAP_LOCK_COUNT 8 @@ -104,8 +105,6 @@ struct bpf_htab { u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; - struct lock_class_key lockdep_key; - int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT]; }; /* each htab element is struct htab_elem + key + value */ @@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab) for (i = 0; i < htab->n_buckets; i++) { INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i); - raw_spin_lock_init(&htab->buckets[i].raw_lock); - lockdep_set_class(&htab->buckets[i].raw_lock, - &htab->lockdep_key); + raw_res_spin_lock_init(&htab->buckets[i].raw_lock); cond_resched(); } } -static inline int htab_lock_bucket(const struct bpf_htab *htab, - struct bucket *b, u32 hash, - unsigned long *pflags) +static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags) { unsigned long flags; + int ret; - hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1); - - preempt_disable(); - local_irq_save(flags); - if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { - __this_cpu_dec(*(htab->map_locked[hash])); - local_irq_restore(flags); - preempt_enable(); - return -EBUSY; - } - - raw_spin_lock(&b->raw_lock); + ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags); + if (ret) + return ret; *pflags = flags; - return 0; } -static inline void htab_unlock_bucket(const struct bpf_htab *htab, - struct bucket *b, u32 hash, - unsigned long flags) +static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags) { - hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1); - raw_spin_unlock(&b->raw_lock); - __this_cpu_dec(*(htab->map_locked[hash])); - local_irq_restore(flags); - preempt_enable(); + raw_res_spin_unlock_irqrestore(&b->raw_lock, flags); } static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); @@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU); bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC); struct bpf_htab *htab; - int err, i; + int err; htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE); if (!htab) return ERR_PTR(-ENOMEM); - lockdep_register_key(&htab->lockdep_key); - bpf_map_init_from_attr(&htab->map, attr); if (percpu_lru) { @@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (!htab->buckets) goto free_elem_count; - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) { - htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map, - sizeof(int), - sizeof(int), - GFP_USER); - if (!htab->map_locked[i]) - goto free_map_locked; - } - if (htab->map.map_flags & BPF_F_ZERO_SEED) htab->hashrnd = 0; else @@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) free_map_locked: if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) - free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_elem_count: bpf_map_free_elem_count(&htab->map); free_htab: - lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); return ERR_PTR(err); } @@ -820,7 +786,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) b = __select_bucket(htab, tgt_l->hash); head = &b->head; - ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return false; @@ -831,7 +797,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) break; } - htab_unlock_bucket(htab, b, tgt_l->hash, flags); + htab_unlock_bucket(b, flags); if (l == tgt_l) check_and_free_fields(htab, l); @@ -1150,7 +1116,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, */ } - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1201,7 +1167,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, check_and_free_fields(htab, l_old); } } - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l_old) { if (old_map_ptr) map->ops->map_fd_put_ptr(map, old_map_ptr, true); @@ -1210,7 +1176,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, } return 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); return ret; } @@ -1257,7 +1223,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value copy_map_value(&htab->map, l_new->key + round_up(map->key_size, 8), value); - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; @@ -1278,7 +1244,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); err_lock_bucket: if (ret) @@ -1315,7 +1281,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1340,7 +1306,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, } ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); return ret; } @@ -1381,7 +1347,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, return -ENOMEM; } - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; @@ -1405,7 +1371,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, } ret = 0; err: - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); err_lock_bucket: if (l_new) { bpf_map_dec_elem_count(&htab->map); @@ -1447,7 +1413,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1457,7 +1423,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key) else ret = -ENOENT; - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l) free_htab_elem(htab, l); @@ -1483,7 +1449,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) return ret; @@ -1494,7 +1460,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) else ret = -ENOENT; - htab_unlock_bucket(htab, b, hash, flags); + htab_unlock_bucket(b, flags); if (l) htab_lru_push_free(htab, l); return ret; @@ -1561,7 +1527,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map) static void htab_map_free(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); - int i; /* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback. * bpf_free_used_maps() is called after bpf prog is no longer executing. @@ -1586,9 +1551,6 @@ static void htab_map_free(struct bpf_map *map) bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); - for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) - free_percpu(htab->map_locked[i]); - lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); } @@ -1631,7 +1593,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, b = __select_bucket(htab, hash); head = &b->head; - ret = htab_lock_bucket(htab, b, hash, &bflags); + ret = htab_lock_bucket(b, &bflags); if (ret) return ret; @@ -1668,7 +1630,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, hlist_nulls_del_rcu(&l->hash_node); out_unlock: - htab_unlock_bucket(htab, b, hash, bflags); + htab_unlock_bucket(b, bflags); if (l) { if (is_lru_map) @@ -1790,7 +1752,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, head = &b->head; /* do not grab the lock unless need it (bucket_cnt > 0). */ if (locked) { - ret = htab_lock_bucket(htab, b, batch, &flags); + ret = htab_lock_bucket(b, &flags); if (ret) { rcu_read_unlock(); bpf_enable_instrumentation(); @@ -1813,7 +1775,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); goto after_loop; @@ -1824,7 +1786,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); kvfree(keys); @@ -1887,7 +1849,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, dst_val += value_size; } - htab_unlock_bucket(htab, b, batch, flags); + htab_unlock_bucket(b, flags); locked = false; while (node_to_free) { From patchwork Sun Mar 16 04:05:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 859D6C282DE for ; Sun, 16 Mar 2025 04:41:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=tjwed54Vqji5j5yoD9/wFQ/aRgNfk0MsQVfeIXpdSXQ=; b=3m5U9Q5RkciD5Lz3jbJVpbw1OR uTYFVzwUhARv9gnqoq5LVe9yvkTJB2GVq1Zd85cSVyBxx/I6eowwdh+W1tsYhfG0UT1SA/B8WRgGe i+MA7ryn3TeVKgzz1/cIibUpO9/EtA1lqYPfcRGuANkgY6fmHodJc1NvmAYVbobVMgYdfVFPQkpaN l2WJzocwyY/rov4wTDWBhpMhn+CEZzn0W7JRuf5M5HY6QNq+sYCUvL409oCVNe3ppxx+za62zf7Tn 2LN/K49v1s9Rlfm60lZ2gyBrH7A0RkIc8r42s7nZrcC1PziDetrY1IBZOZs4QAzpkENlMV9gVIAkb x1YI3/9w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfoa-0000000HGsq-1YrK; Sun, 16 Mar 2025 04:41:32 +0000 Received: from mail-wr1-x442.google.com ([2a00:1450:4864:20::442]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGM-0000000HCOh-1jma for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:12 +0000 Received: by mail-wr1-x442.google.com with SMTP id ffacd0b85a97d-39104c1cbbdso1885958f8f.3 for ; Sat, 15 Mar 2025 21:06:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097969; x=1742702769; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tjwed54Vqji5j5yoD9/wFQ/aRgNfk0MsQVfeIXpdSXQ=; b=IFITeoEwBJXYwaABHQgU0SHkHBJ3rXE6mOnOo6EXgIR8a0QMHEIjx6jN50q62W7QMR CO5FTgO3mH2a+DonkZd/SmumFH1DKY0+VuU3Vf7QELbAAdMxvGf80JiGnnJBYCtgw9q2 uGfZDl9b9CIsNnSlycJJUmyiLffQS1PRuEK6MQ8QcqehDzEK3IT+h1yammCbXUjQMY7d QI/uxhc/SiVTjfdEAni7I+Q2hQAtc8zgEGwBFZ5BPXSo+ehvSGCMPwS5NWEupD9QiSrK 9sFQRJE0/MwNWws98HqcG2hAvGc3gWT1hqr+unH+HoGzl2ehAJ89vHo/enTGJMyJPW4N wQmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097969; x=1742702769; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tjwed54Vqji5j5yoD9/wFQ/aRgNfk0MsQVfeIXpdSXQ=; b=HEv1RRmJRDnsIbjiZz6rycit1VN6L9raJPRAfzJFtIKFgcAYnRbOH7kL9hB+Y8PM+Y VO4qwGP4ErGD39JIjjeF7BMG0eXVSfuzZXBG3Zfjk8AJeSqFX9NNAfZDc+83gmEP/ETm OP7YwifA7Z/1lQt8uSCHRnnV6AIvo7GVDjfJrmjuiLHk5+E2R5o/I4lfuDUwR1+hHghH f2EW8Adupei9Qpf7E1cf2RRKMEAObLg7LGNYlEskAFYlrFPhytTaWFbdr+tTMhbRIDCg TfDE+u1R8tx+r7Gg26Uvp0HVAKZhb2s65xtQ44DFBYKLqlK4twvVbmUem7LIHgdIyFlz NGOw== X-Forwarded-Encrypted: i=1; AJvYcCXGTkdKTBZ6uvLL71GMVnfvzOPrtr4es3AtpkSFxHAVobgkpMAxjaeGo+pVfqXD/lFoNdMI19ru0k4OdLnbU/N0@lists.infradead.org X-Gm-Message-State: AOJu0YywkdiM6IMX3tx1opPRAt9VS6UqcAjzKCJIumml+b5R2y1AYEf5 QvO018lysTolCutG2aWoFpweHBbGCzMmmAuHTgCv4bGotAnIIf/D X-Gm-Gg: ASbGnctsK2Xmvp2Vhz87kqLClQ+9SNTgm7fn9R88IW+gDshC81LJLnrFwGEtPYO/9CC 7Zelw+HTqEtC1W2C0zIa+6D6+DhKpl0iDxZ9gSTD1psr6NsZ+jKwn4jwv6YRk14+YcF3wdiuuz4 LHnXkOyfzPZJOdO/OTgXzOByGCjqROMyrpiqE1nScH3XnU5GGcAR7Ms/Zx9xONwwH/fuY7W4PSD K7m5l9spVA4Qykb41Pgdcs3Bw6lTcngquwcDkLzSzmqDQ28dpa2hhSviwr/2boGc9Vj4YdauM/0 9GcBHQBjwh6NeYSiUipvQZQsVDmeMnY3Qw== X-Google-Smtp-Source: AGHT+IGK4qXm9ArrLCXgXNAYLraEfl7iQZEVMYeN5XftPU/x8Co4B5Nq48EhLKMbp6h132WBFeRBuQ== X-Received: by 2002:a5d:5f8f:0:b0:38f:38eb:fcfc with SMTP id ffacd0b85a97d-3971d136069mr9762245f8f.7.1742097968994; Sat, 15 Mar 2025 21:06:08 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:5::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1fdda152sm67916635e9.1.2025.03.15.21.06.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:08 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 20/25] bpf: Convert percpu_freelist.c to rqspinlock Date: Sat, 15 Mar 2025 21:05:36 -0700 Message-ID: <20250316040541.108729-21-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6720; h=from:subject; bh=9q6tMvL0KIAbFk+sY/U9KRjErNQz0NSPRbrt2C3ZJkE=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3eRwoCpVBivHRVCIfYOVJ3bSQqELoQ3YdqIHaZ 0Qn/pVmJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RyhG8D/ 4gZK23dPYHGv97XCNumvumPpE5iAC1RvxTgRMG3f8G3jsmtWgXSKJJX0I/LPV9SIjEVENK69dF8OYo JtJzi6M0frKXWOH3i348qYosuOyV2drlvwfNUUEFB30EDCZyh15S5rPPv1UpNngps6KckmlscR7wKN MXqabQVbUGrYpUvjSK5gYCr7BOGqDYpZ1XagCXPKfXCOsOOn1H3cDaPCfaf8XxgcflHHLVxFBMvW9y +YzbVai2PpwEalAolVi6/DC77pJhgfX7rwqhMncu4/xkyFnH87t5JONBFJHZf45QeGx0A4agc9nOg8 rxjniya94z/TfDYqZLAfQEZIV9kAGOUMvPwhEnXyYuDHDDRec7tKNSO+P2IUb2hYefVM3Ghm0n4p0P CmZc9N/Y2GIRXvVl4VLGwfu/ensS1wCFj82w76Ig1qpUYwppwWkH9o5bWb9dv3U8lmN3c8F9NHEx1o OIsPoR47YLMoasxqRtXiWe2dvZbilJdVqJelZipX4dCEyIoC+1YYy6r2/ftW3XakSo6UWBre8hFohm bGJu1Qo3BxBDSej8CFT3E3KsVzZ5XVrF/RPr0liFKY5TyEV1kO33w/oEzrui9Z/irDz8gIF9aF4Y8z u+gd5bcIJtqR98WDZhEnXhXlVzZ3ZFtHlkSpmLfDqE0N5VNrh6rqLtwffGVA== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210610_479951_C05AD7CF X-CRM114-Status: GOOD ( 20.34 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Convert the percpu_freelist.c code to use rqspinlock, and remove the extralist fallback and trylock-based acquisitions to avoid deadlocks. Key thing to note is the retained while (true) loop to search through other CPUs when failing to push a node due to locking errors. This retains the behavior of the old code, where it would keep trying until it would be able to successfully push the node back into the freelist of a CPU. Technically, we should start iteration for this loop from raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus, we skip execution in the loop body instead. Closes: https://lore.kernel.org/bpf/CAPPBnEa1_pZ6W24+WwtcNFvTUHTHO7KUmzEbOcMqxp+m2o15qQ@mail.gmail.com Closes: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/percpu_freelist.c | 113 ++++++++--------------------------- kernel/bpf/percpu_freelist.h | 4 +- 2 files changed, 27 insertions(+), 90 deletions(-) diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c index 034cf87b54e9..632762b57299 100644 --- a/kernel/bpf/percpu_freelist.c +++ b/kernel/bpf/percpu_freelist.c @@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s) for_each_possible_cpu(cpu) { struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu); - raw_spin_lock_init(&head->lock); + raw_res_spin_lock_init(&head->lock); head->first = NULL; } - raw_spin_lock_init(&s->extralist.lock); - s->extralist.first = NULL; return 0; } @@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head, WRITE_ONCE(head->first, node); } -static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head, +static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head, struct pcpu_freelist_node *node) { - raw_spin_lock(&head->lock); - pcpu_freelist_push_node(head, node); - raw_spin_unlock(&head->lock); -} - -static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) -{ - if (!raw_spin_trylock(&s->extralist.lock)) + if (raw_res_spin_lock(&head->lock)) return false; - - pcpu_freelist_push_node(&s->extralist, node); - raw_spin_unlock(&s->extralist.lock); + pcpu_freelist_push_node(head, node); + raw_res_spin_unlock(&head->lock); return true; } -static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) +void __pcpu_freelist_push(struct pcpu_freelist *s, + struct pcpu_freelist_node *node) { - int cpu, orig_cpu; + struct pcpu_freelist_head *head; + int cpu; - orig_cpu = raw_smp_processor_id(); - while (1) { - for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) { - struct pcpu_freelist_head *head; + if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node)) + return; + while (true) { + for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { + if (cpu == raw_smp_processor_id()) + continue; head = per_cpu_ptr(s->freelist, cpu); - if (raw_spin_trylock(&head->lock)) { - pcpu_freelist_push_node(head, node); - raw_spin_unlock(&head->lock); - return; - } - } - - /* cannot lock any per cpu lock, try extralist */ - if (pcpu_freelist_try_push_extra(s, node)) + if (raw_res_spin_lock(&head->lock)) + continue; + pcpu_freelist_push_node(head, node); + raw_res_spin_unlock(&head->lock); return; + } } } -void __pcpu_freelist_push(struct pcpu_freelist *s, - struct pcpu_freelist_node *node) -{ - if (in_nmi()) - ___pcpu_freelist_push_nmi(s, node); - else - ___pcpu_freelist_push(this_cpu_ptr(s->freelist), node); -} - void pcpu_freelist_push(struct pcpu_freelist *s, struct pcpu_freelist_node *node) { @@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size, static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s) { + struct pcpu_freelist_node *node = NULL; struct pcpu_freelist_head *head; - struct pcpu_freelist_node *node; int cpu; for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { head = per_cpu_ptr(s->freelist, cpu); if (!READ_ONCE(head->first)) continue; - raw_spin_lock(&head->lock); + if (raw_res_spin_lock(&head->lock)) + continue; node = head->first; if (node) { WRITE_ONCE(head->first, node->next); - raw_spin_unlock(&head->lock); + raw_res_spin_unlock(&head->lock); return node; } - raw_spin_unlock(&head->lock); + raw_res_spin_unlock(&head->lock); } - - /* per cpu lists are all empty, try extralist */ - if (!READ_ONCE(s->extralist.first)) - return NULL; - raw_spin_lock(&s->extralist.lock); - node = s->extralist.first; - if (node) - WRITE_ONCE(s->extralist.first, node->next); - raw_spin_unlock(&s->extralist.lock); - return node; -} - -static struct pcpu_freelist_node * -___pcpu_freelist_pop_nmi(struct pcpu_freelist *s) -{ - struct pcpu_freelist_head *head; - struct pcpu_freelist_node *node; - int cpu; - - for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { - head = per_cpu_ptr(s->freelist, cpu); - if (!READ_ONCE(head->first)) - continue; - if (raw_spin_trylock(&head->lock)) { - node = head->first; - if (node) { - WRITE_ONCE(head->first, node->next); - raw_spin_unlock(&head->lock); - return node; - } - raw_spin_unlock(&head->lock); - } - } - - /* cannot pop from per cpu lists, try extralist */ - if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock)) - return NULL; - node = s->extralist.first; - if (node) - WRITE_ONCE(s->extralist.first, node->next); - raw_spin_unlock(&s->extralist.lock); return node; } struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s) { - if (in_nmi()) - return ___pcpu_freelist_pop_nmi(s); return ___pcpu_freelist_pop(s); } diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h index 3c76553cfe57..914798b74967 100644 --- a/kernel/bpf/percpu_freelist.h +++ b/kernel/bpf/percpu_freelist.h @@ -5,15 +5,15 @@ #define __PERCPU_FREELIST_H__ #include #include +#include struct pcpu_freelist_head { struct pcpu_freelist_node *first; - raw_spinlock_t lock; + rqspinlock_t lock; }; struct pcpu_freelist { struct pcpu_freelist_head __percpu *freelist; - struct pcpu_freelist_head extralist; }; struct pcpu_freelist_node { From patchwork Sun Mar 16 04:05:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BF6E2C282DE for ; Sun, 16 Mar 2025 04:43:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=bm/TjLVsOIQb5qeMDxohaetZDngVOOW8hl6xb9zJkyk=; b=b2U2DTUC1z+eXyrQg3SGwXwaed Qf0dnmA1FI40DVbymns56NZtMtzN7pCEBf6egnlfkpK3ZpXQ2YLYYjASOXI5DVxUTgCVEivAzCdsy 4XhOMchflAp/OHOhHr1XU1O7QMU2IWFnvjC+oGzAPV+paB0P8qUFhrm9/VABKHP6M3LQbRUCQCR14 rIt1emvztiAZqBkmP6fuJ2CtVEGb8a8Xm9OokjtjuIYfzhM6D6KAZgSzbz0/Trkzz+0O7bmVrkMb8 uvkFMOBsupqnv4v/cQTcRwvZ2hXI7w/k9TQPSGXFQZNAY8gBltmYU2lwkCDm7bT4NcXAMnkLJEkKM gsWakLsw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfqE-0000000HH0a-0swl; Sun, 16 Mar 2025 04:43:14 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGN-0000000HCPO-1iZB for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:13 +0000 Received: by mail-wr1-x441.google.com with SMTP id ffacd0b85a97d-39104c1cbbdso1885962f8f.3 for ; Sat, 15 Mar 2025 21:06:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097970; x=1742702770; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bm/TjLVsOIQb5qeMDxohaetZDngVOOW8hl6xb9zJkyk=; b=B8TGxkvpv+rw6/iOsaKx1rQowkBr5k9DJxVV8ZBfD/bDmhIRo+/9fLW72BzDU/eLtm dd99FWOIjfIR+9azPF1cH/+UZG/PkG7whQM8PlaqQk1izGgA/JbcMMmrU6Evl5y+uJpt vExgnMXjT1sPEJH2RwV0t7hZLtz2jZHfYhbuo032ZQ7QivGh30C+6ed1ThoXZ0/gt7Ct NzQBs4GatXgUcaOt65Ex2SEeNgUE0i0f6rtQagkoNlbo7Hjc+cAvqsNVhaw625PlHpq3 8CpX1qibhJODY+JCCjPBvI4pQx9ZT5+5yZc5d4bAX+F3zRy5dquwCa5aen6BF+6s2Yf0 nFRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097970; x=1742702770; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bm/TjLVsOIQb5qeMDxohaetZDngVOOW8hl6xb9zJkyk=; b=SeF5PVPsChay30Yi64OKaesyC+ugdRS+czscKX7/Ue0bTSSm5s0c1Nb+TDsUFukEsy 2aHFDgIXmDJ4jXzb2LEcILWCeWA/FF7SCdFmrjXuZW0Oa2z4CB6WYN8YmABI2dtondEl ORLbva8r3WjK89Ni4z6LeBRTfxZPjXOF0/LyGOqKcbMiltuyfEiqR7wpwrkZ/THpin1m 0dwaV14U7SUn23SwRIhrgP///T3LBGMkmHev46mF2vaWG0bC4Pe7Ah28XI7JjOMOWP6h IfzNZeWp8rRljQQgprYuBZT/dc+wapq+giYEiTnNkYmHz5HvhoI9jtIzJVRsFeF7BGul Ul9A== X-Forwarded-Encrypted: i=1; AJvYcCUz4x0oYkqrkpTI8GdwIkmsI94e317/k5K7IEY61pGlX/XKlZFPh3ye+bobwssPJPLxecKAiY/Sk5zEoN+Wdy32@lists.infradead.org X-Gm-Message-State: AOJu0Ywwj8mOKCxHjaWJ6GWGgEqk5ktUI6G+NtWDc0323zoTNJII29U5 GNVk2BvTmVGvenOE/LmLzNvaNcYfuSkTIM7q2JEhNZYmYsgs6S9+0nJlUQ2ojE0= X-Gm-Gg: ASbGncu3JeaqmaPtHTo36YBG09gHeatQwon3/vmteoEZsk03lpIBouu4oQU899QufjH UZHm8OpaKtQQRGTac7PaOrbMuEaOv1AhOFyE2dleXgRlUDlp0wAELcurmmJhyePwalNN97R178y Oz/Wnc726Kl1NKH1+3lJsTuOvqy9ryUl3urmBjcgl2flLQJ4hD4C81JXdF5n6zii10oAmLBuyt1 Qt3vi6Qwzih1EBfI1lZC7NP9Y41G6IWjYMkyfVbslqHki5MumS7jeU9ROMJ0uyLw8S3bHF7/BpW 6S0/NOrfMQwH8sIRnzd5Qsy+CpOHhX9mxy0= X-Google-Smtp-Source: AGHT+IGURbU24AaTjAxqIKJjqCn2mEjZeMLGKyXVZQ1FJ1WehOwHXfyeGE85aKEnu+FeFGs/jt2rrQ== X-Received: by 2002:adf:a39b:0:b0:397:5de8:6937 with SMTP id ffacd0b85a97d-3975de869d5mr7109406f8f.41.1742097970041; Sat, 15 Mar 2025 21:06:10 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:72::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395c82c2690sm10726584f8f.25.2025.03.15.21.06.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:09 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 21/25] bpf: Convert lpm_trie.c to rqspinlock Date: Sat, 15 Mar 2025 21:05:37 -0700 Message-ID: <20250316040541.108729-22-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3875; h=from:subject; bh=JAC4N7kXKK3pnjvfn/+uPUJ4VSIRwkWi681l7P20D6I=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3ePndVFVpKoLfCEr0LeaQTth201gHenOAJqnlp 4yxo4/+JAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RyqETD/ wPY/kdxYtWMwPuTaaDUV/B3x37peOYAWOdkQcKpbuywBe0gInjoXHOEs+9kkyNps+nUXY3V1aFV1sh HCKWqCMPO7iBeiB4f45fMwvNewTenIvKf+TNJluOSt/v1CzNEil4pBSqsDlCXmZr8B7JXHb796aCm+ 5WfEq7kYFsgD21IEcJQNF4uTNry5R1ceqOAZvdd4y1NmNljt/7G8QS0IJv3zlu/0+BIunPtpqFbpVP 5z55MtMIE1CGFkAHVq48XJVIB9TfWzMIoyD6LKaRKQbrABVSTvT2LTCRcJkvJFKXpEKGbuYcmjo0pK O1yf/FMseMc11MT7/8ZDm/ezGZao7FjQfCb3U94APSqklR8acBNVZlQeLBwTSY1KIAnQu5C4zjoEme yPoSD0GDwU1vuNhQZvVvL5JSlqw6LXR5JgtO2hm1HOvelVuwOon0Mimv8uG/tiH9K7vPJr83ZdKO6X G4BmtnD77NsUAbSGZsNDHKg0c8/kyezVplqtG2Su21wBajRyeNwORZRBpxsxuP6d6oi3H2Fw43nklC +4rCMj1HNzhTA2XLuXrVuqohGykMqtcQZZNUzk4T6xxGMDuQYGV53FMQZWK9zFKZckR/KT1y/gf3zD rVdO2lI+Ki076kbGMvCDeNlQEq7beSPEriiu/ij7C2wMIRHqnyK28WWFR79w== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210611_472191_46125D39 X-CRM114-Status: GOOD ( 18.02 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Convert all LPM trie usage of raw_spinlock to rqspinlock. Note that rcu_dereference_protected in trie_delete_elem is switched over to plain rcu_dereference, the RCU read lock should be held from BPF program side or eBPF syscall path, and the trie->lock is just acquired before the dereference. It is not clear the reason the protected variant was used from the commit history, but the above reasoning makes sense so switch over. Closes: https://lore.kernel.org/lkml/000000000000adb08b061413919e@google.com Signed-off-by: Kumar Kartikeya Dwivedi --- kernel/bpf/lpm_trie.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index e8a772e64324..be66d7e520e0 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -15,6 +15,7 @@ #include #include #include +#include #include /* Intermediate node */ @@ -36,7 +37,7 @@ struct lpm_trie { size_t n_entries; size_t max_prefixlen; size_t data_size; - raw_spinlock_t lock; + rqspinlock_t lock; }; /* This trie implements a longest prefix match algorithm that can be used to @@ -342,7 +343,9 @@ static long trie_update_elem(struct bpf_map *map, if (!new_node) return -ENOMEM; - raw_spin_lock_irqsave(&trie->lock, irq_flags); + ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags); + if (ret) + goto out_free; new_node->prefixlen = key->prefixlen; RCU_INIT_POINTER(new_node->child[0], NULL); @@ -356,8 +359,7 @@ static long trie_update_elem(struct bpf_map *map, */ slot = &trie->root; - while ((node = rcu_dereference_protected(*slot, - lockdep_is_held(&trie->lock)))) { + while ((node = rcu_dereference(*slot))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -442,8 +444,8 @@ static long trie_update_elem(struct bpf_map *map, rcu_assign_pointer(*slot, im_node); out: - raw_spin_unlock_irqrestore(&trie->lock, irq_flags); - + raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags); +out_free: if (ret) bpf_mem_cache_free(&trie->ma, new_node); bpf_mem_cache_free_rcu(&trie->ma, free_node); @@ -467,7 +469,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) if (key->prefixlen > trie->max_prefixlen) return -EINVAL; - raw_spin_lock_irqsave(&trie->lock, irq_flags); + ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags); + if (ret) + return ret; /* Walk the tree looking for an exact key/length match and keeping * track of the path we traverse. We will need to know the node @@ -478,8 +482,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) trim = &trie->root; trim2 = trim; parent = NULL; - while ((node = rcu_dereference_protected( - *trim, lockdep_is_held(&trie->lock)))) { + while ((node = rcu_dereference(*trim))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -543,7 +546,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key) free_node = node; out: - raw_spin_unlock_irqrestore(&trie->lock, irq_flags); + raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags); bpf_mem_cache_free_rcu(&trie->ma, free_parent); bpf_mem_cache_free_rcu(&trie->ma, free_node); @@ -592,7 +595,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr) offsetof(struct bpf_lpm_trie_key_u8, data); trie->max_prefixlen = trie->data_size * 8; - raw_spin_lock_init(&trie->lock); + raw_res_spin_lock_init(&trie->lock); /* Allocate intermediate and leaf nodes from the same allocator */ leaf_size = sizeof(struct lpm_trie_node) + trie->data_size + From patchwork Sun Mar 16 04:05:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 06620C282DE for ; Sun, 16 Mar 2025 04:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=1VnxgBQ3rAJeGHFIGyB+ChpGIi/1tYmBGxWhiMsjK7I=; b=LODjpst1dolycWQYSOeexltXbY BWuTuumSye3S9OR/5HhxLI5PzgqcACKgV1MpYsP48xLp+RVQLxYwn2QgIm7eDWrMYTZG7DC3CXDzu 3gaYTKQBu93903T8Nt/Z3HgQMW9aIZDZvRI8djiHd4jhbuGcXHGpIZPl8EejwzEfQd/ClEv4SrczY GqmqLKtchqXWPazKC5TqdQ8UmchcGiRqLASLyucqdJmx/U9GX4IKzNi+AuGhHWZol9egogd6hiLKk Bxj3kNkQoyXk5EBcHBHR977asMQtAkHDp8Mcxla2uv0WflI5VAdNIb4w0tZOWTHMPtSJvbZ5avcDZ /yTPMLnw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfrr-0000000HH9L-3mAI; Sun, 16 Mar 2025 04:44:55 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGO-0000000HCQc-2wwp for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:13 +0000 Received: by mail-wm1-x342.google.com with SMTP id 5b1f17b1804b1-43d0359b1fcso5885745e9.0 for ; Sat, 15 Mar 2025 21:06:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097971; x=1742702771; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1VnxgBQ3rAJeGHFIGyB+ChpGIi/1tYmBGxWhiMsjK7I=; b=DGasknkeUgABWJ0ivInfb5pHcwGHrSS160Qkaont6y+xonbuzxGJHYOdas6fUCnFHy Upy+9wsSTnsZHdoRc62YiaWKkba9lNN2LLzDzg5dkUFCwlFBGWoVdx3M3Cd5vfVW4WgA yqOg4tyFzO1MfcdpL/A7ClqgPfEstpo6EGFyObYQu5XfH4BJuKSrSK9hNbn3Ie95JqvI RHxYMPtNUlOeLQRTSSVpa2IojbiTZNLdlJpSOzzK0ip+U5twml7xw1oPJTptpSRcnkxA kZY7GUTbmOS8EQR/t+tQzXJOIiuaDaa2k+QrfEXWpuhhGZS0Qaz8rZ70WdLf9iLTTW7Y Pg9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097971; x=1742702771; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1VnxgBQ3rAJeGHFIGyB+ChpGIi/1tYmBGxWhiMsjK7I=; b=vtkoN/D4aFJikbR8vTsD0KaCg/WqmiVnU4Ijy6wXE7iadKSS23js93rAMoBe+N3W5x dcQaWgTypklsaVvJP20dKQRWuvIEFyElg//JjKndpvaETS0wXgfzcjLPp7O5sFiPBkn3 GoPokAfiqXCxIOx0gwVB4lBnZEHLAMhGcFxdbWC+Phd6JIaxBXT1+83ET29KbUqmwrJ4 Nf40+3pP4Ywuyi2sRsrNg6BOBPm1AfNCftYb9ywDSpwEFTTekP94rBNzCrdfxMIXGwtA Z96tgOuWPskf/dNk4cM21DIMAqfV2WbnrZm1rmzsD+lpqm2mAK+TrzzrrbEMAwhIQ/t4 mlpQ== X-Forwarded-Encrypted: i=1; AJvYcCWzj8whWhQQwFWqlYND6VP+meeQ/q2whb/Ex0HdDjCDJfZBYbk2D+XwJI6jou89UHhNb5dVkHUWLxlBAUBTPdJE@lists.infradead.org X-Gm-Message-State: AOJu0YxGMagjT/bODJC5p0gSGD5zXnqhMadHPlswX/mm3N6luD0CvYcv A8unWYShiQL+fBZWcMgdjef8ZqS7aKHpNpahIImFAYd5azIK0CPw X-Gm-Gg: ASbGnctAR1uYcoPr/LUST5WMjPG4GiAnI5ijG4bLvCIwM6iF5WAb7c7dKcJoTryjgTB Xgoo1C+PJo9KwvLDR1zYkmy/kCbB1VU+YEbhtFEwPfrPnLi88TGVQHUmtpOMCsViKYiKUc1WWmh 6Ua+uP2laDTGAdAQXedUr76n56N1njbIJh6lYF2VbCVlLeM2NFCRZHyL0S05M0C8lL3Jsx3lJ1U 6VQUOZHyKvptc4NQW5OmWZFTfZlV6Wqd/f1o37yhZuheLHb7bP07Q8ELzo21bSzFUO7ZIXZ003K 8Qgraypaor3T72WE+209mH9DdzvMOW4K4oGPYGxxHXuoKA== X-Google-Smtp-Source: AGHT+IHykLJ5NvwHyRbYtDv/1bZEgjZY12X6M0CYxwI6OW/IzydFKfy0Lmd5AEeKjh0K2mfT3jL26Q== X-Received: by 2002:a7b:c2a9:0:b0:43b:bb72:1dce with SMTP id 5b1f17b1804b1-43d1806bfc6mr111911795e9.5.1742097971213; Sat, 15 Mar 2025 21:06:11 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:72::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395cb7eb92csm11124023f8f.91.2025.03.15.21.06.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:10 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 22/25] bpf: Introduce rqspinlock kfuncs Date: Sat, 15 Mar 2025 21:05:38 -0700 Message-ID: <20250316040541.108729-23-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5059; h=from:subject; bh=5BKl9hYgLyro6F1bzOCMfYjd/K5FyedLnNqGmCHan+A=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3e+fudrbbc/TgS+/ixztDuZ5UbJUeXYBElGNak SEdzC0OJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8Ryqz4D/ oCflyjgtF5/HlKZj4F2D61xd+E+vDVH2+zf1tMkPdHH8lyU299BGDRQqY2WtOTwcBLUY9OqorBhQ8H rjwEGY2fs77qokXPHkKwGo9ZCV9KoFXcx2t1akp8zymB9tkkxVrbMvKrD4LJLkmJp4+0Gk0P4dmTT6 uL5NE4sFMWsPdIdSevyjlGvL/98JDGWrIzfqzVSbHA9M6AJbeyaL5upuniadERgB4b7/UNXbBeSzt4 ZLFS9zpB8NYwZecwzgOydM6/jWOQPjeh0pUyf8lwSq17ZzhYsHp2945rJhtVwSxt82T3WDRPib5MTU 7t4sgoBzUKN9xFg23oq0eZbuWRdF0BXps6RORdd5fgrSHtEYt5KHaSYff51NxW87b+eNlFzdKM57pA Pez01gQmSKdydkbA0jpu+VYH962Jkoigu2ho3Khwrr2JuWqOcAz+5Y/SnXDhbU8W/UpTH3SK2Wy4kF QMpW9/D0CW2bAdIgChF/jU9dRdCKMMLFH+OpADrQQfn4zvU7cWdzA/1BPNh0bLBYEX7K2va7cXSA+g uHGz5ivgqNl9TakLfq+7exi7BFGrnYDGJZ9qjvtAdu3EDnpEX3iX5MFWcOfOHBmWnOTiGMi5x2ks/7 zQL3CJIhr4Uc+LeK+wpPS9WJnwsrk8CuEMSTyIC9XeeQUQWEeb4pfptwOM1A== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210612_842050_BE294CF5 X-CRM114-Status: GOOD ( 19.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock, and their irqsave/irqrestore variants, which wrap the rqspinlock APIs. bpf_res_spin_lock returns a conditional result, depending on whether the lock was acquired (NULL is returned when lock acquisition succeeds, non-NULL upon failure). The memory pointed to by the returned pointer upon failure can be dereferenced after the NULL check to obtain the error code. Instead of using the old bpf_spin_lock type, introduce a new type with the same layout, and the same alignment, but a different name to avoid type confusion. Preemption is disabled upon successful lock acquisition, however IRQs are not. Special kfuncs can be introduced later to allow disabling IRQs when taking a spin lock. Resilient locks are safe against AA deadlocks, hence not disabling IRQs currently does not allow violation of kernel safety. __irq_flag annotation is used to accept IRQ flags for the IRQ-variants, with the same semantics as existing bpf_local_irq_{save, restore}. These kfuncs will require additional verifier-side support in subsequent commits, to allow programs to hold multiple locks at the same time. Signed-off-by: Kumar Kartikeya Dwivedi --- include/asm-generic/rqspinlock.h | 7 +++ include/linux/bpf.h | 1 + kernel/bpf/rqspinlock.c | 78 ++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+) diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h index 23abd0b8d0f9..6d4244d643df 100644 --- a/include/asm-generic/rqspinlock.h +++ b/include/asm-generic/rqspinlock.h @@ -23,6 +23,13 @@ struct rqspinlock { }; }; +/* Even though this is same as struct rqspinlock, we need to emit a distinct + * type in BTF for BPF programs. + */ +struct bpf_res_spin_lock { + u32 val; +}; + struct qspinlock; #ifdef CONFIG_QUEUED_SPINLOCKS typedef struct qspinlock rqspinlock_t; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0d7b70124d81..a6bc687d6300 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -30,6 +30,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index ad0fc35c647e..cf417a736559 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -15,6 +15,8 @@ #include #include +#include +#include #include #include #include @@ -690,3 +692,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath); #endif /* CONFIG_QUEUED_SPINLOCKS */ + +__bpf_kfunc_start_defs(); + +#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; }) + +__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock) +{ + int ret; + + BUILD_BUG_ON(sizeof(rqspinlock_t) != sizeof(struct bpf_res_spin_lock)); + BUILD_BUG_ON(__alignof__(rqspinlock_t) != __alignof__(struct bpf_res_spin_lock)); + + preempt_disable(); + ret = res_spin_lock((rqspinlock_t *)lock); + if (unlikely(ret)) { + preempt_enable(); + rqspinlock_report_violation(REPORT_STR(ret), lock); + return ret; + } + return 0; +} + +__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock) +{ + res_spin_unlock((rqspinlock_t *)lock); + preempt_enable(); +} + +__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag) +{ + u64 *ptr = (u64 *)flags__irq_flag; + unsigned long flags; + int ret; + + preempt_disable(); + local_irq_save(flags); + ret = res_spin_lock((rqspinlock_t *)lock); + if (unlikely(ret)) { + local_irq_restore(flags); + preempt_enable(); + rqspinlock_report_violation(REPORT_STR(ret), lock); + return ret; + } + *ptr = flags; + return 0; +} + +__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag) +{ + u64 *ptr = (u64 *)flags__irq_flag; + unsigned long flags = *ptr; + + res_spin_unlock((rqspinlock_t *)lock); + local_irq_restore(flags); + preempt_enable(); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(rqspinlock_kfunc_ids) +BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_res_spin_unlock) +BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore) +BTF_KFUNCS_END(rqspinlock_kfunc_ids) + +static const struct btf_kfunc_id_set rqspinlock_kfunc_set = { + .owner = THIS_MODULE, + .set = &rqspinlock_kfunc_ids, +}; + +static __init int rqspinlock_register_kfuncs(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set); +} +late_initcall(rqspinlock_register_kfuncs); From patchwork Sun Mar 16 04:05:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 54E00C282DE for ; Sun, 16 Mar 2025 04:46:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=jlan5+8nB6QkCIN4GJXMu7h0ZNo65385jC1+uYY7jKU=; b=e6dl3bPgrTERIUNkUTUtAeOtjS 18WUw7bLuNwbpxTppjYbWdJMGJQxyvZzPclWWSfx9CNFuNyAbmmYsU2wXBXJUDsH2O6ITkDyv1+Zl jNww46sJp2FxSxl7V4bQtEYDjVdAnTy1TyhkxeBwxhyz/Dsnu8BDcpiRFVX76a7bUc2QReyu4RDmf rpXZ65dsDGkM/TQ29Foq+TRapuk7gVfvrKRxc9w91Odxn4zbWk6VitYKV29VyO5RMH1Ecaf2KCQgJ B8cOVzdyQpJ8El73Ri90kNuYckxvqBdchVjNo8voOCIbMRAgNck9oSlcOmpJloqiN2vdEag8vGC7z x/yUf3Ng==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttftV-0000000HHGF-2Len; Sun, 16 Mar 2025 04:46:37 +0000 Received: from mail-wr1-x444.google.com ([2a00:1450:4864:20::444]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGQ-0000000HCRW-11cx for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:16 +0000 Received: by mail-wr1-x444.google.com with SMTP id ffacd0b85a97d-391342fc148so2134860f8f.2 for ; Sat, 15 Mar 2025 21:06:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097973; x=1742702773; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jlan5+8nB6QkCIN4GJXMu7h0ZNo65385jC1+uYY7jKU=; b=icX0dBvgay6FZOlsazjcU9e3+pZQPgi7aGS0ijZCQVgd3O7xA2+UTNaCZvr1/wppUg 6k5ZlFB7WS1nz3JlEkiMWto1YxjuPs70ajiTZg//L11ZAps+jtnhRamdPlQneyIca9St Gpihw1ftBSdEo8jCqr/j1YaXAeL5xtim3qm1tJsGLHnDzeoItArvjdk4hLECBliDH8Od 1GdzkJQw9+9hq9hjapEqAUv8bvCaORC/k1hzp50t0y6RZSNkaEynWEvskqVdGQh83XFx cybadEokJG98MwTfuc+7xUsTc36K8USdW6pKfyv/C8lrabnfu0hRALxjODQPv1tI9V98 1FkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097973; x=1742702773; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jlan5+8nB6QkCIN4GJXMu7h0ZNo65385jC1+uYY7jKU=; b=OTTqmTjS9rHn4I00cHc7R9btN8OZ0S9MrxSUcG8vIsOp4DS3TCp/dVdmoOjouj/3+9 PHQCpYeg1xB92CNwmsHrA6GSYGmKvyoGltkBhv6naXbHCeN0KHXJksX8yHrUUiI+B8FJ 5DJJLOICnyjsgxZngj8WneFABp5oHybYC1ul20/UzOSX3Z1/cYivSWUn/E2N8BqtnAQY TOpEwyCtE+SE7U5dWYh0V82Eyr7/rmtzfZK0z6MW6SAO8PFOrBEjbYbt2QCV6qp+EYUy Gijz4nrExmr16Pi2pn6aEbEZW9IxoQNnZbP69iMvSM+vpNM+AbXmm86NVKmF/NjMwF8A XsFQ== X-Forwarded-Encrypted: i=1; AJvYcCUT/BQFIyO7Hb58FS1uteyZe/ftZOVgTGsBwkcsnV0HsnruMizRtUvQWeiU7JcBG+Mh2UOx63t9tC5gwWXil3vp@lists.infradead.org X-Gm-Message-State: AOJu0YyTlCN7vK2mu+IzIwDe0q9Iehx0kAZLI6hsqbQp/XpzVpKg5pq9 /ySILZg4C091Xuv32KJwbnyIa2fOGZwevG986I83leh9CqaV0cw+ X-Gm-Gg: ASbGncuXaWqd1jRfBs+rsz1nI6liit5EicpMRS7GvXS4NIZe1GBiRfHprSIO+0507K3 vIxTEVa1O6Qolrc1NdBFLgGT/lmJQYzLUiL0qUICWVZt7N3G7360EAwH5rBAUKj1Be94YHctJs8 H4p+bkI1WGHTESbpcT6Lo1Ir21TTOOEY874JkdfBdwemgtITRIpTN70gmKypbUjJfThPgMJMVXx 8JpPFhsarIT9x6ecRIzZQqoMFq2pHWfIr246gb9M07Wq2iIyNqrvgUNSAsXhkGAnqNoDVLvDRf0 iMM77YQHcExZpIWqa7HQ5SFpitN3aFai4A== X-Google-Smtp-Source: AGHT+IHDSQIlwm4Q3qq0hAantsqR4ybygRtjWMix0xCMm33UfTOrzlQ66jTk8DoOk1JpkKBXfsyQsw== X-Received: by 2002:a05:6000:1567:b0:38f:2b77:a9f3 with SMTP id ffacd0b85a97d-3971f12c847mr8060318f8f.43.1742097972749; Sat, 15 Mar 2025 21:06:12 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:a::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3974d771160sm7104025f8f.19.2025.03.15.21.06.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:11 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Eduard Zingerman , Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 23/25] bpf: Implement verifier support for rqspinlock Date: Sat, 15 Mar 2025 21:05:39 -0700 Message-ID: <20250316040541.108729-24-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=28432; h=from:subject; bh=TKYj1e85LfDF4On+2xNv88oL+YiC+nakNwpx1a84p3o=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3e0js0FKGxFyeBABFuS0o37z2OuJlkez0uq7nl JIfbb/iJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RylgaD/ 4gq3/KWE9tWIFsZftxkV2ziOGJF2+y4efQgWE6NGbCPAZl+2BQ85qW1ZHSSSWFdPXrzrIFfVl6ZwHB LlEYaOCZbFGhOgKi/b+iieH6Aoc4KNpyDf1mXEAxn7pihQ0+2ZF8eRz6O66TRN8haCppKvOZnEzrki osg6NktnS9tCmv5IiQMPqqrqD4cPaGZR7Y6UNn/i9v74BmR6vXqlxr84Lkc3uBGJf+LOdbYR3qacdw cGppfL4fio0YHfLel60vHLOkjf2yeNItsPjX0QI4eqaUUXfqLfRW8LkdCreXy8RBTIbukWvWQpxEvX F+SiXwBIh9saPuHgaV318ANQn0N8d5xtVSEDbKRe5kW/uIneCbMaueU/BrlmqHxuhNoxpWY1WqvygW Tog1lnvdlH2SsqH8TjEOqhFyg45aYOx/5Nt8tDxcpPNJMdViuQ/oeoYibj+FpMJELDoqVPmRE9wEGP 5Y6Q67KPCin96clyIvx1iafakxonUkclXA3rOQRXZjTFGi7W8PLNd04vLFj3aacP7xnjUN9zCqyJrQ pWJzQPzaBQNyN9hlH8AljtP30WR+zwopNhio3xHnde4DwSa+Qpsfith29As5MZDTZYRKem0Dq35YAx QVfi/9UIyJdC0fxZBK4aByFH15DKwg8pTPKIgcq9dM8uYQHQ5RTCntYjIDNQ== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210614_514269_DCB58771 X-CRM114-Status: GOOD ( 29.64 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce verifier-side support for rqspinlock kfuncs. The first step is allowing bpf_res_spin_lock type to be defined in map values and allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK field to recognize and validate. Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only one of them (and at most one of them per-object, like before) must be present. The bpf_res_spin_lock can also be used to protect objects that require lock protection for their kfuncs, like BPF rbtree and linked list. The verifier plumbing to simulate success and failure cases when calling the kfuncs is done by pushing a new verifier state to the verifier state stack which will verify the failure case upon calling the kfunc. The path where success is indicated creates all lock reference state and IRQ state (if necessary for irqsave variants). In the case of failure, the state clears the registers r0-r5, sets the return value, and skips kfunc processing, proceeding to the next instruction. When marking the return value for success case, the value is marked as 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program, whenever user checks the return value as 'if (ret)' or 'if (ret < 0)' the verifier never traverses such branches for success cases, and would be aware that the lock is not held in such cases. We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs are invoked. We introduce a kfunc_class state to avoid mixing lock irqrestore kfuncs with IRQ state created by bpf_local_irq_save. With all this infrastructure, these kfuncs become usable in programs while satisfying all safety properties required by the kernel. Acked-by: Eduard Zingerman Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf.h | 9 ++ include/linux/bpf_verifier.h | 16 ++- kernel/bpf/btf.c | 26 ++++- kernel/bpf/syscall.c | 6 +- kernel/bpf/verifier.c | 219 ++++++++++++++++++++++++++++------- 5 files changed, 231 insertions(+), 45 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a6bc687d6300..c59384f62da0 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -205,6 +205,7 @@ enum btf_field_type { BPF_REFCOUNT = (1 << 9), BPF_WORKQUEUE = (1 << 10), BPF_UPTR = (1 << 11), + BPF_RES_SPIN_LOCK = (1 << 12), }; typedef void (*btf_dtor_kfunc_t)(void *); @@ -240,6 +241,7 @@ struct btf_record { u32 cnt; u32 field_mask; int spin_lock_off; + int res_spin_lock_off; int timer_off; int wq_off; int refcount_off; @@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return "bpf_spin_lock"; + case BPF_RES_SPIN_LOCK: + return "bpf_res_spin_lock"; case BPF_TIMER: return "bpf_timer"; case BPF_WORKQUEUE: @@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return sizeof(struct bpf_spin_lock); + case BPF_RES_SPIN_LOCK: + return sizeof(struct bpf_res_spin_lock); case BPF_TIMER: return sizeof(struct bpf_timer); case BPF_WORKQUEUE: @@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type) switch (type) { case BPF_SPIN_LOCK: return __alignof__(struct bpf_spin_lock); + case BPF_RES_SPIN_LOCK: + return __alignof__(struct bpf_res_spin_lock); case BPF_TIMER: return __alignof__(struct bpf_timer); case BPF_WORKQUEUE: @@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr) case BPF_RB_ROOT: /* RB_ROOT_CACHED 0-inits, no need to do anything after memset */ case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_WORKQUEUE: case BPF_KPTR_UNREF: diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index d6cfc4ee6820..bc073a48aed9 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -115,6 +115,14 @@ struct bpf_reg_state { int depth:30; } iter; + /* For irq stack slots */ + struct { + enum { + IRQ_NATIVE_KFUNC, + IRQ_LOCK_KFUNC, + } kfunc_class; + } irq; + /* Max size from any of the above. */ struct { unsigned long raw1; @@ -255,9 +263,11 @@ struct bpf_reference_state { * default to pointer reference on zero initialization of a state. */ enum ref_state_type { - REF_TYPE_PTR = 1, - REF_TYPE_IRQ = 2, - REF_TYPE_LOCK = 3, + REF_TYPE_PTR = (1 << 1), + REF_TYPE_IRQ = (1 << 2), + REF_TYPE_LOCK = (1 << 3), + REF_TYPE_RES_LOCK = (1 << 4), + REF_TYPE_RES_LOCK_IRQ = (1 << 5), } type; /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 519e3f5e9c10..f7a2bfb0c11a 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3481,6 +3481,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_ goto end; } } + if (field_mask & BPF_RES_SPIN_LOCK) { + if (!strcmp(name, "bpf_res_spin_lock")) { + if (*seen_mask & BPF_RES_SPIN_LOCK) + return -E2BIG; + *seen_mask |= BPF_RES_SPIN_LOCK; + type = BPF_RES_SPIN_LOCK; + goto end; + } + } if (field_mask & BPF_TIMER) { if (!strcmp(name, "bpf_timer")) { if (*seen_mask & BPF_TIMER) @@ -3659,6 +3668,7 @@ static int btf_find_field_one(const struct btf *btf, switch (field_type) { case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_WORKQUEUE: case BPF_LIST_NODE: @@ -3952,6 +3962,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type return ERR_PTR(-ENOMEM); rec->spin_lock_off = -EINVAL; + rec->res_spin_lock_off = -EINVAL; rec->timer_off = -EINVAL; rec->wq_off = -EINVAL; rec->refcount_off = -EINVAL; @@ -3979,6 +3990,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type /* Cache offset for faster lookup at runtime */ rec->spin_lock_off = rec->fields[i].offset; break; + case BPF_RES_SPIN_LOCK: + WARN_ON_ONCE(rec->spin_lock_off >= 0); + /* Cache offset for faster lookup at runtime */ + rec->res_spin_lock_off = rec->fields[i].offset; + break; case BPF_TIMER: WARN_ON_ONCE(rec->timer_off >= 0); /* Cache offset for faster lookup at runtime */ @@ -4022,9 +4038,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type rec->cnt++; } + if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) { + ret = -EINVAL; + goto end; + } + /* bpf_{list_head, rb_node} require bpf_spin_lock */ if ((btf_record_has_field(rec, BPF_LIST_HEAD) || - btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) { + btf_record_has_field(rec, BPF_RB_ROOT)) && + (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) { ret = -EINVAL; goto end; } @@ -5637,7 +5659,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) type = &tab->types[tab->cnt]; type->btf_id = i; - record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | + record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE | BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT | BPF_KPTR, t->size); /* The record cannot be unset, treat it as an error if so */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 6a8f20ee2851..dba2628fe9a5 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -648,6 +648,7 @@ void btf_record_free(struct btf_record *rec) case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: @@ -700,6 +701,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec) case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: @@ -777,6 +779,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj) switch (fields[i].type) { case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: break; case BPF_TIMER: bpf_timer_cancel_and_free(field_ptr); @@ -1212,7 +1215,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, return -EINVAL; map->record = btf_parse_fields(btf, value_type, - BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | + BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR, map->value_size); if (!IS_ERR_OR_NULL(map->record)) { @@ -1231,6 +1234,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token, case 0: continue; case BPF_SPIN_LOCK: + case BPF_RES_SPIN_LOCK: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_ARRAY && map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3303a3605ee8..29121ad32a89 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -456,7 +456,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog) static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) { - return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK); + return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK); } static bool type_is_rdonly_mem(u32 type) @@ -1155,7 +1155,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id); static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, - struct bpf_reg_state *reg, int insn_idx) + struct bpf_reg_state *reg, int insn_idx, + int kfunc_class) { struct bpf_func_state *state = func(env, reg); struct bpf_stack_state *slot; @@ -1177,6 +1178,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ st->live |= REG_LIVE_WRITTEN; st->ref_obj_id = id; + st->irq.kfunc_class = kfunc_class; for (i = 0; i < BPF_REG_SIZE; i++) slot->slot_type[i] = STACK_IRQ_FLAG; @@ -1185,7 +1187,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env, return 0; } -static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg) +static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg, + int kfunc_class) { struct bpf_func_state *state = func(env, reg); struct bpf_stack_state *slot; @@ -1199,6 +1202,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r slot = &state->stack[spi]; st = &slot->spilled_ptr; + if (st->irq.kfunc_class != kfunc_class) { + const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock"; + const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock"; + + verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n", + flag_kfunc, used_kfunc); + return -EINVAL; + } + err = release_irq_state(env->cur_state, st->ref_obj_id); WARN_ON_ONCE(err && err != -EACCES); if (err) { @@ -1609,7 +1621,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st for (i = 0; i < state->acquired_refs; i++) { struct bpf_reference_state *s = &state->refs[i]; - if (s->type != type) + if (!(s->type & type)) continue; if (s->id == id && s->ptr == ptr) @@ -8204,6 +8216,12 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg return err; } +enum { + PROCESS_SPIN_LOCK = (1 << 0), + PROCESS_RES_LOCK = (1 << 1), + PROCESS_LOCK_IRQ = (1 << 2), +}; + /* Implementation details: * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL. * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL. @@ -8226,30 +8244,33 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg * env->cur_state->active_locks remembers which map value element or allocated * object got locked and clears it after bpf_spin_unlock. */ -static int process_spin_lock(struct bpf_verifier_env *env, int regno, - bool is_lock) +static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags) { + bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK; + const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin"; struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; struct bpf_verifier_state *cur = env->cur_state; bool is_const = tnum_is_const(reg->var_off); + bool is_irq = flags & PROCESS_LOCK_IRQ; u64 val = reg->var_off.value; struct bpf_map *map = NULL; struct btf *btf = NULL; struct btf_record *rec; + u32 spin_lock_off; int err; if (!is_const) { verbose(env, - "R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n", - regno); + "R%d doesn't have constant offset. %s_lock has to be at the constant offset\n", + regno, lock_str); return -EINVAL; } if (reg->type == PTR_TO_MAP_VALUE) { map = reg->map_ptr; if (!map->btf) { verbose(env, - "map '%s' has to have BTF in order to use bpf_spin_lock\n", - map->name); + "map '%s' has to have BTF in order to use %s_lock\n", + map->name, lock_str); return -EINVAL; } } else { @@ -8257,36 +8278,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, } rec = reg_btf_record(reg); - if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) { - verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local", - map ? map->name : "kptr"); + if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) { + verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local", + map ? map->name : "kptr", lock_str); return -EINVAL; } - if (rec->spin_lock_off != val + reg->off) { - verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n", - val + reg->off, rec->spin_lock_off); + spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off; + if (spin_lock_off != val + reg->off) { + verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n", + val + reg->off, lock_str, spin_lock_off); return -EINVAL; } if (is_lock) { void *ptr; + int type; if (map) ptr = map; else ptr = btf; - if (cur->active_locks) { - verbose(env, - "Locking two bpf_spin_locks are not allowed\n"); - return -EINVAL; + if (!is_res_lock && cur->active_locks) { + if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) { + verbose(env, + "Locking two bpf_spin_locks are not allowed\n"); + return -EINVAL; + } + } else if (is_res_lock && cur->active_locks) { + if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, reg->id, ptr)) { + verbose(env, "Acquiring the same lock again, AA deadlock detected\n"); + return -EINVAL; + } } - err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr); + + if (is_res_lock && is_irq) + type = REF_TYPE_RES_LOCK_IRQ; + else if (is_res_lock) + type = REF_TYPE_RES_LOCK; + else + type = REF_TYPE_LOCK; + err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr); if (err < 0) { verbose(env, "Failed to acquire lock state\n"); return err; } } else { void *ptr; + int type; if (map) ptr = map; @@ -8294,12 +8332,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, ptr = btf; if (!cur->active_locks) { - verbose(env, "bpf_spin_unlock without taking a lock\n"); + verbose(env, "%s_unlock without taking a lock\n", lock_str); return -EINVAL; } - if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) { - verbose(env, "bpf_spin_unlock of different lock\n"); + if (is_res_lock && is_irq) + type = REF_TYPE_RES_LOCK_IRQ; + else if (is_res_lock) + type = REF_TYPE_RES_LOCK; + else + type = REF_TYPE_LOCK; + if (release_lock_state(cur, type, reg->id, ptr)) { + verbose(env, "%s_unlock of different lock\n", lock_str); return -EINVAL; } @@ -9625,11 +9669,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, return -EACCES; } if (meta->func_id == BPF_FUNC_spin_lock) { - err = process_spin_lock(env, regno, true); + err = process_spin_lock(env, regno, PROCESS_SPIN_LOCK); if (err) return err; } else if (meta->func_id == BPF_FUNC_spin_unlock) { - err = process_spin_lock(env, regno, false); + err = process_spin_lock(env, regno, 0); if (err) return err; } else { @@ -11511,7 +11555,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn regs[BPF_REG_0].map_uid = meta.map_uid; regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag; if (!type_may_be_null(ret_flag) && - btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) { + btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) { regs[BPF_REG_0].id = ++env->id_gen; } break; @@ -11683,10 +11727,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn /* mark_btf_func_reg_size() is used when the reg size is determined by * the BTF func_proto's return value size and argument. */ -static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, - size_t reg_size) +static void __mark_btf_func_reg_size(struct bpf_verifier_env *env, struct bpf_reg_state *regs, + u32 regno, size_t reg_size) { - struct bpf_reg_state *reg = &cur_regs(env)[regno]; + struct bpf_reg_state *reg = ®s[regno]; if (regno == BPF_REG_0) { /* Function return value */ @@ -11704,6 +11748,12 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, } } +static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno, + size_t reg_size) +{ + return __mark_btf_func_reg_size(env, cur_regs(env), regno, reg_size); +} + static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta) { return meta->kfunc_flags & KF_ACQUIRE; @@ -11841,6 +11891,7 @@ enum { KF_ARG_RB_ROOT_ID, KF_ARG_RB_NODE_ID, KF_ARG_WORKQUEUE_ID, + KF_ARG_RES_SPIN_LOCK_ID, }; BTF_ID_LIST(kf_arg_btf_ids) @@ -11850,6 +11901,7 @@ BTF_ID(struct, bpf_list_node) BTF_ID(struct, bpf_rb_root) BTF_ID(struct, bpf_rb_node) BTF_ID(struct, bpf_wq) +BTF_ID(struct, bpf_res_spin_lock) static bool __is_kfunc_ptr_arg_type(const struct btf *btf, const struct btf_param *arg, int type) @@ -11898,6 +11950,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg) return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID); } +static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg) +{ + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID); +} + static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf, const struct btf_param *arg) { @@ -11969,6 +12026,7 @@ enum kfunc_ptr_arg_type { KF_ARG_PTR_TO_MAP, KF_ARG_PTR_TO_WORKQUEUE, KF_ARG_PTR_TO_IRQ_FLAG, + KF_ARG_PTR_TO_RES_SPIN_LOCK, }; enum special_kfunc_type { @@ -12007,6 +12065,10 @@ enum special_kfunc_type { KF_bpf_iter_num_destroy, KF_bpf_set_dentry_xattr, KF_bpf_remove_dentry_xattr, + KF_bpf_res_spin_lock, + KF_bpf_res_spin_unlock, + KF_bpf_res_spin_lock_irqsave, + KF_bpf_res_spin_unlock_irqrestore, }; BTF_SET_START(special_kfunc_set) @@ -12096,6 +12158,10 @@ BTF_ID(func, bpf_remove_dentry_xattr) BTF_ID_UNUSED BTF_ID_UNUSED #endif +BTF_ID(func, bpf_res_spin_lock) +BTF_ID(func, bpf_res_spin_unlock) +BTF_ID(func, bpf_res_spin_lock_irqsave) +BTF_ID(func, bpf_res_spin_unlock_irqrestore) static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { @@ -12189,6 +12255,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, if (is_kfunc_arg_irq_flag(meta->btf, &args[argno])) return KF_ARG_PTR_TO_IRQ_FLAG; + if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno])) + return KF_ARG_PTR_TO_RES_SPIN_LOCK; + if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) { if (!btf_type_is_struct(ref_t)) { verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n", @@ -12296,13 +12365,19 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, struct bpf_kfunc_call_arg_meta *meta) { struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; + int err, kfunc_class = IRQ_NATIVE_KFUNC; bool irq_save; - int err; - if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) { + if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) { irq_save = true; - } else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) { + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) + kfunc_class = IRQ_LOCK_KFUNC; + } else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) { irq_save = false; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) + kfunc_class = IRQ_LOCK_KFUNC; } else { verbose(env, "verifier internal error: unknown irq flags kfunc\n"); return -EFAULT; @@ -12318,7 +12393,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, if (err) return err; - err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx); + err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class); if (err) return err; } else { @@ -12332,7 +12407,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno, if (err) return err; - err = unmark_stack_slot_irq_flag(env, reg); + err = unmark_stack_slot_irq_flag(env, reg, kfunc_class); if (err) return err; } @@ -12459,7 +12534,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ if (!env->cur_state->active_locks) return -EINVAL; - s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr); + s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, + id, ptr); if (!s) { verbose(env, "held lock and object are not in the same allocation\n"); return -EINVAL; @@ -12495,9 +12571,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id) btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]; } +static bool is_bpf_res_spin_lock_kfunc(u32 btf_id) +{ + return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] || + btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] || + btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] || + btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]; +} + static bool kfunc_spin_allowed(u32 btf_id) { - return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id); + return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) || + is_bpf_res_spin_lock_kfunc(btf_id); } static bool is_sync_callback_calling_kfunc(u32 btf_id) @@ -12929,6 +13014,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ case KF_ARG_PTR_TO_CONST_STR: case KF_ARG_PTR_TO_WORKQUEUE: case KF_ARG_PTR_TO_IRQ_FLAG: + case KF_ARG_PTR_TO_RES_SPIN_LOCK: break; default: WARN_ON_ONCE(1); @@ -13227,6 +13313,28 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ if (ret < 0) return ret; break; + case KF_ARG_PTR_TO_RES_SPIN_LOCK: + { + int flags = PROCESS_RES_LOCK; + + if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { + verbose(env, "arg#%d doesn't point to map value or allocated object\n", i); + return -EINVAL; + } + + if (!is_bpf_res_spin_lock_kfunc(meta->func_id)) + return -EFAULT; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) + flags |= PROCESS_SPIN_LOCK; + if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] || + meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) + flags |= PROCESS_LOCK_IRQ; + ret = process_spin_lock(env, regno, flags); + if (ret < 0) + return ret; + break; + } } } @@ -13312,6 +13420,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, insn_aux->is_iter_next = is_iter_next_kfunc(&meta); + if (!insn->off && + (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] || + insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) { + struct bpf_verifier_state *branch; + struct bpf_reg_state *regs; + + branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false); + if (!branch) { + verbose(env, "failed to push state for failed lock acquisition\n"); + return -ENOMEM; + } + + regs = branch->frame[branch->curframe]->regs; + + /* Clear r0-r5 registers in forked state */ + for (i = 0; i < CALLER_SAVED_REGS; i++) + mark_reg_not_init(env, regs, caller_saved[i]); + + mark_reg_unknown(env, regs, BPF_REG_0); + err = __mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1); + if (err) { + verbose(env, "failed to mark s32 range for retval in forked state for lock\n"); + return err; + } + __mark_btf_func_reg_size(env, regs, BPF_REG_0, sizeof(u32)); + } + if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) { verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n"); return -EACCES; @@ -13482,6 +13617,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (btf_type_is_scalar(t)) { mark_reg_unknown(env, regs, BPF_REG_0); + if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] || + meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) + __mark_reg_const_zero(env, ®s[BPF_REG_0]); mark_btf_func_reg_size(env, BPF_REG_0, t->size); } else if (btf_type_is_ptr(t)) { ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id); @@ -18417,7 +18555,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, case STACK_IRQ_FLAG: old_reg = &old->stack[spi].spilled_ptr; cur_reg = &cur->stack[spi].spilled_ptr; - if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap)) + if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) || + old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class) return false; break; case STACK_MISC: @@ -18461,6 +18600,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c case REF_TYPE_IRQ: break; case REF_TYPE_LOCK: + case REF_TYPE_RES_LOCK: + case REF_TYPE_RES_LOCK_IRQ: if (old->refs[i].ptr != cur->refs[i].ptr) return false; break; @@ -19746,7 +19887,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } } - if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) { + if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n"); return -EINVAL; From patchwork Sun Mar 16 04:05:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018348 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E3680C282DE for ; Sun, 16 Mar 2025 04:48:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Td96SnjkiCjYPsW/BwStDtEmfD+sdWIVQooAZW8vma4=; b=HQqzeRqLI1T3Hxz1y4KkmIcjav XB6cIOKQBxz23yM6mbrzMrHVt989h/3jzL47SlbBxMv2oC2VwOX7l6Nq6To1bn2S/kHr8OD4IEe0X waNLF/I4G0Pap1NyWDgfni8/2JOLgBjO7/Y1dxCJ0ZflV3zpq89OiHS+KciZVUhCweKToP2IU6Xwn jJMjpfLdFlEt8NEQs9Uz/aY7+Od6u4GxGzgDydCK8k3WagKy2QZmAW5/Y6gOVJApkB0Dg+93Ixyud H5gnBLWaYJ91tzhpiwzn3Cl3tjhJzBUa/zVSIWHj/y3SH7iJcr6CLtFAH7mvoPPfPnMVhtne65nWo OfN/yYSA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfvA-0000000HHMg-1QFK; Sun, 16 Mar 2025 04:48:20 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGR-0000000HCSQ-1Vqj for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:17 +0000 Received: by mail-wm1-x341.google.com with SMTP id 5b1f17b1804b1-43cef035a3bso6727915e9.1 for ; Sat, 15 Mar 2025 21:06:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097974; x=1742702774; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Td96SnjkiCjYPsW/BwStDtEmfD+sdWIVQooAZW8vma4=; b=IwqVs9cA1mHrDG1B1qOKrK1laW9GjT2mB3vDVq33SLsy5jEOEiPptQDOG+nyealQYB y2cxN+FckmkgczudMvWzP3dntDm6jB3Ff8qIS0ziVG0RvzXnd1vGcd+urAV4/ljT9VJJ txuxyCaLL80NUWffqY7CzvcJEDzSDJie3g7a+1yjw9S507ps2YI2dQg2LZMaFxGFySXj X+EsRozGLE6QgR9+O7s2ReVZBo4/KTn5lu8XT+fEq82pFNT8WZQzKc43F3nwYkppoLrD 1N/Xad2ym+5/FndKuyr8XsYTa89odtZLVLc8Izzpsy8lWlr18OZRVoHY115ChjCffsMw JLGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097974; x=1742702774; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Td96SnjkiCjYPsW/BwStDtEmfD+sdWIVQooAZW8vma4=; b=cc13CO+fXxMUOSRNa9KMPRX9Lz79rUjbRU9cxtjkpUwcKkvKk92+uJfJnM0d8U6aqJ 9IALg62KQ6cl5z29Lc3lqK9xpXTFjsYV3PfOdwxtdQOi55KVXR2AjLv4D0IxAiux+I0d vZM2m/Izl6d4DOoaMtV1ve8mJ5rMrHzWYjAiwmMsdKxSJUzLn2UUxPVyPr6r2iIv7wZr ULDfyq/Kvjeux6k7AbbsN1BIbKj5+ZDpn6AArHmI84rsGl1mdgSGnj6oo3ExS1UWgVhL KpS3H1o8LUK1pU/0q5qjlbaVtWwZV4JVM6rUQSu6zeb6jGgumZ+51Hwjg/mKw8+cEM3q J0cQ== X-Forwarded-Encrypted: i=1; AJvYcCXqahP7AtnF1yNQA5DWnJBFV0eAgqDrSIFy7z86/MorMjqShQOJy0bC0qZNbF0ftOTu436EQNqbVzGVNyYwYu4+@lists.infradead.org X-Gm-Message-State: AOJu0YyLWb7nebl8kJuuECZdsMcU71AOrKEiexv7aQlYNDAK2xcMKOey Btn3r99zcp0PE6f5gFcSgTSTJC3jhe8ddwab+1yyD3/IKexJeQiu X-Gm-Gg: ASbGncv/POH0qZmmVvLhOK8p92FLxt/jabJkNqaxeel72NI/26FtbyRhsCBBM2Tmf4k IDw7QFTFy7WZYgPNmXQAXh6Cd0/WE7dhQVPKJ2Ia3iAXxHvh+M4iPXNyCIY6WvIF/6Jqfu0U9A1 RdhN6Qs3MdtCABiSqpPLi7MjhKshQmMh4cgwHuzhe7nt+4U4Nh8IzKf2vX9fRqgDH0IybFYIjVy qrym0gNgN30h+X5qqdr3B+CMakpBI/uLE93zMFskEwkHUSVdKGpLce9Y5KOP0uA6SDUKVsTq6uc NZNv+z41QAh4Pciw/YBhnY/Lz/2vJCqv3TQ= X-Google-Smtp-Source: AGHT+IEe2u4p0LnlngF0w6iSDydFegjZ1jArOy5WfyUzD1CS7GnhmMYyuaPdi7pWrq/MX6Cd47TJfw== X-Received: by 2002:a05:600c:4fd3:b0:43d:8ea:8d80 with SMTP id 5b1f17b1804b1-43d1ec9071amr97461285e9.5.1742097973885; Sat, 15 Mar 2025 21:06:13 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:70::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1fe15488sm68116955e9.16.2025.03.15.21.06.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:13 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 24/25] bpf: Maintain FIFO property for rqspinlock unlock Date: Sat, 15 Mar 2025 21:05:40 -0700 Message-ID: <20250316040541.108729-25-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4773; h=from:subject; bh=O+lyN4oVrvKYSSQbJkMNs01queFf9cpPeUOzO9ERMeU=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3eZlk7c0Ywl88mFKBS1oFXTDuU3/IFonJhUClr HKe1WSqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3gAKCRBM4MiGSL8RyjP4D/ 4qY2ygrw3rIEgXKa12ik0/oktnV0e/sgD/scWu+14+mWArsWSQ8UUjlTmpeZtHx1UMMimJTVgA8JWn ErwnoS51+LKONrfuBOnb8WiSnFeLF2kEEfBXZQhz52hCEM91I4VjacRj/HFnavXb8dOXViHZOtiFYp tA8XV+c68vvKXklV2jpo5wQkhg4tl3j7veEwJrpGTnt1ANUxSd/148gUr43wVDaWrSgg0QTfjMnt2k goGCGNX40txTa7gZVvoN/8zUCK+ll8jaMBMQ+wXgziu1+Yxk9+wlg2NDfkV+ao2pAFbE2K59WELhhm zH40fY8ghmeKSCcSjq49JyGkr5NuuFkIKX9CxCU42wXXArAMCTvksaIBgF7yX6OcFX13Yd45mH2eQD ugV1igIdF9xIUeRIsnMvEEbbiDCD3kD8Zq8gI8qaaKFsk/czJpoEggouEQbtcPaJjppZjtejTeiJ60 lRqpK1fksWewmXvWLmFMlepLnJ23LvW+pPoJ8fBE0oSJuVXPW2+zmqm1tPHFkN7PEBinvfYBocQLPo HJmKqN3bK3Ub6nxUL5gU8yHrGZ6D9lv3K2LvmqUCYq8IJ9WdhSpT8wCA4I7oyLhMrvQs/sXiLR24eK rPTfZ7DZ+EFlgOEIzpWNR+wfqUw8Eo8o/ueXBPEQAa+d6DeACjG6ZPXCAJmw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210615_436274_ADACF882 X-CRM114-Status: GOOD ( 20.27 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Since out-of-order unlocks are unsupported for rqspinlock, and irqsave variants enforce strict FIFO ordering anyway, make the same change for normal non-irqsave variants, such that FIFO ordering is enforced. Two new verifier state fields (active_lock_id, active_lock_ptr) are used to denote the top of the stack, and prev_id and prev_ptr are ascertained whenever popping the topmost entry through an unlock. Take special care to make these fields part of the state comparison in refsafe. Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf_verifier.h | 3 +++ kernel/bpf/verifier.c | 33 ++++++++++++++++++++++++++++----- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index bc073a48aed9..9734544b6957 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -268,6 +268,7 @@ struct bpf_reference_state { REF_TYPE_LOCK = (1 << 3), REF_TYPE_RES_LOCK = (1 << 4), REF_TYPE_RES_LOCK_IRQ = (1 << 5), + REF_TYPE_LOCK_MASK = REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, } type; /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). @@ -434,6 +435,8 @@ struct bpf_verifier_state { u32 active_locks; u32 active_preempt_locks; u32 active_irq_id; + u32 active_lock_id; + void *active_lock_ptr; bool active_rcu_lock; bool speculative; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 29121ad32a89..4057081e996f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1428,6 +1428,8 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf dst->active_preempt_locks = src->active_preempt_locks; dst->active_rcu_lock = src->active_rcu_lock; dst->active_irq_id = src->active_irq_id; + dst->active_lock_id = src->active_lock_id; + dst->active_lock_ptr = src->active_lock_ptr; return 0; } @@ -1527,6 +1529,8 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r s->ptr = ptr; state->active_locks++; + state->active_lock_id = id; + state->active_lock_ptr = ptr; return 0; } @@ -1577,16 +1581,24 @@ static bool find_reference_state(struct bpf_verifier_state *state, int ptr_id) static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr) { + void *prev_ptr = NULL; + u32 prev_id = 0; int i; for (i = 0; i < state->acquired_refs; i++) { - if (state->refs[i].type != type) - continue; - if (state->refs[i].id == id && state->refs[i].ptr == ptr) { + if (state->refs[i].type == type && state->refs[i].id == id && + state->refs[i].ptr == ptr) { release_reference_state(state, i); state->active_locks--; + /* Reassign active lock (id, ptr). */ + state->active_lock_id = prev_id; + state->active_lock_ptr = prev_ptr; return 0; } + if (state->refs[i].type & REF_TYPE_LOCK_MASK) { + prev_id = state->refs[i].id; + prev_ptr = state->refs[i].ptr; + } } return -EINVAL; } @@ -8342,6 +8354,14 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, int flags) type = REF_TYPE_RES_LOCK; else type = REF_TYPE_LOCK; + if (!find_lock_state(cur, type, reg->id, ptr)) { + verbose(env, "%s_unlock of different lock\n", lock_str); + return -EINVAL; + } + if (reg->id != cur->active_lock_id || ptr != cur->active_lock_ptr) { + verbose(env, "%s_unlock cannot be out of order\n", lock_str); + return -EINVAL; + } if (release_lock_state(cur, type, reg->id, ptr)) { verbose(env, "%s_unlock of different lock\n", lock_str); return -EINVAL; @@ -12534,8 +12554,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ if (!env->cur_state->active_locks) return -EINVAL; - s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ, - id, ptr); + s = find_lock_state(env->cur_state, REF_TYPE_LOCK_MASK, id, ptr); if (!s) { verbose(env, "held lock and object are not in the same allocation\n"); return -EINVAL; @@ -18591,6 +18610,10 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap)) return false; + if (!check_ids(old->active_lock_id, cur->active_lock_id, idmap) || + old->active_lock_ptr != cur->active_lock_ptr) + return false; + for (i = 0; i < old->acquired_refs; i++) { if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) || old->refs[i].type != cur->refs[i].type) From patchwork Sun Mar 16 04:05:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 14018349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B9055C282DE for ; Sun, 16 Mar 2025 04:50:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=fqTweZiejAx9T/E+eHkWTTmWDjmG9az608PXOpULayA=; b=2zs69NhRB/KwVQ40/fyQmYPKL8 l3q4/oX4tjYohiMjhxP86zzi4UKQextwuRCHp5hqFHBNS2l6/ps3r9kYzJ677OuOYsStWK5jxoUl6 JDrYM7qWGEePsabJVc2OT01hmTsgF4YkSHEdDsSz3IkQCWzliaffgNVj6JmfOnjHl8wXBkrdZEe7Q gT24LOsWbJlIZUprh75o+WuXqsl+2wo9sv9OFxilGilR9E20XC0HdCvQbhJ7dgIQItDyTiSSo/uR7 v/9WeyB3SJ5f7MZsGhW+lwQ/h7mSPStWgl6LoOit2jFteF8DaLnXEBjvnjc3m2DsxglhDwy70NJwz nzwc0iUg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1ttfwo-0000000HHTV-0dVl; Sun, 16 Mar 2025 04:50:02 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1ttfGS-0000000HCTB-2qJV for linux-arm-kernel@lists.infradead.org; Sun, 16 Mar 2025 04:06:18 +0000 Received: by mail-wm1-x341.google.com with SMTP id 5b1f17b1804b1-43cf680d351so5110395e9.0 for ; Sat, 15 Mar 2025 21:06:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742097975; x=1742702775; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fqTweZiejAx9T/E+eHkWTTmWDjmG9az608PXOpULayA=; b=NjUDYXXPPIUOIg/Wa4rx27dZnDdnpEGLgqJD29+r7s+TTEMyBULKkXm5IU6Yx/6Pmw /5UOa5xsp8xsHidT6Ld+pJPAKBiFqE+lWs2Tuo6mcDRHAMR/8ZchzHSqRSY8D9kqRwOo sApkQMZEpTaEuMiFDJogImXeg9xjExtgNcSsKqYd4E1FoUxskkfpM5CL3ovfYU8gDEmf W+0qWefy1jZxAizl9QIpu4ZWr2kXZS9iG37mMPYXfe3pzIx/pMiq4PuivwYUy3Tsruah eSV6GROPFjUL9xHZizX0TvHdTz+QkTBQr4cOsbs8MBh1EyyV2DA+hea6YSFUZ8/aPnTg paEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742097975; x=1742702775; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fqTweZiejAx9T/E+eHkWTTmWDjmG9az608PXOpULayA=; b=MbxRjSm1gzfQ20CUVRWv3k1rvwHvaIESgsJTzNrkWoosJBY5xrHQgCVn1Si+GFzPHV LVuc1Ezdz6FcI7sTgHW0h+3ZOZnQjWXFMd8ZGFZj/AnAvxwenPcL4OMOu5u95vubMFDS NlPFzg4D4wJS7UaD2hVXVqWG8lxo2uzJ+lQBCniXntBAe1Q3Qmg4Irkxg2YFLunBrGWE tSQsHORKhNfF8NAaO+CYzGSx07wwM9vJR5ueyQl9kCjx+/dGHjgFcUIZKgTkjq4bgGHV xJcip96yo60LvMGg3smi9yNgBrjTVJQkcrE2jPBH25zjvmBs4eefA1ox8dQizOOqxBtQ hPzw== X-Forwarded-Encrypted: i=1; AJvYcCWOECsFS8/oIy5nLfSBU7ymhK3jRD+5r7gvuYqtLLU7nq/sDyvsrmUQhGVnvempFHAGCslCCE5YShSiAl//Ygc3@lists.infradead.org X-Gm-Message-State: AOJu0Yzo1moO9w/vqty+tEa+xs/zOrzx+PZdlYiUSW2AB8QRClqTsmXA bbsiI/+xoAQEpbtNMfWc3T3LkZthDl9EYiN/1tbl0wynSaAgfs7y X-Gm-Gg: ASbGncvc1TaNn1qG06FzsaIsWfo/OGOojKlCyAxVPunx+JY3Cqm/c0blVyAzoZDmO9z 5A5bkNKRsWK2qVGI7YldBIUIRD5rQHJMxXOGTZGJC603Ullvxd5k99caaSiy9vv4JEB4OxZMzcR jnsnOl/Nfth9Xf3++nTd1dcS/eIfUT5gud4HkWpzh9+7MdGuVqflmfe89A218BPgAlgN0SOMzGl o/eaqBIr391DYEK2VS4lLgnD5i39IbOS1dKHuBGjCm6rP2J0GBSIBNnRYEASDS8AIKODRvjWj7l XBVg+BNUONwjBm0RJXX1OwNFXYmAjQvqtQ== X-Google-Smtp-Source: AGHT+IFeh67jTgD5TP0qheSUJfU7LLF64eXr0kHndM6htg+7j04x+U4+nQOzqvFPpjLVDRYSpnbm9w== X-Received: by 2002:a05:600c:2e49:b0:439:91c7:895a with SMTP id 5b1f17b1804b1-43d18077785mr127539315e9.7.1742097975149; Sat, 15 Mar 2025 21:06:15 -0700 (PDT) Received: from localhost ([2a03:2880:31ff:9::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1fdda29esm67800095e9.7.2025.03.15.21.06.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Mar 2025 21:06:14 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Linus Torvalds , Peter Zijlstra , Will Deacon , Waiman Long , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , "Paul E. McKenney" , Tejun Heo , Barret Rhoden , Josh Don , Dohyun Kim , linux-arm-kernel@lists.infradead.org, kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v4 25/25] selftests/bpf: Add tests for rqspinlock Date: Sat, 15 Mar 2025 21:05:41 -0700 Message-ID: <20250316040541.108729-26-memxor@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250316040541.108729-1-memxor@gmail.com> References: <20250316040541.108729-1-memxor@gmail.com> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=15835; h=from:subject; bh=WzglPmerkocCQhDWbBighUyfP7AwOFpfdLvye4wcaoE=; b=owEBbAKT/ZANAwAIAUzgyIZIvxHKAcsmYgBn1k3f2c4rqqrAQ0kVuTGJPhKGv2EW1Y8So7OlkvKH pijdeFmJAjIEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ9ZN3wAKCRBM4MiGSL8RyhbbD/ UUdDj3C6OQQLLspCJ/FjFqMb7Rh+TPjsp+ezYfCepntai3KXcaf5EXMkD2hxkq5UXjWrPDc0/qRlWe g6SqGPxSDkawyNrLqqJpettdZ9Otv3dfHFcqnUVbvcXce+/Dv0KK8Z0LHNXMVSLNQosrLZZIcUYlK8 q9tIL79uS9UZsjCswHctVclK3kG6/j/pbYNePRmaOGvS28U4vYwkPCR4ukGUTpzSGhCRQyKQ9mfots AD4s5cwsLVJD6mn6JW+T8NQsw5fc4cL+jcmEGw4eFZKfhEtRtSq4pDO8W9npT0aOUuL7n2tR6LUqA9 PCLIgaddMWt+ZBb7ZAl9a4R0eVngKUmjipW2nVbuPrXlPprmGukFj7ffLLfzBiXeEar8Z+qNs7m+k1 3TNBqf/k6JRgsBDMD+KAV17zrogSxgoabbMqJr/Qks7HK3qFuuT/rcVvPct457i9heREK/rh+iPOSx geTMgx8BdnqZP8XlUbLhCrhS5F+aIEhGOZDRvkC2VNavkKhnKCTmkgwWnXwWul6mxGQQyG5ogs40ic Y5lSr36ZMAApoVgCM1E40+g4hc4Va8xlVF2bp0ST/A+GOJVdV5WycxpHNj/qPaS0F//8vnPdY7EVE/ zBJ5rHbNyk57EKt+zdORu09tY/48U3/03l0OWSyN6pXSWzI3uoRnsPzxWV X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250315_210616_872362_139429B6 X-CRM114-Status: GOOD ( 25.06 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce selftests that trigger AA, ABBA deadlocks, and test the edge case where the held locks table runs out of entries, since we then fallback to the timeout as the final line of defense. Also exercise verifier's AA detection where applicable. Signed-off-by: Kumar Kartikeya Dwivedi --- .../selftests/bpf/prog_tests/res_spin_lock.c | 98 +++++++ tools/testing/selftests/bpf/progs/irq.c | 53 ++++ .../selftests/bpf/progs/res_spin_lock.c | 143 ++++++++++ .../selftests/bpf/progs/res_spin_lock_fail.c | 244 ++++++++++++++++++ 4 files changed, 538 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c new file mode 100644 index 000000000000..115287ba441b --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024-2025 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include + +#include "res_spin_lock.skel.h" +#include "res_spin_lock_fail.skel.h" + +void test_res_spin_lock_failure(void) +{ + RUN_TESTS(res_spin_lock_fail); +} + +static volatile int skip; + +static void *spin_lock_thread(void *arg) +{ + int err, prog_fd = *(u32 *) arg; + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 10000, + ); + + while (!READ_ONCE(skip)) { + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run"); + ASSERT_OK(topts.retval, "test_run retval"); + } + pthread_exit(arg); +} + +void test_res_spin_lock_success(void) +{ + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 1, + ); + struct res_spin_lock *skel; + pthread_t thread_id[16]; + int prog_fd, i, err; + void *ret; + + if (get_nprocs() < 2) { + test__skip(); + return; + } + + skel = res_spin_lock__open_and_load(); + if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load")) + return; + /* AA deadlock */ + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "error"); + ASSERT_OK(topts.retval, "retval"); + + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "error"); + ASSERT_OK(topts.retval, "retval"); + + /* Multi-threaded ABBA deadlock. */ + + prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB); + for (i = 0; i < 16; i++) { + int err; + + err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd); + if (!ASSERT_OK(err, "pthread_create")) + goto end; + } + + topts.retval = 0; + topts.repeat = 1000; + int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA); + while (!topts.retval && !err && !READ_ONCE(skel->bss->err)) { + err = bpf_prog_test_run_opts(fd, &topts); + } + + WRITE_ONCE(skip, true); + + for (i = 0; i < 16; i++) { + if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join")) + goto end; + if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd")) + goto end; + } + + ASSERT_EQ(READ_ONCE(skel->bss->err), -EDEADLK, "timeout err"); + ASSERT_OK(err, "err"); + ASSERT_EQ(topts.retval, -EDEADLK, "timeout"); +end: + res_spin_lock__destroy(skel); + return; +} diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c index 298d48d7886d..74d912b22de9 100644 --- a/tools/testing/selftests/bpf/progs/irq.c +++ b/tools/testing/selftests/bpf/progs/irq.c @@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym; extern void bpf_local_irq_restore(unsigned long *) __weak __ksym; extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym; +struct bpf_res_spin_lock lockA __hidden SEC(".data.A"); +struct bpf_res_spin_lock lockB __hidden SEC(".data.B"); + SEC("?tc") __failure __msg("arg#0 doesn't point to an irq flag on stack") int irq_save_bad_arg(struct __sk_buff *ctx) @@ -510,4 +513,54 @@ int irq_sleepable_global_subprog_indirect(void *ctx) return 0; } +SEC("?tc") +__failure __msg("cannot restore irq state out of order") +int irq_ooo_lock_cond_inv(struct __sk_buff *ctx) +{ + unsigned long flags1, flags2; + + if (bpf_res_spin_lock_irqsave(&lockA, &flags1)) + return 0; + if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) { + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; + } + + bpf_res_spin_unlock_irqrestore(&lockB, &flags1); + bpf_res_spin_unlock_irqrestore(&lockA, &flags2); + return 0; +} + +SEC("?tc") +__failure __msg("function calls are not allowed") +int irq_wrong_kfunc_class_1(struct __sk_buff *ctx) +{ + unsigned long flags1; + + if (bpf_res_spin_lock_irqsave(&lockA, &flags1)) + return 0; + /* For now, bpf_local_irq_restore is not allowed in critical section, + * but this test ensures error will be caught with kfunc_class when it's + * opened up. Tested by temporarily permitting this kfunc in critical + * section. + */ + bpf_local_irq_restore(&flags1); + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; +} + +SEC("?tc") +__failure __msg("function calls are not allowed") +int irq_wrong_kfunc_class_2(struct __sk_buff *ctx) +{ + unsigned long flags1, flags2; + + bpf_local_irq_save(&flags1); + if (bpf_res_spin_lock_irqsave(&lockA, &flags2)) + return 0; + bpf_local_irq_restore(&flags2); + bpf_res_spin_unlock_irqrestore(&lockA, &flags1); + return 0; +} + char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c new file mode 100644 index 000000000000..b33385dfbd35 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024-2025 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include "bpf_misc.h" + +#define EDEADLK 35 +#define ETIMEDOUT 110 + +struct arr_elem { + struct bpf_res_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 64); + __type(key, int); + __type(value, struct arr_elem); +} arrmap SEC(".maps"); + +struct bpf_res_spin_lock lockA __hidden SEC(".data.A"); +struct bpf_res_spin_lock lockB __hidden SEC(".data.B"); + +SEC("tc") +int res_spin_lock_test(struct __sk_buff *ctx) +{ + struct arr_elem *elem1, *elem2; + int r; + + elem1 = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem1) + return -1; + elem2 = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem2) + return -1; + + r = bpf_res_spin_lock(&elem1->lock); + if (r) + return r; + if (!bpf_res_spin_lock(&elem2->lock)) { + bpf_res_spin_unlock(&elem2->lock); + bpf_res_spin_unlock(&elem1->lock); + return -1; + } + bpf_res_spin_unlock(&elem1->lock); + return 0; +} + +SEC("tc") +int res_spin_lock_test_AB(struct __sk_buff *ctx) +{ + int r; + + r = bpf_res_spin_lock(&lockA); + if (r) + return !r; + /* Only unlock if we took the lock. */ + if (!bpf_res_spin_lock(&lockB)) + bpf_res_spin_unlock(&lockB); + bpf_res_spin_unlock(&lockA); + return 0; +} + +int err; + +SEC("tc") +int res_spin_lock_test_BA(struct __sk_buff *ctx) +{ + int r; + + r = bpf_res_spin_lock(&lockB); + if (r) + return !r; + if (!bpf_res_spin_lock(&lockA)) + bpf_res_spin_unlock(&lockA); + else + err = -EDEADLK; + bpf_res_spin_unlock(&lockB); + return err ?: 0; +} + +SEC("tc") +int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx) +{ + struct bpf_res_spin_lock *locks[48] = {}; + struct arr_elem *e; + u64 time_beg, time; + int ret = 0, i; + + _Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 31, + "RES_NR_HELD assumed to be 31"); + + for (i = 0; i < 34; i++) { + int key = i; + + /* We cannot pass in i as it will get spilled/filled by the compiler and + * loses bounds in verifier state. + */ + e = bpf_map_lookup_elem(&arrmap, &key); + if (!e) + return 1; + locks[i] = &e->lock; + } + + for (; i < 48; i++) { + int key = i - 2; + + /* We cannot pass in i as it will get spilled/filled by the compiler and + * loses bounds in verifier state. + */ + e = bpf_map_lookup_elem(&arrmap, &key); + if (!e) + return 1; + locks[i] = &e->lock; + } + + time_beg = bpf_ktime_get_ns(); + for (i = 0; i < 34; i++) { + if (bpf_res_spin_lock(locks[i])) + goto end; + } + + /* Trigger AA, after exhausting entries in the held lock table. This + * time, only the timeout can save us, as AA detection won't succeed. + */ + if (!bpf_res_spin_lock(locks[34])) { + bpf_res_spin_unlock(locks[34]); + ret = 1; + goto end; + } + +end: + for (i = i - 1; i >= 0; i--) + bpf_res_spin_unlock(locks[i]); + time = bpf_ktime_get_ns() - time_beg; + /* Time spent should be easily above our limit (1/4 s), since AA + * detection won't be expedited due to lack of held lock entry. + */ + return ret ?: (time > 1000000000 / 4 ? 0 : 1); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c new file mode 100644 index 000000000000..330682a88c16 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024-2025 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +struct arr_elem { + struct bpf_res_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct arr_elem); +} arrmap SEC(".maps"); + +long value; + +struct bpf_spin_lock lock __hidden SEC(".data.A"); +struct bpf_res_spin_lock res_lock __hidden SEC(".data.B"); + +SEC("?tc") +__failure __msg("point to map value or allocated object") +int res_spin_lock_arg(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff)); + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("AA deadlock detected") +int res_spin_lock_AA(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock(&elem->lock); + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("AA deadlock detected") +int res_spin_lock_cond_AA(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&elem->lock)) + return 0; + bpf_res_spin_lock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_mismatch_1(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&elem->lock)) + return 0; + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_mismatch_2(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&res_lock)) + return 0; + bpf_res_spin_unlock(&elem->lock); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_local_irq_save(&f1); + if (bpf_res_spin_lock(&res_lock)) + return 0; + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + return 0; +} + +SEC("?tc") +__failure __msg("unlock of different lock") +int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock_irqsave(&res_lock, &f1)) + return 0; + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__success +int res_spin_lock_ooo(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock(&res_lock)) + return 0; + if (bpf_res_spin_lock(&elem->lock)) { + bpf_res_spin_unlock(&res_lock); + return 0; + } + bpf_res_spin_unlock(&elem->lock); + bpf_res_spin_unlock(&res_lock); + return 0; +} + +SEC("?tc") +__success +int res_spin_lock_ooo_irq(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + unsigned long f1, f2; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + if (bpf_res_spin_lock_irqsave(&res_lock, &f1)) + return 0; + if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) { + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + /* We won't have a unreleased IRQ flag error here. */ + return 0; + } + bpf_res_spin_unlock_irqrestore(&elem->lock, &f2); + bpf_res_spin_unlock_irqrestore(&res_lock, &f1); + return 0; +} + +struct bpf_res_spin_lock lock1 __hidden SEC(".data.OO1"); +struct bpf_res_spin_lock lock2 __hidden SEC(".data.OO2"); + +SEC("?tc") +__failure __msg("bpf_res_spin_unlock cannot be out of order") +int res_spin_lock_ooo_unlock(struct __sk_buff *ctx) +{ + if (bpf_res_spin_lock(&lock1)) + return 0; + if (bpf_res_spin_lock(&lock2)) { + bpf_res_spin_unlock(&lock1); + return 0; + } + bpf_res_spin_unlock(&lock1); + bpf_res_spin_unlock(&lock2); + return 0; +} + +SEC("?tc") +__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0") +int res_spin_lock_bad_off(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) + return 0; + bpf_res_spin_lock((void *)&elem->lock + 1); + return 0; +} + +SEC("?tc") +__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset") +int res_spin_lock_var_off(struct __sk_buff *ctx) +{ + struct arr_elem *elem; + u64 val = value; + + elem = bpf_map_lookup_elem(&arrmap, &(int){0}); + if (!elem) { + // FIXME: Only inline assembly use in assert macro doesn't emit + // BTF definition. + bpf_throw(0); + return 0; + } + bpf_assert_range(val, 0, 40); + bpf_res_spin_lock((void *)&value + val); + return 0; +} + +SEC("?tc") +__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock") +int res_spin_lock_no_lock_map(struct __sk_buff *ctx) +{ + bpf_res_spin_lock((void *)&value + 1); + return 0; +} + +SEC("?tc") +__failure __msg("local 'kptr' has no valid bpf_res_spin_lock") +int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx) +{ + struct { int i; } *p = bpf_obj_new(typeof(*p)); + + if (!p) + return 0; + bpf_res_spin_lock((void *)p); + return 0; +} + +char _license[] SEC("license") = "GPL";