From patchwork Mon Mar 17 14:33:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 14019356 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14828C282EC for ; Mon, 17 Mar 2025 14:33:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FB08280003; Mon, 17 Mar 2025 10:33:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23450280002; Mon, 17 Mar 2025 10:33:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0881F280003; Mon, 17 Mar 2025 10:33:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D80F0280001 for ; Mon, 17 Mar 2025 10:33:36 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6B861160137 for ; Mon, 17 Mar 2025 14:33:37 +0000 (UTC) X-FDA: 83231286474.19.21CCB1A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf20.hostedemail.com (Postfix) with ESMTP id 2C1DB1C0012 for ; Mon, 17 Mar 2025 14:33:34 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=mpILBlMf; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="C65vA/SA"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=mpILBlMf; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="C65vA/SA"; dmarc=none; spf=pass (imf20.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742222015; a=rsa-sha256; cv=none; b=A/96orNSdffGhtSeoZCeE1O+wyoojvFZ7d0ZCv3c8GzyvlrLujRqXnqbfbAueWp6oMirKF EevwaUviwvDFVANAVPCqiONCambHXrAtgt6FRv/r/Q7Zig8wBp/2WNbcCtvnJ5AVmET2H7 uukEBo7CDggV3p6bfHtE+3S1Fgu20XU= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=mpILBlMf; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="C65vA/SA"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=mpILBlMf; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="C65vA/SA"; dmarc=none; spf=pass (imf20.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742222015; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=POxYOC4nCskQkBmgoHQb2Fm/0NHB4W3IAhsdfcHg6ik=; b=v9f7T+HW3bXsD25cmYK+rhgCXzFGsOFpSapQwJxgJr+N7vC3EmZKeTA9/AV/RGeKx4GVTt wxMDRbcMC/8DvHc2VtvEFmfQdmEMKg3U2NQxvsOsh0b97YwWU6pvX+UnOm0XwUnJf+l3Tp R8moAseMqsjv0g2QdbJlAeihgKq+t5A= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A2CD021A7F; Mon, 17 Mar 2025 14:33:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1742222013; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=POxYOC4nCskQkBmgoHQb2Fm/0NHB4W3IAhsdfcHg6ik=; b=mpILBlMf+l3YnmnWTYaOM6bJ/odxRu6RXeh0LRILT5/zLseMqJASSqcry5d32tgqAMelBJ /gWPVHYMGUUtPmeAshyJLQ+vcrCviSeO95ixjcohpKFk651PO4QChSC9NF/ynhROznj6du 80fHv0SkulYEisOqbAVvYxZPBE+VmbY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1742222013; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=POxYOC4nCskQkBmgoHQb2Fm/0NHB4W3IAhsdfcHg6ik=; b=C65vA/SAQPyflkSErhepCAPOEfEeaQ2d9+Ymx0CYteJS9NbmbXo+yiXsuyopyUlBaPM1Ky sXuTlRs6AsA2KXAQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1742222013; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=POxYOC4nCskQkBmgoHQb2Fm/0NHB4W3IAhsdfcHg6ik=; b=mpILBlMf+l3YnmnWTYaOM6bJ/odxRu6RXeh0LRILT5/zLseMqJASSqcry5d32tgqAMelBJ /gWPVHYMGUUtPmeAshyJLQ+vcrCviSeO95ixjcohpKFk651PO4QChSC9NF/ynhROznj6du 80fHv0SkulYEisOqbAVvYxZPBE+VmbY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1742222013; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=POxYOC4nCskQkBmgoHQb2Fm/0NHB4W3IAhsdfcHg6ik=; b=C65vA/SAQPyflkSErhepCAPOEfEeaQ2d9+Ymx0CYteJS9NbmbXo+yiXsuyopyUlBaPM1Ky sXuTlRs6AsA2KXAQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 8839A139D2; Mon, 17 Mar 2025 14:33:33 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id TVjqIL0y2GcycQAAD6G6ig (envelope-from ); Mon, 17 Mar 2025 14:33:33 +0000 From: Vlastimil Babka Subject: [PATCH RFC v3 0/8] SLUB percpu sheaves Date: Mon, 17 Mar 2025 15:33:01 +0100 Message-Id: <20250317-slub-percpu-caches-v3-0-9d9884d8b643@suse.cz> MIME-Version: 1.0 X-B4-Tracking: v=1; b=H4sIAJ4y2GcC/22Qy26DMBBFfwV5XSOPAWOzqlSpH9BtFVW2GYqV8 qgNqGmUf6+BNO0iyxlpzj1zzySgdxhIlZyJx8UFN/RxyB4SYlvdvyN1dZwJZzwD4JKGj9nQEb0 dZ2q1bTFQlecgFWcAdUni4eixcV8b9JW8PD+RQ1w2fujo1HrUNx6TwEBkgpep4ExQRRejzVE/h jlgar9XVuvCNPjT5rfARoynOUSXeyoLUEbr2jJTW14iK26s1WHhv4CCccjvAngESFkojsiUEH8 yh8v+msfPOdY0Xf8zOiC1Q9e5qUoWkUast7CqdxiC3gqskj2SZXvk5PoTPTYe8c3b+Sqd51yaR mmp4V/m5QeUltBpogEAAA== X-Change-ID: 20231128-slub-percpu-caches-9441892011d7 To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz, Sebastian Andrzej Siewior , Davidlohr Bueso , Alexei Starovoitov , "Liam R. Howlett" X-Mailer: b4 0.14.2 X-Stat-Signature: f8dc4u4waf146ytyf3ghkw55ardd7j56 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2C1DB1C0012 X-Rspam-User: X-HE-Tag: 1742222014-22066 X-HE-Meta: U2FsdGVkX18bXNSp7bOo+1gFXDBGHX/VJp4Q6W5i9RitQwzE0cQDGT4+l3jfdM4HbTECqno2lJwpo0J9PZnf7J8mCPsJZhqvfnKH7XcQvYqWBgfRNF0yoM49H+ldssgtC2DR1DsJzAnoupXtqT/MV/PM501IGV9pCnY3ylXKj3ocve1d94YLyfLGiO2Z7Y0dH6mRWRdxtY2EdTfUJetbYvVcd30jEF3WNckmdXi1iAWS1CmDtvv73hXv/DsnW6hSN1Qik7AtQqBQwgdFAXyoQptm745eV7RLEFTJ3jUv4hSjpwu4Rr/CA9hTfuAvPowJI36O5ebeLNRLd7a03KMQexRDmhjy6nFvcpfzqznkYV3fdSJlc157KfPgq1gxx4C3jGj4GO8rMgmdcCdfs5ahFiCEh4dxE4EjPP0XXGx2QBToEoCLD/rDy1IHZaVM7uEb/xlQ2Tb3kV7xGbLMxN20Ywm6jdvTJpM3/4rYt/YPb/F33nNQtJfI18pW91M3Y/icgfZ6CVEAtsKWy7G7unHLd6/MeO+7mUjN1RdFRj7fJFt/HL3PWK0zSGC5m9eNtgGhSLZqFFTLJrVFVaW0nOe9YZEl6MQIH2vnaP/z6zuPdqNp0TluK+yfOrSdFUZgQmTH2InHBbPhwxP5gPelcHbODpkPUxSWfYK1gyhkptrYMe9gxV8mIV3T99m2xzbaQ1xsSN/Th14vM5gqJ0mM1NWj/bq7lRBTpmNsdNsMXWyBxGolv3YfNf4tSzKr4MGf1/3SXkxA+z4baaRuuOw64g3cfIK/mXLu1+n8jNP9aj7ZyMY5071+BmIXUPlieHSCAUB/s2KkYKVhJQpQFs7ct9rXtONNvebR48JNfNvrYo3RgPoP2ojEuAPCvwS7xOLhfnjtKbxGsOgROkv0PGRzSP+wWw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, This is the v3 RFC to add an opt-in percpu array-based caching layer to SLUB. This is to publish accumulated fixes since v2 ahead of LSF/MM. I have also squashed the usage of localtry_lock to be used immediately (not converted to in a separate patch) as the lock should be going to 6.15 at this point. There's still a patch introducing it for this RFC to avoid depending on bpf-next tree. The name "sheaf" was invented by Matthew so we don't call it magazine like the original Bonwick paper. The per-NUMA-node cache of sheaves is thus called "barn". This may seem similar to the arrays in SLAB, but the main differences are: - opt-in, not used for every cache - does not distinguish NUMA locality, thus no "shared" arrays and no "alien" arrays that would need periodical flushing - improves kfree_rcu() handling - API for obtaining a preallocated sheaf that can be used for guaranteed and efficient allocations in a restricted context, when the upper bound for needed objects is known but rarely reached The motivation comes mainly from the ongoing work related to VMA scalability and the related maple tree operations. This is why maple tree nodes are sheaf-enabled in the RFC, but it's not a full conversion that would take benefits of the improved preallocation API. The VMA part is currently left out as it's expected that Suren will land the VMA TYPESAFE_BY_RCU conversion [3] soon and there would be conflict with that. With both series applied it means just adding a line to kmem_cache_args in proc_caches_init(). Some performance benefits were measured by Suren and Liam in previous versions. Suren's results in [5] look promising, so far except for the preallocation support as used by the refactored maple tree code. A sheaf-enabled cache has the following expected advantages: - Cheaper fast paths. For allocations, instead of local double cmpxchg, after Patch 5 it's preempt_disable() and no atomic operations. Same for freeing, which is normally a local double cmpxchg only for a short term allocations (so the same slab is still active on the same cpu when freeing the object) and a more costly locked double cmpxchg otherwise. The downside is the lack of NUMA locality guarantees for the allocated objects. - kfree_rcu() batching and recycling. kfree_rcu() will put objects to a separate percpu sheaf and only submit the whole sheaf to call_rcu() when full. After the grace period, the sheaf can be used for allocations, which is more efficient than freeing and reallocating individual slab objects (even with the batching done by kfree_rcu() implementation itself). In case only some cpus are allowed to handle rcu callbacks, the sheaf can still be made available to other cpus on the same node via the shared barn. The maple_node cache uses kfree_rcu() and thus can benefit from this. - Preallocation support. A prefilled sheaf can be privately borrowed for a short term operation that is not allowed to block in the middle and may need to allocate some objects. If an upper bound (worst case) for the number of allocations is known, but only much fewer allocations actually needed on average, borrowing and returning a sheaf is much more efficient then a bulk allocation for the worst case followed by a bulk free of the many unused objects. Maple tree write operations should benefit from this. Patch 1 is copied from the series "bpf, mm: Introduce try_alloc_pages()" [2] to introduce a variant of local_lock that has a trylock operation. Patch 2 implements the basic sheaf functionality as a caching layer on top of (per-cpu) slabs. Patch 3 adds the sheaf kfree_rcu() support. Patch 4 implements borrowing prefilled sheaves, with maple tree being the ancticipated user. Patch 5 seeks to reduce barn spinlock contention. Separately for possible evaluation. Patches 6 and 7 by Liam add testing stubs that maple tree will use in its userspace tests. Patch 9 enables sheaves for the maple tree node cache, but does not take advantage of prefilling yet. (RFC) LIMITATIONS: - with slub_debug enabled, objects in sheaves are considered allocated so allocation/free stacktraces may become imprecise and checking of e.g. redzone violations may be delayed GIT TREES: this series: https://git.kernel.org/vbabka/l/slub-percpu-sheaves-v3 To avoid conflicts, the series requires (and the branch above is based on) the kfree_rcu() code refactoring scheduled for 6.15: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.15/kfree_rcu_tiny Vlastimil [1] https://lore.kernel.org/all/20241111205506.3404479-1-surenb@google.com/ [2] https://lore.kernel.org/all/20250222024427.30294-2-alexei.starovoitov@gmail.com/ [3] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@google.com/ [4] https://www.infradead.org/git/?p=users/jedix/linux-maple.git;a=shortlog;h=refs/heads/slub-percpu-sheaves-v2 [5] https://lore.kernel.org/all/CAJuCfpFVopL%2BsMdU4bLRxs%2BHS_WPCmFZBdCmwE8qV2Dpa5WZnA@mail.gmail.com/ --- Changes in v3: - Squash localtry_lock conversion so it's used immediately. - Incorporate feedback and add tags from Suren and Harry - thanks! - Mostly adding comments and some refactoring. - Fixes for kfree_rcu_sheaf() vmalloc handling, cpu hotremove flushing. - Fix wrong condition in kmem_cache_return_sheaf() that may have affected performance negatively. - Refactoring of free_to_pcs() - Link to v2: https://lore.kernel.org/r/20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz Changes in v2: - Removed kfree_rcu() destructors support as VMAs will not need it anymore after [3] is merged. - Changed to localtry_lock_t borrowed from [2] instead of an own implementation of the same idea. - Many fixes and improvements thanks to Liam's adoption for maple tree nodes. - Userspace Testing stubs by Liam. - Reduced limitations/todos - hooking to kfree_rcu() is complete, prefilled sheaves can exceed cache's sheaf_capacity. - Link to v1: https://lore.kernel.org/r/20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@suse.cz --- Liam R. Howlett (2): tools: Add testing support for changes to rcu and slab for sheaves tools: Add sheafs support to testing infrastructure Sebastian Andrzej Siewior (1): locking/local_lock: Introduce localtry_lock_t Vlastimil Babka (5): slab: add opt-in caching layer of percpu sheaves slab: add sheaf support for batching kfree_rcu() operations slab: sheaf prefilling for guaranteed allocations slab: determine barn status racily outside of lock maple_tree: use percpu sheaves for maple_node_cache include/linux/local_lock.h | 70 ++ include/linux/local_lock_internal.h | 146 ++++ include/linux/slab.h | 50 ++ lib/maple_tree.c | 11 +- mm/slab.h | 4 + mm/slab_common.c | 29 +- mm/slub.c | 1463 +++++++++++++++++++++++++++++++-- tools/include/linux/slab.h | 65 +- tools/testing/shared/linux.c | 108 ++- tools/testing/shared/linux/rcupdate.h | 22 + 10 files changed, 1891 insertions(+), 77 deletions(-) --- base-commit: 379487e17ca406b47392e7ab6cf35d1c3bacb371 change-id: 20231128-slub-percpu-caches-9441892011d7 prerequisite-message-id: 20250203-slub-tiny-kfree_rcu-v1-0-d4428bf9a8a1@suse.cz prerequisite-patch-id: 1a4af92b5eb1b8bfc86bac8d7fc1ef0963e7d9d6 prerequisite-patch-id: f24a39c38103b7e09fbf2e6f84e6108499ab7980 prerequisite-patch-id: 23e90b23482f4775c95295821dd779ba4e3712e9 prerequisite-patch-id: 5c53a619477acdce07071abec0f40e79501ea40b Best regards,