From patchwork Mon Nov 20 18:34:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 13461891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FFC8C197A0 for ; Mon, 20 Nov 2023 18:35:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E83F6B0406; Mon, 20 Nov 2023 13:34:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 775CB6B040B; Mon, 20 Nov 2023 13:34:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF9456B0425; Mon, 20 Nov 2023 13:34:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AECC96B03AC for ; Mon, 20 Nov 2023 13:34:48 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8167CC0398 for ; Mon, 20 Nov 2023 18:34:48 +0000 (UTC) X-FDA: 81479183856.30.171C9AF Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf01.hostedemail.com (Postfix) with ESMTP id 54F524000C for ; Mon, 20 Nov 2023 18:34:46 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=hwuFRol6; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=YfZzeD4+; spf=pass (imf01.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700505286; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mirY2Hk97uPg9xiXuUhtPBwRnD0yXek2uhM+MkvFe9M=; b=fIYwZo72xuEZMoGSD2vlZyEW1ByRwEsJx2Db4kv2sG6R4szJMP05UentPDGZqEs4nIk7lu zCS2pY/dc4pV6+d7+tp9RtVR26Hr425W+Urc5eGLIPQb6CAe4mRMsjWptbp6oUMN2ePfds r54ygP0Q89vR53DgN4Ec1t4yR8yoon8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700505286; a=rsa-sha256; cv=none; b=uN9YA4O1IX9SKHQ+lhjC5S2SIErGxcCy79DWTDsec6PRYhf6TBv2kmBaj2ryJzXxNcrUZA ABptcaiyhFoQcU18vf2B4rOGSDz5aB+TgOGwRF9xqGbjKiyiWbgH6F42Y9/bInLQDU45Rb 0+ZTa2ph7S4jv66E8eZZBeYV0xIcSdE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=hwuFRol6; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=YfZzeD4+; spf=pass (imf01.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 62AEA1F8B3; Mon, 20 Nov 2023 18:34:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1700505284; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mirY2Hk97uPg9xiXuUhtPBwRnD0yXek2uhM+MkvFe9M=; b=hwuFRol6oXcEWzuDdHFpAqjOeTYFxJUj4nJ61iKgOMub5bOZmoLsewrjU4j1fuVJ1OSXDm CegBQ+7CW2HYU4fz4wV32EXvIfYNs3hsn6UnGU+jOyDE3WHEm3VUHEeqEbsI4YTrjUFbv5 +o9i6vFUMX5QJg7Us70hJbpp7UGaE/s= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1700505284; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mirY2Hk97uPg9xiXuUhtPBwRnD0yXek2uhM+MkvFe9M=; b=YfZzeD4+Rgm0cLt0ciF30u+ZuqPIvKG5rXL0BoDGzTctb5xvenFwHMkVqTBMDVZX+z8ZmD z3n7hRGVcqZj3xBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1AE0B13499; Mon, 20 Nov 2023 18:34:44 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 6JftBcSmW2UUMgAAMHmgww (envelope-from ); Mon, 20 Nov 2023 18:34:44 +0000 From: Vlastimil Babka Date: Mon, 20 Nov 2023 19:34:31 +0100 Subject: [PATCH v2 20/21] mm/slub: optimize alloc fastpath code layout MIME-Version: 1.0 Message-Id: <20231120-slab-remove-slab-v2-20-9c9c70177183@suse.cz> References: <20231120-slab-remove-slab-v2-0-9c9c70177183@suse.cz> In-Reply-To: <20231120-slab-remove-slab-v2-0-9c9c70177183@suse.cz> To: David Rientjes , Christoph Lameter , Pekka Enberg , Joonsoo Kim Cc: Andrew Morton , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Marco Elver , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Kees Cook , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org, linux-hardening@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.12.4 X-Rspamd-Queue-Id: 54F524000C X-Rspam-User: X-Stat-Signature: 7o8en6au3crre3npf7cpz4xz3gsoerhg X-Rspamd-Server: rspam03 X-HE-Tag: 1700505286-101305 X-HE-Meta: U2FsdGVkX1954UIFM/dcCU3ETyNGU0SLuGJAbP5fkwaXdy8pcDBiz6sflXCfR/W4aaRhmOVC4j1OAX4cYv0B9/6XhSCSekOek39TDCr/xoHk7zESFLE1qZLKk2gHjAy44Iiqnm/or8lOY7U9rAhPrNWaPxe0kLiChhyxSRsBF5ueoXpaxk51XMAyOSj0WHFLQixzxqgdaJj5FJ7uOqrRPo3IeaXbCp8RFtyiM58erPaztPO86o810SStS2l6Nu8n8SQ0kfObiKWOLOzFW74M7oVM6z9fz8duzaYzGRQyhJZ3JQN6Fk/jXLwQmjnuvqTO5fsXhFn+IFiGZCavuxYzl9YOKiQHSQ2V4r9jfmp6mrbuS/VCRwEkToWmdYl6S9CaIqnqGqE/7bmmzdhiHeGdpSts7yNKG9BPCVl+lY7VwbMDD4PkY59fB7KPOu++LW3DFzAP5lU9lxo6OpcZtyIi+LV0DzB1L77LTIT9BksfwUc+H9Q9GvdT26PvuC/oxzdHeLpt8F+iqCNOjjNMEa7WMxgXTYOiTBe8SXZGUXVfuXjGDVMflp4oWgGSpdJsVP2vD2t5xzTKes/+cx1WbthX7a27ajEFhhqs5u9hyoH2ZRE1cLwB9235GonyTMpWO5bSxQu+MrCZ6oQEczP4J16LJJA8VZ/xxBiPZz2iaLISb2YTeXvp4wAKJCE5FUFzr6mMGg1EhavaWoDZPGTCo2ORzO8bKawVmPe3P2+s6ax2dPEdNsPh1uPWtfEB8ZryyJ/ryQPCMoZfPJ6BuMpvwrWc0vGLFmYdsK6+Tiip0E3eIJSK7om9OVw6e2Z4ohp/Pq1QZ/MIpBGDKh+ZRY5KbZJPAAEZJLj4B0Mz9IbxE1XjB0LFiJNihw7pKsn9bA/dJBqmoZEnLJhhWN2L2c1D2Ga4TmLhm72darTcOxF7nH0jyStrCMcNPCmj05ygd3d2AUXx1V7r/IjK5HpyZ+KiRmo Vbd1aQ8S ikCNmCvmM7rsBk9ChNlUA935ojyw/nUCg5fkl3EUNeXlVU6rVYBH2Vro3G+i41/WldnWkYuY0vqg7gJilzqq0LMdxSejgx0YCfFSWUOs3nN6Y2pveb9LhT1pSZQbLlm07oWyjQ+KbsX9yf81Ke0ecSLnOaihE448fzoS30GwcyuMq6QQsFaShzVfOldLjT2ucLyrpqJf7rVCBkWavabOsolChs5HkLmIFN6qGafTvPM9VUYT2JMSPsloQC84oCWucOkyJzdTxGWkE2HMYNX2oZhrp3Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With allocation fastpaths no longer divided between two .c files, we have better inlining, however checking the disassembly of kmem_cache_alloc() reveals we can do better to make the fastpaths smaller and move the less common situations out of line or to separate functions, to reduce instruction cache pressure. - split memcg pre/post alloc hooks to inlined checks that use likely() to assume there will be no objcg handling necessary, and non-inline functions doing the actual handling - add some more likely/unlikely() to pre/post alloc hooks to indicate which scenarios should be out of line - change gfp_allowed_mask handling in slab_post_alloc_hook() so the code can be optimized away when kasan/kmsan/kmemleak is configured out bloat-o-meter shows: add/remove: 4/2 grow/shrink: 1/8 up/down: 521/-2924 (-2403) Function old new delta __memcg_slab_post_alloc_hook - 461 +461 kmem_cache_alloc_bulk 775 791 +16 __pfx_should_failslab.constprop - 16 +16 __pfx___memcg_slab_post_alloc_hook - 16 +16 should_failslab.constprop - 12 +12 __pfx_memcg_slab_post_alloc_hook 16 - -16 kmem_cache_alloc_lru 1295 1023 -272 kmem_cache_alloc_node 1118 817 -301 kmem_cache_alloc 1076 772 -304 kmalloc_node_trace 1149 838 -311 kmalloc_trace 1102 789 -313 __kmalloc_node_track_caller 1393 1080 -313 __kmalloc_node 1397 1082 -315 __kmalloc 1374 1059 -315 memcg_slab_post_alloc_hook 464 - -464 Note that gcc still decided to inline __memcg_pre_alloc_hook(), but the code is out of line. Forcing noinline did not improve the results. As a result the fastpaths are shorter and overal code size is reduced. Signed-off-by: Vlastimil Babka Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/slub.c | 89 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 54 insertions(+), 35 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 5683f1d02e4f..77d259f3d592 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1866,25 +1866,17 @@ static inline size_t obj_full_size(struct kmem_cache *s) /* * Returns false if the allocation should fail. */ -static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, - struct list_lru *lru, - struct obj_cgroup **objcgp, - size_t objects, gfp_t flags) +static bool __memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, + struct obj_cgroup **objcgp, + size_t objects, gfp_t flags) { - struct obj_cgroup *objcg; - - if (!memcg_kmem_online()) - return true; - - if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)) - return true; - /* * The obtained objcg pointer is safe to use within the current scope, * defined by current task or set_active_memcg() pair. * obj_cgroup_get() is used to get a permanent reference. */ - objcg = current_obj_cgroup(); + struct obj_cgroup *objcg = current_obj_cgroup(); if (!objcg) return true; @@ -1907,17 +1899,34 @@ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, return true; } -static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, - struct obj_cgroup *objcg, - gfp_t flags, size_t size, - void **p) +/* + * Returns false if the allocation should fail. + */ +static __fastpath_inline +bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, struct list_lru *lru, + struct obj_cgroup **objcgp, size_t objects, + gfp_t flags) +{ + if (!memcg_kmem_online()) + return true; + + if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))) + return true; + + return likely(__memcg_slab_pre_alloc_hook(s, lru, objcgp, objects, + flags)); +} + +static void __memcg_slab_post_alloc_hook(struct kmem_cache *s, + struct obj_cgroup *objcg, + gfp_t flags, size_t size, + void **p) { struct slab *slab; unsigned long off; size_t i; - if (!memcg_kmem_online() || !objcg) - return; + flags &= gfp_allowed_mask; for (i = 0; i < size; i++) { if (likely(p[i])) { @@ -1940,6 +1949,16 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, } } +static __fastpath_inline +void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, + gfp_t flags, size_t size, void **p) +{ + if (likely(!memcg_kmem_online() || !objcg)) + return; + + return __memcg_slab_post_alloc_hook(s, objcg, flags, size, p); +} + static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects) { @@ -3709,34 +3728,34 @@ noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags) } ALLOW_ERROR_INJECTION(should_failslab, ERRNO); -static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, - struct list_lru *lru, - struct obj_cgroup **objcgp, - size_t size, gfp_t flags) +static __fastpath_inline +struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, + struct obj_cgroup **objcgp, + size_t size, gfp_t flags) { flags &= gfp_allowed_mask; might_alloc(flags); - if (should_failslab(s, flags)) + if (unlikely(should_failslab(s, flags))) return NULL; - if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags)) + if (unlikely(!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))) return NULL; return s; } -static inline void slab_post_alloc_hook(struct kmem_cache *s, - struct obj_cgroup *objcg, gfp_t flags, - size_t size, void **p, bool init, - unsigned int orig_size) +static __fastpath_inline +void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, + gfp_t flags, size_t size, void **p, bool init, + unsigned int orig_size) { unsigned int zero_size = s->object_size; bool kasan_init = init; size_t i; - - flags &= gfp_allowed_mask; + gfp_t init_flags = flags & gfp_allowed_mask; /* * For kmalloc object, the allocated memory size(object_size) is likely @@ -3769,13 +3788,13 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, * As p[i] might get tagged, memset and kmemleak hook come after KASAN. */ for (i = 0; i < size; i++) { - p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init); + p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init); if (p[i] && init && (!kasan_init || !kasan_has_integrated_init())) memset(p[i], 0, zero_size); kmemleak_alloc_recursive(p[i], s->object_size, 1, - s->flags, flags); - kmsan_slab_alloc(s, p[i], flags); + s->flags, init_flags); + kmsan_slab_alloc(s, p[i], init_flags); } memcg_slab_post_alloc_hook(s, objcg, flags, size, p); @@ -3799,7 +3818,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list bool init = false; s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags); - if (!s) + if (unlikely(!s)) return NULL; object = kfence_alloc(s, orig_size, gfpflags);