From patchwork Fri Feb 14 15:07:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 13975062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F9E1C021A6 for ; Fri, 14 Feb 2025 15:09:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFE636B0082; Fri, 14 Feb 2025 10:09:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D5C546B0083; Fri, 14 Feb 2025 10:09:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B389E6B0088; Fri, 14 Feb 2025 10:09:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 950286B0082 for ; Fri, 14 Feb 2025 10:09:44 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 45DF7C0512 for ; Fri, 14 Feb 2025 15:09:44 +0000 (UTC) X-FDA: 83118884688.18.EB6D0AC Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) by imf02.hostedemail.com (Postfix) with ESMTP id 411F68000E for ; Fri, 14 Feb 2025 15:09:42 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bGplA+ne; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of ubizjak@gmail.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=ubizjak@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739545782; a=rsa-sha256; cv=none; b=lBQ80NBJX2qEXYirmKzfIlnhb8iVpvihqRtbyMlHo4m96yyFjsJsFjAPoCxod0bQUGPMUk rAqYpApCIHqKUcIIgRIt8tXofeswPtZnLFBNYCNq3HU+zpGD6kFuYuK41rrLCL13RM+09c n61aEOa357/IN4DGPluFESEO5N1FdUI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bGplA+ne; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of ubizjak@gmail.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=ubizjak@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739545782; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2zJBQ4YpMD9soorm1behI7kialp0O7eXmIoL4a6+Dak=; b=Oo0TI47fzRBRPLy/u3kGY1p9lZ3Z6MvFJ8anb4RR92piEScnwl8e6uFENAE+dYUTgq891J HdOwTx6O6Ce7FcjTRwMTjlHA6jl1HyoRzozo3ph1E5h0kkaLbbEGtXBX1N5RGKFyIhRnc5 rF6pxC3yj4C6JDEs1M2e/TnRa7HWc9o= Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-38dd93a4e8eso1874663f8f.1 for ; Fri, 14 Feb 2025 07:09:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739545781; x=1740150581; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2zJBQ4YpMD9soorm1behI7kialp0O7eXmIoL4a6+Dak=; b=bGplA+neLkxkxCErSvJzbINeB8oPqLuRoXXzL56Rwv/WwfOAssFx7z8BjRH+fsIJy1 2uc1RKxLRJSadhODNUv6otYEtiRlq1m8Fqm4831WeBEYfWv1jtdCgVkRbLm7PjDW9hMB V1eQymPTlk4H1C6aCv74oBt7zQaqY3qLClGYl5wyfkLjn4uq1h7zM9R/kfRWAFZTzD28 OqG8ghwArr0Dm1TieEHnBY5/YMZlDWjL2JzvQpxksGGC706HpjPJO8rjoOAewyxVxVeM Q7i8MeW9iikP+kwvNEKUdkpRLNRqTItz55TjMi05NglaHsgx0ffh0MBjuByQ4D6HvzgU MT+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739545781; x=1740150581; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2zJBQ4YpMD9soorm1behI7kialp0O7eXmIoL4a6+Dak=; b=d8wrQ7O5PfaB7GKShTsOq0miIB4eNaqzL2A7s7XpmpqiqF64ebx4QZy9VpFCyp4aPl lKaZ18LVfRRvJ8TVshr/Mq3LO4gB++jXnv5iJPexCc96i9cPUyxREVnpfXfwlPBj9aV9 mjv58KylkFvdbITu8CmhQgygN/G5doHCXf4wBOK5bV8wgNtUFmGcqX1bW7e/uM0lHvBo Arg5HhYEAuyb/31GIOaw38rsSZY0qG3cG4e/hZ8srjbNU/O6lvmM3rrbRJWsb5HPpn0x u6ow+dfPezKhIFrSGi06y92Pt6rQST3xydgCQq0RTvRwCNfl0PfXm7strOgtMKIl3RvF gb4w== X-Forwarded-Encrypted: i=1; AJvYcCXY9QqDib2LXaGRPfy7le5zNUStKByI+NQayry1Qmli3keKyhJbykMfKOCUmx2A5k5ZYO1+wvioiA==@kvack.org X-Gm-Message-State: AOJu0YwEZlStkJQoB+8/xzDalhCY5IaK9RJ6Yj9QYQgcFZDze8haT0vq LOVG0BqXnDEs6lPmNvgf3P7vcEKKnNKGjdq6HpJ9erd35jGG04ea X-Gm-Gg: ASbGncvpb6hnNNTOcsphjHAHOB2pKjd0RSCkL77ekaE6g9J2WgRqBfcUj+eTYsXq9iI 0W56j4gG11k7nSGKQumlVd+gC1x9na5BrjdfXTpzKh6fV6A1OIhGKUjX263eQgzMGHTD88V3hdw 3lGKi3LAjaKJT31eMeI2mypAszsGQhDO9e2VtNWQAbMpJ+JOPnXO0JreWa/xKqDLrXvcNftYeuY CJZabt9Q2rpNA3RyVGWIx2/vQLjokAbDsDpkHlZnP5c05N0VId1B7VSejPpK+bGfKQiiXsEkuvT IyG/JF1nPqE7tTxvZXZR1JCt X-Google-Smtp-Source: AGHT+IFum9XwSDM3eHz0B6ljntzlQn+pA3NOKxdBCEjZBiROAFX1QudujE8vXpbeAXil4HYQjBINIg== X-Received: by 2002:a5d:47c9:0:b0:38d:d5af:29b4 with SMTP id ffacd0b85a97d-38f24526116mr10583092f8f.45.1739545780313; Fri, 14 Feb 2025 07:09:40 -0800 (PST) Received: from localhost.localdomain ([46.248.82.114]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aba533bee9dsm357447266b.173.2025.02.14.07.09.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Feb 2025 07:09:39 -0800 (PST) From: Uros Bizjak To: x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Uros Bizjak , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Dennis Zhou , Tejun Heo , Christoph Lameter , "Peter Zijlstra (Intel)" Subject: [PATCH v2 2/2] x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations Date: Fri, 14 Feb 2025 16:07:47 +0100 Message-ID: <20250214150929.5780-2-ubizjak@gmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20250214150929.5780-1-ubizjak@gmail.com> References: <20250214150929.5780-1-ubizjak@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 411F68000E X-Rspamd-Server: rspam12 X-Stat-Signature: pgjwq7wgwr9fbnxr85m4rbn43kptjzco X-HE-Tag: 1739545782-410736 X-HE-Meta: U2FsdGVkX1/NFIxjBw+f/DvcTRo9MRYAUrphiPkYhNkYb1mqeZYqpKCvTOjCvuUNbVrZrhPC72DiYscF/8RtFZDR3bWj8dZ8oi7SpgPmpqveanZQUyjTlLSlCBTlbWf1HB5m0sQemEhNOtId5H6rzRqKfReHkQQQMnJY9IJlC/AhlRUkpBqZuyD05XaH5CEjvTNUEs923XiJtBLvBRvfgSIakcjG8/P+f2mGr1ZkrCHpHZNvV/hI8eAWiw0rjnBJvdCgSf9r4uA7r9tu9H/BwZZpeELeV67WkPy1MVZH/X4aI5Zb2xXc8D5uYpFRlJmGDJVE+s0qdWszxyWkYszXPKB7s+bkwZKHu6rAGVl4UQtDRftEEO8dtg1lUxBXvTkMVnJz4r3Fuu0xp0CohKPjQpj5sY9LNdJTMXt+uAuzGxt1mbbezwDhOhdA5NqZHtGcjK1SufJ2jWR6+K2tAKB3yeiyMTpQqvEaDcSvXncmZK7aAumNbJbLUwsB4mP/dkCGjFfQm7wa8G60s32+j3Ssc4Mep3l3xEFQOtAyCKybN563Y0tnYxlr6k+XFaNN2MyavX58JzpDOp3R2JGyRn5v7oVcWzp9IS+XGnjyCvHjGVkUbDR22PWZ8chaWS/Et11qJEdSmj9aG7LQ2GLJ1bMt9iC0NbV3mVkCxyxft49ALl0K4a/WpVwr/Tme8901B334znkHhjqx3mvldI97qklmM5caQgOg23B/L9K7EoytjXoDifwNvisb0XNS04Fa1UkZxTck8sF54oqNuMqXMQOsCapjq3+aTq69rgKm6EPt7kgfvKIovuTjcPV5k2bPZbyeLZe+xe2tfjx3RgxkcLsN8o5JZyz3FCBz2VsqnZ2KTXDyQ9A3QfdSy0StbFpkE5MVvbDHwpMG25IIbp6UuHMqp0n4QQsTmIappBxCM0uFYw+0byhsjTjz+aCgHo0EpuKMpgU1Qua48YLMPMAY0fY 2Eqogssu zhhAL6wn1zOYic/2s5/MJONpSMlhMWrJCRRpfFFN2a1l6B6xbpKMF8tJctqCNv1lG/ShPO657x3fEmEj8AInv+FaCmNnRsQlaE9byvSmFVeDxSiwQWJ2KZ2RtIXStV085QYya4/uVnH4ciiUso+3RNi2R9uYoau+TcLl8m5+3iSJbSGCbWhUFnQwOvkV4M264aIChYcWO3Px7V1QLYfUiif/io/Maft1TIvG3sfB00Sw3TCRgJL3lEEw9hTC5RIwXSHYppGcBBHb4HBvAA1wvWjr2B5ADKUR3c9e6ce+/EbVTuZoPQhTiOe+TPUtLudHCW1tyqzWP7oNwNZrZuPJHc7E7gJAwtVVZhk/NQG4tRmFjdiKWbImBwx9fsN+W7p3Vrih4GniTrSCueT+LiMZGO1seFzuuUEKzXtDiIVBoZIDnqvv97ZDTTql4rY5DHC1kwUVDNbpjjq7M9TAjF7ArUA2nOcCUsRWFYhSYY0Oc0boLhGdJIvVRljmJPpVomV87SR1sotiyDCjn2IVOlb3yk8WhNa+dXFURjpTQAYris18laPJG8yj6E9L6VOUYnk22rOXdhgpoWtthWsZOet3X+uLrTfn7byjCvtfQ98Qd5Q9yBFR4UKO2JrGERJVv/XbjAYdSVVcOdyQEVFRGvDEdZi39SU07c8IkFOCL9jI9q8771ZJ7uydsA4q/Rw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: According to [1], the usage of asm pseudo directives in the asm template can confuse the compiler to wrongly estimate the size of the generated code. ALTERNATIVE macro expands to several asm pseudo directives, so its usage in {,try_}cmpxchg{64,128} causes instruction length estimate to fail by an order of magnitude (the specially instrumented compiler reports the estimated length of these asm templates to be more than 20 instructions long). This wrong estimate further causes unoptimal inlining decisions, unoptimal instruction scheduling and unoptimal code block alignments for functions that use these locking primitives. Use asm_inline [2], a feature that makes GCC pretend some inline assembler code is tiny (while it would think it is huge), instead of just asm. For code size estimation, the size of the asm is then taken as the minimum size of one instruction, ignoring how many instructions compiler thinks it is. The effect of this patch on x86_64 target is minor, since 128-bit functions are rarely used on this target. The code size of the resulting defconfig object file stays the same: text data bss dec hex filename 27456612 4638523 814148 32909283 1f627e3 vmlinux-old.o 27456612 4638523 814148 32909283 1f627e3 vmlinux-new.o but the patch has minor effect on code layout due to the different scheduling decisions in functions containing changed macros. There is no effect on x64_32 target, the code size of the resulting defconfig object file and the code layout stays the same: text data bss dec hex filename 18883870 2679275 1707916 23271061 1631695 vmlinux-old.o 18883870 2679275 1707916 23271061 1631695 vmlinux-new.o [1] https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html Signed-off-by: Uros Bizjak Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Dennis Zhou Cc: Tejun Heo Cc: Christoph Lameter Cc: "Peter Zijlstra (Intel)" --- v2: Expand commit message with the explanation of asm_inline the explanation of benefits of the patch and with some code size measurements. --- arch/x86/include/asm/cmpxchg_32.h | 32 +++++++------ arch/x86/include/asm/percpu.h | 77 +++++++++++++++---------------- 2 files changed, 55 insertions(+), 54 deletions(-) diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h index fd1282a783dd..95b5f990ca88 100644 --- a/arch/x86/include/asm/cmpxchg_32.h +++ b/arch/x86/include/asm/cmpxchg_32.h @@ -91,12 +91,14 @@ static __always_inline bool __try_cmpxchg64_local(volatile u64 *ptr, u64 *oldp, union __u64_halves o = { .full = (_old), }, \ n = { .full = (_new), }; \ \ - asm volatile(ALTERNATIVE(_lock_loc \ - "call cmpxchg8b_emu", \ - _lock "cmpxchg8b %a[ptr]", X86_FEATURE_CX8) \ - : ALT_OUTPUT_SP("+a" (o.low), "+d" (o.high)) \ - : "b" (n.low), "c" (n.high), [ptr] "S" (_ptr) \ - : "memory"); \ + asm_inline volatile( \ + ALTERNATIVE(_lock_loc \ + "call cmpxchg8b_emu", \ + _lock "cmpxchg8b %a[ptr]", X86_FEATURE_CX8) \ + : ALT_OUTPUT_SP("+a" (o.low), "+d" (o.high)) \ + : "b" (n.low), "c" (n.high), \ + [ptr] "S" (_ptr) \ + : "memory"); \ \ o.full; \ }) @@ -119,14 +121,16 @@ static __always_inline u64 arch_cmpxchg64_local(volatile u64 *ptr, u64 old, u64 n = { .full = (_new), }; \ bool ret; \ \ - asm volatile(ALTERNATIVE(_lock_loc \ - "call cmpxchg8b_emu", \ - _lock "cmpxchg8b %a[ptr]", X86_FEATURE_CX8) \ - CC_SET(e) \ - : ALT_OUTPUT_SP(CC_OUT(e) (ret), \ - "+a" (o.low), "+d" (o.high)) \ - : "b" (n.low), "c" (n.high), [ptr] "S" (_ptr) \ - : "memory"); \ + asm_inline volatile( \ + ALTERNATIVE(_lock_loc \ + "call cmpxchg8b_emu", \ + _lock "cmpxchg8b %a[ptr]", X86_FEATURE_CX8) \ + CC_SET(e) \ + : ALT_OUTPUT_SP(CC_OUT(e) (ret), \ + "+a" (o.low), "+d" (o.high)) \ + : "b" (n.low), "c" (n.high), \ + [ptr] "S" (_ptr) \ + : "memory"); \ \ if (unlikely(!ret)) \ *(_oldp) = o.full; \ diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 0ab991fba7de..08f5f61690b7 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -348,15 +348,14 @@ do { \ old__.var = _oval; \ new__.var = _nval; \ \ - asm qual (ALTERNATIVE("call this_cpu_cmpxchg8b_emu", \ - "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \ - : ALT_OUTPUT_SP([var] "+m" (__my_cpu_var(_var)), \ - "+a" (old__.low), \ - "+d" (old__.high)) \ - : "b" (new__.low), \ - "c" (new__.high), \ - "S" (&(_var)) \ - : "memory"); \ + asm_inline qual ( \ + ALTERNATIVE("call this_cpu_cmpxchg8b_emu", \ + "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \ + : ALT_OUTPUT_SP([var] "+m" (__my_cpu_var(_var)), \ + "+a" (old__.low), "+d" (old__.high)) \ + : "b" (new__.low), "c" (new__.high), \ + "S" (&(_var)) \ + : "memory"); \ \ old__.var; \ }) @@ -378,17 +377,16 @@ do { \ old__.var = *_oval; \ new__.var = _nval; \ \ - asm qual (ALTERNATIVE("call this_cpu_cmpxchg8b_emu", \ - "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \ - CC_SET(z) \ - : ALT_OUTPUT_SP(CC_OUT(z) (success), \ - [var] "+m" (__my_cpu_var(_var)), \ - "+a" (old__.low), \ - "+d" (old__.high)) \ - : "b" (new__.low), \ - "c" (new__.high), \ - "S" (&(_var)) \ - : "memory"); \ + asm_inline qual ( \ + ALTERNATIVE("call this_cpu_cmpxchg8b_emu", \ + "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \ + CC_SET(z) \ + : ALT_OUTPUT_SP(CC_OUT(z) (success), \ + [var] "+m" (__my_cpu_var(_var)), \ + "+a" (old__.low), "+d" (old__.high)) \ + : "b" (new__.low), "c" (new__.high), \ + "S" (&(_var)) \ + : "memory"); \ if (unlikely(!success)) \ *_oval = old__.var; \ \ @@ -419,15 +417,14 @@ do { \ old__.var = _oval; \ new__.var = _nval; \ \ - asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ - "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \ - : ALT_OUTPUT_SP([var] "+m" (__my_cpu_var(_var)), \ - "+a" (old__.low), \ - "+d" (old__.high)) \ - : "b" (new__.low), \ - "c" (new__.high), \ - "S" (&(_var)) \ - : "memory"); \ + asm_inline qual ( \ + ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ + "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \ + : ALT_OUTPUT_SP([var] "+m" (__my_cpu_var(_var)), \ + "+a" (old__.low), "+d" (old__.high)) \ + : "b" (new__.low), "c" (new__.high), \ + "S" (&(_var)) \ + : "memory"); \ \ old__.var; \ }) @@ -449,19 +446,19 @@ do { \ old__.var = *_oval; \ new__.var = _nval; \ \ - asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ - "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \ - CC_SET(z) \ - : ALT_OUTPUT_SP(CC_OUT(z) (success), \ - [var] "+m" (__my_cpu_var(_var)), \ - "+a" (old__.low), \ - "+d" (old__.high)) \ - : "b" (new__.low), \ - "c" (new__.high), \ - "S" (&(_var)) \ - : "memory"); \ + asm_inline qual ( \ + ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ + "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \ + CC_SET(z) \ + : ALT_OUTPUT_SP(CC_OUT(z) (success), \ + [var] "+m" (__my_cpu_var(_var)), \ + "+a" (old__.low), "+d" (old__.high)) \ + : "b" (new__.low), "c" (new__.high), \ + "S" (&(_var)) \ + : "memory"); \ if (unlikely(!success)) \ *_oval = old__.var; \ + \ likely(success); \ })