From patchwork Tue Mar 14 15:36:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Rutland X-Patchwork-Id: 13174659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DE7ADC6FD1C for ; Tue, 14 Mar 2023 15:38:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=uMFaaKsd5lXOgVosuXX2w+ny7+JXBMKJCsF99S91p8k=; b=nI35fe+HF5yI5f Ih5VD8XapNWAOvXU01c9dE378/Z+iofqYkXv7CV/X9zTqMKiBbyv+S4uulZPfqyppqIeOJccyaOvI YK7SekS/esGddnTk0ngIvIJWhnb4hf7+cp/7wImcI/5aUYe8L1i53KxSXFWGc9gVX46l6FCE6Curz 2tPb9n0c0JvEt5N7DXIQCifVdb+WAuC4JNXaBj7BFiP6xnIV0iig5wT6i5ug9lwdXxB0zj9vRNAys 7QF8a63NbrLQJ8KMmPchKzLM3K5kHSvrzH1nJWAuuf2Vlw6ryLUDC0oavSXo5n1y1xBzweUOO0dtz snkjEZKVANfkUQdF0O1Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pc6iA-00AdQu-1u; Tue, 14 Mar 2023 15:37:14 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pc6i6-00AdOD-0M for linux-arm-kernel@lists.infradead.org; Tue, 14 Mar 2023 15:37:11 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C9EBB150C; Tue, 14 Mar 2023 08:37:52 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 44C323F67D; Tue, 14 Mar 2023 08:37:08 -0700 (PDT) From: Mark Rutland To: linux-arm-kernel@lists.infradead.org Cc: catalin.marinas@arm.com, mark.rutland@arm.com, peterz@infradead.org, robin.murphy@arm.com, will@kernel.org Subject: [PATCH v2 1/4] arm64: atomics: lse: improve cmpxchg implementation Date: Tue, 14 Mar 2023 15:36:57 +0000 Message-Id: <20230314153700.787701-2-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230314153700.787701-1-mark.rutland@arm.com> References: <20230314153700.787701-1-mark.rutland@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230314_083710_254239_35F2BEB7 X-CRM114-Status: GOOD ( 15.19 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org For historical reasons, the LSE implementation of cmpxchg*() hard-codes the GPRs to use, and shuffles registers around with MOVs. This is no longer necessary, and can be simplified. When the LSE cmpxchg implementation was added in commit: c342f78217e822d2 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU") ... the LL/SC implementation of cmpxchg() would be placed out-of-line, and the in-line assembly for cmpxchg would default to: NOP BL NOP The LL/SC implementation of each cmpxchg() function accepted arguments as per AAPCS64 rules, to it was necessary to place the pointer in x0, the older value in X1, and the new value in x2, and acquire the return value from x0. The LL/SC implementation required a temporary register (e.g. for the STXR status value). As the LL/SC implementation preserved the old value, the LSE implementation does likewise. Since commit: addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics") ... the LSE and LL/SC implementations of cmpxchg are inlined as separate asm blocks, with another branch choosing between thw two. Due to this, it is no longer necessary for the LSE implementation to match the register constraints of the LL/SC implementation. This was partially dealt with by removing the hard-coded use of x30 in commit: 3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics") ... but we didn't clean up the hard-coding of x0, x1, and x2. This patch simplifies the LSE implementation of cmpxchg, removing the register shuffling and directly clobbering the 'old' argument. This gives the compiler greater freedom for register allocation, and avoids redundant work. The new constraints permit 'old' (Rs) and 'new' (Rt) to be allocated to the same register when the initial values of the two are the same, e.g. resulting in: CAS X0, X0, [X1] This is safe as Rs is only written back after the initial values of Rs and Rt are consumed, and there are no UNPREDICTABLE behaviours to avoid when Rs == Rt. The new constraints also permit 'new' to be allocated to the zero register, avoiding a MOV in a few cases. The same cannot be done for 'old' as it is both an input and output, and any caller of cmpxchg() should care about the output value. Note that for CAS* the use of the zero register never affects the ordering (while for SWP* the use of the zero regsiter for the 'old' value drops any ACQUIRE semantic). Compared to v6.2-rc4, a defconfig vmlinux is ~116KiB smaller, though the resulting Image is the same size due to internal alignment and padding: [mark@lakrids:~/src/linux]% ls -al vmlinux-* -rwxr-xr-x 1 mark mark 137269304 Jan 16 11:59 vmlinux-after -rwxr-xr-x 1 mark mark 137387936 Jan 16 10:54 vmlinux-before [mark@lakrids:~/src/linux]% ls -al Image-* -rw-r--r-- 1 mark mark 38711808 Jan 16 11:59 Image-after -rw-r--r-- 1 mark mark 38711808 Jan 16 10:54 Image-before This patch does not touch cmpxchg_double*() as that requires contiguous register pairs, and separate patches will replace it with cmpxchg128*(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Peter Zijlstra Cc: Robin Murphy Cc: Will Deacon --- arch/arm64/include/asm/atomic_lse.h | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h index a94d6dacc029..319958b95cfd 100644 --- a/arch/arm64/include/asm/atomic_lse.h +++ b/arch/arm64/include/asm/atomic_lse.h @@ -251,22 +251,15 @@ __lse__cmpxchg_case_##name##sz(volatile void *ptr, \ u##sz old, \ u##sz new) \ { \ - register unsigned long x0 asm ("x0") = (unsigned long)ptr; \ - register u##sz x1 asm ("x1") = old; \ - register u##sz x2 asm ("x2") = new; \ - unsigned long tmp; \ - \ asm volatile( \ __LSE_PREAMBLE \ - " mov %" #w "[tmp], %" #w "[old]\n" \ - " cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n" \ - " mov %" #w "[ret], %" #w "[tmp]" \ - : [ret] "+r" (x0), [v] "+Q" (*(u##sz *)ptr), \ - [tmp] "=&r" (tmp) \ - : [old] "r" (x1), [new] "r" (x2) \ + " cas" #mb #sfx " %" #w "[old], %" #w "[new], %[v]\n" \ + : [v] "+Q" (*(u##sz *)ptr), \ + [old] "+r" (old) \ + : [new] "rZ" (new) \ : cl); \ \ - return x0; \ + return old; \ } __CMPXCHG_CASE(w, b, , 8, )