From patchwork Mon Feb 6 11:58:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Rutland X-Patchwork-Id: 13129727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3C8E5C61DA4 for ; Mon, 6 Feb 2023 12:00:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=NXa+GCh0sRRL6M99z4s6ATyzckkB3AYEcoV/IY9tKXM=; b=TjH2yP4xVCbxrO qwB51EelkZ1gDhdzo2EpFhXtW03x3rdGDS++IdxOXvOhmzDmHEkB+BvZNcDfDRic1d7fvlRXJZx1y TzZ7CbohBxxzM7qzIaiKYbtj73VwpI5+p9k+Ftnkf0m149DT1JHSAmLapHTeM+DvoYpu62Ui0Ia6S D21aDGgjHg48AXcscIrrbTgli+IMzcD+mLDjNwwoEWanJcIpDVHbHIbB1lJSJ4AoiDkJ/8X4asUnY dVAa5dYPkta+fCwZHsRATVpUs8i9JYFGkqIjKP9xWNXvZtunLFgqb5IxdlIQqF9gpSqC1g8fLVUP/ VctPXHonQPrkTyPspRZw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pP09U-008OFz-0C; Mon, 06 Feb 2023 11:59:16 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pP09P-008OCs-Cr for linux-arm-kernel@lists.infradead.org; Mon, 06 Feb 2023 11:59:13 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 488BEFEC; Mon, 6 Feb 2023 03:59:44 -0800 (PST) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 2F9AA3F8C6; Mon, 6 Feb 2023 03:59:01 -0800 (PST) From: Mark Rutland To: linux-arm-kernel@lists.infradead.org Cc: catalin.marinas@arm.com, mark.rutland@arm.com, peterz@infradead.org, will@kernel.org Subject: [PATCH] arm64: atomics: lse: improve cmpxchg implementation Date: Mon, 6 Feb 2023 11:58:52 +0000 Message-Id: <20230206115852.265006-1-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_035911_566973_3F7DD367 X-CRM114-Status: GOOD ( 13.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org For historical reasons, the LSE implementation of cmpxchg*() hard-codes the GPRs to use, and shuffles registers around with MOVs. This is no longer necessary, and can be simplified. When the LSE cmpxchg implementation was added in commit: c342f78217e822d2 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU") ... the LL/SC implementation of cmpxchg() would be placed out-of-line, and the in-line assembly for cmpxchg would default to: NOP BL NOP The LL/SC implementation of each cmpxchg() function accepted arguments as per AAPCS64 rules, to it was necessary to place the pointer in x0, the older value in X1, and the new value in x2, and acquire the return value from x0. The LL/SC implementation required a temporary register (e.g. for the STXR status value). As the LL/SC implementation preserved the old value, the LSE implementation does likewise. Since commit: addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics") ... the LSE and LL/SC implementations of cmpxchg are inlined as separate asm blocks, with another branch choosing between thw two. Due to this, it is no longer necessary for the LSE implementation to match the register constraints of the LL/SC implementation. This was partially dealt with by removing the hard-coded use of x30 in commit: 3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics") ... but we didn't clean up the hard-coding of x0, x1, and x2. This patch simplifies the LSE implementation of cmpxchg, removing the register shuffling and directly clobbering the 'old' argument. This gives the compiler greater freedom for register allocation, and avoids redundant work. The new constraints permit 'old' (Rs) and 'new' (Rt) to be allocated to the same register when the initial values of the two are the same, e.g. resulting in: CAS X0, X0, [X1] This is safe as Rs is only written back after the initial values of Rs and Rt are consumed, and there are no UNPREDICTABLE behaviours to avoid when Rs == Rt. Compared to v6.2-rc4, a defconfig vmlinux is ~116KiB smaller, though the resulting Image is the same size due to internal alignment and padding: [mark@lakrids:~/src/linux]% ls -al vmlinux-* -rwxr-xr-x 1 mark mark 137269304 Jan 16 11:59 vmlinux-after -rwxr-xr-x 1 mark mark 137387936 Jan 16 10:54 vmlinux-before [mark@lakrids:~/src/linux]% ls -al Image-* -rw-r--r-- 1 mark mark 38711808 Jan 16 11:59 Image-after -rw-r--r-- 1 mark mark 38711808 Jan 16 10:54 Image-before This patch does not touch cmpxchg_double*() as that requires contiguous register pairs, and separate patches will replace it with cmpxchg128*(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Peter Zijlstra Cc: Will Deacon --- arch/arm64/include/asm/atomic_lse.h | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) I spotted this when looking at Peter's cmpxchg128() series; this should be independent and I beleive it shouldn't conflict. Mark. diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h index a94d6dacc029..5c964259db1f 100644 --- a/arch/arm64/include/asm/atomic_lse.h +++ b/arch/arm64/include/asm/atomic_lse.h @@ -251,22 +251,15 @@ __lse__cmpxchg_case_##name##sz(volatile void *ptr, \ u##sz old, \ u##sz new) \ { \ - register unsigned long x0 asm ("x0") = (unsigned long)ptr; \ - register u##sz x1 asm ("x1") = old; \ - register u##sz x2 asm ("x2") = new; \ - unsigned long tmp; \ - \ asm volatile( \ __LSE_PREAMBLE \ - " mov %" #w "[tmp], %" #w "[old]\n" \ - " cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n" \ - " mov %" #w "[ret], %" #w "[tmp]" \ - : [ret] "+r" (x0), [v] "+Q" (*(u##sz *)ptr), \ - [tmp] "=&r" (tmp) \ - : [old] "r" (x1), [new] "r" (x2) \ + " cas" #mb #sfx " %" #w "[old], %" #w "[new], %[v]\n" \ + : [v] "+Q" (*(u##sz *)ptr), \ + [old] "+r" (old) \ + : [new] "r" (new) \ : cl); \ \ - return x0; \ + return old; \ } __CMPXCHG_CASE(w, b, , 8, )