From patchwork Wed Apr 20 14:44:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guo Ren X-Patchwork-Id: 12820394 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E81B3C433EF for ; Wed, 20 Apr 2022 14:44:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=tMSq2KaUPqs68FpLwMMksXJCuFpC0FY+EJ6LOxPk30M=; b=Ri8yZ+DpOK7J45 iabIDn73P9eQaBTuKDueRU366t56k0RaI/AELO9uPlbPkiLljWdCchtsK81HM1SnlGK7M9yeIP9c4 evkam4tASm9ejGhfH4W5XyVk6YHe76qrTfKGTLEsu2iaGIbE8ynsEdB3RXzS1We+bIeaH3zEgj3sn tOPZJ2bSmOuhNOClR3pmaEAPhS3WymHRCAW4MuqS3rSSZ8mC3/MN0vjh8BSBaSowaIhrtDf+H7Lwl EyD20bqhElKHtjpB25Uacs3v831leCqjZA7BTRLMgTrsWl6ayZSL9hEueeljHUlPGmzoh7w1nJXug yldoraL/jP0HBx6aCQCw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nhBZW-009RK7-2Q; Wed, 20 Apr 2022 14:44:46 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nhBZS-009RHG-LZ for linux-riscv@lists.infradead.org; Wed, 20 Apr 2022 14:44:44 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5B6A1B81CCE; Wed, 20 Apr 2022 14:44:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF5FCC385AB; Wed, 20 Apr 2022 14:44:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1650465880; bh=jRua+3pFZKtrCEO8fla+OBQVV80FESoUd1XaSPN+HWo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IGh3FIne/NwlHFQDzjBuB4RphbxDBgOs91t76uO24wKle3gnxllG7v0qpvE3+nJlR lqKUOTdzoxN22Ea1a/4DTY+RSXLAYWIAyZswQ/CQOtGEZogd7Aqv5k6qrAunC20L8e Bgg3XZ1Nmw/E9JHQPcCQlqKeeO4EPn6+RkxiYmoZHoCavSZEHI2Qu3ZyxGs0UyT3gR 1tYGKPdbocbRPD4rOvDO5/1iqaTuStzMifGPu+8wdyHnJ/4a+slaPebPE5C6sdnfZ0 0y4PL9tbaiFl6JPC/yfLKHSg69mqTQOxXpHzntLFkJYDgxe/zhnKC6+LYGpFi+/Ova NnV6R9QDrYnVQ== From: guoren@kernel.org To: guoren@kernel.org, arnd@arndb.de, palmer@dabbelt.com, mark.rutland@arm.com, will@kernel.org, peterz@infradead.org, boqun.feng@gmail.com, dlustig@nvidia.com, parri.andrea@gmail.com Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Guo Ren Subject: [PATCH V3 2/5] riscv: atomic: Optimize acquire and release for AMO operations Date: Wed, 20 Apr 2022 22:44:14 +0800 Message-Id: <20220420144417.2453958-3-guoren@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220420144417.2453958-1-guoren@kernel.org> References: <20220420144417.2453958-1-guoren@kernel.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220420_074443_040421_E739CC3E X-CRM114-Status: GOOD ( 12.31 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Guo Ren Current acquire & release implementations from atomic-arch- fallback.h are using __atomic_acquire/release_fence(), it cause another extra "fence r, rw/fence rw,w" instruction after/before AMO instruction. RISC-V AMO instructions could combine acquire and release in the instruction self which could reduce a fence instruction. Here is from RISC-V ISA 10.4 Atomic Memory Operations: To help implement multiprocessor synchronization, the AMOs optionally provide release consistency semantics. - .aq: If the aq bit is set, then no later memory operations in this RISC-V hart can be observed to take place before the AMO. - .rl: If the rl bit is set, then other RISC-V harts will not observe the AMO before memory accesses preceding the AMO in this RISC-V hart. - .aqrl: Setting both the aq and the rl bit on an AMO makes the sequence sequentially consistent, meaning that it cannot be reordered with earlier or later memory operations from the same hart. Signed-off-by: Guo Ren Signed-off-by: Guo Ren Cc: Palmer Dabbelt Cc: Mark Rutland Cc: Andrea Parri Cc: Dan Lustig --- arch/riscv/include/asm/atomic.h | 64 ++++++++++++++++++++++++++++++++ arch/riscv/include/asm/cmpxchg.h | 12 ++---- 2 files changed, 68 insertions(+), 8 deletions(-) diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h index ac9bdf4fc404..20ce8b83bc18 100644 --- a/arch/riscv/include/asm/atomic.h +++ b/arch/riscv/include/asm/atomic.h @@ -99,6 +99,30 @@ c_type arch_atomic##prefix##_fetch_##op##_relaxed(c_type i, \ return ret; \ } \ static __always_inline \ +c_type arch_atomic##prefix##_fetch_##op##_acquire(c_type i, \ + atomic##prefix##_t *v) \ +{ \ + register c_type ret; \ + __asm__ __volatile__ ( \ + " amo" #asm_op "." #asm_type ".aq %1, %2, %0" \ + : "+A" (v->counter), "=r" (ret) \ + : "r" (I) \ + : "memory"); \ + return ret; \ +} \ +static __always_inline \ +c_type arch_atomic##prefix##_fetch_##op##_release(c_type i, \ + atomic##prefix##_t *v) \ +{ \ + register c_type ret; \ + __asm__ __volatile__ ( \ + " amo" #asm_op "." #asm_type ".rl %1, %2, %0" \ + : "+A" (v->counter), "=r" (ret) \ + : "r" (I) \ + : "memory"); \ + return ret; \ +} \ +static __always_inline \ c_type arch_atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \ { \ register c_type ret; \ @@ -118,6 +142,18 @@ c_type arch_atomic##prefix##_##op##_return_relaxed(c_type i, \ return arch_atomic##prefix##_fetch_##op##_relaxed(i, v) c_op I; \ } \ static __always_inline \ +c_type arch_atomic##prefix##_##op##_return_acquire(c_type i, \ + atomic##prefix##_t *v) \ +{ \ + return arch_atomic##prefix##_fetch_##op##_acquire(i, v) c_op I; \ +} \ +static __always_inline \ +c_type arch_atomic##prefix##_##op##_return_release(c_type i, \ + atomic##prefix##_t *v) \ +{ \ + return arch_atomic##prefix##_fetch_##op##_release(i, v) c_op I; \ +} \ +static __always_inline \ c_type arch_atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v) \ { \ return arch_atomic##prefix##_fetch_##op(i, v) c_op I; \ @@ -140,22 +176,38 @@ ATOMIC_OPS(sub, add, +, -i) #define arch_atomic_add_return_relaxed arch_atomic_add_return_relaxed #define arch_atomic_sub_return_relaxed arch_atomic_sub_return_relaxed +#define arch_atomic_add_return_acquire arch_atomic_add_return_acquire +#define arch_atomic_sub_return_acquire arch_atomic_sub_return_acquire +#define arch_atomic_add_return_release arch_atomic_add_return_release +#define arch_atomic_sub_return_release arch_atomic_sub_return_release #define arch_atomic_add_return arch_atomic_add_return #define arch_atomic_sub_return arch_atomic_sub_return #define arch_atomic_fetch_add_relaxed arch_atomic_fetch_add_relaxed #define arch_atomic_fetch_sub_relaxed arch_atomic_fetch_sub_relaxed +#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add_acquire +#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub_acquire +#define arch_atomic_fetch_add_release arch_atomic_fetch_add_release +#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub_release #define arch_atomic_fetch_add arch_atomic_fetch_add #define arch_atomic_fetch_sub arch_atomic_fetch_sub #ifndef CONFIG_GENERIC_ATOMIC64 #define arch_atomic64_add_return_relaxed arch_atomic64_add_return_relaxed #define arch_atomic64_sub_return_relaxed arch_atomic64_sub_return_relaxed +#define arch_atomic64_add_return_acquire arch_atomic64_add_return_acquire +#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return_acquire +#define arch_atomic64_add_return_release arch_atomic64_add_return_release +#define arch_atomic64_sub_return_release arch_atomic64_sub_return_release #define arch_atomic64_add_return arch_atomic64_add_return #define arch_atomic64_sub_return arch_atomic64_sub_return #define arch_atomic64_fetch_add_relaxed arch_atomic64_fetch_add_relaxed #define arch_atomic64_fetch_sub_relaxed arch_atomic64_fetch_sub_relaxed +#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add_acquire +#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub_acquire +#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add_release +#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub_release #define arch_atomic64_fetch_add arch_atomic64_fetch_add #define arch_atomic64_fetch_sub arch_atomic64_fetch_sub #endif @@ -178,6 +230,12 @@ ATOMIC_OPS(xor, xor, i) #define arch_atomic_fetch_and_relaxed arch_atomic_fetch_and_relaxed #define arch_atomic_fetch_or_relaxed arch_atomic_fetch_or_relaxed #define arch_atomic_fetch_xor_relaxed arch_atomic_fetch_xor_relaxed +#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and_acquire +#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or_acquire +#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor_acquire +#define arch_atomic_fetch_and_release arch_atomic_fetch_and_release +#define arch_atomic_fetch_or_release arch_atomic_fetch_or_release +#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor_release #define arch_atomic_fetch_and arch_atomic_fetch_and #define arch_atomic_fetch_or arch_atomic_fetch_or #define arch_atomic_fetch_xor arch_atomic_fetch_xor @@ -186,6 +244,12 @@ ATOMIC_OPS(xor, xor, i) #define arch_atomic64_fetch_and_relaxed arch_atomic64_fetch_and_relaxed #define arch_atomic64_fetch_or_relaxed arch_atomic64_fetch_or_relaxed #define arch_atomic64_fetch_xor_relaxed arch_atomic64_fetch_xor_relaxed +#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and_acquire +#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or_acquire +#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor_acquire +#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and_release +#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or_release +#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor_release #define arch_atomic64_fetch_and arch_atomic64_fetch_and #define arch_atomic64_fetch_or arch_atomic64_fetch_or #define arch_atomic64_fetch_xor arch_atomic64_fetch_xor diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h index 12debce235e5..1af8db92250b 100644 --- a/arch/riscv/include/asm/cmpxchg.h +++ b/arch/riscv/include/asm/cmpxchg.h @@ -52,16 +52,14 @@ switch (size) { \ case 4: \ __asm__ __volatile__ ( \ - " amoswap.w %0, %2, %1\n" \ - RISCV_ACQUIRE_BARRIER \ + " amoswap.w.aq %0, %2, %1\n" \ : "=r" (__ret), "+A" (*__ptr) \ : "r" (__new) \ : "memory"); \ break; \ case 8: \ __asm__ __volatile__ ( \ - " amoswap.d %0, %2, %1\n" \ - RISCV_ACQUIRE_BARRIER \ + " amoswap.d.aq %0, %2, %1\n" \ : "=r" (__ret), "+A" (*__ptr) \ : "r" (__new) \ : "memory"); \ @@ -87,16 +85,14 @@ switch (size) { \ case 4: \ __asm__ __volatile__ ( \ - RISCV_RELEASE_BARRIER \ - " amoswap.w %0, %2, %1\n" \ + " amoswap.w.rl %0, %2, %1\n" \ : "=r" (__ret), "+A" (*__ptr) \ : "r" (__new) \ : "memory"); \ break; \ case 8: \ __asm__ __volatile__ ( \ - RISCV_RELEASE_BARRIER \ - " amoswap.d %0, %2, %1\n" \ + " amoswap.d.rl %0, %2, %1\n" \ : "=r" (__ret), "+A" (*__ptr) \ : "r" (__new) \ : "memory"); \