From patchwork Thu Nov 23 19:04:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13466621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 62F5DC5AD4C for ; Thu, 23 Nov 2023 19:05:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Y//eMCSbFLxyIALfjlGTNpvaylJVpNSHx6WiuF8EzrE=; b=FBnPMzfFAHTECF uLMp/7Au4tuYEw6NhJJWifGz61y12a2vcnIakNiQD7YVe9a1ZPDiTgCIyR0iq5C5BSt+BXQwZZ98Q gRFq2557dCozOIVvkMlDBpCirvPvSlzNwdvWltW2T+/VTEIf4nC8JVcygwBP/QEcWcocG+tVW+AXR rHgPC4LFwQrfh+En2X38aR7KdvylnT5VyVK/QYXK2vlYUNZpcDh4jWnb1P02k7kgJ+QhQ2hPyNxhk 1eqssjSqPuGa5eF25aHnG7JHexpD6rR+Fwq/g1YjzCidasD6gGs5G29x5g4gSAl2Rvxo7Bgzpsz3Q UGQlw7edl6SXIuo6Wxwg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r6F0R-005ZHg-1d; Thu, 23 Nov 2023 19:04:55 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r6F0L-005ZDD-1m for linux-arm-kernel@lists.infradead.org; Thu, 23 Nov 2023 19:04:51 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 369F6B82FDE; Thu, 23 Nov 2023 19:04:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0EB6BC433CB; Thu, 23 Nov 2023 19:04:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700766287; bh=3ljL0Da911z3hmk5/gL8FYWs2N0kZXcYl6ny1yDDcSw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LTZ3btdfPmyeES5lb/Fx+83WSn0LF7ztn8weW++kVE7xLL78yTNhb/ZjDM9GQudaH VlF1lGFLLpcUu181spW7Jzu1P7KDFqT+dnfzn1QO5J/bEg6+nJaPRZ7txBXoIH4Ro8 hsDOVeNetBITsmNdcOpgtaBljWqGdTCLcdUGec9N/8sDZUE89yV0+MjZLa8B8HDRy3 2kMHgcEpUF9WNClvA4va2s3hdwW1Yu+ulnXwn2Z4Os1jf1xyFwntrl1kuHORk1ttXU 6XJ2/4/6x97FHIN7RO2MmcDSleInk6Pw1cy9pkhGfnewfc/nt6I0sYKIYlXd1S4ykx 6NOzaThMxbryA== From: Leon Romanovsky To: Jason Gunthorpe Cc: Arnd Bergmann , Catalin Marinas , linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-rdma@vger.kernel.org, llvm@lists.linux.dev, Michael Guralnik , Nathan Chancellor , Nick Desaulniers , Will Deacon Subject: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Date: Thu, 23 Nov 2023 21:04:31 +0200 Message-ID: X-Mailer: git-send-email 2.42.0 In-Reply-To: References: MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231123_110449_885930_577A566F X-CRM114-Status: GOOD ( 16.94 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Jason Gunthorpe The kernel supports write combining IO memory which is commonly used to generate 64 byte TLPs in a PCIe environment. On many CPUs this mechanism is pretty tolerant and a simple C loop will suffice to generate a 64 byte TLP. However modern ARM64 CPUs are quite sensitive and a compiler generated loop is not enough to reliably generate a 64 byte TLP. Especially given the ARM64 issue that writel() does not codegen anything other than "[xN]" as the address calculation. These newer CPUs require an orderly consecutive block of stores to work reliably. This is best done with four STP integer instructions (perhaps ST64B in future), or a single ST4 vector instruction. Provide a new generic function memcpy_toio_64() which should reliably generate the needed instructions for the architecture, assuming address alignment. As the usual need for this operation is performance sensitive a fast inline implementation is preferred. Implement an optimized version on ARM that is a block of 4 STP instructions. The generic implementation is just a simple loop. x86-64 (clang 16) compiles this into an unrolled loop of 16 movq pairs. Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Jason Gunthorpe Signed-off-by: Leon Romanovsky --- arch/arm64/include/asm/io.h | 20 ++++++++++++++++++++ include/asm-generic/io.h | 30 ++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index 3b694511b98f..73ab91913790 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -135,6 +135,26 @@ extern void __memset_io(volatile void __iomem *, int, size_t); #define memcpy_fromio(a,c,l) __memcpy_fromio((a),(c),(l)) #define memcpy_toio(c,a,l) __memcpy_toio((c),(a),(l)) +static inline void __memcpy_toio_64(volatile void __iomem *to, const void *from) +{ + const u64 *from64 = from; + + /* + * Newer ARM core have sensitive write combining buffers, it is + * important that the stores be contiguous blocks of store instructions. + * Normal memcpy does not work reliably. + */ + asm volatile("stp %x0, %x1, [%8, #16 * 0]\n" + "stp %x2, %x3, [%8, #16 * 1]\n" + "stp %x4, %x5, [%8, #16 * 2]\n" + "stp %x6, %x7, [%8, #16 * 3]\n" + : + : "rZ"(from64[0]), "rZ"(from64[1]), "rZ"(from64[2]), + "rZ"(from64[3]), "rZ"(from64[4]), "rZ"(from64[5]), + "rZ"(from64[6]), "rZ"(from64[7]), "r"(to)); +} +#define memcpy_toio_64(to, from) __memcpy_toio_64(to, from) + /* * I/O memory mapping functions. */ diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index bac63e874c7b..2d6d60ed2128 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -1202,6 +1202,36 @@ static inline void memcpy_toio(volatile void __iomem *addr, const void *buffer, } #endif +#ifndef memcpy_toio_64 +#define memcpy_toio_64 memcpy_toio_64 +/** + * memcpy_toio_64 Copy 64 bytes of data into I/O memory + * @dst: The (I/O memory) destination for the copy + * @src: The (RAM) source for the data + * @count: The number of bytes to copy + * + * dst and src must be aligned to 8 bytes. This operation copies exactly 64 + * bytes. It is intended to be used for write combining IO memory. The + * architecture should provide an implementation that has a high chance of + * generating a single combined transaction. + */ +static inline void memcpy_toio_64(volatile void __iomem *addr, + const void *buffer) +{ + unsigned int i = 0; + +#if BITS_PER_LONG == 64 + for (; i != 8; i++) + __raw_writeq(((const u64 *)buffer)[i], + ((u64 __iomem *)addr) + i); +#else + for (; i != 16; i++) + __raw_writel(((const u32 *)buffer)[i], + ((u32 __iomem *)addr) + i); +#endif +} +#endif + extern int devmem_is_allowed(unsigned long pfn); #endif /* __KERNEL__ */