From patchwork Thu Nov 23 19:04:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13466621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 62F5DC5AD4C for ; Thu, 23 Nov 2023 19:05:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Y//eMCSbFLxyIALfjlGTNpvaylJVpNSHx6WiuF8EzrE=; b=FBnPMzfFAHTECF uLMp/7Au4tuYEw6NhJJWifGz61y12a2vcnIakNiQD7YVe9a1ZPDiTgCIyR0iq5C5BSt+BXQwZZ98Q gRFq2557dCozOIVvkMlDBpCirvPvSlzNwdvWltW2T+/VTEIf4nC8JVcygwBP/QEcWcocG+tVW+AXR rHgPC4LFwQrfh+En2X38aR7KdvylnT5VyVK/QYXK2vlYUNZpcDh4jWnb1P02k7kgJ+QhQ2hPyNxhk 1eqssjSqPuGa5eF25aHnG7JHexpD6rR+Fwq/g1YjzCidasD6gGs5G29x5g4gSAl2Rvxo7Bgzpsz3Q UGQlw7edl6SXIuo6Wxwg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r6F0R-005ZHg-1d; Thu, 23 Nov 2023 19:04:55 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r6F0L-005ZDD-1m for linux-arm-kernel@lists.infradead.org; Thu, 23 Nov 2023 19:04:51 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 369F6B82FDE; Thu, 23 Nov 2023 19:04:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0EB6BC433CB; Thu, 23 Nov 2023 19:04:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700766287; bh=3ljL0Da911z3hmk5/gL8FYWs2N0kZXcYl6ny1yDDcSw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LTZ3btdfPmyeES5lb/Fx+83WSn0LF7ztn8weW++kVE7xLL78yTNhb/ZjDM9GQudaH VlF1lGFLLpcUu181spW7Jzu1P7KDFqT+dnfzn1QO5J/bEg6+nJaPRZ7txBXoIH4Ro8 hsDOVeNetBITsmNdcOpgtaBljWqGdTCLcdUGec9N/8sDZUE89yV0+MjZLa8B8HDRy3 2kMHgcEpUF9WNClvA4va2s3hdwW1Yu+ulnXwn2Z4Os1jf1xyFwntrl1kuHORk1ttXU 6XJ2/4/6x97FHIN7RO2MmcDSleInk6Pw1cy9pkhGfnewfc/nt6I0sYKIYlXd1S4ykx 6NOzaThMxbryA== From: Leon Romanovsky To: Jason Gunthorpe Cc: Arnd Bergmann , Catalin Marinas , linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-rdma@vger.kernel.org, llvm@lists.linux.dev, Michael Guralnik , Nathan Chancellor , Nick Desaulniers , Will Deacon Subject: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Date: Thu, 23 Nov 2023 21:04:31 +0200 Message-ID: X-Mailer: git-send-email 2.42.0 In-Reply-To: References: MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231123_110449_885930_577A566F X-CRM114-Status: GOOD ( 16.94 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Jason Gunthorpe The kernel supports write combining IO memory which is commonly used to generate 64 byte TLPs in a PCIe environment. On many CPUs this mechanism is pretty tolerant and a simple C loop will suffice to generate a 64 byte TLP. However modern ARM64 CPUs are quite sensitive and a compiler generated loop is not enough to reliably generate a 64 byte TLP. Especially given the ARM64 issue that writel() does not codegen anything other than "[xN]" as the address calculation. These newer CPUs require an orderly consecutive block of stores to work reliably. This is best done with four STP integer instructions (perhaps ST64B in future), or a single ST4 vector instruction. Provide a new generic function memcpy_toio_64() which should reliably generate the needed instructions for the architecture, assuming address alignment. As the usual need for this operation is performance sensitive a fast inline implementation is preferred. Implement an optimized version on ARM that is a block of 4 STP instructions. The generic implementation is just a simple loop. x86-64 (clang 16) compiles this into an unrolled loop of 16 movq pairs. Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Jason Gunthorpe Signed-off-by: Leon Romanovsky --- arch/arm64/include/asm/io.h | 20 ++++++++++++++++++++ include/asm-generic/io.h | 30 ++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index 3b694511b98f..73ab91913790 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -135,6 +135,26 @@ extern void __memset_io(volatile void __iomem *, int, size_t); #define memcpy_fromio(a,c,l) __memcpy_fromio((a),(c),(l)) #define memcpy_toio(c,a,l) __memcpy_toio((c),(a),(l)) +static inline void __memcpy_toio_64(volatile void __iomem *to, const void *from) +{ + const u64 *from64 = from; + + /* + * Newer ARM core have sensitive write combining buffers, it is + * important that the stores be contiguous blocks of store instructions. + * Normal memcpy does not work reliably. + */ + asm volatile("stp %x0, %x1, [%8, #16 * 0]\n" + "stp %x2, %x3, [%8, #16 * 1]\n" + "stp %x4, %x5, [%8, #16 * 2]\n" + "stp %x6, %x7, [%8, #16 * 3]\n" + : + : "rZ"(from64[0]), "rZ"(from64[1]), "rZ"(from64[2]), + "rZ"(from64[3]), "rZ"(from64[4]), "rZ"(from64[5]), + "rZ"(from64[6]), "rZ"(from64[7]), "r"(to)); +} +#define memcpy_toio_64(to, from) __memcpy_toio_64(to, from) + /* * I/O memory mapping functions. */ diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index bac63e874c7b..2d6d60ed2128 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -1202,6 +1202,36 @@ static inline void memcpy_toio(volatile void __iomem *addr, const void *buffer, } #endif +#ifndef memcpy_toio_64 +#define memcpy_toio_64 memcpy_toio_64 +/** + * memcpy_toio_64 Copy 64 bytes of data into I/O memory + * @dst: The (I/O memory) destination for the copy + * @src: The (RAM) source for the data + * @count: The number of bytes to copy + * + * dst and src must be aligned to 8 bytes. This operation copies exactly 64 + * bytes. It is intended to be used for write combining IO memory. The + * architecture should provide an implementation that has a high chance of + * generating a single combined transaction. + */ +static inline void memcpy_toio_64(volatile void __iomem *addr, + const void *buffer) +{ + unsigned int i = 0; + +#if BITS_PER_LONG == 64 + for (; i != 8; i++) + __raw_writeq(((const u64 *)buffer)[i], + ((u64 __iomem *)addr) + i); +#else + for (; i != 16; i++) + __raw_writel(((const u32 *)buffer)[i], + ((u32 __iomem *)addr) + i); +#endif +} +#endif + extern int devmem_is_allowed(unsigned long pfn); #endif /* __KERNEL__ */ From patchwork Thu Nov 23 19:04:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13466620 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 53EA5C61D85 for ; Thu, 23 Nov 2023 19:05:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=JSjGdSRycRvX+hCCgvaL789kXPOfp9LxNptjIoll9QA=; b=pKgWw0tq/UU+tr 9RQuBzIhj+cOwA1bhlC0VwvCdKKxA/4j+axauMJbWrGrjxc4Lvhyyiizhe4MEbf4/B1+eTk7BWGX/ upwssXGyKnrdopT3WLmikLexCBihoqDektlU8pc8ozRTeL5WzlRdRuJuYSP7OCMYmh/9YHZZa2IGr tfnxIpe41F63BWONGo2IHSTX2RQx2LJkVyAkNKx4POX9lyCcMA3ISNM106mLUy09JnPtyosdB1Uu/ D+0tPlk6aGVJ2S1ZAlUQujHSaWpyl0hkOSyFLoj8+gbeZ0AwxFSedA505Qw3EJGcgGAGZa65OFo2O 96pD2JyVtctBtVthZ7jA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r6F0K-005ZDF-2F; Thu, 23 Nov 2023 19:04:48 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r6F0H-005ZAd-2F for linux-arm-kernel@lists.infradead.org; Thu, 23 Nov 2023 19:04:47 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 1BC0AB82EE0; Thu, 23 Nov 2023 19:04:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E835BC433CA; Thu, 23 Nov 2023 19:04:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700766283; bh=xUztf5c6+cDrQVGWca5DAIqwQ3w/w7JoXRRhID515MY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iEbQgGVxuo5B+LkFE9lNJQsMr3+HPoAKvZldiQRRSTjhFRsd6PUuS8DZjFh/1U7fg BraPFqCBY6Kutl8+97B/ZKk/rZP89+vzbsMufoKDQgWyqVoVhLQTyp7BM0fmRpCB9W hwdVPYiHevRo6ysqAED9NuVMaxd5e6dPy6nGCNp3phvKuX0lJ7RIkN59ULD+Fz7hHb lAwLRhrKM4v5C8lZJdZmKtNoRn2j29FwQDY8iQTqqfOTjWC3v/ZqP9si/HUV/g0c5d Pfg9ff29ZANEch0NMG6DabapUS49GnGHEj9UYYhDCety0Qp7LVXfxMK0mjcCc8EOYB xmU8SMS9YQYKQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Arnd Bergmann , Catalin Marinas , linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-rdma@vger.kernel.org, llvm@lists.linux.dev, Michael Guralnik , Nathan Chancellor , Nick Desaulniers , Will Deacon Subject: [PATCH rdma-next 2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores Date: Thu, 23 Nov 2023 21:04:32 +0200 Message-ID: <744fdfcd61fa8efa6da8ed432883b5f016c3a86f.1700766072.git.leon@kernel.org> X-Mailer: git-send-email 2.42.0 In-Reply-To: References: MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231123_110445_893752_825678DB X-CRM114-Status: GOOD ( 13.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Jason Gunthorpe mlx5 has a built in self-test at driver startup to evaluate if the platform supports write combining to generate a 64 byte PCIe TLP or not. This has proven necessary because a lot of common scenarios end up with broken write combining (especially inside virtual machines) and there is other way to learn this information. This self test has been consistently failing on new ARM64 CPU designs (specifically with NVIDIA Grace's implementation of Neoverse V2). The C loop around writel() generates some pretty terrible ARM64 assembly, but historically this has worked on a lot of existing ARM64 CPUs till now. We see it succeed about 1 time in 10,000 on the worst effected systems. The CPU architects speculate that the load instructions interspersed with the stores make it very unreliable. Change this to use memcpy_toio_64() which provides a block of 4 STP instructions on ARM64, and the same writel loop on everything else. Fixes: 11f552e21755 ("IB/mlx5: Test write combining support") Signed-off-by: Jason Gunthorpe Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/mem.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c index 96ffbbaf0a73..26b5590d2164 100644 --- a/drivers/infiniband/hw/mlx5/mem.c +++ b/drivers/infiniband/hw/mlx5/mem.c @@ -108,7 +108,6 @@ static int post_send_nop(struct mlx5_ib_dev *dev, struct ib_qp *ibqp, u64 wr_id, __be32 mmio_wqe[16] = {}; unsigned long flags; unsigned int idx; - int i; if (unlikely(dev->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)) return -EIO; @@ -148,9 +147,7 @@ static int post_send_nop(struct mlx5_ib_dev *dev, struct ib_qp *ibqp, u64 wr_id, * we hit doorbell */ wmb(); - for (i = 0; i < 8; i++) - mlx5_write64(&mmio_wqe[i * 2], - bf->bfreg->map + bf->offset + i * 8); + memcpy_toio_64(bf->bfreg->map + bf->offset, mmio_wqe); io_stop_wc(); bf->offset ^= bf->buf_size;