diff mbox series

[rdma-next,2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores

Message ID 744fdfcd61fa8efa6da8ed432883b5f016c3a86f.1700766072.git.leon@kernel.org (mailing list archive)
State New, archived
Headers show
Series Add and use memcpy_toio_64() | expand

Commit Message

Leon Romanovsky Nov. 23, 2023, 7:04 p.m. UTC
From: Jason Gunthorpe <jgg@nvidia.com>

mlx5 has a built in self-test at driver startup to evaluate if the
platform supports write combining to generate a 64 byte PCIe TLP or
not. This has proven necessary because a lot of common scenarios end up
with broken write combining (especially inside virtual machines) and there
is other way to learn this information.

This self test has been consistently failing on new ARM64 CPU
designs (specifically with NVIDIA Grace's implementation of Neoverse
V2). The C loop around writel() generates some pretty terrible ARM64
assembly, but historically this has worked on a lot of existing ARM64 CPUs
till now.

We see it succeed about 1 time in 10,000 on the worst effected
systems. The CPU architects speculate that the load instructions
interspersed with the stores make it very unreliable.

Change this to use memcpy_toio_64() which provides a block of 4 STP
instructions on ARM64, and the same writel loop on everything else.

Fixes: 11f552e21755 ("IB/mlx5: Test write combining support")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mem.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c
index 96ffbbaf0a73..26b5590d2164 100644
--- a/drivers/infiniband/hw/mlx5/mem.c
+++ b/drivers/infiniband/hw/mlx5/mem.c
@@ -108,7 +108,6 @@  static int post_send_nop(struct mlx5_ib_dev *dev, struct ib_qp *ibqp, u64 wr_id,
 	__be32 mmio_wqe[16] = {};
 	unsigned long flags;
 	unsigned int idx;
-	int i;
 
 	if (unlikely(dev->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR))
 		return -EIO;
@@ -148,9 +147,7 @@  static int post_send_nop(struct mlx5_ib_dev *dev, struct ib_qp *ibqp, u64 wr_id,
 	 * we hit doorbell
 	 */
 	wmb();
-	for (i = 0; i < 8; i++)
-		mlx5_write64(&mmio_wqe[i * 2],
-			     bf->bfreg->map + bf->offset + i * 8);
+	memcpy_toio_64(bf->bfreg->map + bf->offset, mmio_wqe);
 	io_stop_wc();
 
 	bf->offset ^= bf->buf_size;