From patchwork Fri Dec 20 10:09:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daisuke Matsuda X-Patchwork-Id: 13916462 Received: from esa9.hc1455-7.c3s2.iphmx.com (esa9.hc1455-7.c3s2.iphmx.com [139.138.36.223]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E6561AAA3D; Fri, 20 Dec 2024 10:11:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=139.138.36.223 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689477; cv=none; b=KQkyndbKTt9TvN1mwjd3d6SqPHEOQRqhx79O2ZK4TVCFY6gtbmo/d19BMaPGC/FRlFucOaEh9wlFwQqgYNm2qQEHKCbtBquzoAtF7OXP25LSQRGWnjxWSiTIX+h/vmvj17yH4Qz7GBzIW+lGzObgPljvPfHT9DF2djBnV8CJd3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689477; c=relaxed/simple; bh=zx4hPGYCwbGuhynaTIxyB0Gari4320Pi0Yjoc5W3lHE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=C9p/4WP91+/KThpQpogfkcgXlsX/g/9/L+sBBnf1jSBjGvwWlG2tMLpqh1ukiTOwRQTe1L5r+WgvFbLISys6O9NG+QnASI5WkdkMsMqt1ZPzAiVJwifDzTUOeHmN82SZAZ0WqEucorDBzfW54CxbDqKn+7AckivtSZolppOu//8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com; spf=pass smtp.mailfrom=fujitsu.com; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b=P8GIODH1; arc=none smtp.client-ip=139.138.36.223 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b="P8GIODH1" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1734689475; x=1766225475; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zx4hPGYCwbGuhynaTIxyB0Gari4320Pi0Yjoc5W3lHE=; b=P8GIODH1r1ffkb3Etl/dXnZjAbYH7ijlLxCPQqe5XnuPNP1t9BGYD3bq yeKIvdYhD6IXgWejg2zc/WQKxHSMofuz3tsZQVcUjgLTWkgZvwD8B82c8 xja8cEh9Jv44o7aHHxZmmkNYHqCkOB45uHZ2m6SByyXJ3JwpcmSoDThrA oh7BHh7r0fDpLaZWcSWTyVUH9AojgTxewkFGWpFBQHckkc8hqJchv/6VQ 58HyEL2tLG1Y4RzwWvnd7yB482816qAAOwWkQu16iIyt481fvaLT/N6X+ 7zuLUU70EX7mUBO7oXM5NkQmGQpF+HiEgs75X2REQXJSzfQUOPqWsCpRO w==; X-CSE-ConnectionGUID: Z3oWk6dhR2C4J5JkeJ4Q4w== X-CSE-MsgGUID: lYMkWzprSA6NSQH284s3Fg== X-IronPort-AV: E=McAfee;i="6700,10204,11291"; a="172794847" X-IronPort-AV: E=Sophos;i="6.12,250,1728918000"; d="scan'208";a="172794847" Received: from unknown (HELO yto-r1.gw.nic.fujitsu.com) ([218.44.52.217]) by esa9.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 19:10:03 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id F07CDD6EA5; Fri, 20 Dec 2024 19:10:00 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id CA0D6D5070; Fri, 20 Dec 2024 19:10:00 +0900 (JST) Received: from sm-x86-stp01.ssoft.mng.com (unknown [10.124.178.20]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 9C9AB200597B; Fri, 20 Dec 2024 19:10:00 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, lizhijian@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v9 1/5] RDMA/rxe: Move some code to rxe_loc.h in preparation for ODP Date: Fri, 20 Dec 2024 19:09:32 +0900 Message-Id: <20241220100936.2193541-2-matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> References: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 rxe_mr_init() and resp_states are going to be used in rxe_odp.c, which is to be created in the subsequent patch. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.h | 37 --------------------------- drivers/infiniband/sw/rxe/rxe_loc.h | 1 + drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 37 +++++++++++++++++++++++++++ 4 files changed, 39 insertions(+), 38 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index d8fb2c7af30a..193f7caffaf2 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -100,43 +100,6 @@ #define rxe_info_mw(mw, fmt, ...) ibdev_info_ratelimited((mw)->ibmw.device, \ "mw#%d %s: " fmt, (mw)->elem.index, __func__, ##__VA_ARGS__) -/* responder states */ -enum resp_states { - RESPST_NONE, - RESPST_GET_REQ, - RESPST_CHK_PSN, - RESPST_CHK_OP_SEQ, - RESPST_CHK_OP_VALID, - RESPST_CHK_RESOURCE, - RESPST_CHK_LENGTH, - RESPST_CHK_RKEY, - RESPST_EXECUTE, - RESPST_READ_REPLY, - RESPST_ATOMIC_REPLY, - RESPST_ATOMIC_WRITE_REPLY, - RESPST_PROCESS_FLUSH, - RESPST_COMPLETE, - RESPST_ACKNOWLEDGE, - RESPST_CLEANUP, - RESPST_DUPLICATE_REQUEST, - RESPST_ERR_MALFORMED_WQE, - RESPST_ERR_UNSUPPORTED_OPCODE, - RESPST_ERR_MISALIGNED_ATOMIC, - RESPST_ERR_PSN_OUT_OF_SEQ, - RESPST_ERR_MISSING_OPCODE_FIRST, - RESPST_ERR_MISSING_OPCODE_LAST_C, - RESPST_ERR_MISSING_OPCODE_LAST_D1E, - RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, - RESPST_ERR_RNR, - RESPST_ERR_RKEY_VIOLATION, - RESPST_ERR_INVALIDATE_RKEY, - RESPST_ERR_LENGTH, - RESPST_ERR_CQ_OVERFLOW, - RESPST_ERROR, - RESPST_DONE, - RESPST_EXIT, -}; - void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name); diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index ded46119151b..1c78b9d36da4 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -58,6 +58,7 @@ int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); /* rxe_mr.c */ u8 rxe_get_next_key(u32 last_key); +void rxe_mr_init(int access, struct rxe_mr *mr); void rxe_mr_init_dma(int access, struct rxe_mr *mr); int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, int access, struct rxe_mr *mr); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index da3dee520876..12cacc4f60f2 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -45,7 +45,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) } } -static void rxe_mr_init(int access, struct rxe_mr *mr) +void rxe_mr_init(int access, struct rxe_mr *mr) { u32 key = mr->elem.index << 8 | rxe_get_next_key(-1); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 3c1354f82283..e4656c7640f0 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -126,6 +126,43 @@ struct rxe_comp_info { u32 rnr_retry; }; +/* responder states */ +enum resp_states { + RESPST_NONE, + RESPST_GET_REQ, + RESPST_CHK_PSN, + RESPST_CHK_OP_SEQ, + RESPST_CHK_OP_VALID, + RESPST_CHK_RESOURCE, + RESPST_CHK_LENGTH, + RESPST_CHK_RKEY, + RESPST_EXECUTE, + RESPST_READ_REPLY, + RESPST_ATOMIC_REPLY, + RESPST_ATOMIC_WRITE_REPLY, + RESPST_PROCESS_FLUSH, + RESPST_COMPLETE, + RESPST_ACKNOWLEDGE, + RESPST_CLEANUP, + RESPST_DUPLICATE_REQUEST, + RESPST_ERR_MALFORMED_WQE, + RESPST_ERR_UNSUPPORTED_OPCODE, + RESPST_ERR_MISALIGNED_ATOMIC, + RESPST_ERR_PSN_OUT_OF_SEQ, + RESPST_ERR_MISSING_OPCODE_FIRST, + RESPST_ERR_MISSING_OPCODE_LAST_C, + RESPST_ERR_MISSING_OPCODE_LAST_D1E, + RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, + RESPST_ERR_RNR, + RESPST_ERR_RKEY_VIOLATION, + RESPST_ERR_INVALIDATE_RKEY, + RESPST_ERR_LENGTH, + RESPST_ERR_CQ_OVERFLOW, + RESPST_ERROR, + RESPST_DONE, + RESPST_EXIT, +}; + enum rdatm_res_state { rdatm_res_state_next, rdatm_res_state_new, From patchwork Fri Dec 20 10:09:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daisuke Matsuda X-Patchwork-Id: 13916463 Received: from esa11.hc1455-7.c3s2.iphmx.com (esa11.hc1455-7.c3s2.iphmx.com [207.54.90.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA9CE1BEF97; Fri, 20 Dec 2024 10:11:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.54.90.137 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689480; cv=none; b=BJKi3TWuLvkrzMdkwPdI5FwOExC8gy75V1+L1UOtbTnoXWxP2zWLVDcxbNrTbL/mvdrrbFN/MGhj8fFl3wx19X7TXi5EazwpXdqCHszJ1fD0hke7Ri2TrDGvavAmKnTJunlKiJ33QaeBpELLZKpDxT5XpP9E1pi+b3BRB4vzlPs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689480; c=relaxed/simple; bh=mSZ7eLr71izJsPwRWEk4puVUA0moz4RtvPiJiYq/0k8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LzLuc/DT+qO6Txu7p2qV+2+ZkGJiuZfg4VK+LbsFEWAxzBDFCHQL273JUUevJeWCed3tlHa0D24SLW07bMjrQwGBDv/6ql4ewVgrvt0ZwmAOb1zdAKTmG5aXwhthnNUfXTIQyLCcaP2eLLhKsD+zQjvmzLixJam2GSQ8nvaoLws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com; spf=pass smtp.mailfrom=fujitsu.com; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b=EMgjzYdk; arc=none smtp.client-ip=207.54.90.137 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b="EMgjzYdk" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1734689479; x=1766225479; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mSZ7eLr71izJsPwRWEk4puVUA0moz4RtvPiJiYq/0k8=; b=EMgjzYdkr27TKhJ90Gm2lSLLe1LlrACM108Yk8XuTeiGBhzZ3AI7p7dF sZgTILbH8Y+/PsNvz/15ov1Hse9LkOLgYT+rOi+WLRV1ea3WxJjMNtcxx 67DtLUHzwSO/2yI4dG4RvDWfZqjc9+oSAINFz6d0DLPJ8qcgc7AxgSTWM WpaB7uLxTYN6rztcP+VtUKH6GX2XQiukiLlnIgu3QnrPCGBtxlJo55oSB im4lLfZuLMOUYvFOS+d6mNZd/wPphxvCIKkG3QmeuTFGb3sI1vRmVEwhk OYW/1aJ2wFxXiP5H1OO6zt5jtpJj3f1CFbCCbMrCZg/9gUEBXDkpf/ItN w==; X-CSE-ConnectionGUID: PrwbV6wFRsG4o6vPOpulZw== X-CSE-MsgGUID: LQBQ2Rf1R8O78U6RiXHzFw== X-IronPort-AV: E=McAfee;i="6700,10204,11291"; a="163883306" X-IronPort-AV: E=Sophos;i="6.12,250,1728918000"; d="scan'208";a="163883306" Received: from unknown (HELO oym-r1.gw.nic.fujitsu.com) ([210.162.30.89]) by esa11.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 19:10:07 +0900 Received: from oym-m2.gw.nic.fujitsu.com (oym-nat-oym-m2.gw.nic.fujitsu.com [192.168.87.59]) by oym-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id 29525D4807; Fri, 20 Dec 2024 19:10:05 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id F2FA7BF3E6; Fri, 20 Dec 2024 19:10:04 +0900 (JST) Received: from sm-x86-stp01.ssoft.mng.com (unknown [10.124.178.20]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id B3FB0200597B; Fri, 20 Dec 2024 19:10:04 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, lizhijian@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v9 2/5] RDMA/rxe: Add page invalidation support Date: Fri, 20 Dec 2024 19:09:33 +0900 Message-Id: <20241220100936.2193541-3-matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> References: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On page invalidation, an MMU notifier callback is invoked to unmap DMA addresses and update the driver page table(umem_odp->dma_list). The callback is registered when an ODP-enabled MR is created. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/Makefile | 2 ++ drivers/infiniband/sw/rxe/rxe_loc.h | 3 +++ drivers/infiniband/sw/rxe/rxe_odp.c | 38 +++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+) create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 5395a581f4bb..93134f1d1d0c 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -23,3 +23,5 @@ rdma_rxe-y := \ rxe_task.o \ rxe_net.o \ rxe_hw_counters.o + +rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 1c78b9d36da4..0162ac9431c1 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -182,4 +182,7 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type]; } +/* rxe_odp.c */ +extern const struct mmu_interval_notifier_ops rxe_mn_ops; + #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c new file mode 100644 index 000000000000..2be8066db012 --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -0,0 +1,38 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2022-2023 Fujitsu Ltd. All rights reserved. + */ + +#include + +#include + +#include "rxe.h" + +static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct ib_umem_odp *umem_odp = + container_of(mni, struct ib_umem_odp, notifier); + unsigned long start, end; + + if (!mmu_notifier_range_blockable(range)) + return false; + + mutex_lock(&umem_odp->umem_mutex); + mmu_interval_set_seq(mni, cur_seq); + + start = max_t(u64, ib_umem_start(umem_odp), range->start); + end = min_t(u64, ib_umem_end(umem_odp), range->end); + + /* update umem_odp->dma_list */ + ib_umem_odp_unmap_dma_pages(umem_odp, start, end); + + mutex_unlock(&umem_odp->umem_mutex); + return true; +} + +const struct mmu_interval_notifier_ops rxe_mn_ops = { + .invalidate = rxe_ib_invalidate_range, +}; From patchwork Fri Dec 20 10:09:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daisuke Matsuda X-Patchwork-Id: 13916460 Received: from esa7.hc1455-7.c3s2.iphmx.com (esa7.hc1455-7.c3s2.iphmx.com [139.138.61.252]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FE3E1AAA1F; Fri, 20 Dec 2024 10:10:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=139.138.61.252 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689420; cv=none; b=Vl1U9uGWafMtjkxhrlEgA3UgJ9Wiat8Rt9i73mZxJRbh4ZRPpRyYjwYiLbH73/TtkHLRvhWUe6WOT6I1ZH8TYsLc1X1pZaORaKMcT0x8uKxy+cLrLb7MFZcM+BushCnc9TecifSFKVlY3ClWIoY3B+Y7JsVS3FCkKV9h+/8D78Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689420; c=relaxed/simple; bh=J7NVvzsg5FdA5lOlUazCwTypS47kXGr6IjjXWpJy/oE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ger6nuxAe5hFf+kwMQGNUxlsgkxhcaVSqHB7+1PNq7JHmdidS2aNjB5uEviGUWPSWJnLjDfRe5hJh0jtkjfkBNMHVYHkfCuFy44IUsbkfwVUN1CQG4FyIkkAPPtA7/upDwf1vHjK3PAFdxiorPHJJr7O65ayuZ+CMSvQ7pqIjJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com; spf=pass smtp.mailfrom=fujitsu.com; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b=H0U0tm/0; arc=none smtp.client-ip=139.138.61.252 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b="H0U0tm/0" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1734689418; x=1766225418; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=J7NVvzsg5FdA5lOlUazCwTypS47kXGr6IjjXWpJy/oE=; b=H0U0tm/0Wleg+nu9JBSBtOFEpRCXd6lXByRamDpcccoYGaxeTPHcAgp8 DUXEHuRD2JBiF1DfkRyrS8UU+JK0BvFHX14TxTzin79ZPZHb0n1uBxULK JA2cQICkgP4PCeoeB/493u8vKTtVFYAe6fe2J5Unf7sxAp6Bcx9OAKDUX 5jBcrSjBTtH7x2GpGgqh4O5YZtcNJpXUGD9eJDtHFCeHFvaNZnr+2ozb2 VOalgm7MMB3JwBr9058HOBAyx3a8dU2TdkUsFA+qd5eonF8+JxmxzLHOS xtjlFQGeU2buR/mu+xIRGw6+l2SfWp1pUVmlscazAhW8IvsUDWGCXdLzG A==; X-CSE-ConnectionGUID: yWNpHhB/QLWS/7XhwTsZTA== X-CSE-MsgGUID: 0lt8Wx0ZTGeYtzxtd0i7UA== X-IronPort-AV: E=McAfee;i="6700,10204,11291"; a="163321216" X-IronPort-AV: E=Sophos;i="6.12,250,1728918000"; d="scan'208";a="163321216" Received: from unknown (HELO oym-r1.gw.nic.fujitsu.com) ([210.162.30.89]) by esa7.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 19:10:09 +0900 Received: from oym-m4.gw.nic.fujitsu.com (oym-nat-oym-m4.gw.nic.fujitsu.com [192.168.87.61]) by oym-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id 098B9D480B; Fri, 20 Dec 2024 19:10:08 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id CB997D4C30; Fri, 20 Dec 2024 19:10:07 +0900 (JST) Received: from sm-x86-stp01.ssoft.mng.com (unknown [10.124.178.20]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 85FCB200597B; Fri, 20 Dec 2024 19:10:07 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, lizhijian@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v9 3/5] RDMA/rxe: Allow registering MRs for On-Demand Paging Date: Fri, 20 Dec 2024 19:09:34 +0900 Message-Id: <20241220100936.2193541-4-matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> References: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Allow userspace to register an ODP-enabled MR, in which case the flag IB_ACCESS_ON_DEMAND is passed to rxe_reg_user_mr(). However, there is no RDMA operation enabled right now. They will be supported later in the subsequent two patches. rxe_odp_do_pagefault() is called to initialize an ODP-enabled MR. It syncs process address space from the CPU page table to the driver page table (dma_list/pfn_list in umem_odp) when called with RXE_PAGEFAULT_SNAPSHOT flag. Additionally, It can be used to trigger page fault when pages being accessed are not present or do not have proper read/write permissions, and possibly to prefetch pages in the future. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 7 +++ drivers/infiniband/sw/rxe/rxe_loc.h | 12 ++++ drivers/infiniband/sw/rxe/rxe_mr.c | 9 ++- drivers/infiniband/sw/rxe/rxe_odp.c | 86 +++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 15 +++-- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- 6 files changed, 128 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 255677bc12b2..3ca73f8d96cc 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -75,6 +75,13 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->ndev->dev_addr); rxe->max_ucontext = RXE_MAX_UCONTEXT; + + if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) { + rxe->attr.kernel_cap_flags |= IBK_ON_DEMAND_PAGING; + + /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ + rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + } } /* initialize port attributes */ diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 0162ac9431c1..1043240daa97 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -185,4 +185,16 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) /* rxe_odp.c */ extern const struct mmu_interval_notifier_ops rxe_mn_ops; +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr); +#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ +static inline int +rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, + int access_flags, struct rxe_mr *mr) +{ + return -EOPNOTSUPP; +} +#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ + #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 12cacc4f60f2..190225a34139 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -323,7 +323,10 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, return err; } - return rxe_mr_copy_xarray(mr, iova, addr, length, dir); + if (mr->umem->is_odp) + return -EOPNOTSUPP; + else + return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } /* copy data in or out of a wqe, i.e. sg list @@ -532,6 +535,10 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value) struct page *page; u64 *va; + /* ODP is not supported right now. WIP. */ + if (mr->umem->is_odp) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* See IBA oA19-28 */ if (unlikely(mr->state != RXE_MR_STATE_VALID)) { rxe_dbg_mr(mr, "mr not in valid state\n"); diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index 2be8066db012..141290078754 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -36,3 +36,89 @@ static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_interval_notifier_ops rxe_mn_ops = { .invalidate = rxe_ib_invalidate_range, }; + +#define RXE_PAGEFAULT_RDONLY BIT(1) +#define RXE_PAGEFAULT_SNAPSHOT BIT(2) +static int rxe_odp_do_pagefault_and_lock(struct rxe_mr *mr, u64 user_va, int bcnt, u32 flags) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + bool fault = !(flags & RXE_PAGEFAULT_SNAPSHOT); + u64 access_mask; + int np; + + access_mask = ODP_READ_ALLOWED_BIT; + if (umem_odp->umem.writable && !(flags & RXE_PAGEFAULT_RDONLY)) + access_mask |= ODP_WRITE_ALLOWED_BIT; + + /* + * ib_umem_odp_map_dma_and_lock() locks umem_mutex on success. + * Callers must release the lock later to let invalidation handler + * do its work again. + */ + np = ib_umem_odp_map_dma_and_lock(umem_odp, user_va, bcnt, + access_mask, fault); + return np; +} + +static int rxe_odp_init_pages(struct rxe_mr *mr) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + int ret; + + ret = rxe_odp_do_pagefault_and_lock(mr, mr->umem->address, + mr->umem->length, + RXE_PAGEFAULT_SNAPSHOT); + + if (ret >= 0) + mutex_unlock(&umem_odp->umem_mutex); + + return ret >= 0 ? 0 : ret; +} + +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr) +{ + struct ib_umem_odp *umem_odp; + int err; + + if (!IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) + return -EOPNOTSUPP; + + rxe_mr_init(access_flags, mr); + + if (!start && length == U64_MAX) { + if (iova != 0) + return -EINVAL; + if (!(rxe->attr.odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT)) + return -EINVAL; + + /* Never reach here, for implicit ODP is not implemented. */ + } + + umem_odp = ib_umem_odp_get(&rxe->ib_dev, start, length, access_flags, + &rxe_mn_ops); + if (IS_ERR(umem_odp)) { + rxe_dbg_mr(mr, "Unable to create umem_odp err = %d\n", + (int)PTR_ERR(umem_odp)); + return PTR_ERR(umem_odp); + } + + umem_odp->private = mr; + + mr->umem = &umem_odp->umem; + mr->access = access_flags; + mr->ibmr.length = length; + mr->ibmr.iova = iova; + mr->page_offset = ib_umem_offset(&umem_odp->umem); + + err = rxe_odp_init_pages(mr); + if (err) { + ib_umem_odp_release(umem_odp); + return err; + } + + mr->state = RXE_MR_STATE_VALID; + mr->ibmr.type = IB_MR_TYPE_USER; + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index c11ab280551a..e703a3ab82d4 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -649,6 +649,10 @@ static enum resp_states process_flush(struct rxe_qp *qp, struct rxe_mr *mr = qp->resp.mr; struct resp_res *res = qp->resp.res; + /* ODP is not supported right now. WIP. */ + if (mr->umem->is_odp) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* oA19-14, oA19-15 */ if (res && res->replay) return RESPST_ACKNOWLEDGE; @@ -702,10 +706,13 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, if (!res->replay) { u64 iova = qp->resp.va + qp->resp.offset; - err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, - atmeth_comp(pkt), - atmeth_swap_add(pkt), - &res->atomic.orig_val); + if (mr->umem->is_odp) + err = RESPST_ERR_UNSUPPORTED_OPCODE; + else + err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); if (err) return err; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 5c18f7e342f2..13064302d766 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1278,7 +1278,10 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, u64 start, mr->ibmr.pd = ibpd; mr->ibmr.device = ibpd->device; - err = rxe_mr_init_user(rxe, start, length, access, mr); + if (access & IB_ACCESS_ON_DEMAND) + err = rxe_odp_mr_init_user(rxe, start, length, iova, access, mr); + else + err = rxe_mr_init_user(rxe, start, length, access, mr); if (err) { rxe_dbg_mr(mr, "reg_user_mr failed, err = %d\n", err); goto err_cleanup; From patchwork Fri Dec 20 10:09:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daisuke Matsuda X-Patchwork-Id: 13916461 Received: from esa7.hc1455-7.c3s2.iphmx.com (esa7.hc1455-7.c3s2.iphmx.com [139.138.61.252]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D34291C0DED; Fri, 20 Dec 2024 10:10:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=139.138.61.252 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689423; cv=none; b=IVROOrkDnooycai6H/Sde/YZA/z8+YV+nnOgxXSeP9AVbMrceFybgx3eJKKxwuw4blJahGAknmSfwpx4fyEIrUMRdW15rQegcFDwJHTYQCUk6RVgWEnGqt/xlCOyajq8cNKNN87owiCofdgD7RU/W4DkOWqNvqhpsNBKwTf4L64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689423; c=relaxed/simple; bh=n2sP4DecmZDxZfHzEQPg0qcVq+Jlny28ynFu1Gyhdyc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bM53FZcGGRoTFMp1L/ci+DtqLNC/5Zj6bHB21wSuPALf31u30++oeG3uqcGE/Z73v045HHId0LkPC8ucH/qup7amO1LCgqtrNDdYh6gI0XjT0dJ2lrRI7SdVEOTEpcbEyylPzJgWVAOkDaOQOIoeR1+VuLTSos48mzWSrJAhIZg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com; spf=pass smtp.mailfrom=fujitsu.com; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b=FvGnxkD9; arc=none smtp.client-ip=139.138.61.252 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b="FvGnxkD9" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1734689420; x=1766225420; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=n2sP4DecmZDxZfHzEQPg0qcVq+Jlny28ynFu1Gyhdyc=; b=FvGnxkD9W/8CcPm5V6R/aWj42uVXLn2lfySh1lONr10g4k0PJ65yiFeN xrtF6xGjfkcq+VghwFOrayv7+LTi3GJ36/Y/9bS9HgToDbpvGrFucyF17 49K1c1J5Fk36G34IOiETuGludjjKZ/Bd5hHzbEVRjqtCyu3Ut6cEv4s0T NdgyV+N44MirYOsgOdfYvdiPyQAHZWt5oYIjgo3yBuIWuTK2mnsNEkPg1 qWtkLWCGciimF2rqmKTXbjsToYMOuDTR1Bd0jPeLfgrdRqifUuC36I5VL 1E7djR6UhQeYyRU8KPPDUGMK1TOTaoH722pLLFof9KuOR5CNbnFdThBsH w==; X-CSE-ConnectionGUID: Lu/uDwhKQA2tCLfImW5+qQ== X-CSE-MsgGUID: Ty60ePE3S/2PlOTXa6ZirQ== X-IronPort-AV: E=McAfee;i="6700,10204,11291"; a="163321228" X-IronPort-AV: E=Sophos;i="6.12,250,1728918000"; d="scan'208";a="163321228" Received: from unknown (HELO yto-r4.gw.nic.fujitsu.com) ([218.44.52.220]) by esa7.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 19:10:12 +0900 Received: from yto-m1.gw.nic.fujitsu.com (yto-nat-yto-m1.gw.nic.fujitsu.com [192.168.83.64]) by yto-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id 904F4D500C; Fri, 20 Dec 2024 19:10:10 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by yto-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id 6076FCFBC3; Fri, 20 Dec 2024 19:10:10 +0900 (JST) Received: from sm-x86-stp01.ssoft.mng.com (unknown [10.124.178.20]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 3329C200597B; Fri, 20 Dec 2024 19:10:10 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, lizhijian@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v9 4/5] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP Date: Fri, 20 Dec 2024 19:09:35 +0900 Message-Id: <20241220100936.2193541-5-matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> References: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 rxe_mr_copy() is used widely to copy data to/from a user MR. requester uses it to load payloads of requesting packets; responder uses it to process Send, Write, and Read operaetions; completer uses it to copy data from response packets of Read and Atomic operations to a user MR. Allow these operations to be used with ODP by adding a subordinate function rxe_odp_mr_copy(). It is comprised of the following steps: 1. Check the driver page table(umem_odp->dma_list) to see if pages being accessed are present with appropriate permission. 2. If necessary, trigger page fault to map the pages. 3. Convert their user space addresses to kernel logical addresses using PFNs in the driver page table(umem_odp->pfn_list). 4. Execute data copy to/from the pages. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 10 +++ drivers/infiniband/sw/rxe/rxe_loc.h | 7 ++ drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_odp.c | 131 ++++++++++++++++++++++++++++ 4 files changed, 149 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 3ca73f8d96cc..ea643ebf9667 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -81,6 +81,16 @@ static void rxe_init_device_param(struct rxe_dev *rxe) /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; + + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 1043240daa97..7a735108d475 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -188,6 +188,8 @@ extern const struct mmu_interval_notifier_ops rxe_mn_ops; #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir); #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -195,6 +197,11 @@ rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, { return -EOPNOTSUPP; } +static inline int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, + int length, enum rxe_mr_copy_dir dir) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 190225a34139..6cd668a8dfb2 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -324,7 +324,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, } if (mr->umem->is_odp) - return -EOPNOTSUPP; + return rxe_odp_mr_copy(mr, iova, addr, length, dir); else return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index 141290078754..d3c67d18c173 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -122,3 +122,134 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, return err; } + +static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp, + u64 iova, int length, u32 perm) +{ + bool need_fault = false; + u64 addr; + int idx; + + addr = iova & (~(BIT(umem_odp->page_shift) - 1)); + + /* Skim through all pages that are to be accessed. */ + while (addr < iova + length) { + idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; + + if (!(umem_odp->dma_list[idx] & perm)) { + need_fault = true; + break; + } + + addr += BIT(umem_odp->page_shift); + } + return need_fault; +} + +static int rxe_odp_map_range_and_lock(struct rxe_mr *mr, u64 iova, int length, u32 flags) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + bool need_fault; + u64 perm; + int err; + + if (unlikely(length < 1)) + return -EINVAL; + + perm = ODP_READ_ALLOWED_BIT; + if (!(flags & RXE_PAGEFAULT_RDONLY)) + perm |= ODP_WRITE_ALLOWED_BIT; + + mutex_lock(&umem_odp->umem_mutex); + + need_fault = rxe_check_pagefault(umem_odp, iova, length, perm); + if (need_fault) { + mutex_unlock(&umem_odp->umem_mutex); + + /* umem_mutex is locked on success. */ + err = rxe_odp_do_pagefault_and_lock(mr, iova, length, + flags); + if (err < 0) + return err; + + need_fault = rxe_check_pagefault(umem_odp, iova, length, perm); + if (need_fault) + return -EFAULT; + } + + return 0; +} + +static int __rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, + int length, enum rxe_mr_copy_dir dir) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + struct page *page; + int idx, bytes; + size_t offset; + u8 *user_va; + + idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift; + offset = iova & (BIT(umem_odp->page_shift) - 1); + + while (length > 0) { + u8 *src, *dest; + + page = hmm_pfn_to_page(umem_odp->pfn_list[idx]); + user_va = kmap_local_page(page); + if (!user_va) + return -EFAULT; + + src = (dir == RXE_TO_MR_OBJ) ? addr : user_va; + dest = (dir == RXE_TO_MR_OBJ) ? user_va : addr; + + bytes = BIT(umem_odp->page_shift) - offset; + if (bytes > length) + bytes = length; + + memcpy(dest, src, bytes); + kunmap_local(user_va); + + length -= bytes; + idx++; + offset = 0; + } + + return 0; +} + +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + u32 flags = 0; + int err; + + if (length == 0) + return 0; + + if (unlikely(!mr->umem->is_odp)) + return -EOPNOTSUPP; + + switch (dir) { + case RXE_TO_MR_OBJ: + break; + + case RXE_FROM_MR_OBJ: + flags = RXE_PAGEFAULT_RDONLY; + break; + + default: + return -EINVAL; + } + + err = rxe_odp_map_range_and_lock(mr, iova, length, flags); + if (err) + return err; + + err = __rxe_odp_mr_copy(mr, iova, addr, length, dir); + + mutex_unlock(&umem_odp->umem_mutex); + + return err; +} From patchwork Fri Dec 20 10:09:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daisuke Matsuda X-Patchwork-Id: 13916459 Received: from esa4.hc1455-7.c3s2.iphmx.com (esa4.hc1455-7.c3s2.iphmx.com [68.232.139.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 797CC1AAA3D; Fri, 20 Dec 2024 10:10:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=68.232.139.117 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689418; cv=none; b=fMTh8EsMza9guNjmS9ktTO9qmvNIjdOQ+2xsM0zB/pWkTATDoPKEzD4FK9mcFBNQiU3wymlPCcaihOhTARTc90oEc75xUs3/TMqdiUhrt6aWVb5KIiRrOt2uPle+27es9NiQobfMyNdxmEFM7PcvO3P807G0enXrUyyArZbIqy8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734689418; c=relaxed/simple; bh=tjI3RBe/5HcoD/GJymJmGnHSsX9G9yUAfmPYNSXHMjA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=B5mvuAYUwjv6MUpOFKKG4kfqXzbcNqdn0cUUjobkeYupSUmCgwCH/GVTKXAx+rWN0PMZNH3pJCa7eN1EWAORLC8K4/tmViuoinQ1x5/r4ghcR5Hjvvt0IvAqzsFAKqJAHz01aBgBdCJshGpzEOBqmej1iavudcicOd3EMdODL4g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com; spf=pass smtp.mailfrom=fujitsu.com; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b=Pg7ebfdk; arc=none smtp.client-ip=68.232.139.117 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b="Pg7ebfdk" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1734689416; x=1766225416; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tjI3RBe/5HcoD/GJymJmGnHSsX9G9yUAfmPYNSXHMjA=; b=Pg7ebfdkzxCGyo2moYdSI+eYbOcRhbHcD0VlQY8O7gT099x0oEC67khE yc/3rGgARf3DeKs+8NLWBr+bF/OUYLultRMpfBI76IeXonjlIXGfZ3LX8 oVtnqFNxp25zB5UGYHkekJ65bo+43djqiq5MpliUgGuLBbWjdBGbCACsm X6X0RdPiqzAza9pTz0yYTnqykQLEoJsVFQpmyKxiuu981jrXk++smBqr/ l2O2ZwKu0wGVWzeuWKRpb0b4E6WfRXSUinNgQG9Y75D7TBwCU/cbNRMDp U61hfVAxd3J3j404qa0ai9l9TM/sHwVFXJuzKnwW3156G/rqg1hJY82Jt A==; X-CSE-ConnectionGUID: kwz5rEXTQ7Wx6WalgAgoLQ== X-CSE-MsgGUID: fRdJSJPgSp6InPatSlvKVw== X-IronPort-AV: E=McAfee;i="6700,10204,11291"; a="184642845" X-IronPort-AV: E=Sophos;i="6.12,250,1728918000"; d="scan'208";a="184642845" Received: from unknown (HELO oym-r1.gw.nic.fujitsu.com) ([210.162.30.89]) by esa4.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 19:10:15 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id F1F47D4807; Fri, 20 Dec 2024 19:10:12 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id C4A4ED94A6; Fri, 20 Dec 2024 19:10:12 +0900 (JST) Received: from sm-x86-stp01.ssoft.mng.com (unknown [10.124.178.20]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 85035200597B; Fri, 20 Dec 2024 19:10:12 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, lizhijian@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v9 5/5] RDMA/rxe: Add support for the traditional Atomic operations with ODP Date: Fri, 20 Dec 2024 19:09:36 +0900 Message-Id: <20241220100936.2193541-6-matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> References: <20241220100936.2193541-1-matsuda-daisuke@fujitsu.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Enable 'fetch and add' and 'compare and swap' operations to be used with ODP. This is comprised of the following steps: 1. Check the driver page table(umem_odp->dma_list) to see if the target page is both readable and writable. 2. If not, then trigger page fault to map the page. 3. Convert its user space address to a kernel logical address using PFNs in the driver page table(umem_odp->pfn_list). 4. Execute the operation. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 11 +++++ drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_odp.c | 69 ++++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 5 +- 5 files changed, 86 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index ea643ebf9667..08c69c637663 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -90,6 +90,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 7a735108d475..a45ff6236613 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -81,6 +81,9 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 key); int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe); void rxe_mr_cleanup(struct rxe_pool_elem *elem); +/* defined in rxe_mr.c; used in rxe_mr.c and rxe_odp.c */ +extern spinlock_t atomic_ops_lock; + /* rxe_mw.c */ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata); int rxe_dealloc_mw(struct ib_mw *ibmw); @@ -190,6 +193,8 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir); +int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val); #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -202,6 +207,12 @@ static inline int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, { return -EOPNOTSUPP; } +static inline int +rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + return RESPST_ERR_UNSUPPORTED_OPCODE; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 6cd668a8dfb2..868d2f0b74e9 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -469,7 +469,7 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length) } /* Guarantee atomicity of atomic operations at the machine level. */ -static DEFINE_SPINLOCK(atomic_ops_lock); +DEFINE_SPINLOCK(atomic_ops_lock); int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val) diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index d3c67d18c173..a82e5011360c 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -253,3 +253,72 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, return err; } + +static int rxe_odp_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + unsigned int page_offset; + struct page *page; + unsigned int idx; + u64 value; + u64 *va; + int err; + + if (unlikely(mr->state != RXE_MR_STATE_VALID)) { + rxe_dbg_mr(mr, "mr not in valid state\n"); + return RESPST_ERR_RKEY_VIOLATION; + } + + err = mr_check_range(mr, iova, sizeof(value)); + if (err) { + rxe_dbg_mr(mr, "iova out of range\n"); + return RESPST_ERR_RKEY_VIOLATION; + } + + idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift; + page_offset = iova & (BIT(umem_odp->page_shift) - 1); + page = hmm_pfn_to_page(umem_odp->pfn_list[idx]); + if (!page) + return RESPST_ERR_RKEY_VIOLATION; + + if (unlikely(page_offset & 0x7)) { + rxe_dbg_mr(mr, "iova not aligned\n"); + return RESPST_ERR_MISALIGNED_ATOMIC; + } + + va = kmap_local_page(page); + + spin_lock_bh(&atomic_ops_lock); + value = *orig_val = va[page_offset >> 3]; + + if (opcode == IB_OPCODE_RC_COMPARE_SWAP) { + if (value == compare) + va[page_offset >> 3] = swap_add; + } else { + value += swap_add; + va[page_offset >> 3] = value; + } + spin_unlock_bh(&atomic_ops_lock); + + kunmap_local(va); + + return 0; +} + +int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + int err; + + err = rxe_odp_map_range_and_lock(mr, iova, sizeof(char), 0); + if (err < 0) + return err; + + err = rxe_odp_do_atomic_op(mr, iova, opcode, compare, swap_add, + orig_val); + mutex_unlock(&umem_odp->umem_mutex); + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index e703a3ab82d4..54ba9ee1acc5 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -707,7 +707,10 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, u64 iova = qp->resp.va + qp->resp.offset; if (mr->umem->is_odp) - err = RESPST_ERR_UNSUPPORTED_OPCODE; + err = rxe_odp_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); else err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, atmeth_comp(pkt),