From patchwork Mon Oct 2 02:30:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Long Li X-Patchwork-Id: 9980157 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 31A9460365 for ; Mon, 2 Oct 2017 02:35:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1AB42289B3 for ; Mon, 2 Oct 2017 02:35:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0FB6C289B6; Mon, 2 Oct 2017 02:35:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A82CF289B3 for ; Mon, 2 Oct 2017 02:35:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752448AbdJBCeH (ORCPT ); Sun, 1 Oct 2017 22:34:07 -0400 Received: from a2nlsmtp01-05.prod.iad2.secureserver.net ([198.71.225.49]:56536 "EHLO a2nlsmtp01-05.prod.iad2.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751947AbdJBCbz (ORCPT ); Sun, 1 Oct 2017 22:31:55 -0400 Received: from linuxonhyperv.com ([107.180.71.197]) by : HOSTING RELAY : with SMTP id yqVJdUn0girtSyqVJddSnl; Sun, 01 Oct 2017 19:30:53 -0700 x-originating-ip: 107.180.71.197 Received: from longli by linuxonhyperv.com with local (Exim 4.89) (envelope-from ) id 1dyqVJ-0000yU-6p; Sun, 01 Oct 2017 19:30:45 -0700 From: Long Li To: Steve French , linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, Christoph Hellwig , Tom Talpey , Matthew Wilcox Cc: Long Li Subject: [Patch v4 21/22] CIFS: SMBD: Upper layer performs SMB read via RDMA write through memory registration Date: Sun, 1 Oct 2017 19:30:29 -0700 Message-Id: <20171002023030.3582-22-longli@exchange.microsoft.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171002023030.3582-1-longli@exchange.microsoft.com> References: <20171002023030.3582-1-longli@exchange.microsoft.com> X-CMAE-Envelope: MS4wfFjCmMNXkC3EknwKPnR/mOzzjXt+cDFV2nqbg6r1IMO1O/xleaqggyf++GBb7NhHt5umYKCJg1itmvHTXvIwFvfGVW62/yZn1kEjbloYdnavgBK8942K R23jo63l3ROwC8Pw7zBU9nWZXujb7I1iX0QCoNXTMH2ibBIAuX7dAljzBO0qB/eH9ZqBXlCn4iiyW5rXvmfd33xCTjBKtrB2lQly6LobR/JLxH/OFhK0jkeR OnABjhBRnIG5NrY4UjMwkf3KcL7I/ht8EFySxcx2+g43dbt900ug5VGdZA3rh/VUZKyhL2oQRCEg+HzSfE9HlzXfvYWpBMabfCKNj08mkXJqs8J6n9U4HO3T YF7tL5GOeA1Rij9/KfIPrcMtZg/Pc7rWYM/dlUEpSk2Jg2jqBnwTNxJNtFBVvi8O7nU/KYdGz4+K8KYokgWvhxJBx/s2wX81RAW8Ht7gueYVmI8v7X/QJlB+ lgSCQGJ+FJqSXuZP Sender: linux-cifs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Long Li If I/O size is larger than rdma_readwrite_threshold, use RDMA write for SMB read by specifying channel SMB2_CHANNEL_RDMA_V1 or SMB2_CHANNEL_RDMA_V1_INVALIDATE in the SMB packet, depending on SMB dialect used. Append a smbd_buffer_descriptor_v1 to the end of the SMB packet and fill in other values to indicate this SMB read uses RDMA write. There is no need to read from the transport for incoming payload. At the time SMB read response comes back, the data is already transfered and placed in the pages by RDMA hardware. When SMB read is finished, deregister the memory regions if RDMA write is used for this SMB read. smbd_deregister_mr may need to do local invalidation and sleep, if server remote invalidation is not used. There are situations where the MID may not be created on I/O failure, under which memory region is deregistered when read data context is released. Signed-off-by: Long Li --- fs/cifs/cifsglob.h | 1 + fs/cifs/file.c | 10 ++++++++++ fs/cifs/smb2pdu.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index f851b50..30b99a5 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1152,6 +1152,7 @@ struct cifs_readdata { struct cifs_readdata *rdata, struct iov_iter *iter); struct kvec iov[2]; + struct smbd_mr *mr; unsigned int pagesz; unsigned int tailsz; unsigned int credits; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0786f19..8396f1e 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -42,6 +42,7 @@ #include "cifs_debug.h" #include "cifs_fs_sb.h" #include "fscache.h" +#include "smbdirect.h" static inline int cifs_convert_flags(unsigned int flags) @@ -2909,6 +2910,11 @@ cifs_readdata_release(struct kref *refcount) struct cifs_readdata *rdata = container_of(refcount, struct cifs_readdata, refcount); + if (rdata->mr) { + smbd_deregister_mr(rdata->mr); + rdata->mr = NULL; + } + if (rdata->cfile) cifsFileInfo_put(rdata->cfile); @@ -3037,6 +3043,8 @@ uncached_fill_pages(struct TCP_Server_Info *server, } if (iter) result = copy_page_from_iter(page, 0, n, iter); + else if (rdata->mr) + result = n; else result = cifs_read_page_from_socket(server, page, n); if (result < 0) @@ -3606,6 +3614,8 @@ readpages_fill_pages(struct TCP_Server_Info *server, if (iter) result = copy_page_from_iter(page, 0, n, iter); + else if (rdata->mr) + result = n; else result = cifs_read_page_from_socket(server, page, n); if (result < 0) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 7053db9..31dcee0 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -2380,6 +2380,39 @@ smb2_new_read_req(void **buf, unsigned int *total_len, req->Length = cpu_to_le32(io_parms->length); req->Offset = cpu_to_le64(io_parms->offset); + /* + * If we want to do a RDMA write, fill in and append + * smbd_buffer_descriptor_v1 to the end of read request + */ + if (server->rdma && rdata && + rdata->bytes >= server->smbd_conn->rdma_readwrite_threshold) { + + struct smbd_buffer_descriptor_v1 *v1; + bool need_invalidate = + io_parms->tcon->ses->server->dialect == SMB30_PROT_ID; + + rdata->mr = smbd_register_mr( + server->smbd_conn, rdata->pages, + rdata->nr_pages, rdata->tailsz, + true, need_invalidate); + if (!rdata->mr) + return -ENOBUFS; + + req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE; + if (need_invalidate) + req->Channel = SMB2_CHANNEL_RDMA_V1; + req->ReadChannelInfoOffset = + offsetof(struct smb2_read_plain_req, Buffer); + req->ReadChannelInfoLength = + sizeof(struct smbd_buffer_descriptor_v1); + v1 = (struct smbd_buffer_descriptor_v1 *) &req->Buffer[0]; + v1->offset = rdata->mr->mr->iova; + v1->token = rdata->mr->mr->rkey; + v1->length = rdata->mr->mr->length; + + *total_len += sizeof(*v1) - 1; + } + if (request_type & CHAINED_REQUEST) { if (!(request_type & END_OF_CHAIN)) { /* next 8-byte aligned request */ @@ -2459,6 +2492,16 @@ smb2_readv_callback(struct mid_q_entry *mid) rdata->result = -EIO; } + /* + * If this rdata has a memmory registered, the MR can be freed + * MR needs to be freed as soon as I/O finishes to prevent deadlock + * because they have limited number and are used for future I/Os + */ + if (rdata->mr) { + smbd_deregister_mr(rdata->mr); + rdata->mr = NULL; + } + if (rdata->result) cifs_stats_fail_inc(tcon, SMB2_READ_HE);