From patchwork Tue Mar 7 14:15:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0058EC678D4 for ; Tue, 7 Mar 2023 14:27:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230129AbjCGO1k (ORCPT ); Tue, 7 Mar 2023 09:27:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230119AbjCGO10 (ORCPT ); Tue, 7 Mar 2023 09:27:26 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A74CE8C81F for ; Tue, 7 Mar 2023 06:22:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198930; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/gv5iBghgfjGMwQIblGY0Ee8w9OWfXa/KQ07CaskV0k=; b=MMj5ZHySy20+77cXPXudrwbWYymzktDp3OfMziT3jGvu0rrh6RPgGYH+p0X59ePhxlbRRd uey3FMunK4jiOPILYTSOix+oO8gMjCliOdgWkjY8FE0Tr5s4fpKkJbAn0olp24phbgqTVg 8ABV30uKtq3g9vpQrOfEDu2ip51Px0o= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-503-XSC25OLtMEO-Q8lwj0T0xw-1; Tue, 07 Mar 2023 09:15:35 -0500 X-MC-Unique: XSC25OLtMEO-Q8lwj0T0xw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EF38D38173C9; Tue, 7 Mar 2023 14:15:34 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 275AB400DFA1; Tue, 7 Mar 2023 14:15:33 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 01/17] io_uring: add IO_URING_F_FUSED and prepare for supporting OP_FUSED_CMD Date: Tue, 7 Mar 2023 22:15:04 +0800 Message-Id: <20230307141520.793891-2-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add flag IO_URING_F_FUSED and prepare for supporting IO_URING_OP_FUSED_CMD, which is still one type of IO_URING_OP_URING_CMD, so it is reasonable to reuse ->uring_cmd() for handling IO_URING_F_FUSED_CMD. Just IO_URING_F_FUSED_CMD will carry one 64byte SQE as payload which is handled by one slave request. The master uring command will provide kernel buffer to the slave request. Mark all existed drivers to not support IO_URING_F_FUSED_CMD, given it depends if driver is capable of handling the slave request. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 6 ++++++ drivers/char/mem.c | 4 ++++ drivers/nvme/host/ioctl.c | 9 +++++++++ include/linux/io_uring.h | 7 +++++++ 4 files changed, 26 insertions(+) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index d1d1c8d606c8..088d9efffe6b 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1271,6 +1271,9 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) __func__, cmd->cmd_op, ub_cmd->q_id, tag, ub_cmd->result); + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) goto out; @@ -2169,6 +2172,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd, struct ublk_device *ub = NULL; int ret = -EINVAL; + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN; diff --git a/drivers/char/mem.c b/drivers/char/mem.c index ffb101d349f0..134ba6665194 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -30,6 +30,7 @@ #include #include #include +#include #ifdef CONFIG_IA64 # include @@ -482,6 +483,9 @@ static ssize_t splice_write_null(struct pipe_inode_info *pipe, struct file *out, static int uring_cmd_null(struct io_uring_cmd *ioucmd, unsigned int issue_flags) { + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + return 0; } diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 723e7d5b778f..44a171bcaa90 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -773,6 +773,9 @@ int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) struct nvme_ns *ns = container_of(file_inode(ioucmd->file)->i_cdev, struct nvme_ns, cdev); + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + return nvme_ns_uring_cmd(ns, ioucmd, issue_flags); } @@ -878,6 +881,9 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd, struct nvme_ns *ns = nvme_find_path(head); int ret = -EINVAL; + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + if (ns) ret = nvme_ns_uring_cmd(ns, ioucmd, issue_flags); srcu_read_unlock(&head->srcu, srcu_idx); @@ -915,6 +921,9 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) struct nvme_ctrl *ctrl = ioucmd->file->private_data; int ret; + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + /* IOPOLL not supported yet */ if (issue_flags & IO_URING_F_IOPOLL) return -EOPNOTSUPP; diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 934e5dd4ccc0..0676c5b4b5fe 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -20,6 +20,13 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = (1 << 8), IO_URING_F_CQE32 = (1 << 9), IO_URING_F_IOPOLL = (1 << 10), + + /* for FUSED_CMD only */ + IO_URING_F_FUSED_WRITE = (1 << 11), /* slave writes to buffer */ + IO_URING_F_FUSED_READ = (1 << 12), /* slave reads from buffer */ + /* driver incapable of FUSED_CMD should fail cmd when seeing F_FUSED */ + IO_URING_F_FUSED = IO_URING_F_FUSED_WRITE | + IO_URING_F_FUSED_READ, }; struct io_uring_cmd { From patchwork Tue Mar 7 14:15:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C020C678D4 for ; Tue, 7 Mar 2023 14:27:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230207AbjCGO1a (ORCPT ); Tue, 7 Mar 2023 09:27:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230206AbjCGO1O (ORCPT ); Tue, 7 Mar 2023 09:27:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73A378B05D for ; Tue, 7 Mar 2023 06:22:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VRLpYKe3D6K4jAn8/yM+KI5gZkcxB6TSRvpfjZ+no2U=; b=BhZIPOAaB3oUmKGkwzrY3MOX6e759ovs/y1aWJNkoYwEhE/Ig6K6MeF+nn5QB8ryyxKjYV Y/FC5zjMUbEbbnTk05i4JK5wBrXNPMBo+gpcgpjrrsJA/tW5QPK561J70yaNZxhLmDyXke pkiJYOIkR/zOFggrs9tnvQgwp16RFno= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-141-go3H6UEvM7WQGKJt0tRE3Q-1; Tue, 07 Mar 2023 09:15:39 -0500 X-MC-Unique: go3H6UEvM7WQGKJt0tRE3Q-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A51561C06903; Tue, 7 Mar 2023 14:15:38 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id E33A12166B26; Tue, 7 Mar 2023 14:15:37 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 02/17] io_uring: increase io_kiocb->flags into 64bit Date: Tue, 7 Mar 2023 22:15:05 +0800 Message-Id: <20230307141520.793891-3-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The 32bit io_kiocb->flags has been used up, so extend it to 64bit. Signed-off-by: Ming Lei --- include/linux/io_uring_types.h | 3 ++- io_uring/io_uring.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 00689c12f6ab..66fd0f7f1b24 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -530,7 +530,8 @@ struct io_kiocb { * and after selection it points to the buffer ID itself. */ u16 buf_index; - unsigned int flags; + u32 __pad; + u64 flags; struct io_cqe cqe; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index fd1cc35a1c00..90a835215d04 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -4418,7 +4418,7 @@ static int __init io_uring_init(void) BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS); - BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int)); + BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(u64)); BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); From patchwork Tue Mar 7 14:15:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163658 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96E4AC6FD1A for ; Tue, 7 Mar 2023 14:22:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231132AbjCGOWc (ORCPT ); Tue, 7 Mar 2023 09:22:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229795AbjCGOWC (ORCPT ); Tue, 7 Mar 2023 09:22:02 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5CFF8C517 for ; Tue, 7 Mar 2023 06:16:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198544; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M7SBaPI0XAZDdUox2W8vA311QtvXoIeFSbVfG3vI/AE=; b=gTzDhtqbKPZE3Elv1aEbP4bBGzFFHIKz/PxC0dn74BYxaD8EY6fYBV8GIVuliItkgN2F4g 9DuNLdadLO4qu/m4EDGjMz9UyO2zd+9QWin6Zy3HuCui+V7iPo/FTqsl+cW7+LrfT9UXL3 3Jh+SgPxP2zYX0Iye5frDQgmSPtTRN0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-541-PX_68rgtOo-7KzNUmYqnJw-1; Tue, 07 Mar 2023 09:15:42 -0500 X-MC-Unique: PX_68rgtOo-7KzNUmYqnJw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 41169877CB2; Tue, 7 Mar 2023 14:15:42 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id EDCBC492C14; Tue, 7 Mar 2023 14:15:40 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 03/17] io_uring: add IORING_OP_FUSED_CMD Date: Tue, 7 Mar 2023 22:15:06 +0800 Message-Id: <20230307141520.793891-4-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd 64byte SQE(slave) is another normal 64byte OP. For any OP which needs to support slave OP, io_issue_defs[op].fused_slave has to be set as 1, and its ->issue() needs to retrieve buffer from master request's fused_cmd_kbuf. Follows the key points of the design/implementation: 1) The master uring command produces and provides immutable command buffer(struct io_uring_bvec_buf) to the slave request, and the slave OP can retrieve any part of this buffer by sqe->addr and sqe->len. 2) Master command is always completed after the slave request is completed. - Before slave request is submitted, the buffer ownership is transferred to slave request. After slave request is completed, the buffer ownership is returned back to master request. - This way also guarantees correct SQE order since the master request uses slave request's LINK flag. 3) Master request is always completed by driver, so that driver can know when the buffer is done with slave quest. The motivation is for supporting zero copy for fuse/ublk, in which the device holds IO request buffer, and IO handling is often normal IO OP(fs, net, ..). With IORING_OP_FUSED_CMD, we can implement this kind of zero copy easily & reliably. Signed-off-by: Ming Lei --- include/linux/io_uring.h | 42 +++++- include/linux/io_uring_types.h | 15 +++ include/uapi/linux/io_uring.h | 1 + io_uring/Makefile | 2 +- io_uring/fused_cmd.c | 232 +++++++++++++++++++++++++++++++++ io_uring/fused_cmd.h | 11 ++ io_uring/io_uring.c | 20 ++- io_uring/io_uring.h | 3 + io_uring/opdef.c | 12 ++ io_uring/opdef.h | 2 + 10 files changed, 334 insertions(+), 6 deletions(-) create mode 100644 io_uring/fused_cmd.c create mode 100644 io_uring/fused_cmd.h diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 0676c5b4b5fe..f3a23e00b47c 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -4,6 +4,7 @@ #include #include +#include #include enum io_uring_cmd_flags { @@ -29,6 +30,19 @@ enum io_uring_cmd_flags { IO_URING_F_FUSED_READ, }; +union io_uring_fused_cmd_data { + /* + * In case of slave request IOSQE_CQE_SKIP_SUCCESS, return slave + * result via master command; otherwise we simply return success + * if buffer is provided, and slave request will return its result + * via its CQE + */ + s32 slave_res; + + /* fused cmd private, driver do not touch it */ + struct io_kiocb *__slave; +}; + struct io_uring_cmd { struct file *file; const void *cmd; @@ -40,10 +54,31 @@ struct io_uring_cmd { }; u32 cmd_op; u32 flags; - u8 pdu[32]; /* available inline for free use */ + + /* for fused command, the available pdu is a bit less */ + union { + struct { + union io_uring_fused_cmd_data data; + u8 pdu[24]; /* available inline for free use */ + } fused; + u8 pdu[32]; /* available inline for free use */ + }; +}; + +struct io_uring_bvec_buf { + unsigned long len; + unsigned int nr_bvecs; + + /* offset in the 1st bvec */ + unsigned int offset; + struct bio_vec *bvec; + struct bio_vec __bvec[]; }; #if defined(CONFIG_IO_URING) +void io_fused_cmd_provide_kbuf(struct io_uring_cmd *ioucmd, bool locked, + const struct io_uring_bvec_buf *imu, + void (*complete_tw_cb)(struct io_uring_cmd *)); int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); @@ -73,6 +108,11 @@ static inline void io_uring_free(struct task_struct *tsk) __io_uring_free(tsk); } #else +static inline void io_fused_cmd_provide_kbuf(struct io_uring_cmd *ioucmd, + bool locked, const struct io_uring_bvec_buf *fused_cmd_kbuf, + unsigned int len, void (*complete_tw_cb)(struct io_uring_cmd *)) +{ +} static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) { diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 66fd0f7f1b24..088df87f6e8c 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -401,6 +401,7 @@ enum { /* keep async read/write and isreg together and in order */ REQ_F_SUPPORT_NOWAIT_BIT, REQ_F_ISREG_BIT, + REQ_F_FUSED_SLAVE_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -470,6 +471,8 @@ enum { REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), + /* slave request in fused cmd, won't be one uring cmd */ + REQ_F_FUSED_SLAVE = BIT(REQ_F_FUSED_SLAVE_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked); @@ -552,6 +555,18 @@ struct io_kiocb { * REQ_F_BUFFER_RING is set. */ struct io_buffer_list *buf_list; + + /* + * store kernel (sub)buffer of fused master request which OP + * is IORING_OP_FUSED_CMD + */ + const struct io_uring_bvec_buf *fused_cmd_kbuf; + + /* + * store fused command master request for fuse slave request, + * which uses fuse master's kernel buffer for handling this OP + */ + struct io_kiocb *fused_master_req; }; union { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 709de6d4feb2..f07d005ee898 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -223,6 +223,7 @@ enum io_uring_op { IORING_OP_URING_CMD, IORING_OP_SEND_ZC, IORING_OP_SENDMSG_ZC, + IORING_OP_FUSED_CMD, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/Makefile b/io_uring/Makefile index 8cc8e5387a75..5301077e61c5 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,5 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o + cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o fused_cmd.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/fused_cmd.c b/io_uring/fused_cmd.c new file mode 100644 index 000000000000..9435e67fc47c --- /dev/null +++ b/io_uring/fused_cmd.c @@ -0,0 +1,232 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "opdef.h" +#include "rsrc.h" +#include "uring_cmd.h" +#include "fused_cmd.h" + +static bool io_fused_slave_valid(const struct io_uring_sqe *sqe, u8 op) +{ + unsigned int sqe_flags = READ_ONCE(sqe->flags); + + if (op == IORING_OP_FUSED_CMD || op == IORING_OP_URING_CMD) + return false; + + if (sqe_flags & REQ_F_BUFFER_SELECT) + return false; + + if (!io_issue_defs[op].fused_slave) + return false; + + return true; +} + +static inline void io_fused_cmd_update_link_flags(struct io_kiocb *req, + const struct io_kiocb *slave) +{ + /* + * We have to keep slave SQE in order, so update master link flags + * with slave request's given master command isn't completed until + * the slave request is done + */ + if (slave->flags & (REQ_F_LINK | REQ_F_HARDLINK)) + req->flags |= REQ_F_LINK; +} + +int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) + __must_hold(&req->ctx->uring_lock) +{ + struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + const struct io_uring_sqe *slave_sqe = sqe + 1; + struct io_ring_ctx *ctx = req->ctx; + struct io_kiocb *slave; + u8 slave_op; + int ret; + + if (unlikely(!(ctx->flags & IORING_SETUP_SQE128))) + return -EINVAL; + + if (unlikely(sqe->__pad1)) + return -EINVAL; + + ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags); + if (unlikely(ioucmd->flags)) + return -EINVAL; + + slave_op = READ_ONCE(slave_sqe->opcode); + if (unlikely(!io_fused_slave_valid(slave_sqe, slave_op))) + return -EINVAL; + + ioucmd->cmd = sqe->cmd; + ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); + req->fused_cmd_kbuf = NULL; + + /* take one extra reference for the slave request */ + io_get_task_refs(1); + + ret = -ENOMEM; + if (unlikely(!io_alloc_req(ctx, &slave))) + goto fail; + + ret = io_init_req(ctx, slave, slave_sqe, true); + if (unlikely(ret)) + goto fail_free_req; + + io_fused_cmd_update_link_flags(req, slave); + + ioucmd->fused.data.__slave = slave; + + return 0; + +fail_free_req: + io_free_req(slave); +fail: + current->io_uring->cached_refs += 1; + return ret; +} + +static inline bool io_fused_slave_write_to_buf(u8 op) +{ + switch (op) { + case IORING_OP_READ: + case IORING_OP_READV: + case IORING_OP_READ_FIXED: + case IORING_OP_RECVMSG: + case IORING_OP_RECV: + return 1; + default: + return 0; + } +} + +int io_fused_cmd(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + const struct io_kiocb *slave = ioucmd->fused.data.__slave; + int ret = -EINVAL; + + /* + * Pass buffer direction for driver to validate if the read/write + * is legal + */ + if (io_fused_slave_write_to_buf(slave->opcode)) + issue_flags |= IO_URING_F_FUSED_WRITE; + else + issue_flags |= IO_URING_F_FUSED_READ; + + ret = io_uring_cmd(req, issue_flags); + if (ret != IOU_ISSUE_SKIP_COMPLETE) + io_free_req(ioucmd->fused.data.__slave); + + return ret; +} + +int io_import_kbuf_for_slave(unsigned long buf_off, unsigned int len, int dir, + struct iov_iter *iter, struct io_kiocb *slave) +{ + struct io_kiocb *req = slave->fused_master_req; + const struct io_uring_bvec_buf *kbuf; + unsigned long offset; + + if (unlikely(!(slave->flags & REQ_F_FUSED_SLAVE) || !req)) + return -EINVAL; + + if (unlikely(!req->fused_cmd_kbuf)) + return -EINVAL; + + /* req->fused_cmd_kbuf is immutable */ + kbuf = req->fused_cmd_kbuf; + offset = kbuf->offset; + + if (!kbuf->bvec) + return -EINVAL; + + if (unlikely(buf_off > kbuf->len)) + return -EFAULT; + + if (unlikely(len > kbuf->len - buf_off)) + return -EFAULT; + + /* don't use io_import_fixed which doesn't support multipage bvec */ + offset += buf_off; + iov_iter_bvec(iter, dir, kbuf->bvec, kbuf->nr_bvecs, offset + len); + + if (offset) + iov_iter_advance(iter, offset); + + return 0; +} + +/* + * Called when slave request is completed, + * + * Return back ownership of the fused_cmd kbuf to master request, and + * notify master request. + */ +void io_fused_cmd_return_kbuf(struct io_kiocb *slave) +{ + struct io_kiocb *req = slave->fused_master_req; + struct io_uring_cmd *ioucmd; + + if (unlikely(!req || !(slave->flags & REQ_F_FUSED_SLAVE))) + return; + + /* return back the buffer */ + slave->fused_master_req = NULL; + ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + ioucmd->fused.data.__slave = NULL; + + /* If slave OP skips CQE, return the result via master command */ + if (slave->flags & REQ_F_CQE_SKIP) + ioucmd->fused.data.slave_res = slave->cqe.res; + else + ioucmd->fused.data.slave_res = 0; + io_uring_cmd_complete_in_task(ioucmd, ioucmd->task_work_cb); +} + +/* + * This API needs to be called when master command has prepared + * FUSED_CMD buffer, and offset/len in ->fused.data is for retrieving + * sub-buffer in the command buffer, which is often figured out by + * command payload data. + * + * Master command is always completed after the slave request + * is completed, so driver has to set completion callback for + * getting notification. + * + * Ownership of the fused_cmd kbuf is transferred to slave request. + */ +void io_fused_cmd_provide_kbuf(struct io_uring_cmd *ioucmd, bool locked, + const struct io_uring_bvec_buf *fused_cmd_kbuf, + void (*complete_tw_cb)(struct io_uring_cmd *)) +{ + struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); + struct io_kiocb *slave = ioucmd->fused.data.__slave; + + /* + * Once the fused slave request is completed, the driver will + * be notified by callback of complete_tw_cb + */ + ioucmd->task_work_cb = complete_tw_cb; + + /* now we get the buffer */ + req->fused_cmd_kbuf = fused_cmd_kbuf; + slave->fused_master_req = req; + + trace_io_uring_submit_sqe(slave, true); + if (locked) + io_req_task_submit(slave, &locked); + else + io_req_task_queue(slave); +} +EXPORT_SYMBOL_GPL(io_fused_cmd_provide_kbuf); diff --git a/io_uring/fused_cmd.h b/io_uring/fused_cmd.h new file mode 100644 index 000000000000..86ad87d1b0ec --- /dev/null +++ b/io_uring/fused_cmd.h @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_FUSED_CMD_H +#define IOU_FUSED_CMD_H + +int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_fused_cmd(struct io_kiocb *req, unsigned int issue_flags); +void io_fused_cmd_return_kbuf(struct io_kiocb *slave); +int io_import_kbuf_for_slave(unsigned long buf, unsigned int len, int dir, + struct iov_iter *iter, struct io_kiocb *slave); + +#endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 90a835215d04..46c060713928 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -91,6 +91,7 @@ #include "cancel.h" #include "net.h" #include "notif.h" +#include "fused_cmd.h" #include "timeout.h" #include "poll.h" @@ -110,7 +111,7 @@ #define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \ REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS | \ - REQ_F_ASYNC_DATA) + REQ_F_ASYNC_DATA | REQ_F_FUSED_SLAVE) #define IO_REQ_CLEAN_SLOW_FLAGS (REQ_F_REFCOUNT | REQ_F_LINK | REQ_F_HARDLINK |\ IO_REQ_CLEAN_FLAGS) @@ -964,6 +965,9 @@ static void __io_req_complete_post(struct io_kiocb *req) { struct io_ring_ctx *ctx = req->ctx; + if (req->flags & REQ_F_FUSED_SLAVE) + io_fused_cmd_return_kbuf(req); + io_cq_lock(ctx); if (!(req->flags & REQ_F_CQE_SKIP)) io_fill_cqe_req(ctx, req); @@ -1848,6 +1852,8 @@ static void io_clean_op(struct io_kiocb *req) spin_lock(&req->ctx->completion_lock); io_put_kbuf_comp(req); spin_unlock(&req->ctx->completion_lock); + } else if (req->flags & REQ_F_FUSED_SLAVE) { + io_fused_cmd_return_kbuf(req); } if (req->flags & REQ_F_NEED_CLEANUP) { @@ -2156,8 +2162,8 @@ static void io_init_req_drain(struct io_kiocb *req) } } -static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, - const struct io_uring_sqe *sqe) +int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe, bool slave) __must_hold(&ctx->uring_lock) { const struct io_issue_def *def; @@ -2210,6 +2216,12 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, } } + if (slave) { + if (!def->fused_slave) + return -EINVAL; + req->flags |= REQ_F_FUSED_SLAVE; + } + if (!def->ioprio && sqe->ioprio) return -EINVAL; if (!def->iopoll && (ctx->flags & IORING_SETUP_IOPOLL)) @@ -2294,7 +2306,7 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, struct io_submit_link *link = &ctx->submit_state.link; int ret; - ret = io_init_req(ctx, req, sqe); + ret = io_init_req(ctx, req, sqe, false); if (unlikely(ret)) return io_submit_fail_init(sqe, req, ret); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2711865f1e19..a50c7e1f6e81 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -78,6 +78,9 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); +int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe, bool slave); + #define io_lockdep_assert_cq_locked(ctx) \ do { \ if (ctx->flags & IORING_SETUP_IOPOLL) { \ diff --git a/io_uring/opdef.c b/io_uring/opdef.c index cca7c5b55208..63b90e8e65f8 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -33,6 +33,7 @@ #include "poll.h" #include "cancel.h" #include "rw.h" +#include "fused_cmd.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -428,6 +429,12 @@ const struct io_issue_def io_issue_defs[] = { .prep = io_eopnotsupp_prep, #endif }, + [IORING_OP_FUSED_CMD] = { + .needs_file = 1, + .plug = 1, + .prep = io_fused_cmd_prep, + .issue = io_fused_cmd, + }, }; @@ -648,6 +655,11 @@ const struct io_cold_def io_cold_defs[] = { .fail = io_sendrecv_fail, #endif }, + [IORING_OP_FUSED_CMD] = { + .name = "FUSED_CMD", + .async_size = uring_cmd_pdu_size(1), + .prep_async = io_uring_cmd_prep_async, + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/opdef.h b/io_uring/opdef.h index c22c8696e749..306f6fc48ed4 100644 --- a/io_uring/opdef.h +++ b/io_uring/opdef.h @@ -29,6 +29,8 @@ struct io_issue_def { unsigned iopoll_queue : 1; /* opcode specific path will handle ->async_data allocation if needed */ unsigned manual_alloc : 1; + /* can be slave op of fused command */ + unsigned fused_slave : 1; int (*issue)(struct io_kiocb *, unsigned int); int (*prep)(struct io_kiocb *, const struct io_uring_sqe *); From patchwork Tue Mar 7 14:15:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90E67C678D4 for ; Tue, 7 Mar 2023 14:22:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231359AbjCGOWb (ORCPT ); Tue, 7 Mar 2023 09:22:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230180AbjCGOV6 (ORCPT ); Tue, 7 Mar 2023 09:21:58 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6D098B044 for ; Tue, 7 Mar 2023 06:16:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198550; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mMsNgQWQgB6DB71lSdY+b0QQaIZnvxIxMO/B+UZ8jwc=; b=grhFRKau3oPnZdfnP21VIjseGFXpbUg95w6yu46SuQjl5WJA1vetyCIqitFsnPBev/a/7u vhc8GQ7BXJn+YPckodrSssipxXY6AC47LtAmq1E3eJv4z67QwrNMQ/sIQqfj64E+NoEcBl MjMlPGjPjXvA5srTR/4xFP4dNP7m+FE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-344-fx7QMbn4NM2lWKqc-E5k9A-1; Tue, 07 Mar 2023 09:15:46 -0500 X-MC-Unique: fx7QMbn4NM2lWKqc-E5k9A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B07A43C0E44B; Tue, 7 Mar 2023 14:15:45 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id ED11E2026D4B; Tue, 7 Mar 2023 14:15:44 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 04/17] io_uring: support OP_READ/OP_WRITE for fused slave request Date: Tue, 7 Mar 2023 22:15:07 +0800 Message-Id: <20230307141520.793891-5-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Start to allow fused slave request to support OP_READ/OP_WRITE, and the buffer can be retrieved from master request. Once the slave request is completed, the master buffer will be returned back. Signed-off-by: Ming Lei --- io_uring/opdef.c | 2 ++ io_uring/rw.c | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 63b90e8e65f8..f044629e5475 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -235,6 +235,7 @@ const struct io_issue_def io_issue_defs[] = { .ioprio = 1, .iopoll = 1, .iopoll_queue = 1, + .fused_slave = 1, .prep = io_prep_rw, .issue = io_read, }, @@ -248,6 +249,7 @@ const struct io_issue_def io_issue_defs[] = { .ioprio = 1, .iopoll = 1, .iopoll_queue = 1, + .fused_slave = 1, .prep = io_prep_rw, .issue = io_write, }, diff --git a/io_uring/rw.c b/io_uring/rw.c index 4c233910e200..36d31a943317 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -19,6 +19,7 @@ #include "kbuf.h" #include "rsrc.h" #include "rw.h" +#include "fused_cmd.h" struct io_rw { /* NOTE: kiocb has the file as the first member, so don't do it here */ @@ -371,6 +372,17 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, size_t sqe_len; ssize_t ret; + /* + * SLAVE OP passes buffer offset from sqe->addr actually, since + * the fused cmd kbuf's mapped start address is zero. + */ + if (req->flags & REQ_F_FUSED_SLAVE) { + ret = io_import_kbuf_for_slave(rw->addr, rw->len, ddir, iter, req); + if (ret) + return ERR_PTR(ret); + return NULL; + } + if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) { ret = io_import_fixed(ddir, iter, req->imu, rw->addr, rw->len); if (ret) @@ -428,11 +440,19 @@ static inline loff_t *io_kiocb_ppos(struct kiocb *kiocb) */ static ssize_t loop_rw_iter(int ddir, struct io_rw *rw, struct iov_iter *iter) { + struct io_kiocb *req = cmd_to_io_kiocb(rw); struct kiocb *kiocb = &rw->kiocb; struct file *file = kiocb->ki_filp; ssize_t ret = 0; loff_t *ppos; + /* + * Fused slave req hasn't user buffer, so ->read/->write can't + * be supported + */ + if (req->flags & REQ_F_FUSED_SLAVE) + return -EOPNOTSUPP; + /* * Don't support polled IO through this interface, and we can't * support non-blocking either. For the latter, this just causes From patchwork Tue Mar 7 14:15:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163680 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B54FC6FD1E for ; Tue, 7 Mar 2023 14:25:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230322AbjCGOZp (ORCPT ); Tue, 7 Mar 2023 09:25:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230311AbjCGOZ1 (ORCPT ); Tue, 7 Mar 2023 09:25:27 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF1BA8B046 for ; Tue, 7 Mar 2023 06:20:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b/8o7oVK1PRUQreC7eaCR5WXIzG/YrOlciL0BHCQHEk=; b=aXq4fUDScutEiHGVIueubUo9Nki++e5L+BgcUkkgn+E7JtmkGoDpCMnzEpeOfzCHxcS5/3 OXzTqhyq75ZfYx+r/Kqw3QCa0uPIx9rgfAjmhIAWxn8HOvx5EmvH5D0YQM9BTrT1z9PQZX GRRKqQo5lHifsqsE3SzSdxNtK/hktVA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-300-BIf_w_NgNFy6TKhAB-E8ZA-1; Tue, 07 Mar 2023 09:15:55 -0500 X-MC-Unique: BIf_w_NgNFy6TKhAB-E8ZA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C8F75858F0E; Tue, 7 Mar 2023 14:15:54 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 03DC5C15BA0; Tue, 7 Mar 2023 14:15:47 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 05/17] io_uring: support OP_SEND_ZC/OP_RECV for fused slave request Date: Tue, 7 Mar 2023 22:15:08 +0800 Message-Id: <20230307141520.793891-6-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Start to allow fused slave request to support OP_SEND_ZC/OP_RECV, and the buffer can be retrieved from master request. Once the slave request is completed, the master buffer will be returned back. Signed-off-by: Ming Lei --- io_uring/net.c | 23 +++++++++++++++++++++-- io_uring/opdef.c | 3 +++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index b7f190ca528e..9ec6a77b4e82 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "fused_cmd.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -378,7 +379,11 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!sock)) return -ENOTSOCK; - ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + if (!(req->flags & REQ_F_FUSED_SLAVE)) + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + else + ret = io_import_kbuf_for_slave((u64)sr->buf, sr->len, + ITER_SOURCE, &msg.msg_iter, req); if (unlikely(ret)) return ret; @@ -869,7 +874,11 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) sr->buf = buf; } - ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); + if (!(req->flags & REQ_F_FUSED_SLAVE)) + ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); + else + ret = io_import_kbuf_for_slave((u64)sr->buf, sr->len, ITER_DEST, + &msg.msg_iter, req); if (unlikely(ret)) goto out_free; @@ -983,6 +992,9 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (zc->flags & IORING_RECVSEND_FIXED_BUF) { unsigned idx = READ_ONCE(sqe->buf_index); + if (req->flags & REQ_F_FUSED_SLAVE) + return -EINVAL; + if (unlikely(idx >= ctx->nr_user_bufs)) return -EFAULT; idx = array_index_nospec(idx, ctx->nr_user_bufs); @@ -1119,8 +1131,15 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret)) return ret; msg.sg_from_iter = io_sg_from_iter; + } else if (req->flags & REQ_F_FUSED_SLAVE) { + ret = io_import_kbuf_for_slave((u64)zc->buf, zc->len, + ITER_SOURCE, &msg.msg_iter, req); + if (unlikely(ret)) + return ret; + msg.sg_from_iter = io_sg_from_iter; } else { io_notif_set_extended(zc->notif); + ret = import_ubuf(ITER_SOURCE, zc->buf, zc->len, &msg.msg_iter); if (unlikely(ret)) return ret; diff --git a/io_uring/opdef.c b/io_uring/opdef.c index f044629e5475..0a9d39a9db16 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -271,6 +271,7 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .fused_slave = 1, #if defined(CONFIG_NET) .prep = io_sendmsg_prep, .issue = io_send, @@ -285,6 +286,7 @@ const struct io_issue_def io_issue_defs[] = { .buffer_select = 1, .audit_skip = 1, .ioprio = 1, + .fused_slave = 1, #if defined(CONFIG_NET) .prep = io_recvmsg_prep, .issue = io_recv, @@ -411,6 +413,7 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .fused_slave = 1, #if defined(CONFIG_NET) .prep = io_send_zc_prep, .issue = io_send_zc, From patchwork Tue Mar 7 14:15:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2907DC6FD1A for ; Tue, 7 Mar 2023 14:25:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230440AbjCGOZo (ORCPT ); Tue, 7 Mar 2023 09:25:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230310AbjCGOZY (ORCPT ); Tue, 7 Mar 2023 09:25:24 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E739B9078A for ; Tue, 7 Mar 2023 06:20:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198752; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P26ujjVK6+cVEWytyY3vOQfQzYjDdlY3n3B9f9x4iO4=; b=dv5XSVnyQbUgZ9myS8WMc6miQgM47yVOmrDx1+DTJOuS5bQkoniMe0Szu6IhrX4X+5sZIu E5nicQN6m15T9boNSGOYBTfwAoTD5TKP9BAN/xiEdnnpFkThuWC7xRiaJu50KFbCMcvpCq FspVH0GTLfW+QpKLEpIIhfX1CZaoSV4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-9-RbvNbJXBMpOmXH05u58B6Q-1; Tue, 07 Mar 2023 09:15:58 -0500 X-MC-Unique: RbvNbJXBMpOmXH05u58B6Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 570351C06EC0; Tue, 7 Mar 2023 14:15:58 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8DB332026D4B; Tue, 7 Mar 2023 14:15:57 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 06/17] block: ublk_drv: mark device as LIVE before adding disk Date: Tue, 7 Mar 2023 22:15:09 +0800 Message-Id: <20230307141520.793891-7-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org IO can be started before add_disk() returns, such as reading parititon table, then the monitor work should work for making forward progress. So mark device as LIVE before adding disk, meantime change to DEAD if add_disk() fails. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 088d9efffe6b..2778c14a8407 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1605,17 +1605,18 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd) set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); get_device(&ub->cdev_dev); + ub->dev_info.state = UBLK_S_DEV_LIVE; ret = add_disk(disk); if (ret) { /* * Has to drop the reference since ->free_disk won't be * called in case of add_disk failure. */ + ub->dev_info.state = UBLK_S_DEV_DEAD; ublk_put_device(ub); goto out_put_disk; } set_bit(UB_STATE_USED, &ub->state); - ub->dev_info.state = UBLK_S_DEV_LIVE; out_put_disk: if (ret) put_disk(disk); From patchwork Tue Mar 7 14:15:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B4DFC678D4 for ; Tue, 7 Mar 2023 14:22:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231366AbjCGOWk (ORCPT ); Tue, 7 Mar 2023 09:22:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231439AbjCGOWP (ORCPT ); Tue, 7 Mar 2023 09:22:15 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 567D3867CE for ; Tue, 7 Mar 2023 06:17:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/NBwVYnHSaZrYp0gKoT57Tl/vQn9azd4FuuYHvk1QnI=; b=UigyT633F3W18Zr+algCohnr4PlUfQIUKLGW5Q+wfAdSspGXmG2Bu/dndStjP5wUR2ePEo kGn9K5iweuaiblb5z+c+kOxn/5K0Zwk8uIZMUwSSC09Qsi3dPzSV3nOxHokxrao8HBsvx1 ApxnScYUxMr8cnzyqfZ5STddLTV9Vdk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-564-0b6g09CRMgG8Ofc3pyTNSA-1; Tue, 07 Mar 2023 09:16:03 -0500 X-MC-Unique: 0b6g09CRMgG8Ofc3pyTNSA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9718838173CB; Tue, 7 Mar 2023 14:16:01 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id C2D132026D4B; Tue, 7 Mar 2023 14:16:00 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 07/17] block: ublk_drv: add common exit handling Date: Tue, 7 Mar 2023 22:15:10 +0800 Message-Id: <20230307141520.793891-8-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Simplify exit handling a bit, and prepare for supporting fused command. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 2778c14a8407..c03728c13eea 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -655,14 +655,15 @@ static void ublk_complete_rq(struct request *req) struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; unsigned int unmapped_bytes; + int res = BLK_STS_OK; /* failed read IO if nothing is read */ if (!io->res && req_op(req) == REQ_OP_READ) io->res = -EIO; if (io->res < 0) { - blk_mq_end_request(req, errno_to_blk_status(io->res)); - return; + res = errno_to_blk_status(io->res); + goto exit; } /* @@ -671,10 +672,8 @@ static void ublk_complete_rq(struct request *req) * * Both the two needn't unmap. */ - if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE) { - blk_mq_end_request(req, BLK_STS_OK); - return; - } + if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE) + goto exit; /* for READ request, writing data in iod->addr to rq buffers */ unmapped_bytes = ublk_unmap_io(ubq, req, io); @@ -691,6 +690,10 @@ static void ublk_complete_rq(struct request *req) blk_mq_requeue_request(req, true); else __blk_mq_end_request(req, BLK_STS_OK); + + return; +exit: + blk_mq_end_request(req, res); } /* From patchwork Tue Mar 7 14:15:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C0FCC678D5 for ; Tue, 7 Mar 2023 14:22:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230212AbjCGOWl (ORCPT ); Tue, 7 Mar 2023 09:22:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231443AbjCGOWQ (ORCPT ); Tue, 7 Mar 2023 09:22:16 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91E708C524 for ; Tue, 7 Mar 2023 06:17:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198573; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t4v5u4I2dfXmVCnl1+VYKxEX2LM+SV9Vt0+RxgXtI8Q=; b=CSREKh5+PDWhN97fA6Km//8SQaaRsYxAKxHoamkF4Gd/Oi5EW+sSX/EUUFmBOr7Jfa4SMk mQiZm3GRQ3SGQPpZ1f8ZI90qro5DO1m7MUnwsQcILVbk8JzosWevNVou3RFGdG3rijFeJM fKWK11LXPGwm6Thp1q0mCmj7oKi4d4Y= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-381-MhnN1RNgMNu-krGUnAurYA-1; Tue, 07 Mar 2023 09:16:08 -0500 X-MC-Unique: MhnN1RNgMNu-krGUnAurYA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1613E1802D58; Tue, 7 Mar 2023 14:16:05 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 57C5A2166B2A; Tue, 7 Mar 2023 14:16:04 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 08/17] block: ublk_drv: don't consider flush request in map/unmap io Date: Tue, 7 Mar 2023 22:15:11 +0800 Message-Id: <20230307141520.793891-9-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org There isn't data in request of REQ_OP_FLUSH always, so don't consider it in both ublk_map_io() and ublk_unmap_io(). Signed-off-by: Ming Lei Reviewed-by: Ziyang Zhang --- drivers/block/ublk_drv.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index c03728c13eea..624ee4ca20e5 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -529,15 +529,13 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + /* * no zero copy, we delay copy WRITE request data into ublksrv * context and the big benefit is that pinning pages in current * context is pretty fast, see ublk_pin_user_pages */ - if (req_op(req) != REQ_OP_WRITE && req_op(req) != REQ_OP_FLUSH) - return rq_bytes; - - if (ublk_rq_has_data(req)) { + if (ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -772,9 +770,7 @@ static inline void __ublk_rq_task_work(struct request *req) return; } - if (ublk_need_get_data(ubq) && - (req_op(req) == REQ_OP_WRITE || - req_op(req) == REQ_OP_FLUSH)) { + if (ublk_need_get_data(ubq) && (req_op(req) == REQ_OP_WRITE)) { /* * We have not handled UBLK_IO_NEED_GET_DATA command yet, * so immepdately pass UBLK_IO_RES_NEED_GET_DATA to ublksrv From patchwork Tue Mar 7 14:15:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04F59C678D4 for ; Tue, 7 Mar 2023 14:26:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230253AbjCGO0I (ORCPT ); Tue, 7 Mar 2023 09:26:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230307AbjCGOZo (ORCPT ); Tue, 7 Mar 2023 09:25:44 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7BC6907B4 for ; Tue, 7 Mar 2023 06:20:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198761; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RQpe5yrYiRdAwej0j43axVzdYRPoNAlMjHxJ4W7+COg=; b=FAejS5e/zQ41LRWpigwq5HsOj1fUUq8c8+RJNWjmbnwTHe4G65MfoOEuXB8EK+WJKfHFGj wZLdfM3G1o56/AUvMHCwLjAT2smdVvG1YXRBwHHPDxdyPOYN0Osvd85nL6DPnAe5VqFCm8 6iwAQVz9A8M+C/0Z3acMCqXaP7C4bzc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-540-8RPNgXGMPl63F7F25p2f2A-1; Tue, 07 Mar 2023 09:16:10 -0500 X-MC-Unique: 8RPNgXGMPl63F7F25p2f2A-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 54F5D877CB4; Tue, 7 Mar 2023 14:16:08 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 835BB492C14; Tue, 7 Mar 2023 14:16:07 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 09/17] block: ublk_drv: add two helpers to clean up map/unmap request Date: Tue, 7 Mar 2023 22:15:12 +0800 Message-Id: <20230307141520.793891-10-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add two helpers for checking if map/unmap is needed, since we may have passthrough request which needs map or unmap in future, such as for supporting report zones. Meantime don't mark ublk_copy_user_pages as inline since this function is a bit fat now. Signed-off-by: Ming Lei Reviewed-by: Ziyang Zhang --- drivers/block/ublk_drv.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 624ee4ca20e5..115590bb60b2 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -488,8 +488,7 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, return done; } -static inline int ublk_copy_user_pages(struct ublk_map_data *data, - bool to_vm) +static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) { const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; const unsigned long start_vm = data->io->addr; @@ -525,6 +524,16 @@ static inline int ublk_copy_user_pages(struct ublk_map_data *data, return done; } +static inline bool ublk_need_map_req(const struct request *req) +{ + return ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE; +} + +static inline bool ublk_need_unmap_req(const struct request *req) +{ + return ublk_rq_has_data(req) && req_op(req) == REQ_OP_READ; +} + static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, struct ublk_io *io) { @@ -535,7 +544,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, * context and the big benefit is that pinning pages in current * context is pretty fast, see ublk_pin_user_pages */ - if (ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE) { + if (ublk_need_map_req(req)) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -556,7 +565,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, { const unsigned int rq_bytes = blk_rq_bytes(req); - if (req_op(req) == REQ_OP_READ && ublk_rq_has_data(req)) { + if (ublk_need_unmap_req(req)) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -770,7 +779,7 @@ static inline void __ublk_rq_task_work(struct request *req) return; } - if (ublk_need_get_data(ubq) && (req_op(req) == REQ_OP_WRITE)) { + if (ublk_need_get_data(ubq) && ublk_need_map_req(req)) { /* * We have not handled UBLK_IO_NEED_GET_DATA command yet, * so immepdately pass UBLK_IO_RES_NEED_GET_DATA to ublksrv From patchwork Tue Mar 7 14:15:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163682 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B180C678D5 for ; Tue, 7 Mar 2023 14:26:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230186AbjCGO0K (ORCPT ); Tue, 7 Mar 2023 09:26:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229898AbjCGOZo (ORCPT ); Tue, 7 Mar 2023 09:25:44 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC1769224B for ; Tue, 7 Mar 2023 06:20:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198801; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A+/fIrb0506DI1ezqKqU90isQNExiPGr2EwqQWVGFOo=; b=KltfBneyoFwFe3ypUOkTRTdy2DE7GGudyD5QCUhyb0KKFEcbUI/DIKnmukffcnzCPaBzCP ot21ihO1AXblmV58Vg3SqF7pRPK6bvwYF5PyHe7s1P/+V3LcAW5/xcbfUnQ07tCh7hCUKF 9duQ8xO/z0mJD2U+y1PBV7R3xPtNYbI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-454-PdKJqBIfOqOXj5ZMFQsajw-1; Tue, 07 Mar 2023 09:16:12 -0500 X-MC-Unique: PdKJqBIfOqOXj5ZMFQsajw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D318E101A55E; Tue, 7 Mar 2023 14:16:11 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 142A1492C14; Tue, 7 Mar 2023 14:16:10 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 10/17] block: ublk_drv: clean up several helpers Date: Tue, 7 Mar 2023 22:15:13 +0800 Message-Id: <20230307141520.793891-11-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert the following pattern in several helpers if (Z) return true return false into: return Z; Signed-off-by: Ming Lei Reviewed-by: Ziyang Zhang --- drivers/block/ublk_drv.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 115590bb60b2..5b69e239dca3 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -298,9 +298,7 @@ static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq) static inline bool ublk_need_get_data(const struct ublk_queue *ubq) { - if (ubq->flags & UBLK_F_NEED_GET_DATA) - return true; - return false; + return ubq->flags & UBLK_F_NEED_GET_DATA; } static struct ublk_device *ublk_get_device(struct ublk_device *ub) @@ -349,25 +347,19 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id) static inline bool ublk_queue_can_use_recovery_reissue( struct ublk_queue *ubq) { - if ((ubq->flags & UBLK_F_USER_RECOVERY) && - (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE)) - return true; - return false; + return (ubq->flags & UBLK_F_USER_RECOVERY) && + (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE); } static inline bool ublk_queue_can_use_recovery( struct ublk_queue *ubq) { - if (ubq->flags & UBLK_F_USER_RECOVERY) - return true; - return false; + return ubq->flags & UBLK_F_USER_RECOVERY; } static inline bool ublk_can_use_recovery(struct ublk_device *ub) { - if (ub->dev_info.flags & UBLK_F_USER_RECOVERY) - return true; - return false; + return ub->dev_info.flags & UBLK_F_USER_RECOVERY; } static void ublk_free_disk(struct gendisk *disk) From patchwork Tue Mar 7 14:15:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37291C6FD1A for ; Tue, 7 Mar 2023 14:26:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230504AbjCGO0K (ORCPT ); Tue, 7 Mar 2023 09:26:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230445AbjCGOZo (ORCPT ); Tue, 7 Mar 2023 09:25:44 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0E4F911EF for ; Tue, 7 Mar 2023 06:20:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198767; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BrMug2+KYgQ2/w79vPRnBZdsK8BPJJ/yNpKhqUGVkYE=; b=a9UoNC7ueFekzEgXclDfTChu/rqHGApH3XywSym172JMpyX7ICH+2ZLb9eb1vqk1TgLXgH 3Ggw/aS3nTjI4ZpEvl7KV8loYl/iO/VuDTbBn3g1JDDMiInhNNmbn9CXVJPCqEu94kEA2m 6yyFqCvmy3ghOpFXet+a4KqaBWyK3BY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-226-y1s1f49SOSCz2dJ3YNVCHg-1; Tue, 07 Mar 2023 09:16:16 -0500 X-MC-Unique: y1s1f49SOSCz2dJ3YNVCHg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1584E3C0E450; Tue, 7 Mar 2023 14:16:15 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 45D202166B26; Tue, 7 Mar 2023 14:16:13 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 11/17] block: ublk_drv: cleanup 'struct ublk_map_data' Date: Tue, 7 Mar 2023 22:15:14 +0800 Message-Id: <20230307141520.793891-12-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org 'struct ublk_map_data' is passed to ublk_copy_user_pages() for copying data between userspace buffer and request pages. Here what matters is userspace buffer address/len and 'struct request', so replace ->io field with user buffer address, and rename max_bytes as len. Meantime remove 'ubq' field from ublk_map_data, since it isn't used any more. Then code becomes more readable. Signed-off-by: Ming Lei Reviewed-by: Ziyang Zhang --- drivers/block/ublk_drv.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 5b69e239dca3..4449e1013841 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -420,10 +420,9 @@ static const struct block_device_operations ub_fops = { #define UBLK_MAX_PIN_PAGES 32 struct ublk_map_data { - const struct ublk_queue *ubq; const struct request *rq; - const struct ublk_io *io; - unsigned max_bytes; + unsigned long ubuf; + unsigned int len; }; struct ublk_io_iter { @@ -483,14 +482,14 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) { const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; - const unsigned long start_vm = data->io->addr; + const unsigned long start_vm = data->ubuf; unsigned int done = 0; struct ublk_io_iter iter = { .pg_off = start_vm & (PAGE_SIZE - 1), .bio = data->rq->bio, .iter = data->rq->bio->bi_iter, }; - const unsigned int nr_pages = round_up(data->max_bytes + + const unsigned int nr_pages = round_up(data->len + (start_vm & (PAGE_SIZE - 1)), PAGE_SIZE) >> PAGE_SHIFT; while (done < nr_pages) { @@ -503,13 +502,13 @@ static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) iter.pages); if (iter.nr_pages <= 0) return done == 0 ? iter.nr_pages : done; - len = ublk_copy_io_pages(&iter, data->max_bytes, to_vm); + len = ublk_copy_io_pages(&iter, data->len, to_vm); for (i = 0; i < iter.nr_pages; i++) { if (to_vm) set_page_dirty(iter.pages[i]); put_page(iter.pages[i]); } - data->max_bytes -= len; + data->len -= len; done += iter.nr_pages; } @@ -538,15 +537,14 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, */ if (ublk_need_map_req(req)) { struct ublk_map_data data = { - .ubq = ubq, .rq = req, - .io = io, - .max_bytes = rq_bytes, + .ubuf = io->addr, + .len = rq_bytes, }; ublk_copy_user_pages(&data, true); - return rq_bytes - data.max_bytes; + return rq_bytes - data.len; } return rq_bytes; } @@ -559,17 +557,16 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, if (ublk_need_unmap_req(req)) { struct ublk_map_data data = { - .ubq = ubq, .rq = req, - .io = io, - .max_bytes = io->res, + .ubuf = io->addr, + .len = io->res, }; WARN_ON_ONCE(io->res > rq_bytes); ublk_copy_user_pages(&data, false); - return io->res - data.max_bytes; + return io->res - data.len; } return rq_bytes; } From patchwork Tue Mar 7 14:15:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163686 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71C31C678D4 for ; Tue, 7 Mar 2023 14:26:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229695AbjCGO0O (ORCPT ); Tue, 7 Mar 2023 09:26:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229838AbjCGOZv (ORCPT ); Tue, 7 Mar 2023 09:25:51 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37EF792BC2 for ; Tue, 7 Mar 2023 06:20:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198770; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=menGxhpISQgd5Ejcu6uxHWBkF1BlxJKwI08R0IH6+S8=; b=fqLf0TRnba3HgRUKCbX9N/KxkiH/gaITBgjPwiX4L+sFscaBsXP23vpoe2VPXkTcNuS1+t 57CZyaUaivFS6bb1bxbY6/WEMbzP7EQz3/1L55bcn3VVIpn+ydR5kfPk9cT/6FWXnDE0jn 3A4cLJJVr89pelbYERecc/YRkffxY8A= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-638-iTAV2ed1MnGtXeUOgUTgtg-1; Tue, 07 Mar 2023 09:16:19 -0500 X-MC-Unique: iTAV2ed1MnGtXeUOgUTgtg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B9C8A38173C3; Tue, 7 Mar 2023 14:16:18 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id DF78A1121314; Tue, 7 Mar 2023 14:16:17 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 12/17] block: ublk_drv: cleanup ublk_copy_user_pages Date: Tue, 7 Mar 2023 22:15:15 +0800 Message-Id: <20230307141520.793891-13-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Clean up ublk_copy_user_pages() by using iov iter, and code gets simplified a lot and becomes much more readable than before. Signed-off-by: Ming Lei Reviewed-by: Ziyang Zhang --- drivers/block/ublk_drv.c | 112 +++++++++++++++++---------------------- 1 file changed, 49 insertions(+), 63 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 4449e1013841..f07eb8a54357 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -419,49 +419,39 @@ static const struct block_device_operations ub_fops = { #define UBLK_MAX_PIN_PAGES 32 -struct ublk_map_data { - const struct request *rq; - unsigned long ubuf; - unsigned int len; -}; - struct ublk_io_iter { struct page *pages[UBLK_MAX_PIN_PAGES]; - unsigned pg_off; /* offset in the 1st page in pages */ - int nr_pages; /* how many page pointers in pages */ struct bio *bio; struct bvec_iter iter; }; -static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, - unsigned max_bytes, bool to_vm) +/* return how many pages are copied */ +static void ublk_copy_io_pages(struct ublk_io_iter *data, + size_t total, size_t pg_off, int dir) { - const unsigned total = min_t(unsigned, max_bytes, - PAGE_SIZE - data->pg_off + - ((data->nr_pages - 1) << PAGE_SHIFT)); unsigned done = 0; unsigned pg_idx = 0; while (done < total) { struct bio_vec bv = bio_iter_iovec(data->bio, data->iter); - const unsigned int bytes = min3(bv.bv_len, total - done, - (unsigned)(PAGE_SIZE - data->pg_off)); + unsigned int bytes = min3(bv.bv_len, (unsigned)total - done, + (unsigned)(PAGE_SIZE - pg_off)); void *bv_buf = bvec_kmap_local(&bv); void *pg_buf = kmap_local_page(data->pages[pg_idx]); - if (to_vm) - memcpy(pg_buf + data->pg_off, bv_buf, bytes); + if (dir == ITER_DEST) + memcpy(pg_buf + pg_off, bv_buf, bytes); else - memcpy(bv_buf, pg_buf + data->pg_off, bytes); + memcpy(bv_buf, pg_buf + pg_off, bytes); kunmap_local(pg_buf); kunmap_local(bv_buf); /* advance page array */ - data->pg_off += bytes; - if (data->pg_off == PAGE_SIZE) { + pg_off += bytes; + if (pg_off == PAGE_SIZE) { pg_idx += 1; - data->pg_off = 0; + pg_off = 0; } done += bytes; @@ -475,41 +465,40 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, data->iter = data->bio->bi_iter; } } - - return done; } -static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) +/* + * Copy data between request pages and io_iter, and 'offset' + * is the start point of linear offset of request. + */ +static size_t ublk_copy_user_pages(const struct request *req, + struct iov_iter *uiter, int dir) { - const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; - const unsigned long start_vm = data->ubuf; - unsigned int done = 0; struct ublk_io_iter iter = { - .pg_off = start_vm & (PAGE_SIZE - 1), - .bio = data->rq->bio, - .iter = data->rq->bio->bi_iter, + .bio = req->bio, + .iter = req->bio->bi_iter, }; - const unsigned int nr_pages = round_up(data->len + - (start_vm & (PAGE_SIZE - 1)), PAGE_SIZE) >> PAGE_SHIFT; - - while (done < nr_pages) { - const unsigned to_pin = min_t(unsigned, UBLK_MAX_PIN_PAGES, - nr_pages - done); - unsigned i, len; - - iter.nr_pages = get_user_pages_fast(start_vm + - (done << PAGE_SHIFT), to_pin, gup_flags, - iter.pages); - if (iter.nr_pages <= 0) - return done == 0 ? iter.nr_pages : done; - len = ublk_copy_io_pages(&iter, data->len, to_vm); - for (i = 0; i < iter.nr_pages; i++) { - if (to_vm) + size_t done = 0; + + while (iov_iter_count(uiter) && iter.bio) { + unsigned nr_pages; + size_t len, off; + int i; + + len = iov_iter_get_pages2(uiter, iter.pages, + iov_iter_count(uiter), + UBLK_MAX_PIN_PAGES, &off); + if (len <= 0) + return done; + + ublk_copy_io_pages(&iter, len, off, dir); + nr_pages = DIV_ROUND_UP(len + off, PAGE_SIZE); + for (i = 0; i < nr_pages; i++) { + if (dir == ITER_DEST) set_page_dirty(iter.pages[i]); put_page(iter.pages[i]); } - data->len -= len; - done += iter.nr_pages; + done += len; } return done; @@ -536,15 +525,14 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, * context is pretty fast, see ublk_pin_user_pages */ if (ublk_need_map_req(req)) { - struct ublk_map_data data = { - .rq = req, - .ubuf = io->addr, - .len = rq_bytes, - }; + struct iov_iter iter; + struct iovec iov; + const int dir = ITER_DEST; - ublk_copy_user_pages(&data, true); + import_single_range(dir, (void __user *)io->addr, + rq_bytes, &iov, &iter); - return rq_bytes - data.len; + return ublk_copy_user_pages(req, &iter, dir); } return rq_bytes; } @@ -556,17 +544,15 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, const unsigned int rq_bytes = blk_rq_bytes(req); if (ublk_need_unmap_req(req)) { - struct ublk_map_data data = { - .rq = req, - .ubuf = io->addr, - .len = io->res, - }; + struct iov_iter iter; + struct iovec iov; + const int dir = ITER_SOURCE; WARN_ON_ONCE(io->res > rq_bytes); - ublk_copy_user_pages(&data, false); - - return io->res - data.len; + import_single_range(dir, (void __user *)io->addr, + io->res, &iov, &iter); + return ublk_copy_user_pages(req, &iter, dir); } return rq_bytes; } From patchwork Tue Mar 7 14:15:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59868C678D5 for ; Tue, 7 Mar 2023 14:23:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231279AbjCGOXg (ORCPT ); Tue, 7 Mar 2023 09:23:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231254AbjCGOXE (ORCPT ); Tue, 7 Mar 2023 09:23:04 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6C6E532B0 for ; Tue, 7 Mar 2023 06:17:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198583; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Zhr1gEfDQ1b53XRiU/ym+oJgS4bMS4YfYGHBvHz+PY4=; b=HzSO2E0Lj0KnHIgBYimaTTxnLP5LrkZ+0/4BjBnUIlWbFUrOj5f2GJsjXhZUu8ZdwgTFgS Tuws8ELOowJ7dR+5hH+mU8Rn1YhDvvRSgphK1ifl5W8q7011X3act53/wkcehQAYOWWqqJ ho3L13D/4v3gL3NpW75a0/8BI22ZbPs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-632-KzsMQm_lPHizF4TvFWGmSw-1; Tue, 07 Mar 2023 09:16:22 -0500 X-MC-Unique: KzsMQm_lPHizF4TvFWGmSw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 130C62807D62; Tue, 7 Mar 2023 14:16:22 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2FDE01121314; Tue, 7 Mar 2023 14:16:20 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 13/17] block: ublk_drv: grab request reference when the request is handled by userspace Date: Tue, 7 Mar 2023 22:15:16 +0800 Message-Id: <20230307141520.793891-14-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add one reference counter into request pdu data, and hold this reference in the request's lifetime. This way is always safe. In theory, the ublk request won't be completed until fused commands are done. However, it is userspace, and application can submit fused command at will. Prepare for supporting zero copy, which needs to retrieve request buffer by fused command, so we have to guarantee: - the fused command can't succeed unless the request isn't queued - when any fused command is successful, this request can't be freed until all fused commands on this request are done. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 67 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 64 insertions(+), 3 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index f07eb8a54357..ba252932951c 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #define UBLK_MINORS (1U << MINORBITS) @@ -62,6 +63,17 @@ struct ublk_rq_data { struct llist_node node; struct callback_head work; + + /* + * Only for applying fused command to support zero copy: + * + * - if there is any fused command aiming at this request, not complete + * request until all fused commands are done + * + * - fused command has to fail unless this reference is grabbed + * successfully + */ + struct kref ref; }; struct ublk_uring_cmd_pdu { @@ -180,6 +192,9 @@ struct ublk_params_header { __u32 types; }; +static inline void __ublk_complete_rq(struct request *req); +static void ublk_complete_rq(struct kref *ref); + static dev_t ublk_chr_devt; static struct class *ublk_chr_class; @@ -288,6 +303,35 @@ static int ublk_apply_params(struct ublk_device *ub) return 0; } +static inline bool ublk_support_zc(const struct ublk_queue *ubq) +{ + return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY; +} + +static inline bool ublk_get_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_support_zc(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + return kref_get_unless_zero(&data->ref); + } + + return true; +} + +static inline void ublk_put_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_support_zc(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + kref_put(&data->ref, ublk_complete_rq); + } else { + __ublk_complete_rq(req); + } +} + static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq) { if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) && @@ -632,13 +676,19 @@ static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) } /* todo: handle partial completion */ -static void ublk_complete_rq(struct request *req) +static inline void __ublk_complete_rq(struct request *req) { struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; unsigned int unmapped_bytes; int res = BLK_STS_OK; + /* called from ublk_abort_queue() code path */ + if (io->flags & UBLK_IO_FLAG_ABORTED) { + res = BLK_STS_IOERR; + goto exit; + } + /* failed read IO if nothing is read */ if (!io->res && req_op(req) == REQ_OP_READ) io->res = -EIO; @@ -678,6 +728,15 @@ static void ublk_complete_rq(struct request *req) blk_mq_end_request(req, res); } +static void ublk_complete_rq(struct kref *ref) +{ + struct ublk_rq_data *data = container_of(ref, struct ublk_rq_data, + ref); + struct request *req = blk_mq_rq_from_pdu(data); + + __ublk_complete_rq(req); +} + /* * Since __ublk_rq_task_work always fails requests immediately during * exiting, __ublk_fail_req() is only called from abort context during @@ -696,7 +755,7 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io, if (ublk_queue_can_use_recovery_reissue(ubq)) blk_mq_requeue_request(req, false); else - blk_mq_end_request(req, BLK_STS_IOERR); + ublk_put_req_ref(ubq, req); } } @@ -732,6 +791,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq, static inline void __ublk_rq_task_work(struct request *req) { struct ublk_queue *ubq = req->mq_hctx->driver_data; + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); int tag = req->tag; struct ublk_io *io = &ubq->ios[tag]; unsigned int mapped_bytes; @@ -803,6 +863,7 @@ static inline void __ublk_rq_task_work(struct request *req) mapped_bytes >> 9; } + kref_init(&data->ref); ubq_complete_io_cmd(io, UBLK_IO_RES_OK); } @@ -1013,7 +1074,7 @@ static void ublk_commit_completion(struct ublk_device *ub, req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag); if (req && likely(!blk_should_fake_timeout(req->q))) - ublk_complete_rq(req); + ublk_put_req_ref(ubq, req); } /* From patchwork Tue Mar 7 14:15:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163678 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E8E4C678D5 for ; Tue, 7 Mar 2023 14:23:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231129AbjCGOXt (ORCPT ); Tue, 7 Mar 2023 09:23:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231436AbjCGOXT (ORCPT ); Tue, 7 Mar 2023 09:23:19 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA3929EF5 for ; Tue, 7 Mar 2023 06:17:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n3W1bhWob2cqsJzOEtepU6FLEJePtTlcXD6B2i+jL6E=; b=AnRWT8Y+aI9vksgR7u1U2Sb++E/cr6u9WcNra5wj4H3almvmr/jSZjraB4AaAzOufUUhL7 rcsw41SKn7XoONn8LtJ53dtiixNu9S5JDS3YOyxUfBG3auboISY2ZLnCRYJX47a/lbfXCM 5kgkfTZMcNB6sAOC7pLpHKiy2LMPOrI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-504-QbMbx1ypNWernilKofLC1g-1; Tue, 07 Mar 2023 09:16:26 -0500 X-MC-Unique: QbMbx1ypNWernilKofLC1g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9E06A3C0E450; Tue, 7 Mar 2023 14:16:25 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB236140EBF4; Tue, 7 Mar 2023 14:16:24 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 14/17] block: ublk_drv: support to copy any part of request pages Date: Tue, 7 Mar 2023 22:15:17 +0800 Message-Id: <20230307141520.793891-15-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add 'offset' to 'struct ublk_map_data', so that ublk_copy_user_pages() can be used to copy any sub-buffer(linear mapped) of the request. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index ba252932951c..c0cb99cb0c3c 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -511,19 +511,37 @@ static void ublk_copy_io_pages(struct ublk_io_iter *data, } } +static bool ublk_advance_io_iter(struct ublk_io_iter *iter, unsigned int offset) +{ + struct bio *bio = iter->bio; + + for_each_bio(bio) { + if (bio->bi_iter.bi_size > offset) { + iter->bio = bio; + iter->iter = bio->bi_iter; + bio_advance_iter(iter->bio, &iter->iter, offset); + return true; + } + offset -= bio->bi_iter.bi_size; + } + return false; +} + /* * Copy data between request pages and io_iter, and 'offset' * is the start point of linear offset of request. */ static size_t ublk_copy_user_pages(const struct request *req, - struct iov_iter *uiter, int dir) + unsigned offset, struct iov_iter *uiter, int dir) { struct ublk_io_iter iter = { .bio = req->bio, - .iter = req->bio->bi_iter, }; size_t done = 0; + if (!ublk_advance_io_iter(&iter, offset)) + return 0; + while (iov_iter_count(uiter) && iter.bio) { unsigned nr_pages; size_t len, off; @@ -576,7 +594,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, import_single_range(dir, (void __user *)io->addr, rq_bytes, &iov, &iter); - return ublk_copy_user_pages(req, &iter, dir); + return ublk_copy_user_pages(req, 0, &iter, dir); } return rq_bytes; } @@ -596,7 +614,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, import_single_range(dir, (void __user *)io->addr, io->res, &iov, &iter); - return ublk_copy_user_pages(req, &iter, dir); + return ublk_copy_user_pages(req, 0, &iter, dir); } return rq_bytes; } From patchwork Tue Mar 7 14:15:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F29F1C6FD1B for ; Tue, 7 Mar 2023 14:26:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229948AbjCGO0L (ORCPT ); Tue, 7 Mar 2023 09:26:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230310AbjCGOZp (ORCPT ); Tue, 7 Mar 2023 09:25:45 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B96BF90799 for ; Tue, 7 Mar 2023 06:20:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198782; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hl0cNTt38JN9bLI9XVfej2Iu/BKlQnkSS2AR2nd9ZC0=; b=RJNpCVmnbVRS7V+I9rEzpTga558R3x44eawR71blBXI7HJNIhz1R+s0Yty2HlopI+ikthp 2Q2tR0E0tv5+TXiAfkMhaLLwEbF79vRfEgAeFYxDdCUemxgbMyTHzaZbzT6ZCDdvJ5xfG9 LT2YGZ5dSjiSJ/U3fSE64JsKiUbWMas= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-214-WytKJ-ZnO_uy5LwJFe0p0Q-1; Tue, 07 Mar 2023 09:16:29 -0500 X-MC-Unique: WytKJ-ZnO_uy5LwJFe0p0Q-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E0EBC185A794; Tue, 7 Mar 2023 14:16:28 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 22EDA492C18; Tue, 7 Mar 2023 14:16:27 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 15/17] block: ublk_drv: add read()/write() support for ublk char device Date: Tue, 7 Mar 2023 22:15:18 +0800 Message-Id: <20230307141520.793891-16-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org We are going to support zero copy by fused uring command, the userspace can't read from or write to the io buffer any more, it becomes not flexible for applications: 1) some targets need to zero buffer explicitly, such as when reading unmapped qcow2 cluster 2) some targets need to support passthrough command, such as zoned report zones, and still need to read/write the io buffer Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so ublk server can handle the above cases easily. This also can help to make zero copy becoming the primary option, and non-zero-copy will become legacy code path since the added read()/write() can cover non-zero-copy feature. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 131 ++++++++++++++++++++++++++++++++++ include/uapi/linux/ublk_cmd.h | 31 +++++++- 2 files changed, 161 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index c0cb99cb0c3c..452deec571f7 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1318,6 +1318,36 @@ static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id, ublk_queue_cmd(ubq, req); } +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, + struct ublk_queue *ubq, int tag, size_t offset) +{ + struct request *req; + + if (!ublk_support_zc(ubq)) + return NULL; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return NULL; + + if (!ublk_get_req_ref(ubq, req)) + return NULL; + + if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) + goto fail_put; + + if (!ublk_rq_has_data(req)) + goto fail_put; + + if (offset > blk_rq_bytes(req)) + goto fail_put; + + return req; +fail_put: + ublk_put_req_ref(ubq, req); + return NULL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd; @@ -1422,11 +1452,112 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) return -EIOCBQUEUED; } +static inline bool ublk_check_ubuf_dir(const struct request *req, + int ubuf_dir) +{ + /* copy ubuf to request pages */ + if (req_op(req) == REQ_OP_READ && ubuf_dir == ITER_SOURCE) + return true; + + /* copy request pages to ubuf */ + if (req_op(req) == REQ_OP_WRITE && ubuf_dir == ITER_DEST) + return true; + + return false; +} + +static struct request *ublk_check_and_get_req(struct kiocb *iocb, + struct iov_iter *iter, size_t *off, int dir) +{ + struct ublk_device *ub = iocb->ki_filp->private_data; + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + u16 tag, q_id; + + if (!ub) + return ERR_PTR(-EACCES); + + if (!user_backed_iter(iter)) + return ERR_PTR(-EACCES); + + if (ub->dev_info.state == UBLK_S_DEV_DEAD) + return ERR_PTR(-EACCES); + + tag = ublk_pos_to_tag(iocb->ki_pos); + q_id = ublk_pos_to_hwq(iocb->ki_pos); + buf_off = ublk_pos_to_buf_offset(iocb->ki_pos); + + if (q_id >= ub->dev_info.nr_hw_queues) + return ERR_PTR(-EINVAL); + + ubq = ublk_get_queue(ub, q_id); + if (!ubq) + return ERR_PTR(-EINVAL); + + if (tag >= ubq->q_depth) + return ERR_PTR(-EINVAL); + + req = __ublk_check_and_get_req(ub, ubq, tag, buf_off); + if (!req) + return ERR_PTR(-EINVAL); + + if (!req->mq_hctx || !req->mq_hctx->driver_data) + goto fail; + + if (!ublk_check_ubuf_dir(req, dir)) + goto fail; + + *off = buf_off; + return req; +fail: + ublk_put_req_ref(ubq, req); + return ERR_PTR(-EACCES); +} + +static ssize_t ublk_ch_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, to, &buf_off, ITER_DEST); + if (unlikely(IS_ERR(req))) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, to, ITER_DEST); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + +static ssize_t ublk_ch_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, from, &buf_off, ITER_SOURCE); + if (unlikely(IS_ERR(req))) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, from, ITER_SOURCE); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + static const struct file_operations ublk_ch_fops = { .owner = THIS_MODULE, .open = ublk_ch_open, .release = ublk_ch_release, .llseek = no_llseek, + .read_iter = ublk_ch_read_iter, + .write_iter = ublk_ch_write_iter, .uring_cmd = ublk_ch_uring_cmd, .mmap = ublk_ch_mmap, }; diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index f6238ccc7800..d1a6b3dc0327 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -54,7 +54,36 @@ #define UBLKSRV_IO_BUF_OFFSET 0x80000000 /* tag bit is 12bit, so at most 4096 IOs for each queue */ -#define UBLK_MAX_QUEUE_DEPTH 4096 +#define UBLK_TAG_BITS 12 +#define UBLK_MAX_QUEUE_DEPTH (1U << UBLK_TAG_BITS) + +/* used for locating each io buffer for pread()/pwrite() on char device */ +#define UBLK_BUFS_SIZE_BITS 42 +#define UBLK_BUFS_SIZE_MASK ((1ULL << UBLK_BUFS_SIZE_BITS) - 1) +#define UBLK_BUF_SIZE_BITS (UBLK_BUFS_SIZE_BITS - UBLK_TAG_BITS) +#define UBLK_BUF_MAX_SIZE (1ULL << UBLK_BUF_SIZE_BITS) + +static inline __u16 ublk_pos_to_hwq(__u64 pos) +{ + return pos >> UBLK_BUFS_SIZE_BITS; +} + +static inline __u32 ublk_pos_to_buf_offset(__u64 pos) +{ + return (pos & UBLK_BUFS_SIZE_MASK) & (UBLK_BUF_MAX_SIZE - 1); +} + +static inline __u16 ublk_pos_to_tag(__u64 pos) +{ + return (pos & UBLK_BUFS_SIZE_MASK) >> UBLK_BUF_SIZE_BITS; +} + +/* offset of single buffer, which has to be < UBLK_BUX_MAX_SIZE */ +static inline __u64 ublk_pos(__u16 q_id, __u16 tag, __u32 offset) +{ + return (((__u64)q_id) << UBLK_BUFS_SIZE_BITS) | + ((((__u64)tag) << UBLK_BUF_SIZE_BITS) + offset); +} /* * zero copy requires 4k block size, and can remap ublk driver's io From patchwork Tue Mar 7 14:15:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6042CC6FD1B for ; Tue, 7 Mar 2023 14:26:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231140AbjCGO0N (ORCPT ); Tue, 7 Mar 2023 09:26:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbjCGOZr (ORCPT ); Tue, 7 Mar 2023 09:25:47 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28968911F5 for ; Tue, 7 Mar 2023 06:20:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198800; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wyNM1N4WNAdhATuJ96hyAAFSa/inNN+RlUlxIKIFdaI=; b=SXY2rYyUhY0X6E1bLkh2U4ObvkV0DwZraJJoLfDkmfOxiypRdgVvVCrgdXk7qMJJQJe/QU BY6jANkpzvcTbrhV8w+kAOP43f5Ljc3pFnlTMDJKfABD2+QI9/Rn3zKlQV1AN17R3lsmmS 04t7fWbChTfJmxCddgCseVyxkbf2v3s= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-146-9n_NcqnqP1GKainxRp8UNA-1; Tue, 07 Mar 2023 09:16:33 -0500 X-MC-Unique: 9n_NcqnqP1GKainxRp8UNA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6DDF98828C1; Tue, 7 Mar 2023 14:16:32 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 92D731121314; Tue, 7 Mar 2023 14:16:31 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 16/17] block: ublk_drv: don't check buffer in case of zero copy Date: Tue, 7 Mar 2023 22:15:19 +0800 Message-Id: <20230307141520.793891-17-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In case of zero copy, ublk server needn't to pre-allocate IO buffer and provide it to driver more. Meantime not set the buffer in case of zero copy any more, and the userspace can use pread()/pwrite() to read from/write to the io request buffer, which is easier & simpler from userspace viewpoint. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 452deec571f7..2385cc3f8566 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1409,25 +1409,30 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV) goto out; /* FETCH_RQ has to provide IO buffer if NEED GET DATA is not enabled */ - if (!ub_cmd->addr && !ublk_need_get_data(ubq)) - goto out; + if (!ublk_support_zc(ubq)) { + if (!ub_cmd->addr && !ublk_need_get_data(ubq)) + goto out; + io->addr = ub_cmd->addr; + } io->cmd = cmd; io->flags |= UBLK_IO_FLAG_ACTIVE; - io->addr = ub_cmd->addr; - ublk_mark_io_ready(ub, ubq); break; case UBLK_IO_COMMIT_AND_FETCH_REQ: req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag); + + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) + goto out; /* * COMMIT_AND_FETCH_REQ has to provide IO buffer if NEED GET DATA is * not enabled or it is Read IO. */ - if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || req_op(req) == REQ_OP_READ)) - goto out; - if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) - goto out; - io->addr = ub_cmd->addr; + if (!ublk_support_zc(ubq)) { + if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || + req_op(req) == REQ_OP_READ)) + goto out; + io->addr = ub_cmd->addr; + } io->flags |= UBLK_IO_FLAG_ACTIVE; io->cmd = cmd; ublk_commit_completion(ub, ub_cmd); From patchwork Tue Mar 7 14:15:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13163661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C13AC678D5 for ; Tue, 7 Mar 2023 14:23:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231432AbjCGOXV (ORCPT ); Tue, 7 Mar 2023 09:23:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230413AbjCGOW4 (ORCPT ); Tue, 7 Mar 2023 09:22:56 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 336928736C for ; Tue, 7 Mar 2023 06:17:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678198600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rR3N9HDXhTp/+TmtUn2Sb3UyI3jiBvNThv67wJ5S54I=; b=HPMrLjQmqg1/rcfs8IQCEjjAUs78oavbNvF4XWfnF2aoUTVb76mmLNY6NBZhVSWLmj0XCi jk0UxqeswEWkesz4BLf6piqyLEcB8eUrNLYcO6pBQ6FRKn4fkfMsgXhM/2zpIjkQvX/n9L fe1UrY+1VO7PEr3oIvw8pu1lLhY5HLM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-271-qslzO9efM4iA1b7u1ket0Q-1; Tue, 07 Mar 2023 09:16:36 -0500 X-MC-Unique: qslzO9efM4iA1b7u1ket0Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A75DA3C0E445; Tue, 7 Mar 2023 14:16:35 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id CC0592026D4B; Tue, 7 Mar 2023 14:16:34 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [PATCH V2 17/17] block: ublk_drv: apply io_uring FUSED_CMD for supporting zero copy Date: Tue, 7 Mar 2023 22:15:20 +0800 Message-Id: <20230307141520.793891-18-ming.lei@redhat.com> In-Reply-To: <20230307141520.793891-1-ming.lei@redhat.com> References: <20230307141520.793891-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Apply io_uring fused command for supporting zero copy: 1) init the fused cmd buffer(io_mapped_buf) in ublk_map_io(), and deinit it in ublk_unmap_io(), and this buffer is immutable, so it is just fine to retrieve it from concurrent fused command. 1) add sub-command opcode of UBLK_IO_FUSED_SUBMIT_IO for retrieving this fused cmd(zero copy) buffer 2) call io_fused_cmd_provide_kbuf() to provide buffer to slave request; meantime setup complete callback via this API, once slave request is completed, the complete callback is called for freeing the buffer and completing the uring fused command Also request reference is held during fused command lifetime, and this way guarantees that request buffer won't be freed until fused commands are done. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 190 ++++++++++++++++++++++++++++++++-- include/uapi/linux/ublk_cmd.h | 6 +- 2 files changed, 183 insertions(+), 13 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 2385cc3f8566..466a8cfd2b2a 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -74,10 +74,15 @@ struct ublk_rq_data { * successfully */ struct kref ref; + bool allocated_bvec; + struct io_uring_bvec_buf buf[0]; }; struct ublk_uring_cmd_pdu { - struct ublk_queue *ubq; + union { + struct ublk_queue *ubq; + struct request *req; + }; }; /* @@ -566,6 +571,69 @@ static size_t ublk_copy_user_pages(const struct request *req, return done; } +/* + * The built command buffer is immutable, so it is fine to feed it to + * concurrent io_uring fused commands + */ +static int ublk_init_zero_copy_buffer(struct request *rq) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct io_uring_bvec_buf *imu = data->buf; + struct req_iterator rq_iter; + unsigned int nr_bvecs = 0; + struct bio_vec *bvec; + unsigned int offset; + struct bio_vec bv; + + if (!ublk_rq_has_data(rq)) + goto exit; + + rq_for_each_bvec(bv, rq, rq_iter) + nr_bvecs++; + + if (!nr_bvecs) + goto exit; + + if (rq->bio != rq->biotail) { + int idx = 0; + + bvec = kvmalloc_array(sizeof(struct bio_vec), nr_bvecs, + GFP_NOIO); + if (!bvec) + return -ENOMEM; + + offset = 0; + rq_for_each_bvec(bv, rq, rq_iter) + bvec[idx++] = bv; + data->allocated_bvec = true; + } else { + struct bio *bio = rq->bio; + + offset = bio->bi_iter.bi_bvec_done; + bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); + } + imu->bvec = bvec; + imu->nr_bvecs = nr_bvecs; + imu->offset = offset; + imu->len = blk_rq_bytes(rq); + + return 0; +exit: + imu->bvec = NULL; + return 0; +} + +static void ublk_deinit_zero_copy_buffer(struct request *rq) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct io_uring_bvec_buf *imu = data->buf; + + if (data->allocated_bvec) { + kvfree(imu->bvec); + data->allocated_bvec = false; + } +} + static inline bool ublk_need_map_req(const struct request *req) { return ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE; @@ -576,11 +644,23 @@ static inline bool ublk_need_unmap_req(const struct request *req) return ublk_rq_has_data(req) && req_op(req) == REQ_OP_READ; } -static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, +static int ublk_map_io(const struct ublk_queue *ubq, struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ublk_support_zc(ubq)) { + int ret = ublk_init_zero_copy_buffer(req); + + /* + * The only failure is -ENOMEM for allocating fused cmd + * buffer, return zero so that we can requeue this req. + */ + if (unlikely(ret)) + return 0; + return rq_bytes; + } + /* * no zero copy, we delay copy WRITE request data into ublksrv * context and the big benefit is that pinning pages in current @@ -600,11 +680,17 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, } static int ublk_unmap_io(const struct ublk_queue *ubq, - const struct request *req, + struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ublk_support_zc(ubq)) { + ublk_deinit_zero_copy_buffer(req); + + return rq_bytes; + } + if (ublk_need_unmap_req(req)) { struct iov_iter iter; struct iovec iov; @@ -688,6 +774,12 @@ static inline struct ublk_uring_cmd_pdu *ublk_get_uring_cmd_pdu( return (struct ublk_uring_cmd_pdu *)&ioucmd->pdu; } +static inline struct ublk_uring_cmd_pdu *ublk_get_uring_fused_cmd_pdu( + struct io_uring_cmd *ioucmd) +{ + return (struct ublk_uring_cmd_pdu *)&ioucmd->fused.pdu; +} + static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) { return ubq->ubq_daemon->flags & PF_EXITING; @@ -743,6 +835,7 @@ static inline void __ublk_complete_rq(struct request *req) return; exit: + ublk_deinit_zero_copy_buffer(req); blk_mq_end_request(req, res); } @@ -1348,6 +1441,67 @@ static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, return NULL; } +static void ublk_fused_cmd_done_cb(struct io_uring_cmd *cmd) +{ + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_fused_cmd_pdu(cmd); + struct request *req = pdu->req; + struct ublk_queue *ubq = req->mq_hctx->driver_data; + + ublk_put_req_ref(ubq, req); + io_uring_cmd_done(cmd, cmd->fused.data.slave_res, 0); +} + +static inline bool ublk_check_fused_buf_dir(const struct request *req, + unsigned int flags) +{ + flags &= IO_URING_F_FUSED; + + if (req_op(req) == REQ_OP_READ && flags == IO_URING_F_FUSED_WRITE) + return true; + + if (req_op(req) == REQ_OP_WRITE && flags == IO_URING_F_FUSED_READ) + return true; + + return false; +} + +static int ublk_handle_fused_cmd(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, int tag, unsigned int issue_flags) +{ + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_fused_cmd_pdu(cmd); + struct ublk_device *ub = cmd->file->private_data; + struct ublk_rq_data *data; + struct request *req; + + if (!ub) + return -EPERM; + + if (!(issue_flags & IO_URING_F_FUSED)) + goto exit; + + req = __ublk_check_and_get_req(ub, ubq, tag, 0); + if (!req) + goto exit; + + pr_devel("%s: qid %d tag %u request bytes %u, issue flags %x\n", + __func__, tag, ubq->q_id, blk_rq_bytes(req), + issue_flags); + + if (!ublk_check_fused_buf_dir(req, issue_flags)) + goto exit_put_ref; + + pdu->req = req; + data = blk_mq_rq_to_pdu(req); + io_fused_cmd_provide_kbuf(cmd, !(issue_flags & IO_URING_F_UNLOCKED), + data->buf, ublk_fused_cmd_done_cb); + return -EIOCBQUEUED; + +exit_put_ref: + ublk_put_req_ref(ubq, req); +exit: + return -EINVAL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd; @@ -1363,7 +1517,8 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) __func__, cmd->cmd_op, ub_cmd->q_id, tag, ub_cmd->result); - if (issue_flags & IO_URING_F_FUSED) + if ((issue_flags & IO_URING_F_FUSED) && + cmd_op != UBLK_IO_FUSED_SUBMIT_IO) return -EOPNOTSUPP; if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) @@ -1373,7 +1528,12 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (!ubq || ub_cmd->q_id != ubq->q_id) goto out; - if (ubq->ubq_daemon && ubq->ubq_daemon != current) + /* + * The fused command reads the io buffer data structure only, so it + * is fine to be issued from other context. + */ + if ((ubq->ubq_daemon && ubq->ubq_daemon != current) && + (cmd_op != UBLK_IO_FUSED_SUBMIT_IO)) goto out; if (tag >= ubq->q_depth) @@ -1396,6 +1556,9 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) goto out; switch (cmd_op) { + case UBLK_IO_FUSED_SUBMIT_IO: + return ublk_handle_fused_cmd(cmd, ubq, tag, issue_flags); + case UBLK_IO_FETCH_REQ: /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */ if (ublk_queue_ready(ubq)) { @@ -1725,11 +1888,14 @@ static void ublk_align_max_io_size(struct ublk_device *ub) static int ublk_add_tag_set(struct ublk_device *ub) { + int zc = !!(ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY); + struct ublk_rq_data *data; + ub->tag_set.ops = &ublk_mq_ops; ub->tag_set.nr_hw_queues = ub->dev_info.nr_hw_queues; ub->tag_set.queue_depth = ub->dev_info.queue_depth; ub->tag_set.numa_node = NUMA_NO_NODE; - ub->tag_set.cmd_size = sizeof(struct ublk_rq_data); + ub->tag_set.cmd_size = struct_size(data, buf, zc); ub->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; ub->tag_set.driver_data = ub; return blk_mq_alloc_tag_set(&ub->tag_set); @@ -1945,12 +2111,18 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) */ ub->dev_info.flags &= UBLK_F_ALL; + /* + * NEED_GET_DATA doesn't make sense any more in case that + * ZERO_COPY is requested. Another reason is that userspace + * can read/write io request buffer by pread()/pwrite() with + * each io buffer's position. + */ + if (ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY) + ub->dev_info.flags &= ~UBLK_F_NEED_GET_DATA; + if (!IS_BUILTIN(CONFIG_BLK_DEV_UBLK)) ub->dev_info.flags |= UBLK_F_URING_CMD_COMP_IN_TASK; - /* We are not ready to support zero copy */ - ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; - ub->dev_info.nr_hw_queues = min_t(unsigned int, ub->dev_info.nr_hw_queues, nr_cpu_ids); ublk_align_max_io_size(ub); diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index d1a6b3dc0327..c4f3465399cf 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -44,6 +44,7 @@ #define UBLK_IO_FETCH_REQ 0x20 #define UBLK_IO_COMMIT_AND_FETCH_REQ 0x21 #define UBLK_IO_NEED_GET_DATA 0x22 +#define UBLK_IO_FUSED_SUBMIT_IO 0x23 /* only ABORT means that no re-fetch */ #define UBLK_IO_RES_OK 0 @@ -85,10 +86,7 @@ static inline __u64 ublk_pos(__u16 q_id, __u16 tag, __u32 offset) ((((__u64)tag) << UBLK_BUF_SIZE_BITS) + offset); } -/* - * zero copy requires 4k block size, and can remap ublk driver's io - * request into ublksrv's vm space - */ +/* io_uring fused command based zero copy */ #define UBLK_F_SUPPORT_ZERO_COPY (1ULL << 0) /*