From patchwork Wed Mar 1 14:06:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156042 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E929FC7EE39 for ; Wed, 1 Mar 2023 14:09:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229904AbjCAOJT (ORCPT ); Wed, 1 Mar 2023 09:09:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbjCAOJN (ORCPT ); Wed, 1 Mar 2023 09:09:13 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9F8A19F10 for ; Wed, 1 Mar 2023 06:08:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679683; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fgL0MAAgWmqfO+9t+5jpWAVNrZVmdtUVYegvFw8KI4g=; b=NxQocpqX85tbABeJ6nkQNI2xvgrRxyXH8zeQZtR3f/dkChEFFc0EtyA8r6st6TvEgp4WbH trftZLalsqjmRTI38Lz80gUzMwNNfR/4RQ2aXjomIRytVa7jkfzLjbmNzFfp1Shi+E6mMK X8bjWqBhxQVjxfaZ6bd+fQmyJasB6TM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-180-EElNq6KeN4CQA32vZtj2Dw-1; Wed, 01 Mar 2023 09:07:59 -0500 X-MC-Unique: EElNq6KeN4CQA32vZtj2Dw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B372B84AC99; Wed, 1 Mar 2023 14:06:27 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id F113E1121318; Wed, 1 Mar 2023 14:06:26 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 01/12] io_uring: increase io_kiocb->flags into 64bit Date: Wed, 1 Mar 2023 22:06:00 +0800 Message-Id: <20230301140611.163055-2-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The 32bit io_kiocb->flags has been used up, so extend it to 64bit. Signed-off-by: Ming Lei --- include/linux/io_uring_types.h | 2 +- io_uring/io_uring.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 0efe4d784358..87342649d2c3 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -530,7 +530,7 @@ struct io_kiocb { * and after selection it points to the buffer ID itself. */ u16 buf_index; - unsigned int flags; + u64 flags; struct io_cqe cqe; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 1df68da89f99..09cc5eaec4ab 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -4418,7 +4418,7 @@ static int __init io_uring_init(void) BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS); - BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int)); + BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(u64)); BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); From patchwork Wed Mar 1 14:06:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156038 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04B7CC7EE2F for ; Wed, 1 Mar 2023 14:08:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229933AbjCAOIn (ORCPT ); Wed, 1 Mar 2023 09:08:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229922AbjCAOIe (ORCPT ); Wed, 1 Mar 2023 09:08:34 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7E073432F for ; Wed, 1 Mar 2023 06:07:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fGzcLuKh8pFFEv9i8gGA6ApGflYmy8iq+CgmaP8wY3k=; b=gg39lkRNwgH9fxSY/onqMXBHBWkEVELjmKNGodJbpFllgmPwVIIXuihv839vB7WH78rZMF VG1eOw0E5KvV3J2cxBacvs9DYum1rA8tBLYh3M8WEAGdwFuYZ1PNwYTujKif5o/p9GrK22 zxgU8Ap7RuWgMMynnd9/xP34Fusi740= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-57-_G84y0-tPl-mfNrQoYpvFw-1; Wed, 01 Mar 2023 09:07:36 -0500 X-MC-Unique: _G84y0-tPl-mfNrQoYpvFw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 71F0F38123DA; Wed, 1 Mar 2023 14:06:31 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id A936EC15BAD; Wed, 1 Mar 2023 14:06:30 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 02/12] io_uring: define io_mapped_ubuf->acct_pages as unsigned integer Date: Wed, 1 Mar 2023 22:06:01 +0800 Message-Id: <20230301140611.163055-3-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Unsigned integer is enough(4G * 4k = 16TB) to hold nr_pages in one io_mapped_ubuf. This way will save one word for io_mapped_ubuf. Signed-off-by: Ming Lei --- io_uring/rsrc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 2b8743645efc..774aca20326c 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -49,7 +49,7 @@ struct io_mapped_ubuf { u64 ubuf; u64 ubuf_end; unsigned int nr_bvecs; - unsigned long acct_pages; + unsigned int acct_pages; struct bio_vec bvec[]; }; From patchwork Wed Mar 1 14:06:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156044 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41A90C7EE43 for ; Wed, 1 Mar 2023 14:09:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229923AbjCAOJU (ORCPT ); Wed, 1 Mar 2023 09:09:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229695AbjCAOJN (ORCPT ); Wed, 1 Mar 2023 09:09:13 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73807206B1 for ; Wed, 1 Mar 2023 06:08:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679688; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4743F7Qm1JF9Z8MoytAhj/h+tT8J4FaPqlcsGjHKeC4=; b=BEV+GZE08iF+/PVxcZ09F8dNTY7ZQwMe4r7fpX63yio669tH7QnTu2eQOx2jFRVsaTeXxT T+5U+5UDtbX07HdBCEgQA3kz6WSOOjy1dCGS8FcYUw0FKZd9MWX4v20oloYeIj69CRjBZ3 FTDimOy4CGQLOFfAQrihWF5NCD197WE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-151-0gcjTk4UOnKMQavsQ6UFQA-1; Wed, 01 Mar 2023 09:08:06 -0500 X-MC-Unique: 0gcjTk4UOnKMQavsQ6UFQA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2620F8DBA77; Wed, 1 Mar 2023 14:06:35 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4B8FC40B40DF; Wed, 1 Mar 2023 14:06:33 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 03/12] io_uring: extend io_mapped_ubuf to cover external bvec table Date: Wed, 1 Mar 2023 22:06:02 +0800 Message-Id: <20230301140611.163055-4-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Extend io_mapped_ubuf to cover external bvec table for supporting fused command kbuf, in which the bvec table could be from one IO request. Signed-off-by: Ming Lei --- io_uring/rsrc.c | 5 +++-- io_uring/rsrc.h | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index a59fc02de598..c41edd197b0a 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1221,7 +1221,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, goto done; } - imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL); + imu = kvmalloc(struct_size(imu, __bvec, nr_pages), GFP_KERNEL); if (!imu) goto done; @@ -1237,7 +1237,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, size_t vec_len; vec_len = min_t(size_t, size, PAGE_SIZE - off); - bvec_set_page(&imu->bvec[i], pages[i], vec_len, off); + bvec_set_page(&imu->__bvec[i], pages[i], vec_len, off); off = 0; size -= vec_len; } @@ -1245,6 +1245,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, imu->ubuf = (unsigned long) iov->iov_base; imu->ubuf_end = imu->ubuf + iov->iov_len; imu->nr_bvecs = nr_pages; + imu->bvec = imu->__bvec; *pimu = imu; ret = 0; done: diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 774aca20326c..24329eca49ef 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -50,7 +50,8 @@ struct io_mapped_ubuf { u64 ubuf_end; unsigned int nr_bvecs; unsigned int acct_pages; - struct bio_vec bvec[]; + struct bio_vec *bvec; + struct bio_vec __bvec[]; }; void io_rsrc_put_tw(struct callback_head *cb); From patchwork Wed Mar 1 14:06:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A11B5C7EE32 for ; Wed, 1 Mar 2023 14:08:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229470AbjCAOI0 (ORCPT ); Wed, 1 Mar 2023 09:08:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229471AbjCAOIZ (ORCPT ); Wed, 1 Mar 2023 09:08:25 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AAF83D0A7 for ; Wed, 1 Mar 2023 06:07:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679663; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dTiSqUkgIjk2gbhCZoCuEwpFtez4HFv9wv/umxEg0aA=; b=FSP/DbJ3AgzjUpiE2wlmtwSyvMkGwXKCOdKdYXUqPeszFD9OiFnPEKUcnvjYr9vN58g+l3 u3lsBRl0DmHAE+TthvP7bvSHoo7x+hFsTn/IR8Ou79hYkW0r0C1m5WNV6sY306T77sZo2G +1JoOaJTjtf3f4kkWdqDNbwswfbCuj4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-472-d7Zz0fMeOkaLYRMNodORiA-1; Wed, 01 Mar 2023 09:07:40 -0500 X-MC-Unique: d7Zz0fMeOkaLYRMNodORiA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9FBA18AD540; Wed, 1 Mar 2023 14:06:39 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id BCB8F2166B26; Wed, 1 Mar 2023 14:06:38 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 04/12] io_uring: rename io_mapped_ubuf as io_mapped_buf Date: Wed, 1 Mar 2023 22:06:03 +0800 Message-Id: <20230301140611.163055-5-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Prepare to reuse io_mapped_ubuf for feeding fused command kbuf(bvec based buffer) to io_uring OP. Meantime rename ->ubuf as ->buf, and -ubuf_end as ->buf_end, both are actually just used for figuring out buffer offset & length only. Signed-off-by: Ming Lei --- include/linux/io_uring_types.h | 6 +++--- io_uring/fdinfo.c | 6 +++--- io_uring/io_uring.c | 2 +- io_uring/rsrc.c | 26 +++++++++++++------------- io_uring/rsrc.h | 10 +++++----- 5 files changed, 25 insertions(+), 25 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 87342649d2c3..7a27b1d3e2ea 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -244,7 +244,7 @@ struct io_ring_ctx { struct io_file_table file_table; unsigned nr_user_files; unsigned nr_user_bufs; - struct io_mapped_ubuf **user_bufs; + struct io_mapped_buf **user_bufs; struct io_submit_state submit_state; @@ -326,7 +326,7 @@ struct io_ring_ctx { /* slow path rsrc auxilary data, used by update/register */ struct io_rsrc_node *rsrc_backup_node; - struct io_mapped_ubuf *dummy_ubuf; + struct io_mapped_buf *dummy_ubuf; struct io_rsrc_data *file_data; struct io_rsrc_data *buf_data; @@ -541,7 +541,7 @@ struct io_kiocb { union { /* store used ubuf, so we can prevent reloading */ - struct io_mapped_ubuf *imu; + struct io_mapped_buf *imu; /* stores selected buf, valid IFF REQ_F_BUFFER_SELECTED is set */ struct io_buffer *kbuf; diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index 882bd56b01ed..2f663a795411 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -157,10 +157,10 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, } seq_printf(m, "UserBufs:\t%u\n", ctx->nr_user_bufs); for (i = 0; has_lock && i < ctx->nr_user_bufs; i++) { - struct io_mapped_ubuf *buf = ctx->user_bufs[i]; - unsigned int len = buf->ubuf_end - buf->ubuf; + struct io_mapped_buf *buf = ctx->user_bufs[i]; + unsigned int len = buf->buf_end - buf->buf; - seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, len); + seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->buf, len); } if (has_lock && !xa_empty(&ctx->personalities)) { unsigned long index; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 09cc5eaec4ab..3df66fddda5a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -298,7 +298,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) if (!ctx->dummy_ubuf) goto err; /* set invalid range, so io_import_fixed() fails meeting it */ - ctx->dummy_ubuf->ubuf = -1UL; + ctx->dummy_ubuf->buf = -1UL; if (percpu_ref_init(&ctx->refs, io_ring_ctx_ref_free, 0, GFP_KERNEL)) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index c41edd197b0a..26c07b28e8bb 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -24,7 +24,7 @@ struct io_rsrc_update { }; static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, - struct io_mapped_ubuf **pimu, + struct io_mapped_buf **pimu, struct page **last_hpage); #define IO_RSRC_REF_BATCH 100 @@ -136,9 +136,9 @@ static int io_buffer_validate(struct iovec *iov) return 0; } -static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_mapped_ubuf **slot) +static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_mapped_buf **slot) { - struct io_mapped_ubuf *imu = *slot; + struct io_mapped_buf *imu = *slot; unsigned int i; if (imu != ctx->dummy_ubuf) { @@ -542,7 +542,7 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, return -EINVAL; for (done = 0; done < nr_args; done++) { - struct io_mapped_ubuf *imu; + struct io_mapped_buf *imu; int offset = up->offset + done; u64 tag = 0; @@ -1092,7 +1092,7 @@ static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages, /* check previously registered pages */ for (i = 0; i < ctx->nr_user_bufs; i++) { - struct io_mapped_ubuf *imu = ctx->user_bufs[i]; + struct io_mapped_buf *imu = ctx->user_bufs[i]; for (j = 0; j < imu->nr_bvecs; j++) { if (!PageCompound(imu->bvec[j].bv_page)) @@ -1106,7 +1106,7 @@ static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages, } static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, - int nr_pages, struct io_mapped_ubuf *imu, + int nr_pages, struct io_mapped_buf *imu, struct page **last_hpage) { int i, ret; @@ -1199,10 +1199,10 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages) } static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, - struct io_mapped_ubuf **pimu, + struct io_mapped_buf **pimu, struct page **last_hpage) { - struct io_mapped_ubuf *imu = NULL; + struct io_mapped_buf *imu = NULL; struct page **pages = NULL; unsigned long off; size_t size; @@ -1242,8 +1242,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, size -= vec_len; } /* store original address for later verification */ - imu->ubuf = (unsigned long) iov->iov_base; - imu->ubuf_end = imu->ubuf + iov->iov_len; + imu->buf = (unsigned long) iov->iov_base; + imu->buf_end = imu->buf + iov->iov_len; imu->nr_bvecs = nr_pages; imu->bvec = imu->__bvec; *pimu = imu; @@ -1321,7 +1321,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, } int io_import_fixed(int ddir, struct iov_iter *iter, - struct io_mapped_ubuf *imu, + struct io_mapped_buf *imu, u64 buf_addr, size_t len) { u64 buf_end; @@ -1332,14 +1332,14 @@ int io_import_fixed(int ddir, struct iov_iter *iter, if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end))) return -EFAULT; /* not inside the mapped region */ - if (unlikely(buf_addr < imu->ubuf || buf_end > imu->ubuf_end)) + if (unlikely(buf_addr < imu->buf || buf_end > imu->buf_end)) return -EFAULT; /* * May not be a start of buffer, set size appropriately * and advance us to the beginning. */ - offset = buf_addr - imu->ubuf; + offset = buf_addr - imu->buf; iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len); if (offset) { diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 24329eca49ef..5da54702cad1 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -19,7 +19,7 @@ struct io_rsrc_put { union { void *rsrc; struct file *file; - struct io_mapped_ubuf *buf; + struct io_mapped_buf *buf; }; }; @@ -45,9 +45,9 @@ struct io_rsrc_node { bool done; }; -struct io_mapped_ubuf { - u64 ubuf; - u64 ubuf_end; +struct io_mapped_buf { + u64 buf; + u64 buf_end; unsigned int nr_bvecs; unsigned int acct_pages; struct bio_vec *bvec; @@ -67,7 +67,7 @@ void io_rsrc_node_switch(struct io_ring_ctx *ctx, struct io_rsrc_data *data_to_kill); int io_import_fixed(int ddir, struct iov_iter *iter, - struct io_mapped_ubuf *imu, + struct io_mapped_buf *imu, u64 buf_addr, size_t len); void __io_sqe_buffers_unregister(struct io_ring_ctx *ctx); From patchwork Wed Mar 1 14:06:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156039 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3198AC6FA9D for ; Wed, 1 Mar 2023 14:09:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229523AbjCAOJQ (ORCPT ); Wed, 1 Mar 2023 09:09:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229923AbjCAOJM (ORCPT ); Wed, 1 Mar 2023 09:09:12 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 903E73B3D8 for ; Wed, 1 Mar 2023 06:07:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Si0P7UPG8+bk/24ad9tKnEySBYUSXrBuipOxFX+Z8s=; b=BVxefdGgTqyW6RJrUra50pFXTQQQy5A93qroYN83NR3P/Ov96NV+GtoaoPHR8+DdFIzSLa gzamvEZSk1YeTCJRrLGcNMykuebOjegknBpAnL7Gky3tjxvWlY5srYLM8h+APSYCt3s8ia y4S1Rd6ErWUHSSGl65+e5LFDE+skL7o= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-655-GhigSr4IMZOwm5GZYVENow-1; Wed, 01 Mar 2023 09:07:39 -0500 X-MC-Unique: GhigSr4IMZOwm5GZYVENow-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BAE233C10EF2; Wed, 1 Mar 2023 14:06:42 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id E9C2B1121318; Wed, 1 Mar 2023 14:06:41 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 05/12] io_uring: export 'struct io_mapped_buf' for fused cmd buffer Date: Wed, 1 Mar 2023 22:06:04 +0800 Message-Id: <20230301140611.163055-6-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Export 'struct io_mapped_buf' for the coming fused cmd buffer, which is based on bvec too. This instance is supposed to be immutable in its whole lifetime. Signed-off-by: Ming Lei --- include/linux/io_uring.h | 19 +++++++++++++++++++ io_uring/rsrc.h | 9 --------- 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 934e5dd4ccc0..88205ea566d3 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -4,6 +4,7 @@ #include #include +#include #include enum io_uring_cmd_flags { @@ -36,6 +37,24 @@ struct io_uring_cmd { u8 pdu[32]; /* available inline for free use */ }; +/* The mapper buffer is supposed to be immutable */ +struct io_mapped_buf { + u64 buf; + u64 buf_end; + unsigned int nr_bvecs; + union { + unsigned int acct_pages; + + /* + * offset into the bvecs, use for external user; with + * 'offset', immutable bvecs can be provided for io_uring + */ + unsigned int offset; + }; + struct bio_vec *bvec; + struct bio_vec __bvec[]; +}; + #if defined(CONFIG_IO_URING) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 5da54702cad1..4bd17877d53a 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -45,15 +45,6 @@ struct io_rsrc_node { bool done; }; -struct io_mapped_buf { - u64 buf; - u64 buf_end; - unsigned int nr_bvecs; - unsigned int acct_pages; - struct bio_vec *bvec; - struct bio_vec __bvec[]; -}; - void io_rsrc_put_tw(struct callback_head *cb); void io_rsrc_put_work(struct work_struct *work); void io_rsrc_refs_refill(struct io_ring_ctx *ctx); From patchwork Wed Mar 1 14:06:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156041 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72147C7EE37 for ; Wed, 1 Mar 2023 14:09:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229937AbjCAOJS (ORCPT ); Wed, 1 Mar 2023 09:09:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229904AbjCAOJN (ORCPT ); Wed, 1 Mar 2023 09:09:13 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07D6F19A1 for ; Wed, 1 Mar 2023 06:07:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OzamTCwD4+DR1k1p7nSy56cupU+GGp9PQzgKrFYNBp0=; b=i0NP5w4dKAF0aEGSMabMmY5WBmhd7mix/6hXsMTkUr7MlCe5+arq4zfkYr52yFYJ+RwDES SP0rB6Sef+D+pZxOii0RfhINnEhEz+mdIndW1pbB3Unl4NkoUSRsOa4JL/WEMKmQEif5Tt 87EGoqNPWRGO1P2XzIewcPJlSRRLjbI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-136-zewfF-lsMr23jqbmv2aCgg-1; Wed, 01 Mar 2023 09:07:46 -0500 X-MC-Unique: zewfF-lsMr23jqbmv2aCgg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 765273C16EA3; Wed, 1 Mar 2023 14:06:46 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id AA9B8140EBF4; Wed, 1 Mar 2023 14:06:45 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 06/12] io_uring: add IO_URING_F_FUSED and prepare for supporting OP_FUSED_CMD Date: Wed, 1 Mar 2023 22:06:05 +0800 Message-Id: <20230301140611.163055-7-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add flag IO_URING_F_FUSED and prepare for supporting IO_URING_OP_FUSED_CMD, which is still one type of IO_URING_OP_URING_CMD, so it is reasonable to reuse ->uring_cmd() for handling IO_URING_F_FUSED_CMD. And just IO_URING_F_FUSED_CMD will carry one 64byte SQE as payload which will be handled by one slave request. The master uring command will provide kernel buffer to the slave request via 'struct io_mapped_buf'. Mark all existed drivers to not support IO_URING_F_FUSED_CMD, given it depends if driver is capable of handling the slave request. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 6 ++++++ drivers/char/mem.c | 4 ++++ drivers/nvme/host/ioctl.c | 9 +++++++++ include/linux/io_uring.h | 7 +++++++ 4 files changed, 26 insertions(+) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index b9c759cef00e..c89ede1c9b22 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1274,6 +1274,9 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (!(issue_flags & IO_URING_F_SQE128)) goto out; + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) goto out; @@ -2172,6 +2175,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd, struct ublk_device *ub = NULL; int ret = -EINVAL; + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN; diff --git a/drivers/char/mem.c b/drivers/char/mem.c index ffb101d349f0..134ba6665194 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -30,6 +30,7 @@ #include #include #include +#include #ifdef CONFIG_IA64 # include @@ -482,6 +483,9 @@ static ssize_t splice_write_null(struct pipe_inode_info *pipe, struct file *out, static int uring_cmd_null(struct io_uring_cmd *ioucmd, unsigned int issue_flags) { + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + return 0; } diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 723e7d5b778f..44a171bcaa90 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -773,6 +773,9 @@ int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) struct nvme_ns *ns = container_of(file_inode(ioucmd->file)->i_cdev, struct nvme_ns, cdev); + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + return nvme_ns_uring_cmd(ns, ioucmd, issue_flags); } @@ -878,6 +881,9 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd, struct nvme_ns *ns = nvme_find_path(head); int ret = -EINVAL; + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + if (ns) ret = nvme_ns_uring_cmd(ns, ioucmd, issue_flags); srcu_read_unlock(&head->srcu, srcu_idx); @@ -915,6 +921,9 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) struct nvme_ctrl *ctrl = ioucmd->file->private_data; int ret; + if (issue_flags & IO_URING_F_FUSED) + return -EOPNOTSUPP; + /* IOPOLL not supported yet */ if (issue_flags & IO_URING_F_IOPOLL) return -EOPNOTSUPP; diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 88205ea566d3..2ccf91146c13 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -21,6 +21,13 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = (1 << 8), IO_URING_F_CQE32 = (1 << 9), IO_URING_F_IOPOLL = (1 << 10), + + /* for FUSED_CMD only */ + IO_URING_F_FUSED_WRITE = (1 << 11), /* slave writes to buffer */ + IO_URING_F_FUSED_READ = (1 << 12), /* slave reads from buffer */ + /* driver incapable of FUSED_CMD should fail cmd when seeing F_FUSED */ + IO_URING_F_FUSED = IO_URING_F_FUSED_WRITE | + IO_URING_F_FUSED_READ, }; struct io_uring_cmd { From patchwork Wed Mar 1 14:06:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156049 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27FE8C7EE39 for ; Wed, 1 Mar 2023 14:09:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229960AbjCAOJY (ORCPT ); Wed, 1 Mar 2023 09:09:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229953AbjCAOJS (ORCPT ); Wed, 1 Mar 2023 09:09:18 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3ED710DE for ; Wed, 1 Mar 2023 06:08:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679706; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6duSMjgZEtPUVi1SiUn7EC2C3gQJDiTE9OlEnqPplAA=; b=N9iDVeFc/6grR8fs8kd2anzn0hHKmXFzCWAS1uZ+g9yaTYqSqnzxmlmA6dSiV3gjhK13Ll 8B8D63bFT+JcXzGqzzDaSnyX1GsZVe4vWm264ENdH5GPalP3Whldry5OwZtFz8AvSHgRMm QdHjpCR98w6cnsNLMc7seo90o065Kqo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-610-qASTspiHOP2XD61i3oNaBw-1; Wed, 01 Mar 2023 09:08:16 -0500 X-MC-Unique: qASTspiHOP2XD61i3oNaBw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 67F3F864764; Wed, 1 Mar 2023 14:06:50 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id F19EE18EC1; Wed, 1 Mar 2023 14:06:48 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 07/12] io_uring: add IORING_OP_FUSED_CMD Date: Wed, 1 Mar 2023 22:06:06 +0800 Message-Id: <20230301140611.163055-8-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd 64byte SQE(slave) is another normal 64byte OP. For any OP which needs to support slave OP, io_issue_defs[op].fused_slave has to be set as 1, and its ->issue() needs to retrieve buffer from master request's fused_cmd_kbuf. Follows the key points of the design/implementation: 1) The master uring command produces and provides immutable command buffer(struct io_mapped_buf) to the slave request, and the slave OP can retrieve any part of this buffer by sqe->addr and sqe->len. 2) Master command is always completed after the slave request is completed. - Before slave request is submitted, the buffer ownership is transferred to slave request. After slave request is completed, the buffer ownership is returned back to master request. - This way also guarantees correct SQE order since the master request uses slave request's LINK flag. 3) Master request is always completed by driver, so that driver can know when the buffer is done with slave quest. The motivation is for supporting zero copy for fuse/ublk, in which the device holds IO request buffer, and IO handling is often normal IO OP(fs, net, ..). With IORING_OP_FUSED_CMD, we can implement this kind of zero copy easily & reliably. Signed-off-by: Ming Lei --- include/linux/io_uring.h | 39 +++++- include/linux/io_uring_types.h | 18 +++ include/uapi/linux/io_uring.h | 1 + io_uring/Makefile | 2 +- io_uring/fused_cmd.c | 233 +++++++++++++++++++++++++++++++++ io_uring/fused_cmd.h | 11 ++ io_uring/io_uring.c | 20 ++- io_uring/io_uring.h | 3 + io_uring/opdef.c | 12 ++ io_uring/opdef.h | 2 + 10 files changed, 335 insertions(+), 6 deletions(-) create mode 100644 io_uring/fused_cmd.c create mode 100644 io_uring/fused_cmd.h diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 2ccf91146c13..64552da503c0 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -30,6 +30,19 @@ enum io_uring_cmd_flags { IO_URING_F_FUSED_READ, }; +union io_uring_fused_cmd_data { + /* + * In case of slave request IOSQE_CQE_SKIP_SUCCESS, return slave + * result via master command; otherwise we simply return success + * if buffer is provided, and slave request will return its result + * via its CQE + */ + s32 slave_res; + + /* fused cmd private, driver do not touch it */ + struct io_kiocb *__slave; +}; + struct io_uring_cmd { struct file *file; const void *cmd; @@ -41,11 +54,27 @@ struct io_uring_cmd { }; u32 cmd_op; u32 flags; - u8 pdu[32]; /* available inline for free use */ + + /* for fused command, the available pdu is a bit less */ + union { + u8 pdu[32]; /* available inline for free use */ + struct { + u8 pdu[24]; /* available inline for free use */ + union io_uring_fused_cmd_data data; + } fused; + }; }; /* The mapper buffer is supposed to be immutable */ struct io_mapped_buf { + /* + * For kernel buffer without virtual address, buf is set as zero, + * which is just fine given both buf/buf_end are just for + * calculating iov iter offset/len and validating buffer. + * + * So slave OP has to fail request in case that the OP doesn't + * support iov iter. + */ u64 buf; u64 buf_end; unsigned int nr_bvecs; @@ -63,6 +92,9 @@ struct io_mapped_buf { }; #if defined(CONFIG_IO_URING) +void io_fused_cmd_provide_kbuf(struct io_uring_cmd *ioucmd, bool locked, + const struct io_mapped_buf *imu, + void (*complete_tw_cb)(struct io_uring_cmd *)); int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); @@ -92,6 +124,11 @@ static inline void io_uring_free(struct task_struct *tsk) __io_uring_free(tsk); } #else +static inline void io_fused_cmd_provide_kbuf(struct io_uring_cmd *ioucmd, + bool locked, const struct io_mapped_buf *fused_cmd_kbuf, + unsigned int len, void (*complete_tw_cb)(struct io_uring_cmd *)) +{ +} static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) { diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 7a27b1d3e2ea..7d358fae65f5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -401,6 +401,8 @@ enum { /* keep async read/write and isreg together and in order */ REQ_F_SUPPORT_NOWAIT_BIT, REQ_F_ISREG_BIT, + REQ_F_FUSED_MASTER_BIT, + REQ_F_FUSED_SLAVE_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -470,6 +472,10 @@ enum { REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), + /* master request(uring cmd) in fused cmd */ + REQ_F_FUSED_MASTER = BIT(REQ_F_FUSED_MASTER_BIT), + /* slave request in fused cmd, won't be one uring cmd */ + REQ_F_FUSED_SLAVE = BIT(REQ_F_FUSED_SLAVE_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked); @@ -551,6 +557,18 @@ struct io_kiocb { * REQ_F_BUFFER_RING is set. */ struct io_buffer_list *buf_list; + + /* + * store kernel (sub)buffer of fused master request which OP + * is IORING_OP_FUSED_CMD + */ + const struct io_mapped_buf *fused_cmd_kbuf; + + /* + * store fused command master request for fuse slave request, + * which uses fuse master's kernel buffer for handling this OP + */ + struct io_kiocb *fused_master_req; }; union { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 709de6d4feb2..f07d005ee898 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -223,6 +223,7 @@ enum io_uring_op { IORING_OP_URING_CMD, IORING_OP_SEND_ZC, IORING_OP_SENDMSG_ZC, + IORING_OP_FUSED_CMD, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/Makefile b/io_uring/Makefile index 8cc8e5387a75..5301077e61c5 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,5 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o + cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o fused_cmd.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/fused_cmd.c b/io_uring/fused_cmd.c new file mode 100644 index 000000000000..9c380b3275f8 --- /dev/null +++ b/io_uring/fused_cmd.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "opdef.h" +#include "rsrc.h" +#include "uring_cmd.h" + +static bool io_fused_slave_valid(const struct io_uring_sqe *sqe, u8 op) +{ + unsigned int sqe_flags = READ_ONCE(sqe->flags); + + if (op == IORING_OP_FUSED_CMD || op == IORING_OP_URING_CMD) + return false; + + if (sqe_flags & REQ_F_BUFFER_SELECT) + return false; + + if (!io_issue_defs[op].fused_slave) + return false; + + return true; +} + +static inline void io_fused_cmd_update_link_flags(struct io_kiocb *req, + const struct io_kiocb *slave) +{ + /* + * We have to keep slave SQE in order, so update master link flags + * with slave request's given master command isn't completed until + * the slave request is done + */ + if (slave->flags & (REQ_F_LINK | REQ_F_HARDLINK)) + req->flags |= REQ_F_LINK; +} + +int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) + __must_hold(&req->ctx->uring_lock) +{ + struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + const struct io_uring_sqe *slave_sqe = sqe + 1; + struct io_ring_ctx *ctx = req->ctx; + struct io_kiocb *slave; + u8 slave_op; + int ret; + + if (unlikely(!(ctx->flags & IORING_SETUP_SQE128))) + return -EINVAL; + + if (unlikely(sqe->__pad1)) + return -EINVAL; + + ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags); + if (unlikely(ioucmd->flags)) + return -EINVAL; + + slave_op = READ_ONCE(slave_sqe->opcode); + if (unlikely(!io_fused_slave_valid(slave_sqe, slave_op))) + return -EINVAL; + + ioucmd->cmd = sqe->cmd; + ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); + req->fused_cmd_kbuf = NULL; + + /* take one extra reference for the slave request */ + io_get_task_refs(1); + + ret = -ENOMEM; + if (unlikely(!io_alloc_req(ctx, &slave))) + goto fail; + + ret = io_init_req(ctx, slave, slave_sqe, true); + if (unlikely(ret)) + goto fail_free_req; + + io_fused_cmd_update_link_flags(req, slave); + + ioucmd->fused.data.__slave = slave; + req->flags |= REQ_F_FUSED_MASTER; + + return 0; + +fail_free_req: + io_free_req(slave); +fail: + current->io_uring->cached_refs += 1; + return ret; +} + +static inline bool io_fused_slave_write_to_buf(u8 op) +{ + switch (op) { + case IORING_OP_READ: + case IORING_OP_READV: + case IORING_OP_READ_FIXED: + case IORING_OP_RECVMSG: + case IORING_OP_RECV: + return 1; + default: + return 0; + } +} + +int io_fused_cmd(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + const struct io_kiocb *slave = ioucmd->fused.data.__slave; + int ret = -EINVAL; + + /* + * Pass buffer direction for driver to validate if the read/write + * is legal + */ + if (io_fused_slave_write_to_buf(slave->opcode)) + issue_flags |= IO_URING_F_FUSED_WRITE; + else + issue_flags |= IO_URING_F_FUSED_READ; + + ret = io_uring_cmd(req, issue_flags); + if (ret != IOU_ISSUE_SKIP_COMPLETE) + io_free_req(ioucmd->fused.data.__slave); + + return ret; +} + +int io_import_kbuf_for_slave(u64 buf, unsigned int len, int rw, + struct iov_iter *iter, struct io_kiocb *slave) +{ + struct io_kiocb *req = slave->fused_master_req; + const struct io_mapped_buf *kbuf; + unsigned int offset; + + if (unlikely(!(slave->flags & REQ_F_FUSED_SLAVE) || !req)) + return -EINVAL; + + if (unlikely(!req->fused_cmd_kbuf)) + return -EINVAL; + + /* req->fused_cmd_kbuf is immutable */ + kbuf = req->fused_cmd_kbuf; + offset = kbuf->offset; + + if (!kbuf->bvec) + return -EINVAL; + + /* not inside the mapped region */ + if (unlikely(buf < kbuf->buf || buf > kbuf->buf_end)) + return -EFAULT; + + if (unlikely(len > kbuf->buf_end - buf)) + return -EFAULT; + + /* don't use io_import_fixed which doesn't support multipage bvec */ + offset += buf - kbuf->buf; + iov_iter_bvec(iter, rw, kbuf->bvec, kbuf->nr_bvecs, offset + len); + + if (offset) + iov_iter_advance(iter, offset); + + return 0; +} + +/* + * Called when slave request is completed, + * + * Return back ownership of the fused_cmd kbuf to master request, and + * notify master request. + */ +void io_fused_cmd_return_kbuf(struct io_kiocb *slave) +{ + struct io_kiocb *req = slave->fused_master_req; + struct io_uring_cmd *ioucmd; + + if (unlikely(!req || !(slave->flags & REQ_F_FUSED_SLAVE))) + return; + + /* return back the buffer */ + slave->fused_master_req = NULL; + ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + ioucmd->fused.data.__slave = NULL; + + /* If slave OP skips CQE, return the result via master command */ + if (slave->flags & REQ_F_CQE_SKIP) + ioucmd->fused.data.slave_res = slave->cqe.res; + else + ioucmd->fused.data.slave_res = 0; + io_uring_cmd_complete_in_task(ioucmd, ioucmd->task_work_cb); +} + +/* + * This API needs to be called when master command has prepared + * FUSED_CMD buffer, and offset/len in ->fused.data is for retrieving + * sub-buffer in the command buffer, which is often figured out by + * command payload data. + * + * Master command is always completed after the slave request + * is completed, so driver has to set completion callback for + * getting notification. + * + * Ownership of the fused_cmd kbuf is transferred to slave request. + */ +void io_fused_cmd_provide_kbuf(struct io_uring_cmd *ioucmd, bool locked, + const struct io_mapped_buf *fused_cmd_kbuf, + void (*complete_tw_cb)(struct io_uring_cmd *)) +{ + struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); + struct io_kiocb *slave = ioucmd->fused.data.__slave; + + /* + * Once the fused slave request is completed, the driver will + * be notified by callback of complete_tw_cb + */ + ioucmd->task_work_cb = complete_tw_cb; + + /* now we get the buffer */ + req->fused_cmd_kbuf = fused_cmd_kbuf; + slave->fused_master_req = req; + + trace_io_uring_submit_sqe(slave, true); + if (locked) + io_req_task_submit(slave, &locked); + else + io_req_task_queue(slave); +} +EXPORT_SYMBOL_GPL(io_fused_cmd_provide_kbuf); diff --git a/io_uring/fused_cmd.h b/io_uring/fused_cmd.h new file mode 100644 index 000000000000..c11d9d8989a1 --- /dev/null +++ b/io_uring/fused_cmd.h @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_FUSED_CMD_H +#define IOU_FUSED_CMD_H + +int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_fused_cmd(struct io_kiocb *req, unsigned int issue_flags); +void io_fused_cmd_return_kbuf(struct io_kiocb *slave); +int io_import_kbuf_for_slave(u64 buf, unsigned int len, int rw, + struct iov_iter *iter, struct io_kiocb *slave); + +#endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3df66fddda5a..d34ce82a4cc6 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -91,6 +91,7 @@ #include "cancel.h" #include "net.h" #include "notif.h" +#include "fused_cmd.h" #include "timeout.h" #include "poll.h" @@ -110,7 +111,7 @@ #define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \ REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS | \ - REQ_F_ASYNC_DATA) + REQ_F_ASYNC_DATA | REQ_F_FUSED_SLAVE) #define IO_REQ_CLEAN_SLOW_FLAGS (REQ_F_REFCOUNT | REQ_F_LINK | REQ_F_HARDLINK |\ IO_REQ_CLEAN_FLAGS) @@ -964,6 +965,9 @@ static void __io_req_complete_post(struct io_kiocb *req) { struct io_ring_ctx *ctx = req->ctx; + if (req->flags & REQ_F_FUSED_SLAVE) + io_fused_cmd_return_kbuf(req); + io_cq_lock(ctx); if (!(req->flags & REQ_F_CQE_SKIP)) io_fill_cqe_req(ctx, req); @@ -1848,6 +1852,8 @@ static void io_clean_op(struct io_kiocb *req) spin_lock(&req->ctx->completion_lock); io_put_kbuf_comp(req); spin_unlock(&req->ctx->completion_lock); + } else if (req->flags & REQ_F_FUSED_SLAVE) { + io_fused_cmd_return_kbuf(req); } if (req->flags & REQ_F_NEED_CLEANUP) { @@ -2156,8 +2162,8 @@ static void io_init_req_drain(struct io_kiocb *req) } } -static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, - const struct io_uring_sqe *sqe) +int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe, bool slave) __must_hold(&ctx->uring_lock) { const struct io_issue_def *def; @@ -2210,6 +2216,12 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, } } + if (slave) { + if (!def->fused_slave) + return -EINVAL; + req->flags |= REQ_F_FUSED_SLAVE; + } + if (!def->ioprio && sqe->ioprio) return -EINVAL; if (!def->iopoll && (ctx->flags & IORING_SETUP_IOPOLL)) @@ -2294,7 +2306,7 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, struct io_submit_link *link = &ctx->submit_state.link; int ret; - ret = io_init_req(ctx, req, sqe); + ret = io_init_req(ctx, req, sqe, false); if (unlikely(ret)) return io_submit_fail_init(sqe, req, ret); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2711865f1e19..a50c7e1f6e81 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -78,6 +78,9 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); +int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe, bool slave); + #define io_lockdep_assert_cq_locked(ctx) \ do { \ if (ctx->flags & IORING_SETUP_IOPOLL) { \ diff --git a/io_uring/opdef.c b/io_uring/opdef.c index cca7c5b55208..63b90e8e65f8 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -33,6 +33,7 @@ #include "poll.h" #include "cancel.h" #include "rw.h" +#include "fused_cmd.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -428,6 +429,12 @@ const struct io_issue_def io_issue_defs[] = { .prep = io_eopnotsupp_prep, #endif }, + [IORING_OP_FUSED_CMD] = { + .needs_file = 1, + .plug = 1, + .prep = io_fused_cmd_prep, + .issue = io_fused_cmd, + }, }; @@ -648,6 +655,11 @@ const struct io_cold_def io_cold_defs[] = { .fail = io_sendrecv_fail, #endif }, + [IORING_OP_FUSED_CMD] = { + .name = "FUSED_CMD", + .async_size = uring_cmd_pdu_size(1), + .prep_async = io_uring_cmd_prep_async, + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/opdef.h b/io_uring/opdef.h index c22c8696e749..306f6fc48ed4 100644 --- a/io_uring/opdef.h +++ b/io_uring/opdef.h @@ -29,6 +29,8 @@ struct io_issue_def { unsigned iopoll_queue : 1; /* opcode specific path will handle ->async_data allocation if needed */ unsigned manual_alloc : 1; + /* can be slave op of fused command */ + unsigned fused_slave : 1; int (*issue)(struct io_kiocb *, unsigned int); int (*prep)(struct io_kiocb *, const struct io_uring_sqe *); From patchwork Wed Mar 1 14:06:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE0CEC7EE33 for ; Wed, 1 Mar 2023 14:09:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229964AbjCAOJW (ORCPT ); Wed, 1 Mar 2023 09:09:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229947AbjCAOJO (ORCPT ); Wed, 1 Mar 2023 09:09:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C6D625975 for ; Wed, 1 Mar 2023 06:08:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679703; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XBW5uND8g4lHs9X+v3T5jlhH8yD3vk2lYh4eamapBtY=; b=M61qsFFRGET/UL9SFEaUbXzbNxx42mbMnkirTQl/K7F4MgVXipxIf7RZjKPzfbSY9q2bgZ S4pye3qGyh4wb4hyKbYnb6MbJX68eiFBEVdPej16LlXl8mA0Ow0J+l94Egd9vne3hl5a+q XQ1CzFFlTpQYBm+eW4zyk6yqVkuVDes= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-75-6x8nZmzYNPCFpoK8hLlXrw-1; Wed, 01 Mar 2023 09:08:20 -0500 X-MC-Unique: 6x8nZmzYNPCFpoK8hLlXrw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2351419705BD; Wed, 1 Mar 2023 14:06:55 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3367B492B00; Wed, 1 Mar 2023 14:06:53 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 08/12] io_uring: support OP_READ/OP_WRITE for fused slave request Date: Wed, 1 Mar 2023 22:06:07 +0800 Message-Id: <20230301140611.163055-9-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Start to allow fused slave request to support OP_READ/OP_WRITE, and the buffer can be retrieved from master request. Once the slave request is completed, the master buffer will be returned back. Signed-off-by: Ming Lei --- io_uring/opdef.c | 2 ++ io_uring/rw.c | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 63b90e8e65f8..f044629e5475 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -235,6 +235,7 @@ const struct io_issue_def io_issue_defs[] = { .ioprio = 1, .iopoll = 1, .iopoll_queue = 1, + .fused_slave = 1, .prep = io_prep_rw, .issue = io_read, }, @@ -248,6 +249,7 @@ const struct io_issue_def io_issue_defs[] = { .ioprio = 1, .iopoll = 1, .iopoll_queue = 1, + .fused_slave = 1, .prep = io_prep_rw, .issue = io_write, }, diff --git a/io_uring/rw.c b/io_uring/rw.c index 4c233910e200..36d31a943317 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -19,6 +19,7 @@ #include "kbuf.h" #include "rsrc.h" #include "rw.h" +#include "fused_cmd.h" struct io_rw { /* NOTE: kiocb has the file as the first member, so don't do it here */ @@ -371,6 +372,17 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, size_t sqe_len; ssize_t ret; + /* + * SLAVE OP passes buffer offset from sqe->addr actually, since + * the fused cmd kbuf's mapped start address is zero. + */ + if (req->flags & REQ_F_FUSED_SLAVE) { + ret = io_import_kbuf_for_slave(rw->addr, rw->len, ddir, iter, req); + if (ret) + return ERR_PTR(ret); + return NULL; + } + if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) { ret = io_import_fixed(ddir, iter, req->imu, rw->addr, rw->len); if (ret) @@ -428,11 +440,19 @@ static inline loff_t *io_kiocb_ppos(struct kiocb *kiocb) */ static ssize_t loop_rw_iter(int ddir, struct io_rw *rw, struct iov_iter *iter) { + struct io_kiocb *req = cmd_to_io_kiocb(rw); struct kiocb *kiocb = &rw->kiocb; struct file *file = kiocb->ki_filp; ssize_t ret = 0; loff_t *ppos; + /* + * Fused slave req hasn't user buffer, so ->read/->write can't + * be supported + */ + if (req->flags & REQ_F_FUSED_SLAVE) + return -EOPNOTSUPP; + /* * Don't support polled IO through this interface, and we can't * support non-blocking either. For the latter, this just causes From patchwork Wed Mar 1 14:06:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65E32C83003 for ; Wed, 1 Mar 2023 14:09:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229947AbjCAOJX (ORCPT ); Wed, 1 Mar 2023 09:09:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229948AbjCAOJP (ORCPT ); Wed, 1 Mar 2023 09:09:15 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5FF235A0 for ; Wed, 1 Mar 2023 06:08:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679706; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YhpzQlaHIrcPWh0ozP8XZ7+aBtM5xeXxoPc3fRNzGPY=; b=MagwIYPlDUghTDlYeN2UIgE7m4mWopv0h6kV5k6oi57JCaIehSF71QsAOfaxSNK1wOIPw2 BYQz6sxn9dZ6Dr9NvR+/WX3SNiY5Wi6iyfNXydtXd/8LlHHBW2fAf1SwJ/t4afGJVKoMIo Bcl9zfuKNAtY36NSIBrJR/ynMohVO70= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-471-WBFGjnlaOVmES13dKljw6Q-1; Wed, 01 Mar 2023 09:08:23 -0500 X-MC-Unique: WBFGjnlaOVmES13dKljw6Q-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 76CEF95E165; Wed, 1 Mar 2023 14:06:58 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id B65C0492C14; Wed, 1 Mar 2023 14:06:57 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 09/12] io_uring: support OP_SEND_ZC/OP_RECV for fused slave request Date: Wed, 1 Mar 2023 22:06:08 +0800 Message-Id: <20230301140611.163055-10-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Start to allow fused slave request to support OP_SEND_ZC/OP_RECV, and the buffer can be retrieved from master request. Once the slave request is completed, the master buffer will be returned back. Signed-off-by: Ming Lei --- io_uring/net.c | 23 +++++++++++++++++++++-- io_uring/opdef.c | 3 +++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index cbd4b725f58c..be5ae5ca823d 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "fused_cmd.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -378,7 +379,11 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!sock)) return -ENOTSOCK; - ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + if (!(req->flags & REQ_F_FUSED_SLAVE)) + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + else + ret = io_import_kbuf_for_slave((u64)sr->buf, sr->len, + ITER_SOURCE, &msg.msg_iter, req); if (unlikely(ret)) return ret; @@ -869,7 +874,11 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) sr->buf = buf; } - ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); + if (!(req->flags & REQ_F_FUSED_SLAVE)) + ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); + else + ret = io_import_kbuf_for_slave((u64)sr->buf, sr->len, ITER_DEST, + &msg.msg_iter, req); if (unlikely(ret)) goto out_free; @@ -983,6 +992,9 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (zc->flags & IORING_RECVSEND_FIXED_BUF) { unsigned idx = READ_ONCE(sqe->buf_index); + if (req->flags & REQ_F_FUSED_SLAVE) + return -EINVAL; + if (unlikely(idx >= ctx->nr_user_bufs)) return -EFAULT; idx = array_index_nospec(idx, ctx->nr_user_bufs); @@ -1119,8 +1131,15 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret)) return ret; msg.sg_from_iter = io_sg_from_iter; + } else if (req->flags & REQ_F_FUSED_SLAVE) { + ret = io_import_kbuf_for_slave((u64)zc->buf, zc->len, + ITER_SOURCE, &msg.msg_iter, req); + if (unlikely(ret)) + return ret; + msg.sg_from_iter = io_sg_from_iter; } else { io_notif_set_extended(zc->notif); + ret = import_ubuf(ITER_SOURCE, zc->buf, zc->len, &msg.msg_iter); if (unlikely(ret)) return ret; diff --git a/io_uring/opdef.c b/io_uring/opdef.c index f044629e5475..0a9d39a9db16 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -271,6 +271,7 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .fused_slave = 1, #if defined(CONFIG_NET) .prep = io_sendmsg_prep, .issue = io_send, @@ -285,6 +286,7 @@ const struct io_issue_def io_issue_defs[] = { .buffer_select = 1, .audit_skip = 1, .ioprio = 1, + .fused_slave = 1, #if defined(CONFIG_NET) .prep = io_recvmsg_prep, .issue = io_recv, @@ -411,6 +413,7 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .fused_slave = 1, #if defined(CONFIG_NET) .prep = io_send_zc_prep, .issue = io_send_zc, From patchwork Wed Mar 1 14:06:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156040 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 722BBC7EE33 for ; Wed, 1 Mar 2023 14:09:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229914AbjCAOJR (ORCPT ); Wed, 1 Mar 2023 09:09:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229955AbjCAOJM (ORCPT ); Wed, 1 Mar 2023 09:09:12 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 130AD199E1 for ; Wed, 1 Mar 2023 06:08:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679707; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V3L3qa8/Pg6AVZb7VqjF5eAJuHVvLbGtfEnBeRmRE68=; b=ThmWxphFTJVc+zbDTjYM+cc4ICM92f3b5mFv9jk7bLFZmjEE+bWEnR2oqNeDqB6lfp28qy xNw2nnfaDq1Ao7XyP48mInWpuyCfwIvuxjTebFWX/KIR7VbLl3WCBvqoOjf8Nkmte2/uAc BUopRhzbgEl0FhaYoq4FWbqbZunnweQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-444-Tq1edAIIMfKWcpwrfil9Xg-1; Wed, 01 Mar 2023 09:08:23 -0500 X-MC-Unique: Tq1edAIIMfKWcpwrfil9Xg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7EF451115A9B; Wed, 1 Mar 2023 14:07:03 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id B8EEA40B40DF; Wed, 1 Mar 2023 14:07:02 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 10/12] block: ublk_drv: mark device as LIVE before adding disk Date: Wed, 1 Mar 2023 22:06:09 +0800 Message-Id: <20230301140611.163055-11-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org IO can be started before add_disk() returns, such as reading parititon table, then the monitor work should work for making forward progress. So mark device as LIVE before adding disk, meantime change to DEAD if add_disk() fails. Signed-off-by: Ming Lei Reviewed-by: Ziyang Zhang --- drivers/block/ublk_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index c89ede1c9b22..2497b91b48ba 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1608,17 +1608,18 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd) set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); get_device(&ub->cdev_dev); + ub->dev_info.state = UBLK_S_DEV_LIVE; ret = add_disk(disk); if (ret) { /* * Has to drop the reference since ->free_disk won't be * called in case of add_disk failure. */ + ub->dev_info.state = UBLK_S_DEV_DEAD; ublk_put_device(ub); goto out_put_disk; } set_bit(UB_STATE_USED, &ub->state); - ub->dev_info.state = UBLK_S_DEV_LIVE; out_put_disk: if (ret) put_disk(disk); From patchwork Wed Mar 1 14:06:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DA8FC64ED6 for ; Wed, 1 Mar 2023 14:09:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229905AbjCAOJU (ORCPT ); Wed, 1 Mar 2023 09:09:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229940AbjCAOJO (ORCPT ); Wed, 1 Mar 2023 09:09:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0997122DDB for ; Wed, 1 Mar 2023 06:08:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679693; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Vhpj5XHtWAuet7CTecEivCEMI579/oqA0+hHiaLE9Qw=; b=UMHszW4Ho9eHpJzyboJweET5olD7OlcJ244Eyeb5Ht4uLOzmpyu25XS13WjDcYUguXHhwM SWMz38tNR2NwsYhXWbZIzRWTp7xTCQ8oS9ifxQOuTKEUMBeAmsCO4I8mManey7Ntuwc8sH r32qi2Y9o9v621hd++IeA7UJbATFka8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-90-0fP1R0LHOai5xC-gLFinCA-1; Wed, 01 Mar 2023 09:08:11 -0500 X-MC-Unique: 0fP1R0LHOai5xC-gLFinCA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D5AA63C0F240; Wed, 1 Mar 2023 14:07:06 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1B9D32026D68; Wed, 1 Mar 2023 14:07:05 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 11/12] block: ublk_drv: add common exit handling Date: Wed, 1 Mar 2023 22:06:10 +0800 Message-Id: <20230301140611.163055-12-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Simplify exit handling a bit, and prepare for supporting fused command. Signed-off-by: Ming Lei Reviewed-by: Ziyang Zhang --- drivers/block/ublk_drv.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 2497b91b48ba..b9e38ebabca7 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -655,14 +655,15 @@ static void ublk_complete_rq(struct request *req) struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; unsigned int unmapped_bytes; + int res = BLK_STS_OK; /* failed read IO if nothing is read */ if (!io->res && req_op(req) == REQ_OP_READ) io->res = -EIO; if (io->res < 0) { - blk_mq_end_request(req, errno_to_blk_status(io->res)); - return; + res = errno_to_blk_status(io->res); + goto exit; } /* @@ -671,10 +672,8 @@ static void ublk_complete_rq(struct request *req) * * Both the two needn't unmap. */ - if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE) { - blk_mq_end_request(req, BLK_STS_OK); - return; - } + if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE) + goto exit; /* for READ request, writing data in iod->addr to rq buffers */ unmapped_bytes = ublk_unmap_io(ubq, req, io); @@ -691,6 +690,10 @@ static void ublk_complete_rq(struct request *req) blk_mq_requeue_request(req, true); else __blk_mq_end_request(req, BLK_STS_OK); + + return; +exit: + blk_mq_end_request(req, res); } /* From patchwork Wed Mar 1 14:06:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13156048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59D56C7EE2F for ; Wed, 1 Mar 2023 14:09:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229695AbjCAOJX (ORCPT ); Wed, 1 Mar 2023 09:09:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229949AbjCAOJP (ORCPT ); Wed, 1 Mar 2023 09:09:15 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47AB21F4A3 for ; Wed, 1 Mar 2023 06:08:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677679685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WJE4DhlpagUTIuGDdUJ2rg+ixtBuoV0cAAwn6SVonDk=; b=c0UREP5MqLbXm0f9TSQ9X8RV4dwW2+TbVtzXaqcf+0O4dfsDCtPAgFbOWXa8f3kSZ7K+OR rHBalilPxXhToWAfMw+yQLgyJDxsF7XvhX2GdrG/TXskFnCORRiyB6L5dKqkmOQNvcbjwu oOILFaJCvUqKgpm5MxLrucIyXicDWlA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-647-18hTaVV_OgmPyEYUZY9P1g-1; Wed, 01 Mar 2023 09:08:03 -0500 X-MC-Unique: 18hTaVV_OgmPyEYUZY9P1g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9BC7E891F41; Wed, 1 Mar 2023 14:07:10 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id A4C122166B26; Wed, 1 Mar 2023 14:07:09 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org Cc: linux-block@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Ming Lei Subject: [RFC PATCH 12/12] block: ublk_drv: apply io_uring FUSED_CMD for supporting zero copy Date: Wed, 1 Mar 2023 22:06:11 +0800 Message-Id: <20230301140611.163055-13-ming.lei@redhat.com> In-Reply-To: <20230301140611.163055-1-ming.lei@redhat.com> References: <20230301140611.163055-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Apply io_uring fused command for supporting zero copy: 1) init the fused cmd buffer(io_mapped_buf) in ublk_map_io(), and deinit it in ublk_unmap_io(), and this buffer is immutable, so it is just fine to retrieve it from concurrent fused command. 1) add sub-command opcode of UBLK_IO_FUSED_SUBMIT_IO for retrieving this fused cmd(zero copy) buffer 2) call io_fused_cmd_provide_kbuf() to provide buffer to slave request; meantime setup complete callback via this API, once slave request is completed, the complete callback is called for freeing the buffer and completing the uring fused command Todo: don't complete ublk block request until all in-flight fused commands aiming this request are completed; this change requires to clean up current ublk driver a bit, so delay this work in future post, and it won't affect reviewing on this whole approach. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 167 ++++++++++++++++++++++++++++++++-- include/uapi/linux/ublk_cmd.h | 1 + 2 files changed, 160 insertions(+), 8 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index b9e38ebabca7..56a362798aa7 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -62,6 +62,8 @@ struct ublk_rq_data { struct llist_node node; struct callback_head work; + bool allocated_bvec; + struct io_mapped_buf buf[0]; }; struct ublk_uring_cmd_pdu { @@ -525,10 +527,87 @@ static inline int ublk_copy_user_pages(struct ublk_map_data *data, return done; } -static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, +/* + * The built command buffer is immutable, so it is fine to feed it to + * concurrent io_uring fused commands + */ +static int ublk_init_zero_copy_buffer(struct request *rq) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct io_mapped_buf *imu = data->buf; + struct req_iterator rq_iter; + unsigned int nr_bvecs = 0; + struct bio_vec *bvec; + unsigned int offset; + struct bio_vec bv; + + if (!ublk_rq_has_data(rq)) + goto exit; + + rq_for_each_bvec(bv, rq, rq_iter) + nr_bvecs++; + + if (!nr_bvecs) + goto exit; + + if (rq->bio != rq->biotail) { + int idx = 0; + + bvec = kvmalloc_array(sizeof(struct bio_vec), nr_bvecs, + GFP_NOIO); + if (!bvec) + return -ENOMEM; + + offset = 0; + rq_for_each_bvec(bv, rq, rq_iter) + bvec[idx++] = bv; + data->allocated_bvec = true; + } else { + struct bio *bio = rq->bio; + + offset = bio->bi_iter.bi_bvec_done; + bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); + } + imu->bvec = bvec; + imu->nr_bvecs = nr_bvecs; + imu->offset = offset; + imu->buf = 0; + imu->buf_end = blk_rq_bytes(rq); + + return 0; +exit: + imu->bvec = NULL; + return 0; +} + +static void ublk_deinit_zero_copy_buffer(struct request *rq) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct io_mapped_buf *imu = data->buf; + + if (data->allocated_bvec) { + kvfree(imu->bvec); + data->allocated_bvec = false; + } +} + +static int ublk_map_io(const struct ublk_queue *ubq, struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + + if (ubq->flags & UBLK_F_SUPPORT_ZERO_COPY) { + int ret = ublk_init_zero_copy_buffer(req); + + /* + * The only failure is -ENOMEM for allocating fused cmd + * buffer, return zero so that we can requeue this req. + */ + if (unlikely(ret)) + return 0; + return rq_bytes; + } + /* * no zero copy, we delay copy WRITE request data into ublksrv * context and the big benefit is that pinning pages in current @@ -553,11 +632,17 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, } static int ublk_unmap_io(const struct ublk_queue *ubq, - const struct request *req, + struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ubq->flags & UBLK_F_SUPPORT_ZERO_COPY) { + ublk_deinit_zero_copy_buffer(req); + + return rq_bytes; + } + if (req_op(req) == REQ_OP_READ && ublk_rq_has_data(req)) { struct ublk_map_data data = { .ubq = ubq, @@ -693,6 +778,7 @@ static void ublk_complete_rq(struct request *req) return; exit: + ublk_deinit_zero_copy_buffer(req); blk_mq_end_request(req, res); } @@ -1259,6 +1345,66 @@ static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id, ublk_queue_cmd(ubq, req); } +static inline bool ublk_check_fused_buf_dir(const struct request *req, + unsigned int flags) +{ + flags &= IO_URING_F_FUSED; + + if (req_op(req) == REQ_OP_READ && flags == IO_URING_F_FUSED_WRITE) + return true; + + if (req_op(req) == REQ_OP_WRITE && flags == IO_URING_F_FUSED_READ) + return true; + + return false; +} + +static void ublk_fused_cmd_done_cb(struct io_uring_cmd *cmd) +{ + io_uring_cmd_done(cmd, cmd->fused.data.slave_res, 0); +} + +static int ublk_handle_fused_cmd(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, int tag, unsigned int issue_flags) +{ + struct ublk_device *ub = cmd->file->private_data; + struct ublk_rq_data *data; + struct request *req; + + if (!ub) + return -EPERM; + + if (!(issue_flags & IO_URING_F_FUSED)) + goto exit; + + if (ub->dev_info.state == UBLK_S_DEV_DEAD) + goto exit; + + if (!(ubq->flags & UBLK_F_SUPPORT_ZERO_COPY)) + goto exit; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req || !blk_mq_request_started(req)) + goto exit; + + pr_devel("%s: qid %d tag %u request bytes %u, issue flags %x\n", + __func__, tag, ubq->q_id, blk_rq_bytes(req), + issue_flags); + + if (!ublk_check_fused_buf_dir(req, issue_flags)) + goto exit; + + if (!ublk_rq_has_data(req)) + goto exit; + + data = blk_mq_rq_to_pdu(req); + io_fused_cmd_provide_kbuf(cmd, !(issue_flags & IO_URING_F_UNLOCKED), + data->buf, ublk_fused_cmd_done_cb); + return -EIOCBQUEUED; +exit: + return -EINVAL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd; @@ -1277,7 +1423,8 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (!(issue_flags & IO_URING_F_SQE128)) goto out; - if (issue_flags & IO_URING_F_FUSED) + if ((issue_flags & IO_URING_F_FUSED) && + cmd_op != UBLK_IO_FUSED_SUBMIT_IO) return -EOPNOTSUPP; if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) @@ -1287,7 +1434,8 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (!ubq || ub_cmd->q_id != ubq->q_id) goto out; - if (ubq->ubq_daemon && ubq->ubq_daemon != current) + if ((ubq->ubq_daemon && ubq->ubq_daemon != current) && + (cmd_op != UBLK_IO_FUSED_SUBMIT_IO)) goto out; if (tag >= ubq->q_depth) @@ -1310,6 +1458,9 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) goto out; switch (cmd_op) { + case UBLK_IO_FUSED_SUBMIT_IO: + return ublk_handle_fused_cmd(cmd, ubq, tag, issue_flags); + case UBLK_IO_FETCH_REQ: /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */ if (ublk_queue_ready(ubq)) { @@ -1533,11 +1684,14 @@ static void ublk_align_max_io_size(struct ublk_device *ub) static int ublk_add_tag_set(struct ublk_device *ub) { + int zc = !!(ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY); + struct ublk_rq_data *data; + ub->tag_set.ops = &ublk_mq_ops; ub->tag_set.nr_hw_queues = ub->dev_info.nr_hw_queues; ub->tag_set.queue_depth = ub->dev_info.queue_depth; ub->tag_set.numa_node = NUMA_NO_NODE; - ub->tag_set.cmd_size = sizeof(struct ublk_rq_data); + ub->tag_set.cmd_size = struct_size(data, buf, zc); ub->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; ub->tag_set.driver_data = ub; return blk_mq_alloc_tag_set(&ub->tag_set); @@ -1756,9 +1910,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) if (!IS_BUILTIN(CONFIG_BLK_DEV_UBLK)) ub->dev_info.flags |= UBLK_F_URING_CMD_COMP_IN_TASK; - /* We are not ready to support zero copy */ - ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; - ub->dev_info.nr_hw_queues = min_t(unsigned int, ub->dev_info.nr_hw_queues, nr_cpu_ids); ublk_align_max_io_size(ub); diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index f6238ccc7800..027e60e49cc8 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -44,6 +44,7 @@ #define UBLK_IO_FETCH_REQ 0x20 #define UBLK_IO_COMMIT_AND_FETCH_REQ 0x21 #define UBLK_IO_NEED_GET_DATA 0x22 +#define UBLK_IO_FUSED_SUBMIT_IO 0x23 /* only ABORT means that no re-fetch */ #define UBLK_IO_RES_OK 0