From patchwork Tue May 30 17:23:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13260798 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47C80C77B7A for ; Tue, 30 May 2023 17:26:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233382AbjE3R0O (ORCPT ); Tue, 30 May 2023 13:26:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233384AbjE3RZ4 (ORCPT ); Tue, 30 May 2023 13:25:56 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A22198 for ; Tue, 30 May 2023 10:25:25 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id 34UGY2wc022411 for ; Tue, 30 May 2023 10:24:05 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=bHlDZ98iZ2yfgOB8PPLdZzv6+izKLcs4GDZzU2G+rPE=; b=BqTHeYYaLgLknyO8xaTdKS0UwXRvuI2IwkNK2sLTn3jK8N3KW1VD0H5aZsV0mAxIMozc S/rJ8/HJI7IzZJxleACqD2zcXXn7oDgIkVfS3paskizdXVE91yesiQfFcbLwAW7h3MoT v5hiSJXWVmyKRIzuGpIuVsk0gZhzY/6mY9CE8BgWUVJvC0E4o61kiMpUZ7pBhJyAlfx/ uGtozZGfD8Me098sjNM4vYbNybvoItrUPf1QhlAaI9PSYJNSSOb6C54VneOAas8cozJQ M68m48u17PZoA+oW21PZZICvsn3G6XZxOT6aNKAHX2YUqEiACufyfcs6LrtYyrKMAVZ+ eg== Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3qwfhajw9a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 May 2023 10:24:04 -0700 Received: from twshared52232.38.frc1.facebook.com (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 30 May 2023 10:23:50 -0700 Received: by devbig007.nao1.facebook.com (Postfix, from userid 544533) id AB3A3194BF685; Tue, 30 May 2023 10:23:48 -0700 (PDT) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCH 1/2] block: add request polling helper Date: Tue, 30 May 2023 10:23:42 -0700 Message-ID: <20230530172343.3250958-1-kbusch@meta.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: XGIEio5RH9EimasGHl96d6hMBY2IaoTY X-Proofpoint-GUID: XGIEio5RH9EimasGHl96d6hMBY2IaoTY X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-30_12,2023-05-30_01,2023-05-22_02 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org From: Keith Busch This will be used by drivers that allocate polling requests. It interface does not require a bio, and can skip the overhead associated with polling those. Signed-off-by: Keith Busch Reviewed-by: Kanchan Joshi Reviewed-by: Sagi Grimberg --- block/blk-mq.c | 29 ++++++++++++++++++++++++++--- include/linux/blk-mq.h | 2 ++ 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index f6dad0886a2fa..3c12c476e3a5c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4740,10 +4740,9 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) } EXPORT_SYMBOL_GPL(blk_mq_update_nr_hw_queues); -int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch *iob, - unsigned int flags) +static int blk_hctx_poll(struct request_queue *q, struct blk_mq_hw_ctx *hctx, + struct io_comp_batch *iob, unsigned int flags) { - struct blk_mq_hw_ctx *hctx = blk_qc_to_hctx(q, cookie); long state = get_current_state(); int ret; @@ -4768,6 +4767,30 @@ int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch * return 0; } +int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch *iob, + unsigned int flags) +{ + return blk_hctx_poll(q, blk_qc_to_hctx(q, cookie), iob, flags); +} + +int blk_rq_poll(struct request *rq, struct io_comp_batch *iob, + unsigned int poll_flags) +{ + struct request_queue *q = rq->q; + int ret; + + if (!blk_rq_is_poll(rq)) + return 0; + if (!percpu_ref_tryget(&q->q_usage_counter)) + return 0; + + ret = blk_hctx_poll(q, rq->mq_hctx, iob, poll_flags); + blk_queue_exit(q); + + return ret; +} +EXPORT_SYMBOL_GPL(blk_rq_poll); + unsigned int blk_mq_rq_cpu(struct request *rq) { return rq->mq_ctx->cpu; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 06caacd77ed66..579818fa1f91d 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -722,6 +722,8 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set, void blk_mq_free_tag_set(struct blk_mq_tag_set *set); void blk_mq_free_request(struct request *rq); +int blk_rq_poll(struct request *rq, struct io_comp_batch *iob, + unsigned int poll_flags); bool blk_mq_queue_inflight(struct request_queue *q); From patchwork Tue May 30 17:23:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13260824 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55A73C7EE23 for ; Tue, 30 May 2023 17:42:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230132AbjE3Rmf (ORCPT ); Tue, 30 May 2023 13:42:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37238 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231272AbjE3Rmd (ORCPT ); Tue, 30 May 2023 13:42:33 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29BADB2 for ; Tue, 30 May 2023 10:42:31 -0700 (PDT) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34UGZjvp011482 for ; Tue, 30 May 2023 10:24:21 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=7ApPjhbD0YCmJA9+SszNafaDNZtkZLNh6kXcyxMsgIs=; b=cNJEJ9sESzq+U8rY7Xkot+YAY9g061OqLEzBmI7MsEqRL5fnHWuV2Ns+JzeyRrvuM8Gn HEwIsjc0w9HNjs+zo2XLee0Bc2kZPfg7uFLXW08IXvsn6xoFRkQcPfG9mxs1ltDccav/ t9vhDn+13RBoq9pMWQAIaewwQ6usmdH9brtnggvxaG/5193BaQVr6fFv72OdHYZViLq2 t6qJDXf7VoZlwcRmJMH9gPcLO9cFPWAGHnOj9F5QQnX7OCLqPAFgeBzbfH0Ps/0Gqz2E p8o4sxlIeQXE2zDYw1+NfqCg6uwZc0wTe23tzMUANhNK+F016xY9Wy554beJzvCFkBiA Zw== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qw7mjcm8a-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 May 2023 10:24:20 -0700 Received: from twshared25760.37.frc1.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 30 May 2023 10:24:00 -0700 Received: by devbig007.nao1.facebook.com (Postfix, from userid 544533) id B7A40194BF687; Tue, 30 May 2023 10:23:48 -0700 (PDT) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCH 2/2] nvme: improved uring polling Date: Tue, 30 May 2023 10:23:43 -0700 Message-ID: <20230530172343.3250958-2-kbusch@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230530172343.3250958-1-kbusch@meta.com> References: <20230530172343.3250958-1-kbusch@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: aHcFJ-jkWDG-YPqMSDEyhtduwb061nD3 X-Proofpoint-GUID: aHcFJ-jkWDG-YPqMSDEyhtduwb061nD3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-30_12,2023-05-30_01,2023-05-22_02 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org From: Keith Busch Drivers can poll requests directly, so use that. We just need to ensure the driver's request was allocated from a polled hctx, so a special driver flag is added to struct io_uring_cmd. The first advantage is unshared and multipath namespaces can use the same polling callback, and multipath is guaranteed to get the same queue as the command was submitted on. Previously multipath polling might check a different path and poll the wrong info. The other advantage is we don't need a bio payload in order to poll, allowing commands like 'flush' and 'write zeroes' to be submitted on the same high priority queue as read and write commands. And using the request based polling skips the unnecessary bio overhead and xarray hctx lookup when we have a request. Signed-off-by: Keith Busch Reviewed-by: Kanchan Joshi Reviewed-by: Sagi Grimberg --- drivers/nvme/host/ioctl.c | 68 +++++++++-------------------------- drivers/nvme/host/multipath.c | 2 +- drivers/nvme/host/nvme.h | 2 -- include/uapi/linux/io_uring.h | 2 ++ 4 files changed, 20 insertions(+), 54 deletions(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index d24ea2e051564..3fa9a50433f18 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -505,7 +505,6 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req, { struct io_uring_cmd *ioucmd = req->end_io_data; struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd); - void *cookie = READ_ONCE(ioucmd->cookie); req->bio = pdu->bio; if (nvme_req(req)->flags & NVME_REQ_CANCELLED) @@ -518,9 +517,10 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req, * For iopoll, complete it directly. * Otherwise, move the completion to task work. */ - if (cookie != NULL && blk_rq_is_poll(req)) + if (blk_rq_is_poll(req)) { + WRITE_ONCE(ioucmd->cookie, NULL); nvme_uring_task_cb(ioucmd, IO_URING_F_UNLOCKED); - else + } else io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb); return RQ_END_IO_FREE; @@ -531,7 +531,6 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io_meta(struct request *req, { struct io_uring_cmd *ioucmd = req->end_io_data; struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd); - void *cookie = READ_ONCE(ioucmd->cookie); req->bio = pdu->bio; pdu->req = req; @@ -540,9 +539,10 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io_meta(struct request *req, * For iopoll, complete it directly. * Otherwise, move the completion to task work. */ - if (cookie != NULL && blk_rq_is_poll(req)) + if (blk_rq_is_poll(req)) { + WRITE_ONCE(ioucmd->cookie, NULL); nvme_uring_task_meta_cb(ioucmd, IO_URING_F_UNLOCKED); - else + } else io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_meta_cb); return RQ_END_IO_NONE; @@ -599,7 +599,6 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, if (issue_flags & IO_URING_F_IOPOLL) rq_flags |= REQ_POLLED; -retry: req = nvme_alloc_user_request(q, &c, rq_flags, blk_flags); if (IS_ERR(req)) return PTR_ERR(req); @@ -613,17 +612,11 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, return ret; } - if (issue_flags & IO_URING_F_IOPOLL && rq_flags & REQ_POLLED) { - if (unlikely(!req->bio)) { - /* we can't poll this, so alloc regular req instead */ - blk_mq_free_request(req); - rq_flags &= ~REQ_POLLED; - goto retry; - } else { - WRITE_ONCE(ioucmd->cookie, req->bio); - req->bio->bi_opf |= REQ_POLLED; - } + if (blk_rq_is_poll(req)) { + ioucmd->flags |= IORING_URING_CMD_POLLED; + WRITE_ONCE(ioucmd->cookie, req); } + /* to free bio on completion, as req->bio will be null at that time */ pdu->bio = req->bio; pdu->meta_len = d.metadata_len; @@ -782,18 +775,16 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, struct io_comp_batch *iob, unsigned int poll_flags) { - struct bio *bio; + struct request *req; int ret = 0; - struct nvme_ns *ns; - struct request_queue *q; + + if (!(ioucmd->flags & IORING_URING_CMD_POLLED)) + return 0; rcu_read_lock(); - bio = READ_ONCE(ioucmd->cookie); - ns = container_of(file_inode(ioucmd->file)->i_cdev, - struct nvme_ns, cdev); - q = ns->queue; - if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio && bio->bi_bdev) - ret = bio_poll(bio, iob, poll_flags); + req = READ_ONCE(ioucmd->cookie); + if (req && blk_rq_is_poll(req)) + ret = blk_rq_poll(req, iob, poll_flags); rcu_read_unlock(); return ret; } @@ -885,31 +876,6 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd, srcu_read_unlock(&head->srcu, srcu_idx); return ret; } - -int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, - struct io_comp_batch *iob, - unsigned int poll_flags) -{ - struct cdev *cdev = file_inode(ioucmd->file)->i_cdev; - struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev); - int srcu_idx = srcu_read_lock(&head->srcu); - struct nvme_ns *ns = nvme_find_path(head); - struct bio *bio; - int ret = 0; - struct request_queue *q; - - if (ns) { - rcu_read_lock(); - bio = READ_ONCE(ioucmd->cookie); - q = ns->queue; - if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio - && bio->bi_bdev) - ret = bio_poll(bio, iob, poll_flags); - rcu_read_unlock(); - } - srcu_read_unlock(&head->srcu, srcu_idx); - return ret; -} #endif /* CONFIG_NVME_MULTIPATH */ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9171452e2f6d4..f17be1c72f4de 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -470,7 +470,7 @@ static const struct file_operations nvme_ns_head_chr_fops = { .unlocked_ioctl = nvme_ns_head_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, .uring_cmd = nvme_ns_head_chr_uring_cmd, - .uring_cmd_iopoll = nvme_ns_head_chr_uring_cmd_iopoll, + .uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll, }; static int nvme_add_ns_head_cdev(struct nvme_ns_head *head) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index bf46f122e9e1e..ca4ea89333660 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -847,8 +847,6 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg); int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, struct io_comp_batch *iob, unsigned int poll_flags); -int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, - struct io_comp_batch *iob, unsigned int poll_flags); int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags); int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd, diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 0716cb17e4360..f8d6ffe78073e 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -232,8 +232,10 @@ enum io_uring_op { * sqe->uring_cmd_flags * IORING_URING_CMD_FIXED use registered buffer; pass this flag * along with setting sqe->buf_index. + * IORING_URING_CMD_POLLED driver use only */ #define IORING_URING_CMD_FIXED (1U << 0) +#define IORING_URING_CMD_POLLED (1U << 31) /*