From patchwork Mon Jun 12 19:03:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13277174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84AE9C7EE2F for ; Mon, 12 Jun 2023 19:04:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234957AbjFLTE3 (ORCPT ); Mon, 12 Jun 2023 15:04:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236885AbjFLTEZ (ORCPT ); Mon, 12 Jun 2023 15:04:25 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 535D41701 for ; Mon, 12 Jun 2023 12:04:10 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35CHKcJm004290 for ; Mon, 12 Jun 2023 12:04:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=AnQZk+f60ziKqHwJmcKaOoi0i87epwPwExTQ/NdgR44=; b=PssIqNYazUc8DZkkmexKG9ro8be45fY+Oh3YRmZF3PsTSJMNBisrdwGcKzAtkxDTBcj8 LxP4RRc9pZi909UfU9XyqFOM+i21s+KBtezDn832A19Jh/j5h6Pd7W4/TRaH/bYuYMGb Leg8Lc03hOsgUEmCspTkM86tLVZLx7OKR9iaUfHi9QnMi0Lx/RxD5F2JknesPErBKXxU omd0iZ7FFR9iiy34LR0wlf/MmovVQtW4r1VbSAiEigarRbTgxsbPliVODdj7S4oqgb+Q N+tb7vNGuBDOTGVFWSRTAmx9aMZSw9V3PFTKdl82Gj2K7H/RMQ8jR/Zoflj5O/IQU4AO sg== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3r5xdtvdkw-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 12 Jun 2023 12:04:08 -0700 Received: from twshared6352.02.ash9.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 12:03:56 -0700 Received: by devbig007.nao1.facebook.com (Postfix, from userid 544533) id 62CD819FA77A4; Mon, 12 Jun 2023 12:03:44 -0700 (PDT) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv3 1/2] block: add request polling helper Date: Mon, 12 Jun 2023 12:03:42 -0700 Message-ID: <20230612190343.2087040-2-kbusch@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230612190343.2087040-1-kbusch@meta.com> References: <20230612190343.2087040-1-kbusch@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: C3EXMIB6AiJiCcxbDm8DmrwvP8YPbFjX X-Proofpoint-ORIG-GUID: C3EXMIB6AiJiCcxbDm8DmrwvP8YPbFjX X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_14,2023-06-12_02,2023-05-22_02 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Keith Busch Provide a direct request polling will for drivers. The interface does not require a bio, and can skip the overhead associated with polling those. The biggest gain from skipping the relatively expensive xarray lookup unnecessary when you already have the request. With this, the simple rq/qc conversion functions have only one caller each, so open code this and remove the helpers. Signed-off-by: Keith Busch Reviewed-by: Kanchan Joshi Reviewed-by: Sagi Grimberg Reviewed-by: Christoph Hellwig --- block/blk-mq.c | 48 ++++++++++++++++++++++++++++-------------- include/linux/blk-mq.h | 2 ++ 2 files changed, 34 insertions(+), 16 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 1749f5890606b..b82b5b3324108 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -49,17 +49,8 @@ static void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags); static void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, struct list_head *list); - -static inline struct blk_mq_hw_ctx *blk_qc_to_hctx(struct request_queue *q, - blk_qc_t qc) -{ - return xa_load(&q->hctx_table, qc); -} - -static inline blk_qc_t blk_rq_to_qc(struct request *rq) -{ - return rq->mq_hctx->queue_num; -} +static int blk_hctx_poll(struct request_queue *q, struct blk_mq_hw_ctx *hctx, + struct io_comp_batch *iob, unsigned int flags); /* * Check if any of the ctx, dispatch list or elevator @@ -1247,7 +1238,7 @@ void blk_mq_start_request(struct request *rq) q->integrity.profile->prepare_fn(rq); #endif if (rq->bio && rq->bio->bi_opf & REQ_POLLED) - WRITE_ONCE(rq->bio->bi_cookie, blk_rq_to_qc(rq)); + WRITE_ONCE(rq->bio->bi_cookie, rq->mq_hctx->queue_num); } EXPORT_SYMBOL(blk_mq_start_request); @@ -1349,7 +1340,7 @@ EXPORT_SYMBOL_GPL(blk_rq_is_poll); static void blk_rq_poll_completion(struct request *rq, struct completion *wait) { do { - blk_mq_poll(rq->q, blk_rq_to_qc(rq), NULL, 0); + blk_hctx_poll(rq->q, rq->mq_hctx, NULL, 0); cond_resched(); } while (!completion_done(wait)); } @@ -4735,10 +4726,9 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) } EXPORT_SYMBOL_GPL(blk_mq_update_nr_hw_queues); -int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch *iob, - unsigned int flags) +static int blk_hctx_poll(struct request_queue *q, struct blk_mq_hw_ctx *hctx, + struct io_comp_batch *iob, unsigned int flags) { - struct blk_mq_hw_ctx *hctx = blk_qc_to_hctx(q, cookie); long state = get_current_state(); int ret; @@ -4763,6 +4753,32 @@ int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch * return 0; } +int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, + struct io_comp_batch *iob, unsigned int flags) +{ + struct blk_mq_hw_ctx *hctx = xa_load(&q->hctx_table, cookie); + + return blk_hctx_poll(q, hctx, iob, flags); +} + +int blk_rq_poll(struct request *rq, struct io_comp_batch *iob, + unsigned int poll_flags) +{ + struct request_queue *q = rq->q; + int ret; + + if (!blk_rq_is_poll(rq)) + return 0; + if (!percpu_ref_tryget(&q->q_usage_counter)) + return 0; + + ret = blk_hctx_poll(q, rq->mq_hctx, iob, poll_flags); + blk_queue_exit(q); + + return ret; +} +EXPORT_SYMBOL_GPL(blk_rq_poll); + unsigned int blk_mq_rq_cpu(struct request *rq) { return rq->mq_ctx->cpu; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 59b52ec155b1b..9f4d285b52b93 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -715,6 +715,8 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set, void blk_mq_free_tag_set(struct blk_mq_tag_set *set); void blk_mq_free_request(struct request *rq); +int blk_rq_poll(struct request *rq, struct io_comp_batch *iob, + unsigned int poll_flags); bool blk_mq_queue_inflight(struct request_queue *q); From patchwork Mon Jun 12 19:03:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13277173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17A85C88CB2 for ; Mon, 12 Jun 2023 19:04:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237698AbjFLTEE (ORCPT ); Mon, 12 Jun 2023 15:04:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237705AbjFLTEC (ORCPT ); Mon, 12 Jun 2023 15:04:02 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 728AA10CB for ; Mon, 12 Jun 2023 12:03:59 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id 35CHpc3i019469 for ; Mon, 12 Jun 2023 12:03:58 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=yk/4kOW+1au1JCuvMoFmqdxb/rxmmIIdIi1gshRX62w=; b=Zv6I49YLTpF8cEnz48V8YybvoJaJMuKI0Pb0QIZjBfoq+CkqlKOXTg4kgxsC5QpjY0QI HGyvXid2A2bAemFuInikX3y7YHBv4rWfdQLHe5d/XsD3z68r1LyTL3qPFj2KRt9zo5+O qATSimYScCJgXrxRa4vjfFmzxkObGLm26i8IEl0x4VJAOV/nRRqAvGCpGCDMPKJcgcPM k8CzEqWAWKIWZ5xKQV0u0qkw3wAe5S0f9FV2HRhrbR9VXVkKMTY0a3CIKyEBnyLhPEpT QH6EGEPtRT0EE6TrHYVjtMK52Zk8fh97ESW3y4UeX9uXG1hH4hzJGLkt+Uk2nQGbttij 6Q== Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3r5xa5meng-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 12 Jun 2023 12:03:58 -0700 Received: from twshared6352.02.ash9.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 12:03:56 -0700 Received: by devbig007.nao1.facebook.com (Postfix, from userid 544533) id B236119FA77A7; Mon, 12 Jun 2023 12:03:46 -0700 (PDT) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv3 2/2] nvme: improved uring polling Date: Mon, 12 Jun 2023 12:03:43 -0700 Message-ID: <20230612190343.2087040-3-kbusch@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230612190343.2087040-1-kbusch@meta.com> References: <20230612190343.2087040-1-kbusch@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: LBXYQLxM7pnQklOmE4QO_tMBOzPQdwcx X-Proofpoint-ORIG-GUID: LBXYQLxM7pnQklOmE4QO_tMBOzPQdwcx X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_14,2023-06-12_02,2023-05-22_02 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Keith Busch Drivers can poll requests directly, so use that. We just need to ensure the driver's request was allocated from a polled hctx, so a special driver flag is added to struct io_uring_cmd. The allows unshared and multipath namespaces to use the same polling callback, and multipath is guaranteed to get the same queue as the command was submitted on. Previously multipath polling might check a different path and poll the wrong info. The other bonus is we don't need a bio payload in order to poll, allowing commands like 'flush' and 'write zeroes' to be submitted on the same high priority queue as read and write commands. Finally, using the request based polling skips the unnecessary bio overhead. Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg --- drivers/nvme/host/ioctl.c | 70 ++++++++++------------------------- drivers/nvme/host/multipath.c | 2 +- drivers/nvme/host/nvme.h | 2 - include/uapi/linux/io_uring.h | 2 + 4 files changed, 22 insertions(+), 54 deletions(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 52ed1094ccbb2..dd7603efed799 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -505,7 +505,6 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req, { struct io_uring_cmd *ioucmd = req->end_io_data; struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd); - void *cookie = READ_ONCE(ioucmd->cookie); req->bio = pdu->bio; if (nvme_req(req)->flags & NVME_REQ_CANCELLED) @@ -518,10 +517,12 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req, * For iopoll, complete it directly. * Otherwise, move the completion to task work. */ - if (cookie != NULL && blk_rq_is_poll(req)) + if (blk_rq_is_poll(req)) { + WRITE_ONCE(ioucmd->cookie, NULL); nvme_uring_task_cb(ioucmd, IO_URING_F_UNLOCKED); - else + } else { io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb); + } return RQ_END_IO_FREE; } @@ -531,7 +532,6 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io_meta(struct request *req, { struct io_uring_cmd *ioucmd = req->end_io_data; struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd); - void *cookie = READ_ONCE(ioucmd->cookie); req->bio = pdu->bio; pdu->req = req; @@ -540,10 +540,12 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io_meta(struct request *req, * For iopoll, complete it directly. * Otherwise, move the completion to task work. */ - if (cookie != NULL && blk_rq_is_poll(req)) + if (blk_rq_is_poll(req)) { + WRITE_ONCE(ioucmd->cookie, NULL); nvme_uring_task_meta_cb(ioucmd, IO_URING_F_UNLOCKED); - else + } else { io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_meta_cb); + } return RQ_END_IO_NONE; } @@ -599,7 +601,6 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, if (issue_flags & IO_URING_F_IOPOLL) rq_flags |= REQ_POLLED; -retry: req = nvme_alloc_user_request(q, &c, rq_flags, blk_flags); if (IS_ERR(req)) return PTR_ERR(req); @@ -613,17 +614,11 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, return ret; } - if (issue_flags & IO_URING_F_IOPOLL && rq_flags & REQ_POLLED) { - if (unlikely(!req->bio)) { - /* we can't poll this, so alloc regular req instead */ - blk_mq_free_request(req); - rq_flags &= ~REQ_POLLED; - goto retry; - } else { - WRITE_ONCE(ioucmd->cookie, req->bio); - req->bio->bi_opf |= REQ_POLLED; - } + if (blk_rq_is_poll(req)) { + ioucmd->flags |= IORING_URING_CMD_POLLED; + WRITE_ONCE(ioucmd->cookie, req); } + /* to free bio on completion, as req->bio will be null at that time */ pdu->bio = req->bio; pdu->meta_len = d.metadata_len; @@ -782,18 +777,16 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, struct io_comp_batch *iob, unsigned int poll_flags) { - struct bio *bio; + struct request *req; int ret = 0; - struct nvme_ns *ns; - struct request_queue *q; + + if (!(ioucmd->flags & IORING_URING_CMD_POLLED)) + return 0; rcu_read_lock(); - bio = READ_ONCE(ioucmd->cookie); - ns = container_of(file_inode(ioucmd->file)->i_cdev, - struct nvme_ns, cdev); - q = ns->queue; - if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio && bio->bi_bdev) - ret = bio_poll(bio, iob, poll_flags); + req = READ_ONCE(ioucmd->cookie); + if (req && blk_rq_is_poll(req)) + ret = blk_rq_poll(req, iob, poll_flags); rcu_read_unlock(); return ret; } @@ -885,31 +878,6 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd, srcu_read_unlock(&head->srcu, srcu_idx); return ret; } - -int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, - struct io_comp_batch *iob, - unsigned int poll_flags) -{ - struct cdev *cdev = file_inode(ioucmd->file)->i_cdev; - struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev); - int srcu_idx = srcu_read_lock(&head->srcu); - struct nvme_ns *ns = nvme_find_path(head); - struct bio *bio; - int ret = 0; - struct request_queue *q; - - if (ns) { - rcu_read_lock(); - bio = READ_ONCE(ioucmd->cookie); - q = ns->queue; - if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio - && bio->bi_bdev) - ret = bio_poll(bio, iob, poll_flags); - rcu_read_unlock(); - } - srcu_read_unlock(&head->srcu, srcu_idx); - return ret; -} #endif /* CONFIG_NVME_MULTIPATH */ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 2bc159a318ff0..61130b4976b04 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -470,7 +470,7 @@ static const struct file_operations nvme_ns_head_chr_fops = { .unlocked_ioctl = nvme_ns_head_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, .uring_cmd = nvme_ns_head_chr_uring_cmd, - .uring_cmd_iopoll = nvme_ns_head_chr_uring_cmd_iopoll, + .uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll, }; static int nvme_add_ns_head_cdev(struct nvme_ns_head *head) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a2d4f59e0535a..84aecf53870ae 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -852,8 +852,6 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg); int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, struct io_comp_batch *iob, unsigned int poll_flags); -int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, - struct io_comp_batch *iob, unsigned int poll_flags); int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags); int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd, diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index f222d263bc555..08720c7bd92f8 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -244,8 +244,10 @@ enum io_uring_op { * sqe->uring_cmd_flags * IORING_URING_CMD_FIXED use registered buffer; pass this flag * along with setting sqe->buf_index. + * IORING_URING_CMD_POLLED driver use only */ #define IORING_URING_CMD_FIXED (1U << 0) +#define IORING_URING_CMD_POLLED (1U << 31) /*