From patchwork Mon Feb 8 08:52:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 974A4C433DB for ; Mon, 8 Feb 2021 08:56:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4721164E3F for ; Mon, 8 Feb 2021 08:56:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231286AbhBHIze (ORCPT ); Mon, 8 Feb 2021 03:55:34 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:52096 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230245AbhBHIx2 (ORCPT ); Mon, 8 Feb 2021 03:53:28 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R411e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UO8W.Pu_1612774364; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UO8W.Pu_1612774364) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:45 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 01/11] block: move definition of blk_qc_t to types.h Date: Mon, 8 Feb 2021 16:52:33 +0800 Message-Id: <20210208085243.82367-2-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org So that kiocb.ki_cookie can be defined as blk_qc_t, which will enforce the encapsulation. Signed-off-by: Jeffle Xu Reviewed-by: Christoph Hellwig Reviewed-by: Mike Snitzer --- include/linux/blk_types.h | 2 +- include/linux/fs.h | 2 +- include/linux/types.h | 3 +++ 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 866f74261b3b..2e05244fc16d 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -532,7 +532,7 @@ static inline int op_stat_group(unsigned int op) return op_is_write(op); } -typedef unsigned int blk_qc_t; +/* Macros for blk_qc_t */ #define BLK_QC_T_NONE -1U #define BLK_QC_T_SHIFT 16 #define BLK_QC_T_INTERNAL (1U << 31) diff --git a/include/linux/fs.h b/include/linux/fs.h index fd47deea7c17..04b687150736 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -330,7 +330,7 @@ struct kiocb { u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ union { - unsigned int ki_cookie; /* for ->iopoll */ + blk_qc_t ki_cookie; /* for ->iopoll */ struct wait_page_queue *ki_waitq; /* for async buffered IO */ }; diff --git a/include/linux/types.h b/include/linux/types.h index a147977602b5..da5ca7e1bea9 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -125,6 +125,9 @@ typedef s64 int64_t; typedef u64 sector_t; typedef u64 blkcnt_t; +/* cookie used for IO polling */ +typedef unsigned int blk_qc_t; + /* * The type of an index into the pagecache. */ From patchwork Mon Feb 8 08:52:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4345CC433DB for ; Mon, 8 Feb 2021 08:56:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EA63764E7A for ; Mon, 8 Feb 2021 08:56:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230148AbhBHI4R (ORCPT ); Mon, 8 Feb 2021 03:56:17 -0500 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:46123 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230101AbhBHIxc (ORCPT ); Mon, 8 Feb 2021 03:53:32 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R951e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UO8W.Q._1612774365; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UO8W.Q._1612774365) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:46 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 02/11] block: add queue_to_disk() to get gendisk from request_queue Date: Mon, 8 Feb 2021 16:52:34 +0800 Message-Id: <20210208085243.82367-3-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Sometimes we need to get the corresponding gendisk from request_queue. It is preferred that block drivers store private data in gendisk->private_data rather than request_queue->queuedata, e.g. see: commit c4a59c4e5db3 ("dm: stop using ->queuedata"). So if only request_queue is given, we need to get its corresponding gendisk to get the private data stored in that gendisk. Signed-off-by: Jeffle Xu Reviewed-by: Mike Snitzer --- include/linux/blkdev.h | 2 ++ include/trace/events/kyber.h | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f94ee3089e01..2a802e011a95 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -687,6 +687,8 @@ static inline bool blk_account_rq(struct request *rq) dma_map_page_attrs(dev, (bv)->bv_page, (bv)->bv_offset, (bv)->bv_len, \ (dir), (attrs)) +#define queue_to_disk(q) (dev_to_disk(kobj_to_dev((q)->kobj.parent))) + static inline bool queue_is_mq(struct request_queue *q) { return q->mq_ops; diff --git a/include/trace/events/kyber.h b/include/trace/events/kyber.h index c0e7d24ca256..f9802562edf6 100644 --- a/include/trace/events/kyber.h +++ b/include/trace/events/kyber.h @@ -30,7 +30,7 @@ TRACE_EVENT(kyber_latency, ), TP_fast_assign( - __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + __entry->dev = disk_devt(queue_to_disk(q)); strlcpy(__entry->domain, domain, sizeof(__entry->domain)); strlcpy(__entry->type, type, sizeof(__entry->type)); __entry->percentile = percentile; @@ -59,7 +59,7 @@ TRACE_EVENT(kyber_adjust, ), TP_fast_assign( - __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + __entry->dev = disk_devt(queue_to_disk(q)); strlcpy(__entry->domain, domain, sizeof(__entry->domain)); __entry->depth = depth; ), @@ -81,7 +81,7 @@ TRACE_EVENT(kyber_throttled, ), TP_fast_assign( - __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + __entry->dev = disk_devt(queue_to_disk(q)); strlcpy(__entry->domain, domain, sizeof(__entry->domain)); ), From patchwork Mon Feb 8 08:52:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B2A9C433DB for ; Mon, 8 Feb 2021 08:56:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CBAA664E43 for ; Mon, 8 Feb 2021 08:56:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231294AbhBHIz5 (ORCPT ); Mon, 8 Feb 2021 03:55:57 -0500 Received: from out30-45.freemail.mail.aliyun.com ([115.124.30.45]:36844 "EHLO out30-45.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230137AbhBHIxb (ORCPT ); Mon, 8 Feb 2021 03:53:31 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R501e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UO9so9N_1612774367; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UO9so9N_1612774367) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:47 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 03/11] block: add poll method to support bio-based IO polling Date: Mon, 8 Feb 2021 16:52:35 +0800 Message-Id: <20210208085243.82367-4-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org ->poll_fn was introduced in commit ea435e1b9392 ("block: add a poll_fn callback to struct request_queue") to support bio-based queues such as nvme multipath, but was later removed in commit 529262d56dbe ("block: remove ->poll_fn"). Given commit c62b37d96b6e ("block: move ->make_request_fn to struct block_device_operations") restore the possibility of bio-based IO polling support by adding an ->poll method to gendisk->fops. Make blk_mq_poll() specific to blk-mq, while blk_bio_poll() is originally a copy from blk_mq_poll(), and is specific to bio-based polling. Signed-off-by: Jeffle Xu --- block/blk-core.c | 58 ++++++++++++++++++++++++++++++++++++++++++ block/blk-mq.c | 22 +--------------- include/linux/blk-mq.h | 1 + include/linux/blkdev.h | 1 + 4 files changed, 61 insertions(+), 21 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7663a9b94b80..37aa513da5f2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1132,6 +1132,64 @@ blk_qc_t submit_bio(struct bio *bio) } EXPORT_SYMBOL(submit_bio); + +static int blk_bio_poll(struct request_queue *q, blk_qc_t cookie, bool spin) +{ + long state; + struct gendisk *disk = queue_to_disk(q); + + state = current->state; + do { + int ret; + + ret = disk->fops->poll(q, cookie); + if (ret > 0) { + __set_current_state(TASK_RUNNING); + return ret; + } + + if (signal_pending_state(state, current)) + __set_current_state(TASK_RUNNING); + + if (current->state == TASK_RUNNING) + return 1; + if (ret < 0 || !spin) + break; + cpu_relax(); + } while (!need_resched()); + + __set_current_state(TASK_RUNNING); + return 0; +} + +/** + * blk_poll - poll for IO completions + * @q: the queue + * @cookie: cookie passed back at IO submission time + * @spin: whether to spin for completions + * + * Description: + * Poll for completions on the passed in queue. Returns number of + * completed entries found. If @spin is true, then blk_poll will continue + * looping until at least one completion is found, unless the task is + * otherwise marked running (or we need to reschedule). + */ +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) +{ + if (!blk_qc_t_valid(cookie) || + !test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) + return 0; + + if (current->plug) + blk_flush_plug_list(current->plug, false); + + if (queue_is_mq(q)) + return blk_mq_poll(q, cookie, spin); + else + return blk_bio_poll(q, cookie, spin); +} +EXPORT_SYMBOL_GPL(blk_poll); + /** * blk_cloned_rq_check_limits - Helper function to check a cloned request * for the new queue limits diff --git a/block/blk-mq.c b/block/blk-mq.c index f285a9123a8b..c18bc23834a0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3826,30 +3826,11 @@ static bool blk_mq_poll_hybrid(struct request_queue *q, return blk_mq_poll_hybrid_sleep(q, rq); } -/** - * blk_poll - poll for IO completions - * @q: the queue - * @cookie: cookie passed back at IO submission time - * @spin: whether to spin for completions - * - * Description: - * Poll for completions on the passed in queue. Returns number of - * completed entries found. If @spin is true, then blk_poll will continue - * looping until at least one completion is found, unless the task is - * otherwise marked running (or we need to reschedule). - */ -int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) +int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin) { struct blk_mq_hw_ctx *hctx; long state; - if (!blk_qc_t_valid(cookie) || - !test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) - return 0; - - if (current->plug) - blk_flush_plug_list(current->plug, false); - hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)]; /* @@ -3891,7 +3872,6 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) __set_current_state(TASK_RUNNING); return 0; } -EXPORT_SYMBOL_GPL(blk_poll); unsigned int blk_mq_rq_cpu(struct request *rq) { diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index d705b174d346..adc789d5888e 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -607,6 +607,7 @@ static inline void blk_rq_bio_prep(struct request *rq, struct bio *bio, } blk_qc_t blk_mq_submit_bio(struct bio *bio); +int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin); void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx, struct lock_class_key *key); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2a802e011a95..ebe16f55cba4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1853,6 +1853,7 @@ static inline void blk_ksm_unregister(struct request_queue *q) { } struct block_device_operations { blk_qc_t (*submit_bio) (struct bio *bio); + int (*poll)(struct request_queue *q, blk_qc_t cookie); int (*open) (struct block_device *, fmode_t); void (*release) (struct gendisk *, fmode_t); int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int); From patchwork Mon Feb 8 08:52:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD966C433E0 for ; Mon, 8 Feb 2021 08:56:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6F78C64E43 for ; Mon, 8 Feb 2021 08:56:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230305AbhBHIzs (ORCPT ); Mon, 8 Feb 2021 03:55:48 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:37738 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230148AbhBHIxb (ORCPT ); Mon, 8 Feb 2021 03:53:31 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UOAH3J-_1612774368; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UOAH3J-_1612774368) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:48 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 04/11] block: add poll_capable method to support bio-based IO polling Date: Mon, 8 Feb 2021 16:52:36 +0800 Message-Id: <20210208085243.82367-5-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This method can be used to check if bio-based device supports IO polling or not. For mq devices, checking for hw queue in polling mode is adequate, while the sanity check shall be implementation specific for bio-based devices. For example, dm device needs to check if all underlying devices are capable of IO polling. Though bio-based device may have done the sanity check during the device initialization phase, cacheing the result of this sanity check (such as by cacheing in the queue_flags) may not work. Because for dm devices, users could change the state of the underlying devices through '/sys/block//io_poll', bypassing the dm device above. In this case, the cached result of the very beginning sanity check could be out-of-date. Thus the sanity check needs to be done every time 'io_poll' is to be modified. Signed-off-by: Jeffle Xu --- block/blk-sysfs.c | 14 +++++++++++--- include/linux/blkdev.h | 1 + 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index b513f1683af0..f11fedefc37d 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -420,9 +420,17 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page, unsigned long poll_on; ssize_t ret; - if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL || - !q->tag_set->map[HCTX_TYPE_POLL].nr_queues) - return -EINVAL; + if (queue_is_mq(q)) { + if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL || + !q->tag_set->map[HCTX_TYPE_POLL].nr_queues) + return -EINVAL; + } else { + struct gendisk *disk = queue_to_disk(q); + + if (!disk->fops->poll_capable || + !disk->fops->poll_capable(disk)) + return -EINVAL; + } ret = queue_var_store(&poll_on, page, count); if (ret < 0) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ebe16f55cba4..8a84088642ce 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1854,6 +1854,7 @@ static inline void blk_ksm_unregister(struct request_queue *q) { } struct block_device_operations { blk_qc_t (*submit_bio) (struct bio *bio); int (*poll)(struct request_queue *q, blk_qc_t cookie); + bool (*poll_capable)(struct gendisk *disk); int (*open) (struct block_device *, fmode_t); void (*release) (struct gendisk *, fmode_t); int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int); From patchwork Mon Feb 8 08:52:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1653C433E9 for ; Mon, 8 Feb 2021 08:59:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A7B5764E7A for ; Mon, 8 Feb 2021 08:59:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230306AbhBHI7J (ORCPT ); Mon, 8 Feb 2021 03:59:09 -0500 Received: from out30-42.freemail.mail.aliyun.com ([115.124.30.42]:36756 "EHLO out30-42.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230330AbhBHIxe (ORCPT ); Mon, 8 Feb 2021 03:53:34 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UOAH3J5_1612774369; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UOAH3J5_1612774369) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:49 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 05/11] block/mq: extract one helper function polling hw queue Date: Mon, 8 Feb 2021 16:52:37 +0800 Message-Id: <20210208085243.82367-6-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Extract the logic of polling one hw queue and related statistics handling out as the helper function. Signed-off-by: Jeffle Xu --- block/blk-mq.c | 5 +---- include/linux/blk-mq.h | 13 +++++++++++++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index c18bc23834a0..b4de2b37b826 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3850,11 +3850,8 @@ int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin) do { int ret; - hctx->poll_invoked++; - - ret = q->mq_ops->poll(hctx); + ret = blk_mq_poll_hctx(q, hctx); if (ret > 0) { - hctx->poll_success++; __set_current_state(TASK_RUNNING); return ret; } diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index adc789d5888e..cf1910c6d5ae 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -606,6 +606,19 @@ static inline void blk_rq_bio_prep(struct request *rq, struct bio *bio, rq->rq_disk = bio->bi_disk; } +static inline int blk_mq_poll_hctx(struct request_queue *q, + struct blk_mq_hw_ctx *hctx) +{ + int ret; + + hctx->poll_invoked++; + ret = q->mq_ops->poll(hctx); + if (ret > 0) + hctx->poll_success++; + + return ret; +} + blk_qc_t blk_mq_submit_bio(struct bio *bio); int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin); void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx, From patchwork Mon Feb 8 08:52:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15BDCC433E0 for ; Mon, 8 Feb 2021 08:56:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C55C364E7E for ; Mon, 8 Feb 2021 08:56:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231192AbhBHI4h (ORCPT ); Mon, 8 Feb 2021 03:56:37 -0500 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:49761 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230126AbhBHIxg (ORCPT ); Mon, 8 Feb 2021 03:53:36 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R411e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UOAH3JG_1612774370; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UOAH3JG_1612774370) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:50 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 06/11] block/mq: add iterator for polling hw queues Date: Mon, 8 Feb 2021 16:52:38 +0800 Message-Id: <20210208085243.82367-7-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add one helper function for iterating all hardware queues in polling mode. Signed-off-by: Jeffle Xu --- include/linux/blk-mq.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index cf1910c6d5ae..0c511555ec16 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -574,6 +574,13 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq) for ((i) = 0; (i) < (q)->nr_hw_queues && \ ({ hctx = (q)->queue_hw_ctx[i]; 1; }); (i)++) +#define queue_for_each_poll_hw_ctx(q, hctx, i) \ + for ((i) = 0; ((q)->tag_set->nr_maps > HCTX_TYPE_POLL) && \ + (i) < (q)->tag_set->map[HCTX_TYPE_POLL].nr_queues && \ + ({ int __idx = (q)->tag_set->map[HCTX_TYPE_POLL].queue_offset + (i); \ + hctx = (q)->queue_hw_ctx[__idx]; 1; }); \ + (i)++) + #define hctx_for_each_ctx(hctx, ctx, i) \ for ((i) = 0; (i) < (hctx)->nr_ctx && \ ({ ctx = (hctx)->ctxs[(i)]; 1; }); (i)++) From patchwork Mon Feb 8 08:52:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFEABC433DB for ; Mon, 8 Feb 2021 08:59:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A23BE64E7E for ; Mon, 8 Feb 2021 08:59:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230330AbhBHI7P (ORCPT ); Mon, 8 Feb 2021 03:59:15 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:40455 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230348AbhBHIxe (ORCPT ); Mon, 8 Feb 2021 03:53:34 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UO9kQia_1612774371; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UO9kQia_1612774371) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:51 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 07/11] dm: always return BLK_QC_T_NONE for bio-based device Date: Mon, 8 Feb 2021 16:52:39 +0800 Message-Id: <20210208085243.82367-8-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently the returned cookie of bio-based device is not used at all. Cookie of bio-based device will be refactored in the following patch. Signed-off-by: Jeffle Xu Reviewed-by: Mike Snitzer --- drivers/md/dm.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 7bac564f3faa..46ca3b739396 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1252,14 +1252,13 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors) } EXPORT_SYMBOL_GPL(dm_accept_partial_bio); -static blk_qc_t __map_bio(struct dm_target_io *tio) +static void __map_bio(struct dm_target_io *tio) { int r; sector_t sector; struct bio *clone = &tio->clone; struct dm_io *io = tio->io; struct dm_target *ti = tio->ti; - blk_qc_t ret = BLK_QC_T_NONE; clone->bi_end_io = clone_endio; @@ -1278,7 +1277,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio) case DM_MAPIO_REMAPPED: /* the bio has been remapped so dispatch it */ trace_block_bio_remap(clone, bio_dev(io->orig_bio), sector); - ret = submit_bio_noacct(clone); + submit_bio_noacct(clone); break; case DM_MAPIO_KILL: free_tio(tio); @@ -1292,8 +1291,6 @@ static blk_qc_t __map_bio(struct dm_target_io *tio) DMWARN("unimplemented target map return value: %d", r); BUG(); } - - return ret; } static void bio_setup_sector(struct bio *bio, sector_t sector, unsigned len) @@ -1380,7 +1377,7 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci, } } -static blk_qc_t __clone_and_map_simple_bio(struct clone_info *ci, +static void __clone_and_map_simple_bio(struct clone_info *ci, struct dm_target_io *tio, unsigned *len) { struct bio *clone = &tio->clone; @@ -1391,7 +1388,7 @@ static blk_qc_t __clone_and_map_simple_bio(struct clone_info *ci, if (len) bio_setup_sector(clone, ci->sector, *len); - return __map_bio(tio); + __map_bio(tio); } static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti, @@ -1405,7 +1402,7 @@ static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti, while ((bio = bio_list_pop(&blist))) { tio = container_of(bio, struct dm_target_io, clone); - (void) __clone_and_map_simple_bio(ci, tio, len); + __clone_and_map_simple_bio(ci, tio, len); } } @@ -1450,7 +1447,7 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti, free_tio(tio); return r; } - (void) __map_bio(tio); + __map_bio(tio); return 0; } @@ -1565,11 +1562,10 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md, /* * Entry point to split a bio into clones and submit them to the targets. */ -static blk_qc_t __split_and_process_bio(struct mapped_device *md, +static void __split_and_process_bio(struct mapped_device *md, struct dm_table *map, struct bio *bio) { struct clone_info ci; - blk_qc_t ret = BLK_QC_T_NONE; int error = 0; init_clone_info(&ci, md, map, bio); @@ -1613,7 +1609,7 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md, bio_chain(b, bio); trace_block_split(b, bio->bi_iter.bi_sector); - ret = submit_bio_noacct(bio); + submit_bio_noacct(bio); break; } } @@ -1621,13 +1617,11 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md, /* drop the extra reference count */ dec_pending(ci.io, errno_to_blk_status(error)); - return ret; } static blk_qc_t dm_submit_bio(struct bio *bio) { struct mapped_device *md = bio->bi_disk->private_data; - blk_qc_t ret = BLK_QC_T_NONE; int srcu_idx; struct dm_table *map; @@ -1657,10 +1651,10 @@ static blk_qc_t dm_submit_bio(struct bio *bio) if (is_abnormal_io(bio)) blk_queue_split(&bio); - ret = __split_and_process_bio(md, map, bio); + __split_and_process_bio(md, map, bio); out: dm_put_live_table(md, srcu_idx); - return ret; + return BLK_QC_T_NONE; } /*----------------------------------------------------------------- From patchwork Mon Feb 8 08:52:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF660C433DB for ; Mon, 8 Feb 2021 08:59:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7190164E88 for ; Mon, 8 Feb 2021 08:59:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231206AbhBHI71 (ORCPT ); Mon, 8 Feb 2021 03:59:27 -0500 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:45179 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230365AbhBHIxv (ORCPT ); Mon, 8 Feb 2021 03:53:51 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UO9soA-_1612774372; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UO9soA-_1612774372) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:52 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 08/11] dm: fix iterate_device sanity check Date: Mon, 8 Feb 2021 16:52:40 +0800 Message-Id: <20210208085243.82367-9-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org According to the definition of dm_iterate_devices_fn: * This function must iterate through each section of device used by the * target until it encounters a non-zero return code, which it then returns. * Returns zero if no callout returned non-zero. For some target type (e.g., dm-stripe), one call of iterate_devices() may iterate multiple underlying devices internally, in which case a non-zero return code returned by iterate_devices_callout_fn will stop the iteration in advance. Thus if we want to ensure that _all_ underlying devices support some kind of attribute, the iteration structure like dm_table_supports_nowait() should be used, while the input iterate_devices_callout_fn should handle the 'not support' semantics. On the opposite, the iteration structure like dm_table_any_dev_attr() should be used if _any_ underlying device supporting this attibute is sufficient. In this case, the input iterate_devices_callout_fn should handle the 'support' semantics. Besides, remove the unnecessary NULL pointer check for request_queue, since the request_queue pointer returned from bdev_get_queue() shall never be NULL according to commit ff9ea323816d ("block, bdi: an active gendisk always has a request_queue associated with it"). Fixes: 545ed20e6df6 ("dm: add infrastructure for DAX support") Fixes: c3c4555edd10 ("dm table: clear add_random unless all devices have it set") Fixes: 4693c9668fdc ("dm table: propagate non rotational flag") Cc: stable@vger.kernel.org Signed-off-by: Jeffle Xu --- drivers/md/dm-table.c | 195 +++++++++++++++++++----------------------- drivers/md/dm.c | 2 +- drivers/md/dm.h | 2 +- 3 files changed, 91 insertions(+), 108 deletions(-) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 4acf2342f7ad..aa37f3e82238 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -820,24 +820,24 @@ void dm_table_set_type(struct dm_table *t, enum dm_queue_mode type) EXPORT_SYMBOL_GPL(dm_table_set_type); /* validate the dax capability of the target device span */ -int device_supports_dax(struct dm_target *ti, struct dm_dev *dev, - sector_t start, sector_t len, void *data) +int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) { int blocksize = *(int *) data, id; bool rc; id = dax_read_lock(); - rc = dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len); + rc = !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len); dax_read_unlock(id); return rc; } /* Check devices support synchronous DAX */ -static int device_dax_synchronous(struct dm_target *ti, struct dm_dev *dev, - sector_t start, sector_t len, void *data) +static int device_not_dax_sync_capable(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) { - return dev->dax_dev && dax_synchronous(dev->dax_dev); + return !dev->dax_dev || !dax_synchronous(dev->dax_dev); } bool dm_table_supports_dax(struct dm_table *t, @@ -854,7 +854,7 @@ bool dm_table_supports_dax(struct dm_table *t, return false; if (!ti->type->iterate_devices || - !ti->type->iterate_devices(ti, iterate_fn, blocksize)) + ti->type->iterate_devices(ti, iterate_fn, blocksize)) return false; } @@ -925,7 +925,7 @@ static int dm_table_determine_type(struct dm_table *t) verify_bio_based: /* We must use this table as bio-based */ t->type = DM_TYPE_BIO_BASED; - if (dm_table_supports_dax(t, device_supports_dax, &page_size) || + if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) || (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) { t->type = DM_TYPE_DAX_BIO_BASED; } @@ -1331,13 +1331,49 @@ bool dm_table_has_no_data_devices(struct dm_table *table) return true; } -static int device_is_zoned_model(struct dm_target *ti, struct dm_dev *dev, - sector_t start, sector_t len, void *data) +/* + * Cases requiring _any_ underlying device supporting some kind of attribute, + * should use the iteration structure like dm_table_any_dev_attr(), or call + * it directly. @func should handle semantics of positive examples, e.g., + * capable of something. + */ +static bool dm_table_any_dev_attr(struct dm_table *t, + iterate_devices_callout_fn func, void *data) +{ + struct dm_target *ti; + unsigned int i; + + for (i = 0; i < dm_table_get_num_targets(t); i++) { + ti = dm_table_get_target(t, i); + + if (ti->type->iterate_devices && + ti->type->iterate_devices(ti, func, data)) + return true; + } + + return false; +} + +/* + * Cases requiring _all_ underlying devices supporting some kind of attribute, + * should use the iteration structure like dm_table_supports_nowait(), or call + * dm_table_all_devs_attr() directly. @anti_func should handle semantics of + * counter examples, e.g., not capable of something. + */ +static inline bool dm_table_all_devs_attr(struct dm_table *t, + iterate_devices_callout_fn anti_func, + void *data) +{ + return !dm_table_any_dev_attr(t, anti_func, data); +} + +static int device_not_zoned_model(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) { struct request_queue *q = bdev_get_queue(dev->bdev); enum blk_zoned_model *zoned_model = data; - return q && blk_queue_zoned_model(q) == *zoned_model; + return blk_queue_zoned_model(q) != *zoned_model; } static bool dm_table_supports_zoned_model(struct dm_table *t, @@ -1354,37 +1390,20 @@ static bool dm_table_supports_zoned_model(struct dm_table *t, return false; if (!ti->type->iterate_devices || - !ti->type->iterate_devices(ti, device_is_zoned_model, &zoned_model)) + ti->type->iterate_devices(ti, device_not_zoned_model, &zoned_model)) return false; } return true; } -static int device_matches_zone_sectors(struct dm_target *ti, struct dm_dev *dev, - sector_t start, sector_t len, void *data) +static int device_not_matches_zone_sectors(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) { struct request_queue *q = bdev_get_queue(dev->bdev); unsigned int *zone_sectors = data; - return q && blk_queue_zone_sectors(q) == *zone_sectors; -} - -static bool dm_table_matches_zone_sectors(struct dm_table *t, - unsigned int zone_sectors) -{ - struct dm_target *ti; - unsigned i; - - for (i = 0; i < dm_table_get_num_targets(t); i++) { - ti = dm_table_get_target(t, i); - - if (!ti->type->iterate_devices || - !ti->type->iterate_devices(ti, device_matches_zone_sectors, &zone_sectors)) - return false; - } - - return true; + return blk_queue_zone_sectors(q) != *zone_sectors; } static int validate_hardware_zoned_model(struct dm_table *table, @@ -1404,7 +1423,7 @@ static int validate_hardware_zoned_model(struct dm_table *table, if (!zone_sectors || !is_power_of_2(zone_sectors)) return -EINVAL; - if (!dm_table_matches_zone_sectors(table, zone_sectors)) { + if (dm_table_any_dev_attr(table, device_not_matches_zone_sectors, &zone_sectors)) { DMERR("%s: zone sectors is not consistent across all devices", dm_device_name(table->md)); return -EINVAL; @@ -1533,7 +1552,7 @@ static int device_flush_capable(struct dm_target *ti, struct dm_dev *dev, unsigned long flush = (unsigned long) data; struct request_queue *q = bdev_get_queue(dev->bdev); - return q && (q->queue_flags & flush); + return (q->queue_flags & flush); } static bool dm_table_supports_flush(struct dm_table *t, unsigned long flush) @@ -1578,54 +1597,20 @@ static int device_dax_write_cache_enabled(struct dm_target *ti, return false; } -static int dm_table_supports_dax_write_cache(struct dm_table *t) -{ - struct dm_target *ti; - unsigned i; - - for (i = 0; i < dm_table_get_num_targets(t); i++) { - ti = dm_table_get_target(t, i); - - if (ti->type->iterate_devices && - ti->type->iterate_devices(ti, - device_dax_write_cache_enabled, NULL)) - return true; - } - - return false; -} - -static int device_is_nonrot(struct dm_target *ti, struct dm_dev *dev, - sector_t start, sector_t len, void *data) +static int device_is_rot(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) { struct request_queue *q = bdev_get_queue(dev->bdev); - return q && blk_queue_nonrot(q); + return !blk_queue_nonrot(q); } static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev, - sector_t start, sector_t len, void *data) + sector_t start, sector_t len, void *data) { struct request_queue *q = bdev_get_queue(dev->bdev); - return q && !blk_queue_add_random(q); -} - -static bool dm_table_all_devices_attribute(struct dm_table *t, - iterate_devices_callout_fn func) -{ - struct dm_target *ti; - unsigned i; - - for (i = 0; i < dm_table_get_num_targets(t); i++) { - ti = dm_table_get_target(t, i); - - if (!ti->type->iterate_devices || - !ti->type->iterate_devices(ti, func, NULL)) - return false; - } - - return true; + return !blk_queue_add_random(q); } static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev, @@ -1633,7 +1618,7 @@ static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *de { struct request_queue *q = bdev_get_queue(dev->bdev); - return q && !q->limits.max_write_same_sectors; + return !q->limits.max_write_same_sectors; } static bool dm_table_supports_write_same(struct dm_table *t) @@ -1660,7 +1645,7 @@ static int device_not_write_zeroes_capable(struct dm_target *ti, struct dm_dev * { struct request_queue *q = bdev_get_queue(dev->bdev); - return q && !q->limits.max_write_zeroes_sectors; + return !q->limits.max_write_zeroes_sectors; } static bool dm_table_supports_write_zeroes(struct dm_table *t) @@ -1687,7 +1672,7 @@ static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev, { struct request_queue *q = bdev_get_queue(dev->bdev); - return q && !blk_queue_nowait(q); + return !blk_queue_nowait(q); } static bool dm_table_supports_nowait(struct dm_table *t) @@ -1714,7 +1699,7 @@ static int device_not_discard_capable(struct dm_target *ti, struct dm_dev *dev, { struct request_queue *q = bdev_get_queue(dev->bdev); - return q && !blk_queue_discard(q); + return !blk_queue_discard(q); } static bool dm_table_supports_discards(struct dm_table *t) @@ -1748,7 +1733,7 @@ static int device_not_secure_erase_capable(struct dm_target *ti, { struct request_queue *q = bdev_get_queue(dev->bdev); - return q && !blk_queue_secure_erase(q); + return !blk_queue_secure_erase(q); } static bool dm_table_supports_secure_erase(struct dm_table *t) @@ -1776,30 +1761,23 @@ static int device_requires_stable_pages(struct dm_target *ti, { struct request_queue *q = bdev_get_queue(dev->bdev); - return q && blk_queue_stable_writes(q); + return blk_queue_stable_writes(q); } /* - * If any underlying device requires stable pages, a table must require - * them as well. Only targets that support iterate_devices are considered: - * don't want error, zero, etc to require stable pages. + * type->iterate_devices() should be called when the sanity check needs to + * iterate and check all underlying data devices. iterate_devices() will + * iterate all underlying data devices until it encounters a non-zero return + * code, returned by whether the input iterate_devices_callout_fn, or + * iterate_devices() itself internally. + * + * For some target type (e.g., dm-stripe), one call of iterate_devices() may + * iterate multiple underlying devices internally, in which case a non-zero + * return code returned by iterate_devices_callout_fn will stop the iteration + * in advance. + * Thus users should call dm_table_any_dev_attr() or dm_table_all_devs_attr() + * directly if possible. See comments of these two functions for more details. */ -static bool dm_table_requires_stable_pages(struct dm_table *t) -{ - struct dm_target *ti; - unsigned i; - - for (i = 0; i < dm_table_get_num_targets(t); i++) { - ti = dm_table_get_target(t, i); - - if (ti->type->iterate_devices && - ti->type->iterate_devices(ti, device_requires_stable_pages, NULL)) - return true; - } - - return false; -} - void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *limits) { @@ -1837,22 +1815,22 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, } blk_queue_write_cache(q, wc, fua); - if (dm_table_supports_dax(t, device_supports_dax, &page_size)) { + if (dm_table_supports_dax(t, device_not_dax_capable, &page_size)) { blk_queue_flag_set(QUEUE_FLAG_DAX, q); - if (dm_table_supports_dax(t, device_dax_synchronous, NULL)) + if (dm_table_supports_dax(t, device_not_dax_sync_capable, NULL)) set_dax_synchronous(t->md->dax_dev); } else blk_queue_flag_clear(QUEUE_FLAG_DAX, q); - if (dm_table_supports_dax_write_cache(t)) + if (dm_table_any_dev_attr(t, device_dax_write_cache_enabled, NULL)) dax_write_cache(t->md->dax_dev, true); /* Ensure that all underlying devices are non-rotational. */ - if (dm_table_all_devices_attribute(t, device_is_nonrot)) - blk_queue_flag_set(QUEUE_FLAG_NONROT, q); - else + if (dm_table_any_dev_attr(t, device_is_rot, NULL)) blk_queue_flag_clear(QUEUE_FLAG_NONROT, q); + else + blk_queue_flag_set(QUEUE_FLAG_NONROT, q); if (!dm_table_supports_write_same(t)) q->limits.max_write_same_sectors = 0; @@ -1864,8 +1842,11 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, /* * Some devices don't use blk_integrity but still want stable pages * because they do their own checksumming. + * If any underlying device requires stable pages, a table must require + * them as well. Only targets that support iterate_devices are considered: + * don't want error, zero, etc to require stable pages. */ - if (dm_table_requires_stable_pages(t)) + if (dm_table_any_dev_attr(t, device_requires_stable_pages, NULL)) blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q); else blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q); @@ -1876,8 +1857,10 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, * Clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not * have it set. */ - if (blk_queue_add_random(q) && dm_table_all_devices_attribute(t, device_is_not_random)) + if (dm_table_any_dev_attr(t, device_is_not_random, NULL)) blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q); + else + blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q); /* * For a zoned target, the number of zones should be updated for the diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 46ca3b739396..c2945c90745e 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1128,7 +1128,7 @@ static bool dm_dax_supported(struct dax_device *dax_dev, struct block_device *bd if (!map) goto out; - ret = dm_table_supports_dax(map, device_supports_dax, &blocksize); + ret = dm_table_supports_dax(map, device_not_dax_capable, &blocksize); out: dm_put_live_table(md, srcu_idx); diff --git a/drivers/md/dm.h b/drivers/md/dm.h index fffe1e289c53..b441ad772c18 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -73,7 +73,7 @@ void dm_table_free_md_mempools(struct dm_table *t); struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t); bool dm_table_supports_dax(struct dm_table *t, iterate_devices_callout_fn fn, int *blocksize); -int device_supports_dax(struct dm_target *ti, struct dm_dev *dev, +int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev, sector_t start, sector_t len, void *data); void dm_lock_md_type(struct mapped_device *md); From patchwork Mon Feb 8 08:52:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69EA9C433E0 for ; Mon, 8 Feb 2021 08:56:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2004B64E88 for ; Mon, 8 Feb 2021 08:56:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230137AbhBHI4Z (ORCPT ); Mon, 8 Feb 2021 03:56:25 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:35915 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230362AbhBHIxi (ORCPT ); Mon, 8 Feb 2021 03:53:38 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UO9soA7_1612774373; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UO9soA7_1612774373) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:53 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 09/11] dm: support IO polling for bio-based dm device Date: Mon, 8 Feb 2021 16:52:41 +0800 Message-Id: <20210208085243.82367-10-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org DM will iterate and poll all polling hardware queues of all target mq devices when polling IO for dm device. To mitigate the race introduced by iterating all target hw queues, a per-hw-queue flag is maintained to indicate whether this polling hw queue currently being polled on or not. Every polling hw queue is exclusive to one polling instance, i.e., the polling instance will skip this polling hw queue if this hw queue currently is being polled by another polling instance, and start polling on the next hw queue. IO polling is enabled when all underlying target devices are capable of IO polling. The sanity check supports the stacked device model, in which one dm device may be build upon another dm device. In this case, the mapped device will check if the underlying dm target device supports IO polling. Signed-off-by: Jeffle Xu --- drivers/md/dm-table.c | 26 ++++++++++++++ drivers/md/dm.c | 64 +++++++++++++++++++++++++++++++++++ include/linux/device-mapper.h | 1 + 3 files changed, 91 insertions(+) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index aa37f3e82238..b090b4c9692d 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1764,6 +1764,19 @@ static int device_requires_stable_pages(struct dm_target *ti, return blk_queue_stable_writes(q); } +static int device_not_poll_capable(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) +{ + struct request_queue *q = bdev_get_queue(dev->bdev); + + return !test_bit(QUEUE_FLAG_POLL, &q->queue_flags); +} + +int dm_table_supports_poll(struct dm_table *t) +{ + return dm_table_all_devs_attr(t, device_not_poll_capable, NULL); +} + /* * type->iterate_devices() should be called when the sanity check needs to * iterate and check all underlying data devices. iterate_devices() will @@ -1875,6 +1888,19 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, #endif blk_queue_update_readahead(q); + + /* + * Check for request-based device is remained to + * dm_mq_init_request_queue()->blk_mq_init_allocated_queue(). + * For bio-based device, only set QUEUE_FLAG_POLL when all underlying + * devices supporting polling. + */ + if (__table_type_bio_based(t->type)) { + if (dm_table_supports_poll(t)) + blk_queue_flag_set(QUEUE_FLAG_POLL, q); + else + blk_queue_flag_clear(QUEUE_FLAG_POLL, q); + } } unsigned int dm_table_get_num_targets(struct dm_table *t) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index c2945c90745e..8423f1347bb8 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1657,6 +1657,68 @@ static blk_qc_t dm_submit_bio(struct bio *bio) return BLK_QC_T_NONE; } +static int dm_poll_one_md(struct mapped_device *md); + +static int dm_poll_one_dev(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) +{ + int i, *count = data; + struct request_queue *q = bdev_get_queue(dev->bdev); + struct blk_mq_hw_ctx *hctx; + + if (queue_is_mq(q)) { + if (!percpu_ref_tryget(&q->q_usage_counter)) + return 0; + + queue_for_each_poll_hw_ctx(q, hctx, i) + *count += blk_mq_poll_hctx(q, hctx); + + percpu_ref_put(&q->q_usage_counter); + } else + *count += dm_poll_one_md(dev->bdev->bd_disk->private_data); + + return 0; +} + +static int dm_poll_one_md(struct mapped_device *md) +{ + int i, srcu_idx, ret = 0; + struct dm_table *t; + struct dm_target *ti; + + t = dm_get_live_table(md, &srcu_idx); + + for (i = 0; i < dm_table_get_num_targets(t); i++) { + ti = dm_table_get_target(t, i); + ti->type->iterate_devices(ti, dm_poll_one_dev, &ret); + } + + dm_put_live_table(md, srcu_idx); + + return ret; +} + +static int dm_bio_poll(struct request_queue *q, blk_qc_t cookie) +{ + struct gendisk *disk = queue_to_disk(q); + struct mapped_device *md = disk->private_data; + + return dm_poll_one_md(md); +} + +static bool dm_bio_poll_capable(struct gendisk *disk) +{ + int ret, srcu_idx; + struct mapped_device *md = disk->private_data; + struct dm_table *t; + + t = dm_get_live_table(md, &srcu_idx); + ret = dm_table_supports_poll(t); + dm_put_live_table(md, srcu_idx); + + return ret; +} + /*----------------------------------------------------------------- * An IDR is used to keep track of allocated minor numbers. *---------------------------------------------------------------*/ @@ -3049,6 +3111,8 @@ static const struct pr_ops dm_pr_ops = { }; static const struct block_device_operations dm_blk_dops = { + .poll = dm_bio_poll, + .poll_capable = dm_bio_poll_capable, .submit_bio = dm_submit_bio, .open = dm_blk_open, .release = dm_blk_close, diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 61a66fb8ebb3..6a9de3fd0087 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -515,6 +515,7 @@ unsigned int dm_table_get_num_targets(struct dm_table *t); fmode_t dm_table_get_mode(struct dm_table *t); struct mapped_device *dm_table_get_md(struct dm_table *t); const char *dm_table_device_name(struct dm_table *t); +int dm_table_supports_poll(struct dm_table *t); /* * Trigger an event. From patchwork Mon Feb 8 08:52:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02636C433E0 for ; Mon, 8 Feb 2021 08:59:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A75D464E82 for ; Mon, 8 Feb 2021 08:59:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231186AbhBHI7X (ORCPT ); Mon, 8 Feb 2021 03:59:23 -0500 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:41953 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230381AbhBHIxm (ORCPT ); Mon, 8 Feb 2021 03:53:42 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R521e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UO9kQj0_1612774374; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UO9kQj0_1612774374) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:55 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 10/11] nvme/pci: don't wait for locked polling queue Date: Mon, 8 Feb 2021 16:52:42 +0800 Message-Id: <20210208085243.82367-11-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org There's no sense waiting for the hw queue when it currently has been locked by another polling instance. The polling instance currently occupying the hw queue will help reap the completion events. It shall be safe to surrender the hw queue, as long as we could reapply for polling later. For Synchronous polling, blk_poll() will reapply for polling, since @spin is always True in this case. While For asynchronous polling, i.e. io_uring itself will reapply for polling when the previous polling returns 0. Besides, it shall do no harm to the polling performance of mq devices. Signed-off-by: Jeffle Xu --- drivers/nvme/host/pci.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 81e6389b2042..db164fcd04e2 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1106,7 +1106,9 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx) if (!nvme_cqe_pending(nvmeq)) return 0; - spin_lock(&nvmeq->cq_poll_lock); + if (!spin_trylock(&nvmeq->cq_poll_lock)) + return 0; + found = nvme_process_cq(nvmeq); spin_unlock(&nvmeq->cq_poll_lock); From patchwork Mon Feb 8 08:52:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 12074405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A491C433E0 for ; Mon, 8 Feb 2021 08:57:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E138B64E7E for ; Mon, 8 Feb 2021 08:56:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230288AbhBHI4o (ORCPT ); Mon, 8 Feb 2021 03:56:44 -0500 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:41110 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230408AbhBHIxp (ORCPT ); Mon, 8 Feb 2021 03:53:45 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R271e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0UOAeIP5_1612774375; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UOAeIP5_1612774375) by smtp.aliyun-inc.com(127.0.0.1); Mon, 08 Feb 2021 16:52:56 +0800 From: Jeffle Xu To: snitzer@redhat.com, axboe@kernel.dk Cc: joseph.qi@linux.alibaba.com, caspar@linux.alibaba.com, hch@lst.de, linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH v3 11/11] dm: fastpath of bio-based polling Date: Mon, 8 Feb 2021 16:52:43 +0800 Message-Id: <20210208085243.82367-12-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210208085243.82367-1-jefflexu@linux.alibaba.com> References: <20210208085243.82367-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Offer one fastpath of bio-based polling when bio submitted to dm device is not split. In this case, there will be only one bio submitted to only one polling hw queue of one underlying mq device, and thus we don't need to track all split bios or iterate through all polling hw queues. The pointer to the polling hw queue the bio submitted to is returned here as the returned cookie. In this case, the polling routine will call mq_ops->poll() directly with the hw queue converted from the input cookie. If the original bio submitted to dm device is split to multiple bios and thus submitted to multiple polling hw queues, the bio submission routine will return BLK_QC_T_BIO_MULTI, while the polling routine will fall back to iterating all hw queues (in polling mode) of all underlying mq devices. Signed-off-by: Jeffle Xu --- block/blk-core.c | 33 +++++++++++++++++++++++++++++++-- include/linux/blk_types.h | 8 ++++++++ include/linux/types.h | 2 +- 3 files changed, 40 insertions(+), 3 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 37aa513da5f2..cb24b33a4870 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -956,11 +956,19 @@ static blk_qc_t __submit_bio(struct bio *bio) * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio. * bio_list_on_stack[1] contains bios that were submitted before the current * ->submit_bio_bio, but that haven't been processed yet. + * + * Return: + * - BLK_QC_T_NONE, no need for IO polling. + * - BLK_QC_T_BIO_MULTI, @bio gets split and enqueued into multi hw queues. + * - Otherwise, @bio is not split, returning the pointer to the corresponding + * hw queue that the bio enqueued into as the returned cookie. */ static blk_qc_t __submit_bio_noacct(struct bio *bio) { struct bio_list bio_list_on_stack[2]; blk_qc_t ret = BLK_QC_T_NONE; + struct request_queue *top_q = bio->bi_disk->queue; + bool poll_on = test_bit(QUEUE_FLAG_POLL, &top_q->queue_flags); BUG_ON(bio->bi_next); @@ -968,6 +976,7 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio) current->bio_list = bio_list_on_stack; do { + blk_qc_t cookie; struct request_queue *q = bio->bi_disk->queue; struct bio_list lower, same; @@ -980,7 +989,20 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio) bio_list_on_stack[1] = bio_list_on_stack[0]; bio_list_init(&bio_list_on_stack[0]); - ret = __submit_bio(bio); + cookie = __submit_bio(bio); + + if (poll_on && + blk_qc_t_bio_valid(ret) && blk_qc_t_valid(cookie)) { + unsigned int queue_num = blk_qc_t_to_queue_num(cookie); + struct blk_mq_hw_ctx *hctx = q->queue_hw_ctx[queue_num]; + + cookie = (blk_qc_t)hctx; + + if (!blk_qc_t_valid(ret)) /* set initial value */ + ret = cookie; + else if (ret != cookie) /* bio got split */ + ret = BLK_QC_T_BIO_MULTI; + } /* * Sort new bios into those for a lower level and those for the @@ -1003,6 +1025,7 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio) } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); current->bio_list = NULL; + return ret; } @@ -1142,7 +1165,13 @@ static int blk_bio_poll(struct request_queue *q, blk_qc_t cookie, bool spin) do { int ret; - ret = disk->fops->poll(q, cookie); + if (blk_qc_t_bio_valid(cookie)) { + struct blk_mq_hw_ctx *hctx = (struct blk_mq_hw_ctx *)cookie; + struct request_queue *target_q = hctx->queue; + + ret = blk_mq_poll_hctx(target_q, hctx); + } else + ret = disk->fops->poll(q, cookie); if (ret > 0) { __set_current_state(TASK_RUNNING); return ret; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 2e05244fc16d..4173754532c0 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -557,6 +557,14 @@ static inline bool blk_qc_t_is_internal(blk_qc_t cookie) return (cookie & BLK_QC_T_INTERNAL) != 0; } +/* Macros for blk_qc_t used for bio-based polling */ +#define BLK_QC_T_BIO_MULTI -2U + +static inline bool blk_qc_t_bio_valid(blk_qc_t cookie) +{ + return cookie != BLK_QC_T_BIO_MULTI; +} + struct blk_rq_stat { u64 mean; u64 min; diff --git a/include/linux/types.h b/include/linux/types.h index da5ca7e1bea9..f6301014a459 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -126,7 +126,7 @@ typedef u64 sector_t; typedef u64 blkcnt_t; /* cookie used for IO polling */ -typedef unsigned int blk_qc_t; +typedef uintptr_t blk_qc_t; /* * The type of an index into the pagecache.