From patchwork Fri Mar 5 15:14:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Garry X-Patchwork-Id: 12118659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45945C433E6 for ; Fri, 5 Mar 2021 15:20:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1DB1565015 for ; Fri, 5 Mar 2021 15:20:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230112AbhCEPUD (ORCPT ); Fri, 5 Mar 2021 10:20:03 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:12700 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229992AbhCEPTj (ORCPT ); Fri, 5 Mar 2021 10:19:39 -0500 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4DsWYZ3HdDzlSl4; Fri, 5 Mar 2021 23:17:06 +0800 (CST) Received: from localhost.localdomain (10.69.192.58) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Fri, 5 Mar 2021 23:19:07 +0800 From: John Garry To: , , , , CC: , , , , , John Garry Subject: [RFC PATCH v3 1/3] blk-mq: Clean up references to old requests when freeing rqs Date: Fri, 5 Mar 2021 23:14:52 +0800 Message-ID: <1614957294-188540-2-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1614957294-188540-1-git-send-email-john.garry@huawei.com> References: <1614957294-188540-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org It has been reported many times that a use-after-free can be intermittently found when iterating busy requests: - https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96fa96@huawei.com/ - https://lore.kernel.org/linux-block/5c3ac5af-ed81-11e4-fee3-f92175f14daf@acm.org/T/#m6c1ac11540522716f645d004e2a5a13c9f218908 - https://lore.kernel.org/linux-block/04e2f9e8-79fa-f1cb-ab23-4a15bf3f64cc@kernel.dk/ The issue is that when we switch scheduler or change queue depth, there may be references in the driver tagset to the stale requests. As a solution, clean up any references to those requests in the driver tagset. This is done with a cmpxchg to make safe any race with setting the driver tagset request from another queue. Signed-off-by: John Garry --- block/blk-mq-sched.c | 2 +- block/blk-mq-tag.c | 2 +- block/blk-mq.c | 20 ++++++++++++++++++-- block/blk-mq.h | 2 ++ 4 files changed, 22 insertions(+), 4 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index ddb65e9e6fd9..bc19bd8f8c7b 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -615,7 +615,7 @@ void blk_mq_sched_free_requests(struct request_queue *q) queue_for_each_hw_ctx(q, hctx, i) { if (hctx->sched_tags) - blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i); + blk_mq_free_rqs_ext(q->tag_set, hctx->sched_tags, i, hctx->tags); } } diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index ce813b909339..7ff1b20d58e7 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -580,7 +580,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx, return -ENOMEM; } - blk_mq_free_rqs(set, *tagsptr, hctx->queue_num); + blk_mq_free_rqs_ext(set, *tagsptr, hctx->queue_num, hctx->tags); blk_mq_free_rq_map(*tagsptr, flags); *tagsptr = new; } else { diff --git a/block/blk-mq.c b/block/blk-mq.c index d4d7c1caa439..9cb60bf7ac24 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2286,8 +2286,8 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio) return BLK_QC_T_NONE; } -void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, - unsigned int hctx_idx) +void __blk_mq_free_rqs_ext(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, + unsigned int hctx_idx, struct blk_mq_tags *ref_tags) { struct page *page; @@ -2296,10 +2296,14 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, for (i = 0; i < tags->nr_tags; i++) { struct request *rq = tags->static_rqs[i]; + int j; if (!rq) continue; set->ops->exit_request(set, rq, hctx_idx); + /* clean up any references which occur in @ref_tags */ + for (j = 0; ref_tags && j < ref_tags->nr_tags; j++) + cmpxchg(&ref_tags->rqs[j], rq, 0); tags->static_rqs[i] = NULL; } } @@ -2316,6 +2320,18 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, } } +void blk_mq_free_rqs_ext(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, + unsigned int hctx_idx, struct blk_mq_tags *ref_tags) +{ + __blk_mq_free_rqs_ext(set, tags, hctx_idx, ref_tags); +} + +void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, + unsigned int hctx_idx) +{ + __blk_mq_free_rqs_ext(set, tags, hctx_idx, NULL); +} + void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags) { kfree(tags->rqs); diff --git a/block/blk-mq.h b/block/blk-mq.h index 3616453ca28c..031e29f74926 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -53,6 +53,8 @@ struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx, */ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, unsigned int hctx_idx); +void blk_mq_free_rqs_ext(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, + unsigned int hctx_idx, struct blk_mq_tags *references); void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags); struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, unsigned int hctx_idx, From patchwork Fri Mar 5 15:14:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Garry X-Patchwork-Id: 12118655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE6FCC433E9 for ; Fri, 5 Mar 2021 15:20:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C607C65093 for ; Fri, 5 Mar 2021 15:20:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229740AbhCEPTb (ORCPT ); Fri, 5 Mar 2021 10:19:31 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:13442 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229935AbhCEPTW (ORCPT ); Fri, 5 Mar 2021 10:19:22 -0500 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4DsWZS6TNdzjV27; Fri, 5 Mar 2021 23:17:52 +0800 (CST) Received: from localhost.localdomain (10.69.192.58) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Fri, 5 Mar 2021 23:19:08 +0800 From: John Garry To: , , , , CC: , , , , , John Garry Subject: [RFC PATCH v3 2/3] blk-mq: Freeze and quiesce all queues for tagset in elevator_exit() Date: Fri, 5 Mar 2021 23:14:53 +0800 Message-ID: <1614957294-188540-3-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1614957294-188540-1-git-send-email-john.garry@huawei.com> References: <1614957294-188540-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org A use-after-free may occur if blk_mq_queue_tag_busy_iter() is run on a queue when another queue associated with the same tagset is switching IO scheduler: BUG: KASAN: use-after-free in bt_iter+0xa0/0x120 Read of size 8 at addr ffff0410285e7e00 by task fio/2302 CPU: 24 PID: 2302 Comm: fio Not tainted 5.12.0-rc1-11925-g29a317e228d9 #747 Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018 Call trace: dump_backtrace+0x0/0x2d8 show_stack+0x18/0x68 dump_stack+0x124/0x1a0 print_address_description.constprop.13+0x68/0x30c kasan_report+0x1e8/0x258 __asan_load8+0x9c/0xd8 bt_iter+0xa0/0x120 blk_mq_queue_tag_busy_iter+0x348/0x5d8 blk_mq_in_flight+0x80/0xb8 part_stat_show+0xcc/0x210 dev_attr_show+0x44/0x90 sysfs_kf_seq_show+0x120/0x1c0 kernfs_seq_show+0x9c/0xb8 seq_read_iter+0x214/0x668 kernfs_fop_read_iter+0x204/0x2c0 new_sync_read+0x1ec/0x2d0 vfs_read+0x18c/0x248 ksys_read+0xc8/0x178 __arm64_sys_read+0x44/0x58 el0_svc_common.constprop.1+0xc8/0x1a8 do_el0_svc+0x90/0xa0 el0_svc+0x24/0x38 el0_sync_handler+0x90/0xb8 el0_sync+0x154/0x180 Indeed, blk_mq_queue_tag_busy_iter() already does take a reference to its queue usage counter when called, and the queue cannot be frozen to switch IO scheduler until all refs are dropped. This ensures no stale references to IO scheduler requests will be seen by blk_mq_queue_tag_busy_iter(). However, there is nothing to stop blk_mq_queue_tag_busy_iter() being run for another queue associated with the same tagset, and it seeing a stale IO scheduler request from the other queue after they are freed. To stop this happening, freeze and quiesce all queues associated with the tagset as the elevator is exited. Signed-off-by: John Garry Reviewed-by: Bart Van Assche Signed-off-by: Bart Van Assche --- I think that this patch is what Bart suggested: https://lore.kernel.org/linux-block/c0d127a9-9320-6e1c-4e8d-412aa9ea9ca6@acm.org/ block/blk.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/block/blk.h b/block/blk.h index 3b53e44b967e..1a948bfd91e4 100644 --- a/block/blk.h +++ b/block/blk.h @@ -201,10 +201,29 @@ void elv_unregister_queue(struct request_queue *q); static inline void elevator_exit(struct request_queue *q, struct elevator_queue *e) { + struct blk_mq_tag_set *set = q->tag_set; + struct request_queue *tmp; + lockdep_assert_held(&q->sysfs_lock); + mutex_lock(&set->tag_list_lock); + list_for_each_entry(tmp, &set->tag_list, tag_set_list) { + if (tmp == q) + continue; + blk_mq_freeze_queue(tmp); + blk_mq_quiesce_queue(tmp); + } + blk_mq_sched_free_requests(q); __elevator_exit(q, e); + + list_for_each_entry(tmp, &set->tag_list, tag_set_list) { + if (tmp == q) + continue; + blk_mq_unquiesce_queue(tmp); + blk_mq_unfreeze_queue(tmp); + } + mutex_unlock(&set->tag_list_lock); } ssize_t part_size_show(struct device *dev, struct device_attribute *attr, From patchwork Fri Mar 5 15:14:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Garry X-Patchwork-Id: 12118657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FE40C433E0 for ; Fri, 5 Mar 2021 15:20:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DA0FE64F27 for ; Fri, 5 Mar 2021 15:20:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229865AbhCEPTb (ORCPT ); Fri, 5 Mar 2021 10:19:31 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:12699 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229938AbhCEPTU (ORCPT ); Fri, 5 Mar 2021 10:19:20 -0500 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4DsWYZ2qwSzlSkl; Fri, 5 Mar 2021 23:17:06 +0800 (CST) Received: from localhost.localdomain (10.69.192.58) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Fri, 5 Mar 2021 23:19:08 +0800 From: John Garry To: , , , , CC: , , , , , John Garry Subject: [RFC PATCH v3 3/3] blk-mq: Lockout tagset iterator when exiting elevator Date: Fri, 5 Mar 2021 23:14:54 +0800 Message-ID: <1614957294-188540-4-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1614957294-188540-1-git-send-email-john.garry@huawei.com> References: <1614957294-188540-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org All queues associated with a tagset are frozen when one queue is exiting an elevator. This is to ensure that one queue running blk_mq_queue_tag_busy_iter() cannot hold a stale request reference for the queue who is exiting the elevator. However, there is nothing to stop blk_mq_all_tag_iter() being run for the tagset, and, again, getting hold of a stale request reference. A kasan UAF can be triggered for this scenario: BUG: KASAN: use-after-free in bt_tags_iter+0xe0/0x128 Read of size 4 at addr ffff001085330fcc by task more/3038 CPU: 1 PID: 3038 Comm: more Not tainted 5.12.0-rc1-11926-g7359e4a1604d-dirty #750 Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018 Call trace: dump_backtrace+0x0/0x2d0 show_stack+0x18/0x68 dump_stack+0x100/0x16c print_address_description.constprop.13+0x68/0x30c kasan_report+0x1d8/0x240 __asan_load4+0x9c/0xd8 bt_tags_iter+0xe0/0x128 __blk_mq_all_tag_iter+0x320/0x3a8 blk_mq_tagset_busy_iter+0x84/0xb8 scsi_host_busy+0x88/0xb8 show_host_busy+0x1c/0x48 dev_attr_show+0x44/0x90 sysfs_kf_seq_show+0x128/0x1c8 kernfs_seq_show+0xa0/0xb8 seq_read_iter+0x210/0x660 kernfs_fop_read_iter+0x208/0x2b0 new_sync_read+0x1ec/0x2d0 vfs_read+0x188/0x248 ksys_read+0xc8/0x178 __arm64_sys_read+0x44/0x58 el0_svc_common.constprop.1+0xc4/0x190 do_el0_svc+0x90/0xa0 el0_svc+0x24/0x38 el0_sync_handler+0x90/0xb8 el0_sync+0x154/0x180 To avoid this, reject the tagset iterators when the queue is exiting the elevator. This should not break any semantics in blk_mq_all_tag_iter(), as, since all queues are frozen, there should be no active tags to iterate. Signed-off-by: John Garry --- block/blk-mq-tag.c | 5 +++++ block/blk-mq.c | 1 + block/blk.h | 4 ++++ include/linux/blk-mq.h | 1 + 4 files changed, 11 insertions(+) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 7ff1b20d58e7..5950fee490e8 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -358,11 +358,16 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, { int i; + if (!atomic_inc_not_zero(&tagset->iter_usage_counter)) + return; + for (i = 0; i < tagset->nr_hw_queues; i++) { if (tagset->tags && tagset->tags[i]) __blk_mq_all_tag_iter(tagset->tags[i], fn, priv, BT_TAG_ITER_STARTED); } + + atomic_dec(&tagset->iter_usage_counter); } EXPORT_SYMBOL(blk_mq_tagset_busy_iter); diff --git a/block/blk-mq.c b/block/blk-mq.c index 9cb60bf7ac24..326e1b0e5b83 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3493,6 +3493,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set) goto out_free_mq_rq_maps; } } + atomic_set(&set->iter_usage_counter, 1); mutex_init(&set->tag_list_lock); INIT_LIST_HEAD(&set->tag_list); diff --git a/block/blk.h b/block/blk.h index 1a948bfd91e4..461e5b54eb5f 100644 --- a/block/blk.h +++ b/block/blk.h @@ -214,9 +214,13 @@ static inline void elevator_exit(struct request_queue *q, blk_mq_quiesce_queue(tmp); } + while (atomic_cmpxchg(&set->iter_usage_counter, 1, 0) != 1); + blk_mq_sched_free_requests(q); __elevator_exit(q, e); + atomic_set(&set->iter_usage_counter, 1); + list_for_each_entry(tmp, &set->tag_list, tag_set_list) { if (tmp == q) continue; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 2c473c9b8990..30a21335767b 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -263,6 +263,7 @@ struct blk_mq_tag_set { struct mutex tag_list_lock; struct list_head tag_list; + atomic_t iter_usage_counter; }; /**