From patchwork Sat May 9 03:10:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11537927 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7D1DC139F for ; Sat, 9 May 2020 03:11:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3AE91216FD for ; Sat, 9 May 2020 03:11:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3AE91216FD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CCE1F90002B; Fri, 8 May 2020 23:11:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C59B190001C; Fri, 8 May 2020 23:11:05 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A33E290002B; Fri, 8 May 2020 23:11:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 851E790001C for ; Fri, 8 May 2020 23:11:05 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4B3CE824805A for ; Sat, 9 May 2020 03:11:05 +0000 (UTC) X-FDA: 76795704090.20.clock54_7ddc54db3023a X-Spam-Summary: 10,1,0,63bb569d29f8f7c4,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:1:41:69:355:379:541:800:960:966:968:973:982:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2553:2559:2562:2637:2689:2693:2892:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4031:4321:4385:4605:5007:6261:6742:7875:7903:8604:8660:9108:9389:9393:9592:10007:11026:11232:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12679:12683:12895:12986:13148:13149:13161:13229:13230:13868:13894:13972:14394:21063:21080:21212:21324:21433:21444:21451:21627:21740:21972:21987:21990:30012:30054:30064:30075:30090:30091,0,RBL:209.85.210.193:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:1:0,LFtime:27,LUA_SUMMARY:none X-HE-Tag: clock54_7ddc54db3023a X-Filterd-Recvd-Size: 14493 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Sat, 9 May 2020 03:11:04 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id w65so1976788pfc.12 for ; Fri, 08 May 2020 20:11:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/hs1FsqzUCeFOKYWDLktPBXjXEQEnvc6B5fC1IA2tJs=; b=UmYs1/QL+BqOwoFwOLybhAy947IF9GCejmBhrJn8oShknQvz1UgRaq5BqKyGoc6OwO 7RME/JWPhwa458hN/XMwWxRM7B6Pkz5OpiYCt5+1DO48Ina9BjTj+nKMTxK8ofPJDozX Y8kHlq25eWxmRq3T/vJ4yr7DudMiC0idKJPV+HqA8AL/0eNtU8V/A/YvpiV0puUWlMas M8MjzlbMo0HHF59ZLgzz7IvqclbCZLBlcx7yL66rx3L6+harJsDwU7EjyCIbCl22RITB 8audhwiNqhRDW+Cfyj6PnUlMF5c0MF2+ybmoQO+vr4hX/DN+SXDtk7farZ9cV15kLXvJ S08g== X-Gm-Message-State: AGi0PuZL1b/lCIMtnvWh7cHipkKsVA7EoNn03D5ppeN+1u4Wq7p3Wqd3 kjQDskVR2hZ0yqqDkPJ1rQQ= X-Google-Smtp-Source: APiQypJIdZZf4fzJDIak5s4qUDa2dh0roVOHepWDb5ro3z6Ko9AyY0pr0+1Yd9BRw31ZoJl0sOgIzg== X-Received: by 2002:a63:115a:: with SMTP id 26mr4759653pgr.354.1588993863961; Fri, 08 May 2020 20:11:03 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id g16sm3198429pfq.203.2020.05.08.20.11.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 May 2020 20:11:00 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id C63964063E; Sat, 9 May 2020 03:10:59 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Omar Sandoval , Hannes Reinecke , Michal Hocko , Christoph Hellwig Subject: [PATCH v4 1/5] block: revert back to synchronous request_queue removal Date: Sat, 9 May 2020 03:10:54 +0000 Message-Id: <20200509031058.8239-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200509031058.8239-1-mcgrof@kernel.org> References: <20200509031058.8239-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on v4.12 moved the work behind blk_release_queue() into a workqueue after a splat floated around which indicated some work on blk_release_queue() could sleep in blk_exit_rl(). This splat would be possible when a driver called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue() as its final call) from an atomic context. blk_put_queue() decrements the refcount for the request_queue kobject, and upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is now removed through commit db6d9952356 ("block: remove request_list code") on v5.0, we reserve the right to be able to sleep within blk_release_queue() context. The last reference for the request_queue must not be called from atomic context. *When* the last reference to the request_queue reaches 0 varies, and so let's take the opportunity to document when that is expected to happen and also document the context of the related calls as best as possible so we can avoid future issues, and with the hopes that the synchronous request_queue removal sticks. We revert back to synchronous request_queue removal because asynchronous removal creates a regression with expected userspace interaction with several drivers. An example is when removing the loopback driver, one uses ioctls from userspace to do so, but upon return and if successful, one expects the device to be removed. Likewise if one races to add another device the new one may not be added as it is still being removed. This was expected behavior before and it now fails as the device is still present and busy still. Moving to asynchronous request_queue removal could have broken many scripts which relied on the removal to have been completed if there was no error. Document this expectation as well so that this doesn't regress userspace again. Using asynchronous request_queue removal however has helped us find other bugs. In the future we can test what could break with this arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Suggested-by: Nicolai Stange Fixes: dc9edc44de6c ("block: Fix a blk_exit_rl() regression") Reviewed-by: Christoph Hellwig Signed-off-by: Luis Chamberlain Reviewed-by: Bart Van Assche --- block/blk-core.c | 23 +++++++++++++ block/blk-sysfs.c | 43 +++++++++++++------------ block/genhd.c | 73 +++++++++++++++++++++++++++++++++++++++++- include/linux/blkdev.h | 2 -- 4 files changed, 117 insertions(+), 24 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index dccdae09b7b6..da120fd257fa 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -306,6 +306,16 @@ void blk_clear_pm_only(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_clear_pm_only); +/** + * blk_put_queue - decrement the request_queue refcount + * @q: the request_queue structure to decrement the refcount for + * + * Decrements the refcount of the request_queue kobject. When this reaches 0 + * we'll have blk_release_queue() called. + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. + */ void blk_put_queue(struct request_queue *q) { kobject_put(&q->kobj); @@ -337,9 +347,14 @@ EXPORT_SYMBOL_GPL(blk_set_queue_dying); * * Mark @q DYING, drain all pending requests, mark @q DEAD, destroy and * put it. All future requests will be failed immediately with -ENODEV. + * + * Context: can sleep */ void blk_cleanup_queue(struct request_queue *q) { + /* cannot be called from atomic context */ + might_sleep(); + WARN_ON_ONCE(blk_queue_registered(q)); /* mark @q DYING, no new request or merges will be allowed afterwards */ @@ -584,6 +599,14 @@ struct request_queue *blk_alloc_queue(make_request_fn make_request, int node_id) } EXPORT_SYMBOL(blk_alloc_queue); +/** + * blk_get_queue - increment the request_queue refcount + * @q: the request_queue structure to increment the refcount for + * + * Increment the refcount of the request_queue kobject. + * + * Context: Any context. + */ bool blk_get_queue(struct request_queue *q) { if (likely(!blk_queue_dying(q))) { diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index fca9b158f4a0..5d0fc165a036 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -860,22 +860,32 @@ static void blk_exit_queue(struct request_queue *q) bdi_put(q->backing_dev_info); } - /** - * __blk_release_queue - release a request queue - * @work: pointer to the release_work member of the request queue to be released + * blk_release_queue - releases all allocated resources of the request_queue + * @kobj: pointer to a kobject, whose container is a request_queue + * + * This function releases all allocated resources of the request queue. + * + * The struct request_queue refcount is incremented with blk_get_queue() and + * decremented with blk_put_queue(). Once the refcount reaches 0 this function + * is called. + * + * For drivers that have a request_queue on a gendisk and added with + * __device_add_disk() the refcount to request_queue will reach 0 with + * the last put_disk() called by the driver. For drivers which don't use + * __device_add_disk() this happens with blk_cleanup_queue(). * - * Description: - * This function is called when a block device is being unregistered. The - * process of releasing a request queue starts with blk_cleanup_queue, which - * set the appropriate flags and then calls blk_put_queue, that decrements - * the reference counter of the request queue. Once the reference counter - * of the request queue reaches zero, blk_release_queue is called to release - * all allocated resources of the request queue. + * Drivers exist which depend on the release of the request_queue to be + * synchronous, it should not be deferred. + * + * Context: can sleep */ -static void __blk_release_queue(struct work_struct *work) +static void blk_release_queue(struct kobject *kobj) { - struct request_queue *q = container_of(work, typeof(*q), release_work); + struct request_queue *q = + container_of(kobj, struct request_queue, kobj); + + might_sleep(); if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) blk_stat_remove_callback(q, q->poll_cb); @@ -904,15 +914,6 @@ static void __blk_release_queue(struct work_struct *work) call_rcu(&q->rcu_head, blk_free_queue_rcu); } -static void blk_release_queue(struct kobject *kobj) -{ - struct request_queue *q = - container_of(kobj, struct request_queue, kobj); - - INIT_WORK(&q->release_work, __blk_release_queue); - schedule_work(&q->release_work); -} - static const struct sysfs_ops queue_sysfs_ops = { .show = queue_attr_show, .store = queue_attr_store, diff --git a/block/genhd.c b/block/genhd.c index c05d509877fa..901567d70390 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -897,11 +897,32 @@ static void invalidate_partition(struct gendisk *disk, int partno) bdput(bdev); } +/** + * del_gendisk - remove the gendisk + * @disk: the struct gendisk to remove + * + * Removes the gendisk and all its associated resources. This deletes the + * partitions associated with the gendisk, and unregisters the associated + * request_queue. + * + * This is the counter to the respective __device_add_disk() call. + * + * The final removal of the struct gendisk happens when its refcount reaches 0 + * with put_disk(), which should be called after del_gendisk(), if + * __device_add_disk() was used. + * + * Drivers exist which depend on the release of the gendisk to be synchronous, + * it should not be deferred. + * + * Context: can sleep + */ void del_gendisk(struct gendisk *disk) { struct disk_part_iter piter; struct hd_struct *part; + might_sleep(); + blk_integrity_del(disk); disk_del_events(disk); @@ -992,11 +1013,15 @@ static ssize_t disk_badblocks_store(struct device *dev, * * This function gets the structure containing partitioning * information for the given device @devt. + * + * Context: can sleep */ struct gendisk *get_gendisk(dev_t devt, int *partno) { struct gendisk *disk = NULL; + might_sleep(); + if (MAJOR(devt) != BLOCK_EXT_MAJOR) { struct kobject *kobj; @@ -1528,10 +1553,31 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno) return 0; } +/** + * disk_release - releases all allocated resources of the gendisk + * @dev: the device representing this disk + * + * This function releases all allocated resources of the gendisk. + * + * The struct gendisk refcount is incremented with get_gendisk() or + * get_disk_and_module(), and its refcount is decremented with + * put_disk_and_module() or put_disk(). Once the refcount reaches 0 this + * function is called. + * + * Drivers which used __device_add_disk() have a gendisk with a request_queue + * assigned. Since the request_queue sits on top of the gendisk for these + * drivers we also call blk_put_queue() for them, and we expect the + * request_queue refcount to reach 0 at this point, and so the request_queue + * will also be freed prior to the disk. + * + * Context: can sleep + */ static void disk_release(struct device *dev) { struct gendisk *disk = dev_to_disk(dev); + might_sleep(); + blk_free_devt(dev->devt); disk_release_events(disk); kfree(disk->random); @@ -1737,6 +1783,15 @@ struct gendisk *__alloc_disk_node(int minors, int node_id) } EXPORT_SYMBOL(__alloc_disk_node); +/** + * get_disk_and_module - increments the gendisk and gendisk fops module refcount + * @disk: the struct gendisk to to increment the refcount for + * + * This increments the refcount for the struct gendisk, and the gendisk's + * fops module owner. + * + * Context: Any context. + */ struct kobject *get_disk_and_module(struct gendisk *disk) { struct module *owner; @@ -1757,6 +1812,16 @@ struct kobject *get_disk_and_module(struct gendisk *disk) } EXPORT_SYMBOL(get_disk_and_module); +/** + * put_disk - decrements the gendisk refcount + * @disk: the struct gendisk to to decrement the refcount for + * + * This decrements the refcount for the struct gendisk. When this reaches 0 + * we'll have disk_release() called. + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. + */ void put_disk(struct gendisk *disk) { if (disk) @@ -1764,9 +1829,15 @@ void put_disk(struct gendisk *disk) } EXPORT_SYMBOL(put_disk); -/* +/** + * put_disk_and_module - decrements the module and gendisk refcount + * @disk: the struct gendisk to to decrement the refcount for + * * This is a counterpart of get_disk_and_module() and thus also of * get_gendisk(). + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. */ void put_disk_and_module(struct gendisk *disk) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f00bd4042295..3122a93c7277 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -571,8 +571,6 @@ struct request_queue { size_t cmd_size; - struct work_struct release_work; - #define BLK_MAX_WRITE_HINTS 5 u64 write_hints[BLK_MAX_WRITE_HINTS]; }; From patchwork Sat May 9 03:10:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11537941 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8209139F for ; Sat, 9 May 2020 03:11:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8F8BD218AC for ; Sat, 9 May 2020 03:11:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F8BD218AC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DB93090002D; Fri, 8 May 2020 23:11:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D6AF290001C; Fri, 8 May 2020 23:11:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C304890002D; Fri, 8 May 2020 23:11:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id A7B5E90001C for ; Fri, 8 May 2020 23:11:07 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6D4FE180AD811 for ; Sat, 9 May 2020 03:11:07 +0000 (UTC) X-FDA: 76795704174.30.seed36_7e2b4b233d357 X-Spam-Summary: 2,0,0,f73cffafd30154fd,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1543:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3165:3354:3867:3868:3872:4117:4321:4605:5007:6261:6742:9592:10004:11026:11232:11473:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12895:12986:13894:13972:14093:14096:14181:14394:14721:21080:21222:21444:21451:21627:21990:30012:30054:30064,0,RBL:209.85.214.193:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: seed36_7e2b4b233d357 X-Filterd-Recvd-Size: 6081 Received: from mail-pl1-f193.google.com (mail-pl1-f193.google.com [209.85.214.193]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Sat, 9 May 2020 03:11:06 +0000 (UTC) Received: by mail-pl1-f193.google.com with SMTP id s10so1592809plr.1 for ; Fri, 08 May 2020 20:11:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GsoZPFNxO+4hxSpvgE4DeJVl8mNmT5WoVOtsmVVbdVk=; b=cygK31KAm9ZrPYTfZxpaaTlI7AO1vkc+k7tceAK89/ptUUA3tUNI8RCpukJaC7rTjr soNy7QdoQQaW6cevIeHcmUq6d1fY1tQ619m1rqqVhyLCczK+zw+sqPbLn6K56tm+YWOb Mmn/Kq5jItKWOSHr0zkB2ygbJoTiJ/MjIgdsICf144511pcZjH4peTzkaU2S7Z2xwU0p tG3tdNEbvLcgaA4Q12ynqqR6aZl5wtcr0ODbTLbRgH1wbj7h3tHjXTF0HqF+pllgpzBw UTv9MqnKLok7COVyVAiWzNdDEzM1fuRmq8/g2EFw2clJApefBY5Xzq0VU1U58WYO8dHm b1zQ== X-Gm-Message-State: AGi0PuY4NBWEjYiAHUVNTR+PX9+Su4NmoYI82UBCwpI3KuS2aT1df8wQ z1mdnT0196YCpwnlCV/iFgt/bLGYApc= X-Google-Smtp-Source: APiQypIjGBDDLRX5XOPh0RRN+PZ/cn6kVp6wDqUKRMqjvQKgal0TNIvxddybqbqILVcuM2mK1vF1CQ== X-Received: by 2002:a17:90a:77c6:: with SMTP id e6mr9324761pjs.84.1588993865942; Fri, 08 May 2020 20:11:05 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id i10sm3183476pfa.166.2020.05.08.20.11.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 May 2020 20:11:00 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id D7C7141405; Sat, 9 May 2020 03:10:59 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Omar Sandoval , Hannes Reinecke , Michal Hocko , Christoph Hellwig Subject: [PATCH v4 2/5] block: move main block debugfs initialization to its own file Date: Sat, 9 May 2020 03:10:55 +0000 Message-Id: <20200509031058.8239-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200509031058.8239-1-mcgrof@kernel.org> References: <20200509031058.8239-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: make_request-based drivers and and request-based drivers share some debugfs code. By moving this into its own file it makes it easier to expand and audit this shared code. This patch contains no functional changes. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Reviewed-by: Christoph Hellwig Reviewed-by: Greg Kroah-Hartman Reviewed-by: Bart Van Assche Signed-off-by: Luis Chamberlain --- block/Makefile | 1 + block/blk-core.c | 9 +-------- block/blk-debugfs.c | 15 +++++++++++++++ block/blk.h | 7 +++++++ 4 files changed, 24 insertions(+), 8 deletions(-) create mode 100644 block/blk-debugfs.c diff --git a/block/Makefile b/block/Makefile index 206b96e9387f..1d3ab20505d8 100644 --- a/block/Makefile +++ b/block/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \ blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \ genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o +obj-$(CONFIG_DEBUG_FS) += blk-debugfs.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o obj-$(CONFIG_BLK_DEV_BSG) += bsg.o diff --git a/block/blk-core.c b/block/blk-core.c index da120fd257fa..0a34b299275e 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -49,10 +49,6 @@ #include "blk-pm.h" #include "blk-rq-qos.h" -#ifdef CONFIG_DEBUG_FS -struct dentry *blk_debugfs_root; -#endif - EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete); @@ -1813,10 +1809,7 @@ int __init blk_dev_init(void) blk_requestq_cachep = kmem_cache_create("request_queue", sizeof(struct request_queue), 0, SLAB_PANIC, NULL); - -#ifdef CONFIG_DEBUG_FS - blk_debugfs_root = debugfs_create_dir("block", NULL); -#endif + blk_debugfs_register(); return 0; } diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c new file mode 100644 index 000000000000..19091e1effc0 --- /dev/null +++ b/block/blk-debugfs.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Shared request-based / make_request-based functionality + */ +#include +#include +#include + +struct dentry *blk_debugfs_root; + +void blk_debugfs_register(void) +{ + blk_debugfs_root = debugfs_create_dir("block", NULL); +} diff --git a/block/blk.h b/block/blk.h index 73bd3b1c6938..ec16e8a6049e 100644 --- a/block/blk.h +++ b/block/blk.h @@ -456,5 +456,12 @@ struct request_queue *__blk_alloc_queue(int node_id); int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, bool *same_page); +#ifdef CONFIG_DEBUG_FS +void blk_debugfs_register(void); +#else +static inline void blk_debugfs_register(void) +{ +} +#endif /* CONFIG_DEBUG_FS */ #endif /* BLK_INTERNAL_H */ From patchwork Sat May 9 03:10:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11537937 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49F8A1668 for ; Sat, 9 May 2020 03:11:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E0453216FD for ; Sat, 9 May 2020 03:11:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E0453216FD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5F1BA90002C; Fri, 8 May 2020 23:11:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 553A290001C; Fri, 8 May 2020 23:11:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CCDE90002C; Fri, 8 May 2020 23:11:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id 14A8990001C for ; Fri, 8 May 2020 23:11:07 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B8C6D52A6 for ; Sat, 9 May 2020 03:11:06 +0000 (UTC) X-FDA: 76795704132.03.match32_7e0e2e7f3ea51 X-Spam-Summary: 2,0,0,80efefb2c46f540e,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:327:355:379:541:960:965:966:968:973:982:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:1981:2194:2196:2198:2199:2200:2201:2393:2559:2562:2693:2901:2903:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3873:3874:4250:4321:4385:4386:4390:4395:4605:5007:6238:6261:6742:7208:7875:7903:7904:7974:8527:8660:9121:9163:9165:9389:9592:10004:10226:10394:11026:11233:11657:11914:12043:12048:12291:12295:12296:12297:12438:12517:12519:12555:12679:12683:12740:12895:12986:13148:13161:13221:13229:13230:13870:13894:13972:14394:21063:21080:21222:21324:21444:21451:21611:21622:21740:21773:21789:21939:21966:21987:21990:30003:30012:30029:30045:30051:30054:30056:30064:30070:30076:30090,0,RBL:209.85.210.196:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2 4,LUA_SU X-HE-Tag: match32_7e0e2e7f3ea51 X-Filterd-Recvd-Size: 35081 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Sat, 9 May 2020 03:11:06 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id d184so1994758pfd.4 for ; Fri, 08 May 2020 20:11:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=c+N98j7Uw/XgQMz0eucqhw8vB6Ont3YdqTGvm5jrEkk=; b=pQGCCcxSkvxaJotY+o1m5UmbYkeX+ND9QFi0ljwUGMNtk3f8U0DLm9Rs3mDSS9xeQj 8JBY1+Ngobz14q7ziqMhbneW5evF9Gmi+j3Hq1gfZIX5o+WRsWzd5TWbFPdfowvLFBjn 1tws9IgNQeypZdrRu33b/6bG/by8GPLs9jZQEff/9mBeRx6960i+DJOKSV2O6Ds6W8ow UuWTXbcq7V0erSRvlo+LGGj+IMH8c9BPAIgULJV6oWCgzuX16XlHX7ESKdKVhNpNmS+R Zs3ZEwhaHFyyVA5z8cRW36k1ulCz+dQjIbo/w+yzGlqKlzEHR3YnvqgmkFcxAPFgsfn8 j+tg== X-Gm-Message-State: AGi0PuYxUAZWqw5INlvOe6o9MUF7lj+CtB/L70c0Wi5eS1r7CjCiyp7l ouaYioFegW3LnDPP6X51GhA= X-Google-Smtp-Source: APiQypKtd+3OZRFx+NniDMniNMYSjUjwR4ST9j1pimJhPsHP/tSPxVTNzxIjsuWHyzxYKXMGT08Rlw== X-Received: by 2002:a65:460f:: with SMTP id v15mr4718325pgq.24.1588993865079; Fri, 08 May 2020 20:11:05 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id a196sm3195827pfd.184.2020.05.08.20.11.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 May 2020 20:11:00 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id EBCC041D00; Sat, 9 May 2020 03:10:59 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Omar Sandoval , Hannes Reinecke , Michal Hocko , Christof Schmitt , syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Subject: [PATCH v4 3/5] blktrace: fix debugfs use after free Date: Sat, 9 May 2020 03:10:56 +0000 Message-Id: <20200509031058.8239-4-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200509031058.8239-1-mcgrof@kernel.org> References: <20200509031058.8239-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory") merged on v4.12 Omar fixed the original blktrace code for request-based drivers (multiqueue). This however left in place a possible crash, if you happen to abuse blktrace while racing to remove / add a device. We used to use asynchronous removal of the request_queue, and with that the issue was easier to reproduce. Now that we have reverted to synchronous removal of the request_queue, the issue is still possible to reproduce, its however just a bit more difficult. We essentially run two instances of break-blktrace which add/remove a loop device, and setup a blktrace and just never tear the blktrace down. We do this twice in parallel. This is easily reproduced with the break-blktrace run_0004.sh script. We can end up with two types of panics each reflecting where we race, one a failed blktrace setup: [ 252.426751] debugfs: Directory 'loop0' with parent 'block' already present! [ 252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 252.436592] #PF: supervisor write access in kernel mode [ 252.439822] #PF: error_code(0x0002) - not-present page [ 252.442967] PGD 0 P4D 0 [ 252.444656] Oops: 0002 [#1] SMP NOPTI [ 252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 252.456343] RIP: 0010:down_write+0x15/0x40 [ 252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246 [ 252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000 [ 252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001 [ 252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000 [ 252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0 [ 252.473346] FS: 00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000 [ 252.475225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0 [ 252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 252.479866] Call Trace: [ 252.480322] simple_recursive_removal+0x4e/0x2e0 [ 252.481078] ? debugfs_remove+0x60/0x60 [ 252.481725] ? relay_destroy_buf+0x77/0xb0 [ 252.482662] debugfs_remove+0x40/0x60 [ 252.483518] blk_remove_buf_file_callback+0x5/0x10 [ 252.484328] relay_close_buf+0x2e/0x60 [ 252.484930] relay_open+0x1ce/0x2c0 [ 252.485520] do_blk_trace_setup+0x14f/0x2b0 [ 252.486187] __blk_trace_setup+0x54/0xb0 [ 252.486803] blk_trace_ioctl+0x90/0x140 [ 252.487423] ? do_sys_openat2+0x1ab/0x2d0 [ 252.488053] blkdev_ioctl+0x4d/0x260 [ 252.488636] block_ioctl+0x39/0x40 [ 252.489139] ksys_ioctl+0x87/0xc0 [ 252.489675] __x64_sys_ioctl+0x16/0x20 [ 252.490380] do_syscall_64+0x52/0x180 [ 252.491032] entry_SYSCALL_64_after_hwframe+0x44/0xa9 And the other on the device removal: [ 128.528940] debugfs: Directory 'loop0' with parent 'block' already present! [ 128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 128.619537] #PF: supervisor write access in kernel mode [ 128.622700] #PF: error_code(0x0002) - not-present page [ 128.625842] PGD 0 P4D 0 [ 128.627585] Oops: 0002 [#1] SMP NOPTI [ 128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 128.640471] RIP: 0010:down_write+0x15/0x40 [ 128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246 [ 128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000 [ 128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0 [ 128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000 [ 128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0 [ 128.660500] FS: 00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000 [ 128.662204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0 [ 128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.667282] Call Trace: [ 128.667801] simple_recursive_removal+0x4e/0x2e0 [ 128.668663] ? debugfs_remove+0x60/0x60 [ 128.669368] debugfs_remove+0x40/0x60 [ 128.669985] blk_trace_free+0xd/0x50 [ 128.670593] __blk_trace_remove+0x27/0x40 [ 128.671274] blk_trace_shutdown+0x30/0x40 [ 128.671935] blk_release_queue+0x95/0xf0 [ 128.672589] kobject_put+0xa5/0x1b0 [ 128.673188] disk_release+0xa2/0xc0 [ 128.673786] device_release+0x28/0x80 [ 128.674376] kobject_put+0xa5/0x1b0 [ 128.674915] loop_remove+0x39/0x50 [loop] [ 128.675511] loop_control_ioctl+0x113/0x130 [loop] [ 128.676199] ksys_ioctl+0x87/0xc0 [ 128.676708] __x64_sys_ioctl+0x16/0x20 [ 128.677274] do_syscall_64+0x52/0x180 [ 128.677823] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The common theme here is: debugfs: Directory 'loop0' with parent 'block' already present This crash happens because of how blktrace uses the debugfs directory where it places its files. Upon init we always create the same directory which would be needed by blktrace but we only do this for make_request drivers (multiqueue) block drivers, but never for request-based block drivers. Furthermore, that directory is only created on init for the entire disk. This means that if you use blktrace on a partition, we'll always be creating a new directory regardless of whether or not you are doing blktrace on a make_request driver (multiqueue) or a request-based block drivers. These directory creations are only associated with a path, and so when a debugfs_remove() is called it removes everything in its way. A device removal will remove all blktrace files, and so if a blktrace is still present a cleanup of blktrace files later will end up trying to remove dentries pointing to NULL. We can fix the UAF by using a debugfs directory which moving forward will always be accessible if debugfs is enabled for both make_request drivers (multiqueue) and request-based block drivers, *and* for all partitions upon creation. This ensures that removal of the directories only happens on device removal and removes the race of the files underneath an active blktrace. For partitions we simply symlink to the whole disk's debugfs_dir, as the debugfs_dir is shared anyway and this limits us to only run one blktrace for the entire disk. We special-case a solution for scsi-generic which got blktrace support added by Christof via commit 6da127ad0918 ("blktrace: Add blktrace ioctls to SCSI generic devices") so upstream since v2.6.25. scsi-generic drives use a character device, however behind the scenes we have a scsi device with a request_queue. How this is used varies by class of driver (TYPE_DISK, TYPE_TYPE, etc). Care has to be taken into consideration of the fact that scsi drivers will probe asynchronously but the scsi-generic class_interface sg_add_device() will complete before. This means sd_probe() will use device_add_disk() for TYPE_DISK and have its debugfs_dir created *after* the scsi-generic device is created. For scsi-generic then we symlink to the real debugfs_dir only during a blktrace ioctl, but we do this only once. We also have to special-case yet another solution for drivers which use the bsg queue. This goes tested with: o nvme partitions o ISCSI with tgt, and blktracing against scsi-generic with: o block o tape o cdrom o media changer Screenshots of what the debugfs for block looks like after running blktrace on a system with sg0 which has a raid controllerand then sg1 as the media changer: # ls -l /sys/kernel/debug/block total 0 drwxr-xr-x 3 root root 0 May 9 02:31 bsg drwxr-xr-x 19 root root 0 May 9 02:31 nvme0n1 drwxr-xr-x 19 root root 0 May 9 02:31 nvme1n1 lrwxrwxrwx 1 root root 0 May 9 02:31 nvme1n1p1 -> nvme1n1 lrwxrwxrwx 1 root root 0 May 9 02:31 nvme1n1p2 -> nvme1n1 lrwxrwxrwx 1 root root 0 May 9 02:31 nvme1n1p3 -> nvme1n1 lrwxrwxrwx 1 root root 0 May 9 02:31 nvme1n1p5 -> nvme1n1 lrwxrwxrwx 1 root root 0 May 9 02:31 nvme1n1p6 -> nvme1n1 drwxr-xr-x 2 root root 0 May 9 02:33 sch0 lrwxrwxrwx 1 root root 0 May 9 02:33 sg0 -> bsg/2:0:0:0 lrwxrwxrwx 1 root root 0 May 9 02:33 sg1 -> sch0 drwxr-xr-x 5 root root 0 May 9 02:31 vda lrwxrwxrwx 1 root root 0 May 9 02:31 vda1 -> vda Code for handling the ebugfs_dir did get more complicatd for scsi-generic but this is technical debt. For the other types of devices, this simplifies the code considerably, with the only penalty now being that we're always creating the request queue debugfs directory for the request-based block device drivers. The symlink use also makes it clearer when the request_queue is shared. This patch is part of the work which disputes the severity of CVE-2019-19770 which shows this issue is not a core debugfs issue, but a misuse of debugfs within blktace. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Cc: Christof Schmitt Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory") Signed-off-by: Luis Chamberlain --- block/blk-debugfs.c | 187 +++++++++++++++++++++++++++++++++++ block/blk-mq-debugfs.c | 5 - block/blk-sysfs.c | 3 + block/blk.h | 16 +++ block/bsg.c | 2 + block/partitions/core.c | 9 ++ drivers/scsi/ch.c | 1 + drivers/scsi/sg.c | 75 ++++++++++++++ drivers/scsi/st.c | 2 + include/linux/blkdev.h | 4 +- include/linux/blktrace_api.h | 1 - include/linux/genhd.h | 69 +++++++++++++ kernel/trace/blktrace.c | 24 +++-- 13 files changed, 385 insertions(+), 13 deletions(-) diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c index 19091e1effc0..d40f12aecf8a 100644 --- a/block/blk-debugfs.c +++ b/block/blk-debugfs.c @@ -8,8 +8,195 @@ #include struct dentry *blk_debugfs_root; +struct dentry *blk_debugfs_bsg = NULL; + +/** + * enum blk_debugfs_dir_type - block device debugfs directory type + * @BLK_DBG_DIR_BASE: the block device debugfs_dir exists on the base + * system /block/ debugfs directory. + * @BLK_DBG_DIR_BSG: the block device debugfs_dir is under the directory + * /block/bsg/ + */ +enum blk_debugfs_dir_type { + BLK_DBG_DIR_BASE = 1, + BLK_DBG_DIR_BSG, +}; void blk_debugfs_register(void) { blk_debugfs_root = debugfs_create_dir("block", NULL); } + +static struct dentry *queue_get_base_dir(enum blk_debugfs_dir_type type) +{ + switch (type) { + case BLK_DBG_DIR_BASE: + return blk_debugfs_root; + case BLK_DBG_DIR_BSG: + return blk_debugfs_bsg; + } + return NULL; +} + +static void queue_debugfs_register_type(struct request_queue *q, + const char *name, + enum blk_debugfs_dir_type type) +{ + struct dentry *base_dir = queue_get_base_dir(type); + + q->debugfs_dir = debugfs_create_dir(name, base_dir); +} + +/** + * blk_queue_debugfs_register - register the debugfs_dir for the block device + * @q: the associated request_queue of the block device + * @name: the name of the block device exposed + * + * This is used to create the debugfs_dir used by the block layer and blktrace. + * Drivers which use any of the *add_disk*() calls or variants have this called + * automatically for them. This directory is removed automatically on + * blk_release_queue() once the request_queue reference count reaches 0. + */ +void blk_queue_debugfs_register(struct request_queue *q, const char *name) +{ + queue_debugfs_register_type(q, name, BLK_DBG_DIR_BASE); +} +EXPORT_SYMBOL_GPL(blk_queue_debugfs_register); + +/** + * blk_queue_debugfs_unregister - remove the debugfs_dir for the block device + * @q: the associated request_queue of the block device + * + * Removes the debugfs_dir for the request_queue on the associated block device. + * This is handled for you on blk_release_queue(), and that should only be + * called once. + * + * Since we don't care where the debugfs_dir was created this is used for all + * types of of enum blk_debugfs_dir_type. + */ +void blk_queue_debugfs_unregister(struct request_queue *q) +{ + debugfs_remove_recursive(q->debugfs_dir); +} + +static struct dentry *queue_debugfs_symlink_type(struct request_queue *q, + const char *src, + const char *dst, + enum blk_debugfs_dir_type type) +{ + struct dentry *dentry = ERR_PTR(-EINVAL); + char *dir_dst; + + dir_dst = kzalloc(PATH_MAX, GFP_KERNEL); + if (!dir_dst) + return dentry; + + switch (type) { + case BLK_DBG_DIR_BASE: + if (dst) + snprintf(dir_dst, PATH_MAX, "%s", dst); + else if (!IS_ERR_OR_NULL(q->debugfs_dir)) + snprintf(dir_dst, PATH_MAX, "%s", + q->debugfs_dir->d_name.name); + else + goto out; + break; + case BLK_DBG_DIR_BSG: + if (dst) + snprintf(dir_dst, PATH_MAX, "bsg/%s", dst); + else + goto out; + break; + } + + /* + * The base block debugfs directory is always used for the symlinks, + * their target is what changes. + */ + dentry = debugfs_create_symlink(src, blk_debugfs_root, dir_dst); +out: + kfree(dir_dst); + + return dentry; +} + +/** + * blk_queue_debugfs_symlink - symlink to the real block device debugfs_dir + * @q: the request queue where we know the debugfs_dir exists or will exist + * eventually. Cannot be NULL. + * @src: name of the exposed device we wish to associate to the block device + * @dst: the name of the directory to which we want to symlink to, may be NULL + * if you do not know what this may be, but only if your base block device + * is not bsg. If you set this to NULL, we will have no other option but + * to look at the request_queue to infer the name, but you must ensure + * it is already be set, be mindful of asynchronous probes. + * + * Some devices don't have a request_queue of their own, however, they have an + * association to one and have historically supported using the same + * debugfs_dir which has been used to represent the whole disk for blktrace + * functionality. Such is the case for partitions and for scsi-generic devices. + * They share the same request_queue and debugfs_dir as with the whole disk for + * blktrace purposes. This helper allows such association to be made explicit + * and enable blktrace functionality for them. scsi-generic devices representing + * scsi device such as block, cdrom, tape, media changer register their own + * debug_dir already and share the same request_queue as with scsi-generic, as + * such the respective scsi-generic debugfs_dir is just a symlink to these + * driver's debugfs_dir. + * + * To remove use debugfs_remove() on the symlink dentry returned by this + * function. The block layer will not clean this up for you, you must remove + * it yourself in case of device removal. + */ +struct dentry *blk_queue_debugfs_symlink(struct request_queue *q, + const char *src, + const char *dst) +{ + return queue_debugfs_symlink_type(q, src, dst, BLK_DBG_DIR_BASE); +} +EXPORT_SYMBOL_GPL(blk_queue_debugfs_symlink); + +#ifdef CONFIG_BLK_DEV_BSG + +void blk_debugfs_register_bsg(void) +{ + blk_debugfs_bsg = debugfs_create_dir("bsg", blk_debugfs_root); +} + +/** + * blk_queue_debugfs_register_bsg - create the debugfs_dir for bsg block devices + * @q: the associated request_queue of the block device + * @name: the name of the block device exposed + * + * This is used to create the debugfs_dir used by the Block layer SCSI generic + * (bsg) driver. This is to be used only by the scsi-generic driver on behalf + * of scsi devices which work as scsi controllers or transports. + * + * This directory is cleaned up for all drivers automatically on + * blk_release_queue() once the request_queue reference count reaches 0. + */ +void blk_queue_debugfs_register_bsg(struct request_queue *q, const char *name) +{ + queue_debugfs_register_type(q, name, BLK_DBG_DIR_BSG); +} +EXPORT_SYMBOL_GPL(blk_queue_debugfs_register_bsg); + +/** + * blk_queue_debugfs_symlink_bsg - symlink to the bsg debugfs_dir + * @q: the request queue where we know the debugfs_dir exists or will exist + * eventually. Cannot be NULL. + * @src: name of the scsi-generic device we wish to associate to the bsg + * request_queue. + * @dst: the name of the bsg request_queue debugfs_dir to which we want to + * symlink to. This cannot be NULL. + * + * This is used by scsi-generic devices representing raid controllers / + * transport drivers. + */ +struct dentry *blk_queue_debugfs_bsg_symlink(struct request_queue *q, + const char *src, + const char *dst) +{ + return queue_debugfs_symlink_type(q, src, dst, BLK_DBG_DIR_BSG); +} +EXPORT_SYMBOL_GPL(blk_queue_debugfs_bsg_symlink); +#endif /* CONFIG_BLK_DEV_BSG */ diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 96b7a35c898a..08edc3a54114 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -822,9 +822,6 @@ void blk_mq_debugfs_register(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i; - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), - blk_debugfs_root); - debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs); /* @@ -855,9 +852,7 @@ void blk_mq_debugfs_register(struct request_queue *q) void blk_mq_debugfs_unregister(struct request_queue *q) { - debugfs_remove_recursive(q->debugfs_dir); q->sched_debugfs_dir = NULL; - q->debugfs_dir = NULL; } static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 5d0fc165a036..1d151f19bd87 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -905,6 +905,7 @@ static void blk_release_queue(struct kobject *kobj) blk_trace_shutdown(q); + blk_queue_debugfs_unregister(q); if (queue_is_mq(q)) blk_mq_debugfs_unregister(q); @@ -976,6 +977,8 @@ int blk_register_queue(struct gendisk *disk) goto unlock; } + blk_queue_debugfs_register(q, kobject_name(q->kobj.parent)); + if (queue_is_mq(q)) { __blk_mq_register_dev(dev, q); blk_mq_debugfs_register(q); diff --git a/block/blk.h b/block/blk.h index ec16e8a6049e..f7ace11c8bd1 100644 --- a/block/blk.h +++ b/block/blk.h @@ -458,10 +458,26 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, bool *same_page); #ifdef CONFIG_DEBUG_FS void blk_debugfs_register(void); +void blk_queue_debugfs_unregister(struct request_queue *q); +void blk_part_debugfs_register(struct hd_struct *p, const char *name); +void blk_part_debugfs_unregister(struct hd_struct *p); #else static inline void blk_debugfs_register(void) { } + +static inline void blk_queue_debugfs_unregister(struct request_queue *q) +{ +} + +static inline void blk_part_debugfs_register(struct hd_struct *p, + const char *name) +{ +} + +static inline void blk_part_debugfs_unregister(struct hd_struct *p) +{ +} #endif /* CONFIG_DEBUG_FS */ #endif /* BLK_INTERNAL_H */ diff --git a/block/bsg.c b/block/bsg.c index d7bae94b64d9..bfb1036858c4 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -503,6 +503,8 @@ static int __init bsg_init(void) if (ret) goto unregister_chrdev; + blk_debugfs_register_bsg(); + printk(KERN_INFO BSG_DESCRIPTION " version " BSG_VERSION " loaded (major %d)\n", bsg_major); return 0; diff --git a/block/partitions/core.c b/block/partitions/core.c index 873999e2e2f2..a96b2418e70d 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "check.h" static int (*check_part[])(struct parsed_partitions *) = { @@ -309,6 +310,9 @@ void delete_partition(struct gendisk *disk, struct hd_struct *part) struct disk_part_tbl *ptbl = rcu_dereference_protected(disk->part_tbl, 1); +#ifdef CONFIG_DEBUG_FS + debugfs_remove(part->debugfs_sym); +#endif rcu_assign_pointer(ptbl->part[part->partno], NULL); rcu_assign_pointer(ptbl->last_lookup, NULL); kobject_put(part->holder_dir); @@ -450,6 +454,11 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, /* everything is up and running, commence */ rcu_assign_pointer(ptbl->part[partno], p); +#ifdef CONFIG_DEBUG_FS + p->debugfs_sym = blk_queue_debugfs_symlink(disk->queue, dev_name(pdev), + disk->disk_name); +#endif + /* suppress uevent if the disk suppresses it */ if (!dev_get_uevent_suppress(ddev)) kobject_uevent(&pdev->kobj, KOBJ_ADD); diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c index cb74ab1ae5a4..5dfabc04bfef 100644 --- a/drivers/scsi/ch.c +++ b/drivers/scsi/ch.c @@ -971,6 +971,7 @@ static int ch_probe(struct device *dev) mutex_unlock(&ch->lock); dev_set_drvdata(dev, ch); + blk_queue_debugfs_register(sd->request_queue, dev_name(class_dev)); sdev_printk(KERN_INFO, sd, "Attached scsi changer %s\n", ch->name); return 0; diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 20472aaaf630..6fa201086e59 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -47,6 +47,7 @@ static int sg_version_num = 30536; /* 2 digits for each component */ #include #include #include /* for sg_check_file_access() */ +#include #include "scsi.h" #include @@ -169,6 +170,10 @@ typedef struct sg_device { /* holds the state of each scsi generic device */ struct gendisk *disk; struct cdev * cdev; /* char_dev [sysfs: /sys/cdev/major/sg] */ struct kref d_ref; +#ifdef CONFIG_DEBUG_FS + bool debugfs_set; + struct dentry *debugfs_sym; +#endif } Sg_device; /* tasklet or soft irq callback */ @@ -914,6 +919,72 @@ static int put_compat_request_table(struct compat_sg_req_info __user *o, } #endif +#ifdef CONFIG_DEBUG_FS +/* + * For scsi-generic devices like TYPE_DISK will re-use the scsi_device + * request_queue on their driver for their disk and later device_add_disk() it, + * we want its respective scsi-generic debugfs_dir to just be a symlink to the + * one created on the real scsi device probe. + * + * We use this on the ioctl path instead of sg_add_device() since some driver + * probes can run asynchronously. Such is the case for scsi devices of + * TYPE_DISK, and the class interface currently has no callbacks once a device + * driver probe has completed its probe. We don't use wait_for_device_probe() + * on sg_add_device() as that would defeat the purpose of using asynchronous + * probe. + */ +static void sg_init_blktrace_setup(Sg_device *sdp) +{ + struct scsi_device *scsidp = sdp->device; + struct device *scsi_dev = &scsidp->sdev_gendev; + struct gendisk *sg_disk = sdp->disk; + struct request_queue *q = scsidp->request_queue; + + /* + * Although debugfs is used for debugging purposes and we + * typically don't care about the return value, we do here + * because we use it for userspace to ensure blktrace works. + * + * Instead of always just checking for the return value though, + * just try setting this once, if the first time failed we don't + * try again. + */ + if (sdp->debugfs_set) + return; + + switch (sdp->device->type) { + case TYPE_RAID: + /* + * We do the registration for bsg here to keep bsg scsi_device + * opaque. If bsg is disabled we just create the debugfs_dir on + * the base block debugfs_dir and scsi-generic symlinks to it. + */ + blk_queue_debugfs_register_bsg(q, dev_name(scsi_dev)); + sdp->debugfs_sym = + blk_queue_debugfs_bsg_symlink(q, + sg_disk->disk_name, + dev_name(scsi_dev)); + break; + default: + /* + * We don't know scsi_device probed device name (this is + * different from the scsi_device name). This is opaque to + * scsi-generic, so we use the request_queue to infer the name + * based on the set debugfs_dir. + */ + sdp->debugfs_sym = blk_queue_debugfs_symlink(q, + sg_disk->disk_name, + NULL); + break; + } + sdp->debugfs_set = true; +} +#else +static void sg_init_blktrace_setup(Sg_device *sdp) +{ +} +#endif + static long sg_ioctl_common(struct file *filp, Sg_device *sdp, Sg_fd *sfp, unsigned int cmd_in, void __user *p) @@ -1117,6 +1188,7 @@ sg_ioctl_common(struct file *filp, Sg_device *sdp, Sg_fd *sfp, return put_user(max_sectors_bytes(sdp->device->request_queue), ip); case BLKTRACESETUP: + sg_init_blktrace_setup(sdp); return blk_trace_setup(sdp->device->request_queue, sdp->disk->disk_name, MKDEV(SCSI_GENERIC_MAJOR, sdp->index), @@ -1644,6 +1716,9 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf) sysfs_remove_link(&scsidp->sdev_gendev.kobj, "generic"); device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index)); +#ifdef CONFIG_DEBUG_FS + debugfs_remove(sdp->debugfs_sym); +#endif cdev_del(sdp->cdev); sdp->cdev = NULL; diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c index 4bf4ab3b70f4..fb3c0546803a 100644 --- a/drivers/scsi/st.c +++ b/drivers/scsi/st.c @@ -4417,6 +4417,8 @@ static int st_probe(struct device *dev) if (error) goto out_remove_devs; scsi_autopm_put_device(SDp); + blk_queue_debugfs_register(tpnt->device->request_queue, + tape_name(tpnt)); sdev_printk(KERN_NOTICE, SDp, "Attached scsi tape %s\n", tape_name(tpnt)); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3122a93c7277..9b12fcc94572 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -561,8 +561,10 @@ struct request_queue { struct list_head tag_set_list; struct bio_set bio_split; -#ifdef CONFIG_BLK_DEBUG_FS +#ifdef CONFIG_DEBUG_FS struct dentry *debugfs_dir; +#endif +#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; struct dentry *rqos_debugfs_dir; #endif diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 3b6ff5902edc..eb6db276e293 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -22,7 +22,6 @@ struct blk_trace { u64 end_lba; u32 pid; u32 dev; - struct dentry *dir; struct dentry *dropped_file; struct dentry *msg_file; struct list_head running_list; diff --git a/include/linux/genhd.h b/include/linux/genhd.h index f9c226f9546a..71b7896365b3 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -86,6 +86,9 @@ struct hd_struct { #endif struct percpu_ref ref; struct rcu_work rcu_work; +#ifdef CONFIG_DEBUG_FS + struct dentry *debugfs_sym; +#endif }; /** @@ -391,4 +394,70 @@ static inline dev_t blk_lookup_devt(const char *name, int partno) } #endif /* CONFIG_BLOCK */ +#ifdef CONFIG_DEBUG_FS +void blk_queue_debugfs_register(struct request_queue *q, const char *name); +struct dentry *blk_queue_debugfs_symlink(struct request_queue *q, + const char *src, + const char *dst); +#ifdef CONFIG_BLK_DEV_BSG +void blk_debugfs_register_bsg(void); +void blk_queue_debugfs_register_bsg(struct request_queue *q, const char *name); +struct dentry *blk_queue_debugfs_bsg_symlink(struct request_queue *q, + const char *src, + const char *dst); +#else + +static inline void blk_debugfs_register_bsg(void) +{ +} + +/* If bsg is not enabled we use the base directory */ +static inline void blk_queue_debugfs_register_bsg(struct request_queue *q, + const char *name) +{ + blk_queue_debugfs_register(q, name); +} + +static inline +struct dentry *blk_queue_debugfs_bsg_symlink(struct request_queue *q, + const char *src, + const char *dst) +{ + return blk_queue_debugfs_symlink(q, src, dst); +} + +#endif /* CONFIG_BLK_DEV_BSG */ +#else /* ! CONFIG_DEBUG_FS */ +static inline void blk_queue_debugfs_register(struct request_queue *q, + const char *name) +{ +} + +struct dentry *blk_queue_debugfs_symlink(struct request_queue *q, + const char *src, + const char *dst) +{ + return ERR_PTR(-ENODEV); +} + +#ifdef CONFIG_BLK_DEV_BSG +static inline void blk_debugfs_register_bsg(void) +{ +} +#endif /* CONFIG_BLK_DEV_BSG */ + +static inline void blk_queue_debugfs_register_bsg(struct request_queue *q, + const char *name) +{ +} + +static inline +struct dentry *blk_queue_debugfs_bsg_symlink(struct request_queue *q, + const char *src, + const char *dst) +{ + return ERR_PTR(-ENODEV); +} +#endif /* CONFIG_DEBUG_FS */ + #endif /* _LINUX_GENHD_H */ diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index ca39dc3230cb..6c10a1427de2 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt) debugfs_remove(bt->msg_file); debugfs_remove(bt->dropped_file); relay_close(bt->rchan); - debugfs_remove(bt->dir); free_percpu(bt->sequence); free_percpu(bt->msg_data); kfree(bt); @@ -509,9 +508,24 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = -ENOENT; - dir = debugfs_lookup(buts->name, blk_debugfs_root); - if (!dir) - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); + dir = q->debugfs_dir; + + /* + * Although the directory here is from debugfs, and we typically do not + * care about NULL dirs as debugfs is typically only used for debugging, + * we rely on the directory to exist to place files which we then use + * for blktrace userspace functionality. Without this directory + * blktrace would not work. Enabling blktrace functionality enables + * debugfs too, as such, we *really* do want to check for this and must + * ensure it was set before chugging on. If NULL were used below, we'd + * also end up creating the debugfs files under the block root + * directory, which we definitely do not want. + */ + if (IS_ERR_OR_NULL(dir)) { + pr_warn("debugfs_dir not present for %s so skipping\n", + buts->name); + goto err; + } bt->dev = dev; atomic_set(&bt->dropped, 0); @@ -551,8 +565,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = 0; err: - if (dir && !bt->dir) - dput(dir); if (ret) blk_trace_free(bt); return ret; From patchwork Sat May 9 03:10:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11537925 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 197FB139F for ; Sat, 9 May 2020 03:11:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E4469216FD for ; Sat, 9 May 2020 03:11:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E4469216FD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DB9B690002A; Fri, 8 May 2020 23:11:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D1C8C90001C; Fri, 8 May 2020 23:11:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFAE690002A; Fri, 8 May 2020 23:11:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id 89FF490001C for ; Fri, 8 May 2020 23:11:04 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4B943824805A for ; Sat, 9 May 2020 03:11:04 +0000 (UTC) X-FDA: 76795704048.14.sleep76_7dba98eca1920 X-Spam-Summary: 2,0,0,65f1b88377d128d5,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:967:973:988:989:1260:1311:1314:1345:1359:1431:1437:1515:1535:1542:1711:1730:1747:1777:1792:2393:2525:2538:2559:2563:2682:2685:2693:2859:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:5007:6119:6261:7903:8660:9025:9040:10004:11026:11658:11914:12043:12048:12294:12296:12297:12517:12519:12555:12679:12895:13095:13148:13161:13229:13230:13894:14181:14394:14721:14824:21063:21080:21212:21324:21433:21444:21451:21627:21939:21990:30029:30054:30067:30070,0,RBL:209.85.210.193:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: sleep76_7dba98eca1920 X-Filterd-Recvd-Size: 5079 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Sat, 9 May 2020 03:11:03 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id z1so1995697pfn.3 for ; Fri, 08 May 2020 20:11:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cEK33y5M3evHk3zepZBRBq1zuAEKJAta0bGSZcp5I7U=; b=SAmdD+wRwCQ/m0kUt/kNyzyjxusMpQhFRoI0DrtQ8FIPOlpWdEzH9m9k/lG9vj7nvW k3jpBgZFpQna4oHJsszR3ct6rWUqV0chGC2SmAL8JwwddevOUb1XWd2ol+QZsVdYk0ts PYCU6TNGrtUgURxduiS7Pv1n8QxUr29UkDVxpSFx7v9DuGOeBS/xnGf6y2FF6x8m5Die li6Az6M8464FDAVipaJEZwRcVfjUfWLQ9j7NdmO8Kx31i512at6HetZm9VDbI11bvoCt vIZaLY9bJaWneOR+q8N0nGBtu/GyxMFasUWM/LaF5sVUZaivHAkyGEqRcBh37wsQ7gr0 hZbg== X-Gm-Message-State: AGi0PuZCfaQLEULQkorvx+9vpvFoK13lujq65yhNs1OKTH7oHFigOsrb fxVwzfjOhVSCwFwEw/2Zk8E= X-Google-Smtp-Source: APiQypLJMN/211bTPotanM4jUsK4k4pu+kicwA68ZAITFqqfkEyvZgoqNiyBViyjSN8+owsWBeCmNA== X-Received: by 2002:a62:6545:: with SMTP id z66mr5765485pfb.87.1588993863018; Fri, 08 May 2020 20:11:03 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id k186sm2518635pga.94.2020.05.08.20.11.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 May 2020 20:11:00 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 0657241D67; Sat, 9 May 2020 03:11:00 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v4 4/5] blktrace: break out of blktrace setup on concurrent calls Date: Sat, 9 May 2020 03:10:57 +0000 Message-Id: <20200509031058.8239-5-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200509031058.8239-1-mcgrof@kernel.org> References: <20200509031058.8239-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We use one blktrace per request_queue, that means one per the entire disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1, or just two calls on /dev/vda. We check for concurrent setup only at the very end of the blktrace setup though. If we try to run two concurrent blktraces on the same block device the second one will fail, and the first one seems to go on. However when one tries to kill the first one one will see things like this: The kernel will show these: ``` debugfs: File 'dropped' in directory 'nvme1n1' already present! debugfs: File 'msg' in directory 'nvme1n1' already present! debugfs: File 'trace0' in directory 'nvme1n1' already present! `` And userspace just sees this error message for the second call: ``` blktrace /dev/nvme1n1 BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error ``` The first userspace process #1 will also claim that the files were taken underneath their nose as well. The files are taken away form the first process given that when the second blktrace fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN. This means that even if go-happy process #1 is waiting for blktrace data, we *have* been asked to take teardown the blktrace. This can easily be reproduced with break-blktrace [0] run_0005.sh test. Just break out early if we know we're already going to fail, this will prevent trying to create the files all over again, which we know still exist. [0] https://github.com/mcgrof/break-blktrace Signed-off-by: Luis Chamberlain --- kernel/trace/blktrace.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 6c10a1427de2..bd5ec2184d46 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -3,6 +3,9 @@ * Copyright (C) 2006 Jens Axboe * */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -493,6 +496,12 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, */ strreplace(buts->name, '/', '_'); + if (q->blk_trace) { + pr_warn("Concurrent blktraces are not allowed on %s\n", + buts->name); + return -EBUSY; + } + bt = kzalloc(sizeof(*bt), GFP_KERNEL); if (!bt) return -ENOMEM; From patchwork Sat May 9 03:10:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11537943 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F88315E6 for ; Sat, 9 May 2020 03:11:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F155A218AC for ; Sat, 9 May 2020 03:11:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F155A218AC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 46A7990002E; Fri, 8 May 2020 23:11:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3F38690001C; Fri, 8 May 2020 23:11:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2952490002E; Fri, 8 May 2020 23:11:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 02C4D90001C for ; Fri, 8 May 2020 23:11:08 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AF62C52A6 for ; Sat, 9 May 2020 03:11:08 +0000 (UTC) X-FDA: 76795704216.09.earth13_7e6205394370e X-Spam-Summary: 2,0,0,524adabc28e4e0c3,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3352:3865:3866:3868:3871:5007:6261:9389:10004:11026:11658:11914:12043:12048:12297:12517:12519:12555:12895:13069:13311:13357:13894:13972:14096:14181:14384:14394:14721:21080:21444:21451:21627:30012:30054,0,RBL:209.85.215.194:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: earth13_7e6205394370e X-Filterd-Recvd-Size: 3671 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Sat, 9 May 2020 03:11:08 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id l25so1804337pgc.5 for ; Fri, 08 May 2020 20:11:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=B9qeuwlCWo2bLV/HUmko4UIQJaKMhP6Ukx+rQSkan8Y=; b=ge8x+XewBMghfVfXuQL6Ya/jebkXsXBUHCTlXEwl9pMaeHsOZorIKik1UPDxLW+Zlp AgP7A07sQ/gt0JICFoGAAqIrg5a6vxIok7amT1Ghs3zxMXbT5H4nCBx0pKJYUwpW5n2W 3GCfBVrSMBsam41zWkU+7VUygVBFiSYF0/zvn8udruguGuYgcesUCGubeI1rT/kJvwJ5 0FENyQJou5gRhykpvIyppb8j0Znw6aexxw7x1kkr/ob7tfzw0s+xnSjxqUeCqYCt2e7U eYSxftQheVzLTljx1F6U/xEGhClDsRe86XWfnQa5ynn03+EJfKcj8lF7bkpWFNUUeF9b ANdw== X-Gm-Message-State: AGi0PubfRSDsBgiOw61agX4eONFiK83XhREfdAcfH0BbpSOuExvMapjk v2AVRpUX/0iBfSgHC1+bcgs= X-Google-Smtp-Source: APiQypJ4gDoO4Ls5NVzNOvJO5nVpKEkSKJtyC67OMRsCvPXltLPj+Zy6NxHNyPn2f7uJc5uyHsOujA== X-Received: by 2002:a62:144b:: with SMTP id 72mr6234440pfu.246.1588993867590; Fri, 08 May 2020 20:11:07 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id v9sm3508964pju.3.2020.05.08.20.11.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 May 2020 20:11:06 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 1484E41D95; Sat, 9 May 2020 03:11:00 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v4 5/5] loop: be paranoid on exit and prevent new additions / removals Date: Sat, 9 May 2020 03:10:58 +0000 Message-Id: <20200509031058.8239-6-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200509031058.8239-1-mcgrof@kernel.org> References: <20200509031058.8239-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Be pedantic on removal as well and hold the mutex. This should prevent uses of addition while we exit. I cannot trigger an issue with this though, this is just a fix done through code inspection. Reviewed-by: Ming Lei Signed-off-by: Luis Chamberlain --- drivers/block/loop.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 14372df0f354..54fbcbd930de 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -2333,6 +2333,8 @@ static void __exit loop_exit(void) range = max_loop ? max_loop << part_shift : 1UL << MINORBITS; + mutex_lock(&loop_ctl_mutex); + idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); idr_destroy(&loop_index_idr); @@ -2340,6 +2342,8 @@ static void __exit loop_exit(void) unregister_blkdev(LOOP_MAJOR, "loop"); misc_deregister(&loop_misc); + + mutex_unlock(&loop_ctl_mutex); } module_init(loop_init);