From patchwork Thu Apr 9 21:45:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11482487 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15AE414B4 for ; Thu, 9 Apr 2020 21:45:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D02A12072F for ; Thu, 9 Apr 2020 21:45:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D02A12072F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0C1FA8E0018; Thu, 9 Apr 2020 17:45:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 074848E0003; Thu, 9 Apr 2020 17:45:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA3A78E0018; Thu, 9 Apr 2020 17:45:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id DC3D98E0003 for ; Thu, 9 Apr 2020 17:45:35 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AC731180AD811 for ; Thu, 9 Apr 2020 21:45:35 +0000 (UTC) X-FDA: 76689648630.28.taste09_6b0175afcfb53 X-Spam-Summary: 2,0,0,2cc5452f7bc6e740,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1543:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3165:3353:3867:3868:3872:4321:4605:5007:6120:6261:6742:7903:9592:10004:11026:11232:11473:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12895:12986:13894:13972:14093:14096:14181:14721:21080:21222:21444:21451:21627:21990:30012:30054:30064,0,RBL:209.85.214.193:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:15,LUA_SUMMARY:none X-HE-Tag: taste09_6b0175afcfb53 X-Filterd-Recvd-Size: 5859 Received: from mail-pl1-f193.google.com (mail-pl1-f193.google.com [209.85.214.193]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Apr 2020 21:45:35 +0000 (UTC) Received: by mail-pl1-f193.google.com with SMTP id m16so1500108pls.4 for ; Thu, 09 Apr 2020 14:45:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OLph60D1SnFnHEB2NFdtxbtWAnKHUye2dDiwcTEr5pU=; b=p42jaob6UGGz+QLLEHnnjlw91N2CGir/VZ1fXL95wao8HDHYHdT3HYTnIlxTYIyQRC FTfHxuD7C4gPZ8m9iIXRrxEqgcesCOM6nrmQ0RE+jCuS+VhvSLon+d+bAcrPu2IjnD+v lcS8ytzeodvmHVQEIR3rs/zGGvK5TaoRj2nRFpo7xaihKGOMm4NU0ZkuS8G7Vh4HylJt 0ryGQTVDrILuaSuBLN5biS8API6aKbmMdnC1Kc1T8CYcoXfunueRclj/Ew0pNMyD3Nuk S8SX9aOb0EunzC7ZLxqBVwiKbon4JjiqM50wJIKCINPQ2vZ6nMSapbw+VI6taAu0K3WY Wnbw== X-Gm-Message-State: AGi0Pua+4cd+cbi74sXdIxkuCWE1J5us5Tw2Z4saaEYNIMZnOmbyK5AK 7BqwQ1CTqd5F4qT8YTOhBrk= X-Google-Smtp-Source: APiQypKmaslyoWulwk72NDpUWDybfCIK5dcSrW/r3JU1c/rJvucJcwAs0tETDPvCZSZhhul2HvDzwg== X-Received: by 2002:a17:90a:82:: with SMTP id a2mr1754386pja.47.1586468734191; Thu, 09 Apr 2020 14:45:34 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id nk12sm116640pjb.41.2020.04.09.14.45.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Apr 2020 14:45:32 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 29E48404EF; Thu, 9 Apr 2020 21:45:32 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Bart Van Assche , Omar Sandoval , Hannes Reinecke , Michal Hocko Subject: [RFC v2 1/5] block: move main block debugfs initialization to its own file Date: Thu, 9 Apr 2020 21:45:26 +0000 Message-Id: <20200409214530.2413-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200409214530.2413-1-mcgrof@kernel.org> References: <20200409214530.2413-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Single and multiqeueue block devices share some debugfs code. By moving this into its own file it makes it easier to expand and audit this shared code. This patch contains no functional changes. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Signed-off-by: Luis Chamberlain --- block/Makefile | 1 + block/blk-core.c | 9 +-------- block/blk-debugfs.c | 15 +++++++++++++++ block/blk.h | 7 +++++++ 4 files changed, 24 insertions(+), 8 deletions(-) create mode 100644 block/blk-debugfs.c diff --git a/block/Makefile b/block/Makefile index 206b96e9387f..1d3ab20505d8 100644 --- a/block/Makefile +++ b/block/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \ blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \ genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o +obj-$(CONFIG_DEBUG_FS) += blk-debugfs.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o obj-$(CONFIG_BLK_DEV_BSG) += bsg.o diff --git a/block/blk-core.c b/block/blk-core.c index 7e4a1da0715e..5aaae7a1b338 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -48,10 +48,6 @@ #include "blk-pm.h" #include "blk-rq-qos.h" -#ifdef CONFIG_DEBUG_FS -struct dentry *blk_debugfs_root; -#endif - EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete); @@ -1796,10 +1792,7 @@ int __init blk_dev_init(void) blk_requestq_cachep = kmem_cache_create("request_queue", sizeof(struct request_queue), 0, SLAB_PANIC, NULL); - -#ifdef CONFIG_DEBUG_FS - blk_debugfs_root = debugfs_create_dir("block", NULL); -#endif + blk_debugfs_register(); return 0; } diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c new file mode 100644 index 000000000000..634dea4b1507 --- /dev/null +++ b/block/blk-debugfs.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Shared debugfs mq / non-mq functionality + */ +#include +#include +#include + +struct dentry *blk_debugfs_root; + +void blk_debugfs_register(void) +{ + blk_debugfs_root = debugfs_create_dir("block", NULL); +} diff --git a/block/blk.h b/block/blk.h index 0a94ec68af32..86a66b614f08 100644 --- a/block/blk.h +++ b/block/blk.h @@ -487,5 +487,12 @@ struct request_queue *__blk_alloc_queue(int node_id); int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, bool *same_page); +#ifdef CONFIG_DEBUG_FS +void blk_debugfs_register(void); +#else +static inline void blk_debugfs_register(void) +{ +} +#endif /* CONFIG_DEBUG_FS */ #endif /* BLK_INTERNAL_H */ From patchwork Thu Apr 9 21:45:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11482493 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B50514B4 for ; Thu, 9 Apr 2020 21:45:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE0B220CC7 for ; Thu, 9 Apr 2020 21:45:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE0B220CC7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 35E5C8E001A; Thu, 9 Apr 2020 17:45:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 297A18E0003; Thu, 9 Apr 2020 17:45:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0ED888E001A; Thu, 9 Apr 2020 17:45:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id E87558E0003 for ; Thu, 9 Apr 2020 17:45:38 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id BA06AA773 for ; Thu, 9 Apr 2020 21:45:38 +0000 (UTC) X-FDA: 76689648756.26.drink28_6b54e30cfb739 X-Spam-Summary: 2,0,0,76e1e29e36c94cc4,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:69:327:355:379:541:800:960:965:966:967:968:973:988:989:1260:1311:1314:1345:1437:1515:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2525:2559:2566:2570:2682:2685:2693:2703:2859:2901:2933:2937:2939:2942:2945:2947:2951:2954:3022:3865:3866:3867:3868:3870:3871:3872:3873:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4250:4321:4385:4390:4395:4605:5007:6119:6120:6238:6261:7208:7903:7904:8660:9025:9040:9121:9163:9165:10004:11658:12048:12219:12740:13148:13161:13221:13229:13230:30056,0,RBL:209.85.216.65:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:15,LUA_SUMMARY:none X-HE-Tag: drink28_6b54e30cfb739 X-Filterd-Recvd-Size: 22699 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Apr 2020 21:45:37 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id nu11so31970pjb.1 for ; Thu, 09 Apr 2020 14:45:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HGRSYR9QZ02X4P6LgM06lKnRINNg0I457xshcHxe55E=; b=q1GFx4PxkxIjbw5+bo3M/HagP1xZSvsBHmA7fuOqRptQ7Q2RAhNQaajw081o9BqBX4 kMYsM7vCwjNf7Njaz540kC/2tl3CQ4NMJJoDvGlrsbaVPofSfgE+l4SbaQmJXCdcrQT3 VA7glkyzxB2sSCOHs5fTgo+cGdjxE2ckEKrMaetL3JJEXSlJWzF+2dDXUYiMM+MjLhkO TIRCBp9SD4LrVnlYRoX3AxcdvZ6XxsnkdjKdcIueF3PuJR9aR9dLfOC/WglR5xtaiAJ1 28vRnUnZFBRjGf8m3UBUoXq3hXu6GT8q6sviQM3fpSZgiqBsn9pECINVMaOgOlyaV6bA AMXw== X-Gm-Message-State: AGi0PubsshnQ3oqNC4ygNUZ9cmiZmSufk5eTvg6Hvh6xJtjmER2uAIzN mrry8FgkfJcoKbtj6mDSpUg= X-Google-Smtp-Source: APiQypJ7yrKvG1eKAZMRVDwPEgbxEiz1QkrcfNcOCJHgVXUm/tqtX2Qo/uI95A9F7kOm1VZUO9eeTg== X-Received: by 2002:a17:902:444:: with SMTP id 62mr1611237ple.109.1586468736406; Thu, 09 Apr 2020 14:45:36 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id 8sm60937pfv.65.2020.04.09.14.45.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Apr 2020 14:45:32 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 3DF3E419AC; Thu, 9 Apr 2020 21:45:32 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Bart Van Assche , Omar Sandoval , Hannes Reinecke , Michal Hocko , syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Subject: [RFC v2 2/5] blktrace: fix debugfs use after free Date: Thu, 9 Apr 2020 21:45:27 +0000 Message-Id: <20200409214530.2413-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200409214530.2413-1-mcgrof@kernel.org> References: <20200409214530.2413-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory") merged on v4.12 Omar fixed the original blktrace code for multiqueue use. This however left in place a possible crash, if you happen to abuse blktrace in a way it was not intended. Namely, if you loop adding a device, setup the blktrace with BLKTRACESETUP, forget to BLKTRACETEARDOWN, and then just remove the device you end up with a panic: [ 107.193134] debugfs: Directory 'loop0' with parent 'block' already present! [ 107.254615] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 107.258785] #PF: supervisor write access in kernel mode [ 107.262035] #PF: error_code(0x0002) - not-present page [ 107.264106] PGD 0 P4D 0 [ 107.264404] Oops: 0002 [#1] SMP NOPTI [ 107.264803] CPU: 8 PID: 674 Comm: kworker/8:2 Tainted: G E 5.6.0-rc7-next-20200327 #1 [ 107.265712] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 107.266553] Workqueue: events __blk_release_queue [ 107.267051] RIP: 0010:down_write+0x15/0x40 [ 107.267488] Code: eb ca e8 ee a5 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 107.269300] RSP: 0018:ffff9927c06efda8 EFLAGS: 00010246 [ 107.269841] RAX: 0000000000000000 RBX: ffff8be7e73b0600 RCX: ffffff8100000000 [ 107.270559] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 107.271281] RBP: 00000000000000a0 R08: ffff8be7ebc80fa8 R09: ffff8be7ebc80fa8 [ 107.272001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 107.272722] R13: ffff8be7efc30400 R14: ffff8be7e0571200 R15: 00000000000000a0 [ 107.273475] FS: 0000000000000000(0000) GS:ffff8be7efc00000(0000) knlGS:0000000000000000 [ 107.274346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 107.274968] CR2: 00000000000000a0 CR3: 000000042abee003 CR4: 0000000000360ee0 [ 107.275710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 107.276465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 107.277214] Call Trace: [ 107.277532] simple_recursive_removal+0x4e/0x2e0 [ 107.278049] ? debugfs_remove+0x60/0x60 [ 107.278493] debugfs_remove+0x40/0x60 [ 107.278922] blk_trace_free+0xd/0x50 [ 107.279339] __blk_trace_remove+0x27/0x40 [ 107.279797] blk_trace_shutdown+0x30/0x40 [ 107.280256] __blk_release_queue+0xab/0x110 [ 107.280734] process_one_work+0x1b4/0x380 [ 107.281194] worker_thread+0x50/0x3c0 [ 107.281622] kthread+0xf9/0x130 [ 107.281994] ? process_one_work+0x380/0x380 [ 107.282467] ? kthread_park+0x90/0x90 [ 107.282895] ret_from_fork+0x1f/0x40 [ 107.283316] Modules linked in: loop(E) [ 107.288562] CR2: 00000000000000a0 [ 107.288957] ---[ end trace b885d243d441bbce ]--- This splat happens to be very similar to the one reported via kernel.org korg#205713, only that korg#205713 was for v4.19.83 and the above now includes the simple_recursive_removal() introduced via commit a3d1e7eb5abe ("simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems") merged on v5.6. korg#205713 then was used to create CVE-2019-19770 and claims that the bug is in a use-after-free in the debugfs core code. The implications of this being a generic UAF on debugfs would be much more severe, as it would imply parent dentries can sometimes not be positive, which we hold by design is just not possible. Below is the splat explained with a bit more details, explaining what is happening in userspace, kernel, and a print of the CPU on, which the code runs on: load loopback module [ 13.603371] == blk_mq_debugfs_register(12) start [ 13.604040] == blk_mq_debugfs_register(12) q->debugfs_dir created [ 13.604934] == blk_mq_debugfs_register(12) end [ 13.627382] == blk_mq_debugfs_register(12) start [ 13.628041] == blk_mq_debugfs_register(12) q->debugfs_dir created [ 13.629240] == blk_mq_debugfs_register(12) end [ 13.651667] == blk_mq_debugfs_register(12) start [ 13.652836] == blk_mq_debugfs_register(12) q->debugfs_dir created [ 13.655107] == blk_mq_debugfs_register(12) end [ 13.684917] == blk_mq_debugfs_register(12) start [ 13.687876] == blk_mq_debugfs_register(12) q->debugfs_dir created [ 13.691588] == blk_mq_debugfs_register(13) end [ 13.707320] == blk_mq_debugfs_register(13) start [ 13.707863] == blk_mq_debugfs_register(13) q->debugfs_dir created [ 13.708856] == blk_mq_debugfs_register(13) end [ 13.735623] == blk_mq_debugfs_register(13) start [ 13.736656] == blk_mq_debugfs_register(13) q->debugfs_dir created [ 13.738411] == blk_mq_debugfs_register(13) end [ 13.763326] == blk_mq_debugfs_register(13) start [ 13.763972] == blk_mq_debugfs_register(13) q->debugfs_dir created [ 13.765167] == blk_mq_debugfs_register(13) end [ 13.779510] == blk_mq_debugfs_register(13) start [ 13.780522] == blk_mq_debugfs_register(13) q->debugfs_dir created [ 13.782338] == blk_mq_debugfs_register(13) end [ 13.783521] loop: module loaded LOOP_CTL_DEL(loop0) #1 [ 13.803550] = __blk_release_queue(4) start [ 13.807772] == blk_trace_shutdown(4) start [ 13.810749] == blk_trace_shutdown(4) end [ 13.813437] = __blk_release_queue(4) calling blk_mq_debugfs_unregister() [ 13.817593] ==== blk_mq_debugfs_unregister(4) begin [ 13.817621] ==== blk_mq_debugfs_unregister(4) debugfs_remove_recursive(q->debugfs_dir) [ 13.821203] ==== blk_mq_debugfs_unregister(4) end q->debugfs_dir is NULL [ 13.826166] = __blk_release_queue(4) blk_mq_debugfs_unregister() end [ 13.832992] = __blk_release_queue(4) end LOOP_CTL_ADD(loop0) #1 [ 13.843742] == blk_mq_debugfs_register(7) start [ 13.845569] == blk_mq_debugfs_register(7) q->debugfs_dir created [ 13.848628] == blk_mq_debugfs_register(7) end BLKTRACE_SETUP(loop0) #1 [ 13.850924] == blk_trace_ioctl(7, BLKTRACESETUP) start [ 13.852852] === do_blk_trace_setup(7) start [ 13.854580] === do_blk_trace_setup(7) creating directory [ 13.856620] === do_blk_trace_setup(7) using what debugfs_lookup() gave [ 13.860635] === do_blk_trace_setup(7) end with ret: 0 [ 13.862615] == blk_trace_ioctl(7, BLKTRACESETUP) end LOOP_CTL_DEL(loop0) #2 [ 13.883304] = __blk_release_queue(7) start [ 13.885324] == blk_trace_shutdown(7) start [ 13.887197] == blk_trace_shutdown(7) calling __blk_trace_remove() [ 13.889807] == __blk_trace_remove(7) start [ 13.891669] === blk_trace_cleanup(7) start [ 13.911656] ====== blk_trace_free(7) start LOOP_CTL_ADD(loop0) #2 [ 13.912709] == blk_mq_debugfs_register(2) start ---> From LOOP_CTL_DEL(loop0) #2 [ 13.915887] ====== blk_trace_free(7) end ---> From LOOP_CTL_ADD(loop0) #2 [ 13.918359] debugfs: Directory 'loop0' with parent 'block' already present! [ 13.926433] == blk_mq_debugfs_register(2) q->debugfs_dir created [ 13.930373] == blk_mq_debugfs_register(2) end BLKTRACE_SETUP(loop0) #2 [ 13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start [ 13.936758] === do_blk_trace_setup(2) start [ 13.938944] === do_blk_trace_setup(2) creating directory [ 13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave ---> From LOOP_CTL_DEL(loop0) #2 [ 13.971046] === blk_trace_cleanup(7) end [ 13.973175] == __blk_trace_remove(7) end [ 13.975352] == blk_trace_shutdown(7) end [ 13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister() [ 13.980645] ==== blk_mq_debugfs_unregister(7) begin [ 13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir) [ 13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL [ 13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end [ 13.993155] = __blk_release_queue(7) end ---> From BLKTRACE_SETUP(loop0) #2 [ 13.995928] === do_blk_trace_setup(2) end with ret: 0 [ 13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end LOOP_CTL_DEL(loop0) #3 [ 14.035119] = __blk_release_queue(2) start [ 14.036925] == blk_trace_shutdown(2) start [ 14.038518] == blk_trace_shutdown(2) calling __blk_trace_remove() [ 14.040829] == __blk_trace_remove(2) start [ 14.042413] === blk_trace_cleanup(2) start LOOP_CTL_ADD(loop0) #3 [ 14.072522] == blk_mq_debugfs_register(6) start ---> From LOOP_CTL_DEL(loop0) #3 [ 14.075151] ====== blk_trace_free(2) start ---> From LOOP_CTL_ADD(loop0) #3 [ 14.075882] == blk_mq_debugfs_register(6) q->debugfs_dir created ---> From LOOP_CTL_DEL(loop0) #3 [ 14.078624] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 14.084332] == blk_mq_debugfs_register(6) end [ 14.086971] #PF: supervisor write access in kernel mode [ 14.086974] #PF: error_code(0x0002) - not-present page [ 14.086977] PGD 0 P4D 0 [ 14.086984] Oops: 0002 [#1] SMP NOPTI [ 14.086990] CPU: 2 PID: 287 Comm: kworker/2:2 Tainted: G E 5.6.0-next-20200403+ #54 [ 14.086991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 14.087002] Workqueue: events __blk_release_queue [ 14.087011] RIP: 0010:down_write+0x15/0x40 [ 14.090300] == blk_trace_ioctl(6, BLKTRACESETUP) start [ 14.093277] Code: eb ca e8 3e 34 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 14.093280] RSP: 0018:ffffc28a00533da8 EFLAGS: 00010246 [ 14.093284] RAX: 0000000000000000 RBX: ffff9f7a24d07980 RCX: ffffff8100000000 [ 14.093286] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 14.093287] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000019 [ 14.093289] R10: 0000000000000774 R11: 0000000000000000 R12: 0000000000000000 [ 14.093291] R13: ffff9f7a2fab0400 R14: ffff9f7a21dd1140 R15: 00000000000000a0 [ 14.093294] FS: 0000000000000000(0000) GS:ffff9f7a2fa80000(0000) knlGS:0000000000000000 [ 14.093296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.093298] CR2: 00000000000000a0 CR3: 00000004293d2003 CR4: 0000000000360ee0 [ 14.093307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.093308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 14.093310] Call Trace: [ 14.093324] simple_recursive_removal+0x4e/0x2e0 [ 14.093330] ? debugfs_remove+0x60/0x60 [ 14.093334] debugfs_remove+0x40/0x60 [ 14.093339] blk_trace_free+0x20/0x70 [ 14.093346] __blk_trace_remove+0x54/0x90 [ 14.096704] === do_blk_trace_setup(6) start [ 14.098534] blk_trace_shutdown+0x74/0x80 [ 14.100958] === do_blk_trace_setup(6) creating directory [ 14.104575] __blk_release_queue+0xbe/0x160 [ 14.104580] process_one_work+0x1b4/0x380 [ 14.104585] worker_thread+0x50/0x3c0 [ 14.104589] kthread+0xf9/0x130 [ 14.104593] ? process_one_work+0x380/0x380 [ 14.104596] ? kthread_park+0x90/0x90 [ 14.104599] ret_from_fork+0x1f/0x40 [ 14.104603] Modules linked in: loop(E) xfs(E) libcrc32c(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) joydev(E) serio_raw(E) aesni_intel(E) glue_helper(E) virtio_balloon(E) evdev(E) crypto_simd(E) pcspkr(E) cryptd(E) i6300esb(E) button(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E) ata_generic(E) uhci_hcd(E) ata_piix(E) ehci_hcd(E) nvme(E) libata(E) crc32c_intel(E) usbcore(E) psmouse(E) nvme_core(E) virtio_pci(E) scsi_mod(E) virtio_ring(E) t10_pi(E) virtio(E) i2c_piix4(E) floppy(E) [ 14.107400] === do_blk_trace_setup(6) using what debugfs_lookup() gave [ 14.108939] CR2: 00000000000000a0 [ 14.110589] === do_blk_trace_setup(6) end with ret: 0 [ 14.111592] ---[ end trace 7a783b33b9614db9 ]--- The root cause to this issue is that debugfs_lookup() can find a previous incarnation's dir of the same name which is about to get removed from a not yet schedule work. We can fix the UAF by simply using a debugfs directory which moving forward will always be accessible if debugfs is enabled, this way, its allocated and avaialble always for both request-based block drivers or make_request drivers (multiqueue) block drivers. This simplifies the code considerably, with the only penalty now being that we're always creating the request queue debugfs directory for the request-based block device drivers. The UAF then is not a core debugfs issue, but instead a misuse of debugfs, and this issue can only be triggered if you are root, and misuse blktrace. This issue can be reproduced with break-blktrace [2] using: break-blktrace -c 10 -d -s This patch fixes this issue. Note that there is also another respective UAF but from the ioctl path [3], this should also fix that issue. This patch then also disputes the severity of CVE-2019-19770 as this issue is only possible by being root and using blktrace. It is not a core debugfs issue. [0] https://bugzilla.kernel.org/show_bug.cgi?id=205713 [1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770 [2] https://github.com/mcgrof/break-blktrace [3] https://lore.kernel.org/lkml/000000000000ec635b059f752700@google.com/ Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Fiexes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory") Signed-off-by: Luis Chamberlain --- block/blk-debugfs.c | 12 ++++++++++++ block/blk-mq-debugfs.c | 5 ----- block/blk-sysfs.c | 3 +++ block/blk.h | 10 ++++++++++ include/linux/blkdev.h | 4 +++- include/linux/blktrace_api.h | 1 - kernel/trace/blktrace.c | 19 ++++++++----------- 7 files changed, 36 insertions(+), 18 deletions(-) diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c index 634dea4b1507..a8b343e758e4 100644 --- a/block/blk-debugfs.c +++ b/block/blk-debugfs.c @@ -13,3 +13,15 @@ void blk_debugfs_register(void) { blk_debugfs_root = debugfs_create_dir("block", NULL); } + +void blk_q_debugfs_register(struct request_queue *q) +{ + q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), + blk_debugfs_root); +} + +void blk_q_debugfs_unregister(struct request_queue *q) +{ + debugfs_remove_recursive(q->debugfs_dir); + q->debugfs_dir = NULL; +} diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index b3f2ba483992..bda9378eab90 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -823,9 +823,6 @@ void blk_mq_debugfs_register(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i; - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), - blk_debugfs_root); - debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs); /* @@ -856,9 +853,7 @@ void blk_mq_debugfs_register(struct request_queue *q) void blk_mq_debugfs_unregister(struct request_queue *q) { - debugfs_remove_recursive(q->debugfs_dir); q->sched_debugfs_dir = NULL; - q->debugfs_dir = NULL; } static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index fca9b158f4a0..20f20b0fa0b9 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -895,6 +895,7 @@ static void __blk_release_queue(struct work_struct *work) blk_trace_shutdown(q); + blk_q_debugfs_unregister(q); if (queue_is_mq(q)) blk_mq_debugfs_unregister(q); @@ -975,6 +976,8 @@ int blk_register_queue(struct gendisk *disk) goto unlock; } + blk_q_debugfs_register(q); + if (queue_is_mq(q)) { __blk_mq_register_dev(dev, q); blk_mq_debugfs_register(q); diff --git a/block/blk.h b/block/blk.h index 86a66b614f08..b86123a2d74f 100644 --- a/block/blk.h +++ b/block/blk.h @@ -489,10 +489,20 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, bool *same_page); #ifdef CONFIG_DEBUG_FS void blk_debugfs_register(void); +void blk_q_debugfs_register(struct request_queue *q); +void blk_q_debugfs_unregister(struct request_queue *q); #else static inline void blk_debugfs_register(void) { } + +static inline void blk_q_debugfs_register(struct request_queue *q) +{ +} + +static inline void blk_q_debugfs_unregister(struct request_queue *q) +{ +} #endif /* CONFIG_DEBUG_FS */ #endif /* BLK_INTERNAL_H */ diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 32868fbedc9e..8b1cab52cef9 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -569,8 +569,10 @@ struct request_queue { struct list_head tag_set_list; struct bio_set bio_split; -#ifdef CONFIG_BLK_DEBUG_FS +#ifdef CONFIG_DEBUG_FS struct dentry *debugfs_dir; +#endif +#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; struct dentry *rqos_debugfs_dir; #endif diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 3b6ff5902edc..eb6db276e293 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -22,7 +22,6 @@ struct blk_trace { u64 end_lba; u32 pid; u32 dev; - struct dentry *dir; struct dentry *dropped_file; struct dentry *msg_file; struct list_head running_list; diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index ca39dc3230cb..15086227592f 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt) debugfs_remove(bt->msg_file); debugfs_remove(bt->dropped_file); relay_close(bt->rchan); - debugfs_remove(bt->dir); free_percpu(bt->sequence); free_percpu(bt->msg_data); kfree(bt); @@ -476,7 +475,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, struct blk_user_trace_setup *buts) { struct blk_trace *bt = NULL; - struct dentry *dir = NULL; int ret; if (!buts->buf_size || !buts->buf_nr) @@ -485,6 +483,9 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, if (!blk_debugfs_root) return -ENOENT; + if (!q->debugfs_dir) + return -ENOENT; + strncpy(buts->name, name, BLKTRACE_BDEV_SIZE); buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0'; @@ -509,21 +510,19 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = -ENOENT; - dir = debugfs_lookup(buts->name, blk_debugfs_root); - if (!dir) - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); - bt->dev = dev; atomic_set(&bt->dropped, 0); INIT_LIST_HEAD(&bt->running_list); ret = -EIO; - bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt, + bt->dropped_file = debugfs_create_file("dropped", 0444, + q->debugfs_dir, bt, &blk_dropped_fops); - bt->msg_file = debugfs_create_file("msg", 0222, dir, bt, &blk_msg_fops); + bt->msg_file = debugfs_create_file("msg", 0222, q->debugfs_dir, + bt, &blk_msg_fops); - bt->rchan = relay_open("trace", dir, buts->buf_size, + bt->rchan = relay_open("trace", q->debugfs_dir, buts->buf_size, buts->buf_nr, &blk_relay_callbacks, bt); if (!bt->rchan) goto err; @@ -551,8 +550,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = 0; err: - if (dir && !bt->dir) - dput(dir); if (ret) blk_trace_free(bt); return ret; From patchwork Thu Apr 9 21:45:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11482491 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7357414B4 for ; Thu, 9 Apr 2020 21:45:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 49B992072F for ; Thu, 9 Apr 2020 21:45:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 49B992072F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D6E9C8E0019; Thu, 9 Apr 2020 17:45:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C81EA8E0003; Thu, 9 Apr 2020 17:45:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFA308E0019; Thu, 9 Apr 2020 17:45:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0043.hostedemail.com [216.40.44.43]) by kanga.kvack.org (Postfix) with ESMTP id A387B8E0003 for ; Thu, 9 Apr 2020 17:45:36 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 73F99181AEF3C for ; Thu, 9 Apr 2020 21:45:36 +0000 (UTC) X-FDA: 76689648672.14.peace91_6b1f82716c838 X-Spam-Summary: 2,0,0,fbd19fb11c77de36,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:2393:2559:2562:2903:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3870:3871:3873:3874:4321:5007:6120:6261:6742:9389:10004:11026:11658:11914:12043:12048:12296:12297:12517:12519:12555:12895:12986:13069:13311:13357:13894:14096:14181:14384:14721:21080:21212:21444:21451:21627:21990:30051:30054:30064,0,RBL:209.85.216.65:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:17,LUA_SUMMARY:none X-HE-Tag: peace91_6b1f82716c838 X-Filterd-Recvd-Size: 4149 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Apr 2020 21:45:36 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id kx8so24042pjb.5 for ; Thu, 09 Apr 2020 14:45:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eoDxljTp5vbr3aoHKkBfKXVECAcC7qfX/D5k3HZf/M0=; b=uZc7ICEMPYg/NpNnFjUS2mRzD6J9Ll6ctvKu+PbJzIuA5acR33SNEisDX2OyX8+AGT kGCkEP9ESspLviqt0PtpHq99lVmd/2agPMAmJUEoE8zaZBKNI+6p8mq1IE1siwQhthhy NmbE4DEpjaXaYT2kQBImePfMK4Wt0cVvMS6nHo6jEpxn8yjo9E0U83usAF94+c1hD3Xr VePD0CSMXjogdCfgarBBhvcf10L54RyS6ddSjnBAjzibt/Cv7+fkf2h8SD9YMVJZu3JT gHJg5NXe7bojD+yup+MMO5kfFj03WCDsoWDkPcCy8OjgwRE0Vpw0qzkHTm3IzJUKkyre qhQA== X-Gm-Message-State: AGi0Puab+jo/y+Wsk1XICvizilbJhNBBw2kSq7Fht+MHxalh8nwWAT4s VlNvjQ20RqvNcMRD+sRD6ITYiFBwZew= X-Google-Smtp-Source: APiQypJjM7OsXoNAVoK/beb/tphg54XlEChQn9RFbJNqZgnvlVR9rMP+bWm17T1MuhZd12Ok+Iemmg== X-Received: by 2002:a17:902:ba86:: with SMTP id k6mr1675518pls.47.1586468735123; Thu, 09 Apr 2020 14:45:35 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id u186sm42297pfu.205.2020.04.09.14.45.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Apr 2020 14:45:32 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 4DC0C41D00; Thu, 9 Apr 2020 21:45:32 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Bart Van Assche , Omar Sandoval , Hannes Reinecke , Michal Hocko Subject: [RFC v2 3/5] blktrace: ref count the request_queue during ioctl Date: Thu, 9 Apr 2020 21:45:28 +0000 Message-Id: <20200409214530.2413-4-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200409214530.2413-1-mcgrof@kernel.org> References: <20200409214530.2413-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Ensure that the reqest_queue ref counts the request_queue during its full ioctl cycle. This avoids possible races against removal, given blk_get_queue() also checks to ensure the queue is not dying. This small race is possible if you defer removal of the reqest_queue and userspace fires off an ioctl for the device in the meantime. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Signed-off-by: Luis Chamberlain Reviewed-by: Bart Van Assche --- kernel/trace/blktrace.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 15086227592f..17e144d15779 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -701,6 +701,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) if (!q) return -ENXIO; + if (!blk_get_queue(q)) + return -ENXIO; + mutex_lock(&q->blk_trace_mutex); switch (cmd) { @@ -729,6 +732,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) } mutex_unlock(&q->blk_trace_mutex); + + blk_put_queue(q); + return ret; } From patchwork Thu Apr 9 21:45:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11482497 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7AD5D92A for ; Thu, 9 Apr 2020 21:45:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 47A8E2072F for ; Thu, 9 Apr 2020 21:45:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47A8E2072F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 88B9F8E001B; Thu, 9 Apr 2020 17:45:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7F18B8E0003; Thu, 9 Apr 2020 17:45:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B8D98E001B; Thu, 9 Apr 2020 17:45:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 610518E0003 for ; Thu, 9 Apr 2020 17:45:39 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2BBB9824556B for ; Thu, 9 Apr 2020 21:45:39 +0000 (UTC) X-FDA: 76689648798.08.sun75_6b84e3648dd26 X-Spam-Summary: 2,0,0,ae733ef94166599e,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:966:968:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1542:1711:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2559:2562:2693:2736:2898:2903:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3870:3871:3872:4321:4385:5007:6120:6261:6742:7903:8660:9389:10004:11026:11658:11914:12043:12048:12114:12296:12297:12438:12517:12519:12555:12895:13148:13230:13255:13894:14181:14721:14819:21080:21433:21444:21451:21627:21795:21987:21990:30012:30051:30054:30064:30070,0,RBL:209.85.210.195:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:14,LUA_SUMMARY:none X-HE-Tag: sun75_6b84e3648dd26 X-Filterd-Recvd-Size: 5145 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Apr 2020 21:45:38 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id l1so129004pff.10 for ; Thu, 09 Apr 2020 14:45:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Zps5dpYPJW545FaPYLeIjK5QW6MnaSH8AGIMhRszEiQ=; b=JmxQX0r68xHgOcQz6mAKmA4QMhg2kv/yTjR7g++ZCCjjchdJiPNDfXkCjl4n6Rvoyb iQVIFSafITqqd5MUsoWJFnTcS0S+MdaJ0I5kkJlaAYBz9N73e00y96LDEF6jiY3I9OHb InGXrVYvGFe63ATD0YOidGXEuydPYQbfOGhGYe2FycbYngX6eSt/F305TM4RwhIqTdhN TRtdXuKN8cr/WX8MylQMJYJIDObjJAXZh3/EEmdkCW/KLeL/PnQ+8pA4m+K9EsEPG3by sda8aQBqeMJxtMiBLU4QW84hrUruBHKV12LuylZIly3fG+GVE7mksnibiGR/7Lw1u6J/ q8rQ== X-Gm-Message-State: AGi0Pua6ecHA3/OERJpLICdBq2ShU0aT5yVWjFpCE0c+7R4F49M1dBrI MQaVVuTrEV3CLYLThZEcn+8= X-Google-Smtp-Source: APiQypKG6gQPVfMD+0afCeX7kVugM0CXqtzTx9olvPQK5ZLuzXpo7Y7xI9ctxu781oEvpMwFytQxKA== X-Received: by 2002:a63:31c4:: with SMTP id x187mr1464572pgx.56.1586468737362; Thu, 09 Apr 2020 14:45:37 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id mn18sm124100pjb.13.2020.04.09.14.45.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Apr 2020 14:45:32 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 5DA2941DAA; Thu, 9 Apr 2020 21:45:32 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Bart Van Assche , Omar Sandoval , Hannes Reinecke , Michal Hocko Subject: [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle() Date: Thu, 9 Apr 2020 21:45:29 +0000 Message-Id: <20200409214530.2413-5-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200409214530.2413-1-mcgrof@kernel.org> References: <20200409214530.2413-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: block device are refcounted so to ensure once its final user goes away it can be cleaned up by the lower layers properly. The block device's request_queue structure is also refcounted, however if the last blk_put_queue() is called under atomic context the block layer has to defer removal. By refcounting the block device during the use of blkcg_schedule_throttle(), we ensure ensure two things: 1) the block device remains available during the call 2) we ensure avoid having to deal with the fact we're using the request_queue structure in atomic context, since the last blk_put_queue() will be called upon disk_release(), *after* our own bdput(). This means this code path is *not* going to remove the request_queue structure, as we are ensuring some later upper layer disk_release() will be the one to release the request_queue structure for us. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Signed-off-by: Luis Chamberlain --- mm/swapfile.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index 6659ab563448..5f6f3a61b5c0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3753,6 +3753,7 @@ static void free_swap_count_continuations(struct swap_info_struct *si) void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, gfp_t gfp_mask) { + struct block_device *bdev; struct swap_info_struct *si, *next; if (!(gfp_mask & __GFP_IO) || !memcg) return; @@ -3771,8 +3772,18 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { if (si->bdev) { + bdev = bdgrab(si->bdev); + if (!bdev) + continue; + /* + * By adding our own bdgrab() we ensure the queue + * sticks around until disk_release(), and so we ensure + * our release of the request_queue does not happen in + * atomic context. + */ blkcg_schedule_throttle(bdev_get_queue(si->bdev), true); + bdput(bdev); break; } } From patchwork Thu Apr 9 21:45:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11482501 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 193B814B4 for ; Thu, 9 Apr 2020 21:45:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DA11E2072F for ; Thu, 9 Apr 2020 21:45:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DA11E2072F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B4CE28E001D; Thu, 9 Apr 2020 17:45:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A86DE8E0003; Thu, 9 Apr 2020 17:45:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D8ED8E001D; Thu, 9 Apr 2020 17:45:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com [216.40.44.39]) by kanga.kvack.org (Postfix) with ESMTP id 81AB88E0003 for ; Thu, 9 Apr 2020 17:45:41 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 526F82470 for ; Thu, 9 Apr 2020 21:45:41 +0000 (UTC) X-FDA: 76689648882.14.pull75_6bd76474d4c5f X-Spam-Summary: 2,0,0,c976883f82d22e82,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:2:41:355:379:541:800:960:966:968:973:982:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1605:1606:1730:1747:1777:1792:2196:2198:2199:2200:2393:2553:2559:2562:2903:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3873:3874:4119:4250:4321:4385:4605:5007:6120:6261:6742:7875:7903:8660:9389:9592:10004:11026:11232:11473:11658:11914:12043:12048:12050:12291:12296:12297:12438:12517:12519:12555:12683:12895:13141:13146:13148:13230:13894:13972:21080:21212:21433:21444:21451:21627:21740:21972:21987:21990:30012:30054:30064:30070:30090:30091,0,RBL:209.85.215.196:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:14,LUA_SUMMARY:none X-HE-Tag: pull75_6bd76474d4c5f X-Filterd-Recvd-Size: 8322 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Apr 2020 21:45:40 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id p8so101533pgi.5 for ; Thu, 09 Apr 2020 14:45:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RyMpnPWw/Y2hF5e4RISVaGw9GiKX0YRBleN8UhVQook=; b=Q9VBE9H/jA+EvTlAhCQb5W/JPw0TyHyNi6BBbEBkoz0pw69a5qGOBZOBf3AKxifGTZ SFSpKYpw4DU7ySfM4e3OWekOTTWk/9bN0YLoM21CT3cOe+4OudMumqXXVNnDr45geJ5W vT2IRrvRh64RPShFcWtOj4IMoXBHVruq9LHSabQ1066YmCb3s2L7hpWsdxlw6IDrV6yR 2UcO6iL69dA/RHDseRWSXTI/tt3ttKn6OUZi2Wagzeie/nS21vy1q74byDmpQ6U4qFhO yyHbzQKku4xx5QR8Bvd0zR5ENlqvNJNX2WBPLbhNdUTDeLYDF8cP8nNJexyTPoruadRk CBYw== X-Gm-Message-State: AGi0PubP8QMpdnW66dH+51IrL6sawreL6mgdtVh1vWZzVzfOQL4JCLEh mVK0A2nWYts9vnH9mtgh7LI= X-Google-Smtp-Source: APiQypL2c8CK9HLPDrIBFDxSQIJ2+qEM1rtn9Tgz5zxuYIUXNV4bsvZbJNQbtegiTbkBK3QjOXl2Dw== X-Received: by 2002:a63:d512:: with SMTP id c18mr1420081pgg.347.1586468740119; Thu, 09 Apr 2020 14:45:40 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id w134sm65241pfd.41.2020.04.09.14.45.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Apr 2020 14:45:37 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 7115241DAB; Thu, 9 Apr 2020 21:45:32 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Bart Van Assche , Omar Sandoval , Hannes Reinecke , Michal Hocko Subject: [RFC v2 5/5] block: revert back to synchronous request_queue removal Date: Thu, 9 Apr 2020 21:45:30 +0000 Message-Id: <20200409214530.2413-6-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200409214530.2413-1-mcgrof@kernel.org> References: <20200409214530.2413-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on v4.12 moved the work behind blk_release_queue() into a workqueue after a splat floated around which indicated some work on blk_release_queue() could sleep in blk_exit_rl(). This splat would be possible when a driver called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue() as its final call) from an atomic context. blk_put_queue() puts decrements the refcount for the request_queue kobject, and upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is now removed through commit db6d9952356 ("block: remove request_list code"), we reserve the right to be able to sleep within blk_release_queue() context. There should be little reason to defer removal from atomic context these days, as you can always just increase your block device's reference count even in atomic context and leave the removal for the request_queue to the upper layers later. However if you really need to defer removal of the request_queue, you can set the queue flag QUEUE_FLAG_DEFER_REMOVAL now. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Signed-off-by: Luis Chamberlain --- block/blk-sysfs.c | 40 ++++++++++++++++++++++++++++++++-------- include/linux/blkdev.h | 3 +++ 2 files changed, 35 insertions(+), 8 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 20f20b0fa0b9..2ae8c39c88ef 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -860,10 +860,9 @@ static void blk_exit_queue(struct request_queue *q) bdi_put(q->backing_dev_info); } - /** - * __blk_release_queue - release a request queue - * @work: pointer to the release_work member of the request queue to be released + * blk_release_queue_sync- release a request queue + * @q: pointer to the request queue to be released * * Description: * This function is called when a block device is being unregistered. The @@ -872,11 +871,27 @@ static void blk_exit_queue(struct request_queue *q) * the reference counter of the request queue. Once the reference counter * of the request queue reaches zero, blk_release_queue is called to release * all allocated resources of the request queue. + * + * There are two approaches to releasing the request queue, by default + * we reserve the right to sleep on release and so release is synchronous. + * If you know the path under which blk_cleanup_queue() or your last + * blk_put_queue() is called can be called in atomic context you want to + * ensure to defer the removal by setting the QUEUE_FLAG_DEFER_REMOVAL + * flag as follows upon initialization: + * + * blk_queue_flag_set(QUEUE_FLAG_DEFER_REMOVAL, q) + * + * Note that deferring removal may have implications for userspace. An + * example is if you are using an ioctl to allow removal of a block device, + * and the kernel returns immediately even though the device may only + * disappear after the full removal is completed. + * + * You should also be able to work around this by just increasing the + * refcount for the block device instead during your atomic operation, + * and so QUEUE_FLAG_DEFER_REMOVAL should almost never be required. */ -static void __blk_release_queue(struct work_struct *work) +static void blk_release_queue_sync(struct request_queue *q) { - struct request_queue *q = container_of(work, typeof(*q), release_work); - if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) blk_stat_remove_callback(q, q->poll_cb); blk_stat_free_callback(q->poll_cb); @@ -905,13 +920,22 @@ static void __blk_release_queue(struct work_struct *work) call_rcu(&q->rcu_head, blk_free_queue_rcu); } +void __blk_release_queue(struct work_struct *work) +{ + struct request_queue *q = container_of(work, typeof(*q), release_work); + + blk_release_queue_sync(q); +} + static void blk_release_queue(struct kobject *kobj) { struct request_queue *q = container_of(kobj, struct request_queue, kobj); - INIT_WORK(&q->release_work, __blk_release_queue); - schedule_work(&q->release_work); + if (blk_queue_defer_removal(q)) + schedule_work(&q->release_work); + else + blk_release_queue_sync(q); } static const struct sysfs_ops queue_sysfs_ops = { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 8b1cab52cef9..46fee1ef92e3 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -614,6 +614,7 @@ struct request_queue { #define QUEUE_FLAG_PCI_P2PDMA 25 /* device supports PCI p2p requests */ #define QUEUE_FLAG_ZONE_RESETALL 26 /* supports Zone Reset All */ #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */ +#define QUEUE_FLAG_DEFER_REMOVAL 28 /* defer queue removal */ #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP)) @@ -648,6 +649,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #else #define blk_queue_rq_alloc_time(q) false #endif +#define blk_queue_defer_removal(q) \ + test_bit(QUEUE_FLAG_DEFER_REMOVAL, &(q)->queue_flags) #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \