Message ID | 20200429074627.5955-5-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: fix blktrace debugfs use after free | expand |
On Wed, Apr 29, 2020 at 07:46:25AM +0000, Luis Chamberlain wrote: > --- a/block/blk-debugfs.c > +++ b/block/blk-debugfs.c > @@ -13,3 +13,32 @@ void blk_debugfs_register(void) > { > blk_debugfs_root = debugfs_create_dir("block", NULL); > } > + > +static struct dentry *blk_debugfs_dir_register(const char *name) > +{ > + return debugfs_create_dir(name, blk_debugfs_root); > +} Nit, that function is not needed at all, just spell out the call to debugfs_create_dir() in the 2 places below you call it. That will result in less lines of code overall :) > - dir = blk_trace_debugfs_dir(buts, bt); > + dir = blk_trace_debugfs_dir(bdev, q); > + if (WARN_ON(!dir)) > + goto err; With panic-on-warn you just rebooted the box, lovely :( I said previously, that if you _REALLY_ wanted to warn about this, or do something different based on the result of a debugfs call, then you can, but you need to comment the heck out of it as to why you are doing so, otherwise I'm just going to catch it in my tree-wide sweeps and end up removing it. Other than those two nits, this looks _much_ better, thanks for doing this. greg k-h
I can't say I'm a fan of all these long backtraces in commit logs.. > +static struct dentry *blk_debugfs_dir_register(const char *name) > +{ > + return debugfs_create_dir(name, blk_debugfs_root); > +} I don't think we really need this helper. > +void blk_part_debugfs_unregister(struct hd_struct *p) > +{ > + debugfs_remove_recursive(p->debugfs_dir); > + p->debugfs_dir = NULL; > +} Why do we need to clear the pointer here? > +#ifdef CONFIG_DEBUG_FS > + /* Currently only used by kernel/trace/blktrace.c */ > + struct dentry *debugfs_dir; > +#endif Does that comment really add value? > +static struct dentry *blk_trace_debugfs_dir(struct block_device *bdev, > + struct request_queue *q) > { > + struct hd_struct *p = NULL; > > + * Some drivers like scsi-generic use a NULL block device. For > + * other drivers when bdev != bdev->bd_contain we are doing a blktrace > + * on a parition, otherwise we know we are working on the whole > + * disk, and for that the request_queue already has its own debugfs_dir. > + * which we have been using for other things other than blktrace. > + */ > + if (bdev && bdev != bdev->bd_contains) > + p = bdev->bd_part; > > + if (p) > + return p->debugfs_dir; > + > + return q->debugfs_dir; This could be simplified down to: if (bdev && bdev != bdev->bd_contains) return bdev->bd_part->debugfs_dir; return q->debugfs_dir; Given that bd_part is in __blkdev_get very near bd_contains. Also given that this patch completely rewrites blk_trace_debugfs_dir is there any point in the previous patch? > @@ -491,6 +500,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, > struct dentry *dir = NULL; > int ret; > > + > if (!buts->buf_size || !buts->buf_nr) > return -EINVAL; > Spurious whitespace change.
On Wed, Apr 29, 2020 at 04:26:37AM -0700, Christoph Hellwig wrote: > I can't say I'm a fan of all these long backtraces in commit logs.. > > > +static struct dentry *blk_debugfs_dir_register(const char *name) > > +{ > > + return debugfs_create_dir(name, blk_debugfs_root); > > +} > > I don't think we really need this helper. We don't export blk_debugfs_root, didn't think we'd want to, and since only a few scew funky drivers would use the struct gendisk and also support BLKTRACE, I didn't think we'd want to export it now. A new block private symbol namespace alright? > > +void blk_part_debugfs_unregister(struct hd_struct *p) > > +{ > > + debugfs_remove_recursive(p->debugfs_dir); > > + p->debugfs_dir = NULL; > > +} > > Why do we need to clear the pointer here? True, not needed for partition. > > +#ifdef CONFIG_DEBUG_FS > > + /* Currently only used by kernel/trace/blktrace.c */ > > + struct dentry *debugfs_dir; > > +#endif > > Does that comment really add value? I'll nuke it. > > +static struct dentry *blk_trace_debugfs_dir(struct block_device *bdev, > > + struct request_queue *q) > > { > > + struct hd_struct *p = NULL; > > > > + * Some drivers like scsi-generic use a NULL block device. For > > + * other drivers when bdev != bdev->bd_contain we are doing a blktrace > > + * on a parition, otherwise we know we are working on the whole > > + * disk, and for that the request_queue already has its own debugfs_dir. > > + * which we have been using for other things other than blktrace. > > + */ > > + if (bdev && bdev != bdev->bd_contains) > > + p = bdev->bd_part; > > > > + if (p) > > + return p->debugfs_dir; > > + > > + return q->debugfs_dir; > > This could be simplified down to: > > if (bdev && bdev != bdev->bd_contains) > return bdev->bd_part->debugfs_dir; > return q->debugfs_dir; > > Given that bd_part is in __blkdev_get very near bd_contains. Ah neat. > Also given that this patch completely rewrites blk_trace_debugfs_dir is > there any point in the previous patch? Still think it helps with making this patch easier to read, but I don't care, lemme know if I should just fold it. > > @@ -491,6 +500,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, > > struct dentry *dir = NULL; > > int ret; > > > > + > > if (!buts->buf_size || !buts->buf_nr) > > return -EINVAL; > > > > Spurious whitespace change. Will nuke. Luis
On Wed, Apr 29, 2020 at 11:45:42AM +0000, Luis Chamberlain wrote: > On Wed, Apr 29, 2020 at 04:26:37AM -0700, Christoph Hellwig wrote: > > I can't say I'm a fan of all these long backtraces in commit logs.. > > > > > +static struct dentry *blk_debugfs_dir_register(const char *name) > > > +{ > > > + return debugfs_create_dir(name, blk_debugfs_root); > > > +} > > > > I don't think we really need this helper. > > We don't export blk_debugfs_root, didn't think we'd want to, and > since only a few scew funky drivers would use the struct gendisk > and also support BLKTRACE, I didn't think we'd want to export it > now. > > A new block private symbol namespace alright? Err, that function is static and has two callers. > > This could be simplified down to: > > > > if (bdev && bdev != bdev->bd_contains) > > return bdev->bd_part->debugfs_dir; > > return q->debugfs_dir; > > > > Given that bd_part is in __blkdev_get very near bd_contains. > > Ah neat. > > > Also given that this patch completely rewrites blk_trace_debugfs_dir is > > there any point in the previous patch? > > Still think it helps with making this patch easier to read, but I don't > care, lemme know if I should just fold it. In fact I'm not even sure we need the helper. Modulo the comment this just becomes a: if (bdev && bdev != bdev->bd_contains) dir = bdev->bd_part->debugfs_dir; else dir = q->debugfs_dir; in do_blk_trace_setup.
On Wed, Apr 29, 2020 at 04:50:51AM -0700, Christoph Hellwig wrote: > On Wed, Apr 29, 2020 at 11:45:42AM +0000, Luis Chamberlain wrote: > > On Wed, Apr 29, 2020 at 04:26:37AM -0700, Christoph Hellwig wrote: > > > I can't say I'm a fan of all these long backtraces in commit logs.. > > > > > > > +static struct dentry *blk_debugfs_dir_register(const char *name) > > > > +{ > > > > + return debugfs_create_dir(name, blk_debugfs_root); > > > > +} > > > > > > I don't think we really need this helper. > > > > We don't export blk_debugfs_root, didn't think we'd want to, and > > since only a few scew funky drivers would use the struct gendisk > > and also support BLKTRACE, I didn't think we'd want to export it > > now. > > > > A new block private symbol namespace alright? > > Err, that function is static and has two callers. Yes but that is to make it easier to look for who is creating the debugfs_dir for either the request_queue or partition. I'll export blk_debugfs_root and we'll open code all this. > > > This could be simplified down to: > > > > > > if (bdev && bdev != bdev->bd_contains) > > > return bdev->bd_part->debugfs_dir; > > > return q->debugfs_dir; > > > > > > Given that bd_part is in __blkdev_get very near bd_contains. > > > > Ah neat. > > > > > Also given that this patch completely rewrites blk_trace_debugfs_dir is > > > there any point in the previous patch? > > > > Still think it helps with making this patch easier to read, but I don't > > care, lemme know if I should just fold it. > > In fact I'm not even sure we need the helper. Modulo the comment > this just becomes a: > > if (bdev && bdev != bdev->bd_contains) > dir = bdev->bd_part->debugfs_dir; > else > dir = q->debugfs_dir; > > in do_blk_trace_setup. True, alright will remove that patch. Luis
On Wed, Apr 29, 2020 at 12:02:30PM +0000, Luis Chamberlain wrote: > > Err, that function is static and has two callers. > > Yes but that is to make it easier to look for who is creating the > debugfs_dir for either the request_queue or partition. I'll export > blk_debugfs_root and we'll open code all this. No, please not. exported variables are usually a bad idea. Just skip the somewhat pointless trivial static function.
On Wed, Apr 29, 2020 at 05:04:06AM -0700, Christoph Hellwig wrote: > On Wed, Apr 29, 2020 at 12:02:30PM +0000, Luis Chamberlain wrote: > > > Err, that function is static and has two callers. > > > > Yes but that is to make it easier to look for who is creating the > > debugfs_dir for either the request_queue or partition. I'll export > > blk_debugfs_root and we'll open code all this. > > No, please not. exported variables are usually a bad idea. Just > skip the somewhat pointless trivial static function. Alrighty. It has me thinking we might want to only export those symbols to a specific namespace. Thoughts, preferences? BLOCK_GENHD_PRIVATE ? The scsi-generic driver seems... rather unique, and I'd imagine we'd want to discourage such concoctions in the future, so proliferations of these symbols. Luis
On Wed, Apr 29, 2020 at 12:21:52PM +0000, Luis Chamberlain wrote: > On Wed, Apr 29, 2020 at 05:04:06AM -0700, Christoph Hellwig wrote: > > On Wed, Apr 29, 2020 at 12:02:30PM +0000, Luis Chamberlain wrote: > > > > Err, that function is static and has two callers. > > > > > > Yes but that is to make it easier to look for who is creating the > > > debugfs_dir for either the request_queue or partition. I'll export > > > blk_debugfs_root and we'll open code all this. > > > > No, please not. exported variables are usually a bad idea. Just > > skip the somewhat pointless trivial static function. > > Alrighty. It has me thinking we might want to only export those symbols > to a specific namespace. Thoughts, preferences? > > BLOCK_GENHD_PRIVATE ? That's a nice add-on issue after this is fixed. As Christoph and I pointed out, you have _less_ code in the file if you remove the static wrapper function. Do that now and then worry about symbol namespaces please. thanks, greg k-h
On Wed, Apr 29, 2020 at 02:57:26PM +0200, Greg KH wrote: > On Wed, Apr 29, 2020 at 12:21:52PM +0000, Luis Chamberlain wrote: > > On Wed, Apr 29, 2020 at 05:04:06AM -0700, Christoph Hellwig wrote: > > > On Wed, Apr 29, 2020 at 12:02:30PM +0000, Luis Chamberlain wrote: > > > > > Err, that function is static and has two callers. > > > > > > > > Yes but that is to make it easier to look for who is creating the > > > > debugfs_dir for either the request_queue or partition. I'll export > > > > blk_debugfs_root and we'll open code all this. > > > > > > No, please not. exported variables are usually a bad idea. Just > > > skip the somewhat pointless trivial static function. > > > > Alrighty. It has me thinking we might want to only export those symbols > > to a specific namespace. Thoughts, preferences? > > > > BLOCK_GENHD_PRIVATE ? > > That's a nice add-on issue after this is fixed. As Christoph and I > pointed out, you have _less_ code in the file if you remove the static > wrapper function. Do that now and then worry about symbol namespaces > please. So it turns out that in the old implementation, it was implicit that the request_queue directory was shared with the scsi drive. So, the q->debugfs_dir *will* be set, and as we have it here', we'd silently be overwriting the old q->debugfs_dir, as the queue is the same. To keep things working as it used to, with both, we just need to use a symlink here. With the old way, we'd *always* create the sg directory and re-use that, however since we can only have one blktrace per request_queue, it still had the same restriction, this was just implicit. Using a symlink will make this much more obvious and upkeep the old functionality. We'll need to only export one symbol. I'll roll this in. Luis
diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c index 19091e1effc0..a0f4077d6959 100644 --- a/block/blk-debugfs.c +++ b/block/blk-debugfs.c @@ -13,3 +13,32 @@ void blk_debugfs_register(void) { blk_debugfs_root = debugfs_create_dir("block", NULL); } + +static struct dentry *blk_debugfs_dir_register(const char *name) +{ + return debugfs_create_dir(name, blk_debugfs_root); +} + +void blk_queue_debugfs_register(struct request_queue *q, const char *name) +{ + q->debugfs_dir = blk_debugfs_dir_register(name); +} +EXPORT_SYMBOL_GPL(blk_queue_debugfs_register); + +void blk_queue_debugfs_unregister(struct request_queue *q) +{ + debugfs_remove_recursive(q->debugfs_dir); + q->debugfs_dir = NULL; +} +EXPORT_SYMBOL_GPL(blk_queue_debugfs_unregister); + +void blk_part_debugfs_register(struct hd_struct *p, const char *name) +{ + p->debugfs_dir = blk_debugfs_dir_register(name); +} + +void blk_part_debugfs_unregister(struct hd_struct *p) +{ + debugfs_remove_recursive(p->debugfs_dir); + p->debugfs_dir = NULL; +} diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 96b7a35c898a..08edc3a54114 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -822,9 +822,6 @@ void blk_mq_debugfs_register(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i; - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), - blk_debugfs_root); - debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs); /* @@ -855,9 +852,7 @@ void blk_mq_debugfs_register(struct request_queue *q) void blk_mq_debugfs_unregister(struct request_queue *q) { - debugfs_remove_recursive(q->debugfs_dir); q->sched_debugfs_dir = NULL; - q->debugfs_dir = NULL; } static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index eda8c4985511..f758a7e06671 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -905,6 +905,7 @@ static void blk_release_queue(struct kobject *kobj) blk_trace_shutdown(q); + blk_queue_debugfs_unregister(q); if (queue_is_mq(q)) blk_mq_debugfs_unregister(q); @@ -976,6 +977,8 @@ int blk_register_queue(struct gendisk *disk) goto unlock; } + blk_queue_debugfs_register(q, kobject_name(q->kobj.parent)); + if (queue_is_mq(q)) { __blk_mq_register_dev(dev, q); blk_mq_debugfs_register(q); @@ -986,6 +989,7 @@ int blk_register_queue(struct gendisk *disk) ret = elv_register_queue(q, false); if (ret) { mutex_unlock(&q->sysfs_lock); + blk_queue_debugfs_unregister(q); mutex_unlock(&q->sysfs_dir_lock); kobject_del(&q->kobj); blk_trace_remove_sysfs(dev); diff --git a/block/blk.h b/block/blk.h index ec16e8a6049e..46d867a7f5bc 100644 --- a/block/blk.h +++ b/block/blk.h @@ -458,10 +458,21 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, bool *same_page); #ifdef CONFIG_DEBUG_FS void blk_debugfs_register(void); +void blk_part_debugfs_register(struct hd_struct *p, const char *name); +void blk_part_debugfs_unregister(struct hd_struct *p); #else static inline void blk_debugfs_register(void) { } + +static inline void blk_part_debugfs_register(struct hd_struct *p, + const char *name) +{ +} + +static inline void blk_part_debugfs_unregister(struct hd_struct *p) +{ +} #endif /* CONFIG_DEBUG_FS */ #endif /* BLK_INTERNAL_H */ diff --git a/block/partitions/core.c b/block/partitions/core.c index c085bf85509b..ae395b3ec9cc 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -312,6 +312,7 @@ void delete_partition(struct gendisk *disk, struct hd_struct *part) rcu_assign_pointer(ptbl->part[part->partno], NULL); rcu_assign_pointer(ptbl->last_lookup, NULL); kobject_put(part->holder_dir); + blk_part_debugfs_unregister(part); device_del(part_to_dev(part)); /* @@ -433,6 +434,8 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, if (!p->holder_dir) goto out_del; + blk_part_debugfs_register(p, dev_name(pdev)); + dev_set_uevent_suppress(pdev, 0); if (flags & ADDPART_FLAG_WHOLEDISK) { err = device_create_file(pdev, &dev_attr_whole_disk); diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 20472aaaf630..f21787611918 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1548,6 +1548,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf) goto out; } + blk_queue_debugfs_register(sdp->device->request_queue, disk->disk_name); error = cdev_add(cdev, MKDEV(SCSI_GENERIC_MAJOR, sdp->index), 1); if (error) goto cdev_add_err; @@ -1644,6 +1645,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf) sysfs_remove_link(&scsidp->sdev_gendev.kobj, "generic"); device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index)); + blk_queue_debugfs_unregister(sdp->device->request_queue); cdev_del(sdp->cdev); sdp->cdev = NULL; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3122a93c7277..e7edd31bdf9a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -561,8 +561,11 @@ struct request_queue { struct list_head tag_set_list; struct bio_set bio_split; -#ifdef CONFIG_BLK_DEBUG_FS +#ifdef CONFIG_DEBUG_FS + /* Used by block/blk-*debugfs.c and kernel/trace/blktrace.c */ struct dentry *debugfs_dir; +#endif +#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; struct dentry *rqos_debugfs_dir; #endif diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 3b6ff5902edc..eb6db276e293 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -22,7 +22,6 @@ struct blk_trace { u64 end_lba; u32 pid; u32 dev; - struct dentry *dir; struct dentry *dropped_file; struct dentry *msg_file; struct list_head running_list; diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 058d895544c7..899760cf8c37 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -86,6 +86,10 @@ struct hd_struct { #endif struct percpu_ref ref; struct rcu_work rcu_work; +#ifdef CONFIG_DEBUG_FS + /* Currently only used by kernel/trace/blktrace.c */ + struct dentry *debugfs_dir; +#endif }; /** @@ -382,4 +386,18 @@ static inline dev_t blk_lookup_devt(const char *name, int partno) } #endif /* CONFIG_BLOCK */ +#ifdef CONFIG_DEBUG_FS +void blk_queue_debugfs_register(struct request_queue *q, const char *name); +void blk_queue_debugfs_unregister(struct request_queue *q); +#else +static inline void blk_queue_debugfs_register(struct request_queue *q, + const char *name) +{ +} + +static inline void blk_queue_debugfs_unregister(struct request_queue *q) +{ +} +#endif /* CONFIG_DEBUG_FS */ + #endif /* _LINUX_GENHD_H */ diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 2c6e6c386ace..5c52976bd762 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -3,6 +3,7 @@ * Copyright (C) 2006 Jens Axboe <axboe@kernel.dk> * */ + #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/blktrace_api.h> @@ -311,7 +312,6 @@ static void blk_trace_free(struct blk_trace *bt) debugfs_remove(bt->msg_file); debugfs_remove(bt->dropped_file); relay_close(bt->rchan); - debugfs_remove(bt->dir); free_percpu(bt->sequence); free_percpu(bt->msg_data); kfree(bt); @@ -468,16 +468,25 @@ static void blk_trace_setup_lba(struct blk_trace *bt, } } -static struct dentry *blk_trace_debugfs_dir(struct blk_user_trace_setup *buts, - struct blk_trace *bt) +static struct dentry *blk_trace_debugfs_dir(struct block_device *bdev, + struct request_queue *q) { - struct dentry *dir = NULL; + struct hd_struct *p = NULL; - dir = debugfs_lookup(buts->name, blk_debugfs_root); - if (!dir) - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); + /* + * Some drivers like scsi-generic use a NULL block device. For + * other drivers when bdev != bdev->bd_contain we are doing a blktrace + * on a parition, otherwise we know we are working on the whole + * disk, and for that the request_queue already has its own debugfs_dir. + * which we have been using for other things other than blktrace. + */ + if (bdev && bdev != bdev->bd_contains) + p = bdev->bd_part; - return dir; + if (p) + return p->debugfs_dir; + + return q->debugfs_dir; } /* @@ -491,6 +500,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, struct dentry *dir = NULL; int ret; + if (!buts->buf_size || !buts->buf_nr) return -EINVAL; @@ -521,7 +531,9 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = -ENOENT; - dir = blk_trace_debugfs_dir(buts, bt); + dir = blk_trace_debugfs_dir(bdev, q); + if (WARN_ON(!dir)) + goto err; bt->dev = dev; atomic_set(&bt->dropped, 0); @@ -561,8 +573,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = 0; err: - if (dir && !bt->dir) - dput(dir); if (ret) blk_trace_free(bt); return ret;
On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory") merged on v4.12 Omar fixed the original blktrace code for request-based drivers (multiqueue). This however left in place a possible crash, if you happen to abuse blktrace while racing to remove / add a device. We used to use asynchronous removal of the request_queue, and with that the issue was easier to reproduce. Now that we have reverted to synchronous removal of the request_queue, the issue is still possible to reproduce, its however just a bit more difficult. We essentially run two instances of break-blktrace which add/remove a loop device, and setup a blktrace and just never tear the blktrace down. We do this twice in parallel. This is easily reproduced with the break-blktrace run_0004.sh script. We can end up with two types of panics each reflecting where we race, one a failed blktrace setup: [ 252.426751] debugfs: Directory 'loop0' with parent 'block' already present! [ 252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 252.436592] #PF: supervisor write access in kernel mode [ 252.439822] #PF: error_code(0x0002) - not-present page [ 252.442967] PGD 0 P4D 0 [ 252.444656] Oops: 0002 [#1] SMP NOPTI [ 252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 252.456343] RIP: 0010:down_write+0x15/0x40 [ 252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246 [ 252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000 [ 252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001 [ 252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000 [ 252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0 [ 252.473346] FS: 00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000 [ 252.475225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0 [ 252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 252.479866] Call Trace: [ 252.480322] simple_recursive_removal+0x4e/0x2e0 [ 252.481078] ? debugfs_remove+0x60/0x60 [ 252.481725] ? relay_destroy_buf+0x77/0xb0 [ 252.482662] debugfs_remove+0x40/0x60 [ 252.483518] blk_remove_buf_file_callback+0x5/0x10 [ 252.484328] relay_close_buf+0x2e/0x60 [ 252.484930] relay_open+0x1ce/0x2c0 [ 252.485520] do_blk_trace_setup+0x14f/0x2b0 [ 252.486187] __blk_trace_setup+0x54/0xb0 [ 252.486803] blk_trace_ioctl+0x90/0x140 [ 252.487423] ? do_sys_openat2+0x1ab/0x2d0 [ 252.488053] blkdev_ioctl+0x4d/0x260 [ 252.488636] block_ioctl+0x39/0x40 [ 252.489139] ksys_ioctl+0x87/0xc0 [ 252.489675] __x64_sys_ioctl+0x16/0x20 [ 252.490380] do_syscall_64+0x52/0x180 [ 252.491032] entry_SYSCALL_64_after_hwframe+0x44/0xa9 And the other on the device removal: [ 128.528940] debugfs: Directory 'loop0' with parent 'block' already present! [ 128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 128.619537] #PF: supervisor write access in kernel mode [ 128.622700] #PF: error_code(0x0002) - not-present page [ 128.625842] PGD 0 P4D 0 [ 128.627585] Oops: 0002 [#1] SMP NOPTI [ 128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 128.640471] RIP: 0010:down_write+0x15/0x40 [ 128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246 [ 128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000 [ 128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0 [ 128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000 [ 128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0 [ 128.660500] FS: 00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000 [ 128.662204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0 [ 128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.667282] Call Trace: [ 128.667801] simple_recursive_removal+0x4e/0x2e0 [ 128.668663] ? debugfs_remove+0x60/0x60 [ 128.669368] debugfs_remove+0x40/0x60 [ 128.669985] blk_trace_free+0xd/0x50 [ 128.670593] __blk_trace_remove+0x27/0x40 [ 128.671274] blk_trace_shutdown+0x30/0x40 [ 128.671935] blk_release_queue+0x95/0xf0 [ 128.672589] kobject_put+0xa5/0x1b0 [ 128.673188] disk_release+0xa2/0xc0 [ 128.673786] device_release+0x28/0x80 [ 128.674376] kobject_put+0xa5/0x1b0 [ 128.674915] loop_remove+0x39/0x50 [loop] [ 128.675511] loop_control_ioctl+0x113/0x130 [loop] [ 128.676199] ksys_ioctl+0x87/0xc0 [ 128.676708] __x64_sys_ioctl+0x16/0x20 [ 128.677274] do_syscall_64+0x52/0x180 [ 128.677823] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The common theme here is: debugfs: Directory 'loop0' with parent 'block' already present This crash happens because of how blktrace uses the debugfs directory where it places its files. Upon init we always create the same directory which would be needed by blktrace but we only do this for make_request drivers (multiqueue) block drivers, but never for request-based block drivers. Furthermore, that directory is only created on init for the entire disk. This means that if you use blktrace on a parition, we'll always be creating a new directory regardless of whether or not you are doing blktrace on a make_request driver (multiqueue) or a request-based block drivers. These directory creations are only associated with a path, and so when a debugfs_remove() is called it removes everything in its way. A device removal will remove all blktrace files, and so if a blktrace is still present a cleanup of blktrace files later will end up trying to remove dentries pointing to NULL. We can fix the UAF by using a debugfs directory which moving forward will always be accessible if debugfs is enabled for both make_request drivers (multiqueue) and request-based block drivers, *and* for all partitions upon creation. This ensures that removal of the directories only happens on device removal and removes the race of the files underneath an active blktrace. This also simplifies the code considerably, with the only penalty now being that we're always creating the request queue debugfs directory for the request-based block device drivers, and the partition debugfs directories upon initialization for both types of drivers. This patch is part of the work which disputes the severity of CVE-2019-19770 which shows this issue is not a core debugfs issue, but a misuse of debugfs within blktace. Cc: Bart Van Assche <bvanassche@acm.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: yu kuai <yukuai3@huawei.com> Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- block/blk-debugfs.c | 29 +++++++++++++++++++++++++++++ block/blk-mq-debugfs.c | 5 ----- block/blk-sysfs.c | 4 ++++ block/blk.h | 11 +++++++++++ block/partitions/core.c | 3 +++ drivers/scsi/sg.c | 2 ++ include/linux/blkdev.h | 5 ++++- include/linux/blktrace_api.h | 1 - include/linux/genhd.h | 18 ++++++++++++++++++ kernel/trace/blktrace.c | 32 +++++++++++++++++++++----------- 10 files changed, 92 insertions(+), 18 deletions(-)