Message ID | 20180412015852.26014-1-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote: > Several block drivers call alloc_disk() followed by put_disk() if > something fails before device_add_disk() is called without calling > blk_cleanup_queue(). Make sure that also for this scenario a request > queue is dissociated from the cgroup controller. This patch avoids > that loading the parport_pc, paride and pf drivers triggers the > following kernel crash: > > BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride] > Read of size 4 at addr 0000000000000008 by task modprobe/744 > Call Trace: > dump_stack+0x9a/0xeb > kasan_report+0x139/0x350 > pi_init+0x42e/0x580 [paride] > pf_init+0x2bb/0x1000 [pf] > do_one_initcall+0x8e/0x405 > do_init_module+0xd9/0x2f2 > load_module+0x3ab4/0x4700 > SYSC_finit_module+0x176/0x1a0 > do_syscall_64+0xee/0x2b0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> > Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: Alexandru Moise <00moses.alexander00@gmail.com> > Cc: Tejun Heo <tj@kernel.org> > --- > block/blk-sysfs.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index f8457d6f0190..2e134da78f82 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -788,11 +788,37 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) > static void __blk_release_queue(struct work_struct *work) > { > struct request_queue *q = container_of(work, typeof(*q), release_work); > + struct blkcg_gq *gq; > > if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) > blk_stat_remove_callback(q, q->poll_cb); > blk_stat_free_callback(q->poll_cb); > > + if (!blk_queue_dead(q)) { > + /* > + * Last reference was dropped without having called > + * blk_cleanup_queue(). > + */ > + WARN_ONCE(blk_queue_init_done(q), > + "request queue %p has been registered but blk_cleanup_queue() has not been called for that queue\n", > + q); > + bdi_put(q->backing_dev_info); > + blkcg_exit_queue(q); > + > + if (q->elevator) { > + ioc_clear_queue(q); > + elevator_exit(q, q->elevator); > + } > + } > + > + rcu_read_lock(); > + gq = blkg_lookup(&blkcg_root, q); > + rcu_read_unlock(); > + > + WARN(gq, > + "request queue %p is being released but it has not yet been removed from the blkcg controller\n", > + q); > + > blk_free_queue_stats(q->stats); This solution is good. Thanks for fixing this! Tested-by: Alexandru Moise <00moses.alexander00@gmail.com> ../Alex > > blk_exit_rl(q, &q->root_rl); > -- > 2.16.2 >
On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote: > Several block drivers call alloc_disk() followed by put_disk() if > something fails before device_add_disk() is called without calling > blk_cleanup_queue(). Make sure that also for this scenario a request > queue is dissociated from the cgroup controller. This patch avoids > that loading the parport_pc, paride and pf drivers triggers the > following kernel crash: > > BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride] > Read of size 4 at addr 0000000000000008 by task modprobe/744 > Call Trace: > dump_stack+0x9a/0xeb > kasan_report+0x139/0x350 > pi_init+0x42e/0x580 [paride] > pf_init+0x2bb/0x1000 [pf] > do_one_initcall+0x8e/0x405 > do_init_module+0xd9/0x2f2 > load_module+0x3ab4/0x4700 > SYSC_finit_module+0x176/0x1a0 > do_syscall_64+0xee/0x2b0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> > Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: Alexandru Moise <00moses.alexander00@gmail.com> > Cc: Tejun Heo <tj@kernel.org> > --- > block/blk-sysfs.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index f8457d6f0190..2e134da78f82 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -788,11 +788,37 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) > static void __blk_release_queue(struct work_struct *work) > { > struct request_queue *q = container_of(work, typeof(*q), release_work); > + struct blkcg_gq *gq; > > if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) > blk_stat_remove_callback(q, q->poll_cb); > blk_stat_free_callback(q->poll_cb); > > + if (!blk_queue_dead(q)) { > + /* > + * Last reference was dropped without having called > + * blk_cleanup_queue(). > + */ > + WARN_ONCE(blk_queue_init_done(q), > + "request queue %p has been registered but blk_cleanup_queue() has not been called for that queue\n", > + q); > + bdi_put(q->backing_dev_info); > + blkcg_exit_queue(q); > + > + if (q->elevator) { > + ioc_clear_queue(q); > + elevator_exit(q, q->elevator); > + } > + } > + > + rcu_read_lock(); > + gq = blkg_lookup(&blkcg_root, q); > + rcu_read_unlock(); > + > + WARN(gq, > + "request queue %p is being released but it has not yet been removed from the blkcg controller\n", > + q); > + > blk_free_queue_stats(q->stats); This solution is good. Thanks for fixing this! Tested-by: Alexandru Moise <00moses.alexander00@gmail.com> ../Alex > > blk_exit_rl(q, &q->root_rl); > -- > 2.16.2 >
On Thu, Apr 12, 2018 at 06:20:34AM +0200, Alexandru Moise wrote: > On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote: > > Several block drivers call alloc_disk() followed by put_disk() if > > something fails before device_add_disk() is called without calling > > blk_cleanup_queue(). Make sure that also for this scenario a request > > queue is dissociated from the cgroup controller. This patch avoids > > that loading the parport_pc, paride and pf drivers triggers the > > following kernel crash: > > > > BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride] > > Read of size 4 at addr 0000000000000008 by task modprobe/744 > > Call Trace: > > dump_stack+0x9a/0xeb > > kasan_report+0x139/0x350 > > pi_init+0x42e/0x580 [paride] > > pf_init+0x2bb/0x1000 [pf] > > do_one_initcall+0x8e/0x405 > > do_init_module+0xd9/0x2f2 > > load_module+0x3ab4/0x4700 > > SYSC_finit_module+0x176/0x1a0 > > do_syscall_64+0xee/0x2b0 > > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > > > Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> > > Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") > > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > > Cc: Alexandru Moise <00moses.alexander00@gmail.com> > > Cc: Tejun Heo <tj@kernel.org> > > --- > > block/blk-sysfs.c | 26 ++++++++++++++++++++++++++ > > 1 file changed, 26 insertions(+) > > > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > > index f8457d6f0190..2e134da78f82 100644 > > --- a/block/blk-sysfs.c > > +++ b/block/blk-sysfs.c > > @@ -788,11 +788,37 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) > > static void __blk_release_queue(struct work_struct *work) > > { > > struct request_queue *q = container_of(work, typeof(*q), release_work); > > + struct blkcg_gq *gq; > > > > if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) > > blk_stat_remove_callback(q, q->poll_cb); > > blk_stat_free_callback(q->poll_cb); > > > > + if (!blk_queue_dead(q)) { > > + /* > > + * Last reference was dropped without having called > > + * blk_cleanup_queue(). > > + */ > > + WARN_ONCE(blk_queue_init_done(q), > > + "request queue %p has been registered but blk_cleanup_queue() has not been called for that queue\n", > > + q); > > + bdi_put(q->backing_dev_info); > > + blkcg_exit_queue(q); > > + > > + if (q->elevator) { > > + ioc_clear_queue(q); > > + elevator_exit(q, q->elevator); > > + } > > + } > > + > > + rcu_read_lock(); > > + gq = blkg_lookup(&blkcg_root, q); > > + rcu_read_unlock(); > > + > > + WARN(gq, > > + "request queue %p is being released but it has not yet been removed from the blkcg controller\n", > > + q); > > + > > blk_free_queue_stats(q->stats); > > This solution is good. Thanks for fixing this! > > Tested-by: Alexandru Moise <00moses.alexander00@gmail.com> > > ../Alex Oh, you also need the no-ops for !CONFIG_BLK_CGROUP, also there is no blkcg_root then as well. Thanks, ../Alex > > > > > blk_exit_rl(q, &q->root_rl); > > -- > > 2.16.2 > >
On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote: > Several block drivers call alloc_disk() followed by put_disk() if > something fails before device_add_disk() is called without calling > blk_cleanup_queue(). Make sure that also for this scenario a request > queue is dissociated from the cgroup controller. This patch avoids > that loading the parport_pc, paride and pf drivers triggers the > following kernel crash: Can we move the cleanup to put_disk in general and not just for this case? Having alloc/free routines pair up generally avoids a lot of confusion.
On Thu, 2018-04-12 at 07:34 +0200, Christoph Hellwig wrote: > On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote: > > Several block drivers call alloc_disk() followed by put_disk() if > > something fails before device_add_disk() is called without calling > > blk_cleanup_queue(). Make sure that also for this scenario a request > > queue is dissociated from the cgroup controller. This patch avoids > > that loading the parport_pc, paride and pf drivers triggers the > > following kernel crash: > > Can we move the cleanup to put_disk in general and not just for > this case? Having alloc/free routines pair up generally avoids > a lot of confusion. Hello Christoph, At least the SCSI ULP drivers drop the last reference to a disk after the blk_cleanup_queue() call. As explained in the description of commit a063057d7c73, removing a request queue from blkcg must happen before blk_cleanup_queue() finishes because a block driver may free the request queue spinlock immediately after blk_cleanup_queue() returns. So I don't think that we can move the code that removes a request queue from blkcg into put_disk(). Another challenge is that some block drivers (e.g. skd) clear the disk->queue pointer if device_add_disk() has not been called to avoid that put_disk() causes a request queue reference count imbalance. Bart.
On Thu, Apr 12, 2018 at 11:52:11AM +0000, Bart Van Assche wrote: > On Thu, 2018-04-12 at 07:34 +0200, Christoph Hellwig wrote: > > On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote: > > > Several block drivers call alloc_disk() followed by put_disk() if > > > something fails before device_add_disk() is called without calling > > > blk_cleanup_queue(). Make sure that also for this scenario a request > > > queue is dissociated from the cgroup controller. This patch avoids > > > that loading the parport_pc, paride and pf drivers triggers the > > > following kernel crash: > > > > Can we move the cleanup to put_disk in general and not just for > > this case? Having alloc/free routines pair up generally avoids > > a lot of confusion. > > Hello Christoph, > > At least the SCSI ULP drivers drop the last reference to a disk after > the blk_cleanup_queue() call. As explained in the description of commit > a063057d7c73, removing a request queue from blkcg must happen before > blk_cleanup_queue() finishes because a block driver may free the > request queue spinlock immediately after blk_cleanup_queue() returns. > So I don't think that we can move the code that removes a request > queue from blkcg into put_disk(). Another challenge is that some block > drivers (e.g. skd) clear the disk->queue pointer if device_add_disk() > has not been called to avoid that put_disk() causes a request queue > reference count imbalance. Which sounds like a very good reason not to use a driver controller lock for internals like blkcq. In fact splitting the lock used for synchronizing access to queue fields from the driver controller lock used to synchronize I/O in the legacy path in long overdue.
Hello, On Thu, Apr 12, 2018 at 03:14:40PM +0200, hch@lst.de wrote: > > At least the SCSI ULP drivers drop the last reference to a disk after > > the blk_cleanup_queue() call. As explained in the description of commit > > a063057d7c73, removing a request queue from blkcg must happen before > > blk_cleanup_queue() finishes because a block driver may free the > > request queue spinlock immediately after blk_cleanup_queue() returns. > > So I don't think that we can move the code that removes a request > > queue from blkcg into put_disk(). Another challenge is that some block > > drivers (e.g. skd) clear the disk->queue pointer if device_add_disk() > > has not been called to avoid that put_disk() causes a request queue > > reference count imbalance. > > Which sounds like a very good reason not to use a driver controller > lock for internals like blkcq. > > In fact splitting the lock used for synchronizing access to queue > fields from the driver controller lock used to synchronize I/O > in the legacy path in long overdue. It'd be probably a lot easier to make sure the shared lock doesn't go away till all the request_queues using it go away. The choice is between refcnting something which carries the lock and double locking in hot paths. Can't think of many downsides of the former approach. Thanks.
On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote: > Several block drivers call alloc_disk() followed by put_disk() if > something fails before device_add_disk() is called without calling > blk_cleanup_queue(). Make sure that also for this scenario a request > queue is dissociated from the cgroup controller. This patch avoids > that loading the parport_pc, paride and pf drivers triggers the > following kernel crash: > > BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride] > Read of size 4 at addr 0000000000000008 by task modprobe/744 > Call Trace: > dump_stack+0x9a/0xeb > kasan_report+0x139/0x350 > pi_init+0x42e/0x580 [paride] > pf_init+0x2bb/0x1000 [pf] > do_one_initcall+0x8e/0x405 > do_init_module+0xd9/0x2f2 > load_module+0x3ab4/0x4700 > SYSC_finit_module+0x176/0x1a0 > do_syscall_64+0xee/0x2b0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> > Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: Alexandru Moise <00moses.alexander00@gmail.com> > Cc: Tejun Heo <tj@kernel.org> So, this might be correct for this reported case but I don't think it's correct in general. There's no synchronization between blkcg q->lock usages and blk_cleanup_queue(). e.g. hotunplugging and shutting down a device while file operations are still in progress can easily blow this up. Thanks.
On Thu, Apr 12, 2018 at 06:48:12AM -0700, tj@kernel.org wrote: > > Which sounds like a very good reason not to use a driver controller > > lock for internals like blkcq. > > > > In fact splitting the lock used for synchronizing access to queue > > fields from the driver controller lock used to synchronize I/O > > in the legacy path in long overdue. > > It'd be probably a lot easier to make sure the shared lock doesn't go > away till all the request_queues using it go away. The choice is > between refcnting something which carries the lock and double locking > in hot paths. Can't think of many downsides of the former approach. We've stopped sharing request_queues between different devices a while ago. The problem is just that for legacy devices the driver still controls the lock life time, and it might be shorter than the queue lifetime. Note that for blk-mq we basically don't use the queue_lock at all, and certainly not in the hot path.
Hello, On Thu, Apr 12, 2018 at 03:56:51PM +0200, hch@lst.de wrote: > On Thu, Apr 12, 2018 at 06:48:12AM -0700, tj@kernel.org wrote: > > > Which sounds like a very good reason not to use a driver controller > > > lock for internals like blkcq. > > > > > > In fact splitting the lock used for synchronizing access to queue > > > fields from the driver controller lock used to synchronize I/O > > > in the legacy path in long overdue. > > > > It'd be probably a lot easier to make sure the shared lock doesn't go > > away till all the request_queues using it go away. The choice is > > between refcnting something which carries the lock and double locking > > in hot paths. Can't think of many downsides of the former approach. > > We've stopped sharing request_queues between different devices a while > ago. The problem is just that for legacy devices the driver still controls > the lock life time, and it might be shorter than the queue lifetime. > Note that for blk-mq we basically don't use the queue_lock at all, > and certainly not in the hot path. Cool, can we just factor out the queue lock from those drivers? It's just really messy to be switching locks like we do in the cleanup path. Thanks.
On Thu, Apr 12, 2018 at 06:58:21AM -0700, tj@kernel.org wrote: > Cool, can we just factor out the queue lock from those drivers? It's > just really messy to be switching locks like we do in the cleanup > path. So, looking at a couple drivers, it looks like all we'd need is a struct which contains a spinlock and a refcnt. Switching the drivers to use that is a bit tedious but straight-forward and the block layer can simply put that struct on queue release. Thanks.
On 04/12/18 07:51, Tejun Heo wrote: > On Wed, Apr 11, 2018 at 07:58:52PM -0600, Bart Van Assche wrote: >> Several block drivers call alloc_disk() followed by put_disk() if >> something fails before device_add_disk() is called without calling >> blk_cleanup_queue(). Make sure that also for this scenario a request >> queue is dissociated from the cgroup controller. This patch avoids >> that loading the parport_pc, paride and pf drivers triggers the >> following kernel crash: >> >> BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride] >> Read of size 4 at addr 0000000000000008 by task modprobe/744 >> Call Trace: >> dump_stack+0x9a/0xeb >> kasan_report+0x139/0x350 >> pi_init+0x42e/0x580 [paride] >> pf_init+0x2bb/0x1000 [pf] >> do_one_initcall+0x8e/0x405 >> do_init_module+0xd9/0x2f2 >> load_module+0x3ab4/0x4700 >> SYSC_finit_module+0x176/0x1a0 >> do_syscall_64+0xee/0x2b0 >> entry_SYSCALL_64_after_hwframe+0x42/0xb7 >> >> Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> >> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") >> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> >> Cc: Alexandru Moise <00moses.alexander00@gmail.com> >> Cc: Tejun Heo <tj@kernel.org> > > So, this might be correct for this reported case but I don't think > it's correct in general. There's no synchronization between blkcg > q->lock usages and blk_cleanup_queue(). e.g. hotunplugging and > shutting down a device while file operations are still in progress can > easily blow this up. Hello Tejun, I have retested hotunplugging by rerunning the srp-test software. It seems like you overlooked that this patch does not remove the blkcg_exit_queue() call from blk_cleanup_queue()? If a device is hotunplugged it is up to the block driver to call blk_cleanup_queue(). And blk_cleanup_queue() will call blkcg_exit_queue(). Bart.
Hello, Bart. On Thu, Apr 12, 2018 at 08:09:17AM -0600, Bart Van Assche wrote: > I have retested hotunplugging by rerunning the srp-test software. It > seems like you overlooked that this patch does not remove the > blkcg_exit_queue() call from blk_cleanup_queue()? If a device is > hotunplugged it is up to the block driver to call > blk_cleanup_queue(). And blk_cleanup_queue() will call > blkcg_exit_queue(). Hmm... what'd prevent blg_lookup_and_create() racing against that? Thanks.
On Thu, 2018-04-12 at 08:37 -0700, Tejun Heo wrote: > On Thu, Apr 12, 2018 at 08:09:17AM -0600, Bart Van Assche wrote: > > I have retested hotunplugging by rerunning the srp-test software. It > > seems like you overlooked that this patch does not remove the > > blkcg_exit_queue() call from blk_cleanup_queue()? If a device is > > hotunplugged it is up to the block driver to call > > blk_cleanup_queue(). And blk_cleanup_queue() will call > > blkcg_exit_queue(). > > Hmm... what'd prevent blg_lookup_and_create() racing against that? Hello Tejun, Did you perhaps mean blkg_lookup_create()? That function has one caller, namely blkcg_bio_issue_check(). The only caller of that function is generic_make_request_checks(). A patch was posted on the linux-block mailing list recently that surrounds that call with blk_queue_enter() / blk_queue_exit(). I think that prevents that blkcg_exit_queue() is called concurrently with blkg_lookup_create(). Bart.
Hello, On Thu, Apr 12, 2018 at 04:03:52PM +0000, Bart Van Assche wrote: > On Thu, 2018-04-12 at 08:37 -0700, Tejun Heo wrote: > > On Thu, Apr 12, 2018 at 08:09:17AM -0600, Bart Van Assche wrote: > > > I have retested hotunplugging by rerunning the srp-test software. It > > > seems like you overlooked that this patch does not remove the > > > blkcg_exit_queue() call from blk_cleanup_queue()? If a device is > > > hotunplugged it is up to the block driver to call > > > blk_cleanup_queue(). And blk_cleanup_queue() will call > > > blkcg_exit_queue(). > > > > Hmm... what'd prevent blg_lookup_and_create() racing against that? > > Hello Tejun, > > Did you perhaps mean blkg_lookup_create()? That function has one caller, Ah yeah, sorry about the sloppiness. > namely blkcg_bio_issue_check(). The only caller of that function is > generic_make_request_checks(). A patch was posted on the linux-block mailing > list recently that surrounds that call with blk_queue_enter() / blk_queue_exit(). > I think that prevents that blkcg_exit_queue() is called concurrently with > blkg_lookup_create(). Yeah, that'd solve the problem for that particular path, but what that's doing is adding another layer of refcnting which is used to implement the revoke (or sever) semantics. This is a fragile approach tho because it isn't clear who's protecting what and there's nothing in blkg's usage which suggests it'd be protected that way and we're basically working around a different underlying problem (lock switching) by expanding the coverage of a different lock. I'd much prefer fixing the lock switching problem properly than working around that shortcoming this way. Thans.
On Thu, 2018-04-12 at 09:12 -0700, tj@kernel.org wrote: > > Did you perhaps mean blkg_lookup_create()? That function has one caller, > > namely blkcg_bio_issue_check(). The only caller of that function is > > generic_make_request_checks(). A patch was posted on the linux-block mailing > > list recently that surrounds that call with blk_queue_enter() / blk_queue_exit(). > > I think that prevents that blkcg_exit_queue() is called concurrently with > > blkg_lookup_create(). > > Yeah, that'd solve the problem for that particular path, but what > that's doing is adding another layer of refcnting which is used to > implement the revoke (or sever) semantics. This is a fragile approach > tho because it isn't clear who's protecting what and there's nothing > in blkg's usage which suggests it'd be protected that way and we're > basically working around a different underlying problem (lock > switching) by expanding the coverage of a different lock. Hello Tejun, Any code that submits a bio or request needs blk_queue_enter() / blk_queue_exit() anyway. Please have a look at the following commit - you will see that that commit reduces the number of blk_queue_enter() / blk_queue_exit() calls in the hot path: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus&id=37f9579f4c31a6d698dbf3016d7bf132f9288d30 Thanks, Bart.
Hello, On Thu, Apr 12, 2018 at 04:29:09PM +0000, Bart Van Assche wrote: > Any code that submits a bio or request needs blk_queue_enter() / > blk_queue_exit() anyway. Please have a look at the following commit - you will > see that that commit reduces the number of blk_queue_enter() / blk_queue_exit() > calls in the hot path: > > https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus&id=37f9579f4c31a6d698dbf3016d7bf132f9288d30 So, this can work, but it's still pretty fragile. * Lock switching is fragile and we really should get rid of it. This is very likely to come back and bite us. * Right now, blk_queue_enter/exit() doesn't have any annotations. IOW, there's no way for paths which need enter locked to actually assert that. If we're gonna protect more things with queue enter/exit, it gotta be annotated. Thanks.
On Thu, 2018-04-12 at 11:11 -0700, tj@kernel.org wrote: > * Right now, blk_queue_enter/exit() doesn't have any annotations. > IOW, there's no way for paths which need enter locked to actually > assert that. If we're gonna protect more things with queue > enter/exit, it gotta be annotated. Hello Tejun, One possibility is to check the count for the local CPU of q->q_usage_counter at runtime, e.g. using WARN_ON_ONCE(). However, that might have a runtime overhead. Another possibility is to follow the example of kernfs and to use a lockdep map and lockdep_assert_held(). There might be other alternatives I have not thought of. Bart.
Hello, On Thu, Apr 12, 2018 at 06:56:26PM +0000, Bart Van Assche wrote: > On Thu, 2018-04-12 at 11:11 -0700, tj@kernel.org wrote: > > * Right now, blk_queue_enter/exit() doesn't have any annotations. > > IOW, there's no way for paths which need enter locked to actually > > assert that. If we're gonna protect more things with queue > > enter/exit, it gotta be annotated. > > Hello Tejun, > > One possibility is to check the count for the local CPU of q->q_usage_counter > at runtime, e.g. using WARN_ON_ONCE(). However, that might have a runtime > overhead. Another possibility is to follow the example of kernfs and to use > a lockdep map and lockdep_assert_held(). There might be other alternatives I > have not thought of. Oh, I'd just do straight-forward lockdep annotation on queue_enter/exit functions and provide the necessary assert helpers. Thanks.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index f8457d6f0190..2e134da78f82 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -788,11 +788,37 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) static void __blk_release_queue(struct work_struct *work) { struct request_queue *q = container_of(work, typeof(*q), release_work); + struct blkcg_gq *gq; if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) blk_stat_remove_callback(q, q->poll_cb); blk_stat_free_callback(q->poll_cb); + if (!blk_queue_dead(q)) { + /* + * Last reference was dropped without having called + * blk_cleanup_queue(). + */ + WARN_ONCE(blk_queue_init_done(q), + "request queue %p has been registered but blk_cleanup_queue() has not been called for that queue\n", + q); + bdi_put(q->backing_dev_info); + blkcg_exit_queue(q); + + if (q->elevator) { + ioc_clear_queue(q); + elevator_exit(q, q->elevator); + } + } + + rcu_read_lock(); + gq = blkg_lookup(&blkcg_root, q); + rcu_read_unlock(); + + WARN(gq, + "request queue %p is being released but it has not yet been removed from the blkcg controller\n", + q); + blk_free_queue_stats(q->stats); blk_exit_rl(q, &q->root_rl);
Several block drivers call alloc_disk() followed by put_disk() if something fails before device_add_disk() is called without calling blk_cleanup_queue(). Make sure that also for this scenario a request queue is dissociated from the cgroup controller. This patch avoids that loading the parport_pc, paride and pf drivers triggers the following kernel crash: BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride] Read of size 4 at addr 0000000000000008 by task modprobe/744 Call Trace: dump_stack+0x9a/0xeb kasan_report+0x139/0x350 pi_init+0x42e/0x580 [paride] pf_init+0x2bb/0x1000 [pf] do_one_initcall+0x8e/0x405 do_init_module+0xd9/0x2f2 load_module+0x3ab4/0x4700 SYSC_finit_module+0x176/0x1a0 do_syscall_64+0xee/0x2b0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Tejun Heo <tj@kernel.org> --- block/blk-sysfs.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)