Message ID | 20170427155437.23228-2-bart.vanassche@sandisk.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Bart, On Thu, Apr 27, 2017 at 08:54:32AM -0700, Bart Van Assche wrote: > blk_mq_quiesce_queue() callers, e.g. elevator_switch_mq(), assume > that no .queue_rq() calls occur while switching to another I/O I think we should call blk_mq_freeze_queue_wait() instead of blk_quiesce_queue() in elevator_switch_mq(), otherwise it is easy to cause use-after-free. > scheduler. This patch fixes the following kernel crash if another > I/O scheduler than "none" is the default scheduler: > > general protection fault: 0000 [#1] SMP > RIP: 0010:__lock_acquire+0xfe/0x1280 > Call Trace: > lock_acquire+0xd5/0x1c0 > _raw_spin_lock+0x2a/0x40 > dd_dispatch_request+0x29/0x1e0 > blk_mq_sched_dispatch_requests+0x139/0x190 > __blk_mq_run_hw_queue+0x12d/0x1c0 > blk_mq_run_work_fn+0xd/0x10 > process_one_work+0x206/0x6a0 > worker_thread+0x49/0x4a0 > kthread+0x107/0x140 > ret_from_fork+0x2e/0x40 > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > Cc: Hannes Reinecke <hare@suse.com> > Cc: Omar Sandoval <osandov@fb.com> > Cc: Ming Lei <ming.lei@redhat.com> > Cc: <stable@vger.kernel.org> > --- > block/blk-mq.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index b75ef2392db7..3b3420f76b5a 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1224,8 +1224,9 @@ EXPORT_SYMBOL(blk_mq_queue_stopped); > > void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx) > { > - cancel_work(&hctx->run_work); > - cancel_delayed_work(&hctx->delay_work); > + cancel_work_sync(&hctx->run_work); > + cancel_delayed_work_sync(&hctx->delay_work); Could you explain it a bit why we need the sync version? > + cancel_delayed_work_sync(&hctx->delayed_run_work); More introduced, more bugs may come, :-) So I suggest to unity both .run_work and .dealyed_run_work into one work, just as what Jens did in the following link: http://marc.info/?t=149183989800010&r=1&w=2 Thanks, Ming
Hi, On Fri, Apr 28, 2017 at 10:00:44AM +0800, Ming Lei wrote: > Hi Bart, > > On Thu, Apr 27, 2017 at 08:54:32AM -0700, Bart Van Assche wrote: > > blk_mq_quiesce_queue() callers, e.g. elevator_switch_mq(), assume > > that no .queue_rq() calls occur while switching to another I/O > > I think we should call blk_mq_freeze_queue_wait() instead > of blk_quiesce_queue() in elevator_switch_mq(), otherwise > it is easy to cause use-after-free. Sorry, my fault, blk_mq_freeze_queue_wait() has been done in blk_mq_freeze_queue() already, so no new IO can enter queue any more, and all old pending IOs will be fninished when blk_mq_freeze_queue() returns. > > > scheduler. This patch fixes the following kernel crash if another > > I/O scheduler than "none" is the default scheduler: > > > > general protection fault: 0000 [#1] SMP > > RIP: 0010:__lock_acquire+0xfe/0x1280 > > Call Trace: > > lock_acquire+0xd5/0x1c0 > > _raw_spin_lock+0x2a/0x40 > > dd_dispatch_request+0x29/0x1e0 > > blk_mq_sched_dispatch_requests+0x139/0x190 > > __blk_mq_run_hw_queue+0x12d/0x1c0 > > blk_mq_run_work_fn+0xd/0x10 > > process_one_work+0x206/0x6a0 > > worker_thread+0x49/0x4a0 > > kthread+0x107/0x140 > > ret_from_fork+0x2e/0x40 So I am just wondering where the I/O in the above issue is from? and feels there might be other bug behind the issue. Also blk_mq_quiesce_queue() in elevator_switch_mq() shouldn't be necessary. Thanks, Ming
On Fri, 2017-04-28 at 10:00 +0800, Ming Lei wrote: > On Thu, Apr 27, 2017 at 08:54:32AM -0700, Bart Van Assche wrote: > > void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx) > > { > > - cancel_work(&hctx->run_work); > > - cancel_delayed_work(&hctx->delay_work); > > + cancel_work_sync(&hctx->run_work); > > + cancel_delayed_work_sync(&hctx->delay_work); > > Could you explain it a bit why we need the sync version? Because the purpose of this patch is to make blk_mq_quiesce_queue() wait for all .queue_rq() calls. > So I suggest to unity both .run_work and .dealyed_run_work > into one work, just as what Jens did in the following link: > > http://marc.info/?t=149183989800010&r=1&w=2 That should be done after this patch is upstream otherwise this patch won't apply to the stable trees. Bart.
On Fri, Apr 28, 2017 at 03:35:59PM +0000, Bart Van Assche wrote: > On Fri, 2017-04-28 at 10:00 +0800, Ming Lei wrote: > > On Thu, Apr 27, 2017 at 08:54:32AM -0700, Bart Van Assche wrote: > > > void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx) > > > { > > > - cancel_work(&hctx->run_work); > > > - cancel_delayed_work(&hctx->delay_work); > > > + cancel_work_sync(&hctx->run_work); > > > + cancel_delayed_work_sync(&hctx->delay_work); > > > > Could you explain it a bit why we need the sync version? > > Because the purpose of this patch is to make blk_mq_quiesce_queue() wait for > all .queue_rq() calls. If only .queue_rq() from this queue is waited, blk_mq_freeze_queue() is enough. If you want to wait .queue_rq() from all queues, blk_mq_quiesce_queue() can't do that too. Firstly there shouldn't be a 'struct request_queue' parameter passed to this function, because you want to wait for .queue_rq() from all queues. Secondaly your current implementation can't guarantee that point: - only hw queues of this queue are stopped in this function, and .queue_rq() from other queues still can be run once synchronize_rcu() returns. - for BLOCKING case, the SRCU is per hctx. Thirdly I am still not sure why we want to wait for all .queue_rq(), could you explain it a bit? At least for mq scheduler switch, looks all schedule data is per request queue, and we shouldn't have needed to wait all .queue_rq(). > > > So I suggest to unity both .run_work and .dealyed_run_work > > into one work, just as what Jens did in the following link: > > > > http://marc.info/?t=149183989800010&r=1&w=2 > > That should be done after this patch is upstream otherwise this > patch won't apply to the stable trees. OK. Thanks, Ming
diff --git a/block/blk-mq.c b/block/blk-mq.c index b75ef2392db7..3b3420f76b5a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1224,8 +1224,9 @@ EXPORT_SYMBOL(blk_mq_queue_stopped); void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx) { - cancel_work(&hctx->run_work); - cancel_delayed_work(&hctx->delay_work); + cancel_work_sync(&hctx->run_work); + cancel_delayed_work_sync(&hctx->delay_work); + cancel_delayed_work_sync(&hctx->delayed_run_work); set_bit(BLK_MQ_S_STOPPED, &hctx->state); } EXPORT_SYMBOL(blk_mq_stop_hw_queue);
blk_mq_quiesce_queue() callers, e.g. elevator_switch_mq(), assume that no .queue_rq() calls occur while switching to another I/O scheduler. This patch fixes the following kernel crash if another I/O scheduler than "none" is the default scheduler: general protection fault: 0000 [#1] SMP RIP: 0010:__lock_acquire+0xfe/0x1280 Call Trace: lock_acquire+0xd5/0x1c0 _raw_spin_lock+0x2a/0x40 dd_dispatch_request+0x29/0x1e0 blk_mq_sched_dispatch_requests+0x139/0x190 __blk_mq_run_hw_queue+0x12d/0x1c0 blk_mq_run_work_fn+0xd/0x10 process_one_work+0x206/0x6a0 worker_thread+0x49/0x4a0 kthread+0x107/0x140 ret_from_fork+0x2e/0x40 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: <stable@vger.kernel.org> --- block/blk-mq.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)