Message ID | 20190401212014.192753-3-bvanassche@acm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fir a race between request queue cleanup and running queues | expand |
On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote: > Any request queue data structure may change while a queue is frozen. > Hence make sure that blk_mq_run_hw_queues() does not access any hw > queue while a request queue is frozen. > > After blk_cleanup_queue() has marked a queue as dead it is no longer > safe to access the hardware queue data structures. This patch avoids > that blk_mq_run_hw_queues() crashes when called during or after > blk_cleanup_queue() has freed the hardware queues. This patch is a > variant of a patch posted by Hannes Reinecke ("[PATCH] block: don't > call blk_mq_run_hw_queues() for dead or dying queues "). This patch > is similar in nature to commit c246e80d8673 ("block: Avoid that > request_fn is invoked on a dead queue"; v3.8). An example of a crash > that is fixed by this patch: > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30 > Call Trace: > [<ffffffff81303a88>] blk_mq_run_hw_queues+0x48/0x90 > [<ffffffff813053cc>] blk_mq_requeue_work+0x10c/0x120 > [<ffffffff81098cb4>] process_one_work+0x154/0x410 > [<ffffffff81099896>] worker_thread+0x116/0x4a0 > [<ffffffff8109edb9>] kthread+0xc9/0xe0 > [<ffffffff81619b05>] ret_from_fork+0x55/0x80 > > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Hannes Reinecke <hare@suse.com> > Cc: James Smart <james.smart@broadcom.com> > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Jianchao Wang <jianchao.w.wang@oracle.com> > Cc: Dongli Zhang <dongli.zhang@oracle.com> > Cc: <stable@vger.kernel.org> > Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17. > Reported-by: James Smart <james.smart@broadcom.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > block/blk-mq.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 3ff3d7b49969..652d0c6d5945 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) > struct blk_mq_hw_ctx *hctx; > int i; > > + /* > + * Do not run any hardware queues if the queue is frozen or if a > + * concurrent blk_cleanup_queue() call is removing any data > + * structures used by this function. > + */ > + if (!percpu_ref_tryget(&q->q_usage_counter)) > + return; > queue_for_each_hw_ctx(q, hctx, i) { > if (blk_mq_hctx_stopped(hctx)) > continue; > > blk_mq_run_hw_queue(hctx, async); > } > + percpu_ref_put(&q->q_usage_counter); > } > EXPORT_SYMBOL(blk_mq_run_hw_queues); I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put() in the fast path if we simply release all hctx resource in hctx's release handler by the following patch: https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.com/T/#u Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in scsi_end_request(). Thanks, Ming
On Tue, 2019-04-02 at 08:53 +0800, Ming Lei wrote: > On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 3ff3d7b49969..652d0c6d5945 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) > > struct blk_mq_hw_ctx *hctx; > > int i; > > > > + /* > > + * Do not run any hardware queues if the queue is frozen or if a > > + * concurrent blk_cleanup_queue() call is removing any data > > + * structures used by this function. > > + */ > > + if (!percpu_ref_tryget(&q->q_usage_counter)) > > + return; > > queue_for_each_hw_ctx(q, hctx, i) { > > if (blk_mq_hctx_stopped(hctx)) > > continue; > > > > blk_mq_run_hw_queue(hctx, async); > > } > > + percpu_ref_put(&q->q_usage_counter); > > } > > EXPORT_SYMBOL(blk_mq_run_hw_queues); > > I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put() > in the fast path if we simply release all hctx resource in hctx's > release handler by the following patch: > > https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.com/T/#u > > Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in > scsi_end_request(). The above approach has the advantages of being easy to review and to maintain. Patch "[PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler" makes the block layer more complicated because it introduces a new state for hardware queues: block driver cleanup has happened (set->ops->exit_hctx(...)) but the hardware queues are still in use by the block layer core. Let's see what other reviewers think. Bart.
On Tue, Apr 02, 2019 at 08:44:10AM -0700, Bart Van Assche wrote: > On Tue, 2019-04-02 at 08:53 +0800, Ming Lei wrote: > > On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote: > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > > index 3ff3d7b49969..652d0c6d5945 100644 > > > --- a/block/blk-mq.c > > > +++ b/block/blk-mq.c > > > @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) > > > struct blk_mq_hw_ctx *hctx; > > > int i; > > > > > > + /* > > > + * Do not run any hardware queues if the queue is frozen or if a > > > + * concurrent blk_cleanup_queue() call is removing any data > > > + * structures used by this function. > > > + */ > > > + if (!percpu_ref_tryget(&q->q_usage_counter)) > > > + return; > > > queue_for_each_hw_ctx(q, hctx, i) { > > > if (blk_mq_hctx_stopped(hctx)) > > > continue; > > > > > > blk_mq_run_hw_queue(hctx, async); > > > } > > > + percpu_ref_put(&q->q_usage_counter); > > > } > > > EXPORT_SYMBOL(blk_mq_run_hw_queues); > > > > I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put() > > in the fast path if we simply release all hctx resource in hctx's > > release handler by the following patch: > > > > https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.com/T/#u > > > > Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in > > scsi_end_request(). > > The above approach has the advantages of being easy to review and to maintain. > > Patch "[PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler" > makes the block layer more complicated because it introduces a new state for > hardware queues: block driver cleanup has happened (set->ops->exit_hctx(...)) but We are done with driver after blk_freeze_queue() and blk_sync_queue(), then call .exit_hctx() to say good bye with driver, I don't see it causes any issue. > the hardware queues are still in use by the block layer core. Block layer has the correct in-memory state to work well, and no driver activity is involved too. Thanks, Ming
diff --git a/block/blk-mq.c b/block/blk-mq.c index 3ff3d7b49969..652d0c6d5945 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) struct blk_mq_hw_ctx *hctx; int i; + /* + * Do not run any hardware queues if the queue is frozen or if a + * concurrent blk_cleanup_queue() call is removing any data + * structures used by this function. + */ + if (!percpu_ref_tryget(&q->q_usage_counter)) + return; queue_for_each_hw_ctx(q, hctx, i) { if (blk_mq_hctx_stopped(hctx)) continue; blk_mq_run_hw_queue(hctx, async); } + percpu_ref_put(&q->q_usage_counter); } EXPORT_SYMBOL(blk_mq_run_hw_queues);
Any request queue data structure may change while a queue is frozen. Hence make sure that blk_mq_run_hw_queues() does not access any hw queue while a request queue is frozen. After blk_cleanup_queue() has marked a queue as dead it is no longer safe to access the hardware queue data structures. This patch avoids that blk_mq_run_hw_queues() crashes when called during or after blk_cleanup_queue() has freed the hardware queues. This patch is a variant of a patch posted by Hannes Reinecke ("[PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues "). This patch is similar in nature to commit c246e80d8673 ("block: Avoid that request_fn is invoked on a dead queue"; v3.8). An example of a crash that is fixed by this patch: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30 Call Trace: [<ffffffff81303a88>] blk_mq_run_hw_queues+0x48/0x90 [<ffffffff813053cc>] blk_mq_requeue_work+0x10c/0x120 [<ffffffff81098cb4>] process_one_work+0x154/0x410 [<ffffffff81099896>] worker_thread+0x116/0x4a0 [<ffffffff8109edb9>] kthread+0xc9/0xe0 [<ffffffff81619b05>] ret_from_fork+0x55/0x80 Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: James Smart <james.smart@broadcom.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: <stable@vger.kernel.org> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17. Reported-by: James Smart <james.smart@broadcom.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- block/blk-mq.c | 8 ++++++++ 1 file changed, 8 insertions(+)