diff mbox series

[1/1] blk-mq: get ctx in order to handle BLK_MQ_S_INACTIVE in blk_mq_get_tag()

Message ID 20200602061749.32029-1-dongli.zhang@oracle.com (mailing list archive)
State New, archived
Headers show
Series [1/1] blk-mq: get ctx in order to handle BLK_MQ_S_INACTIVE in blk_mq_get_tag() | expand

Commit Message

Dongli Zhang June 2, 2020, 6:17 a.m. UTC
When scheduler is set, we hit below page fault when we offline cpu.

[ 1061.007725] BUG: kernel NULL pointer dereference, address: 0000000000000040
[ 1061.008710] #PF: supervisor read access in kernel mode
[ 1061.009492] #PF: error_code(0x0000) - not-present page
[ 1061.010241] PGD 0 P4D 0
[ 1061.010614] Oops: 0000 [#1] SMP PTI
[ 1061.011130] CPU: 0 PID: 122 Comm: kworker/0:1H Not tainted 5.7.0-rc7+ #2'
... ...
[ 1061.013760] Workqueue: kblockd blk_mq_run_work_fn
[ 1061.014446] RIP: 0010:blk_mq_put_tag+0xf/0x30
... ...
[ 1061.017726] RSP: 0018:ffffa5c18037fc70 EFLAGS: 00010287
[ 1061.018475] RAX: 0000000000000000 RBX: ffffa5c18037fcf0 RCX: 0000000000000004
[ 1061.019507] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff911535dc1180
... ...
[ 1061.028454] Call Trace:
[ 1061.029307]  blk_mq_get_tag+0x26e/0x280
[ 1061.029866]  ? wait_woken+0x80/0x80
[ 1061.030378]  blk_mq_get_driver_tag+0x99/0x110
[ 1061.031009]  blk_mq_dispatch_rq_list+0x107/0x5e0
[ 1061.031672]  ? elv_rb_del+0x1a/0x30
[ 1061.032178]  blk_mq_do_dispatch_sched+0xe2/0x130
[ 1061.032844]  __blk_mq_sched_dispatch_requests+0xcc/0x150
[ 1061.033638]  blk_mq_sched_dispatch_requests+0x2b/0x50
[ 1061.034239]  __blk_mq_run_hw_queue+0x75/0x110
[ 1061.034867]  process_one_work+0x15c/0x370
[ 1061.035450]  worker_thread+0x44/0x3d0
[ 1061.035980]  kthread+0xf3/0x130
[ 1061.036440]  ? max_active_store+0x80/0x80
[ 1061.037018]  ? kthread_bind+0x10/0x10
[ 1061.037554]  ret_from_fork+0x35/0x40
[ 1061.038073] Modules linked in:
[ 1061.038543] CR2: 0000000000000040
[ 1061.038962] ---[ end trace d20e1df7d028e69f ]---

This is because blk_mq_get_driver_tag() would be used to allocate tag once
scheduler (e.g., mq-deadline) is set. However, in order to handle
BLK_MQ_S_INACTIVE in blk_mq_get_tag(), we need to set data->ctx for
blk_mq_put_tag().

Fixes: bf0beec0607db3c6 ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
This is based on for-next because currently the pull request for v5.8 is
not picked by mainline.

 block/blk-mq.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Ming Lei June 2, 2020, 6:46 a.m. UTC | #1
On Mon, Jun 01, 2020 at 11:17:49PM -0700, Dongli Zhang wrote:
> When scheduler is set, we hit below page fault when we offline cpu.
> 
> [ 1061.007725] BUG: kernel NULL pointer dereference, address: 0000000000000040
> [ 1061.008710] #PF: supervisor read access in kernel mode
> [ 1061.009492] #PF: error_code(0x0000) - not-present page
> [ 1061.010241] PGD 0 P4D 0
> [ 1061.010614] Oops: 0000 [#1] SMP PTI
> [ 1061.011130] CPU: 0 PID: 122 Comm: kworker/0:1H Not tainted 5.7.0-rc7+ #2'
> ... ...
> [ 1061.013760] Workqueue: kblockd blk_mq_run_work_fn
> [ 1061.014446] RIP: 0010:blk_mq_put_tag+0xf/0x30
> ... ...
> [ 1061.017726] RSP: 0018:ffffa5c18037fc70 EFLAGS: 00010287
> [ 1061.018475] RAX: 0000000000000000 RBX: ffffa5c18037fcf0 RCX: 0000000000000004
> [ 1061.019507] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff911535dc1180
> ... ...
> [ 1061.028454] Call Trace:
> [ 1061.029307]  blk_mq_get_tag+0x26e/0x280
> [ 1061.029866]  ? wait_woken+0x80/0x80
> [ 1061.030378]  blk_mq_get_driver_tag+0x99/0x110
> [ 1061.031009]  blk_mq_dispatch_rq_list+0x107/0x5e0
> [ 1061.031672]  ? elv_rb_del+0x1a/0x30
> [ 1061.032178]  blk_mq_do_dispatch_sched+0xe2/0x130
> [ 1061.032844]  __blk_mq_sched_dispatch_requests+0xcc/0x150
> [ 1061.033638]  blk_mq_sched_dispatch_requests+0x2b/0x50
> [ 1061.034239]  __blk_mq_run_hw_queue+0x75/0x110
> [ 1061.034867]  process_one_work+0x15c/0x370
> [ 1061.035450]  worker_thread+0x44/0x3d0
> [ 1061.035980]  kthread+0xf3/0x130
> [ 1061.036440]  ? max_active_store+0x80/0x80
> [ 1061.037018]  ? kthread_bind+0x10/0x10
> [ 1061.037554]  ret_from_fork+0x35/0x40
> [ 1061.038073] Modules linked in:
> [ 1061.038543] CR2: 0000000000000040
> [ 1061.038962] ---[ end trace d20e1df7d028e69f ]---
> 
> This is because blk_mq_get_driver_tag() would be used to allocate tag once
> scheduler (e.g., mq-deadline) is set. However, in order to handle
> BLK_MQ_S_INACTIVE in blk_mq_get_tag(), we need to set data->ctx for
> blk_mq_put_tag().
> 
> Fixes: bf0beec0607db3c6 ("blk-mq: drain I/O when all CPUs in a hctx are offline")
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> This is based on for-next because currently the pull request for v5.8 is
> not picked by mainline.
> 
>  block/blk-mq.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9a36ac1c1fa1..8bf6c06a86c1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1056,6 +1056,7 @@ bool blk_mq_get_driver_tag(struct request *rq)
>  {
>  	struct blk_mq_alloc_data data = {
>  		.q = rq->q,
> +		.ctx = rq->mq_ctx,
>  		.hctx = rq->mq_hctx,
>  		.flags = BLK_MQ_REQ_NOWAIT,
>  		.cmd_flags = rq->cmd_flags,

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Daniel Wagner June 2, 2020, 7:51 a.m. UTC | #2
On Mon, Jun 01, 2020 at 11:17:49PM -0700, Dongli Zhang wrote:
> When scheduler is set, we hit below page fault when we offline cpu.
> 
> [ 1061.007725] BUG: kernel NULL pointer dereference, address: 0000000000000040
> [ 1061.008710] #PF: supervisor read access in kernel mode
> [ 1061.009492] #PF: error_code(0x0000) - not-present page
> [ 1061.010241] PGD 0 P4D 0
> [ 1061.010614] Oops: 0000 [#1] SMP PTI
> [ 1061.011130] CPU: 0 PID: 122 Comm: kworker/0:1H Not tainted 5.7.0-rc7+ #2'
> ... ...
> [ 1061.013760] Workqueue: kblockd blk_mq_run_work_fn
> [ 1061.014446] RIP: 0010:blk_mq_put_tag+0xf/0x30
> ... ...
> [ 1061.017726] RSP: 0018:ffffa5c18037fc70 EFLAGS: 00010287
> [ 1061.018475] RAX: 0000000000000000 RBX: ffffa5c18037fcf0 RCX: 0000000000000004
> [ 1061.019507] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff911535dc1180
> ... ...
> [ 1061.028454] Call Trace:
> [ 1061.029307]  blk_mq_get_tag+0x26e/0x280
> [ 1061.029866]  ? wait_woken+0x80/0x80
> [ 1061.030378]  blk_mq_get_driver_tag+0x99/0x110
> [ 1061.031009]  blk_mq_dispatch_rq_list+0x107/0x5e0
> [ 1061.031672]  ? elv_rb_del+0x1a/0x30
> [ 1061.032178]  blk_mq_do_dispatch_sched+0xe2/0x130
> [ 1061.032844]  __blk_mq_sched_dispatch_requests+0xcc/0x150
> [ 1061.033638]  blk_mq_sched_dispatch_requests+0x2b/0x50
> [ 1061.034239]  __blk_mq_run_hw_queue+0x75/0x110
> [ 1061.034867]  process_one_work+0x15c/0x370
> [ 1061.035450]  worker_thread+0x44/0x3d0
> [ 1061.035980]  kthread+0xf3/0x130
> [ 1061.036440]  ? max_active_store+0x80/0x80
> [ 1061.037018]  ? kthread_bind+0x10/0x10
> [ 1061.037554]  ret_from_fork+0x35/0x40
> [ 1061.038073] Modules linked in:
> [ 1061.038543] CR2: 0000000000000040
> [ 1061.038962] ---[ end trace d20e1df7d028e69f ]---
> 
> This is because blk_mq_get_driver_tag() would be used to allocate tag once
> scheduler (e.g., mq-deadline) is set. However, in order to handle
> BLK_MQ_S_INACTIVE in blk_mq_get_tag(), we need to set data->ctx for
> blk_mq_put_tag().
> 
> Fixes: bf0beec0607db3c6 ("blk-mq: drain I/O when all CPUs in a hctx are offline")
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>

Reviewed-by: Daniel Wagner <dwagner@suse.de>
Bob Liu June 2, 2020, 8:01 a.m. UTC | #3
On 6/2/20 2:17 PM, Dongli Zhang wrote:
> When scheduler is set, we hit below page fault when we offline cpu.
> 
> [ 1061.007725] BUG: kernel NULL pointer dereference, address: 0000000000000040
> [ 1061.008710] #PF: supervisor read access in kernel mode
> [ 1061.009492] #PF: error_code(0x0000) - not-present page
> [ 1061.010241] PGD 0 P4D 0
> [ 1061.010614] Oops: 0000 [#1] SMP PTI
> [ 1061.011130] CPU: 0 PID: 122 Comm: kworker/0:1H Not tainted 5.7.0-rc7+ #2'
> ... ...
> [ 1061.013760] Workqueue: kblockd blk_mq_run_work_fn
> [ 1061.014446] RIP: 0010:blk_mq_put_tag+0xf/0x30
> ... ...
> [ 1061.017726] RSP: 0018:ffffa5c18037fc70 EFLAGS: 00010287
> [ 1061.018475] RAX: 0000000000000000 RBX: ffffa5c18037fcf0 RCX: 0000000000000004
> [ 1061.019507] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff911535dc1180
> ... ...
> [ 1061.028454] Call Trace:
> [ 1061.029307]  blk_mq_get_tag+0x26e/0x280
> [ 1061.029866]  ? wait_woken+0x80/0x80
> [ 1061.030378]  blk_mq_get_driver_tag+0x99/0x110
> [ 1061.031009]  blk_mq_dispatch_rq_list+0x107/0x5e0
> [ 1061.031672]  ? elv_rb_del+0x1a/0x30
> [ 1061.032178]  blk_mq_do_dispatch_sched+0xe2/0x130
> [ 1061.032844]  __blk_mq_sched_dispatch_requests+0xcc/0x150
> [ 1061.033638]  blk_mq_sched_dispatch_requests+0x2b/0x50
> [ 1061.034239]  __blk_mq_run_hw_queue+0x75/0x110
> [ 1061.034867]  process_one_work+0x15c/0x370
> [ 1061.035450]  worker_thread+0x44/0x3d0
> [ 1061.035980]  kthread+0xf3/0x130
> [ 1061.036440]  ? max_active_store+0x80/0x80
> [ 1061.037018]  ? kthread_bind+0x10/0x10
> [ 1061.037554]  ret_from_fork+0x35/0x40
> [ 1061.038073] Modules linked in:
> [ 1061.038543] CR2: 0000000000000040
> [ 1061.038962] ---[ end trace d20e1df7d028e69f ]---
> 
> This is because blk_mq_get_driver_tag() would be used to allocate tag once
> scheduler (e.g., mq-deadline) is set. However, in order to handle
> BLK_MQ_S_INACTIVE in blk_mq_get_tag(), we need to set data->ctx for
> blk_mq_put_tag().
> 
> Fixes: bf0beec0607db3c6 ("blk-mq: drain I/O when all CPUs in a hctx are offline")
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> This is based on for-next because currently the pull request for v5.8 is
> not picked by mainline.
> 
>  block/blk-mq.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9a36ac1c1fa1..8bf6c06a86c1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1056,6 +1056,7 @@ bool blk_mq_get_driver_tag(struct request *rq)
>  {
>  	struct blk_mq_alloc_data data = {
>  		.q = rq->q,
> +		.ctx = rq->mq_ctx,
>  		.hctx = rq->mq_hctx,
>  		.flags = BLK_MQ_REQ_NOWAIT,
>  		.cmd_flags = rq->cmd_flags,
> 

Nice catch!
Reviewed-by: Bob Liu <bob.liu@oracle.com>
John Garry June 3, 2020, 11:59 a.m. UTC | #4
On 02/06/2020 07:17, Dongli Zhang wrote:
> When scheduler is set, we hit below page fault when we offline cpu.
> 
> [ 1061.007725] BUG: kernel NULL pointer dereference, address: 0000000000000040
> [ 1061.008710] #PF: supervisor read access in kernel mode
> [ 1061.009492] #PF: error_code(0x0000) - not-present page
> [ 1061.010241] PGD 0 P4D 0
> [ 1061.010614] Oops: 0000 [#1] SMP PTI
> [ 1061.011130] CPU: 0 PID: 122 Comm: kworker/0:1H Not tainted 5.7.0-rc7+ #2'
> ... ...
> [ 1061.013760] Workqueue: kblockd blk_mq_run_work_fn
> [ 1061.014446] RIP: 0010:blk_mq_put_tag+0xf/0x30
> ... ...
> [ 1061.017726] RSP: 0018:ffffa5c18037fc70 EFLAGS: 00010287
> [ 1061.018475] RAX: 0000000000000000 RBX: ffffa5c18037fcf0 RCX: 0000000000000004
> [ 1061.019507] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff911535dc1180
> ... ...
> [ 1061.028454] Call Trace:
> [ 1061.029307]  blk_mq_get_tag+0x26e/0x280
> [ 1061.029866]  ? wait_woken+0x80/0x80
> [ 1061.030378]  blk_mq_get_driver_tag+0x99/0x110
> [ 1061.031009]  blk_mq_dispatch_rq_list+0x107/0x5e0
> [ 1061.031672]  ? elv_rb_del+0x1a/0x30
> [ 1061.032178]  blk_mq_do_dispatch_sched+0xe2/0x130
> [ 1061.032844]  __blk_mq_sched_dispatch_requests+0xcc/0x150
> [ 1061.033638]  blk_mq_sched_dispatch_requests+0x2b/0x50
> [ 1061.034239]  __blk_mq_run_hw_queue+0x75/0x110
> [ 1061.034867]  process_one_work+0x15c/0x370
> [ 1061.035450]  worker_thread+0x44/0x3d0
> [ 1061.035980]  kthread+0xf3/0x130
> [ 1061.036440]  ? max_active_store+0x80/0x80
> [ 1061.037018]  ? kthread_bind+0x10/0x10
> [ 1061.037554]  ret_from_fork+0x35/0x40
> [ 1061.038073] Modules linked in:
> [ 1061.038543] CR2: 0000000000000040
> [ 1061.038962] ---[ end trace d20e1df7d028e69f ]---
> 
> This is because blk_mq_get_driver_tag() would be used to allocate tag once
> scheduler (e.g., mq-deadline) is set. 

I tried mq-deadline and I did not see this. Anyway else special or 
specific about your test?

However, I see other issues for that (setting the scheduler), that being 
scsi timeouts when I start running IO and hotplugging CPUs. I should 
have tested the scheduler != none previously for "blk-mq: improvement 
CPU hotplug (simplified version) " series ...

I'll check Ming's patch "[PATCH] blk-mq: don't fail driver tag 
allocation because of inactive hctx" for that.

Thanks,
John

However, in order to handle
> BLK_MQ_S_INACTIVE in blk_mq_get_tag(), we need to set data->ctx for
> blk_mq_put_tag().
> 
> Fixes: bf0beec0607db3c6 ("blk-mq: drain I/O when all CPUs in a hctx are offline")
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> This is based on for-next because currently the pull request for v5.8 is
> not picked by mainline.
> 
>   block/blk-mq.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9a36ac1c1fa1..8bf6c06a86c1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1056,6 +1056,7 @@ bool blk_mq_get_driver_tag(struct request *rq)
>   {
>   	struct blk_mq_alloc_data data = {
>   		.q = rq->q,
> +		.ctx = rq->mq_ctx,
>   		.hctx = rq->mq_hctx,
>   		.flags = BLK_MQ_REQ_NOWAIT,
>   		.cmd_flags = rq->cmd_flags,
>
Dongli Zhang June 3, 2020, 4:23 p.m. UTC | #5
Hi John,

On 6/3/20 4:59 AM, John Garry wrote:
> On 02/06/2020 07:17, Dongli Zhang wrote:
>> When scheduler is set, we hit below page fault when we offline cpu.
>>
>> [ 1061.007725] BUG: kernel NULL pointer dereference, address: 0000000000000040
>> [ 1061.008710] #PF: supervisor read access in kernel mode
>> [ 1061.009492] #PF: error_code(0x0000) - not-present page
>> [ 1061.010241] PGD 0 P4D 0
>> [ 1061.010614] Oops: 0000 [#1] SMP PTI
>> [ 1061.011130] CPU: 0 PID: 122 Comm: kworker/0:1H Not tainted 5.7.0-rc7+ #2'
>> ... ...
>> [ 1061.013760] Workqueue: kblockd blk_mq_run_work_fn
>> [ 1061.014446] RIP: 0010:blk_mq_put_tag+0xf/0x30
>> ... ...
>> [ 1061.017726] RSP: 0018:ffffa5c18037fc70 EFLAGS: 00010287
>> [ 1061.018475] RAX: 0000000000000000 RBX: ffffa5c18037fcf0 RCX: 0000000000000004
>> [ 1061.019507] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff911535dc1180
>> ... ...
>> [ 1061.028454] Call Trace:
>> [ 1061.029307]  blk_mq_get_tag+0x26e/0x280
>> [ 1061.029866]  ? wait_woken+0x80/0x80
>> [ 1061.030378]  blk_mq_get_driver_tag+0x99/0x110
>> [ 1061.031009]  blk_mq_dispatch_rq_list+0x107/0x5e0
>> [ 1061.031672]  ? elv_rb_del+0x1a/0x30
>> [ 1061.032178]  blk_mq_do_dispatch_sched+0xe2/0x130
>> [ 1061.032844]  __blk_mq_sched_dispatch_requests+0xcc/0x150
>> [ 1061.033638]  blk_mq_sched_dispatch_requests+0x2b/0x50
>> [ 1061.034239]  __blk_mq_run_hw_queue+0x75/0x110
>> [ 1061.034867]  process_one_work+0x15c/0x370
>> [ 1061.035450]  worker_thread+0x44/0x3d0
>> [ 1061.035980]  kthread+0xf3/0x130
>> [ 1061.036440]  ? max_active_store+0x80/0x80
>> [ 1061.037018]  ? kthread_bind+0x10/0x10
>> [ 1061.037554]  ret_from_fork+0x35/0x40
>> [ 1061.038073] Modules linked in:
>> [ 1061.038543] CR2: 0000000000000040
>> [ 1061.038962] ---[ end trace d20e1df7d028e69f ]---
>>
>> This is because blk_mq_get_driver_tag() would be used to allocate tag once
>> scheduler (e.g., mq-deadline) is set. 
> 
> I tried mq-deadline and I did not see this. Anyway else special or specific
> about your test?
> 

I think you just hit the issue as mentioned in another thread.

To reproduce the issue reproduce to hit the condition that:

1. blk_mq_hctx_notify_offline() BLK_MQ_S_INACTIVE with the barrier ...

... while ...

2. blk_mq_get_tag() gets the tag but BLK_MQ_S_INACTIVE is already set.
Therefore, it would put the tag to release it.

Dongli Zhang
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9a36ac1c1fa1..8bf6c06a86c1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1056,6 +1056,7 @@  bool blk_mq_get_driver_tag(struct request *rq)
 {
 	struct blk_mq_alloc_data data = {
 		.q = rq->q,
+		.ctx = rq->mq_ctx,
 		.hctx = rq->mq_hctx,
 		.flags = BLK_MQ_REQ_NOWAIT,
 		.cmd_flags = rq->cmd_flags,