diff mbox

[4/7] blk-mq: Avoid that request processing stalls when sharing tags

Message ID 20171201000848.2656-5-bart.vanassche@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche Dec. 1, 2017, 12:08 a.m. UTC
blk_mq_sched_mark_restart_hctx() must be called before
blk_mq_dispatch_rq_list() is called. Make sure that
BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list()
call occurs.

Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-mq-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ming Lei Dec. 1, 2017, 2:58 a.m. UTC | #1
On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote:
> blk_mq_sched_mark_restart_hctx() must be called before

Could you please describe the theory on commit log? Like, why is it
a must? and what is the issue to be fixed? 

> blk_mq_dispatch_rq_list() is called. Make sure that
> BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list()
> call occurs.
> 
> Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")

We always mark RESTART state bit just before dispatching from ->dispatch_list,
this way has been there before b347689ffbca, which doesn't change this
RESTART mechanism, so please explain a bit why it is a fix on commit
b347689ffbca.

> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  block/blk-mq-sched.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index d8e3533d3218..c4e0cb5f6f1f 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -208,8 +208,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	 * on the dispatch list or we were able to dispatch from the
>  	 * dispatch list.
>  	 */
> +	blk_mq_sched_mark_restart_hctx(hctx);

This looks over-kill.

It means RESTART has to be done from request's completion after each new dispatch.

>  	if (!list_empty(&rq_list)) {
> -		blk_mq_sched_mark_restart_hctx(hctx);
>  		if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
>  			if (has_sched_dispatch)
>  				blk_mq_do_dispatch_sched(hctx);
> -- 
> 2.15.0
>
Bart Van Assche Dec. 1, 2017, 7:52 p.m. UTC | #2
On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote:
> On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote:

> > blk_mq_sched_mark_restart_hctx() must be called before

> 

> Could you please describe the theory on commit log? Like, why is it

> a must? and what is the issue to be fixed? 


The BLK_MQ_S_SCHED_RESTART test at the end of blk_mq_dispatch_rq_list() can
only work if BLK_MQ_S_SCHED_RESTART is set before blk_mq_dispatch_rq_list()
is called. BTW, without this patch every iteration of my test triggers a
queue stall. With this patch a queue stall only occurs sporadically so I
think we really need something like this patch.

> > blk_mq_dispatch_rq_list() is called. Make sure that

> > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list()

> > call occurs.

> > 

> > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")

> 

> We always mark RESTART state bit just before dispatching from ->dispatch_list,

> this way has been there before b347689ffbca, which doesn't change this

> RESTART mechanism, so please explain a bit why it is a fix on commit

> b347689ffbca.


I'm not completely sure which patch introduced the lockup fixed by this patch
but I will have another look whether this was really introduced by commit
b347689ffbca.

Bart.
Ming Lei Dec. 2, 2017, 12:36 a.m. UTC | #3
On Fri, Dec 01, 2017 at 07:52:14PM +0000, Bart Van Assche wrote:
> On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote:
> > On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote:
> > > blk_mq_sched_mark_restart_hctx() must be called before
> > 
> > Could you please describe the theory on commit log? Like, why is it
> > a must? and what is the issue to be fixed? 
> 
> The BLK_MQ_S_SCHED_RESTART test at the end of blk_mq_dispatch_rq_list() can
> only work if BLK_MQ_S_SCHED_RESTART is set before blk_mq_dispatch_rq_list()
> is called.

The theory about using BLK_MQ_S_SCHED_RESTART in current way is that we
mark it after requests are added to hctx->dispatch, then blk_mq_sched_restart() 
can see this request to be revisited.

So in theory, we don't need to set it before each dispatch.

Once .get_budget()/.put_budget() is introduced, things may be a bit
different because we may need to revisit requests in scheduler/SW queue.
But we depend on SCSI's RESTART(scsi_end_request()) to do that. So we
still don't need this patch.

> BTW, without this patch every iteration of my test triggers a
> queue stall. With this patch a queue stall only occurs sporadically so I
> think we really need something like this patch.

We need to root cause your queue stall first, otherwise any change can
be thought as workaround. Could you investigate the issue a bit and get
the exact reason?

> 
> > > blk_mq_dispatch_rq_list() is called. Make sure that
> > > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list()
> > > call occurs.
> > > 
> > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")
> > 
> > We always mark RESTART state bit just before dispatching from ->dispatch_list,
> > this way has been there before b347689ffbca, which doesn't change this
> > RESTART mechanism, so please explain a bit why it is a fix on commit
> > b347689ffbca.
> 
> I'm not completely sure which patch introduced the lockup fixed by this patch
> but I will have another look whether this was really introduced by commit
> b347689ffbca.

Please make sure 'Fixes' tag correct.
Bart Van Assche Dec. 2, 2017, 12:48 a.m. UTC | #4
On Sat, 2017-12-02 at 08:36 +0800, Ming Lei wrote:
> On Fri, Dec 01, 2017 at 07:52:14PM +0000, Bart Van Assche wrote:

> > On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote:

> > > On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote:

> > > > blk_mq_dispatch_rq_list() is called. Make sure that

> > > > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list()

> > > > call occurs.

> > > > 

> > > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")

> > > 

> > > We always mark RESTART state bit just before dispatching from ->dispatch_list,

> > > this way has been there before b347689ffbca, which doesn't change this

> > > RESTART mechanism, so please explain a bit why it is a fix on commit

> > > b347689ffbca.

> > 

> > I'm not completely sure which patch introduced the lockup fixed by this patch

> > but I will have another look whether this was really introduced by commit

> > b347689ffbca.

> 

> Please make sure 'Fixes' tag correct.


Further tests have shown that the lockup I referred to does not occur before commit
b347689ffbca but that it occurs with b347689ffbca. I think that shows clearly that
commit b347689ffbca introduced this lockup.

Bart.
Ming Lei Dec. 2, 2017, 1 a.m. UTC | #5
On Sat, Dec 02, 2017 at 12:48:51AM +0000, Bart Van Assche wrote:
> On Sat, 2017-12-02 at 08:36 +0800, Ming Lei wrote:
> > On Fri, Dec 01, 2017 at 07:52:14PM +0000, Bart Van Assche wrote:
> > > On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote:
> > > > On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote:
> > > > > blk_mq_dispatch_rq_list() is called. Make sure that
> > > > > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list()
> > > > > call occurs.
> > > > > 
> > > > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue")
> > > > 
> > > > We always mark RESTART state bit just before dispatching from ->dispatch_list,
> > > > this way has been there before b347689ffbca, which doesn't change this
> > > > RESTART mechanism, so please explain a bit why it is a fix on commit
> > > > b347689ffbca.
> > > 
> > > I'm not completely sure which patch introduced the lockup fixed by this patch
> > > but I will have another look whether this was really introduced by commit
> > > b347689ffbca.
> > 
> > Please make sure 'Fixes' tag correct.
> 
> Further tests have shown that the lockup I referred to does not occur before commit
> b347689ffbca but that it occurs with b347689ffbca.

Then you need to root cause it, or

Provide debugfs log and reproduction steps, please.

> I think that shows clearly that commit b347689ffbca introduced this lockup.

That may not be so clearly, maybe it is just triggered easily after this
commit.
Bart Van Assche Dec. 2, 2017, 1:05 a.m. UTC | #6
On Sat, 2017-12-02 at 09:00 +0800, Ming Lei wrote:
> On Sat, Dec 02, 2017 at 12:48:51AM +0000, Bart Van Assche wrote:

> > Further tests have shown that the lockup I referred to does not occur before commit

> > b347689ffbca but that it occurs with b347689ffbca.

> 

> Then you need to root cause it, or


It's not my job to root cause bugs introduced by your patches.

> Provide debugfs log and reproduction steps, please.


How I reproduce this bug is something I have already mentioned many times in
previous e-mails.

Bart.
Ming Lei Dec. 2, 2017, 1:14 a.m. UTC | #7
On Sat, Dec 02, 2017 at 01:05:05AM +0000, Bart Van Assche wrote:
> On Sat, 2017-12-02 at 09:00 +0800, Ming Lei wrote:
> > On Sat, Dec 02, 2017 at 12:48:51AM +0000, Bart Van Assche wrote:
> > > Further tests have shown that the lockup I referred to does not occur before commit
> > > b347689ffbca but that it occurs with b347689ffbca.
> > 
> > Then you need to root cause it, or
> 
> It's not my job to root cause bugs introduced by your patches.
> 
> > Provide debugfs log and reproduction steps, please.
> 
> How I reproduce this bug is something I have already mentioned many times in
> previous e-mails.

Please provide it in this commit log or this thread explicitly, otherwise who
knows what is that.

If that is something only you have the environment for the reproduction,
please provide the debugfs log, otherwise don't expect community can help
you much.
Ming Lei Dec. 2, 2017, 1:20 a.m. UTC | #8
On Sat, Dec 02, 2017 at 01:05:05AM +0000, Bart Van Assche wrote:
> On Sat, 2017-12-02 at 09:00 +0800, Ming Lei wrote:
> > On Sat, Dec 02, 2017 at 12:48:51AM +0000, Bart Van Assche wrote:
> > > Further tests have shown that the lockup I referred to does not occur before commit
> > > b347689ffbca but that it occurs with b347689ffbca.
> > 
> > Then you need to root cause it, or
> 
> It's not my job to root cause bugs introduced by your patches.

Then show us the debugfs and dmesg log, otherwise how can we make
progress since you are the only reproducer?

Also it is you who posts the patch, you have to provide exact root
cause.
diff mbox

Patch

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index d8e3533d3218..c4e0cb5f6f1f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -208,8 +208,8 @@  void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 * on the dispatch list or we were able to dispatch from the
 	 * dispatch list.
 	 */
+	blk_mq_sched_mark_restart_hctx(hctx);
 	if (!list_empty(&rq_list)) {
-		blk_mq_sched_mark_restart_hctx(hctx);
 		if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
 			if (has_sched_dispatch)
 				blk_mq_do_dispatch_sched(hctx);