Message ID | 20171201000848.2656-5-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote: > blk_mq_sched_mark_restart_hctx() must be called before Could you please describe the theory on commit log? Like, why is it a must? and what is the issue to be fixed? > blk_mq_dispatch_rq_list() is called. Make sure that > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list() > call occurs. > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") We always mark RESTART state bit just before dispatching from ->dispatch_list, this way has been there before b347689ffbca, which doesn't change this RESTART mechanism, so please explain a bit why it is a fix on commit b347689ffbca. > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Omar Sandoval <osandov@fb.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Hannes Reinecke <hare@suse.de> > Cc: Johannes Thumshirn <jthumshirn@suse.de> > --- > block/blk-mq-sched.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index d8e3533d3218..c4e0cb5f6f1f 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -208,8 +208,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > * on the dispatch list or we were able to dispatch from the > * dispatch list. > */ > + blk_mq_sched_mark_restart_hctx(hctx); This looks over-kill. It means RESTART has to be done from request's completion after each new dispatch. > if (!list_empty(&rq_list)) { > - blk_mq_sched_mark_restart_hctx(hctx); > if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { > if (has_sched_dispatch) > blk_mq_do_dispatch_sched(hctx); > -- > 2.15.0 >
On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote: > On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote: > > blk_mq_sched_mark_restart_hctx() must be called before > > Could you please describe the theory on commit log? Like, why is it > a must? and what is the issue to be fixed? The BLK_MQ_S_SCHED_RESTART test at the end of blk_mq_dispatch_rq_list() can only work if BLK_MQ_S_SCHED_RESTART is set before blk_mq_dispatch_rq_list() is called. BTW, without this patch every iteration of my test triggers a queue stall. With this patch a queue stall only occurs sporadically so I think we really need something like this patch. > > blk_mq_dispatch_rq_list() is called. Make sure that > > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list() > > call occurs. > > > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") > > We always mark RESTART state bit just before dispatching from ->dispatch_list, > this way has been there before b347689ffbca, which doesn't change this > RESTART mechanism, so please explain a bit why it is a fix on commit > b347689ffbca. I'm not completely sure which patch introduced the lockup fixed by this patch but I will have another look whether this was really introduced by commit b347689ffbca. Bart.
On Fri, Dec 01, 2017 at 07:52:14PM +0000, Bart Van Assche wrote: > On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote: > > On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote: > > > blk_mq_sched_mark_restart_hctx() must be called before > > > > Could you please describe the theory on commit log? Like, why is it > > a must? and what is the issue to be fixed? > > The BLK_MQ_S_SCHED_RESTART test at the end of blk_mq_dispatch_rq_list() can > only work if BLK_MQ_S_SCHED_RESTART is set before blk_mq_dispatch_rq_list() > is called. The theory about using BLK_MQ_S_SCHED_RESTART in current way is that we mark it after requests are added to hctx->dispatch, then blk_mq_sched_restart() can see this request to be revisited. So in theory, we don't need to set it before each dispatch. Once .get_budget()/.put_budget() is introduced, things may be a bit different because we may need to revisit requests in scheduler/SW queue. But we depend on SCSI's RESTART(scsi_end_request()) to do that. So we still don't need this patch. > BTW, without this patch every iteration of my test triggers a > queue stall. With this patch a queue stall only occurs sporadically so I > think we really need something like this patch. We need to root cause your queue stall first, otherwise any change can be thought as workaround. Could you investigate the issue a bit and get the exact reason? > > > > blk_mq_dispatch_rq_list() is called. Make sure that > > > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list() > > > call occurs. > > > > > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") > > > > We always mark RESTART state bit just before dispatching from ->dispatch_list, > > this way has been there before b347689ffbca, which doesn't change this > > RESTART mechanism, so please explain a bit why it is a fix on commit > > b347689ffbca. > > I'm not completely sure which patch introduced the lockup fixed by this patch > but I will have another look whether this was really introduced by commit > b347689ffbca. Please make sure 'Fixes' tag correct.
On Sat, 2017-12-02 at 08:36 +0800, Ming Lei wrote: > On Fri, Dec 01, 2017 at 07:52:14PM +0000, Bart Van Assche wrote: > > On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote: > > > On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote: > > > > blk_mq_dispatch_rq_list() is called. Make sure that > > > > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list() > > > > call occurs. > > > > > > > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") > > > > > > We always mark RESTART state bit just before dispatching from ->dispatch_list, > > > this way has been there before b347689ffbca, which doesn't change this > > > RESTART mechanism, so please explain a bit why it is a fix on commit > > > b347689ffbca. > > > > I'm not completely sure which patch introduced the lockup fixed by this patch > > but I will have another look whether this was really introduced by commit > > b347689ffbca. > > Please make sure 'Fixes' tag correct. Further tests have shown that the lockup I referred to does not occur before commit b347689ffbca but that it occurs with b347689ffbca. I think that shows clearly that commit b347689ffbca introduced this lockup. Bart.
On Sat, Dec 02, 2017 at 12:48:51AM +0000, Bart Van Assche wrote: > On Sat, 2017-12-02 at 08:36 +0800, Ming Lei wrote: > > On Fri, Dec 01, 2017 at 07:52:14PM +0000, Bart Van Assche wrote: > > > On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote: > > > > On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote: > > > > > blk_mq_dispatch_rq_list() is called. Make sure that > > > > > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list() > > > > > call occurs. > > > > > > > > > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") > > > > > > > > We always mark RESTART state bit just before dispatching from ->dispatch_list, > > > > this way has been there before b347689ffbca, which doesn't change this > > > > RESTART mechanism, so please explain a bit why it is a fix on commit > > > > b347689ffbca. > > > > > > I'm not completely sure which patch introduced the lockup fixed by this patch > > > but I will have another look whether this was really introduced by commit > > > b347689ffbca. > > > > Please make sure 'Fixes' tag correct. > > Further tests have shown that the lockup I referred to does not occur before commit > b347689ffbca but that it occurs with b347689ffbca. Then you need to root cause it, or Provide debugfs log and reproduction steps, please. > I think that shows clearly that commit b347689ffbca introduced this lockup. That may not be so clearly, maybe it is just triggered easily after this commit.
On Sat, 2017-12-02 at 09:00 +0800, Ming Lei wrote: > On Sat, Dec 02, 2017 at 12:48:51AM +0000, Bart Van Assche wrote: > > Further tests have shown that the lockup I referred to does not occur before commit > > b347689ffbca but that it occurs with b347689ffbca. > > Then you need to root cause it, or It's not my job to root cause bugs introduced by your patches. > Provide debugfs log and reproduction steps, please. How I reproduce this bug is something I have already mentioned many times in previous e-mails. Bart.
On Sat, Dec 02, 2017 at 01:05:05AM +0000, Bart Van Assche wrote: > On Sat, 2017-12-02 at 09:00 +0800, Ming Lei wrote: > > On Sat, Dec 02, 2017 at 12:48:51AM +0000, Bart Van Assche wrote: > > > Further tests have shown that the lockup I referred to does not occur before commit > > > b347689ffbca but that it occurs with b347689ffbca. > > > > Then you need to root cause it, or > > It's not my job to root cause bugs introduced by your patches. > > > Provide debugfs log and reproduction steps, please. > > How I reproduce this bug is something I have already mentioned many times in > previous e-mails. Please provide it in this commit log or this thread explicitly, otherwise who knows what is that. If that is something only you have the environment for the reproduction, please provide the debugfs log, otherwise don't expect community can help you much.
On Sat, Dec 02, 2017 at 01:05:05AM +0000, Bart Van Assche wrote: > On Sat, 2017-12-02 at 09:00 +0800, Ming Lei wrote: > > On Sat, Dec 02, 2017 at 12:48:51AM +0000, Bart Van Assche wrote: > > > Further tests have shown that the lockup I referred to does not occur before commit > > > b347689ffbca but that it occurs with b347689ffbca. > > > > Then you need to root cause it, or > > It's not my job to root cause bugs introduced by your patches. Then show us the debugfs and dmesg log, otherwise how can we make progress since you are the only reproducer? Also it is you who posts the patch, you have to provide exact root cause.
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index d8e3533d3218..c4e0cb5f6f1f 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -208,8 +208,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) * on the dispatch list or we were able to dispatch from the * dispatch list. */ + blk_mq_sched_mark_restart_hctx(hctx); if (!list_empty(&rq_list)) { - blk_mq_sched_mark_restart_hctx(hctx); if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { if (has_sched_dispatch) blk_mq_do_dispatch_sched(hctx);
blk_mq_sched_mark_restart_hctx() must be called before blk_mq_dispatch_rq_list() is called. Make sure that BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list() call occurs. Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> --- block/blk-mq-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)