Message ID | 1512490099.2660.6.camel@sandisk.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote: > On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote: > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index db9556662e27..1816dd8259b3 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) > > out_put_device: > > put_device(&sdev->sdev_gendev); > > out: > > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev)) > > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); > > return false; > > } > > This cannot work since multiple threads can call scsi_mq_get_budget() That is exactly the way we are handling these cases before 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq), so if it can't work, that is not fault of commit 0df21c86bdbf. > concurrently and hence it can happen that none of them sees > atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I If there is concurrent .get_budget(), one of them will see the counter becoming zero finally because each sdev->device_busy is inc/dec atomically. Or scsi_dev_queue_ready() return true. Anyway, we need this patch to avoid possible regression. If you think there are bugs in blk-mq RESTART, just root cause and and fix it. > consider as a workaround. Have you considered to fix the root cause and to > add blk_mq_sched_mark_restart_hctx() where these are missing, e.g. in > blk_mq_sched_dispatch_requests()? The patch below is not a full solution > but resulted in a significant improvement in my tests: > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 69e3226e66ca..9d86876ec503 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > * TODO: get more budgets, and dequeue more requests in > * one time. > */ > + blk_mq_sched_mark_restart_hctx(hctx); > blk_mq_do_dispatch_ctx(hctx); > } else { > blk_mq_flush_busy_ctxs(hctx, &rq_list); This is still a workaround for RESTART, see my comment before: https://marc.info/?l=linux-block&m=151217500929341&w=2
On Wed, 2017-12-06 at 00:28 +0800, Ming Lei wrote: > This is still a workaround for RESTART, see my comment before: > > https://marc.info/?l=linux-block&m=151217500929341&w=2 A quote from that e-mail: "The theory about using BLK_MQ_S_SCHED_RESTART in current way is that we mark it after requests are added to hctx->dispatch". Reading that makes me wonder whether you understand the purpose of the BLK_MQ_S_SCHED_RESTART flag? That flag is not set after requests are added to the dispatch list but after requests have been *removed*. The purpose of that flag is to detect whether another thread has run the queue after requests were removed from the dispatch list and before these were readded. If so, the queue needs to be rerun. Bart.
On Tue, Dec 05, 2017 at 04:41:46PM +0000, Bart Van Assche wrote: > On Wed, 2017-12-06 at 00:28 +0800, Ming Lei wrote: > > This is still a workaround for RESTART, see my comment before: > > > > https://marc.info/?l=linux-block&m=151217500929341&w=2 > > A quote from that e-mail: "The theory about using BLK_MQ_S_SCHED_RESTART in > current way is that we mark it after requests are added to hctx->dispatch". > Reading that makes me wonder whether you understand the purpose of the > BLK_MQ_S_SCHED_RESTART flag? That flag is not set after requests are added > to the dispatch list but after requests have been *removed*. The purpose of > that flag is to detect whether another thread has run the queue after > requests were removed from the dispatch list and before these were readded. > If so, the queue needs to be rerun. If you want to discuss that, please reply on that thread of '[PATCH 4/7] blk-mq: Avoid that request processing sta', I will reply on you there too.
On Wed, Dec 06, 2017 at 12:28:25AM +0800, Ming Lei wrote: > On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote: > > On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote: > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > index db9556662e27..1816dd8259b3 100644 > > > --- a/drivers/scsi/scsi_lib.c > > > +++ b/drivers/scsi/scsi_lib.c > > > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) > > > out_put_device: > > > put_device(&sdev->sdev_gendev); > > > out: > > > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev)) > > > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); > > > return false; > > > } > > > > This cannot work since multiple threads can call scsi_mq_get_budget() > > That is exactly the way we are handling these cases before 0df21c86bdbf(scsi: > implement .get_budget and .put_budget for blk-mq), so if it can't work, > that is not fault of commit 0df21c86bdbf. > > > concurrently and hence it can happen that none of them sees > > atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I > > If there is concurrent .get_budget(), one of them will see the counter > becoming zero finally because each sdev->device_busy is inc/dec > atomically. Or scsi_dev_queue_ready() return true. > > Anyway, we need this patch to avoid possible regression. If you think > there are bugs in blk-mq RESTART, just root cause and and fix it. > > > consider as a workaround. Have you considered to fix the root cause and to > > add blk_mq_sched_mark_restart_hctx() where these are missing, e.g. in > > blk_mq_sched_dispatch_requests()? The patch below is not a full solution > > but resulted in a significant improvement in my tests: > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > index 69e3226e66ca..9d86876ec503 100644 > > --- a/block/blk-mq-sched.c > > +++ b/block/blk-mq-sched.c > > @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > * TODO: get more budgets, and dequeue more requests in > > * one time. > > */ > > + blk_mq_sched_mark_restart_hctx(hctx); > > blk_mq_do_dispatch_ctx(hctx); > > } else { > > blk_mq_flush_busy_ctxs(hctx, &rq_list); > BTW, this kind of change can't cover scsi_set_blocked() which is triggered by timeout, scsi dispatch failure. You will see that easily if you run the SCSI test script I provided in the commit log.
On Wed, 2017-12-06 at 09:52 +0800, Ming Lei wrote: > On Wed, Dec 06, 2017 at 12:28:25AM +0800, Ming Lei wrote: > > On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote: > > > The patch below is not a full solution but resulted in a significant > > > improvement in my tests: > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > > index 69e3226e66ca..9d86876ec503 100644 > > > --- a/block/blk-mq-sched.c > > > +++ b/block/blk-mq-sched.c > > > @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > > * TODO: get more budgets, and dequeue more requests in > > > * one time. > > > */ > > > + blk_mq_sched_mark_restart_hctx(hctx); > > > blk_mq_do_dispatch_ctx(hctx); > > > } else { > > > blk_mq_flush_busy_ctxs(hctx, &rq_list); > > BTW, this kind of change can't cover scsi_set_blocked() which is > triggered by timeout, scsi dispatch failure. You will see that > easily if you run the SCSI test script I provided in the commit log. Hello Ming, I am aware that the above change does not cover all cases. That's why I wrote in my previous e-mail that that patch is not a full solution. The reason I posted that change anyway is because I prefer a solution that is not based on delayed queue runs over a solution that is based on delayed queue runs (blk_mq_delay_run_hw_queue()). My concern is that performance of a solution based on delayed queue runs will be suboptimal. Bart.
On Wed, Dec 06, 2017 at 04:07:17PM +0000, Bart Van Assche wrote: > On Wed, 2017-12-06 at 09:52 +0800, Ming Lei wrote: > > On Wed, Dec 06, 2017 at 12:28:25AM +0800, Ming Lei wrote: > > > On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote: > > > > The patch below is not a full solution but resulted in a significant > > > > improvement in my tests: > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > > > index 69e3226e66ca..9d86876ec503 100644 > > > > --- a/block/blk-mq-sched.c > > > > +++ b/block/blk-mq-sched.c > > > > @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > > > * TODO: get more budgets, and dequeue more requests in > > > > * one time. > > > > */ > > > > + blk_mq_sched_mark_restart_hctx(hctx); > > > > blk_mq_do_dispatch_ctx(hctx); > > > > } else { > > > > blk_mq_flush_busy_ctxs(hctx, &rq_list); > > > > BTW, this kind of change can't cover scsi_set_blocked() which is > > triggered by timeout, scsi dispatch failure. You will see that > > easily if you run the SCSI test script I provided in the commit log. > > Hello Ming, > > I am aware that the above change does not cover all cases. That's why I wrote > in my previous e-mail that that patch is not a full solution. The reason I > posted that change anyway is because I prefer a solution that is not based on > delayed queue runs over a solution that is based on delayed queue runs > (blk_mq_delay_run_hw_queue()). My concern is that performance of a solution > based on delayed queue runs will be suboptimal. Hi, The patch I posted won't cause any performance regression because it is only triggered when queue is becoming idle, also that is exact the way for us to deal with these cases before. But if you always call blk_mq_sched_mark_restart_hctx() before a new dispatch, that may affect performance on NVMe which may never trigger BLK_STS_RESOURCE.
On Wed, 2017-12-06 at 00:28 +0800, Ming Lei wrote: > On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote: > > On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote: > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > index db9556662e27..1816dd8259b3 100644 > > > --- a/drivers/scsi/scsi_lib.c > > > +++ b/drivers/scsi/scsi_lib.c > > > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) > > > out_put_device: > > > put_device(&sdev->sdev_gendev); > > > out: > > > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev)) > > > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); > > > return false; > > > } > > > > This cannot work since multiple threads can call scsi_mq_get_budget() > > That is exactly the way we are handling these cases before 0df21c86bdbf(scsi: > implement .get_budget and .put_budget for blk-mq), so if it can't work, > that is not fault of commit 0df21c86bdbf. > > > concurrently and hence it can happen that none of them sees > > atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I > > If there is concurrent .get_budget(), one of them will see the counter > becoming zero finally because each sdev->device_busy is inc/dec > atomically. Or scsi_dev_queue_ready() return true. > > Anyway, we need this patch to avoid possible regression. If you think > there are bugs in blk-mq RESTART, just root cause and and fix it. Hello Ming, When I looked at the patch at the start of this thread for the first time I got frustrated because I didn't see how this patch could fix the queue stall I ran into myself. Today I started realizing that what Holger reported is probably another issue than what I ran into myself. Since this patch by itself looks now useful to me: Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> BTW, do you think the patch at the start of this thread also fixes the issue that resulted in commit 826a70a08b12 ("SCSI: don't get target/host busy_count in scsi_mq_get_budget()")? Do you think we still need that patch? Bart.
On Thu, 2017-12-07 at 09:31 +0800, Ming Lei wrote: > But if you always call blk_mq_sched_mark_restart_hctx() before a new > dispatch, that may affect performance on NVMe which may never trigger > BLK_STS_RESOURCE. Hmm ... only the SCSI core implements .get_budget() and .put_budget() and I proposed to insert a blk_mq_sched_mark_restart_hctx() call under "if (q->mq_ops->get_budget)". In other words, I proposed to insert a blk_mq_sched_mark_restart_hctx() call in a code path that is never triggered by the NVMe driver. So I don't see how the change I proposed could affect the performance of the NVMe driver? Bart.
On Thu, Dec 07, 2017 at 09:11:54PM +0000, Bart Van Assche wrote: > On Thu, 2017-12-07 at 09:31 +0800, Ming Lei wrote: > > But if you always call blk_mq_sched_mark_restart_hctx() before a new > > dispatch, that may affect performance on NVMe which may never trigger > > BLK_STS_RESOURCE. > > Hmm ... only the SCSI core implements .get_budget() and .put_budget() and > I proposed to insert a blk_mq_sched_mark_restart_hctx() call under "if > (q->mq_ops->get_budget)". In other words, I proposed to insert a > blk_mq_sched_mark_restart_hctx() call in a code path that is never triggered > by the NVMe driver. So I don't see how the change I proposed could affect > the performance of the NVMe driver? You only add the check on none scheduler, right? But this race isn't related with scheduler, that means it can't fix the race with other schedulers. I have test case to trigger this issue on both none and mq-deadline, and my patch fixes them all. Thanks, Ming
On Thu, Dec 07, 2017 at 09:06:58PM +0000, Bart Van Assche wrote: > On Wed, 2017-12-06 at 00:28 +0800, Ming Lei wrote: > > On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote: > > > On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote: > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > > index db9556662e27..1816dd8259b3 100644 > > > > --- a/drivers/scsi/scsi_lib.c > > > > +++ b/drivers/scsi/scsi_lib.c > > > > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) > > > > out_put_device: > > > > put_device(&sdev->sdev_gendev); > > > > out: > > > > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev)) > > > > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); > > > > return false; > > > > } > > > > > > This cannot work since multiple threads can call scsi_mq_get_budget() > > > > That is exactly the way we are handling these cases before 0df21c86bdbf(scsi: > > implement .get_budget and .put_budget for blk-mq), so if it can't work, > > that is not fault of commit 0df21c86bdbf. > > > > > concurrently and hence it can happen that none of them sees > > > atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I > > > > If there is concurrent .get_budget(), one of them will see the counter > > becoming zero finally because each sdev->device_busy is inc/dec > > atomically. Or scsi_dev_queue_ready() return true. > > > > Anyway, we need this patch to avoid possible regression. If you think > > there are bugs in blk-mq RESTART, just root cause and and fix it. > > Hello Ming, > > When I looked at the patch at the start of this thread for the first time I > got frustrated because I didn't see how this patch could fix the queue stall > I ran into myself. Today I started realizing that what Holger reported is > probably another issue than what I ran into myself. Since this patch by > itself looks now useful to me: > > Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Hi Bart, Thanks for your Review! > > BTW, do you think the patch at the start of this thread also fixes the issue > that resulted in commit 826a70a08b12 ("SCSI: don't get target/host busy_count > in scsi_mq_get_budget()")? Do you think we still need that patch? Yeah, once the patch in this thread is in, IO hang shouldn't happen any more even without 826a70a08b12. But that commit is still the correct thing to do, since we let blk-mq's sbitmap queue respect per-host queue depth, which way is much efficient than the simple atomic operations used in scsi_host_queue_ready(). So I think that commit is still useful. When I was figuring out patch of 826a70a08b12, I just ignore the .get_budget() for requests from hctx->dispatch_list, because we don't need to deal with queue idle when .get_budget() is called from both blk_mq_do_dispatch_sched() and blk_mq_do_dispatch_ctx(). Thanks, Ming
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 69e3226e66ca..9d86876ec503 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) * TODO: get more budgets, and dequeue more requests in * one time. */ + blk_mq_sched_mark_restart_hctx(hctx); blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list);