Message ID | 1548838916-25051-1-git-send-email-jianchao.w.wang@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: fix a hung issue when fsync | expand |
On 1/30/19 2:01 AM, Jianchao Wang wrote: > Florian reported a io hung issue when fsync(). It should be > triggered by following race condition. > > data + post flush a flush > > blk_flush_complete_seq > case REQ_FSEQ_DATA > blk_flush_queue_rq > issued to driver blk_mq_dispatch_rq_list > try to issue a flush req > failed due to NON-NCQ command > .queue_rq return BLK_STS_DEV_RESOURCE > > request completion > req->end_io // doesn't check RESTART > mq_flush_data_end_io > case REQ_FSEQ_POSTFLUSH > blk_kick_flush > do nothing because previous flush > has not been completed > blk_mq_run_hw_queue > insert rq to hctx->dispatch > due to RESTART is still set, do nothing > > To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io > with blk_mq_sched_restart to check and clear the RESTART flag. Applied, thanks.
On Wed, Jan 30, 2019 at 08:54:09AM -0700, Jens Axboe wrote: > On 1/30/19 2:01 AM, Jianchao Wang wrote: > > Florian reported a io hung issue when fsync(). It should be > > triggered by following race condition. > > > > data + post flush a flush > > > > blk_flush_complete_seq > > case REQ_FSEQ_DATA > > blk_flush_queue_rq > > issued to driver blk_mq_dispatch_rq_list > > try to issue a flush req > > failed due to NON-NCQ command > > .queue_rq return BLK_STS_DEV_RESOURCE > > > > request completion > > req->end_io // doesn't check RESTART > > mq_flush_data_end_io > > case REQ_FSEQ_POSTFLUSH > > blk_kick_flush > > do nothing because previous flush > > has not been completed > > blk_mq_run_hw_queue > > insert rq to hctx->dispatch > > due to RESTART is still set, do nothing > > > > To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io > > with blk_mq_sched_restart to check and clear the RESTART flag. > > Applied, thanks. > > -- > Jens Axboe Can this be applied to stable kernels please? It's commit 85bd6e61f34dffa8ec2dc75ff3c02ee7b2f1cbce upstream. Thanks,
On Sun, Feb 17, 2019 at 04:37:29PM +0100, Thibaut Sautereau wrote: >On Wed, Jan 30, 2019 at 08:54:09AM -0700, Jens Axboe wrote: >> On 1/30/19 2:01 AM, Jianchao Wang wrote: >> > Florian reported a io hung issue when fsync(). It should be >> > triggered by following race condition. >> > >> > data + post flush a flush >> > >> > blk_flush_complete_seq >> > case REQ_FSEQ_DATA >> > blk_flush_queue_rq >> > issued to driver blk_mq_dispatch_rq_list >> > try to issue a flush req >> > failed due to NON-NCQ command >> > .queue_rq return BLK_STS_DEV_RESOURCE >> > >> > request completion >> > req->end_io // doesn't check RESTART >> > mq_flush_data_end_io >> > case REQ_FSEQ_POSTFLUSH >> > blk_kick_flush >> > do nothing because previous flush >> > has not been completed >> > blk_mq_run_hw_queue >> > insert rq to hctx->dispatch >> > due to RESTART is still set, do nothing >> > >> > To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io >> > with blk_mq_sched_restart to check and clear the RESTART flag. >> >> Applied, thanks. >> >> -- >> Jens Axboe > >Can this be applied to stable kernels please? > >It's commit 85bd6e61f34dffa8ec2dc75ff3c02ee7b2f1cbce upstream. I've queued it for 4.20, 4.19 and 4.14. -- Thanks, Sasha
diff --git a/block/blk-flush.c b/block/blk-flush.c index a3fc7191..6e0f2d9 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -335,7 +335,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error) blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error); spin_unlock_irqrestore(&fq->mq_flush_lock, flags); - blk_mq_run_hw_queue(hctx, true); + blk_mq_sched_restart(hctx); } /**