Message ID | 20220415034703.2081695-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | [V2] block: avoid io timeout in case of sync polled dio | expand |
On Fri, Apr 15, 2022 at 11:47:03AM +0800, Ming Lei wrote: > + /* make sure the bio is issued before polling */ > + if (bio.bi_opf & REQ_POLLED) > + blk_flush_plug(current->plug, false); I still think the core code should handle this. Without that we'd need to export the blk_flush_plug for anything that would want to poll bios from modules, in addition to it generally being a mess. See a proposed patch for that below. I'd also split the flush aspect from the poll aspect into two patches. > + if (bio.bi_opf & REQ_POLLED) > + bio_poll(&bio, NULL, 0); > + else > blk_io_schedule(); Instead of this duplicate logic everywhere I'd just make bio_boll call blk_io_schedule for the !REQ_POLLED case and simplify all the callers. > + if (dio->submit.poll_bio && > + (dio->submit.poll_bio->bi_opf & > + REQ_POLLED)) This indentation looks awfull,normal would be: if (dio->submit.poll_bio && (dio->submit.poll_bio->bi_opf & REQ_POLLED)) --- From 08ff61b0142eb708fc384cf867c72175561d974a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Fri, 15 Apr 2022 07:15:42 +0200 Subject: blk-mq: don't plug for synchronously polled requests For synchronous polling to work, the bio must be issued to the driver from the submit_bio call, otherwise ->bi_cookie won't be set. Based on a patch from Ming Lei. Reported-by: Changhui Zhong <czhong@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-mq.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ed3ed86f7dd24..bcc7e3d11296c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2851,7 +2851,13 @@ void blk_mq_submit_bio(struct bio *bio) return; } - if (plug) + /* + * We can't plug for synchronously polled submissions, otherwise + * bio->bi_cookie won't be set directly after submission, which is the + * indicator used by the submitter to check if a bio needs polling. + */ + if (plug && + (rq->bio->bi_opf & (REQ_POLLED | REQ_NOWAIT)) != REQ_POLLED) blk_add_rq_to_plug(plug, rq); else if ((rq->rq_flags & RQF_ELV) || (rq->mq_hctx->dispatch_busy &&
On Fri, Apr 15, 2022 at 07:18:44AM +0200, Christoph Hellwig wrote: > On Fri, Apr 15, 2022 at 11:47:03AM +0800, Ming Lei wrote: > > + /* make sure the bio is issued before polling */ > > + if (bio.bi_opf & REQ_POLLED) > > + blk_flush_plug(current->plug, false); > > I still think the core code should handle this. Without that we'd need > to export the blk_flush_plug for anything that would want to poll bios > from modules, in addition to it generally being a mess. See a proposed So far there isn't such usage yet. dm calls bio_poll() in ->iopoll(), and its caller(io_uring) will finish the plug. > patch for that below. I'd also split the flush aspect from the poll > aspect into two patches. > > > + if (bio.bi_opf & REQ_POLLED) > > + bio_poll(&bio, NULL, 0); > > + else > > blk_io_schedule(); > > Instead of this duplicate logic everywhere I'd just make bio_boll > call blk_io_schedule for the !REQ_POLLED case and simplify all the > callers. bio_poll() may be called with rcu read lock held, so I'd suggest to not mix the two together. > > > + if (dio->submit.poll_bio && > > + (dio->submit.poll_bio->bi_opf & > > + REQ_POLLED)) > > This indentation looks awfull,normal would be: > > if (dio->submit.poll_bio && > (dio->submit.poll_bio->bi_opf & REQ_POLLED)) That follows the indentation style of fs/iomap/direct-io.c for break in 'if'. > > --- > From 08ff61b0142eb708fc384cf867c72175561d974a Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig <hch@lst.de> > Date: Fri, 15 Apr 2022 07:15:42 +0200 > Subject: blk-mq: don't plug for synchronously polled requests > > For synchronous polling to work, the bio must be issued to the driver from > the submit_bio call, otherwise ->bi_cookie won't be set. > > Based on a patch from Ming Lei. > > Reported-by: Changhui Zhong <czhong@redhat.com> > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/blk-mq.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index ed3ed86f7dd24..bcc7e3d11296c 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2851,7 +2851,13 @@ void blk_mq_submit_bio(struct bio *bio) > return; > } > > - if (plug) > + /* > + * We can't plug for synchronously polled submissions, otherwise > + * bio->bi_cookie won't be set directly after submission, which is the > + * indicator used by the submitter to check if a bio needs polling. > + */ > + if (plug && > + (rq->bio->bi_opf & (REQ_POLLED | REQ_NOWAIT)) != REQ_POLLED) > blk_add_rq_to_plug(plug, rq); > else if ((rq->rq_flags & RQF_ELV) || > (rq->mq_hctx->dispatch_busy && It is nothing to do with REQ_NOWAIT. sync polled dio can be marked as REQ_NOWAIT by userspace too. If '--nowait=1' is added in the fio reproducer, io timeout is triggered too. Thanks, Ming
On Fri, Apr 15, 2022 at 07:00:37PM +0800, Ming Lei wrote: > On Fri, Apr 15, 2022 at 07:18:44AM +0200, Christoph Hellwig wrote: > > On Fri, Apr 15, 2022 at 11:47:03AM +0800, Ming Lei wrote: > > > + /* make sure the bio is issued before polling */ > > > + if (bio.bi_opf & REQ_POLLED) > > > + blk_flush_plug(current->plug, false); > > > > I still think the core code should handle this. Without that we'd need > > to export the blk_flush_plug for anything that would want to poll bios > > from modules, in addition to it generally being a mess. See a proposed > > So far there isn't such usage yet. dm calls bio_poll() in ->iopoll(), > and its caller(io_uring) will finish the plug. Yes. But not doing this automatically also means you keep easily forgetting callsites. For example iomap still does not flush the plug in your patch. > > patch for that below. I'd also split the flush aspect from the poll > > aspect into two patches. > > > > > + if (bio.bi_opf & REQ_POLLED) > > > + bio_poll(&bio, NULL, 0); > > > + else > > > blk_io_schedule(); > > > > Instead of this duplicate logic everywhere I'd just make bio_boll > > call blk_io_schedule for the !REQ_POLLED case and simplify all the > > callers. > > bio_poll() may be called with rcu read lock held, so I'd suggest to > not mix the two together. Ok, makes sense. > > > > > + if (dio->submit.poll_bio && > > > + (dio->submit.poll_bio->bi_opf & > > > + REQ_POLLED)) > > > > This indentation looks awfull,normal would be: > > > > if (dio->submit.poll_bio && > > (dio->submit.poll_bio->bi_opf & REQ_POLLED)) > > That follows the indentation style of fs/iomap/direct-io.c for break in > 'if'. It doesn't. Just look at the conditional you replaced for example :) > > + /* > > + * We can't plug for synchronously polled submissions, otherwise > > + * bio->bi_cookie won't be set directly after submission, which is the > > + * indicator used by the submitter to check if a bio needs polling. > > + */ > > + if (plug && > > + (rq->bio->bi_opf & (REQ_POLLED | REQ_NOWAIT)) != REQ_POLLED) > > blk_add_rq_to_plug(plug, rq); > > else if ((rq->rq_flags & RQF_ELV) || > > (rq->mq_hctx->dispatch_busy && > > It is nothing to do with REQ_NOWAIT. sync polled dio can be marked as > REQ_NOWAIT by userspace too. If '--nowait=1' is added in the fio > reproducer, io timeout is triggered too. True. So I guess we'll need a new flag to distinguish the cases.
On Sat, Apr 16, 2022 at 07:49:13AM +0200, Christoph Hellwig wrote: > On Fri, Apr 15, 2022 at 07:00:37PM +0800, Ming Lei wrote: > > On Fri, Apr 15, 2022 at 07:18:44AM +0200, Christoph Hellwig wrote: > > > On Fri, Apr 15, 2022 at 11:47:03AM +0800, Ming Lei wrote: > > > > + /* make sure the bio is issued before polling */ > > > > + if (bio.bi_opf & REQ_POLLED) > > > > + blk_flush_plug(current->plug, false); > > > > > > I still think the core code should handle this. Without that we'd need > > > to export the blk_flush_plug for anything that would want to poll bios > > > from modules, in addition to it generally being a mess. See a proposed > > > > So far there isn't such usage yet. dm calls bio_poll() in ->iopoll(), > > and its caller(io_uring) will finish the plug. > > Yes. But not doing this automatically also means you keep easily > forgetting callsites. For example iomap still does not flush the plug > in your patch. It is reasonable for flush user(usually submission) to be responsible for finishing/flushing plug. iomap is one good example to show this point, since it does flush the plug before call bio_poll(), see __iomap_dio_rw(). > > > > patch for that below. I'd also split the flush aspect from the poll > > > aspect into two patches. > > > > > > > + if (bio.bi_opf & REQ_POLLED) > > > > + bio_poll(&bio, NULL, 0); > > > > + else > > > > blk_io_schedule(); > > > > > > Instead of this duplicate logic everywhere I'd just make bio_boll > > > call blk_io_schedule for the !REQ_POLLED case and simplify all the > > > callers. > > > > bio_poll() may be called with rcu read lock held, so I'd suggest to > > not mix the two together. > > Ok, makes sense. > > > > > > > > + if (dio->submit.poll_bio && > > > > + (dio->submit.poll_bio->bi_opf & > > > > + REQ_POLLED)) > > > > > > This indentation looks awfull,normal would be: > > > > > > if (dio->submit.poll_bio && > > > (dio->submit.poll_bio->bi_opf & REQ_POLLED)) > > > > That follows the indentation style of fs/iomap/direct-io.c for break in > > 'if'. > > It doesn't. Just look at the conditional you replaced for example :) OK, I will change to your style. > > > > + /* > > > + * We can't plug for synchronously polled submissions, otherwise > > > + * bio->bi_cookie won't be set directly after submission, which is the > > > + * indicator used by the submitter to check if a bio needs polling. > > > + */ > > > + if (plug && > > > + (rq->bio->bi_opf & (REQ_POLLED | REQ_NOWAIT)) != REQ_POLLED) > > > blk_add_rq_to_plug(plug, rq); > > > else if ((rq->rq_flags & RQF_ELV) || > > > (rq->mq_hctx->dispatch_busy && > > > > It is nothing to do with REQ_NOWAIT. sync polled dio can be marked as > > REQ_NOWAIT by userspace too. If '--nowait=1' is added in the fio > > reproducer, io timeout is triggered too. > > True. So I guess we'll need a new flag to distinguish the cases. If there will be more such kind of poll usage in kernel, I think it is fine to add the flag, but so far all the three aren't used very often. Thanks, Ming
On Sat, Apr 16, 2022 at 05:03:35PM +0800, Ming Lei wrote: > > Yes. But not doing this automatically also means you keep easily > > forgetting callsites. For example iomap still does not flush the plug > > in your patch. > > It is reasonable for flush user(usually submission) to be responsible > for finishing/flushing plug. Well, I very much disagree here. blk_flush_plug is not a publіc, exported API, and that is for a reason. A bio submission interface that requires flushing the plug to be useful is rather broken. > iomap is one good example to show this point, since it does flush the plug > before call bio_poll(), see __iomap_dio_rw(). iomap does not do a manual plug flush anywhere. iomap does finish the plug before polling, which makes sense. Now of course __blkdev_direct_IO_simple doesn't even use a plug to start with, so I'm wondering what plug this patch even tries to flush?
On Mon, Apr 18, 2022 at 07:12:34AM +0200, Christoph Hellwig wrote: > On Sat, Apr 16, 2022 at 05:03:35PM +0800, Ming Lei wrote: > > > Yes. But not doing this automatically also means you keep easily > > > forgetting callsites. For example iomap still does not flush the plug > > > in your patch. > > > > It is reasonable for flush user(usually submission) to be responsible > > for finishing/flushing plug. > > Well, I very much disagree here. blk_flush_plug is not a publіc, > exported API, and that is for a reason. A bio submission interface > that requires flushing the plug to be useful is rather broken. But there isn't any such users from module now. Maybe never, since sync polled dio becomes legacy after io_uring is invented. Do we have such potential use case in which explicit flush plug is needed except for polled io in __blkdev_direct_IO_simple() and swap_readpage()? If there is, I am happy to add one flag for bypassing plug in blk core code. > > > iomap is one good example to show this point, since it does flush the plug > > before call bio_poll(), see __iomap_dio_rw(). > > iomap does not do a manual plug flush anywhere. > > iomap does finish the plug before polling, which makes sense. > > Now of course __blkdev_direct_IO_simple doesn't even use a plug > to start with, so I'm wondering what plug this patch even tries > to flush? At least blkdev_write_iter(), and __swap_writepage() might call into ->direct_IO with one plug too. Not mention loop driver can call into ->direct_IO directly, and we should have applied plug for batching submission in loop_process_work(). Thanks, Ming
On Mon, Apr 18, 2022 at 04:19:09PM +0800, Ming Lei wrote: > But there isn't any such users from module now. Maybe never, since sync > polled dio becomes legacy after io_uring is invented. I thought about that a bit, but if we decided synchronous polled I/O is not in major user anymore (which I think) and we think it is enough of a burden to support (which I'm not sure of, but this patch points to) then it might be time to remove it. > Do we have such potential use case in which explicit flush plug is > needed except for polled io in __blkdev_direct_IO_simple() and swap_readpage()? > > If there is, I am happy to add one flag for bypassing plug in blk core > code. I think the point is that we need to offer sensible APIs for I/O methods we want to support. > > > > > iomap is one good example to show this point, since it does flush the plug > > > before call bio_poll(), see __iomap_dio_rw(). > > > > iomap does not do a manual plug flush anywhere. > > > > iomap does finish the plug before polling, which makes sense. > > > > Now of course __blkdev_direct_IO_simple doesn't even use a plug > > to start with, so I'm wondering what plug this patch even tries > > to flush? > > At least blkdev_write_iter(), and __swap_writepage() might call > into ->direct_IO with one plug too. > > Not mention loop driver can call into ->direct_IO directly, and we > should have applied plug for batching submission in loop_process_work(). The loop driver still calls through the read/write iter method, and the swap code ->direct_IO path is not used for block devices (and completley broken right now, see the patch series from Neil). But the loop driver is a good point, even for the iomap case as the blk_finish_plug would only flush the plug that is on-stack in __iomap_dio_rw, it would not help you with any plug holder by caller higher in the stack.
On Tue, Apr 19, 2022 at 07:39:24AM +0200, Christoph Hellwig wrote: > On Mon, Apr 18, 2022 at 04:19:09PM +0800, Ming Lei wrote: > > But there isn't any such users from module now. Maybe never, since sync > > polled dio becomes legacy after io_uring is invented. > > I thought about that a bit, but if we decided synchronous polled I/O > is not in major user anymore (which I think) and we think it is enough > of a burden to support (which I'm not sure of, but this patch points to) > then it might be time to remove it. I agree to remove it because it is legacy poll interface, and very likely no one use it since the problem to be addressed can easily be exposed by sync polled dio sanity test or 'blktests block/007'. I'd suggest to switch sync polled dio into non-polled silently, just with logging sort of 'sync polled io is obsolete, please use io_uring for io polling', or any other idea? > > > Do we have such potential use case in which explicit flush plug is > > needed except for polled io in __blkdev_direct_IO_simple() and swap_readpage()? > > > > If there is, I am happy to add one flag for bypassing plug in blk core > > code. > > I think the point is that we need to offer sensible APIs for I/O > methods we want to support. > > > > > > > > iomap is one good example to show this point, since it does flush the plug > > > > before call bio_poll(), see __iomap_dio_rw(). > > > > > > iomap does not do a manual plug flush anywhere. > > > > > > iomap does finish the plug before polling, which makes sense. > > > > > > Now of course __blkdev_direct_IO_simple doesn't even use a plug > > > to start with, so I'm wondering what plug this patch even tries > > > to flush? > > > > At least blkdev_write_iter(), and __swap_writepage() might call > > into ->direct_IO with one plug too. > > > > Not mention loop driver can call into ->direct_IO directly, and we > > should have applied plug for batching submission in loop_process_work(). > > The loop driver still calls through the read/write iter method, and > the swap code ->direct_IO path is not used for block devices (and > completley broken right now, see the patch series from Neil). > > But the loop driver is a good point, even for the iomap case as the > blk_finish_plug would only flush the plug that is on-stack in > __iomap_dio_rw, it would not help you with any plug holder by caller > higher in the stack. Right, good catch! thanks, Ming
On Tue, Apr 19, 2022 at 03:47:26PM +0800, Ming Lei wrote: > I agree to remove it because it is legacy poll interface, and very likely > no one use it since the problem to be addressed can easily be exposed by > sync polled dio sanity test or 'blktests block/007'. > > I'd suggest to switch sync polled dio into non-polled silently, just with > logging sort of 'sync polled io is obsolete, please use io_uring for io > polling', or any other idea? We've carefully define RWF_HIPRI to just be a hint. So I don't think logging an error would be all that useful.
diff --git a/block/fops.c b/block/fops.c index 9f2ecec406b0..c9bac700e072 100644 --- a/block/fops.c +++ b/block/fops.c @@ -101,11 +101,16 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, bio_set_polled(&bio, iocb); submit_bio(&bio); + /* make sure the bio is issued before polling */ + if (bio.bi_opf & REQ_POLLED) + blk_flush_plug(current->plug, false); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (!READ_ONCE(bio.bi_private)) break; - if (!(iocb->ki_flags & IOCB_HIPRI) || !bio_poll(&bio, NULL, 0)) + if (bio.bi_opf & REQ_POLLED) + bio_poll(&bio, NULL, 0); + else blk_io_schedule(); } __set_current_state(TASK_RUNNING); diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index b08f5dc31780..e67d2f63a163 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -654,8 +654,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (!READ_ONCE(dio->submit.waiter)) break; - if (!dio->submit.poll_bio || - !bio_poll(dio->submit.poll_bio, NULL, 0)) + if (dio->submit.poll_bio && + (dio->submit.poll_bio->bi_opf & + REQ_POLLED)) + bio_poll(dio->submit.poll_bio, NULL, 0); + else blk_io_schedule(); } __set_current_state(TASK_RUNNING); diff --git a/mm/page_io.c b/mm/page_io.c index b417f000b49e..16f2a63e2524 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -421,12 +421,18 @@ int swap_readpage(struct page *page, bool synchronous) count_vm_event(PSWPIN); bio_get(bio); submit_bio(bio); + + /* make sure the bio is issued before polling */ + if (bio->bi_opf & REQ_POLLED) + blk_flush_plug(current->plug, false); while (synchronous) { set_current_state(TASK_UNINTERRUPTIBLE); if (!READ_ONCE(bio->bi_private)) break; - if (!bio_poll(bio, NULL, 0)) + if (bio->bi_opf & REQ_POLLED) + bio_poll(bio, NULL, 0); + else blk_io_schedule(); } __set_current_state(TASK_RUNNING);
For POLLED bio, bio_poll() should be called for reaping io since there isn't irq for completing the request, so we can't call into blk_io_schedule() in case that bio_poll() returns zero, otherwise io timeout is easily triggered. Also before calling bio_poll(), current->plug should be flushed out, otherwise the bio may not be issued to driver, and ->bi_cookie won't be set. CC: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Reported-by: Changhui Zhong <czhong@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- V2: - as pointed out by Christoph, iomap & mm code should be covered block/fops.c | 7 ++++++- fs/iomap/direct-io.c | 7 +++++-- mm/page_io.c | 8 +++++++- 3 files changed, 18 insertions(+), 4 deletions(-)