diff mbox series

[V2] block: avoid io timeout in case of sync polled dio

Message ID 20220415034703.2081695-1-ming.lei@redhat.com (mailing list archive)
State Deferred, archived
Headers show
Series [V2] block: avoid io timeout in case of sync polled dio | expand

Commit Message

Ming Lei April 15, 2022, 3:47 a.m. UTC
For POLLED bio, bio_poll() should be called for reaping io since
there isn't irq for completing the request, so we can't call into
blk_io_schedule() in case that bio_poll() returns zero, otherwise
io timeout is easily triggered.

Also before calling bio_poll(), current->plug should be flushed out,
otherwise the bio may not be issued to driver, and ->bi_cookie won't
be set.

CC: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Reported-by: Changhui Zhong <czhong@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V2:
	- as pointed out by Christoph, iomap & mm code should be covered

 block/fops.c         | 7 ++++++-
 fs/iomap/direct-io.c | 7 +++++--
 mm/page_io.c         | 8 +++++++-
 3 files changed, 18 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig April 15, 2022, 5:18 a.m. UTC | #1
On Fri, Apr 15, 2022 at 11:47:03AM +0800, Ming Lei wrote:
> +	/* make sure the bio is issued before polling */
> +	if (bio.bi_opf & REQ_POLLED)
> +		blk_flush_plug(current->plug, false);

I still think the core code should handle this.  Without that we'd need
to export the blk_flush_plug for anything that would want to poll bios
from modules, in addition to it generally being a mess.  See a proposed
patch for that below.  I'd also split the flush aspect from the poll
aspect into two patches.

> +		if (bio.bi_opf & REQ_POLLED)
> +			bio_poll(&bio, NULL, 0);
> +		else
>  			blk_io_schedule();

Instead of this duplicate logic everywhere I'd just make bio_boll
call blk_io_schedule for the !REQ_POLLED case and simplify all the
callers.

> +			if (dio->submit.poll_bio &&
> +					(dio->submit.poll_bio->bi_opf &
> +						REQ_POLLED))

This indentation looks awfull,normal would be:

			if (dio->submit.poll_bio &&
			    (dio->submit.poll_bio->bi_opf & REQ_POLLED))

---
From 08ff61b0142eb708fc384cf867c72175561d974a Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Fri, 15 Apr 2022 07:15:42 +0200
Subject: blk-mq: don't plug for synchronously polled requests

For synchronous polling to work, the bio must be issued to the driver from
the submit_bio call, otherwise ->bi_cookie won't be set.

Based on a patch from Ming Lei.

Reported-by: Changhui Zhong <czhong@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ed3ed86f7dd24..bcc7e3d11296c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2851,7 +2851,13 @@ void blk_mq_submit_bio(struct bio *bio)
 		return;
 	}
 
-	if (plug)
+	/*
+	 * We can't plug for synchronously polled submissions, otherwise
+	 * bio->bi_cookie won't be set directly after submission, which is the
+	 * indicator used by the submitter to check if a bio needs polling.
+	 */
+	if (plug &&
+	    (rq->bio->bi_opf & (REQ_POLLED | REQ_NOWAIT)) != REQ_POLLED)
 		blk_add_rq_to_plug(plug, rq);
 	else if ((rq->rq_flags & RQF_ELV) ||
 		 (rq->mq_hctx->dispatch_busy &&
Ming Lei April 15, 2022, 11 a.m. UTC | #2
On Fri, Apr 15, 2022 at 07:18:44AM +0200, Christoph Hellwig wrote:
> On Fri, Apr 15, 2022 at 11:47:03AM +0800, Ming Lei wrote:
> > +	/* make sure the bio is issued before polling */
> > +	if (bio.bi_opf & REQ_POLLED)
> > +		blk_flush_plug(current->plug, false);
> 
> I still think the core code should handle this.  Without that we'd need
> to export the blk_flush_plug for anything that would want to poll bios
> from modules, in addition to it generally being a mess.  See a proposed

So far there isn't such usage yet. dm calls bio_poll() in ->iopoll(),
and its caller(io_uring) will finish the plug.

> patch for that below.  I'd also split the flush aspect from the poll
> aspect into two patches.
> 
> > +		if (bio.bi_opf & REQ_POLLED)
> > +			bio_poll(&bio, NULL, 0);
> > +		else
> >  			blk_io_schedule();
> 
> Instead of this duplicate logic everywhere I'd just make bio_boll
> call blk_io_schedule for the !REQ_POLLED case and simplify all the
> callers.

bio_poll() may be called with rcu read lock held, so I'd suggest to
not mix the two together.

> 
> > +			if (dio->submit.poll_bio &&
> > +					(dio->submit.poll_bio->bi_opf &
> > +						REQ_POLLED))
> 
> This indentation looks awfull,normal would be:
> 
> 			if (dio->submit.poll_bio &&
> 			    (dio->submit.poll_bio->bi_opf & REQ_POLLED))

That follows the indentation style of fs/iomap/direct-io.c for break in
'if'.

> 
> ---
> From 08ff61b0142eb708fc384cf867c72175561d974a Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Fri, 15 Apr 2022 07:15:42 +0200
> Subject: blk-mq: don't plug for synchronously polled requests
> 
> For synchronous polling to work, the bio must be issued to the driver from
> the submit_bio call, otherwise ->bi_cookie won't be set.
> 
> Based on a patch from Ming Lei.
> 
> Reported-by: Changhui Zhong <czhong@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-mq.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ed3ed86f7dd24..bcc7e3d11296c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2851,7 +2851,13 @@ void blk_mq_submit_bio(struct bio *bio)
>  		return;
>  	}
>  
> -	if (plug)
> +	/*
> +	 * We can't plug for synchronously polled submissions, otherwise
> +	 * bio->bi_cookie won't be set directly after submission, which is the
> +	 * indicator used by the submitter to check if a bio needs polling.
> +	 */
> +	if (plug &&
> +	    (rq->bio->bi_opf & (REQ_POLLED | REQ_NOWAIT)) != REQ_POLLED)
>  		blk_add_rq_to_plug(plug, rq);
>  	else if ((rq->rq_flags & RQF_ELV) ||
>  		 (rq->mq_hctx->dispatch_busy &&

It is nothing to do with REQ_NOWAIT. sync polled dio can be marked as
REQ_NOWAIT by userspace too. If '--nowait=1' is added in the fio
reproducer, io timeout is triggered too.



Thanks,
Ming
Christoph Hellwig April 16, 2022, 5:49 a.m. UTC | #3
On Fri, Apr 15, 2022 at 07:00:37PM +0800, Ming Lei wrote:
> On Fri, Apr 15, 2022 at 07:18:44AM +0200, Christoph Hellwig wrote:
> > On Fri, Apr 15, 2022 at 11:47:03AM +0800, Ming Lei wrote:
> > > +	/* make sure the bio is issued before polling */
> > > +	if (bio.bi_opf & REQ_POLLED)
> > > +		blk_flush_plug(current->plug, false);
> > 
> > I still think the core code should handle this.  Without that we'd need
> > to export the blk_flush_plug for anything that would want to poll bios
> > from modules, in addition to it generally being a mess.  See a proposed
> 
> So far there isn't such usage yet. dm calls bio_poll() in ->iopoll(),
> and its caller(io_uring) will finish the plug.

Yes.  But not doing this automatically also means you keep easily
forgetting callsites.  For example iomap still does not flush the plug
in your patch.

> > patch for that below.  I'd also split the flush aspect from the poll
> > aspect into two patches.
> > 
> > > +		if (bio.bi_opf & REQ_POLLED)
> > > +			bio_poll(&bio, NULL, 0);
> > > +		else
> > >  			blk_io_schedule();
> > 
> > Instead of this duplicate logic everywhere I'd just make bio_boll
> > call blk_io_schedule for the !REQ_POLLED case and simplify all the
> > callers.
> 
> bio_poll() may be called with rcu read lock held, so I'd suggest to
> not mix the two together.

Ok, makes sense.

> > 
> > > +			if (dio->submit.poll_bio &&
> > > +					(dio->submit.poll_bio->bi_opf &
> > > +						REQ_POLLED))
> > 
> > This indentation looks awfull,normal would be:
> > 
> > 			if (dio->submit.poll_bio &&
> > 			    (dio->submit.poll_bio->bi_opf & REQ_POLLED))
> 
> That follows the indentation style of fs/iomap/direct-io.c for break in
> 'if'.

It doesn't.  Just look at the conditional you replaced for example :)

> > +	/*
> > +	 * We can't plug for synchronously polled submissions, otherwise
> > +	 * bio->bi_cookie won't be set directly after submission, which is the
> > +	 * indicator used by the submitter to check if a bio needs polling.
> > +	 */
> > +	if (plug &&
> > +	    (rq->bio->bi_opf & (REQ_POLLED | REQ_NOWAIT)) != REQ_POLLED)
> >  		blk_add_rq_to_plug(plug, rq);
> >  	else if ((rq->rq_flags & RQF_ELV) ||
> >  		 (rq->mq_hctx->dispatch_busy &&
> 
> It is nothing to do with REQ_NOWAIT. sync polled dio can be marked as
> REQ_NOWAIT by userspace too. If '--nowait=1' is added in the fio
> reproducer, io timeout is triggered too.

True.  So I guess we'll need a new flag to distinguish the cases.
Ming Lei April 16, 2022, 9:03 a.m. UTC | #4
On Sat, Apr 16, 2022 at 07:49:13AM +0200, Christoph Hellwig wrote:
> On Fri, Apr 15, 2022 at 07:00:37PM +0800, Ming Lei wrote:
> > On Fri, Apr 15, 2022 at 07:18:44AM +0200, Christoph Hellwig wrote:
> > > On Fri, Apr 15, 2022 at 11:47:03AM +0800, Ming Lei wrote:
> > > > +	/* make sure the bio is issued before polling */
> > > > +	if (bio.bi_opf & REQ_POLLED)
> > > > +		blk_flush_plug(current->plug, false);
> > > 
> > > I still think the core code should handle this.  Without that we'd need
> > > to export the blk_flush_plug for anything that would want to poll bios
> > > from modules, in addition to it generally being a mess.  See a proposed
> > 
> > So far there isn't such usage yet. dm calls bio_poll() in ->iopoll(),
> > and its caller(io_uring) will finish the plug.
> 
> Yes.  But not doing this automatically also means you keep easily
> forgetting callsites.  For example iomap still does not flush the plug
> in your patch.

It is reasonable for flush user(usually submission) to be responsible
for finishing/flushing plug.

iomap is one good example to show this point, since it does flush the plug
before call bio_poll(), see __iomap_dio_rw().

> 
> > > patch for that below.  I'd also split the flush aspect from the poll
> > > aspect into two patches.
> > > 
> > > > +		if (bio.bi_opf & REQ_POLLED)
> > > > +			bio_poll(&bio, NULL, 0);
> > > > +		else
> > > >  			blk_io_schedule();
> > > 
> > > Instead of this duplicate logic everywhere I'd just make bio_boll
> > > call blk_io_schedule for the !REQ_POLLED case and simplify all the
> > > callers.
> > 
> > bio_poll() may be called with rcu read lock held, so I'd suggest to
> > not mix the two together.
> 
> Ok, makes sense.
> 
> > > 
> > > > +			if (dio->submit.poll_bio &&
> > > > +					(dio->submit.poll_bio->bi_opf &
> > > > +						REQ_POLLED))
> > > 
> > > This indentation looks awfull,normal would be:
> > > 
> > > 			if (dio->submit.poll_bio &&
> > > 			    (dio->submit.poll_bio->bi_opf & REQ_POLLED))
> > 
> > That follows the indentation style of fs/iomap/direct-io.c for break in
> > 'if'.
> 
> It doesn't.  Just look at the conditional you replaced for example :)

OK, I will change to your style.

> 
> > > +	/*
> > > +	 * We can't plug for synchronously polled submissions, otherwise
> > > +	 * bio->bi_cookie won't be set directly after submission, which is the
> > > +	 * indicator used by the submitter to check if a bio needs polling.
> > > +	 */
> > > +	if (plug &&
> > > +	    (rq->bio->bi_opf & (REQ_POLLED | REQ_NOWAIT)) != REQ_POLLED)
> > >  		blk_add_rq_to_plug(plug, rq);
> > >  	else if ((rq->rq_flags & RQF_ELV) ||
> > >  		 (rq->mq_hctx->dispatch_busy &&
> > 
> > It is nothing to do with REQ_NOWAIT. sync polled dio can be marked as
> > REQ_NOWAIT by userspace too. If '--nowait=1' is added in the fio
> > reproducer, io timeout is triggered too.
> 
> True.  So I guess we'll need a new flag to distinguish the cases.

If there will be more such kind of poll usage in kernel, I think it is fine
to add the flag, but so far all the three aren't used very often.


Thanks,
Ming
Christoph Hellwig April 18, 2022, 5:12 a.m. UTC | #5
On Sat, Apr 16, 2022 at 05:03:35PM +0800, Ming Lei wrote:
> > Yes.  But not doing this automatically also means you keep easily
> > forgetting callsites.  For example iomap still does not flush the plug
> > in your patch.
> 
> It is reasonable for flush user(usually submission) to be responsible
> for finishing/flushing plug.

Well, I very much disagree here.  blk_flush_plug is not a publіc,
exported API, and that is for a reason.  A bio submission interface
that requires flushing the plug to be useful is rather broken.

> iomap is one good example to show this point, since it does flush the plug
> before call bio_poll(), see __iomap_dio_rw().

iomap does not do a manual plug flush anywhere.

iomap does finish the plug before polling, which makes sense.

Now of course __blkdev_direct_IO_simple doesn't even use a plug
to start with, so I'm wondering what plug this patch even tries
to flush?
Ming Lei April 18, 2022, 8:19 a.m. UTC | #6
On Mon, Apr 18, 2022 at 07:12:34AM +0200, Christoph Hellwig wrote:
> On Sat, Apr 16, 2022 at 05:03:35PM +0800, Ming Lei wrote:
> > > Yes.  But not doing this automatically also means you keep easily
> > > forgetting callsites.  For example iomap still does not flush the plug
> > > in your patch.
> > 
> > It is reasonable for flush user(usually submission) to be responsible
> > for finishing/flushing plug.
> 
> Well, I very much disagree here.  blk_flush_plug is not a publіc,
> exported API, and that is for a reason.  A bio submission interface
> that requires flushing the plug to be useful is rather broken.

But there isn't any such users from module now. Maybe never, since sync
polled dio becomes legacy after io_uring is invented.

Do we have such potential use case in which explicit flush plug is
needed except for polled io in __blkdev_direct_IO_simple() and swap_readpage()?

If there is, I am happy to add one flag for bypassing plug in blk core
code.

> 
> > iomap is one good example to show this point, since it does flush the plug
> > before call bio_poll(), see __iomap_dio_rw().
> 
> iomap does not do a manual plug flush anywhere.
> 
> iomap does finish the plug before polling, which makes sense.
> 
> Now of course __blkdev_direct_IO_simple doesn't even use a plug
> to start with, so I'm wondering what plug this patch even tries
> to flush?
 
At least blkdev_write_iter(), and __swap_writepage() might call
into ->direct_IO with one plug too.

Not mention loop driver can call into ->direct_IO directly, and we
should have applied plug for batching submission in loop_process_work().


Thanks,
Ming
Christoph Hellwig April 19, 2022, 5:39 a.m. UTC | #7
On Mon, Apr 18, 2022 at 04:19:09PM +0800, Ming Lei wrote:
> But there isn't any such users from module now. Maybe never, since sync
> polled dio becomes legacy after io_uring is invented.

I thought about that a bit, but if we decided synchronous polled I/O
is not in major user anymore (which I think) and we think it is enough
of a burden to support (which I'm not sure of, but this patch points to)
then it might be time to remove it.

> Do we have such potential use case in which explicit flush plug is
> needed except for polled io in __blkdev_direct_IO_simple() and swap_readpage()?
> 
> If there is, I am happy to add one flag for bypassing plug in blk core
> code.

I think the point is that we need to offer sensible APIs for I/O
methods we want to support.

> > 
> > > iomap is one good example to show this point, since it does flush the plug
> > > before call bio_poll(), see __iomap_dio_rw().
> > 
> > iomap does not do a manual plug flush anywhere.
> > 
> > iomap does finish the plug before polling, which makes sense.
> > 
> > Now of course __blkdev_direct_IO_simple doesn't even use a plug
> > to start with, so I'm wondering what plug this patch even tries
> > to flush?
>  
> At least blkdev_write_iter(), and __swap_writepage() might call
> into ->direct_IO with one plug too.
> 
> Not mention loop driver can call into ->direct_IO directly, and we
> should have applied plug for batching submission in loop_process_work().

The loop driver still calls through the read/write iter method, and
the swap code ->direct_IO path is not used for block devices (and
completley broken right now, see the patch series from Neil).

But the loop driver is a good point, even for the iomap case as the
blk_finish_plug would only flush the plug that is on-stack in
__iomap_dio_rw, it would not help you with any plug holder by caller
higher in the stack.
Ming Lei April 19, 2022, 7:47 a.m. UTC | #8
On Tue, Apr 19, 2022 at 07:39:24AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 18, 2022 at 04:19:09PM +0800, Ming Lei wrote:
> > But there isn't any such users from module now. Maybe never, since sync
> > polled dio becomes legacy after io_uring is invented.
> 
> I thought about that a bit, but if we decided synchronous polled I/O
> is not in major user anymore (which I think) and we think it is enough
> of a burden to support (which I'm not sure of, but this patch points to)
> then it might be time to remove it.

I agree to remove it because it is legacy poll interface, and very likely
no one use it since the problem to be addressed can easily be exposed by
sync polled dio sanity test or 'blktests block/007'.

I'd suggest to switch sync polled dio into non-polled silently, just with
logging sort of 'sync polled io is obsolete, please use io_uring for io
polling', or any other idea?

> 
> > Do we have such potential use case in which explicit flush plug is
> > needed except for polled io in __blkdev_direct_IO_simple() and swap_readpage()?
> > 
> > If there is, I am happy to add one flag for bypassing plug in blk core
> > code.
> 
> I think the point is that we need to offer sensible APIs for I/O
> methods we want to support.
> 
> > > 
> > > > iomap is one good example to show this point, since it does flush the plug
> > > > before call bio_poll(), see __iomap_dio_rw().
> > > 
> > > iomap does not do a manual plug flush anywhere.
> > > 
> > > iomap does finish the plug before polling, which makes sense.
> > > 
> > > Now of course __blkdev_direct_IO_simple doesn't even use a plug
> > > to start with, so I'm wondering what plug this patch even tries
> > > to flush?
> >  
> > At least blkdev_write_iter(), and __swap_writepage() might call
> > into ->direct_IO with one plug too.
> > 
> > Not mention loop driver can call into ->direct_IO directly, and we
> > should have applied plug for batching submission in loop_process_work().
> 
> The loop driver still calls through the read/write iter method, and
> the swap code ->direct_IO path is not used for block devices (and
> completley broken right now, see the patch series from Neil).
> 
> But the loop driver is a good point, even for the iomap case as the
> blk_finish_plug would only flush the plug that is on-stack in
> __iomap_dio_rw, it would not help you with any plug holder by caller
> higher in the stack.

Right, good catch!


thanks,
Ming
Christoph Hellwig April 19, 2022, 8:15 a.m. UTC | #9
On Tue, Apr 19, 2022 at 03:47:26PM +0800, Ming Lei wrote:
> I agree to remove it because it is legacy poll interface, and very likely
> no one use it since the problem to be addressed can easily be exposed by
> sync polled dio sanity test or 'blktests block/007'.
> 
> I'd suggest to switch sync polled dio into non-polled silently, just with
> logging sort of 'sync polled io is obsolete, please use io_uring for io
> polling', or any other idea?

We've carefully define RWF_HIPRI to just be a hint.  So I don't think
logging an error would be all that useful.
diff mbox series

Patch

diff --git a/block/fops.c b/block/fops.c
index 9f2ecec406b0..c9bac700e072 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -101,11 +101,16 @@  static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 		bio_set_polled(&bio, iocb);
 
 	submit_bio(&bio);
+	/* make sure the bio is issued before polling */
+	if (bio.bi_opf & REQ_POLLED)
+		blk_flush_plug(current->plug, false);
 	for (;;) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!READ_ONCE(bio.bi_private))
 			break;
-		if (!(iocb->ki_flags & IOCB_HIPRI) || !bio_poll(&bio, NULL, 0))
+		if (bio.bi_opf & REQ_POLLED)
+			bio_poll(&bio, NULL, 0);
+		else
 			blk_io_schedule();
 	}
 	__set_current_state(TASK_RUNNING);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b08f5dc31780..e67d2f63a163 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -654,8 +654,11 @@  __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			if (!READ_ONCE(dio->submit.waiter))
 				break;
 
-			if (!dio->submit.poll_bio ||
-			    !bio_poll(dio->submit.poll_bio, NULL, 0))
+			if (dio->submit.poll_bio &&
+					(dio->submit.poll_bio->bi_opf &
+						REQ_POLLED))
+				bio_poll(dio->submit.poll_bio, NULL, 0);
+			else
 				blk_io_schedule();
 		}
 		__set_current_state(TASK_RUNNING);
diff --git a/mm/page_io.c b/mm/page_io.c
index b417f000b49e..16f2a63e2524 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -421,12 +421,18 @@  int swap_readpage(struct page *page, bool synchronous)
 	count_vm_event(PSWPIN);
 	bio_get(bio);
 	submit_bio(bio);
+
+	/* make sure the bio is issued before polling */
+	if (bio->bi_opf & REQ_POLLED)
+		blk_flush_plug(current->plug, false);
 	while (synchronous) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!READ_ONCE(bio->bi_private))
 			break;
 
-		if (!bio_poll(bio, NULL, 0))
+		if (bio->bi_opf & REQ_POLLED)
+			bio_poll(bio, NULL, 0);
+		else
 			blk_io_schedule();
 	}
 	__set_current_state(TASK_RUNNING);