diff mbox series

block: don't ignore REQ_NOWAIT for direct IO

Message ID 546c66d26ae71abc151aa2074c3dd75ff5efb529.1605892141.git.asml.silence@gmail.com (mailing list archive)
State New, archived
Headers show
Series block: don't ignore REQ_NOWAIT for direct IO | expand

Commit Message

Pavel Begunkov Nov. 20, 2020, 5:10 p.m. UTC
io_uring's direct nowait requests end up waiting on io_schedule() in
sbitmap, that's seems to be so because blkdev_direct_IO() fails to
propagate IOCB_NOWAIT to a bio and hence to blk-mq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/block_dev.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Pavel Begunkov Nov. 20, 2020, 5:13 p.m. UTC | #1
On 20/11/2020 17:10, Pavel Begunkov wrote:
> io_uring's direct nowait requests end up waiting on io_schedule() in
> sbitmap, that's seems to be so because blkdev_direct_IO() fails to
> propagate IOCB_NOWAIT to a bio and hence to blk-mq.

I'll leave it for judgement to those who know that code better,
but io_schedule() is gone from my traces.

> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  fs/block_dev.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 9e84b1928b94..e7e860c78d93 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -263,6 +263,8 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
>  		bio.bi_opf = dio_bio_write_op(iocb);
>  		task_io_account_write(ret);
>  	}
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		bio.bi_opf |= REQ_NOWAIT;
>  	if (iocb->ki_flags & IOCB_HIPRI)
>  		bio_set_polled(&bio, iocb);
>  
> @@ -416,6 +418,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
>  			bio->bi_opf = dio_bio_write_op(iocb);
>  			task_io_account_write(bio->bi_iter.bi_size);
>  		}
> +		if (iocb->ki_flags & IOCB_NOWAIT)
> +			bio->bi_opf |= REQ_NOWAIT;
>  
>  		dio->size += bio->bi_iter.bi_size;
>  		pos += bio->bi_iter.bi_size;
>
Jens Axboe Nov. 20, 2020, 7:13 p.m. UTC | #2
On 11/20/20 10:10 AM, Pavel Begunkov wrote:
> io_uring's direct nowait requests end up waiting on io_schedule() in
> sbitmap, that's seems to be so because blkdev_direct_IO() fails to
> propagate IOCB_NOWAIT to a bio and hence to blk-mq.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  fs/block_dev.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 9e84b1928b94..e7e860c78d93 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -263,6 +263,8 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
>  		bio.bi_opf = dio_bio_write_op(iocb);
>  		task_io_account_write(ret);
>  	}
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		bio.bi_opf |= REQ_NOWAIT;
>  	if (iocb->ki_flags & IOCB_HIPRI)
>  		bio_set_polled(&bio, iocb);

Was thinking this wasn't needed, but I guess that users could do sync && NOWAIT
and get -EAGAIN if using preadv2/pwritev2.

> @@ -416,6 +418,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
>  			bio->bi_opf = dio_bio_write_op(iocb);
>  			task_io_account_write(bio->bi_iter.bi_size);
>  		}
> +		if (iocb->ki_flags & IOCB_NOWAIT)
> +			bio->bi_opf |= REQ_NOWAIT;
>  
>  		dio->size += bio->bi_iter.bi_size;
>  		pos += bio->bi_iter.bi_size;

Looks fine to me, we definitely should not be waiting on tags for IOCB_NOWAIT
IO. Will run some shakedown and test for 5.11.
Pavel Begunkov April 2, 2021, 2:24 p.m. UTC | #3
On 20/11/2020 19:13, Jens Axboe wrote:
> On 11/20/20 10:10 AM, Pavel Begunkov wrote:
>> io_uring's direct nowait requests end up waiting on io_schedule() in
>> sbitmap, that's seems to be so because blkdev_direct_IO() fails to
>> propagate IOCB_NOWAIT to a bio and hence to blk-mq.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>  fs/block_dev.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 9e84b1928b94..e7e860c78d93 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -263,6 +263,8 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
>>  		bio.bi_opf = dio_bio_write_op(iocb);
>>  		task_io_account_write(ret);
>>  	}
>> +	if (iocb->ki_flags & IOCB_NOWAIT)
>> +		bio.bi_opf |= REQ_NOWAIT;
>>  	if (iocb->ki_flags & IOCB_HIPRI)
>>  		bio_set_polled(&bio, iocb);
> 
> Was thinking this wasn't needed, but I guess that users could do sync && NOWAIT
> and get -EAGAIN if using preadv2/pwritev2.
> 
>> @@ -416,6 +418,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
>>  			bio->bi_opf = dio_bio_write_op(iocb);
>>  			task_io_account_write(bio->bi_iter.bi_size);
>>  		}
>> +		if (iocb->ki_flags & IOCB_NOWAIT)
>> +			bio->bi_opf |= REQ_NOWAIT;
>>  
>>  		dio->size += bio->bi_iter.bi_size;
>>  		pos += bio->bi_iter.bi_size;
> 
> Looks fine to me, we definitely should not be waiting on tags for IOCB_NOWAIT
> IO. Will run some shakedown and test for 5.11.
> 

up
Jens Axboe April 2, 2021, 2:34 p.m. UTC | #4
On 11/20/20 10:10 AM, Pavel Begunkov wrote:
> io_uring's direct nowait requests end up waiting on io_schedule() in
> sbitmap, that's seems to be so because blkdev_direct_IO() fails to
> propagate IOCB_NOWAIT to a bio and hence to blk-mq.

Thanks, applied. This slipped through the cracks, and I didn't notice
until I went and directly tested some of this...

iomap suffers from the same issue, fwiw.
diff mbox series

Patch

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9e84b1928b94..e7e860c78d93 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -263,6 +263,8 @@  __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 		bio.bi_opf = dio_bio_write_op(iocb);
 		task_io_account_write(ret);
 	}
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		bio.bi_opf |= REQ_NOWAIT;
 	if (iocb->ki_flags & IOCB_HIPRI)
 		bio_set_polled(&bio, iocb);
 
@@ -416,6 +418,8 @@  __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 			bio->bi_opf = dio_bio_write_op(iocb);
 			task_io_account_write(bio->bi_iter.bi_size);
 		}
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			bio->bi_opf |= REQ_NOWAIT;
 
 		dio->size += bio->bi_iter.bi_size;
 		pos += bio->bi_iter.bi_size;