Message ID | 20210209023008.76263-4-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve IOCB_NOWAIT O_DIRECT reads | expand |
On Mon, Feb 08, 2021 at 07:30:08PM -0700, Jens Axboe wrote: > + if (iocb->ki_flags & IOCB_NOWAIT) { > + if (filemap_range_needs_writeback(mapping, pos, end)) { > + ret = -EAGAIN; > + goto out_free_dio; > + } > + flags |= IOMAP_NOWAIT; > + } > if (iter_is_iovec(iter)) > dio->flags |= IOMAP_DIO_DIRTY; > } else { > + if (iocb->ki_flags & IOCB_NOWAIT) { > + if (filemap_range_has_page(mapping, pos, end)) { > + ret = -EAGAIN; > + goto out_free_dio; > + } > + flags |= IOMAP_NOWAIT; > + } > + > flags |= IOMAP_WRITE; > dio->flags |= IOMAP_DIO_WRITE; > > @@ -478,14 +493,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > dio->flags |= IOMAP_DIO_WRITE_FUA; > } > > - if (iocb->ki_flags & IOCB_NOWAIT) { > - if (filemap_range_has_page(mapping, pos, end)) { > - ret = -EAGAIN; > - goto out_free_dio; > - } > - flags |= IOMAP_NOWAIT; > - } looking at this I really hate the scheme with the potential racyness and duplicated page looksups. Why can't we pass a nonblock flag to filemap_write_and_wait_range and invalidate_inode_pages2_range that makes them return -EAGAIN when they would block to clean this whole mess up?
On 2/9/21 12:51 AM, Christoph Hellwig wrote: > On Mon, Feb 08, 2021 at 07:30:08PM -0700, Jens Axboe wrote: >> + if (iocb->ki_flags & IOCB_NOWAIT) { >> + if (filemap_range_needs_writeback(mapping, pos, end)) { >> + ret = -EAGAIN; >> + goto out_free_dio; >> + } >> + flags |= IOMAP_NOWAIT; >> + } >> if (iter_is_iovec(iter)) >> dio->flags |= IOMAP_DIO_DIRTY; >> } else { >> + if (iocb->ki_flags & IOCB_NOWAIT) { >> + if (filemap_range_has_page(mapping, pos, end)) { >> + ret = -EAGAIN; >> + goto out_free_dio; >> + } >> + flags |= IOMAP_NOWAIT; >> + } >> + >> flags |= IOMAP_WRITE; >> dio->flags |= IOMAP_DIO_WRITE; >> >> @@ -478,14 +493,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, >> dio->flags |= IOMAP_DIO_WRITE_FUA; >> } >> >> - if (iocb->ki_flags & IOCB_NOWAIT) { >> - if (filemap_range_has_page(mapping, pos, end)) { >> - ret = -EAGAIN; >> - goto out_free_dio; >> - } >> - flags |= IOMAP_NOWAIT; >> - } > > looking at this I really hate the scheme with the potential racyness > and duplicated page looksups. Me too > Why can't we pass a nonblock flag to filemap_write_and_wait_range > and invalidate_inode_pages2_range that makes them return -EAGAIN > when they would block to clean this whole mess up? We could, but that's a _lot_ of surgery. I'd rather live with the slight race for now instead of teaching writepages, page laundering, etc about IOCB_NOWAIT. I do think that's a worthy long term goal, but we dio read situation is bad enough that it warrants a quicker fix.
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5bec..5b111d1635ab 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -458,9 +458,24 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (pos >= dio->i_size) goto out_free_dio; + if (iocb->ki_flags & IOCB_NOWAIT) { + if (filemap_range_needs_writeback(mapping, pos, end)) { + ret = -EAGAIN; + goto out_free_dio; + } + flags |= IOMAP_NOWAIT; + } if (iter_is_iovec(iter)) dio->flags |= IOMAP_DIO_DIRTY; } else { + if (iocb->ki_flags & IOCB_NOWAIT) { + if (filemap_range_has_page(mapping, pos, end)) { + ret = -EAGAIN; + goto out_free_dio; + } + flags |= IOMAP_NOWAIT; + } + flags |= IOMAP_WRITE; dio->flags |= IOMAP_DIO_WRITE; @@ -478,14 +493,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->flags |= IOMAP_DIO_WRITE_FUA; } - if (iocb->ki_flags & IOCB_NOWAIT) { - if (filemap_range_has_page(mapping, pos, end)) { - ret = -EAGAIN; - goto out_free_dio; - } - flags |= IOMAP_NOWAIT; - } - ret = filemap_write_and_wait_range(mapping, pos, end); if (ret) goto out_free_dio;
For reads, use the better variant of checking for the need to call filemap_write_and_wait_range() when doing O_DIRECT. This avoids falling back to the slow path for IOCB_NOWAIT, if there are no pages to wait for (or write out). Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/iomap/direct-io.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)