diff mbox series

mm: don't read i_size of inode unless we need it

Message ID 6b67981f-57d4-c80e-bc07-6020aa601381@kernel.dk (mailing list archive)
State New, archived
Headers show
Series mm: don't read i_size of inode unless we need it | expand

Commit Message

Jens Axboe Oct. 26, 2021, 6:15 p.m. UTC
We always go through i_size_read(), and we rarely end up needing it. Push
the read to down where we need to check it, which avoids it for most
cases.

It looks like we can even remove this check entirely, which might be
worth pursuing. But at least this takes it out of the hot path.

Acked-by: Chris Mason <clm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

I came across this and wrote the patch the other day, then Pavel pointed
me at his original posting of a very similar patch back in August.
Discussed it with Chris, and it sure _seems_ like this would be fine.

In an attempt to move the original discussion forward, here's this
posting.

Comments

Chris Mason Oct. 26, 2021, 7:11 p.m. UTC | #1
> On Oct 26, 2021, at 2:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
> 
> We always go through i_size_read(), and we rarely end up needing it. Push
> the read to down where we need to check it, which avoids it for most
> cases.
> 
> It looks like we can even remove this check entirely, which might be
> worth pursuing. But at least this takes it out of the hot path.
> 
> Acked-by: Chris Mason <clm@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> I came across this and wrote the patch the other day, then Pavel pointed
> me at his original posting of a very similar patch back in August.
> Discussed it with Chris, and it sure _seems_ like this would be fine.
> 
> In an attempt to move the original discussion forward, here's this
> posting.
> 

I had the same concerns Dave Chinner did, but I think the i_size check inside generic_file_read_iter() is dead code at this point.  Checking ki_pos against i_size was added for Btrfs:

commit 66f998f611897319b555364cefd5d6e88a205866
Author: Josef Bacik <josef@redhat.com>
Date:   Sun May 23 11:00:54 2010 -0400

    fs: allow short direct-io reads to be completed via buffered IO

And we’ve switched to btrfs_file_read_iter(), which does the check the same way PavelJens have done it here.

I don’t think checking i_size before or after O_DIRECT makes the race fundamentally different.  We might return a short read at different  times than we did before, but we won’t be returning stale/incorrect data.

-chris

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 44b4b551e430..850920276846 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2736,9 +2736,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> 		struct file *file = iocb->ki_filp;
> 		struct address_space *mapping = file->f_mapping;
> 		struct inode *inode = mapping->host;
> -		loff_t size;
> 
> -		size = i_size_read(inode);
> 		if (iocb->ki_flags & IOCB_NOWAIT) {
> 			if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
> 						iocb->ki_pos + count - 1))
> @@ -2770,8 +2768,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> 		 * the rest of the read.  Buffered reads will not work for
> 		 * DAX files, so don't bother trying.
> 		 */
> -		if (retval < 0 || !count || iocb->ki_pos >= size ||
> -		    IS_DAX(inode))
> +		if (retval < 0 || !count || IS_DAX(inode))
> +			return retval;
> +		if (iocb->ki_pos >= i_size_read(inode))
> 			return retval;
> 	}
>
Jens Axboe Oct. 27, 2021, 3:50 p.m. UTC | #2
On 10/26/21 1:11 PM, Chris Mason wrote:
> 
>> On Oct 26, 2021, at 2:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> We always go through i_size_read(), and we rarely end up needing it. Push
>> the read to down where we need to check it, which avoids it for most
>> cases.
>>
>> It looks like we can even remove this check entirely, which might be
>> worth pursuing. But at least this takes it out of the hot path.
>>
>> Acked-by: Chris Mason <clm@fb.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> I came across this and wrote the patch the other day, then Pavel pointed
>> me at his original posting of a very similar patch back in August.
>> Discussed it with Chris, and it sure _seems_ like this would be fine.
>>
>> In an attempt to move the original discussion forward, here's this
>> posting.
>>
> 
> I had the same concerns Dave Chinner did, but I think the i_size check
> inside generic_file_read_iter() is dead code at this point.  Checking
> ki_pos against i_size was added for Btrfs:
> 
> commit 66f998f611897319b555364cefd5d6e88a205866
> Author: Josef Bacik <josef@redhat.com>
> Date:   Sun May 23 11:00:54 2010 -0400
> 
>     fs: allow short direct-io reads to be completed via buffered IO
> 
> And we’ve switched to btrfs_file_read_iter(), which does the check the
> same way PavelJens have done it here.
> 
> I don’t think checking i_size before or after O_DIRECT makes the race
> fundamentally different.  We might return a short read at different
> times than we did before, but we won’t be returning stale/incorrect
> data.

Andrew, can you queue this one up in the mm branch?
Christoph Hellwig Oct. 28, 2021, 2:19 p.m. UTC | #3
Btw, given that you're micro-optimizing in this area:

block devices still use ->direct_IO in the I/O path.  It might make
sense to switch to the model of the file systems that use iomap where
we avoid that indirect call and can optimize the code for the direct
I/O fast path.  With that you woudn't even end up using this i_size_read
at all
Jens Axboe Oct. 28, 2021, 3 p.m. UTC | #4
On 10/28/21 8:19 AM, Christoph Hellwig wrote:
> Btw, given that you're micro-optimizing in this area:
> 
> block devices still use ->direct_IO in the I/O path.  It might make
> sense to switch to the model of the file systems that use iomap where
> we avoid that indirect call and can optimize the code for the direct
> I/O fast path.  With that you woudn't even end up using this i_size_read
> at all

It's not a bad idea, we do kind of jump through some hoops here by
calling generic_file_iter_read(), we should just check for dio upfront
and handle that, falling back to filemap_read() if needed.
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index 44b4b551e430..850920276846 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2736,9 +2736,7 @@  generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		struct file *file = iocb->ki_filp;
 		struct address_space *mapping = file->f_mapping;
 		struct inode *inode = mapping->host;
-		loff_t size;
 
-		size = i_size_read(inode);
 		if (iocb->ki_flags & IOCB_NOWAIT) {
 			if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
 						iocb->ki_pos + count - 1))
@@ -2770,8 +2768,9 @@  generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		 * the rest of the read.  Buffered reads will not work for
 		 * DAX files, so don't bother trying.
 		 */
-		if (retval < 0 || !count || iocb->ki_pos >= size ||
-		    IS_DAX(inode))
+		if (retval < 0 || !count || IS_DAX(inode))
+			return retval;
+		if (iocb->ki_pos >= i_size_read(inode))
 			return retval;
 	}