Message ID | 6b67981f-57d4-c80e-bc07-6020aa601381@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: don't read i_size of inode unless we need it | expand |
> On Oct 26, 2021, at 2:15 PM, Jens Axboe <axboe@kernel.dk> wrote: > > We always go through i_size_read(), and we rarely end up needing it. Push > the read to down where we need to check it, which avoids it for most > cases. > > It looks like we can even remove this check entirely, which might be > worth pursuing. But at least this takes it out of the hot path. > > Acked-by: Chris Mason <clm@fb.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > --- > > I came across this and wrote the patch the other day, then Pavel pointed > me at his original posting of a very similar patch back in August. > Discussed it with Chris, and it sure _seems_ like this would be fine. > > In an attempt to move the original discussion forward, here's this > posting. > I had the same concerns Dave Chinner did, but I think the i_size check inside generic_file_read_iter() is dead code at this point. Checking ki_pos against i_size was added for Btrfs: commit 66f998f611897319b555364cefd5d6e88a205866 Author: Josef Bacik <josef@redhat.com> Date: Sun May 23 11:00:54 2010 -0400 fs: allow short direct-io reads to be completed via buffered IO And we’ve switched to btrfs_file_read_iter(), which does the check the same way PavelJens have done it here. I don’t think checking i_size before or after O_DIRECT makes the race fundamentally different. We might return a short read at different times than we did before, but we won’t be returning stale/incorrect data. -chris > diff --git a/mm/filemap.c b/mm/filemap.c > index 44b4b551e430..850920276846 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2736,9 +2736,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) > struct file *file = iocb->ki_filp; > struct address_space *mapping = file->f_mapping; > struct inode *inode = mapping->host; > - loff_t size; > > - size = i_size_read(inode); > if (iocb->ki_flags & IOCB_NOWAIT) { > if (filemap_range_needs_writeback(mapping, iocb->ki_pos, > iocb->ki_pos + count - 1)) > @@ -2770,8 +2768,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) > * the rest of the read. Buffered reads will not work for > * DAX files, so don't bother trying. > */ > - if (retval < 0 || !count || iocb->ki_pos >= size || > - IS_DAX(inode)) > + if (retval < 0 || !count || IS_DAX(inode)) > + return retval; > + if (iocb->ki_pos >= i_size_read(inode)) > return retval; > } >
On 10/26/21 1:11 PM, Chris Mason wrote: > >> On Oct 26, 2021, at 2:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >> >> We always go through i_size_read(), and we rarely end up needing it. Push >> the read to down where we need to check it, which avoids it for most >> cases. >> >> It looks like we can even remove this check entirely, which might be >> worth pursuing. But at least this takes it out of the hot path. >> >> Acked-by: Chris Mason <clm@fb.com> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> --- >> >> I came across this and wrote the patch the other day, then Pavel pointed >> me at his original posting of a very similar patch back in August. >> Discussed it with Chris, and it sure _seems_ like this would be fine. >> >> In an attempt to move the original discussion forward, here's this >> posting. >> > > I had the same concerns Dave Chinner did, but I think the i_size check > inside generic_file_read_iter() is dead code at this point. Checking > ki_pos against i_size was added for Btrfs: > > commit 66f998f611897319b555364cefd5d6e88a205866 > Author: Josef Bacik <josef@redhat.com> > Date: Sun May 23 11:00:54 2010 -0400 > > fs: allow short direct-io reads to be completed via buffered IO > > And we’ve switched to btrfs_file_read_iter(), which does the check the > same way PavelJens have done it here. > > I don’t think checking i_size before or after O_DIRECT makes the race > fundamentally different. We might return a short read at different > times than we did before, but we won’t be returning stale/incorrect > data. Andrew, can you queue this one up in the mm branch?
Btw, given that you're micro-optimizing in this area: block devices still use ->direct_IO in the I/O path. It might make sense to switch to the model of the file systems that use iomap where we avoid that indirect call and can optimize the code for the direct I/O fast path. With that you woudn't even end up using this i_size_read at all
On 10/28/21 8:19 AM, Christoph Hellwig wrote: > Btw, given that you're micro-optimizing in this area: > > block devices still use ->direct_IO in the I/O path. It might make > sense to switch to the model of the file systems that use iomap where > we avoid that indirect call and can optimize the code for the direct > I/O fast path. With that you woudn't even end up using this i_size_read > at all It's not a bad idea, we do kind of jump through some hoops here by calling generic_file_iter_read(), we should just check for dio upfront and handle that, falling back to filemap_read() if needed.
diff --git a/mm/filemap.c b/mm/filemap.c index 44b4b551e430..850920276846 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2736,9 +2736,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; struct inode *inode = mapping->host; - loff_t size; - size = i_size_read(inode); if (iocb->ki_flags & IOCB_NOWAIT) { if (filemap_range_needs_writeback(mapping, iocb->ki_pos, iocb->ki_pos + count - 1)) @@ -2770,8 +2768,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) * the rest of the read. Buffered reads will not work for * DAX files, so don't bother trying. */ - if (retval < 0 || !count || iocb->ki_pos >= size || - IS_DAX(inode)) + if (retval < 0 || !count || IS_DAX(inode)) + return retval; + if (iocb->ki_pos >= i_size_read(inode)) return retval; }