mbox series

[PATCHSET,v2,0/3] Improve IOCB_NOWAIT O_DIRECT reads

Message ID 20210209023008.76263-1-axboe@kernel.dk (mailing list archive)
Headers show
Series Improve IOCB_NOWAIT O_DIRECT reads | expand

Message

Jens Axboe Feb. 9, 2021, 2:30 a.m. UTC
Hi,

For v1, see:

https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/

tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache
entries for the given range. This causes unnecessary work from the callers
side, when the IO could have been issued totally fine without blocking on
writeback when there is none.

 fs/iomap/direct-io.c | 23 ++++++++++++++--------
 include/linux/fs.h   |  2 ++
 mm/filemap.c         | 47 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 62 insertions(+), 10 deletions(-)

Since v1:

- Simplify the filemap_range_needs_writeback() loop (Willy)
- Drop the write side (Chinner)

Comments

Andrew Morton Feb. 9, 2021, 7:55 p.m. UTC | #1
On Mon,  8 Feb 2021 19:30:05 -0700 Jens Axboe <axboe@kernel.dk> wrote:

> Hi,
> 
> For v1, see:
> 
> https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/
> 
> tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache
> entries for the given range. This causes unnecessary work from the callers
> side, when the IO could have been issued totally fine without blocking on
> writeback when there is none.
> 

Seems a good idea.  Obviously we'll do more work in the case where some
writeback needs doing, but we'll be doing synchronous writeout in that
case anyway so who cares.

Please remind me what prevents pages from becoming dirty during or
immediately after the filemap_range_needs_writeback() check?  Perhaps
filemap_range_needs_writeback() could have a comment explaining what it
is that keeps its return value true after it has returned it!
Jens Axboe Feb. 9, 2021, 8:11 p.m. UTC | #2
On 2/9/21 12:55 PM, Andrew Morton wrote:
> On Mon,  8 Feb 2021 19:30:05 -0700 Jens Axboe <axboe@kernel.dk> wrote:
> 
>> Hi,
>>
>> For v1, see:
>>
>> https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/
>>
>> tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache
>> entries for the given range. This causes unnecessary work from the callers
>> side, when the IO could have been issued totally fine without blocking on
>> writeback when there is none.
>>
> 
> Seems a good idea.  Obviously we'll do more work in the case where some
> writeback needs doing, but we'll be doing synchronous writeout in that
> case anyway so who cares.

Right, I think that'll be a round two on top of this, so we can make the
write side happier too. That's a bit more involved...

> Please remind me what prevents pages from becoming dirty during or
> immediately after the filemap_range_needs_writeback() check?  Perhaps
> filemap_range_needs_writeback() could have a comment explaining what it
> is that keeps its return value true after it has returned it!

It's inherently racy, just like it is now. There's really no difference
there, and I don't think there's a way to close that. Even if you
modified filemap_write_and_wait_range() to be non-block friendly,
there's nothing stopping anyone from adding dirty page cache right after
that call.
Sedat Dilek Feb. 10, 2021, 8:07 a.m. UTC | #3
On Tue, Feb 9, 2021 at 10:25 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 2/9/21 12:55 PM, Andrew Morton wrote:
> > On Mon,  8 Feb 2021 19:30:05 -0700 Jens Axboe <axboe@kernel.dk> wrote:
> >
> >> Hi,
> >>
> >> For v1, see:
> >>
> >> https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/
> >>
> >> tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache
> >> entries for the given range. This causes unnecessary work from the callers
> >> side, when the IO could have been issued totally fine without blocking on
> >> writeback when there is none.
> >>
> >
> > Seems a good idea.  Obviously we'll do more work in the case where some
> > writeback needs doing, but we'll be doing synchronous writeout in that
> > case anyway so who cares.
>
> Right, I think that'll be a round two on top of this, so we can make the
> write side happier too. That's a bit more involved...
>
> > Please remind me what prevents pages from becoming dirty during or
> > immediately after the filemap_range_needs_writeback() check?  Perhaps
> > filemap_range_needs_writeback() could have a comment explaining what it
> > is that keeps its return value true after it has returned it!
>
> It's inherently racy, just like it is now. There's really no difference
> there, and I don't think there's a way to close that. Even if you
> modified filemap_write_and_wait_range() to be non-block friendly,
> there's nothing stopping anyone from adding dirty page cache right after
> that call.
>

Jens, do you have some numbers before and after your patchset is applied?

And kindly a test "profile" for FIO :-)?

Thanks.

- Sedat -
Jens Axboe Feb. 10, 2021, 2:47 p.m. UTC | #4
On 2/10/21 1:07 AM, Sedat Dilek wrote:
> On Tue, Feb 9, 2021 at 10:25 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 2/9/21 12:55 PM, Andrew Morton wrote:
>>> On Mon,  8 Feb 2021 19:30:05 -0700 Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>>> Hi,
>>>>
>>>> For v1, see:
>>>>
>>>> https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/
>>>>
>>>> tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache
>>>> entries for the given range. This causes unnecessary work from the callers
>>>> side, when the IO could have been issued totally fine without blocking on
>>>> writeback when there is none.
>>>>
>>>
>>> Seems a good idea.  Obviously we'll do more work in the case where some
>>> writeback needs doing, but we'll be doing synchronous writeout in that
>>> case anyway so who cares.
>>
>> Right, I think that'll be a round two on top of this, so we can make the
>> write side happier too. That's a bit more involved...
>>
>>> Please remind me what prevents pages from becoming dirty during or
>>> immediately after the filemap_range_needs_writeback() check?  Perhaps
>>> filemap_range_needs_writeback() could have a comment explaining what it
>>> is that keeps its return value true after it has returned it!
>>
>> It's inherently racy, just like it is now. There's really no difference
>> there, and I don't think there's a way to close that. Even if you
>> modified filemap_write_and_wait_range() to be non-block friendly,
>> there's nothing stopping anyone from adding dirty page cache right after
>> that call.
>>
> 
> Jens, do you have some numbers before and after your patchset is applied?

I don't, the load was pretty light for the test case - it was just doing
33-34K of O_DIRECT 4k random reads in a pretty small range of the device.
When you end up having page cache in that range, that means you end up
punting a LOT of requests to the async worker. So it wasn't as much a
performance win for this particular case, but an efficiency win. You get
rid of a worker using 40% CPU, and reduce the latencies.

> And kindly a test "profile" for FIO :-)?

To reproduce this, have a small range dio rand reads and then have
something else that does a few buffered reads from the same range.