Message ID | 20210209023008.76263-1-axboe@kernel.dk (mailing list archive) |
---|---|
Headers | show |
Series | Improve IOCB_NOWAIT O_DIRECT reads | expand |
On Mon, 8 Feb 2021 19:30:05 -0700 Jens Axboe <axboe@kernel.dk> wrote: > Hi, > > For v1, see: > > https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/ > > tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache > entries for the given range. This causes unnecessary work from the callers > side, when the IO could have been issued totally fine without blocking on > writeback when there is none. > Seems a good idea. Obviously we'll do more work in the case where some writeback needs doing, but we'll be doing synchronous writeout in that case anyway so who cares. Please remind me what prevents pages from becoming dirty during or immediately after the filemap_range_needs_writeback() check? Perhaps filemap_range_needs_writeback() could have a comment explaining what it is that keeps its return value true after it has returned it!
On 2/9/21 12:55 PM, Andrew Morton wrote: > On Mon, 8 Feb 2021 19:30:05 -0700 Jens Axboe <axboe@kernel.dk> wrote: > >> Hi, >> >> For v1, see: >> >> https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/ >> >> tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache >> entries for the given range. This causes unnecessary work from the callers >> side, when the IO could have been issued totally fine without blocking on >> writeback when there is none. >> > > Seems a good idea. Obviously we'll do more work in the case where some > writeback needs doing, but we'll be doing synchronous writeout in that > case anyway so who cares. Right, I think that'll be a round two on top of this, so we can make the write side happier too. That's a bit more involved... > Please remind me what prevents pages from becoming dirty during or > immediately after the filemap_range_needs_writeback() check? Perhaps > filemap_range_needs_writeback() could have a comment explaining what it > is that keeps its return value true after it has returned it! It's inherently racy, just like it is now. There's really no difference there, and I don't think there's a way to close that. Even if you modified filemap_write_and_wait_range() to be non-block friendly, there's nothing stopping anyone from adding dirty page cache right after that call.
On Tue, Feb 9, 2021 at 10:25 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 2/9/21 12:55 PM, Andrew Morton wrote: > > On Mon, 8 Feb 2021 19:30:05 -0700 Jens Axboe <axboe@kernel.dk> wrote: > > > >> Hi, > >> > >> For v1, see: > >> > >> https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/ > >> > >> tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache > >> entries for the given range. This causes unnecessary work from the callers > >> side, when the IO could have been issued totally fine without blocking on > >> writeback when there is none. > >> > > > > Seems a good idea. Obviously we'll do more work in the case where some > > writeback needs doing, but we'll be doing synchronous writeout in that > > case anyway so who cares. > > Right, I think that'll be a round two on top of this, so we can make the > write side happier too. That's a bit more involved... > > > Please remind me what prevents pages from becoming dirty during or > > immediately after the filemap_range_needs_writeback() check? Perhaps > > filemap_range_needs_writeback() could have a comment explaining what it > > is that keeps its return value true after it has returned it! > > It's inherently racy, just like it is now. There's really no difference > there, and I don't think there's a way to close that. Even if you > modified filemap_write_and_wait_range() to be non-block friendly, > there's nothing stopping anyone from adding dirty page cache right after > that call. > Jens, do you have some numbers before and after your patchset is applied? And kindly a test "profile" for FIO :-)? Thanks. - Sedat -
On 2/10/21 1:07 AM, Sedat Dilek wrote: > On Tue, Feb 9, 2021 at 10:25 PM Jens Axboe <axboe@kernel.dk> wrote: >> >> On 2/9/21 12:55 PM, Andrew Morton wrote: >>> On Mon, 8 Feb 2021 19:30:05 -0700 Jens Axboe <axboe@kernel.dk> wrote: >>> >>>> Hi, >>>> >>>> For v1, see: >>>> >>>> https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/ >>>> >>>> tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache >>>> entries for the given range. This causes unnecessary work from the callers >>>> side, when the IO could have been issued totally fine without blocking on >>>> writeback when there is none. >>>> >>> >>> Seems a good idea. Obviously we'll do more work in the case where some >>> writeback needs doing, but we'll be doing synchronous writeout in that >>> case anyway so who cares. >> >> Right, I think that'll be a round two on top of this, so we can make the >> write side happier too. That's a bit more involved... >> >>> Please remind me what prevents pages from becoming dirty during or >>> immediately after the filemap_range_needs_writeback() check? Perhaps >>> filemap_range_needs_writeback() could have a comment explaining what it >>> is that keeps its return value true after it has returned it! >> >> It's inherently racy, just like it is now. There's really no difference >> there, and I don't think there's a way to close that. Even if you >> modified filemap_write_and_wait_range() to be non-block friendly, >> there's nothing stopping anyone from adding dirty page cache right after >> that call. >> > > Jens, do you have some numbers before and after your patchset is applied? I don't, the load was pretty light for the test case - it was just doing 33-34K of O_DIRECT 4k random reads in a pretty small range of the device. When you end up having page cache in that range, that means you end up punting a LOT of requests to the async worker. So it wasn't as much a performance win for this particular case, but an efficiency win. You get rid of a worker using 40% CPU, and reduce the latencies. > And kindly a test "profile" for FIO :-)? To reproduce this, have a small range dio rand reads and then have something else that does a few buffered reads from the same range.