mbox series

[PATCHSET,v4,0/8] Improve async iomap DIO performance

Message ID 20230720181310.71589-1-axboe@kernel.dk (mailing list archive)
Headers show
Series Improve async iomap DIO performance | expand

Message

Jens Axboe July 20, 2023, 6:13 p.m. UTC
Hi,

iomap always punts async dio write completions to a workqueue, which has
a cost in terms of efficiency (now you need an unrelated worker to
process it) and latency (now you're bouncing a completion through an
async worker, which is a classic slowdown scenario).

Even for writes that should, in theory, be able to complete inline,
if we race with truncate or need to invalidate pages post completion,
we cannot sanely be in IRQ context as the locking types don't allow
for that.

io_uring handles IRQ completions via task_work, and for writes that
don't need to do extra IO at completion time, we can safely complete
them inline from that. This patchset adds IOCB_DEFER, which an IO
issuer can set to inform the completion side that any extra work that
needs doing for that completion can be punted to a safe task context.

The iomap dio completion will happen in hard/soft irq context, and we
need a saner context to process these completions. IOCB_DIO_DEFER is
added, which can be set in a struct kiocb->ki_flags by the issuer. If
the completion side of the iocb handling understands this flag, it can
choose to set a kiocb->dio_complete() handler and just call ki_complete
from IRQ context. The issuer must then ensure that this callback is
processed from a task. io_uring punts IRQ completions to task_work
already, so it's trivial wire it up to run more of the completion before
posting a CQE. This is good for up to a 37% improvement in
throughput/latency for low queue depth IO, patch 5 has the details.

If we need to do real work at completion time, iomap will clear the
IOMAP_DIO_DEFER_COMP flag.

This work came about when Andres tested low queue depth dio writes
for postgres and compared it to doing sync dio writes, showing that the
async processing slows us down a lot.

Dave, would appreciate your input on if the logic is right now in
terms of when we can inline complete when DEFER is set!

 fs/iomap/direct-io.c | 154 +++++++++++++++++++++++++++++++++----------
 include/linux/fs.h   |  34 +++++++++-
 io_uring/rw.c        |  27 +++++++-
 3 files changed, 176 insertions(+), 39 deletions(-)

Can also be found in a git branch here:

https://git.kernel.dk/cgit/linux/log/?h=xfs-async-dio.4

Since v3:
- Add two patches for polled IO. One that completes inline if it's set
  at completion time, and one that cleans up the iocb->private handling
  and adds comments as to why they are only relevant on polled IO.
- Rename IOMAP_DIO_WRITE_FUA to IOMAP_DIO_STABLE_WRITE in conjunction
  with treating fua && vwc the same as !vwc.
- Address review comments from Christoph
- Add comments and expand commit messages, where appropriate.