mbox series

[0/4] io_uring: use ITER_UBUF

Message ID 20221107175610.349807-1-kbusch@meta.com (mailing list archive)
Headers show
Series io_uring: use ITER_UBUF | expand

Message

Keith Busch Nov. 7, 2022, 5:56 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

ITER_UBUF is a more efficient representation when using single vector
buffers, providing small optimizations in the fast path. Most of this
series came from Jens; I just ported them forward to the current release
and tested against various filesystems and devices.

Usage for this new iter type has been extensively exercised via
read/write syscall interface for some time now, so I don't expect
surprises from supporting this with io_uring. There are, however, a
couple difference between the two interfaces:

  1. io_uring will always prefer using the _iter versions of read/write
     callbacks if file_operations implement both, where as the generic
     syscalls will use .read/.write (if implemented) for non-vectored IO.
 
  2. io_uring will use the ITER_UBUF representation for single vector
     readv/writev, but the generic syscalls currently uses ITER_IOVEC for
     these.

That should mean, then, the only potential areas for problem are for
file_operations that implement both .read/.read_iter or
.write/.write_iter. Fortunately there are very few that do that, and I
found only two of them that won't readily work: qib_file_ops, and
snd_pcm_f_ops. The former is already broken with io_uring before this
series, and the latter's vectored read/write only works with ITER_IOVEC,
so that will break, but I don't think anyone is using io_uring to talk
to a sound card driver.

Jens Axboe (3):
  iov: add import_ubuf()
  io_uring: switch network send/recv to ITER_UBUF
  io_uring: use ubuf for single range imports for read/write

Keith Busch (1):
  iov_iter: move iter_ubuf check inside restore WARN

 include/linux/uio.h |  1 +
 io_uring/net.c      | 13 ++++---------
 io_uring/rw.c       |  9 ++++++---
 lib/iov_iter.c      | 15 +++++++++++++--
 4 files changed, 24 insertions(+), 14 deletions(-)

Comments

Christoph Hellwig Nov. 8, 2022, 6:54 a.m. UTC | #1
On Mon, Nov 07, 2022 at 09:56:06AM -0800, Keith Busch wrote:
>   1. io_uring will always prefer using the _iter versions of read/write
>      callbacks if file_operations implement both, where as the generic
>      syscalls will use .read/.write (if implemented) for non-vectored IO.

There are very few file operations that have both, and for those
the difference matters, e.g. the strange vectors semantics for the
sound code.  I would strongly suggest to mirror what the normal
read/write path does here.

>   2. io_uring will use the ITER_UBUF representation for single vector
>      readv/writev, but the generic syscalls currently uses ITER_IOVEC for
>      these.

Same here.  It might be woth to use ITER_UBUF for single vector
readv/writev, but this should be the same for all interfaces.  I'd
suggest to drop this for now and do a separate series with careful
review from Al for this.
Keith Busch Nov. 8, 2022, 8:25 p.m. UTC | #2
On Mon, Nov 07, 2022 at 10:54:06PM -0800, Christoph Hellwig wrote:
> On Mon, Nov 07, 2022 at 09:56:06AM -0800, Keith Busch wrote:
> >   1. io_uring will always prefer using the _iter versions of read/write
> >      callbacks if file_operations implement both, where as the generic
> >      syscalls will use .read/.write (if implemented) for non-vectored IO.
> 
> There are very few file operations that have both, and for those
> the difference matters, e.g. the strange vectors semantics for the
> sound code. 
 
Yes, thankfully there are not many. Other than the two mentioned
file_operations, the only other fops I find implementing both are
'null_ops' and 'zero_ops'; those are fine. And one other implements
just .write/.write_iter: trace_events_user.c, which is also fine.

> I would strongly suggest to mirror what the normal
> read/write path does here.

I don't think we can change that now. io_uring has always used the
.{read,write}_iter callbacks if available ever since it introduced
non-vectored read/write (3a6820f2bb8a0). Altering the io_uring op's ABI
to align with the read/write syscalls seems risky.

But I don't think there are any real use cases affected by this series
anyway.

> >   2. io_uring will use the ITER_UBUF representation for single vector
> >      readv/writev, but the generic syscalls currently uses ITER_IOVEC for
> >      these.
> 
> Same here.  It might be woth to use ITER_UBUF for single vector
> readv/writev, but this should be the same for all interfaces.  I'd
> suggest to drop this for now and do a separate series with careful
> review from Al for this.

I feel like that's a worthy longer term goal, but I'll start looking
into it now.