mbox series

[for-next,0/3] io_uring: multishot recvmsg

Message ID 20220708184358.1624275-1-dylany@fb.com (mailing list archive)
Headers show
Series io_uring: multishot recvmsg | expand

Message

Dylan Yudaken July 8, 2022, 6:43 p.m. UTC
This series adds multishot support to recvmsg in io_uring.

The idea is that you submit a single multishot recvmsg and then receive
completions as and when data arrives. For recvmsg each completion also has
control data, and this is necessarily included in the same buffer as the
payload.

In order to do this a new structure is used: io_uring_recvmsg_out. This
specifies the length written of the name, control and payload. As well as
including the flags.
The layout of the buffer is <header><name><control><payload> where the
lengths are those specified in the original msghdr used to issue the recvmsg.

I suspect this API will be the most contentious part of this series and would
appreciate any comments on it.

For completeness I considered having the original struct msghdr as the header,
but size wise it is much bigger (72 bytes including an iovec vs 16 bytes here).
Testing also showed a 1% slowdown in terms of QPS.

Using a mini network tester [1] shows 14% QPS improvment using this API, however
this is likely to go down to ~8% with the latest allocation cache added by Jens.

I have based this on this other patch series [2].

[1]: https://github.com/DylanZA/netbench/tree/main
[2]: https://lore.kernel.org/io-uring/20220708181838.1495428-1-dylany@fb.com/

Dylan Yudaken (3):
  net: copy from user before calling __copy_msghdr
  net: copy from user before calling __get_compat_msghdr
  io_uring: support multishot in recvmsg

 include/linux/socket.h        |   7 +-
 include/net/compat.h          |   5 +-
 include/uapi/linux/io_uring.h |   7 ++
 io_uring/net.c                | 195 ++++++++++++++++++++++++++++------
 io_uring/net.h                |   5 +
 net/compat.c                  |  39 +++----
 net/socket.c                  |  37 +++----
 7 files changed, 215 insertions(+), 80 deletions(-)


base-commit: 9802dee74e7f30ab52dc5f346373185cd860afab

Comments

Paolo Abeni July 12, 2022, 9:18 a.m. UTC | #1
On Fri, 2022-07-08 at 11:43 -0700, Dylan Yudaken wrote:
> This series adds multishot support to recvmsg in io_uring.
> 
> The idea is that you submit a single multishot recvmsg and then receive
> completions as and when data arrives. For recvmsg each completion also has
> control data, and this is necessarily included in the same buffer as the
> payload.
> 
> In order to do this a new structure is used: io_uring_recvmsg_out. This
> specifies the length written of the name, control and payload. As well as
> including the flags.
> The layout of the buffer is <header><name><control><payload> where the
> lengths are those specified in the original msghdr used to issue the recvmsg.
> 
> I suspect this API will be the most contentious part of this series and would
> appreciate any comments on it.
> 
> For completeness I considered having the original struct msghdr as the header,
> but size wise it is much bigger (72 bytes including an iovec vs 16 bytes here).
> Testing also showed a 1% slowdown in terms of QPS.
> 
> Using a mini network tester [1] shows 14% QPS improvment using this API, however
> this is likely to go down to ~8% with the latest allocation cache added by Jens.
> 
> I have based this on this other patch series [2].
> 
> [1]: https://github.com/DylanZA/netbench/tree/main
> [2]: https://lore.kernel.org/io-uring/20220708181838.1495428-1-dylany@fb.com/
> 
> Dylan Yudaken (3):
>   net: copy from user before calling __copy_msghdr
>   net: copy from user before calling __get_compat_msghdr
>   io_uring: support multishot in recvmsg
> 
>  include/linux/socket.h        |   7 +-
>  include/net/compat.h          |   5 +-
>  include/uapi/linux/io_uring.h |   7 ++
>  io_uring/net.c                | 195 ++++++++++++++++++++++++++++------
>  io_uring/net.h                |   5 +
>  net/compat.c                  |  39 +++----
>  net/socket.c                  |  37 +++----
>  7 files changed, 215 insertions(+), 80 deletions(-)
> 
> 
> base-commit: 9802dee74e7f30ab52dc5f346373185cd860afab

I read the above as this series is targeting Jens's tree. It looks like
it should be conflicts-free vs net-next.

For the network bits:

Acked-by: Paolo Abeni <pabeni@redhat.com>

Cheers,

Paolo