diff mbox series

[liburing,v2] man/io_uring_enter.2: document IORING_OP_SEND_ZC

Message ID e876fa3c0de9d45db41a796eb8ac547e298a8787.1662459139.git.asml.silence@gmail.com (mailing list archive)
State New
Headers show
Series [liburing,v2] man/io_uring_enter.2: document IORING_OP_SEND_ZC | expand

Commit Message

Pavel Begunkov Sept. 6, 2022, 10:17 a.m. UTC
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---

Pasting a text-only version right below for convenience.

Issue the zerocopy equivalent of a send(2) system call. It's similar to
IORING_OP_SEND but tries to avoid making intermediate copies of data.
Zerocopy execution is not guaranteed and it may fall back to copying.

The flags field of the first struct io_uring_cqe may likely contain
IORING_CQE_F_MORE, which means that there will be a second completion
event, a.k.a. notification, with the user_data field set to  the  same
value, and  the user must not modify the buffer until the notification
is posted. The first cqe follows the usual rules and so its res field
will contain the number of bytes sent or a negative error code. The
notification's res field will be set to zero and the flags field will
contain IORING_CQE_F_NOTIF. The two step model is needed because the
kernel may hold on to buffers for a long time, e.g. waiting for a TCP
ACK, and having a separate  cqe  for request completions allows the
userspace to push more data without extra delays. Note, notifications
are only responsible for controlling the buffers lifetime and don't
tell anything about whether the data has atually been sent out or
received by the other end.

fd must be set to the socket file descriptor, addr must contain a
pointer to the buffer, len denotes the length of the buffer to send,
and msg_flags holds the flags associated with the system call. When
addr2 is non-zero it points to the address of the target with addr_len
specifying its size, turning the request into a sendto(2) system call
equivalent.

Available since 6.0.


 man/io_uring_enter.2 | 47 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

Comments

Jens Axboe Sept. 6, 2022, 12:52 p.m. UTC | #1
On 9/6/22 4:17 AM, Pavel Begunkov wrote:
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> 
> Pasting a text-only version right below for convenience.

Thanks, appreciate the plain text one, easier to review. Pushed
this out with just minor edits here and there.
diff mbox series

Patch

diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
index 1a9311e..a93e949 100644
--- a/man/io_uring_enter.2
+++ b/man/io_uring_enter.2
@@ -1059,6 +1059,53 @@  value being passed in. This request type can be used to either just wake or
 interrupt anyone waiting for completions on the target ring, or it can be used
 to pass messages via the two fields. Available since 5.18.
 
+.TP
+.B IORING_OP_SEND_ZC
+Issue the zerocopy equivalent of a
+.BR send(2)
+system call. It's similar to IORING_OP_SEND but tries to avoid making
+intermediate copies of data. Zerocopy execution is not guaranteed and it may
+fall back to copying.
+
+The
+.I flags
+field of the first
+.I "struct io_uring_cqe"
+may likely contain IORING_CQE_F_MORE, which means that there will be a second
+completion event, a.k.a. notification, with the
+.I user_data
+field set to the same value, and the user must not modify the buffer until the
+notification is posted. The first cqe follows the usual rules and so its
+.I res
+field will contain the number of bytes sent or a negative error code. The
+notification's
+.I res
+field will be set to zero and the
+.I flags
+field will contain IORING_CQE_F_NOTIF. The two step model is needed because
+the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK,
+and having a separate cqe for request completions allows the userspace to push
+more data without extra delays. Note, notifications are only responsible for
+controlling the buffers lifetime and don't tell anything about whether the data
+has atually been sent out or received by the other end.
+
+.I fd
+must be set to the socket file descriptor,
+.I addr
+must contain a pointer to the buffer,
+.I len
+denotes the length of the buffer to send, and
+.I msg_flags
+holds the flags associated with the system call. When
+.I addr2
+is non-zero it points to the address of the target with
+.I addr_len
+specifying its size, turning the request into a
+.BR sendto(2)
+system call equivalent.
+
+Available since 6.0.
+
 .PP
 The
 .I flags