@@ -1059,6 +1059,53 @@ value being passed in. This request type can be used to either just wake or
interrupt anyone waiting for completions on the target ring, or it can be used
to pass messages via the two fields. Available since 5.18.
+.TP
+.B IORING_OP_SEND_ZC
+Issue the zerocopy equivalent of a
+.BR send(2)
+system call. It's similar to IORING_OP_SEND but tries to avoid making
+intermediate copies of data. Zerocopy execution is not guaranteed and it may
+fall back to copying.
+
+The
+.I flags
+field of the first
+.I "struct io_uring_cqe"
+may likely contain IORING_CQE_F_MORE, which means that there will be a second
+completion event, a.k.a. notification, with the
+.I user_data
+field set to the same value, and the user must not modify the buffer until the
+notification is posted. The first cqe follows the usual rules and so its
+.I res
+field will contain the number of bytes sent or a negative error code. The
+notification's
+.I res
+field will be set to zero and the
+.I flags
+field will contain IORING_CQE_F_NOTIF. The two step model is needed because
+the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK,
+and having a separate cqe for request completions allows the userspace to push
+more data without extra delays. Note, notifications are only responsible for
+controlling the buffers lifetime and don't tell anything about whether the data
+has atually been sent out or received by the other end.
+
+.I fd
+must be set to the socket file descriptor,
+.I addr
+must contain a pointer to the buffer,
+.I len
+denotes the length of the buffer to send, and
+.I msg_flags
+holds the flags associated with the system call. When
+.I addr2
+is non-zero it points to the address of the target with
+.I addr_len
+specifying its size, turning the request into a
+.BR sendto(2)
+system call equivalent.
+
+Available since 6.0.
+
.PP
The
.I flags
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- Pasting a text-only version right below for convenience. Issue the zerocopy equivalent of a send(2) system call. It's similar to IORING_OP_SEND but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying. The flags field of the first struct io_uring_cqe may likely contain IORING_CQE_F_MORE, which means that there will be a second completion event, a.k.a. notification, with the user_data field set to the same value, and the user must not modify the buffer until the notification is posted. The first cqe follows the usual rules and so its res field will contain the number of bytes sent or a negative error code. The notification's res field will be set to zero and the flags field will contain IORING_CQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows the userspace to push more data without extra delays. Note, notifications are only responsible for controlling the buffers lifetime and don't tell anything about whether the data has atually been sent out or received by the other end. fd must be set to the socket file descriptor, addr must contain a pointer to the buffer, len denotes the length of the buffer to send, and msg_flags holds the flags associated with the system call. When addr2 is non-zero it points to the address of the target with addr_len specifying its size, turning the request into a sendto(2) system call equivalent. Available since 6.0. man/io_uring_enter.2 | 47 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)