Message ID | 9c8bead87b2b980fcec441b8faef52188b4a6588.1664292100.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [for-6.1] io_uring/net: don't skip notifs for failed requests | expand |
On Wed, 28 Sep 2022 00:51:49 +0100, Pavel Begunkov wrote: > We currently only add a notification CQE when the send succeded, i.e. > cqe.res >= 0. However, it'd be more robust to do buffer notifications > for failed requests as well in case drivers decide do something fanky. > > Always return a buffer notification after initial prep, don't hide it. > This behaviour is better aligned with documentation and the patch also > helps the userspace to respect it. > > [...] Applied, thanks! [1/1] io_uring/net: don't skip notifs for failed requests commit: 6ae91ac9a6aa7d6005c3c6d0f4d263fbab9f377f Best regards,
Hi Pavel, > We currently only add a notification CQE when the send succeded, i.e. > cqe.res >= 0. However, it'd be more robust to do buffer notifications > for failed requests as well in case drivers decide do something fanky. > > Always return a buffer notification after initial prep, don't hide it. > This behaviour is better aligned with documentation and the patch also > helps the userspace to respect it. Just as reference, this was the version I was testing with: https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7ffb896cdb8ccd55065f7ffae9fb8050e39211c7 > void io_sendrecv_fail(struct io_kiocb *req) > { > struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); > - int res = req->cqe.res; > > if (req->flags & REQ_F_PARTIAL_IO) > - res = sr->done_io; > + req->cqe.res = sr->done_io; > + > if ((req->flags & REQ_F_NEED_CLEANUP) && > - (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC)) { > - /* preserve notification for partial I/O */ > - if (res < 0) > - sr->notif->flags |= REQ_F_CQE_SKIP; > - io_notif_flush(sr->notif); > - sr->notif = NULL; Here we rely on io_send_zc_cleanup(), correct? Note that I hit a very bad problem during my tests of SENDMSG_ZC. BUG(); in first_iovec_segment() triggered very easily. The problem is io_setup_async_msg() in the partial retry case, which seems to happen more often with _ZC. if (!async_msg->free_iov) async_msg->msg.msg_iter.iov = async_msg->fast_iov; Is wrong it needs to be something like this: + if (!kmsg->free_iov) { + size_t fast_idx = kmsg->msg.msg_iter.iov - kmsg->fast_iov; + async_msg->msg.msg_iter.iov = &async_msg->fast_iov[fast_idx]; + } As iov_iter_iovec_advance() may change i->iov in order to have i->iov_offset being only relative to the first element. I'm not sure about the 'kmsg->free_iov' case, do we reuse the callers memory or should we make a copy? I initially used this https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=e1d3a9f5c7708a37172d258753ed7377eaac9e33 But I didn't test with the non-fast_iov case. BTW: I tested with 5 vectors with length like this 4, 0, 64, 32, 8388608 and got a short write with about ~ 2000000. I'm not sure if it was already a problem before: commit 257e84a5377fbbc336ff563833a8712619acce56 io_uring: refactor sendmsg/recvmsg iov managing But I guess it was a potential problem before starting with 7ba89d2af17aa879dda30f5d5d3f152e587fc551 where io_net_retry() was introduced. metze
On 9/28/22 16:23, Stefan Metzmacher wrote: > > Hi Pavel, > >> We currently only add a notification CQE when the send succeded, i.e. >> cqe.res >= 0. However, it'd be more robust to do buffer notifications >> for failed requests as well in case drivers decide do something fanky. >> >> Always return a buffer notification after initial prep, don't hide it. >> This behaviour is better aligned with documentation and the patch also >> helps the userspace to respect it. > > Just as reference, this was the version I was testing with: > https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7ffb896cdb8ccd55065f7ffae9fb8050e39211c7 > >> void io_sendrecv_fail(struct io_kiocb *req) >> { >> struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); >> - int res = req->cqe.res; >> if (req->flags & REQ_F_PARTIAL_IO) >> - res = sr->done_io; >> + req->cqe.res = sr->done_io; >> + >> if ((req->flags & REQ_F_NEED_CLEANUP) && >> - (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC)) { >> - /* preserve notification for partial I/O */ >> - if (res < 0) >> - sr->notif->flags |= REQ_F_CQE_SKIP; >> - io_notif_flush(sr->notif); >> - sr->notif = NULL; > > Here we rely on io_send_zc_cleanup(), correct? > > Note that I hit a very bad problem during my tests of SENDMSG_ZC. > BUG(); in first_iovec_segment() triggered very easily. > The problem is io_setup_async_msg() in the partial retry case, > which seems to happen more often with _ZC. > > if (!async_msg->free_iov) > async_msg->msg.msg_iter.iov = async_msg->fast_iov; > > Is wrong it needs to be something like this: > > + if (!kmsg->free_iov) { > + size_t fast_idx = kmsg->msg.msg_iter.iov - kmsg->fast_iov; > + async_msg->msg.msg_iter.iov = &async_msg->fast_iov[fast_idx]; > + } I agree, it doesn't look right. It indeed needs sth like io_uring/rw.c:io_req_map_rw() > As iov_iter_iovec_advance() may change i->iov in order to have i->iov_offset > being only relative to the first element. > > I'm not sure about the 'kmsg->free_iov' case, do we reuse the > callers memory or should we make a copy? > I initially used this > https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=e1d3a9f5c7708a37172d258753ed7377eaac9e33 > But I didn't test with the non-fast_iov case. > > BTW: I tested with 5 vectors with length like this 4, 0, 64, 32, 8388608 > and got a short write with about ~ 2000000. > > I'm not sure if it was already a problem before: > > commit 257e84a5377fbbc336ff563833a8712619acce56 > io_uring: refactor sendmsg/recvmsg iov managing > > But I guess it was a potential problem before starting with > 7ba89d2af17aa879dda30f5d5d3f152e587fc551 where io_net_retry() > was introduced. > > metze
On 9/28/22 17:56, Pavel Begunkov wrote: > On 9/28/22 16:23, Stefan Metzmacher wrote: >> Just as reference, this was the version I was testing with: >> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7ffb896cdb8ccd55065f7ffae9fb8050e39211c7 >> >>> void io_sendrecv_fail(struct io_kiocb *req) >>> { >>> struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); >>> - int res = req->cqe.res; >>> if (req->flags & REQ_F_PARTIAL_IO) >>> - res = sr->done_io; >>> + req->cqe.res = sr->done_io; >>> + >>> if ((req->flags & REQ_F_NEED_CLEANUP) && >>> - (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC)) { >>> - /* preserve notification for partial I/O */ >>> - if (res < 0) >>> - sr->notif->flags |= REQ_F_CQE_SKIP; >>> - io_notif_flush(sr->notif); >>> - sr->notif = NULL; >> >> Here we rely on io_send_zc_cleanup(), correct? Right >> Note that I hit a very bad problem during my tests of SENDMSG_ZC. >> BUG(); in first_iovec_segment() triggered very easily. >> The problem is io_setup_async_msg() in the partial retry case, >> which seems to happen more often with _ZC. >> >> if (!async_msg->free_iov) >> async_msg->msg.msg_iter.iov = async_msg->fast_iov; >> >> Is wrong it needs to be something like this: >> >> + if (!kmsg->free_iov) { >> + size_t fast_idx = kmsg->msg.msg_iter.iov - kmsg->fast_iov; >> + async_msg->msg.msg_iter.iov = &async_msg->fast_iov[fast_idx]; >> + } > > I agree, it doesn't look right. It indeed needs sth like > io_uring/rw.c:io_req_map_rw() Took a closer look, that chunk above looks good and matches io_req_map_rw() apart from non essential differences. Can you send a patch? >> As iov_iter_iovec_advance() may change i->iov in order to have i->iov_offset >> being only relative to the first element. >> >> I'm not sure about the 'kmsg->free_iov' case, do we reuse the >> callers memory or should we make a copy? We can reuse it, we own it and it's immutable from the iter perspective. >> BTW: I tested with 5 vectors with length like this 4, 0, 64, 32, 8388608 >> and got a short write with about ~ 2000000. Which is interesting to know. What does 2M here mean? Is it consistently retries when sending more than 2M bytes?
diff --git a/io_uring/net.c b/io_uring/net.c index 6b69eff6887e..5058a9fc9e9c 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -916,7 +916,6 @@ void io_send_zc_cleanup(struct io_kiocb *req) kfree(io->free_iov); } if (zc->notif) { - zc->notif->flags |= REQ_F_CQE_SKIP; io_notif_flush(zc->notif); zc->notif = NULL; } @@ -1047,7 +1046,7 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) struct msghdr msg; struct iovec iov; struct socket *sock; - unsigned msg_flags, cflags; + unsigned msg_flags; int ret, min_ret = 0; sock = sock_from_file(req->file); @@ -1115,8 +1114,6 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) req->flags |= REQ_F_PARTIAL_IO; return io_setup_async_addr(req, &__address, issue_flags); } - if (ret < 0 && !zc->done_io) - zc->notif->flags |= REQ_F_CQE_SKIP; if (ret == -ERESTARTSYS) ret = -EINTR; req_set_fail(req); @@ -1129,8 +1126,7 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) io_notif_flush(zc->notif); req->flags &= ~REQ_F_NEED_CLEANUP; - cflags = ret >= 0 ? IORING_CQE_F_MORE : 0; - io_req_set_res(req, ret, cflags); + io_req_set_res(req, ret, IORING_CQE_F_MORE); return IOU_OK; } @@ -1139,7 +1135,7 @@ int io_sendmsg_zc(struct io_kiocb *req, unsigned int issue_flags) struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); struct io_async_msghdr iomsg, *kmsg; struct socket *sock; - unsigned flags, cflags; + unsigned flags; int ret, min_ret = 0; sock = sock_from_file(req->file); @@ -1178,8 +1174,6 @@ int io_sendmsg_zc(struct io_kiocb *req, unsigned int issue_flags) req->flags |= REQ_F_PARTIAL_IO; return io_setup_async_msg(req, kmsg, issue_flags); } - if (ret < 0 && !sr->done_io) - sr->notif->flags |= REQ_F_CQE_SKIP; if (ret == -ERESTARTSYS) ret = -EINTR; req_set_fail(req); @@ -1196,27 +1190,20 @@ int io_sendmsg_zc(struct io_kiocb *req, unsigned int issue_flags) io_notif_flush(sr->notif); req->flags &= ~REQ_F_NEED_CLEANUP; - cflags = ret >= 0 ? IORING_CQE_F_MORE : 0; - io_req_set_res(req, ret, cflags); + io_req_set_res(req, ret, IORING_CQE_F_MORE); return IOU_OK; } void io_sendrecv_fail(struct io_kiocb *req) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); - int res = req->cqe.res; if (req->flags & REQ_F_PARTIAL_IO) - res = sr->done_io; + req->cqe.res = sr->done_io; + if ((req->flags & REQ_F_NEED_CLEANUP) && - (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC)) { - /* preserve notification for partial I/O */ - if (res < 0) - sr->notif->flags |= REQ_F_CQE_SKIP; - io_notif_flush(sr->notif); - sr->notif = NULL; - } - io_req_set_res(req, res, req->cqe.flags); + (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC)) + req->cqe.flags |= IORING_CQE_F_MORE; } int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
We currently only add a notification CQE when the send succeded, i.e. cqe.res >= 0. However, it'd be more robust to do buffer notifications for failed requests as well in case drivers decide do something fanky. Always return a buffer notification after initial prep, don't hide it. This behaviour is better aligned with documentation and the patch also helps the userspace to respect it. Cc: stable@vger.kernel.org # 6.0 Suggested-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- We need it as soon as possible, and it's likely almost time for 6.1 rcs. io_uring/net.c | 29 ++++++++--------------------- 1 file changed, 8 insertions(+), 21 deletions(-)