Message ID | 0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring/net: save msghdr->msg_control for retries | expand |
Hi Jens, > If the application sets ->msg_control and we have to later retry this > command, or if it got queued with IOSQE_ASYNC to begin with, then we > need to retain the original msg_control value. This is due to the net > stack overwriting this field with an in-kernel pointer, to copy it > in. Hitting that path for the second time will now fail the copy from > user, as it's attempting to copy from a non-user address. I'm not 100% sure about the impact of this change. But I think the logic we need is that only the first __sys_sendmsg_sock() that returns > 0 should see msg_control. A retry because of MSG_WAITALL should clear msg_control[len] for a follow up __sys_sendmsg_sock(). And I fear the patch below would not clear it... Otherwise the receiver/socket-layer will get the same msg_control twice, which is unexpected. metze
On 6/19/23 3:57?AM, Stefan Metzmacher wrote: > Hi Jens, > >> If the application sets ->msg_control and we have to later retry this >> command, or if it got queued with IOSQE_ASYNC to begin with, then we >> need to retain the original msg_control value. This is due to the net >> stack overwriting this field with an in-kernel pointer, to copy it >> in. Hitting that path for the second time will now fail the copy from >> user, as it's attempting to copy from a non-user address. > > I'm not 100% sure about the impact of this change. > > But I think the logic we need is that only the > first __sys_sendmsg_sock() that returns > 0 should > see msg_control. A retry because of MSG_WAITALL should > clear msg_control[len] for a follow up __sys_sendmsg_sock(). > And I fear the patch below would not clear it... > > Otherwise the receiver/socket-layer will get the same msg_control twice, > which is unexpected. Yes agree, if we do transfer some (but not all) data and WAITALL is set, it should get cleared. I'll post a patch for that. Note that it was also broken before, just differently broken. The most likely outcome here was a full retry and now getting -EFAULT.
Am 19.06.23 um 15:05 schrieb Jens Axboe: > On 6/19/23 3:57?AM, Stefan Metzmacher wrote: >> Hi Jens, >> >>> If the application sets ->msg_control and we have to later retry this >>> command, or if it got queued with IOSQE_ASYNC to begin with, then we >>> need to retain the original msg_control value. This is due to the net >>> stack overwriting this field with an in-kernel pointer, to copy it >>> in. Hitting that path for the second time will now fail the copy from >>> user, as it's attempting to copy from a non-user address. >> >> I'm not 100% sure about the impact of this change. >> >> But I think the logic we need is that only the >> first __sys_sendmsg_sock() that returns > 0 should >> see msg_control. A retry because of MSG_WAITALL should >> clear msg_control[len] for a follow up __sys_sendmsg_sock(). >> And I fear the patch below would not clear it... >> >> Otherwise the receiver/socket-layer will get the same msg_control twice, >> which is unexpected. > > Yes agree, if we do transfer some (but not all) data and WAITALL is set, > it should get cleared. I'll post a patch for that. Thanks! > Note that it was also broken before, just differently broken. The most > likely outcome here was a full retry and now getting -EFAULT. Yes, I can see that it was broken before... metze
Am 19.06.23 um 15:09 schrieb Stefan Metzmacher: > Am 19.06.23 um 15:05 schrieb Jens Axboe: >> On 6/19/23 3:57?AM, Stefan Metzmacher wrote: >>> Hi Jens, >>> >>>> If the application sets ->msg_control and we have to later retry this >>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we >>>> need to retain the original msg_control value. This is due to the net >>>> stack overwriting this field with an in-kernel pointer, to copy it >>>> in. Hitting that path for the second time will now fail the copy from >>>> user, as it's attempting to copy from a non-user address. >>> >>> I'm not 100% sure about the impact of this change. >>> >>> But I think the logic we need is that only the >>> first __sys_sendmsg_sock() that returns > 0 should >>> see msg_control. A retry because of MSG_WAITALL should >>> clear msg_control[len] for a follow up __sys_sendmsg_sock(). >>> And I fear the patch below would not clear it... >>> >>> Otherwise the receiver/socket-layer will get the same msg_control twice, >>> which is unexpected. >> >> Yes agree, if we do transfer some (but not all) data and WAITALL is set, >> it should get cleared. I'll post a patch for that. > > Thanks! > >> Note that it was also broken before, just differently broken. The most >> likely outcome here was a full retry and now getting -EFAULT. > > Yes, I can see that it was broken before... I haven't checked myself, but I'm wondering about the recvmsg case, I guess we would need to advance the msg_control buffer after each iteration, in order to avoid overwritting the already received messages on retry. This all gets complicated with things like MSG_CTRUNC. I guess it's too late to reject MSG_WAITALL together with msg_control for io_recvmsg() because of compat reasons, but as MSG_WAITALL is also processed in the socket layer, we could keep it simple for now and skip the this retry logic: if (flags & MSG_WAITALL) min_ret = iov_iter_count(&kmsg->msg.msg_iter); This might become something similar to this, but likely more complex, as would need to record kmsg->controllen == 0 condition already in io_recvmsg_prep: if (flags & MSG_WAITALL && kmsg->controllen == 0) min_ret = iov_iter_count(&kmsg->msg.msg_iter); metze
On 6/19/23 7:27?AM, Stefan Metzmacher wrote: > Am 19.06.23 um 15:09 schrieb Stefan Metzmacher: >> Am 19.06.23 um 15:05 schrieb Jens Axboe: >>> On 6/19/23 3:57?AM, Stefan Metzmacher wrote: >>>> Hi Jens, >>>> >>>>> If the application sets ->msg_control and we have to later retry this >>>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we >>>>> need to retain the original msg_control value. This is due to the net >>>>> stack overwriting this field with an in-kernel pointer, to copy it >>>>> in. Hitting that path for the second time will now fail the copy from >>>>> user, as it's attempting to copy from a non-user address. >>>> >>>> I'm not 100% sure about the impact of this change. >>>> >>>> But I think the logic we need is that only the >>>> first __sys_sendmsg_sock() that returns > 0 should >>>> see msg_control. A retry because of MSG_WAITALL should >>>> clear msg_control[len] for a follow up __sys_sendmsg_sock(). >>>> And I fear the patch below would not clear it... >>>> >>>> Otherwise the receiver/socket-layer will get the same msg_control twice, >>>> which is unexpected. >>> >>> Yes agree, if we do transfer some (but not all) data and WAITALL is set, >>> it should get cleared. I'll post a patch for that. >> >> Thanks! >> >>> Note that it was also broken before, just differently broken. The most >>> likely outcome here was a full retry and now getting -EFAULT. >> >> Yes, I can see that it was broken before... > > I haven't checked myself, but I'm wondering about the recvmsg case, > I guess we would need to advance the msg_control buffer after each > iteration, in order to avoid overwritting the already received messages > on retry. > > This all gets complicated with things like MSG_CTRUNC. > > I guess it's too late to reject MSG_WAITALL together with msg_control > for io_recvmsg() because of compat reasons, > but as MSG_WAITALL is also processed in the socket layer, we could keep it > simple for now and skip the this retry logic: > > if (flags & MSG_WAITALL) > min_ret = iov_iter_count(&kmsg->msg.msg_iter); > > This might become something similar to this, > but likely more complex, as would need to record kmsg->controllen == 0 > condition already in io_recvmsg_prep: > > if (flags & MSG_WAITALL && kmsg->controllen == 0) > min_ret = iov_iter_count(&kmsg->msg.msg_iter); Yep agree, I think this is the best way - ensure that once we transfer data with cmsg, it's a one-shot kind of deal. Do you want to cut a patch for that one?
Am 19.06.23 um 16:38 schrieb Jens Axboe: > On 6/19/23 7:27?AM, Stefan Metzmacher wrote: >> Am 19.06.23 um 15:09 schrieb Stefan Metzmacher: >>> Am 19.06.23 um 15:05 schrieb Jens Axboe: >>>> On 6/19/23 3:57?AM, Stefan Metzmacher wrote: >>>>> Hi Jens, >>>>> >>>>>> If the application sets ->msg_control and we have to later retry this >>>>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we >>>>>> need to retain the original msg_control value. This is due to the net >>>>>> stack overwriting this field with an in-kernel pointer, to copy it >>>>>> in. Hitting that path for the second time will now fail the copy from >>>>>> user, as it's attempting to copy from a non-user address. >>>>> >>>>> I'm not 100% sure about the impact of this change. >>>>> >>>>> But I think the logic we need is that only the >>>>> first __sys_sendmsg_sock() that returns > 0 should >>>>> see msg_control. A retry because of MSG_WAITALL should >>>>> clear msg_control[len] for a follow up __sys_sendmsg_sock(). >>>>> And I fear the patch below would not clear it... >>>>> >>>>> Otherwise the receiver/socket-layer will get the same msg_control twice, >>>>> which is unexpected. >>>> >>>> Yes agree, if we do transfer some (but not all) data and WAITALL is set, >>>> it should get cleared. I'll post a patch for that. >>> >>> Thanks! >>> >>>> Note that it was also broken before, just differently broken. The most >>>> likely outcome here was a full retry and now getting -EFAULT. >>> >>> Yes, I can see that it was broken before... >> >> I haven't checked myself, but I'm wondering about the recvmsg case, >> I guess we would need to advance the msg_control buffer after each >> iteration, in order to avoid overwritting the already received messages >> on retry. >> >> This all gets complicated with things like MSG_CTRUNC. >> >> I guess it's too late to reject MSG_WAITALL together with msg_control >> for io_recvmsg() because of compat reasons, >> but as MSG_WAITALL is also processed in the socket layer, we could keep it >> simple for now and skip the this retry logic: >> >> if (flags & MSG_WAITALL) >> min_ret = iov_iter_count(&kmsg->msg.msg_iter); >> >> This might become something similar to this, >> but likely more complex, as would need to record kmsg->controllen == 0 >> condition already in io_recvmsg_prep: >> >> if (flags & MSG_WAITALL && kmsg->controllen == 0) >> min_ret = iov_iter_count(&kmsg->msg.msg_iter); > > Yep agree, I think this is the best way - ensure that once we transfer > data with cmsg, it's a one-shot kind of deal. > > Do you want to cut a patch for that one? No, sorry I'm busy with other stuff and not able to to do any testing... metze
On 6/19/23 8:40 AM, Stefan Metzmacher wrote: > Am 19.06.23 um 16:38 schrieb Jens Axboe: >> On 6/19/23 7:27?AM, Stefan Metzmacher wrote: >>> Am 19.06.23 um 15:09 schrieb Stefan Metzmacher: >>>> Am 19.06.23 um 15:05 schrieb Jens Axboe: >>>>> On 6/19/23 3:57?AM, Stefan Metzmacher wrote: >>>>>> Hi Jens, >>>>>> >>>>>>> If the application sets ->msg_control and we have to later retry this >>>>>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we >>>>>>> need to retain the original msg_control value. This is due to the net >>>>>>> stack overwriting this field with an in-kernel pointer, to copy it >>>>>>> in. Hitting that path for the second time will now fail the copy from >>>>>>> user, as it's attempting to copy from a non-user address. >>>>>> >>>>>> I'm not 100% sure about the impact of this change. >>>>>> >>>>>> But I think the logic we need is that only the >>>>>> first __sys_sendmsg_sock() that returns > 0 should >>>>>> see msg_control. A retry because of MSG_WAITALL should >>>>>> clear msg_control[len] for a follow up __sys_sendmsg_sock(). >>>>>> And I fear the patch below would not clear it... >>>>>> >>>>>> Otherwise the receiver/socket-layer will get the same msg_control twice, >>>>>> which is unexpected. >>>>> >>>>> Yes agree, if we do transfer some (but not all) data and WAITALL is set, >>>>> it should get cleared. I'll post a patch for that. >>>> >>>> Thanks! >>>> >>>>> Note that it was also broken before, just differently broken. The most >>>>> likely outcome here was a full retry and now getting -EFAULT. >>>> >>>> Yes, I can see that it was broken before... >>> >>> I haven't checked myself, but I'm wondering about the recvmsg case, >>> I guess we would need to advance the msg_control buffer after each >>> iteration, in order to avoid overwritting the already received messages >>> on retry. >>> >>> This all gets complicated with things like MSG_CTRUNC. >>> >>> I guess it's too late to reject MSG_WAITALL together with msg_control >>> for io_recvmsg() because of compat reasons, >>> but as MSG_WAITALL is also processed in the socket layer, we could keep it >>> simple for now and skip the this retry logic: >>> >>> if (flags & MSG_WAITALL) >>> min_ret = iov_iter_count(&kmsg->msg.msg_iter); >>> >>> This might become something similar to this, >>> but likely more complex, as would need to record kmsg->controllen == 0 >>> condition already in io_recvmsg_prep: >>> >>> if (flags & MSG_WAITALL && kmsg->controllen == 0) >>> min_ret = iov_iter_count(&kmsg->msg.msg_iter); >> >> Yep agree, I think this is the best way - ensure that once we transfer >> data with cmsg, it's a one-shot kind of deal. >> >> Do you want to cut a patch for that one? > > No, sorry I'm busy with other stuff and not able to to do any testing... OK that's fine, I'll post both.
diff --git a/io_uring/net.c b/io_uring/net.c index 89e839013837..51b0f7fbb4f5 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -65,6 +65,7 @@ struct io_sr_msg { u16 addr_len; u16 buf_group; void __user *addr; + void __user *msg_control; /* used only for send zerocopy */ struct io_kiocb *notif; }; @@ -195,11 +196,15 @@ static int io_sendmsg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + int ret; iomsg->msg.msg_name = &iomsg->addr; iomsg->free_iov = iomsg->fast_iov; - return sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags, + ret = sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags, &iomsg->free_iov); + /* save msg_control as sys_sendmsg() overwrites it */ + sr->msg_control = iomsg->msg.msg_control; + return ret; } int io_send_prep_async(struct io_kiocb *req) @@ -297,6 +302,7 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags) if (req_has_async_data(req)) { kmsg = req->async_data; + kmsg->msg.msg_control = sr->msg_control; } else { ret = io_sendmsg_copy_hdr(req, &iomsg); if (ret)
If the application sets ->msg_control and we have to later retry this command, or if it got queued with IOSQE_ASYNC to begin with, then we need to retain the original msg_control value. This is due to the net stack overwriting this field with an in-kernel pointer, to copy it in. Hitting that path for the second time will now fail the copy from user, as it's attempting to copy from a non-user address. Link: https://github.com/axboe/liburing/issues/880 Signed-off-by: Jens Axboe <axboe@kernel.dk> ---