diff mbox series

io_uring/net: save msghdr->msg_control for retries

Message ID 0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk (mailing list archive)
State New
Headers show
Series io_uring/net: save msghdr->msg_control for retries | expand

Commit Message

Jens Axboe June 12, 2023, 7:56 p.m. UTC
If the application sets ->msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.

Link: https://github.com/axboe/liburing/issues/880
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

Comments

Stefan Metzmacher June 19, 2023, 9:57 a.m. UTC | #1
Hi Jens,

> If the application sets ->msg_control and we have to later retry this
> command, or if it got queued with IOSQE_ASYNC to begin with, then we
> need to retain the original msg_control value. This is due to the net
> stack overwriting this field with an in-kernel pointer, to copy it
> in. Hitting that path for the second time will now fail the copy from
> user, as it's attempting to copy from a non-user address.

I'm not 100% sure about the impact of this change.

But I think the logic we need is that only the
first __sys_sendmsg_sock() that returns > 0 should
see msg_control. A retry because of MSG_WAITALL should
clear msg_control[len] for a follow up __sys_sendmsg_sock().
And I fear the patch below would not clear it...

Otherwise the receiver/socket-layer will get the same msg_control twice,
which is unexpected.

metze
Jens Axboe June 19, 2023, 1:05 p.m. UTC | #2
On 6/19/23 3:57?AM, Stefan Metzmacher wrote:
> Hi Jens,
> 
>> If the application sets ->msg_control and we have to later retry this
>> command, or if it got queued with IOSQE_ASYNC to begin with, then we
>> need to retain the original msg_control value. This is due to the net
>> stack overwriting this field with an in-kernel pointer, to copy it
>> in. Hitting that path for the second time will now fail the copy from
>> user, as it's attempting to copy from a non-user address.
> 
> I'm not 100% sure about the impact of this change.
> 
> But I think the logic we need is that only the
> first __sys_sendmsg_sock() that returns > 0 should
> see msg_control. A retry because of MSG_WAITALL should
> clear msg_control[len] for a follow up __sys_sendmsg_sock().
> And I fear the patch below would not clear it...
> 
> Otherwise the receiver/socket-layer will get the same msg_control twice,
> which is unexpected.

Yes agree, if we do transfer some (but not all) data and WAITALL is set,
it should get cleared. I'll post a patch for that.

Note that it was also broken before, just differently broken. The most
likely outcome here was a full retry and now getting -EFAULT.
Stefan Metzmacher June 19, 2023, 1:09 p.m. UTC | #3
Am 19.06.23 um 15:05 schrieb Jens Axboe:
> On 6/19/23 3:57?AM, Stefan Metzmacher wrote:
>> Hi Jens,
>>
>>> If the application sets ->msg_control and we have to later retry this
>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we
>>> need to retain the original msg_control value. This is due to the net
>>> stack overwriting this field with an in-kernel pointer, to copy it
>>> in. Hitting that path for the second time will now fail the copy from
>>> user, as it's attempting to copy from a non-user address.
>>
>> I'm not 100% sure about the impact of this change.
>>
>> But I think the logic we need is that only the
>> first __sys_sendmsg_sock() that returns > 0 should
>> see msg_control. A retry because of MSG_WAITALL should
>> clear msg_control[len] for a follow up __sys_sendmsg_sock().
>> And I fear the patch below would not clear it...
>>
>> Otherwise the receiver/socket-layer will get the same msg_control twice,
>> which is unexpected.
> 
> Yes agree, if we do transfer some (but not all) data and WAITALL is set,
> it should get cleared. I'll post a patch for that.

Thanks!

> Note that it was also broken before, just differently broken. The most
> likely outcome here was a full retry and now getting -EFAULT.

Yes, I can see that it was broken before...

metze
Stefan Metzmacher June 19, 2023, 1:27 p.m. UTC | #4
Am 19.06.23 um 15:09 schrieb Stefan Metzmacher:
> Am 19.06.23 um 15:05 schrieb Jens Axboe:
>> On 6/19/23 3:57?AM, Stefan Metzmacher wrote:
>>> Hi Jens,
>>>
>>>> If the application sets ->msg_control and we have to later retry this
>>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we
>>>> need to retain the original msg_control value. This is due to the net
>>>> stack overwriting this field with an in-kernel pointer, to copy it
>>>> in. Hitting that path for the second time will now fail the copy from
>>>> user, as it's attempting to copy from a non-user address.
>>>
>>> I'm not 100% sure about the impact of this change.
>>>
>>> But I think the logic we need is that only the
>>> first __sys_sendmsg_sock() that returns > 0 should
>>> see msg_control. A retry because of MSG_WAITALL should
>>> clear msg_control[len] for a follow up __sys_sendmsg_sock().
>>> And I fear the patch below would not clear it...
>>>
>>> Otherwise the receiver/socket-layer will get the same msg_control twice,
>>> which is unexpected.
>>
>> Yes agree, if we do transfer some (but not all) data and WAITALL is set,
>> it should get cleared. I'll post a patch for that.
> 
> Thanks!
> 
>> Note that it was also broken before, just differently broken. The most
>> likely outcome here was a full retry and now getting -EFAULT.
> 
> Yes, I can see that it was broken before...

I haven't checked myself, but I'm wondering about the recvmsg case,
I guess we would need to advance the msg_control buffer after each
iteration, in order to avoid overwritting the already received messages
on retry.

This all gets complicated with things like MSG_CTRUNC.

I guess it's too late to reject MSG_WAITALL together with msg_control
for io_recvmsg() because of compat reasons,
but as MSG_WAITALL is also processed in the socket layer, we could keep it
simple for now and skip the this retry logic:

         if (flags & MSG_WAITALL)
                 min_ret = iov_iter_count(&kmsg->msg.msg_iter);

This might become something similar to this,
but likely more complex, as would need to record kmsg->controllen == 0
condition already in io_recvmsg_prep:

         if (flags & MSG_WAITALL && kmsg->controllen == 0)
                 min_ret = iov_iter_count(&kmsg->msg.msg_iter);


metze
Jens Axboe June 19, 2023, 2:38 p.m. UTC | #5
On 6/19/23 7:27?AM, Stefan Metzmacher wrote:
> Am 19.06.23 um 15:09 schrieb Stefan Metzmacher:
>> Am 19.06.23 um 15:05 schrieb Jens Axboe:
>>> On 6/19/23 3:57?AM, Stefan Metzmacher wrote:
>>>> Hi Jens,
>>>>
>>>>> If the application sets ->msg_control and we have to later retry this
>>>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we
>>>>> need to retain the original msg_control value. This is due to the net
>>>>> stack overwriting this field with an in-kernel pointer, to copy it
>>>>> in. Hitting that path for the second time will now fail the copy from
>>>>> user, as it's attempting to copy from a non-user address.
>>>>
>>>> I'm not 100% sure about the impact of this change.
>>>>
>>>> But I think the logic we need is that only the
>>>> first __sys_sendmsg_sock() that returns > 0 should
>>>> see msg_control. A retry because of MSG_WAITALL should
>>>> clear msg_control[len] for a follow up __sys_sendmsg_sock().
>>>> And I fear the patch below would not clear it...
>>>>
>>>> Otherwise the receiver/socket-layer will get the same msg_control twice,
>>>> which is unexpected.
>>>
>>> Yes agree, if we do transfer some (but not all) data and WAITALL is set,
>>> it should get cleared. I'll post a patch for that.
>>
>> Thanks!
>>
>>> Note that it was also broken before, just differently broken. The most
>>> likely outcome here was a full retry and now getting -EFAULT.
>>
>> Yes, I can see that it was broken before...
> 
> I haven't checked myself, but I'm wondering about the recvmsg case,
> I guess we would need to advance the msg_control buffer after each
> iteration, in order to avoid overwritting the already received messages
> on retry.
> 
> This all gets complicated with things like MSG_CTRUNC.
> 
> I guess it's too late to reject MSG_WAITALL together with msg_control
> for io_recvmsg() because of compat reasons,
> but as MSG_WAITALL is also processed in the socket layer, we could keep it
> simple for now and skip the this retry logic:
> 
>         if (flags & MSG_WAITALL)
>                 min_ret = iov_iter_count(&kmsg->msg.msg_iter);
> 
> This might become something similar to this,
> but likely more complex, as would need to record kmsg->controllen == 0
> condition already in io_recvmsg_prep:
> 
>         if (flags & MSG_WAITALL && kmsg->controllen == 0)
>                 min_ret = iov_iter_count(&kmsg->msg.msg_iter);

Yep agree, I think this is the best way - ensure that once we transfer
data with cmsg, it's a one-shot kind of deal.

Do you want to cut a patch for that one?
Stefan Metzmacher June 19, 2023, 2:40 p.m. UTC | #6
Am 19.06.23 um 16:38 schrieb Jens Axboe:
> On 6/19/23 7:27?AM, Stefan Metzmacher wrote:
>> Am 19.06.23 um 15:09 schrieb Stefan Metzmacher:
>>> Am 19.06.23 um 15:05 schrieb Jens Axboe:
>>>> On 6/19/23 3:57?AM, Stefan Metzmacher wrote:
>>>>> Hi Jens,
>>>>>
>>>>>> If the application sets ->msg_control and we have to later retry this
>>>>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we
>>>>>> need to retain the original msg_control value. This is due to the net
>>>>>> stack overwriting this field with an in-kernel pointer, to copy it
>>>>>> in. Hitting that path for the second time will now fail the copy from
>>>>>> user, as it's attempting to copy from a non-user address.
>>>>>
>>>>> I'm not 100% sure about the impact of this change.
>>>>>
>>>>> But I think the logic we need is that only the
>>>>> first __sys_sendmsg_sock() that returns > 0 should
>>>>> see msg_control. A retry because of MSG_WAITALL should
>>>>> clear msg_control[len] for a follow up __sys_sendmsg_sock().
>>>>> And I fear the patch below would not clear it...
>>>>>
>>>>> Otherwise the receiver/socket-layer will get the same msg_control twice,
>>>>> which is unexpected.
>>>>
>>>> Yes agree, if we do transfer some (but not all) data and WAITALL is set,
>>>> it should get cleared. I'll post a patch for that.
>>>
>>> Thanks!
>>>
>>>> Note that it was also broken before, just differently broken. The most
>>>> likely outcome here was a full retry and now getting -EFAULT.
>>>
>>> Yes, I can see that it was broken before...
>>
>> I haven't checked myself, but I'm wondering about the recvmsg case,
>> I guess we would need to advance the msg_control buffer after each
>> iteration, in order to avoid overwritting the already received messages
>> on retry.
>>
>> This all gets complicated with things like MSG_CTRUNC.
>>
>> I guess it's too late to reject MSG_WAITALL together with msg_control
>> for io_recvmsg() because of compat reasons,
>> but as MSG_WAITALL is also processed in the socket layer, we could keep it
>> simple for now and skip the this retry logic:
>>
>>          if (flags & MSG_WAITALL)
>>                  min_ret = iov_iter_count(&kmsg->msg.msg_iter);
>>
>> This might become something similar to this,
>> but likely more complex, as would need to record kmsg->controllen == 0
>> condition already in io_recvmsg_prep:
>>
>>          if (flags & MSG_WAITALL && kmsg->controllen == 0)
>>                  min_ret = iov_iter_count(&kmsg->msg.msg_iter);
> 
> Yep agree, I think this is the best way - ensure that once we transfer
> data with cmsg, it's a one-shot kind of deal.
> 
> Do you want to cut a patch for that one?

No, sorry I'm busy with other stuff and not able to to do any testing...

metze
Jens Axboe June 19, 2023, 2:40 p.m. UTC | #7
On 6/19/23 8:40 AM, Stefan Metzmacher wrote:
> Am 19.06.23 um 16:38 schrieb Jens Axboe:
>> On 6/19/23 7:27?AM, Stefan Metzmacher wrote:
>>> Am 19.06.23 um 15:09 schrieb Stefan Metzmacher:
>>>> Am 19.06.23 um 15:05 schrieb Jens Axboe:
>>>>> On 6/19/23 3:57?AM, Stefan Metzmacher wrote:
>>>>>> Hi Jens,
>>>>>>
>>>>>>> If the application sets ->msg_control and we have to later retry this
>>>>>>> command, or if it got queued with IOSQE_ASYNC to begin with, then we
>>>>>>> need to retain the original msg_control value. This is due to the net
>>>>>>> stack overwriting this field with an in-kernel pointer, to copy it
>>>>>>> in. Hitting that path for the second time will now fail the copy from
>>>>>>> user, as it's attempting to copy from a non-user address.
>>>>>>
>>>>>> I'm not 100% sure about the impact of this change.
>>>>>>
>>>>>> But I think the logic we need is that only the
>>>>>> first __sys_sendmsg_sock() that returns > 0 should
>>>>>> see msg_control. A retry because of MSG_WAITALL should
>>>>>> clear msg_control[len] for a follow up __sys_sendmsg_sock().
>>>>>> And I fear the patch below would not clear it...
>>>>>>
>>>>>> Otherwise the receiver/socket-layer will get the same msg_control twice,
>>>>>> which is unexpected.
>>>>>
>>>>> Yes agree, if we do transfer some (but not all) data and WAITALL is set,
>>>>> it should get cleared. I'll post a patch for that.
>>>>
>>>> Thanks!
>>>>
>>>>> Note that it was also broken before, just differently broken. The most
>>>>> likely outcome here was a full retry and now getting -EFAULT.
>>>>
>>>> Yes, I can see that it was broken before...
>>>
>>> I haven't checked myself, but I'm wondering about the recvmsg case,
>>> I guess we would need to advance the msg_control buffer after each
>>> iteration, in order to avoid overwritting the already received messages
>>> on retry.
>>>
>>> This all gets complicated with things like MSG_CTRUNC.
>>>
>>> I guess it's too late to reject MSG_WAITALL together with msg_control
>>> for io_recvmsg() because of compat reasons,
>>> but as MSG_WAITALL is also processed in the socket layer, we could keep it
>>> simple for now and skip the this retry logic:
>>>
>>>          if (flags & MSG_WAITALL)
>>>                  min_ret = iov_iter_count(&kmsg->msg.msg_iter);
>>>
>>> This might become something similar to this,
>>> but likely more complex, as would need to record kmsg->controllen == 0
>>> condition already in io_recvmsg_prep:
>>>
>>>          if (flags & MSG_WAITALL && kmsg->controllen == 0)
>>>                  min_ret = iov_iter_count(&kmsg->msg.msg_iter);
>>
>> Yep agree, I think this is the best way - ensure that once we transfer
>> data with cmsg, it's a one-shot kind of deal.
>>
>> Do you want to cut a patch for that one?
> 
> No, sorry I'm busy with other stuff and not able to to do any testing...

OK that's fine, I'll post both.
diff mbox series

Patch

diff --git a/io_uring/net.c b/io_uring/net.c
index 89e839013837..51b0f7fbb4f5 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -65,6 +65,7 @@  struct io_sr_msg {
 	u16				addr_len;
 	u16				buf_group;
 	void __user			*addr;
+	void __user			*msg_control;
 	/* used only for send zerocopy */
 	struct io_kiocb 		*notif;
 };
@@ -195,11 +196,15 @@  static int io_sendmsg_copy_hdr(struct io_kiocb *req,
 			       struct io_async_msghdr *iomsg)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+	int ret;
 
 	iomsg->msg.msg_name = &iomsg->addr;
 	iomsg->free_iov = iomsg->fast_iov;
-	return sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags,
+	ret = sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags,
 					&iomsg->free_iov);
+	/* save msg_control as sys_sendmsg() overwrites it */
+	sr->msg_control = iomsg->msg.msg_control;
+	return ret;
 }
 
 int io_send_prep_async(struct io_kiocb *req)
@@ -297,6 +302,7 @@  int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 
 	if (req_has_async_data(req)) {
 		kmsg = req->async_data;
+		kmsg->msg.msg_control = sr->msg_control;
 	} else {
 		ret = io_sendmsg_copy_hdr(req, &iomsg);
 		if (ret)