diff mbox series

[1/1] io_uring/rw: forbid multishot async reads

Message ID 49f065f7b88b7a67190ce65919e7c93c04b3a74b.1739799395.git.asml.silence@gmail.com (mailing list archive)
State New
Headers show
Series [1/1] io_uring/rw: forbid multishot async reads | expand

Commit Message

Pavel Begunkov Feb. 17, 2025, 1:37 p.m. UTC
At the moment we can't sanely handle queuing an async request from a
multishot context, so disable them. It shouldn't matter as pollable
files / socekts don't normally do async.

Cc: stable@vger.kernel.org
Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/rw.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Pavel Begunkov Feb. 17, 2025, 1:49 p.m. UTC | #1
On 2/17/25 13:37, Pavel Begunkov wrote:
> At the moment we can't sanely handle queuing an async request from a
> multishot context, so disable them. It shouldn't matter as pollable
> files / socekts don't normally do async.

I forgot:

Reported-by: chase xd <sl1589472800@gmail.com>

> 
> Cc: stable@vger.kernel.org
> Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/rw.c | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 96b42c331267..4bda46c5eb20 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -878,7 +878,15 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>   	if (unlikely(ret))
>   		return ret;
>   
> -	ret = io_iter_do_read(rw, &io->iter);
> +	if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) {
> +		void *cb_copy = rw->kiocb.ki_complete;
> +
> +		rw->kiocb.ki_complete = NULL;
> +		ret = io_iter_do_read(rw, &io->iter);
> +		rw->kiocb.ki_complete = cb_copy;
> +	} else {
> +		ret = io_iter_do_read(rw, &io->iter);
> +	}
>   
>   	/*
>   	 * Some file systems like to return -EOPNOTSUPP for an IOCB_NOWAIT
> @@ -902,7 +910,8 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>   	} else if (ret == -EIOCBQUEUED) {
>   		return IOU_ISSUE_SKIP_COMPLETE;
>   	} else if (ret == req->cqe.res || ret <= 0 || !force_nonblock ||
> -		   (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) {
> +		   (req->flags & REQ_F_NOWAIT) || !need_complete_io(req) ||
> +		   (issue_flags & IO_URING_F_MULTISHOT)) {
>   		/* read all, failed, already did sync or don't want to retry */
>   		goto done;
>   	}
Jens Axboe Feb. 17, 2025, 1:58 p.m. UTC | #2
On 2/17/25 6:37 AM, Pavel Begunkov wrote:
> At the moment we can't sanely handle queuing an async request from a
> multishot context, so disable them. It shouldn't matter as pollable
> files / socekts don't normally do async.

Having something pollable that can return -EIOCBQUEUED is odd, but
that's just a side comment.


> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 96b42c331267..4bda46c5eb20 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -878,7 +878,15 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>  	if (unlikely(ret))
>  		return ret;
>  
> -	ret = io_iter_do_read(rw, &io->iter);
> +	if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) {
> +		void *cb_copy = rw->kiocb.ki_complete;
> +
> +		rw->kiocb.ki_complete = NULL;
> +		ret = io_iter_do_read(rw, &io->iter);
> +		rw->kiocb.ki_complete = cb_copy;
> +	} else {
> +		ret = io_iter_do_read(rw, &io->iter);
> +	}

This looks a bit odd. Why can't io_read_mshot() just clear
->ki_complete?

The kiocb semantics of ki_complete == NULL -> sync kiocb is also odd,
but probably fine for this case as read mshot strictly deals with
pollable files. Otherwise you'd just be blocking off this issue,
regardless of whether or not IOCB_NOWAIT is set.

In any case, it'd be much nicer to container this in io_read_mshot()
where it arguably belongs, rather than sprinkle it in __io_read().
Possible?
Pavel Begunkov Feb. 17, 2025, 2:03 p.m. UTC | #3
On 2/17/25 13:58, Jens Axboe wrote:
> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>> At the moment we can't sanely handle queuing an async request from a
>> multishot context, so disable them. It shouldn't matter as pollable
>> files / socekts don't normally do async.
> 
> Having something pollable that can return -EIOCBQUEUED is odd, but
> that's just a side comment.

[off list]

It is, but in short alg sockets do that. See
struct msghdr::msg_iocb
Pavel Begunkov Feb. 17, 2025, 2:04 p.m. UTC | #4
On 2/17/25 14:03, Pavel Begunkov wrote:
> On 2/17/25 13:58, Jens Axboe wrote:
>> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>>> At the moment we can't sanely handle queuing an async request from a
>>> multishot context, so disable them. It shouldn't matter as pollable
>>> files / socekts don't normally do async.
>>
>> Having something pollable that can return -EIOCBQUEUED is odd, but
>> that's just a side comment.
> 
> [off list]
> 
> It is, but in short alg sockets do that. See
> struct msghdr::msg_iocb

Well, not really off list, but doesn't matter
Pavel Begunkov Feb. 17, 2025, 2:08 p.m. UTC | #5
On 2/17/25 13:58, Jens Axboe wrote:
> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
> 
> The kiocb semantics of ki_complete == NULL -> sync kiocb is also odd,

That's what is_sync_kiocb() does. Would be cleaner to use
init_sync_kiocb(), but there is a larger chance to do sth
wrong as it's reinitialises it entirely.

> but probably fine for this case as read mshot strictly deals with
> pollable files. Otherwise you'd just be blocking off this issue,
> regardless of whether or not IOCB_NOWAIT is set.
> 
> In any case, it'd be much nicer to container this in io_read_mshot()
> where it arguably belongs, rather than sprinkle it in __io_read().
> Possible?

That's what I tried first, but __io_read() -> io_rw_init_file()
reinitialises it, so I don't see any good way without some
broader refactoring.
Pavel Begunkov Feb. 17, 2025, 2:12 p.m. UTC | #6
On 2/17/25 13:58, Jens Axboe wrote:
> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>> At the moment we can't sanely handle queuing an async request from a
>> multishot context, so disable them. It shouldn't matter as pollable
>> files / socekts don't normally do async.
> 
> Having something pollable that can return -EIOCBQUEUED is odd, but
> that's just a side comment.
> 
> 
>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>> index 96b42c331267..4bda46c5eb20 100644
>> --- a/io_uring/rw.c
>> +++ b/io_uring/rw.c
>> @@ -878,7 +878,15 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>>   	if (unlikely(ret))
>>   		return ret;
>>   
>> -	ret = io_iter_do_read(rw, &io->iter);
>> +	if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) {
>> +		void *cb_copy = rw->kiocb.ki_complete;
>> +
>> +		rw->kiocb.ki_complete = NULL;
>> +		ret = io_iter_do_read(rw, &io->iter);
>> +		rw->kiocb.ki_complete = cb_copy;
>> +	} else {
>> +		ret = io_iter_do_read(rw, &io->iter);
>> +	}
> 
> This looks a bit odd. Why can't io_read_mshot() just clear
> ->ki_complete?

Forgot about that one, as for restoring it back, io_uring compares
or calls ->ki_complete in a couple of places, this way the patch
is more contained. It can definitely be refactored on top.
Jens Axboe Feb. 17, 2025, 3:06 p.m. UTC | #7
On 2/17/25 7:12 AM, Pavel Begunkov wrote:
> On 2/17/25 13:58, Jens Axboe wrote:
>> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>>> At the moment we can't sanely handle queuing an async request from a
>>> multishot context, so disable them. It shouldn't matter as pollable
>>> files / socekts don't normally do async.
>>
>> Having something pollable that can return -EIOCBQUEUED is odd, but
>> that's just a side comment.
>>
>>
>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>> index 96b42c331267..4bda46c5eb20 100644
>>> --- a/io_uring/rw.c
>>> +++ b/io_uring/rw.c
>>> @@ -878,7 +878,15 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>>>       if (unlikely(ret))
>>>           return ret;
>>>   -    ret = io_iter_do_read(rw, &io->iter);
>>> +    if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) {
>>> +        void *cb_copy = rw->kiocb.ki_complete;
>>> +
>>> +        rw->kiocb.ki_complete = NULL;
>>> +        ret = io_iter_do_read(rw, &io->iter);
>>> +        rw->kiocb.ki_complete = cb_copy;
>>> +    } else {
>>> +        ret = io_iter_do_read(rw, &io->iter);
>>> +    }
>>
>> This looks a bit odd. Why can't io_read_mshot() just clear
>> ->ki_complete?
> 
> Forgot about that one, as for restoring it back, io_uring compares
> or calls ->ki_complete in a couple of places, this way the patch
> is more contained. It can definitely be refactored on top.

I'd be tempted to do that for the fix too, the patch as-is is a
bit of an eye sore... Hmm.
Pavel Begunkov Feb. 17, 2025, 3:33 p.m. UTC | #8
On 2/17/25 15:06, Jens Axboe wrote:
> On 2/17/25 7:12 AM, Pavel Begunkov wrote:
>> On 2/17/25 13:58, Jens Axboe wrote:
>>> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>>>> At the moment we can't sanely handle queuing an async request from a
>>>> multishot context, so disable them. It shouldn't matter as pollable
>>>> files / socekts don't normally do async.
>>>
>>> Having something pollable that can return -EIOCBQUEUED is odd, but
>>> that's just a side comment.
>>>
>>>
>>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>>> index 96b42c331267..4bda46c5eb20 100644
>>>> --- a/io_uring/rw.c
>>>> +++ b/io_uring/rw.c
>>>> @@ -878,7 +878,15 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>>>>        if (unlikely(ret))
>>>>            return ret;
>>>>    -    ret = io_iter_do_read(rw, &io->iter);
>>>> +    if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) {
>>>> +        void *cb_copy = rw->kiocb.ki_complete;
>>>> +
>>>> +        rw->kiocb.ki_complete = NULL;
>>>> +        ret = io_iter_do_read(rw, &io->iter);
>>>> +        rw->kiocb.ki_complete = cb_copy;
>>>> +    } else {
>>>> +        ret = io_iter_do_read(rw, &io->iter);
>>>> +    }
>>>
>>> This looks a bit odd. Why can't io_read_mshot() just clear
>>> ->ki_complete?
>>
>> Forgot about that one, as for restoring it back, io_uring compares
>> or calls ->ki_complete in a couple of places, this way the patch
>> is more contained. It can definitely be refactored on top.
> 
> I'd be tempted to do that for the fix too, the patch as-is is a
> bit of an eye sore... Hmm.

It is an eyesore, sure, but I think a simple/concise eyesore is
better as a fix than having to change a couple more blocks across
rw.c. It probably wouldn't be too many changes, but I can't say
I'm concerned about this version too much as long as it can be
reshuffled later.
Jens Axboe Feb. 17, 2025, 3:37 p.m. UTC | #9
On 2/17/25 7:08 AM, Pavel Begunkov wrote:
> On 2/17/25 13:58, Jens Axboe wrote:
>> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>>
>> The kiocb semantics of ki_complete == NULL -> sync kiocb is also odd,
> 
> That's what is_sync_kiocb() does. Would be cleaner to use
> init_sync_kiocb(), but there is a larger chance to do sth
> wrong as it's reinitialises it entirely.

Sorry if that wasn't clear, yeah I do realize this is what
is_sync_kiocb() checks. I do agree that manually clearing is saner.

>> but probably fine for this case as read mshot strictly deals with
>> pollable files. Otherwise you'd just be blocking off this issue,
>> regardless of whether or not IOCB_NOWAIT is set.
>>
>> In any case, it'd be much nicer to container this in io_read_mshot()
>> where it arguably belongs, rather than sprinkle it in __io_read().
>> Possible?
> 
> That's what I tried first, but __io_read() -> io_rw_init_file()
> reinitialises it, so I don't see any good way without some
> broader refactoring.

Would it be bad? The only reason the kiocb init part is in there is
because of the ->iopoll() check, that could still be there with the rest
of the init going into normal prep (as it arguably should).

Something like the below, totally untested...

diff --git a/io_uring/rw.c b/io_uring/rw.c
index 16f12f94943f..f8dd9a9fe9ca 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -264,6 +264,9 @@ static int io_prep_rw_pi(struct io_kiocb *req, struct io_rw *rw, int ddir,
 	return ret;
 }
 
+static void io_complete_rw(struct kiocb *kiocb, long res);
+static void io_complete_rw_iopoll(struct kiocb *kiocb, long res);
+
 static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		      int ddir, bool do_import)
 {
@@ -288,6 +291,18 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	}
 	rw->kiocb.dio_complete = NULL;
 	rw->kiocb.ki_flags = 0;
+	if (req->ctx->flags & IORING_SETUP_IOPOLL) {
+		if (!(rw->kiocb.ki_flags & IOCB_DIRECT))
+			return -EOPNOTSUPP;
+
+		rw->kiocb.private = NULL;
+		rw->kiocb.ki_flags |= IOCB_HIPRI;
+		rw->kiocb.ki_complete = io_complete_rw_iopoll;
+	} else {
+		if (rw->kiocb.ki_flags & IOCB_HIPRI)
+			return -EINVAL;
+		rw->kiocb.ki_complete = io_complete_rw;
+	}
 
 	rw->addr = READ_ONCE(sqe->addr);
 	rw->len = READ_ONCE(sqe->len);
@@ -810,23 +825,15 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
 	    ((file->f_flags & O_NONBLOCK && !(req->flags & REQ_F_SUPPORT_NOWAIT))))
 		req->flags |= REQ_F_NOWAIT;
 
-	if (ctx->flags & IORING_SETUP_IOPOLL) {
-		if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll)
+	if (kiocb->ki_flags & IOCB_HIPRI) {
+		if (!file->f_op->iopoll)
 			return -EOPNOTSUPP;
-
-		kiocb->private = NULL;
-		kiocb->ki_flags |= IOCB_HIPRI;
-		kiocb->ki_complete = io_complete_rw_iopoll;
 		req->iopoll_completed = 0;
 		if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) {
 			/* make sure every req only blocks once*/
 			req->flags &= ~REQ_F_IOPOLL_STATE;
 			req->iopoll_start = ktime_get_ns();
 		}
-	} else {
-		if (kiocb->ki_flags & IOCB_HIPRI)
-			return -EINVAL;
-		kiocb->ki_complete = io_complete_rw;
 	}
 
 	if (req->flags & REQ_F_HAS_METADATA) {
Jens Axboe Feb. 17, 2025, 3:57 p.m. UTC | #10
On 2/17/25 8:33 AM, Pavel Begunkov wrote:
> On 2/17/25 15:06, Jens Axboe wrote:
>> On 2/17/25 7:12 AM, Pavel Begunkov wrote:
>>> On 2/17/25 13:58, Jens Axboe wrote:
>>>> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>>>>> At the moment we can't sanely handle queuing an async request from a
>>>>> multishot context, so disable them. It shouldn't matter as pollable
>>>>> files / socekts don't normally do async.
>>>>
>>>> Having something pollable that can return -EIOCBQUEUED is odd, but
>>>> that's just a side comment.
>>>>
>>>>
>>>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>>>> index 96b42c331267..4bda46c5eb20 100644
>>>>> --- a/io_uring/rw.c
>>>>> +++ b/io_uring/rw.c
>>>>> @@ -878,7 +878,15 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>>>>>        if (unlikely(ret))
>>>>>            return ret;
>>>>>    -    ret = io_iter_do_read(rw, &io->iter);
>>>>> +    if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) {
>>>>> +        void *cb_copy = rw->kiocb.ki_complete;
>>>>> +
>>>>> +        rw->kiocb.ki_complete = NULL;
>>>>> +        ret = io_iter_do_read(rw, &io->iter);
>>>>> +        rw->kiocb.ki_complete = cb_copy;
>>>>> +    } else {
>>>>> +        ret = io_iter_do_read(rw, &io->iter);
>>>>> +    }
>>>>
>>>> This looks a bit odd. Why can't io_read_mshot() just clear
>>>> ->ki_complete?
>>>
>>> Forgot about that one, as for restoring it back, io_uring compares
>>> or calls ->ki_complete in a couple of places, this way the patch
>>> is more contained. It can definitely be refactored on top.
>>
>> I'd be tempted to do that for the fix too, the patch as-is is a
>> bit of an eye sore... Hmm.
> 
> It is an eyesore, sure, but I think a simple/concise eyesore is
> better as a fix than having to change a couple more blocks across
> rw.c. It probably wouldn't be too many changes, but I can't say
> I'm concerned about this version too much as long as it can be
> reshuffled later.

Sure, as discussed let's do a cleanup series on top. You'll send out
a v2 with some improved commit message wording?
Pavel Begunkov Feb. 17, 2025, 5:51 p.m. UTC | #11
On 2/17/25 15:57, Jens Axboe wrote:
> On 2/17/25 8:33 AM, Pavel Begunkov wrote:
>> On 2/17/25 15:06, Jens Axboe wrote:
>>> On 2/17/25 7:12 AM, Pavel Begunkov wrote:
>>>> On 2/17/25 13:58, Jens Axboe wrote:
>>>>> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>>>>>> At the moment we can't sanely handle queuing an async request from a
>>>>>> multishot context, so disable them. It shouldn't matter as pollable
>>>>>> files / socekts don't normally do async.
>>>>>
>>>>> Having something pollable that can return -EIOCBQUEUED is odd, but
>>>>> that's just a side comment.
>>>>>
>>>>>
>>>>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>>>>> index 96b42c331267..4bda46c5eb20 100644
>>>>>> --- a/io_uring/rw.c
>>>>>> +++ b/io_uring/rw.c
>>>>>> @@ -878,7 +878,15 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>         if (unlikely(ret))
>>>>>>             return ret;
>>>>>>     -    ret = io_iter_do_read(rw, &io->iter);
>>>>>> +    if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) {
>>>>>> +        void *cb_copy = rw->kiocb.ki_complete;
>>>>>> +
>>>>>> +        rw->kiocb.ki_complete = NULL;
>>>>>> +        ret = io_iter_do_read(rw, &io->iter);
>>>>>> +        rw->kiocb.ki_complete = cb_copy;
>>>>>> +    } else {
>>>>>> +        ret = io_iter_do_read(rw, &io->iter);
>>>>>> +    }
>>>>>
>>>>> This looks a bit odd. Why can't io_read_mshot() just clear
>>>>> ->ki_complete?
>>>>
>>>> Forgot about that one, as for restoring it back, io_uring compares
>>>> or calls ->ki_complete in a couple of places, this way the patch
>>>> is more contained. It can definitely be refactored on top.
>>>
>>> I'd be tempted to do that for the fix too, the patch as-is is a
>>> bit of an eye sore... Hmm.
>>
>> It is an eyesore, sure, but I think a simple/concise eyesore is
>> better as a fix than having to change a couple more blocks across
>> rw.c. It probably wouldn't be too many changes, but I can't say
>> I'm concerned about this version too much as long as it can be
>> reshuffled later.
> 
> Sure, as discussed let's do a cleanup series on top. You'll send out
> a v2 with some improved commit message wording?

Yeah, I'll resend it a bit later
Pavel Begunkov Feb. 19, 2025, 1:35 a.m. UTC | #12
On 2/17/25 15:37, Jens Axboe wrote:
> On 2/17/25 7:08 AM, Pavel Begunkov wrote:
>> On 2/17/25 13:58, Jens Axboe wrote:
>>> On 2/17/25 6:37 AM, Pavel Begunkov wrote:
>>>
>>> The kiocb semantics of ki_complete == NULL -> sync kiocb is also odd,
>>
>> That's what is_sync_kiocb() does. Would be cleaner to use
>> init_sync_kiocb(), but there is a larger chance to do sth
>> wrong as it's reinitialises it entirely.
> 
> Sorry if that wasn't clear, yeah I do realize this is what
> is_sync_kiocb() checks. I do agree that manually clearing is saner.
> 
>>> but probably fine for this case as read mshot strictly deals with
>>> pollable files. Otherwise you'd just be blocking off this issue,
>>> regardless of whether or not IOCB_NOWAIT is set.
>>>
>>> In any case, it'd be much nicer to container this in io_read_mshot()
>>> where it arguably belongs, rather than sprinkle it in __io_read().
>>> Possible?
>>
>> That's what I tried first, but __io_read() -> io_rw_init_file()
>> reinitialises it, so I don't see any good way without some
>> broader refactoring.
> 
> Would it be bad? The only reason the kiocb init part is in there is
> because of the ->iopoll() check, that could still be there with the rest
> of the init going into normal prep (as it arguably should).
> 
> Something like the below, totally untested...

fwiw, turns out we can't move all of it like in the diff as
ki_flags setup and checks depend on having the file set.

> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 16f12f94943f..f8dd9a9fe9ca 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -264,6 +264,9 @@ static int io_prep_rw_pi(struct io_kiocb *req, struct io_rw *rw, int ddir,
>   	return ret;
>   }
>   
> +static void io_complete_rw(struct kiocb *kiocb, long res);
> +static void io_complete_rw_iopoll(struct kiocb *kiocb, long res);
> +
>   static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>   		      int ddir, bool do_import)
>   {
> @@ -288,6 +291,18 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>   	}
>   	rw->kiocb.dio_complete = NULL;
>   	rw->kiocb.ki_flags = 0;
> +	if (req->ctx->flags & IORING_SETUP_IOPOLL) {
> +		if (!(rw->kiocb.ki_flags & IOCB_DIRECT))
> +			return -EOPNOTSUPP;
> +
> +		rw->kiocb.private = NULL;
> +		rw->kiocb.ki_flags |= IOCB_HIPRI;
> +		rw->kiocb.ki_complete = io_complete_rw_iopoll;
> +	} else {
> +		if (rw->kiocb.ki_flags & IOCB_HIPRI)
> +			return -EINVAL;
> +		rw->kiocb.ki_complete = io_complete_rw;
> +	}
>   
>   	rw->addr = READ_ONCE(sqe->addr);
>   	rw->len = READ_ONCE(sqe->len);
> @@ -810,23 +825,15 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
>   	    ((file->f_flags & O_NONBLOCK && !(req->flags & REQ_F_SUPPORT_NOWAIT))))
>   		req->flags |= REQ_F_NOWAIT;
>   
> -	if (ctx->flags & IORING_SETUP_IOPOLL) {
> -		if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll)
> +	if (kiocb->ki_flags & IOCB_HIPRI) {
> +		if (!file->f_op->iopoll)
>   			return -EOPNOTSUPP;
> -
> -		kiocb->private = NULL;
> -		kiocb->ki_flags |= IOCB_HIPRI;
> -		kiocb->ki_complete = io_complete_rw_iopoll;
>   		req->iopoll_completed = 0;
>   		if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) {
>   			/* make sure every req only blocks once*/
>   			req->flags &= ~REQ_F_IOPOLL_STATE;
>   			req->iopoll_start = ktime_get_ns();
>   		}
> -	} else {
> -		if (kiocb->ki_flags & IOCB_HIPRI)
> -			return -EINVAL;
> -		kiocb->ki_complete = io_complete_rw;
>   	}
>   
>   	if (req->flags & REQ_F_HAS_METADATA) {
>
diff mbox series

Patch

diff --git a/io_uring/rw.c b/io_uring/rw.c
index 96b42c331267..4bda46c5eb20 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -878,7 +878,15 @@  static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(ret))
 		return ret;
 
-	ret = io_iter_do_read(rw, &io->iter);
+	if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) {
+		void *cb_copy = rw->kiocb.ki_complete;
+
+		rw->kiocb.ki_complete = NULL;
+		ret = io_iter_do_read(rw, &io->iter);
+		rw->kiocb.ki_complete = cb_copy;
+	} else {
+		ret = io_iter_do_read(rw, &io->iter);
+	}
 
 	/*
 	 * Some file systems like to return -EOPNOTSUPP for an IOCB_NOWAIT
@@ -902,7 +910,8 @@  static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 	} else if (ret == -EIOCBQUEUED) {
 		return IOU_ISSUE_SKIP_COMPLETE;
 	} else if (ret == req->cqe.res || ret <= 0 || !force_nonblock ||
-		   (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) {
+		   (req->flags & REQ_F_NOWAIT) || !need_complete_io(req) ||
+		   (issue_flags & IO_URING_F_MULTISHOT)) {
 		/* read all, failed, already did sync or don't want to retry */
 		goto done;
 	}