Message ID | 20211013165416.985696-9-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Batched completions | expand |
On Wed, Oct 13, 2021 at 10:54:15AM -0600, Jens Axboe wrote: > @@ -2404,6 +2406,11 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) > struct kiocb *kiocb = &req->rw.kiocb; > int ret; > > + if (!file) > + file = kiocb->ki_filp; > + else if (file != kiocb->ki_filp) > + break; > + Can you explain why we now can only poll for a single file (independent of the fact that batching is used)? > + if (!pos && !iob.req_list) > return 0; > + if (iob.req_list) > + iob.complete(&iob); Why not: if (iob.req_list) iob.complete(&iob); else if (!pos) return 0; ?
On 10/14/21 2:03 AM, Christoph Hellwig wrote: > On Wed, Oct 13, 2021 at 10:54:15AM -0600, Jens Axboe wrote: >> @@ -2404,6 +2406,11 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) >> struct kiocb *kiocb = &req->rw.kiocb; >> int ret; >> >> + if (!file) >> + file = kiocb->ki_filp; >> + else if (file != kiocb->ki_filp) >> + break; >> + > > Can you explain why we now can only poll for a single file (independent > of the fact that batching is used)? Different file may be on a different backend, it's just playing it safe and splitting it up. In practice it should not matter. >> + if (!pos && !iob.req_list) >> return 0; >> + if (iob.req_list) >> + iob.complete(&iob); > > Why not: > > if (iob.req_list) > iob.complete(&iob); > else if (!pos) > return 0; That is cleaner, will adapt. Thanks!
On Thu, Oct 14, 2021 at 09:45:38AM -0600, Jens Axboe wrote: > On 10/14/21 2:03 AM, Christoph Hellwig wrote: > > On Wed, Oct 13, 2021 at 10:54:15AM -0600, Jens Axboe wrote: > >> @@ -2404,6 +2406,11 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) > >> struct kiocb *kiocb = &req->rw.kiocb; > >> int ret; > >> > >> + if (!file) > >> + file = kiocb->ki_filp; > >> + else if (file != kiocb->ki_filp) > >> + break; > >> + > > > > Can you explain why we now can only poll for a single file (independent > > of the fact that batching is used)? > > Different file may be on a different backend, it's just playing it > safe and splitting it up. In practice it should not matter. Well, with file systems even the same file can land on different devices. Maybe we need a cookie? Either way this should be commented as right now it looks pretty arbitrary.
On 10/14/21 10:08 AM, Christoph Hellwig wrote: > On Thu, Oct 14, 2021 at 09:45:38AM -0600, Jens Axboe wrote: >> On 10/14/21 2:03 AM, Christoph Hellwig wrote: >>> On Wed, Oct 13, 2021 at 10:54:15AM -0600, Jens Axboe wrote: >>>> @@ -2404,6 +2406,11 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) >>>> struct kiocb *kiocb = &req->rw.kiocb; >>>> int ret; >>>> >>>> + if (!file) >>>> + file = kiocb->ki_filp; >>>> + else if (file != kiocb->ki_filp) >>>> + break; >>>> + >>> >>> Can you explain why we now can only poll for a single file (independent >>> of the fact that batching is used)? >> >> Different file may be on a different backend, it's just playing it >> safe and splitting it up. In practice it should not matter. > > Well, with file systems even the same file can land on different > devices. Maybe we need a cookie? > > Either way this should be commented as right now it looks pretty > arbitrary. I got rid of this and added a dev_id kind of cookie that gets matched on batched addition.
On Thu, Oct 14, 2021 at 12:14:19PM -0600, Jens Axboe wrote: > > Either way this should be commented as right now it looks pretty > > arbitrary. > > I got rid of this and added a dev_id kind of cookie that gets matched > on batched addition. Thinking about this a bit more. I don't think we need to differenciate devices, just complete handlers. That is everything using the same ->complete_batch handler could go onto the same list. Which means different nvme devices can be polled together. So I think we can just check the complete/complete_batch handler and be done with it. The big benefit besides saving a field is that this is pretty much self-explaining.
On 10/15/21 10:29 PM, Christoph Hellwig wrote: > On Thu, Oct 14, 2021 at 12:14:19PM -0600, Jens Axboe wrote: >>> Either way this should be commented as right now it looks pretty >>> arbitrary. >> >> I got rid of this and added a dev_id kind of cookie that gets matched >> on batched addition. > > Thinking about this a bit more. I don't think we need to differenciate > devices, just complete handlers. That is everything using the same > ->complete_batch handler could go onto the same list. Which means > different nvme devices can be polled together. So I think we can > just check the complete/complete_batch handler and be done with it. > The big benefit besides saving a field is that this is pretty much > self-explaining. Good idea, cleaner than a dev_id cookie too. I'll be sending out the revised series soonish, but I incorporated that change.
diff --git a/fs/io_uring.c b/fs/io_uring.c index 082ff64c1bcb..5c031ab8f77f 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2390,6 +2390,8 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) { struct io_wq_work_node *pos, *start, *prev; unsigned int poll_flags = BLK_POLL_NOSLEEP; + struct file *file = NULL; + DEFINE_IO_BATCH(iob); int nr_events = 0; /* @@ -2404,6 +2406,11 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) struct kiocb *kiocb = &req->rw.kiocb; int ret; + if (!file) + file = kiocb->ki_filp; + else if (file != kiocb->ki_filp) + break; + /* * Move completed and retryable entries to our local lists. * If we find a request that requires polling, break out @@ -2412,19 +2419,21 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) if (READ_ONCE(req->iopoll_completed)) break; - ret = kiocb->ki_filp->f_op->iopoll(kiocb, NULL, poll_flags); + ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, poll_flags); if (unlikely(ret < 0)) return ret; else if (ret) poll_flags |= BLK_POLL_ONESHOT; /* iopoll may have completed current req */ - if (READ_ONCE(req->iopoll_completed)) + if (iob.req_list || READ_ONCE(req->iopoll_completed)) break; } - if (!pos) + if (!pos && !iob.req_list) return 0; + if (iob.req_list) + iob.complete(&iob); prev = start; wq_list_for_each_resume(pos, prev) {
Wire up using an io_batch for f_op->iopoll(). If the lower stack supports it, we can handle high rates of polled IO more efficiently. This raises the single core efficiency on my system from ~6.1M IOPS to ~6.6M IOPS running a random read workload at depth 128 on two gen2 Optane drives. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/io_uring.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)