Message ID | 20220308152105.309618-15-joshi.k@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | io_uring passthru over nvme | expand |
On Tue, Mar 08, 2022 at 08:51:02PM +0530, Kanchan Joshi wrote: > + if (req->opcode == IORING_OP_URING_CMD || > + req->opcode == IORING_OP_URING_CMD_FIXED) { > + /* uring_cmd structure does not contain kiocb struct */ > + struct kiocb kiocb_uring_cmd; > + > + kiocb_uring_cmd.private = req->uring_cmd.bio; > + kiocb_uring_cmd.ki_filp = req->uring_cmd.file; > + ret = req->uring_cmd.file->f_op->iopoll(&kiocb_uring_cmd, > + &iob, poll_flags); > + } else { > + ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, > + poll_flags); > + } This is just completely broken. You absolutely do need the iocb in struct uring_cmd for ->iopoll to work.
On Fri, Mar 11, 2022 at 12:20 PM Christoph Hellwig <hch@lst.de> wrote: > > On Tue, Mar 08, 2022 at 08:51:02PM +0530, Kanchan Joshi wrote: > > + if (req->opcode == IORING_OP_URING_CMD || > > + req->opcode == IORING_OP_URING_CMD_FIXED) { > > + /* uring_cmd structure does not contain kiocb struct */ > > + struct kiocb kiocb_uring_cmd; > > + > > + kiocb_uring_cmd.private = req->uring_cmd.bio; > > + kiocb_uring_cmd.ki_filp = req->uring_cmd.file; > > + ret = req->uring_cmd.file->f_op->iopoll(&kiocb_uring_cmd, > > + &iob, poll_flags); > > + } else { > > + ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, > > + poll_flags); > > + } > > This is just completely broken. You absolutely do need the iocb > in struct uring_cmd for ->iopoll to work. But, after you did bio based polling, we need just the bio to poll. iocb is a big structure (48 bytes), and if we try to place it in struct io_uring_cmd, we will just blow up the cacheline in io_uring (first one in io_kiocb). So we just store that bio pointer in io_uring_cmd on submission (please see patch 15). And here on completion we pick that bio, and put that into this local iocb, simply because ->iopoll needs it. Do you see I am still missing anything here?
On Mon, Mar 14, 2022 at 03:46:08PM +0530, Kanchan Joshi wrote: > But, after you did bio based polling, we need just the bio to poll. > iocb is a big structure (48 bytes), and if we try to place it in > struct io_uring_cmd, we will just blow up the cacheline in io_uring > (first one in io_kiocb). > So we just store that bio pointer in io_uring_cmd on submission > (please see patch 15). > And here on completion we pick that bio, and put that into this local > iocb, simply because ->iopoll needs it. > Do you see I am still missing anything here? Yes. The VFS layer interface for polling is the kiocb. Don't break it. The bio is just an implementation detail.
On Tue, Mar 15, 2022 at 09:57:45AM +0100, Christoph Hellwig wrote: >On Mon, Mar 14, 2022 at 03:46:08PM +0530, Kanchan Joshi wrote: >> But, after you did bio based polling, we need just the bio to poll. >> iocb is a big structure (48 bytes), and if we try to place it in >> struct io_uring_cmd, we will just blow up the cacheline in io_uring >> (first one in io_kiocb). >> So we just store that bio pointer in io_uring_cmd on submission >> (please see patch 15). >> And here on completion we pick that bio, and put that into this local >> iocb, simply because ->iopoll needs it. >> Do you see I am still missing anything here? > >Yes. The VFS layer interface for polling is the kiocb. Don't break >it. The bio is just an implementation detail. So how about adding ->async_cmd_poll in file_operations (since this corresponds to ->async_cmd)? It will take struct io_uring_cmd pointer as parameter. Both ->iopoll and ->async_cmd_poll will differ in what they accept (kiocb vs io_uring_cmd). The provider may use bio_poll, or whatever else is the implementation detail. for read/write, submission interface took kiocb, and completion interface (->iopoll) also operated on the same. for uring/async-cmd, submission interface took io_uring_cmd, but completion used kiocb based ->iopoll. The new ->async_cmd_poll should settle this.
On Wed, Mar 16, 2022 at 10:39:05AM +0530, Kanchan Joshi wrote: > So how about adding ->async_cmd_poll in file_operations (since this > corresponds to ->async_cmd)? > It will take struct io_uring_cmd pointer as parameter. > Both ->iopoll and ->async_cmd_poll will differ in what they accept (kiocb > vs io_uring_cmd). The provider may use bio_poll, or whatever else is the > implementation detail. That does sound way better than the current version at least.
diff --git a/fs/io_uring.c b/fs/io_uring.c index f04bb497bd88..8bd9401f9964 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2664,7 +2664,20 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) if (READ_ONCE(req->iopoll_completed)) break; - ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, poll_flags); + if (req->opcode == IORING_OP_URING_CMD || + req->opcode == IORING_OP_URING_CMD_FIXED) { + /* uring_cmd structure does not contain kiocb struct */ + struct kiocb kiocb_uring_cmd; + + kiocb_uring_cmd.private = req->uring_cmd.bio; + kiocb_uring_cmd.ki_filp = req->uring_cmd.file; + ret = req->uring_cmd.file->f_op->iopoll(&kiocb_uring_cmd, + &iob, poll_flags); + } else { + ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, + poll_flags); + } + if (unlikely(ret < 0)) return ret; else if (ret) @@ -2777,6 +2790,15 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min) wq_list_empty(&ctx->iopoll_list)) break; } + + /* + * In some scenarios, completion callback has been queued up to be + * completed in-task context but polling happens in the same task + * not giving a chance for the completion callback to complete. + */ + if (current->task_works) + io_run_task_work(); + ret = io_do_iopoll(ctx, !min); if (ret < 0) break; @@ -4136,6 +4158,14 @@ static int io_linkat(struct io_kiocb *req, unsigned int issue_flags) return 0; } +static void io_complete_uring_cmd_iopoll(struct io_kiocb *req, long res) +{ + WRITE_ONCE(req->result, res); + /* order with io_iopoll_complete() checking ->result */ + smp_wmb(); + WRITE_ONCE(req->iopoll_completed, 1); +} + /* * Called by consumers of io_uring_cmd, if they originally returned * -EIOCBQUEUED upon receiving the command. @@ -4146,7 +4176,11 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret) if (ret < 0) req_set_fail(req); - io_req_complete(req, ret); + + if (req->uring_cmd.flags & IO_URING_F_UCMD_POLLED) + io_complete_uring_cmd_iopoll(req, ret); + else + io_req_complete(req, ret); } EXPORT_SYMBOL_GPL(io_uring_cmd_done); @@ -4158,15 +4192,18 @@ static int io_uring_cmd_prep(struct io_kiocb *req, if (!req->file->f_op->async_cmd || !(req->ctx->flags & IORING_SETUP_SQE128)) return -EOPNOTSUPP; - if (req->ctx->flags & IORING_SETUP_IOPOLL) - return -EOPNOTSUPP; + if (req->ctx->flags & IORING_SETUP_IOPOLL) { + ioucmd->flags = IO_URING_F_UCMD_POLLED; + ioucmd->bio = NULL; + req->iopoll_completed = 0; + } else { + ioucmd->flags = 0; + } if (req->opcode == IORING_OP_URING_CMD_FIXED) { req->imu = NULL; io_req_set_rsrc_node(req, ctx); req->buf_index = READ_ONCE(sqe->buf_index); - ioucmd->flags = IO_URING_F_UCMD_FIXEDBUFS; - } else { - ioucmd->flags = 0; + ioucmd->flags |= IO_URING_F_UCMD_FIXEDBUFS; } ioucmd->cmd = (void *) &sqe->cmd; diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index abad6175739e..65db83d703b7 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -9,6 +9,7 @@ enum io_uring_cmd_flags { IO_URING_F_COMPLETE_DEFER = 1, IO_URING_F_UNLOCKED = 2, IO_URING_F_UCMD_FIXEDBUFS = 4, + IO_URING_F_UCMD_POLLED = 8, /* int's last bit, sign checks are usually faster than a bit test */ IO_URING_F_NONBLOCK = INT_MIN, }; @@ -16,8 +17,12 @@ enum io_uring_cmd_flags { struct io_uring_cmd { struct file *file; void *cmd; - /* for irq-completion - if driver requires doing stuff in task-context*/ - void (*driver_cb)(struct io_uring_cmd *cmd); + union { + void *bio; /* used for polled completion */ + + /* for irq-completion - if driver requires doing stuff in task-context*/ + void (*driver_cb)(struct io_uring_cmd *cmd); + }; u32 flags; u32 cmd_op; u16 cmd_len;