diff mbox series

[GIT,PULL] io_uring passthrough support

Message ID 6f712c75-c849-ae89-d763-b2a18da52844@kernel.dk (mailing list archive)
State New
Headers show
Series [GIT,PULL] io_uring passthrough support | expand

Pull-request

git://git.kernel.dk/linux-block.git tags/for-5.19/io_uring-passthrough-2022-05-22

Commit Message

Jens Axboe May 22, 2022, 9:26 p.m. UTC
Hi Linus,

On top of everything else, this adds support for passthrough for
io_uring. The initial feature for this is NVMe passthrough support,
which allows non-filesystem based IO commands and admin commands. To
support this, io_uring grows support for SQE and CQE members that are
twice as big, allowing to pass in a full NVMe command without having to
copy data around. And to complete with more than just a single 32-bit
value as the output.

This will cause a merge conflict as well, with the provided buffer
change from the core branch, and adding CQE32 support for NOP in this
branch. Resolution:

+++ b/fs/io_uring.c
@@@ -4997,17 -5169,11 +5256,23 @@@ static int io_nop_prep(struct io_kiocb 
   */
  static int io_nop(struct io_kiocb *req, unsigned int issue_flags)
  {
++	unsigned int cflags;
 +	void __user *buf;
 +
 +	if (req->flags & REQ_F_BUFFER_SELECT) {
 +		size_t len = 1;
 +
 +		buf = io_buffer_select(req, &len, issue_flags);
 +		if (!buf)
 +			return -ENOBUFS;
 +	}
 +
- 	__io_req_complete(req, issue_flags, 0, io_put_kbuf(req, issue_flags));
++	cflags = io_put_kbuf(req, issue_flags);
+ 	if (!(req->ctx->flags & IORING_SETUP_CQE32))
 -		__io_req_complete(req, issue_flags, 0, 0);
++		__io_req_complete(req, issue_flags, 0, cflags);
+ 	else
 -		__io_req_complete32(req, issue_flags, 0, 0, req->nop.extra1,
 -					req->nop.extra2);
++		__io_req_complete32(req, issue_flags, 0, cflags,
++					req->nop.extra1, req->nop.extra2);
  	return 0;
  }
  

Please pull!


The following changes since commit 7ccba24d3bc084d891def1a6fea504e4cb327a8c:

  io_uring: don't clear req->kbuf when buffer selection is done (2022-05-09 06:29:06 -0600)

are available in the Git repository at:

  git://git.kernel.dk/linux-block.git tags/for-5.19/io_uring-passthrough-2022-05-22

for you to fetch changes up to 3fe07bcd800d6e5e4e4263ca2564d69095c157bf:

  io_uring: cleanup handling of the two task_work lists (2022-05-21 09:17:05 -0600)

----------------------------------------------------------------
for-5.19/io_uring-passthrough-2022-05-22

----------------------------------------------------------------
Anuj Gupta (1):
      nvme: add vectored-io support for uring-cmd

Christoph Hellwig (1):
      nvme: refactor nvme_submit_user_cmd()

Jens Axboe (6):
      Merge branch 'for-5.19/io_uring' into for-5.19/io_uring-passthrough
      Merge branch 'for-5.19/io_uring-socket' into for-5.19/io_uring-passthrough
      io_uring: add support for 128-byte SQEs
      fs,io_uring: add infrastructure for uring-cmd
      block: wire-up support for passthrough plugging
      io_uring: cleanup handling of the two task_work lists

Kanchan Joshi (3):
      nvme: wire-up uring-cmd support for io-passthru on char-device.
      nvme: helper for uring-passthrough checks
      nvme: enable uring-passthrough for admin commands

Ming Lei (1):
      blk-mq: fix passthrough plugging

Stefan Roesch (12):
      io_uring: support CQE32 in io_uring_cqe
      io_uring: store add. return values for CQE32
      io_uring: change ring size calculation for CQE32
      io_uring: add CQE32 setup processing
      io_uring: add CQE32 completion processing
      io_uring: modify io_get_cqe for CQE32
      io_uring: flush completions for CQE32
      io_uring: overflow processing for CQE32
      io_uring: add tracing for additional CQE32 fields
      io_uring: support CQE32 in /proc info
      io_uring: enable CQE32
      io_uring: support CQE32 for nop operation

 block/blk-mq.c                  | 109 ++++----
 drivers/nvme/host/core.c        |   2 +
 drivers/nvme/host/ioctl.c       | 278 +++++++++++++++++++-
 drivers/nvme/host/multipath.c   |   1 +
 drivers/nvme/host/nvme.h        |   5 +
 fs/io_uring.c                   | 444 ++++++++++++++++++++++++++------
 include/linux/fs.h              |   2 +
 include/linux/io_uring.h        |  33 +++
 include/trace/events/io_uring.h |  18 +-
 include/uapi/linux/io_uring.h   |  24 +-
 include/uapi/linux/nvme_ioctl.h |  28 ++
 11 files changed, 806 insertions(+), 138 deletions(-)

Comments

Linus Torvalds May 23, 2022, 8:15 p.m. UTC | #1
On Sun, May 22, 2022 at 2:26 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> This will cause a merge conflict as well, with the provided buffer
> change from the core branch, and adding CQE32 support for NOP in this
> branch.

Ugh, that really hits home how ugly this CQE32 support was.

Dammit, it shouldn't have been done this way. That io_nop() code is
disgusting, and how it wants that separate "with extra info" case is
just nasty.

I've pulled this, but with some swearing. That whole "extra1" and
"extra2" is ugly as hell, and just the naming shows that it has no
sane semantics, much less documentation.

And the way it's randomly hidden in 'struct io_nop' *and* then a union
with that hash_node is just disgusting beyond words. Why do you need
both fields when you just copy one to the other at cmd start and then
back at cmd end?

I must be missing something, but that it is incredibly ugly is clear.

                 Linus
Jens Axboe May 23, 2022, 8:19 p.m. UTC | #2
On 5/23/22 2:15 PM, Linus Torvalds wrote:
> On Sun, May 22, 2022 at 2:26 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> This will cause a merge conflict as well, with the provided buffer
>> change from the core branch, and adding CQE32 support for NOP in this
>> branch.
> 
> Ugh, that really hits home how ugly this CQE32 support was.
> 
> Dammit, it shouldn't have been done this way. That io_nop() code is
> disgusting, and how it wants that separate "with extra info" case is
> just nasty.
> 
> I've pulled this, but with some swearing. That whole "extra1" and
> "extra2" is ugly as hell, and just the naming shows that it has no
> sane semantics, much less documentation.
> 
> And the way it's randomly hidden in 'struct io_nop' *and* then a union
> with that hash_node is just disgusting beyond words. Why do you need
> both fields when you just copy one to the other at cmd start and then
> back at cmd end?
> 
> I must be missing something, but that it is incredibly ugly is clear.

I think you are! The NOP case is just a sample way of exercising the
CQE32 support, with extra1+2 being what is passed back in the bigger
CQE. The NOP command exists purely to test things, and the CQE32 support
there is a bit forced because NOP just always completes with '0' in the
normal res field.

We can obviously dump this as it isn't integral to anything, and
honestly now that the NVMe is wired up, there's no great need to have a
separate test for it. But it doesn't really hurt and there are already
regression tests for it.
pr-tracker-bot@kernel.org May 23, 2022, 8:42 p.m. UTC | #3
The pull request you sent on Sun, 22 May 2022 15:26:23 -0600:

> git://git.kernel.dk/linux-block.git tags/for-5.19/io_uring-passthrough-2022-05-22

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/9836e93c0a7e031ac6a71c56171c229de1eea7cf

Thank you!
Christoph Hellwig May 24, 2022, 5:15 a.m. UTC | #4
On Mon, May 23, 2022 at 02:19:38PM -0600, Jens Axboe wrote:
> We can obviously dump this as it isn't integral to anything, and
> honestly now that the NVMe is wired up, there's no great need to have a
> separate test for it. But it doesn't really hurt and there are already
> regression tests for it.

Yes, please dump it.  It is not actually useful and is just more
code around for no good reason.
diff mbox series

Patch

diff --cc fs/io_uring.c
index 1015dd49e7e5,c5a476e6c068..395d3a921b53
--- a/fs/io_uring.c