mbox series

[RFC,00/13] uring-passthru for nvme

Message ID 20211220141734.12206-1-joshi.k@samsung.com (mailing list archive)
Headers show
Series uring-passthru for nvme | expand

Message

Kanchan Joshi Dec. 20, 2021, 2:17 p.m. UTC
Here is a revamped series on uring-passthru which is on top of Jens
"nvme-passthru-wip.2" branch.
https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2

This scales much better than before with the addition of following:
- plugging
- passthru polling (sync and async; sync part comes from a patch that
  Keith did earlier)
- bio-cache (this is regardless of irq/polling since we submit/complete in
  task-contex anyway. Currently kicks in when fixed-buffer option is
also passed, but that's primarily to keep the plumbing simple)

Also the feedback from Christoph (previous fixed-buffer series) is in
which has streamlined the plumbing.

I look forward to further feedback/comments.

KIOPS(512b) on P5800x looked like this:

QD    uring    pt    uring-poll    pt-poll
8      538     589      831         902
64     967     1131     1351        1378
256    1043    1230     1376        1429

Here uring is operating on block-interface (nvme0n1) while 'pt' refers
to uring-passthru operating on char-interface (ng0n1).

Perf/testing is with this custom fio that turnes regular io into
passthru on supplying "uring_cmd=1" option.
https://github.com/joshkan/fio/tree/nvme-passthru-wip-polling
Example command-line:
fio -iodepth=256 -rw=randread -ioengine=io_uring -bs=512 -numjobs=1 -runtime=60 -group_reporting -iodepth_batch_submit=64 -iodepth_batch_complete_min=1 -iodepth_batch_complete_max=64 -fixedbufs=1 -hipri=1 -sqthread_poll=0 -filename=/dev/ng0n1 -name=io_uring_256 -uring_cmd=1

background/context:
https://linuxplumbersconf.org/event/11/contributions/989/attachments/747/1723/lpc-2021-building-a-fast-passthru.pdf

Changes from v5:
https://lore.kernel.org/linux-nvme/20210805125539.66958-1-joshi.k@samsung.com/
1. Fixed-buffer passthru with same ioctl code + other feedback from hch
2. Plugging (from Jens)
3. Sync polling (from Keith)
3. Async polling via io_uring
4. Enable bio-cache for fixed-buffer passthru

Changes from v4:
https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
1. Moved to v5 branch of Jens, adapted to task-work changes in io_uring
2. Removed support for block-passthrough (over nvme0n1) for now
3. Added support for char-passthrough (over ng0n1)
4. Added fixed-buffer passthrough in io_uring and nvme plumbing


Anuj Gupta (3):
  io_uring: mark iopoll not supported for uring-cmd
  io_uring: modify unused field in io_uring_cmd to store flags
  io_uring: add support for uring_cmd with fixed-buffer

Jens Axboe (2):
  io_uring: plug for async bypass
  block: wire-up support for plugging

Kanchan Joshi (6):
  io_uring: add infra for uring_cmd completion in submitter-task
  nvme: wire-up support for async-passthru on char-device.
  io_uring: add flag and helper for fixed-buffer uring-cmd
  nvme: enable passthrough with fixed-buffer
  block: factor out helper for bio allocation from cache
  nvme: enable bio-cache for fixed-buffer passthru

Keith Busch (1):
  nvme: allow user passthrough commands to poll

Pankaj Raghav (1):
  nvme: Add async passthru polling support

 block/bio.c                     |  43 +++--
 block/blk-map.c                 |  46 ++++++
 block/blk-mq.c                  |  93 +++++------
 drivers/nvme/host/core.c        |  21 ++-
 drivers/nvme/host/ioctl.c       | 271 ++++++++++++++++++++++++++++----
 drivers/nvme/host/multipath.c   |   2 +
 drivers/nvme/host/nvme.h        |  13 +-
 drivers/nvme/host/pci.c         |   4 +-
 drivers/nvme/target/passthru.c  |   2 +-
 fs/io_uring.c                   | 113 +++++++++++--
 include/linux/bio.h             |   1 +
 include/linux/blk-mq.h          |   4 +
 include/linux/io_uring.h        |  26 ++-
 include/uapi/linux/io_uring.h   |   6 +-
 include/uapi/linux/nvme_ioctl.h |   4 +
 15 files changed, 542 insertions(+), 107 deletions(-)

Comments

Jens Axboe Dec. 21, 2021, 3:45 a.m. UTC | #1
On 12/20/21 7:17 AM, Kanchan Joshi wrote:
> Here is a revamped series on uring-passthru which is on top of Jens
> "nvme-passthru-wip.2" branch.
> https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2
> 
> This scales much better than before with the addition of following:
> - plugging
> - passthru polling (sync and async; sync part comes from a patch that
>   Keith did earlier)
> - bio-cache (this is regardless of irq/polling since we submit/complete in
>   task-contex anyway. Currently kicks in when fixed-buffer option is
> also passed, but that's primarily to keep the plumbing simple)
> 
> Also the feedback from Christoph (previous fixed-buffer series) is in
> which has streamlined the plumbing.
> 
> I look forward to further feedback/comments.
> 
> KIOPS(512b) on P5800x looked like this:
> 
> QD    uring    pt    uring-poll    pt-poll
> 8      538     589      831         902
> 64     967     1131     1351        1378
> 256    1043    1230     1376        1429

These are nice results! Can you share all the job files or fio
invocations for each of these? I guess it's just two variants, with QD
varied between them?

We really (REALLY) should turn the nvme-wip branch into something
coherent, but at least with this we have some idea of an end result and
something that is testable. This looks so much better from the
performance POV than the earlier versions, passthrough _should_ be
faster than non-pt.
Kanchan Joshi Dec. 21, 2021, 2:36 p.m. UTC | #2
On Tue, Dec 21, 2021 at 9:15 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 12/20/21 7:17 AM, Kanchan Joshi wrote:
> > Here is a revamped series on uring-passthru which is on top of Jens
> > "nvme-passthru-wip.2" branch.
> > https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2
> >
> > This scales much better than before with the addition of following:
> > - plugging
> > - passthru polling (sync and async; sync part comes from a patch that
> >   Keith did earlier)
> > - bio-cache (this is regardless of irq/polling since we submit/complete in
> >   task-contex anyway. Currently kicks in when fixed-buffer option is
> > also passed, but that's primarily to keep the plumbing simple)
> >
> > Also the feedback from Christoph (previous fixed-buffer series) is in
> > which has streamlined the plumbing.
> >
> > I look forward to further feedback/comments.
> >
> > KIOPS(512b) on P5800x looked like this:
> >
> > QD    uring    pt    uring-poll    pt-poll
> > 8      538     589      831         902
> > 64     967     1131     1351        1378
> > 256    1043    1230     1376        1429
>
> These are nice results! Can you share all the job files or fio
> invocations for each of these? I guess it's just two variants, with QD
> varied between them?

Yes, just two variants with three QD/batch combinations.
Here are all the job files for the above data:
https://github.com/joshkan/fio/tree/nvme-passthru-wip-polling/pt-perf-jobs

> We really (REALLY) should turn the nvme-wip branch into something
> coherent, but at least with this we have some idea of an end result and
> something that is testable. This looks so much better from the
> performance POV than the earlier versions, passthrough _should_ be
> faster than non-pt.
>
It'd be great to know how it performs in your setup.
And please let me know how I can help in making things more coherent.