[RFC,00/13] uring-passthru for nvme

Message ID	20211220141734.12206-1-joshi.k@samsung.com (mailing list archive)
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Kanchan Joshi <joshi.k@samsung.com> To: io-uring@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org Cc: axboe@kernel.dk, hch@lst.de, kbusch@kernel.org, javier@javigon.com, anuj20.g@samsung.com, joshiiitr@gmail.com, pankydev8@gmail.com Subject: [RFC 00/13] uring-passthru for nvme Date: Mon, 20 Dec 2021 19:47:21 +0530 Message-Id: <20211220141734.12206-1-joshi.k@samsung.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" CMS-TYPE: 105P DLP-Filter: Pass References: <CGME20211220142227epcas5p280851b0a62baa78379979eb81af7a096@epcas5p2.samsung.com> Precedence: bulk
Series	uring-passthru for nvme \| expand [RFC,00/13] uring-passthru for nvme [RFC,01/13] io_uring: add infra for uring_cmd completion in submitter-task [RFC,02/13] nvme: wire-up support for async-passthru on char-device. [RFC,03/13] io_uring: mark iopoll not supported for uring-cmd [RFC,04/13] io_uring: modify unused field in io_uring_cmd to store flags [RFC,05/13] io_uring: add flag and helper for fixed-buffer uring-cmd [RFC,06/13] io_uring: add support for uring_cmd with fixed-buffer [RFC,07/13] nvme: enable passthrough with fixed-buffer [RFC,08/13] io_uring: plug for async bypass [RFC,09/13] block: wire-up support for plugging [RFC,10/13] block: factor out helper for bio allocation from cache [RFC,11/13] nvme: enable bio-cache for fixed-buffer passthru [RFC,12/13] nvme: allow user passthrough commands to poll [RFC,13/13] nvme: Add async passthru polling support

Message ID

20211220141734.12206-1-joshi.k@samsung.com (mailing list archive)

Headers

From: Kanchan Joshi <joshi.k@samsung.com>
To: io-uring@vger.kernel.org, linux-nvme@lists.infradead.org,
        linux-block@vger.kernel.org
Cc: axboe@kernel.dk, hch@lst.de, kbusch@kernel.org, javier@javigon.com,
        anuj20.g@samsung.com, joshiiitr@gmail.com, pankydev8@gmail.com
Subject: [RFC 00/13] uring-passthru for nvme
Date: Mon, 20 Dec 2021 19:47:21 +0530
Message-Id: <20211220141734.12206-1-joshi.k@samsung.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"
CMS-TYPE: 105P
DLP-Filter: Pass
References: 
 <CGME20211220142227epcas5p280851b0a62baa78379979eb81af7a096@epcas5p2.samsung.com>
Precedence: bulk

Series

uring-passthru for nvme | expand

Message

Kanchan Joshi Dec. 20, 2021, 2:17 p.m. UTC

Here is a revamped series on uring-passthru which is on top of Jens
"nvme-passthru-wip.2" branch.
https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2

This scales much better than before with the addition of following:
- plugging
- passthru polling (sync and async; sync part comes from a patch that
  Keith did earlier)
- bio-cache (this is regardless of irq/polling since we submit/complete in
  task-contex anyway. Currently kicks in when fixed-buffer option is
also passed, but that's primarily to keep the plumbing simple)

Also the feedback from Christoph (previous fixed-buffer series) is in
which has streamlined the plumbing.

I look forward to further feedback/comments.

KIOPS(512b) on P5800x looked like this:

QD    uring    pt    uring-poll    pt-poll
8      538     589      831         902
64     967     1131     1351        1378
256    1043    1230     1376        1429

Here uring is operating on block-interface (nvme0n1) while 'pt' refers
to uring-passthru operating on char-interface (ng0n1).

Perf/testing is with this custom fio that turnes regular io into
passthru on supplying "uring_cmd=1" option.
https://github.com/joshkan/fio/tree/nvme-passthru-wip-polling
Example command-line:
fio -iodepth=256 -rw=randread -ioengine=io_uring -bs=512 -numjobs=1 -runtime=60 -group_reporting -iodepth_batch_submit=64 -iodepth_batch_complete_min=1 -iodepth_batch_complete_max=64 -fixedbufs=1 -hipri=1 -sqthread_poll=0 -filename=/dev/ng0n1 -name=io_uring_256 -uring_cmd=1

background/context:
https://linuxplumbersconf.org/event/11/contributions/989/attachments/747/1723/lpc-2021-building-a-fast-passthru.pdf

Changes from v5:
https://lore.kernel.org/linux-nvme/20210805125539.66958-1-joshi.k@samsung.com/
1. Fixed-buffer passthru with same ioctl code + other feedback from hch
2. Plugging (from Jens)
3. Sync polling (from Keith)
3. Async polling via io_uring
4. Enable bio-cache for fixed-buffer passthru

Changes from v4:
https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
1. Moved to v5 branch of Jens, adapted to task-work changes in io_uring
2. Removed support for block-passthrough (over nvme0n1) for now
3. Added support for char-passthrough (over ng0n1)
4. Added fixed-buffer passthrough in io_uring and nvme plumbing


Anuj Gupta (3):
  io_uring: mark iopoll not supported for uring-cmd
  io_uring: modify unused field in io_uring_cmd to store flags
  io_uring: add support for uring_cmd with fixed-buffer

Jens Axboe (2):
  io_uring: plug for async bypass
  block: wire-up support for plugging

Kanchan Joshi (6):
  io_uring: add infra for uring_cmd completion in submitter-task
  nvme: wire-up support for async-passthru on char-device.
  io_uring: add flag and helper for fixed-buffer uring-cmd
  nvme: enable passthrough with fixed-buffer
  block: factor out helper for bio allocation from cache
  nvme: enable bio-cache for fixed-buffer passthru

Keith Busch (1):
  nvme: allow user passthrough commands to poll

Pankaj Raghav (1):
  nvme: Add async passthru polling support

 block/bio.c                     |  43 +++--
 block/blk-map.c                 |  46 ++++++
 block/blk-mq.c                  |  93 +++++------
 drivers/nvme/host/core.c        |  21 ++-
 drivers/nvme/host/ioctl.c       | 271 ++++++++++++++++++++++++++++----
 drivers/nvme/host/multipath.c   |   2 +
 drivers/nvme/host/nvme.h        |  13 +-
 drivers/nvme/host/pci.c         |   4 +-
 drivers/nvme/target/passthru.c  |   2 +-
 fs/io_uring.c                   | 113 +++++++++++--
 include/linux/bio.h             |   1 +
 include/linux/blk-mq.h          |   4 +
 include/linux/io_uring.h        |  26 ++-
 include/uapi/linux/io_uring.h   |   6 +-
 include/uapi/linux/nvme_ioctl.h |   4 +
 15 files changed, 542 insertions(+), 107 deletions(-)

Comments

Jens Axboe Dec. 21, 2021, 3:45 a.m. UTC | #1

On 12/20/21 7:17 AM, Kanchan Joshi wrote:
> Here is a revamped series on uring-passthru which is on top of Jens
> "nvme-passthru-wip.2" branch.
> https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2
> 
> This scales much better than before with the addition of following:
> - plugging
> - passthru polling (sync and async; sync part comes from a patch that
>   Keith did earlier)
> - bio-cache (this is regardless of irq/polling since we submit/complete in
>   task-contex anyway. Currently kicks in when fixed-buffer option is
> also passed, but that's primarily to keep the plumbing simple)
> 
> Also the feedback from Christoph (previous fixed-buffer series) is in
> which has streamlined the plumbing.
> 
> I look forward to further feedback/comments.
> 
> KIOPS(512b) on P5800x looked like this:
> 
> QD    uring    pt    uring-poll    pt-poll
> 8      538     589      831         902
> 64     967     1131     1351        1378
> 256    1043    1230     1376        1429

These are nice results! Can you share all the job files or fio
invocations for each of these? I guess it's just two variants, with QD
varied between them?

We really (REALLY) should turn the nvme-wip branch into something
coherent, but at least with this we have some idea of an end result and
something that is testable. This looks so much better from the
performance POV than the earlier versions, passthrough _should_ be
faster than non-pt.

Kanchan Joshi Dec. 21, 2021, 2:36 p.m. UTC | #2

On Tue, Dec 21, 2021 at 9:15 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 12/20/21 7:17 AM, Kanchan Joshi wrote:
> > Here is a revamped series on uring-passthru which is on top of Jens
> > "nvme-passthru-wip.2" branch.
> > https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2
> >
> > This scales much better than before with the addition of following:
> > - plugging
> > - passthru polling (sync and async; sync part comes from a patch that
> >   Keith did earlier)
> > - bio-cache (this is regardless of irq/polling since we submit/complete in
> >   task-contex anyway. Currently kicks in when fixed-buffer option is
> > also passed, but that's primarily to keep the plumbing simple)
> >
> > Also the feedback from Christoph (previous fixed-buffer series) is in
> > which has streamlined the plumbing.
> >
> > I look forward to further feedback/comments.
> >
> > KIOPS(512b) on P5800x looked like this:
> >
> > QD    uring    pt    uring-poll    pt-poll
> > 8      538     589      831         902
> > 64     967     1131     1351        1378
> > 256    1043    1230     1376        1429
>
> These are nice results! Can you share all the job files or fio
> invocations for each of these? I guess it's just two variants, with QD
> varied between them?

Yes, just two variants with three QD/batch combinations.
Here are all the job files for the above data:
https://github.com/joshkan/fio/tree/nvme-passthru-wip-polling/pt-perf-jobs

> We really (REALLY) should turn the nvme-wip branch into something
> coherent, but at least with this we have some idea of an end result and
> something that is testable. This looks so much better from the
> performance POV than the earlier versions, passthrough _should_ be
> faster than non-pt.
>
It'd be great to know how it performs in your setup.
And please let me know how I can help in making things more coherent.