mbox series

[for-next,v1,0/2] enable pcpu bio-cache for IRQ uring-passthru I/O

Message ID 20230117120638.72254-1-anuj20.g@samsung.com (mailing list archive)
Headers show
Series enable pcpu bio-cache for IRQ uring-passthru I/O | expand

Message

Anuj Gupta Jan. 17, 2023, 12:06 p.m. UTC
This series extends bio pcpu caching for normal / IRQ-driven
uring-passthru I/Os. Earlier, only polled uring-passthru I/Os could
leverage bio-cache. After the series from Pavel[1], bio-cache can be
leveraged by normal / IRQ driven I/Os as well. t/io_uring with an Optane
SSD setup shows +7.21% for batches of 32 requests.

[1] https://lore.kernel.org/io-uring/cover.1666347703.git.asml.silence@gmail.com/

IRQ, 128/32/32, cache off

# taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B1 -P0 -O0 -u1 -n1 /dev/ng0n1
submitter=0, tid=13207, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=3.05M, BW=1488MiB/s, IOS/call=32/31
IOPS=3.04M, BW=1483MiB/s, IOS/call=32/31
IOPS=3.03M, BW=1477MiB/s, IOS/call=32/32
IOPS=3.03M, BW=1481MiB/s, IOS/call=32/32
^CExiting on signal
Maximum IOPS=3.05M

IRQ, 128/32/32, cache on

# taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B1 -P0 -O0 -u1 -n1 /dev/ng0n1
submitter=0, tid=6755, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=3.27M, BW=1596MiB/s, IOS/call=32/31
IOPS=3.27M, BW=1595MiB/s, IOS/call=32/32
IOPS=3.26M, BW=1592MiB/s, IOS/call=32/31
IOPS=3.26M, BW=1593MiB/s, IOS/call=32/32
^CExiting on signal
Maximum IOPS=3.27M

Anuj Gupta (2):
  nvme: set REQ_ALLOC_CACHE for uring-passthru request
  block: extend bio-cache for non-polled requests

 block/blk-map.c           | 6 ++----
 drivers/nvme/host/ioctl.c | 4 ++--
 2 files changed, 4 insertions(+), 6 deletions(-)

Comments

Jens Axboe Jan. 17, 2023, 5:11 p.m. UTC | #1
On 1/17/23 5:06?AM, Anuj Gupta wrote:
> This series extends bio pcpu caching for normal / IRQ-driven
> uring-passthru I/Os. Earlier, only polled uring-passthru I/Os could
> leverage bio-cache. After the series from Pavel[1], bio-cache can be
> leveraged by normal / IRQ driven I/Os as well. t/io_uring with an Optane
> SSD setup shows +7.21% for batches of 32 requests.
> 
> [1] https://lore.kernel.org/io-uring/cover.1666347703.git.asml.silence@gmail.com/
> 
> IRQ, 128/32/32, cache off

Tests here -

before:

polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=62.88M, BW=30.70GiB/s, IOS/call=32/31
IOPS=62.95M, BW=30.74GiB/s, IOS/call=32/31
IOPS=62.52M, BW=30.53GiB/s, IOS/call=32/32
IOPS=62.61M, BW=30.57GiB/s, IOS/call=31/32
IOPS=62.52M, BW=30.53GiB/s, IOS/call=32/31
IOPS=62.40M, BW=30.47GiB/s, IOS/call=32/32

after:

polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=76.58M, BW=37.39GiB/s, IOS/call=31/31
IOPS=79.42M, BW=38.78GiB/s, IOS/call=32/32
IOPS=78.06M, BW=38.12GiB/s, IOS/call=31/31
IOPS=77.64M, BW=37.91GiB/s, IOS/call=32/31
IOPS=77.17M, BW=37.68GiB/s, IOS/call=32/32
IOPS=76.73M, BW=37.47GiB/s, IOS/call=31/31
IOPS=76.94M, BW=37.57GiB/s, IOS/call=32/31

Note that this includes Pavel's fix as well:

https://lore.kernel.org/linux-block/80d4511011d7d4751b4cf6375c4e38f237d935e3.1673955390.git.asml.silence@gmail.com/

But this mirrors the improvement seen on the non-passthrough side as
well. I'd say that's a pass :-)
Jens Axboe Jan. 17, 2023, 5:23 p.m. UTC | #2
On Tue, 17 Jan 2023 17:36:36 +0530, Anuj Gupta wrote:
> This series extends bio pcpu caching for normal / IRQ-driven
> uring-passthru I/Os. Earlier, only polled uring-passthru I/Os could
> leverage bio-cache. After the series from Pavel[1], bio-cache can be
> leveraged by normal / IRQ driven I/Os as well. t/io_uring with an Optane
> SSD setup shows +7.21% for batches of 32 requests.
> 
> [1] https://lore.kernel.org/io-uring/cover.1666347703.git.asml.silence@gmail.com/
> 
> [...]

Applied, thanks!

[1/2] nvme: set REQ_ALLOC_CACHE for uring-passthru request
      commit: 988136a307157de9e6e9d27ee9f7ea24ee374f32
[2/2] block: extend bio-cache for non-polled requests
      commit: 934f178446b11f621ab52e83211ebf399896db47

Best regards,
Kanchan Joshi Jan. 18, 2023, 9:14 a.m. UTC | #3
On Tue, Jan 17, 2023 at 10:11:08AM -0700, Jens Axboe wrote:
>On 1/17/23 5:06?AM, Anuj Gupta wrote:
>> This series extends bio pcpu caching for normal / IRQ-driven
>> uring-passthru I/Os. Earlier, only polled uring-passthru I/Os could
>> leverage bio-cache. After the series from Pavel[1], bio-cache can be
>> leveraged by normal / IRQ driven I/Os as well. t/io_uring with an Optane
>> SSD setup shows +7.21% for batches of 32 requests.
>>
>> [1] https://lore.kernel.org/io-uring/cover.1666347703.git.asml.silence@gmail.com/
>>
>> IRQ, 128/32/32, cache off
>
>Tests here -
>
>before:
>
>polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=128
>Engine=io_uring, sq_ring=128, cq_ring=128
>IOPS=62.88M, BW=30.70GiB/s, IOS/call=32/31
>IOPS=62.95M, BW=30.74GiB/s, IOS/call=32/31
>IOPS=62.52M, BW=30.53GiB/s, IOS/call=32/32
>IOPS=62.61M, BW=30.57GiB/s, IOS/call=31/32
>IOPS=62.52M, BW=30.53GiB/s, IOS/call=32/31
>IOPS=62.40M, BW=30.47GiB/s, IOS/call=32/32
>
>after:
>
>polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=128
>Engine=io_uring, sq_ring=128, cq_ring=128
>IOPS=76.58M, BW=37.39GiB/s, IOS/call=31/31
>IOPS=79.42M, BW=38.78GiB/s, IOS/call=32/32
>IOPS=78.06M, BW=38.12GiB/s, IOS/call=31/31
>IOPS=77.64M, BW=37.91GiB/s, IOS/call=32/31
>IOPS=77.17M, BW=37.68GiB/s, IOS/call=32/32
>IOPS=76.73M, BW=37.47GiB/s, IOS/call=31/31
>IOPS=76.94M, BW=37.57GiB/s, IOS/call=32/31
>
>Note that this includes Pavel's fix as well:
>
>https://lore.kernel.org/linux-block/80d4511011d7d4751b4cf6375c4e38f237d935e3.1673955390.git.asml.silence@gmail.com/

So I was thinking whether we need this fix for passthru path too. We do
not.
For block path, blk_mq_get_cached_request() encountered a
mismatch since type was different (read vs default).
For passthru, blk_mq_alloc_cached_request() sees no mismatch since
passthrough opf is not treated as read (default vs default).