From patchwork Tue Mar 8 16:53:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Pankaj Raghav
X-Patchwork-Id: 12774056
Return-Path:
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
by smtp.lore.kernel.org (Postfix) with ESMTP id 47632C433F5
for ; Tue, 8 Mar 2022 16:54:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id S235155AbiCHQzQ (ORCPT );
Tue, 8 Mar 2022 11:55:16 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37810 "EHLO
lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
with ESMTP id S232531AbiCHQzP (ORCPT
); Tue, 8 Mar 2022 11:55:15 -0500
Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com
[210.118.77.11])
by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E33B4EA31
for ;
Tue, 8 Mar 2022 08:54:17 -0800 (PST)
Received: from eucas1p1.samsung.com (unknown [182.198.249.206])
by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id
20220308165415euoutp01e4b2bb1ab5a58a7691288f2b40c8ba33~adhclI2vL3222332223euoutp01P
for ;
Tue, 8 Mar 2022 16:54:15 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.w1.samsung.com
20220308165415euoutp01e4b2bb1ab5a58a7691288f2b40c8ba33~adhclI2vL3222332223euoutp01P
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com;
s=mail20170921; t=1646758455;
bh=DVWYiQZI1nKd9R78r+5DqRDkHxcFQwzM1wM8Z6qv000=;
h=From:To:Cc:Subject:Date:References:From;
b=EUv3SSAYU9jEOjNPE+kups+KTKXf1nPeu3x5oYixY4Y/hDJfMINvKlhmFliNu6GGI
2qyFMUyXvOpL/4qkIUbWTP4yFZ7i07vrA4CYemyqu033PasLOEQZtfPWzb7e92JTP7
kvj5DDu/fD2pNu0PCQkuMn7ItWLr75bXWO7H2mW4=
Received: from eusmges2new.samsung.com (unknown [203.254.199.244]) by
eucas1p1.samsung.com (KnoxPortal) with ESMTP id
20220308165415eucas1p198934546e51c58554f75ca43224f8813~adhcIEtwN0389503895eucas1p1d;
Tue, 8 Mar 2022 16:54:15 +0000 (GMT)
Received: from eucas1p1.samsung.com ( [182.198.249.206]) by
eusmges2new.samsung.com (EUCPMTA) with SMTP id 10.AD.09887.63A87226;
Tue, 8
Mar 2022 16:54:14 +0000 (GMT)
Received: from eusmtrp1.samsung.com (unknown [182.198.249.138]) by
eucas1p1.samsung.com (KnoxPortal) with ESMTPA id
20220308165414eucas1p106df0bd6a901931215cfab81660a4564~adhbjebPQ0451504515eucas1p1a;
Tue, 8 Mar 2022 16:54:14 +0000 (GMT)
Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by
eusmtrp1.samsung.com (KnoxPortal) with ESMTP id
20220308165414eusmtrp12bfc3e097f2865107747e511b8275ede~adhbiV7KW0499204992eusmtrp1a;
Tue, 8 Mar 2022 16:54:14 +0000 (GMT)
X-AuditID: cbfec7f4-471ff7000000269f-f8-62278a36700e
Received: from eusmtip1.samsung.com ( [203.254.199.221]) by
eusmgms1.samsung.com (EUCPMTA) with SMTP id D2.73.09522.63A87226;
Tue, 8
Mar 2022 16:54:14 +0000 (GMT)
Received: from localhost (unknown [106.210.248.181]) by eusmtip1.samsung.com
(KnoxPortal) with ESMTPA id
20220308165414eusmtip1be13002a57f36e518a5408b47888d556~adhbNGoHv0472304723eusmtip1k;
Tue, 8 Mar 2022 16:54:14 +0000 (GMT)
From: Pankaj Raghav
To: Luis Chamberlain ,
Adam Manzanares ,
=?utf-8?q?Javier_Gonz=C3=A1lez?= ,
kanchan Joshi , Jens Axboe ,
Keith Busch , Christoph Hellwig ,
Sagi Grimberg ,
Damien Le Moal , =?utf-8?q?Matias_Bj?=
=?utf-8?q?=C3=B8rling?= , jiangbo.365@bytedance.com
Cc: Pankaj Raghav ,
Kanchan Joshi ,
linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
Pankaj Raghav
Subject: [PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices
Date: Tue, 8 Mar 2022 17:53:43 +0100
Message-Id: <20220308165349.231320-1-p.raghav@samsung.com>
X-Mailer: git-send-email 2.25.1
MIME-Version: 1.0
X-Brightmail-Tracker:
H4sIAAAAAAAAA+NgFrrGKsWRmVeSWpSXmKPExsWy7djPc7pmXepJBotOW1tMP6xosfpuP5vF
77PnmS1Wrj7KZPH4zmd2i54DH1gsjv5/y2Zx/u1hJotJh64xWuy9pW0xf9lTdosJbV+ZLW5M
eMpo8XlpC7vFmptPWSzWvX7P4iDg8e/EGjaPnbPusnucv7eRxePy2VKPTas62Tw2L6n32H2z
ASjXep/Vo2/LKkaPz5vkPNoPdDMFcEdx2aSk5mSWpRbp2yVwZRx8s465YGdqxcIfP5gbGB/7
dDFyckgImEjcPtbC0sXIxSEksIJRov/IBjYI5wujxOwzH6Gcz4wSGzqmMsG0LDp/kgkisZxR
ouP2Z3aQhJDAS0aJfROAEhwcbAJaEo2dYGERgQvMEj9uKoLUMwtsZpT4MGkFK0iNsICbRPdL
EZAaFgFViQP/DzOC2LwCVhK3pzawQOySl5h56Ts7RFxQ4uTMJ2BxZqB489bZzBA13ZwSS697
Q9guEj/Wr4DqFZZ4dXwLO4QtI3F6cg/YmxIC/YwSU1v+MEE4Mxgleg5vBjtaQsBaou9MDojJ
LKApsX6XPkSvo8T/79NYISr4JG68FYQ4gU9i0rbpzBBhXomONiGIaiWJnT+fQG2VkLjcNAfq
Gg+JPZeXs0ICKlZi/pHbrBMYFWYheWwWksdmIdywgJF5FaN4amlxbnpqsVFearlecWJucWle
ul5yfu4mRmDSO/3v+JcdjMtffdQ7xMjEwXiIUYKDWUmE9/55lSQh3pTEyqrUovz4otKc1OJD
jNIcLErivMmZGxKFBNITS1KzU1MLUotgskwcnFINTI3P1fVTv06qe75os0nwonsH+/sVtGde
NPtVd2Z+14k/e0+eSU7T3qJ70N2ecUJMbZnOmeXLsyU+nPR43vur9lr9EutDwdaCR07Xc195
NDm82vCdQsDyZ8+itFUt4w03V6+/vLF77R6TDJ0d3A67ai6KHTh3/PpOuzVTlPYf1reJ/x82
+VikqHzAVV8mycL7nvVOoVEHl5/8uCztzRGGVftmZCyo9NvuOzvtN/c3tVvfb970PPd/L/+T
2WxcexjiZHmnid6fsvq1TWKEddnlHVfqr169HRI291zW46y1b1X1TSuUlVfe89sZuZ5ZVCI0
OvN2zdTJv0T+FqUeuXTxpMXinA8Lm6Jbfwj9WvssZZWdEktxRqKhFnNRcSIAhlkwiOkDAAA=
X-Brightmail-Tracker:
H4sIAAAAAAAAA+NgFlrHIsWRmVeSWpSXmKPExsVy+t/xu7pmXepJBh93i1hMP6xosfpuP5vF
77PnmS1Wrj7KZPH4zmd2i54DH1gsjv5/y2Zx/u1hJotJh64xWuy9pW0xf9lTdosJbV+ZLW5M
eMpo8XlpC7vFmptPWSzWvX7P4iDg8e/EGjaPnbPusnucv7eRxePy2VKPTas62Tw2L6n32H2z
ASjXep/Vo2/LKkaPz5vkPNoPdDMFcEfp2RTll5akKmTkF5fYKkUbWhjpGVpa6BmZWOoZGpvH
WhmZKunb2aSk5mSWpRbp2yXoZRx8s465YGdqxcIfP5gbGB/7dDFyckgImEgsOn+SCcQWEljK
KDG9rwwiLiFxe2ETI4QtLPHnWhdbFyMXUM1zRomDR86wdDFycLAJaEk0drKDxEUEbjBLLJva
xgjiMAtsZ5TYsHIOG0iRsICbRPdLEZBBLAKqEgf+HwYbyitgJXF7agMLxAJ5iZmXvrNDxAUl
Ts58AhZnBoo3b53NPIGRbxaS1CwkqQWMTKsYRVJLi3PTc4sN9YoTc4tL89L1kvNzNzECI27b
sZ+bdzDOe/VR7xAjEwfjIUYJDmYlEd7751WShHhTEiurUovy44tKc1KLDzGaAt03kVlKNDkf
GPN5JfGGZgamhiZmlgamlmbGSuK8ngUdiUIC6YklqdmpqQWpRTB9TBycUg1MoVb6N7v3n9Vo
ju8XO2um8eUY/57eg3pfdvkpP9Yv4rWTirWUOV0XckbdRuWdVWjBo8rH4v4rOC5Mvb905eH3
H0K35mdWC/rb7f4gF7+yKmeunviOI6tPvJ1Zeo1l2mmud7/MD2qef+DIMKfBfrvu7VPnGvc3
fnzfIpWT9W6NkOC8eN64nW9WLVpmz3WLJ+7GJc6Qxx66cvqJxYe2PhK0mv10+uo5D29P5Zvw
uuTzXDF5/ltOsydNc5Q8Xfi0zbUubrfqdP0V24T8mvs/PMhwU0qa2v/2n7Mh2/K48+wfW5t9
1x6tCml+GjutIXp1699t/e7WB3b8PnRSSszojls+7x52K5792yvvXni6L3C1sxJLcUaioRZz
UXEiAMb7MZxBAwAA
X-CMS-MailID: 20220308165414eucas1p106df0bd6a901931215cfab81660a4564
X-Msg-Generator: CA
X-RootMTR: 20220308165414eucas1p106df0bd6a901931215cfab81660a4564
X-EPHeader: CA
CMS-TYPE: 201P
X-CMS-RootMailID: 20220308165414eucas1p106df0bd6a901931215cfab81660a4564
References:
Precedence: bulk
List-ID:
X-Mailing-List: linux-block@vger.kernel.org
#Motivation:
There are currently ZNS drives that are produced and deployed that do
not have power_of_2(PO2) zone size. The NVMe spec for ZNS does not
specify the PO2 requirement but the linux block layer currently checks
for zoned devices to have power_of_2 zone sizes.
As a result there are many applications in the kernel such as F2FS,
BTRFS and other userspace applications that are designed based on the assumption
that zone sizes are PO2.
This patchset aims at supporting non-power_of_2 zoned devices without
affecting the existing applications by adding an emulation layer for
NVMe ZNS devices without regressing the current upstream implementation.
#Implementation:
A new callback is added to the block device operation fops which is
called when a special handling is required by the driver when a
non-power_of_2 zoned device is discovered. This patchset adds support
only to NVMe ZNS and null block driver to measure performance.
The scsi ZAC/ZBC implementation is untouched.
Emulation is enabled by doing a static remapping of the zones only in
the host and whenever a request is sent to the device via the block
layer, a transformation is done to the actual device sector.
#Testing:
There are two things that need to be tested: no regression on the
upstream implementation for PO2 zone sizes and testing the
implementation of the emulation itself.
To do apple-apples comparison, the following device specs were chosen
for testing (both on null_blk and QEMU):
PO2 device: zone.size=128M zone.cap=96M
NPO2 device: zone.size=96M zone.cap=96M
##Regression:
These tests are done on a **PO2 device**.
PO2 device used: zone.size=128M zone.cap=96M
###blktests:
Blktests were executed with the following config:
TEST_DEVS=(/dev/nvme0n2)
TIMEOUT=100
RUN_ZONED_TESTS=1
block and zbd tests were performed and no regression were found in the
tests.
###Performance:
Performance tests were performed on a null blk device. The following fio
script was used to measure the performance:
fio --name=zbc --filename=/dev/nullb0 --direct=1 --zonemode=zbd --size=23G
--io_size= --ioengine=io_uring --iodepth= --rw= --bs=4k --loops=4
No regressions were found with the patches on a **PO2 device** compared
to the existing upstream implementation.
The following results are an average of 4 runs on AMD Ryzen 5 5600X with
32GB of RAM:
Sequential Write:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 1 | 4 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patches | 155 | 604 | 6.00 | 426 | 1663 | 8.77 |
x-----------------x---------------------------------x---------------------------------x
| With patches | 157 | 613 | 5.92 | 425 | 1741 | 8.79 |
x-----------------x---------------------------------x---------------------------------x
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patches | 607 | 2370 | 12.06 | 622 | 2431 | 23.61 |
x-----------------x---------------------------------x---------------------------------x
| With patches | 621 | 2425 | 11.80 | 633 | 2472 | 23.24 |
x-----------------x---------------------------------x---------------------------------x
Sequential read:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 1 | 4 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patches | 165 | 643 | 5.72 | 485 | 1896 | 8.03 |
x-----------------x---------------------------------x---------------------------------x
| With patches | 167 | 654 | 5.62 | 483 | 1888 | 8.06 |
x-----------------x---------------------------------x---------------------------------x
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patches | 696 | 2718 | 11.29 | 692 | 2701 | 22.92 |
x-----------------x---------------------------------x---------------------------------x
| With patches | 696 | 2718 | 11.29 | 730 | 2835 | 21.70 |
x-----------------x---------------------------------x---------------------------------x
Random read:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 1 | 4 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patches | 159 | 623 | 5.86 | 451 | 1760 | 8.58 |
x-----------------x---------------------------------x---------------------------------x
| With patches | 163 | 635 | 5.75 | 462 | 1806 | 8.36 |
x-----------------x---------------------------------x---------------------------------x
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patches | 544 | 2124 | 14.44 | 553 | 2162 | 28.64 |
x-----------------x---------------------------------x---------------------------------x
| With patches | 554 | 2165 | 14.15 | 556 | 2171 | 28.52 |
x-----------------x---------------------------------x---------------------------------x
##Emulated device
NPO2 device: zone.size=96M zone.cap=96M
###blktests:
Blktests were executed with the following config:
TEST_DEVS=(/dev/nvme0n2)
TIMEOUT=100
RUN_ZONED_TESTS=1
block and zbd tests were performed and they are passing.
###Performance:
Performance tests were performed on a null blk device. The following fio
script was used to measure the performance:
fio --name=zbc --filename=/dev/nullb0 --direct=1 --zonemode=zbd --size=23G
--io_size= --ioengine=io_uring --iodepth= --rw= --bs=4k --loops=4
On an average, the NPO2 devices had a performance degradation of less than 1%
compared to the PO2 devices.
The following results are an average of 4 runs on AMD Ryzen 5 5600X with
32GB of RAM:
Write:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 1 | 4 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| With patches | 155 | 606 | 5.99 | 424 | 1655 | 8.83 |
x-----------------x---------------------------------x---------------------------------x
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| With patches | 609 | 2378 | 12.04 | 620 | 2421 | 23.75 |
x-----------------x---------------------------------x---------------------------------x
SEQREAD:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 1 | 4 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| With patches | 160 | 623 | 5.91 | 481 | 1878 | 8.11 |
x-----------------x---------------------------------x---------------------------------x
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| With patches | 696 | 2720 | 11.28 | 722 | 2819 | 21.96 |
x-----------------x---------------------------------x---------------------------------x
RANDREAD:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 1 | 4 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| With patches | 155 | 607 | 6.03 | 465 | 1817 | 8.31 |
x-----------------x---------------------------------x---------------------------------x
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| With patches | 552 | 2158 | 14.21 | 561 | 2190 | 28.27 |
x-----------------x---------------------------------x---------------------------------x
#TODO:
- The current implementation only works for the NVMe pci transport to
limit the scope and impact.
Support for NVMe target will follow soon.
Pankaj Raghav (6):
nvme: zns: Allow ZNS drives that have non-power_of_2 zone size
block: Add npo2_zone_setup callback to block device fops
block: add a bool member to request_queue for power_of_2 emulation
nvme: zns: Add support for power_of_2 emulation to NVMe ZNS devices
null_blk: forward the sector value from null_handle_memory_backend
null_blk: Add support for power_of_2 emulation to the null blk device
block/blk-zoned.c | 3 +
drivers/block/null_blk/main.c | 18 +--
drivers/block/null_blk/null_blk.h | 12 ++
drivers/block/null_blk/zoned.c | 203 ++++++++++++++++++++++++++----
drivers/nvme/host/core.c | 28 +++--
drivers/nvme/host/nvme.h | 100 ++++++++++++++-
drivers/nvme/host/pci.c | 4 +
drivers/nvme/host/zns.c | 86 +++++++++++--
include/linux/blk-mq.h | 2 +
include/linux/blkdev.h | 25 ++++
10 files changed, 428 insertions(+), 53 deletions(-)