mbox series

[RFC,0/1] Introduce a new target: lzbd - LightNVM Zoned Block Device

Message ID 1555588885-22546-1-git-send-email-hans@owltronix.com (mailing list archive)
Headers show
Series Introduce a new target: lzbd - LightNVM Zoned Block Device | expand

Message

Hans Holmberg April 18, 2019, 12:01 p.m. UTC
From: Hans Holmberg <hans.holmberg@cnexlabs.com>

Introduce a new target: lzbd - LightNVM Zoned Block Device

The new target makes it possible to expose an
Open-Channel 2.0 SSD as one or more zoned block devices exposing
BLK_ZONE_TYPE_SEQWRITE_REQ zones.

I've been playing around with this the last couple of months and
now I'd love to get some feedback.

It's very been useful to look at null_blk's zone support when
doing the plumbing work and Simon and Klaus has also been very helpful
when figuring out the design. Thanks guys!

Naming is sometimes the hardest thing. I named this thing lzbd, as
I found that most descriptive acronym.

NOTE: This is an early prototype and lacking some vital
features at the moment. It is worth looking at and playing
around with for those interested, but beware of dragons :)

See the lzbd documentation(Documentation/lightnvm/lzbd.txt) for my ideas on how
a full implementation would look like.

What is supported(for now):

	* Reads
	* Sequential writes
	* Unaligned writes (a per-zone ws_opt alignment buffer is used)
	* Zone resets
	* Zone reporting
	* Wear leveling(sort of, wear indices are not upated on reset yet)

I've mainly tested in QEMU (cunits=0, ws_min=4, ws_opt=8).

The zoned block device tests in blktests (tests/zbd) passes, and I've done
a bunch of general smoke testing(aligned/unaligned writes with verification
using dd and fio, ..), so the general plumbing seems to hold up, but 
more testing is needed.

Performance is definately not what it should be yet. Only one chunk per zone
is being written to at a time, effectively rate-limiting writes per zone,
which is an interesting constraint, but probably not what we want.

What is not supported(yet):

	* Metadata persistance (when the instance is removed, data is lost)
		- Zone to chunks mapping needs to be stored

	* Sync handling (flushing alignment buffers)
		- Zone Aligment buffer needs to be flushed to disk

	* Write error handling
		- Write errors will require zone -> chunk remapping
		  of the failing chunk.

	* Chuck reset error handling (chunks going offline)
	* Updating wear indices on chunk resets
		- This is low hanging fruit to fix

	* Cunits read buffering


Final thoughts, for now:

Since lzbd (and pblk for that matter) are not entirely unlike file systems,
it would be nice to create a mkfs/fsck/dmzadm-like tool that would:

	* Format the drive and persist instance configuration in a superblock
	  contained in the instance metadata.
	* Repair broken(i.e. powerfailed) instances
	  Per-sector metadata is currently not utilized in lzbd, but would
	  be helpful in recovery scenarios.	


The patch is based on Matias for5.2/core branch in the github 
openchannel project. It is also available at [1] (branch for-5.2/lzbd)


Thanks,
Hans

[1] CNEX Labs linux github project: https://github.com/CNEX-Labs/linux


Hans Holmberg (1):
  lightnvm: add lzbd - a zoned block device target

 Documentation/lightnvm/lzbd.txt | 122 +++++++++++
 drivers/lightnvm/Kconfig        |  11 +
 drivers/lightnvm/Makefile       |   3 +
 drivers/lightnvm/lzbd-io.c      | 342 +++++++++++++++++++++++++++++++
 drivers/lightnvm/lzbd-target.c  | 392 +++++++++++++++++++++++++++++++++++
 drivers/lightnvm/lzbd-user.c    | 310 ++++++++++++++++++++++++++++
 drivers/lightnvm/lzbd-zone.c    | 444 ++++++++++++++++++++++++++++++++++++++++
 drivers/lightnvm/lzbd.h         | 139 +++++++++++++
 8 files changed, 1763 insertions(+)
 create mode 100644 Documentation/lightnvm/lzbd.txt
 create mode 100644 drivers/lightnvm/lzbd-io.c
 create mode 100644 drivers/lightnvm/lzbd-target.c
 create mode 100644 drivers/lightnvm/lzbd-user.c
 create mode 100644 drivers/lightnvm/lzbd-zone.c
 create mode 100644 drivers/lightnvm/lzbd.h