Message ID | 20191212183816.102402-3-damien.lemoal@wdc.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | New zonefs file system | expand |
On 12/12/19 7:38 PM, Damien Le Moal wrote: > Add the new file Documentation/filesystems/zonefs.txt to document zonefs > principles and user-space tool usage. > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > Documentation/filesystems/zonefs.txt | 150 +++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 2 files changed, 151 insertions(+) > create mode 100644 Documentation/filesystems/zonefs.txt > > diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt > new file mode 100644 > index 000000000000..e5d798f4087d > --- /dev/null > +++ b/Documentation/filesystems/zonefs.txt > @@ -0,0 +1,150 @@ > +ZoneFS - Zone filesystem for Zoned block devices > + > +Overview > +======== > + > +zonefs is a very simple file system exposing each zone of a zoned block device > +as a file. Unlike a regular file system with zoned block device support (e.g. > +f2fs), zonefs does not hide the sequential write constraint of zoned block > +devices to the user. Files representing sequential write zones of the device > +must be written sequentially starting from the end of the file (append only > +writes). > + > +As such, zonefs is in essence closer to a raw block device access interface > +than to a full featured POSIX file system. The goal of zonefs is to simplify > +the implementation of zoned block devices support in applications by replacing > +raw block device file accesses with a richer file API, avoiding relying on > +direct block device file ioctls which may be more obscure to developers. One > +example of this approach is the implementation of LSM (log-structured merge) > +tree structures (such as used in RocksDB and LevelDB) on zoned block devices by > +allowing SSTables to be stored in a zone file similarly to a regular file system > +rather than as a range of sectors of the entire disk. The introduction of the > +higher level construct "one file is one zone" can help reducing the amount of > +changes needed in the application as well as introducing support for different > +application programming languages. > + > +zonefs on-disk metadata is reduced to a super block which persistently stores a > +magic number and optional features flags and values. On mount, zonefs uses > +blkdev_report_zones() to obtain the device zone configuration and populates > +the mount point with a static file tree solely based on this information. > +E.g. file sizes come from the device zone type and write pointer offset managed > +by the device itself. > + > +The zone files created on mount have the following characteristics. > +1) Files representing zones of the same type are grouped together > + under the same sub-directory: > + * For conventional zones, the sub-directory "cnv" is used. > + * For sequential write zones, the sub-directory "seq" is used. > + These two directories are the only directories that exist in zonefs. Users > + cannot create other directories and cannot rename nor delete the "cnv" and > + "seq" sub-directories. > +2) The name of zone files is the number of the file within the zone type > + sub-directory, in order of increasing zone start sector. > +3) The size of conventional zone files is fixed to the device zone size. > + Conventional zone files cannot be truncated. > +4) The size of sequential zone files represent the file's zone write pointer > + position relative to the zone start sector. Truncating these files is > + allowed only down to 0, in wich case, the zone is reset to rewind the file > + zone write pointer position to the start of the zone, or up to the zone size, > + in which case the file's zone is transitioned to the FULL state (finish zone > + operation). > +5) All read and write operations to files are not allowed beyond the file zone > + size. Any access exceeding the zone size is failed with the -EFBIG error. > +6) Creating, deleting, renaming or modifying any attribute of files and > + sub-directories is not allowed. > + > +Several optional features of zonefs can be enabled at format time. > +* Conventional zone aggregation: ranges of contiguous conventional zones can be > + agregated into a single larger file instead of the default one file per zone. > +* File ownership: The owner UID and GID of zone files is by default 0 (root) > + but can be changed to any valid UID/GID. > +* File access permissions: the default 640 access permissions can be changed. > + Please mention the 'direct writes only to sequential zones' restriction. Cheers, Hannes
On 2019/12/16 17:38, Hannes Reinecke wrote: > On 12/12/19 7:38 PM, Damien Le Moal wrote: >> Add the new file Documentation/filesystems/zonefs.txt to document zonefs >> principles and user-space tool usage. >> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >> --- >> Documentation/filesystems/zonefs.txt | 150 +++++++++++++++++++++++++++ >> MAINTAINERS | 1 + >> 2 files changed, 151 insertions(+) >> create mode 100644 Documentation/filesystems/zonefs.txt >> >> diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt >> new file mode 100644 >> index 000000000000..e5d798f4087d >> --- /dev/null >> +++ b/Documentation/filesystems/zonefs.txt >> @@ -0,0 +1,150 @@ >> +ZoneFS - Zone filesystem for Zoned block devices >> + >> +Overview >> +======== >> + >> +zonefs is a very simple file system exposing each zone of a zoned block device >> +as a file. Unlike a regular file system with zoned block device support (e.g. >> +f2fs), zonefs does not hide the sequential write constraint of zoned block >> +devices to the user. Files representing sequential write zones of the device >> +must be written sequentially starting from the end of the file (append only >> +writes). >> + >> +As such, zonefs is in essence closer to a raw block device access interface >> +than to a full featured POSIX file system. The goal of zonefs is to simplify >> +the implementation of zoned block devices support in applications by replacing >> +raw block device file accesses with a richer file API, avoiding relying on >> +direct block device file ioctls which may be more obscure to developers. One >> +example of this approach is the implementation of LSM (log-structured merge) >> +tree structures (such as used in RocksDB and LevelDB) on zoned block devices by >> +allowing SSTables to be stored in a zone file similarly to a regular file system >> +rather than as a range of sectors of the entire disk. The introduction of the >> +higher level construct "one file is one zone" can help reducing the amount of >> +changes needed in the application as well as introducing support for different >> +application programming languages. >> + >> +zonefs on-disk metadata is reduced to a super block which persistently stores a >> +magic number and optional features flags and values. On mount, zonefs uses >> +blkdev_report_zones() to obtain the device zone configuration and populates >> +the mount point with a static file tree solely based on this information. >> +E.g. file sizes come from the device zone type and write pointer offset managed >> +by the device itself. >> + >> +The zone files created on mount have the following characteristics. >> +1) Files representing zones of the same type are grouped together >> + under the same sub-directory: >> + * For conventional zones, the sub-directory "cnv" is used. >> + * For sequential write zones, the sub-directory "seq" is used. >> + These two directories are the only directories that exist in zonefs. Users >> + cannot create other directories and cannot rename nor delete the "cnv" and >> + "seq" sub-directories. >> +2) The name of zone files is the number of the file within the zone type >> + sub-directory, in order of increasing zone start sector. >> +3) The size of conventional zone files is fixed to the device zone size. >> + Conventional zone files cannot be truncated. >> +4) The size of sequential zone files represent the file's zone write pointer >> + position relative to the zone start sector. Truncating these files is >> + allowed only down to 0, in wich case, the zone is reset to rewind the file >> + zone write pointer position to the start of the zone, or up to the zone size, >> + in which case the file's zone is transitioned to the FULL state (finish zone >> + operation). >> +5) All read and write operations to files are not allowed beyond the file zone >> + size. Any access exceeding the zone size is failed with the -EFBIG error. >> +6) Creating, deleting, renaming or modifying any attribute of files and >> + sub-directories is not allowed. >> + >> +Several optional features of zonefs can be enabled at format time. >> +* Conventional zone aggregation: ranges of contiguous conventional zones can be >> + agregated into a single larger file instead of the default one file per zone. >> +* File ownership: The owner UID and GID of zone files is by default 0 (root) >> + but can be changed to any valid UID/GID. >> +* File access permissions: the default 640 access permissions can be changed. >> + > > Please mention the 'direct writes only to sequential zones' restriction. Yes, indeed, this is missing. Will add it. > > Cheers, > > Hannes >
diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt new file mode 100644 index 000000000000..e5d798f4087d --- /dev/null +++ b/Documentation/filesystems/zonefs.txt @@ -0,0 +1,150 @@ +ZoneFS - Zone filesystem for Zoned block devices + +Overview +======== + +zonefs is a very simple file system exposing each zone of a zoned block device +as a file. Unlike a regular file system with zoned block device support (e.g. +f2fs), zonefs does not hide the sequential write constraint of zoned block +devices to the user. Files representing sequential write zones of the device +must be written sequentially starting from the end of the file (append only +writes). + +As such, zonefs is in essence closer to a raw block device access interface +than to a full featured POSIX file system. The goal of zonefs is to simplify +the implementation of zoned block devices support in applications by replacing +raw block device file accesses with a richer file API, avoiding relying on +direct block device file ioctls which may be more obscure to developers. One +example of this approach is the implementation of LSM (log-structured merge) +tree structures (such as used in RocksDB and LevelDB) on zoned block devices by +allowing SSTables to be stored in a zone file similarly to a regular file system +rather than as a range of sectors of the entire disk. The introduction of the +higher level construct "one file is one zone" can help reducing the amount of +changes needed in the application as well as introducing support for different +application programming languages. + +zonefs on-disk metadata is reduced to a super block which persistently stores a +magic number and optional features flags and values. On mount, zonefs uses +blkdev_report_zones() to obtain the device zone configuration and populates +the mount point with a static file tree solely based on this information. +E.g. file sizes come from the device zone type and write pointer offset managed +by the device itself. + +The zone files created on mount have the following characteristics. +1) Files representing zones of the same type are grouped together + under the same sub-directory: + * For conventional zones, the sub-directory "cnv" is used. + * For sequential write zones, the sub-directory "seq" is used. + These two directories are the only directories that exist in zonefs. Users + cannot create other directories and cannot rename nor delete the "cnv" and + "seq" sub-directories. +2) The name of zone files is the number of the file within the zone type + sub-directory, in order of increasing zone start sector. +3) The size of conventional zone files is fixed to the device zone size. + Conventional zone files cannot be truncated. +4) The size of sequential zone files represent the file's zone write pointer + position relative to the zone start sector. Truncating these files is + allowed only down to 0, in wich case, the zone is reset to rewind the file + zone write pointer position to the start of the zone, or up to the zone size, + in which case the file's zone is transitioned to the FULL state (finish zone + operation). +5) All read and write operations to files are not allowed beyond the file zone + size. Any access exceeding the zone size is failed with the -EFBIG error. +6) Creating, deleting, renaming or modifying any attribute of files and + sub-directories is not allowed. + +Several optional features of zonefs can be enabled at format time. +* Conventional zone aggregation: ranges of contiguous conventional zones can be + agregated into a single larger file instead of the default one file per zone. +* File ownership: The owner UID and GID of zone files is by default 0 (root) + but can be changed to any valid UID/GID. +* File access permissions: the default 640 access permissions can be changed. + +User Space Tools +================ + +The mkzonefs tool is used to format zoned block devices for use with zonefs. +This tool is available on Github at: + +git@github.com:damien-lemoal/zonefs-tools.git. + +zonefs-tools also includes a test suite which can be run against any zoned +block device, including null_blk block device created with zoned mode. + +Example: the following formats a 15TB host-managed SMR HDD with 256 MB zones +with the conventional zones aggregation feature enabled. + +# mkzonefs -o aggr_cnv /dev/sdX +# mount -t zonefs /dev/sdX /mnt +# ls -l /mnt/ +total 0 +dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv +dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq + +The size of the zone files sub-directories indicate the number of files existing +for each type of zones. In this example, there is only one conventional zone +file (all conventional zones are agreggated under a single file). + +# ls -l /mnt/cnv +total 137101312 +-rw-r----- 1 root root 140391743488 Nov 25 13:23 0 + +This aggregated conventional zone file can be used as a regular file. + +# mkfs.ext4 /mnt/cnv/0 +# mount -o loop /mnt/cnv/0 /data + +The "seq" sub-directory grouping files for sequential write zones has in this +example 55356 zones. + +# ls -lv /mnt/seq +total 14511243264 +-rw-r----- 1 root root 0 Nov 25 13:23 0 +-rw-r----- 1 root root 0 Nov 25 13:23 1 +-rw-r----- 1 root root 0 Nov 25 13:23 2 +... +-rw-r----- 1 root root 0 Nov 25 13:23 55354 +-rw-r----- 1 root root 0 Nov 25 13:23 55355 + +For sequential write zone files, the file size changes as data is appended at +the end of the file, similarly to any regular file system. + +# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct +1+0 records in +1+0 records out +4096 bytes (4.1 kB, 4.0 KiB) copied, 1.05112 s, 3.9 kB/s + +# ls -l /mnt/seq/0 +-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/sdh/seq/0 + +The written file can be truncated to the zone size, prventing any further write +operation. + +# truncate -s 268435456 /mnt/seq/0 +# ls -l /mnt/seq/0 +-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0 + +Truncation to 0 size allows freeing the file zone storage space and restart +append-writes to the file. + +# truncate -s 0 /mnt/seq/0 +# ls -l /mnt/seq/0 +-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0 + +Since files are statically mapped to zones on the disk, the number of blocks of +a file as reported by stat() and fstat() indicates the size of the file zone. + +# stat /mnt/seq/0 + File: /mnt/seq/0 + Size: 0 Blocks: 524288 IO Block: 4096 regular empty file +Device: 870h/2160d Inode: 50431 Links: 1 +Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) +Access: 2019-11-25 13:23:57.048971997 +0900 +Modify: 2019-11-25 13:52:25.553805765 +0900 +Change: 2019-11-25 13:52:25.553805765 +0900 + Birth: - + +The number of blocks of the file ("Blocks") in units of 512B blocks gives the +maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone +size in this example. Of note is that the "IO block" field always indicates the +minimum IO size for writes and corresponds to the device physical sector size. diff --git a/MAINTAINERS b/MAINTAINERS index 0641167ed2ea..1c760735e906 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -18290,6 +18290,7 @@ L: linux-fsdevel@vger.kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs.git S: Maintained F: fs/zonefs/ +F: Documentation/filesystems/zonefs.txt ZPOOL COMPRESSED PAGE STORAGE API M: Dan Streetman <ddstreet@ieee.org>
Add the new file Documentation/filesystems/zonefs.txt to document zonefs principles and user-space tool usage. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- Documentation/filesystems/zonefs.txt | 150 +++++++++++++++++++++++++++ MAINTAINERS | 1 + 2 files changed, 151 insertions(+) create mode 100644 Documentation/filesystems/zonefs.txt