[RFC,V5] fs: New zonefs file system

From: Damien Le Moal <damien.lemoal@wdc.com>

From: Damien Le Moal <damien.lemoal@wdc.com>

zonefs is a very simple file system exposing each zone of a zoned
block device as a file. zonefs is in fact closer to a raw block device
access interface than to a full feature POSIX file system.

The goal of zonefs is to simplify implementation of zoned block device
raw access by applications by allowing switching to the well known POSIX
file API rather than relying on direct block device file ioctls and
read/write. Zonefs, for instance, greatly simplifies the implementation
of LSM (log-structured merge) tree structures (such as used in RocksDB
and LevelDB) on zoned block devices by allowing SSTables to be stored in
a zone file similarly to a regular file system architecture, hence
reducing the amount of change needed in the application.

Zonefs on-disk metadata is reduced to a super block to store a magic
number, a uuid and optional features flags and values. On mount, zonefs
uses blkdev_report_zones() to obtain the device zone configuration and
populates the mount point with a static file tree solely based on this
information. E.g. file sizes come from zone write pointer offset managed
by the device itself.

The zone files created on mount have the following characteristics.
1) Files representing zones of the same type are grouped together
   under a common directory:
  * For conventional zones, the directory "cnv" is used.
  * For sequential write zones, the directory "seq" is used.
  These two directories are the only directories that exist in zonefs.
  Users cannot create other directories and cannot rename nor delete
  the "cnv" and "seq" directories.
2) The name of zone files is by default the number of the file within
   the zone type directory, in order of increasing zone start sector.
3) The size of conventional zone files is fixed to the device zone size.
   Conventional zone files cannot be truncated.
4) The size of sequential zone files represent the file zone write
   pointer position relative to the zone start sector. Truncating these
   files is allowed only down to 0, in wich case, the zone is reset to
   rewind the file zone write pointer position to the start of the zone.
5) All read and write operations to files are not allowed beyond the
   file zone size. Any access exceeding the zone size is failed with
   the -EFBIG error.
6) Creating, deleting, renaming or modifying any attribute of files
   and directories is not allowed. The only exception being the file
   size of sequential zone files which can be modified by write
   operations or truncation to 0.

Several optional features of zonefs can be enabled at format time.
* Conventional zone aggregation: contiguous conventional zones can be
  agregated into a single larger file instead of multiple per-zone
  files.
* File naming: the default file number file name can be switched to
  using the base-10 value of the file zone start sector.
* File ownership: The owner UID and GID of zone files is by default 0
  (root) but can be changed to any valid UID/GID.
* File access permissions: the default 640 access permissions can be
  changed.

The mkzonefs tool is used to format zonefs. This tool is available
on Github at: git@github.com:damien-lemoal/zonefs-tools.git.
zonefs-tools includes a simple test suite which can be run against any
zoned block device, including null_blk block device created with zoned
mode.

Example: the following formats a host-managed SMR HDD with the
conventional zone aggregation feature enabled.

mkzonefs -o aggr_cnv /dev/sdX
mount -t zonefs /dev/sdX /mnt
ls -l /mnt/
total 0
dr-xr-xr-x 2 root root 0 Apr 11 13:00 cnv
dr-xr-xr-x 2 root root 0 Apr 11 13:00 seq

ls -l /mnt/cnv
total 137363456
-rw-rw---- 1 root root 140660178944 Apr 11 13:00 0

ls -Fal -v /mnt/seq
total 14511243264
dr-xr-xr-x 2 root root 15942528 Jul 10 11:53 ./
drwxr-xr-x 4 root root     1152 Jul 10 11:53 ../
-rw-r----- 1 root root        0 Jul 10 11:53 0
-rw-r----- 1 root root 33554432 Jul 10 13:43 1
-rw-r----- 1 root root        0 Jul 10 11:53 2
-rw-r----- 1 root root        0 Jul 10 11:53 3
...

The aggregated conventional zone file can be used as a regular file.
Operations such as the following work.

mkfs.ext4 /mnt/cnv/0
mount -o loop /mnt/cnv/0 /data

Contains contributions from Johannes Thumshirn <jthumshirn@suse.de>
and Christoph Hellwig <hch@lst.de>.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[darrick: fix iomap api breakage]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v5: This is the same as v4, but with the iomap api changes baked in.
---
 MAINTAINERS                |   10 
 fs/Kconfig                 |    1 
 fs/Makefile                |    1 
 fs/zonefs/Kconfig          |    9 
 fs/zonefs/Makefile         |    4 
 fs/zonefs/super.c          | 1097 ++++++++++++++++++++++++++++++++++++++++++++
 fs/zonefs/zonefs.h         |  185 +++++++
 include/uapi/linux/magic.h |    1 
 8 files changed, 1308 insertions(+)
 create mode 100644 fs/zonefs/Kconfig
 create mode 100644 fs/zonefs/Makefile
 create mode 100644 fs/zonefs/super.c
 create mode 100644 fs/zonefs/zonefs.h

Message ID	20190904181518.GE5354@magnolia (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=LKvn=W7=vger.kernel.org=linux-fsdevel-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0839014E5 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Wed, 4 Sep 2019 18:15:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B322522CEA for <patchwork-linux-fsdevel@patchwork.kernel.org>; Wed, 4 Sep 2019 18:15:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Y2Hvf4yO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391002AbfIDSPh (ORCPT <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>); Wed, 4 Sep 2019 14:15:37 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:53994 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390308AbfIDSPh (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Wed, 4 Sep 2019 14:15:37 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x84I8h5v017439; Wed, 4 Sep 2019 18:15:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : mime-version : content-type; s=corp-2019-08-05; bh=1HRSuuxf3HkDoTBZg5xXSqtA/LPEfX2iwPTlkBOEDqs=; b=Y2Hvf4yONYld0LFwfVr+jNkHLxlMl5r5fxLPt4MiCZrPxza7LhWyKYFy4eDvFZ3Azc01 +U07ibY160KBwV9iOCUB8oUi6BnYYPnTO5pJ+akismm7o7Hm/2MjE4DpcWy87qbVmcng +PvMNIK4qiS31/FLC05b6Q1f5XXjaDeivHfV4d5z+MggUfR2vZsrMc37RRoT34EzFty0 itdyk2O4XU6ks7WeyMzr4dspnJ4r1pPTxImBObkLxH1RZx4A9g78JD+Zkeo6ms3zKvVC A3aQqTszP+xHrkLxiO3xorsax1AcfzrDSW94BqqPVp+s5n10uQrum+kJkEtZ0s48mkA6 nQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 2utj8802p6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 04 Sep 2019 18:15:23 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x84I8TX7096280; Wed, 4 Sep 2019 18:15:23 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 2usu53fk4q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 04 Sep 2019 18:15:21 +0000 Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x84IFKAa015991; Wed, 4 Sep 2019 18:15:20 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 04 Sep 2019 11:15:20 -0700 Date: Wed, 4 Sep 2019 11:15:18 -0700 From: "Darrick J. Wong" <darrick.wong@oracle.com> To: Damien Le Moal <Damien.LeMoal@wdc.com> Cc: Christoph Hellwig <hch@infradead.org>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>, Johannes Thumshirn <jthumshirn@suse.de>, Naohiro Aota <Naohiro.Aota@wdc.com>, Dave Chinner <david@fromorbit.com>, Hannes Reinecke <hare@suse.de>, Matias Bjorling <Matias.Bjorling@wdc.com> Subject: [PATCH RFC V5] fs: New zonefs file system Message-ID: <20190904181518.GE5354@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9370 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1909040182 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9370 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1909040182 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	[RFC,V5] fs: New zonefs file system \| expand [RFC,V5] fs: New zonefs file system

[RFC,V5] fs: New zonefs file system

Commit Message

Patch