[V3] fs: New zonefs file system

zonefs is a very simple file system exposing each zone of a zoned
block device as a file. zonefs is in fact closer to a raw block device
access interface than to a full feature POSIX file system.

The goal of zonefs is to simplify implementation of zoned block device
raw access by applications by allowing switching to the well known POSIX
file API rather than relying on direct block device file ioctls and
read/write. Zonefs, for instance, greatly simplifies the implementation
of LSM (log-structured merge) tree structures (such as used in RocksDB
and LevelDB) on zoned block devices by allowing SSTables to be stored in
a zone file similarly to a regular file system architecture, hence
reducing the amount of change needed in the application.

Zonefs on-disk metadata is reduced to a super block to store a magic
number, a uuid and optional features flags and values. On mount, zonefs
uses blkdev_report_zones() to obtain the device zone configuration and
populates the mount point with a static file tree solely based on this
information. E.g. file sizes come from zone write pointer offset managed
by the device itself.

The zone files created on mount have the following characteristics.
1) Files representing zones of the same type are grouped together
   under a common directory:
  * For conventional zones, the directory "cnv" is used.
  * For sequential write zones, the directory "seq" is used.
  These two directories are the only directories that exist in zonefs.
  Users cannot create other directories and cannot rename nor delete
  the "cnv" and "seq" directories.
2) The name of zone files is by default the number of the file within
   the zone type directory, in order of increasing zone start sector.
3) The size of conventional zone files is fixed to the device zone size.
   Conventional zone files cannot be truncated.
4) The size of sequential zone files represent the file zone write
   pointer position relative to the zone start sector. Truncating these
   files is allowed only down to 0, in wich case, the zone is reset to
   rewind the file zone write pointer position to the start of the zone.
5) All read and write operations to files are not allowed beyond the
   file zone size. Any access exceeding the zone size is failed with
   the -EFBIG error.
6) Creating, deleting, renaming or modifying any attribute of files
   and directories is not allowed. The only exception being the file
   size of sequential zone files which can be modified by write
   operations or truncation to 0.

Several optional features of zonefs can be enabled at format time.
* Conventional zone aggregation: contiguous conventional zones can be
  agregated into a single larger file instead of multiple per-zone
  files.
* File naming: the default file number file name can be switched to
  using the base-10 value of the file zone start sector.
* File ownership: The owner UID and GID of zone files is by default 0
  (root) but can be changed to any valid UID/GID.
* File access permissions: the default 640 access permissions can be
  changed.

The mkzonefs tool is used to format zonefs. This tool is available
on Github at: git@github.com:damien-lemoal/zonefs-tools.git.
zonefs-tools includes a simple test suite which can be run against any
zoned block device, including null_blk block device created with zoned
mode.

Example: the following formats a host-managed SMR HDD with the
conventional zone aggregation feature enabled.

mkzonefs -o aggr_cnv /dev/sdX
mount -t zonefs /dev/sdX /mnt
ls -l /mnt/
total 0
dr-xr-xr-x 2 root root 0 Apr 11 13:00 cnv
dr-xr-xr-x 2 root root 0 Apr 11 13:00 seq

ls -l /mnt/cnv
total 137363456
-rw-rw---- 1 root root 140660178944 Apr 11 13:00 0

ls -Fal -v /mnt/seq
total 14511243264
dr-xr-xr-x 2 root root 15942528 Jul 10 11:53 ./
drwxr-xr-x 4 root root     1152 Jul 10 11:53 ../
-rw-r----- 1 root root        0 Jul 10 11:53 0
-rw-r----- 1 root root 33554432 Jul 10 13:43 1
-rw-r----- 1 root root        0 Jul 10 11:53 2
-rw-r----- 1 root root        0 Jul 10 11:53 3
...

The aggregated conventional zone file can be used as a regular file.
Operations such as the following work.

mkfs.ext4 /mnt/cnv/0
mount -o loop /mnt/cnv/0 /data

Contains contributions from Johannes Thumshirn <jthumshirn@suse.de>
and Christoph Hellwig <hch@lst.de>.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
Changes from v2:
* Addressed comments from Darrick: Typo, added checksum to super block,
  enhance cheks of the super block fields validity (used reserved bytes
  and unknown features bits)
* Rebased on XFS tree iomap-for-next branch

Changes from v1:
* Rebased on latest iomap branch iomap-5.4-merge of XFS tree at
  git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
* Addressed all comments from Dave Chinner and others

 MAINTAINERS                |   10 +
 fs/Kconfig                 |    2 +
 fs/Makefile                |    1 +
 fs/zonefs/Kconfig          |    9 +
 fs/zonefs/Makefile         |    4 +
 fs/zonefs/super.c          | 1083 ++++++++++++++++++++++++++++++++++++
 fs/zonefs/zonefs.h         |  177 ++++++
 include/uapi/linux/magic.h |    1 +
 8 files changed, 1287 insertions(+)
 create mode 100644 fs/zonefs/Kconfig
 create mode 100644 fs/zonefs/Makefile
 create mode 100644 fs/zonefs/super.c
 create mode 100644 fs/zonefs/zonefs.h

Message ID	20190821070308.28665-1-damien.lemoal@wdc.com (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <SRS0=IGv+=WR=vger.kernel.org=linux-xfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 397891399 for <patchwork-linux-xfs@patchwork.kernel.org>; Wed, 21 Aug 2019 07:03:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E13B42332A for <patchwork-linux-xfs@patchwork.kernel.org>; Wed, 21 Aug 2019 07:03:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="m0kJVPjJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727219AbfHUHDL (ORCPT <rfc822;patchwork-linux-xfs@patchwork.kernel.org>); Wed, 21 Aug 2019 03:03:11 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:29627 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726693AbfHUHDL (ORCPT <rfc822;linux-xfs@vger.kernel.org>); Wed, 21 Aug 2019 03:03:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1566371045; x=1597907045; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=AkzIZHPd3OuS3OvQjTdF2ayA5V01sLEmx16DrrhyEQM=; b=m0kJVPjJbe5+X9PkYEcWpRRDG9mEMnBS9dsV/5kYLGi8B39FtVjeyj74 7gEgHSC1/502IlgTu361BcLoc9xsY7Mqi48JwB6PqiBGig/l7mBinSef3 4vNO6th9Plax32jaGnZhnWMyYXeC46G+TZUg+pzss6ZyNecM9yftKnCfX EN3zT9WPj5QzzVJsCCC1dVzq1ukpZT2ellhG1vWpVxwMMSbUh9fE5GMfB 7VZEUvX1C+X6Zqe7m+e06aZUFCU0indKSZANKbz6CDBGYNfsh/WQC2uQ3 +XbxLRL9yjGSvNX/XakurwLeRFjCe1fCWEErc4Et0wlBUjltRB78YHZaz g==; IronPort-SDR: ZdgNOuDL4bPtFuRzIOHH9TSwfpoCjxdfoRY+yNA2g0k3S4rLTj41tMbSEKwnkNaiHmlgmaEsSN WK8Y5RKcEGNpItWleuSTtnNg71rpj3QAjw0NIki2sgIUcvOseoNz8P+ntNHERmDVAU+A7SlOYq I6IlREkdLpRa1oaRFMKj+TFqKahtYsaDWqtQFztBON2elEM5FUTOaL/YJ74Cimebl5/leEpoTM gYW/fC28cH78gXDbdljJWPIjXZ0pCsCfDU0cz6gSkxOQT0bSWvQ9u/943gbxfdRhBlHCPdYR06 exg= X-IronPort-AV: E=Sophos;i="5.64,411,1559491200"; d="scan'208";a="216727104" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 21 Aug 2019 15:04:05 +0800 IronPort-SDR: CAtGvbCYPCfi81U+780bF0eo9s4ACMrY9MSagVzzuwRDOumYwH6aH2bpCAGJGbevRkh7za556O 3JPNECTf610BY1knq8vcrrzrRWOJfGn2ysVNtu+x/C+BV6n1OokWVDodiv/FXo0MlZt4JmogSh 3SEVYJ5Rxsz9v9KRMWRb+mWUGcv2mtl6uwNYRNd8+u5ZpIMv79sGHXqVqDCgePSvIhvdBzs4oX xM+qCPDdhTTiRmN4r186wOi7UlnfcJ9eqs8po6UFa2Z9cLptywPd4Krj64ojQ1NhpXtlly1/z3 +f1LHWZ5cgkuYJbOFayMfNcI Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2019 00:00:32 -0700 IronPort-SDR: iU/krKWRkoK/flAuSYLn9G+IpeB1PeEq6pgLIdVDfkllSXWCZDf8bHpq5BGyrsBdcKsl+5NzH2 8JSlWS5fd7XPlfUiRuA/MrH5gz0FLvLgqHAiA9sXwJ6661Cx6JUWCkVi3isilHZPlZepR9kq/x NjnmimMHh7Jawza+h3E1fGhj8QUITOSwORAhvroA4gCmPqjjkThsH2itu9/01kP3xC52GfEyBf k/L2OYDGOtqBc+4Do/ihf36oufYa8PczZNKZvV8fayVTItqH1v4pIor/skUGA9XDZoXfD9LGPo igk= WDCIronportException: Internal Received: from washi.fujisawa.hgst.com ([10.149.53.254]) by uls-op-cesaip01.wdc.com with ESMTP; 21 Aug 2019 00:03:08 -0700 From: Damien Le Moal <damien.lemoal@wdc.com> To: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>, Johannes Thumshirn <jthumshirn@suse.de>, Dave Chinner <david@fromorbit.com>, "Darrick J . Wong" <darrick.wong@oracle.com> Cc: Hannes Reinecke <hare@suse.de>, Matias Bjorling <matias.bjorling@wdc.com> Subject: [PATCH V3] fs: New zonefs file system Date: Wed, 21 Aug 2019 16:03:08 +0900 Message-Id: <20190821070308.28665-1-damien.lemoal@wdc.com> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-xfs.vger.kernel.org> X-Mailing-List: linux-xfs@vger.kernel.org
Series	[V3] fs: New zonefs file system \| expand [V3] fs: New zonefs file system

[V3] fs: New zonefs file system

Commit Message

Comments

Patch