From patchwork Thu Jun 14 00:11:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 10463007 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AC8C860348 for ; Thu, 14 Jun 2018 00:11:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F40E28CD9 for ; Thu, 14 Jun 2018 00:11:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 92EB728CDD; Thu, 14 Jun 2018 00:11:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 109E028CD9 for ; Thu, 14 Jun 2018 00:11:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935903AbeFNALz (ORCPT ); Wed, 13 Jun 2018 20:11:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:37882 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935764AbeFNALy (ORCPT ); Wed, 13 Jun 2018 20:11:54 -0400 Received: from garbanzo.do-not-panic.com (c-73-15-241-2.hsd1.ca.comcast.net [73.15.241.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 60FD3208DA; Thu, 14 Jun 2018 00:11:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1528935113; bh=l73qH8OvYpWhT/3WywukQIXyPNXv62Z/sKUD9VlmNWE=; h=From:To:Cc:Subject:Date:From; b=Vmt8Uaz1+oAfIeBSX7YfOVXREetv3zNgC0EDyACRYZONxZCD6jJH2HlfqjWDFEoeV wN/AcHtis2PcTF5b2KfrKpdeKtSkddtmv/yHfoTjK4Wy4aQ2Se77Dec3rLrFY0cAsQ PqZhgTbEZYy0C6FP99nYLce0uxwyBcoyoEdmhqEk= From: "Luis R. Rodriguez" To: damien.lemoal@wdc.com Cc: hare@suse.de, axboe@kernel.dk, jaegeuk@kernel.org, yuchao0@huawei.com, ghe@suse.com, mwilck@suse.com, tchvatal@suse.com, zren@suse.com, agk@redhat.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, "Luis R. Rodriguez" Subject: [PATCH] dm-zoned-tools: add zoned disk udev rules for scheduler / dmsetup Date: Wed, 13 Jun 2018 17:11:47 -0700 Message-Id: <20180614001147.1545-1-mcgrof@kernel.org> X-Mailer: git-send-email 2.17.1 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Setting up a zoned disks in a generic form is not so trivial. There is also quite a bit of tribal knowledge with these devices which is not easy to find. The currently supplied demo script works but it is not generic enough to be practical for Linux distributions or even developers which often move from one kernel to another. This tries to put a bit of this tribal knowledge into an initial udev rule for development with the hopes Linux distributions can later deploy. Three rule are added. One rule is optional for now, it should be extended later to be more distribution-friendly and then I think this may be ready for consideration for integration on distributions. 1) scheduler setup 2) backlist f2fs devices 3) run dmsetup for the rest of devices Note that this udev rule will not work well if you want to use a disk with f2fs on part of the disk and another filesystem on another part of the disk. That setup will require manual love so these setups can use the same backlist on rule 2). Its not widely known for instance that as of v4.16 it is mandated to use either deadline or the mq-deadline scheduler for *all* SMR drivers. Its also been determined that the Linux kernel is not the place to set this up, so a udev rule *is required* as per latest discussions. This is the first rule we add. Furthermore if you are *not* using f2fs you always have to run dmsetup. dmsetups do not persist, so you currently *always* have to run a custom sort of script, which is not ideal for Linux distributions. We can invert this logic into a udev rule to enable users to blacklist disks they know they want to use f2fs for. This the second optional rule. This blacklisting can be generalized further in the future with an exception list file, for instance using INPUT{db} or the like. The third and final rule added then runs dmsetup for the rest of the disks using the disk serial number for the new device mapper name. Note that it is currently easy for users to make a mistake and run mkfs on the the original disk, not the /dev/mapper/ device for non f2fs arrangements. If that is done experience shows things can easily fall apart with alignment *eventually*. We have no generic way today to error out on this condition and proactively prevent this. Signed-off-by: Luis R. Rodriguez Nacked-by: Mike Snitzer --- README | 10 +++++- udev/99-zoned-disks.rules | 78 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+), 1 deletion(-) create mode 100644 udev/99-zoned-disks.rules diff --git a/README b/README index 65e96c34fd04..f49541eaabc8 100644 --- a/README +++ b/README @@ -168,7 +168,15 @@ Options: reclaiming random zones if the percentage of free random data zones falls below . -V. Example scripts +V. Udev zone disk deployment +============================ + +A udev rule is provided which enables you to set the IO scheduler, blacklist +driver to run dmsetup, and runs dmsetup for the rest of the zone drivers. +If you use this udev rule the below script is not needed. Be sure to mkfs only +on the resulting /dev/mapper/zone-$serial device you end up with. + +VI. Example scripts ================== [[ diff --git a/udev/99-zoned-disks.rules b/udev/99-zoned-disks.rules new file mode 100644 index 000000000000..e19b738dcc0e --- /dev/null +++ b/udev/99-zoned-disks.rules @@ -0,0 +1,78 @@ +# To use a zone disks first thing you need to: +# +# 1) Enable zone disk support in your kernel +# 2) Use the deadline or mq-deadline scheduler for it - mandated as of v4.16 +# 3) Blacklist devices dedicated for f2fs as of v4.10 +# 4) Run dmsetup other disks +# 5) Create the filesystem -- NOTE: use mkfs /dev/mapper/zone-serial if +# you enabled use dmsetup on the disk. +# 6) Consider using nofail mount option in case you run an supported kernel +# +# You can use this udev rules file for 2) 3) and 4). Further details below. +# +# 1) Enable zone disk support in your kernel +# +# o CONFIG_BLK_DEV_ZONED +# o CONFIG_DM_ZONED +# +# This will let the kernel actually see these devices, ie, via fdisk /dev/sda +# for instance. Run: +# +# dmzadm --format /dev/sda + +# 2) Set deadline or mq-deadline for all disks which are zoned +# +# Zoned disks can only work with the deadline or mq-deadline scheduler. This is +# mandated for all SMR drives since v4.16. It has been determined this must be +# done through a udev rule, and the kernel should not set this up for disks. +# This magic will have to live for *all* zoned disks. +# XXX: what about distributions that want mq-deadline ? Probably easy for now +# to assume deadline and later have a mapping file to enable +# mq-deadline for specific serial devices? +ACTION=="add|change", KERNEL=="sd*[!0-9]", ATTRS{queue/zoned}=="host-managed", \ + ATTR{queue/scheduler}="deadline" + +# 3) Blacklist f2fs devices as of v4.10 +# We don't have to run dmsetup on on disks where you want to use f2fs, so you +# can use this rule to skip dmsetup for it. First get the serial short number. +# +# udevadm info --name=/dev/sda | grep -i serial_shor +# XXX: To generalize this for distributions consider using INPUT{db} to or so +# and then use that to check if the serial number matches one on the database. +#ACTION=="add", SUBSYSTEM=="block", ENV{ID_SERIAL_SHORT}=="XXA1ZFFF", GOTO="zone_disk_group_end" + +# 4) We need to run dmsetup if you want to use other filesystems +# +# dmsetup is not persistent, so it needs to be run on upon every boot. We use +# the device serial number for the /dev/mapper/ name. +ACTION=="add", KERNEL=="sd*[!0-9]", ATTRS{queue/zoned}=="host-managed", \ + RUN+="/sbin/dmsetup create zoned-$env{ID_SERIAL_SHORT} --table '0 %s{size} zoned $devnode'", $attr{size} + +# 4) Create a filesystem for the device +# +# Be 100% sure you use /dev/mapper/zone-$YOUR_DEVICE_SERIAL for the mkfs +# command as otherwise things can break. +# +# XXX: preventing the above proactively in the kernel would be ideal however +# this may be hard. +# +# Once you create the filesystem it will get a UUID. +# +# Find out what UUID is, you can do this for instance if your zoned disk is +# your second device-mapper device, ie dm-1 by: +# +# ls -l /dev/disk/by-uuid/dm-1 +# +# To figure out which dm-$number it is, use dmsetup info, the minor number +# is the $number. +# +# 5) Add an etry in /etc/fstab with nofail for example: +# +# UUID=99999999-aaaa-bbbb-c1234aaaabbb33456 /media/monster xfs nofail 0 0 +# +# nofail will ensure system boots fine even if you boot into a kernel which +# lacks support for the device and so it is not found. Since the UUID will +# always match the device we don't care if the device moves around the bus +# on the system. We just need to get the UUID once. + +LABEL="zone_disk_group_end"