From patchwork Mon Apr 26 06:27:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76DADC433B4 for ; Mon, 26 Apr 2021 06:28:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4EB0B611CC for ; Mon, 26 Apr 2021 06:28:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231806AbhDZG2w (ORCPT ); Mon, 26 Apr 2021 02:28:52 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231547AbhDZG2v (ORCPT ); Mon, 26 Apr 2021 02:28:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418490; x=1650954490; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VYj2D3vCfh0AB0IqIZFogs4QFkCRHDdkEehdh8qZFRY=; b=OSHlGW5yk+Fqvb6QOZOWVk1+vZoFj5neCPI4gPrMjtzLxNgva1+mkdf8 S707buVErYo9xQT3rnRtlgwlOq343f0ppWKrz+PefMopWgWdTmpdO9IqZ 8rF8fn7HR/DjbRbrqkHsb0zsqdP70qGZxRrSGOcYNZrYCqodmv48Xm7JT XARKe0Oeg/u2DS1yb7VRgIIqYGqCw+5j9MqF20O2doaVwJOZzJzJYw4E6 t6Md4lAs3iyRHmuVu+kJxxUF+OhQX4u3zxdSib1UFUKE88pD7Tn65+HAo GKcfEUg+7lJXMqYFRKDugD9qYe8UOoMXsRQ0nleiRCWDn0EQX9Kz02gc8 Q==; IronPort-SDR: UCY5am4aABkWGfbO/5/a2vHa/rCakDG+LzWl4LS62z1zvHVFjMV8YU8sYf9uiHDsu61kElX3Qt eV3/23e0kNB5HkU9iYc1BPceLwOob6nP7CxUYPMSVyTtx7baV0Nl5Evd+6/eVoSIMh7NBnq/3L YTqyw1SS3Nj4VaHffwRkE5Iz1qYtKZWz4Sxsshg5G6hvr6NRbbUGIz464mwjCeY0uKb7r1jwtW 22U4MEQrvHR/D4rS6xjuhhs0jOOuzEJ1qA/ayHW53a6sZZcO2INU2BCiMfNGZuingjzZAqVN6t 1co= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788101" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:10 +0800 IronPort-SDR: unOKvWF4hO7O7/sL6AZkaIfHwwpD48tas0PB9r2wnqKmdoUdJzw5m9IIj6kgLLUknk7/TiNxxr OEOP8GYBzI1MuwSVKiN2vblDrhqaAKI8tmWHF43VCVc9meEAVFHykdoM3DRl72SdE5vxqPJsTA t+TMlrHySa80yp2yLLmEcjLkH+vIcEEbxdk3Gon3dZ6mU8Ut+1fGqao7I3JUFEBXQGLjjqcKql +0nDhgzuuDOeMvO2XN14nYYvCap9zhJEr3gLsxsHEKchAwe6bxRgv56cMm/17gTZ7JZVxjpePR OBWKC+mYdNNd8o4ACSH2CNzy Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:34 -0700 IronPort-SDR: UuU41OTQYUQ0zsC/kZF0hIqa19D7j/bt2cnwZut3CNRGEppJuptA26iQFcP7JIdaz4vOOpmpbi F7EKNUJpjZaKXLRxLR32KQNi7cKPcZOHasZvy9YSJY3m/IVjAw1bx/yhyquinFdhuFZ3fAGkzO 8c7tzGgRDMtvEglZfauiPgH3ZwyQQQxE3lw/7dgRGtsq7JfYfdSVolhpNKFDDycmlpJRk1xWnH yCs3eZdlHl/QBpUEiRvjLE4z8LP/OyRZIJtOM0rHNi039FAT7wWqCZsC8Ms0DeEdh44j5Cwf71 VmA= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:09 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota , Damien Le Moal Subject: [PATCH 01/26] btrfs-progs: utils: Introduce queue_param helper function Date: Mon, 26 Apr 2021 15:27:17 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Introduce the queue_param helper function to get a device request queue parameter. This helper will be used later to query information of a zoned device. Furthermore, rewrite is_ssd() using the helper function. Signed-off-by: Damien Le Moal [Naohiro] fixed error return value Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- common/device-utils.c | 46 +++++++++++++++++++++++++++++++++++++++++++ common/device-utils.h | 1 + mkfs/main.c | 40 ++----------------------------------- 3 files changed, 49 insertions(+), 38 deletions(-) diff --git a/common/device-utils.c b/common/device-utils.c index c860b94661c4..f5d5277e8fce 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -252,3 +252,49 @@ u64 get_partition_size(const char *dev) return result; } +/* + * Get a device request queue parameter. + */ +int queue_param(const char *file, const char *param, char *buf, size_t len) +{ + blkid_probe probe; + char wholedisk[PATH_MAX]; + char sysfs_path[PATH_MAX]; + dev_t devno; + int fd; + int ret; + + probe = blkid_new_probe_from_filename(file); + if (!probe) + return 0; + + /* Device number of this disk (possibly a partition) */ + devno = blkid_probe_get_devno(probe); + if (!devno) { + blkid_free_probe(probe); + return 0; + } + + /* Get whole disk name (not full path) for this devno */ + ret = blkid_devno_to_wholedisk(devno, + wholedisk, sizeof(wholedisk), NULL); + if (ret) { + blkid_free_probe(probe); + return 0; + } + + snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/%s", + wholedisk, param); + + blkid_free_probe(probe); + + fd = open(sysfs_path, O_RDONLY); + if (fd < 0) + return 0; + + len = read(fd, buf, len); + close(fd); + + return len; +} + diff --git a/common/device-utils.h b/common/device-utils.h index 70d19cae3e50..d1799323d002 100644 --- a/common/device-utils.h +++ b/common/device-utils.h @@ -29,5 +29,6 @@ u64 disk_size(const char *path); u64 btrfs_device_size(int fd, struct stat *st); int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, u64 max_block_count, unsigned opflags); +int queue_param(const char *file, const char *param, char *buf, size_t len); #endif diff --git a/mkfs/main.c b/mkfs/main.c index c910369cbf94..a903896289fa 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -434,49 +434,13 @@ static int zero_output_file(int out_fd, u64 size) static int is_ssd(const char *file) { - blkid_probe probe; - char wholedisk[PATH_MAX]; - char sysfs_path[PATH_MAX]; - dev_t devno; - int fd; char rotational; int ret; - probe = blkid_new_probe_from_filename(file); - if (!probe) + ret = queue_param(file, "rotational", &rotational, 1); + if (ret < 1) return 0; - /* Device number of this disk (possibly a partition) */ - devno = blkid_probe_get_devno(probe); - if (!devno) { - blkid_free_probe(probe); - return 0; - } - - /* Get whole disk name (not full path) for this devno */ - ret = blkid_devno_to_wholedisk(devno, - wholedisk, sizeof(wholedisk), NULL); - if (ret) { - blkid_free_probe(probe); - return 0; - } - - snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/rotational", - wholedisk); - - blkid_free_probe(probe); - - fd = open(sysfs_path, O_RDONLY); - if (fd < 0) { - return 0; - } - - if (read(fd, &rotational, 1) < 1) { - close(fd); - return 0; - } - close(fd); - return rotational == '0'; } From patchwork Mon Apr 26 06:27:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EE64C43460 for ; Mon, 26 Apr 2021 06:28:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7118160FE5 for ; Mon, 26 Apr 2021 06:28:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231878AbhDZG2x (ORCPT ); Mon, 26 Apr 2021 02:28:53 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231547AbhDZG2w (ORCPT ); Mon, 26 Apr 2021 02:28:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418491; x=1650954491; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qR+05otdO4j1VtWAkrYMlCkmh0VIFngmakMgf6s/9uA=; b=SIi/arNtwlrlmwfi2ftjReJJOJwHMIclmYjRDR+LFyYzRj90ewSRnwpj tINLpkYzTnJ+QPWCPONPjsAYOCjvYkuPvqUSwckPp5m2ZDXbVvhgN2C73 xjdFDJqIjrZqVUEydvIPeMCaDssGNa+kApEsKuDOmhAIAzbqmlV3IZfb2 rpJRGSAtWuOaYcIpqWY7mitFpVtgwaNnbzYebghxhBaE94o0D+8Lcjk0+ KfGq1twfH8iH/MIHykjfeAdMBRumird7a6ZyjG0B9FUAb6y3a3dyCnnRP DWYM8mX9r4rTW2fFUzqsgR1JBWb7KvibXua/vQoaJLv0kFlZexmX+Bezx w==; IronPort-SDR: Cyqq+6t6aYikqkPtApnR+oGfgnBYlE0RHodbXF8y7VzUuB7UcuRD9rUtpqO8VDn7sPv8eN3dK/ zBpFIZXKqOQF4LpJSx3Kb12u0nW3Oq273s8s2CLk+/vX+WkXW7nXyqI/EPTDO554ZpRzgpiwyM twst4J0zQWS/EFceIKG1JvOYCBCFXtJiaU6lbh7iQyx2XuI7LRW/DS30S66OpM0SR3VTfw7wNw CfguO9v49tjdZ/qV2o+66YqVSKhv46I7gPr7G+PKi87Pd88LpYeBeA4LV7X+9rjGN6Y/SoCaBU Rdo= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788105" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:10 +0800 IronPort-SDR: osZP+ye4j9Kc9iFjkv2fllQec72dbFWRVLfDRlxrccHFa8W08qcFmSaOjapXP+9QLWYe9F9iXl JYXC+IdShNMSO9xKlf3YizUx27l1Du3VP3speYJU43KHfP9dHwAOxREmoj8N6DLejNaHTKMMsD ydE7go5Z5ga9BcHfGs51+7hx8w61zmhpDH2VpNyAZJCSHVj/bA2B9fBR/upW8sV52NTXNZEC4e a9mq2Q9TsReGaiUblbNyASHqhKFpS3wdkmVQ0QLXHr1UIzzCo6Lyd9m6DvTB02AVO+03e9di43 RVyW+H9mOxIJrwc02xVFF1Ec Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:35 -0700 IronPort-SDR: JtwrsZgxbifRM4P9g10HcxlaTYnyGzxnz/6JbJXjkVfLs1ROW8DvCM3fA3n1REOpLhuA30A1dB 03xcYrlSH2RNjbfCvXp7Xmnjs+m+Z8hUp9yW+qVu1US8xqV1a4fwP5zKPvFpiilTNovcp2DQDJ ZSDO1fuex6Y35gsNtWJ/qUz8X7Ud2YfwIOSpoH8NueFvbkl3j224cuSdFht5CG7KytI6lB/kMX YMLlaTci5nzxmDzOBujDsV9Q30aL8goVSFkin3DaNuzyfdQqhpiKkO8zpkRa7bc9VVbdljtF/o wkI= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:11 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 02/26] btrfs-progs: provide fs_info from btrfs_device Date: Mon, 26 Apr 2021 15:27:18 +0900 Message-Id: <6bc8f7eb4dee9cc47219fd0930c892b6683c3b14.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Likewise in the kernel code, provide fs_info access from struct btrfs_device. This will help to unify the code between the kernel and the userland. Since fs_info can be NULL at the time of btrfs_add_to_fsid(), let's use btrfs_open_devices() to set fs_info to the devices. Signed-off-by: Naohiro Aota --- cmds/rescue-chunk-recover.c | 2 +- common/device-scan.c | 1 + kernel-shared/disk-io.c | 2 +- kernel-shared/volumes.c | 8 ++++++-- kernel-shared/volumes.h | 5 +++-- 5 files changed, 12 insertions(+), 6 deletions(-) diff --git a/cmds/rescue-chunk-recover.c b/cmds/rescue-chunk-recover.c index 5f21672b9d3e..216a6226b0f7 100644 --- a/cmds/rescue-chunk-recover.c +++ b/cmds/rescue-chunk-recover.c @@ -1446,7 +1446,7 @@ open_ctree_with_broken_chunk(struct recover_control *rc) fs_info->is_chunk_recover = 1; fs_info->fs_devices = rc->fs_devices; - ret = btrfs_open_devices(fs_info->fs_devices, O_RDWR); + ret = btrfs_open_devices(fs_info, fs_info->fs_devices, O_RDWR); if (ret) goto out; diff --git a/common/device-scan.c b/common/device-scan.c index cd4c12821078..01d2e0656583 100644 --- a/common/device-scan.c +++ b/common/device-scan.c @@ -141,6 +141,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans, dev_item = &disk_super->dev_item; uuid_generate(device->uuid); + device->fs_info = fs_info; device->devid = 0; device->type = 0; device->io_width = io_width; diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index 5555a406321b..a78be1e7a692 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -1271,7 +1271,7 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, if (flags & OPEN_CTREE_EXCLUSIVE) oflags |= O_EXCL; - ret = btrfs_open_devices(fs_devices, oflags); + ret = btrfs_open_devices(fs_info, fs_devices, oflags); if (ret) goto out; diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c index f7dd879398d4..cbcf7bfa371d 100644 --- a/kernel-shared/volumes.c +++ b/kernel-shared/volumes.c @@ -389,13 +389,17 @@ void btrfs_close_all_devices(void) } } -int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, int flags) +int btrfs_open_devices(struct btrfs_fs_info *fs_info, + struct btrfs_fs_devices *fs_devices, int flags) { int fd; struct btrfs_device *device; int ret; list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (!device->fs_info) + device->fs_info = fs_info; + if (!device->name) { printk("no name for device %llu, skip it now\n", device->devid); continue; @@ -2106,7 +2110,7 @@ static int open_seed_devices(struct btrfs_fs_info *fs_info, u8 *fsid) memcpy(fs_devices->fsid, fsid, BTRFS_FSID_SIZE); } - ret = btrfs_open_devices(fs_devices, O_RDONLY); + ret = btrfs_open_devices(fs_info, fs_devices, O_RDONLY); if (ret) goto out; diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h index e1d7918dd30b..faaa285dbf11 100644 --- a/kernel-shared/volumes.h +++ b/kernel-shared/volumes.h @@ -28,6 +28,7 @@ struct btrfs_device { struct list_head dev_list; struct btrfs_root *dev_root; struct btrfs_fs_devices *fs_devices; + struct btrfs_fs_info *fs_info; u64 total_ios; @@ -282,8 +283,8 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 *num_bytes, u64 type); int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 *start, u64 num_bytes); -int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, - int flags); +int btrfs_open_devices(struct btrfs_fs_info *fs_info, + struct btrfs_fs_devices *fs_devices, int flags); int btrfs_close_devices(struct btrfs_fs_devices *fs_devices); void btrfs_close_all_devices(void); int btrfs_insert_dev_extent(struct btrfs_trans_handle *trans, From patchwork Mon Apr 26 06:27:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223875 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44C2CC43461 for ; Mon, 26 Apr 2021 06:28:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1FB48611CC for ; Mon, 26 Apr 2021 06:28:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231903AbhDZG2y (ORCPT ); Mon, 26 Apr 2021 02:28:54 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231901AbhDZG2x (ORCPT ); Mon, 26 Apr 2021 02:28:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418492; x=1650954492; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6P2gg3pGzirHJwFMY6j+77k45SzmvWkOmRc14aa3gh0=; b=OTa0RFQOhsxU+sJiaIXNoW7eOc1l+9cWvZSRVruiDIJmgy2PEAg5wTZD WT1oYWpG3wmsuH8cYR/rUIoOUq3t3Me2GZa06WG9zJbxDeoEU11Ts+Tkz rmw0NqJkVQZ1R/jEDfB5l/w0GHABuUSL8xL533aYU2n/h6FuRDQw/4XOP 9Xcc58+5zaEcQ6sXhkXLALqGXmqOj40/iTQ4SFkUPHLe5mL0NXnUjRmzr UVUmJMOL70WRAwYyHp/Y9hzKWnrZhib2Os74VPbwslsXBzdRyzozG1yC7 7fCDMAkSbXKBtjHFiW94ifkCn0N8gQQEeONRK3j983amIIU8RBInFdSkq A==; IronPort-SDR: T+/QzQGPVgjG3OVyT1Um8GmPrLCnxRrY88g8TTWbJN+drcp/at/HDIvuRVBbknsilaPZSLlYB/ icp7pmWK+Wstj+5MslGtR7V42AEAXu1gTvjxiFaVGtfrB+JoJdkbqJ8HDkU0nUhFC1CxbFdfLm jkP+fDABFdwBeozmzUN+vQBFaAr+TyTDRufT6Uz8eRwzJu4KaiBcY/yBvrW9f5EwmGBT90maWA Qlq/lGlW2LjyUU1KAyBuIMzgQIc4t1L/4Vvt9HFLGoCZHq+BWMFFaz9KonZ4RTwYSNJXYtc6vd WII= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788108" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:12 +0800 IronPort-SDR: qJfGv/f/LYDHkPG6Myr6JJ64+tNueoKiQZkpqGuTLmP1veXseTOq3UFZ7RPrJcl1oVQKlyiR8Q +K5HRCz6bzvwwI5DuXsVNzsYtBPSTWv9MRA/SR9ARSm/VORXIGUfNu+08BRdXaWA+LslGT7/HA hz6IK3ReOjJ442eoBI8jZ2izlxVwUwQbDx0hCurWDvhBzqaa4lPAXUwO9bNlFcJm4vJoM3vzlh zrYvUP/aoIWE5Myar/brC8whkN5+7709ghpq4y46wbxtcp+BaVT0StS3Q4DB8aLwWr+fxcRKnU aF9Bxyqp6M2FEFF+x18VqIOs Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:36 -0700 IronPort-SDR: ZUcfqRKwpcHIGC7hEBHHsKa6HBbx2jogi16WWUWePptiJywDAX/XKV6Qi3GiHOox7+J6EY68yd 1uDVEIN9UseqJk6qsmWE3OTAoFhwsUxeWL6LwF5/6hoczeu/oVwhKPwBsI9YAToDNcgeanIzVg 9uVwQ+dqBFSjPG4N2elth1C7QhYvUI2hyyEf+6N2ZCv57XJwFrg/HKANphhS30EA6Cmnp5VF7/ guB0ZmX3v/U9j6yj2Bm64GuHTaTislzi6qXmue09jAeer1tw2uV32aG62/u1cUTPhcpCFULtyg lcQ= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:12 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota , Damien Le Moal Subject: [PATCH 03/26] btrfs-progs: build: zoned: Check zoned block device support Date: Mon, 26 Apr 2021 15:27:19 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If the kernel supports zoned block devices, the file /usr/include/linux/blkzoned.h will be present. Check this and define BTRFS_ZONED if the file is present. If it present, enables ZONED feature, if not disable it. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- configure.ac | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/configure.ac b/configure.ac index 6ea29e0a5a06..5ad95d662b47 100644 --- a/configure.ac +++ b/configure.ac @@ -250,6 +250,18 @@ AX_CHECK_DEFINE([ext2fs/ext2_fs.h], [EXT4_EPOCH_MASK], [Define to 1 if e2fsprogs defines EXT4_EPOCH_MASK])], [AC_MSG_WARN([no definition of EXT4_EPOCH_MASK found, probably old e2fsprogs, no 64bit time precision of converted images])]) +AC_CHECK_HEADER(linux/blkzoned.h, [blkzoned_found=yes], [blkzoned_found=no]) +AC_ARG_ENABLE([zoned], + AS_HELP_STRING([--disable-zoned], [disable zoned block device support]), + [], [enable_zoned=$blkzoned_found] +) + +AS_IF([test "x$enable_zoned" = xyes], [ + AC_CHECK_HEADER(linux/blkzoned.h, [], + [AC_MSG_ERROR([Couldn't find linux/blkzoned.h])]) + AC_DEFINE([BTRFS_ZONED], [1], [enable zoned block device support]) +]) + dnl Define _LIBS= and _CFLAGS= by pkg-config dnl dnl The default PKG_CHECK_MODULES() action-if-not-found is end the @@ -367,6 +379,7 @@ AC_MSG_RESULT([ Python bindings: ${enable_python} Python interpreter: ${PYTHON} crypto provider: ${cryptoprovider} ${cryptoproviderversion} + zoned device: ${enable_zoned} Type 'make' to compile. ]) From patchwork Mon Apr 26 06:27:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 613C8C433B4 for ; Mon, 26 Apr 2021 06:28:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3C632611F0 for ; Mon, 26 Apr 2021 06:28:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231911AbhDZG2z (ORCPT ); Mon, 26 Apr 2021 02:28:55 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231901AbhDZG2y (ORCPT ); Mon, 26 Apr 2021 02:28:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418493; x=1650954493; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EgvfzCaTLJz3+RGoOkKZ6Cb8r//5rm/8tKwkYtPqSfA=; b=M/b721q8HL3S73/Z3g5anW8cJRRhpSmNiE/ENkxtvRv8zF/zE09u522b q7/MxEtpXuWIHwIkBJ+hPjwKTaLe7VIWK5F5amfStZk2r4cOq8AuXJd9i wfjZ6/3EjvKQcB3esJnv8vBOhpXVwg03L0x+SpaepN7TYTp60gC+w/I9O /hD07kPSdB8NTHtmoawS2woJyOOSswOPgtI2IkSk6WNjXJmF2XCfHMIbM nPy8yNTEV5+G/Lywmve+/67Clc2WUkM56y8oRru3CrDUV1UA+GAoOxCwB 8ds/hmv9SrCZiSOCO9z4LJyoitNcKlnyoWZLw0+781W1DpRGviNPPR5TJ A==; IronPort-SDR: 4Cdqm7f15ude4XCEpNOW2GdmyDbbarvZNEZrOGOPbUSsxHH96/oNm+fIUTRy3wy9D9xl5Vi8U1 QatUaks1PJVemfS7RVXw904dgjKSiNRnxFkUyjLoByeEtgKtBG4M7+gi3u53G6te4k8PkMXqFE hpfFB1mKuGGVvR1N7atBB2BnWxJ7pVmEPPaRkNcJYb/LlEss6Z2725WMS9Ad9JScMPPXNV8Fw4 xMbN+hD0D3l9Sma1jLKLp2L5cL0qF2I34wRt3YKsE0+1zPS1G7qoegEAlL37GFlB1wxDUb+RAD gwE= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788109" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:13 +0800 IronPort-SDR: WYZO6jMh2dYeCapfkorl5j9MA2rV8zImJFualnP2Qkd9+fXgCEnfTKQuEQzao7J5s3XffCzW9/ /8OKGZSNryK1s/iVIE9D1lhCg2k0DVs/htETaztjebDDUuoqb+OxEVXNxUVgRQQDyQTZ+SUBjw CmeShvBZ9Jx+2N/30IMDo5MBB3uQ/d6St8Nbq+1giMUnomMrvVQYNUukLB/oVY60ok+i/Jt+ng S42w45lP32VItBD+IetAZhMxLjs/tm8KrDq67Iv37/BGEK1Lhx4ySYUdlt6hWvfXwVlkIykTIT 1WW3HBHgYpwdYXPOrpcQWgsn Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:37 -0700 IronPort-SDR: dcmMJdYBHLLmpmJfOnRTMGRUNPtg15T0VOJKLx2XgoHwMaqN9zzFQ6IBS0F6dhUErqR2vu6oCU FquMaCtF/4BFwVPevR0uFdEVhhHWuf4UMZXev5OmkGNDuSANyh+EhVQ7PTCRRqkwGj+cuaTE0r BQEDz7woRELAkVl6ieb65bceSRK7wdF2DR0CI6ri/HLO34mrB1yibCFa64IPfI1kX8hcII+Xj8 YNDeDtelxt5b01Ludg0C0Anq3Tig7LshcQyLg7ehUwe3uiC6uoR6JrAESeUaY7AbqK87ok6ZUw S1c= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:13 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 04/26] btrfs-progs: zoned: add new ZONED feature flag Date: Mon, 26 Apr 2021 15:27:20 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org With the zoned feature enabled, a zoned block device-aware btrfs allocates block groups aligned to the device zones and always write in sequential zones at the zone write pointer position. It also supports "emulated" zoned mode on a non-zoned device. In the emulated mode, btrfs emulates conventional zones by slicing the device with a fixed size. We don't support conversion from the ext4 volume with the zoned feature because we can't be sure all the converted block groups are aligned to zone boundaries. Signed-off-by: Naohiro Aota --- common/fsfeatures.c | 8 ++++++++ common/fsfeatures.h | 3 ++- kernel-shared/ctree.h | 4 +++- kernel-shared/print-tree.c | 1 + 4 files changed, 14 insertions(+), 2 deletions(-) diff --git a/common/fsfeatures.c b/common/fsfeatures.c index 569208a9e5b1..c0793339b531 100644 --- a/common/fsfeatures.c +++ b/common/fsfeatures.c @@ -100,6 +100,14 @@ static const struct btrfs_feature mkfs_features[] = { NULL, 0, NULL, 0, "RAID1 with 3 or 4 copies" }, +#ifdef BTRFS_ZONED + { "zoned", BTRFS_FEATURE_INCOMPAT_ZONED, + "zoned", + NULL, 0, + NULL, 0, + NULL, 0, + "support Zoned devices" }, +#endif /* Keep this one last */ { "list-all", BTRFS_FEATURE_LIST_ALL, NULL } }; diff --git a/common/fsfeatures.h b/common/fsfeatures.h index 74ec2a21caf6..1a7d7f62897f 100644 --- a/common/fsfeatures.h +++ b/common/fsfeatures.h @@ -25,7 +25,8 @@ | BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) /* - * Avoid multi-device features (RAID56) and mixed block groups + * Avoid multi-device features (RAID56), mixed block groups, and zoned + * btrfs */ #define BTRFS_CONVERT_ALLOWED_FEATURES \ (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF \ diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h index 7683b8bbf0b4..77a5ad488104 100644 --- a/kernel-shared/ctree.h +++ b/kernel-shared/ctree.h @@ -495,6 +495,7 @@ struct btrfs_super_block { #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) #define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) +#define BTRFS_FEATURE_INCOMPAT_ZONED (1ULL << 12) #define BTRFS_FEATURE_COMPAT_SUPP 0ULL @@ -519,7 +520,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ - BTRFS_FEATURE_INCOMPAT_METADATA_UUID) + BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ + BTRFS_FEATURE_INCOMPAT_ZONED) /* * A leaf is full of items. offset and size tell us where to find diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c index 92df05c15d68..76853aee8634 100644 --- a/kernel-shared/print-tree.c +++ b/kernel-shared/print-tree.c @@ -1614,6 +1614,7 @@ static struct readable_flag_entry incompat_flags_array[] = { DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES), DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID), DEF_INCOMPAT_FLAG_ENTRY(RAID1C34), + DEF_INCOMPAT_FLAG_ENTRY(ZONED), }; static const int incompat_flags_num = sizeof(incompat_flags_array) / sizeof(struct readable_flag_entry); From patchwork Mon Apr 26 06:27:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B729C433B4 for ; Mon, 26 Apr 2021 06:28:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1C26F60FE5 for ; Mon, 26 Apr 2021 06:28:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231933AbhDZG25 (ORCPT ); Mon, 26 Apr 2021 02:28:57 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231901AbhDZG24 (ORCPT ); Mon, 26 Apr 2021 02:28:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418494; x=1650954494; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3GvVN3giBmmzVscF+Pu5EJ7P3int0VFMPJX5o/133Uk=; b=NVVKk0cc+HB3CLAqXq70Z/a16bXP+sgufzdOrEgGGHqNV8mpmXQplWnP UjHKQY9CraUtUccum+YNx3Ze7AyFjP1hJ+ENMQb6VHrvYPvqmQBKcosWF 8yOfT0mOeQV5J6mHiaqnNiznBVS7twzJ4PrMTLDPxj+5cTivsHDDBz8bu +a2XYt2arhBjqTn2Tlj+7pxydCMPvv16eSEEFEcRMfVOuprBmNo19ge8c OlHv3+uTsIIfMfS4rr1DdKRwZoYITo0dzgEZWJQ6LtCDR1EZbndffrH8l DUjJA8p8zuuYqveMO/exdgiVgNHPA/p78dSjhHr/B2yUPmp3tRaGGcSou A==; IronPort-SDR: mipjMVZH9Qn8oLxZTmIPIhD7nxUZnDWm8X3KvN3N12NO51UahqRKPAzN8optQixRX+ftnbaCUt pGvXZJdlA6v2PfJxX7UjJ7mfadsv1sMflLOnFX/8ph+WwkEKGbN0ztgZRWnPoqnt59GVMubj57 t+/yZQJpLGbnbYbYD7VSuBVOurXeI6fki3n+YrkxRy4qUEn6hv8Afaq1eHtMOSBKehL8yih8g5 KVov05VlKcb/8JucS6Ol2axskHmJVYDGbGAq8jNiqmppVycEhSp9hbqo7DTV+cs7M+lRuGX6IU gn0= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788113" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:14 +0800 IronPort-SDR: L96OuES7zDvLl18uAH+3II8A3YcWF5TAjjoHn08Q5x5J/0GgJYJz/MvWTW6c165J1PKQ9LRlz9 F/cSbFyoOv5ICq083ZXDGG6C9i7ZQkl6fXjmJG60P2mpwR8YFo78PcY3fcfw9es2OtdCKFALuB iHKpkGyHI2YR5VfVKjJTnQ0agtLyixcE/Au1CLuANm7hdks1b+WxTHS/DhOseuQWOHZBD4yAdE JzXPDHhyTUXcwdp9DT465Ue1vNCKfoko4/Xy2VtulFiYepGcApMwFoNzNrOdVkV1lnKMWpVc3+ kDHPf9Ivl2VWXlcI6XXC97jS Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:38 -0700 IronPort-SDR: 7QluCnrICdi0DUJVcVbJlKMPEUEIEKuV+FXqtojjDUsOdb8MQvUFw9gfkuaF25z2C4xlKlLGwR pUYIlTzKdMIPho54Sc5ujDfbUcFsdv65Ozof7zyhaaK603g7XJzJc6qb4z90lQMspK/9H/TiA6 NaXyjAewd189hT0hLmjDnzQxxR7s11PIrpenDoo5sSS7iv9GdhG7fAFK7n0W2599LRjJqpD96+ LRlpwrd742L18siJ5BbxoqBLIdbWEUJITuDq9VVh0LKl2tzdmjLRKTXU1m18boYXeV1mPiPqer oPk= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:14 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 05/26] btrfs-progs: zoned: get zone information of zoned block devices Date: Mon, 26 Apr 2021 15:27:21 +0900 Message-Id: <0c6581b81ebb7bc573c6c6c533c134b34b294dca.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Get the zone information (number of zones and zone size) from all the devices, if the volume contains a zoned block device. To avoid costly run-time zone report commands to test the device zones type during block allocation, it also records all the zone status (zone type, write pointer position, etc.). Signed-off-by: Naohiro Aota --- Makefile | 2 +- common/device-scan.c | 2 + kerncompat.h | 4 + kernel-shared/disk-io.c | 12 ++ kernel-shared/volumes.c | 2 + kernel-shared/volumes.h | 2 + kernel-shared/zoned.c | 242 ++++++++++++++++++++++++++++++++++++++++ kernel-shared/zoned.h | 42 +++++++ 8 files changed, 307 insertions(+), 1 deletion(-) create mode 100644 kernel-shared/zoned.c create mode 100644 kernel-shared/zoned.h diff --git a/Makefile b/Makefile index e288a336c81e..3dc0543982b2 100644 --- a/Makefile +++ b/Makefile @@ -169,7 +169,7 @@ libbtrfs_objects = common/send-stream.o common/send-utils.o kernel-lib/rbtree.o kernel-shared/free-space-cache.o kernel-shared/root-tree.o \ kernel-shared/volumes.o kernel-shared/transaction.o \ kernel-shared/free-space-tree.o repair.o kernel-shared/inode-item.o \ - kernel-shared/file-item.o \ + kernel-shared/file-item.o kernel-shared/zoned.o \ kernel-lib/raid56.o kernel-lib/tables.o \ common/device-scan.o common/path-utils.o \ common/utils.o libbtrfsutil/subvolume.o libbtrfsutil/stubs.o \ diff --git a/common/device-scan.c b/common/device-scan.c index 01d2e0656583..74d7853afccb 100644 --- a/common/device-scan.c +++ b/common/device-scan.c @@ -35,6 +35,7 @@ #include "kernel-shared/ctree.h" #include "kernel-shared/volumes.h" #include "kernel-shared/disk-io.h" +#include "kernel-shared/zoned.h" #include "ioctl.h" static int btrfs_scan_done = 0; @@ -198,6 +199,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans, return 0; out: + free(device->zone_info); free(device); free(buf); return ret; diff --git a/kerncompat.h b/kerncompat.h index 7060326fe4f4..a39b79cba767 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -76,6 +76,10 @@ #define ULONG_MAX (~0UL) #endif +#ifndef SECTOR_SHIFT +#define SECTOR_SHIFT 9 +#endif + #define __token_glue(a,b,c) ___token_glue(a,b,c) #define ___token_glue(a,b,c) a ## b ## c #ifdef DEBUG_BUILD_CHECKS diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index a78be1e7a692..0519cb2358b5 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -29,6 +29,7 @@ #include "kernel-shared/disk-io.h" #include "kernel-shared/volumes.h" #include "kernel-shared/transaction.h" +#include "zoned.h" #include "crypto/crc32c.h" #include "common/utils.h" #include "kernel-shared/print-tree.h" @@ -1314,6 +1315,17 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, if (!fs_info->chunk_root) return fs_info; + /* + * Get zone type information of zoned block devices. This will also + * handle emulation of a zoned filesystem if a regular device has the + * zoned incompat feature flag set. + */ + ret = btrfs_get_dev_zone_info_all_devices(fs_info); + if (ret) { + error("zoned: failed to read device zone info: %d", ret); + goto out_chunk; + } + eb = fs_info->chunk_root->node; read_extent_buffer(eb, fs_info->chunk_tree_uuid, btrfs_header_chunk_tree_uuid(eb), diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c index cbcf7bfa371d..63530a99b41c 100644 --- a/kernel-shared/volumes.c +++ b/kernel-shared/volumes.c @@ -27,6 +27,7 @@ #include "kernel-shared/transaction.h" #include "kernel-shared/print-tree.h" #include "kernel-shared/volumes.h" +#include "zoned.h" #include "common/utils.h" #include "kernel-lib/raid56.h" @@ -357,6 +358,7 @@ again: /* free the memory */ free(device->name); free(device->label); + free(device->zone_info); free(device); } diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h index faaa285dbf11..a64288d566d8 100644 --- a/kernel-shared/volumes.h +++ b/kernel-shared/volumes.h @@ -45,6 +45,8 @@ struct btrfs_device { u64 generation; + struct btrfs_zoned_device_info *zone_info; + /* the internal btrfs device id */ u64 devid; diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c new file mode 100644 index 000000000000..370d93915c6e --- /dev/null +++ b/kernel-shared/zoned.c @@ -0,0 +1,242 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +#include "kernel-lib/list.h" +#include "kernel-shared/volumes.h" +#include "kernel-shared/zoned.h" +#include "common/utils.h" +#include "common/device-utils.h" +#include "common/messages.h" +#include "mkfs/common.h" + +/* Maximum number of zones to report per ioctl(BLKREPORTZONE) call */ +#define BTRFS_REPORT_NR_ZONES 4096 + +static int btrfs_get_dev_zone_info(struct btrfs_device *device); + +enum btrfs_zoned_model zoned_model(const char *file) +{ + const char *host_aware = "host-aware"; + const char *host_managed = "host-managed"; + struct stat st; + char model[32]; + int ret; + + ret = stat(file, &st); + if (ret < 0) { + error("zoned: unable to stat %s", file); + return -ENOENT; + } + + /* Consider a regular file as non-zoned device */ + if (!S_ISBLK(st.st_mode)) + return ZONED_NONE; + + ret = queue_param(file, "zoned", model, sizeof(model)); + if (ret <= 0) + return ZONED_NONE; + + if (strncmp(model, host_aware, strlen(host_aware)) == 0) + return ZONED_HOST_AWARE; + if (strncmp(model, host_managed, strlen(host_managed)) == 0) + return ZONED_HOST_MANAGED; + + return ZONED_NONE; +} + +u64 zone_size(const char *file) +{ + char chunk[32]; + int ret; + + ret = queue_param(file, "chunk_sectors", chunk, sizeof(chunk)); + if (ret <= 0) + return 0; + + return strtoull((const char *)chunk, NULL, 10) << SECTOR_SHIFT; +} + +#ifdef BTRFS_ZONED +static int report_zones(int fd, const char *file, + struct btrfs_zoned_device_info *zinfo) +{ + u64 device_size; + u64 zone_bytes = zone_size(file); + size_t rep_size; + u64 sector = 0; + struct blk_zone_report *rep; + struct blk_zone *zone; + unsigned int i, n = 0; + int ret; + + /* + * Zones are guaranteed (by the kernel) to be a power of 2 number of + * sectors. Check this here and make sure that zones are not too + * small. + */ + if (!zone_bytes || !is_power_of_2(zone_bytes)) { + error("zoned: illegal zone size %llu (not a power of 2)", + zone_bytes); + exit(1); + } + /* + * The zone size must be large enough to hold the initial system + * block group for mkfs time. + */ + if (zone_bytes < BTRFS_MKFS_SYSTEM_GROUP_SIZE) { + error("zoned: illegal zone size %llu (smaller than %d)", + zone_bytes, BTRFS_MKFS_SYSTEM_GROUP_SIZE); + exit(1); + } + + /* + * No need to use btrfs_device_size() here, since it is ensured + * that the file is block device. + */ + if (ioctl(fd, BLKGETSIZE64, &device_size) < 0) { + error("zoned: ioctl(BLKGETSIZE64) failed on %s (%m)", file); + exit(1); + } + + /* Allocate the zone information array */ + zinfo->zone_size = zone_bytes; + zinfo->nr_zones = device_size / zone_bytes; + if (device_size & (zone_bytes - 1)) + zinfo->nr_zones++; + zinfo->zones = calloc(zinfo->nr_zones, sizeof(struct blk_zone)); + if (!zinfo->zones) { + error("zoned: no memory for zone information"); + exit(1); + } + + /* Allocate a zone report */ + rep_size = sizeof(struct blk_zone_report) + + sizeof(struct blk_zone) * BTRFS_REPORT_NR_ZONES; + rep = malloc(rep_size); + if (!rep) { + error("zoned: no memory for zones report"); + exit(1); + } + + /* Get zone information */ + zone = (struct blk_zone *)(rep + 1); + while (n < zinfo->nr_zones) { + memset(rep, 0, rep_size); + rep->sector = sector; + rep->nr_zones = BTRFS_REPORT_NR_ZONES; + + ret = ioctl(fd, BLKREPORTZONE, rep); + if (ret != 0) { + error("zoned: ioctl BLKREPORTZONE failed (%m)"); + exit(1); + } + + if (!rep->nr_zones) + break; + + for (i = 0; i < rep->nr_zones; i++) { + if (n >= zinfo->nr_zones) + break; + memcpy(&zinfo->zones[n], &zone[i], + sizeof(struct blk_zone)); + n++; + } + + sector = zone[rep->nr_zones - 1].start + + zone[rep->nr_zones - 1].len; + } + + free(rep); + + return 0; +} + +#endif + +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + int ret = 0; + + /* fs_info->zone_size might not set yet. Use the incomapt flag here. */ + if (!btrfs_fs_incompat(fs_info, ZONED)) + return 0; + + list_for_each_entry(device, &fs_devices->devices, dev_list) { + /* We can skip reading of zone info for missing devices */ + if (device->fd == -1) + continue; + + ret = btrfs_get_dev_zone_info(device); + if (ret) + break; + } + + return ret; +} + +static int btrfs_get_dev_zone_info(struct btrfs_device *device) +{ + struct btrfs_fs_info *fs_info = device->fs_info; + + /* + * Cannot use btrfs_is_zoned here, since fs_info::zone_size might not + * yet be set. + */ + if (!btrfs_fs_incompat(fs_info, ZONED)) + return 0; + + if (device->zone_info) + return 0; + + return btrfs_get_zone_info(device->fd, device->name, + &device->zone_info); +} + +int btrfs_get_zone_info(int fd, const char *file, + struct btrfs_zoned_device_info **zinfo_ret) +{ +#ifdef BTRFS_ZONED + struct btrfs_zoned_device_info *zinfo; + int ret; +#endif + enum btrfs_zoned_model model; + + *zinfo_ret = NULL; + + /* Check zone model */ + model = zoned_model(file); + if (model == ZONED_NONE) + return 0; + +#ifdef BTRFS_ZONED + zinfo = calloc(1, sizeof(*zinfo)); + if (!zinfo) { + error("zoned: no memory for zone information"); + exit(1); + } + + zinfo->model = model; + + /* Get zone information */ + ret = report_zones(fd, file, zinfo); + if (ret != 0) { + kfree(zinfo); + return ret; + } + *zinfo_ret = zinfo; +#else + error("zoned: %s: Unsupported host-%s zoned block device", file, + model == ZONED_HOST_MANAGED ? "managed" : "aware"); + if (model == ZONED_HOST_MANAGED) + return -EOPNOTSUPP; + + error("zoned: %s: handling host-aware block device as a regular disk", + file); +#endif + + return 0; +} diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h new file mode 100644 index 000000000000..461a2d624c67 --- /dev/null +++ b/kernel-shared/zoned.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __BTRFS_ZONED_H__ +#define __BTRFS_ZONED_H__ + +#include +#include "kerncompat.h" + +#ifdef BTRFS_ZONED +#include +#else +struct blk_zone { + int dummy; +}; +#endif /* BTRFS_ZONED */ + +/* + * Zoned block device models. + */ +enum btrfs_zoned_model { + ZONED_NONE = 0, + ZONED_HOST_AWARE, + ZONED_HOST_MANAGED, +}; + +/* + * Zone information for a zoned block device. + */ +struct btrfs_zoned_device_info { + enum btrfs_zoned_model model; + u64 zone_size; + u32 nr_zones; + struct blk_zone *zones; +}; + +enum btrfs_zoned_model zoned_model(const char *file); +u64 zone_size(const char *file); +int btrfs_get_zone_info(int fd, const char *file, + struct btrfs_zoned_device_info **zinfo); +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info); + +#endif /* __BTRFS_ZONED_H__ */ From patchwork Mon Apr 26 06:27:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CED7CC43460 for ; Mon, 26 Apr 2021 06:28:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AC67B61153 for ; Mon, 26 Apr 2021 06:28:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231938AbhDZG26 (ORCPT ); Mon, 26 Apr 2021 02:28:58 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231926AbhDZG25 (ORCPT ); Mon, 26 Apr 2021 02:28:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418495; x=1650954495; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tLxYRaXGwiYIYF/mc8ILCrrBkTsc5FRTTCmIjbckHFM=; b=PygXUXwuHtRvonVRpPAOW807m0RzQ3BPu3G73cWRZpBcvd+KAmCnqlD1 DgXiD508Fsl/ytTPYG/F0S/GijCZiB1pu6gPB7dmgiXVqqFNflsKtphPS Q1RHv7PfcAsB+uQOA/5WXOsrjrWR6Ynw1yg8Tus4KErpFmGx5pHvU7+YC 2FnhOQem4wWXaMe2w7lcRdpy2wvK6PQjd5jZEgLNw/aR/VvYKR4ZAQK41 83wbpjMIKOHXK9ztfBxWsN7VwTAJOP+SpkIHUkRrTeTSJLMkLcDIZWMEL qDRLU9Kt+PS1ol8Gnl1+FN0mZhlOzcWTKQT8huWasSykCrTY9BZ16dHSG w==; IronPort-SDR: iz9/JN4pyo+sx//hMGM3tzJWS1XixjmsxJ92lUlX2+B8u8l0XZ9pi2ljZMCArIgzkSbMPX+lwy T3AGRK6bm2nSB4HlHtBmYaOkuNAwS5W7PF7cnkiaUL6a3hyZQbFgAeX/dsWQkCvfSzMBUvqLB2 tJHmFf3wzQNLyvOaLVz0b953nWkWwONeej/M0FBEhLH9+G8h4M6plg/Grqht7QBVeTxcdrHXLT 36G2lWpExfD0p18mL7oBMEN1cYoBYr2be9iaa92CbWxIl7laaMNAxzcBqlradOocqXDzzhJJWS 2j0= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788114" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:15 +0800 IronPort-SDR: Jp68yTmtc2pqlgVELWunkiVyM4YkJgAFUj/0ILNTcDpSeeFuXio8O1atWdrzPb2YH4D1IIHafC 3BYqyb6ogjpen0CQl1TGwFTG+RLCVv6njbCuigbaRapRIk42377aVec0F7mFit9ajJHnSiOi6K aTJz7c1b42Kynn8j4GU0x1sb+9udOWveTUZqJd3tqAnsBO9EWFOAhSKZKPDGglZt9Yd9DlT2VQ mZ2dzSCjlFqtY/MET1Vlu9dxc9Rpykwu/1Edf8sYfRdbGf+kLOFTudmvHlX+k3ak/m3oNBTwlJ xY4hbCsmeaJiFczrY6Y7QR2B Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:39 -0700 IronPort-SDR: OicOyqO345dlBQKyhZ3ZOP68RC4O1HuUgNUvilTriyVOft7LhJkmiVBwHSnqPzAHnC/AmUcqTQ 13NP89VG8FX1tKpXg+07oaAMaq0uFjgnyo34gIDHt3++WpZwK0nP0TZX6okCBvwIGy770T/V6k vzzrvsi00xCscnPDS2DNYxBrpKEtIEtEofCSE5vBFbvPxB3s+ISu9hcLAJHXIe8y4R7VOmids1 u/x6w7U5Aq1i14EOfjOhIoerkrb9FMvQxPjfOpwpk53uAa34FVB5OJ9NpRXl2X7xbshL+7lbwW Vrw= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:15 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 06/26] btrfs-progs: zoned: check and enable ZONED mode Date: Mon, 26 Apr 2021 15:27:22 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Introduce function btrfs_check_zoned_mode() to check if ZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- kernel-shared/ctree.h | 14 +++++++ kernel-shared/disk-io.c | 6 +++ kernel-shared/zoned.c | 85 +++++++++++++++++++++++++++++++++++++++++ kernel-shared/zoned.h | 1 + 4 files changed, 106 insertions(+) diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h index 77a5ad488104..aab631a44785 100644 --- a/kernel-shared/ctree.h +++ b/kernel-shared/ctree.h @@ -1213,8 +1213,22 @@ struct btrfs_fs_info { u32 nodesize; u32 sectorsize; u32 stripesize; + + /* + * Zone size > 0 when in ZONED mode, otherwise it's used for a check + * if the mode is enabled + */ + union { + u64 zone_size; + u64 zoned; + }; }; +static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info) +{ + return fs_info->zoned != 0; +} + /* * in ram representation of the tree. extent_root is used for all allocations * and for the extent tree extent_root root. diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index 0519cb2358b5..4aba237f5a5c 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -1326,6 +1326,12 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, goto out_chunk; } + ret = btrfs_check_zoned_mode(fs_info); + if (ret) { + error("zoned: failed to initialize zoned mode: %d", ret); + goto out_chunk; + } + eb = fs_info->chunk_root->node; read_extent_buffer(eb, fs_info->chunk_tree_uuid, btrfs_header_chunk_tree_uuid(eb), diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 370d93915c6e..7cb5262ba481 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -240,3 +240,88 @@ int btrfs_get_zone_info(int fd, const char *file, return 0; } + +int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + u64 zoned_devices = 0; + u64 nr_devices = 0; + u64 zone_size = 0; + const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); + int ret = 0; + + /* Count zoned devices */ + list_for_each_entry(device, &fs_devices->devices, dev_list) { + enum btrfs_zoned_model model; + + if (device->fd == -1) + continue; + + model = zoned_model(device->name); + /* + * A Host-Managed zoned device must be used as a zoned device. + * A Host-Aware zoned device and a non-zoned devices can be + * treated as a zoned device, if ZONED flag is enabled in the + * superblock. + */ + if (model == ZONED_HOST_MANAGED || + (model == ZONED_HOST_AWARE && incompat_zoned) || + (model == ZONED_NONE && incompat_zoned)) { + struct btrfs_zoned_device_info *zone_info = + device->zone_info; + + zoned_devices++; + if (!zone_size) { + zone_size = zone_info->zone_size; + } else if (zone_info->zone_size != zone_size) { + error( + "zoned: unequal block device zone sizes: have %llu found %llu", + device->zone_info->zone_size, + zone_size); + ret = -EINVAL; + goto out; + } + } + nr_devices++; + } + + if (!zoned_devices && !incompat_zoned) + goto out; + + if (!zoned_devices && incompat_zoned) { + /* No zoned block device found on ZONED filesystem */ + error("zoned: no zoned devices found on a zoned filesystem"); + ret = -EINVAL; + goto out; + } + + if (zoned_devices && !incompat_zoned) { + error("zoned: mode not enabled but zoned device found"); + ret = -EINVAL; + goto out; + } + + if (zoned_devices != nr_devices) { + error("zoned: cannot mix zoned and regular devices"); + ret = -EINVAL; + goto out; + } + + /* + * stripe_size is always aligned to BTRFS_STRIPE_LEN in + * __btrfs_alloc_chunk(). Since we want stripe_len == zone_size, + * check the alignment here. + */ + if (!IS_ALIGNED(zone_size, BTRFS_STRIPE_LEN)) { + error("zoned: zone size %llu not aligned to stripe %u", + zone_size, BTRFS_STRIPE_LEN); + ret = -EINVAL; + goto out; + } + + fs_info->zone_size = zone_size; + +out: + return ret; +} diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index 461a2d624c67..a6134babdf41 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -38,5 +38,6 @@ u64 zone_size(const char *file); int btrfs_get_zone_info(int fd, const char *file, struct btrfs_zoned_device_info **zinfo); int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info); +int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); #endif /* __BTRFS_ZONED_H__ */ From patchwork Mon Apr 26 06:27:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A820C433ED for ; Mon, 26 Apr 2021 06:28:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E47F160FE5 for ; Mon, 26 Apr 2021 06:28:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231944AbhDZG27 (ORCPT ); Mon, 26 Apr 2021 02:28:59 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231937AbhDZG26 (ORCPT ); Mon, 26 Apr 2021 02:28:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418496; x=1650954496; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tUSU1R6uYdSefDdZrLuIvSJhhlRTRNNwQAAC8iniTis=; b=a+NpKD0YlJKhvoWub1x5/CMF3t434s8yEmOqed/BMEQ3ofwlP1ihRHyT 0U7MqOFurMWbDSbmu5hwFlYDOhZkE8C0Iynpoi4eAycyDqXWfZn0jm3A0 vVFnMfY+r+ENZNr9EAC2V3dGegqsFfYMlKFQ5tNauMz0uFmxwUpqSa2Ie 5iJNfQ5onF8ltYsyQMShiPfNneFTAZDaeCrTn7X21YwjvO4Wc40uKO9QW jzAVyZcYE/tCdH32+StHem//AJnYtd4o39a0cJrGEIAHQQmmJpF9lY8xp TYpUr1xIUvaoOjTalw9x5/5Jyi5mNM9jofv1qD+NnWDMuXLGiZT1xmM8h w==; IronPort-SDR: KzJ3k/ZeoG1NXE2RfBjVLDVymh4jY+dNMslXE/OkmQYS1TwcuvzUsUn4177yy2+1HZE2Mw+Ile g6kSsdCAU7t1jdcOR/dG3vmU+yYm3mpX2iUNMn18ylYUH/E/JExGUK3avwvazMZt6FDiw5ncmK 1kt/PRNK0n4NhKjEKp3bHPm2kxRXSjLIrvWBYb7Baq7RfEM1OieGV5XIoyOahjGi5MgvEm5FfR LfZGCP3SvmFFJRccWmpQIvuVsJeeEOk4aZgyTrVOCYvkgVvLCt3SzNAOGwW117/RC5I/Pwes5b b9M= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788115" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:16 +0800 IronPort-SDR: w5QMKhHu8H4n+Q2tFI9sPFoY3Zgze+AtwKFWQvmcUgkQEotjzSCqILiqRA4dHrwWM+7Er8Hy2U WOoF6lCAyGmngyA+JZzseNnRA0xO+YrZB5feqRUUVq0PaXXA9fLPH7M7G5KKRi4DoqOY40EnQe Ug0kDy51AKjqILBkCySYZ2alG3HV/T5n9/v+Hpb+leKODg7p0d/I7uppxVPd/JeHdfZ9hbPOni QNs3NLssUa5+MDMrLaHCi416YTXmzPNN9r19Bhg2LCSDMD1w6RKEmdeSChw5m8cH2M4jADzQdf qdk8CQez0rLvX6k0F9v0rLcZ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:41 -0700 IronPort-SDR: VGxuJn/PUjbHuissdAbt6FNywGtxfHPHOVezfbjtV6ecy+llcSQMcuXf80/G7V84A6giqs2raj R5P0yPOy37LpgixOLW6C9RtvmpU5RybLPV9fQmeCugEydAiEuu++mlwxq20GyKwdfvJ+r5jDPj O507BiWNRq4Hd23UczwkrDf8KH8UDVUVBycL5NYdQi7aV5qiZVMNjRoKVHFhJEMyrX2v3Fn/+/ yZSxTYIo9XRyjZvGgF2guxL/ZnVUcHNzXPkvhte85qPlNGEVtE6T7xUH5fBXiBachJWR5MDXVE mLA= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:16 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 07/26] btrfs-progs: zoned: introduce max_zone_append_size Date: Mon, 26 Apr 2021 15:27:23 +0900 Message-Id: <1023703371dbf8239f11f7fe8061b26dde480511.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The zone append write command has a maximum IO size restriction it accepts. This is because a zone append write command cannot be split, as we ask the device to place the data into a specific target zone and the device responds with the actual written location of the data. Introduce max_zone_append_size to zone_info and fs_info to track the value, so we can limit all I/O to a zoned block device that we want to write using the zone append command to the device's limits. Zone append command is mandatory for zoned btrfs. So, reject a device with max_zone_append_size == 0. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- kernel-shared/ctree.h | 3 +++ kernel-shared/zoned.c | 29 +++++++++++++++++++++++++++++ kernel-shared/zoned.h | 1 + 3 files changed, 33 insertions(+) diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h index aab631a44785..5023db474784 100644 --- a/kernel-shared/ctree.h +++ b/kernel-shared/ctree.h @@ -1222,6 +1222,9 @@ struct btrfs_fs_info { u64 zone_size; u64 zoned; }; + + /* Max size to emit ZONE_APPEND write command */ + u64 max_zone_append_size; }; static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info) diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 7cb5262ba481..ee879a57b716 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -58,6 +58,18 @@ u64 zone_size(const char *file) return strtoull((const char *)chunk, NULL, 10) << SECTOR_SHIFT; } +u64 max_zone_append_size(const char *file) +{ + char chunk[32]; + int ret; + + ret = queue_param(file, "zone_append_max_bytes", chunk, sizeof(chunk)); + if (ret <= 0) + return 0; + + return strtoull((const char *)chunk, NULL, 10); +} + #ifdef BTRFS_ZONED static int report_zones(int fd, const char *file, struct btrfs_zoned_device_info *zinfo) @@ -102,9 +114,19 @@ static int report_zones(int fd, const char *file, /* Allocate the zone information array */ zinfo->zone_size = zone_bytes; + zinfo->max_zone_append_size = max_zone_append_size(file); zinfo->nr_zones = device_size / zone_bytes; if (device_size & (zone_bytes - 1)) zinfo->nr_zones++; + + if (zoned_model(file) != ZONED_NONE && + zinfo->max_zone_append_size == 0) { + error( + "zoned: zoned device %s does not support ZONE_APPEND command", + file); + exit(1); + } + zinfo->zones = calloc(zinfo->nr_zones, sizeof(struct blk_zone)); if (!zinfo->zones) { error("zoned: no memory for zone information"); @@ -248,6 +270,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 zoned_devices = 0; u64 nr_devices = 0; u64 zone_size = 0; + u64 max_zone_append_size = 0; const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; @@ -282,6 +305,11 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) ret = -EINVAL; goto out; } + if (!max_zone_append_size || + (zone_info->max_zone_append_size && + zone_info->max_zone_append_size < max_zone_append_size)) + max_zone_append_size = + zone_info->max_zone_append_size; } nr_devices++; } @@ -321,6 +349,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) } fs_info->zone_size = zone_size; + fs_info->max_zone_append_size = max_zone_append_size; out: return ret; diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index a6134babdf41..fcf2ccf34f26 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -29,6 +29,7 @@ enum btrfs_zoned_model { struct btrfs_zoned_device_info { enum btrfs_zoned_model model; u64 zone_size; + u64 max_zone_append_size; u32 nr_zones; struct blk_zone *zones; }; From patchwork Mon Apr 26 06:27:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3FF1C43461 for ; Mon, 26 Apr 2021 06:28:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA1DF611CC for ; Mon, 26 Apr 2021 06:28:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231954AbhDZG27 (ORCPT ); Mon, 26 Apr 2021 02:28:59 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231937AbhDZG27 (ORCPT ); Mon, 26 Apr 2021 02:28:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418497; x=1650954497; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LET1yipfnjktN53BvpBod2WBsEjIhj9flXDrJ70sQSA=; b=noF39aGWXnXj5q8azfvioTFRFjntF8cKl5o+N0P7R0cXsIjVprWAW0kr Igrsxd3HvDFhWU41vsfrAB0USZ0sNjmabFzNHXV3xipqstT/02JqZk5zT QNSuucCzJuGoPQTQXxf17dMvP8YMM8qqKs6qeSr7pZWE+H/aCxdRw48Gu VolyeX1gRlKzlgwLI7yDLKk9Jr01bVb4bidpLkED5KMS48YuLrnz4ovmg kqEXUaI4fiHso/pNifQg+8Z7J4ELxuFJ+0UVWVoxODJ+iCR906d2RV/oX m4MD2FEoS5PL6ejn8dQ5wcU8tjBYATeR+VxTGPjdBlAzetWKeZe2B3Xi3 g==; IronPort-SDR: rs+S6y7h/+l/qmAau1Sok2yQL1MhRXcNy1zzybb+DmpkIVXmZyd82s6Jhj1FaJEeY6d4pzLAtD ugtsF5SO8LxF6d88Jv4Cz50Nfb3OMjDbyl+j+SK84mwFeA7DYVMOvt3NjMD6doFDt+7U8Ak5rt wibJy2qMs6/PFMuBTpWR3rPJuMn9SSV7t5s96HT3LbeLB+X4QVRjd7JUOtX1//v+HTGlg7NjFA Vw1B9dEaHaGBc0DvAuBm3bb7bas+hAKTKaCJLdCcG9X9IPpiA5HUxuHZrDtRhgOOQT0us8EFf0 Qmo= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788119" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:17 +0800 IronPort-SDR: bi5mC04ibWAQS7aymLKz9kJb3UaSnPyFJbOgIvpKeVDbXBrAdc4b7mc51YZsaMySsVE/jI+qVl b5LI5gjZr7F6/prHDo7SJHE7NO4kW6NPikxKcdEca6DVfqARDh3WzE8cq+Q4FMeuZR+NK2bboq F2hoaKX5m3T00PZZ0fCx5ixEcU0sdQ5txiWJCQY7Ld7ngLCThIIqIUb9xAEpvtnLO5M1FnnudG TpIjgFbF7emfQPa9h0XylCKlKW1G118wOT2vO+kLdnC32a/fnr7241vn9ZQbeRffchza9udoBS EkdWkzTzxGql1u1f+zzFKfrV Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:42 -0700 IronPort-SDR: DoUX66YQZY/VWS22S0RDydYmzNLjMtv0S5aaN9ZkGfWQUfDjqcmmTNJX+M0i0JJDWGdKxOxIA3 974//xdCJUnriTa/+kQ8QToufzOQGKTbveGOvMYdlaN4v7WekFHlX4kfHhcNt/U2YwwTBoAAkt xsmWskrw9xU++yhr6qE3kSMHIUTgHch016TQw20bbsR9pSmSn2aJMwKUiUKBc+DhOW8tVeGCCc hOn0d8hFJUoU9aXuJAx0nV2XRkdAq4VTF1Fomx9R1MInjtgwyFETf6Lick8uyBwlgkUmg9unHn Xrw= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:17 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 08/26] btrfs-progs: zoned: disallow mixed-bg in ZONED mode Date: Mon, 26 Apr 2021 15:27:24 +0900 Message-Id: <5fcf22f807b9de547010cee4b211348f9fe014bf.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Placing both data and metadata in a block group is impossible in ZONED mode. For data, we can allocate a space for it and write it immediately after the allocation. For metadata, however, we cannot do that, because the logical addresses are recorded in other metadata buffers to build up the trees. As a result, a data buffer can be placed after a metadata buffer, which is not written yet. Writing out the data buffer will break the sequential write rule. Check and disallow MIXED_BG with ZONED mode. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- kernel-shared/zoned.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index ee879a57b716..7b05fe6cc70f 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -348,6 +348,12 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) goto out; } + if (btrfs_fs_incompat(fs_info, MIXED_GROUPS)) { + error("zoned: mixed block groups not supported"); + ret = -EINVAL; + goto out; + } + fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; From patchwork Mon Apr 26 06:27:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17DFEC433B4 for ; Mon, 26 Apr 2021 06:28:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC45F611CC for ; Mon, 26 Apr 2021 06:28:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231960AbhDZG3B (ORCPT ); Mon, 26 Apr 2021 02:29:01 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231937AbhDZG3A (ORCPT ); Mon, 26 Apr 2021 02:29:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418498; x=1650954498; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sczYZgfONl8x2jLo9BD68Jlw4NkITHMuKlvjtlVTGug=; b=p68JrY97bYS53MwGnD0JWrNOED/EL3N2ZKQ9m36HqlrvnPad6dCLuY/e N3P6Y2CmNSw3ea+9vFhzDZHZlxfVXYpmzOnm4rWT5lkRxv6ngFxccOxxX j3wF81OVm0N7ZA6g+oanG9XZxS1yEzbcrzn1XTQhEAF0nODY/vn6rz9l5 J/LgRxQwN3ZBduILJ/KK9h0b3w1bzpYC1GdN2jAevcTi+Bi5He5QHKuFr VDda50BNPkhCG5wews1GhVgkvqcDczRFksRBND5zWVsywhLzT3efaGIZW CiWCfEEC7BR7Z1XF7X2W2Ja5WwT77IRP5hzfX9S0VEFxF4ormR39BLspe Q==; IronPort-SDR: XqI4I8y77pMiAsCUns5hlL4BefoEFW+4vnifiyfw9QNaM80uriz5iBGVnAFK2cxt/7rSBXl48N FKhSoKvz+F/Owme9UHxnXCD4krIzeD1Bwl6bpatDPCzYd+Z1USXuY7WlqHD8GJu1WgqwYPAfhc TNXbKq3p+yZJ1coHcvopGCE+FHP6ARSc60Wzdv18Mzr5BH/be5DrbeKXW5HecTsRNdoRVrQWKU Fn0hclDs3fhzJVYhi/xJth9FnCmQ/yAqBDFgysYN3CgHo1g1n6O72NK47zftFEpP4kDe1QZQk7 yqo= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788121" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:18 +0800 IronPort-SDR: YMjep2Bt73icod7grhp+h6k2MjeFYiD9W3j01yoD+I7PBtxC4LglFUcjmu+tPJvmlLGEJd2atX gTOKLMZnn/xmz+cxcKBjJd7PRYH5GIlgAN82v9/8ije3SeCJJ7x0g8HrXh26oyLoIKoPS2ifbr BxVGzYOczQLVyZ1H+EtDWsag/c1ajcKoo/wZYulLBzShkD/ofx0UL29CwEqIAtEjbObcOFRnN8 kmv4fHRuHi7rPLdxd3a1t7F5H9yldRwwc+yv40WJTOOikuXvdtXCaDqG2W6xmtuBDAUjY2WuLv v4A6CvWIbnkIvMvea2AOqbE8 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:43 -0700 IronPort-SDR: MWBaLVLRjy4c/W//N9boEMwCB+wh/DJQ8VLKbeLFgXl4y90cn1fC43oeGWZAbkTL1Rg29aGP8B xLR7hd0WxejwECleghFkWjW6co1nJV+8QxB19FVbFd7zX4ZxMRUFt1TaQ7SsW85+mNoIzIrX16 k2XAJs0gs+J00s60PAuKvn1LGQeEP+WYyg3y+Tl3qpDN5tWcR8hNLZW5ShO+uuUBmuFwHwh/H5 7gWJJeGyupK568stXo8YX0YcNqbpDxI++s/+fVx7JoQT8tz9Er5Pl+pIsu6Uq3kzka6ENpfdg/ l2w= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:18 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 09/26] btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices Date: Mon, 26 Apr 2021 15:27:25 +0900 Message-Id: <3ecdd9e85e977d87443a503a41fc349944687d6c.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Run a zoned filesystem on non-zoned devices. This is done by "slicing up" the block device into static sized chunks and fake a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing of zoned filesystems, i.e. the zoned chunk allocator, on regular block devices. Currently, we always use EMULATED_ZONE_SIZE (= 256MB) for the emulated zone size. In the future, this will be customized by mkfs option. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- kerncompat.h | 1 + kernel-shared/zoned.c | 67 +++++++++++++++++++++++++++++++++++++++---- 2 files changed, 62 insertions(+), 6 deletions(-) diff --git a/kerncompat.h b/kerncompat.h index a39b79cba767..b2983ed60c4a 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -166,6 +166,7 @@ typedef long long s64; typedef int s32; #endif +typedef u64 sector_t; struct vma_shared { int prio_tree_node; }; struct vm_area_struct { diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 7b05fe6cc70f..ebaa2a81b2c8 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -14,6 +14,8 @@ /* Maximum number of zones to report per ioctl(BLKREPORTZONE) call */ #define BTRFS_REPORT_NR_ZONES 4096 +#define EMULATED_ZONE_SIZE SZ_256M + static int btrfs_get_dev_zone_info(struct btrfs_device *device); enum btrfs_zoned_model zoned_model(const char *file) @@ -51,6 +53,10 @@ u64 zone_size(const char *file) char chunk[32]; int ret; + /* zoned emulation on regular device */ + if (zoned_model(file) == ZONED_NONE) + return EMULATED_ZONE_SIZE; + ret = queue_param(file, "chunk_sectors", chunk, sizeof(chunk)); if (ret <= 0) return 0; @@ -71,6 +77,46 @@ u64 max_zone_append_size(const char *file) } #ifdef BTRFS_ZONED +/* + * Emulate blkdev_report_zones() for a non-zoned device. It slices up the block + * device into static sized chunks and fake a conventional zone on each of + * them. + */ +static int emulate_report_zones(const char *file, int fd, u64 pos, + struct blk_zone *zones, unsigned int nr_zones) +{ + const sector_t zone_sectors = EMULATED_ZONE_SIZE >> SECTOR_SHIFT; + struct stat st; + sector_t bdev_size; + unsigned int i; + int ret; + + ret = fstat(fd, &st); + if (ret < 0) { + error("unable to stat %s: %m", file); + return -EIO; + } + + bdev_size = btrfs_device_size(fd, &st) >> SECTOR_SHIFT; + + pos >>= SECTOR_SHIFT; + for (i = 0; i < nr_zones; i++) { + zones[i].start = i * zone_sectors + pos; + zones[i].len = zone_sectors; + zones[i].capacity = zone_sectors; + zones[i].wp = zones[i].start + zone_sectors; + zones[i].type = BLK_ZONE_TYPE_CONVENTIONAL; + zones[i].cond = BLK_ZONE_COND_NOT_WP; + + if (zones[i].wp >= bdev_size) { + i++; + break; + } + } + + return i; +} + static int report_zones(int fd, const char *file, struct btrfs_zoned_device_info *zinfo) { @@ -149,12 +195,23 @@ static int report_zones(int fd, const char *file, rep->sector = sector; rep->nr_zones = BTRFS_REPORT_NR_ZONES; - ret = ioctl(fd, BLKREPORTZONE, rep); - if (ret != 0) { - error("zoned: ioctl BLKREPORTZONE failed (%m)"); - exit(1); + if (zinfo->model != ZONED_NONE) { + ret = ioctl(fd, BLKREPORTZONE, rep); + if (ret != 0) { + error("zoned: ioctl BLKREPORTZONE failed (%m)"); + exit(1); + } + } else { + ret = emulate_report_zones(file, fd, + sector << SECTOR_SHIFT, + zone, BTRFS_REPORT_NR_ZONES); + if (ret < 0) { + error("zoned: failed to emulate BLKREPORTZONE"); + exit(1); + } } + if (!rep->nr_zones) break; @@ -231,8 +288,6 @@ int btrfs_get_zone_info(int fd, const char *file, /* Check zone model */ model = zoned_model(file); - if (model == ZONED_NONE) - return 0; #ifdef BTRFS_ZONED zinfo = calloc(1, sizeof(*zinfo)); From patchwork Mon Apr 26 06:27:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 508A0C433ED for ; Mon, 26 Apr 2021 06:28:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D9A060FE5 for ; Mon, 26 Apr 2021 06:28:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231963AbhDZG3C (ORCPT ); Mon, 26 Apr 2021 02:29:02 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231937AbhDZG3B (ORCPT ); Mon, 26 Apr 2021 02:29:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418500; x=1650954500; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mpGVlOUdC8AuE3mwl78pptXRCSgby5exq7T2yTC/kzk=; b=jk0whuhbWybjYUUpYfDnhre1/vXeipOuU5mFg+cGG9uV9Op3BAxIWs45 KSNdvmyoowdLA1JgQ2wiYRT1ScdoKt7EPEPa0q6Y/V1iO7K0teMricT99 Kx+lxYhWOfMJHek5mAPc661E0N3iANDxMra6JxfpjTJHtLvFe0T2ueQnx qJf7DKWiPNqasFlXk0eO3HLlVicrdj7JJnekiNJjW9AELY/jHGyxsPS6H 3hsruJJAkvXm6Q34kv8vlTiwDQwTSv6fCtpkG7epXSmtgRNL7xt9Ba++W maVNfV1EeWhQSRaKh4Gg1CheAhj4MlvrcqtKe2xY5PcgnZVNgzLMtc8mG A==; IronPort-SDR: jSl92k+pzbwTzrPctefAPHm5/asSbkKOEaBQ9xwITHHlf5MGTmpfmMJafIQrEzoicmG8IXJfZl d5G9RCbUAE3bwi5nMoKceaxzfiOWeJ5tQDGpUgICzj1iqRvTCyhpd7RM4qldRW8t1LV4OmaLGf bpC8Ys65kvuBuKvB2zJFY/YqYxa2QT6cEbG+atfGn+c6uT60LrqIBtznDRZxWyKXpwnhLClSJz pjD+DWgnDXzf3JMuVobRjKHODFn+vDJ9JoLtL9OJmt4z2sT5EQUp1GGSZ7r3nHwk/kqPYYuW9s MU4= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788123" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:19 +0800 IronPort-SDR: gFNHEvRRWDWAwKFY7vCx256Eg2QwSUUUPl+VW0q/D2fucYI6okh27T6E9j/s0Wjn42+/hvf8n9 bG7D9uihZ6OX1PRwioMRMCEoXXdI5MA+FT0PH8QLGx6S19rk1TngWbxnVfX9NpJ2h7EJrZVkGe IcbETgHmlflp+zuK79qEYqHJCkIsvFzlzGInMk4CFpTfhjn37L5jnuspfLHjMGgRxOjNrD5ZrR N1XIXEoh68m6kf3TOF1WU4/f+hlZYkxD4t3YaUQ9e6jMxyeDW03ca54S2M5UF/C/ynAlSPafL9 JreEbS+sWkRRLCQusBcNzvTC Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:44 -0700 IronPort-SDR: PRHMjpndclVfBe7jMJu/9hzuvWx8Z0rVnPk2+yAmHysaPrvvXVbE4zTDJsZm+r9dDlI0PR3bWu 78AXF6rdUxpedD9y+Ty39IPGoDY535s6clcIf53Nz3EOz8ki7M6ixtN3hE7sZn+h/0+gZCXrtR Q13g6/SNLRz6X2X3/gUzg+RP3MmJr8TBesIHZSoufBSE/A9z0x48/DTSdHd06sSuEFmTmVc5xf 61nK3NHg9/UEov/m58W95Jk9su86A/fIP62pSX2PDT8nR7fve0G/E/rN6JyErVkTKM2Szz8PeG yYo= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:20 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 10/26] btrfs-progs: zoned: implement log-structured superblock for ZONED mode Date: Mon, 26 Apr 2021 15:27:26 +0900 Message-Id: <3bb3fbb1f36ad682c949eec3476dcee00a15a132.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Superblock (and its copies) is the only data structure in btrfs which has a fixed location on a device. Since we cannot overwrite in a sequential write required zone, we cannot place superblock in the zone. One easy solution is limiting superblock and copies to be placed only in conventional zones. However, this method has two downsides: one is reduced number of superblock copies. The location of the second copy of superblock is 256GB, which is in a sequential write required zone on typical devices in the market today. So, the number of superblock and copies is limited to be two. Second downside is that we cannot support devices which have no conventional zones at all. To solve these two problems, we employ superblock log writing. It uses two adjacent zones as a circular buffer to write updated superblocks. Once the first zone is filled up, start writing into the second one. Then, when both zones are filled up and before starting to write to the first zone again, it reset the first zone. We can determine the position of the latest superblock by reading write pointer information from a device. One corner case is when both zones are full. For this situation, we read out the last superblock of each zone, and compare them to determine which zone is older. The following zones are reserved as the circular buffer on ZONED btrfs. - primary superblock: offset 0B (and the following zone) - first copy: offset 512G (and the following zone) - Second copy: offset 4T (4096G, and the following zone) If these reserved zones are conventional, superblock is written fixed at the start of the zone without logging. Currently, superblock reading/writing is done by pread/pwrite. This commit replace the call sites with sbread/sbwrite to wrap the functions. For zoned btrfs, btrfs_sb_io which is called from sbread/sbwrite reverses the IO position back to a mirror number, maps the mirror number into the superblock logging position, and do the IO. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- cmds/inspect-dump-super.c | 3 +- common/device-scan.c | 4 +- kerncompat.h | 16 +++ kernel-shared/disk-io.c | 13 +- kernel-shared/zoned.c | 280 ++++++++++++++++++++++++++++++++++++++ kernel-shared/zoned.h | 29 ++++ 6 files changed, 335 insertions(+), 10 deletions(-) diff --git a/cmds/inspect-dump-super.c b/cmds/inspect-dump-super.c index f8d1506a6afd..04e81d8c3b60 100644 --- a/cmds/inspect-dump-super.c +++ b/cmds/inspect-dump-super.c @@ -25,6 +25,7 @@ #include "kernel-shared/ctree.h" #include "kernel-shared/disk-io.h" #include "kernel-shared/print-tree.h" +#include "kernel-shared/zoned.h" #include "common/utils.h" #include "cmds/commands.h" #include "common/help.h" @@ -38,7 +39,7 @@ static int load_and_dump_sb(char *filename, int fd, u64 sb_bytenr, int full, sb = (struct btrfs_super_block *)super_block_data; - ret = pread64(fd, super_block_data, BTRFS_SUPER_INFO_SIZE, sb_bytenr); + ret = sbread(fd, super_block_data, sb_bytenr); if (ret != BTRFS_SUPER_INFO_SIZE) { /* check if the disk if too short for further superblock */ if (ret == 0 && errno == 0) diff --git a/common/device-scan.c b/common/device-scan.c index 74d7853afccb..659f48c4dedb 100644 --- a/common/device-scan.c +++ b/common/device-scan.c @@ -190,7 +190,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans, btrfs_set_stack_device_bytes_used(dev_item, device->bytes_used); memcpy(&dev_item->uuid, device->uuid, BTRFS_UUID_SIZE); - ret = pwrite(fd, buf, sectorsize, BTRFS_SUPER_INFO_OFFSET); + ret = sbwrite(fd, buf, BTRFS_SUPER_INFO_OFFSET); BUG_ON(ret != sectorsize); free(buf); @@ -267,7 +267,7 @@ int btrfs_device_already_in_root(struct btrfs_root *root, int fd, ret = -ENOMEM; goto out; } - ret = pread(fd, buf, BTRFS_SUPER_INFO_SIZE, super_offset); + ret = sbread(fd, buf, super_offset); if (ret != BTRFS_SUPER_INFO_SIZE) goto brelse; diff --git a/kerncompat.h b/kerncompat.h index b2983ed60c4a..d37edfe7fdac 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -364,6 +364,19 @@ static inline int is_power_of_2(unsigned long n) return (n != 0 && ((n & (n - 1)) == 0)); } +static inline int ilog2(u64 num) +{ + int l = 0; + + num >>= 1; + while (num) { + l++; + num >>= 1; + } + + return l; +} + typedef u16 __bitwise __le16; typedef u16 __bitwise __be16; typedef u32 __bitwise __le32; @@ -371,6 +384,9 @@ typedef u32 __bitwise __be32; typedef u64 __bitwise __le64; typedef u64 __bitwise __be64; +#define U64_MAX UINT64_MAX +#define U32_MAX UINT32_MAX + /* Macros to generate set/get funcs for the struct fields * assume there is a lefoo_to_cpu for every type, so lets make a simple * one for u8: diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index 4aba237f5a5c..d79d6a00cdf8 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -1615,7 +1615,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, u64 bytenr; if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) { - ret = pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, sb_bytenr); + ret = sbread(fd, buf, sb_bytenr); /* real error */ if (ret < 0) return -errno; @@ -1643,7 +1643,8 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, for (i = 0; i < max_super; i++) { bytenr = btrfs_sb_offset(i); - ret = pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, bytenr); + ret = sbread(fd, buf, bytenr); + if (ret < BTRFS_SUPER_INFO_SIZE) break; @@ -1715,9 +1716,8 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, * super_copy is BTRFS_SUPER_INFO_SIZE bytes and is * zero filled, we can use it directly */ - ret = pwrite64(device->fd, fs_info->super_copy, - BTRFS_SUPER_INFO_SIZE, - fs_info->super_bytenr); + ret = sbwrite(device->fd, fs_info->super_copy, + fs_info->super_bytenr); if (ret != BTRFS_SUPER_INFO_SIZE) { errno = EIO; error( @@ -1750,8 +1750,7 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, * super_copy is BTRFS_SUPER_INFO_SIZE bytes and is * zero filled, we can use it directly */ - ret = pwrite64(device->fd, fs_info->super_copy, - BTRFS_SUPER_INFO_SIZE, bytenr); + ret = sbwrite(device->fd, fs_info->super_copy, bytenr); if (ret != BTRFS_SUPER_INFO_SIZE) { errno = EIO; error( diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index ebaa2a81b2c8..1b235dc0a1c9 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -2,6 +2,7 @@ #include #include +#include #include "kernel-lib/list.h" #include "kernel-shared/volumes.h" @@ -14,6 +15,20 @@ /* Maximum number of zones to report per ioctl(BLKREPORTZONE) call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* + * Location of the first zone of superblock logging zone pairs. + * + * - primary superblock: 0B (zone 0) + * - first copy: 512G (zone starting at that offset) + * - second copy: 4T (zone starting at that offset) + */ +#define BTRFS_SB_LOG_PRIMARY_OFFSET (0ULL) +#define BTRFS_SB_LOG_FIRST_OFFSET (512ULL * SZ_1G) +#define BTRFS_SB_LOG_SECOND_OFFSET (4096ULL * SZ_1G) + +#define BTRFS_SB_LOG_FIRST_SHIFT ilog2(BTRFS_SB_LOG_FIRST_OFFSET) +#define BTRFS_SB_LOG_SECOND_SHIFT ilog2(BTRFS_SB_LOG_SECOND_OFFSET) + #define EMULATED_ZONE_SIZE SZ_256M static int btrfs_get_dev_zone_info(struct btrfs_device *device); @@ -117,6 +132,116 @@ static int emulate_report_zones(const char *file, int fd, u64 pos, return i; } +static int sb_write_pointer(int fd, struct blk_zone *zones, u64 *wp_ret) +{ + bool empty[BTRFS_NR_SB_LOG_ZONES]; + bool full[BTRFS_NR_SB_LOG_ZONES]; + sector_t sector; + + ASSERT(zones[0].type != BLK_ZONE_TYPE_CONVENTIONAL && + zones[1].type != BLK_ZONE_TYPE_CONVENTIONAL); + + empty[0] = zones[0].cond == BLK_ZONE_COND_EMPTY; + empty[1] = zones[1].cond == BLK_ZONE_COND_EMPTY; + full[0] = zones[0].cond == BLK_ZONE_COND_FULL; + full[1] = zones[1].cond == BLK_ZONE_COND_FULL; + + /* + * Possible states of log buffer zones + * + * Empty[0] In use[0] Full[0] + * Empty[1] * x 0 + * In use[1] 0 x 0 + * Full[1] 1 1 C + * + * Log position: + * *: Special case, no superblock is written + * 0: Use write pointer of zones[0] + * 1: Use write pointer of zones[1] + * C: Compare super blocks from zones[0] and zones[1], use the latest + * one determined by generation + * x: Invalid state + */ + + if (empty[0] && empty[1]) { + /* Special case to distinguish no superblock to read */ + *wp_ret = zones[0].start << SECTOR_SHIFT; + return -ENOENT; + } else if (full[0] && full[1]) { + /* Compare two super blocks */ + u8 buf[BTRFS_NR_SB_LOG_ZONES][BTRFS_SUPER_INFO_SIZE]; + struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES]; + int i; + int ret; + + for (i = 0; i < BTRFS_NR_SB_LOG_ZONES; i++) { + u64 bytenr; + + bytenr = ((zones[i].start + zones[i].len) + << SECTOR_SHIFT) - BTRFS_SUPER_INFO_SIZE; + + ret = pread64(fd, buf[i], BTRFS_SUPER_INFO_SIZE, + bytenr); + if (ret != BTRFS_SUPER_INFO_SIZE) + return -EIO; + super[i] = (struct btrfs_super_block *)&buf[i]; + } + + if (super[0]->generation > super[1]->generation) + sector = zones[1].start; + else + sector = zones[0].start; + } else if (!full[0] && (empty[1] || full[1])) { + sector = zones[0].wp; + } else if (full[0]) { + sector = zones[1].wp; + } else { + return -EUCLEAN; + } + *wp_ret = sector << SECTOR_SHIFT; + return 0; +} + +/* + * Get the first zone number of the superblock mirror + */ +static inline u32 sb_zone_number(int shift, int mirror) +{ + u64 zone = 0; + + ASSERT(0 <= mirror && mirror < BTRFS_SUPER_MIRROR_MAX); + switch (mirror) { + case 0: zone = 0; break; + case 1: zone = 1ULL << (BTRFS_SB_LOG_FIRST_SHIFT - shift); break; + case 2: zone = 1ULL << (BTRFS_SB_LOG_SECOND_SHIFT - shift); break; + } + + ASSERT(zone <= U32_MAX); + + return (u32)zone; +} + +int btrfs_reset_dev_zone(int fd, struct blk_zone *zone) +{ + struct blk_zone_range range; + + /* Nothing to do if it is already empty */ + if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL || + zone->cond == BLK_ZONE_COND_EMPTY) + return 0; + + range.sector = zone->start; + range.nr_sectors = zone->len; + + if (ioctl(fd, BLKRESETZONE, &range) < 0) + return -errno; + + zone->cond = BLK_ZONE_COND_EMPTY; + zone->wp = zone->start; + + return 0; +} + static int report_zones(int fd, const char *file, struct btrfs_zoned_device_info *zinfo) { @@ -232,6 +357,161 @@ static int report_zones(int fd, const char *file, return 0; } +static int sb_log_location(int fd, struct blk_zone *zones, int rw, + u64 *bytenr_ret) +{ + u64 wp; + int ret; + + /* Use the head of the zones if either zone is conventional */ + if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) { + *bytenr_ret = zones[0].start << SECTOR_SHIFT; + return 0; + } else if (zones[1].type == BLK_ZONE_TYPE_CONVENTIONAL) { + *bytenr_ret = zones[1].start << SECTOR_SHIFT; + return 0; + } + + ret = sb_write_pointer(fd, zones, &wp); + if (ret != -ENOENT && ret < 0) + return ret; + + if (rw == WRITE) { + struct blk_zone *reset = NULL; + + if (wp == zones[0].start << SECTOR_SHIFT) + reset = &zones[0]; + else if (wp == zones[1].start << SECTOR_SHIFT) + reset = &zones[1]; + + if (reset && reset->cond != BLK_ZONE_COND_EMPTY) { + ASSERT(reset->cond == BLK_ZONE_COND_FULL); + + ret = btrfs_reset_dev_zone(fd, reset); + if (ret) + return ret; + } + } else if (ret != -ENOENT) { + /* For READ, we want the previous one */ + if (wp == zones[0].start << SECTOR_SHIFT) + wp = (zones[1].start + zones[1].len) << SECTOR_SHIFT; + wp -= BTRFS_SUPER_INFO_SIZE; + } + + *bytenr_ret = wp; + return 0; + +} + +size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw) +{ + size_t count = BTRFS_SUPER_INFO_SIZE; + struct stat stat_buf; + struct blk_zone_report *rep; + struct blk_zone *zones; + const u64 sb_size_sector = BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT; + u64 mapped = U64_MAX; + u32 zone_num; + unsigned int zone_size_sector; + size_t rep_size; + int mirror = -1; + int i; + int ret; + size_t ret_sz; + + ASSERT(rw == READ || rw == WRITE); + + if (fstat(fd, &stat_buf) == -1) { + error("fstat failed (%s)", strerror(errno)); + exit(1); + } + + /* Do not call ioctl(BLKGETZONESZ) on a regular file. */ + if ((stat_buf.st_mode & S_IFMT) == S_IFBLK) { + ret = ioctl(fd, BLKGETZONESZ, &zone_size_sector); + if (ret) { + error("zoned: ioctl BLKGETZONESZ failed (%m)"); + exit(1); + } + } else { + zone_size_sector = 0; + } + + /* We can call pread/pwrite if 'fd' is non-zoned device/file. */ + if (zone_size_sector == 0) { + if (rw == READ) + return pread64(fd, buf, count, offset); + return pwrite64(fd, buf, count, offset); + } + + ASSERT(IS_ALIGNED(zone_size_sector, sb_size_sector)); + + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + if (offset == btrfs_sb_offset(i)) { + mirror = i; + break; + } + } + ASSERT(mirror != -1); + + zone_num = sb_zone_number(ilog2(zone_size_sector) + SECTOR_SHIFT, + mirror); + + rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone) * 2; + rep = calloc(1, rep_size); + if (!rep) { + error("zoned: no memory for zones report"); + exit(1); + } + + rep->sector = zone_num * (sector_t)zone_size_sector; + rep->nr_zones = 2; + + ret = ioctl(fd, BLKREPORTZONE, rep); + if (ret) { + error("zoned: ioctl BLKREPORTZONE failed (%m)"); + exit(1); + } + if (rep->nr_zones != 2) { + if (errno == ENOENT || errno == 0) + return (rw == WRITE ? count : 0); + error("zoned: failed to read zone info of %u and %u: %m", + zone_num, zone_num + 1); + free(rep); + return 0; + } + + zones = (struct blk_zone *)(rep + 1); + + ret = sb_log_location(fd, zones, rw, &mapped); + /* + * Special case: no superblock found in the zones. This case happens + * when initializing a file-system. + */ + if (rw == READ && ret == -ENOENT) { + memset(buf, 0, count); + return count; + } + if (ret) + return ret; + + if (rw == READ) + ret_sz = pread64(fd, buf, count, mapped); + else + ret_sz = pwrite64(fd, buf, count, mapped); + + if (ret_sz != count) + return ret_sz; + + /* Call fsync() to force the write order */ + if (rw == WRITE && fsync(fd)) { + error("failed to synchronize superblock: %s", strerror(errno)); + exit(1); + } + + return ret_sz; +} + #endif int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index fcf2ccf34f26..82e3096eab8a 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -3,9 +3,14 @@ #ifndef __BTRFS_ZONED_H__ #define __BTRFS_ZONED_H__ +#include "kernel-shared/disk-io.h" + #include #include "kerncompat.h" +/* Number of superblock log zones */ +#define BTRFS_NR_SB_LOG_ZONES 2 + #ifdef BTRFS_ZONED #include #else @@ -41,4 +46,28 @@ int btrfs_get_zone_info(int fd, const char *file, int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info); int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); +#ifdef BTRFS_ZONED +size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw); +static inline size_t sbread(int fd, void *buf, off_t offset) +{ + return btrfs_sb_io(fd, buf, offset, READ); +} +static inline size_t sbwrite(int fd, void *buf, off_t offset) +{ + return btrfs_sb_io(fd, buf, offset, WRITE); +} +int btrfs_reset_dev_zone(int fd, struct blk_zone *zone); +#else +#define sbread(fd, buf, offset) \ + pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) +#define sbwrite(fd, buf, offset) \ + pwrite64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) + +static inline int btrfs_reset_dev_zone(int fd, struct blk_zone *zone) +{ + return 0; +} + +#endif /* BTRFS_ZONED */ + #endif /* __BTRFS_ZONED_H__ */ From patchwork Mon Apr 26 06:27:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2E9FC433ED for ; Mon, 26 Apr 2021 06:28:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7F60B61153 for ; Mon, 26 Apr 2021 06:28:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231970AbhDZG3G (ORCPT ); Mon, 26 Apr 2021 02:29:06 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231937AbhDZG3C (ORCPT ); Mon, 26 Apr 2021 02:29:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418501; x=1650954501; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1bbY8I/2hypVGf5Ode6E+LRJwrvwQY2dcTWOw7/OAEk=; b=XFhdafAOXsr6TNh6cNwbZ+7mL+1tJdGkGKAxO8AxLJItLVG1rBxX/pVF Hf/lflDbXG32LAX1KNPGfSKSthx59oJ/LOg37r1qZk2ynUuO7qGW1A6hn 54X55w1QeVoAey7ymFpVqIyLIuUzdJitSTjJ58t/THCpapdpvvl7O/+gi 4q0/lOqrHpZbRf4XvNGHjL+XK9YdgODLedHrBaOW2ZQUg3JIYFZZcALSl CGUfku2EwTCOgsXgUNpXxxj+6MU4shFsfEYGtpTHJ63Jl7Rce4ZJ8WDIf Ul6mWOwMWSi/fnLGqjvIb26DQZ4Ss4t0lFuApCtQ7FG+VVC5MBRXzerZ4 A==; IronPort-SDR: 8H3SEPP12ckVu6BLSU09PiJKq5DyLjXnESchwtqY10VxlSeASs6Hrf8BdnGsLwVC8hqhYgeqwY GxDxSBCkd0mNoS+LLpy9f8sDNwoL1p7lRK4xW8qCTPzyDenFKyEqxFNqF+n0IduuJ8qhOxOLD4 iaatzAFFwRuVN5GsM09Umj3Mq3TTCAJkbZF5YuVD73x6qOXaLIn6zZv+1PfGrxT1mvqhV2XQF6 QI9bauS9YcrATsPkfp9iPx4f6xhPTHgkcV3sQ1h0ShRBUvxlT/7AsFs4piTIXEwq+4PAnmiG1r sXY= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788125" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:20 +0800 IronPort-SDR: qL0SBH+qrDkqA//qGEOX6qPf4WEf0HCeBcLO0/8iwqs22YNGN0AxzqTYgKzkQopCOipLD1Mjp6 isg3mMKYlEYjkcJv9O3PLy2qev/K+eFfSvaqsP+Rplvgnse29OQeZ/jYOHTtxmSY5MlWeJIKyk tajrSe7OhXHJAaDs2diq8WXRCacJZxRCMz5oyehUUFl5xVt+lNWTrn5LFd4Zz0kzKgIr6as13q k/PWSJSj4wwqk3sy4gK2DwH6Ay780yl7uwC9S3z2cjDUuUT0kmkDgtqELepPyQj/VHGdGzdLrh KPt9v+F9HRtQkPkc/jKdbYu7 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:45 -0700 IronPort-SDR: oou90oOZTL1wLAc55wjJ2aRl7a+u6PCElki0Xtjf8zPJQMmBAudAR43BSbD6+EyKWz4aAIbLoQ 5wLuc29zYcOSs3Xt9pfZ5Ofu+NX4Su4q7tSgOE86f0LGppptgdYViqcWJ7NhIZphbwQGOmoXPp mkXzJuJOGjx56dKsaLpG3WoUs+6dcg5mkIml4HYy8L3ILQ9oiuNjXl1zLWaPLNZw8xNAcnzQlD GxnYg1K1EyZ8AJWbNrghsiBJISNttWCRJS3ibXWQfJOJ6uon0SjOmeuyFq/BiOJ5MeAhbqYkI+ Q60= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:21 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 11/26] btrfs-progs: zoned: implement zoned chunk allocator Date: Mon, 26 Apr 2021 15:27:27 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Implement a zoned chunk and device extent allocator. One device zone becomes a device extent so that a zone reset affects only this device extent and does not change the state of blocks in the neighbor device extents. To implement the allocator, we need to extend the following functions for a zoned filesystem. - init_alloc_chunk_ctl - dev_extent_search_start - dev_extent_hole_check - decide_stripe_size Here, dev_extent_hole_check() is newly introduced to check the validity of a hole found. init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always set the stripe_size to the zone size and aligns the parameters to the zone size. dev_extent_search_start() only aligns the start offset to zone boundaries. We don't care about the first 1MB like in regular filesystem because we anyway reserve the first two zones for superblock logging. dev_extent_hole_check_zoned() checks if zones in given hole are either conventional or empty sequential zones. Also, it skips zones reserved for superblock logging. With the change to the hole, the new hole may now contain pending extents. So, in this case, loop again to check that. Finally, decide_stripe_size_zoned() should shrink the number of devices instead of stripe size because we need to honor stripe_size == zone_size. Signed-off-by: Naohiro Aota --- kerncompat.h | 2 + kernel-shared/volumes.c | 143 ++++++++++++++++++++++++++++++++++++++-- kernel-shared/volumes.h | 1 + kernel-shared/zoned.c | 139 ++++++++++++++++++++++++++++++++++++++ kernel-shared/zoned.h | 51 ++++++++++++++ 5 files changed, 332 insertions(+), 4 deletions(-) diff --git a/kerncompat.h b/kerncompat.h index d37edfe7fdac..c58e8a27430f 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -358,6 +359,7 @@ do { \ /* Alignment check */ #define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) +#define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) static inline int is_power_of_2(unsigned long n) { diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c index 63530a99b41c..ecfc63265f35 100644 --- a/kernel-shared/volumes.c +++ b/kernel-shared/volumes.c @@ -162,6 +162,8 @@ struct alloc_chunk_ctl { u64 max_chunk_size; int total_devs; u64 dev_offset; + int nparity; + int ncopies; }; struct stripe { @@ -457,6 +459,8 @@ int btrfs_scan_one_device(int fd, const char *path, static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) { + u64 zone_size; + switch (device->fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: /* @@ -465,11 +469,72 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) * make sure to start at an offset of at least 1MB. */ return max(start, BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + case BTRFS_CHUNK_ALLOC_ZONED: + zone_size = device->zone_info->zone_size; + return ALIGN(max_t(u64, start, zone_size), zone_size); default: BUG(); } } +static bool dev_extent_hole_check_zoned(struct btrfs_device *device, + u64 *hole_start, u64 *hole_size, + u64 num_bytes) +{ + u64 zone_size = device->zone_info->zone_size; + u64 pos; + bool changed = false; + + ASSERT(IS_ALIGNED(*hole_start, zone_size)); + + while (*hole_size > 0) { + pos = btrfs_find_allocatable_zones(device, *hole_start, + *hole_start + *hole_size, + num_bytes); + if (pos != *hole_start) { + *hole_size = *hole_start + *hole_size - pos; + *hole_start = pos; + changed = true; + if (*hole_size < num_bytes) + break; + } + + *hole_start += zone_size; + *hole_size -= zone_size; + changed = true; + } + + return changed; +} + +/** + * dev_extent_hole_check - check if specified hole is suitable for allocation + * @device: the device which we have the hole + * @hole_start: starting position of the hole + * @hole_size: the size of the hole + * @num_bytes: the size of the free space that we need + * + * This function may modify @hole_start and @hole_size to reflect the suitable + * position for allocation. Returns true if hole position is updated, false + * otherwise. + */ +static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start, + u64 *hole_size, u64 num_bytes) +{ + switch (device->fs_devices->chunk_alloc_policy) { + case BTRFS_CHUNK_ALLOC_REGULAR: + /* No check */ + break; + case BTRFS_CHUNK_ALLOC_ZONED: + return dev_extent_hole_check_zoned(device, hole_start, + hole_size, num_bytes); + default: + BUG(); + } + + return false; +} + /* * find_free_dev_extent_start - find free space in the specified device * @device: the device which we search the free space in @@ -507,6 +572,10 @@ static int find_free_dev_extent_start(struct btrfs_device *device, int ret; int slot; struct extent_buffer *l; + u64 zone_size = 0; + + if (device->zone_info) + zone_size = device->zone_info->zone_size; search_start = dev_extent_search_start(device, search_start); @@ -517,6 +586,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device, max_hole_start = search_start; max_hole_size = 0; +again: if (search_start >= search_end) { ret = -ENOSPC; goto out; @@ -562,11 +632,9 @@ static int find_free_dev_extent_start(struct btrfs_device *device, if (key.offset > search_start) { hole_size = key.offset - search_start; + dev_extent_hole_check(device, &search_start, &hole_size, + num_bytes); - /* - * Have to check before we set max_hole_start, otherwise - * we could end up sending back this offset anyway. - */ if (hole_size > max_hole_size) { max_hole_start = search_start; max_hole_size = hole_size; @@ -603,6 +671,12 @@ next: * search_end may be smaller than search_start. */ if (search_end > search_start) { + if (dev_extent_hole_check(device, &search_start, &hole_size, + num_bytes)) { + btrfs_release_path(path); + goto again; + } + hole_size = search_end - search_start; if (hole_size > max_hole_size) { @@ -618,6 +692,7 @@ next: ret = 0; out: + ASSERT(zone_size == 0 || IS_ALIGNED(max_hole_start, zone_size)); btrfs_free_path(path); *start = max_hole_start; if (len) @@ -646,6 +721,11 @@ int btrfs_insert_dev_extent(struct btrfs_trans_handle *trans, struct extent_buffer *leaf; struct btrfs_key key; + /* Check alignment to zone for a zoned block device */ + ASSERT(!device->zone_info || + device->zone_info->model != ZONED_HOST_MANAGED || + IS_ALIGNED(start, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -1052,6 +1132,38 @@ static void init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info, ctl->max_chunk_size = min(percent_max, ctl->max_chunk_size); } +static void init_alloc_chunk_ctl_policy_zoned(struct btrfs_fs_info *info, + struct alloc_chunk_ctl *ctl) +{ + u64 type = ctl->type; + u64 zone_size = info->zone_size; + int min_num_stripes = ctl->min_stripes * ctl->num_stripes; + int min_data_stripes = (min_num_stripes - ctl->nparity) / ctl->ncopies; + u64 min_chunk_size = min_data_stripes * zone_size; + + ctl->stripe_size = zone_size; + ctl->min_stripe_size = zone_size; + if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + if (type & BTRFS_BLOCK_GROUP_SYSTEM) { + ctl->max_chunk_size = SZ_16M; + ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; + } else if (type & BTRFS_BLOCK_GROUP_DATA) { + ctl->max_chunk_size = 10ULL * SZ_1G; + ctl->max_stripes = BTRFS_MAX_DEVS(info); + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { + /* for larger filesystems, use larger metadata chunks */ + if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) + ctl->max_chunk_size = SZ_1G; + else + ctl->max_chunk_size = SZ_256M; + ctl->max_stripes = BTRFS_MAX_DEVS(info); + } + } + + ctl->max_chunk_size = round_down(ctl->max_chunk_size, zone_size); + ctl->max_chunk_size = max(ctl->max_chunk_size, min_chunk_size); +} + static void init_alloc_chunk_ctl(struct btrfs_fs_info *info, struct alloc_chunk_ctl *ctl) { @@ -1066,11 +1178,16 @@ static void init_alloc_chunk_ctl(struct btrfs_fs_info *info, ctl->max_chunk_size = 4 * ctl->stripe_size; ctl->total_devs = btrfs_super_num_devices(info->super_copy); ctl->dev_offset = 0; + ctl->nparity = btrfs_raid_array[type].nparity; + ctl->ncopies = btrfs_raid_array[type].ncopies; switch (info->fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: init_alloc_chunk_ctl_policy_regular(info, ctl); break; + case BTRFS_CHUNK_ALLOC_ZONED: + init_alloc_chunk_ctl_policy_zoned(info, ctl); + break; default: BUG(); } @@ -1113,12 +1230,27 @@ static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl) return 0; } +static int decide_stripe_size_zoned(struct alloc_chunk_ctl *ctl) +{ + if (chunk_bytes_by_type(ctl) > ctl->max_chunk_size) { + /* stripe_size is fixed in ZONED. Reduce num_stripes instead. */ + ctl->num_stripes = ctl->max_chunk_size * ctl->ncopies / + ctl->stripe_size; + if (ctl->num_stripes < ctl->min_stripes) + return -ENOSPC; + } + + return 0; +} + static int decide_stripe_size(struct btrfs_fs_info *info, struct alloc_chunk_ctl *ctl) { switch (info->fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: return decide_stripe_size_regular(ctl); + case BTRFS_CHUNK_ALLOC_ZONED: + return decide_stripe_size_zoned(ctl); default: BUG(); } @@ -1140,6 +1272,7 @@ static int create_chunk(struct btrfs_trans_handle *trans, int index; struct btrfs_key key; u64 offset; + u64 zone_size = info->zone_size; if (!ctl->start) { ret = find_next_chunk(info, &offset); @@ -1192,6 +1325,8 @@ static int create_chunk(struct btrfs_trans_handle *trans, BUG_ON(ret); } + ASSERT(!zone_size || IS_ALIGNED(dev_offset, zone_size)); + device->bytes_used += ctl->stripe_size; ret = btrfs_update_device(trans, device); if (ret < 0) diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h index a64288d566d8..5a85a6c0bc6f 100644 --- a/kernel-shared/volumes.h +++ b/kernel-shared/volumes.h @@ -74,6 +74,7 @@ struct btrfs_device { enum btrfs_chunk_allocation_policy { BTRFS_CHUNK_ALLOC_REGULAR, + BTRFS_CHUNK_ALLOC_ZONED, }; struct btrfs_fs_devices { diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 1b235dc0a1c9..e828d633619a 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -512,6 +512,144 @@ size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw) return ret_sz; } +/* + * btrfs_check_allocatable_zones - check if spcecifeid region is + * suitable for allocation + * @device: the device to allocate a region + * @pos: the position of the region + * @num_bytes: the size of the region + * + * In non-ZONED device, anywhere is suitable for allocation. In ZONED + * device, check if + * 1) the region is not on non-empty sequential zones, + * 2) all zones in the region have the same zone type, + * 3) it does not contain super block location + */ +bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, + u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u64 nzones, begin, end; + u64 sb_pos; + bool is_sequential; + int shift; + int i; + + if (!zinfo || zinfo->model == ZONED_NONE) + return true; + + nzones = num_bytes / zinfo->zone_size; + begin = pos / zinfo->zone_size; + end = begin + nzones; + + ASSERT(IS_ALIGNED(pos, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return false; + + shift = ilog2(zinfo->zone_size); + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + sb_pos = sb_zone_number(shift, i); + if (!(end < sb_pos || sb_pos + 1 < begin)) + return false; + } + + is_sequential = btrfs_dev_is_sequential(device, pos); + + while (num_bytes) { + if (is_sequential && !btrfs_dev_is_empty_zone(device, pos)) + return false; + if (is_sequential != btrfs_dev_is_sequential(device, pos)) + return false; + + pos += zinfo->zone_size; + num_bytes -= zinfo->zone_size; + } + + return true; +} + +/** + * btrfs_find_allocatable_zones - find allocatable zones within a given region + * + * @device: the device to allocate a region on + * @hole_start: the position of the hole to allocate the region + * @num_bytes: size of wanted region + * @hole_end: the end of the hole + * @return: position of allocatable zones + * + * Allocatable region should not contain any superblock locations. + */ +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + int shift = ilog2(zinfo->zone_size); + u64 nzones = num_bytes >> shift; + u64 pos = hole_start; + u64 begin, end; + bool is_sequential; + bool have_sb; + int i; + + ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + while (pos < hole_end) { + begin = pos >> shift; + end = begin + nzones; + + if (end > zinfo->nr_zones) + return hole_end; + + /* + * The zones must be all sequential (and empty), or + * conventional zones + */ + is_sequential = btrfs_dev_is_sequential(device, pos); + for (i = 0; i < end - begin; i++) { + u64 zone_offset = pos + ((u64)i << shift); + + if ((is_sequential && + !btrfs_dev_is_empty_zone(device, zone_offset)) || + (is_sequential != + btrfs_dev_is_sequential(device, zone_offset))) { + pos += zinfo->zone_size; + continue; + } + } + + have_sb = false; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_zone; + u64 sb_pos; + + sb_zone = sb_zone_number(shift, i); + if (!(end <= sb_zone || + sb_zone + BTRFS_NR_SB_LOG_ZONES <= begin)) { + have_sb = true; + pos = ((u64)sb_zone + BTRFS_NR_SB_LOG_ZONES) << shift; + break; + } + + /* We also need to exclude regular superblock positions */ + sb_pos = btrfs_sb_offset(i); + if (!(pos + num_bytes <= sb_pos || + sb_pos + BTRFS_SUPER_INFO_SIZE <= pos)) { + have_sb = true; + pos = ALIGN(sb_pos + BTRFS_SUPER_INFO_SIZE, + zinfo->zone_size); + break; + } + } + if (!have_sb) + break; + } + + return pos; +} + #endif int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) @@ -691,6 +829,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; out: return ret; diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index 82e3096eab8a..29c203f45ada 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -7,6 +7,7 @@ #include #include "kerncompat.h" +#include "kernel-shared/volumes.h" /* Number of superblock log zones */ #define BTRFS_NR_SB_LOG_ZONES 2 @@ -56,7 +57,34 @@ static inline size_t sbwrite(int fd, void *buf, off_t offset) { return btrfs_sb_io(fd, buf, offset, WRITE); } + +static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, + u64 bytenr) +{ + unsigned int zno; + + if (!zinfo || zinfo->model == ZONED_NONE) + return false; + + zno = bytenr / zinfo->zone_size; + return zinfo->zones[zno].type == BLK_ZONE_TYPE_SEQWRITE_REQ; +} + +static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + unsigned int zno; + + if (!zone_is_sequential(zinfo, pos)) + return true; + + zno = pos / zinfo->zone_size; + return zinfo->zones[zno].cond == BLK_ZONE_COND_EMPTY; +} + int btrfs_reset_dev_zone(int fd, struct blk_zone *zone); +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes); #else #define sbread(fd, buf, offset) \ pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) @@ -68,6 +96,29 @@ static inline int btrfs_reset_dev_zone(int fd, struct blk_zone *zone) return 0; } +static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, + u64 bytenr) +{ + return false; +} + +static inline u64 btrfs_find_allocatable_zones(struct btrfs_device *device, + u64 hole_start, u64 hole_end, + u64 num_bytes) +{ + return hole_start; +} + +static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) +{ + return true; +} + #endif /* BTRFS_ZONED */ +static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) +{ + return zone_is_sequential(device->zone_info, pos); +} + #endif /* __BTRFS_ZONED_H__ */ From patchwork Mon Apr 26 06:27:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 835BEC43461 for ; Mon, 26 Apr 2021 06:28:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A5BB61153 for ; Mon, 26 Apr 2021 06:28:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231978AbhDZG3H (ORCPT ); Mon, 26 Apr 2021 02:29:07 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231965AbhDZG3D (ORCPT ); Mon, 26 Apr 2021 02:29:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418502; x=1650954502; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=815W11zYf3RRQCcQUxj1SXGqRFUWYkBCGH6U2OMgS6I=; b=JLblOSP5T24zXj2o4HnsY0CBo+8I1scswPg+S2tQG4TQDohD5+nF0nfE Hqyr0fI7ae+DKC0JkkebHvaUxBwS/ejfQebIeTKq93bzeL+eimixlnzxQ VM0v160nmJqr0dJVuDJ8DCPrkwdXGdMXCXpNyxxRXQtBoCJgSvofydqca ZyovHl5n/29PlL+uieEcO0Ded2pzTXF/5ePiUOy5B4F3GH++Z4kBpGl6Z pxf/PXIoCojca6waaCybTW+Nlk0PZsV812G8yM9DbCASe7hh+c0/DBSkr Wh9qO0QBWoOFrjzPhUDhqRyhXybJJENO6BArwogoxIRcxm2s0XOWLKuQ7 g==; IronPort-SDR: pLtDTHDoCaKrRuQPZhdymL+L1jhLcw+cIXKsXugRJj6Fupv/g0/0nyDNldjvL10g39Hz4aZijs BBzasivN75yXc74vVOuWYBfNPI5SLnOYdvEgXKPmJBPOcjkSsZiqiANASjeO/h+z/N2Nk15Wp5 xOKCOMdn6I8aSTIvXOt0an2F4tKRVCIc09O+HWL6heNL3K8yGn4wAoBSTXCN3l4RMj0YcvE/GV 7wp+ks4TMFSPb6KVT5V9vl3DQZTxWY3mFCdc15PqJ1hkCHLKOOb8TcK9ovOL25RDj4W/5FRKKW s/Y= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788126" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:22 +0800 IronPort-SDR: +L5dgeRo6P3QhtT4Ew9bqjOSBe4+gcMKCF6cgG82bRfmIqDHOsijIUt39HYArjQRE46YsJMdl/ hq6QxLNXB4iR3WfCT5IOB6ZTU2RR4Ccf6uJ2iayT3V0AVFDmLVF1mB4eJqO/fp4HMiMePm/+Fl YpIGggeJs7vHcj35TjVGsItDkuNSYMBKhE84wc2+u2x14b4w24BH56OX2P9mfGTJL9NPr9ynce M9ihLWrxN1G5s/PGJSpdt36GXZY1o6OQtHSM3bigUWHCnwOJb9IbKFhoFfXR5s/yMlgVxKOABb UfaMpTns/m3wIfG1OFhYELzE Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:46 -0700 IronPort-SDR: SJ/y91c6/zHyhSaaUNBMx1WzG0OSeD/1zBlfM/KCEHnh752Nd0g7ZI3U3rOiEiKDDKCezfP8e+ D/ZK2oDEC9q04no0LsoNJE5drOBdK8Iz2ajTC61H3Uw9/zx5D5547LVq9ggKgZlqE8vjULxyFR UIikzGQLr/p1/12+PI8RYQUb1nxjZb60CreAsDjljvZKZffs4CEAvZFuR/CbgIhGDzS/q2R6YR acg/FsppLJmKM5T9O+KYZmpdmh/P0Pcj9H5Dk+1GAz6EQ1KIJXDnvUCbKvfoSiq1VS0rBtL1Pt EVI= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:22 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 12/26] btrfs-progs: zoned: load zone's allocation offset Date: Mon, 26 Apr 2021 15:27:28 +0900 Message-Id: <545865d98227b802f24a39393f9297c6d7799c6a.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org A zoned filesystem must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. To facilitate this, add an "alloc_offset" to the block-group to track the logical addresses of the write pointer. This logical address is populated in btrfs_load_block_group_zone_info() from the write pointers of corresponding zones. For now, zoned filesystems the single profile. Supporting non-single profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-single profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota --- kernel-shared/ctree.h | 6 ++ kernel-shared/extent-tree.c | 8 +++ kernel-shared/zoned.c | 133 ++++++++++++++++++++++++++++++++++++ kernel-shared/zoned.h | 8 +++ 4 files changed, 155 insertions(+) diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h index 5023db474784..a68c8bd38bd2 100644 --- a/kernel-shared/ctree.h +++ b/kernel-shared/ctree.h @@ -1134,6 +1134,12 @@ struct btrfs_block_group { /* For dirty block groups */ struct list_head dirty_list; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only with ZONED mode enabled. + */ + u64 alloc_offset; }; struct btrfs_device; diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c index 5b1fbe10283a..ec5ea9a8e090 100644 --- a/kernel-shared/extent-tree.c +++ b/kernel-shared/extent-tree.c @@ -31,6 +31,7 @@ #include "kernel-shared/volumes.h" #include "kernel-shared/free-space-cache.h" #include "kernel-shared/free-space-tree.h" +#include "kernel-shared/zoned.h" #include "common/utils.h" #define PENDING_EXTENT_INSERT 0 @@ -2704,6 +2705,10 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info, } cache->space_info = space_info; + ret = btrfs_load_block_group_zone_info(fs_info, cache); + if (ret) + return ret; + btrfs_add_block_group_cache(fs_info, cache); return 0; } @@ -2761,6 +2766,9 @@ btrfs_add_block_group(struct btrfs_fs_info *fs_info, u64 bytes_used, u64 type, cache->start = chunk_offset; cache->length = size; + ret = btrfs_load_block_group_zone_info(fs_info, cache); + BUG_ON(ret); + cache->used = bytes_used; cache->flags = type; INIT_LIST_HEAD(&cache->dirty_list); diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index e828d633619a..8b51115e667f 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -14,6 +14,10 @@ /* Maximum number of zones to report per ioctl(BLKREPORTZONE) call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) +/* Pseudo write pointer value for conventional zone */ +#define WP_CONVENTIONAL ((u64)-2) /* * Location of the first zone of superblock logging zone pairs. @@ -650,6 +654,135 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, return pos; } +int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, + struct btrfs_block_group *cache) +{ + struct btrfs_device *device; + struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree; + struct cache_extent *ce; + struct map_lookup *map; + u64 logical = cache->start; + u64 length = cache->length; + u64 physical = 0; + int ret = 0; + int i; + u64 *alloc_offsets = NULL; + u32 num_sequential = 0, num_conventional = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + /* Sanity check */ + if (logical == BTRFS_BLOCK_RESERVED_1M_FOR_SUPER) { + if (length + SZ_1M != fs_info->zone_size) { + error("zoned: unaligned initial system block group"); + return -EIO; + } + } else if (!IS_ALIGNED(length, fs_info->zone_size)) { + error("zoned: unaligned block group at %llu + %llu", logical, + length); + return -EIO; + } + + /* Get the chunk mapping */ + ce = search_cache_extent(&map_tree->cache_tree, logical); + if (!ce) { + error("zoned: failed to find block group at %llu", logical); + return -ENOENT; + } + map = container_of(ce, struct map_lookup, ce); + + alloc_offsets = calloc(map->num_stripes, sizeof(*alloc_offsets)); + if (!alloc_offsets) { + error("zoned: failed to allocate alloc_offsets"); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->fd == -1) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (is_sequential) + num_sequential++; + else + num_conventional++; + + if (!is_sequential) { + alloc_offsets[i] = WP_CONVENTIONAL; + continue; + } + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + zone = device->zone_info->zones[physical / fs_info->zone_size]; + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + error( + "zoned: offline/readonly zone %llu on device %s (devid %llu)", + physical / fs_info->zone_size, device->name, + device->devid); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (num_conventional > 0) { + /* + * Since conventional zones do not have a write pointer, we + * cannot determine alloc_offset from the pointer + */ + ret = -EINVAL; + goto out; + } + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + cache->alloc_offset = alloc_offsets[0]; + break; + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID0: + case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* non-single profiles are not supported yet */ + default: + error("zoned: profile %s not yet supported", + btrfs_group_profile_str(map->type)); + ret = -EINVAL; + goto out; + } + +out: + free(alloc_offsets); + return ret; +} + #endif int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index 29c203f45ada..45d77c8daa69 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -85,6 +85,8 @@ static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) int btrfs_reset_dev_zone(int fd, struct blk_zone *zone); u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, u64 hole_end, u64 num_bytes); +int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, + struct btrfs_block_group *cache); #else #define sbread(fd, buf, offset) \ pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) @@ -114,6 +116,12 @@ static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) return true; } +static inline int btrfs_load_block_group_zone_info( + struct btrfs_fs_info *fs_info, struct btrfs_block_group *cache) +{ + return 0; +} + #endif /* BTRFS_ZONED */ static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Apr 26 06:27:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D96FFC43460 for ; Mon, 26 Apr 2021 06:28:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BDA25611ED for ; Mon, 26 Apr 2021 06:28:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231979AbhDZG3H (ORCPT ); Mon, 26 Apr 2021 02:29:07 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41949 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231969AbhDZG3G (ORCPT ); Mon, 26 Apr 2021 02:29:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418504; x=1650954504; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jXQWsLQNpFmsGr39U3jojTtB99oLuiG1HtmvFzyWGX4=; b=X84RCRexwxdlXzbxW/PHx1oHEqG8Otz2uMLUCzv4hsiUhqFke00NyAdD 6ujNA2Cmc3htdpprudIUyqLxDZihs4GowQJ/KhEIFp/3YoF559htf8Fxh NeqHTq4pTLLLWn8tffOOYyUb7cBbHDB6tWf21K+dbjcDnv1IsLwtAREli nA3LhDlGQz2BgQudp2At12XOm9QYY4pCyRhRhozyKtqcBu13m4cPIgOKF Ye8ArSR6kZIOVLLXgZyjDmNOoLdSU0TtgFa674jHC2O538IQp/08u4OC2 TryDRHD4MPO2hBuozBMAzFEU06JUA4PTHUmHtyZHsj4DQ6YDFhhgfJZpr w==; IronPort-SDR: /kRcr2zHtmVn3NXfu6txLtOezg/DYY2Fg1wbtKBhSE/JZi7/kDj9XQUhwrWPOt72xMe09v4IwT HVCDQeJBxENbAcUEsOhR3SiwRGeEKkyWOBvu1YrEMPnc0w8u0Zz9hRaDOmrpazJX7O/dp9JHS4 C0VKaCLnEapt5rJAZ5kFUSoEgC6VpbWcO6HLN1MKbq3y2Mbu1d6UlIpPifGL3TwX8sVMJG7RVj ewqAti7F1xKr1v2EbOhp6aQrctGb3+5VfTHT5ONOADSz0XBdcaUUYEsEuW6L9PrqrNI75XVVyI 4ZY= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788127" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:23 +0800 IronPort-SDR: iXIFZPBlPdXX+wXwOoCD7Q/RRSGXbYb7BaNO9UlyBJU6iqGh87JkWTUYmEM0F2nQFoLR4Vf/D3 z7dc8fjNwN2iSyrclmsjSSv1Zwdn0jNwdxbnfWXJ+BV23cUey54TzX75icjZSvBH8gzB4W1CJx DeUiOhRIjjwkcbHMmV2rNLYIRjISpQ9Qa4+C9r2IfY+C2OiGNLVEuKQhT8DAGm0/yr++g/q6fr YvUgUfiDDh+Zw/q2V+99Szm72BTezAhqmVrO/Hy0tdA8L+8kX/UsdCHcyqxSYlRfA5n1tRbyM4 C+78XdcVAgq+V6bLfsBrGSnM Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:47 -0700 IronPort-SDR: 2N7NHyTWLbXyc1a5UuRB4aGT38i6jyW9Usy2rYAUzXDlcEy9rECY/tTpaD17fJSJhPUyAsRFrl Mja3uZxK69CIizpoalx46Gugiwr/NCS9pP1nIrz1JxzFM4Mjv15uTrc4p08GLmrsXvvEPQoywM hZSnl9nSlZsHaw/o2oewue/LGjflTrRW/KrLFbL+HziQEbLSY1ZMeT6ZAI2gij/ima4L0aw2+u NM03ZAc25xYSwCxwDdjZr5GXuX2u7gM4Mu0RI1bYFonkPc1NHV4rVYSGD7CtJ/QgCA/fGp5fVQ vm0= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:23 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 13/26] btrfs-progs: zoned: implement sequential extent allocation Date: Mon, 26 Apr 2021 15:27:29 +0900 Message-Id: <7a8a75019747c596075def43b885f7721d6c2bbc.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Implement a sequential extent allocator for zoned filesystems. This allocator only needs to check if there is enough space in the block group after the allocation pointer to satisfy the extent allocation request. Since the allocator is really simple, we implement it directly in find_search_start(). Signed-off-by: Naohiro Aota --- kernel-shared/extent-tree.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c index ec5ea9a8e090..7453bf9f49b6 100644 --- a/kernel-shared/extent-tree.c +++ b/kernel-shared/extent-tree.c @@ -284,6 +284,14 @@ again: if (cache->ro || !block_group_bits(cache, data)) goto new_group; + if (btrfs_is_zoned(root->fs_info)) { + if (cache->length - cache->alloc_offset < num) + goto new_group; + *start_ret = cache->start + cache->alloc_offset; + cache->alloc_offset += num; + return 0; + } + while(1) { ret = find_first_extent_bit(&root->fs_info->free_space_cache, last, &start, &end, EXTENT_DIRTY); From patchwork Mon Apr 26 06:27:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5E05C433ED for ; Mon, 26 Apr 2021 06:28:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA985611F0 for ; Mon, 26 Apr 2021 06:28:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231983AbhDZG3I (ORCPT ); Mon, 26 Apr 2021 02:29:08 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41929 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231972AbhDZG3G (ORCPT ); Mon, 26 Apr 2021 02:29:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418504; x=1650954504; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iQ/n5nLYIxw5jRuhvo4gw0nJIIoC8SyC1rasLoLInpI=; b=DXtEChKRWNZccHM2ZsCbevNoC3lzyAdS4pWZjZIUn17CHneqTR6yKUW8 lNSg5GkKA9xhsMk9rTeIlt5iM/EwsnRolRVqeSlvXfIlOVh8MMiLQR81O N/poZwNwXnWNRDaksY93RrEVPvLouqagx5FN9KI4zt12XbauPWAsTkq/D tvEj3CaHcGCzkGEc+y+qltKVLjAW1MjaqSYkCuZv4/zZe7YY8oooWiEJe yLxPhaiTmObER4RHIf43dsF/6biLBK0B6oNOnEcyYvAkeF+Rw0OLAPv0h vrtLEgEvzgcjsVr+07QKEO886EVV3K8/KgyH3+t4kcdatqH/bhufgEtxe Q==; IronPort-SDR: p1QBKbCzRDAFAU8DOI3SQdcVanQPhIG111GRm504qolRBxtuHFlV2IMfD2FkSOr3+WDt+HdXS1 hJGpDaZJKH8Tu9JV8VXAyfa1euHpNfT52r6sgo+o1E52V5NKcbrAHIOk1K7IO7RidMmR3MaRkr KdVWy8gYTHJrMatB7y4bSQToG5iGkXxrAA3d5vFUTtjKqifKEuYctUvdIXW8n6MPpvLbl10KFX 6T1Y3fbSquEgiRjhJmV5pQpoc5AddpKScPesWOOWZ+gUuzXKq9TLeFl+JkZMpuC5VULpCr0rjw BC8= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788129" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:24 +0800 IronPort-SDR: 9pvGEftP3yKr+E6yC9AnO2yQGv/VdUYnF5GMTetQ6WDXHuJ8Wy8B6zfaLwCoxyCyqsSaHtDpEt StFaq++AGyv7hkqYYkWz2KfMdgji0FpvMrhw7b/5fTI/Pv2ekIj+YitbatVWgqiT3WP+M7rcGE KOhJIya4ZhIePkhTd+L+hd7TzGma+BGHFL9UOcwn984R/OwQrTaxPlCE7gsbeIbTjH4+LEnaAn teG7YXHYCpDuwSzVmQ2FW/9TIRv1EE/qyo49KXcdKrXL4wJzgwnSiqyEa4VlpnSfB+nAt5tAUW Cb/OnPRCo3nI1k9q5mZ5f04d Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:48 -0700 IronPort-SDR: tCiunZEryTfolYsIQxkM2hn3ItfmCMR/zHgIPpBwDmTXcxWrvCLI1rr0i/71sfka4qd7k6UzNN llv1IVGWr22lzDOJ8cRmoWQ6w/jfcn5PgQUwTcdMiZYPfL7lXV9alUKkGt+1lbl5U1UXMEPcOm 0fAWF6Vkv628zteto713EXdDWCZWR70r7G+2jKWpwXXole/M9TjpCiQyemqA68TA/Yx0b4RU+L 1dUI0540UKXQUSgGJU2ls+/T/1w43hDidyj+f0hy4lrY3iAsmfxUGtMVZ1Uerx8kd6tVeLiZ8d fac= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:24 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 14/26] btrfs-progs: zoned: calculate allocation offset for conventional zones Date: Mon, 26 Apr 2021 15:27:30 +0900 Message-Id: <5ea922c1b1410e0b41c0309106116927d77bdf5b.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset for sequential allocation if a block group contains a conventional zone. But instead, we can consider the end of the highest addressed extent in the block group for the allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex, which is already taken in btrfs_make_block_group(). Since it is a new block group anyways, we can simply set the allocation offset to 0. Signed-off-by: Naohiro Aota --- kernel-shared/zoned.c | 85 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 79 insertions(+), 6 deletions(-) diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 8b51115e667f..715a7881328c 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -654,6 +654,67 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, return pos; } +/* + * Calculate an allocation pointer from the extent allocation information + * for a block group consist of conventional zones. It is pointed to the + * end of the highest addressed extent in the block group as an allocation + * offset. + */ +static int calculate_alloc_pointer(struct btrfs_fs_info *fs_info, + struct btrfs_block_group *cache, + u64 *offset_ret) +{ + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + int ret; + u64 length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = cache->start + cache->length; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + /* We should not find the exact match */ + if (!ret) + ret = -EUCLEAN; + if (ret < 0) + goto out; + + ret = btrfs_previous_extent_item(root, path, cache->start); + if (ret) { + if (ret == 1) { + ret = 0; + *offset_ret = 0; + } + goto out; + } + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]); + + if (found_key.type == BTRFS_EXTENT_ITEM_KEY) + length = found_key.offset; + else + length = fs_info->nodesize; + + if (!(found_key.objectid >= cache->start && + found_key.objectid + length <= cache->start + cache->length)) { + ret = -EUCLEAN; + goto out; + } + *offset_ret = found_key.objectid + length - cache->start; + ret = 0; + +out: + btrfs_free_path(path); + return ret; +} + int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, struct btrfs_block_group *cache) { @@ -667,6 +728,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, int ret = 0; int i; u64 *alloc_offsets = NULL; + u64 last_alloc = 0; u32 num_sequential = 0, num_conventional = 0; if (!btrfs_is_zoned(fs_info)) @@ -752,12 +814,16 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, } if (num_conventional > 0) { - /* - * Since conventional zones do not have a write pointer, we - * cannot determine alloc_offset from the pointer - */ - ret = -EINVAL; - goto out; + ret = calculate_alloc_pointer(fs_info, cache, &last_alloc); + if (ret || map->num_stripes == num_conventional) { + if (!ret) + cache->alloc_offset = last_alloc; + else + error( + "zoned: failed to determine allocation offset of bg %llu", + cache->start); + goto out; + } } switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { @@ -779,6 +845,13 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, } out: + /* An extent is allocated after the write pointer */ + if (!ret && num_conventional && last_alloc > cache->alloc_offset) { + error("zoned: got wrong write pointer in BG %llu: %llu > %llu", + logical, last_alloc, cache->alloc_offset); + ret = -EIO; + } + free(alloc_offsets); return ret; } From patchwork Mon Apr 26 06:27:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B49EC433B4 for ; Mon, 26 Apr 2021 06:28:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E173F611CE for ; Mon, 26 Apr 2021 06:28:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231987AbhDZG3J (ORCPT ); Mon, 26 Apr 2021 02:29:09 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231937AbhDZG3H (ORCPT ); Mon, 26 Apr 2021 02:29:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418505; x=1650954505; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tx0Mt8hBm9+wyuz6+bwxLY4Uo3mRSJjrmg+WARGheWQ=; b=h83b+o0QL13WpB+guyIjilDoVV96N5zI4ydrGg1kvIpm4VZzqFgOhFF0 4WFljXjmzMVyj5Mu1fq4OOOklCnHlsTsQH8xK7cDuvX9U+pTXUnARjFd+ d8HxmRiMv61IRCwZzJH2Mh49pFcYXwQOyfUd3VzXbAvjRX0ir2tZxTHCX ab/zxD9GBB4vXPLo4XG+eq/cJAHcb5J2dciRWUVtzqRJkTnFJIE/Jx8D/ EgUIVX3XSrPYlvp7H+60SLlqaqtqfva/6oZk5fetTjMFdJE3fHz3j2SXX d6NsLCLCLdUu5ORV1NCYuLS1ge3fAvoFzicr0Dr7o7a37bcE8qa/Cf19e A==; IronPort-SDR: V79lxG1xbccYPCRKrmECWsS6zZWC8PEjkCXXVSFZEzZYm8DNSkWHkYwG2FTqJ9fAPp8uN9FFDY OkGfqV7EES0IVldkqPck3cac+ogzYXrvNyQLc/kotc8RR8TfZXMbJvWeCm+FaaITef36rVGqSN 2A8Jzy1puopGC01U8AyxZjl68kFUIvK8rCEHYp4KqcE2fFSE50+9Hl2lj/LONatZfGDreK+fXk jOBjzaM8+j+zEbDZND8PwSdcQPN0EZdeLXCUt/aQQ8pEzQB0M44qTXxMOp9Dpvs8JHE2HkGst6 Pks= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788130" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:25 +0800 IronPort-SDR: BXdCgwRwzYbAM5epfa8b93GYmoJuw7eEfe13G4ZLtB9k2bKsHC2yfFPC12f4ck/WWYyxFnIDlD 2Jazqe1cJ7Ag5PROHeRllxQaqT+wC0Db/YCFfhK1MQkj1tsckVq6/LebwOiPi9Y90pNPpq6Fbf xkay8k6IIllaliLl3u9wXGnHPKy0vb28kdhtWFonG3M8labP7vgkiltVwVDqfUaKPNV18lFuNM 6ZZu2LMcesAWW1sksrLEAfyVAI9IdctWLb2Od8qCHN0z84L+NB08wwsaDw01WSaNdE8ggN4lmw +6MN428xABcUbC4eepIsoLX7 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:49 -0700 IronPort-SDR: aHDR/4YzyrtTW8n7jEspTs9N3euHMjfXwiIiYnCBUXgemB0kPVLGBvwZU5Qkq3BWQ+RsrjkdCa kSTa84Vk2uvVRuKKBbuQ0ib4KdfyjCCNIEVa2J1wBLuD3WjwAhvoNmu4ouQetbQPksNHYZvFGU 7VvtpAmQ9LDa4QY33PRpHqBLx8eptmD+3dPQ2d3BxN/dWrgP073ZGTdgBgoP3y+0DVl4bDs3iu 9mlEVDGqGQ4C4xbkLyP3ufjpV3hg1jIMLxu6URWAmmxeIbpD7ibJaihXA1MQqFnyPVKFhO+126 uNM= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:25 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 15/26] btrfs-progs: zoned: redirty clean extent buffers in zoned btrfs Date: Mon, 26 Apr 2021 15:27:31 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch check if next dirty extent buffer is continuous to a previously written one. If not, it redirty extent buffers between the previous one and the next one, so that all dirty buffers are written sequentially. Signed-off-by: Naohiro Aota --- kernel-shared/ctree.h | 1 + kernel-shared/transaction.c | 6 ++++++ kernel-shared/zoned.c | 30 ++++++++++++++++++++++++++++++ kernel-shared/zoned.h | 8 ++++++++ 4 files changed, 45 insertions(+) diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h index a68c8bd38bd2..3cca60323e3d 100644 --- a/kernel-shared/ctree.h +++ b/kernel-shared/ctree.h @@ -1140,6 +1140,7 @@ struct btrfs_block_group { * allocation. This is used only with ZONED mode enabled. */ u64 alloc_offset; + u64 write_offset; }; struct btrfs_device; diff --git a/kernel-shared/transaction.c b/kernel-shared/transaction.c index a2e53fb8dfca..5b991651c28e 100644 --- a/kernel-shared/transaction.c +++ b/kernel-shared/transaction.c @@ -18,6 +18,7 @@ #include "kernel-shared/disk-io.h" #include "kernel-shared/transaction.h" #include "kernel-shared/delayed-ref.h" +#include "kernel-shared/zoned.h" #include "common/messages.h" struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root, @@ -138,10 +139,15 @@ int __commit_transaction(struct btrfs_trans_handle *trans, int ret; while(1) { +again: ret = find_first_extent_bit(tree, 0, &start, &end, EXTENT_DIRTY); if (ret) break; + + if (btrfs_redirty_extent_buffer_for_zoned(fs_info, start, end)) + goto again; + while(start <= end) { eb = find_first_extent_buffer(tree, start); BUG_ON(!eb || eb->start != start); diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 715a7881328c..793c524ed66f 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -852,10 +852,40 @@ out: ret = -EIO; } + if (!ret) + cache->write_offset = cache->alloc_offset; + free(alloc_offsets); return ret; } +bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info, + u64 start, u64 end) +{ + u64 next; + struct btrfs_block_group *cache; + struct extent_buffer *eb; + + if (!btrfs_is_zoned(fs_info)) + return false; + + cache = btrfs_lookup_first_block_group(fs_info, start); + BUG_ON(!cache); + + if (cache->start + cache->write_offset < start) { + next = cache->start + cache->write_offset; + BUG_ON(next + fs_info->nodesize > start); + eb = btrfs_find_create_tree_block(fs_info, next); + btrfs_mark_buffer_dirty(eb); + free_extent_buffer(eb); + return true; + } + + cache->write_offset += (end + 1 - start); + + return false; +} + #endif int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index 45d77c8daa69..1ba5a9939a3c 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -87,6 +87,8 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, u64 hole_end, u64 num_bytes); int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, struct btrfs_block_group *cache); +bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info, + u64 start, u64 end); #else #define sbread(fd, buf, offset) \ pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) @@ -122,6 +124,12 @@ static inline int btrfs_load_block_group_zone_info( return 0; } +static inline bool btrfs_redirty_extent_buffer_for_zoned( + struct btrfs_fs_info *fs_info, u64 start, u64 end) +{ + return false; +} + #endif /* BTRFS_ZONED */ static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Apr 26 06:27:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 284C7C43460 for ; Mon, 26 Apr 2021 06:28:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 07ADA611CE for ; Mon, 26 Apr 2021 06:28:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232006AbhDZG3M (ORCPT ); Mon, 26 Apr 2021 02:29:12 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231982AbhDZG3I (ORCPT ); Mon, 26 Apr 2021 02:29:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418506; x=1650954506; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MuNYcdAPn6VD4bz8GMSau8kyyerND+W88ZSYtOfv7mA=; b=DRCR3ZSpBvAi/M58YYg3JLMZqugsQyVF/QVhiXGebUDChqMEiUa0mNbs wyG1wU41V1KDeMs1nTy7pn+C4NHSdYYLzd7hZFfwWQTUztNtotUYYvYrz mHDRAB/3P1AectacmS2ghsgNl0FGGePbSc+KpENl0w5DHu8uYVZKU9GRX +NSrDGixRbPK0my+rTu+PTKQ+ME7ndgWzahtD6QIDnCZnQNv+nnb/n/JK 7XTdkjgn5FLmsgF6LkHtjr45X3D3y6Trjs10d+HSSmUpWHlbJhI8ZLpxU IWhXZAxEpQJPkIKRKZJ7RuaBecMtOomihp033plQ4Bw6UJtLxHWcc8OKY A==; IronPort-SDR: PoU/yM7crHFH0n7pAo18/QAWf/uGYLtsgSQV9FgwXIexkPCNccYgkKN4AUkrt9bs7Al+3K+y/8 c7VCE7noYIBjEOaQaRvstTIMM+JvJHoARUhsE6djVypzYQr02BGsm4P6/C1fprelEb2F+TCwhF lEOaVMIfvqYaGbivgQjd+aJIfKkWoRdYhXBDsLbjM1IItuNLjvGvO6OLrxuRm33D+w4J//abpM vKZGQTxzgWcll74fmYZV0fYx0uMITMfSHi6DnSvvbq7nh64UYXQreXBnmcUGpgw81vhnCSepAC fJo= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788131" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:26 +0800 IronPort-SDR: jpVDtJk9d9uEj4DYlxI9JFzRFF8TiTYGhrOnzytPpQPEiYrKKPC/j1raOUiaDwuHLZ5woqh3qX DUjLk1k9s8Hh26AtbgydZuxZm4augLAy5buQsS0oLinOnDA3kuMMt5GOIhx4u/zTjkxS8UjChn 0FhUQNQ4Ot6SEDCupEfeHTL5BOSOZcoyVIICzvQcTZBjyE0n8OX7zhXcRZ9AlzniV1cvtgCDFH 2irFZXRUXN4ACKfM7rfr6eHODfVZnVjrQASVmJfjm6Hs2En4/ryB1kAPYV2wQSAdeLjMGakC6q Qox04rUVx/3Pf8Tf3+CXmknr Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:50 -0700 IronPort-SDR: gvqVbTpjYDqTq3un4Juue8/gigouwYRKTS9jgt/gId/tA5nMOWf9dtMzGDSs3byvPQGIjz8rQG pCE66YdC6zvb+EkYoDsn7L2zAMehP7UZqPZWKSbHT64kEgcX/VN7KZVY1rhHzulzvYF8tyACCs vGbJ3cmJu8JP6pTdqY0G71Ln19RF/CttDlkL9O1BINBa26KH22GSZc1zy0nhe8OOPqPGs7lTWW mmZLIF/Wh8ytIwQz+URUVkdEbfUq1JoWbm34P3/5Fn2UE0i9dyD92aMtoHyiJ913KEtqIAKveu i8Y= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:26 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 16/26] btrfs-progs: zoned: reset zone of freed block group Date: Mon, 26 Apr 2021 15:27:32 +0900 Message-Id: <501d22b99fbdf8439fd9726ead439d29b5de8363.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When freeing a chunk, we can/should reset the underlying device zones for the chunk. This commit introduces btrfs_reset_chunk_zones() and reset the zones. Signed-off-by: Naohiro Aota --- kernel-shared/extent-tree.c | 10 ++++++++++ kernel-shared/zoned.c | 28 ++++++++++++++++++++++++++++ kernel-shared/zoned.h | 8 ++++++++ 3 files changed, 46 insertions(+) diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c index 7453bf9f49b6..e3ffe146606f 100644 --- a/kernel-shared/extent-tree.c +++ b/kernel-shared/extent-tree.c @@ -21,6 +21,7 @@ #include #include #include "kerncompat.h" +#include "kernel-lib/list.h" #include "kernel-lib/radix-tree.h" #include "kernel-lib/rbtree.h" #include "kernel-shared/ctree.h" @@ -3013,6 +3014,15 @@ static int free_chunk_dev_extent_items(struct btrfs_trans_handle *trans, struct btrfs_chunk); num_stripes = btrfs_chunk_num_stripes(path->nodes[0], chunk); for (i = 0; i < num_stripes; i++) { + u64 devid = btrfs_stripe_devid_nr(path->nodes[0], chunk, i); + u64 offset = btrfs_stripe_offset_nr(path->nodes[0], chunk, i); + u64 length = btrfs_stripe_length(fs_info, path->nodes[0], + chunk); + + ret = btrfs_reset_chunk_zones(fs_info, devid, offset, length); + if (ret < 0) + goto out; + ret = free_dev_extent_item(trans, fs_info, btrfs_stripe_devid_nr(path->nodes[0], chunk, i), btrfs_stripe_offset_nr(path->nodes[0], chunk, i)); diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 793c524ed66f..22e0245abaf6 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -886,6 +886,34 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info, return false; } +int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid, + u64 offset, u64 length) +{ + struct btrfs_device *device; + + list_for_each_entry(device, &fs_info->fs_devices->devices, + dev_list) { + struct btrfs_zoned_device_info *zinfo; + struct blk_zone *reset; + + if (device->devid != devid) + continue; + + zinfo = device->zone_info; + if (!zone_is_sequential(zinfo, offset)) + continue; + + reset = &zinfo->zones[offset / zinfo->zone_size]; + if (btrfs_reset_dev_zone(device->fd, reset)) { + error("zoned: failed to reset zone %llu: %m", + offset / zinfo->zone_size); + return -EIO; + } + } + + return 0; +} + #endif int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index 1ba5a9939a3c..70044acc4d94 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -89,6 +89,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, struct btrfs_block_group *cache); bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info, u64 start, u64 end); +int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid, + u64 offset, u64 length); #else #define sbread(fd, buf, offset) \ pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) @@ -130,6 +132,12 @@ static inline bool btrfs_redirty_extent_buffer_for_zoned( return false; } +static inline int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, + u64 devid, u64 offset, u64 length) +{ + return 0; +} + #endif /* BTRFS_ZONED */ static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Apr 26 06:27:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5423AC43461 for ; Mon, 26 Apr 2021 06:28:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 353DD60FE5 for ; Mon, 26 Apr 2021 06:28:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232027AbhDZG3P (ORCPT ); Mon, 26 Apr 2021 02:29:15 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231985AbhDZG3J (ORCPT ); Mon, 26 Apr 2021 02:29:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418507; x=1650954507; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=V06DWIQPAuC54kVXygCBbgmdtbthaINtC6mttwoRURo=; b=Azc1DWxQpIFTyZzUbkQElGQxeiwN4svH5yKicNfldUcr7GP41CvsFiQM ghjwAAgiwtZIwaAz/d8W+LrlMy9LihJLSGk9pvNX1KBVagK5kZAHPNNSE crVpVRS9wNpt/8RajpQz/uCpvORygc0yxUEcRjhYjVsipD9/12O8YZJrp tRmNx7G4e1AKPb+lg0lYyxreevx3d0r/XLInquhZPJyLUdPB+UmqbQhlP kDm+X0RdfKji8JQ0CrH9E202tNoEhq8MGhmjwpudbmyMFNQD2/8SFEbCz hwqokL/eU4ztz5H3VCJHa+tn3vP7RYE9CHkROkdEowIqjyq/NxrNG+ya9 Q==; IronPort-SDR: dzfXKTM+VPEPTt3NegSYTZnF/LqoLkTNKFazmv/ocvtAbIl4+4mR5o35reOmImApKD10SfEU3z j8U0DjfejrYNicoQsFa6W/x4BMcidpB/DGyb60tn0t5BgSexDwC8PdhkjKN7C7rQyRE2TBlwV7 EsWt62MGdL2xPEJPcOjQsWt6BcMz/uBTW+jL6pABpaMWVNkl5AEAPUrUzX2vdzIuqNdtFZj6Op iG77PGWxkRYb+HmG2A/4IqqPWOyv84vgpMXzsFDxz3uSnmRg+aJ2nxvh7BEVNcEc9c8zEzPJG7 LVc= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788133" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:27 +0800 IronPort-SDR: 2QKZ9Z6RWn1/MRmtxQKtoFqzpBb0DOCnObOAzbVAkq8TVio6B+QVXQESk4Xczi4i+TcrTpPS6k 8FM8RoDuvqfzEY+4QFla03PvzL9B2CX4fEDZoeBBvGvVDByE2hTD2j4psHGyCJV62ZELDGKxio DJ/Tl/wnZbMefTp+2jnyaW1GHP8V1z1kga4d/1wGXNmkslAIAf0EM9Y0rGz8qcLqb8O5CJoxHX c3MUjFWh+COFI0jCLrCpGqTK3RFxBUI/ITAbz31NCrfzKP76aDrgTTnWhrIBPnxZjpII0MhDbW BtNEfcz6sORswpNfvFj0bG0R Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:51 -0700 IronPort-SDR: 1SjEG1dBd9BnnjnTgZ8RBbFIwgtaEZ9463LtcKuJhSmtFjrbIEiZG2fKHQ8lBTPatrBzTu2MWs /YMVa+WfP3N3xaXrPaZQf5NRewUx77tmuNTUtFakJ1P2Rika6mdvEQv4rd73dsYUDMBB0b+o/+ wI47uiFOcM3dZo72KDuLaEXtYWrmSb1Nixuh3bgkeUHqyYqqFH3hO00cdu7XrF9edKzuVIiK9Z cxVVScBLVp7zRsl/YuSD5+w3bAQOt695p5ubwpp/9eW6tMelcbSnVzZJND8DMQt5QtZDcQPQSL 5YE= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:27 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 17/26] btrfs-progs: zoned: support resetting zoned device Date: Mon, 26 Apr 2021 15:27:33 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org All zones of zoned block devices should be reset before writing. Support this by introducing PREP_DEVICE_ZONED. btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if it is sequential required zone, or discard the zone range otherwise. Signed-off-by: Naohiro Aota --- common/device-utils.c | 35 +++++++++++++++++++++++++++++++---- common/device-utils.h | 2 ++ kernel-shared/zoned.c | 33 +++++++++++++++++++++++++++++++++ kernel-shared/zoned.h | 7 +++++++ 4 files changed, 73 insertions(+), 4 deletions(-) diff --git a/common/device-utils.c b/common/device-utils.c index f5d5277e8fce..2687f1884619 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -25,6 +25,7 @@ #include #include "kernel-lib/sizes.h" #include "kernel-shared/disk-io.h" +#include "kernel-shared/zoned.h" #include "common/device-utils.h" #include "common/internal.h" #include "common/messages.h" @@ -49,7 +50,7 @@ static int discard_range(int fd, u64 start, u64 len) /* * Discard blocks in the given range in 1G chunks, the process is interruptible */ -static int discard_blocks(int fd, u64 start, u64 len) +int discard_blocks(int fd, u64 start, u64 len) { while (len > 0) { /* 1G granularity */ @@ -155,6 +156,7 @@ out: int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, u64 max_block_count, unsigned opflags) { + struct btrfs_zoned_device_info *zinfo = NULL; u64 block_count; struct stat st; int i, ret; @@ -173,7 +175,27 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, if (max_block_count) block_count = min(block_count, max_block_count); - if (opflags & PREP_DEVICE_DISCARD) { + if (opflags & PREP_DEVICE_ZONED) { + ret = btrfs_get_zone_info(fd, file, &zinfo); + if (ret < 0 || !zinfo) { + error("zoned: unable to load zone information of %s", + file); + return 1; + } + if (opflags & PREP_DEVICE_VERBOSE) + printf("Resetting device zones %s (%u zones) ...\n", + file, zinfo->nr_zones); + /* + * We cannot ignore zone reset errors for a zoned block + * device as this could result in the inability to write to + * non-empty sequential zones of the device. + */ + if (btrfs_reset_all_zones(fd, zinfo)) { + error("zoned: failed to reset device '%s' zones: %m", + file); + goto err; + } + } else if (opflags & PREP_DEVICE_DISCARD) { /* * We intentionally ignore errors from the discard ioctl. It * is not necessary for the mkfs functionality but just an @@ -198,17 +220,22 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, if (ret < 0) { errno = -ret; error("failed to zero device '%s': %m", file); - return 1; + goto err; } ret = btrfs_wipe_existing_sb(fd); if (ret < 0) { error("cannot wipe superblocks on %s", file); - return 1; + goto err; } + free(zinfo); *block_count_ret = block_count; return 0; + +err: + free(zinfo); + return 1; } u64 btrfs_device_size(int fd, struct stat *st) diff --git a/common/device-utils.h b/common/device-utils.h index d1799323d002..e7e638a57eb2 100644 --- a/common/device-utils.h +++ b/common/device-utils.h @@ -23,7 +23,9 @@ #define PREP_DEVICE_ZERO_END (1U << 0) #define PREP_DEVICE_DISCARD (1U << 1) #define PREP_DEVICE_VERBOSE (1U << 2) +#define PREP_DEVICE_ZONED (1U << 3) +int discard_blocks(int fd, u64 start, u64 len); u64 get_partition_size(const char *dev); u64 disk_size(const char *path); u64 btrfs_device_size(int fd, struct stat *st); diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 22e0245abaf6..ba1399cce04d 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -361,6 +361,39 @@ static int report_zones(int fd, const char *file, return 0; } +/* + * Discard blocks in the zones of a zoned block device. Process this with + * zone size granularity so that blocks in conventional zones are discarded + * using discard_range and blocks in sequential zones are reset though a + * zone reset. + */ +int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo) +{ + unsigned int i; + int ret = 0; + + ASSERT(zinfo); + + /* Zone size granularity */ + for (i = 0; i < zinfo->nr_zones; i++) { + if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) { + ret = discard_blocks(fd, + zinfo->zones[i].start << SECTOR_SHIFT, + zinfo->zone_size); + if (ret == EOPNOTSUPP) + ret = 0; + } else if (zinfo->zones[i].cond != BLK_ZONE_COND_EMPTY) { + ret = btrfs_reset_dev_zone(fd, &zinfo->zones[i]); + } else { + ret = 0; + } + + if (ret) + return ret; + } + return fsync(fd); +} + static int sb_log_location(int fd, struct blk_zone *zones, int rw, u64 *bytenr_ret) { diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index 70044acc4d94..88831d2d787c 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -91,6 +91,7 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info, u64 start, u64 end); int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid, u64 offset, u64 length); +int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo); #else #define sbread(fd, buf, offset) \ pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) @@ -138,6 +139,12 @@ static inline int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, return 0; } +static inline int btrfs_reset_all_zones(int fd, + struct btrfs_zoned_device_info *zinfo) +{ + return -EOPNOTSUPP; +} + #endif /* BTRFS_ZONED */ static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Apr 26 06:27:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A44BC433ED for ; Mon, 26 Apr 2021 06:28:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3710B61153 for ; Mon, 26 Apr 2021 06:28:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232019AbhDZG3N (ORCPT ); Mon, 26 Apr 2021 02:29:13 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231994AbhDZG3K (ORCPT ); Mon, 26 Apr 2021 02:29:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418508; x=1650954508; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=P5lV0K+qJTEhuwTW3AJRQ0wX1wtPNylRcGOWbVn2pbk=; b=Axg2XnMvXvbROyuskQcqMln7JDWfL9MqOIM5ag9l7Jiz3IpAdW2lSuVk e3gxvNKAqLR/NN2GYcVZnHqg/Y5Bd0akxKiU8UpdCRTTH0I995VEqKw2W MXNsDNMX3Y9SvQSkLbrKT+aOn96jB1cbj7fBanG2bJ+4Clxz7bcKRk7Hi PluA9Dxz9djPzdc9JeME1mbgB8UmT3Ry/U5/Pkrr7kCPoXzP0g1yFNywl YcEeHLnz9PF1NVozsqDDxUhBF3Fg6bkO7rEf5JoHrrjGlI0dT42FK3ndj ACQjdRCMOrfPYm4B4MutzZtLzWiqd/LEd02PcscMPy9EfpQUgNO09sObG Q==; IronPort-SDR: P5rCZPA+m7ULetW5sThUXKN26ORXlQLBUwBjL0wMbxKoesO4B45d1Scj0mvooI+V+//PySDBmV kmUCJfazsTadXfTScHvAxIw4RBBj6WMO4aCozVqczEINtaBS7gGvOCOt/YaiOs14A+55wIBDr2 vpCepKkTSPePg1mU8ZwrC1c6U9Mk8IIL5dnH88MlU612vYLubonCbhsqKkXA2MZIvGl4TSerf4 eHMDDQ80HSy+j/Mi+vCF2CUNeaxe5e/UpACsg30Y9vKL5koY1abwDiJ+inGeEpq3wYLf/9UPBJ Ggo= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788134" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:28 +0800 IronPort-SDR: NkNQhS9v1m1b8T+8TP7XVMCTuHWcPCD1yWlvUBVTwB42vN7FBYXGtq1u1oNqKD9/3KSJJoCGb+ ZGNZKW/n03lCQ2g2kh/kBXgVd006TUXurHJW7mFoYI/q/pJvfTCW2iUdksF2u9MiEsbELbqE1o y3NY5Ct48Qq+OYf8lObeCdIxbh6p3lUIOdOWifOx4plpx1fZM12eelUzuntlxot5eWJK/FeUkS MLilOvnqFL11LubndeTDKc8+u3r5ejOCjMM8omdAmY82tsn3SKLgMQ8vj8lTGH8erRCVwC5oRq jEB+ZlGhggcX6t/JsMCD4E75 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:52 -0700 IronPort-SDR: 5wCC3dvqYytowrptfYuU5V6VQYOFDdsuJ+xXF/2jpdzUStTeYCYtCUsPa7WahTsa2IBSE6NOzG vCpIvw4cpOhJsshi7gVtu62fv288ITmLKioWmEQS6WzLki7eLg1SoCDNHq+TghZc1oiJ4rTH4h gcL//0E0RrFhMqihE5A7yJs2TQ4FE9LnwHpSYfAYrkVAuHy9sHxRBdS4BHe7fLfVJEvFusAKvH LiM2ECKI8oSHeMkmH05Q0oHlIYz6ro+oIelWlnEYzYEX+sWIcYPUM9XNsd2mYolPnT1WZ4kLO4 88E= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:28 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 18/26] btrfs-progs: zoned: support zero out on zoned block device Date: Mon, 26 Apr 2021 15:27:34 +0900 Message-Id: <9740fbfd8cb582ac0f961bd96a4a8dadb10c8a44.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If we zero out a region in a sequential write required zone, we cannot write to the region until we reset the zone. Thus, we must prohibit zeroing out to a sequential write required zone. zero_dev_clamped() is modified to take the zone information and it calls zero_zone_blocks() if the device is host managed to avoid writing to sequential write required zones. Signed-off-by: Naohiro Aota --- common/device-utils.c | 14 +++++++++----- common/device-utils.h | 1 + kernel-shared/zoned.c | 28 ++++++++++++++++++++++++++++ kernel-shared/zoned.h | 9 +++++++++ 4 files changed, 47 insertions(+), 5 deletions(-) diff --git a/common/device-utils.c b/common/device-utils.c index 2687f1884619..c1006c501555 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -67,7 +67,7 @@ int discard_blocks(int fd, u64 start, u64 len) return 0; } -static int zero_blocks(int fd, off_t start, size_t len) +int zero_blocks(int fd, off_t start, size_t len) { char *buf = malloc(len); int ret = 0; @@ -86,7 +86,8 @@ static int zero_blocks(int fd, off_t start, size_t len) #define ZERO_DEV_BYTES SZ_2M /* don't write outside the device by clamping the region to the device size */ -static int zero_dev_clamped(int fd, off_t start, ssize_t len, u64 dev_size) +static int zero_dev_clamped(int fd, struct btrfs_zoned_device_info *zinfo, + off_t start, ssize_t len, u64 dev_size) { off_t end = max(start, start + len); @@ -99,6 +100,9 @@ static int zero_dev_clamped(int fd, off_t start, ssize_t len, u64 dev_size) start = min_t(u64, start, dev_size); end = min_t(u64, end, dev_size); + if (zinfo && zinfo->model == ZONED_HOST_MANAGED) + return zero_zone_blocks(fd, zinfo, start, end - start); + return zero_blocks(fd, start, end - start); } @@ -209,12 +213,12 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, } } - ret = zero_dev_clamped(fd, 0, ZERO_DEV_BYTES, block_count); + ret = zero_dev_clamped(fd, zinfo, 0, ZERO_DEV_BYTES, block_count); for (i = 0 ; !ret && i < BTRFS_SUPER_MIRROR_MAX; i++) - ret = zero_dev_clamped(fd, btrfs_sb_offset(i), + ret = zero_dev_clamped(fd, zinfo, btrfs_sb_offset(i), BTRFS_SUPER_INFO_SIZE, block_count); if (!ret && (opflags & PREP_DEVICE_ZERO_END)) - ret = zero_dev_clamped(fd, block_count - ZERO_DEV_BYTES, + ret = zero_dev_clamped(fd, zinfo, block_count - ZERO_DEV_BYTES, ZERO_DEV_BYTES, block_count); if (ret < 0) { diff --git a/common/device-utils.h b/common/device-utils.h index e7e638a57eb2..6eee3270e0c7 100644 --- a/common/device-utils.h +++ b/common/device-utils.h @@ -26,6 +26,7 @@ #define PREP_DEVICE_ZONED (1U << 3) int discard_blocks(int fd, u64 start, u64 len); +int zero_blocks(int fd, off_t start, size_t len); u64 get_partition_size(const char *dev); u64 disk_size(const char *path); u64 btrfs_device_size(int fd, struct stat *st); diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index ba1399cce04d..3c476eebf004 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -394,6 +394,34 @@ int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo) return fsync(fd); } +int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, + size_t len) +{ + size_t zone_len = zinfo->zone_size; + off_t ofst = start; + size_t count; + int ret; + + /* Make sure that zero_blocks does not write sequential zones */ + while (len > 0) { + /* Limit zero_blocks to a single zone */ + count = min_t(size_t, len, zone_len); + if (count > zone_len - (ofst & (zone_len - 1))) + count = zone_len - (ofst & (zone_len - 1)); + + if (!zone_is_sequential(zinfo, ofst)) { + ret = zero_blocks(fd, ofst, count); + if (ret != 0) + return ret; + } + + len -= count; + ofst += count; + } + + return 0; +} + static int sb_log_location(int fd, struct blk_zone *zones, int rw, u64 *bytenr_ret) { diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index 88831d2d787c..9e1ce3ae103f 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -92,6 +92,8 @@ bool btrfs_redirty_extent_buffer_for_zoned(struct btrfs_fs_info *fs_info, int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid, u64 offset, u64 length); int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo); +int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, + size_t len); #else #define sbread(fd, buf, offset) \ pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) @@ -145,6 +147,13 @@ static inline int btrfs_reset_all_zones(int fd, return -EOPNOTSUPP; } +static inline int zero_zone_blocks(int fd, + struct btrfs_zoned_device_info *zinfo, + off_t start, size_t len) +{ + return -EOPNOTSUPP; +} + #endif /* BTRFS_ZONED */ static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Apr 26 06:27:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F566C43462 for ; Mon, 26 Apr 2021 06:28:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7426A61263 for ; Mon, 26 Apr 2021 06:28:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231985AbhDZG3Q (ORCPT ); Mon, 26 Apr 2021 02:29:16 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231998AbhDZG3L (ORCPT ); Mon, 26 Apr 2021 02:29:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418509; x=1650954509; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0YA7DcbyvtO3SDNosMcpRwVPiz2Do5MC3wVv1b8aPKs=; b=S260SYDTAVkBPt8eegODHv8YwYaLO+Tx3rvxvzdsmQ24FHo3r81B2u8M ArB1NaC/YFS3dgl2g+Fd/4vaNqAMr706Qu/yK8LqJ55SF92o9U7vA24Dz klfc5DGa1qoK3BF/frqiemoyszv58PTnJgX2OGHgYgW7wWkSz44sMxpRt M6goPg5+bo5wb44iOmCv60nBakakOjURpq+Ik/R2j+QPU1K8kFYhYXbwC dva3JE8gieftKfxkgHz4A7ROrNDoOTV8lPJ+uOBfAmpaFr6VfIJrVXT1N xu4QtnA3873xuKuoWmUDrd3mVenUWLXdtW5R/IG00lcju+2SNUGr6Ftjo w==; IronPort-SDR: cqicdXCZda9Egz39XURh34BsdNPMjV/eJbmoP70FlS8eZcjhZIOgwp9mPNaBc2QmPL/fsBUqn8 hIefdy2AGQbJN0v9dpidFxYLl3B074HMOzB3YzJ2kbSq8AZ6y3Bgu/E55PWiILZKqIQ66c8+bj pWFb4j4lfzSWfjSnPeHsL6/zL+dIBwMV90RMtuO8u/05xM4AIVTbhyB3ktnDN5B+p/q3Dm+R2P BwsgoKlY1nK5r/H58l5yM7BQ/2TZI/0QxvHXFRzXrGu6cbeiL5H7Lz2//ajFhTooOlE2IUzY7l RXg= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788136" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:28 +0800 IronPort-SDR: 6504aNTL6yb5GxMMNAZJ0LwcmuA/mKQrqMHF2pjbeAmUS1U/Y4CbpoUV0WtdyG6B1KfTQhtVOa k/F4aYs3E8DxUciRgkJpqVCTI7nTNu/I5RjwPZztaLl48iHS5kVamV1cjQZIDO2S7wkqwFxEJq q9N559ylfu1R0KYEFXLt3y6suLomwQ8DrLCiScyesPN+0cDSdIxBpV2wOYpKQexNOa5ehop7+g eU4pWyUjv8ViHe2Rg+xAoSbG3Nwe4HaRt+GYGDFsv3TeTxRx6gWYbFMrO1rGM1g+5Rxru1JgR0 tMSzdEuAvqu4mpAIarijvOXf Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:53 -0700 IronPort-SDR: /tAlQBwlfX1ey15QYHGklUNDFWZZ/0C+sVNDr+W9tXXO05k3qU6YuaXD8/BCKMvwoJ3eaVXhAV VVEyexv0bJ21qCeG6bzfZYxPtUwHHm1Il9c47lUTAScNAdwswukaoNhjBNu79y9CnJ5RVfa53a Bd6rjM5LxpKK0fPbTxsk4um4nUl+IdZFjPujvgLjUzNYXAWlsgWbJJLwO/r5TRC37CTED5nUpW H//JW2K/CKr5iYW6JzS52BeNwF8ALp5nXHf+4Is9DdaBZPJ7uqQ6/RuQHu0Qk85TeP+P+7Q2PJ X70= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:29 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 19/26] btrfs-progs: zoned: support wiping SB on sequential write zone Date: Mon, 26 Apr 2021 15:27:35 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We cannot overwrite superblock magic in a sequential required zone. Instead, we can reset the zone to wipe it. Signed-off-by: Naohiro Aota --- common/device-utils.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/common/device-utils.c b/common/device-utils.c index c1006c501555..4230654653aa 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -106,7 +106,7 @@ static int zero_dev_clamped(int fd, struct btrfs_zoned_device_info *zinfo, return zero_blocks(fd, start, end - start); } -static int btrfs_wipe_existing_sb(int fd) +static int btrfs_wipe_existing_sb(int fd, struct btrfs_zoned_device_info *zinfo) { const char *off = NULL; size_t len = 0; @@ -141,14 +141,26 @@ static int btrfs_wipe_existing_sb(int fd) if (len > sizeof(buf)) len = sizeof(buf); - memset(buf, 0, len); - ret = pwrite(fd, buf, len, offset); - if (ret < 0) { - error("cannot wipe existing superblock: %m"); - ret = -1; - } else if (ret != len) { - error("cannot wipe existing superblock: wrote %d of %zd", ret, len); - ret = -1; + if (!zone_is_sequential(zinfo, offset)) { + memset(buf, 0, len); + ret = pwrite(fd, buf, len, offset); + if (ret < 0) { + error("cannot wipe existing superblock: %m"); + ret = -1; + } else if (ret != len) { + error("cannot wipe existing superblock: wrote %d of %zd", + ret, len); + ret = -1; + } + } else { + struct blk_zone *zone = &zinfo->zones[offset / zinfo->zone_size]; + + ret = btrfs_reset_dev_zone(fd, zone); + if (ret < 0) { + error( + "zoned: failed to wipe zones containing superblock: %m"); + ret = -1; + } } fsync(fd); @@ -227,7 +239,7 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, goto err; } - ret = btrfs_wipe_existing_sb(fd); + ret = btrfs_wipe_existing_sb(fd, zinfo); if (ret < 0) { error("cannot wipe superblocks on %s", file); goto err; From patchwork Mon Apr 26 06:27:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62A0EC43460 for ; Mon, 26 Apr 2021 06:28:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 462C161263 for ; Mon, 26 Apr 2021 06:28:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232033AbhDZG3Q (ORCPT ); Mon, 26 Apr 2021 02:29:16 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232001AbhDZG3M (ORCPT ); Mon, 26 Apr 2021 02:29:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418510; x=1650954510; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OtamS0SME954htnKEKlGeGhi8XVokKaxWAHG+a7apzU=; b=R2I5g95E9/9unmk+dRhNmzxl4WWEoCT5vNrZxoqZ5gdtkFVog4Ddrh5A oF/yZq+fqqKasxw69Mwoske+9CswIVX/01CFTjazQy3L66s0AtAcYrMy0 FTY0Jlj+EWYBL+eLGkr26+Vl+OlzVvwo/UMp3vf8HJvHuxh2j5BeTvRPe X5wgT4t25u+hbsSkeGy56Yk8ZKzw5UrSmzOmteoxhoLsdZB05ijbpbc95 qFKPmj2jlTrgOdbk5A9Qf/9vjAxcUaVVrDHVrUu0RMF3GjJgm2gvTHJ2S 6DVvAQcPTf+qW02YjjMosrnh/fsX/eQIzmBaG9tDjYF7PzHkcmqA0yePV Q==; IronPort-SDR: i9gajImBpTiLbKBlIr7IlZLeCrXU7G0nFbzwtWP3Sxsj6dj1nsQi5AyKX7sC8i2YE6+RDTcJ36 RETPbT0k/Yk9BKKYLlB/EEZaoBYlXrYfS4i/Je27nWrM9YC7dKwDtnznlkjuq8MNygFPiTJ6nk 1kxi6CeuUvPfkiGthky2ZEl2A89rI0fnq5+3TQK4WvWBR6PqNlOF5Hb83q4xwM6drAkUQqIRDG zBD8lEDw8o2EmFk42F8uzDTo7m0TXfePap2bi6Yrbrt/CANt7ytVVoMqWErWkG9GZj099gzXqK SAA= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788137" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:29 +0800 IronPort-SDR: HwbeA5++HHyuLCYTL5Eh/AgtJaUD6O2bza3JkUhAm3zp00dCuiMWJujNR++JWT+DS/y2nWR3kb +jDZtRScS9MC/vqs6OR8RzjEJZwM3uKA5KCcAPhAvcdlaBfdHi7vsYXuyIr0pZ1hxXY9uHOKnE TVdAZvgnu79vRpTASr3Gm9Ng1da5iUUVVOHhPF2ZnS2LmMO06iSJoCqalDpZOVV1AC+D+oUF6j qsaRfdST94LJm6JxgyOdwjrkc/R9+rkpPKNjkslW+7ffKE/pY32muWXJ+UqwIyJvpDSEVLF9uD W1N6BDk9tHr6aF9YEC8wNhaq Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:55 -0700 IronPort-SDR: 7kBS1Va1IR8on4TsE/5bkPx7GOI/QFnnv7YRba2JPTnhR1Q7e/Ac9yldaYW7X6Qrxlb3Zk5Dfo xdwMSqqtMYHeSGiqIDNZJBy51T+UDjqxgV+NxHs+jPPyTeRqexVydTFg3d1wejJ+T8v10YR9+m dLu/ygzXngGyK/vFMEyQ5vXE+bMWthK53hxU2/hBZSBD6dw4Ayc4R0lcqj6H0vzly4KYVn5Df5 TpNm0kaTUaPsr6oXAO7U7qahjJ4LLbo3kOkeYRtHJW7uqqKB/llMkiCsDLGcJ61/iF3xLmUAhQ K/c= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:30 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 20/26] btrfs-progs: mkfs: zoned: detect and enable zoned feature flag Date: Mon, 26 Apr 2021 15:27:36 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This commit make mkfs.btrfs aware of the "zoned" feature flag and prepare the disks for mkfs.btrfs. It automatically detects host-managed zoned device and enable the future. It also add "zone_size" to struct btrfs_mkfs_config to track the zone size. Signed-off-by: Naohiro Aota --- mkfs/common.h | 1 + mkfs/main.c | 28 ++++++++++++++++++++++++++-- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/mkfs/common.h b/mkfs/common.h index cc88db7183fb..4d86f5ef4ccc 100644 --- a/mkfs/common.h +++ b/mkfs/common.h @@ -65,6 +65,7 @@ struct btrfs_mkfs_config { u64 num_bytes; /* checksum algorithm to use */ enum btrfs_csum_type csum_type; + u64 zone_size; /* Output fields, set during creation */ diff --git a/mkfs/main.c b/mkfs/main.c index a903896289fa..42e6e6b58b04 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -37,6 +37,7 @@ #include "kernel-shared/free-space-tree.h" #include "kernel-shared/volumes.h" #include "kernel-shared/transaction.h" +#include "kernel-shared/zoned.h" #include "common/utils.h" #include "common/path-utils.h" #include "common/device-utils.h" @@ -900,6 +901,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv) int metadata_profile_opt = 0; int discard = 1; int ssd = 0; + int zoned = 0; int force_overwrite = 0; char *source_dir = NULL; bool source_dir_set = false; @@ -1069,6 +1071,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv) if (dev_cnt == 0) print_usage(1); + zoned = features & BTRFS_FEATURE_INCOMPAT_ZONED; + if (source_dir_set && dev_cnt > 1) { error("the option -r is limited to a single device"); goto error; @@ -1109,6 +1113,19 @@ int BOX_MAIN(mkfs)(int argc, char **argv) file = argv[optind++]; ssd = is_ssd(file); + if (zoned) { + if (!zone_size(file)) { + error("zoned: %s: zone size undefined", file); + exit(1); + } + } else if (zoned_model(file) == ZONED_HOST_MANAGED) { + if (verbose) + printf( + "Zoned: %s: host-managed device detected, setting zoned feature\n", + file); + zoned = 1; + features |= BTRFS_FEATURE_INCOMPAT_ZONED; + } /* * Set default profiles according to number of added devices. @@ -1278,7 +1295,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv) ret = btrfs_prepare_device(fd, file, &dev_block_count, block_count, (zero_end ? PREP_DEVICE_ZERO_END : 0) | (discard ? PREP_DEVICE_DISCARD : 0) | - (verbose ? PREP_DEVICE_VERBOSE : 0)); + (verbose ? PREP_DEVICE_VERBOSE : 0) | + (zoned ? PREP_DEVICE_ZONED : 0)); if (ret) goto error; if (block_count && block_count > dev_block_count) { @@ -1309,6 +1327,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv) mkfs_cfg.stripesize = stripesize; mkfs_cfg.features = features; mkfs_cfg.csum_type = csum_type; + mkfs_cfg.zone_size = zone_size(file); ret = make_btrfs(fd, &mkfs_cfg); if (ret) { @@ -1391,7 +1410,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv) block_count, (verbose ? PREP_DEVICE_VERBOSE : 0) | (zero_end ? PREP_DEVICE_ZERO_END : 0) | - (discard ? PREP_DEVICE_DISCARD : 0)); + (discard ? PREP_DEVICE_DISCARD : 0) | + (zoned ? PREP_DEVICE_ZONED : 0)); if (ret) { goto error; } @@ -1502,6 +1522,10 @@ raid_groups: btrfs_group_profile_str(metadata_profile), pretty_size(allocation.system)); printf("SSD detected: %s\n", ssd ? "yes" : "no"); + printf("Zoned device: %s\n", zoned ? "yes" : "no"); + if (zoned) + printf("Zone size: %s\n", + pretty_size(fs_info->zone_size)); btrfs_parse_fs_features_to_string(features_buf, features); printf("Incompat features: %s\n", features_buf); btrfs_parse_runtime_features_to_string(features_buf, From patchwork Mon Apr 26 06:27:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1ABB5C433B4 for ; Mon, 26 Apr 2021 06:28:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DEDAE60FE5 for ; Mon, 26 Apr 2021 06:28:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232039AbhDZG3R (ORCPT ); Mon, 26 Apr 2021 02:29:17 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232017AbhDZG3N (ORCPT ); Mon, 26 Apr 2021 02:29:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418511; x=1650954511; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FkTWq40rmGdCVEdnixL5+cbs5NMtmsI4916/9B1ciTc=; b=ADPrGYzM4fVZW60AnzDovmE/spdF++wP/iGpk5xyYea8ERPO0mGZVfdm fW/8S74fAcnYjkTrjRpSOJ/zMH4FAVPmcWBdkA0PzP0bbPa/OKNRMhw08 mxbTju/fONUvWvJZk9YQ/CHXGks9t63+PJ+OtJGgtWt1Wpsg4Nmr1c2Hn KgPz+cguyZm/ifqckHBmUyetC93/ao/ylk8Y3uZtDjNVkfgH/VqxFSVwR jx/HhzLFoeVfnnvJ8KKryrXAKJzlJ7+8FT0EKAN+JwIWTbv4xiFtj0dkR EjgUX71F+qi5/dmavXEV0RXbZmbFwFZa2aEXkm3yKoqe8f/qnDoqrcAxL w==; IronPort-SDR: SEhP6ZWH4ikwiFT/C+gMIWHruYM0p5AYiBrscMApKPOPHFt2gjMTJyhwXkz6zoUg3h4r6Rs+2E 5X/8Rl5D7UwSGAmzHYYBOYDLwdfZMoilMV/jYzgC701LvAKTn4IF4bI/KFaRP1OqbHMzPImCyJ Disd9ZfhkaHKcd9Ml0xrZFrqvhOTUfHsSCTQbvB6E9BHxhGOyGkFtPR1K9pQ2hoewhswGDV4X9 42Xia7u4zDO+m7mfel66a22RxdOpkU/vmznUidYHHyTlMFwSVsQ9NfPhvIBHNZB7ZCWNXj8NmC bls= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788140" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:30 +0800 IronPort-SDR: lZi/9NUlyNvRozttcyzgeVAZYzCvtNxVeDlkxppNrU415wn2XVKOoSG9Yj0tWgjUjdg5IvUE1w Owz6YvNULsZqLBuhdY/keV2DbYJ/RIdm+QcpDirGHIfidIg/if1bsh3WD1b/Fymq662kwM2Nvq Uji/UjFtzQjAANgNt0UXCjujiLLW70qq/f7F381vcdWxugnuilYBJDNzsse978DiEGq6gH3YW1 dSjglvwoQcsPc1mcKkXDySxJHqF+v9QVitrUXB/8G4u8idO9RVLKwDRPoiYonuO/vC92CX9MPn 7EDi2ZL9B9g68xolcwRfednH Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:56 -0700 IronPort-SDR: OtCZLyauE2Jfxh/HCBdLTj8k2IqEaMej/JhqSP+7r0WCtrZsnME4X9vtk+gXBc5JFYnRAoUJp0 XquFOvfFqDPD9WT84qPzduzUuhAnZDPiEHWVja3qbSBUF0jhNGE9zXpki1Sijkx24O11KnQgXw CPJ5Z2iscWJgNtvj9KOCbOXw0kcIwX1XYmMSmZW5ts8cHq2e7rrHnY6ZgZ8dK3vEq4DNlzDfVm YYtyW8YB6a4jkm3dfMzS3F/CsNkT8YDXX9yRQPgi4YHdaKlejjYa1qPoh6Eqc8pkJznFFFaDE4 lWs= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:31 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 21/26] btrfs-progs: mkfs: zoned: check incompatible features with zoned btrfs Date: Mon, 26 Apr 2021 15:27:37 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This commit disables some features which are incompatible with zoned btrfs. RAID/DUP is disabled because we cannot handle two zone append writes to different zones in the kernel. MIXED_BG is disabled because the allocated metadata region will be write holes for data writes. Space-cache (v1) require in-place updatings. It also disables the "--rootdir" option for now. The copying from a directory needs some tweaks for zoned btrfs (e.g. zone size aware space calculation), and we do not implement them yet. Signed-off-by: Naohiro Aota --- mkfs/common.c | 5 ++++- mkfs/main.c | 23 +++++++++++++++++++++++ 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/mkfs/common.c b/mkfs/common.c index 368f3b06f75e..6b0c434fbd6a 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -204,7 +204,10 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_super_stripesize(&super, cfg->stripesize); btrfs_set_super_csum_type(&super, cfg->csum_type); btrfs_set_super_chunk_root_generation(&super, 1); - btrfs_set_super_cache_generation(&super, -1); + if (cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED) + btrfs_set_super_cache_generation(&super, 0); + else + btrfs_set_super_cache_generation(&super, -1); btrfs_set_super_incompat_flags(&super, cfg->features); if (cfg->label) __strncpy_null(super.label, cfg->label, BTRFS_LABEL_SIZE - 1); diff --git a/mkfs/main.c b/mkfs/main.c index 42e6e6b58b04..9407cdfa8fe7 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -1191,6 +1191,23 @@ int BOX_MAIN(mkfs)(int argc, char **argv) features |= BTRFS_FEATURE_INCOMPAT_RAID1C34; } + if (zoned) { + if (source_dir_set) { + error("the option -r and zoned feature are incompatible"); + exit(1); + } + + if (features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) { + error("cannot enable mixed-bg with zoned feature"); + exit(1); + } + + if (features & BTRFS_FEATURE_INCOMPAT_RAID56) { + error("cannot enable RAID5/6 with zoned feature"); + exit(1); + } + } + if (btrfs_check_nodesize(nodesize, sectorsize, features)) goto error; @@ -1280,6 +1297,12 @@ int BOX_MAIN(mkfs)(int argc, char **argv) if (ret) goto error; + if (zoned && ((metadata_profile | data_profile) & + BTRFS_BLOCK_GROUP_PROFILE_MASK)) { + error("cannot use RAID/DUP profile on zoned mode"); + goto error; + } + dev_cnt--; /* From patchwork Mon Apr 26 06:27:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99AA7C43461 for ; Mon, 26 Apr 2021 06:28:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7B545611ED for ; Mon, 26 Apr 2021 06:28:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232043AbhDZG3S (ORCPT ); Mon, 26 Apr 2021 02:29:18 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231994AbhDZG3O (ORCPT ); Mon, 26 Apr 2021 02:29:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418512; x=1650954512; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Qyf1WvQDTZ6O1PcEII/rrzmWoCLf/e+D5iQfoxLJIBI=; b=aG38qulDmZzonRJJEWPwP5SvCQ55ZiWhWttKAlXlg1ryjoBjZPolr2Nf NQaDkdfOQ7uQlO1jpX5L/Bd5j6DxSw2qTaWkWSqM78X1mUrVf//uuv7gi 1LQ3StpI41x7x4z4i8O9xJpD16XJr9Z8ekMnYZPi6hot8qM2eTQinE5uC dU5MNvNfkrNyBVJWQJJxdCBanFJQocqdVJB2pLngHmQTiOHwB/kYyTAL4 mUQaJXVNTvYCriACgknaCaGax0zfpx+044ziWrLvSvj/Gu+339TDsWJcS mXZOma7v8MdowQmyixejq53UiWErulKjhfBwCf92Ur+jPRrimTffBG4Nv A==; IronPort-SDR: Y+IDZ8IIdjdE6/LX0jHyXOu/1U/iUAc1bBYchtyO1PyyN6BwOr0anTWdZcPYuno5n3sJwPwsKr e+8AsCOpy5kTfkzKZ20gIqMJJHlejVXz7fGhCEIEeswatG/yoncq9QsXjwB/LpQW1lBLghZDmF 4wfRF1IHr2T21oxZ3WCgMcEma4xlj5PB1a73+wvOtlherNKQHVDc9YxZvIfmHuJNr00VfZNOeA IRK2WoBA/7+xSK6dgypQArSjNxk2lZq9T1aOnYAi5Vrj1kkbiBG6m7P3mn2b9QXunrkxPvw1kw oqc= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788141" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:32 +0800 IronPort-SDR: NWqMznZDbOn+J7HqGY0AiHSADlVWHxQBynaViyoR2w4PPofmwf7j27Ycx4xnNAlcWTrVtzYYKX gc1S4Fi3168lCPpQiuvOkmGCHqmNuNiKxxgga4nuCVF2zGVPsHNIaV/nllpGAP01bgQS7ETXtj PUMMfazkz+s61Xs3ypWm64HnnBecfmSl5W7+Ow60QtkJfaq7lC8b9c2cwXZRGqTQM2IeYbYtyp qUaXIC4BjuJnLjY9KJ6r6qZuHOZeelnATW/AzpDD6T0rNbxWtWbR0pjuA8vYM+MH4FQL6o//0L i99ppdVaQwFjhhRbGUqaH1Vn Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:57 -0700 IronPort-SDR: Nmkn/5a+mJc2guVFL3Gls0aHoQTh//SxgPLF/Ao1sSf7nfzH69ENM2RZtVsoOlk3n55RpsM+Di MMW/I+T9zxS+P+8X1JZdUg1CbkGEHI/OB3rhEliR4UERRVtRhqoOm64VVaWGHNDEx4kTuDjIwk m9oRy6YnIH1yHVs6gUw3D5RnEqlfv3hFQvb4UdM/5wBQqwv8rdIlqcS6zo0RE+qkRSBWSsGm3i v26o9jvL0YxIO9Pb/Iq5crKRGTrhYZZIcvFkhaeH3wHo8y+2YzVTlAAr+8xWvdUyeLDnpO8qjn qzM= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:33 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 22/26] btrfs-progs: mkfs: zoned: tweak initial system block group placement Date: Mon, 26 Apr 2021 15:27:38 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On zoned btrfs, chunks must be aligned to zone size to ensure sequential writing to a block group maps to sequential writing to a device zone. Thus, we need to tweak the position and the size of the initial system block group. Signed-off-by: Naohiro Aota --- mkfs/common.c | 26 ++++++++++++++++---------- mkfs/main.c | 21 ++++++++++++++++----- 2 files changed, 32 insertions(+), 15 deletions(-) diff --git a/mkfs/common.c b/mkfs/common.c index 6b0c434fbd6a..3d10ad086754 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -22,6 +22,7 @@ #include "kernel-shared/ctree.h" #include "kernel-shared/disk-io.h" #include "kernel-shared/volumes.h" +#include "kernel-shared/zoned.h" #include "common/utils.h" #include "common/path-utils.h" #include "common/device-utils.h" @@ -155,6 +156,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) int skinny_metadata = !!(cfg->features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA); u64 num_bytes; + u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; + u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; + + if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) { + system_group_offset = cfg->zone_size * BTRFS_NR_SB_LOG_ZONES; + system_group_size = cfg->zone_size; + } buf = malloc(sizeof(*buf) + max(cfg->sectorsize, cfg->nodesize)); if (!buf) @@ -186,7 +194,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) cfg->blocks[MKFS_SUPER_BLOCK] = BTRFS_SUPER_INFO_OFFSET; for (i = 1; i < MKFS_BLOCK_COUNT; i++) { - cfg->blocks[i] = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER + + cfg->blocks[i] = system_group_offset + cfg->nodesize * (i - 1); } @@ -323,8 +331,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_device_id(buf, dev_item, 1); btrfs_set_device_generation(buf, dev_item, 0); btrfs_set_device_total_bytes(buf, dev_item, num_bytes); - btrfs_set_device_bytes_used(buf, dev_item, - BTRFS_MKFS_SYSTEM_GROUP_SIZE); + btrfs_set_device_bytes_used(buf, dev_item, system_group_size); btrfs_set_device_io_align(buf, dev_item, cfg->sectorsize); btrfs_set_device_io_width(buf, dev_item, cfg->sectorsize); btrfs_set_device_sector_size(buf, dev_item, cfg->sectorsize); @@ -345,14 +352,14 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) /* then we have chunk 0 */ btrfs_set_disk_key_objectid(&disk_key, BTRFS_FIRST_CHUNK_TREE_OBJECTID); - btrfs_set_disk_key_offset(&disk_key, BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + btrfs_set_disk_key_offset(&disk_key, system_group_offset); btrfs_set_disk_key_type(&disk_key, BTRFS_CHUNK_ITEM_KEY); btrfs_set_item_key(buf, &disk_key, nritems); btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff); btrfs_set_item_size(buf, btrfs_item_nr(nritems), item_size); chunk = btrfs_item_ptr(buf, nritems, struct btrfs_chunk); - btrfs_set_chunk_length(buf, chunk, BTRFS_MKFS_SYSTEM_GROUP_SIZE); + btrfs_set_chunk_length(buf, chunk, system_group_size); btrfs_set_chunk_owner(buf, chunk, BTRFS_EXTENT_TREE_OBJECTID); btrfs_set_chunk_stripe_len(buf, chunk, BTRFS_STRIPE_LEN); btrfs_set_chunk_type(buf, chunk, BTRFS_BLOCK_GROUP_SYSTEM); @@ -362,7 +369,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_chunk_num_stripes(buf, chunk, 1); btrfs_set_stripe_devid_nr(buf, chunk, 0, 1); btrfs_set_stripe_offset_nr(buf, chunk, 0, - BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + system_group_offset); nritems++; write_extent_buffer(buf, super.dev_item.uuid, @@ -401,7 +408,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) sizeof(struct btrfs_dev_extent); btrfs_set_disk_key_objectid(&disk_key, 1); - btrfs_set_disk_key_offset(&disk_key, BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + btrfs_set_disk_key_offset(&disk_key, system_group_offset); btrfs_set_disk_key_type(&disk_key, BTRFS_DEV_EXTENT_KEY); btrfs_set_item_key(buf, &disk_key, nritems); btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff); @@ -413,14 +420,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_dev_extent_chunk_objectid(buf, dev_extent, BTRFS_FIRST_CHUNK_TREE_OBJECTID); btrfs_set_dev_extent_chunk_offset(buf, dev_extent, - BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + system_group_offset); write_extent_buffer(buf, chunk_tree_uuid, (unsigned long)btrfs_dev_extent_chunk_tree_uuid(dev_extent), BTRFS_UUID_SIZE); - btrfs_set_dev_extent_length(buf, dev_extent, - BTRFS_MKFS_SYSTEM_GROUP_SIZE); + btrfs_set_dev_extent_length(buf, dev_extent, system_group_size); nritems++; btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_DEV_TREE]); diff --git a/mkfs/main.c b/mkfs/main.c index 9407cdfa8fe7..915e42b7f9cd 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -71,8 +71,17 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed, u64 bytes_used; u64 chunk_start = 0; u64 chunk_size = 0; + u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; + u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; int ret; + if (btrfs_is_zoned(fs_info)) { + /* Two zones are reserved for superblock */ + system_group_offset = fs_info->zone_size * + BTRFS_NR_SB_LOG_ZONES; + system_group_size = fs_info->zone_size; + } + if (mixed) flags |= BTRFS_BLOCK_GROUP_DATA; @@ -92,9 +101,8 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed, */ ret = btrfs_make_block_group(trans, fs_info, bytes_used, BTRFS_BLOCK_GROUP_SYSTEM, - BTRFS_BLOCK_RESERVED_1M_FOR_SUPER, - BTRFS_MKFS_SYSTEM_GROUP_SIZE); - allocation->system += BTRFS_MKFS_SYSTEM_GROUP_SIZE; + system_group_offset, system_group_size); + allocation->system += system_group_size; if (ret) return ret; @@ -917,6 +925,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv) struct mkfs_allocation allocation = { 0 }; struct btrfs_mkfs_config mkfs_cfg; enum btrfs_csum_type csum_type = BTRFS_CSUM_TYPE_CRC32; + u64 system_group_size; crc32c_optimization_init(); @@ -1330,9 +1339,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv) } /* To create the first block group and chunk 0 in make_btrfs */ - if (dev_block_count < BTRFS_MKFS_SYSTEM_GROUP_SIZE) { + system_group_size = zoned ? + zone_size(file) : BTRFS_MKFS_SYSTEM_GROUP_SIZE; + if (dev_block_count < system_group_size) { error("device is too small to make filesystem, must be at least %llu", - (unsigned long long)BTRFS_MKFS_SYSTEM_GROUP_SIZE); + (unsigned long long)system_group_size); goto error; } From patchwork Mon Apr 26 06:27:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20459C433ED for ; Mon, 26 Apr 2021 06:28:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EF16060FE5 for ; Mon, 26 Apr 2021 06:28:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232047AbhDZG3S (ORCPT ); Mon, 26 Apr 2021 02:29:18 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232030AbhDZG3P (ORCPT ); Mon, 26 Apr 2021 02:29:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418513; x=1650954513; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JMZc7s5PWSv2sdJtwJrr2lOzFkA8DgKfyV+RGbXQ8BU=; b=di0eb54DppjLEIpdVqTN8WpcOkwAcgPgamQX6rptIzrNarWzi4iDObUF CbrvLZ+Y/nTsBe+TlK17S7S1mJwIR7xN6sV7di4u948S6qL75YGsS1tEF wl/vDH3HBUzVt/GLQh1mtaoc/zPvyTjfaEty5MJq0wO+pjrDVdcp1JGQ/ sMC+7AO2Yvg6sJK7smQaRO6cLhDTs87Ssdzcz0tEIVIw/hB/BtMu1TUQM FSN7sXBFTu+IgyqE1+HUCizQAFsIQD74kSU4kroFwUd8Vt+8Wg1ZC7V2N O4tTazKFaeGxz6eCMS5jUxa/oHKtqjc702fpcOJ7po6Dnvng5Eu3SKQb3 A==; IronPort-SDR: MVEBw+ovgQv6OsSz0L9N63tKBTdi2SELaJdR/Hq2YqyLsqssXm7lraufGW52sG6iwp54seORci usE/cyTLb329YBYkl/xvdtb5S6SJqc91MD0ymFUZQVbOpMJV2clWZqyEXiap5YAaPu53NohMgZ cAu+o6WOrxLjFR7wtOTnLL8Iw9spocimjflPYsbjsQnZ6KkWwXNU+q2SPUvr1EJc7Brs7BMMhO N4Jgtt13rwreNJ0tCVUb/hpmsLsFb/bXJ005dmjqTGofidwRwnlDRV8pMLif+sipTec8BuI3GQ GBw= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788142" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:33 +0800 IronPort-SDR: JlAVwbc2QeipCJd3YRPKa7uZQ098mlORQjhOaz4wTil43tml+RtEqaUv+TSEfKBD26GFgNzXJx kmjnjWs1shs9AnaAVd7JcpXXDuRhIctvPrAPPqbpvEtFEiZ2c9LPF06jD5dzf/pX/hn2qW/bqv G64YsExEzrYJK1DUyMERXAj0OCfQR0/TCQJElpgtHRWiPo9wbQMtCC5DWjJ/iNMXxSYbYDsgr7 /dkiuXs/JU7xCD2o241EYmJQmNVPOh8PNyZoDejbh2si8bvnuGLLFbiuHxZxUo4Eqi30ACbWk9 +lIhlPp/nO1l8dqlPSnLbwVU Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:58 -0700 IronPort-SDR: jePceaJEY0K8E/1Np1VLQUveU6e0n75LgvqEVirFGHbIuepCIJbXYPrU9pXMsJ2mBmXitL1abX nsI20wmOc2teDdAsu5mwer3KQ0a8pQ5DLXyVh60O5MJ5gzAXFi8X/KSMsHD3Pd8caL9WqRpbxL kta4xEkRSedkplYTxutxhVoiUIvhRpqK594HK5pVVVVK3YgNmQtOJhaI42WqN4iBeb9RxZvkx1 b3M4uKyplk1nEpRBd46dAu1zOXQInJyZbsMLS+znXdgayNPYNllBwJSn12LqIymLyqS3ZEFW0L rOw= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:34 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 23/26] btrfs-progs: mkfs: zoned: use sbwrite to update superblock Date: Mon, 26 Apr 2021 15:27:39 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Use sbwrite instead of pwrite to support superblock logging on zoned btrfs. In addition, call fsync() to persist the superblock to ensure the write order. It also helps us to detect an unaligned write (write to a position other than the write pointer) error. Signed-off-by: Naohiro Aota --- mkfs/common.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mkfs/common.c b/mkfs/common.c index 3d10ad086754..cee6a54ae7a5 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -473,13 +473,16 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) buf->len = BTRFS_SUPER_INFO_SIZE; csum_tree_block_size(buf, btrfs_csum_type_size(cfg->csum_type), 0, cfg->csum_type); - ret = pwrite(fd, buf->data, BTRFS_SUPER_INFO_SIZE, - cfg->blocks[MKFS_SUPER_BLOCK]); + ret = sbwrite(fd, buf->data, cfg->blocks[MKFS_SUPER_BLOCK]); if (ret != BTRFS_SUPER_INFO_SIZE) { ret = (ret < 0 ? -errno : -EIO); goto out; } + ret = fsync(fd); + if (ret) + goto out; + ret = 0; out: From patchwork Mon Apr 26 06:27:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BCEFC43470 for ; Mon, 26 Apr 2021 06:28:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 605A161249 for ; Mon, 26 Apr 2021 06:28:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232050AbhDZG3T (ORCPT ); Mon, 26 Apr 2021 02:29:19 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232036AbhDZG3Q (ORCPT ); Mon, 26 Apr 2021 02:29:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418515; x=1650954515; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KxdTftG6NiT2MFIWuKJzxmh1xXy8Tl6pDMJVV/Y3K2c=; b=bkJJSgqHZmwkEkHUpqrwO2anWobeIJaNBtUadc+1+uXbFarRI0mwMLOy EyU9Krzb9LW6t3Gzgy6ZmJ+ZPwoUYJC8WF/czmuwA25PpupNLc8UFFloV FnXWwJWZdL/+pSjQVS3w/FTf3TzSnHscuX6e4duR8ug2VH1qJMxlFS0DE HSCa7BqDRByBVQD1Gf9t4LkAw/cjtBbP0WgGlE5SsLQaCOsvhmJQoXBy6 66BkGzwN0uDnQRJWnBt3VlYWhxNZ+fNiFuRejDooUesWgmWEnydYcI7rp dYNroEiWGXoLCrJjlbvWMNbE1uiAlHszoYQtuVI6psRHSvk0gz5sYbwvw Q==; IronPort-SDR: ZvzAdVybFmGlSMU2dFjMcFZa/Snpq0cTmUTPK8KaOrZ2EfwLpT266t0OO2UtmrjwqHSDAHE2N6 nYILNL89YIv08Sxu/9CE/zds+lDDn6MNdGU7d1sl65VqV2XdNMoukcu4sJ3L3bhqSOLKkdrray xD2xfRiGBqWSltL59eOD6sVPv7J6hclr5+zpxxiUTbyQ2wBhIeiM0KKZBqx3KUFHPw0d9UrLzJ Xety/dUTsmQ/jH/EV7pfU7EOMSvGQOuDy/kINRGarqxbTfYxFPQas4IYjRVIXrU0cIAFFMrMWi K7U= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788143" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:35 +0800 IronPort-SDR: W95lNDP/3B77bB16TpzuLRxBSzXKgEYLNZKItmoEK70CC6eH2edm+TbqkQGRAn377QPImEEHPx 8YeR1ZuY+ig5w3L7Z+FVyPcE3D6q/KYnM8DEDC4//uFWrGZbJpvs0OrfT8wTyt216xC3Q9V7ut c/Vy8qEbaXwOUF2hDWkIqIUDUFTdZZ9eZmAz+lGW53cEhAPl6ZWLCkdMeS+T+i99NhU4mo9T1g VBt0tYo3gL/U3usWDZIZ8bY4AoyhnxiFRPPEaSkvwbJjDX8+7cH6b10xSMM/wetk940dmRx0EN UgIET3sJ9aYyXvNnokNpEqEc Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:08:59 -0700 IronPort-SDR: Z6uVk2pKgKb7aWNlf7M0gO4VPeTU+nt4Iv+yDJhM5k0De8JgwKbQbQArrzNuhZHmwJXtKMvPDk pNZ1HQboS63PEaqddvtYsU7Ry04kZh7WI3swz5QP8ZihssmD2xC+gdhIfxHoaa+8jkELfbYJta qBP8LpfL+YqzwQ4z550ij43z2ksDJcUHIaKBEtR8vN7M5mcjuuDp8d4mkb+mbUYTmNMVSbO8YR neKGJdw+n/Lw9Kx/lD9gwISnC3soegNo0fk0PutpbRcKX6cj3egw+y4Edc0/3dQp8A1SrUtAiM xOA= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:35 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 24/26] btrfs-progs: zoned: wipe temporary superblocks in superblock log zone Date: Mon, 26 Apr 2021 15:27:40 +0900 Message-Id: <3425b775916cd02bb7bdf57e018ea0ead833db4e.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Mkfs.btrfs uses a temporary superblock during the initialization process. The temporary superblock uses BTRFS_MAGIC_TEMPORARY as its magic which is different from a regular superblock. As a result, libblkid, which only supports the usual magic, cannot recognize the volume as btrfs. So, let's wipe the temporary magic before writing out the usual superblock. Technically, we can add the temporary magic to the libblkid's table. But, it will result in recognizing a half-baked filesystem as btrfs, which is not ideal. Signed-off-by: Naohiro Aota --- kernel-shared/disk-io.c | 6 ++++++ kernel-shared/zoned.c | 20 ++++++++++++++++++++ kernel-shared/zoned.h | 6 ++++++ 3 files changed, 32 insertions(+) diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index d79d6a00cdf8..355010277ca9 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -1951,6 +1951,12 @@ int close_ctree_fs_info(struct btrfs_fs_info *fs_info) } if (fs_info->finalize_on_close) { + ret = btrfs_wipe_temporary_sb(fs_info->fs_devices); + if (ret) { + error("zoned: failed to wipe temporary super blocks: %m"); + goto skip_commit; + } + btrfs_set_super_magic(fs_info->super_copy, BTRFS_MAGIC); root->fs_info->finalize_on_close = 0; ret = write_all_supers(fs_info); diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 3c476eebf004..8801ed43157e 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -975,6 +975,26 @@ int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid, return 0; } +int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices) +{ + struct list_head *head = &fs_devices->devices; + struct btrfs_device *dev; + int ret = 0; + + list_for_each_entry(dev, head, dev_list) { + struct btrfs_zoned_device_info *zinfo = dev->zone_info; + + if (!zinfo) + continue; + + ret = btrfs_reset_dev_zone(dev->fd, &zinfo->zones[0]); + if (ret) + break; + } + + return ret; +} + #endif int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) diff --git a/kernel-shared/zoned.h b/kernel-shared/zoned.h index 9e1ce3ae103f..a2e84464a221 100644 --- a/kernel-shared/zoned.h +++ b/kernel-shared/zoned.h @@ -94,6 +94,7 @@ int btrfs_reset_chunk_zones(struct btrfs_fs_info *fs_info, u64 devid, int btrfs_reset_all_zones(int fd, struct btrfs_zoned_device_info *zinfo); int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, size_t len); +int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices); #else #define sbread(fd, buf, offset) \ pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) @@ -154,6 +155,11 @@ static inline int zero_zone_blocks(int fd, return -EOPNOTSUPP; } +static inline int btrfs_wipe_temporary_sb(struct btrfs_fs_devices *fs_devices) +{ + return 0; +} + #endif /* BTRFS_ZONED */ static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Apr 26 06:27:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 143E1C43462 for ; Mon, 26 Apr 2021 06:28:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC59C61249 for ; Mon, 26 Apr 2021 06:28:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232056AbhDZG3U (ORCPT ); Mon, 26 Apr 2021 02:29:20 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232040AbhDZG3R (ORCPT ); Mon, 26 Apr 2021 02:29:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418516; x=1650954516; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sRDalgsr7TMC45xcY7giVXe7YppZxa1ApJwfiZdlyhY=; b=gqiy59vtq3DYzJRzRg/me3V6/VURciIKDaFrkwoEuKofvzKZXctIOz+Y jVnwuesyxGW7H36xq68gTOMmMd+X8DEt8dwTUtGOh3Mzf7UKMQLJ/5Fqj mUtdNzFhZZUUk8zUWWre3Eaol9nv3vyJzKnPEL8Cp3qbTXuNUvk/+w24v DCzRrIiIQLX+kWwBrmKbmGs2fEgE5mKtvm+8LaVwQol4BmZwcd4B5RoMz 9SgN6fymM2kXn1TgvpicRoyM2jGFwa8GWN0mMMUVvFpnEm55Snam4CoAq EBzDBXso5xog6Di8F1XteZaRJBHC7euZoKepLqsKlxLI6t7UHggGBX4Uz w==; IronPort-SDR: kMqSDHWlj5f+gEY+ySBy5X3yprcf1kKqoFMiNrkF4GN3r1JTLqmWMt36AvZ/9PArKwX4UzYR53 LAniwSWrAvSB7jrahAuPkCm1n9+p6ITIbdNPMviMD8GN8bh3JDFKcNWZaV3W3MTeXrj8kJTOyI W3moLZVrReY9UsU1mR6CPmBlIUJTO6KLglar3YwdTHX6kbbx5oM7BaWcNDyXIddZff9JbsnKZ5 BRAH4RIjEl3ELbDyR4OAlmysejXmJKhOMj5xZMKeAwDAol/+SW3puH17jtgWzqmshNJEUUELGL 1Q0= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788144" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:36 +0800 IronPort-SDR: sWuLQdTClUVLndH0aqLByOXOJMV9FQs7iFdAyXTr0nWRQoO78MAXvjb1j4RqazeReQxDBkW9f3 TLbNDfQQqCaaLuh9egJngn/FsaNxPmTbFjjOSEFxgc6KStNTB5SXnkaWRLXyHq1Cd+iFbdkHNU WwbbhNDK7vnwCZ3gLCbKepsWVvGuBAsUXIfJLCra84cAja1k5C8fM+QtiQ5uqXUzZJr43BcOEz K3J3izR5SrmMxUAadcLBkCpSJ1BBIvXK4AaaMWTBVnJayqVHkHF2bG5sLezGbkfo9CuqpwVDTk ul0FbIA0m6P+OaOwsKrKnqLB Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:09:00 -0700 IronPort-SDR: 79SmdgjlGxoV3aL0pzZ+zK9Xl8MzmuxXZJ19ETNNGhtRiLHtKSq9PcfR3MEKgEc+/LRS9hzisy 7kTEVhZ2a/1+C1s1ltXClb9Nx4/SeuybZZ67TIpJyeyGAjE8H+DFU5CvXh6n/nMp+paOPflATx k4X4eEavaHfYnRVsgHZ6DeELRJzheK2Uf6WmHfef/d4d3oCLjGe37Yne5Cic70Fut1WQ3Y2xwK nlXuYbLuIsUPSXqZjhfrlGL8klVB0/RsnTrO97/Jbev1udZv8ip+96LgroSDgwI5VETmZkroLC Xbw= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:36 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 25/26] btrfs-progs: zoned: device-add: support ZONED device Date: Mon, 26 Apr 2021 15:27:41 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch check if the target file system is flagged as ZONED. If it is, the device to be added is flagged PREP_DEVICE_ZONED. Also add checks to prevent mixing non-zoned devices and zoned devices. Signed-off-by: Naohiro Aota --- cmds/device.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/cmds/device.c b/cmds/device.c index adc21053fbc8..4cc104b788bb 100644 --- a/cmds/device.c +++ b/cmds/device.c @@ -29,6 +29,7 @@ #include "ioctl.h" #include "common/utils.h" #include "kernel-shared/volumes.h" +#include "kernel-shared/zoned.h" #include "cmds/filesystem-usage.h" #include "cmds/commands.h" @@ -65,6 +66,8 @@ static int cmd_device_add(const struct cmd_struct *cmd, int force = 0; int last_dev; bool enqueue = false; + int zoned; + struct btrfs_ioctl_feature_flags feature_flags; optind = 0; while (1) { @@ -113,12 +116,27 @@ static int cmd_device_add(const struct cmd_struct *cmd, return 1; } + ret = ioctl(fdmnt, BTRFS_IOC_GET_FEATURES, &feature_flags); + if (ret) { + error("error getting feature flags '%s': %m", mntpnt); + return 1; + } + zoned = feature_flags.incompat_flags & BTRFS_FEATURE_INCOMPAT_ZONED; + for (i = optind; i < last_dev; i++){ struct btrfs_ioctl_vol_args ioctl_args; int devfd, res; u64 dev_block_count = 0; char *path; + if (!zoned && zoned_model(argv[i]) == ZONED_HOST_MANAGED) { + error( +"zoned: cannot add host managed zoned device to non-ZONED file system '%s'", + argv[i]); + ret++; + continue; + } + res = test_dev_for_mkfs(argv[i], force); if (res) { ret++; @@ -134,7 +152,8 @@ static int cmd_device_add(const struct cmd_struct *cmd, res = btrfs_prepare_device(devfd, argv[i], &dev_block_count, 0, PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE | - (discard ? PREP_DEVICE_DISCARD : 0)); + (discard ? PREP_DEVICE_DISCARD : 0) | + (zoned ? PREP_DEVICE_ZONED : 0)); close(devfd); if (res) { ret++; From patchwork Mon Apr 26 06:27:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12223921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 619ABC43600 for ; Mon, 26 Apr 2021 06:28:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4047C611CE for ; Mon, 26 Apr 2021 06:28:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232059AbhDZG3U (ORCPT ); Mon, 26 Apr 2021 02:29:20 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:41951 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232046AbhDZG3S (ORCPT ); Mon, 26 Apr 2021 02:29:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1619418517; x=1650954517; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+fwcmCXKKtTFB5Xass2mOzXgeswnfWXCWGsBUZMjcNc=; b=WyXnm79kuOGB2W5S3IHKt6vhxoSexlSbks3IbX/PTULepqPL107SyyO1 G1K5WaCWDbJfkkVpzbiKw7NOK4+T5+PHZtCYvQTfrrPTe3iS48IS3pxA8 j0vGiaaJrAAEszT7DVXNmEMF56nlg4gui2MbGX9616qFUIUI0VD85FHGV LlP3HMsTeUEqjy2ng2V0fWBBqZTxNBSaOpDobUxaG7lshrtTYKZoFsPCe RGhERMgT+ZJb5IDPJW4nhjvAx0CfIJhxVLguB7/LjLieDMSopWi1Q3O4X LB0TrbLRlitFngfdomrANQ4W2us29Mod0k3Mobhdx3ErMQYD+QdHxm+KT Q==; IronPort-SDR: fgFhEdl2E56xG6zY5C6AURprwzKUVutQYLRpBUd+VDcm7QoE5iDi/1Y9rchDwVuX9wY+jqSUe9 s8ZKQ9ZDOaCgKX7G27xqfZZejQk3gyO/DuFQcfqzr7MK3u+sZf+YaSAebvhtalbRL1WpKRoJQ0 0vqNVhVPvOZImANT3eIBZMmrUeLfJglWagM3dkWnocUM9S4CS5STEe53F6Ixcy0/Pqc/V3XC8Z X7uPxul/2GXLfy/RAJ1Ozt2+NIJke1aX7qdaeXzmKjPQwqtjG4aBs56AlTx2Eu4TVk+gtfFHy6 Srw= X-IronPort-AV: E=Sophos;i="5.82,251,1613404800"; d="scan'208";a="170788145" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Apr 2021 14:28:37 +0800 IronPort-SDR: tW91xyIxFdJdHJRxCrqt2W5cpQ3G9i/YEJM3QBddidvpLdSU1wUxWEICu3oTZYlT8I29dBuu6v EYaIOv/80lU1E5kWcGhYJCju33E4XoJCSlLECsNApbaOw84KFuluksaug5dqQdYy8hP2HHlwKc qAH4U42+ARCmkuFjnJ2lZQ63y+8SrQM6Y8EqoPeS4ljxknN3XePpie78umBtyc05iAgB0xDP6G 144hfKeWG/CxjOCyGITn5whjSm2Tk8xFMv3p/RvCvxxKPN1CZ6zJqkomOgUVJ/sU9IaCm0YnQ7 W4WPVOwVJ7B2c5AeNaOGJva5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Apr 2021 23:09:01 -0700 IronPort-SDR: RhviBfCuwSH3Fu2C3J5d9gj3VrxU5FRxYgGgBvk/tqQSfFb7QaSZbmO3hbLVDHJN/Z5KFAgIj+ PWl5fWYW/Q/ClB7S+841NjpH79Nfh8ccyUhB9Nevuhcld77BY+BOMNqiGWjGXGnCB9OnC7UWmR yiH7epYcYClKwCtQm9z+WUpDAlzwio0yCGPq7diHc9oum5Q6JLMuOccTyHUli1KhIICnx5QqxG WiUXzvqEnyaEHw3/3Xn9LT9W/lOYw58SI7GQCx2B5FQNbJDHHEgM/VbzzjBm/ZSmIH4Cxl+lDi yJk= WDCIronportException: Internal Received: from bgy2573.ad.shared (HELO naota-xeon.wdc.com) ([10.225.48.58]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Apr 2021 23:28:37 -0700 From: Naohiro Aota To: David Sterba Cc: linux-btrfs@vger.kernel.org, Josef Bacik , Naohiro Aota Subject: [PATCH 26/26] btrfs-progs: zoned: introduce zoned support for device replace Date: Mon, 26 Apr 2021 15:27:42 +0900 Message-Id: <63e64f41e87756562d34322f19c78e5e5b4ec068.1619416549.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch checks if the target file system is flagged as ZONED. If it is, the device to be added is flagged PREP_DEVICE_ZONED. Also add checks to prevent mixing non-zoned devices and zoned devices. Signed-off-by: Naohiro Aota --- cmds/replace.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/cmds/replace.c b/cmds/replace.c index 53af8ca61898..1de4a6d3ca9f 100644 --- a/cmds/replace.c +++ b/cmds/replace.c @@ -122,12 +122,14 @@ static const char *const cmd_replace_start_usage[] = { static int cmd_replace_start(const struct cmd_struct *cmd, int argc, char **argv) { + struct btrfs_ioctl_feature_flags feature_flags; struct btrfs_ioctl_dev_replace_args start_args = {0}; struct btrfs_ioctl_dev_replace_args status_args = {0}; int ret; int i; int fdmnt = -1; int fddstdev = -1; + int zoned; char *path; char *srcdev; char *dstdev = NULL; @@ -182,6 +184,14 @@ static int cmd_replace_start(const struct cmd_struct *cmd, if (fdmnt < 0) goto leave_with_error; + ret = ioctl(fdmnt, BTRFS_IOC_GET_FEATURES, &feature_flags); + if (ret) { + error("zoned: ioctl(GET_FEATURES) on '%s' returns error: %m", + path); + goto leave_with_error; + } + zoned = feature_flags.incompat_flags & BTRFS_FEATURE_INCOMPAT_ZONED; + /* check for possible errors before backgrounding */ status_args.cmd = BTRFS_IOCTL_DEV_REPLACE_CMD_STATUS; status_args.result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_RESULT; @@ -286,7 +296,8 @@ static int cmd_replace_start(const struct cmd_struct *cmd, strncpy((char *)start_args.start.tgtdev_name, dstdev, BTRFS_DEVICE_PATH_NAME_MAX); ret = btrfs_prepare_device(fddstdev, dstdev, &dstdev_block_count, 0, - PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE); + PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE | + (zoned ? PREP_DEVICE_ZONED : 0)); if (ret) goto leave_with_error;