From patchwork Thu Aug 8 09:30:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083725 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3EE4614DB for ; Thu, 8 Aug 2019 09:31:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F1C4289CF for ; Thu, 8 Aug 2019 09:31:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2330C28B00; Thu, 8 Aug 2019 09:31:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BE14E289CF for ; Thu, 8 Aug 2019 09:31:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732275AbfHHJbP (ORCPT ); Thu, 8 Aug 2019 05:31:15 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59627 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731122AbfHHJbO (ORCPT ); Thu, 8 Aug 2019 05:31:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256674; x=1596792674; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jfxhO32hn7zstCghu1KgmwZMJF8yqJ8z6sMLGzvyNSU=; b=Jl5X4rjjVdy5Exx7AyD6FvM6eso0Y53W7vMHgKbr+BbMa2cOwor90jcN NzkDT3MXYnDpYTqXzL1leL1WFeVNOD3v+gHs+9aWUApljAOs5Z9aicVu7 5gHUeOqRnlO3SgPrCkBrwWt/4ANzLLJCCoCvXxsKyTdKXVquIMtWQ107S xakrXhHQjhG+elNwRpx4FfYHgcpww3Iv2N1PaE2YIPL45PamE4EuXRmC2 xeNlBda2Kfg7c2RkYYsv6j3Pd37csEskbZ2OF4rEVIDaZK+BQPL8NYVkI 6JkgZAKit0U3/2dw3LBJ7QueRQD/Bt9wqwSLbkEaoAGvJumMz4vueUmDl A==; IronPort-SDR: UsHy/sTcOb8Dx0wjMtDLx+GlVQS5EEBi3Uqx+Vg1Xoo7jqbl1CtyZHNc1q5Vo0u1G02S8mY7D7 mP+8E0JActzM7pqLRs6mDT09nFekapj3Ex4cosFbB+CXjLmQ0UxlEwHekQCAgoK5S3W6Hp0Tyc JTiVK7h9KTbYOIABK81hTCoTMaQNkwxhVIfOPeJWC9CHMW51jgwFDMxUHMmoLl5OqdfyI+KrYz UzJc+UOYBob+FZumN9n1I+A3kVV8A/VP7133Ymslpk74TNEcwT9P11Kws5RJarsydRzpBEDivf Ne8= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363286" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:14 +0800 IronPort-SDR: R6fyWVHP15c9F8Ou7jebzbiT7t6M77d8ws35T2gwfJV5wV7eNNWYKrheqRsvx48ZXNSh7nkFR2 S9RCQ3BCJPStG8ch1a9Fd+FJAT0HLLwcQ0bHV6eGM458Zhe9qUhzpRg1w5QQQ7fBw33CSM/Dj7 ScZpV6SKmg8/F2feMpKvATmlch8xuMC4Gi0lX2jq5BIk/XKvxyE37BioDwMy6vkgpVcAUdwzCt bsbXlS40+bsh/sI3bWaxUFFf0y3jcsjhfnOLHaI79czxazFGbCZs3Hg1B12tNs0SZOl6NtP0iX xhxkbbsNqIbj27Bwo5bEC+VY Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:28:58 -0700 IronPort-SDR: AoBvL0sHRNMUHesl0ewMllz6ceCWPs8G/L9LvWRAenILKFR0SBc1895k7HvPzH+xCu7simG9T2 CqMdqp7hT0zNwvv26mo9JKH8nEMR9oAvL7y22/X4IYqUJkyerwobRKBxO3Q/JLLdoLoGbJIz1c k6AwLhG04pmABXNw95V5142W+XHjpkqgA8YyJCCuZ/imkAnSp/DQT9vp276k3Sdxaafk44SEuo OWwLjfS9AnMiNMuT6YsMyS/34pYyJhI1Epd0FC+NV8rb0a+PyC+/OOJQogbwT+JyNJeRXODGg0 4VU= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:13 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 01/27] btrfs: introduce HMZONED feature flag Date: Thu, 8 Aug 2019 18:30:12 +0900 Message-Id: <20190808093038.4163421-2-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces the HMZONED incompat flag. The flag indicates that the volume management will satisfy the constraints imposed by host-managed zoned block devices. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota Reviewed-by: Anand Jain --- fs/btrfs/sysfs.c | 2 ++ include/uapi/linux/btrfs.h | 1 + 2 files changed, 3 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index e6493b068294..ad708a9edd0b 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -193,6 +193,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(raid56, RAID56); BTRFS_FEAT_ATTR_INCOMPAT(skinny_metadata, SKINNY_METADATA); BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES); BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID); +BTRFS_FEAT_ATTR_INCOMPAT(hmzoned, HMZONED); BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE); static struct attribute *btrfs_supported_feature_attrs[] = { @@ -207,6 +208,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(skinny_metadata), BTRFS_FEAT_ATTR_PTR(no_holes), BTRFS_FEAT_ATTR_PTR(metadata_uuid), + BTRFS_FEAT_ATTR_PTR(hmzoned), BTRFS_FEAT_ATTR_PTR(free_space_tree), NULL }; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index c195896d478f..2d5e8f801135 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -270,6 +270,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8) #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) +#define BTRFS_FEATURE_INCOMPAT_HMZONED (1ULL << 11) struct btrfs_ioctl_feature_flags { __u64 compat_flags; From patchwork Thu Aug 8 09:30:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083731 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D17A014DB for ; Thu, 8 Aug 2019 09:31:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1978289CF for ; Thu, 8 Aug 2019 09:31:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B556628AFF; Thu, 8 Aug 2019 09:31:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF8F8289CF for ; Thu, 8 Aug 2019 09:31:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732285AbfHHJbS (ORCPT ); Thu, 8 Aug 2019 05:31:18 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59627 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731122AbfHHJbR (ORCPT ); Thu, 8 Aug 2019 05:31:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256676; x=1596792676; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WG/9WLMgzFHO2fyfnPBieI1csnC0YJkjkIVhiXcCbNI=; b=BF7yuMu/359cAP9uC9OuwqV32/N6H+yNkkqfTZ9prTNiZN/4Ep4VpB5/ Ttb0YswkvnlpYOeDr4e0hvXo/6CMLiqCtUHOM1aDg1B6OsLPw0kih9Jhk Gg28GlVmnuYPt26DM5vfdz6GY9b1RLKOocLlPvUbZQ5roNevJcKrYlZHH MRlZMklxbpeiA7PK8cuSA8UiXaQD9FdTS99cBt4KSwf2PdQsWCEAjHd/M XlfrK49LgwKLDwTl4fMy279exdgNyUgIzRbWzB5DO7wCBuYE8CubiOOKa i1mbv8p54UQOe995d1T55AMxkd7svYfqwqAfn0ps54FxSAc2TP7i3Be8a A==; IronPort-SDR: bZ91BNb3RoKY6EjOd+nTFRogYgMDzQULkqhfuKfYIyZnVLEnKHuMmEVLEmsYqiXVJrBTB7DMJU 9ex4mz2N3uaowLyeoiRE8FTpovKbtJygaXdZT8rRJ5ND5+c25sR+GVS9kIdEskGrQ3qHi8aBp4 Qf7DV2YU9nT9if9XSaBlmy2XVhKb9+uPdavnlJ0sffp+A5tmKinCoNFdC0U4s3lmeA+FtiHqHU Qv7mPbmMu3vPxnyCe4BMJsrpZuWO1mlqfKPSgDk0rEs0Gx8YaeleV65p6OXWyr4wyJuBnWIdli w4A= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363294" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:16 +0800 IronPort-SDR: W2FSJXYpqL/vmIdvLoM9J7DFN1Dk/c0R0SRFpHoQkxjDQxJ7aVNwHK5qmZSYg0djDkhgBcjJZB X5dXrsfx36W188diY4HzklHgs2aJOvVex8yo5Y95n/rqEYl2rLsKKdwNIdVXUsxFTHMbRpduyp 8btXk7izLsAKL+Ymp4gKPmHRmElH+FC5y2cALWM06XqiZ9EpsabtJVqayYqvL97JbdzXX5ztNr 7Vj9FpxuPVDcdZBDpOOfwNQBTfKbXKRIfVvA2+NNoPPXNvbiDXDuZL1tWXogcgJyxjPaCEeBZs kEPHjYJHJ9F+SmLwExI25bz7 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:00 -0700 IronPort-SDR: zIwsiSxGtbSsr5+L4sjFGAmkD1DIJnEyj7eCTteHEDt5rrIb+Je6fCw4f6Bn93I8z8pPKvlpPE IEpKex5DQYIQxfrTIjEp/gFlhcfMhmGEjN+5BkocLs+R4UHtwO40JlE71cQQlZ1evwQ6llwbMn FBA+eWwo6LBYbITR5WOZzytXynxzJgf1zw/4ZEmOzIJ1NXRtQ/GX2imMqxxrp1HMmniDHrEwgQ 7o80tSD2jXzm5dHvRmZSt/P7sA3eLzfIDCX8sM9M1K8osWKODdz1lncAo0Zw7j44o6DtuNTOKr uzo= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:15 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 02/27] btrfs: Get zone information of zoned block devices Date: Thu, 8 Aug 2019 18:30:13 +0900 Message-Id: <20190808093038.4163421-3-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If a zoned block device is found, get its zone information (number of zones and zone size) using the new helper function btrfs_get_dev_zonetypes(). To avoid costly run-time zone report commands to test the device zones type during block allocation, attach the seq_zones bitmap to the device structure to indicate if a zone is sequential or accept random writes. Also it attaches the empty_zones bitmap to indicate if a zone is empty or not. This patch also introduces the helper function btrfs_dev_is_sequential() to test if the zone storing a block is a sequential write required zone and btrfs_dev_is_empty_zone() to test if the zone is a empty zone. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/Makefile | 2 +- fs/btrfs/hmzoned.c | 162 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/hmzoned.h | 79 ++++++++++++++++++++++ fs/btrfs/volumes.c | 18 ++++- fs/btrfs/volumes.h | 4 ++ 5 files changed, 262 insertions(+), 3 deletions(-) create mode 100644 fs/btrfs/hmzoned.c create mode 100644 fs/btrfs/hmzoned.h diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 76a843198bcb..8d93abb31074 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -11,7 +11,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \ - block-rsv.o delalloc-space.o + block-rsv.o delalloc-space.o hmzoned.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c new file mode 100644 index 000000000000..bfd04792dd62 --- /dev/null +++ b/fs/btrfs/hmzoned.c @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * Authors: + * Naohiro Aota + * Damien Le Moal + */ + +#include +#include +#include "ctree.h" +#include "volumes.h" +#include "hmzoned.h" +#include "rcu-string.h" + +/* Maximum number of zones to report per blkdev_report_zones() call */ +#define BTRFS_REPORT_NR_ZONES 4096 + +static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, + struct blk_zone **zones_ret, + unsigned int *nr_zones, gfp_t gfp_mask) +{ + struct blk_zone *zones = *zones_ret; + int ret; + + if (!zones) { + zones = kcalloc(*nr_zones, sizeof(struct blk_zone), GFP_KERNEL); + if (!zones) + return -ENOMEM; + } + + ret = blkdev_report_zones(device->bdev, pos >> SECTOR_SHIFT, + zones, nr_zones, gfp_mask); + if (ret != 0) { + btrfs_err_in_rcu(device->fs_info, + "get zone at %llu on %s failed %d", pos, + rcu_str_deref(device->name), ret); + return ret; + } + if (!*nr_zones) + return -EIO; + + *zones_ret = zones; + + return 0; +} + +int btrfs_get_dev_zone_info(struct btrfs_device *device) +{ + struct btrfs_zoned_device_info *zone_info = NULL; + struct block_device *bdev = device->bdev; + sector_t nr_sectors = bdev->bd_part->nr_sects; + sector_t sector = 0; + struct blk_zone *zones = NULL; + unsigned int i, nreported = 0, nr_zones; + unsigned int zone_sectors; + int ret; + + if (!bdev_is_zoned(bdev)) + return 0; + + zone_info = kzalloc(sizeof(*zone_info), GFP_KERNEL); + if (!zone_info) + return -ENOMEM; + + zone_sectors = bdev_zone_sectors(bdev); + ASSERT(is_power_of_2(zone_sectors)); + zone_info->zone_size = (u64)zone_sectors << SECTOR_SHIFT; + zone_info->zone_size_shift = ilog2(zone_info->zone_size); + zone_info->nr_zones = nr_sectors >> ilog2(bdev_zone_sectors(bdev)); + if (nr_sectors & (bdev_zone_sectors(bdev) - 1)) + zone_info->nr_zones++; + + zone_info->seq_zones = kcalloc(BITS_TO_LONGS(zone_info->nr_zones), + sizeof(*zone_info->seq_zones), + GFP_KERNEL); + if (!zone_info->seq_zones) { + ret = -ENOMEM; + goto out; + } + + zone_info->empty_zones = kcalloc(BITS_TO_LONGS(zone_info->nr_zones), + sizeof(*zone_info->empty_zones), + GFP_KERNEL); + if (!zone_info->empty_zones) { + ret = -ENOMEM; + goto out; + } + + /* Get zones type */ + while (sector < nr_sectors) { + nr_zones = BTRFS_REPORT_NR_ZONES; + ret = btrfs_get_dev_zones(device, sector << SECTOR_SHIFT, + &zones, &nr_zones, GFP_KERNEL); + if (ret) + goto out; + + for (i = 0; i < nr_zones; i++) { + if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) + set_bit(nreported, zone_info->seq_zones); + if (zones[i].cond == BLK_ZONE_COND_EMPTY) + set_bit(nreported, zone_info->empty_zones); + nreported++; + } + sector = zones[nr_zones - 1].start + zones[nr_zones - 1].len; + } + + if (nreported != zone_info->nr_zones) { + btrfs_err_in_rcu(device->fs_info, + "inconsistent number of zones on %s (%u / %u)", + rcu_str_deref(device->name), nreported, + zone_info->nr_zones); + ret = -EIO; + goto out; + } + + device->zone_info = zone_info; + + btrfs_info_in_rcu( + device->fs_info, + "host-%s zoned block device %s, %u zones of %llu sectors", + bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", + rcu_str_deref(device->name), zone_info->nr_zones, + zone_info->zone_size >> SECTOR_SHIFT); + +out: + kfree(zones); + + if (ret) { + kfree(zone_info->seq_zones); + kfree(zone_info->empty_zones); + kfree(zone_info); + } + + return ret; +} + +void btrfs_destroy_dev_zone_info(struct btrfs_device *device) +{ + struct btrfs_zoned_device_info *zone_info = device->zone_info; + + if (!zone_info) + return; + + kfree(zone_info->seq_zones); + kfree(zone_info->empty_zones); + kfree(zone_info); + device->zone_info = NULL; +} + +int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone, gfp_t gfp_mask) +{ + unsigned int nr_zones = 1; + int ret; + + ret = btrfs_get_dev_zones(device, pos, &zone, &nr_zones, gfp_mask); + if (ret != 0 || !nr_zones) + return ret ? ret : -EIO; + + return 0; +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h new file mode 100644 index 000000000000..ffc70842135e --- /dev/null +++ b/fs/btrfs/hmzoned.h @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * Authors: + * Naohiro Aota + * Damien Le Moal + */ + +#ifndef BTRFS_HMZONED_H +#define BTRFS_HMZONED_H + +struct btrfs_zoned_device_info { + /* + * Number of zones, zone size and types of zones if bdev is a + * zoned block device. + */ + u64 zone_size; + u8 zone_size_shift; + u32 nr_zones; + unsigned long *seq_zones; + unsigned long *empty_zones; +}; + +int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone, gfp_t gfp_mask); +int btrfs_get_dev_zone_info(struct btrfs_device *device); +void btrfs_destroy_dev_zone_info(struct btrfs_device *device); + +static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) +{ + struct btrfs_zoned_device_info *zone_info = device->zone_info; + + if (!zone_info) + return false; + + return test_bit(pos >> zone_info->zone_size_shift, + zone_info->seq_zones); +} + +static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) +{ + struct btrfs_zoned_device_info *zone_info = device->zone_info; + + if (!zone_info) + return true; + + return test_bit(pos >> zone_info->zone_size_shift, + zone_info->empty_zones); +} + +static inline void btrfs_dev_set_empty_zone_bit(struct btrfs_device *device, + u64 pos, bool set) +{ + struct btrfs_zoned_device_info *zone_info = device->zone_info; + unsigned int zno; + + if (!zone_info) + return; + + zno = pos >> zone_info->zone_size_shift; + if (set) + set_bit(zno, zone_info->empty_zones); + else + clear_bit(zno, zone_info->empty_zones); +} + +static inline void btrfs_dev_set_zone_empty(struct btrfs_device *device, + u64 pos) +{ + btrfs_dev_set_empty_zone_bit(device, pos, true); +} + +static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, + u64 pos) +{ + btrfs_dev_set_empty_zone_bit(device, pos, false); +} + +#endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d74b74ca07af..8e5a894e7bde 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -29,6 +29,7 @@ #include "sysfs.h" #include "tree-checker.h" #include "space-info.h" +#include "hmzoned.h" const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { [BTRFS_RAID_RAID10] = { @@ -342,6 +343,7 @@ void btrfs_free_device(struct btrfs_device *device) rcu_string_free(device->name); extent_io_tree_release(&device->alloc_state); bio_put(device->flush_bio); + btrfs_destroy_dev_zone_info(device); kfree(device); } @@ -847,6 +849,11 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; + /* Get zone type information of zoned block devices */ + ret = btrfs_get_dev_zone_info(device); + if (ret != 0) + goto error_brelse; + fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { @@ -2598,6 +2605,14 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path } rcu_assign_pointer(device->name, name); + device->fs_info = fs_info; + device->bdev = bdev; + + /* Get zone type information of zoned block devices */ + ret = btrfs_get_dev_zone_info(device); + if (ret) + goto error_free_device; + trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { ret = PTR_ERR(trans); @@ -2614,8 +2629,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path fs_info->sectorsize); device->disk_total_bytes = device->total_bytes; device->commit_total_bytes = device->total_bytes; - device->fs_info = fs_info; - device->bdev = bdev; set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state); device->mode = FMODE_EXCL; @@ -2756,6 +2769,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path sb->s_flags |= SB_RDONLY; if (trans) btrfs_end_transaction(trans); + btrfs_destroy_dev_zone_info(device); error_free_device: btrfs_free_device(device); error: diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 7f6aa1816409..5da1f354db93 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -57,6 +57,8 @@ struct btrfs_io_geometry { #define BTRFS_DEV_STATE_REPLACE_TGT (3) #define BTRFS_DEV_STATE_FLUSH_SENT (4) +struct btrfs_zoned_device_info; + struct btrfs_device { struct list_head dev_list; /* device_list_mutex */ struct list_head dev_alloc_list; /* chunk mutex */ @@ -77,6 +79,8 @@ struct btrfs_device { struct block_device *bdev; + struct btrfs_zoned_device_info *zone_info; + /* the mode sent to blkdev_get */ fmode_t mode; From patchwork Thu Aug 8 09:30:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083735 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4FFC91395 for ; Thu, 8 Aug 2019 09:31:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E37D28AFC for ; Thu, 8 Aug 2019 09:31:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 32BE428AFF; Thu, 8 Aug 2019 09:31:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8185D289CF for ; Thu, 8 Aug 2019 09:31:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732292AbfHHJbT (ORCPT ); Thu, 8 Aug 2019 05:31:19 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59627 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732281AbfHHJbT (ORCPT ); Thu, 8 Aug 2019 05:31:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256678; x=1596792678; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NdguPa/k7TlJRTExHv9E8y7kWVjAudq+c5Ss5Jg+4+U=; b=EBuApeMQcjLMOdV9aAojDsh4AkrLpa2RxgwiS86fxyvIEtHBI8lVKhTL GR1ry5b5mdjOD5hQrNtdQP3G1wYvNJUfzb2jdMgI3nMcsMVz7VzXSkZ0F q/YrgmJPO8y/bWZh2xr6IHqGhjAE3srgJ9dlGbHckiI28IY6Z/7eEUQkF EghvWWGwH1pYApDL8ukvtjPTcJ+zm46ieXmQ8AQ0ib1IDPV5nlhHvsWet kuwIFUF6MDKQtlNTNEL3ujrjM6uwnemCmaKsjNtPHJ0PqSTbHK3wXAbTQ yLQ6U4RjeHUO0GT9F+/Pu9zrlrqHGq+8Kecav2hi5SCZiuvuPvXysX7ZV Q==; IronPort-SDR: +7jvxuXeA1wtUlKB7wPM9iiLAPRoo7gw1lLEH9VUFIc39DU0EGCVkGgf19Cf2otwhMCuCdZL+g 9/iYEOetac4VQuUGaZ2R38gkf5QIIn3TEXSkzNzU/4hneJcaxcmws6rirPALdPFCO35TvNldAn FhZ7g2LbQk1299uisy9kwjBZ+0Yh1LFVEuctOOqg6sF/6XDpxw2kXql02b+fMQVdtqtYid5Bvc PO3J9oO8m8vR0x/09QDKItxuxkA2WW0HLvf8BU6KNtIrn7wv0MnckDQjv52WsBRat4SXrbGtCV H8o= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363302" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:18 +0800 IronPort-SDR: QBtah0HMme2AhIVFNRoVhk9/4Js0yWPXWb82EHSXDIMDvGvQq+LXCAVi0Jq2b+pSU0suAFCqYV WSmV87Y0HUSm9Wwk9u9QTX66e6gWl4lRvHqYXVL2IbWkjnbT8+B1qx6Wu8o//LU2On2ymVnERZ KPTTQ1zq4Ik99ZxeJ43Exu96KAlRAijvTLsfy3VlBipOlqiqoUAzQDQn0jr6x8j7PnqyJI2K20 J/SOOoHuN9kz8pYlqkcz6LTMT1LfpCvl46OwMJo6niSALNYl2A2dp8Loc2BDJllfVrCbus8Smk h1SJo50G6VKu6bNB/d1H8Fhz Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:02 -0700 IronPort-SDR: vokvck1VcQAObBlggLxw3sf4HA7mnm5uPrvGn+51ns0AA/Nuj9kgDDXBoNGgl9b2DeuPyq08VQ 5lVndZDqLiFfLZqkcoXxAt4Ar9BxPi5hdObJ78okMesX9bZk1Q5/xder9gdEvNa1ciUqaztC2Y 1s44M10caILCcM4PZytdNf6muOPsfOF0NP1RgEqdb3Q6GWq8CfVi0YV9QpH2wi5mTV7DiuLOws dSd2hOdhZ7p6rg/60gTI46A+ChW/HZ5bld2pQGmR31tWEpzu+HDYqR49fgWhN29xloO7iql3vN Tig= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:17 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 03/27] btrfs: Check and enable HMZONED mode Date: Thu, 8 Aug 2019 18:30:14 +0900 Message-Id: <20190808093038.4163421-4-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP HMZONED mode cannot be used together with the RAID5/6 profile for now. Introduce the function btrfs_check_hmzoned_mode() to check this. This function will also check if HMZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Additionally, as updates to the space cache are in-place, the space cache cannot be located over sequential zones and there is no guarantees that the device will have enough conventional zones to store this cache. Resolve this problem by disabling completely the space cache. This does not introduces any problems with sequential block groups: all the free space is located after the allocation pointer and no free space before the pointer. There is no need to have such cache. For the same reason, NODATACOW is also disabled. Also INODE_MAP_CACHE is also disabled to avoid preallocation in the INODE_MAP_CACHE inode. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 ++ fs/btrfs/dev-replace.c | 8 +++++ fs/btrfs/disk-io.c | 8 +++++ fs/btrfs/hmzoned.c | 67 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/hmzoned.h | 18 ++++++++++++ fs/btrfs/super.c | 1 + fs/btrfs/volumes.c | 5 ++++ 7 files changed, 110 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 299e11e6c554..a00ce8c4d678 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -713,6 +713,9 @@ struct btrfs_fs_info { struct btrfs_root *uuid_root; struct btrfs_root *free_space_root; + /* Zone size when in HMZONED mode */ + u64 zone_size; + /* the log root tree is a directory of all the other log roots */ struct btrfs_root *log_root_tree; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 6b2e9aa83ffa..2cc3ac4d101d 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -20,6 +20,7 @@ #include "rcu-string.h" #include "dev-replace.h" #include "sysfs.h" +#include "hmzoned.h" static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, int scrub_ret); @@ -201,6 +202,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, return PTR_ERR(bdev); } + if (!btrfs_check_device_zone_type(fs_info, bdev)) { + btrfs_err(fs_info, + "zone type of target device mismatch with the filesystem!"); + ret = -EINVAL; + goto error; + } + sync_blockdev(bdev); devices = &fs_info->fs_devices->devices; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5f7ee70b3d1a..8854ff2e5fa5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -40,6 +40,7 @@ #include "compression.h" #include "tree-checker.h" #include "ref-verify.h" +#include "hmzoned.h" #define BTRFS_SUPER_FLAG_SUPP (BTRFS_HEADER_FLAG_WRITTEN |\ BTRFS_HEADER_FLAG_RELOC |\ @@ -3123,6 +3124,13 @@ int open_ctree(struct super_block *sb, btrfs_free_extra_devids(fs_devices, 1); + ret = btrfs_check_hmzoned_mode(fs_info); + if (ret) { + btrfs_err(fs_info, "failed to init hmzoned mode: %d", + ret); + goto fail_block_groups; + } + ret = btrfs_sysfs_add_fsid(fs_devices, NULL); if (ret) { btrfs_err(fs_info, "failed to init sysfs fsid interface: %d", diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index bfd04792dd62..512674d8f488 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -160,3 +160,70 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, return 0; } + +int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + u64 hmzoned_devices = 0; + u64 nr_devices = 0; + u64 zone_size = 0; + int incompat_hmzoned = btrfs_fs_incompat(fs_info, HMZONED); + int ret = 0; + + /* Count zoned devices */ + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (!device->bdev) + continue; + if (bdev_zoned_model(device->bdev) == BLK_ZONED_HM || + (bdev_zoned_model(device->bdev) == BLK_ZONED_HA && + incompat_hmzoned)) { + hmzoned_devices++; + if (!zone_size) { + zone_size = device->zone_info->zone_size; + } else if (device->zone_info->zone_size != zone_size) { + btrfs_err(fs_info, + "Zoned block devices must have equal zone sizes"); + ret = -EINVAL; + goto out; + } + } + nr_devices++; + } + + if (!hmzoned_devices && incompat_hmzoned) { + /* No zoned block device found on HMZONED FS */ + btrfs_err(fs_info, "HMZONED enabled file system should have zoned devices"); + ret = -EINVAL; + goto out; + } + + if (!hmzoned_devices && !incompat_hmzoned) + goto out; + + fs_info->zone_size = zone_size; + + if (hmzoned_devices != nr_devices) { + btrfs_err(fs_info, + "zoned devices cannot be mixed with regular devices"); + ret = -EINVAL; + goto out; + } + + /* + * stripe_size is always aligned to BTRFS_STRIPE_LEN in + * __btrfs_alloc_chunk(). Since we want stripe_len == zone_size, + * check the alignment here. + */ + if (!IS_ALIGNED(zone_size, BTRFS_STRIPE_LEN)) { + btrfs_err(fs_info, + "zone size is not aligned to BTRFS_STRIPE_LEN"); + ret = -EINVAL; + goto out; + } + + btrfs_info(fs_info, "HMZONED mode enabled, zone size %llu B", + fs_info->zone_size); +out: + return ret; +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index ffc70842135e..29cfdcabff2f 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -9,6 +9,8 @@ #ifndef BTRFS_HMZONED_H #define BTRFS_HMZONED_H +#include + struct btrfs_zoned_device_info { /* * Number of zones, zone size and types of zones if bdev is a @@ -25,6 +27,7 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone, gfp_t gfp_mask); int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); +int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { @@ -76,4 +79,19 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, btrfs_dev_set_empty_zone_bit(device, pos, false); } +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info *fs_info, + struct block_device *bdev) +{ + u64 zone_size; + + if (btrfs_fs_incompat(fs_info, HMZONED)) { + zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT; + /* Do not allow non-zoned device */ + return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size; + } + + /* Do not allow Host Manged zoned device */ + return bdev_zoned_model(bdev) != BLK_ZONED_HM; +} + #endif diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 78de9d5d80c6..d7879a5a2536 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -43,6 +43,7 @@ #include "free-space-cache.h" #include "backref.h" #include "space-info.h" +#include "hmzoned.h" #include "tests/btrfs-tests.h" #include "qgroup.h" diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8e5a894e7bde..755b2ec1e0de 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2572,6 +2572,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path if (IS_ERR(bdev)) return PTR_ERR(bdev); + if (!btrfs_check_device_zone_type(fs_info, bdev)) { + ret = -EINVAL; + goto error; + } + if (fs_devices->seeding) { seeding_dev = 1; down_write(&sb->s_umount); From patchwork Thu Aug 8 09:30:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083743 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 170A914DB for ; Thu, 8 Aug 2019 09:31:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07983289CF for ; Thu, 8 Aug 2019 09:31:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EFC6C28AFD; Thu, 8 Aug 2019 09:31:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F2FF289CF for ; Thu, 8 Aug 2019 09:31:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732301AbfHHJbX (ORCPT ); Thu, 8 Aug 2019 05:31:23 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59627 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732281AbfHHJbU (ORCPT ); Thu, 8 Aug 2019 05:31:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256680; x=1596792680; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L+q9uysGobuJDzywTc2ewr95QDB6LcwNJCL8Gp3Ba0w=; b=JbJhBWfr8hd4pSyX0UYUrwGJIsiI48rYSVGLzI7vBN4xOmUGDkNFV+/4 LLMrl5r6eegs7fJRoNXFMF3/31EoXTYUC1jLgao8fDsT6kXkXoMpvyJ+X vSIS7AlezcNLm0hO1NMXG81nq//WFN3E1xbk0v/xRXGr7tArtlELldasQ 1EQy1pAkLnLkWrfkHwiplneFoX8Eml7APbtZHsl0dOLYIPTLJksI6/gJw 5iQC4r155MWtztjAPCp/4AQuUJnqnI30O5u3+ALRLjJQVrh3CHXT18dK5 K3nTfvzX3D4TtLBsHuyG/ceU7eMs/jQ702dhnOvqbwH7VIaESjsmtwM7g g==; IronPort-SDR: Ct/G8J/b1UilGsRUKvQx5HGONY58052lAZX8j5TE6LgCaVxT5+X137tLKykl7eixDKeQgqaSeo 8PwCA2eS3TocM768EOcCYY0PThY9PgiZ9aDvwPDrewoKN3hOcfSC+g70Ed2oCdri9CLvkZff2T llDd7gRPs7LJNlphkpiYwMzMakNVweIO61KWqFAikCkFDFAe5TYCJCCGuJ1hWSxVwGKEd3Axo/ gC/Xg9qfkBkASwoIK8nQmIlxmJjwUUnkDxwkRfowM2JMh7h3LXtDJv4aQX+MdaOTuGP4UHvhRv xI8= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363305" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:20 +0800 IronPort-SDR: 3gt2ev8PzMoxWeXvtIQHDMHR9Ix+QSauqcSNfE1vdIEDyblWmPp5VZrVjX6aSc6OkOq+uWepdW U2AQU8QuwMDgTQPSkUSXcRK9QC628JwII1GuEYswjb8KNkcEV9fyZFTyIgWqHfGhrYpr9BKKLS zgtIBwmGJ2iB+vbJYCa5dpdtS6z3bvY4wScTJXL7vjJVkafHZi8vM2AbgWLfTRrbUmOiYFLGfQ nsMHpIXF0lOjKMWeQ4+ZrWAgJGtLnXzLN6A5aGLbLm+snVswuZWAMdv6b251P6RWJ736jOuFVD EwvmMKP936HNatizaKRMqwEz Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:04 -0700 IronPort-SDR: xX2J41Lsh3rGBMRJCkY9hyBDSAQ7sIAKt3MjxI+cdMlZiBcMxfE0A3N3eqYwaSD9IKEsrx5lI4 Ju8sddG6vgS5MiYJbzsiq7HlxJhkQpB5RzgUj4s9cPbzOvSgCc6UHap1x4TmvvKNpnstd96AlB cd3Ig5ORFEO5+lpgmy1fr7d3otaO2kaZjwttz58XkC123qnsmHvtVn8+GewekL9cP3l4Xld2fr LJxlK3jw9ifrxVvXsz398NLIRvYjP1v2+CrxBq/v0t9monCSj5WH+W2qw26KHaLbbLraK74f+j gps= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:19 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 04/27] btrfs: disallow RAID5/6 in HMZONED mode Date: Thu, 8 Aug 2019 18:30:15 +0900 Message-Id: <20190808093038.4163421-5-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Supporting the RAID5/6 profile in HMZONED mode is not trivial. For example, non-full stripe writes will cause overwriting parity blocks. When we do a non-full stripe write, it writes to the parity block with the data at that moment. Then, another write to the stripes will try to overwrite the parity block with new parity value. However, sequential zones do not allow such parity overwriting. Furthermore, using RAID5/6 on SMR drives, which usually have a huge capacity, incur large overhead of rebuild. Such overhead can lead to higher volume failure rate (e.g. additional drive failure during rebuild) because of the increased rebuild time. Thus, let's disable RAID5/6 profile in HMZONED mode for now. Signed-off-by: Naohiro Aota --- fs/btrfs/hmzoned.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 512674d8f488..641c83f6ea73 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -222,6 +222,13 @@ int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info) goto out; } + /* RAID56 is not allowed */ + if (btrfs_fs_incompat(fs_info, RAID56)) { + btrfs_err(fs_info, "HMZONED mode does not support RAID56"); + ret = -EINVAL; + goto out; + } + btrfs_info(fs_info, "HMZONED mode enabled, zone size %llu B", fs_info->zone_size); out: From patchwork Thu Aug 8 09:30:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083739 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F3AA71395 for ; Thu, 8 Aug 2019 09:31:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E4437289CF for ; Thu, 8 Aug 2019 09:31:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D809E28AFD; Thu, 8 Aug 2019 09:31:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7179D289CF for ; Thu, 8 Aug 2019 09:31:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732308AbfHHJbX (ORCPT ); Thu, 8 Aug 2019 05:31:23 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59647 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732163AbfHHJbW (ORCPT ); Thu, 8 Aug 2019 05:31:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256682; x=1596792682; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tlSRYL+GTtWyMRutPWGcsp9feWnsZ7UbgYNziYdC/js=; b=V5093T7PvKdI/jY8O8csXEx+Ijkg4H7z15H6YNLkeOiWfkcq8NHf+6W+ /qgqxjhtJE9jSbwP+hG5LbZPAVtEJokv7nJROgikoYi7y05INln1nnk/j WR92wvbbB2LBwp7e3NxVxYf2NTcOUoV5QATtAfAp56YYtuD5MOWvm4Nrx 8efh+ril3DloJG8JD56f0des49P+Dz9V7MC3JiHN+wYbv6lX3Sq0uoAlZ 5EeBzSy80vcZPYQyg3bVqt6jzcfcae5yaLjLS0TfsX2blHhIp45EqyyB/ g4vvNYO/x3yM+ClomLLuIrPvBo8bvbSS1mzHGo7Ov5qPaFDGc/aRn4CRA A==; IronPort-SDR: 6mWWnD9eVAsrS4/ERNWZMUZUflpUqsG+t42BHQU2HqfBKmzeJ3BWX/ZA7VImSMaMelt+7pV/2Y xOd410orh+3jHhklQ5iQXLdzBh9rUeYy/2OqMyT4Kms8zqcEz8e0RZBpVhAOVC6ObYb0OhBkbp ByZ5Nvp7Dz9MAOJDVx1yskC6oqRXF3Fu/QRM6AeflEmxSi+gZwXXRxOhTr+L/UqmQBfHAQbZ+O U8MkdSF6o5ee4yZd75R1iauqgfc3E5C6vJs+vVz5L+LDesI6qHhDPeHCOtkFYqwth1f4Pb3iEQ RqI= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363315" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:22 +0800 IronPort-SDR: L1Sf3DNe40Ghm4ZNVkhDBD4Qe4VqjwcPiOuYlJI+2yVGOYKGyZbgMvl4fjWZcTZFkpwusOsOWG 6nQaLJaddj/6K8+V4KkFYIu9e3LE4NF+CJ+xOShk1uEp8/M1RI/eCcs2H0UDPyL7niW6MBruZh mwNteL2TPryA9h4QBNMZT0mqYT+yws7ERcMZlKIF77FpALTzYeG72vzUbYJi276kv1etf/BjRp ip9e0J3H55uDs9J9kb0JLbJfaqU0XEzLkwarLUviFCTCnaLilW83AmHp9wYmRk97wxDvKmICPI dsCjlnsNnnW2tSdr0Q/pHTal Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:06 -0700 IronPort-SDR: 562Ffc8W/J3sp2S09jR2zOKeoNrL+cJl6jQmACFX/1uKC2spM4dr7Bx21RimySJ4rq/XWhWbYS 7x84e3fMmFz9eihV3fv2vCSkfHw94DZqYMXcT9x7WilEr/6fRc1juD+FcVwRrN1D7hPw87b6kf fVik+GXuvePtDWYPxlo7abFfUww2oCB7x2gjmYrqNg/tjFFv4tgpZRQ+WWUDWyX1l2nZnQ64L8 vbTSiA9ZQz3HwwFNvy8phGwc++tjnrjkk2lDQuqdHibxzaSW4AiaY7tr/YnXrJnN/l34bheqSo txc= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:21 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 05/27] btrfs: disallow space_cache in HMZONED mode Date: Thu, 8 Aug 2019 18:30:16 +0900 Message-Id: <20190808093038.4163421-6-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As updates to the space cache are in-place, the space cache cannot be located over sequential zones and there is no guarantees that the device will have enough conventional zones to store this cache. Resolve this problem by disabling completely the space cache. This does not introduces any problems with sequential block groups: all the free space is located after the allocation pointer and no free space before the pointer. There is no need to have such cache. Signed-off-by: Naohiro Aota --- fs/btrfs/hmzoned.c | 18 ++++++++++++++++++ fs/btrfs/hmzoned.h | 1 + fs/btrfs/super.c | 10 ++++++++-- 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 641c83f6ea73..99a03ab3b5de 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -234,3 +234,21 @@ int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info) out: return ret; } + +int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info) +{ + if (!btrfs_fs_incompat(info, HMZONED)) + return 0; + + /* + * SPACE CACHE writing is not CoWed. Disable that to avoid + * write errors in sequential zones. + */ + if (btrfs_test_opt(info, SPACE_CACHE)) { + btrfs_err(info, + "cannot enable disk space caching with HMZONED mode"); + return -EINVAL; + } + + return 0; +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index 29cfdcabff2f..83579b2dc0a4 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -28,6 +28,7 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info); +int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index d7879a5a2536..496d8b74f9a2 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -440,8 +440,12 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, cache_gen = btrfs_super_cache_generation(info->super_copy); if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE)) btrfs_set_opt(info->mount_opt, FREE_SPACE_TREE); - else if (cache_gen) - btrfs_set_opt(info->mount_opt, SPACE_CACHE); + else if (cache_gen) { + if (btrfs_fs_incompat(info, HMZONED)) + WARN_ON(1); + else + btrfs_set_opt(info->mount_opt, SPACE_CACHE); + } /* * Even the options are empty, we still need to do extra check @@ -877,6 +881,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, ret = -EINVAL; } + if (!ret) + ret = btrfs_check_mountopts_hmzoned(info); if (!ret && btrfs_test_opt(info, SPACE_CACHE)) btrfs_info(info, "disk space caching is enabled"); if (!ret && btrfs_test_opt(info, FREE_SPACE_TREE)) From patchwork Thu Aug 8 09:30:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083747 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DEF1B14DB for ; Thu, 8 Aug 2019 09:31:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D042A289CF for ; Thu, 8 Aug 2019 09:31:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C44DF28AFD; Thu, 8 Aug 2019 09:31:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 807A728AFC for ; Thu, 8 Aug 2019 09:31:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732318AbfHHJbZ (ORCPT ); Thu, 8 Aug 2019 05:31:25 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732163AbfHHJbY (ORCPT ); Thu, 8 Aug 2019 05:31:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256684; x=1596792684; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5J1GW4kx+20xPsSwku37opj8v1sPkJx3qocgeUce9oE=; b=RYPBx6j7lOIfFWNz7Sa9ZKSWuCarr9DNWyd+XD+cx0dwylV+UjqGotEM b63umS6O57C+25rDBpKvt18z/d4VbWJCy3HYabUxGX2sW5n5PuD5NVimP JiBsdanX18cm2lAzog8z6PsyiN+KidKfJ9KG72da8pFhxDOBnIw9USfpe n0BK6PXfJg6xEDAtVFulNqaAK0Sryw8wFPO+so6+HNplBAwIZeDnlhXYB 9Nl47U+2O0QV/nhUL6tW4ImjWOtSofGoQ5ERaa/fTveI1koClq66GiUIT 50SliEm/00WJpo1dmV0pPIJKbciu6wnIwfxXiwQAo/VqMHu9Of79R/Y+D A==; IronPort-SDR: xXRa2LIMAiz8AD1osZkAT2kuv334/AQ5wmAaA7qGbb2v9mPbcqY6g1QYdPg7XpDkDuaBVaeZHH AY/BxHHPleZh9BKitYVM1NzFJoFpomQAMJYoCX0OqvccRFrnHYoKtK7WyW2uEVOsp3iuEYRhSJ nwV5XT/z6gFLiOFEl3E2oh8dmySJwF7frmEbZodnFRAxUVcQ8Qs7uetYlXoUr19z2Qn+Q4uk1M 6k4BOHdGDPElLdhuP889y1nphPr5NBm1hdMjcwkPQPyCOsgTCJoaS6AC1kO4WTy4pW7hxcrBPY oCQ= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363330" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:24 +0800 IronPort-SDR: T88MzdyPpUmcuOfp4qnqyRZAo6PN3UoHvNxPxVaHtWN2VeAmm7GqgGHKABmVI2uNv3N9DY5Dme L1Tvhj4qYimbPI9qaZthsDgs2uodrUBv/ju8X8ChSxz4btoyCwiTlFNEHuOReEZjftESYDD3q0 WV338KpqqH2FtKp7dXCx7sB46XpgSFMGlIbAEBOzhBPDfJUFvCSt9lXj1X0iAMbIt6mQf4bLhc qaRQGvLPAPFXZVRv9tmFYbCPpmPBv2uYe7Uq3kLcNHfqX1PyAOWm4IRvO4fdTcsBw263tBAUuQ jePoavRIuAIzdLOZLYJebcNZ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:08 -0700 IronPort-SDR: 8NgDJN/dZvem/XqMVj+sISf6Ya/j2S7n23aT8h1E/SxVcxH52fRqMQ3ooviyl47CUifJxw/jqu Aoiimh5MFArvXLAJQNpAXnUn3SJtEQiSf3kS0C5eVWYiobtLYsc1l8giIbXuvo+/G42cQUKE2b noSZew6AkUqKGumq+0s+9etOz/M3ksOQjSNGNfEaCw1TSAXI0GjrAIw1ne9DF6DcvoFCcjNec6 O9lSg3GJ/TEBH7BpKisTrq7eWB9lgUrc4SKtM8i8wUJyrdCknsrv5oU5ddsO+k8N3f6HxVF6/T TPY= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:23 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 06/27] btrfs: disallow NODATACOW in HMZONED mode Date: Thu, 8 Aug 2019 18:30:17 +0900 Message-Id: <20190808093038.4163421-7-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP NODATACOW implies overwriting the file data on a device, which is impossible in sequential required zones. Disable NODATACOW globally with mount option and per-file NODATACOW attribute by masking FS_NOCOW_FL. Signed-off-by: Naohiro Aota --- fs/btrfs/hmzoned.c | 6 ++++++ fs/btrfs/ioctl.c | 3 +++ 2 files changed, 9 insertions(+) diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 99a03ab3b5de..0770b1f58bd9 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -250,5 +250,11 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info) return -EINVAL; } + if (btrfs_test_opt(info, NODATACOW)) { + btrfs_err(info, + "cannot enable nodatacow with HMZONED mode"); + return -EINVAL; + } + return 0; } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d0743ec1231d..06783c489023 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -93,6 +93,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode, static unsigned int btrfs_mask_fsflags_for_type(struct inode *inode, unsigned int flags) { + if (btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED)) + flags &= ~FS_NOCOW_FL; + if (S_ISDIR(inode->i_mode)) return flags; else if (S_ISREG(inode->i_mode)) From patchwork Thu Aug 8 09:30:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083755 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 049C41709 for ; Thu, 8 Aug 2019 09:31:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EAA0128AFF for ; Thu, 8 Aug 2019 09:31:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D3E3E289CF; Thu, 8 Aug 2019 09:31:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8688F28A4E for ; Thu, 8 Aug 2019 09:31:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732326AbfHHJb1 (ORCPT ); Thu, 8 Aug 2019 05:31:27 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732163AbfHHJb0 (ORCPT ); Thu, 8 Aug 2019 05:31:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256686; x=1596792686; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bRZULrSJKwNy2EeClCcFfsPH/n5tsEyPC1ct2OsOnkE=; b=aEL1DiPlkOfQNaeLhCyeHvdydrleGCnJHAf6nrAMHGiReANvSGakk3xd OMXrNUCf7awF4xbimiijxIMzOHDtNxHZc/iqxRUZzP+beBQTUXGZ2G8H0 ThhZpGd1YVvY/0pfnKar1HmGsuezY1s68kwDkCoGM5T4s2v/vL4MzIr6N Q6bFRFsr+zLxpbuton3ErEf71N01mQQu5gJyUDpscI/1xMsAYTjeS4d7I L/pgzcbZnivgDe3kttxwPgjK5GTzLlRF7jguDSELxa7uXumu9+aelt9g6 yx6GdyGg8LqY70dih0fsxQIxWjr/BRMx1d4N6ACHIkYNSeG/jGH2NM5Nd Q==; IronPort-SDR: 1ooL4MS+UAzNYav8mmEAWlC3Tt02CrcyJzVuy98AYBlf4fQvkPHESpxMWVDcQ2xhG6VF/lNe85 AHVpn0pJVcitdhDYLuVSgMMLVH1r6iJ5nO/NQyxIAculw8QrWnb1uAzPbfgA7PHUsREmEbiP6l tkjbbzVxbwUa3Ok5qOJ02yJSZPaG64FMs6/8qgZjvGFH5QF823u1ZU/5LeOfsIN5xDklmSNE4H fyVXW7utyziITLi8Y9OuaY+K4W86lm9NT69P0ujrJMGCWY9ydMr8poW8tqQluZdAEK0jExS7Zs ZWM= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363340" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:26 +0800 IronPort-SDR: Lbzdfdag8tkQr0Im0Xcl6hpreKyuDB0loio7jfxCjzKMyIZtLTV21Z9MBzCBP+k3FP5lyOGypR UiU2xzIcWRZuM5YE2OJCdZHV60ZJmOQXQRwo7vctp9dXVfcIuKbDh3wgYNYt2NV39JWKiJKUP3 DyhpdHIU6/U8/4iVSVAv5xw6XjDSZbyn7pW1yr2fFZ8d4SBLQg8brXcdFxs3FB7IQ5rINXMkkB gxvS7eyH4/lyY75UmD1PrKJ4rQFnRQwD1m9J0q7cpk1xcPcpKK+8GnTJEbufwZgrg7g1dpojDs Qps+CY3aY37bsTU1NIRzHKbu Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:10 -0700 IronPort-SDR: E3QCgO11VIYuopcFd1kubSpixRfDmybMfUhy42DrDIMv+CglxcKYqlbQxZnlt/mWCUKHaH2AP1 SUbV7sa8ZpSBeH6GP6lv7rHQ3KFvdqLiwpNZwf2QrOw01XwAPleNBBekVZL1eBECevkDYGvG8D v5K4AlaFUu0On2XfxwjtLXYqBR1txFTXBmjcOyCSt6A8xA46tsh8YJnVLdUo7Lk3KSXnw20vGP qZQ7mN8cvT7Bpv+isvNXtQxmGEFgOxG6RdhnR2CavIUwwCr5WoJEznSJPR7Qs0dRiEt8YWcSp6 7P8= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:25 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 07/27] btrfs: disable tree-log in HMZONED mode Date: Thu, 8 Aug 2019 18:30:18 +0900 Message-Id: <20190808093038.4163421-8-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Extent buffers for tree-log tree are allocated scattered between other metadata's extent buffers, and btrfs_sync_log() writes out only the tree-log buffers. This behavior breaks sequential writing rule, which is mandatory in sequential required zones. Actually, we don't have much benefit using tree-logging with HMZONED mode, until we can allocate tree-log buffer sequentially. So, disable tree-log entirely in HMZONED mode. Signed-off-by: Naohiro Aota --- fs/btrfs/hmzoned.c | 6 ++++++ fs/btrfs/super.c | 4 ++++ 2 files changed, 10 insertions(+) diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 0770b1f58bd9..e07e76af1e82 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -256,5 +256,11 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info) return -EINVAL; } + if (!btrfs_test_opt(info, NOTREELOG)) { + btrfs_err(info, + "cannot enable tree log with HMZONED mode"); + return -EINVAL; + } + return 0; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 496d8b74f9a2..396238e099bc 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -447,6 +447,10 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, btrfs_set_opt(info->mount_opt, SPACE_CACHE); } + if (btrfs_fs_incompat(info, HMZONED)) + btrfs_set_and_info(info, NOTREELOG, + "disabling tree log with HMZONED mode"); + /* * Even the options are empty, we still need to do extra check * against new flags From patchwork Thu Aug 8 09:30:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083753 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0DCD14DB for ; Thu, 8 Aug 2019 09:31:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E02CE28A4E for ; Thu, 8 Aug 2019 09:31:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D278928B00; Thu, 8 Aug 2019 09:31:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 85FB9289CF for ; Thu, 8 Aug 2019 09:31:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732334AbfHHJb3 (ORCPT ); Thu, 8 Aug 2019 05:31:29 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732163AbfHHJb2 (ORCPT ); Thu, 8 Aug 2019 05:31:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256688; x=1596792688; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AA0KI5TsQFQQlLW6JYAZLzzTfRAsTn7wEog+IoFrg74=; b=crvKlaHZKHdAuLNS0Ud4BwbnXCJBnAWyYeoUrS52v2BYVjrA+FUHRqCG AIHBb3LWt9RKQdktKnpLg/fm8rsdMhVqAofsHtly8I8PsLxmD/+JhI6/5 J5iMl97vPBPLlvlyyrLggZsD3vLkmCz7Ab4YP/tRWeY94EzQagk5FyJkX Po89Ml4T1YCINRC98Oy92HRqOcqMSM8vy0YVtnv2xyDYFbqIdZYM65QYv dGCWECmJ51CCOtPfl6Cvl0OLdprs0cF+5pr/MwDp6XHT0368U7/KuhExF ZHfW04LNbJNY7euRPjSlrmdOG3KSTOAeZrd0xYnApmAqmYv8obgkBUwLS g==; IronPort-SDR: OP0FrxcFRb7oliVXwM+ooOGpWK3J2Vv5hsI/aNhEUocg/hhB6d3rWEci2IAX1xk3qGxDHpkdmz T0SXvfnaxoGyN5InkIUQsOmD3VVn1NgXkSU/yyNFyG6WUCnUd3yKJniOXMXf3ISiOxybUQngkc 9Bz9HUeFhJc/Ta0537Y6BvdiCndRuroZ2FbrFllM0K7GkhvHAox+KwIZK14hX6Vsl2FhY57pDU UlijDx3LxD02TXX5v6mOF2YE9pHZe+s0kLbn0txwLKoeHiLaU9U1+3855uV45GDD5KhemREc9c 7SI= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363348" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:28 +0800 IronPort-SDR: vQtCN8u7nejT51066FuWpsGhzx02jefAp1cc2jlEwuEY/lVtbqivs8AdW3Nkbi2AvUIIx8O4fS M8HMSXm8bIr2VsbtYcSbPnMoNO4B1OcCOCfw9DL52LTyWNYPtrWdqixMd0nIswrcPpmjBAN0Nj IsT/Uv2RZhZVtX4/mst/60qdwXP2imwU7mQRQZk+LlL0JQYO4fLnqf6aRyD4KMd1YKBjJKjzyX Dx1R3L+KCyh+ELpKo1oukCoxd0E6+PUVm/IlGDov4kNQsZEW0vCFKeBBTE1xmFIq+7GCy/RAlA OrZuoeVjz6Z9LBggb/TztNvN Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:12 -0700 IronPort-SDR: u0Cd2mGjZ2NzYaHzuzY7uu2z8ynBKWMSIRSg3UehRLv1YSGj5INVymfD0UFEQEedB7duxlVocv jf12XPiu1f0+aQmRg/rz49qNaWjmyBerOcvGvsJxX0JEUKjxjEplQR1bKbBPKc7KWebJ/Kttrn RR+rujgQ7Eki26OE/l/ESXarVRN/PJWfaKdfMs4giTirj+6Vn7MuPyIcolkwnLsdrUthz/3AIL 5lKQBu/VmmvHWrX1qLRQhdZEnQ3oRDhkHoEy0fnlHQIW9tIsIPxeLtHR4tlS7rZrVXS7Gtg0gD yP0= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:27 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 08/27] btrfs: disable fallocate in HMZONED mode Date: Thu, 8 Aug 2019 18:30:19 +0900 Message-Id: <20190808093038.4163421-9-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP fallocate() is implemented by reserving actual extent instead of reservations. This can result in exposing the sequential write constraint of host-managed zoned block devices to the application, which would break the POSIX semantic for the fallocated file. To avoid this, report fallocate() as not supported when in HMZONED mode for now. In the future, we may be able to implement "in-memory" fallocate() in HMZONED mode by utilizing space_info->bytes_may_use or so. Signed-off-by: Naohiro Aota --- fs/btrfs/file.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 58a18ed11546..7474010a997d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3023,6 +3023,10 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_end = round_up(offset + len, blocksize); cur_offset = alloc_start; + /* Do not allow fallocate in HMZONED mode */ + if (btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED)) + return -EOPNOTSUPP; + /* Make sure we aren't being give some crap mode */ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) From patchwork Thu Aug 8 09:30:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083759 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A5561398 for ; Thu, 8 Aug 2019 09:31:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 47866289CF for ; Thu, 8 Aug 2019 09:31:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 38B1628AFD; Thu, 8 Aug 2019 09:31:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FD67289CF for ; Thu, 8 Aug 2019 09:31:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732346AbfHHJbc (ORCPT ); Thu, 8 Aug 2019 05:31:32 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732163AbfHHJbb (ORCPT ); Thu, 8 Aug 2019 05:31:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256690; x=1596792690; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=19z8ZDnDDtejz8ePJD+rn/REo3SE14j88zXUEjlO4rs=; b=letbL8hqmFnJ7BFDG95Hv5Bt3yjMhi3/SUE0RXxy45Heekrxdas+Mj6D asVSpdy7fuPqWUS07DEj2OAONIWwkurKuJxwQr+kk0g6XEvLqAZ6df0vY G7V95wgAxjh0S62JdQsrM331bmTps17IC9m+JG0qT0gIy+RyqAMJT6CXl uWIi+0SPjwlCPAMd0alkF6M1ggXC9ZUiSqAaJfH6Yj3yTAkW9udaiIE/H 2etosHTq/aE9VK1nedMFyr+7w62qGP75M0unQegaXcN4T1FyQj1Ff3719 fxEM4p7B5NQMqVtZpxe2rRDbzAVJKDNpwrXmjgwlNZvLomQe/snUDtH8d g==; IronPort-SDR: E/6aJ0U6ZsZYJAhznJIx/2SyjkI1VZ/9v0q+gMEuAc1xJroVFUVXlIj13W1JXpToy9GBjIN3Yx OLHcHugpzA77qivbGRASYvwrv0QWtDlOpeVjNdHnuL+PmE+K8Up04jw2IebsyelbkK60TlKJgl S9l47EaUlFME1B+KO3VY2vbewcEkwRU5hfVIOIKdskmxd3PjDEnMc+7DqbYnGLpRcGoMuhaBoO 03CTaE8F4oGI1g6qjH4IIWfx8kqDSNy7rq3dbRTRI2NC9UfQX7PpzOk0YQnC7GvEfi5oRalBde 6gE= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363350" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:30 +0800 IronPort-SDR: KzbchxShDayKriUE6f4dCvoMOh06O0XNAwDF7kZM1oPRM4NeVyueR6/0VSTIBBFn+/nR4oddHZ 9TCcsjQfr0d5MaobujcppwQlL59RLmtGi49fIOZ0ChoH/8k8rRDUXAtKKSom1P4k9nT1wqfGdQ Q/F4d545gZq3O5OHc1vtHXzIzg6mCZZ6wh4N2SgfE0EwLPRqfJhxwssZQNp31w9QhorVv6j7t7 MSvydWT4kw6vv8i5Z4llhi6NK9nX0wQfEJjuKqc7za+Qt2thNy1HmA0yANkibIGR5l0unPprO6 hHUQgDoN8vl1TDpCqLuXcIp+ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:14 -0700 IronPort-SDR: J+Cm9qp/Q1t9qIa5I9kTlzCa7+NFmDv972/HXHfQLVUrdf2ZC/xFrG/tynt8Avdmh01piwEgo0 QK3N8etxr5CzXDDyItKzCSMIvxCLlc3pwlagLrLTmyA11vUVLO6tGz2R0/RgP/rCPGxSs6XVo9 eN22u1lMBPcsYPJjypZfd63ZVhytYIa5D6VzejVllYLFC+/LhTRL6kfEb5y8kg/059iZ4Q6P8q fDpmaEysTC7PX8w0syD702dDLGOecgNNVVAyYClvZd8V99TVjEiDTY1Np+vwCYNNxAuJxOv2xl jmg= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:29 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 09/27] btrfs: align device extent allocation to zone boundary Date: Thu, 8 Aug 2019 18:30:20 +0900 Message-Id: <20190808093038.4163421-10-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In HMZONED mode, align the device extents to zone boundaries so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, check that a region allocation is always over empty same-type zones and it is not over any locations of super block copies. This patch also add a verification in verify_one_dev_extent() to check if the device extent is align to zone boundary. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 6 ++++ fs/btrfs/hmzoned.c | 56 ++++++++++++++++++++++++++++++++ fs/btrfs/hmzoned.h | 10 ++++++ fs/btrfs/volumes.c | 72 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 144 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d3b58e388535..3a36646dfaa8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7637,6 +7637,12 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) min_free = div64_u64(min_free, dev_min); } + /* We cannot allocate size less than zone_size anyway */ + if (index == BTRFS_RAID_DUP) + min_free = max_t(u64, min_free, 2 * fs_info->zone_size); + else + min_free = max_t(u64, min_free, fs_info->zone_size); + mutex_lock(&fs_info->chunk_mutex); list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) { u64 dev_offset; diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index e07e76af1e82..7d334b236cd3 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -12,6 +12,7 @@ #include "volumes.h" #include "hmzoned.h" #include "rcu-string.h" +#include "disk-io.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -264,3 +265,58 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info) return 0; } + +/* + * btrfs_check_allocatable_zones - check if spcecifeid region is + * suitable for allocation + * @device: the device to allocate a region + * @pos: the position of the region + * @num_bytes: the size of the region + * + * In non-ZONED device, anywhere is suitable for allocation. In ZONED + * device, check if + * 1) the region is not on non-empty zones, + * 2) all zones in the region have the same zone type, + * 3) it does not contain super block location, if the zones are + * sequential. + */ +bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, + u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u64 nzones, begin, end; + u64 sb_pos; + u8 shift; + int i; + + if (!zinfo) + return true; + + shift = zinfo->zone_size_shift; + nzones = num_bytes >> shift; + begin = pos >> shift; + end = begin + nzones; + + ASSERT(IS_ALIGNED(pos, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return false; + + /* check if zones in the region are all empty */ + if (find_next_zero_bit(zinfo->empty_zones, end, begin) != end) + return false; + + if (btrfs_dev_is_sequential(device, pos)) { + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + sb_pos = btrfs_sb_offset(i); + if (!(sb_pos + BTRFS_SUPER_INFO_SIZE <= pos || + pos + end <= sb_pos)) + return false; + } + + return find_next_zero_bit(zinfo->seq_zones, end, begin) == end; + } + + return find_next_bit(zinfo->seq_zones, end, begin) == end; +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index 83579b2dc0a4..396ece5f9410 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -29,6 +29,8 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info); int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info); +bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, + u64 num_bytes); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { @@ -95,4 +97,12 @@ static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info *fs_info, return bdev_zoned_model(bdev) != BLK_ZONED_HM; } +static inline u64 btrfs_zone_align(struct btrfs_device *device, u64 pos) +{ + if (!device->zone_info) + return pos; + + return ALIGN(pos, device->zone_info->zone_size); +} + #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 755b2ec1e0de..265a1496e459 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1572,6 +1572,7 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, u64 max_hole_size; u64 extent_end; u64 search_end = device->total_bytes; + u64 zone_size = 0; int ret; int slot; struct extent_buffer *l; @@ -1582,6 +1583,14 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, * at an offset of at least 1MB. */ search_start = max_t(u64, search_start, SZ_1M); + /* + * For a zoned block device, skip the first zone of the device + * entirely. + */ + if (device->zone_info) + zone_size = device->zone_info->zone_size; + search_start = max_t(u64, search_start, zone_size); + search_start = btrfs_zone_align(device, search_start); path = btrfs_alloc_path(); if (!path) @@ -1646,12 +1655,21 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, */ if (contains_pending_extent(device, &search_start, hole_size)) { + search_start = btrfs_zone_align(device, + search_start); if (key.offset >= search_start) hole_size = key.offset - search_start; else hole_size = 0; } + if (!btrfs_check_allocatable_zones(device, search_start, + num_bytes)) { + search_start += zone_size; + btrfs_release_path(path); + goto again; + } + if (hole_size > max_hole_size) { max_hole_start = search_start; max_hole_size = hole_size; @@ -1691,6 +1709,14 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, hole_size = search_end - search_start; if (contains_pending_extent(device, &search_start, hole_size)) { + search_start = btrfs_zone_align(device, search_start); + btrfs_release_path(path); + goto again; + } + + if (!btrfs_check_allocatable_zones(device, search_start, + num_bytes)) { + search_start += zone_size; btrfs_release_path(path); goto again; } @@ -1708,6 +1734,7 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, ret = 0; out: + ASSERT(zone_size == 0 || IS_ALIGNED(max_hole_start, zone_size)); btrfs_free_path(path); *start = max_hole_start; if (len) @@ -4964,6 +4991,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int i; int j; int index; + int hmzoned = btrfs_fs_incompat(info, HMZONED); BUG_ON(!alloc_profile_is_valid(type, 0)); @@ -5004,10 +5032,20 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, BUG(); } + if (hmzoned) { + max_stripe_size = info->zone_size; + max_chunk_size = round_down(max_chunk_size, info->zone_size); + } + /* We don't want a chunk larger than 10% of writable space */ max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1), max_chunk_size); + if (hmzoned) + max_chunk_size = max(round_down(max_chunk_size, + info->zone_size), + info->zone_size); + devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info), GFP_NOFS); if (!devices_info) @@ -5042,6 +5080,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (total_avail == 0) continue; + if (hmzoned && total_avail < max_stripe_size * dev_stripes) + continue; + ret = find_free_dev_extent(device, max_stripe_size * dev_stripes, &dev_offset, &max_avail); @@ -5060,6 +5101,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, continue; } + if (hmzoned && max_avail < max_stripe_size * dev_stripes) + continue; + if (ndevs == fs_devices->rw_devices) { WARN(1, "%s: found more than %llu devices\n", __func__, fs_devices->rw_devices); @@ -5093,6 +5137,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, ndevs = min(ndevs, devs_max); +again: /* * The primary goal is to maximize the number of stripes, so use as * many devices as possible, even if the stripes are not maximum sized. @@ -5116,6 +5161,17 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, * we try to reduce stripe_size. */ if (stripe_size * data_stripes > max_chunk_size) { + if (hmzoned) { + /* + * stripe_size is fixed in HMZONED. Reduce ndevs + * instead. + */ + ASSERT(nparity == 0); + ndevs = div_u64(max_chunk_size * ncopies, + stripe_size * dev_stripes); + goto again; + } + /* * Reduce stripe_size, round it up to a 16MB boundary again and * then use it, unless it ends up being even bigger than the @@ -5129,6 +5185,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, /* align to BTRFS_STRIPE_LEN */ stripe_size = round_down(stripe_size, BTRFS_STRIPE_LEN); + ASSERT(!hmzoned || stripe_size == info->zone_size); + map = kmalloc(map_lookup_size(num_stripes), GFP_NOFS); if (!map) { ret = -ENOMEM; @@ -7755,6 +7813,20 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, ret = -EUCLEAN; goto out; } + + if (dev->zone_info) { + u64 zone_size = dev->zone_info->zone_size; + + if (!IS_ALIGNED(physical_offset, zone_size) || + !IS_ALIGNED(physical_len, zone_size)) { + btrfs_err(fs_info, +"dev extent devid %llu physical offset %llu len %llu is not aligned to device zone", + devid, physical_offset, physical_len); + ret = -EUCLEAN; + goto out; + } + } + out: free_extent_map(em); return ret; From patchwork Thu Aug 8 09:30:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083761 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C608B1398 for ; Thu, 8 Aug 2019 09:31:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B3D4828AFD for ; Thu, 8 Aug 2019 09:31:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A754D28AFF; Thu, 8 Aug 2019 09:31:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04DB928A4E for ; Thu, 8 Aug 2019 09:31:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732354AbfHHJbd (ORCPT ); Thu, 8 Aug 2019 05:31:33 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732335AbfHHJbc (ORCPT ); Thu, 8 Aug 2019 05:31:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256692; x=1596792692; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xZFvSm/S7y0Hg3MXM5eZh/Xrj5XEJQmoUfsTowanRBM=; b=U9ogLjWPJuyqvJt7qGKmB90ssUdn8WR6YrMw5ado1M4ZPiI+bsHkYIeS hDe/ogw7j/M3t0CR0fdGjjleu8Kxwp/4WXUzPwvGflUHxvgFo3WV6ehAq cL/cfmce0PXAT7zjmc3GouqXQwBjjzgoBWNITuYp1rnAqLzfqhH4an/0y ObcSLsFgpoYVz4JbbhzzM3cZf6BNGM4yYUOIH+gUSJuhkGrk6QJP7a4vS +g0ESCF+Y+penIVVRKO4GeygNTZLczQDqbljU69EN3eMMAgCKZ4cwZ1xi zgPzIjFM5KXV/fnPbqGXzq+W94bohdYT0PP/ZMToOWMoNIzwleloPubSI A==; IronPort-SDR: v9ZdxfqVXv9w2ZuAvNSt9DbQDGpk5JIFMVRBot97fNfNGaEMvNcYJ5GsI3SbBY7o+1D25cSsCD ZCWoIXKQ+Wbp5+A5aAUw9iyo3EK9HNSio+cpvnvajDPinyWWJ1UfjX8ysMrE02rmwf+CDpjYLO Q2+dtMEvCXG9vaa86H8EMXHua7Nj0r6CqIcmCMckshKjnbyGGqdx8iSTbL9naFO/8GwYyhSHID IfQan8Ku0NPLA3Z1uhlSXXd5OXWQQcZw1SH1zVpWZKfOcVbTK97RWhplbC+VAvfS2ab2Gj8ndO wV4= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363358" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:32 +0800 IronPort-SDR: 9jsZOam+a7GB2v4H2J3wrIgnqOD1oJtmz7BAynIQjUIC46MVi3a/xCtMIHrz2lTsHoJ40hCuDs 83JiE2szNR53ylFg2V97TMvud3yL8kGjAyGcXbX3HnrhZ4yK0Q2wHXoVh9831/lNiDr90U0oVF wIkQXTefUZlu4qCDqVSe+xl8wRVM/wmVoGPhiKn962NfHMFuY5sJgSyPTugWLQtW2Vd79apjE0 Rpj+/Wofggg2J+/9WcJ/V44ZkcFLF6tJvdkEyQ8yhy2Etiu/grYFyrFieihktBgYa6AhoFhTgf AcQGnj8oDgM1/xLE4/rS68sa Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:16 -0700 IronPort-SDR: 84JgInKBy+5FMRQAbIQ7Hqyc7tdyvDennL+c53rPz+gaOEh2fvOfD+iPCWki+304ooCopJTwaf 1CLpfGQKS4obcjgntOXGP3QiQLQYn74RrQtvaidtRORGstPs+I1GQqARj5hBbGbHtSPBG22IX5 m0xk3hLo8hjpJJoy07iqCJrqpmr4CpcBDAvV5GVqjwbSHx6XH4pPn+oUrvJoo7XwVx83HUQP7/ sXmZZt25iac7RJSz3qamihGemX40cK28TbqHFmw27mw40nN5WxIgx2iEQxiKQkYimeDl6rUaFX rU4= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:31 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 10/27] btrfs: do sequential extent allocation in HMZONED mode Date: Thu, 8 Aug 2019 18:30:21 +0900 Message-Id: <20190808093038.4163421-11-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On HMZONED drives, writes must always be sequential and directed at a block group zone write pointer position. Thus, block allocation in a block group must also be done sequentially using an allocation pointer equal to the block group zone write pointer plus the number of blocks allocated but not yet written. Sequential allocation function find_free_extent_seq() bypass the checks in find_free_extent() and increase the reserved byte counter by itself. It is impossible to revert once allocated region in the sequential allocation, since it might race with other allocations and leave an allocation hole, which breaks the sequential write rule. Furthermore, this commit introduce two new variable to struct btrfs_block_group_cache. "wp_broken" indicate that write pointer is broken (e.g. not synced on a RAID1 block group) and mark that block group read only. "zone_unusable" keeps track of the size of once allocated then freed region in a block group. Such region is never usable until resetting underlying zones. This commit also introduce "bytes_zone_unusable" to track such unusable bytes in a space_info. Pinned bytes are always reclaimed to "bytes_zone_unusable". They are not usable until resetting them first. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 25 ++++ fs/btrfs/extent-tree.c | 179 +++++++++++++++++++++++++--- fs/btrfs/free-space-cache.c | 35 ++++++ fs/btrfs/free-space-cache.h | 5 + fs/btrfs/hmzoned.c | 231 ++++++++++++++++++++++++++++++++++++ fs/btrfs/hmzoned.h | 1 + fs/btrfs/space-info.c | 13 +- fs/btrfs/space-info.h | 4 +- fs/btrfs/sysfs.c | 2 + 9 files changed, 471 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a00ce8c4d678..3d31a1960c4d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -482,6 +482,20 @@ struct btrfs_full_stripe_locks_tree { struct mutex lock; }; +/* Block group allocation types */ +enum btrfs_alloc_type { + + /* Regular first fit allocation */ + BTRFS_ALLOC_FIT = 0, + + /* + * Sequential allocation: this is for HMZONED mode and + * will result in ignoring free space before a block + * group allocation offset. + */ + BTRFS_ALLOC_SEQ = 1, +}; + struct btrfs_block_group_cache { struct btrfs_key key; struct btrfs_block_group_item item; @@ -521,6 +535,7 @@ struct btrfs_block_group_cache { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + unsigned int wp_broken:1; int disk_cache_state; @@ -594,6 +609,16 @@ struct btrfs_block_group_cache { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + enum btrfs_alloc_type alloc_type; + u64 zone_unusable; + /* + * Allocation offset for the block group to implement + * sequential allocation. This is used only with HMZONED mode + * enabled and if the block group resides on a sequential + * zone. + */ + u64 alloc_offset; }; /* delayed seq elem */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3a36646dfaa8..d2aacffe14d6 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -31,6 +31,8 @@ #include "space-info.h" #include "block-rsv.h" #include "delalloc-space.h" +#include "rcu-string.h" +#include "hmzoned.h" #undef SCRAMBLE_DELAYED_REFS @@ -543,6 +545,8 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, struct btrfs_caching_control *caching_ctl; int ret = 0; + ASSERT(cache->alloc_type == BTRFS_ALLOC_FIT); + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); if (!caching_ctl) return -ENOMEM; @@ -4429,6 +4433,20 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg) wait_var_event(&bg->reservations, !atomic_read(&bg->reservations)); } +static void __btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, + u64 ram_bytes, u64 num_bytes, + int delalloc) +{ + struct btrfs_space_info *space_info = cache->space_info; + + cache->reserved += num_bytes; + space_info->bytes_reserved += num_bytes; + btrfs_space_info_update_bytes_may_use(cache->fs_info, space_info, + -ram_bytes); + if (delalloc) + cache->delalloc_bytes += num_bytes; +} + /** * btrfs_add_reserved_bytes - update the block_group and space info counters * @cache: The cache we are manipulating @@ -4447,18 +4465,16 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, struct btrfs_space_info *space_info = cache->space_info; int ret = 0; + /* should handled by find_free_extent_seq */ + ASSERT(cache->alloc_type != BTRFS_ALLOC_SEQ); + spin_lock(&space_info->lock); spin_lock(&cache->lock); - if (cache->ro) { + if (cache->ro) ret = -EAGAIN; - } else { - cache->reserved += num_bytes; - space_info->bytes_reserved += num_bytes; - btrfs_space_info_update_bytes_may_use(cache->fs_info, - space_info, -ram_bytes); - if (delalloc) - cache->delalloc_bytes += num_bytes; - } + else + __btrfs_add_reserved_bytes(cache, ram_bytes, num_bytes, + delalloc); spin_unlock(&cache->lock); spin_unlock(&space_info->lock); return ret; @@ -4576,9 +4592,13 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, cache = btrfs_lookup_block_group(fs_info, start); BUG_ON(!cache); /* Logic error */ - cluster = fetch_cluster_info(fs_info, - cache->space_info, - &empty_cluster); + if (cache->alloc_type == BTRFS_ALLOC_FIT) + cluster = fetch_cluster_info(fs_info, + cache->space_info, + &empty_cluster); + else + cluster = NULL; + empty_cluster <<= 1; } @@ -4618,7 +4638,11 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, space_info->max_extent_size = 0; percpu_counter_add_batch(&space_info->total_bytes_pinned, -len, BTRFS_TOTAL_BYTES_PINNED_BATCH); - if (cache->ro) { + if (cache->alloc_type == BTRFS_ALLOC_SEQ) { + /* need reset before reusing in ALLOC_SEQ BG */ + space_info->bytes_zone_unusable += len; + readonly = true; + } else if (cache->ro) { space_info->bytes_readonly += len; readonly = true; } @@ -5464,6 +5488,60 @@ static int find_free_extent_unclustered(struct btrfs_block_group_cache *bg, return 0; } +/* + * Simple allocator for sequential only block group. It only allows + * sequential allocation. No need to play with trees. This function + * also reserve the bytes as in btrfs_add_reserved_bytes. + */ + +static int find_free_extent_seq(struct btrfs_block_group_cache *cache, + struct find_free_extent_ctl *ffe_ctl) +{ + struct btrfs_space_info *space_info = cache->space_info; + struct btrfs_free_space_ctl *ctl = cache->free_space_ctl; + u64 start = cache->key.objectid; + u64 num_bytes = ffe_ctl->num_bytes; + u64 avail; + int ret = 0; + + /* Sanity check */ + if (cache->alloc_type != BTRFS_ALLOC_SEQ) + return 1; + + spin_lock(&space_info->lock); + spin_lock(&cache->lock); + + if (cache->ro) { + ret = -EAGAIN; + goto out; + } + + spin_lock(&ctl->tree_lock); + avail = cache->key.offset - cache->alloc_offset; + if (avail < num_bytes) { + ffe_ctl->max_extent_size = avail; + spin_unlock(&ctl->tree_lock); + ret = 1; + goto out; + } + + ffe_ctl->found_offset = start + cache->alloc_offset; + cache->alloc_offset += num_bytes; + ctl->free_space -= num_bytes; + spin_unlock(&ctl->tree_lock); + + ASSERT(IS_ALIGNED(ffe_ctl->found_offset, + cache->fs_info->stripesize)); + ffe_ctl->search_start = ffe_ctl->found_offset; + __btrfs_add_reserved_bytes(cache, ffe_ctl->ram_bytes, num_bytes, + ffe_ctl->delalloc); + +out: + spin_unlock(&cache->lock); + spin_unlock(&space_info->lock); + return ret; +} + /* * Return >0 means caller needs to re-search for free extent * Return 0 means we have the needed free extent. @@ -5764,6 +5842,17 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, if (unlikely(block_group->cached == BTRFS_CACHE_ERROR)) goto loop; + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) { + ret = find_free_extent_seq(block_group, &ffe_ctl); + if (ret) + goto loop; + /* + * find_free_space_seq should ensure that + * everything is OK and reserve the extent. + */ + goto nocheck; + } + /* * Ok we want to try and use the cluster allocator, so * lets look there @@ -5819,6 +5908,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, num_bytes); goto loop; } +nocheck: btrfs_inc_block_group_reservations(block_group); /* we are all good, lets return */ @@ -7370,7 +7460,8 @@ static int inc_block_group_ro(struct btrfs_block_group_cache *cache, int force) } num_bytes = cache->key.offset - cache->reserved - cache->pinned - - cache->bytes_super - btrfs_block_group_used(&cache->item); + cache->bytes_super - cache->zone_unusable - + btrfs_block_group_used(&cache->item); sinfo_used = btrfs_space_info_used(sinfo, true); if (sinfo_used + num_bytes + min_allocable_bytes <= @@ -7519,6 +7610,7 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache) if (!--cache->ro) { num_bytes = cache->key.offset - cache->reserved - cache->pinned - cache->bytes_super - + cache->zone_unusable - btrfs_block_group_used(&cache->item); sinfo->bytes_readonly -= num_bytes; list_del_init(&cache->ro_list); @@ -7989,6 +8081,7 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, atomic_set(&cache->trimming, 0); mutex_init(&cache->free_space_lock); btrfs_init_full_stripe_locks_tree(&cache->full_stripe_locks_root); + cache->alloc_type = BTRFS_ALLOC_FIT; return cache; } @@ -8061,6 +8154,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) int need_clear = 0; u64 cache_gen; u64 feature; + u64 unusable = 0; int mixed; feature = btrfs_super_incompat_flags(info->super_copy); @@ -8130,6 +8224,14 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) key.objectid = found_key.objectid + found_key.offset; btrfs_release_path(path); + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_err(info, "failed to load zone info of bg %llu", + cache->key.objectid); + btrfs_put_block_group(cache); + goto error; + } + /* * We need to exclude the super stripes now so that the space * info has super bytes accounted for, otherwise we'll think @@ -8166,6 +8268,31 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) free_excluded_extents(cache); } + if (cache->alloc_type == BTRFS_ALLOC_SEQ) { + u64 free; + + WARN_ON(cache->bytes_super != 0); + if (!cache->wp_broken) { + unusable = cache->alloc_offset - + btrfs_block_group_used(&cache->item); + free = cache->key.offset - cache->alloc_offset; + } else { + unusable = cache->key.offset - + btrfs_block_group_used(&cache->item); + free = 0; + } + /* we only need ->free_space in ALLOC_SEQ BGs */ + cache->last_byte_to_unpin = (u64)-1; + cache->cached = BTRFS_CACHE_FINISHED; + cache->free_space_ctl->free_space = free; + cache->zone_unusable = unusable; + /* + * Should not have any excluded extents. Just + * in case, though. + */ + free_excluded_extents(cache); + } + ret = btrfs_add_block_group_cache(info, cache); if (ret) { btrfs_remove_free_space_cache(cache); @@ -8176,7 +8303,8 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, found_key.offset, btrfs_block_group_used(&cache->item), - cache->bytes_super, &space_info); + cache->bytes_super, unusable, + &space_info); cache->space_info = space_info; @@ -8189,6 +8317,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) ASSERT(list_empty(&cache->bg_list)); btrfs_mark_bg_unused(cache); } + + if (cache->wp_broken) + inc_block_group_ro(cache, 1); } list_for_each_entry_rcu(space_info, &info->space_info, list) { @@ -8282,6 +8413,13 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; cache->needs_free_space = 1; + + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_put_block_group(cache); + return ret; + } + ret = exclude_super_stripes(cache); if (ret) { /* @@ -8326,7 +8464,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, */ trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, - cache->bytes_super, &cache->space_info); + cache->bytes_super, 0, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); @@ -8576,12 +8714,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, WARN_ON(block_group->space_info->total_bytes < block_group->key.offset); WARN_ON(block_group->space_info->bytes_readonly - < block_group->key.offset); + < block_group->key.offset - block_group->zone_unusable); + WARN_ON(block_group->space_info->bytes_zone_unusable + < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->key.offset * factor); } block_group->space_info->total_bytes -= block_group->key.offset; - block_group->space_info->bytes_readonly -= block_group->key.offset; + block_group->space_info->bytes_readonly -= + (block_group->key.offset - block_group->zone_unusable); + block_group->space_info->bytes_zone_unusable -= + block_group->zone_unusable; block_group->space_info->disk_total -= block_group->key.offset * factor; spin_unlock(&block_group->space_info->lock); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 062be9dde4c6..2aeb3620645c 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2326,8 +2326,11 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, u64 offset, u64 bytes) { struct btrfs_free_space *info; + struct btrfs_block_group_cache *block_group = ctl->private; int ret = 0; + ASSERT(!block_group || block_group->alloc_type != BTRFS_ALLOC_SEQ); + info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) return -ENOMEM; @@ -2376,6 +2379,30 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, return ret; } +int __btrfs_add_free_space_seq(struct btrfs_block_group_cache *block_group, + u64 bytenr, u64 size) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 offset = bytenr - block_group->key.objectid; + u64 to_free, to_unusable; + + spin_lock(&ctl->tree_lock); + if (block_group->wp_broken) + to_free = 0; + else if (offset >= block_group->alloc_offset) + to_free = size; + else if (offset + size <= block_group->alloc_offset) + to_free = 0; + else + to_free = offset + size - block_group->alloc_offset; + to_unusable = size - to_free; + + ctl->free_space += to_free; + block_group->zone_unusable += to_unusable; + spin_unlock(&ctl->tree_lock); + return 0; +} + int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, u64 offset, u64 bytes) { @@ -2384,6 +2411,8 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, int ret; bool re_search = false; + ASSERT(block_group->alloc_type != BTRFS_ALLOC_SEQ); + spin_lock(&ctl->tree_lock); again: @@ -2619,6 +2648,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 align_gap = 0; u64 align_gap_len = 0; + ASSERT(block_group->alloc_type != BTRFS_ALLOC_SEQ); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -2738,6 +2769,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, struct rb_node *node; u64 ret = 0; + ASSERT(block_group->alloc_type != BTRFS_ALLOC_SEQ); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; @@ -3384,6 +3417,8 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, { int ret; + ASSERT(block_group->alloc_type != BTRFS_ALLOC_SEQ); + *trimmed = 0; spin_lock(&block_group->lock); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 8760acb55ffd..d30667784f73 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -73,10 +73,15 @@ void btrfs_init_free_space_ctl(struct btrfs_block_group_cache *block_group); int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_free_space_ctl *ctl, u64 bytenr, u64 size); +int __btrfs_add_free_space_seq(struct btrfs_block_group_cache *block_group, + u64 bytenr, u64 size); static inline int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, u64 bytenr, u64 size) { + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) + return __btrfs_add_free_space_seq(block_group, bytenr, size); + return __btrfs_add_free_space(block_group->fs_info, block_group->free_space_ctl, bytenr, size); diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 7d334b236cd3..89631f5f01f2 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -17,6 +17,9 @@ /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) + static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, struct blk_zone **zones_ret, unsigned int *nr_zones, gfp_t gfp_mask) @@ -320,3 +323,231 @@ bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, return find_next_bit(zinfo->seq_zones, end, begin) == end; } + +int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->key.objectid; + u64 length = cache->key.offset; + u64 physical = 0; + int ret, alloc_type; + int i, j; + u64 *alloc_offsets = NULL; + + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return 0; + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, "unaligned block group at %llu + %llu", + logical, length); + return -EIO; + } + + /* Get the chunk mapping */ + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + /* + * Get the zone type: if the group is mapped to a non-sequential zone, + * there is no need for the allocation offset (fit allocation is OK). + */ + alloc_type = -1; + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), + GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (alloc_type == -1) + alloc_type = is_sequential ? + BTRFS_ALLOC_SEQ : BTRFS_ALLOC_FIT; + + if ((is_sequential && alloc_type != BTRFS_ALLOC_SEQ) || + (!is_sequential && alloc_type == BTRFS_ALLOC_SEQ)) { + btrfs_err(fs_info, "found block group of mixed zone types"); + ret = -EIO; + goto out; + } + + if (!is_sequential) + continue; + + /* + * This zone will be used for allocation, so mark this + * zone non-empty. + */ + btrfs_dev_clear_zone_empty(device, physical); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + ret = btrfs_get_dev_zone(device, physical, &zone, GFP_NOFS); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err( + fs_info, "Offline/readonly zone %llu", + physical >> device->zone_info->zone_size_shift); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (alloc_type == BTRFS_ALLOC_FIT) + goto out; + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + cache->alloc_offset = WP_MISSING_DEV; + for (i = 0; i < map->num_stripes; i++) { + if (alloc_offsets[i] == WP_MISSING_DEV) + continue; + if (cache->alloc_offset == WP_MISSING_DEV) + cache->alloc_offset = alloc_offsets[i]; + if (alloc_offsets[i] == cache->alloc_offset) + continue; + + btrfs_err(fs_info, + "write pointer mismatch: block group %llu", + logical); + cache->wp_broken = 1; + } + break; + case BTRFS_BLOCK_GROUP_RAID0: + cache->alloc_offset = 0; + for (i = 0; i < map->num_stripes; i++) { + if (alloc_offsets[i] == WP_MISSING_DEV) { + btrfs_err(fs_info, + "cannot recover write pointer: block group %llu", + logical); + cache->wp_broken = 1; + continue; + } + + if (alloc_offsets[0] < alloc_offsets[i]) { + btrfs_err(fs_info, + "write pointer mismatch: block group %llu", + logical); + cache->wp_broken = 1; + continue; + } + + cache->alloc_offset += alloc_offsets[i]; + } + break; + case BTRFS_BLOCK_GROUP_RAID10: + /* + * Pass1: check write pointer of RAID1 level: each pointer + * should be equal. + */ + for (i = 0; i < map->num_stripes / map->sub_stripes; i++) { + int base = i * map->sub_stripes; + u64 offset = WP_MISSING_DEV; + + for (j = 0; j < map->sub_stripes; j++) { + if (alloc_offsets[base + j] == WP_MISSING_DEV) + continue; + if (offset == WP_MISSING_DEV) + offset = alloc_offsets[base+j]; + if (alloc_offsets[base + j] == offset) + continue; + + btrfs_err(fs_info, + "write pointer mismatch: block group %llu", + logical); + cache->wp_broken = 1; + } + for (j = 0; j < map->sub_stripes; j++) + alloc_offsets[base + j] = offset; + } + + /* Pass2: check write pointer of RAID1 level */ + cache->alloc_offset = 0; + for (i = 0; i < map->num_stripes / map->sub_stripes; i++) { + int base = i * map->sub_stripes; + + if (alloc_offsets[base] == WP_MISSING_DEV) { + btrfs_err(fs_info, + "cannot recover write pointer: block group %llu", + logical); + cache->wp_broken = 1; + continue; + } + + if (alloc_offsets[0] < alloc_offsets[base]) { + btrfs_err(fs_info, + "write pointer mismatch: block group %llu", + logical); + cache->wp_broken = 1; + continue; + } + + cache->alloc_offset += alloc_offsets[base]; + } + break; + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* RAID5/6 is not supported yet */ + default: + btrfs_err(fs_info, "Unsupported profile on HMZONED %llu", + map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK); + ret = -EINVAL; + goto out; + } + +out: + cache->alloc_type = alloc_type; + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index 396ece5f9410..399d9e9543aa 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -31,6 +31,7 @@ int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info); int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info); bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, u64 num_bytes); +int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index ab7b9ec4c240..4c6457bd1b9c 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -15,6 +15,7 @@ u64 btrfs_space_info_used(struct btrfs_space_info *s_info, ASSERT(s_info); return s_info->bytes_used + s_info->bytes_reserved + s_info->bytes_pinned + s_info->bytes_readonly + + s_info->bytes_zone_unusable + (may_use_included ? s_info->bytes_may_use : 0); } @@ -133,7 +134,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; @@ -149,6 +150,7 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; found->bytes_readonly += bytes_readonly; + found->bytes_zone_unusable += bytes_zone_unusable; if (total_bytes > 0) found->full = 0; btrfs_space_info_add_new_bytes(info, found, @@ -372,10 +374,10 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, info->total_bytes - btrfs_space_info_used(info, true), info->full ? "" : "not "); btrfs_info(fs_info, - "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu", + "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu zone_unusable=%llu", info->total_bytes, info->bytes_used, info->bytes_pinned, info->bytes_reserved, info->bytes_may_use, - info->bytes_readonly); + info->bytes_readonly, info->bytes_zone_unusable); spin_unlock(&info->lock); DUMP_BLOCK_RSV(fs_info, global_block_rsv); @@ -392,10 +394,11 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, list_for_each_entry(cache, &info->block_groups[index], list) { spin_lock(&cache->lock); btrfs_info(fs_info, - "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s", + "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved zone_unusable %llu %s", cache->key.objectid, cache->key.offset, btrfs_block_group_used(&cache->item), cache->pinned, - cache->reserved, cache->ro ? "[readonly]" : ""); + cache->reserved, cache->zone_unusable, + cache->ro ? "[readonly]" : ""); btrfs_dump_free_space(cache, bytes); spin_unlock(&cache->lock); } diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index c2b54b8e1a14..b3837b2c41e4 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -17,6 +17,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 bytes_zone_unusable; /* total bytes that are unusable until + resetting the device zone */ u64 max_extent_size; /* This will hold the maximum extent size of the space info if we had an ENOSPC in the @@ -115,7 +117,7 @@ void btrfs_space_info_add_old_bytes(struct btrfs_fs_info *fs_info, int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index ad708a9edd0b..37733ec8e437 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -349,6 +349,7 @@ SPACE_INFO_ATTR(bytes_pinned); SPACE_INFO_ATTR(bytes_reserved); SPACE_INFO_ATTR(bytes_may_use); SPACE_INFO_ATTR(bytes_readonly); +SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); BTRFS_ATTR(space_info, total_bytes_pinned, @@ -362,6 +363,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bytes_reserved), BTRFS_ATTR_PTR(space_info, bytes_may_use), BTRFS_ATTR_PTR(space_info, bytes_readonly), + BTRFS_ATTR_PTR(space_info, bytes_zone_unusable), BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, total_bytes_pinned), From patchwork Thu Aug 8 09:30:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083765 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41C881709 for ; Thu, 8 Aug 2019 09:31:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30C3B289CF for ; Thu, 8 Aug 2019 09:31:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2476628AFD; Thu, 8 Aug 2019 09:31:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF61F28A4E for ; Thu, 8 Aug 2019 09:31:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732367AbfHHJbf (ORCPT ); Thu, 8 Aug 2019 05:31:35 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732335AbfHHJbe (ORCPT ); Thu, 8 Aug 2019 05:31:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256694; x=1596792694; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jYKsL/1OlXFJUZhZp6NAR8UoqS9vZgr4waQTYzZT8kw=; b=KFPe+ABRz2qXW5I1WhYxyQxQz7sXnLp53u+ib0k89CJ31bLkCavPnksY Gc3umDUw8D2T6M4vgxPchB80n9Zd6c29OJqqDmwMKeZsIWmjp6Pw2aZW6 QSNGSPBsIj2NIgMCAA+Dv2qrG2JfOXpb7Tv769O5f4Qu+rudGw7w6N7Jq zKyZie/4LSznGayY3pt9bg8sSzaz0cLffuOmGAZaP2HvC3TFFrM5ZZOU0 1BgJboms/ThxvRn3SkogjCVPLqL9eeCA2BQBRk2o1/dQ64B+mvzuwCUMe 8O8Nan/bBR46fCbX0r9gzE7USs5R8szMtoQIKXvkdrTluKq4fXYRWesWz A==; IronPort-SDR: l24Il5RaSN7Q2/niocy0ZOUWwQyGzW3XdN6BynZUCJnQHnvebhbU1keeAKE4ow+jWsS/vfWW+U zb3+bvV89bccR3EYWpVRuUPMWAW1j7akkcnDidYUOcMW5i3L2p8MYfC5Vosq4aigREv6AKWenb vBdeEq062PjgDfM7GFMoj3/ovemauyja1xySto6vFrvfGNlz3FHQY28EVzAC+wEiZaA7lqsBxW fLYQYl35pviGP3YW7tYEiaMVmgAFjLoxgRBWWTaDiX8zaPL7/m8F2VBZfqo1kGeb5eSTOxgui2 HTE= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363360" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:34 +0800 IronPort-SDR: AeNpXHToLNrM4YLUKW/n2eXSkrGiHPGuuAh9yhV8NKR7NONBSck127MHzt8M4RpTofMc9NCbIi 73lp/2TN6FBX2r7UK8jOm4e1jvjfWJ/uEtzUEvZykBuMlLRwDw7qsb+gyRwpWRCvQYSVNAa/dl WelbNjDj8+Y26ggW1OgcjJHgQo+7/f83fm+0lUPqA4uoi7b6sFCiqpEoKX10nF5TFTWoYFK0tQ ktX7sZY4x0zbTDRgsglPsLmiEDG8zNrtKQAyDj+aJdQJ4SR5I/JeJn7OqK2ZI2iiYyH2S3bNaG UIuj7VM7QcSfRlRTpYFJA/nf Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:18 -0700 IronPort-SDR: jWCrIph+cwuyVi7MfHyfleyB7RA7JSlsd4mwUa/mOF1Hx0cOwg0U2ufBJWABgQAFLErDs6ZBRH LZA5010Cz3b9bXSYqO4o3W+nam18IeY6XrHCB62aLG7L0eYoYFf/CibTjXD+C5XFbTOmox34G+ qJaH04kaJZ80rUyTHyT3vnp7f3qlnO/k9zieTliP3wH9RWWzmFjAAvMBBUYZRxeAjLW2LRbkM6 QzRd54n+c6AcCKhVbdwjnKeoDMU8dvWJYp0L3gDPi1IIkemEjmkN7I/poihBshudfMdmbhCOi1 Ar8= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:33 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 11/27] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Date: Thu, 8 Aug 2019 18:30:22 +0900 Message-Id: <20190808093038.4163421-12-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If the btrfs volume has mirrored block groups, it unconditionally makes un-mirrored block groups read only. When we have mirrored block groups, but don't have writable block groups, this will drop all writable block groups. So, check if we have at least one writable mirrored block group before setting un-mirrored block groups read only. This change is necessary to handle e.g. xfstests btrfs/124 case. When we mount degraded RAID1 FS and write to it, and then re-mount with full device, the write pointers of corresponding zones of written BG differ. We mark such block group as "wp_broken" and make it read only. In this situation, we only have read only RAID1 BGs because of "wp_broken" and un-mirrored BGs are also marked read only, because we have RAID1 BGs. As a result, all the BGs are now read only, so that we cannot even start the rebalance to fix the situation. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d2aacffe14d6..d0d887448bb5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8142,6 +8142,27 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info) return ret; } +/* + * have_mirrored_block_group - check if we have at least one writable + * mirrored Block Group + */ +static bool have_mirrored_block_group(struct btrfs_space_info *space_info) +{ + struct btrfs_block_group_cache *cache; + int i; + + for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) { + if (i == BTRFS_RAID_RAID0 || i == BTRFS_RAID_SINGLE) + continue; + list_for_each_entry(cache, &space_info->block_groups[i], + list) { + if (!cache->ro) + return true; + } + } + return false; +} + int btrfs_read_block_groups(struct btrfs_fs_info *info) { struct btrfs_path *path; @@ -8329,6 +8350,10 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) BTRFS_BLOCK_GROUP_RAID56_MASK | BTRFS_BLOCK_GROUP_DUP))) continue; + + if (!have_mirrored_block_group(space_info)) + continue; + /* * avoid allocating from un-mirrored block group if there are * mirrored block groups. From patchwork Thu Aug 8 09:30:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083771 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E3381398 for ; Thu, 8 Aug 2019 09:31:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5FA59289CF for ; Thu, 8 Aug 2019 09:31:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 535BD28AFF; Thu, 8 Aug 2019 09:31:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CBC47289CF for ; Thu, 8 Aug 2019 09:31:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732372AbfHHJbh (ORCPT ); Thu, 8 Aug 2019 05:31:37 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732221AbfHHJbg (ORCPT ); Thu, 8 Aug 2019 05:31:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256696; x=1596792696; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fhlPGqsAtyE6VIkqRIno/wzBJXuUmwFtvXwoT7NL7YM=; b=dbmlasohromUUN6Lx+XQgg07dbwUaPz+yVSVR2VeXkOic6qYdKDtAd5l J9XJP3NPd4RKO6gtyTPnPu7yTgfvcqA3ZIqOMB/MxrfrElbNTDTFAXsrk 27v8D+rQooTHoZq+kuzOUPiiWqBg87Tbrp1GnyFeSZRmkgy/4O5kVwmQR PwA3r70LoKkCYMUlqcMc5R0nh7ZsQ25RO+Qd5SOYeHJ8ztpObv17Ye1Np SsUHQTQsBtaVX6nl+C42RWEnAOnppnO4//wVrroJy3vSXZBG3qaxcWNFB QR5mD2VzxXeamVluM6fUQ7n13Sch3WQFo0rlT0/q99Dn9KR2IseqNjKgW Q==; IronPort-SDR: 7XkF2OIZeCtL9sTmOc2cuGQssipZ0H9b1kV1zXf+EN4U8SJjIXfOl63iNX17TUM9zUr2iEjiIF KmoWdet8Pg1nTgrmpvN6KEFqISOmriznMm88YWenSshWYd16+JVXzlPr+AlvcpoUyMDmLol7jD qNtm5Y1E53usVCHi5f4pimNRzy6s4SfKch1Npzvage+SZA2OLYE/ncm5xkuylEjASnFcsFWtTB cF5XkNoLD7sDpj0j2LWlZvgft01DrgrGirHoY6gM/2NseiPyRWCzPapfHO9Ixbyvvk9EHfbyaj 2Gw= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363365" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:36 +0800 IronPort-SDR: 3Z2i7dstl2d42rPD8blJ+R4Ky9h6jazbfcXeJE9BbIlyZywwhLDND2Zx0aExOxEQmjAd9pGaA4 IVqaZM7E2yymo6/+TtQkGjQtu9CHK/onJUAZt6mOs8+41wZhgxDryMVKxsl/fsDOVvY0yrkzRx M6wgCDNdzyyFxlOZYuAK//M/5UA4HEmvxtDwRk9snf+rMmQJrMjyf3+owVA4BWIPjVC8AaHWty xL1lVEErhCRC+GEtA9YLVyIXDsFLoYHvszwJTABV18EDKoBt5h/9k3g9jLb219TrZsyVUFskw2 xsLdnyodGegUO+L/xgH75FkW Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:20 -0700 IronPort-SDR: DFIdTiZWCgwnlQCeP18lJnDT92ZWih7XHyIVIUM+kQxwrX/kBwik9+dmtny0XfG9lTOqnhBarV 0KNjghfEDuppDfMR0ePGJkZWwGx0wF5/3IC0d96WN90q5PCjlSVLQqLMtjTD4z5PuxLTOQ4w40 nV5ypx1tJE4TPUfqeLfgpv4BKL0jkZD+w+feyZvIkN9NJN5XB0Cey2txcXRPKWpjFvce+rfjHw 4JTHLJlyoFwWb85AXQ8Ky3UrSq2PbWu0U6C/ghLABlNrb1EipldLJxawrm9SSY+AYP0Gng5pGQ SOg= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:35 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 12/27] btrfs: ensure metadata space available on/after degraded mount in HMZONED Date: Thu, 8 Aug 2019 18:30:23 +0900 Message-Id: <20190808093038.4163421-13-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On/After degraded mount, we might have no writable metadata block group due to broken write pointers. If you e.g. balance the FS before writing any data, alloc_tree_block_no_bg_flush() (called from insert_balance_item()) fails to allocate a tree block for it, due to global reservation failure. We can reproduce this situation with xfstests btrfs/124. While we can workaround the failure if we write some data and, as a result of writing, let a new metadata block group allocated, it's a bad practice to apply. This commit avoids such failures by ensuring that read-write mounted volume has non-zero metadata space. If metadata space is empty, it forces new metadata block group allocation. Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 9 +++++++++ fs/btrfs/hmzoned.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/hmzoned.h | 1 + 3 files changed, 55 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8854ff2e5fa5..65b3198c6e83 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3287,6 +3287,15 @@ int open_ctree(struct super_block *sb, } } + ret = btrfs_hmzoned_check_metadata_space(fs_info); + if (ret) { + btrfs_warn(fs_info, "failed to allocate metadata space: %d", + ret); + btrfs_warn(fs_info, "try remount with readonly"); + close_ctree(fs_info); + return ret; + } + down_read(&fs_info->cleanup_work_sem); if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) || (ret = btrfs_orphan_cleanup(fs_info->tree_root))) { diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 89631f5f01f2..38cc1bbfe118 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -13,6 +13,8 @@ #include "hmzoned.h" #include "rcu-string.h" #include "disk-io.h" +#include "space-info.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -551,3 +553,46 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache) return ret; } + +/* + * On/After degraded mount, we might have no writable metadata block + * group due to broken write pointers. If you e.g. balance the FS + * before writing any data, alloc_tree_block_no_bg_flush() (called + * from insert_balance_item())fails to allocate a tree block for + * it. To avoid such situations, ensure we have some metadata BG here. + */ +int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info) +{ + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_trans_handle *trans; + struct btrfs_space_info *info; + u64 left; + int ret; + + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return 0; + + info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA); + spin_lock(&info->lock); + left = info->total_bytes - btrfs_space_info_used(info, true); + spin_unlock(&info->lock); + + if (left) + return 0; + + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + mutex_lock(&fs_info->chunk_mutex); + ret = btrfs_alloc_chunk(trans, btrfs_metadata_alloc_profile(fs_info)); + if (ret) { + mutex_unlock(&fs_info->chunk_mutex); + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + } + mutex_unlock(&fs_info->chunk_mutex); + + return btrfs_commit_transaction(trans); +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index 399d9e9543aa..e95139d4c072 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -32,6 +32,7 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info); bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, u64 num_bytes); int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache); +int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { From patchwork Thu Aug 8 09:30:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083775 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D79301398 for ; Thu, 8 Aug 2019 09:31:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C67E6289CF for ; Thu, 8 Aug 2019 09:31:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BA95828AFD; Thu, 8 Aug 2019 09:31:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 49248289CF for ; Thu, 8 Aug 2019 09:31:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732380AbfHHJbj (ORCPT ); Thu, 8 Aug 2019 05:31:39 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732221AbfHHJbi (ORCPT ); Thu, 8 Aug 2019 05:31:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256698; x=1596792698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vnp/vX92Ol+fm3AJcz5D2qt3jc+Tr939ENAUlT2pB5E=; b=hzacS94wazbmSRb8LSyLeAt76X81hi4D2dq47Znw2lsElv1C0VpNw3di cvKOJw78yWHtX0wuJQCIZI2NyvpMerPR8d0f62xrSoHTlPfWqrEwuTQOv FM6pNh0S1b2zJJrXwCtn0RmuscwaU3ixpAu2OzMb8gd98VmTR/s2Q8vzP YA7Y3r9dRDz15LgRKhHv4j0oj629s3NOztn4OeI0Q6CgjCtmyyuFx48yV D0I7mpCr+iH3A+JYgtTDo+cA3crXI5VCZ70BoJRgTk4PObhG1g9SmC/1Y rcYMjKtingdQkw+p5hQh/KgTTGwGm0gMRRSQcbzIah4uog5rC1VeJn3vY Q==; IronPort-SDR: 0Gyn6HW12bTrGyOeva8toH06iRlM9H5A5OI2wrRRSnVZpLlLy9QSivbJrBmkfhidQ6gCD2084+ nhoxeJMDIeGJRic/OG3yvCMV392XpSN058FSyoc0rV/HnLfkAQjJp8ReXUEaj9wR4L4gXh4Sjd EHLkb+nHv8i6hXou6WJHoFw00afHMFDXTStK3ico62amZA/12LcQcD7suqnmgDCyHc2HQTtKsk o58u+8zyYA92pY8HMKh3UibI4OA4LA8ymRB7lI7DCa9ETTNXpnCqxJ2v/qmn/2sOY6ENY6zYOk 5Ac= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363370" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:38 +0800 IronPort-SDR: uzwtNRcJoyjn1DBDSlVyYg0oHWq2eBbn8OtnCaVFBvt/wKtjbSpdZvatAnZ6BsD6pBgujOtC/H MeWwwgjEJ/weHEEWJzqJcaS9YH4kQoa0u1hYy3vudeQgMHtmQcT+xPV9rOAuDE8GTNjpn7Of41 /3NzQ5NcNJQFC9RXaNUPRmQrzGUPFnA9u/WQ0FS6hmZ6xbPS8EFWijfS9vqgIP1314yI1rGG6X f7wr/RoFzOnQiFh0FpNmP/ppMoeDm7q3umS7okYiNJ76pa/1cI0gRhhcdnW7kaECBKiMogD7ZE 9lbXn1R0qBrXA1wZmLA1L3g8 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:22 -0700 IronPort-SDR: J5GqJEzM8HhV0pXjAe5bD2rmHw/4IEaW/ZcVEqaS0eHig2xD48sv7pK5PmZlS31U4bNfqz7uib 3byRjjpoOfLfmEiGZLx3VhQ1tcJowpqCxv1uy9Gk1N6FqFx2gQPkDnTdmB2FZJg3L/ghxvI8fM jv7TJN3EhXf3b+6OfD9TzFVYzg/L2W9jUO0CFvkmfZUvKkxSdvPJ1P+Xde6ArwLv30EusAVW1R 8emgh7lf+VOjPdmQNrBED9y72jFNtuvPGt+3oJu8mgdi4iD8m3O/KvkrT3TccZI98DKbI0SPtt i7U= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:37 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 13/27] btrfs: reset zones of unused block groups Date: Thu, 8 Aug 2019 18:30:24 +0900 Message-Id: <20190808093038.4163421-14-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For an HMZONED volume, a block group maps to a zone of the device. For deleted unused block groups, the zone of the block group can be reset to rewind the zone write pointer at the start of the zone. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 27 +++++++++++++++++++-------- fs/btrfs/hmzoned.c | 18 ++++++++++++++++++ fs/btrfs/hmzoned.h | 18 ++++++++++++++++++ 3 files changed, 55 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d0d887448bb5..8665aba61bb9 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1936,6 +1936,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, for (i = 0; i < bbio->num_stripes; i++, stripe++) { + struct btrfs_device *dev = stripe->dev; + u64 physical = stripe->physical; + u64 length = stripe->length; u64 bytes; struct request_queue *req_q; @@ -1943,19 +1946,23 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + req_q = bdev_get_queue(stripe->dev->bdev); - if (!blk_queue_discard(req_q)) + + /* zone reset in HMZONED mode */ + if (btrfs_can_zone_reset(dev, physical, length)) + ret = btrfs_reset_device_zone(dev, physical, + length, &bytes); + else if (blk_queue_discard(req_q)) + ret = btrfs_issue_discard(dev->bdev, physical, + length, &bytes); + else continue; - ret = btrfs_issue_discard(stripe->dev->bdev, - stripe->physical, - stripe->length, - &bytes); if (!ret) discarded_bytes += bytes; else if (ret != -EOPNOTSUPP) break; /* Logic errors or -ENOMEM, or -EIO but I don't know how that could happen JDM */ - /* * Just in case we get back EOPNOTSUPP for some reason, * just ignore the return value so we don't screw up @@ -8985,8 +8992,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); - /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD); + /* + * DISCARD can flip during remount. In HMZONED mode, + * we need to reset sequential required zones. + */ + trimming = btrfs_test_opt(fs_info, DISCARD) || + btrfs_fs_incompat(fs_info, HMZONED); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 38cc1bbfe118..5968ef621fa7 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -596,3 +596,21 @@ int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info) return btrfs_commit_transaction(trans); } + +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes) +{ + int ret; + + ret = blkdev_reset_zones(device->bdev, + physical >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS); + if (!ret) { + *bytes = length; + set_bit(physical >> device->zone_info->zone_size_shift, + device->zone_info->empty_zones); + } + + return ret; +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index e95139d4c072..40b4151fc935 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -32,6 +32,8 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info); bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, u64 num_bytes); int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache); +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes); int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -107,4 +109,20 @@ static inline u64 btrfs_zone_align(struct btrfs_device *device, u64 pos) return ALIGN(pos, device->zone_info->zone_size); } +static inline bool btrfs_can_zone_reset(struct btrfs_device *device, + u64 physical, u64 length) +{ + u64 zone_size; + + if (!btrfs_dev_is_sequential(device, physical)) + return false; + + zone_size = device->zone_info->zone_size; + if (!IS_ALIGNED(physical, zone_size) || + !IS_ALIGNED(length, zone_size)) + return false; + + return true; +} + #endif From patchwork Thu Aug 8 09:30:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 300421398 for ; Thu, 8 Aug 2019 09:31:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1FFD9289CF for ; Thu, 8 Aug 2019 09:31:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1451B28AFC; Thu, 8 Aug 2019 09:31:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DADD289CF for ; Thu, 8 Aug 2019 09:31:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732388AbfHHJbm (ORCPT ); Thu, 8 Aug 2019 05:31:42 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59650 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732221AbfHHJbl (ORCPT ); Thu, 8 Aug 2019 05:31:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256700; x=1596792700; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+gzXoIOCfgUIpJhfW43aVrMMmQdl5D7DVFIt2CZBcYc=; b=JRLmrCGEx5y5Ae8Iv7/TbMjcK6MJCe1uPM+1Te9FRBzDMYC6/dcCFpdS c/OP7sfkd2gqm4lTjnWvmpO8Du9ZYHTfslIS8OJb1xvJTtSlKZgsspoad 6yVNHhFq8I/boi6IqLErSFIja0SBPXtpssGETYIykW4dWpS/rsNO/g5OG E3Rwwn50Nlgm42rpx4qp10gmGX7H4YpVSrHI1MMoLs43nynzNUrpOdbwE gVO8p0zjzjhnFwumYAKCgPxyrfFFR2sveN8g823u/QZ08y7z2KvNR6f+C YJwtP9gF97SrEe72DhtQi/aQh8Bkbq5Oi5wxWcyazcclZUPFGdirh3jmI w==; IronPort-SDR: XFeVPViWvHlyigrs8TpOkSlswBzF9ePj8a4ZE8tvEedy1QYKeiYHzYyRSBU08f9/DYAjhz2YqF Ww0wshcrWb5zDEXI0OEe1SIfnU1YAqxCXrVUUeyzQAU9Z4waXfFnDoUNt3STWWwmZ8ObcNnJNl CTVm5PBRMLosinVBCGTIBPb2xwrodDSCAbNyriT5+NlMOWCg0awO2q03CagREy4mZe5b0T2elz SAcIoxw++K/+i7TQl1vX3scpW3EGsev7TZ3242xV/x/rS9lB5vAi3kO0dvA1qRqeaqKoisQVL5 dHI= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363372" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:40 +0800 IronPort-SDR: TJRHdd60/JYTfzCCah4ilbGwJ4kkm2hAuGk6puUu5C0rsm9rasG/BQkMBnXd6+8j6d0OXXw8Hy u/9fAn/sFTQlFzb3alQSTWfNBTH2QVCKhFv/ZeMgYa75GPtEFHoPQ5/Q2o4UXZ/HKhEb8cSANF JLMM2M2DcboolZb0xglY7PNaCF4AA0w2xrXCqxX7vqNt99VBpGungrSyj0RmuO4tdQK9ETxf3n ENohyMHBomiBFUqZRnSY3C7syEO+dEbe3NQy/XvG3gbdHSOOXEuWHoeQZPk+fc1/cS5ugjDq7I l1Y0G48zdzsfkoA6hwhhgtaS Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:24 -0700 IronPort-SDR: N5IoBjuvounAwKb8tXBPJiGmGhPCPEu+BPe90a4q8wNRqe/kefR89hR6hhoCg8eeHIRyA3lgXS 2xwPKUioPegt3HkQHFhpBS5AiVL47FGAEtrQi3YeBu5lWLu5yowG+tuRO6KXY7vpN3pCMNNDBp RkLOUUNJEjQiMAZabgPx/79cwtQSgiPOsNMcsxbUorAgcXakTQ6f4ekfITWrgqPJsu0i0sjl4h W3r/CdMiq2gCt0M2HvqLZJl9DWqp2+t+0u3lRwVD8rp8evZjYi6wJdjOvYCV1AiYU/VcFUWBdw EGU= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:39 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 14/27] btrfs: limit super block locations in HMZONED mode Date: Thu, 8 Aug 2019 18:30:25 +0900 Message-Id: <20190808093038.4163421-15-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When in HMZONED mode, make sure that device super blocks are located in randomly writable zones of zoned block devices. That is, do not write super blocks in sequential write required zones of host-managed zoned block devices as update would not be possible. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 4 ++++ fs/btrfs/extent-tree.c | 8 ++++++++ fs/btrfs/hmzoned.h | 12 ++++++++++++ fs/btrfs/scrub.c | 3 +++ 4 files changed, 27 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 65b3198c6e83..a0a3709de2e6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3547,6 +3547,8 @@ static int write_dev_supers(struct btrfs_device *device, if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->commit_total_bytes) break; + if (!btrfs_check_super_location(device, bytenr)) + continue; btrfs_set_super_bytenr(sb, bytenr); @@ -3613,6 +3615,8 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors) if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->commit_total_bytes) break; + if (!btrfs_check_super_location(device, bytenr)) + continue; bh = __find_get_block(device->bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8665aba61bb9..de9d3028833e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -238,6 +238,14 @@ static int exclude_super_stripes(struct btrfs_block_group_cache *cache) if (logical[nr] + stripe_len <= cache->key.objectid) continue; + /* shouldn't have super stripes in sequential zones */ + if (cache->alloc_type == BTRFS_ALLOC_SEQ) { + btrfs_err(fs_info, + "sequentil allocation bg %llu should not have super blocks", + cache->key.objectid); + return -EUCLEAN; + } + start = logical[nr]; if (start < cache->key.objectid) { start = cache->key.objectid; diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index 40b4151fc935..9de26d6b8c4e 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -10,6 +10,7 @@ #define BTRFS_HMZONED_H #include +#include "volumes.h" struct btrfs_zoned_device_info { /* @@ -125,4 +126,15 @@ static inline bool btrfs_can_zone_reset(struct btrfs_device *device, return true; } +static inline bool btrfs_check_super_location(struct btrfs_device *device, + u64 pos) +{ + /* + * On a non-zoned device, any address is OK. On a zoned + * device, non-SEQUENTIAL WRITE REQUIRED zones are capable. + */ + return device->zone_info == NULL || + !btrfs_dev_is_sequential(device, pos); +} + #endif diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 0c99cf9fb595..e15d846c700a 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -18,6 +18,7 @@ #include "check-integrity.h" #include "rcu-string.h" #include "raid56.h" +#include "hmzoned.h" /* * This is only the first step towards a full-features scrub. It reads all @@ -3732,6 +3733,8 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, if (bytenr + BTRFS_SUPER_INFO_SIZE > scrub_dev->commit_total_bytes) break; + if (!btrfs_check_super_location(scrub_dev, bytenr)) + continue; ret = scrub_pages(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr, scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i, From patchwork Thu Aug 8 09:30:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A158814DB for ; Thu, 8 Aug 2019 09:31:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8EDCE289CF for ; Thu, 8 Aug 2019 09:31:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 82E2F28B01; Thu, 8 Aug 2019 09:31:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EEFAF28A4E for ; Thu, 8 Aug 2019 09:31:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732394AbfHHJbo (ORCPT ); Thu, 8 Aug 2019 05:31:44 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59661 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732176AbfHHJbm (ORCPT ); Thu, 8 Aug 2019 05:31:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256702; x=1596792702; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=95TLqx4nx2zUBAa8a/Oek5KR4ReXUGbQocMRXrb3wxs=; b=nhHf7CZLnK1JYkvy3CaExDm5rlcV0GBjrQ4i2ysqCElXSqWOQE4mjsCY DiQ/xsNHV1wVmSXJ0YXmK73IOOtTDNIczUqI/0cmF+1qJTyViJMcozbcd GqCGqCmYjVx5STbRO1oE6nBQ7GGhWv6is5e+BfOpW5jEhZwC8mrayMLl7 hoeFj0Ymf6BwESq6bFSf8+sqkq4lfyjILmL03e/zUNTv9rX1fmuqwwEoC wtMgP9JFvKZHdfkAv3RC6JMZDoqfCmEgAtG0cWue3wTotvfK9oCeIWANh m8b/5kv/OFac0obosa+j8vjU9Y80t+mkg/46K7jYtx3iG/gHbsfwHOs4J Q==; IronPort-SDR: 3w70Jo/+Gj96WTUcr/hrnfD2aBS1F5iV8GFFFQq/YwGh3uKInzkMvfwcAJlJnkHcwfV+cIOvLw YEU2L2IjTKVF+9FEhoAeCbiH7ThS1V+FV3lDqQmQdl03aY7QynoA4MayUvNdamyEskwI0X8gdM LJM31maBU4Ir7yMdMfYDfZ8+4eGLezOWwcV53lkO8Td3lQPjtnxLY4AO88LhX7EYOqzDGvEhBd JIEt0f3u7BX1mYW+0/Lrtn/s9tcywPuRvdFee+PYE63xjiOaXCHP9CmJQhjNnPGujOrkgilT20 iB8= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363376" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:41 +0800 IronPort-SDR: 0nnf643pgrTyDq7csMC/wi92QiNETO409jyoO4HjsntsrFvSrEPGRbUpjyPZIJwf89ST5RoQpY 675YFdCbbfb5GoJdrn9TJpvITDjOHAcuuGGyUzpm6u9lbexyankSKn6zVpOQcHtZ7slQfnaspQ T+hwqL6lwR8RCn4RciABzZvI9shi6YiPM42yi+ea3VaOU09LjAmFbfCTxATzkLcaox4H4XtAwQ 1+0Y4BbinzM47/gPckE4bQAIbkjk058qbb9retvw8izHy5T84MqQQJQUA4mx9VuGehXLtoogZk DLcqq0m9SautE5/5ZizJaRiS Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:26 -0700 IronPort-SDR: 8AItdRDgQrx4j//uTSZKQwnWXcE0fbllgICZo4TelbKmcV3ovkR/NRML+b5jlcjE6m5ms8ubH9 UpASYDh1RRFp+VtLOVdy36NW7S5VEUjyox5hKNKGA+vyPKPMS56zsZ23fXqQc2f2otOsmPPFI1 dFtygjDDU2hOJa6G7fQsw1EHq8Bf5op2PCdjIncOLfcpH0cxQn6mk+V/32gfrGk9sKTeFMroT/ DuO0PafcM0fA0utApE9DcoVv9PFIwueD2Mc6qigJNKbVXD26v9+3DcZAPPZ4pNSm3VmYVVpUhV 9kw= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:41 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 15/27] btrfs: redirty released extent buffers in sequential BGs Date: Thu, 8 Aug 2019 18:30:26 +0900 Message-Id: <20190808093038.4163421-16-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On HMZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean and unwritten extent buffers that have been released in a transaction. Btrfs redirty the buffer so that btree_write_cache_pages() can send proper bios to the disk. Besides it clear the entire content of the extent buffer not to confuse raw block scanners e.g. btrfsck. By clearing the content, csum_dirty_buffer() complains about bytenr mismatch, so avoid the checking and checksum using newly introduced buffer flag EXTENT_BUFFER_NO_CHECK. Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 5 +++++ fs/btrfs/extent-tree.c | 11 ++++++++++- fs/btrfs/extent_io.c | 2 ++ fs/btrfs/extent_io.h | 2 ++ fs/btrfs/hmzoned.c | 34 ++++++++++++++++++++++++++++++++++ fs/btrfs/hmzoned.h | 3 +++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ 8 files changed, 69 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a0a3709de2e6..e0a80997b6ee 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -513,6 +513,9 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) if (page != eb->pages[0]) return 0; + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) + return 0; + found_start = btrfs_header_bytenr(eb); /* * Please do not consolidate these warnings into a single if. @@ -4577,6 +4580,8 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, btrfs_destroy_pinned_extent(fs_info, fs_info->pinned_extents); + btrfs_free_redirty_list(cur_trans); + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index de9d3028833e..bc95a73a762d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5084,8 +5084,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, buf->start); - if (!ret) + if (!ret) { + btrfs_redirty_list_add(trans->transaction, buf); goto out; + } } pin = 0; @@ -5097,6 +5099,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, goto out; } + if (btrfs_fs_incompat(fs_info, HMZONED)) { + btrfs_redirty_list_add(trans->transaction, buf); + pin_down_extent(cache, buf->start, buf->len, 1); + btrfs_put_block_group(cache); + goto out; + } + WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)); btrfs_add_free_space(cache, buf->start, buf->len); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index aea990473392..4e67b16c9f80 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -23,6 +23,7 @@ #include "rcu-string.h" #include "backref.h" #include "disk-io.h" +#include "hmzoned.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -4863,6 +4864,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, init_waitqueue_head(&eb->read_lock_wq); btrfs_leak_debug_add(&eb->leak_list, &buffers); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 401423b16976..c63b58438f90 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -58,6 +58,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -186,6 +187,7 @@ struct extent_buffer { */ wait_queue_head_t read_lock_wq; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG int spinning_writers; atomic_t spinning_readers; diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 5968ef621fa7..4c296d282e67 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -614,3 +614,37 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, return ret; } + +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + + if (!btrfs_fs_incompat(fs_info, HMZONED) || + btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) || + !list_empty(&eb->release_list)) + return; + + set_extent_buffer_dirty(eb); + memzero_extent_buffer(eb, 0, eb->len); + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + + spin_lock(&trans->releasing_ebs_lock); + list_add_tail(&eb->release_list, &trans->releasing_ebs); + spin_unlock(&trans->releasing_ebs_lock); + atomic_inc(&eb->refs); +} + +void btrfs_free_redirty_list(struct btrfs_transaction *trans) +{ + spin_lock(&trans->releasing_ebs_lock); + while (!list_empty(&trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + spin_unlock(&trans->releasing_ebs_lock); +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index 9de26d6b8c4e..3a73c3c5e1da 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -36,6 +36,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache); int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info); +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb); +void btrfs_free_redirty_list(struct btrfs_transaction *trans); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index e3adb714c04b..45bd7c25bebf 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -19,6 +19,7 @@ #include "volumes.h" #include "dev-replace.h" #include "qgroup.h" +#include "hmzoned.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -257,6 +258,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2269,6 +2272,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + /* + * At this point, we should have written the all tree blocks + * allocated in this transaction. So it's now safe to free the + * redirtyied extent buffers. + */ + btrfs_free_redirty_list(cur_trans); + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 2c5a6f6e5bb0..09329d2901b7 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -85,6 +85,9 @@ struct btrfs_transaction { spinlock_t dropped_roots_lock; struct btrfs_delayed_ref_root delayed_refs; struct btrfs_fs_info *fs_info; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) From patchwork Thu Aug 8 09:30:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083789 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C75114DB for ; Thu, 8 Aug 2019 09:31:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DE7D289CF for ; Thu, 8 Aug 2019 09:31:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41B6B28AFC; Thu, 8 Aug 2019 09:31:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA8C828A4E for ; Thu, 8 Aug 2019 09:31:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732403AbfHHJbq (ORCPT ); Thu, 8 Aug 2019 05:31:46 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732202AbfHHJbo (ORCPT ); Thu, 8 Aug 2019 05:31:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256704; x=1596792704; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=s+439xCDucUCzK3Jj6tvgdGIkZ65+VhZ5mm/gT/YK5E=; b=ZmV2M96TmURUwfTIXPppxdwWAER14p/WWS+noY+4Cx6pa9UN5LyJolMe +5gNXgYUneanpnrCsk/aT/I4dUzkehemQUCnto3pcu0nz3+kbWeOLc06f DKn5CQJZ5n7e36Idmo3c4xfYKvPybsjVZIqYLOjaJdt8tsaQPtsrrW4us a6Lc2Qlc4gnPrOz1blWOFS3tB1WmBDJ3Ja1IaV8TtG88eGMhbxoktBdfK RHGRb7YuyjgA36I0isT7dCfZRLR/LltIZR+mpFgmqnF4bqisySwGwFGV2 CqKOUooBuwrHtjqCbZjK8pUXDj4HB0ccEYNVv05333RfUpRHacN6Qn7zJ w==; IronPort-SDR: 2TMmi3lXiElEHLzQzRKt/CNV7NfYcDOWjJAPMslmCZzzCPic+J62zgYt6kitjsGPCl14pg5M9r 571rQJslZMsobZTgfQLw5GdZ4ACz+rtdzqqTdI96KKIxQt8r8QCjX6nyM6ZK3fldwf3xnRM/Jy IgvKAXvZKj8bwUi56z5/rlfkS1ap5FEVUrfA0dwNAd+imKTDMjG6qn9kj0vwVRE+UdjsahGEpW nLWlIb1ddYwuB9Wha66rOZ3i7n5XzPUt5GPLy91F/Mh4tl+6+Ee6boHnUB78A9VPStJeMmLBLT gZ0= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363381" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:43 +0800 IronPort-SDR: G4BV1zFHLmJVmhe7m+kHNm2qdVYT0YJ1yXG/2IixIVwqm4OefdQE+qzX8XWZjhhwdgRuuf6gRc dMUPsJFH1Al1JpLmmO97s9bay1GY1ULVIoosIFWIVlRwsZdvh4RvKG4RPP5Z+x2uX/IP4B223t tU5uU2k+a11SJnR+QABtdihq0pj/Bv2Ny09ED6TE0r6BGO3a5PWnfNuaRNSiVW0OaRNAzVVg64 75Vm5wwS/6/iA+qq07E1u3h6yQJMFeDC9Nc41u5gJ/wlb+VHm07lMs1HfUBIOFLufhmkQiPqvi Mb8g+EnYUQiiT5RSfIxQv1SE Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:28 -0700 IronPort-SDR: a0FXLMkufqMqShzalCB1sd/2Th8akOxgGvcLfVQcoOAq7L7Hz9J2bWToZW9F0boLELyNzPPK1l c3xjgyTwxtG+mxW7ytoeGax1HCE4aiNlYdIjVkIm6A+YJxouqR/ePDFm7ZLdqOl3oIVs0Loq1B 0/hmAooJn8nQBNERVdS4+krfLbQ7BIIcu1NRbnjo23yI3IVj8qdsaR4nmkkkSrxZuz0ZPcDSDW LK0Wz6jK1dCqSa7lBYQrn2PC7YqvW60WzlJ6lg7RV+uH49VzD6g37svBbgBOCvlSsCGwvpIFhY viY= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:43 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 16/27] btrfs: serialize data allocation and submit IOs Date: Thu, 8 Aug 2019 18:30:27 +0900 Message-Id: <20190808093038.4163421-17-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP To preserve sequential write pattern on the drives, we must serialize allocation and submit_bio. This commit add per-block group mutex "zone_io_lock" and find_free_extent_seq() hold the lock. The lock is kept even after returning from find_free_extent(). It is released when submiting IOs corresponding to the allocation is completed. Implementing such behavior under __extent_writepage_io is almost impossible because once pages are unlocked we are not sure when submiting IOs for an allocated region is finished or not. Instead, this commit add run_delalloc_hmzoned() to write out non-compressed data IOs at once using extent_write_locked_rage(). After the write, we can call btrfs_hmzoned_unlock_allocation() to unlock the block group for new allocation. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 5 +++++ fs/btrfs/hmzoned.h | 34 +++++++++++++++++++++++++++++++ fs/btrfs/inode.c | 45 ++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 83 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3d31a1960c4d..1e924c0d1210 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -619,6 +619,7 @@ struct btrfs_block_group_cache { * zone. */ u64 alloc_offset; + struct mutex zone_io_lock; }; /* delayed seq elem */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index bc95a73a762d..5b1a9e607555 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5532,6 +5532,7 @@ static int find_free_extent_seq(struct btrfs_block_group_cache *cache, if (cache->alloc_type != BTRFS_ALLOC_SEQ) return 1; + btrfs_hmzoned_data_io_lock(cache); spin_lock(&space_info->lock); spin_lock(&cache->lock); @@ -5563,6 +5564,9 @@ static int find_free_extent_seq(struct btrfs_block_group_cache *cache, out: spin_unlock(&cache->lock); spin_unlock(&space_info->lock); + /* if succeeds, unlock after submit_bio */ + if (ret) + btrfs_hmzoned_data_io_unlock(cache); return ret; } @@ -8104,6 +8108,7 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, btrfs_init_free_space_ctl(cache); atomic_set(&cache->trimming, 0); mutex_init(&cache->free_space_lock); + mutex_init(&cache->zone_io_lock); btrfs_init_full_stripe_locks_tree(&cache->full_stripe_locks_root); cache->alloc_type = BTRFS_ALLOC_FIT; diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index 3a73c3c5e1da..a8e7286708d4 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -39,6 +39,7 @@ int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info); void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); +void btrfs_hmzoned_data_io_unlock_at(struct inode *inode, u64 start, u64 len); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { @@ -140,4 +141,37 @@ static inline bool btrfs_check_super_location(struct btrfs_device *device, !btrfs_dev_is_sequential(device, pos); } + +static inline void btrfs_hmzoned_data_io_lock( + struct btrfs_block_group_cache *cache) +{ + /* No need to lock metadata BGs or non-sequential BGs */ + if (!(cache->flags & BTRFS_BLOCK_GROUP_DATA) || + cache->alloc_type != BTRFS_ALLOC_SEQ) + return; + mutex_lock(&cache->zone_io_lock); +} + +static inline void btrfs_hmzoned_data_io_unlock( + struct btrfs_block_group_cache *cache) +{ + if (!(cache->flags & BTRFS_BLOCK_GROUP_DATA) || + cache->alloc_type != BTRFS_ALLOC_SEQ) + return; + mutex_unlock(&cache->zone_io_lock); +} + +static inline void btrfs_hmzoned_data_io_unlock_logical( + struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group_cache *cache; + + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return; + + cache = btrfs_lookup_block_group(fs_info, logical); + btrfs_hmzoned_data_io_unlock(cache); + btrfs_put_block_group(cache); +} + #endif diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ee582a36653d..d504200c9767 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -48,6 +48,7 @@ #include "qgroup.h" #include "dedupe.h" #include "delalloc-space.h" +#include "hmzoned.h" struct btrfs_iget_args { struct btrfs_key *location; @@ -1279,6 +1280,39 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, return 0; } +static noinline int run_delalloc_hmzoned(struct inode *inode, + struct page *locked_page, u64 start, + u64 end, int *page_started, + unsigned long *nr_written) +{ + struct extent_map *em; + u64 logical; + int ret; + + ret = cow_file_range(inode, locked_page, start, end, + end, page_started, nr_written, 0, NULL); + if (ret) + return ret; + + if (*page_started) + return 0; + + em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, end - start + 1, + 0); + ASSERT(em != NULL && em->block_start < EXTENT_MAP_LAST_BYTE); + logical = em->block_start; + free_extent_map(em); + + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + extent_write_locked_range(inode, start, end, WB_SYNC_ALL); + *page_started = 1; + + btrfs_hmzoned_data_io_unlock_logical(btrfs_sb(inode->i_sb), logical); + + return 0; +} + static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes) { @@ -1645,17 +1679,24 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, int ret; int force_cow = need_force_cow(inode, start, end); unsigned int write_flags = wbc_to_write_flags(wbc); + int do_compress = inode_can_compress(inode) && + inode_need_compress(inode, start, end); + int hmzoned = btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED); if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ASSERT(!hmzoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ASSERT(!hmzoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(inode) || - !inode_need_compress(inode, start, end)) { + } else if (!do_compress && !hmzoned) { ret = cow_file_range(inode, locked_page, start, end, end, page_started, nr_written, 1, NULL); + } else if (!do_compress && hmzoned) { + ret = run_delalloc_hmzoned(inode, locked_page, start, end, + page_started, nr_written); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &BTRFS_I(inode)->runtime_flags); From patchwork Thu Aug 8 09:30:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083791 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E06541399 for ; Thu, 8 Aug 2019 09:31:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D13B928AFC for ; Thu, 8 Aug 2019 09:31:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C592928AFD; Thu, 8 Aug 2019 09:31:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FD90289CF for ; Thu, 8 Aug 2019 09:31:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732411AbfHHJbs (ORCPT ); Thu, 8 Aug 2019 05:31:48 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732176AbfHHJbq (ORCPT ); Thu, 8 Aug 2019 05:31:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256706; x=1596792706; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t2T8/0CuftCCAs+exhYi5obxY8CW2cdT0Xrk9qPFPWw=; b=IxxIO8H/bbQU06IqJXQ/HcmNk2EvFXITy11vK1dO6k19D3e5Kr6AOEtk Vokc+Rt43pFBzDhNrh/E/azIMAjWo5AAMWvfyQwltS8irUmGYOmN2Y65Y uNObnjHkdgWzGR9qNm9AaOw7eZVTS2qWOWYKXlazqowBqPPA4hLbzhjV8 3GFMVfEljFhww3wTGWLcHZGp4kjNm37epCDNVkCJjvaPrcab9Yry5mxU9 3wrrwclDBswzmybT0H6vGQXWHvfRJt+aRrJAnZQHTWbLZ9gSPw31g5pZU 6oNscmvUF9IlMSxbIZbh2hQ5rFc9Fb5mmJegZRWi2FCUU1AXByy76o1Xj w==; IronPort-SDR: UI/19HZxXjzmY8Xzckj8pLgCyPnMwfRhY+HiBPfhAbHbkO62MtWjbn1/iaO9AryTCB9dev/6sd pe5Jq/4xAmargexocGXpc6zMCLZ590n3O0wkyb21VdUza6TMk94yRa40qBNNEbLRpJADJAOFpZ f9GjZ8Cg9+CGYpbj7Au581TOyRDCj6ARd4tfVRMhKB3ZVZTvyw7GFe79zEKgVK4iXVV6eqoew8 A4KZx8RYQFHjlw9IwPie63+668nPpCvG7kwDxew4wNusOOZnwPnBgb02SV/tsityRUB3yLbMMO Agk= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363389" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:45 +0800 IronPort-SDR: +bcSx1C6tyNdrdY/Y2bsrpNx16PSEnUfwf7c3KlkSikNPtQCGtozqGCENb2j2ji4E3YXmNUYx1 01CVU8m3h28qrYB2Q3DhJfAghFM5eDQFSkzHr2AlSQ8AoizY3soaxuYxRefUESxecDRMpJ5NBb M6wjbiCJS2QBVMQFrsazJQrGUc52r03fmW+Y8pBNCrq/+582WIfVIxPtT3JtHkp//rlDDAFCGJ DVsAgid9sRnaZlw4cF9SfU95xjVGm8s+QDrsG76U5LIDRm5RaTKe3ZNSgOjUjdLfoHFbtM4ymJ i79vrj3WiDeDho1AhBnpXW/3 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:29 -0700 IronPort-SDR: 9nQlb89y/RkHJBgrERurdEjEmG6uRxyJh86/YxIyDRWdKpKLzVKojfFmkWEc+3Kms9VRlrhewW jWWNpfzb4TGeM1qleCZQu53rttQ4rJZvnI+Fm9N477jiRNl2JaZJLx7W0R5IirGseIsClR7mj2 SNXgu2phljw5NlWowsJnEhYUHxPwRxnKyhmsrB30lX4XO9Hlc7C1rV4oFbilsYw4pe5carP4ud +a89sYvFOl9JbEhrCAnfaFJwwENzHD0HqZSgzjnuKMlTfvz8viQ2on1Vuh8X3GztFmXeb2y5Mw 17Q= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:44 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 17/27] btrfs: implement atomic compressed IO submission Date: Thu, 8 Aug 2019 18:30:28 +0900 Message-Id: <20190808093038.4163421-18-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As same as with non-compressed IO submission, we must unlock a block group for the next allocation. Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d504200c9767..283ac11849b1 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -776,13 +776,26 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) * and IO for us. Otherwise, we need to submit * all those pages down to the drive. */ - if (!page_started && !ret) + if (!page_started && !ret) { + struct extent_map *em; + u64 logical; + + em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, + async_extent->start, + async_extent->ram_size, + 0); + logical = em->block_start; + free_extent_map(em); + extent_write_locked_range(inode, async_extent->start, async_extent->start + async_extent->ram_size - 1, WB_SYNC_ALL); - else if (ret) + + btrfs_hmzoned_data_io_unlock_logical(fs_info, + logical); + } else if (ret) unlock_page(async_chunk->locked_page); kfree(async_extent); cond_resched(); @@ -883,6 +896,7 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) free_async_extent_pages(async_extent); } alloc_hint = ins.objectid + ins.offset; + btrfs_hmzoned_data_io_unlock_logical(fs_info, ins.objectid); kfree(async_extent); cond_resched(); } @@ -890,6 +904,7 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) out_free_reserve: btrfs_dec_block_group_reservations(fs_info, ins.objectid); btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1); + btrfs_hmzoned_data_io_unlock_logical(fs_info, ins.objectid); out_free: extent_clear_unlock_delalloc(inode, async_extent->start, async_extent->start + From patchwork Thu Aug 8 09:30:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083795 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2413F1398 for ; Thu, 8 Aug 2019 09:31:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1483D289CF for ; Thu, 8 Aug 2019 09:31:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 086D428AFC; Thu, 8 Aug 2019 09:31:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9D10289CF for ; Thu, 8 Aug 2019 09:31:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732416AbfHHJbt (ORCPT ); Thu, 8 Aug 2019 05:31:49 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732202AbfHHJbs (ORCPT ); Thu, 8 Aug 2019 05:31:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256707; x=1596792707; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=84iCM2QxAe3BvyMZ1HFkjly71xFGsV5zPKdATf5w8MI=; b=YWY4DYuaKk0JWlWJx3BTGakabQMxmvOrFgX3dLg5m+Z+9eMlOfs6nU58 LjZAFBNiPzv2s444ZpXaiGXzlD32KNJLd4+SRQDPv2ZPGAxXYkvnVwSWL cXQV6FIrQCgZ0L3mXKoiXmWaDakeMNjg8rfvFJ97fUTjLY9UG63df8RGl 2zdDFZeJLl5ryTNuee1c98kLqdXCwrAqen9J3PHJC1iSXPcstvlJLyrxf j8QGnIhJHsrkndMyEDdQRx5KLY8sa7I4bG7TYgv5gTBNhnac5WvKv9Ks8 hJCXdf+I2HcUYg+ulVaCu0+FQpMbWK6R3C5doS5t31hTxDFIBpLdtxCks g==; IronPort-SDR: n0EpGsSMpcUlm+u/wBWAZvpw3vgugDLtRxiS+DK+sbk/9LVIgy+UDDjYwk/9Ek0JZFsB6rfSCm /D8oVDMkn8xYYClZE53I1TYe54pZK7XIF5RO1gKSiELG9zKrOJmgB2rUAIw+PGUCuGwvxki4Kh WFlQfnQtWH+f3ts/XpE8+7sPdozK6hbGRFFpBMebeajnmSDMTmgf1vQT15lzZmdDVFU+x+YxqL IwG5tSARs02n42WYRkB6Kyr+Iz1QoEtj83HGfxEyxhYzWA8GV4hXKTHP3TSbNa+ZdVqIRU9MYh TTY= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363394" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:47 +0800 IronPort-SDR: dckxEebWhcvrY3dcKettyhUGB41fhOK23EnIDlNs1w1k2Wv6tvvZ0kSUIOoIG30PfjB822rgNN QyTmTXkg4jzSiatYQpgd/6iYJdoQWsHqok06uSg4yhVWHq7zPtMsr5Cto0acvRHdT1p7dajIpf DXhQSkn+/YyZsGAEjd2SwvSdzQITP51MfLt38CNPxmAZNgIC/tmpovJYUIS3AjRdRAKG29yK+G mnW5d3jpKVN/HggqouLjXgFpD6wb2e0Mnb2a/yCmvfPURvAXks+ZjTxyX+xNYJgLKJ9106hCiQ y8499z10ZUWf9nPaTIWoI3++ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:31 -0700 IronPort-SDR: JmzZlMkNESd5XX8l2Bt+KoQLj68z+hn9uDjUNW16vRo7XdKaEeNcmbn/hOgfaIUhyWunoNQ/eW 3nVoywTudUU2yPYnjFATU/E9urSHiP/SKjSqZes3DAGoiZgVffLVLHGYBuxqKk675L441n9JkJ DboAiofFNCKdB9Cr8hULMExG2mSZbGIWotEov6Z0wl6HWHChe4wQSrFHfR+vIphfsTngMfq1qR EvWvfmBncEb5NT/HLvvIXZJRGPgFtdYzvtu4b8fACIFdjb73ZuEtCPzLgsSGENJXjTpip3vY01 5Eo= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:46 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 18/27] btrfs: support direct write IO in HMZONED Date: Thu, 8 Aug 2019 18:30:29 +0900 Message-Id: <20190808093038.4163421-19-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As same as with other IO submission, we must unlock a block group for the next allocation. Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 283ac11849b1..d7be97c6a069 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8519,6 +8519,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, struct btrfs_io_bio *io_bio; bool write = (bio_op(dio_bio) == REQ_OP_WRITE); int ret = 0; + u64 disk_bytenr; bio = btrfs_bio_clone(dio_bio); @@ -8562,7 +8563,11 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, dio_data->unsubmitted_oe_range_end; } + disk_bytenr = dip->disk_bytenr; ret = btrfs_submit_direct_hook(dip); + if (write) + btrfs_hmzoned_data_io_unlock_logical( + btrfs_sb(inode->i_sb), disk_bytenr); if (!ret) return; From patchwork Thu Aug 8 09:30:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083799 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E2D11399 for ; Thu, 8 Aug 2019 09:31:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E4B928A4E for ; Thu, 8 Aug 2019 09:31:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3223F28B00; Thu, 8 Aug 2019 09:31:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 930AD28A4E for ; Thu, 8 Aug 2019 09:31:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732431AbfHHJbv (ORCPT ); Thu, 8 Aug 2019 05:31:51 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732419AbfHHJbu (ORCPT ); Thu, 8 Aug 2019 05:31:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256709; x=1596792709; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FKpxSAahkceObCkCHvtSaJb1gQU26ZNexGqsAqpyCHs=; b=iZLGMNVycAYcwsQlEBYjiCtaoaC5tA1OejvAm7DRu+Mpsj32ac7tb8JE w9VKXwcdNIiAZgtUL0ZUlXqsFqCG4YzsvlpROfVZHieZxvhddXHufDqR4 4r4huqaJe3EGw6iTq+7IKXJhC6Jlri87KfVuUHXHOVwCzsgNQIVE6DhRY oLfsiVYrSzSDFfRU11cBvI4R4UWTba2aLI5EB3K0dCO4nL2IKCuMYRo0W /n+7bEUaWYL+52j9Mcnh7FfehluIMb5tPssItFbz4BqgyRaBS1ZhduMLW cPPWMSdEICHbSBaXb3d1BY/t5axRkT4Wjx9PA/Cc+Q7SUOHrie7o2950A w==; IronPort-SDR: 8gSMoeNHKdLrC1g3tr2/FZWgP7CIiDQev/Tht9swwFjt/f9+pcGK3eBfb8RP61omF7rzC3KNOo 0DUMxSosgMZFfsrZ35Yk71jEvG9eLAg+CHbCFkkfojylS9a9JifFQUS2H/K/zv8CG8tcvthEH+ p0b8KrqxLK5FrQSbvai1nZdV4rgT0Qp2TMSYVd3S+NJeoD4+IixAQkLcGQQ/TzWl09inWWfGcR YnMyFztJ5c27lAfARn5ixjhLE3kCc1Hdk0u22sAEJTuN1MEe4/OMjznYHu7ysaHrNOhwPmOiaR HRU= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363398" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:49 +0800 IronPort-SDR: vNJXFm1toqc8B0f8WTi6CokGXAWcPcXkobRyiX92mqTOkDhE8FFqbn3/IE0B6C27DKMZo0W3IL X7xwG/ku2/b/USCmqCFy3uKpLkxWYQbkmAHk0MXztCbWazggb9NJw2L7QziyI0Tm+hTLnTB5Gh j0eoBJemXbMhWMA7VB4I/SE/OKBsXDlaq8amZ0afUBKBhl8br7yggMeW1bQc9agsEpfhwZQ1DW zjWo275vhUSAwHPqjdBX4QjlHqw681nEMfV9csOCS+mWXHIFGwDH6ZmPTwALwScz+GMCYxgVHj aWzM12iEecrffGL+RUO4rUSO Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:33 -0700 IronPort-SDR: eVv6sa08/VTJrhxlQcdasc+oTG7jDsg4rvqOX10CixvsPT9kVoZ7m95kg3irLMStP4BEgJtZdQ j/Y9Dj8tcAEPk6PPh/NsjsCNFohzNHq+9z6m80qkn68M4vYs8zanWY83VcLq/DcVU7ZBCaBPLv t0WeqK7NvSUQNrjbo5B7ZUYCvuvtTGbARmysk0IpcDCc6ZBtP2sCngMaeBO3bAxwJDIICJ8hlv 9Dwuxo6Qyd9xKhrfWyLGbSdFedobivAJwxHPBLwB85K1vM9OzaxpXqvRIpRDUaMLBqqQdmFi9A r7U= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:48 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 19/27] btrfs: serialize meta IOs on HMZONED mode Date: Thu, 8 Aug 2019 18:30:30 +0900 Message-Id: <20190808093038.4163421-20-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As same as in data IO path, we must serialize write IOs for metadata. We cannot add mutex around allocation and submit because metadata blocks are allocated in an earlier stage to build up B-trees. Thus, this commit add hmzoned_meta_io_lock and hold it during metadata IO submission in btree_write_cache_pages() to serialize IOs. Furthermore, this commit add per-block grorup metadata IO submission pointer "meta_write_pointer" to ensure sequential writing. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_io.c | 17 ++++++++++++++++- fs/btrfs/hmzoned.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/hmzoned.h | 17 +++++++++++++++++ 5 files changed, 82 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1e924c0d1210..a6a03fc5e4c5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -620,6 +620,7 @@ struct btrfs_block_group_cache { */ u64 alloc_offset; struct mutex zone_io_lock; + u64 meta_write_pointer; }; /* delayed seq elem */ @@ -1108,6 +1109,8 @@ struct btrfs_fs_info { spinlock_t ref_verify_lock; struct rb_root block_tree; #endif + + struct mutex hmzoned_meta_io_lock; }; static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e0a80997b6ee..63dd4670aba6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2703,6 +2703,7 @@ int open_ctree(struct super_block *sb, mutex_init(&fs_info->delete_unused_bgs_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); + mutex_init(&fs_info->hmzoned_meta_io_lock); seqlock_init(&fs_info->profiles_lock); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4e67b16c9f80..ff963b2214aa 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3892,7 +3892,9 @@ int btree_write_cache_pages(struct address_space *mapping, struct writeback_control *wbc) { struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree; + struct btrfs_fs_info *fs_info = tree->fs_info; struct extent_buffer *eb, *prev_eb = NULL; + struct btrfs_block_group_cache *cache = NULL; struct extent_page_data epd = { .bio = NULL, .tree = tree, @@ -3922,6 +3924,7 @@ int btree_write_cache_pages(struct address_space *mapping, tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; + btrfs_hmzoned_meta_io_lock(fs_info); retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); @@ -3965,6 +3968,14 @@ int btree_write_cache_pages(struct address_space *mapping, if (!ret) continue; + if (!btrfs_check_meta_write_pointer(fs_info, eb, + &cache)) { + ret = 0; + done = 1; + free_extent_buffer(eb); + break; + } + prev_eb = eb; ret = lock_extent_buffer_for_io(eb, &epd); if (!ret) { @@ -3999,12 +4010,16 @@ int btree_write_cache_pages(struct address_space *mapping, index = 0; goto retry; } + if (cache) + btrfs_put_block_group(cache); ASSERT(ret <= 0); if (ret < 0) { end_write_bio(&epd, ret); - return ret; + goto out; } ret = flush_write_bio(&epd); +out: + btrfs_hmzoned_meta_io_unlock(fs_info); return ret; } diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 4c296d282e67..4b13c6c47849 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -548,6 +548,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache) out: cache->alloc_type = alloc_type; + if (!ret) + cache->meta_write_pointer = + cache->alloc_offset + cache->key.objectid; kfree(alloc_offsets); free_extent_map(em); @@ -648,3 +651,45 @@ void btrfs_free_redirty_list(struct btrfs_transaction *trans) } spin_unlock(&trans->releasing_ebs_lock); } + +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group_cache **cache_ret) +{ + struct btrfs_block_group_cache *cache; + + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return true; + + cache = *cache_ret; + + if (cache && + (eb->start < cache->key.objectid || + cache->key.objectid + cache->key.offset <= eb->start)) { + btrfs_put_block_group(cache); + cache = NULL; + *cache_ret = NULL; + } + + if (!cache) + cache = btrfs_lookup_block_group(fs_info, + eb->start); + + if (cache) { + *cache_ret = cache; + + if (cache->alloc_type != BTRFS_ALLOC_SEQ) + return true; + + if (cache->meta_write_pointer != eb->start) { + btrfs_put_block_group(cache); + cache = NULL; + *cache_ret = NULL; + return false; + } + + cache->meta_write_pointer = eb->start + eb->len; + } + + return true; +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index a8e7286708d4..c68c4b8056a4 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -40,6 +40,9 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); void btrfs_hmzoned_data_io_unlock_at(struct inode *inode, u64 start, u64 len); +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group_cache **cache_ret); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { @@ -174,4 +177,18 @@ static inline void btrfs_hmzoned_data_io_unlock_logical( btrfs_put_block_group(cache); } +static inline void btrfs_hmzoned_meta_io_lock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return; + mutex_lock(&fs_info->hmzoned_meta_io_lock); +} + +static inline void btrfs_hmzoned_meta_io_unlock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return; + mutex_unlock(&fs_info->hmzoned_meta_io_lock); +} + #endif From patchwork Thu Aug 8 09:30:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083803 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A74814DB for ; Thu, 8 Aug 2019 09:31:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED585289CF for ; Thu, 8 Aug 2019 09:31:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E0F4628AFC; Thu, 8 Aug 2019 09:31:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 813C128AFD for ; Thu, 8 Aug 2019 09:31:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732442AbfHHJbw (ORCPT ); Thu, 8 Aug 2019 05:31:52 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732427AbfHHJbw (ORCPT ); Thu, 8 Aug 2019 05:31:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256711; x=1596792711; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LXvW7fbiuo+RMdBcEe7muZ2DHPLietQkEmIKlsq8yds=; b=Ig5Q2bb3FPai1tIpj4egB1s+fq+W0rM6wwHWuy/+XK91ueuURUNGCRAM n0VdqQ3fLbbvMU71jEiePObRU8VFzpY5P4g7SGaBhBjekHCjS2DnLSQdF 9I/O7H88hXl7+B8QwehuttOIzT2k2aTiJmH6JHg8hMgixcog2k7bG5Dqa nzbApf/3+LUFmlKsJzU56KrjCfFNf5tge29IGCc7WY6oTeUsxOIw3rOCF 95IwRUUTjvZX3qPWicVz+YD/Iae2haJyeRsrybAPZGp9BzxHGfQeFcKVY 0TZCb3GCxiD7kEf0LwHDvQR/qFky61h0kP+dOWqeO9hu8bAVl6iF3Z9qF A==; IronPort-SDR: HUb4VPBKX2x9XeZscenAzUW6ggFPvdvbJUduxDRk3ND6liQzn+08d7gJOgZXNKOAhz22GM48RN adj8k3mqJN8gCHtN03ZaUZ1MnaTqWm+DUzdn2N+7uVqpnrebqQurApbqh07M6y/oocx416HcRw t8JuYdc3DrSuVjUynGkUpLM1nw9STWJdpHxJ/mbKwTyOjJ06AuUxoGucAgmFpXGE3YuuS4N6K2 t0ThxNVen4to19vZdlHh9oI05RvMRVa6AIo7k1We4bpgheQCmNb8lQ1l/gIYQqdhkljfmPEi5r JuM= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363402" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:51 +0800 IronPort-SDR: kl7bfjdxNCpSge5A7JLkMvmyU1WrMn5r5aajvZZLCctQRhpRaiL3FL17pNOIDzZecCSEd6Doal xYnEPa95RTAomBgNw+EW+E4Wlqqx+M++h/YvbYSEIcq5OaV94ATPz1nv5eK47zoi663OFuPYel B3wTJ+ptgn935VEjEllZFVo0Q6gPUOg6bgc506bVms+07NRwJyuaJXBtSFjY+mCR/0F6nJcC4i zhplt5d2ubDGLtzrzBWgfeijs0EID/nmm2BNtCgygrUyRDZjOObKa0HXsVqUdHsVKYr2loqLWB 4lVoG/WDUzVapJL2v1AhLsum Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:35 -0700 IronPort-SDR: IWv7hftPv5sUwka1c28g3x5y/uT0Lm9LQCGDkVy42iCBNXEzUDV13T4Fp7fqhtqcWYhWuabzb8 ybsKtcO8GPazdQ6ongrjZjttxkeT5tPKj64QJxPt4T3Vu/gMKkJWhryA2SLPAaY1tP/kJOaVY+ Algr78bCpFmYEgev7LVEHjypMC/jw/mTlgKCg1eoft1P9ytrdiViR5xlFQ/UhrHcQKXLxWVAnb RqHvsCQ/Q1W+N81UNX0T0+Cok+AfaBaqti8EZ1Y0XWTLzgKrItEIF3zmn+PsQVsYQ4bQbIuQKs Mtg= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:50 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 20/27] btrfs: wait existing extents before truncating Date: Thu, 8 Aug 2019 18:30:31 +0900 Message-Id: <20190808093038.4163421-21-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d7be97c6a069..95f4ce8ac8d0 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5236,6 +5236,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_end_write_no_snapshotting(root); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_fs_incompat(fs_info, HMZONED)) { + ret = btrfs_wait_ordered_range( + inode, + ALIGN(newsize, fs_info->sectorsize), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Thu Aug 8 09:30:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083807 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0DB1D1398 for ; Thu, 8 Aug 2019 09:31:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE9FE28A4E for ; Thu, 8 Aug 2019 09:31:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DF5F228AFD; Thu, 8 Aug 2019 09:31:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6735B289CF for ; Thu, 8 Aug 2019 09:31:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732450AbfHHJbz (ORCPT ); Thu, 8 Aug 2019 05:31:55 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732443AbfHHJby (ORCPT ); Thu, 8 Aug 2019 05:31:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256713; x=1596792713; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6qSbdTxWdz4jPmP4l08UWJ2wxWWR037KJ+Tc0/MK7LE=; b=ckXLDopBidI/eTABtZDN5Kv4PNJMs4riRJ1onlWDGP020b6sBBOshqYR lRWW/L70fEWs4JDMhqhAqiU1F4EJjhVPkRf6PU6phW+jGdtcy3y+nJ/8g jB0VCkGtk+KLSphPAZ5mm9a8tMGhqPh1Bzyngx7Faz+Q0rfD83RfdyA/g v4gYFpcDYim1OT6xgYn8Z4bQhl3TmbXKLKHaQ8Bp9no3I64pM4F38rVPx 0lTlSB2+97hrNo51iJ670Dwv/f2WcYiMy5+UCG3dihqkDW8LWYzSMgVkN xBN/IN/xgOmj15WXacnZJGCT3qT4rMty4lnAaoL0AI3RlN8yKEkSV53FZ Q==; IronPort-SDR: sAEcKPOwhOpPD3TKi0yz42aOcdu0TcYmsqd2TVp+5/vkcif34cuuk4IWiNBOMl4ltF9kwprWlu XreJ2AyzhckaLqOjRrVsT1D3YrCGds82Bu4GVdD7Oj44GaXT1/dzod1Ox7PT2WafvQ6MsR21iT SHYrfFBfGbMVxJ5rtJ5e0emlX7xhqSCWvL4d2JF89iZWHNJiblIKDbrbLzxfP2RofjKereqaVF hqSoW5T/klmFHoZeK7ch9h2iWi8TY2eHsuefP+oKV2/ULqT0mrMAK7v89mtWS+R4Eub7C1T4/s NPo= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363406" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:53 +0800 IronPort-SDR: Ip/neW9unGANmBZN0rcVOSs8DsRv34+81m1SbZqwSX3nxJPISlCSrKjTDCbrE98gSUlJ7lyOLw Ep2+NRmGE5CpRfjFzfdA9O8c7CssGOMjy13Un/x/LNzW8AnY/GMQO4bBjfUEhCGyJV9J0oT8H7 f4pqgtCvdc2hmNeM3im8JqsPioqqhWmnRkihzvMIt2MVFmZM0LGyO0g2Jo8hiu8kgAsfWIYdVi KaPqu/YTkc0ZapzCiPLsooGIIMQditL7YKSBmzqweKEjpcyWEh/dq5gpwPZO2uT8jFuHvpMf3a nJV2qquIAUCCgvg9nPQ3qfAM Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:37 -0700 IronPort-SDR: fSF3l/pMF5ycW0YuQxoDgtIamqZFRIj5lXbShNPLHrz2mWoFeMoRXDtqCX+49TgHcjYnSNCYa7 HjhICeZblC2XIRsaLT8OrvIN6yLjXr4GiQKdxlEMv7LDkMZrwagDHJfkpYkkHurE9ZqH+NUbSg NTZ0cNQ+Mq+XgbIKOXZM3jYowCX7Af7lNeWPAt+pCsdQJ1ZaAD24YxVWuiYjbjaKnl8nRBif+E ACbLyUp31PmQaRsMRPFM6hOmeDqmL+Ap4L1g3ALStHMHg7X/auQITtTJS7MXJhP0xapn+vr3Sn LSM= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:52 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 21/27] btrfs: avoid async checksum/submit on HMZONED mode Date: Thu, 8 Aug 2019 18:30:32 +0900 Message-Id: <20190808093038.4163421-22-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In HMZONED, btrfs use per-Block Group zone_io_lock to serialize the data write IOs or use per-FS hmzoned_meta_io_lock to serialize the metadata write IOs. Even with these serialization, write bios sent from {btree,btrfs}_write_cache_pages can be reordered by async checksum workers as these workers are per CPU and not per zone. To preserve write BIO ordering, we can disable async checksum on HMZONED. This does not result in lower performance with HDDs as a single CPU core is fast enough to do checksum for a single zone write stream with the maximum possible bandwidth of the device. If multiple zones are being written simultaneously, HDD seek overhead lowers the achievable maximum bandwidth, resulting again in a per zone checksum serialization not affecting performance. Besides, this commit disable async_submit in btrfs_submit_compressed_write() for the same reason. This part will be unnecessary once btrfs get the "btrfs: fix cgroup writeback support" series. Signed-off-by: Naohiro Aota --- fs/btrfs/compression.c | 5 +++-- fs/btrfs/disk-io.c | 2 ++ fs/btrfs/inode.c | 9 ++++++--- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 60c47b417a4b..058dea5e432f 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -322,6 +322,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, struct block_device *bdev; blk_status_t ret; int skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM; + int async_submit = !btrfs_fs_incompat(fs_info, HMZONED); WARN_ON(!PAGE_ALIGNED(start)); cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS); @@ -377,7 +378,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(fs_info, bio, 0, 1); + ret = btrfs_map_bio(fs_info, bio, 0, async_submit); if (ret) { bio->bi_status = ret; bio_endio(bio); @@ -408,7 +409,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(fs_info, bio, 0, 1); + ret = btrfs_map_bio(fs_info, bio, 0, async_submit); if (ret) { bio->bi_status = ret; bio_endio(bio); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 63dd4670aba6..a8d7e81ccad1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -873,6 +873,8 @@ static blk_status_t btree_submit_bio_start(void *private_data, struct bio *bio, static int check_async_write(struct btrfs_fs_info *fs_info, struct btrfs_inode *bi) { + if (btrfs_fs_incompat(fs_info, HMZONED)) + return 0; if (atomic_read(&bi->sync_writers)) return 0; if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 95f4ce8ac8d0..bb0ae3107e60 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2075,7 +2075,8 @@ static blk_status_t btrfs_submit_bio_hook(struct inode *inode, struct bio *bio, enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA; blk_status_t ret = 0; int skip_sum; - int async = !atomic_read(&BTRFS_I(inode)->sync_writers); + int async = !atomic_read(&BTRFS_I(inode)->sync_writers) && + !btrfs_fs_incompat(fs_info, HMZONED); skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM; @@ -8383,7 +8384,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, /* Check btrfs_submit_bio_hook() for rules about async submit. */ if (async_submit) - async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); + async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers) && + !btrfs_fs_incompat(fs_info, HMZONED); if (!write) { ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); @@ -8448,7 +8450,8 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) } /* async crcs make it difficult to collect full stripe writes. */ - if (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK) + if (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK || + btrfs_fs_incompat(fs_info, HMZONED)) async_submit = 0; else async_submit = 1; From patchwork Thu Aug 8 09:30:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083813 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A67771399 for ; Thu, 8 Aug 2019 09:31:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 97E4A289CF for ; Thu, 8 Aug 2019 09:31:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8BC8428AFC; Thu, 8 Aug 2019 09:31:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4529F289CF for ; Thu, 8 Aug 2019 09:31:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732454AbfHHJb5 (ORCPT ); Thu, 8 Aug 2019 05:31:57 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732449AbfHHJbz (ORCPT ); Thu, 8 Aug 2019 05:31:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256715; x=1596792715; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ml3KMhXO+H6Ky5Bs7beA24Bl2I2FMFKC8vXmTBkJNNc=; b=ruiEwqpeDqmLLCknIokI4CSEmSNxzLA948QxpmwZ2G4PFjIg3HOO/R3V 29/ecjYSFSNHTi2yfI5gauebR7UHrQWgN4yYX9cDJxqla+La2T1vxpFwd RYCH4bp7OqobAnlNEkn8YoMq3t6i6zTHsF/f/C9jnuPtErZ466FDkt2HI 8XU4e0FPGx9YtGpuA6y1DVupGnslx8vCahCxv5I9+7Xe8McobS+dtbgOh uo9I95aCYycCgd/IWBzoQ//TBj0acV8K3dUYACwwVQM797yCx4NM6I74g u5JKI+yvSCftyn03iY5hje1EcBUebraQ++3wPpkkiQseTYVbtDkt9oR0p Q==; IronPort-SDR: gYaznZAMpjZjGsHz+7yjwY8MiEmtdijLp/ny81ogvZheQ7BzuJibahsyet9kXhJFCSKJ7SBYF/ +DF97vxwhFT0hmUvcECJZOu9aTJ8xqzV1kK44/9VH+Q3rIxk25WPJDXtLIg08Vbxcj+g579lOy x1tZYOwSw3Hw9yiFsIMml2XrD7LUyM0x7e7083tj6zYh+W/HKRCiDamMc1pYx7hS2amFZ6wZYZ lKVK8XHeUPoyIFqB3s2oQhGtRpiZa4qcJQ4k7JIloK8nvQP2DLE4NgVJv6RMMxTXulpUixIRV2 Hsw= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363410" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:55 +0800 IronPort-SDR: BNQt6a7gCQ3RxwPI19pbBI9DvdE/efrkc0OFCFwHkiGHmMLRgCcRbmX/fDylsh4zsjYqff+plE Co5HfhvqRaPKLcmjlXkVeKhAUgqlLiamJwe+RRXCOgI4DZByisyIv/IvCdU6GFvz/AEJNF69Cy xlrYs992+4oT608XMkbVMVb5vs4mb2KoZno5VxniHRQMRTBfXbtKoxBaREc98t8dCW7arAmGQR BDh8gPQoQUUZKqZlbpFSPRTBVaJWsvH8Rt2K2xYsf3EogR1lSVKDwxBPJYa2OQr2BOhapWIAYq kIfti47TBLBfJ4vYVln0Z3+u Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:39 -0700 IronPort-SDR: h+6JuoFrNRJHbV0UF6iM3ySjRNzKNVz3H5lPYTCMDXcJFOLdGiKHUTkjYHp+LOpoQ83+LGsUbZ 6QoXLC0lrBuRQtqIQrnrXY9JyPpalk4yq8xSzUTIOdZi7bKpCb75VoxI4eI1IGVPAorjWTzRd+ TCbSLwJ0C23viCEtkvAGjh7QT8mNT7hWYdbYcIlwMDuB5+mMz+elSAqgc/KzLlZ8mCMRq4uSg9 AqcvFo6utg8lyqWfxvwoUW8DtDtXkJBl39Cu87TtufhdRHc7Z0v5YJz42M+SyJBudnyiXjtgxz jB0= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:54 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 22/27] btrfs: disallow mixed-bg in HMZONED mode Date: Thu, 8 Aug 2019 18:30:33 +0900 Message-Id: <20190808093038.4163421-23-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Placing both data and metadata in a block group is impossible in HMZONED mode. For data, we can allocate a space for it and write it immediately after the allocation. For metadata, however, we cannot do so, because the logical addresses are recorded in other metadata buffers to build up the trees. As a result, a data buffer can be placed after a metadata buffer, which is not written yet. Writing out the data buffer will break the sequential write rule. This commit check and disallow MIXED_BG with HMZONED mode. Signed-off-by: Naohiro Aota --- fs/btrfs/hmzoned.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 4b13c6c47849..123d9c804c21 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -235,6 +235,13 @@ int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info) goto out; } + if (btrfs_fs_incompat(fs_info, MIXED_GROUPS)) { + btrfs_err(fs_info, + "HMZONED mode is not allowed for mixed block groups"); + ret = -EINVAL; + goto out; + } + btrfs_info(fs_info, "HMZONED mode enabled, zone size %llu B", fs_info->zone_size); out: From patchwork Thu Aug 8 09:30:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083829 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E3021398 for ; Thu, 8 Aug 2019 09:32:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F33028A4E for ; Thu, 8 Aug 2019 09:32:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 02E7128AFD; Thu, 8 Aug 2019 09:32:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B3EB628A4E for ; Thu, 8 Aug 2019 09:32:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732468AbfHHJcA (ORCPT ); Thu, 8 Aug 2019 05:32:00 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732449AbfHHJb5 (ORCPT ); Thu, 8 Aug 2019 05:31:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256717; x=1596792717; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1xMaKip+xlhszDSiuVh43B5gpITCb2oW5pfRmyNf0mc=; b=n+tyPEfVLfmXTxxuZw6jwCd/NfMa/qRf6Mz27fXN7Nmh+P8MkKtkfsJS G9b3/B+y6vN7HB9m8sgnp74ZsBnMiEChdgC/mdzSdmKXThkaV5TpWzwoj D3XXhvHKCVCKGy6wRdIqz+j0gPhDfdOARM/0Lq2BHK+oEHPJfVIgQ9G+g yUFHbt5zjdERwNh1U++ugZYY59H+GoE3G4TrtvSYoWlylWJV/sxyJlIIb +RkUO1hMh7cTp2VWP9G4y/z1SuCX+hf0A0dWEgGqTNstcdCTL08KtrvGQ 32F681mWMQABjcug/kfOpp/TzY7mrQdUP6Ufgz+RvIIPIcP2T3l5xyqS2 A==; IronPort-SDR: WdeAgK75X5uA/cYaTcHttFhiLQXZWtHcKl1ONfWzELet24NCm81pzNvIXcM5K1qbx9NSnglyqT 0bYdZaFh6qPl9UBJ34t3xdpT8ajmR+P1kVwzhwYPefCZgTqtTdG9BN+nlVmi8krgHLDD6qncHv j6gCKBbAf585y2lB5/OnqhUNbTyAV14sE2rSYjgRTSwJkPW0NBTmOsp/L0oUrOsu4gSesksZbZ 9DgynqH7KnJ5udoSTwhUpPJGai2fk7j8hCfwY7LxOWsSLO+jNKBcq0jLJ1RmdnBqwWqi6+iTUp WjY= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363413" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:57 +0800 IronPort-SDR: CkGjALXS95f8WL9FehMZpuH49NJCtmDoHEUweLbQdpjcsvqcl4E0IaT5bNN1TbK+L6ZOlfmcl+ GyVALfyHp74fr7kwqTSgnGPZTlps54CKuNnI5SXBFSPQVVZj1mAq21TSUoG+RPhaEO/yAvwT3d nKhs/kAa767iHRmqxxExwqPphEBMq1dy8v1Pvfx72rE2Xg4C5X/gTHmjkHd+c3+t74iiz0TkWF 354ZEp41YRpgNYWP4hvaLDOeTFDlZiQKyHInQlwKZUaIEmIoDx/+jFcc9I+4RZ5cKE4hlxhs3h d8R7RR6lcECxeJZ8wJPYKpyE Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:41 -0700 IronPort-SDR: JYpQuDawn0Z7BQ84sS6TT1riyLEjVrAe0skOlDIcHSS5NbjzKGHGki4fGq+Of+X0s6aM1slT1Q jFeWQmxXthlonZgvpENicBWHjKjZ4Lk/EC2NO585N9LTzJlDaFwZ3cxcKJE4EuGPqqeJz4To59 kX3GUu5TTuDKgnLNs7M8BaBvOAZHyGj2IAJhkIRlCZKV44WvCw5WHvTsldjfh3oSUCTvwrxkjF /hhboxx/rKBHCfMdR7JhLgMhgKOOtkgRDuT/cmtSUeNLnc3JdnE2LopKGSdIadPe02yJyyNzCH aG0= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:56 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 23/27] btrfs: disallow inode_cache in HMZONED mode Date: Thu, 8 Aug 2019 18:30:34 +0900 Message-Id: <20190808093038.4163421-24-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP inode_cache use pre-allocation to write its cache data. However, pre-allocation is completely disabled in HMZONED mode. We can technically enable inode_cache in the same way as relocation. However, inode_cache is rarely used and the man page discourage using it. So, let's just disable it for now. Signed-off-by: Naohiro Aota --- fs/btrfs/hmzoned.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 123d9c804c21..8529106321ac 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -275,6 +275,12 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info) return -EINVAL; } + if (btrfs_test_pending(info, SET_INODE_MAP_CACHE)) { + btrfs_err(info, + "cannot enable inode map caching with HMZONED mode"); + return -EINVAL; + } + return 0; } From patchwork Thu Aug 8 09:30:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39A3B14DB for ; Thu, 8 Aug 2019 09:32:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 29CE028A4E for ; Thu, 8 Aug 2019 09:32:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DAF128AFC; Thu, 8 Aug 2019 09:32:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 97C0C28AFD for ; Thu, 8 Aug 2019 09:32:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389632AbfHHJcB (ORCPT ); Thu, 8 Aug 2019 05:32:01 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732459AbfHHJcA (ORCPT ); Thu, 8 Aug 2019 05:32:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256719; x=1596792719; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FpMsOIcUl0QbkS9l++/SpIQW6ZLpofUQQinEIkqlVho=; b=J/aqvkvJyesLf1Kc2dN/9tG0lWez90TLCefd9iHrPNU1DgHjou9nIAOD cqQZyw6ZFYmmtriQX1u5bXhTMZQM5+sZLXZIX1d/AXJT35JPhF6og2QwB KEejbcHJztx9wN4zkFf3apwReElgygj/24bCmKGPkCvWrtoCyKWU3drRV ZzemtqRamoiLNABVIkHPvX+BMCpVWhYVi2w+gGQ7WLpqSbAbDIXk16uX5 EzkV95zg5WEBT18E5OVzT+Dy7oEjiMym90Cy2PpizT09ZiCEgMl7cRZiQ O+h7cGWujho9qxtmadPlpFIRj6/PC0Gq5m0nSIC3XPGjX20EJGpyw5AJ7 w==; IronPort-SDR: PlhzKi6CfZUQWSgZ36PMeBhLf3WMECiUaFYi60Uia1zv6O9SqHMCK9nHab8giLQb4pLcoLp4pV V6BBjzBccHtOBm/qC1FbFgmCI7P70H6tn8+0V1bMF0Y3HipLhmrj3OcxnbEvFc2OemkA3PvGIV ZlkpbCdCK8FZ/klgJMlHDVuFPoINA2jkc2vcTjmgm723tY/eEnyopamo0dRc1wAT3J0CiL15CZ W9wXdXlxOhcjuUeatBHtICgMBTuZ4JVKeFAQOay88bM+OT/a68NmHjcQoWlaXXySIE/loeNQd/ 6LE= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363417" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:31:59 +0800 IronPort-SDR: td4BzzrfDNRFGfs9xexMWHs2pUlVGaI7eSuoSIn6jCw6s9BqUpUk9oJukhuZ3MfRIVsjHXoxqo VDFxoYjS0FE7pWzR3w0enhxbEU3ow0taaKSiMqYPze1UPeQOTh5qtIA1GlNPR0JztltR61PQ/7 0RUQ/RZXX2DRxdqczDLpxUmgB6t++ixSfCd5EaRvcFuWyg4mNoZkZfxVh9JtO73OyNjDCY888L Hv39G/ePfjynJAePNrtfPPrarZTDxGQTJr0lfRDEG/0maM0iLR9Zvg7bgjo2h4/sSYebDC02ye B+d//6AJ3O7AYP30umr70lTX Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:43 -0700 IronPort-SDR: GSxOrhDmLDGdy21ltZHCxOcjpU4hXCzi706+dONrGQTPEyx3JRtYkNfpSD2T4AXS1Y8bc3LiLT RorWO/iRyOYLEFp9rdThhpJ7faUuWG5HelQ5usyNAmWDo2/2chkgFNylCNib43XL7HjqeBSMuc 2cqQY4lQtFDnZSaJp/MiFyRIK3vansVdRsMdwhsJKugrQeqPL0EX9MUWZQq6nAvNv2Zqq5FqYJ tO57leCF0HwjjLzkZ8D0dO2n72lG9+EDJzwSn7RY8Z9XrCp5mXqmjTHANH9DnE7387ctZj7x/i F1I= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:31:58 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 24/27] btrfs: support dev-replace in HMZONED mode Date: Thu, 8 Aug 2019 18:30:35 +0900 Message-Id: <20190808093038.4163421-25-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, dev-replace copy all the device extents on source device to the target device, and it also clones new incoming write I/Os from users to the source device into the target device. Cloning incoming IOs can break the sequential write rule in the target device. When write is mapped in the middle of block group, that I/O is directed in the middle of a zone of target device, which breaks the sequential write rule. However, the cloning function cannot be simply disabled since incoming I/Os targeting already copied device extents must be cloned so that the I/O is executed on the target device. We cannot use dev_replace->cursor_{left,right} to determine whether bio is going to not yet copied region. Since we have time gap between finishing btrfs_scrub_dev() and rewriting the mapping tree in btrfs_dev_replace_finishing(), we can have newly allocated device extent which is never cloned (by handle_ops_on_dev_replace) nor copied (by the dev-replace process). So the point is to copy only already existing device extents. This patch introduce mark_block_group_to_copy() to mark existing block group as a target of copying. Then, handle_ops_on_dev_replace() and dev-replace can check the flag to do their job. This patch also handles empty region between used extents. Since dev-replace is smart to copy only used extents on source device, we have to fill the gap to honor the sequential write rule in the target device. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 1 + fs/btrfs/dev-replace.c | 147 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/dev-replace.h | 3 + fs/btrfs/extent-tree.c | 20 +++++- fs/btrfs/hmzoned.c | 77 +++++++++++++++++++++ fs/btrfs/hmzoned.h | 4 ++ fs/btrfs/scrub.c | 83 ++++++++++++++++++++++- fs/btrfs/volumes.c | 40 ++++++++++- 8 files changed, 370 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a6a03fc5e4c5..1282840a2db8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -536,6 +536,7 @@ struct btrfs_block_group_cache { unsigned int has_caching_ctl:1; unsigned int removed:1; unsigned int wp_broken:1; + unsigned int to_copy:1; int disk_cache_state; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 2cc3ac4d101d..7ef1654aed9d 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -264,6 +264,10 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE); device->fs_devices = fs_info->fs_devices; + ret = btrfs_get_dev_zone_info(device); + if (ret) + goto error; + mutex_lock(&fs_info->fs_devices->device_list_mutex); list_add(&device->dev_list, &fs_info->fs_devices->devices); fs_info->fs_devices->num_devices++; @@ -398,6 +402,143 @@ static char* btrfs_dev_name(struct btrfs_device *device) return rcu_str_deref(device->name); } +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info, + struct btrfs_device *src_dev) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_dev_extent *dev_extent = NULL; + struct btrfs_block_group_cache *cache; + struct extent_buffer *l; + int slot; + int ret; + u64 chunk_offset, length; + + /* Do not use "to_copy" on non-HMZONED for now */ + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + path->reada = READA_FORWARD; + path->search_commit_root = 1; + path->skip_locking = 1; + + key.objectid = src_dev->devid; + key.offset = 0ull; + key.type = BTRFS_DEV_EXTENT_KEY; + + while (1) { + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + break; + if (ret > 0) { + if (path->slots[0] >= + btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + break; + if (ret > 0) { + ret = 0; + break; + } + } else { + ret = 0; + } + } + + l = path->nodes[0]; + slot = path->slots[0]; + + btrfs_item_key_to_cpu(l, &found_key, slot); + + if (found_key.objectid != src_dev->devid) + break; + + if (found_key.type != BTRFS_DEV_EXTENT_KEY) + break; + + if (found_key.offset < key.offset) + break; + + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent); + length = btrfs_dev_extent_length(l, dev_extent); + + chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + if (!cache) + goto skip; + + spin_lock(&cache->lock); + cache->to_copy = 1; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + +skip: + key.offset = found_key.offset + length; + btrfs_release_path(path); + } + + btrfs_free_path(path); + + return ret; +} + +void btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group_cache *cache, + u64 physical) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map *em; + struct map_lookup *map; + u64 chunk_offset = cache->key.objectid; + int num_extents, cur_extent; + int i; + + em = btrfs_get_chunk_map(fs_info, chunk_offset, 1); + BUG_ON(IS_ERR(em)); + map = em->map_lookup; + + num_extents = cur_extent = 0; + for (i = 0; i < map->num_stripes; i++) { + /* we have more device extent to copy */ + if (srcdev != map->stripes[i].dev) + continue; + + num_extents++; + if (physical == map->stripes[i].physical) + cur_extent = i; + } + + free_extent_map(em); + + if (num_extents > 1) { + if (cur_extent == 0) { + /* + * first stripe on this device. Keep this BG + * readonly until we finish all the stripes. + */ + btrfs_inc_block_group_ro(cache); + } else if (cur_extent == num_extents - 1) { + /* last stripe on this device */ + btrfs_dec_block_group_ro(cache); + spin_lock(&cache->lock); + cache->to_copy = 0; + spin_unlock(&cache->lock); + } + } else { + spin_lock(&cache->lock); + cache->to_copy = 0; + spin_unlock(&cache->lock); + } +} + static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, const char *tgtdev_name, u64 srcdevid, const char *srcdev_name, int read_src) @@ -439,6 +580,12 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, if (ret) return ret; + mutex_lock(&fs_info->chunk_mutex); + ret = mark_block_group_to_copy(fs_info, src_device); + mutex_unlock(&fs_info->chunk_mutex); + if (ret) + return ret; + down_write(&dev_replace->rwsem); switch (dev_replace->replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h index 78c5d8f1adda..5ba60345dbf8 100644 --- a/fs/btrfs/dev-replace.h +++ b/fs/btrfs/dev-replace.h @@ -18,5 +18,8 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info); void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info); int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info); int btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace); +void btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group_cache *cache, + u64 physical); #endif diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5b1a9e607555..e68872571f18 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -33,6 +33,7 @@ #include "delalloc-space.h" #include "rcu-string.h" #include "hmzoned.h" +#include "dev-replace.h" #undef SCRAMBLE_DELAYED_REFS @@ -1949,6 +1950,8 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 length = stripe->length; u64 bytes; struct request_queue *req_q; + struct btrfs_dev_replace *dev_replace = + &fs_info->dev_replace; if (!stripe->dev->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); @@ -1958,15 +1961,28 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, req_q = bdev_get_queue(stripe->dev->bdev); /* zone reset in HMZONED mode */ - if (btrfs_can_zone_reset(dev, physical, length)) + if (btrfs_can_zone_reset(dev, physical, length)) { ret = btrfs_reset_device_zone(dev, physical, length, &bytes); - else if (blk_queue_discard(req_q)) + if (ret) + goto next; + if (!btrfs_dev_replace_is_ongoing( + dev_replace) || + dev != dev_replace->srcdev) + goto next; + + discarded_bytes += bytes; + /* send to replace target as well */ + ret = btrfs_reset_device_zone( + dev_replace->tgtdev, + physical, length, &bytes); + } else if (blk_queue_discard(req_q)) ret = btrfs_issue_discard(dev->bdev, physical, length, &bytes); else continue; +next: if (!ret) discarded_bytes += bytes; else if (ret != -EOPNOTSUPP) diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c index 8529106321ac..76230ad80a68 100644 --- a/fs/btrfs/hmzoned.c +++ b/fs/btrfs/hmzoned.c @@ -706,3 +706,80 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, return true; } + +int btrfs_hmzoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length) +{ + if (!btrfs_dev_is_sequential(device, physical)) + return -EOPNOTSUPP; + + return blkdev_issue_zeroout(device->bdev, + physical >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS, 0); +} + +static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical, + struct blk_zone *zone) +{ + struct btrfs_bio *bbio = NULL; + u64 mapped_length = PAGE_SIZE; + int nmirrors; + int i, ret; + + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, + &mapped_length, &bbio); + if (ret || !bbio || mapped_length < PAGE_SIZE) { + btrfs_put_bbio(bbio); + return -EIO; + } + + if (bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + return -EINVAL; + + nmirrors = (int)bbio->num_stripes; + for (i = 0; i < nmirrors; i++) { + u64 physical = bbio->stripes[i].physical; + struct btrfs_device *dev = bbio->stripes[i].dev; + + /* missing device */ + if (!dev->bdev) + continue; + + ret = btrfs_get_dev_zone(dev, physical, zone, GFP_NOFS); + /* failing device */ + if (ret == -EIO || ret == -EOPNOTSUPP) + continue; + break; + } + + return ret; +} + +int btrfs_sync_hmzone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos) +{ + struct btrfs_fs_info *fs_info = tgt_dev->fs_info; + struct blk_zone zone; + u64 length; + u64 wp; + int ret; + + if (!btrfs_dev_is_sequential(tgt_dev, physical_pos)) + return 0; + + ret = read_zone_info(fs_info, logical, &zone); + if (ret) + return ret; + + wp = physical_start + ((zone.wp - zone.start) << SECTOR_SHIFT); + + if (physical_pos == wp) + return 0; + + if (physical_pos > wp) + return -EUCLEAN; + + length = wp - physical_pos; + return btrfs_hmzoned_issue_zeroout(tgt_dev, physical_pos, length); +} diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h index c68c4b8056a4..b0bb96404a24 100644 --- a/fs/btrfs/hmzoned.h +++ b/fs/btrfs/hmzoned.h @@ -43,6 +43,10 @@ void btrfs_hmzoned_data_io_unlock_at(struct inode *inode, u64 start, u64 len); bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, struct btrfs_block_group_cache **cache_ret); +int btrfs_hmzoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length); +int btrfs_sync_hmzone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos); static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) { diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index e15d846c700a..9f3484597338 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -167,6 +167,7 @@ struct scrub_ctx { int pages_per_rd_bio; int is_dev_replace; + u64 write_pointer; struct scrub_bio *wr_curr_bio; struct mutex wr_lock; @@ -1648,6 +1649,23 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, sbio = sctx->wr_curr_bio; if (sbio->page_count == 0) { struct bio *bio; + u64 physical = spage->physical_for_dev_replace; + + if (btrfs_fs_incompat(sctx->fs_info, HMZONED) && + sctx->write_pointer < physical) { + u64 length = physical - sctx->write_pointer; + + ret = btrfs_hmzoned_issue_zeroout(sctx->wr_tgtdev, + sctx->write_pointer, + length); + if (ret == -EOPNOTSUPP) + ret = 0; + if (ret) { + mutex_unlock(&sctx->wr_lock); + return ret; + } + sctx->write_pointer = physical; + } sbio->physical = spage->physical_for_dev_replace; sbio->logical = spage->logical; @@ -1710,6 +1728,10 @@ static void scrub_wr_submit(struct scrub_ctx *sctx) * doubled the write performance on spinning disks when measured * with Linux 3.5 */ btrfsic_submit_bio(sbio->bio); + + if (btrfs_fs_incompat(sctx->fs_info, HMZONED)) + sctx->write_pointer = sbio->physical + + sbio->page_count * PAGE_SIZE; } static void scrub_wr_bio_end_io(struct bio *bio) @@ -3043,6 +3065,21 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return ret < 0 ? ret : 0; } +void sync_replace_for_hmzoned(struct scrub_ctx *sctx) +{ + if (!btrfs_fs_incompat(sctx->fs_info, HMZONED)) + return; + + sctx->flush_all_writes = true; + scrub_submit(sctx); + mutex_lock(&sctx->wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_lock); + + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3174,6 +3211,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ blk_start_plug(&plug); + if (sctx->is_dev_replace && + btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { + mutex_lock(&sctx->wr_lock); + sctx->write_pointer = physical; + mutex_unlock(&sctx->wr_lock); + sctx->flush_all_writes = true; + } + /* * now find all extents for each stripe and scrub them */ @@ -3346,6 +3391,9 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) goto out; + if (sctx->is_dev_replace) + sync_replace_for_hmzoned(sctx); + if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -3413,6 +3461,26 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, blk_finish_plug(&plug); btrfs_free_path(path); btrfs_free_path(ppath); + + if (btrfs_fs_incompat(fs_info, HMZONED) && sctx->is_dev_replace && + ret >= 0) { + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); + + mutex_lock(&sctx->wr_lock); + if (sctx->write_pointer < physical_end) { + ret = btrfs_sync_hmzone_write_pointer( + sctx->wr_tgtdev, base + offset, + map->stripes[num].physical, + sctx->write_pointer); + if (ret) + btrfs_err(fs_info, "failed to recover write pointer"); + } + mutex_unlock(&sctx->wr_lock); + btrfs_dev_clear_zone_empty(sctx->wr_tgtdev, + map->stripes[num].physical); + } + return ret < 0 ? ret : 0; } @@ -3554,6 +3622,14 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + spin_lock(&cache->lock); + if (sctx->is_dev_replace && !cache->to_copy) { + spin_unlock(&cache->lock); + ro_set = 0; + goto done; + } + spin_unlock(&cache->lock); + /* * we need call btrfs_inc_block_group_ro() with scrubs_paused, * to avoid deadlock caused by: @@ -3588,7 +3664,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, ret = btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->key.objectid, cache->key.offset); - if (ret > 0) { + if (ret >= 0) { struct btrfs_trans_handle *trans; trans = btrfs_join_transaction(root); @@ -3664,6 +3740,11 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + if (sctx->is_dev_replace) + btrfs_finish_block_group_to_copy( + dev_replace->srcdev, cache, found_key.offset); + +done: down_write(&fs_info->dev_replace.rwsem); dev_replace->cursor_left = dev_replace->cursor_right; dev_replace->item_needs_writeback = 1; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 265a1496e459..07e7528fb23e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1592,6 +1592,9 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, search_start = max_t(u64, search_start, zone_size); search_start = btrfs_zone_align(device, search_start); + WARN_ON(device->zone_info && + !IS_ALIGNED(num_bytes, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -5894,9 +5897,29 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info, return ret; } +static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group_cache *cache; + bool ret; + + /* non-HMZONED mode does not use "to_copy" flag */ + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return false; + + cache = btrfs_lookup_block_group(fs_info, logical); + + spin_lock(&cache->lock); + ret = cache->to_copy; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + return ret; +} + static void handle_ops_on_dev_replace(enum btrfs_map_op op, struct btrfs_bio **bbio_ret, struct btrfs_dev_replace *dev_replace, + u64 logical, int *num_stripes_ret, int *max_errors_ret) { struct btrfs_bio *bbio = *bbio_ret; @@ -5909,6 +5932,15 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, if (op == BTRFS_MAP_WRITE) { int index_where_to_add; + /* + * a block group which have "to_copy" set will + * eventually copied by dev-replace process. We can + * avoid cloning IO here. + */ + if (is_block_group_to_copy(dev_replace->srcdev->fs_info, + logical)) + return; + /* * duplicate the write operations while the dev replace * procedure is running. Since the copying of the old disk to @@ -5936,6 +5968,10 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, index_where_to_add++; max_errors++; tgtdev_indexes++; + + /* mark this zone as non-empty */ + btrfs_dev_clear_zone_empty(new->dev, + new->physical); } } num_stripes = index_where_to_add; @@ -6321,8 +6357,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bbio, dev_replace, &num_stripes, - &max_errors); + handle_ops_on_dev_replace(op, &bbio, dev_replace, logical, + &num_stripes, &max_errors); } *bbio_ret = bbio; From patchwork Thu Aug 8 09:30:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083819 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AE9DA1398 for ; Thu, 8 Aug 2019 09:32:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9CCF628A4E for ; Thu, 8 Aug 2019 09:32:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90D9D28B04; Thu, 8 Aug 2019 09:32:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B998128A4E for ; Thu, 8 Aug 2019 09:32:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389610AbfHHJcD (ORCPT ); Thu, 8 Aug 2019 05:32:03 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389565AbfHHJcC (ORCPT ); Thu, 8 Aug 2019 05:32:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256721; x=1596792721; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DU4KgyM/kRqpnP/dYGB14KvaBQspJzwf3tIDWQVOR+o=; b=II1O9VFNRQOcHCsL/w0f48JhKhoRJTqEeJdngwg/Na7kQ4nLYkOACECi ilKYukn+Psom0+YYO8PJ19rhpQXFaotUyPOUSCo1Ew/DmtMAktjjQTDBi tHBid91B4gVkEt4oyIdj4CdemU+BD4rBdfyS6A9j8qwtp8+0jotjUQ0Pj f1THMHlRuAYOAiH8wRHJMZe2eZl8DRGAzh/9v2Qfke0eU8zCV+JZpPKWD BdaK+o7WzTcwZE4vGq2Y7EsTYG79a9C9WrNGTyJPskUFxXooOQm0OC6tm PhE6gioDWMqm5lId7p39ODu4/nIsgCKU5fmk7rdxTmsJ2VuNMMbuW2Jql w==; IronPort-SDR: xIC0DV49iQWFYdiylaeSclnYJ7O8xMCv3yODySinU5kq1U7/c9Jj0Bo4gJeYvvS47pZJXzCSMi NTiuAvag9Z1dOQps+n8P0hDNI/rLVcvuKFu/7LMw1t+n70VM4/sQsD83ycSn5uQEsYO8DktVuh T9kXtUt5h67Y56AV/SUhwzuiPbJ8rlLClQSUUeN2YJWIccYF+Hh/Zkp1mkmPE2OOHGRiVr5OcG edRJoRDWwRq3GM0lgUqgERs6+W35x0FJT+eW7PMCjWxfmWynwQUFJSGqxNawI6rIsFLvMbxBIl QSk= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363422" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:32:01 +0800 IronPort-SDR: J4SwgkOV9eQjQahWWC4gEiYGLrVDi6FakzoL9Nvnct9Y/WaCEfP/W3z2moisBihtr+c5QlX//S RcLoZDdfeGh17nwV5II+KgxDVz1siFhmUR2H0ISbDRx7GSWtgYpijt6Z9TDL8ir2cvc0LGTzbR nfAhIS8VFNiPVqrLwePJoLylnmZXfBxHY8URoPOa7ytlzuoKvJXIPNA9Kd1splugBo801Fao2T ZNgMpC7v6sCLMwZCnx+71JR766oOGHTqyB1wVfGb6ppnOzGS6CfaxxLyvNPYnBn9nhyoxZq9qV seTuMXaiY0pUth7txSpiQVDq Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:45 -0700 IronPort-SDR: IkIdsbhSZKPuHdzcau+I8HYTRv0Z6i43TqdcOBxrbFQIg2dRAK+XwIwHPZCCaWIRWcnwQIwRlu jxhmSTM4WtsB+nVQCM451wa7dcXbxEOtLgvovyXftQPcuQsP61nw3JGyS3v7hYY20UFaRgjjQh TUS3ZRo8snmtiV3dSCYkZHh7JaIPLx4fEtAboLjZxrOqzP3vyOIIUy+JowoMddIRCZuDuwCFs4 z8yIvYYGHBGGhOg6GvryNuFaEnOqu8w9gRa8Msc2vE9QMl2M6aWWCrCodWNSOpIDe1t5B2bmHR Vjc= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:32:00 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 25/27] btrfs: enable relocation in HMZONED mode Date: Thu, 8 Aug 2019 18:30:36 +0900 Message-Id: <20190808093038.4163421-26-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP To serialize allocation and submit_bio, we introduced mutex around them. As a result, preallocation must be completely disabled to avoid a deadlock. Since current relocation process relies on preallocation to move file data extents, it must be handled in another way. In HMZONED mode, we just truncate the inode to the size that we wanted to pre-allocate. Then, we flush dirty pages on the file before finishing relocation process. run_delalloc_hmzoned() will handle all the allocation and submit IOs to the underlying layers. Signed-off-by: Naohiro Aota --- fs/btrfs/relocation.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 7f219851fa23..d852e3389ee2 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3152,6 +3152,34 @@ int prealloc_file_extent_cluster(struct inode *inode, if (ret) goto out; + /* + * In HMZONED, we cannot preallocate the file region. Instead, + * we dirty and fiemap_write the region. + */ + + if (btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED)) { + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_trans_handle *trans; + + end = cluster->end - offset + 1; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + inode->i_ctime = current_time(inode); + i_size_write(inode, end); + btrfs_ordered_update_i_size(inode, end, NULL); + ret = btrfs_update_inode(trans, root, inode); + if (ret) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + } + ret = btrfs_end_transaction(trans); + + goto out; + } + cur_offset = prealloc_start; while (nr < cluster->nr) { start = cluster->boundary[nr] - offset; @@ -3340,6 +3368,10 @@ static int relocate_file_extent_cluster(struct inode *inode, btrfs_throttle(fs_info); } WARN_ON(nr != cluster->nr); + if (btrfs_fs_incompat(fs_info, HMZONED) && !ret) { + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); + WARN_ON(ret); + } out: kfree(ra); return ret; @@ -4180,8 +4212,12 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_inode_item *item; struct extent_buffer *leaf; + u64 flags = BTRFS_INODE_NOCOMPRESS | BTRFS_INODE_PREALLOC; int ret; + if (btrfs_fs_incompat(trans->fs_info, HMZONED)) + flags &= ~BTRFS_INODE_PREALLOC; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -4196,8 +4232,7 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(leaf, item, 1); btrfs_set_inode_size(leaf, item, 0); btrfs_set_inode_mode(leaf, item, S_IFREG | 0600); - btrfs_set_inode_flags(leaf, item, BTRFS_INODE_NOCOMPRESS | - BTRFS_INODE_PREALLOC); + btrfs_set_inode_flags(leaf, item, flags); btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); From patchwork Thu Aug 8 09:30:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083827 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 793281399 for ; Thu, 8 Aug 2019 09:32:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 69C6E28A4E for ; Thu, 8 Aug 2019 09:32:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5DAFE28AFD; Thu, 8 Aug 2019 09:32:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE94E28A4E for ; Thu, 8 Aug 2019 09:32:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389719AbfHHJcG (ORCPT ); Thu, 8 Aug 2019 05:32:06 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59666 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389695AbfHHJcE (ORCPT ); Thu, 8 Aug 2019 05:32:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256723; x=1596792723; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GkNICqHv3LHdQ3RjvVkkYh7gT17X0jeX8dQZMkj4w0Q=; b=Ee+oCqAFgB4BWsFfhs5cDZ5trrBy1fLy9toFvlH6OMKiGgC12akdCc11 BCoXT3MkuJbFcxy+xauGeU5E6dtdUYzlhW1zyRxVpO3tUx4iCINA70MBr axNbB8Pwi00P0YZH82smgyQ2OwMlfvYxhT6Gy5mlO5bF0v4DIZXR1c14O V2aophRGhCA67xmQ/0sd4KK673VVoP+S5x0qDeyUcO6Iu0szi0B8b9UAr 0iLMyiEb8DOS4j9B4R6DcUcSUojRedQTXPe8znvBb1X5LNjac3XvfAoqH 7hO3LkJFO9DNzSMmKgAMizjEnu2oQgSAchV0P9qZFQ4iG6h6TdlFQjiXv Q==; IronPort-SDR: jGqxPYJAS85KTpvPwcBHZPMoFIEdi14Pa1Xhk//0BCo5H9CKRqN38V4yM54iIVKHTruVcnE5Rm 4GYpCjKvpRACvoW61mvmp34aC+fogCtEphEVoUaisb2LyfPx+dObTUkMSAMzYwUwMhtZz13s+I bVYv+0GV5DKzapQom5TyX9NwcW8fhmxX3cl1EQt3lJblIEhM67UxNQDWzg0X0D8FmY4D0DfZB5 R6qhB7C0aVNWjGDLFbTmIQKL0lrrddcONipJ3apJ1DCUUCwe5Vy+dQSrqjtabe25xPMJ5vWvYz WUE= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363424" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:32:03 +0800 IronPort-SDR: KtTxvZlfBwTq0ciTnNh5HBMx18/8BTkkQTgxEj48PpaaaY6KAKVQ5Ai+ufCVVjbXZwblsZvPN0 PyJfWvEsVs2KuScYTp2wYctEMOpVbAxQkPBc0uFfDXcG50gvhGE4OqUDZi3NUJgkqtx2UvQ7fg uIWs03RANArpJwn2yDwOn6Px827wFT+ZeNDbWPmKlJuDo9rimLabGLp5jPx85xylTmfoGMNAK5 MKV4USikmE+uKYWP0jmRY+m4pfBiikSPT17yCoysCxHG+X1nJ/DhzaytFFOlcbf+tOtyzlYBoY /Em0f9AjrFgzOXQ/UCvOjIyH Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:47 -0700 IronPort-SDR: Bh6p7ibxGijovDG2ZaRxpHOYyzbCO/GbjLLaTbXKMcrgUGmCotY85nV2ZG220M7rTDF+QHA8Sp ojnaqjEi4zluY6HuFaCWrkXTrSekemx6/XG9r2HKQktTX9hbqNuLr+hMaNZxwfeVb03W8IVCLD IfEuQ/EdUHrMjpJnPurmiv2YQRtUoNTm+dunzs+GVmNGYystmU2L0fn1NN4lILn1XvTftAvnxz VCLVgPjq27FVwLVx3lO4XUdNhtgVF1qDoVslcb4DJ8JwInY4WwVDe5MVcZ674OXpiPJqeN1g8T QN4= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:32:02 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 26/27] btrfs: relocate block group to repair IO failure in HMZONED Date: Thu, 8 Aug 2019 18:30:37 +0900 Message-Id: <20190808093038.4163421-27-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When btrfs find a checksum error and if the file system has a mirror of the damaged data, btrfs read the correct data from the mirror and write the data to damaged blocks. This repairing, however, is against the sequential write required rule. We can consider three methods to repair an IO failure in HMZONED mode: (1) Reset and rewrite the damaged zone (2) Allocate new device extent and replace the damaged device extent to the new extent (3) Relocate the corresponding block group Method (1) is most similar to a behavior done with regular devices. However, it also wipes non-damaged data in the same device extent, and so it unnecessary degrades non-damaged data. Method (2) is much like device replacing but done in the same device. It is safe because it keeps the device extent until the replacing finish. However, extending device replacing is non-trivial. It assumes "src_dev>physical == dst_dev->physical". Also, the extent mapping replacing function should be extended to support replacing device extent position in one device. Method (3) invokes relocation of the damaged block group, so it is straightforward to implement. It relocates all the mirrored device extents, so it is, potentially, a more costly operation than method (1) or (2). But it relocates only using extents which reduce the total IO size. Let's apply method (3) for now. In the future, we can extend device-replace and apply method (2). For protecting a block group gets relocated multiple time with multiple IO errors, this commit introduces "relocating_repair" bit to show it's now relocating to repair IO failures. Also it uses a new kthread "btrfs-relocating-repair", not to block IO path with relocating process. This commit also supports repairing in the scrub process. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent_io.c | 3 ++ fs/btrfs/scrub.c | 3 ++ fs/btrfs/volumes.c | 72 ++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 5 files changed, 80 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1282840a2db8..144cf9c13320 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -537,6 +537,7 @@ struct btrfs_block_group_cache { unsigned int removed:1; unsigned int wp_broken:1; unsigned int to_copy:1; + unsigned int relocating_repair:1; int disk_cache_state; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ff963b2214aa..0d3b61606b15 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2187,6 +2187,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); BUG_ON(!mirror_num); + if (btrfs_fs_incompat(fs_info, HMZONED)) + return btrfs_repair_one_hmzone(fs_info, logical); + bio = btrfs_io_bio_alloc(1); bio->bi_iter.bi_size = 0; map_length = length; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 9f3484597338..6dd5fa4ad657 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -861,6 +861,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) have_csum = sblock_to_check->pagev[0]->have_csum; dev = sblock_to_check->pagev[0]->dev; + if (btrfs_fs_incompat(fs_info, HMZONED) && !sctx->is_dev_replace) + return btrfs_repair_one_hmzone(fs_info, logical); + /* * We must use GFP_NOFS because the scrub task might be waiting for a * worker task executing this function and in turn a transaction commit diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 07e7528fb23e..20109f20f102 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -8006,3 +8006,75 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr) spin_unlock(&fs_info->swapfile_pins_lock); return node != NULL; } + +static int relocating_repair_kthread(void *data) +{ + struct btrfs_block_group_cache *cache = + (struct btrfs_block_group_cache *) data; + struct btrfs_fs_info *fs_info = cache->fs_info; + u64 target; + int ret = 0; + + target = cache->key.objectid; + btrfs_put_block_group(cache); + + if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) { + btrfs_info(fs_info, + "skip relocating block group %llu to repair: EBUSY", + target); + return -EBUSY; + } + + mutex_lock(&fs_info->delete_unused_bgs_mutex); + + /* ensure Block Group still exists */ + cache = btrfs_lookup_block_group(fs_info, target); + if (!cache) + goto out; + + if (!cache->relocating_repair) + goto out; + + ret = btrfs_may_alloc_data_chunk(fs_info, target); + if (ret < 0) + goto out; + + btrfs_info(fs_info, "relocating block group %llu to repair IO failure", + target); + ret = btrfs_relocate_chunk(fs_info, target); + +out: + if (cache) + btrfs_put_block_group(cache); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags); + + return ret; +} + +int btrfs_repair_one_hmzone(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group_cache *cache; + + /* do not attempt to repair in degraded state */ + if (btrfs_test_opt(fs_info, DEGRADED)) + return 0; + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache) + return 0; + + spin_lock(&cache->lock); + if (cache->relocating_repair) { + spin_unlock(&cache->lock); + btrfs_put_block_group(cache); + return 0; + } + cache->relocating_repair = 1; + spin_unlock(&cache->lock); + + kthread_run(relocating_repair_kthread, cache, + "btrfs-relocating-repair"); + + return 0; +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 5da1f354db93..ccb139d1f9c4 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -593,5 +593,6 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_repair_one_hmzone(struct btrfs_fs_info *fs_info, u64 logical); #endif From patchwork Thu Aug 8 09:30:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11083831 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B1841398 for ; Thu, 8 Aug 2019 09:32:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EF8B928A4E for ; Thu, 8 Aug 2019 09:32:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E3DED28AFD; Thu, 8 Aug 2019 09:32:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A27F28A4E for ; Thu, 8 Aug 2019 09:32:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389730AbfHHJcI (ORCPT ); Thu, 8 Aug 2019 05:32:08 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:59697 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389565AbfHHJcF (ORCPT ); Thu, 8 Aug 2019 05:32:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1565256725; x=1596792725; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lUU9T54QR52TyC+Jq5zzRhvcQsB4/EQha/EEg5ySYzA=; b=OeEqd8ckmVrF6kJ/G/A+8v54jlCbmIfAhpARIWXcft4gC5UFyHUf93Q1 pTKZCSulDthc0eySGQB2D7+XgmBTS1RlXIKVETwDjiHntShuINBxoLXFq +mm/wjpil5Vs5ovtBuD4tUDUxnGixe3p+LRkBWZyyaC3MogxLAQPtMYqo LsLySVjucWRE9XEgRQQyv4zUr+6Km6hl958EX/Q7UEZZTtxHTdGEJmbru sIfynvXhjjfczvNh9UZ0qDB+mNN+sPW7CFpFISALjH31+ruXw55a8cKba BpnBbd7p8QHSB6CREZrxCqXop9IoUBOSi9fE2GzT2hZTgNdHXD9yJ6V/Z A==; IronPort-SDR: 0TvEN1yKtNhNhxWNZQo6AnKJAt2+l2LL9oFQB52W7bUAzSkE2r7DzhM7VOc5uQwzOPL69rQ6jk H5PNnZddHhuqP9MTjnLYIYq3OGaaFcZXWMvG4eJSzGN7m9yYDlNftzQ+FNTpWsKzK90gY1sHFR h6AzHSLGTwO5N/vVrzfrVFSWiVN4ZkFkmTehkq+h4SGHo/C3WpO+lFZTJxJw0EYRP2bkeHGPto J3ltGm6Fw/NwqQSkSmCJHaMcOvywinENVShD7fMYNhIKN1jQzeOHVGso/e2he2Ooo4HHZRbUia Vg8= X-IronPort-AV: E=Sophos;i="5.64,360,1559491200"; d="scan'208";a="115363426" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2019 17:32:05 +0800 IronPort-SDR: STKCcgI/glddXFOD+zJijbLWCH60+819MlpO4+QKT0omDEAG31dRGqCxVMuivaHpUeJQ1m366T F4MgjnbttefLYVtrh9UpdnXQM4JHyl1AuP9pI5hmfPJHhsKFtvCJWfXlpyOE8aiBHq3/wXws4I mfP72yxtqPT4Mv1Qsi4Li4+/oeTOSu0jlLhh53X06K5m+QfnfQ8flUWNO9dEHuaFMiflEcj2bL e3UwI7CNEX1pBYByZB8mVLk+dlJUo0hdCPIMAiaDMGKnZZzhjpFf9NiDX4zYYWGI+OtbRtUQU8 8+DZK6KMq2WCXc9KgMVV55pa Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 02:29:49 -0700 IronPort-SDR: BDeWM+V8lbSQDQcfqQEm7/WRu1o1URSSz3UHOHqyi1GrzNxkXFTMjoaY0D+2t+FjHAm7ETcJEO JT0iKYYhjt1nT7DXY03ND6n382qK7nE4woT1H9NLUY6ZJC3sn20IE3PINtgZYUZrTz67/Ydy7c G6qghopsycPjmyJQcsBiFAELYdYx9a0kxfvusid5sVQR/vzi7Iyi4tE92RPeplC3kBlj+3WGQ7 lJ2QKkyCbUh6RX46jGI1FNuVbS03OBujkB+FdZ+PHJBm3u7ImecToPp6rMbcRcPDKaZFD5I4hK wxk= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 08 Aug 2019 02:32:04 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Matias Bjorling , Johannes Thumshirn , Hannes Reinecke , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v3 27/27] btrfs: enable to mount HMZONED incompat flag Date: Thu, 8 Aug 2019 18:30:38 +0900 Message-Id: <20190808093038.4163421-28-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com> References: <20190808093038.4163421-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This final patch adds the HMZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount HMZONED flagged file system. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 144cf9c13320..b9dc9d4e152d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -294,7 +294,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ - BTRFS_FEATURE_INCOMPAT_METADATA_UUID) + BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ + BTRFS_FEATURE_INCOMPAT_HMZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)