From patchwork Fri Jun 7 13:10:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4C1414E5 for ; Fri, 7 Jun 2019 13:13:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D6D1A288D3 for ; Fri, 7 Jun 2019 13:13:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CACBB28BFA; Fri, 7 Jun 2019 13:13:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A33D288E4 for ; Fri, 7 Jun 2019 13:13:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728943AbfFGNLQ (ORCPT ); Fri, 7 Jun 2019 09:11:16 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53156 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728098AbfFGNLO (ORCPT ); Fri, 7 Jun 2019 09:11:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913074; x=1591449074; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DlQ4GLxGdZJmE7yvObMcvG5bi4ACFJsGLoXRshNTPh4=; b=YnnTsX7TgGm2qbncVUjZ6CfHIxdV8PjRqWEC4pV9fFdld8oGEgGQgXhc IxwfFnTVFm4HGnxxVF+E720prI8UP2A74RXgmjEuq0X61poSx2rW01IgA HGr219GR7U8/YzYNk+Cgu1yyRrgAbs1NhUq1d5MpvZZTxoZG9SRGC3WV+ MgIJLemcIhUuIhQPy8iobqwIfMHosDTtHTuZo6wTtT6veM1IrGjKRb7K5 1egTIC51ZVpUS+nm+kWKbQUKpyPGyOHIwRE5pTy7vmi5xs83dgLHKIf+l +deZVIQ4r4Me+GjIpKd/l2pxyDbRh7n3aE196oD9jnpX1GZwY9HGgFtjB Q==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027764" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:14 +0800 IronPort-SDR: cxtWHuDtXCXoPt8gegaPneVM5VhprwTNG0/zgciKLkD9vwhc2dQlngAAiAKmaWsG22rXqCf4xg apBqvpzxfbsKa8n1WxrR0ylFYmZdYfekgwTITkgJmW2Fp+HPyXIS5DEIPzvCUCWKDmVFN53NI0 NjfDubZA/9tOc19KcpsVt+cm1eUmU4jhiIc+2iOK+OGuCU4jNbtz1c1C5zNo235/BhNKcaRC3P H2NoxTjY9XnvC5IssVyXciEYosY6NeD8sKQ6YjRrIr4R4iUVLY7mtHQFCJvhoa7jWPa85ETgj1 rrGziCJ+NSXEoXAGRA4a1hUA Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:31 -0700 IronPort-SDR: /mUewI0BvY0IcXPufJN85DXHDBm7kLJrWkNdFR/j9w3Yx8O7m1GDyEqDFzwYyRLNgv6HdBHBep jOnug/QD5AdqhMzOFqtkUGzi2Fa8gEDxo+0T4N/L0EIP/ybIA9p9kBVawt2DH3I/+kkhs4pMiF CTPZ/RViJFuQdKc/+v8m8ngMQ17xtRlRdzO7kwIe5JTQtSyOtPXfSQLaPXq9iAQyibVIMhQYkb uxEyI2JgTRIzkJSGVEjregeQS9v/uGlTH8NTsep6vmo+t8jHrsvVYZTFCwduBWzWRslsd13BC6 Nvg= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:12 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 01/19] btrfs: introduce HMZONED feature flag Date: Fri, 7 Jun 2019 22:10:07 +0900 Message-Id: <20190607131025.31996-2-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces the HMZONED incompat flag. The flag indicates that the volume management will satisfy the constraints imposed by host-managed zoned block devices. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/sysfs.c | 2 ++ include/uapi/linux/btrfs.h | 1 + 2 files changed, 3 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 2f078b77fe14..ccb3d732e7d2 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -192,6 +192,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(raid56, RAID56); BTRFS_FEAT_ATTR_INCOMPAT(skinny_metadata, SKINNY_METADATA); BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES); BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID); +BTRFS_FEAT_ATTR_INCOMPAT(hmzoned, HMZONED); BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE); static struct attribute *btrfs_supported_feature_attrs[] = { @@ -206,6 +207,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(skinny_metadata), BTRFS_FEAT_ATTR_PTR(no_holes), BTRFS_FEAT_ATTR_PTR(metadata_uuid), + BTRFS_FEAT_ATTR_PTR(hmzoned), BTRFS_FEAT_ATTR_PTR(free_space_tree), NULL }; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index c195896d478f..2d5e8f801135 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -270,6 +270,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8) #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) +#define BTRFS_FEATURE_INCOMPAT_HMZONED (1ULL << 11) struct btrfs_ioctl_feature_flags { __u64 compat_flags; From patchwork Fri Jun 7 13:10:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED4D115E6 for ; Fri, 7 Jun 2019 13:13:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DECB32879D for ; Fri, 7 Jun 2019 13:13:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D3433288B9; Fri, 7 Jun 2019 13:13:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E16028B7A for ; Fri, 7 Jun 2019 13:13:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729061AbfFGNLV (ORCPT ); Fri, 7 Jun 2019 09:11:21 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53156 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728932AbfFGNLQ (ORCPT ); Fri, 7 Jun 2019 09:11:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913076; x=1591449076; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QliUIFG/v+yws6IlsTxN0SCd2dbN9LJ1vfQ0z2PLAQM=; b=Gh/nOvaIFzPE6BJ1ojN8heS8VHgSOonqlOptDcDMKU2p5/lyh8sgaK9i H7WFZmTEVFwkvAV9wfdEZGe4gaUf6g7UyUcKEFA7NZjNdvq/WZvMaaaUB Qo05wlX00bo6lY/13DdJ+U1H0T3Xvv8gkSRnmkh51/Np3gzrvQgEjkoWi /zEXCW2AZ/Vy3oGZs9ESzEP9FMBlU5mcH0TWxRL1m4TwuhFlJVdirgHeI tpg3NbXWuLdKSnRIGhWZuLJZWiWUph7d0BbYE9JdxQDP1xC1B7qJt479T 35xEG5S8woyVUG7iiqLIsG8Wi1Hz/0/gzShQMAdmJoDE5DtWyGZ7/WJXu w==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027768" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:16 +0800 IronPort-SDR: k5fYwwych66gL/sICCyt5O0lOfVooOie5jO5cwT7PlS4ocPmHbDmJUYWZW2YGhgTUskLw/nk7T qG0SdrKOiIW6ekumUuIfdvh+J3q7BmN5MyrOr6V5nmUE5sf7rANGOFZkXvAP/Q1K/T8CiqpsMk ZHG56+6+w+miAZI5wyO3X/hjw/4hP8FTqkCcg7BOv3wnfBatLQpyQpmQB798WS1TNIzMLnLCwZ lF91FTNNSEksioW/r3z6FRaIgdjB+9NSWABIk+lSKf1XmPDJnIxVHgPU6oe+Jz4lX6vzWGOrKY dbKglBUSqwGUuWLjkBOIh2Qy Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:33 -0700 IronPort-SDR: YkFT809UwetK/2akVNMwPFNQIpJ6hZvBTC0wGTL01WPT+RQYhcV5Hm1VGCh3iOWlHmdcrP9eD5 iubinUFsceC9F+3nV3qKcDv/+UlH8RoIxErJJb86fTQNluDkVPkEymxKN+qBFvYLV5+F5awtbj a4D0qhChRAHJk/AxFjLMOsjRvHgK6RxvHNZo8gXR3RvwMI9LA0bRFkY+0lbht2TUbCOdDS70wj JJLPDDX1LXraYXBnqhTn4vffuOSJ+IZP+9s58k+f2TC9nWEgUQQb8/cWwZDtXzf2X1m0PsV0TD 6XU= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:14 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 02/19] btrfs: Get zone information of zoned block devices Date: Fri, 7 Jun 2019 22:10:08 +0900 Message-Id: <20190607131025.31996-3-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If a zoned block device is found, get its zone information (number of zones and zone size) using the new helper function btrfs_get_dev_zonetypes(). To avoid costly run-time zone report commands to test the device zones type during block allocation, attach the seqzones bitmap to the device structure to indicate if a zone is sequential or accept random writes. This patch also introduces the helper function btrfs_dev_is_sequential() to test if the zone storing a block is a sequential write required zone. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 143 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 33 +++++++++++ 2 files changed, 176 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1c2a6e4b39da..b673178718e3 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -786,6 +786,135 @@ static int btrfs_free_stale_devices(const char *path, return ret; } +static int __btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, + struct blk_zone **zones, + unsigned int *nr_zones, gfp_t gfp_mask) +{ + struct blk_zone *z = *zones; + int ret; + + if (!z) { + z = kcalloc(*nr_zones, sizeof(struct blk_zone), GFP_KERNEL); + if (!z) + return -ENOMEM; + } + + ret = blkdev_report_zones(device->bdev, pos >> SECTOR_SHIFT, + z, nr_zones, gfp_mask); + if (ret != 0) { + btrfs_err(device->fs_info, "Get zone at %llu failed %d\n", + pos, ret); + return ret; + } + + *zones = z; + + return 0; +} + +static void btrfs_destroy_dev_zonetypes(struct btrfs_device *device) +{ + kfree(device->seq_zones); + kfree(device->empty_zones); + device->seq_zones = NULL; + device->empty_zones = NULL; + device->nr_zones = 0; + device->zone_size = 0; + device->zone_size_shift = 0; +} + +int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone, gfp_t gfp_mask) +{ + unsigned int nr_zones = 1; + int ret; + + ret = __btrfs_get_dev_zones(device, pos, &zone, &nr_zones, gfp_mask); + if (ret != 0 || !nr_zones) + return ret ? ret : -EIO; + + return 0; +} + +int btrfs_get_dev_zonetypes(struct btrfs_device *device) +{ + struct block_device *bdev = device->bdev; + sector_t nr_sectors = bdev->bd_part->nr_sects; + sector_t sector = 0; + struct blk_zone *zones = NULL; + unsigned int i, n = 0, nr_zones; + int ret; + + device->zone_size = 0; + device->zone_size_shift = 0; + device->nr_zones = 0; + device->seq_zones = NULL; + device->empty_zones = NULL; + + if (!bdev_is_zoned(bdev)) + return 0; + + device->zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT; + device->zone_size_shift = ilog2(device->zone_size); + device->nr_zones = nr_sectors >> ilog2(bdev_zone_sectors(bdev)); + if (nr_sectors & (bdev_zone_sectors(bdev) - 1)) + device->nr_zones++; + + device->seq_zones = kcalloc(BITS_TO_LONGS(device->nr_zones), + sizeof(*device->seq_zones), GFP_KERNEL); + if (!device->seq_zones) + return -ENOMEM; + + device->empty_zones = kcalloc(BITS_TO_LONGS(device->nr_zones), + sizeof(*device->empty_zones), GFP_KERNEL); + if (!device->empty_zones) + return -ENOMEM; + +#define BTRFS_REPORT_NR_ZONES 4096 + + /* Get zones type */ + while (sector < nr_sectors) { + nr_zones = BTRFS_REPORT_NR_ZONES; + ret = __btrfs_get_dev_zones(device, sector << SECTOR_SHIFT, + &zones, &nr_zones, GFP_KERNEL); + if (ret != 0 || !nr_zones) { + if (!ret) + ret = -EIO; + goto out; + } + + for (i = 0; i < nr_zones; i++) { + if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) + set_bit(n, device->seq_zones); + if (zones[i].cond == BLK_ZONE_COND_EMPTY) + set_bit(n, device->empty_zones); + sector = zones[i].start + zones[i].len; + n++; + } + } + + if (n != device->nr_zones) { + btrfs_err(device->fs_info, + "Inconsistent number of zones (%u / %u)\n", n, + device->nr_zones); + ret = -EIO; + goto out; + } + + btrfs_info(device->fs_info, + "host-%s zoned block device, %u zones of %llu sectors\n", + bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", + device->nr_zones, device->zone_size >> SECTOR_SHIFT); + +out: + kfree(zones); + + if (ret) + btrfs_destroy_dev_zonetypes(device); + + return ret; +} + static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *device, fmode_t flags, void *holder) @@ -842,6 +971,11 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; + /* Get zone type information of zoned block devices */ + ret = btrfs_get_dev_zonetypes(device); + if (ret != 0) + goto error_brelse; + fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { @@ -1243,6 +1377,7 @@ static void btrfs_close_bdev(struct btrfs_device *device) } blkdev_put(device->bdev, device->mode); + btrfs_destroy_dev_zonetypes(device); } static void btrfs_close_one_device(struct btrfs_device *device) @@ -2664,6 +2799,13 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path mutex_unlock(&fs_info->chunk_mutex); mutex_unlock(&fs_devices->device_list_mutex); + /* Get zone type information of zoned block devices */ + ret = btrfs_get_dev_zonetypes(device); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto error_sysfs; + } + if (seeding_dev) { mutex_lock(&fs_info->chunk_mutex); ret = init_first_rw_device(trans); @@ -2729,6 +2871,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path return ret; error_sysfs: + btrfs_destroy_dev_zonetypes(device); btrfs_sysfs_rm_device_link(fs_devices, device); mutex_lock(&fs_info->fs_devices->device_list_mutex); mutex_lock(&fs_info->chunk_mutex); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index b8a0e8d0672d..1599641e216c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -62,6 +62,16 @@ struct btrfs_device { struct block_device *bdev; + /* + * Number of zones, zone size and types of zones if bdev is a + * zoned block device. + */ + u64 zone_size; + u8 zone_size_shift; + u32 nr_zones; + unsigned long *seq_zones; + unsigned long *empty_zones; + /* the mode sent to blkdev_get */ fmode_t mode; @@ -476,6 +486,28 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans, int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset); struct extent_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info, u64 logical, u64 length); +int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone, gfp_t gfp_mask); + +static inline int btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) +{ + unsigned int zno = pos >> device->zone_size_shift; + + if (!device->seq_zones) + return 1; + + return test_bit(zno, device->seq_zones); +} + +static inline int btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) +{ + unsigned int zno = pos >> device->zone_size_shift; + + if (!device->empty_zones) + return 0; + + return test_bit(zno, device->empty_zones); +} static inline void btrfs_dev_stat_inc(struct btrfs_device *dev, int index) @@ -568,5 +600,6 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_get_dev_zonetypes(struct btrfs_device *device); #endif From patchwork Fri Jun 7 13:10:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981783 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9087314B6 for ; Fri, 7 Jun 2019 13:13:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80F1628857 for ; Fri, 7 Jun 2019 13:13:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7507E28965; Fri, 7 Jun 2019 13:13:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BF25D28857 for ; Fri, 7 Jun 2019 13:13:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728446AbfFGNNL (ORCPT ); Fri, 7 Jun 2019 09:13:11 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53156 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728098AbfFGNLS (ORCPT ); Fri, 7 Jun 2019 09:11:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913079; x=1591449079; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mzpyOljqlLkDcZETtYWI1q1h4uahN9tSbBV7ZSiRpqM=; b=rJn6Z3EhEcsyQXhS1Y8rn62B6H63hT4QcNCmVcgc+je7Am/EOmTpn8Za mvQXAZi6ERvvnxgZ3PyQsj3bEydY4mKMFrjmcPnDGkYBMpynD5oS0+QYi ACBjAiM6gdyZZP2PwDRVdo48ho+/22uk5QbCHGWzAx5j33tZK+nMdxvrP moeCI15expwKTrBoMK2DrLCiLZ9Va6obPwgH9OZofnLVnFY08HSKMLcpV tWBq6A84bbZ/p2OfdeKGM2tsagZMksBtwU1NKpcwZQaowX3+T3fwKRSSb +dTm/zUX0cm8A9MICjw8+b1Eyn5mYnr7VaIkKP0IQcMa3RCEC6FjepVQb w==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027772" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:19 +0800 IronPort-SDR: ldUrL+XqF7ew5p3xe8HCnXy28Wv9pr9wQdjnFE1yOwdJcRCEI7nXXKh9YfFfj48qG2PQR/XqBA nlUVOTETsI9RFy0gU/Dmb88cpMESMOm/3iee6s9rfzUYU0GQ5NEGrm+D6SyLN3boXSm494omT2 zbtmeCss57KS8TU7sVKFrT/lYxoA4FsbUCeDQ9vPRzNcap027RCvhkgj1BQixL+QvOlCvbMOY5 KkfeHC3cl/+zut27e/3AbHU2T+CMFQ+0NTJMQa/RDEzG4A8oe4ndVdBbhyt9+Bt4VDpGgYxyDq V3WY+APzDpVZGF73IJbvDCEk Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:35 -0700 IronPort-SDR: YPG82Uull4FiZKkSrx8jlsCgFqVnBtA0+4BQFCuPGHKdSCK8aLQtOhKRUAmsD2hmCyQfxKcI+i d45qXn2jeIpV6OKzk2nD4b3GYfOQ+qSwOqmwAncAquYJD0dG4/bSWqk4NyZElO8Gu0rLb8gkLA 3bf9I1aXcw5aFnV8RD3hRK5vq/vFofSkSowsfjT6hnpqK3cShiESmZVlhdmrFweqxjDSFYVz4R ub8Jjok4yiEjzxxSOw1rdWn5Whh+65c940KOPqfxO+CXPcNnSdQl92i2lmju2KrD418Dj35EI2 On8= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:17 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 03/19] btrfs: Check and enable HMZONED mode Date: Fri, 7 Jun 2019 22:10:09 +0900 Message-Id: <20190607131025.31996-4-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP HMZONED mode cannot be used together with the RAID5/6 profile for now. Introduce the function btrfs_check_hmzoned_mode() to check this. This function will also check if HMZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Additionally, as updates to the space cache are in-place, the space cache cannot be located over sequential zones and there is no guarantees that the device will have enough conventional zones to store this cache. Resolve this problem by disabling completely the space cache. This does not introduces any problems with sequential block groups: all the free space is located after the allocation pointer and no free space before the pointer. There is no need to have such cache. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 ++ fs/btrfs/dev-replace.c | 7 +++ fs/btrfs/disk-io.c | 7 +++ fs/btrfs/super.c | 12 ++--- fs/btrfs/volumes.c | 99 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 6 files changed, 124 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b81c331b28fa..6c00101407e4 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -806,6 +806,9 @@ struct btrfs_fs_info { struct btrfs_root *uuid_root; struct btrfs_root *free_space_root; + /* Zone size when in HMZONED mode */ + u64 zone_size; + /* the log root tree is a directory of all the other log roots */ struct btrfs_root *log_root_tree; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index ee0989c7e3a9..fbe5ea2a04ed 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -201,6 +201,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, return PTR_ERR(bdev); } + if ((bdev_zoned_model(bdev) == BLK_ZONED_HM && + !btrfs_fs_incompat(fs_info, HMZONED)) || + (!bdev_is_zoned(bdev) && btrfs_fs_incompat(fs_info, HMZONED))) { + ret = -EINVAL; + goto error; + } + filemap_write_and_wait(bdev->bd_inode->i_mapping); devices = &fs_info->fs_devices->devices; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 663efce22d98..7c1404c76768 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3086,6 +3086,13 @@ int open_ctree(struct super_block *sb, btrfs_free_extra_devids(fs_devices, 1); + ret = btrfs_check_hmzoned_mode(fs_info); + if (ret) { + btrfs_err(fs_info, "failed to init hmzoned mode: %d", + ret); + goto fail_block_groups; + } + ret = btrfs_sysfs_add_fsid(fs_devices, NULL); if (ret) { btrfs_err(fs_info, "failed to init sysfs fsid interface: %d", diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 2c66d9ea6a3b..740a701f16c5 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -435,11 +435,13 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, bool saved_compress_force; int no_compress = 0; - cache_gen = btrfs_super_cache_generation(info->super_copy); - if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE)) - btrfs_set_opt(info->mount_opt, FREE_SPACE_TREE); - else if (cache_gen) - btrfs_set_opt(info->mount_opt, SPACE_CACHE); + if (!btrfs_fs_incompat(info, HMZONED)) { + cache_gen = btrfs_super_cache_generation(info->super_copy); + if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE)) + btrfs_set_opt(info->mount_opt, FREE_SPACE_TREE); + else if (cache_gen) + btrfs_set_opt(info->mount_opt, SPACE_CACHE); + } /* * Even the options are empty, we still need to do extra check diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b673178718e3..b6f367d19dc9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1524,6 +1524,83 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, return ret; } +int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + u64 hmzoned_devices = 0; + u64 nr_devices = 0; + u64 zone_size = 0; + int incompat_hmzoned = btrfs_fs_incompat(fs_info, HMZONED); + int ret = 0; + + /* Count zoned devices */ + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (!device->bdev) + continue; + if (bdev_zoned_model(device->bdev) == BLK_ZONED_HM || + (bdev_zoned_model(device->bdev) == BLK_ZONED_HA && + incompat_hmzoned)) { + hmzoned_devices++; + if (!zone_size) { + zone_size = device->zone_size; + } else if (device->zone_size != zone_size) { + btrfs_err(fs_info, + "Zoned block devices must have equal zone sizes"); + ret = -EINVAL; + goto out; + } + } + nr_devices++; + } + + if (!hmzoned_devices && incompat_hmzoned) { + /* No zoned block device, disable HMZONED */ + btrfs_err(fs_info, "HMZONED enabled file system should have zoned devices"); + ret = -EINVAL; + goto out; + } + + if (!hmzoned_devices && !incompat_hmzoned) + goto out; + + fs_info->zone_size = zone_size; + + if (hmzoned_devices != nr_devices) { + btrfs_err(fs_info, + "zoned devices mixed with regular devices"); + ret = -EINVAL; + goto out; + } + + /* RAID56 is not allowed */ + if (btrfs_fs_incompat(fs_info, RAID56)) { + btrfs_err(fs_info, "HMZONED mode does not support RAID56"); + ret = -EINVAL; + goto out; + } + + /* + * SPACE CACHE writing is not cowed. Disable that to avoid + * write errors in sequential zones. + */ + if (btrfs_test_opt(fs_info, SPACE_CACHE)) { + btrfs_info(fs_info, + "disabling disk space caching with HMZONED mode"); + btrfs_clear_opt(fs_info->mount_opt, SPACE_CACHE); + } + + btrfs_set_and_info(fs_info, NOTREELOG, + "disabling tree log with HMZONED mode"); + + btrfs_info(fs_info, "HMZONED mode enabled, zone size %llu B", + fs_info->zone_size); + +out: + + return ret; +} + static void btrfs_release_disk_super(struct page *page) { kunmap(page); @@ -2695,6 +2772,13 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path if (IS_ERR(bdev)) return PTR_ERR(bdev); + if ((bdev_zoned_model(bdev) == BLK_ZONED_HM && + !btrfs_fs_incompat(fs_info, HMZONED)) || + (!bdev_is_zoned(bdev) && btrfs_fs_incompat(fs_info, HMZONED))) { + ret = -EINVAL; + goto error; + } + if (fs_devices->seeding) { seeding_dev = 1; down_write(&sb->s_umount); @@ -2816,6 +2900,21 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path } } + /* Get zone type information of zoned block devices */ + if (bdev_is_zoned(bdev)) { + ret = btrfs_get_dev_zonetypes(device); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto error_sysfs; + } + } + + ret = btrfs_check_hmzoned_mode(fs_info); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto error_sysfs; + } + ret = btrfs_add_dev_item(trans, device); if (ret) { btrfs_abort_transaction(trans, ret); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1599641e216c..f66755e43669 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -432,6 +432,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, void *holder); int btrfs_forget_devices(const char *path); int btrfs_close_devices(struct btrfs_fs_devices *fs_devices); +int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info); void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step); void btrfs_assign_next_active_device(struct btrfs_device *device, struct btrfs_device *this_dev); From patchwork Fri Jun 7 13:10:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981767 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06EFD15E6 for ; Fri, 7 Jun 2019 13:12:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ECF82288B9 for ; Fri, 7 Jun 2019 13:12:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E162C28965; Fri, 7 Jun 2019 13:12:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A0EB9288F7 for ; Fri, 7 Jun 2019 13:12:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729125AbfFGNLZ (ORCPT ); Fri, 7 Jun 2019 09:11:25 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53165 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729052AbfFGNLV (ORCPT ); Fri, 7 Jun 2019 09:11:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913081; x=1591449081; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Sceo+E3qPLPcf84lMk/uwJwqo0EhyjB73konNzrUp0U=; b=nRDRC0+9/ZqL7o5LijwfvLGmhVQagD8pC0NoD6oPEaffmrwAiCYoNlum IL9KmCT5L5sCb616+Bf2tlWoAXt22BVNQQ2R/SvlIZ1ogkJrK49xyZGpA 9/eGKX0/w1xoE0ynjvgNPxut+bcDPhWhaMN/ZtolcRGYQoHmJ0QHZT86/ Fu9oXNXTM5/L2o+GTdiGKa5aynekNiOWHiF/SQa4uS7zOo83roERBOHa/ QLCgYmxwFNp4p6jG3mWHHYfOjs8bhhVmOhBPz3mzWQtrm1OoTuYPCIW5d s9LdVQftFPViZVRqesp3pL0ld6A0cFKSOMmwEZWLQQ9xNBez+fiduTB3i A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027777" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:21 +0800 IronPort-SDR: PgjCzsjCngKeHtkpHB6v/6nAfh2E+la86qnVmyMmonkjaUb3+U1viEfGdnYyn5gPopchosiPRx pmp5ukB/FYPwXAQS7JbBcwc0mk6hK1ORRUlZkDC6JEYb1EuP09qBcZOXvOP21wIlwW30yxTmcW ctoJpHCyNYEZpHFdEbj+g+Mklaj5ZPEsXfYSdgKufl5UnrPWuADnOEX4pcvWuAZ8/dc6ggbx71 2oY2pHlAKCMn6E0z6MB6ZwtWxcMlfytE7US14pamjQH60pVflMcz6rlYd8/CbWe/cDQC2rPo1u yYiZF/h2KGKKr23KW2a38tWf Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:38 -0700 IronPort-SDR: 5ucARHwTib91o21SkMfarKE2KT4zGnVxo8EVkI21T36K4fSonyOo3Xu5NjiqAVPJrbuMVyiQvP 6iyQaugsHN91BVlRA6GdFB6ryWu9GZeD8KdPC+V2m+s3n7RdMSLpZhUUqMe0mrcC5pM+kzeEFt E/M5VRqrj+cCG4OLKVK9u90ntv3gB8ZLNc22JG9b0XeEN6iA4Ot0G3oeDykf4fjm7mGGTerptv KqX3M0mmEbuxDfUWqbEdDfLCls0DikrDjDTzWtxUnUXEpxusBOjIAZIoEobzh72gh2tvKk9PXr FTc= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:19 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 04/19] btrfs: disable fallocate in HMZONED mode Date: Fri, 7 Jun 2019 22:10:10 +0900 Message-Id: <20190607131025.31996-5-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP fallocate() is implemented by reserving actual extent instead of reservations. This can result in exposing the sequential write constraint of host-managed zoned block devices to the application, which would break the POSIX semantic for the fallocated file. To avoid this, report fallocate() as not supported when in HMZONED mode for now. In the future, we may be able to implement "in-memory" fallocate() in HMZONED mode by utilizing space_info->bytes_may_use or so. Signed-off-by: Naohiro Aota --- fs/btrfs/file.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89f5be2bfb43..e664b5363697 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3027,6 +3027,10 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_end = round_up(offset + len, blocksize); cur_offset = alloc_start; + /* Do not allow fallocate in HMZONED mode */ + if (btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED)) + return -EOPNOTSUPP; + /* Make sure we aren't being give some crap mode */ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) From patchwork Fri Jun 7 13:10:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981775 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8815414B6 for ; Fri, 7 Jun 2019 13:13:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7A0EF28857 for ; Fri, 7 Jun 2019 13:13:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6DC1A288F7; Fri, 7 Jun 2019 13:13:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 27CFD28B36 for ; Fri, 7 Jun 2019 13:13:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729116AbfFGNLY (ORCPT ); Fri, 7 Jun 2019 09:11:24 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728932AbfFGNLX (ORCPT ); Fri, 7 Jun 2019 09:11:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913083; x=1591449083; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xqnDdtcrbB/JuA69zVboBOpPg+hMLEFntRJfoMfWmVE=; b=D6jZBY8nytzqV8nzHZotEopg1kOiSaNUPMHkCsX0x4evNSIbMk3zu0xY EK1Wyy/HREzQ4u5D6xJ/jqRE8V8FNY6lClNP3eq4Ujia6TmiNBecfYW6I CT/HygpB11aWnQsDYKgk7TMKyVDX/ypZazegXGU8svMS8z6+LLvOYh10a hcOekk49w0PpK12PlzZ3967+eROcJOB+3pc7dKkIsOFvbv3BrzYX4G2TX aEEI8rNsVvmL1gbsRa8UWaU53x8bfZ2ZO40lR37LO02VVUIsI8nw7KP62 em1VM62cDiX6Hb2lOOOCW8Oy5MutwwwZ7R9N/eWoH38dB0OKH4ZqVSU7x g==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027786" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:23 +0800 IronPort-SDR: p1i5SJ9Gkevvu9Ac5/JCfeb9kBSps/bGLwNMEsQb/pjl7y5dgB77+0TlRB46bC8H/00gZFXPW3 0CQLTLFZEX6newzAyzu0YJX2GMnm3XWgdc7tnQxi3jM8510zivrarSacpksHNU2olIKUnYrGui bYZayL9NEV1lgmAcJAQa4ujDpJEFPMSgN6D22actEwDQPDfnOZnrlg9RcDcAl4RWANAO2q3Jus zzYRQdJCvoOVK9aMiCepTnXno0x+3+grHQ5lqJRsskgbjzrQPcMskC7f17LIzeFeAw4PJFeazh 62hFYrg9ZlQhI3xSXLwo+F4d Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:40 -0700 IronPort-SDR: ZkdpoyZeYXGA6DvrPn+rNtzz+woX9jTuSHowB6NDPVP4u2BMZsWyaLpqxeyMjN8tG39U2MTOSK U1ojklXNwTbbdUmJ3VBD/A/Rq6ou5d+4VM8T1s9Hj7lp6JtY1/LEUdwlYY2i9qFDPE0GblK2E2 J5gcbC0SetZ+3ggH0xiLEQRcQfUJ5IIxuEDpmSfLGa+aIXClwCTXiezmMyTAqj6DBKc4sHrECe S7MfeZekClyCcw11njl4dtxqYaHfoax+uNxPPGlvyv8uqWsilj4Xpc66wT0KIHPq0hDaw1ZTaR c8I= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:21 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 05/19] btrfs: disable direct IO in HMZONED mode Date: Fri, 7 Jun 2019 22:10:11 +0900 Message-Id: <20190607131025.31996-6-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Direct write I/Os can be directed at existing extents that have already been written. Such write requests are prohibited on host-managed zoned block devices. So disable direct IO support for a volume with HMZONED mode enabled. Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6bebc0ca751d..89542c19d09e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8520,6 +8520,9 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, unsigned int blocksize_mask = fs_info->sectorsize - 1; ssize_t retval = -EINVAL; + if (btrfs_fs_incompat(fs_info, HMZONED)) + goto out; + if (offset & blocksize_mask) goto out; From patchwork Fri Jun 7 13:10:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981771 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD8AE14B6 for ; Fri, 7 Jun 2019 13:13:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BDD47288E4 for ; Fri, 7 Jun 2019 13:13:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B26C8288F7; Fri, 7 Jun 2019 13:13:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 156C528BA5 for ; Fri, 7 Jun 2019 13:13:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728458AbfFGNMy (ORCPT ); Fri, 7 Jun 2019 09:12:54 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729127AbfFGNLZ (ORCPT ); Fri, 7 Jun 2019 09:11:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913085; x=1591449085; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=On1HhV87QhFcsITaz2WTpH7k8b/z+/zuwRi+IH2xmYU=; b=DaluEIRL7XQlWU2c1AUm2B4VV0whW5HiYCPx94p9IA72BRa4hzGH9QVn 8XpAiiinqd8M8Iv1P2I2JjOSZRk85r470CWfIjExOgbG6D2MEW/8nWU2x A76P6f5U8nPeFyTVYnlwaP87TF+Yp8dOxw22b3/rQXBffUC316bc1/QJ2 e6FojHoTANqC96FS9oTOO8vDu5pFaaHjxLMkcrDpRzQNm0tQQrcXaND+n VzwYG1Kfay6Yhr1wf4gxqNk1D9P/O10Nb/lMiIJx/swcSHNeh/nsW4MB4 jBrX7VzE5H7VUqlvDe6Opx4rTKniln1YqSMXNi5wNuNjMx0jHO5QNPD7J w==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027793" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:25 +0800 IronPort-SDR: 8IFnJmrUH3mSgTLqJpcVLmTMLhcOkUzWGzi+GSodyA5RwZqqik11peGUAl/xCrw7TQCSA24Msn O3gPkf3mYa8QkbxXz2F585pDhvneAwPV/DnH88CfcZfRe3moI5jsxfx7Met+OlwKx+WTZdbz/z unrIM481Aa8P2tcUQJYNgAXO0KE6nCSdezBZkhTwybLWCjGTJtqSo5jamX99xLh97PweuEwCzy 9ebjPLFVy7I8DM3dy4DYmiQ4zec89J28G1QyaRUj5CNjpZugRaROi3Nka1xBCUol8SX/nlu/yh ShfIwDjsyDH2/oLcZGC+MzKu Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:42 -0700 IronPort-SDR: oasUBg4Wvxp0weaeUclGaB4/Pz8g3U0Tuw1WPIpINufJpmT95434UXbvY1WFw6A/oyHkh5yqdY Wi7CKrX+XF+q7bCUDfnfKd9P6TBUJMHFSg7q1edbZ7IKrob24fJPYFiYwEqgCKAuNNFc5OTZSr Myv7YR6lY1CKl6Yp2wjte0z4AdeeasBWpD/sK/5yCDE5n3GaQVNBx2apOy3pSYQcN4yzswtY+a 4GqiBLKCZfp7gqFB/bp7ilRod+olV2knCGGHv8JnojP7h24nwPac/Dldfp7lu5OvFHvhyYRDPx T+w= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:23 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 06/19] btrfs: align dev extent allocation to zone boundary Date: Fri, 7 Jun 2019 22:10:12 +0900 Message-Id: <20190607131025.31996-7-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In HMZONED mode, align the device extents to zone boundaries so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, check that a region allocation is always over empty same-type zones. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 6 +++ fs/btrfs/volumes.c | 100 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 103 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1aee51a9f3bf..363db58f56b8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9884,6 +9884,12 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) min_free = div64_u64(min_free, dev_min); } + /* We cannot allocate size less than zone_size anyway */ + if (index == BTRFS_RAID_DUP) + min_free = max_t(u64, min_free, 2 * fs_info->zone_size); + else + min_free = max_t(u64, min_free, fs_info->zone_size); + mutex_lock(&fs_info->chunk_mutex); list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) { u64 dev_offset; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b6f367d19dc9..c1ed3b6e3cfd 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1737,6 +1737,46 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start, return false; } +static u64 dev_zone_align(struct btrfs_device *device, u64 pos) +{ + if (device->zone_size) + return ALIGN(pos, device->zone_size); + return pos; +} + +/* + * is_allocatable_region - check if spcecifeid region is suitable for allocation + * @device: the device to allocate a region + * @pos: the position of the region + * @num_bytes: the size of the region + * + * In non-ZONED device, anywhere is suitable for allocation. In ZONED + * device, check if the region is not on non-empty zones. Also, check if + * all zones in the region have the same zone type. + */ +static bool is_allocatable_region(struct btrfs_device *device, u64 pos, + u64 num_bytes) +{ + int is_sequential; + + if (device->zone_size == 0) + return true; + + WARN_ON(!IS_ALIGNED(pos, device->zone_size)); + WARN_ON(!IS_ALIGNED(num_bytes, device->zone_size)); + + is_sequential = btrfs_dev_is_sequential(device, pos); + + while (num_bytes > 0) { + if (!btrfs_dev_is_empty_zone(device, pos) || + (is_sequential != btrfs_dev_is_sequential(device, pos))) + return false; + pos += device->zone_size; + num_bytes -= device->zone_size; + } + + return true; +} /* * find_free_dev_extent_start - find free space in the specified device @@ -1779,9 +1819,14 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, /* * We don't want to overwrite the superblock on the drive nor any area * used by the boot loader (grub for example), so we make sure to start - * at an offset of at least 1MB. + * at an offset of at least 1MB on a regular disk. For a zoned block + * device, skip the first zone of the device entirely. */ - search_start = max_t(u64, search_start, SZ_1M); + if (device->zone_size) + search_start = max_t(u64, dev_zone_align(device, search_start), + device->zone_size); + else + search_start = max_t(u64, search_start, SZ_1M); path = btrfs_alloc_path(); if (!path) @@ -1846,12 +1891,22 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, */ if (contains_pending_extent(device, &search_start, hole_size)) { + search_start = dev_zone_align(device, + search_start); if (key.offset >= search_start) hole_size = key.offset - search_start; else hole_size = 0; } + if (!is_allocatable_region(device, search_start, + num_bytes)) { + search_start = dev_zone_align(device, + search_start+1); + btrfs_release_path(path); + goto again; + } + if (hole_size > max_hole_size) { max_hole_start = search_start; max_hole_size = hole_size; @@ -1876,7 +1931,7 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, extent_end = key.offset + btrfs_dev_extent_length(l, dev_extent); if (extent_end > search_start) - search_start = extent_end; + search_start = dev_zone_align(device, extent_end); next: path->slots[0]++; cond_resched(); @@ -1891,6 +1946,14 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, hole_size = search_end - search_start; if (contains_pending_extent(device, &search_start, hole_size)) { + search_start = dev_zone_align(device, + search_start); + btrfs_release_path(path); + goto again; + } + + if (!is_allocatable_region(device, search_start, num_bytes)) { + search_start = dev_zone_align(device, search_start+1); btrfs_release_path(path); goto again; } @@ -5177,6 +5240,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int i; int j; int index; + int hmzoned = btrfs_fs_incompat(info, HMZONED); BUG_ON(!alloc_profile_is_valid(type, 0)); @@ -5221,10 +5285,20 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, BUG(); } + if (hmzoned) { + max_stripe_size = info->zone_size; + max_chunk_size = round_down(max_chunk_size, info->zone_size); + } + /* We don't want a chunk larger than 10% of writable space */ max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1), max_chunk_size); + if (hmzoned) + max_chunk_size = max(round_down(max_chunk_size, + info->zone_size), + info->zone_size); + devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info), GFP_NOFS); if (!devices_info) @@ -5259,6 +5333,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (total_avail == 0) continue; + if (hmzoned && total_avail < max_stripe_size * dev_stripes) + continue; + ret = find_free_dev_extent(device, max_stripe_size * dev_stripes, &dev_offset, &max_avail); @@ -5277,6 +5354,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, continue; } + if (hmzoned && max_avail < max_stripe_size * dev_stripes) + continue; + if (ndevs == fs_devices->rw_devices) { WARN(1, "%s: found more than %llu devices\n", __func__, fs_devices->rw_devices); @@ -5310,6 +5390,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, ndevs = min(ndevs, devs_max); +again: /* * The primary goal is to maximize the number of stripes, so use as * many devices as possible, even if the stripes are not maximum sized. @@ -5333,6 +5414,17 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, * we try to reduce stripe_size. */ if (stripe_size * data_stripes > max_chunk_size) { + if (hmzoned) { + /* + * stripe_size is fixed in HMZONED. Reduce ndevs + * instead. + */ + WARN_ON(nparity != 0); + ndevs = div_u64(max_chunk_size * ncopies, + stripe_size * dev_stripes); + goto again; + } + /* * Reduce stripe_size, round it up to a 16MB boundary again and * then use it, unless it ends up being even bigger than the @@ -5346,6 +5438,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, /* align to BTRFS_STRIPE_LEN */ stripe_size = round_down(stripe_size, BTRFS_STRIPE_LEN); + WARN_ON(hmzoned && stripe_size != info->zone_size); + map = kmalloc(map_lookup_size(num_stripes), GFP_NOFS); if (!map) { ret = -ENOMEM; From patchwork Fri Jun 7 13:10:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981763 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 779E715E6 for ; Fri, 7 Jun 2019 13:12:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 640982886B for ; Fri, 7 Jun 2019 13:12:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 585F8288B8; Fri, 7 Jun 2019 13:12:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 36F4E288F7 for ; Fri, 7 Jun 2019 13:12:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728796AbfFGNL3 (ORCPT ); Fri, 7 Jun 2019 09:11:29 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729161AbfFGNL2 (ORCPT ); Fri, 7 Jun 2019 09:11:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913088; x=1591449088; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xBsRKkNWUbngL/HotqrmOsnq8ySlTj6IOvXhxVGScfM=; b=bYhHT8CmNOdSHvwe+YC+Z2u0f8vwkJHxjbnhHiyGuTbmjG7evwBf/YjO McuKp3YXNvZPtDIh3/oBvDzVwNdMKoVQ8QQ10t32EZUMvz7zcZHKCbnWB ZU9BUf+W+SjWhgNJ8B6ZltvU5U3araxPfe73QFSmRM8J0tIMJKKDzu5sO 58LnJJFjt0FrWtkx/TXjsWrgq0fo51fMcMLtmt+OolorbqNhBTbhi9UTP kyfLrgCCMycHD3ztCosBOKFI2Ly5DuDXZCPgl0Xc77VKvxZ8kOtN/+cBr 9icsp8zibbCvKrVwrk+hXLyEI9ZNhAXMWsj+/N7L0GrzTvxtKxO409i3s A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027798" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:28 +0800 IronPort-SDR: 8YsjyHbGiqb3F979RZGK8KizYd2YhHUA5Ncyp/NthRdeObN6qUVylACnttP3D7ul7TGhZHiU5M alnWZx7CwAfh/aD9duS9I+6nRCL8fbXb+lKudfH95w8ABDB5vFMcy2S0NNYJ8RRZJlVaUOQWyj 9pGFINj9Vvzm+OTmXchCC3Bh0hklhpIxcRTRKqvYfK29GjDSLzBVvaEsbDQKAw01/8gVJ7Z0vI lBJygH/wWMxLF3XdfkYZwCJ22Mcl5vu0URnWE29Wzm+NmW9XbnHK1tS1GNwK2hGUTuUUXj98fo 5rDTHg7OfQEl2fw/JbeIDbp5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:44 -0700 IronPort-SDR: ZpJkImPkJtKlQGDbEDAh4KA0jLciilUBCSDoL45BFmkpCnBH26cXNh8XlKSZCOiMI7UkPZ3qvU FTZq8Gd1ktRVsADv9IVQmP4RSSZCDtzdC48nRvNHvmKwqeFDFPKDQmv4pCFEkKiPmuBCumKPZ3 K8wFLX+QfxLHR85IY8ap11qxAm3x0yrM1P2FomZkEifV0dNGK5PQ+O/SFj5LHRdIoxaLMN089i hCpePRP3X0Va6YEamWRpaMf82uZ/5l+hySR69Rm0GLefTjoHqYUCqXEsIQwPuRkTPNHdE8uEPl iDg= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:26 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 07/19] btrfs: do sequential extent allocation in HMZONED mode Date: Fri, 7 Jun 2019 22:10:13 +0900 Message-Id: <20190607131025.31996-8-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On HMZONED drives, writes must always be sequential and directed at a block group zone write pointer position. Thus, block allocation in a block group must also be done sequentially using an allocation pointer equal to the block group zone write pointer plus the number of blocks allocated but not yet written. Sequential allocation function find_free_extent_seq() bypass the checks in find_free_extent() and increase the reserved byte counter by itself. It is impossible to revert once allocated region in the sequential allocation, since it might race with other allocations and leave an allocation hole, which breaks the sequential write rule. Furthermore, this commit introduce two new variable to struct btrfs_block_group_cache. "wp_broken" indicate that write pointer is broken (e.g. not synced on a RAID1 block group) and mark that block group read only. "unusable" keeps track of the size of once allocated then freed region. Such region is never usable until resetting underlying zones. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 24 +++ fs/btrfs/extent-tree.c | 378 ++++++++++++++++++++++++++++++++++-- fs/btrfs/free-space-cache.c | 33 ++++ fs/btrfs/free-space-cache.h | 5 + 4 files changed, 426 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6c00101407e4..f4bcd2a6ec12 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -582,6 +582,20 @@ struct btrfs_full_stripe_locks_tree { struct mutex lock; }; +/* Block group allocation types */ +enum btrfs_alloc_type { + + /* Regular first fit allocation */ + BTRFS_ALLOC_FIT = 0, + + /* + * Sequential allocation: this is for HMZONED mode and + * will result in ignoring free space before a block + * group allocation offset. + */ + BTRFS_ALLOC_SEQ = 1, +}; + struct btrfs_block_group_cache { struct btrfs_key key; struct btrfs_block_group_item item; @@ -592,6 +606,7 @@ struct btrfs_block_group_cache { u64 reserved; u64 delalloc_bytes; u64 bytes_super; + u64 unusable; u64 flags; u64 cache_generation; @@ -621,6 +636,7 @@ struct btrfs_block_group_cache { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + unsigned int wp_broken:1; int disk_cache_state; @@ -694,6 +710,14 @@ struct btrfs_block_group_cache { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only with HMZONED mode enabled and if + * the block group resides on a sequential zone. + */ + enum btrfs_alloc_type alloc_type; + u64 alloc_offset; }; /* delayed seq elem */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 363db58f56b8..ebd0d6eae038 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -28,6 +28,7 @@ #include "sysfs.h" #include "qgroup.h" #include "ref-verify.h" +#include "rcu-string.h" #undef SCRAMBLE_DELAYED_REFS @@ -590,6 +591,8 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, struct btrfs_caching_control *caching_ctl; int ret = 0; + WARN_ON(cache->alloc_type == BTRFS_ALLOC_SEQ); + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); if (!caching_ctl) return -ENOMEM; @@ -6555,6 +6558,19 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg) wait_var_event(&bg->reservations, !atomic_read(&bg->reservations)); } +static void __btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, + u64 ram_bytes, u64 num_bytes, + int delalloc) +{ + struct btrfs_space_info *space_info = cache->space_info; + + cache->reserved += num_bytes; + space_info->bytes_reserved += num_bytes; + update_bytes_may_use(space_info, -ram_bytes); + if (delalloc) + cache->delalloc_bytes += num_bytes; +} + /** * btrfs_add_reserved_bytes - update the block_group and space info counters * @cache: The cache we are manipulating @@ -6573,17 +6589,16 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, struct btrfs_space_info *space_info = cache->space_info; int ret = 0; + /* should handled by find_free_extent_seq */ + WARN_ON(cache->alloc_type == BTRFS_ALLOC_SEQ); + spin_lock(&space_info->lock); spin_lock(&cache->lock); - if (cache->ro) { + if (cache->ro) ret = -EAGAIN; - } else { - cache->reserved += num_bytes; - space_info->bytes_reserved += num_bytes; - update_bytes_may_use(space_info, -ram_bytes); - if (delalloc) - cache->delalloc_bytes += num_bytes; - } + else + __btrfs_add_reserved_bytes(cache, ram_bytes, num_bytes, + delalloc); spin_unlock(&cache->lock); spin_unlock(&space_info->lock); return ret; @@ -6701,9 +6716,13 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, cache = btrfs_lookup_block_group(fs_info, start); BUG_ON(!cache); /* Logic error */ - cluster = fetch_cluster_info(fs_info, - cache->space_info, - &empty_cluster); + if (cache->alloc_type == BTRFS_ALLOC_FIT) + cluster = fetch_cluster_info(fs_info, + cache->space_info, + &empty_cluster); + else + cluster = NULL; + empty_cluster <<= 1; } @@ -6743,7 +6762,8 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, space_info->max_extent_size = 0; percpu_counter_add_batch(&space_info->total_bytes_pinned, -len, BTRFS_TOTAL_BYTES_PINNED_BATCH); - if (cache->ro) { + if (cache->ro || cache->alloc_type == BTRFS_ALLOC_SEQ) { + /* need reset before reusing in ALLOC_SEQ BG */ space_info->bytes_readonly += len; readonly = true; } @@ -7588,6 +7608,60 @@ static int find_free_extent_unclustered(struct btrfs_block_group_cache *bg, return 0; } +/* + * Simple allocator for sequential only block group. It only allows + * sequential allocation. No need to play with trees. This function + * also reserve the bytes as in btrfs_add_reserved_bytes. + */ + +static int find_free_extent_seq(struct btrfs_block_group_cache *cache, + struct find_free_extent_ctl *ffe_ctl) +{ + struct btrfs_space_info *space_info = cache->space_info; + struct btrfs_free_space_ctl *ctl = cache->free_space_ctl; + u64 start = cache->key.objectid; + u64 num_bytes = ffe_ctl->num_bytes; + u64 avail; + int ret = 0; + + /* Sanity check */ + if (cache->alloc_type != BTRFS_ALLOC_SEQ) + return 1; + + spin_lock(&space_info->lock); + spin_lock(&cache->lock); + + if (cache->ro) { + ret = -EAGAIN; + goto out; + } + + spin_lock(&ctl->tree_lock); + avail = cache->key.offset - cache->alloc_offset; + if (avail < num_bytes) { + ffe_ctl->max_extent_size = avail; + spin_unlock(&ctl->tree_lock); + ret = 1; + goto out; + } + + ffe_ctl->found_offset = start + cache->alloc_offset; + cache->alloc_offset += num_bytes; + ctl->free_space -= num_bytes; + spin_unlock(&ctl->tree_lock); + + BUG_ON(!IS_ALIGNED(ffe_ctl->found_offset, + cache->fs_info->stripesize)); + ffe_ctl->search_start = ffe_ctl->found_offset; + __btrfs_add_reserved_bytes(cache, ffe_ctl->ram_bytes, num_bytes, + ffe_ctl->delalloc); + +out: + spin_unlock(&cache->lock); + spin_unlock(&space_info->lock); + return ret; +} + /* * Return >0 means caller needs to re-search for free extent * Return 0 means we have the needed free extent. @@ -7889,6 +7963,16 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, if (unlikely(block_group->cached == BTRFS_CACHE_ERROR)) goto loop; + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) { + ret = find_free_extent_seq(block_group, &ffe_ctl); + if (ret) + goto loop; + /* btrfs_find_space_for_alloc_seq should ensure + * that everything is OK and reserve the extent. + */ + goto nocheck; + } + /* * Ok we want to try and use the cluster allocator, so * lets look there @@ -7944,6 +8028,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, num_bytes); goto loop; } +nocheck: btrfs_inc_block_group_reservations(block_group); /* we are all good, lets return */ @@ -9616,7 +9701,8 @@ static int inc_block_group_ro(struct btrfs_block_group_cache *cache, int force) } num_bytes = cache->key.offset - cache->reserved - cache->pinned - - cache->bytes_super - btrfs_block_group_used(&cache->item); + cache->bytes_super - cache->unusable - + btrfs_block_group_used(&cache->item); sinfo_used = btrfs_space_info_used(sinfo, true); if (sinfo_used + num_bytes + min_allocable_bytes <= @@ -9766,6 +9852,7 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache) if (!--cache->ro) { num_bytes = cache->key.offset - cache->reserved - cache->pinned - cache->bytes_super - + cache->unusable - btrfs_block_group_used(&cache->item); sinfo->bytes_readonly -= num_bytes; list_del_init(&cache->ro_list); @@ -10200,11 +10287,240 @@ static void link_block_group(struct btrfs_block_group_cache *cache) } } +static int +btrfs_get_block_group_alloc_offset(struct btrfs_block_group_cache *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->key.objectid; + u64 length = cache->key.offset; + u64 physical = 0; + int ret, alloc_type; + int i, j; + u64 *alloc_offsets = NULL; + +#define WP_MISSING_DEV ((u64)-1) + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, "unaligned block group at %llu + %llu", + logical, length); + return -EIO; + } + + /* Get the chunk mapping */ + em_tree = &fs_info->mapping_tree.map_tree; + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + /* + * Get the zone type: if the group is mapped to a non-sequential zone, + * there is no need for the allocation offset (fit allocation is OK). + */ + alloc_type = -1; + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), + GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + int is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (alloc_type == -1) + alloc_type = is_sequential ? + BTRFS_ALLOC_SEQ : BTRFS_ALLOC_FIT; + + if ((is_sequential && alloc_type != BTRFS_ALLOC_SEQ) || + (!is_sequential && alloc_type == BTRFS_ALLOC_SEQ)) { + btrfs_err(fs_info, "found block group of mixed zone types"); + ret = -EIO; + goto out; + } + + if (!is_sequential) + continue; + + /* this zone will be used for allocation, so mark this + * zone non-empty + */ + clear_bit(physical >> device->zone_size_shift, + device->empty_zones); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + ret = btrfs_get_dev_zone(device, physical, &zone, GFP_NOFS); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err(fs_info, "Offline/readonly zone %llu", + physical >> device->zone_size_shift); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (alloc_type == BTRFS_ALLOC_FIT) + goto out; + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + cache->alloc_offset = WP_MISSING_DEV; + for (i = 0; i < map->num_stripes; i++) { + if (alloc_offsets[i] == WP_MISSING_DEV) + continue; + if (cache->alloc_offset == WP_MISSING_DEV) + cache->alloc_offset = alloc_offsets[i]; + if (alloc_offsets[i] == cache->alloc_offset) + continue; + + btrfs_err(fs_info, + "write pointer mismatch: block group %llu", + logical); + cache->wp_broken = 1; + } + break; + case BTRFS_BLOCK_GROUP_RAID0: + cache->alloc_offset = 0; + for (i = 0; i < map->num_stripes; i++) { + if (alloc_offsets[i] == WP_MISSING_DEV) { + btrfs_err(fs_info, + "cannot recover write pointer: block group %llu", + logical); + cache->wp_broken = 1; + continue; + } + + if (alloc_offsets[0] < alloc_offsets[i]) { + btrfs_err(fs_info, + "write pointer mismatch: block group %llu", + logical); + cache->wp_broken = 1; + continue; + } + + cache->alloc_offset += alloc_offsets[i]; + } + break; + case BTRFS_BLOCK_GROUP_RAID10: + /* + * Pass1: check write pointer of RAID1 level: each pointer + * should be equal. + */ + for (i = 0; i < map->num_stripes / map->sub_stripes; i++) { + int base = i*map->sub_stripes; + u64 offset = WP_MISSING_DEV; + + for (j = 0; j < map->sub_stripes; j++) { + if (alloc_offsets[base+j] == WP_MISSING_DEV) + continue; + if (offset == WP_MISSING_DEV) + offset = alloc_offsets[base+j]; + if (alloc_offsets[base+j] == offset) + continue; + + btrfs_err(fs_info, + "write pointer mismatch: block group %llu", + logical); + cache->wp_broken = 1; + } + for (j = 0; j < map->sub_stripes; j++) + alloc_offsets[base+j] = offset; + } + + /* Pass2: check write pointer of RAID1 level */ + cache->alloc_offset = 0; + for (i = 0; i < map->num_stripes / map->sub_stripes; i++) { + int base = i*map->sub_stripes; + + if (alloc_offsets[base] == WP_MISSING_DEV) { + btrfs_err(fs_info, + "cannot recover write pointer: block group %llu", + logical); + cache->wp_broken = 1; + continue; + } + + if (alloc_offsets[0] < alloc_offsets[base]) { + btrfs_err(fs_info, + "write pointer mismatch: block group %llu", + logical); + cache->wp_broken = 1; + continue; + } + + cache->alloc_offset += alloc_offsets[base]; + } + break; + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* RAID5/6 is not supported yet */ + default: + btrfs_err(fs_info, "Unsupported profile on HMZONED %llu", + map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK); + ret = -EINVAL; + goto out; + } + +out: + cache->alloc_type = alloc_type; + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} + static struct btrfs_block_group_cache * btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, u64 start, u64 size) { struct btrfs_block_group_cache *cache; + int ret; cache = kzalloc(sizeof(*cache), GFP_NOFS); if (!cache) @@ -10238,6 +10554,16 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, atomic_set(&cache->trimming, 0); mutex_init(&cache->free_space_lock); btrfs_init_full_stripe_locks_tree(&cache->full_stripe_locks_root); + cache->alloc_type = BTRFS_ALLOC_FIT; + cache->alloc_offset = 0; + + if (btrfs_fs_incompat(fs_info, HMZONED)) { + ret = btrfs_get_block_group_alloc_offset(cache); + if (ret) { + kfree(cache); + return NULL; + } + } return cache; } @@ -10310,6 +10636,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) int need_clear = 0; u64 cache_gen; u64 feature; + u64 unusable; int mixed; feature = btrfs_super_incompat_flags(info->super_copy); @@ -10415,6 +10742,26 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) free_excluded_extents(cache); } + switch (cache->alloc_type) { + case BTRFS_ALLOC_FIT: + unusable = cache->bytes_super; + break; + case BTRFS_ALLOC_SEQ: + WARN_ON(cache->bytes_super != 0); + unusable = cache->alloc_offset - + btrfs_block_group_used(&cache->item); + /* we only need ->free_space in ALLOC_SEQ BGs */ + cache->last_byte_to_unpin = (u64)-1; + cache->cached = BTRFS_CACHE_FINISHED; + cache->free_space_ctl->free_space = + cache->key.offset - cache->alloc_offset; + cache->unusable = unusable; + free_excluded_extents(cache); + break; + default: + BUG(); + } + ret = btrfs_add_block_group_cache(info, cache); if (ret) { btrfs_remove_free_space_cache(cache); @@ -10425,7 +10772,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) trace_btrfs_add_block_group(info, cache, 0); update_space_info(info, cache->flags, found_key.offset, btrfs_block_group_used(&cache->item), - cache->bytes_super, &space_info); + unusable, &space_info); cache->space_info = space_info; @@ -10438,6 +10785,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) ASSERT(list_empty(&cache->bg_list)); btrfs_mark_bg_unused(cache); } + + if (cache->wp_broken) + inc_block_group_ro(cache, 1); } list_for_each_entry_rcu(space_info, &info->space_info, list) { diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index f74dc259307b..cc69dc71f4c1 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2326,8 +2326,11 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, u64 offset, u64 bytes) { struct btrfs_free_space *info; + struct btrfs_block_group_cache *block_group = ctl->private; int ret = 0; + WARN_ON(block_group && block_group->alloc_type == BTRFS_ALLOC_SEQ); + info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) return -ENOMEM; @@ -2376,6 +2379,28 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, return ret; } +int __btrfs_add_free_space_seq(struct btrfs_block_group_cache *block_group, + u64 bytenr, u64 size) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 offset = bytenr - block_group->key.objectid; + u64 to_free, to_unusable; + + spin_lock(&ctl->tree_lock); + if (offset >= block_group->alloc_offset) + to_free = size; + else if (offset + size <= block_group->alloc_offset) + to_free = 0; + else + to_free = offset + size - block_group->alloc_offset; + to_unusable = size - to_free; + ctl->free_space += to_free; + block_group->unusable += to_unusable; + spin_unlock(&ctl->tree_lock); + return 0; + +} + int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, u64 offset, u64 bytes) { @@ -2384,6 +2409,8 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, int ret; bool re_search = false; + WARN_ON(block_group->alloc_type == BTRFS_ALLOC_SEQ); + spin_lock(&ctl->tree_lock); again: @@ -2619,6 +2646,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 align_gap = 0; u64 align_gap_len = 0; + WARN_ON(block_group->alloc_type == BTRFS_ALLOC_SEQ); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -2738,6 +2767,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, struct rb_node *node; u64 ret = 0; + WARN_ON(block_group->alloc_type == BTRFS_ALLOC_SEQ); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; @@ -3384,6 +3415,8 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, { int ret; + WARN_ON(block_group->alloc_type == BTRFS_ALLOC_SEQ); + *trimmed = 0; spin_lock(&block_group->lock); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 8760acb55ffd..d30667784f73 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -73,10 +73,15 @@ void btrfs_init_free_space_ctl(struct btrfs_block_group_cache *block_group); int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_free_space_ctl *ctl, u64 bytenr, u64 size); +int __btrfs_add_free_space_seq(struct btrfs_block_group_cache *block_group, + u64 bytenr, u64 size); static inline int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, u64 bytenr, u64 size) { + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) + return __btrfs_add_free_space_seq(block_group, bytenr, size); + return __btrfs_add_free_space(block_group->fs_info, block_group->free_space_ctl, bytenr, size); From patchwork Fri Jun 7 13:10:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981759 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B95C414E5 for ; Fri, 7 Jun 2019 13:12:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A7F94288E4 for ; Fri, 7 Jun 2019 13:12:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B8EF28BA5; Fri, 7 Jun 2019 13:12:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 50792288B9 for ; Fri, 7 Jun 2019 13:12:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729186AbfFGNLb (ORCPT ); Fri, 7 Jun 2019 09:11:31 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729175AbfFGNLa (ORCPT ); Fri, 7 Jun 2019 09:11:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913090; x=1591449090; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GKuA4EwOvjFtZaiDno+tWhs28ewLsbZ4n58bYeMECQo=; b=YMO8WDxFNolUvfQJ6XuU0X7sea41p4spoEP7L2yPcim6XDLD4iMZ3IOx 5tHIup95RQcVjqSxzf6gDHctIW6jz3e1chNEw1qiBm/B9uR5z0FjsnCjK Jv9bZDt1pREw1PAXRBz4IZ+ISR2fii43cJjnIlfkJZy5Tio6cd4Bqc+rm JybUjBG452fxoQ7mZQz0Uq+ak4TDobRoel+cq3mCzpkgsxgU+F43V4QCo ZX59EH1MIiB3X8zmxNI4Zj9TR0kWuWlcHovQzYy9u7DdIaDH7Ge3vtppq 5fEt03no47J1HgHv5PRWvwkllBw+mk0ffEQN/OE6HNko6xr6Jz1xppC4O g==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027805" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:30 +0800 IronPort-SDR: 1OIfbapfZ8VtWUpKgs/7OA+qj6hUL6crUF1Szh9nCzI9txnhUlz5SDaPQHzJuxbaDj6gqeghRe cdtZZn9If49W1u9xT5hspYjnV2Q10p/U+rd5wjh428i88ET0+POnUEgmlK6Co2i6VWMiQ5vYj9 y8GcFhmPXl/ZeiuUuTdumaoqvpAYwGwxDJXmSSEWmpudw5dxiDWpO95uIYHUe6ouC5+0Kpimg2 FXc/tR3h4sxDPKBa9WONVkq8f4ACHDEKdHH2asZwUmCUdZV0aOsNpQ+aj6PN4Chchr/4+eJN+k qGxEhW0qIDTmu5WgB74zPzJK Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:47 -0700 IronPort-SDR: pNdd9U/Q1+9d52KqBqHR5WKrQzu3sqL7DQ+1pWcp32GJBIpD+8D/yKbuS3sqOHQzsWAs8IjNjC v8mnxl/NhUtia31KMMF1zMaI5/6cy4WXzThfkYF5pg2DjBpBR7R5BAARVye/eiO9v2hHHcEevp cJYMvzeCbPlEtXr5fRIaNYF5vJWUosWF7oIAMrOzWi+7TIi9K0msaECOHvn2/DmlriPLE5chSA tNfw13VbK2wttI3hOh4cqc7mwXXouLxurDB9IMdw6L1apSdMx+FX753sD/+CNVN7Z9PSzpHEs0 41c= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:28 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 08/19] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Date: Fri, 7 Jun 2019 22:10:14 +0900 Message-Id: <20190607131025.31996-9-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If the btrfs volume has mirrored block groups, it unconditionally makes un-mirrored block groups read only. When we have mirrored block groups, but don't have writable block groups, this will drop all writable block groups. So, check if we have at least one writable mirrored block group before setting un-mirrored block groups read only. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ebd0d6eae038..3d41d840fe5c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10791,6 +10791,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) } list_for_each_entry_rcu(space_info, &info->space_info, list) { + bool has_rw = false; + int i; + if (!(get_alloc_profile(info, space_info->flags) & (BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID1 | @@ -10798,6 +10801,25 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_DUP))) continue; + + /* check if we have at least one writable mirroed block group */ + for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) { + if (i == BTRFS_RAID_RAID0 || i == BTRFS_RAID_SINGLE) + continue; + list_for_each_entry(cache, &space_info->block_groups[i], + list) { + if (!cache->ro) { + has_rw = true; + break; + } + } + if (has_rw) + break; + } + + if (!has_rw) + continue; + /* * avoid allocating from un-mirrored block group if there are * mirrored block groups. From patchwork Fri Jun 7 13:10:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981755 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B77EC15E6 for ; Fri, 7 Jun 2019 13:12:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A76D3288B8 for ; Fri, 7 Jun 2019 13:12:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9BADF28965; Fri, 7 Jun 2019 13:12:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E30E28BFA for ; Fri, 7 Jun 2019 13:12:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729225AbfFGNLg (ORCPT ); Fri, 7 Jun 2019 09:11:36 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729202AbfFGNLc (ORCPT ); Fri, 7 Jun 2019 09:11:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913092; x=1591449092; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3bAmdgIxyty7Z14az/12nbHorxCPMZonbxqA68tGk5c=; b=Nli57SDBAOxmImS3eWAfyXQHH2rSMpLykA2yF0NpCF5WnC2yxi4fUbHX kc03VXV5hBX1KpJyHOYdP2FSaC2A3t9bAoyln8pNUDjnANZcsady/Fwpg L2sXn3aFQl8A9lkmFd4mtF52WwgvXHAUyyo/qz6xBkNT6TnbcxWx0rYpa CRglnxDWW1dyVeHfHmBGvuNkmPVlU34+9T+pbsAluNoid360rLXZAU/qM UWtwjZ0UvD2t4bseQ2X54+K1LWF1/HHC80jkuL07KVYap1pRmeXgmxj1P +CLTtPTDmyNQVosf0yF0qFRVlhfWytwcaiQYLMqsAcVCRA66gOfrXRG9G w==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027808" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:32 +0800 IronPort-SDR: CCqCKZuRz33+0hF/Fif4eTafdXCw53t/vEnu8l+lsei+Ws7h4klDz89W7W3UnK8/3bktc37P0n oeUyENIZsJj9Z76jCHc31bsS3SI8PBQ9DljEHen2k4jqmud+zq0YLXryPSKVvvEV33+qh9WuJ2 B6aLFhEsDnZqCvin1rtSXOTJbIgSWt5eMV/5E7ocB8HKc30VTa+6MsarDdj9bMK8SSPwAJsPnQ cTQpzSL5/NdU3ImxdEH51R8PA+f4gT5alMZVF1U4fvC/zXf43mfHfVOg8ojRbU9YV7mopmUBVQ DceQaC2vgsKi2RZIhY4t8k1c Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:49 -0700 IronPort-SDR: LERPIoMDAu8EulzSRTx1WTxMy0zUcYeAl54vVYEBP3Slghj8Dvvt864i3oygsiyTO5XtUrPWnv 6E4BmyMBG9F39N8DUsb50gM1pCslo42THAkGwCra6FgvcDT+/12o/wXYK+TlVp7UmPLWyXtrU2 837GKd30eqKoPR9wLdR+IXVUr5PTDvvaF0LTeLP4kZd14kDxPtxf0Y4+Mn3Y496YtBqPmtjMej /WZDr6Om6YmQxx5bHmUOg5jq2athotKO1RWJ7epc/5qTJz1e0ZESdFCpf18Aex2I8/UIYzIqR1 3Js= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:30 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 09/19] btrfs: limit super block locations in HMZONED mode Date: Fri, 7 Jun 2019 22:10:15 +0900 Message-Id: <20190607131025.31996-10-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When in HMZONED mode, make sure that device super blocks are located in randomly writable zones of zoned block devices. That is, do not write super blocks in sequential write required zones of host-managed zoned block devices as update would not be possible. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 11 +++++++++++ fs/btrfs/disk-io.h | 1 + fs/btrfs/extent-tree.c | 4 ++++ fs/btrfs/scrub.c | 2 ++ 4 files changed, 18 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7c1404c76768..ddbb02906042 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3466,6 +3466,13 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) return latest; } +int btrfs_check_super_location(struct btrfs_device *device, u64 pos) +{ + /* any address is good on a regular (zone_size == 0) device */ + /* non-SEQUENTIAL WRITE REQUIRED zones are capable on a zoned device */ + return device->zone_size == 0 || !btrfs_dev_is_sequential(device, pos); +} + /* * Write superblock @sb to the @device. Do not wait for completion, all the * buffer heads we write are pinned. @@ -3495,6 +3502,8 @@ static int write_dev_supers(struct btrfs_device *device, if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->commit_total_bytes) break; + if (!btrfs_check_super_location(device, bytenr)) + continue; btrfs_set_super_bytenr(sb, bytenr); @@ -3561,6 +3570,8 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors) if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->commit_total_bytes) break; + if (!btrfs_check_super_location(device, bytenr)) + continue; bh = __find_get_block(device->bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index a0161aa1ea0b..70e97cd6fa76 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -141,6 +141,7 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode, struct page *page, size_t pg_offset, u64 start, u64 len, int create); int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags); +int btrfs_check_super_location(struct btrfs_device *device, u64 pos); int __init btrfs_end_io_wq_init(void); void __cold btrfs_end_io_wq_exit(void); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3d41d840fe5c..ae2c895d08c4 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -267,6 +267,10 @@ static int exclude_super_stripes(struct btrfs_block_group_cache *cache) return ret; } + /* we won't have super stripes in sequential zones */ + if (cache->alloc_type == BTRFS_ALLOC_SEQ) + return 0; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); ret = btrfs_rmap_block(fs_info, cache->key.objectid, diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f7b29f9db5e2..36ad4fad7eaf 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3720,6 +3720,8 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, if (bytenr + BTRFS_SUPER_INFO_SIZE > scrub_dev->commit_total_bytes) break; + if (!btrfs_check_super_location(scrub_dev, bytenr)) + continue; ret = scrub_pages(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr, scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i, From patchwork Fri Jun 7 13:10:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981749 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B6C9D14E5 for ; Fri, 7 Jun 2019 13:12:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A80D22886B for ; Fri, 7 Jun 2019 13:12:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9C1C128965; Fri, 7 Jun 2019 13:12:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CBB128C07 for ; Fri, 7 Jun 2019 13:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729232AbfFGNLg (ORCPT ); Fri, 7 Jun 2019 09:11:36 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729175AbfFGNLe (ORCPT ); Fri, 7 Jun 2019 09:11:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913094; x=1591449094; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=p8z3jb430Qgqzm++OBw+hFbvq/mvjn8Ir5DK424mBcI=; b=XTmOTyuacwVz6PZE8MaxhvlF8/0GtheDkb0wCKs1LVlkjIAQSNVS8/5q 8GqveWxvWm1XqIfar+o0dxcJsLP//xNdj86GQw3hRRL49rGJDLc9L/2T2 FqXLX0Le5C/yPGSx0iPff9I6K5fwd5a5Sm2ejGeaq2ulGoo6EF67CnOhu qvHbcCZBmEy7Ib2vXVuyEGcmOoLe6JjlIx8oCggc1SnM6utsb/rI2aQ/A jMII433QNUkgU+yCTlsw2Z6uYbgEMFs2t9GaZxyIlREMlCRB2pPp0agZM 9MDbhdtxb/bo0V1VxuXw4BFfFphs08kW7hf/Z0F5HZ13a2piMB4v1VnB0 A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027811" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:34 +0800 IronPort-SDR: ripgf24P5iEILz4XuvxMxYsrvrXtIEZLdESfBqI4FyzF+JDUpPr3k307JQ/i/szMqdOXVLK87D 8db/snQbGRk2H0Re4mAycMKkgTYES5SjLpA0/7ebtg3nStkoUsKFVmID1AAeN47dg9ZsSEGVmU x9toFEd0Ed6hhTsTc7ntKepfD5pwGikGvW+SqFivwtz3YgOjPGaOHrS0SdEuhzXtf4v2xx5/YN Mcx8VClMhNnkUd2jQuPX0Nlfrz+9RkbPyWNs/hB1alPqmox7JYN2ZS/hrf6LgWWrYwaBpffELF 8/zcCYdaKtGhpTbaYwg2UCz+ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:51 -0700 IronPort-SDR: 8ZW0jd7H9HZ7r/JhlmGNBuVBz7JFaYUZwT7cYlV5evYbYvsNzZfklaNova3j6NeVtPN0gHBds6 8Xdx0LbkmzMwqBElW1LE4bbcu/EM4OWVivVDAiIKQ6jkrnEFAU7aYcc5DJ9jkSx3WgPNgPXIMQ LlKipULqUQzyOSY/SFRoipc+7JUvUx5AuWIakyz2gjaOweehVUo01Iv3geCJYmxrfuyWbN4c4d IMhaRqfZsstKH4f0pjwxgY5W0XTVjv4lrW50zyQXCK2zeCDztcmLLiIhzVmS0H0nhDcjs6W9kc q6M= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:32 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 10/19] btrfs: rename btrfs_map_bio() Date: Fri, 7 Jun 2019 22:10:16 +0900 Message-Id: <20190607131025.31996-11-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch renames btrfs_map_bio() to __btrfs_map_bio() to prepare using __btrfs_map_bio() as a helper function. NOTE: may be squash with next patch? Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c1ed3b6e3cfd..52d0d458c0fd 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6808,8 +6808,9 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) } } -blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, - int mirror_num, int async_submit) +static blk_status_t __btrfs_map_bio(struct btrfs_fs_info *fs_info, + struct bio *bio, int mirror_num, + int async_submit) { struct btrfs_device *dev; struct bio *first_bio = bio; @@ -6884,6 +6885,12 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, return BLK_STS_OK; } +blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num, int async_submit) +{ + return __btrfs_map_bio(fs_info, bio, mirror_num, async_submit); +} + /* * Find a device specified by @devid or @uuid in the list of @fs_devices, or * return NULL. From patchwork Fri Jun 7 13:10:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981747 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D8C5D14B6 for ; Fri, 7 Jun 2019 13:12:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9181286BC for ; Fri, 7 Jun 2019 13:12:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD41428C04; Fri, 7 Jun 2019 13:12:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E87FD286BC for ; Fri, 7 Jun 2019 13:12:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729255AbfFGNLh (ORCPT ); Fri, 7 Jun 2019 09:11:37 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729244AbfFGNLh (ORCPT ); Fri, 7 Jun 2019 09:11:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913097; x=1591449097; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Md7N9CZ8khRJ1RLXgchsCXH8xFL4tXDt3hVbHgdxZsc=; b=B2zVem9dWUohx/XLmBWHyskoYTtKSWmaKxjA1+QUZS7gA7WSenf+5GlW df+i/9SlSNFDFUyf4F6e0+XqSVZVjG9Tk6kP/5SEPpNhDnwT/Gzz8F3hb dEEPCxzg2FuZJivA3aBcuE4eEjN1p7ljBbnE/tf3/m5qnzbZojbIr/Osd 6cQ9H/D9TVBUw0r9HK6/jACMy+bqMtttlChCvHAWRZ9QjUTU4JdMe2LIf /T4xTolwIcp7ggoxwTqd0paINkZ0oByvJg6L2Hs7jo5wN/hLTqOn3I3SH E+5yPb6ozuk5T7ZHLNoFumbkgq/6fXlaw/t2ohP+VXiN9xXGxkvcu4mlJ w==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027815" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:37 +0800 IronPort-SDR: 48V5oN7uqwjYFwpGZZTjvI3qzyWPbIPi2w4hKsMRHqE/Qsl5szBJbAOpTg22kFxyyP9tycrOdX 2PHuHcxol14INu4PLMPEqUd2hUA66t4IR3ToU0UnntlVPPxRR6AD5ybdtPJjejMjiKtfzYVRQA A/qXkZPn6ZzUcZTGoiqPsT1H60yPZ/y2qH6Xv/zqsrkDkzLUmNazpx5I3fQm8pz0EALROB9v24 8A+bcM9DPrItuzAUa2m18XixxIszcDIr2ZTl6uJEXjUjhloMfowqIqDdxh7cCA5q10JZSqDv2i MtWTHU1BHVedYtOp56TpUPru Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:53 -0700 IronPort-SDR: UrvpwSOk56ZiywO/Md4VupENO7qzjOp0MOmyuhgrTZkPx4lOzSzGwD5NZ8ltmjmRXD5hqOQ/S7 /SVOaT+JkfSclqcVKuJI4j/llxlL1Lk6o8OE7KHnU9GYfcSTkWaVq8np8Kl6Q5NKp0q2T9Ve+5 f8p5FvWQ8QsO4Pc1hwEPLLAjWz/ZwyWtp7I9pvOrV/4FijlE+w5r6vrqLz97coObvlBASySSQS ghtCem3vZWS1UMMj+uK5LYAaaqOuzXVdQPQbsUOZsIO836bchB9s7fVYLEGHXr1btsXNXWQbHG Z5c= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:35 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 11/19] btrfs: introduce submit buffer Date: Fri, 7 Jun 2019 22:10:17 +0900 Message-Id: <20190607131025.31996-12-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sequential allocation is not enough to maintain sequential delivery of write IOs to the device. Various features (async compress, async checksum, ...) of btrfs affect ordering of the IOs. This patch introduces submit buffer to sort WRITE bios belonging to a block group and sort them out sequentially in increasing block address to achieve sequential write sequences with __btrfs_map_bio(). Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 + fs/btrfs/extent-tree.c | 5 ++ fs/btrfs/volumes.c | 165 +++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 3 + include/trace/events/btrfs.h | 41 +++++++++ 5 files changed, 212 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f4bcd2a6ec12..ade6d8243962 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -718,6 +718,9 @@ struct btrfs_block_group_cache { */ enum btrfs_alloc_type alloc_type; u64 alloc_offset; + struct mutex submit_lock; + u64 submit_offset; + struct bio_list submit_buffer; }; /* delayed seq elem */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ae2c895d08c4..ebdc7a6dbe01 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -124,6 +124,7 @@ void btrfs_put_block_group(struct btrfs_block_group_cache *cache) if (atomic_dec_and_test(&cache->count)) { WARN_ON(cache->pinned > 0); WARN_ON(cache->reserved > 0); + WARN_ON(!bio_list_empty(&cache->submit_buffer)); /* * If not empty, someone is still holding mutex of @@ -10511,6 +10512,8 @@ btrfs_get_block_group_alloc_offset(struct btrfs_block_group_cache *cache) goto out; } + cache->submit_offset = logical + cache->alloc_offset; + out: cache->alloc_type = alloc_type; kfree(alloc_offsets); @@ -10547,6 +10550,7 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, atomic_set(&cache->count, 1); spin_lock_init(&cache->lock); + mutex_init(&cache->submit_lock); init_rwsem(&cache->data_rwsem); INIT_LIST_HEAD(&cache->list); INIT_LIST_HEAD(&cache->cluster_list); @@ -10554,6 +10558,7 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&cache->ro_list); INIT_LIST_HEAD(&cache->dirty_list); INIT_LIST_HEAD(&cache->io_list); + bio_list_init(&cache->submit_buffer); btrfs_init_free_space_ctl(cache); atomic_set(&cache->trimming, 0); mutex_init(&cache->free_space_lock); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 52d0d458c0fd..26a64a53032f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -29,6 +29,11 @@ #include "sysfs.h" #include "tree-checker.h" +struct map_bio_data { + void *orig_bi_private; + int mirror_num; +}; + const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { [BTRFS_RAID_RAID10] = { .sub_stripes = 2, @@ -523,6 +528,7 @@ static void requeue_list(struct btrfs_pending_bios *pending_bios, pending_bios->tail = tail; } + /* * we try to collect pending bios for a device so we don't get a large * number of procs sending bios down to the same device. This greatly @@ -606,6 +612,8 @@ static noinline void run_scheduled_bios(struct btrfs_device *device) spin_unlock(&device->io_lock); while (pending) { + struct btrfs_bio *bbio; + struct completion *sent = NULL; rmb(); /* we want to work on both lists, but do more bios on the @@ -643,7 +651,12 @@ static noinline void run_scheduled_bios(struct btrfs_device *device) sync_pending = 0; } + bbio = cur->bi_private; + if (bbio) + sent = bbio->sent; btrfsic_submit_bio(cur); + if (sent) + complete(sent); num_run++; batch_run++; @@ -5916,6 +5929,7 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes) atomic_set(&bbio->error, 0); refcount_set(&bbio->refs, 1); + INIT_LIST_HEAD(&bbio->list); return bbio; } @@ -6730,7 +6744,7 @@ static void btrfs_end_bio(struct bio *bio) * the work struct is scheduled. */ static noinline void btrfs_schedule_bio(struct btrfs_device *device, - struct bio *bio) + struct bio *bio, int need_seqwrite) { struct btrfs_fs_info *fs_info = device->fs_info; int should_queue = 1; @@ -6738,7 +6752,12 @@ static noinline void btrfs_schedule_bio(struct btrfs_device *device, /* don't bother with additional async steps for reads, right now */ if (bio_op(bio) == REQ_OP_READ) { + struct btrfs_bio *bbio = bio->bi_private; + struct completion *sent = bbio->sent; + btrfsic_submit_bio(bio); + if (sent) + complete(sent); return; } @@ -6746,7 +6765,7 @@ static noinline void btrfs_schedule_bio(struct btrfs_device *device, bio->bi_next = NULL; spin_lock(&device->io_lock); - if (op_is_sync(bio->bi_opf)) + if (op_is_sync(bio->bi_opf) && need_seqwrite == 0) pending_bios = &device->pending_sync_bios; else pending_bios = &device->pending_bios; @@ -6785,8 +6804,21 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_bio_counter_inc_noblocked(fs_info); + /* queue all bios into scheduler if sequential write is required */ + if (bbio->need_seqwrite) { + if (!async) { + DECLARE_COMPLETION_ONSTACK(sent); + + bbio->sent = &sent; + btrfs_schedule_bio(dev, bio, bbio->need_seqwrite); + wait_for_completion_io(&sent); + } else { + btrfs_schedule_bio(dev, bio, bbio->need_seqwrite); + } + return; + } if (async) - btrfs_schedule_bio(dev, bio); + btrfs_schedule_bio(dev, bio, bbio->need_seqwrite); else btrfsic_submit_bio(bio); } @@ -6808,9 +6840,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) } } + static blk_status_t __btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num, - int async_submit) + int async_submit, int need_seqwrite) { struct btrfs_device *dev; struct bio *first_bio = bio; @@ -6838,6 +6871,7 @@ static blk_status_t __btrfs_map_bio(struct btrfs_fs_info *fs_info, bbio->private = first_bio->bi_private; bbio->end_io = first_bio->bi_end_io; bbio->fs_info = fs_info; + bbio->need_seqwrite = need_seqwrite; atomic_set(&bbio->stripes_pending, bbio->num_stripes); if ((bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) && @@ -6885,10 +6919,131 @@ static blk_status_t __btrfs_map_bio(struct btrfs_fs_info *fs_info, return BLK_STS_OK; } +static blk_status_t __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, + struct bio *cur_bio, int mirror_num, + int async_submit) +{ + u64 logical = (u64)cur_bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 length = cur_bio->bi_iter.bi_size; + struct bio *bio; + struct bio *next; + struct bio_list submit_list; + struct btrfs_block_group_cache *cache = NULL; + struct map_bio_data *map_private; + int sent; + blk_status_t ret; + + WARN_ON(bio_op(cur_bio) != REQ_OP_WRITE); + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache || cache->alloc_type != BTRFS_ALLOC_SEQ) { + if (cache) + btrfs_put_block_group(cache); + return __btrfs_map_bio(fs_info, cur_bio, mirror_num, + async_submit, 0); + } + + mutex_lock(&cache->submit_lock); + if (cache->submit_offset == logical) + goto send_bios; + + if (cache->submit_offset > logical) { + trace_btrfs_bio_before_write_pointer(cache, cur_bio); + mutex_unlock(&cache->submit_lock); + btrfs_put_block_group(cache); + WARN_ON_ONCE(1); + return BLK_STS_IOERR; + } + + /* buffer the unaligned bio */ + map_private = kmalloc(sizeof(*map_private), GFP_NOFS); + if (!map_private) { + mutex_unlock(&cache->submit_lock); + return errno_to_blk_status(-ENOMEM); + } + + map_private->orig_bi_private = cur_bio->bi_private; + map_private->mirror_num = mirror_num; + cur_bio->bi_private = map_private; + + bio_list_add(&cache->submit_buffer, cur_bio); + mutex_unlock(&cache->submit_lock); + btrfs_put_block_group(cache); + + /* mimic a good result ... */ + return BLK_STS_OK; + +send_bios: + mutex_unlock(&cache->submit_lock); + /* send this bio */ + ret = __btrfs_map_bio(fs_info, cur_bio, mirror_num, 1, 1); + if (ret != BLK_STS_OK) { + /* TODO kill buffered bios */ + return ret; + } + +loop: + /* and send previously buffered following bios */ + mutex_lock(&cache->submit_lock); + cache->submit_offset += length; + length = 0; + bio_list_init(&submit_list); + + /* collect sequential bios into submit_list */ + do { + sent = 0; + bio = bio_list_get(&cache->submit_buffer); + while (bio) { + u64 logical = + (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + struct bio_list *target; + + next = bio->bi_next; + bio->bi_next = NULL; + + if (logical == cache->submit_offset + length) { + sent = 1; + length += bio->bi_iter.bi_size; + target = &submit_list; + } else { + target = &cache->submit_buffer; + } + bio_list_add(target, bio); + + bio = next; + } + } while (sent); + mutex_unlock(&cache->submit_lock); + + /* send the collected bios */ + while ((bio = bio_list_pop(&submit_list)) != NULL) { + map_private = (struct map_bio_data *)bio->bi_private; + mirror_num = map_private->mirror_num; + bio->bi_private = map_private->orig_bi_private; + kfree(map_private); + + ret = __btrfs_map_bio(fs_info, bio, mirror_num, 1, 1); + if (ret) { + bio->bi_status = ret; + bio_endio(bio); + } + } + + if (length) + goto loop; + btrfs_put_block_group(cache); + + return BLK_STS_OK; +} + blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num, int async_submit) { - return __btrfs_map_bio(fs_info, bio, mirror_num, async_submit); + if (btrfs_fs_incompat(fs_info, HMZONED) && bio_op(bio) == REQ_OP_WRITE) + return __btrfs_map_bio_zoned(fs_info, bio, mirror_num, + async_submit); + + return __btrfs_map_bio(fs_info, bio, mirror_num, async_submit, 0); } /* diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f66755e43669..e97d13cb1627 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -329,6 +329,9 @@ struct btrfs_bio { int mirror_num; int num_tgtdevs; int *tgtdev_map; + int need_seqwrite; + struct list_head list; + struct completion *sent; /* * logical block numbers for the start of each stripe * The last one or two are p/q. These are sorted, diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index fe4d268028ee..2b4cd791bf24 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -2091,6 +2091,47 @@ DEFINE_BTRFS_LOCK_EVENT(btrfs_try_tree_read_lock); DEFINE_BTRFS_LOCK_EVENT(btrfs_try_tree_write_lock); DEFINE_BTRFS_LOCK_EVENT(btrfs_tree_read_lock_atomic); +DECLARE_EVENT_CLASS(btrfs_hmzoned_bio_buffer_events, + TP_PROTO(const struct btrfs_block_group_cache *cache, + const struct bio *bio), + + TP_ARGS(cache, bio), + + TP_STRUCT__entry_btrfs( + __field( u64, block_group ) + __field( u64, flags ) + __field( u64, submit_pos ) + __field( u64, logical ) + __field( u64, length ) + ), + + TP_fast_assign_btrfs(cache->fs_info, + __entry->block_group = cache->key.objectid; + __entry->flags = cache->flags; + __entry->submit_pos = cache->submit_offset; + __entry->logical = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + __entry->length = bio->bi_iter.bi_size; + ), + + TP_printk_btrfs( + "block_group=%llu(%s) submit_pos=%llu logical=%llu length=%llu", + __entry->block_group, + __print_flags((unsigned long)__entry->flags, "|", + BTRFS_GROUP_FLAGS), + __entry->submit_pos, __entry->logical, + __entry->length) +); + +#define DEFINE_BTRFS_HMZONED_BIO_BUF_EVENT(name) \ +DEFINE_EVENT(btrfs_hmzoned_bio_buffer_events, name, \ + TP_PROTO(const struct btrfs_block_group_cache *cache, \ + const struct bio *bio), \ + \ + TP_ARGS(cache, bio) \ +) + +DEFINE_BTRFS_HMZONED_BIO_BUF_EVENT(btrfs_bio_before_write_pointer); + #endif /* _TRACE_BTRFS_H */ /* This part must be outside protection */ From patchwork Fri Jun 7 13:10:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981745 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18D8514B6 for ; Fri, 7 Jun 2019 13:12:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08B5228965 for ; Fri, 7 Jun 2019 13:12:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F0B7128B36; Fri, 7 Jun 2019 13:12:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25D2528BAB for ; Fri, 7 Jun 2019 13:12:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729296AbfFGNLm (ORCPT ); Fri, 7 Jun 2019 09:11:42 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729244AbfFGNLl (ORCPT ); Fri, 7 Jun 2019 09:11:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913100; x=1591449100; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GoYf30WTHw0II9b6V5i/U/orUf2sypkPcyGouNXga/M=; b=J5w+muKCneSXYiNgbLo5K0mXvRO38wZslXJefdgK4Vyak+raEWv2/nKM yUpu+1uXjcPT9U7fx3n2YRZDwk7/9by1QFDSv9cEotNXLpkuB9HtPVw1u ut9c8MuDE/l/cHl1hbPkslUjIUVOgsPl882ROcZjCTbhjpMHWW6hao+8L iOmWaHkJFzokEIjtvVgmog7ZnwuEqfJp7RGZqwI49vg5hSE6NfLWTRVYN zwPw0PlbXVMuxTI+8GM/7i1Han1qkhm06pUd8KAUFIobGsStNFcCe7+OY BR+fZWkI5GgJ/QLWVJMsgEznNwbQkAKn1j1AdvvfYryz1+9WaV+g48W5k A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027817" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:40 +0800 IronPort-SDR: kwjn0jBdY6713oYsphFrj2W0PGXjEuFiqVihz86QsXjU+w57Ja8wv84IRL6jdvcShUucpi1BgO ILupTMGWqVqXxHKRqXyaKTLTMTLySkUcZ/NHtOARuf3FrtdaLafXcEgHA8WHfpIL4DSkDQmbES jcBiHcPi/0CzxPx/CYKKAZtBZUAKsFco7siO8jyWOM1mYpXJsKhqpZgwNYVxtvL3TMZIlKuWtv X8hsKvMFERKWUwZkexa6qc81niP/JL5LqpUuq87Ae6aFloQqEUkznPNLoPlRajOxn+10870ae6 m/Ae8Ig+OOCrEy971r674LH2 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:57 -0700 IronPort-SDR: a9N/SVvEpntRkOI1lh60+JeGY+0UjvJLraoXISMbOe7ZnKGftLjFqju7tPgPZcWWC8qBi9O90f RCvhebZ9KTXVYmfrcgkrw80BF7nCrjEbxwBL3Bq35qfh4siviPv8ADHfeOzE5GjW8ADpjp8pCD dy1iP4z+ix/8gW42wfFngRUdEqwvtNFv8anvh7oHKu2RFKamP3yPbd8crVnyUUgjbbErSermHc gU0usiOQdm8Ri/eBf3UmVjA+ZoRT8OC1W22KmbNyYRTYjl0rQSgCkscBczniIux3k+Xjnc4kPa FfI= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:37 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 12/19] btrfs: expire submit buffer on timeout Date: Fri, 7 Jun 2019 22:10:18 +0900 Message-Id: <20190607131025.31996-13-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It is possible to have bios stalled in the submit buffer due to some bug or device problem. In such situation, btrfs stops working waiting for buffered bios completions. To avoid such hang, add a worker that will cancel the stalled bios after a timeout. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 13 ++++ fs/btrfs/disk-io.c | 2 + fs/btrfs/extent-tree.c | 16 +++- fs/btrfs/super.c | 18 +++++ fs/btrfs/volumes.c | 146 ++++++++++++++++++++++++++++++++++- include/trace/events/btrfs.h | 2 + 6 files changed, 193 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ade6d8243962..dad8ea5c3b99 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -596,6 +596,8 @@ enum btrfs_alloc_type { BTRFS_ALLOC_SEQ = 1, }; +struct expire_work; + struct btrfs_block_group_cache { struct btrfs_key key; struct btrfs_block_group_item item; @@ -721,6 +723,14 @@ struct btrfs_block_group_cache { struct mutex submit_lock; u64 submit_offset; struct bio_list submit_buffer; + struct expire_work *expire_work; + int expired:1; +}; + +struct expire_work { + struct list_head list; + struct delayed_work work; + struct btrfs_block_group_cache *block_group; }; /* delayed seq elem */ @@ -1194,6 +1204,9 @@ struct btrfs_fs_info { spinlock_t ref_verify_lock; struct rb_root block_tree; #endif + + struct list_head expire_work_list; + struct mutex expire_work_lock; }; static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ddbb02906042..56a416902ce7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2717,6 +2717,8 @@ int open_ctree(struct super_block *sb, INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); spin_lock_init(&fs_info->reada_lock); btrfs_init_ref_verify(fs_info); + INIT_LIST_HEAD(&fs_info->expire_work_list); + mutex_init(&fs_info->expire_work_lock); fs_info->thread_pool_size = min_t(unsigned long, num_online_cpus() + 2, 8); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ebdc7a6dbe01..cb29a96c226b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -125,6 +125,7 @@ void btrfs_put_block_group(struct btrfs_block_group_cache *cache) WARN_ON(cache->pinned > 0); WARN_ON(cache->reserved > 0); WARN_ON(!bio_list_empty(&cache->submit_buffer)); + WARN_ON(cache->expire_work); /* * If not empty, someone is still holding mutex of @@ -10180,6 +10181,13 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) block_group->cached == BTRFS_CACHE_ERROR) free_excluded_extents(block_group); + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) { + mutex_lock(&block_group->submit_lock); + WARN_ON(!bio_list_empty(&block_group->submit_buffer)); + WARN_ON(block_group->expire_work != NULL); + mutex_unlock(&block_group->submit_lock); + } + btrfs_remove_free_space_cache(block_group); ASSERT(block_group->cached != BTRFS_CACHE_STARTED); ASSERT(list_empty(&block_group->dirty_list)); @@ -10513,6 +10521,7 @@ btrfs_get_block_group_alloc_offset(struct btrfs_block_group_cache *cache) } cache->submit_offset = logical + cache->alloc_offset; + cache->expired = 0; out: cache->alloc_type = alloc_type; @@ -10565,6 +10574,7 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, btrfs_init_full_stripe_locks_tree(&cache->full_stripe_locks_root); cache->alloc_type = BTRFS_ALLOC_FIT; cache->alloc_offset = 0; + cache->expire_work = NULL; if (btrfs_fs_incompat(fs_info, HMZONED)) { ret = btrfs_get_block_group_alloc_offset(cache); @@ -11329,11 +11339,13 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) /* Don't want to race with allocators so take the groups_sem */ down_write(&space_info->groups_sem); + mutex_lock(&block_group->submit_lock); spin_lock(&block_group->lock); if (block_group->reserved || block_group->pinned || btrfs_block_group_used(&block_group->item) || block_group->ro || - list_is_singular(&block_group->list)) { + list_is_singular(&block_group->list) || + !bio_list_empty(&block_group->submit_buffer)) { /* * We want to bail if we made new allocations or have * outstanding allocations in this block group. We do @@ -11342,10 +11354,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) */ trace_btrfs_skip_unused_block_group(block_group); spin_unlock(&block_group->lock); + mutex_unlock(&block_group->submit_lock); up_write(&space_info->groups_sem); goto next; } spin_unlock(&block_group->lock); + mutex_unlock(&block_group->submit_lock); /* We don't want to force the issue, only flip if it's ok. */ ret = inc_block_group_ro(block_group, 0); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 740a701f16c5..343c26537999 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -154,6 +154,24 @@ void __btrfs_handle_fs_error(struct btrfs_fs_info *fs_info, const char *function * completes. The next time when the filesystem is mounted writable * again, the device replace operation continues. */ + + /* expire pending bios in submit buffer */ + if (btrfs_fs_incompat(fs_info, HMZONED)) { + struct expire_work *work; + struct btrfs_block_group_cache *block_group; + + mutex_lock(&fs_info->expire_work_lock); + list_for_each_entry(work, &fs_info->expire_work_list, list) { + block_group = work->block_group; + mutex_lock(&block_group->submit_lock); + if (block_group->expire_work) + mod_delayed_work( + system_unbound_wq, + &block_group->expire_work->work, 0); + mutex_unlock(&block_group->submit_lock); + }; + mutex_unlock(&fs_info->expire_work_lock); + } } #ifdef CONFIG_PRINTK diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 26a64a53032f..a04379e440fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6840,6 +6840,124 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) } } +static void expire_bios_fn(struct work_struct *work) +{ + struct expire_work *ework; + struct btrfs_block_group_cache *cache; + struct bio *bio, *next; + + ework = container_of(work, struct expire_work, work.work); + cache = ework->block_group; + + mutex_lock(&cache->fs_info->expire_work_lock); + mutex_lock(&cache->submit_lock); + list_del(&cache->expire_work->list); + + if (btrfs_fs_closing(cache->fs_info)) { + WARN_ON(!bio_list_empty(&cache->submit_buffer)); + goto end; + } + + if (bio_list_empty(&cache->submit_buffer)) + goto end; + + bio = bio_list_get(&cache->submit_buffer); + cache->expired = 1; + mutex_unlock(&cache->submit_lock); + + btrfs_handle_fs_error(cache->fs_info, -EIO, + "bio submit buffer expired"); + btrfs_err(cache->fs_info, "block group %llu submit pos %llu", + cache->key.objectid, cache->submit_offset); + + while (bio) { + struct map_bio_data *map_private = + (struct map_bio_data *)bio->bi_private; + + next = bio->bi_next; + bio->bi_next = NULL; + bio->bi_private = map_private->orig_bi_private; + kfree(map_private); + + trace_btrfs_expire_bio(cache, bio); + bio->bi_status = BLK_STS_IOERR; + bio_endio(bio); + + bio = next; + } + +end: + kfree(cache->expire_work); + cache->expire_work = NULL; + mutex_unlock(&cache->submit_lock); + mutex_unlock(&cache->fs_info->expire_work_lock); + btrfs_put_block_group(cache); +} + +static int schedule_expire_work(struct btrfs_block_group_cache *cache) +{ + const unsigned long delay = 90 * HZ; + struct btrfs_fs_info *fs_info = cache->fs_info; + struct expire_work *work; + int ret = 0; + + mutex_lock(&fs_info->expire_work_lock); + mutex_lock(&cache->submit_lock); + if (cache->expire_work) { + mod_delayed_work(system_unbound_wq, &cache->expire_work->work, + delay); + goto end; + } + + work = kmalloc(sizeof(*work), GFP_NOFS); + if (!work) { + ret = -ENOMEM; + goto end; + } + work->block_group = cache; + INIT_LIST_HEAD(&work->list); + INIT_DELAYED_WORK(&work->work, expire_bios_fn); + cache->expire_work = work; + + list_add(&work->list, &fs_info->expire_work_list); + btrfs_get_block_group(cache); + mod_delayed_work(system_unbound_wq, &cache->expire_work->work, delay); + +end: + mutex_unlock(&cache->submit_lock); + mutex_unlock(&cache->fs_info->expire_work_lock); + return ret; +} + +static bool cancel_expire_work(struct btrfs_block_group_cache *cache) +{ + struct expire_work *work; + bool ret = true; + + mutex_lock(&cache->fs_info->expire_work_lock); + mutex_lock(&cache->submit_lock); + work = cache->expire_work; + if (!work) + goto end; + cache->expire_work = NULL; + + ret = cancel_delayed_work(&work->work); + /* + * if cancel failed, expire_work is freed by the + * expire worker thread + */ + if (!ret) + goto end; + + list_del(&work->list); + kfree(work); + btrfs_put_block_group(cache); + +end: + mutex_unlock(&cache->submit_lock); + mutex_unlock(&cache->fs_info->expire_work_lock); + return ret; +} static blk_status_t __btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num, @@ -6931,7 +7049,9 @@ static blk_status_t __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, struct btrfs_block_group_cache *cache = NULL; struct map_bio_data *map_private; int sent; + bool should_queue; blk_status_t ret; + int ret2; WARN_ON(bio_op(cur_bio) != REQ_OP_WRITE); @@ -6944,8 +7064,20 @@ static blk_status_t __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, } mutex_lock(&cache->submit_lock); - if (cache->submit_offset == logical) + + if (cache->expired) { + trace_btrfs_bio_in_expired_block_group(cache, cur_bio); + mutex_unlock(&cache->submit_lock); + btrfs_put_block_group(cache); + WARN_ON_ONCE(1); + return BLK_STS_IOERR; + } + + if (cache->submit_offset == logical) { + mutex_unlock(&cache->submit_lock); + cancel_expire_work(cache); goto send_bios; + } if (cache->submit_offset > logical) { trace_btrfs_bio_before_write_pointer(cache, cur_bio); @@ -6968,13 +7100,18 @@ static blk_status_t __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, bio_list_add(&cache->submit_buffer, cur_bio); mutex_unlock(&cache->submit_lock); + + ret2 = schedule_expire_work(cache); + if (ret2) { + btrfs_put_block_group(cache); + return errno_to_blk_status(ret2); + } btrfs_put_block_group(cache); /* mimic a good result ... */ return BLK_STS_OK; send_bios: - mutex_unlock(&cache->submit_lock); /* send this bio */ ret = __btrfs_map_bio(fs_info, cur_bio, mirror_num, 1, 1); if (ret != BLK_STS_OK) { @@ -7013,6 +7150,7 @@ static blk_status_t __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, bio = next; } } while (sent); + should_queue = !bio_list_empty(&cache->submit_buffer); mutex_unlock(&cache->submit_lock); /* send the collected bios */ @@ -7031,8 +7169,10 @@ static blk_status_t __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, if (length) goto loop; - btrfs_put_block_group(cache); + if (should_queue) + WARN_ON(schedule_expire_work(cache)); + btrfs_put_block_group(cache); return BLK_STS_OK; } diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 2b4cd791bf24..0ffb0b330b6c 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -2131,6 +2131,8 @@ DEFINE_EVENT(btrfs_hmzoned_bio_buffer_events, name, \ ) DEFINE_BTRFS_HMZONED_BIO_BUF_EVENT(btrfs_bio_before_write_pointer); +DEFINE_BTRFS_HMZONED_BIO_BUF_EVENT(btrfs_expire_bio); +DEFINE_BTRFS_HMZONED_BIO_BUF_EVENT(btrfs_bio_in_expired_block_group); #endif /* _TRACE_BTRFS_H */ From patchwork Fri Jun 7 13:10:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981739 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 28E1F14B6 for ; Fri, 7 Jun 2019 13:12:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B3F42886B for ; Fri, 7 Jun 2019 13:12:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0FB9128B7A; Fri, 7 Jun 2019 13:12:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B04E428965 for ; Fri, 7 Jun 2019 13:12:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729325AbfFGNLq (ORCPT ); Fri, 7 Jun 2019 09:11:46 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729299AbfFGNLn (ORCPT ); Fri, 7 Jun 2019 09:11:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913103; x=1591449103; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3H1gt6qDbJK5L6WeE3cmLouxI5kw9amQgf6PfEfBzzA=; b=QcCBV67ouuY2x4AgZlim1TlaMQuqdgmdauxtG3nDWDgtFf4j2Uy1jsNC qSUcvolFI/TePTStSz++3z0rAwZOgkQZtxqG45MyVusVjLlwG/eebkfw0 5vt6/0LWwyRdO92x3bf2VEDEeY5howW6jMCq6gzy0BYHP5JlOScy3MzEv X1WgZ8MOJbnkrhN7whQQYVPa6yXCNY0veuhXsHLg2j3xQaq1t1Z/nBTxs lok0sjAKSIHhBPZcTnT4nupL/VnpP87EzaXpPg8KezymFEwHphqNXarfk GBtI0a1KwARC4otq5Bm1W+DzX60iMBnaxNjVLfAoBrqZGX/NkSp4hHg4s A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027822" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:43 +0800 IronPort-SDR: Y8tkiP48KrQxcmcbQHvY78pQG4VvLWFNRg1qw6o7Zm47+8ojon/l3Y5fXIBkEIO9PZGUxB1mmI oHUvnbIOZ9i4xORGrZ7XMP+jCtwtvC/2VjhyU0JQLNJ1RCRMgVYtRJ2+uRUH+0rvCP+Hq0V6FA vKXF2rSIwgoMbf44ui0qyVvff2g5V4p7KT5ryAvz7yOepXPJgzKBh009vj8/vfq0RfVqJTztVe iIBKYhHSIs+IPmTqxiyyAP1A8SjF1wT1suk9IaJ+ChJ2PljrXYx1Bf/bWtMAQXAwPpWE992SMw 5U32klJwrHCwuqUVTERit2Sx Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:48:59 -0700 IronPort-SDR: fkSvxhUQTL8482s8hxWPSBdBJN+Y/AGyg2D29sTHeMWlBI9W8owNFJL1UlI7L86o8Op3gEUotR XiYXVAzJ2LvaAXAARtrwmGFKW5Kd5Ye7b/pYwvUwV7UEBaeLM87dw+svFWATf/IZNlWSNMtUy2 0vARqdFnHRK6PtP3lGbN9BpIQqeCwYHn3Nh6CucMNw9UUt/hvMwyQH7cbqwZOEL4u8NrkeuU0j aATi5zIP32XQeUDAf1GiSr2Jt0HtObLiCnnqQWoLGPpQgHvxyooK7i4XSB4HwuisNNllTmDIuM KW0= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:41 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 13/19] btrfs: avoid sync IO prioritization on checksum in HMZONED mode Date: Fri, 7 Jun 2019 22:10:19 +0900 Message-Id: <20190607131025.31996-14-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Btrfs prioritize sync I/Os to be handled by async checksum worker earlier. As a result, checksumming sync I/Os to larger logical extent address can finish faster than checksumming non-sync I/Os to smaller logical extent address. Since we have upper limit of number of checksum worker, it is possible that sync I/Os to wait forever for non-starting checksum of I/Os for smaller address. This situation can be reproduced by e.g. fstests btrfs/073. To avoid such disordering, disable sync IO prioritization for now. Note that sync I/Os anyway must wait for I/Os to smaller address to finish. So, actually prioritization have no benefit in HMZONED mode. Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 56a416902ce7..6651986da470 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -838,7 +838,7 @@ blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, async->status = 0; - if (op_is_sync(bio->bi_opf)) + if (op_is_sync(bio->bi_opf) && !btrfs_fs_incompat(fs_info, HMZONED)) btrfs_set_work_high_priority(&async->work); btrfs_queue_work(fs_info->workers, &async->work); From patchwork Fri Jun 7 13:10:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981743 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 72AEC14E5 for ; Fri, 7 Jun 2019 13:12:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6346E28857 for ; Fri, 7 Jun 2019 13:12:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 57D2F28965; Fri, 7 Jun 2019 13:12:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C66C1288F7 for ; Fri, 7 Jun 2019 13:12:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728754AbfFGNLq (ORCPT ); Fri, 7 Jun 2019 09:11:46 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729244AbfFGNLp (ORCPT ); Fri, 7 Jun 2019 09:11:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913105; x=1591449105; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LSxy5uEmyBPfWWJ/8qZvZdh0ZR6TZdvGA+vF1AEs9MI=; b=lRKKx55UjD6fkkbTkp7pEEs8XZxToOXKlklDE/FJuMevHbuzHp44Xhzo V+MI5LA721x95TcJbPccTs7w9iaWt7uvEsBQLuRvQGlWi1KiksqwTZA5y O5zFKwNEVstr8LnVxCnGGkH1lBQBfHvHHkJX8E5/HVZg8eFBHxTvfQGsD YnYaxXrMgOdVe67kDzz/qfb4RgILlF/ydP7b1iX2ql01FQDQX0E+R0jRI n6cOf4BwXQjtzUyGMlHRhi+meFhUB1F50vF9C/b/NnvaxbqXrM41+0Ghm F+nGndLECMZmlVp31TF4jaNFUeTLF8cXHNxqAqMXkZZfblV7n17w77fgU A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027825" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:45 +0800 IronPort-SDR: oF1LCejUPoDVXs8+Xnfz5NGFEjyF2GKHZoKHg/Pl06rm5suAYv7NINborRfE7iNeGto/WPACxu auBkVMJ0mR/vqv7dlTLT2NOuw0wEeZFuBWxQFKI7VcNIeaDpYLESoYxFsQR6EcY+xFr90SWWPo XutlGR+qvc6RxQpNlgSesEQb+XZYOa01ylJZmivZwcqGLEdRpLTFffJIiwFMQ/d8+2/ekLgE5N +mRTCiGbjxJOIZNqzYRMb7gn2o+mNR2wt/SJqkwINUXM52yM1WX00vIIna+9JYhcUhix7vPBar yv2D71qExRbMn/aoPMVNeNpL Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:49:02 -0700 IronPort-SDR: CsIXMpv7pCkU43XH4e+ANxqeC2A/NnFNqyQ5afRdlALMPA5UUIx8RfF76FKrrxLfiUhtEtJtjo gzxvupaL7h+Z1abHiS7eLCP5Z5K3BidCF98pncwwBQoqHo7ob9Wm9AXN6O88KK0e5SVLm9SsJQ FADkCCmicOobWAuPHKG62vEuKSzDdgXlHIIRJfY/jz9zXLUKUWfpSOJ4ZLVnnmT3BAIBNs8X59 Ep1BOJIv6eP1At3UnIZPsS/P8vgqXOFATSO0+0aiY2Q4IclDFsCkHFWc6nvPdjNy9S1S582Khd FXA= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:43 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 14/19] btrfs: redirty released extent buffers in sequential BGs Date: Fri, 7 Jun 2019 22:10:20 +0900 Message-Id: <20190607131025.31996-15-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On HMZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean extent buffers that have been released in a transaction. Btrfs consult the list before writing out and waiting for the IOs, and it redirties a buffer if 1) it's in sequential BG, 2) it's in un-submit range, and 3) it's not under IO. Thus, such buffers are marked for IO in btrfs_write_and_wait_transaction() to send proper bios to the disk. Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 27 ++++++++++++++++++++++++--- fs/btrfs/extent_io.c | 1 + fs/btrfs/extent_io.h | 2 ++ fs/btrfs/transaction.c | 35 +++++++++++++++++++++++++++++++++++ fs/btrfs/transaction.h | 3 +++ 5 files changed, 65 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6651986da470..c6147fce648f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -535,7 +535,9 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) if (csum_tree_block(eb, result)) return -EINVAL; - if (btrfs_header_level(eb)) + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) + ret = 0; + else if (btrfs_header_level(eb)) ret = btrfs_check_node(eb); else ret = btrfs_check_leaf_full(eb); @@ -1115,10 +1117,20 @@ struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, void btrfs_clean_tree_block(struct extent_buffer *buf) { struct btrfs_fs_info *fs_info = buf->fs_info; - if (btrfs_header_generation(buf) == - fs_info->running_transaction->transid) { + struct btrfs_transaction *cur_trans = fs_info->running_transaction; + + if (btrfs_header_generation(buf) == cur_trans->transid) { btrfs_assert_tree_locked(buf); + if (btrfs_fs_incompat(fs_info, HMZONED) && + list_empty(&buf->release_list)) { + atomic_inc(&buf->refs); + spin_lock(&cur_trans->releasing_ebs_lock); + list_add_tail(&buf->release_list, + &cur_trans->releasing_ebs); + spin_unlock(&cur_trans->releasing_ebs_lock); + } + if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, -buf->len, @@ -4533,6 +4545,15 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, btrfs_destroy_pinned_extent(fs_info, fs_info->pinned_extents); + while (!list_empty(&cur_trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&cur_trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 13fca7bfc1f2..c73c69e2bef4 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4816,6 +4816,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, init_waitqueue_head(&eb->read_lock_wq); btrfs_leak_debug_add(&eb->leak_list, &buffers); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index aa18a16a6ed7..2987a01f84f9 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -58,6 +58,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -186,6 +187,7 @@ struct extent_buffer { */ wait_queue_head_t read_lock_wq; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG atomic_t spinning_writers; atomic_t spinning_readers; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3f6811cdf803..ded40ad75419 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -236,6 +236,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2219,7 +2221,31 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) wake_up(&fs_info->transaction_wait); + if (btrfs_fs_incompat(fs_info, HMZONED)) { + struct extent_buffer *eb; + + list_for_each_entry(eb, &cur_trans->releasing_ebs, + release_list) { + struct btrfs_block_group_cache *cache; + + cache = btrfs_lookup_block_group(fs_info, eb->start); + if (!cache) + continue; + mutex_lock(&cache->submit_lock); + if (cache->alloc_type == BTRFS_ALLOC_SEQ && + cache->submit_offset <= eb->start && + !extent_buffer_under_io(eb)) { + set_extent_buffer_dirty(eb); + cache->space_info->bytes_readonly += eb->len; + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + } + mutex_unlock(&cache->submit_lock); + btrfs_put_block_group(cache); + } + } + ret = btrfs_write_and_wait_transaction(trans); + if (ret) { btrfs_handle_fs_error(fs_info, ret, "Error while writing out transaction"); @@ -2227,6 +2253,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + while (!list_empty(&cur_trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&cur_trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 78c446c222b7..7984a7f01dd8 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -85,6 +85,9 @@ struct btrfs_transaction { spinlock_t dropped_roots_lock; struct btrfs_delayed_ref_root delayed_refs; struct btrfs_fs_info *fs_info; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) From patchwork Fri Jun 7 13:10:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981733 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C894914B6 for ; Fri, 7 Jun 2019 13:12:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B8BE8288E4 for ; Fri, 7 Jun 2019 13:12:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ACCBA28BAB; Fri, 7 Jun 2019 13:12:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 55C0128857 for ; Fri, 7 Jun 2019 13:12:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728669AbfFGNMS (ORCPT ); Fri, 7 Jun 2019 09:12:18 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729331AbfFGNLr (ORCPT ); Fri, 7 Jun 2019 09:11:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913107; x=1591449107; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4nLHyfI3fv6ZUFS/VY17ZQTSfHVHXl6J3Rsy55AAxfI=; b=bf491hq+/rKLt8/g+b4N48dghEfF3xe3USJYMWnD7ppMF6LQB8yTmc71 neZPESzp/2wumT2NZT4Nsh6iR+OBt8fku88HbgzpJEzA7+CTE09+hupI+ hYd4Kvs9Po1hI3mHMS9BQbxmNJXPYlkMeVxEm3Wxp9TdOhq9rE5EbrW5h ED8s/a2zZg+wURedhcVXYtl5k0vxdHtXDQXWFKLrbeeMD9t5THXGQqfyq 1niVLdPmaLjNUobvSWPLJhcd0UJvb+BzpnBffPhZ7y07AYCLEaINn4yLh roujzsBYITr+4N++D3x6tHCa8Ha1Gb3Qvfvwmn13lIoUIW9i6jzRJdQOp g==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027828" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:47 +0800 IronPort-SDR: vAHs450dXNQmH+P7+KLM7Qjm+T+Um6g0Zom6PfpnvVIxDCgMey18KJNeHKoqTQs3OK4d0HOYmJ 9F/Og9KPt5vsNctJXnlMmVBGcNiW8atEP0b7D4jApOGur+4Te5IsJgaTIlu/dhrAXgyCHG8+vv C5f3AnRMI8GFWFofVRGZbUVFfJMXmkBx8Zp3V6JqPnebHu+Hsx+qQGGDCKIhqKoPMIKS1LJIO6 0eAIwRx8XGaGjp58CV5N8YOcD5M9Cr7+pUlJmKbX8WteLwoytbXPN46H7FNezBwXx+S3tlZWPb hqSxBAAHuSMis9yU/8dgmy+F Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:49:04 -0700 IronPort-SDR: G16LtUd5bpuNhdKAT+GJ64jCGJDCmSwixdGbkAjD/5aTmabaHr6g0WAuwCU2x2fTn64SgXiht6 FeQRwPatSfUymMiWjfPDDSJ9uxFPy/TCJDAJVP5ogBrTg8hCVijGXwClBVICZPXiYwiR5j5JbR /ZL5kUv3H4f8z1jwvQFO4LtM9jZ3yNnkLtco4C1mRcmLhwEQfWLQS50T7EyD8xT7cdwKXlAV/w URI8YGhsb0RJJ4qffLiCffnlQlBQZ1MHrxDz35OJMFsR9RD0pFjTv6Nz4V7p9JUoREo83OqI6l gVY= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:45 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 15/19] btrfs: reset zones of unused block groups Date: Fri, 7 Jun 2019 22:10:21 +0900 Message-Id: <20190607131025.31996-16-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For an HMZONED volume, a block group maps to a zone of the device. For deleted unused block groups, the zone of the block group can be reset to rewind the zone write pointer at the start of the zone. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index cb29a96c226b..ff4d55d6ef04 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2018,6 +2018,26 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + + if (btrfs_dev_is_sequential(stripe->dev, + stripe->physical) && + stripe->length == stripe->dev->zone_size) { + ret = blkdev_reset_zones(stripe->dev->bdev, + stripe->physical >> + SECTOR_SHIFT, + stripe->length >> + SECTOR_SHIFT, + GFP_NOFS); + if (!ret) + discarded_bytes += stripe->length; + else + break; + set_bit(stripe->physical >> + stripe->dev->zone_size_shift, + stripe->dev->empty_zones); + continue; + } + req_q = bdev_get_queue(stripe->dev->bdev); if (!blk_queue_discard(req_q)) continue; @@ -11430,7 +11450,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_unlock(&space_info->lock); /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD); + trimming = btrfs_test_opt(fs_info, DISCARD) || + btrfs_fs_incompat(fs_info, HMZONED); /* Implicit trim during transaction commit. */ if (trimming) From patchwork Fri Jun 7 13:10:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981731 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 783A214B6 for ; Fri, 7 Jun 2019 13:12:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A99028BEC for ; Fri, 7 Jun 2019 13:12:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5F22C28C04; Fri, 7 Jun 2019 13:12:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05D9328BA5 for ; Fri, 7 Jun 2019 13:12:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729363AbfFGNLu (ORCPT ); Fri, 7 Jun 2019 09:11:50 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729358AbfFGNLt (ORCPT ); Fri, 7 Jun 2019 09:11:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913109; x=1591449109; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=G3/Vp9VLrbcnr4kDWzhgsEYPsCzzkH9MlcwAXXNhHIo=; b=c8At71QNzQX/coXo2dMike0BNOtREKGvOTTFplmHO3GmmA05Rc06jl1c xVgpreB729Xsh7CKdhRykCUiWmEtYTMStfTVa6G5UJlEyYBpCNxAJQUxw QegiWMMXCJGrFNzbmg37B+and7XNDA1hEO9Qs1hqLuc8g/UwFixRfntxY 7gplsbkF5y3RQdJE9YODjQv75rq9ZjfqQSPazfMUYeH8v6FjbExFALKJB SF1b2u3NmARADPjfVr3OLf8N1/JDoGrKT5Oq0WUTTYldqMglIPeCIR39y ovCNJuxJDtZ+gt6Li++y3RK9GoNAjYSsXzcqXt6ktEgzngBcYTLUpWPVk A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027830" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:49 +0800 IronPort-SDR: 9HRUwnprtHIERKiJEx5srnFM0FYVpgMfJ4KbrNjIqOozau4YZsyi4TQhgZBMGz6F4S20Qye2z5 YgaTKQ45qY+xGXDXKO8MJ3NVe1NwFWLtpm/DiqayyCml7Q6BtORZY6cXt07C2cvpWld4PRppjY lJDlwB16NsEtXhC8uokTsq+RoHPinq+jEp33ngdG/fa3hyvWYoepnn+TIcafK+kVlsvdb+iAT+ nk/4yXaJciC0DkdWcYFwyUp35kFmw+OguiIc5Y51SKvJdNMQruNZXldpSPHRw+IdoD8E3Bc4Ge HkMR3Uhd8CQ91nxIcyx0bgYD Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:49:06 -0700 IronPort-SDR: ZwcGTQv9TuHReFMxKoQ8KBHBcJz3oTri1VPGAZ0U2gxiK1TvAiEca/JACj07y+j4Q4Y/Isiy4D CNnhp892FCdZR249ARAbn+daosKOHxSOTwxPTE62Bt56vekU3GnfjQNq2TjK5BIH/R11svxlGA JesDCp5XQ9x5qVXj57ahV7xf9nJGsBIWCpraViQ7DJ5uhO62jHZaJ2wnS8mDFewTwB3K2h+O/j 5mzp9jv292y7jVTYo3PfhkfXLSZw7oMknNjx+Xfwmf9pT2o1umymTJUgZm72RszVcE67QWou+c 3ws= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:47 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 16/19] btrfs: wait existing extents before truncating Date: Fri, 7 Jun 2019 22:10:22 +0900 Message-Id: <20190607131025.31996-17-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 89542c19d09e..4e8c7921462f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5137,6 +5137,17 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_end_write_no_snapshotting(root); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_fs_incompat(fs_info, HMZONED)) { + u64 sectormask = fs_info->sectorsize - 1; + + ret = btrfs_wait_ordered_range(inode, + newsize & (~sectormask), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Fri Jun 7 13:10:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981729 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A4AD14E5 for ; Fri, 7 Jun 2019 13:12:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4BFA0288F7 for ; Fri, 7 Jun 2019 13:12:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3F94728B36; Fri, 7 Jun 2019 13:12:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9523928BAB for ; Fri, 7 Jun 2019 13:12:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729393AbfFGNLw (ORCPT ); Fri, 7 Jun 2019 09:11:52 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729380AbfFGNLw (ORCPT ); Fri, 7 Jun 2019 09:11:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913112; x=1591449112; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GKTLJDMPSC7oI+dEroHu1zACnrTugLVZNcumBahEBmo=; b=B2UKT4d3JsOHykMpUlqg2pHIlVri7GvGwt1Atpk48/GKVGVGckbwm5cy O853VRBtoGP0bjLNuWyzOmJxLWsOSeKTu6DgVTghNyOu1IzdFcA+gapKV StLb1uhveO0ffEYjz5jQ55/TfcCP3ZAWfAL0abxr//gvVwiQ+M0FwS+xa BSaqnOSM2yFiUFuz+JsDKXkxFe8fL7I5869PSzPSSGlvgqwQr3z8IiMWh G8foA4vAWf3sXFmqc+pIsjIpr27R3CFrnk/Q2CJHnpq4fZp5mTi7MH8Tb x08RsDgBHiq5hBReGlognKx/X0qPgg0SXkWgDI06XpDto2ZIH1VY8hrJK A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027833" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:51 +0800 IronPort-SDR: jqRFCCLG0nmeDNsMiHbogjZ+eUNuZb7uV/Do2HAEKihQPJb+mMCylE+cDj+nRsu0J2dVSEz0cv 051RjdOW8mjfQrTwpRGdHOgYJEtZewwubAMr14BMmbCGn0ucDAsvvzmKN+4Yc2GuHx04L5IIPN em25PoWUpHuaEYUAeD8buIXav5ATeOIBVt6WttedufqbVB4VX7/c2N8ySXI0pO7fnv+D81TWh/ UrD/ETD0a5Hdoc03uTjLBHfjaXWlaU+RJdDRsNH4ClagCk4WPRjwZ6q2WDtS8VXB8liPjIDcuB 5BM+3Kzbe5fIkoOBHqztiQE7 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:49:08 -0700 IronPort-SDR: N4hXjbYkwpT0Rm7GGc+XOG4jjckvPLd/sRk5mC7ndisiGr/S9C24fE5PFMbLCE8EILow8OVYHb cXXOn7E69piuGZ6+76vgHNhaXuLC5rgBvFjisyHJYwy0v4IzdIrxmxd9uN2aTqHMUBiXugb2ky hFutFfqr5Hbm4Zkq9QWNrI2yVNIDuXO3+wNhP5e/Gluy0tSE/iIIhB0fSyP4ibphvpTPyhEAof SzYH9JqoIUshxY2xeX/Mo4nNk1jpX6YKOcNqlePn4uFUrKr9mHJyDHpj/Hst9lHt2/+icPeprv xOs= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:50 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 17/19] btrfs: shrink delayed allocation size in HMZONED mode Date: Fri, 7 Jun 2019 22:10:23 +0900 Message-Id: <20190607131025.31996-18-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In a write heavy workload, the following scenario can occur: 1. mark page #0 to page #2 (and their corresponding extent region) as dirty and candidate for delayed allocation pages 0 1 2 3 4 dirty o o o - - towrite - - - - - delayed o o o - - alloc 2. extent_write_cache_pages() mark dirty pages as TOWRITE pages 0 1 2 3 4 dirty o o o - - towrite o o o - - delayed o o o - - alloc 3. Meanwhile, another write dirties page #3 and page #4 pages 0 1 2 3 4 dirty o o o o o towrite o o o - - delayed o o o o o alloc 4. find_lock_delalloc_range() decide to allocate a region to write page #0 to page #4 5. but, extent_write_cache_pages() only initiate write to TOWRITE tagged pages (#0 to #2) So the above process leaves page #3 and page #4 behind. Usually, the periodic dirty flush kicks write IOs for page #3 and #4. However, if we try to mount a subvolume at this timing, mount process takes s_umount write lock to block the periodic flush to come in. To deal with the problem, shrink the delayed allocation region to have only expected to be written pages. Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c73c69e2bef4..ea582ff85c73 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3310,6 +3310,33 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode, delalloc_start = delalloc_end + 1; continue; } + + if (btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED) && + (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) && + ((delalloc_start >> PAGE_SHIFT) < + (delalloc_end >> PAGE_SHIFT))) { + unsigned long i; + unsigned long end_index = delalloc_end >> PAGE_SHIFT; + + for (i = delalloc_start >> PAGE_SHIFT; + i <= end_index; i++) + if (!xa_get_mark(&inode->i_mapping->i_pages, i, + PAGECACHE_TAG_TOWRITE)) + break; + + if (i <= end_index) { + u64 unlock_start = (u64)i << PAGE_SHIFT; + + if (i == delalloc_start >> PAGE_SHIFT) + unlock_start += PAGE_SIZE; + + unlock_extent(tree, unlock_start, delalloc_end); + __unlock_for_delalloc(inode, page, unlock_start, + delalloc_end); + delalloc_end = unlock_start - 1; + } + } + ret = btrfs_run_delalloc_range(inode, page, delalloc_start, delalloc_end, &page_started, nr_written, wbc); /* File system has been set read-only */ From patchwork Fri Jun 7 13:10:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981727 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7FD4514B6 for ; Fri, 7 Jun 2019 13:12:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7192628965 for ; Fri, 7 Jun 2019 13:12:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 65E5D28B36; Fri, 7 Jun 2019 13:12:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54DD828B7A for ; Fri, 7 Jun 2019 13:12:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729417AbfFGNLz (ORCPT ); Fri, 7 Jun 2019 09:11:55 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729404AbfFGNLy (ORCPT ); Fri, 7 Jun 2019 09:11:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913114; x=1591449114; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JsJEksBAd4pxlAoho7ymddDsmZQUseg8FYJnpzL5RF0=; b=qoHz+JKWu+POwXs4RmFgIkOxbICaOnJdaOOWJ4dxonRnCZ8s188/cjG9 2vEQ8BXMZFUiWEgh3WuRyXMmj436WKuOlK6o/yh9s8L2oFOHIeeEM2Gv4 5peKdTADelyczDTh4q24axb3PArJwAQNoBqyCSK9HbTpIAmOGA/mckQ8+ yRYRJDt4DURkOn+UDY2XyX5Gu8tkg1rzZkO82edCytEzYcIujVHo/hVVJ 4KrONAUckff68T+lw7mkBouDZXo2qoiaTNskDZlpzJOyXaFgDv6NmISwp Vlfiw1KO+gfSvmRQtvoCkFSXaqmTnSH7gbQja02m3s1Lu+iuHDnejsVxF A==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027836" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:54 +0800 IronPort-SDR: TFRe6NKM4mF7cFUbS7qO2XUpJktKuC9XqWPvbQBLnhxv0C27cGhi8y7HIhfWc5o4qssK7uku0O N3PmNfh/cKAlwLbqZivK6gAxWI3kJf35jAxYdtIh3t7P8sFnFx4hAGpdzjnu+7lDf9btJ8bLZG bvuiYVwpFIV+IlkVB4yuumMDtUR1Tt2Qa0gctT6lVlcT0cyaiUgYeUn307yZjJQNrIlABMwf+X 4pPTF93yRcxcCenTKlMIFbiJkc7ge182uF9ZQn6DT+BFGv4fJH+9RI+vGsp2LD/5rbKQwrlgpG EZRBtyE2nGInfUE1q/pksg7N Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:49:11 -0700 IronPort-SDR: fNhUCvGk1hpWIQkOdSVYVao56ook/uvyTJ0C0ii4bVedeCMgNsQ+z20OA6sGJXjedY2cUkYH2L 8YokaYA7y8m9wJ5wtCco0EF1AOzKb8QQG/mT02KYIpOHlIq95wVRoVH/JG6WYLNrcZyJi743HM TMvBEK/PqzQ5oiaeI/JNcb3BG0hN6M3ECiukBJhcC9lRrXQQtWYh/EDeBOhPP6EZt1qKfKvPW0 6/AOtLiDCfsejHAOm1DkqgyoA1+wOkr8AGPP3uPF1en/FG0+gTiNvFbaDDiX/Cm/kLOjNY2xbw U6E= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:52 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 18/19] btrfs: support dev-replace in HMZONED mode Date: Fri, 7 Jun 2019 22:10:24 +0900 Message-Id: <20190607131025.31996-19-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, dev-replace copy all the device extents on source device to the target device, and it also clones new incoming write I/Os from users to the source device into the target device. Cloning incoming IOs can break the sequential write rule in the target device. When write is mapped in the middle of block group, that I/O is directed in the middle of a zone of target device, which breaks the sequential write rule. However, the cloning function cannot be simply disabled since incoming I/Os targeting already copied device extents must be cloned so that the I/O is executed on the target device. We cannot use dev_replace->cursor_{left,right} to determine whether bio is going to not yet copied region. Since we have time gap between finishing btrfs_scrub_dev() and rewriting the mapping tree in btrfs_dev_replace_finishing(), we can have newly allocated device extent which is never cloned (by handle_ops_on_dev_replace) nor copied (by the dev-replace process). So the point is to copy only already existing device extents. This patch introduce mark_block_group_to_copy() to mark existing block group as a target of copying. Then, handle_ops_on_dev_replace() and dev-replace can check the flag to do their job. This patch also handles empty region between used extents. Since dev-replace is smart to copy only used extents on source device, we have to fill the gap to honor the sequential write rule in the target device. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 1 + fs/btrfs/dev-replace.c | 96 +++++++++++++++++++++++ fs/btrfs/extent-tree.c | 32 +++++++- fs/btrfs/scrub.c | 169 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.c | 27 ++++++- 5 files changed, 319 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index dad8ea5c3b99..a0be2b96117a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -639,6 +639,7 @@ struct btrfs_block_group_cache { unsigned int has_caching_ctl:1; unsigned int removed:1; unsigned int wp_broken:1; + unsigned int to_copy:1; int disk_cache_state; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index fbe5ea2a04ed..5011b5ce0e75 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -263,6 +263,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, device->dev_stats_valid = 1; set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE); device->fs_devices = fs_info->fs_devices; + if (bdev_is_zoned(bdev)) { + ret = btrfs_get_dev_zonetypes(device); + if (ret) { + mutex_unlock(&fs_info->fs_devices->device_list_mutex); + goto error; + } + } list_add(&device->dev_list, &fs_info->fs_devices->devices); fs_info->fs_devices->num_devices++; fs_info->fs_devices->open_devices++; @@ -396,6 +403,88 @@ static char* btrfs_dev_name(struct btrfs_device *device) return rcu_str_deref(device->name); } +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info, + struct btrfs_device *src_dev) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_dev_extent *dev_extent = NULL; + struct btrfs_block_group_cache *cache; + struct extent_buffer *l; + int slot; + int ret; + u64 chunk_offset, length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + path->reada = READA_FORWARD; + path->search_commit_root = 1; + path->skip_locking = 1; + + key.objectid = src_dev->devid; + key.offset = 0ull; + key.type = BTRFS_DEV_EXTENT_KEY; + + while (1) { + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + break; + if (ret > 0) { + if (path->slots[0] >= + btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + break; + if (ret > 0) { + ret = 0; + break; + } + } else { + ret = 0; + } + } + + l = path->nodes[0]; + slot = path->slots[0]; + + btrfs_item_key_to_cpu(l, &found_key, slot); + + if (found_key.objectid != src_dev->devid) + break; + + if (found_key.type != BTRFS_DEV_EXTENT_KEY) + break; + + if (found_key.offset < key.offset) + break; + + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent); + length = btrfs_dev_extent_length(l, dev_extent); + + chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + if (!cache) + goto skip; + + cache->to_copy = 1; + + btrfs_put_block_group(cache); + +skip: + key.offset = found_key.offset + length; + btrfs_release_path(path); + } + + btrfs_free_path(path); + + return ret; +} + static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, const char *tgtdev_name, u64 srcdevid, const char *srcdev_name, int read_src) @@ -439,6 +528,13 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, } need_unlock = true; + + mutex_lock(&fs_info->chunk_mutex); + ret = mark_block_group_to_copy(fs_info, src_device); + mutex_unlock(&fs_info->chunk_mutex); + if (ret) + return ret; + down_write(&dev_replace->rwsem); switch (dev_replace->replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ff4d55d6ef04..268365dd9a5d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -29,6 +29,7 @@ #include "qgroup.h" #include "ref-verify.h" #include "rcu-string.h" +#include "dev-replace.h" #undef SCRAMBLE_DELAYED_REFS @@ -2022,7 +2023,31 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, if (btrfs_dev_is_sequential(stripe->dev, stripe->physical) && stripe->length == stripe->dev->zone_size) { - ret = blkdev_reset_zones(stripe->dev->bdev, + struct btrfs_device *dev = stripe->dev; + + ret = blkdev_reset_zones(dev->bdev, + stripe->physical >> + SECTOR_SHIFT, + stripe->length >> + SECTOR_SHIFT, + GFP_NOFS); + if (!ret) + discarded_bytes += stripe->length; + else + break; + set_bit(stripe->physical >> + dev->zone_size_shift, + dev->empty_zones); + + if (!btrfs_dev_replace_is_ongoing( + &fs_info->dev_replace) || + stripe->dev != fs_info->dev_replace.srcdev) + continue; + + /* send to target as well */ + dev = fs_info->dev_replace.tgtdev; + + ret = blkdev_reset_zones(dev->bdev, stripe->physical >> SECTOR_SHIFT, stripe->length >> @@ -2033,8 +2058,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, else break; set_bit(stripe->physical >> - stripe->dev->zone_size_shift, - stripe->dev->empty_zones); + dev->zone_size_shift, + dev->empty_zones); + continue; } diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 36ad4fad7eaf..7bfc19c50224 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -165,6 +165,7 @@ struct scrub_ctx { int pages_per_rd_bio; int is_dev_replace; + u64 write_pointer; struct scrub_bio *wr_curr_bio; struct mutex wr_lock; @@ -1646,6 +1647,19 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, sbio = sctx->wr_curr_bio; if (sbio->page_count == 0) { struct bio *bio; + u64 physical = spage->physical_for_dev_replace; + + if (btrfs_fs_incompat(sctx->fs_info, HMZONED) && + sctx->write_pointer < physical) { + u64 length = physical - sctx->write_pointer; + + ret = blkdev_issue_zeroout( + sctx->wr_tgtdev->bdev, + sctx->write_pointer >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS, 0); + sctx->write_pointer = physical; + } sbio->physical = spage->physical_for_dev_replace; sbio->logical = spage->logical; @@ -1708,6 +1722,10 @@ static void scrub_wr_submit(struct scrub_ctx *sctx) * doubled the write performance on spinning disks when measured * with Linux 3.5 */ btrfsic_submit_bio(sbio->bio); + + if (btrfs_fs_incompat(sctx->fs_info, HMZONED)) + sctx->write_pointer = sbio->physical + + sbio->page_count * PAGE_SIZE; } static void scrub_wr_bio_end_io(struct bio *bio) @@ -3030,6 +3048,43 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return ret < 0 ? ret : 0; } +static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical, + struct blk_zone *zone) +{ + struct btrfs_bio *bbio = NULL; + u64 mapped_length = PAGE_SIZE; + int nmirrors; + int i, ret; + + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, + &mapped_length, &bbio); + if (ret || !bbio || mapped_length < PAGE_SIZE) { + btrfs_put_bbio(bbio); + return -EIO; + } + + if (bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + return -EINVAL; + + nmirrors = min(scrub_nr_raid_mirrors(bbio), BTRFS_MAX_MIRRORS); + for (i = 0; i < nmirrors; i++) { + u64 physical = bbio->stripes[i].physical; + struct btrfs_device *dev = bbio->stripes[i].dev; + + /* missing device */ + if (!dev->bdev) + continue; + + ret = btrfs_get_dev_zone(dev, physical, zone, GFP_NOFS); + /* failing device */ + if (ret == -EIO || ret == -EOPNOTSUPP) + continue; + break; + } + + return ret; +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3161,6 +3216,15 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ blk_start_plug(&plug); + if (btrfs_fs_incompat(fs_info, HMZONED) && sctx->is_dev_replace && + btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { + mutex_lock(&sctx->wr_lock); + sctx->write_pointer = physical; + mutex_unlock(&sctx->wr_lock); + } + + sctx->flush_all_writes = true; + /* * now find all extents for each stripe and scrub them */ @@ -3333,6 +3397,15 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) goto out; + sctx->flush_all_writes = true; + scrub_submit(sctx); + mutex_lock(&sctx->wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_lock); + + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); + if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -3400,6 +3473,45 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, blk_finish_plug(&plug); btrfs_free_path(path); btrfs_free_path(ppath); + + if (btrfs_fs_incompat(fs_info, HMZONED) && sctx->is_dev_replace && + ret >= 0) { + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); + + mutex_lock(&sctx->wr_lock); + if (sctx->write_pointer < physical_end && + btrfs_dev_is_sequential(sctx->wr_tgtdev, + sctx->write_pointer)) { + struct blk_zone zone; + u64 wp; + + ret = read_zone_info(fs_info, base + offset, &zone); + if (ret) { + btrfs_err(fs_info, + "cannot recover write pointer"); + goto out_zone_sync; + } + + wp = map->stripes[num].physical + + ((zone.wp - zone.start) << SECTOR_SHIFT); + if (sctx->write_pointer < wp) { + u64 length = wp - sctx->write_pointer; + + ret = blkdev_issue_zeroout( + sctx->wr_tgtdev->bdev, + sctx->write_pointer >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS, 0); + } + } +out_zone_sync: + mutex_unlock(&sctx->wr_lock); + clear_bit(map->stripes[num].physical >> + sctx->wr_tgtdev->zone_size_shift, + sctx->wr_tgtdev->empty_zones); + } + return ret < 0 ? ret : 0; } @@ -3468,11 +3580,14 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, int ret = 0; int ro_set; int slot; + int i, num_extents, cur_extent; struct extent_buffer *l; struct btrfs_key key; struct btrfs_key found_key; struct btrfs_block_group_cache *cache; struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + struct extent_map *em; + struct map_lookup *map; path = btrfs_alloc_path(); if (!path) @@ -3487,6 +3602,23 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, key.type = BTRFS_DEV_EXTENT_KEY; while (1) { + if (btrfs_fs_incompat(fs_info, HMZONED) && + sctx->is_dev_replace) { + struct btrfs_trans_handle *trans; + + scrub_pause_on(fs_info); + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) + ret = PTR_ERR(trans); + else + ret = btrfs_commit_transaction(trans); + if (ret) { + scrub_pause_off(fs_info); + break; + } + scrub_pause_off(fs_info); + } + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); if (ret < 0) break; @@ -3541,6 +3673,11 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + if (sctx->is_dev_replace && !cache->to_copy) { + ro_set = 0; + goto done; + } + /* * we need call btrfs_inc_block_group_ro() with scrubs_paused, * to avoid deadlock caused by: @@ -3651,6 +3788,38 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + if (sctx->is_dev_replace) { + em = btrfs_get_chunk_map(fs_info, chunk_offset, 1); + BUG_ON(IS_ERR(em)); + map = em->map_lookup; + + num_extents = cur_extent = 0; + for (i = 0; i < map->num_stripes; i++) { + /* we have more device extent to copy */ + if (dev_replace->srcdev != map->stripes[i].dev) + continue; + + num_extents++; + if (found_key.offset == + map->stripes[i].physical) + cur_extent = i; + } + + free_extent_map(em); + + if (num_extents > 1) { + if (cur_extent == 0) { + btrfs_inc_block_group_ro(cache); + } else if (cur_extent == num_extents - 1) { + btrfs_dec_block_group_ro(cache); + cache->to_copy = 0; + } + } else { + cache->to_copy = 0; + } + } + +done: down_write(&fs_info->dev_replace.rwsem); dev_replace->cursor_left = dev_replace->cursor_right; dev_replace->item_needs_writeback = 1; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a04379e440fb..e0a37466bb2d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1841,6 +1841,8 @@ int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, else search_start = max_t(u64, search_start, SZ_1M); + WARN_ON(device->zone_size && !IS_ALIGNED(num_bytes, device->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -6180,6 +6182,7 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info, static void handle_ops_on_dev_replace(enum btrfs_map_op op, struct btrfs_bio **bbio_ret, struct btrfs_dev_replace *dev_replace, + u64 logical, int *num_stripes_ret, int *max_errors_ret) { struct btrfs_bio *bbio = *bbio_ret; @@ -6190,7 +6193,18 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, int i; if (op == BTRFS_MAP_WRITE) { + struct btrfs_block_group_cache *cache; + struct btrfs_fs_info *fs_info = dev_replace->srcdev->fs_info; int index_where_to_add; + int hmzoned = btrfs_fs_incompat(fs_info, HMZONED); + + cache = btrfs_lookup_block_group(fs_info, logical); + BUG_ON(!cache); + if (hmzoned && cache->to_copy) { + btrfs_put_block_group(cache); + return; + } + btrfs_put_block_group(cache); /* * duplicate the write operations while the dev replace @@ -6215,10 +6229,17 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, new->physical = old->physical; new->length = old->length; new->dev = dev_replace->tgtdev; - bbio->tgtdev_map[i] = index_where_to_add; + bbio->tgtdev_map[i] = + index_where_to_add; index_where_to_add++; max_errors++; tgtdev_indexes++; + + /* mark this zone as non-empty */ + if (hmzoned) + clear_bit(new->physical >> + new->dev->zone_size_shift, + new->dev->empty_zones); } } num_stripes = index_where_to_add; @@ -6551,8 +6572,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bbio, dev_replace, &num_stripes, - &max_errors); + handle_ops_on_dev_replace(op, &bbio, dev_replace, logical, + &num_stripes, &max_errors); } *bbio_ret = bbio; From patchwork Fri Jun 7 13:10:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10981721 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3100C14E5 for ; Fri, 7 Jun 2019 13:12:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21C61288B8 for ; Fri, 7 Jun 2019 13:12:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 15D5C28965; Fri, 7 Jun 2019 13:12:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C009128857 for ; Fri, 7 Jun 2019 13:11:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729438AbfFGNL6 (ORCPT ); Fri, 7 Jun 2019 09:11:58 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:53172 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729427AbfFGNL4 (ORCPT ); Fri, 7 Jun 2019 09:11:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1559913116; x=1591449116; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=34wqI78X3YG1tFhgA4iF1bvmBfVutYvEGOanZcd3z6c=; b=WVB6fdKLv21M/9yn5bLblyC/IzuLUrXFc00Mxcm71dAHEcszZgUxnT08 7ExgwnstcCP+AAYugelUnYyPcc8PBIfmrLjDiSe9pBtkYOYRKeyDKTjpt gSzdRYCWC75+0NDo1cEiiodY4iWerModERJwWztd50o86tJliBwmk2JKu 88VvpaCl5/JZZHahuc47L2/s8DiY8PZ9PF2LzAaOlP7A1RSQUR048OdJ8 gDKWohwSc+xUupT5ZFHgs/fcnxPc5QxXEm7T/lzlqgI5ujPXfLC4wfwsk yzkYVn9CvmZYtLa2oQ9tDt/1X1HodW8E2yKPZwFQfkEevE/EgBl4jMVpG g==; X-IronPort-AV: E=Sophos;i="5.63,563,1557158400"; d="scan'208";a="110027840" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Jun 2019 21:11:56 +0800 IronPort-SDR: mQHflmnJrCvmb2Ff+xpTb7R15qem5brXecU7yy8KtETCIT0dRThwcJ8TmCRHIrf3Lqd2JEqwKW +MH+8af10NrNvlkGnd4ifubhswrB50qyKyV7RaseHJBIvcBRWiN5WVSyZRXsmIeMG18+NTSaJK B7X5LdxYEKpZ3usKExCdyd9MTbFGNlbwWTQg3nZOF/pEYkBZGB5nK7N0sJtO0GhaYXb5tVfANA ayxcxVehA31+oLsEhhGuNhtlOP8Uae8hOtHMiejlUg9ocK/EMuXN1mCkksUuu7Gi6M+Oa78L8J cxjJLcZG9HgDtz8Or6AijL8v Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP; 07 Jun 2019 05:49:13 -0700 IronPort-SDR: wHtmOKsPawBuLEPy2aJrCCD3lu2JG3Ie90aB6xkYjS12NfUo2EliNollxKWvghp6NrQD/0Oyhu NhcLnJQCH3IMdvgGXXnKAp+KaVPzVPutYH/sO19tj3ILP2++7HMea8Q0vRqP8nr/cMUa4Xc9d9 rsABvFAXonXcYklBAIkAAjuq30HzqSNfkPe6+4e+OttnhHD0/VXg11hT9VMPMWwt8mJQZhBfo/ 187gZr0gaY6yqNWlYDTxBYKcVMy+TrJs4PEToFtObDD7K1raTTMdncXMz/MvIA0viHIuFgAfSR w4k= Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Jun 2019 06:11:54 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Qu Wenruo , Nikolay Borisov , linux-kernel@vger.kernel.org, Hannes Reinecke , linux-fsdevel@vger.kernel.org, Damien Le Moal , =?utf-8?q?Matias_Bj=C3=B8rling?= , Johannes Thumshirn , Bart Van Assche , Naohiro Aota Subject: [PATCH 19/19] btrfs: enable to mount HMZONED incompat flag Date: Fri, 7 Jun 2019 22:10:25 +0900 Message-Id: <20190607131025.31996-20-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190607131025.31996-1-naohiro.aota@wdc.com> References: <20190607131025.31996-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This final patch adds the HMZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount HMZONED flagged file system. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a0be2b96117a..b30af9bbf22f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -285,7 +285,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ - BTRFS_FEATURE_INCOMPAT_METADATA_UUID) + BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ + BTRFS_FEATURE_INCOMPAT_HMZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)