From patchwork Fri Jul 8 23:18:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911950 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55D1CCCA482 for ; Fri, 8 Jul 2022 23:19:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238889AbiGHXTK (ORCPT ); Fri, 8 Jul 2022 19:19:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236955AbiGHXTI (ORCPT ); Fri, 8 Jul 2022 19:19:08 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02B0F41985; Fri, 8 Jul 2022 16:19:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322347; x=1688858347; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=46bgUsnJhvod1OQcMqcmVs+X0PGcZlNGXTNYwMX0W94=; b=hNRvWzwvVUPk2t02rbwCV1HeOkejm9uyCpVCwGVMl5gH9W3T0ZiCXHaK l9No0lduZKV8buX297E88FgP60G0TVFi6EndgInPzbPBJGeMnFj11NbYS Kj0gs294rPzzwIU3WJlcalnDJm4UMKymcwnFx471Baq+Pf8XjLsGqkhOo 9Y5FwmR1OTjVdB35lfDnI8DkUeQWFJO0eJxmkMGt1DcrmBjAl7Qidjlp/ VVcQq9+lwikmXWTjlK99tkH623nJgiiHPtuc6ISSZz6NqFjzeM9lnt4Ju FggP2ZJHs3XRBiZlM0qx+LGWMwUfe4n87BOyV//lZBXFjkzzG98a1nIk1 A==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871811" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:06 +0800 IronPort-SDR: yYEmo4hE2XNCCGMBYB4uZezzR957IS+GCGTPMc+eRvKqixVZUeXXdxss2wuAORc2guI7I+QPFF f+7ydgGgSVEYjaEVx69BZ7CIO88NUDwFFYrWcokf8ypssXIPXzH7Bj/jgy468dT2PzR+uoBR9I 5xkNNhHQ8QWXr21uAKcLuiKFJEHnkw/C4ZmH8r1OdguXMrwlXvSTBTXxNzNYAyLaBCRjnivLhH MV5wfucTp3Ek4VLR3f8j1ekrFF0a+TFfY1iz8XOo1IPq7VgjF4wnyPtGDryeFwccwtcm21zSxL RAxU710HL9/0UAy9YmbvR5rM Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:11 -0700 IronPort-SDR: JcLMt+KdjvQG4Y9LpYB39ypAVHoA8RfHjdysTxMnETop4io0fG1IIHLBlBTN6oZahaloZ9+vDq lTjvs79JQeExkGiwltBGxv4M/DFGZdGuUemfHlIkr92GArMkljdDvn24Og6Yo3MIO5E5/trhBO /CZYLXeg9fNRvnr2cjVwCj226HHOGlJi0hHoVTzYyAqG5aTqL8CIE9mOEc/oAOZBuEy63o4coY VX6KIT3bx7fESy/ngYCJGQzh8LR7S13hmgSeykonxaFAEJygbr0j9a867s23a0OepeJ7euYUXb FXo= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:05 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota , Johannes Thumshirn Subject: [PATCH 01/13] block: add bdev_max_segments() helper Date: Sat, 9 Jul 2022 08:18:38 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add bdev_max_segments() like other queue parameters. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn Reviewed-by: Jens Axboe Reviewed-by: Christoph Hellwig --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2f7b43444c5f..62e3ff52ab03 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1206,6 +1206,11 @@ bdev_max_zone_append_sectors(struct block_device *bdev) return queue_max_zone_append_sectors(bdev_get_queue(bdev)); } +static inline unsigned int bdev_max_segments(struct block_device *bdev) +{ + return queue_max_segments(bdev_get_queue(bdev)); +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; From patchwork Fri Jul 8 23:18:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2581DCCA483 for ; Fri, 8 Jul 2022 23:19:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238897AbiGHXTL (ORCPT ); Fri, 8 Jul 2022 19:19:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238853AbiGHXTK (ORCPT ); Fri, 8 Jul 2022 19:19:10 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A45C14198F; Fri, 8 Jul 2022 16:19:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322349; x=1688858349; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9+7G1x4we068kq8mY96PXl2+JAjJjjn2izwbzHLi14c=; b=IeqRXvad8iVSz2sVYT6bOtwQrWRekobi2QtGSrMPuQiAzbwnMWwV5I2/ BwJus1YXiUaj1ZSABQmiXw0gv6WRelK507HK1EboVSwHDf9a8tTnWeU/D vmtMNVnPzrpaADdrA/b918MByYlQHQqdrT49PCZA3SQLA9doD3FB+eW0E TMgQvOfS2OxOXIsStXPxvghcBTHqK+uGF5hKCJDs03y9SgwemSNJCUdvS YQdcrs8F/DBf6tcuTiaDu3iQtAP/3czkRVXEA6U89Ylt4yiCamRgEF7VT XauEhWYpw2PWNwpSnGHdglVhZwdRRfblFiQWUG3gkgPkVPS5LBw2RwcDB Q==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871812" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:07 +0800 IronPort-SDR: If0k3NmXCPOzuJ6HvZluX7vqpzFvAHbBEe4iw+ghZZ5mS6aQ2yE1qg6atINM8K6ETpV7Nj5huX ZMC4u1wwyc6sogOZQYjHFVihlcFWdgb/w/Lbr6vjk65TUw59/1FHdpXiI+pjdAabcV0ftPySmE KdnmxEQN9LjbDp777/ZPk72eXE/e4XCeV8l42GEN7arvM9OAIW6huAgAXYZNZ61SK4AuCgBHcC F+a3hMhZhLrgiGHFUTdbsqVWmZhdv9KM7LvCcseVrk6p4z21BNR1KR8S0Ymz4/39PyQt68puYF vQXDcnEOcMZasN14DfD0KU83 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:13 -0700 IronPort-SDR: US9+dHhFeROQyZEuvJbpQ9CmrcVrEGWOanTOD5sv6ccG+B9xtV2SuBW0+uUNuzFr8DBsd9z8k4 EJ2IAXicPVhGsCihdDWlMq5cC8Ba0Ke8IeJprKx9q5CDHtlruIocqZbcgyP+OVtYIsQcjtUtWy XQbV26v8gZhzj9JwbYxMxFsuRQfQhCEWHL8JZNthEShcQcOo73TOoPQGRIVtLWuIQkkKVOWwy0 hURmeHKsY9j9Dy4Fs17QuRzqM6BO6jszBLxPQ9Qr7mKi78Fu3PtjeGcn6KpSdRiskPHEdbRh1U T00= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:06 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 02/13] btrfs: zoned: revive max_zone_append_bytes Date: Sat, 9 Jul 2022 08:18:39 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch is basically a revert of commit 5a80d1c6a270 ("btrfs: zoned: remove max_zone_append_size logic"), but without unnecessary ASSERT and check. The max_zone_append_size will be used as a hint to estimate the number of extents to cover delalloc/writeback region in the later commits. The size of a ZONE APPEND bio is also limited by queue_max_segments(), so this commit considers it to calculate max_zone_append_size. Technically, a bio can be larger than queue_max_segments() * PAGE_SIZE if the pages are contiguous. But, it is safe to consider "queue_max_segments() * PAGE_SIZE" as an upper limit of an extent size to calculate the number of extents needed to write data. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/zoned.c | 17 +++++++++++++++++ fs/btrfs/zoned.h | 1 + 3 files changed, 20 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4e2569f84aab..e4879912c475 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1071,6 +1071,8 @@ struct btrfs_fs_info { */ u64 zone_size; + /* Max size to emit ZONE_APPEND write command */ + u64 max_zone_append_size; struct mutex zoned_meta_io_lock; spinlock_t treelog_bg_lock; u64 treelog_bg; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 79a2d48a5251..bdc533fa80ae 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -415,6 +415,16 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache) nr_sectors = bdev_nr_sectors(bdev); zone_info->zone_size_shift = ilog2(zone_info->zone_size); zone_info->nr_zones = nr_sectors >> ilog2(zone_sectors); + /* + * We limit max_zone_append_size also by max_segments * + * PAGE_SIZE. Technically, we can have multiple pages per segment. But, + * since btrfs adds the pages one by one to a bio, and btrfs cannot + * increase the metadata reservation even if it increases the number of + * extents, it is safe to stick with the limit. + */ + zone_info->max_zone_append_size = + min_t(u64, (u64)bdev_max_zone_append_sectors(bdev) << SECTOR_SHIFT, + (u64)bdev_max_segments(bdev) << PAGE_SHIFT); if (!IS_ALIGNED(nr_sectors, zone_sectors)) zone_info->nr_zones++; @@ -640,6 +650,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 zoned_devices = 0; u64 nr_devices = 0; u64 zone_size = 0; + u64 max_zone_append_size = 0; const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; @@ -674,6 +685,11 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) ret = -EINVAL; goto out; } + if (!max_zone_append_size || + (zone_info->max_zone_append_size && + zone_info->max_zone_append_size < max_zone_append_size)) + max_zone_append_size = + zone_info->max_zone_append_size; } nr_devices++; } @@ -723,6 +739,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) } fs_info->zone_size = zone_size; + fs_info->max_zone_append_size = max_zone_append_size; fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; /* diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 6b2eec99162b..9caeab07fd38 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -19,6 +19,7 @@ struct btrfs_zoned_device_info { */ u64 zone_size; u8 zone_size_shift; + u64 max_zone_append_size; u32 nr_zones; unsigned int max_active_zones; atomic_t active_zones_left; From patchwork Fri Jul 8 23:18:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911952 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BF1DCCA485 for ; Fri, 8 Jul 2022 23:19:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238931AbiGHXTL (ORCPT ); Fri, 8 Jul 2022 19:19:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236955AbiGHXTK (ORCPT ); Fri, 8 Jul 2022 19:19:10 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 563DE41980; Fri, 8 Jul 2022 16:19:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322349; x=1688858349; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=N2vcj5aOOm19BcfSRErh+H43G+2KMatrE8cadITZY0k=; b=TU7JkwAX8R6uIDH5Wa0YAOHkIHGl5+2ct6Qca1zFQDqE0W7pDZYv1VsO adWxg2oXhSk35oH67h+xj12XMgcWUYYVtjVxKkeFwb4+N8JrXFvPlrzYt WXrBZORz+KCXb6IX0w+bcXTIpME3zHPJeshK8Xm7ejZL3WGhBdlea3mMp 75M8O6Ht7EnNoieSvhQYqTWgeowdWhrDw2Ctev8+Nq0jMwoMmdGd4YMKf o173EDIg7U/xvoQjqGi+TLlTIlwM7VWSuR02+8UMT8HaxuyJOC9Beydhc 0zrxJJvYsiJYo47kDnwn9HrbTEJeX5e/msXjfROSGPu4cOMPqT+86I9ar w==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871816" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:08 +0800 IronPort-SDR: MAH7aNBvOqY+RxkxP34d3hZYOoOYZs22aoFETP4Ubb24bR/B6ivxR46ko+vgmF+6vbggbQNdOR erNWuktvMBYqm+jOia4WlJaHw324uU/s+Q1/jSC70umo5TiLl9Mbh2r8v4B7ycet978p/LScK/ Z6hA9nVsViQs9Bo+lCiXY1f4ChMm00ljVplvBVn/tHjrPKXFfqTMb66z/Ps3XBf1neun++F3my SWqdfRWyPMTokuLF7rxTV2QAFlJb/WKYU13IKj8z2un9t4vEPN4//zFDctbHVvYSkQLfiJHQDU iDlMv/USLA+933MORdQezOAG Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:13 -0700 IronPort-SDR: 3MTy+jy+0hQPMukFqrB2EeOevYGWdxxH8Rb8Kr5sUUTX1y/FfdWEN7MhPWu8sbrE8ZPQf0vXr8 tg3WU8AcXCcbuSi5I/b1bgPFvFuQAWW3K2CfAeqsYx1x1IP/Qq0cbs/RaMN5ac/ki2ea5H+CYO QYG8Hebxe4Eaa+Y1wfSa27Ya8KMgfvynGNh2R8v1+9UWDvmPtUpwrygaJOfqtq/QAJJ7WHlEiZ ppcNPMLy32yGVzF1rhhmgIpCki/+STnyyzrT35MFPcfHT0sUWjaeC57lGqow73CfipZEhxcoJ3 egg= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:08 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 03/13] btrfs: replace BTRFS_MAX_EXTENT_SIZE with fs_info->max_extent_size Date: Sat, 9 Jul 2022 08:18:40 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On zoned btrfs, data write out is limited by max_zone_append_size, and a large ordered extent is split according the size of a bio. OTOH, the number of extents to be written is calculated using BTRFS_MAX_EXTENT_SIZE, and that estimated number is used to reserve the metadata bytes to update and/or create the metadata items. The metadata reservation is done at e.g, btrfs_buffered_write() and then released according to the estimation changes. Thus, if the number of extent increases massively, the reserved metadata can run out. The increase of the number of extents easily occurs on zoned btrfs if BTRFS_MAX_EXTENT_SIZE > max_zone_append_size. And, it causes the following warning on a small RAM environment with disabling metadata over-commit (in the following patch). [75721.498492] ------------[ cut here ]------------ [75721.505624] BTRFS: block rsv 1 returned -28 [75721.512230] WARNING: CPU: 24 PID: 2327559 at fs/btrfs/block-rsv.c:537 btrfs_use_block_rsv+0x560/0x760 [btrfs] [75721.524407] Modules linked in: btrfs null_blk blake2b_generic xor raid6_pq loop dm_flakey dm_mod algif_hash af_alg veth xt_nat xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc overlay sunrpc ext4 mbcache jbd2 rapl ipmi_ssif bfq k10temp i2c_piix4 ipmi_si ipmi_devintf ipmi_msghandler zram ip_tables ccp ast bnxt_en drm_vram_helper drm_ttm_helper pkcs8_key_parser asn1_decoder public_key oid_registry fuse ipv6 [last unloaded: btrfs] [75721.581854] CPU: 24 PID: 2327559 Comm: kworker/u64:10 Kdump: loaded Tainted: G W 5.18.0-rc2-BTRFS-ZNS+ #109 [75721.597200] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 [75721.607310] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] [75721.616209] RIP: 0010:btrfs_use_block_rsv+0x560/0x760 [btrfs] [75721.624255] Code: 83 c0 01 38 d0 7c 0c 84 d2 74 08 4c 89 ff e8 57 59 64 e0 41 0f b7 74 24 62 ba e4 ff ff ff 48 c7 c7 a0 dc 33 a1 e8 c4 58 50 e2 <0f> 0b e9 9c fe ff ff 4d 8d a5 a0 02 00 00 4c 89 e7 e8 aa fb 5f e2 [75721.646649] RSP: 0018:ffffc9000fbdf3e0 EFLAGS: 00010286 [75721.654126] RAX: 0000000000000000 RBX: 0000000000004000 RCX: 0000000000000000 [75721.663524] RDX: 0000000000000004 RSI: 0000000000000008 RDI: fffff52001f7be6e [75721.672921] RBP: ffffc9000fbdf420 R08: 0000000000000001 R09: ffff889f8d1fc6c7 [75721.682493] R10: ffffed13f1a3f8d8 R11: 0000000000000001 R12: ffff88980a3c0e28 [75721.692284] R13: ffff889b66590000 R14: ffff88980a3c0e40 R15: ffff88980a3c0e8a [75721.701878] FS: 0000000000000000(0000) GS:ffff889f8d000000(0000) knlGS:0000000000000000 [75721.712601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [75721.720726] CR2: 000055d12e05c018 CR3: 0000800193594000 CR4: 0000000000350ee0 [75721.730499] Call Trace: [75721.735166] [75721.739886] btrfs_alloc_tree_block+0x1e1/0x1100 [btrfs] [75721.747545] ? btrfs_alloc_logged_file_extent+0x550/0x550 [btrfs] [75721.756145] ? btrfs_get_32+0xea/0x2d0 [btrfs] [75721.762852] ? btrfs_get_32+0xea/0x2d0 [btrfs] [75721.769520] ? push_leaf_left+0x420/0x620 [btrfs] [75721.776431] ? memcpy+0x4e/0x60 [75721.781931] split_leaf+0x433/0x12d0 [btrfs] [75721.788392] ? btrfs_get_token_32+0x580/0x580 [btrfs] [75721.795636] ? push_for_double_split.isra.0+0x420/0x420 [btrfs] [75721.803759] ? leaf_space_used+0x15d/0x1a0 [btrfs] [75721.811156] btrfs_search_slot+0x1bc3/0x2790 [btrfs] [75721.818300] ? lock_downgrade+0x7c0/0x7c0 [75721.824411] ? free_extent_buffer.part.0+0x107/0x200 [btrfs] [75721.832456] ? split_leaf+0x12d0/0x12d0 [btrfs] [75721.839149] ? free_extent_buffer.part.0+0x14f/0x200 [btrfs] [75721.846945] ? free_extent_buffer+0x13/0x20 [btrfs] [75721.853960] ? btrfs_release_path+0x4b/0x190 [btrfs] [75721.861429] btrfs_csum_file_blocks+0x85c/0x1500 [btrfs] [75721.869313] ? rcu_read_lock_sched_held+0x16/0x80 [75721.876085] ? lock_release+0x552/0xf80 [75721.881957] ? btrfs_del_csums+0x8c0/0x8c0 [btrfs] [75721.888886] ? __kasan_check_write+0x14/0x20 [75721.895152] ? do_raw_read_unlock+0x44/0x80 [75721.901323] ? _raw_write_lock_irq+0x60/0x80 [75721.907983] ? btrfs_global_root+0xb9/0xe0 [btrfs] [75721.915166] ? btrfs_csum_root+0x12b/0x180 [btrfs] [75721.921918] ? btrfs_get_global_root+0x820/0x820 [btrfs] [75721.929166] ? _raw_write_unlock+0x23/0x40 [75721.935116] ? unpin_extent_cache+0x1e3/0x390 [btrfs] [75721.942041] btrfs_finish_ordered_io.isra.0+0xa0c/0x1dc0 [btrfs] [75721.949906] ? try_to_wake_up+0x30/0x14a0 [75721.955700] ? btrfs_unlink_subvol+0xda0/0xda0 [btrfs] [75721.962661] ? rcu_read_lock_sched_held+0x16/0x80 [75721.969111] ? lock_acquire+0x41b/0x4c0 [75721.974982] finish_ordered_fn+0x15/0x20 [btrfs] [75721.981639] btrfs_work_helper+0x1af/0xa80 [btrfs] [75721.988184] ? _raw_spin_unlock_irq+0x28/0x50 [75721.994643] process_one_work+0x815/0x1460 [75722.000444] ? pwq_dec_nr_in_flight+0x250/0x250 [75722.006643] ? do_raw_spin_trylock+0xbb/0x190 [75722.013086] worker_thread+0x59a/0xeb0 [75722.018511] kthread+0x2ac/0x360 [75722.023428] ? process_one_work+0x1460/0x1460 [75722.029431] ? kthread_complete_and_exit+0x30/0x30 [75722.036044] ret_from_fork+0x22/0x30 [75722.041255] [75722.045047] irq event stamp: 0 [75722.049703] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [75722.057610] hardirqs last disabled at (0): [] copy_process+0x1c1a/0x66b0 [75722.067533] softirqs last enabled at (0): [] copy_process+0x1c59/0x66b0 [75722.077423] softirqs last disabled at (0): [<0000000000000000>] 0x0 [75722.085335] ---[ end trace 0000000000000000 ]--- To fix the estimation, we need to introduce fs_info->max_extent_size to replace BTRFS_MAX_EXTENT_SIZE, which allow setting the different size for regular btrfs vs zoned btrfs. Set fs_info->max_extent_size to BTRFS_MAX_EXTENT_SIZE by default. On zoned btrfs, it is set to fs_info->max_zone_append_size. CC: stable@vger.kernel.org # 5.12+ Fixes: d8e3fb106f39 ("btrfs: zoned: use ZONE_APPEND write for zoned mode") Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c | 2 ++ fs/btrfs/extent_io.c | 8 +++++++- fs/btrfs/inode.c | 6 ++++-- fs/btrfs/zoned.c | 2 ++ 5 files changed, 18 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e4879912c475..fca253bdb4b8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1056,6 +1056,9 @@ struct btrfs_fs_info { u32 csums_per_leaf; u32 stripesize; + /* Maximum size of an extent. BTRFS_MAX_EXTENT_SIZE on regular btrfs. */ + u64 max_extent_size; + /* Block groups and devices containing active swapfiles. */ spinlock_t swapfile_pins_lock; struct rb_root swapfile_pins; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 76835394a61b..914557d59472 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3142,6 +3142,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) fs_info->sectorsize_bits = ilog2(4096); fs_info->stripesize = 4096; + fs_info->max_extent_size = BTRFS_MAX_EXTENT_SIZE; + spin_lock_init(&fs_info->swapfile_pins_lock); fs_info->swapfile_pins = RB_ROOT; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3194eca41635..cedc94a7d5b2 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2021,10 +2021,16 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, struct page *locked_page, u64 *start, u64 *end) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; const u64 orig_start = *start; const u64 orig_end = *end; - u64 max_bytes = BTRFS_MAX_EXTENT_SIZE; +#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS + /* The sanity tests may not set a valid fs_info. */ + u64 max_bytes = fs_info ? fs_info->max_extent_size : BTRFS_MAX_EXTENT_SIZE; +#else + u64 max_bytes = fs_info->max_extent_size; +#endif u64 delalloc_start; u64 delalloc_end; bool found; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b9485e19b696..155282dacc6e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2201,6 +2201,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page void btrfs_split_delalloc_extent(struct inode *inode, struct extent_state *orig, u64 split) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); u64 size; /* not delalloc, ignore it */ @@ -2208,7 +2209,7 @@ void btrfs_split_delalloc_extent(struct inode *inode, return; size = orig->end - orig->start + 1; - if (size > BTRFS_MAX_EXTENT_SIZE) { + if (size > fs_info->max_extent_size) { u32 num_extents; u64 new_size; @@ -2237,6 +2238,7 @@ void btrfs_split_delalloc_extent(struct inode *inode, void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new, struct extent_state *other) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); u64 new_size, old_size; u32 num_extents; @@ -2250,7 +2252,7 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new, new_size = other->end - new->start + 1; /* we're not bigger than the max, unreserve the space and go */ - if (new_size <= BTRFS_MAX_EXTENT_SIZE) { + if (new_size <= fs_info->max_extent_size) { spin_lock(&BTRFS_I(inode)->lock); btrfs_mod_outstanding_extents(BTRFS_I(inode), -1); spin_unlock(&BTRFS_I(inode)->lock); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index bdc533fa80ae..3b45b35aa945 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -741,6 +741,8 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; + if (fs_info->max_zone_append_size < fs_info->max_extent_size) + fs_info->max_extent_size = fs_info->max_zone_append_size; /* * Check mount options here, because we might change fs_info->zoned From patchwork Fri Jul 8 23:18:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BB58C433EF for ; Fri, 8 Jul 2022 23:19:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237547AbiGHXTM (ORCPT ); Fri, 8 Jul 2022 19:19:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238899AbiGHXTL (ORCPT ); Fri, 8 Jul 2022 19:19:11 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6534C41985; Fri, 8 Jul 2022 16:19:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322350; x=1688858350; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wA2Hdc5+oSsAh0DQ5SCxlPFwt1GkrPXMZVV8EyoQmIQ=; b=iOqdW21GlqWasRQInzwYXp0zFaRYuyK8WB2D1P8JKlBcesG8uTmQF+Q8 UAgzfoivnrALSZld2zD34z9jqueotKyKD+elmyLG4Gl6vBES30ctCSrpM 54ibTLiI2qrQMyfYShSWYBtA/yPjjL3ZGjGIztzpUKrlXfRcYE/wc+rdO DSkKBhG/a2YXf4G9flvYrHNK4cyDZzaVEEq0snWB/TeEBnq6f0qQRsBTD aanbXaGZjfzSifybaz/RVYypHVVjAtU6DxEzAR51aIYavAC1Ii2a9lA95 ApX3iOqeLPuG5N5//qLuE1D4HUi8Iy0LXKpewgOmhh1OCxOTWHmCOiguv A==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871822" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:09 +0800 IronPort-SDR: wsc6VLn4pbe1BdQcpzLs8sWPP+Mbas720kj8EVBBiv3XqPMhuMY3973NxfrXcw+wg8xnyRj+bQ HvoC/UVZY0N/Vc9+XUYFuNCT9haJGTGToYcJ+/U0uKg8HsYrrmaIoDxiQ0uL9Zj40uuW9c8FMT ThkzLUMCDMX4bqtODWzA+7qOinDaq7e1bXL0yhEolM5M81MBNCVGC9m6bKQfjqMRH2jD6+Su++ 0shhvOrDnHMdhnCjWmxL+VOgZzIxSZxaQU2QKfHmpTyzzuG5wq5hlzIqX9PXUaXDXHkBu+4qaB Pu4AYjTTfuK/qnqqIMNHFcMJ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:14 -0700 IronPort-SDR: 4eo3PUIgzL9oieReARKaXP5wK8pG83HmTS4Rlptsq0E6ABYILdcG8Kp8a7/BPZldnDKL2xJ5XY aecFfA549tQJQQdMx8nPEzaorfZFyVuQkAcMMZz2XVkP9q1xUtqmaUTzlKnJ+VWvPjQrCRRNaF ZtqqJ924HhG573mSqV8CNp7YZDE5sdXUuOdhdqEQCpVlagPPxkREPqi6yKaeXzVbIWoPRigN8a kTg8KFP5U9ZolEUikj38JMeb13DenB5zElMq11zGBQAu+qL11m+KW923qM0VcB1aZIvxj1vgSk lug= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:08 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 04/13] btrfs: convert count_max_extents() to use fs_info->max_extent_size Date: Sat, 9 Jul 2022 08:18:41 +0900 Message-Id: <8492f9cc951b3324d6b9989c194fd428c799793e.1657321126.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If count_max_extents() uses BTRFS_MAX_EXTENT_SIZE to calculate the number of extents needed, btrfs release the metadata reservation too much on its way to write out the data. Now that BTRFS_MAX_EXTENT_SIZE is replaced with fs_info->max_extent_size, convert count_max_extents() to use it instead, and fix the calculation of the metadata reservation. CC: stable@vger.kernel.org # 5.12+ Fixes: d8e3fb106f39 ("btrfs: zoned: use ZONE_APPEND write for zoned mode") Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 21 +++++++++++++-------- fs/btrfs/delalloc-space.c | 6 +++--- fs/btrfs/inode.c | 16 ++++++++-------- 3 files changed, 24 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index fca253bdb4b8..c215e15baea2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -107,14 +107,6 @@ struct btrfs_ioctl_encoded_io_args; #define BTRFS_STAT_CURR 0 #define BTRFS_STAT_PREV 1 -/* - * Count how many BTRFS_MAX_EXTENT_SIZE cover the @size - */ -static inline u32 count_max_extents(u64 size) -{ - return div_u64(size + BTRFS_MAX_EXTENT_SIZE - 1, BTRFS_MAX_EXTENT_SIZE); -} - static inline unsigned long btrfs_chunk_item_size(int num_stripes) { BUG_ON(num_stripes == 0); @@ -4057,6 +4049,19 @@ static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info) return fs_info->zone_size > 0; } +/* + * Count how many fs_info->max_extent_size cover the @size + */ +static inline u32 count_max_extents(struct btrfs_fs_info *fs_info, u64 size) +{ +#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS + if (!fs_info) + return div_u64(size + BTRFS_MAX_EXTENT_SIZE - 1, BTRFS_MAX_EXTENT_SIZE); +#endif + + return div_u64(size + fs_info->max_extent_size - 1, fs_info->max_extent_size); +} + static inline bool btrfs_is_data_reloc_root(const struct btrfs_root *root) { return root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID; diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c index 36ab0859a263..1e8f17ff829e 100644 --- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -273,7 +273,7 @@ static void calc_inode_reservations(struct btrfs_fs_info *fs_info, u64 num_bytes, u64 disk_num_bytes, u64 *meta_reserve, u64 *qgroup_reserve) { - u64 nr_extents = count_max_extents(num_bytes); + u64 nr_extents = count_max_extents(fs_info, num_bytes); u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes); u64 inode_update = btrfs_calc_metadata_size(fs_info, 1); @@ -350,7 +350,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes, * needs to free the reservation we just made. */ spin_lock(&inode->lock); - nr_extents = count_max_extents(num_bytes); + nr_extents = count_max_extents(fs_info, num_bytes); btrfs_mod_outstanding_extents(inode, nr_extents); inode->csum_bytes += disk_num_bytes; btrfs_calculate_inode_block_rsv_size(fs_info, inode); @@ -413,7 +413,7 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes) unsigned num_extents; spin_lock(&inode->lock); - num_extents = count_max_extents(num_bytes); + num_extents = count_max_extents(fs_info, num_bytes); btrfs_mod_outstanding_extents(inode, -num_extents); btrfs_calculate_inode_block_rsv_size(fs_info, inode); spin_unlock(&inode->lock); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 155282dacc6e..8ce937b0b014 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2218,10 +2218,10 @@ void btrfs_split_delalloc_extent(struct inode *inode, * applies here, just in reverse. */ new_size = orig->end - split + 1; - num_extents = count_max_extents(new_size); + num_extents = count_max_extents(fs_info, new_size); new_size = split - orig->start; - num_extents += count_max_extents(new_size); - if (count_max_extents(size) >= num_extents) + num_extents += count_max_extents(fs_info, new_size); + if (count_max_extents(fs_info, size) >= num_extents) return; } @@ -2278,10 +2278,10 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new, * this case. */ old_size = other->end - other->start + 1; - num_extents = count_max_extents(old_size); + num_extents = count_max_extents(fs_info, old_size); old_size = new->end - new->start + 1; - num_extents += count_max_extents(old_size); - if (count_max_extents(new_size) >= num_extents) + num_extents += count_max_extents(fs_info, old_size); + if (count_max_extents(fs_info, new_size) >= num_extents) return; spin_lock(&BTRFS_I(inode)->lock); @@ -2360,7 +2360,7 @@ void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state, if (!(state->state & EXTENT_DELALLOC) && (bits & EXTENT_DELALLOC)) { struct btrfs_root *root = BTRFS_I(inode)->root; u64 len = state->end + 1 - state->start; - u32 num_extents = count_max_extents(len); + u32 num_extents = count_max_extents(fs_info, len); bool do_list = !btrfs_is_free_space_inode(BTRFS_I(inode)); spin_lock(&BTRFS_I(inode)->lock); @@ -2402,7 +2402,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, struct btrfs_inode *inode = BTRFS_I(vfs_inode); struct btrfs_fs_info *fs_info = btrfs_sb(vfs_inode->i_sb); u64 len = state->end + 1 - state->start; - u32 num_extents = count_max_extents(len); + u32 num_extents = count_max_extents(fs_info, len); if ((state->state & EXTENT_DEFRAG) && (bits & EXTENT_DEFRAG)) { spin_lock(&inode->lock); From patchwork Fri Jul 8 23:18:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D7C7CCA482 for ; Fri, 8 Jul 2022 23:19:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237222AbiGHXTN (ORCPT ); Fri, 8 Jul 2022 19:19:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236955AbiGHXTM (ORCPT ); Fri, 8 Jul 2022 19:19:12 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCA9E41980; Fri, 8 Jul 2022 16:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322351; x=1688858351; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=s7rECdbtPS/xFD5a/QPIGcpKrU9rpkUQJ38wY6Yn8+0=; b=CaDqnLp7hpKdTPhL5rVhDMPml9spO3iAwYTDitkatYCGcsVHtwkVNpep I3HgEgOkQcpYHo3E+8dLvg7U4ByPNFku5muIUYIQbwtOdB63cQXMUNp+H UTwPRKt4PHqjwBUyhrk4pZ5uD7LiwZ8TyxLFv1SMxcwLHx43DZhkNNt/q B0qkYN6ylJn8g1u7s6QiwbQBnsQUxkYQ5mFv2o/QssRWHZWZpom2vDkcY ly4afnsBQohGkCalPptoI8aMG2GfVqfiItWNHOyvqV/AcD13bNWXv9s3h FbVZ0EbdvhfL9Khj/6ym/GX8ricc9I/IAHz9rXhMSb7TIJtyHXxou6OTs g==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871825" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:10 +0800 IronPort-SDR: igcWPKftA1EAk75FzxLtGq3kVYrzw0s2DtAOU8lqiKfH0sED0l8BTJsBpoGm0OuVBqeC1XxN3v ju6h1SU+BpO5khsA1WK4HQo9EDRFWCshLAqDDzQc2oyz/oS2zf7CXFYV3A8PddrbdtoXdYhLzZ kgwg/Bl9c6zYhHlRartvdTQj3UfdMkXbYksu1mS/oQQXVeFTwF5wRJOtO/ZYv9B2MmpqnqhpbQ rL7jK8ySrdcssmaFL0i+WhE0EOtF8G/Sm6R/A+v//cgRVEJHfFJA3K9OkiDvvltr/bcU9aQXW+ RTCSpWxRdDDijTRFsWfw6ATG Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:15 -0700 IronPort-SDR: l2LoFiW6bi8YiTjRqo/8wDg16TTVxQcKSy34EOyak8p/URdjvc4j2b7fr+/3VZC0WxNKRTsjgE lva5Z6l0QbKDDBcWncOftWdRVv9tiBRvsWm0NEkFeIKvA+Q/af0m5WFcAK3WlpnxvA0x4QIVYu dIMxkNNjro715TtzK3zuE//3GdtiBzisk9O7nJPbBG9Op7VF1/NH3MjoHi4gKw7iL0U7mugVeT mZGiQGB6pVBX+VUMUkQMrIJfF4qK7x+pCPrxv4jEFFo0YJOoGE0eIhmcPzr5yWxzdVKONeSWuf wzs= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:09 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota , Johannes Thumshirn Subject: [PATCH 05/13] btrfs: use fs_info->max_extent_size in get_extent_max_capacity() Date: Sat, 9 Jul 2022 08:18:42 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Use fs_info->max_extent_size also in get_extent_max_capacity() for the completeness. This is only used for defrag and not really necessary to fix the metadata reservation size. But, it still suppresses unnecessary defrag operations. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/ioctl.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 7e1b4b0fbd6c..37480d4e6443 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1230,16 +1230,18 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start, return em; } -static u32 get_extent_max_capacity(const struct extent_map *em) +static u32 get_extent_max_capacity(struct btrfs_fs_info *fs_info, + const struct extent_map *em) { if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) return BTRFS_MAX_COMPRESSED; - return BTRFS_MAX_EXTENT_SIZE; + return fs_info->max_extent_size; } static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em, u32 extent_thresh, u64 newer_than, bool locked) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_map *next; bool ret = false; @@ -1263,7 +1265,7 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em, * If the next extent is at its max capacity, defragging current extent * makes no sense, as the total number of extents won't change. */ - if (next->len >= get_extent_max_capacity(em)) + if (next->len >= get_extent_max_capacity(fs_info, em)) goto out; /* Skip older extent */ if (next->generation < newer_than) @@ -1400,6 +1402,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode, bool locked, struct list_head *target_list, u64 *last_scanned_ret) { + struct btrfs_fs_info *fs_info = inode->root->fs_info; bool last_is_target = false; u64 cur = start; int ret = 0; @@ -1484,7 +1487,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode, * Skip extents already at its max capacity, this is mostly for * compressed extents, which max cap is only 128K. */ - if (em->len >= get_extent_max_capacity(em)) + if (em->len >= get_extent_max_capacity(fs_info, em)) goto next; /* From patchwork Fri Jul 8 23:18:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01F80CCA47B for ; Fri, 8 Jul 2022 23:19:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239057AbiGHXTP (ORCPT ); Fri, 8 Jul 2022 19:19:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236955AbiGHXTO (ORCPT ); Fri, 8 Jul 2022 19:19:14 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 527AD41980; Fri, 8 Jul 2022 16:19:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322353; x=1688858353; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AKCSa9IFVquArH+CHsNurvH2tukAQ0tT7Aga37iEAAM=; b=QIXCqMKhUKDebS+V7xkoLFJ8vRtPbsp8nwQYsPB4Bp0YGT1gjUjzFoJo kfe0xDUuuUcRL1kh8o5ZMkVfuZHRPAsEQIUBV18tkwfSj9Hfa8InTJid5 H9JWPd1ZM3BmjCja1soXEu+Q2w3vjuAXflPmfI04MlVX5p5n/bBo98MXA dOyi5TE1DtPlu+QyYlImSNsryM6WaBy5YvqfkgjYI0dUf4d2zRpJbDQn2 TwIigeBpu1mcNgLOkCYL+BP3ibKrlwrW7+og3X5VqxC++0lHvr+RzAvRh 5NutoIvNAFHWvAH4LXOMvT/f65Wdv72K+vzf+ROKYbBHv2c4kNdPA1RMz g==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871827" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:11 +0800 IronPort-SDR: JWtpO3xtMFQdJ1ojDyQF5pi6P0i83UoF1RrQgFtfHVAWJd6EGulk4AIlzKSbuFI0aAGeZ0mqkz yQ25yuZ3Ktsd/C14XLZaK45ppMlMC2XusSSP2cucP56TQ6xuezrX/gVVHDTG4FzFSc0h7yBCmf SgZrfJJ+6/4KTuATITx+yPz5CmH4s0NQwPwpIi0fMkFcX/ly7/Xt9RDxl82DY3x7AgbAvPCU73 4ea9YBn5VYcFR92oZrZmgW2ONP2tovEZGfDpc/xnGRaPpbYFw13cNqefW1Lx0fwm5FiUf4EVcA T/UeumRYPnEB9GJ2kS7JfEFB Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:16 -0700 IronPort-SDR: EoxQdNvQNkc9R58eGoL1wXQdF8ZcYha5IuGHjEqoRUHO1eT0YWODQ5rTIUOOgUKo7qFFwHEA9Y 42WlU1gNMlHt1LdTZyDcYFp9qsGVRpgz+RsDoRYUaI/WzgkCq/1ktfvRhf5xqhq/MnEtehivel PINGuBLFP3+felVNNSErnVQXOJrRjFq8RzvPLeNdv6BqIWAWe/R6TnM1rZV6CatOOIxImHjYO+ nNgGKTyYAg8P9kO1M+a5ULlO3m4IGPrTZZ85r7ur75JK9IVhgOe5GQfkCYNrNamYIDiORKliV6 LaE= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:10 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota , Johannes Thumshirn Subject: [PATCH 06/13] btrfs: let can_allocate_chunk return int Date: Sat, 9 Jul 2022 08:18:43 +0900 Message-Id: <109038d56dfcb9c2f0dc9d37f71acf135bd9ca1b.1657321126.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For the later patch, convert the return type from bool to int. There is no functional changes. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent-tree.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f97a0f28f464..c8f26ab7fe24 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3965,12 +3965,12 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, } } -static bool can_allocate_chunk(struct btrfs_fs_info *fs_info, - struct find_free_extent_ctl *ffe_ctl) +static int can_allocate_chunk(struct btrfs_fs_info *fs_info, + struct find_free_extent_ctl *ffe_ctl) { switch (ffe_ctl->policy) { case BTRFS_EXTENT_ALLOC_CLUSTERED: - return true; + return 0; case BTRFS_EXTENT_ALLOC_ZONED: /* * If we have enough free space left in an already @@ -3980,8 +3980,8 @@ static bool can_allocate_chunk(struct btrfs_fs_info *fs_info, */ if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size && !btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->flags)) - return false; - return true; + return -ENOSPC; + return 0; default: BUG(); } @@ -4063,8 +4063,9 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info, int exist = 0; /*Check if allocation policy allows to create a new chunk */ - if (!can_allocate_chunk(fs_info, ffe_ctl)) - return -ENOSPC; + ret = can_allocate_chunk(fs_info, ffe_ctl); + if (ret) + return ret; trans = current->journal_info; if (trans) From patchwork Fri Jul 8 23:18:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7286CCCA482 for ; Fri, 8 Jul 2022 23:19:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239120AbiGHXTQ (ORCPT ); Fri, 8 Jul 2022 19:19:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232885AbiGHXTO (ORCPT ); Fri, 8 Jul 2022 19:19:14 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 024EC41985; Fri, 8 Jul 2022 16:19:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322353; x=1688858353; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aD1/CtJv1bF7AYbarhWHVXAHA6ubkFe6OSBHGSFbDGM=; b=IP6hZdSRNOE5IZMGbUlB8emu1KLhGEn/FUZvrEcM7JfFGyrekWwBXL/N sPU8CfRnqyriQxDfoW3WWGUxHQicV1eE5/uw8Pp5nZUvQ954RTfuLFxwu +nMo3OVEW5G04VkdDCDZomCmi2zvp36lQd9w2ZpLv9G0FHuucW0FKmQS2 k0PvOQeZlaInoys6r8CmF1a7U7pKwuwvMCtY7JfcnfZpTdUnvyxWU7x2o ncQ7nWMhOgdnUyDJS7LdgRTcKDjKYDXySnfbtbNFG0aexMACrQxZwHAzX fDHNYDsmPysgacccrTGqK9fhytyGVP1J1pupD3VHepmx73SL5QHeSb8gQ g==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871828" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:11 +0800 IronPort-SDR: In8/ZcGRtTTVUzD/3L5E+BH588ALRlur2VgpBsUgraDG5tveofyTZ1hAWpzF8lb/3V2zltCdh9 64wCdSnUuvD2uWURBKlMAEJF2QhC0PaU8r/CGrAnHFBCeFK3JgupRegZ34f1j8aqBrKT4OSWOr zm6Ow5ae6FNOH9IcsmWpJvhFDtIU1OGlWucu2UTVtS7OEHwLlV0DWAGjuh/XUtGxweJ1IZS/2I 3+goOgXkU8QUE8CR8aXT+UisGBrjI7mNaLPPYdh3kxebUdwfe0uGphpjLy6EIwbXUnJvTogshc XeO5o6ge+kFslLpwG7MISY60 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:17 -0700 IronPort-SDR: xRp6S7WUUY7hjI+kbGIPRH4P3GT7qgZTYLNsM+bg65OqHS7B8cbOAv5X765eNk9w6v2mR4dlpe 1mAJ6czy/wfEPDAsWMOjK0GSE8dT0+laEAgO1ziikM0PDcBZe7BYuaUGO/m38JUypeuGpwBB5t VtOeT4eYAuRbSjIcLD3O2/IbllLXuB2IXVujDa/lYoD9iJtp5xnjH/TH9r73us+oUrzOgQB24w GZNhLuOep02dCEQhEJzVa4TSj+tGJuQXxA+9BZcod7JneAu+VuIhbcABmfXSAKcieeM6bu9VRN 1MM= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:11 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 07/13] btrfs: zoned: finish least available block group on data BG allocation Date: Sat, 9 Jul 2022 08:18:44 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When we run out of active zones and no sufficient space is left in any block groups, we need to finish one block group to make room to activate a new block group. However, we cannot do this for metadata block groups because we can cause a deadlock by waiting for a running transaction commit. So, do that only for a data block group. Furthermore, the block group to be finished has two requirements. First, the block group must not have reserved bytes left. Having reserved bytes means we have an allocated region but did not yet send bios for it. If that region is allocated by the thread calling btrfs_zone_finish(), it results in a deadlock. Second, the block group to be finished must not be a SYSTEM block group. Finishing a SYSTEM block group easily breaks further chunk allocation by nullifying the SYSTEM free space. In a certain case, we cannot find any zone finish candidate or btrfs_zone_finish() may fail. In that case, we fall back to split the allocation bytes and fill the last spaces left in the block groups. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent-tree.c | 49 +++++++++++++++++++++++++++++++++--------- fs/btrfs/zoned.c | 40 ++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 ++++++ 3 files changed, 86 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c8f26ab7fe24..5589e04eda0e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3965,6 +3965,44 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, } } +static int can_allocate_chunk_zoned(struct btrfs_fs_info *fs_info, + struct find_free_extent_ctl *ffe_ctl) +{ + /* If we can activate new zone, just allocate a chunk and use it */ + if (btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->flags)) + return 0; + + /* + * We already reached the max active zones. Try to finish one block + * group to make a room for a new block group. This is only possible for + * a data BG because btrfs_zone_finish() may need to wait for a running + * transaction which can cause a deadlock for metadata allocation. + */ + if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) { + int ret = btrfs_zone_finish_one_bg(fs_info); + + if (ret == 1) + return 0; + else if (ret < 0) + return ret; + } + + /* + * If we have enough free space left in an already active block group + * and we can't activate any other zone now, do not allow allocating a + * new chunk and let find_free_extent() retry with a smaller size. + */ + if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size) + return -ENOSPC; + + /* + * We cannot activate a new block group and no enough space left in any + * block groups. So, allocating a new block group may not help. But, + * there is nothing to do anyway, so let's go with it. + */ + return 0; +} + static int can_allocate_chunk(struct btrfs_fs_info *fs_info, struct find_free_extent_ctl *ffe_ctl) { @@ -3972,16 +4010,7 @@ static int can_allocate_chunk(struct btrfs_fs_info *fs_info, case BTRFS_EXTENT_ALLOC_CLUSTERED: return 0; case BTRFS_EXTENT_ALLOC_ZONED: - /* - * If we have enough free space left in an already - * active block group and we can't activate any other - * zone now, do not allow allocating a new chunk and - * let find_free_extent() retry with a smaller size. - */ - if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size && - !btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->flags)) - return -ENOSPC; - return 0; + return can_allocate_chunk_zoned(fs_info, ffe_ctl); default: BUG(); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 3b45b35aa945..40ac90272b53 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2179,3 +2179,43 @@ void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logica spin_unlock(&block_group->lock); btrfs_put_block_group(block_group); } + +int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info) +{ + struct btrfs_block_group *block_group; + struct btrfs_block_group *min_bg = NULL; + u64 min_avail = U64_MAX; + int ret; + + spin_lock(&fs_info->zone_active_bgs_lock); + list_for_each_entry(block_group, &fs_info->zone_active_bgs, + active_bg_list) { + u64 avail; + + spin_lock(&block_group->lock); + if (block_group->reserved || + (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)) { + spin_unlock(&block_group->lock); + continue; + } + + avail = block_group->zone_capacity - block_group->alloc_offset; + if (min_avail > avail) { + if (min_bg) + btrfs_put_block_group(min_bg); + min_bg = block_group; + min_avail = avail; + btrfs_get_block_group(min_bg); + } + spin_unlock(&block_group->lock); + } + spin_unlock(&fs_info->zone_active_bgs_lock); + + if (!min_bg) + return 0; + + ret = btrfs_zone_finish(min_bg); + btrfs_put_block_group(min_bg); + + return ret < 0 ? ret : 1; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 9caeab07fd38..329d28e2fd8d 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -80,6 +80,7 @@ void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info); bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info); void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, u64 length); +int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -249,6 +250,12 @@ static inline bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info) static inline void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, u64 length) { } + +static inline int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info) +{ + return 1; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jul 8 23:18:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911958 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C74BC43334 for ; Fri, 8 Jul 2022 23:19:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239171AbiGHXTR (ORCPT ); Fri, 8 Jul 2022 19:19:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239055AbiGHXTP (ORCPT ); Fri, 8 Jul 2022 19:19:15 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 643A54198E; Fri, 8 Jul 2022 16:19:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322354; x=1688858354; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=B23aDjtBOKZTdBvlwEiWyAAY8IvX0vve2qr11qxvpI8=; b=hBIJKG6rY7jSprrlOce1XSQulgCchG01WRRTdaxJ0ZlBOXWeu8fladZQ WEaGguzQOBbRM+oKxNn5YIIP3OsR9E1LIRaEBTYfr6kIPQVGC7+v3nzqX 97NIxBf6Qwh1WnUT3PNXFLsaN1qblNbcvBuSvc2ivwY1TKfgR9KxXqQjn qJbnxY1+PzzMQsEHADmZdaHQMEPzEEeQ0AlT9H9PQ50hxo+rWyseNMyFg lxtL3l10lc9Gu93DT40l8R+l7P9+vIxzmIVlRoGeZQgADoL5cb5pyu9ju Gf4Jonm5PoT9h7bMp5WKecIfmqnA5tSYIRn9tCl/yDzTj95KJkY2r4Xf0 w==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871830" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:12 +0800 IronPort-SDR: iSk4bW6y9deQraCh1F0BmIs1CPS2vYABZfWw4K038bDhbSvc2NvJJwqa3FkEhfF6BN3AjlfT0/ +PnXFUQfHfbOEE4EUGROlIXvV8u7B4A3UiznxpDTL5k6B/uVs+GQMyKe+qjwER7dTLBI/nYdMw QcEMhT2U8dsJQEvL1bpZK4ijwvl7FG9lxOgUU+sLq3CbMVkeNK2Mjki5B3DVbvDwc5Qet8nMkg xDwMV40azM7QK76X8hqEC/9OdcfGdIyrEAkPhroFqgJ8qH82XWs26Gumwv9DC0HP8oBDoclhni RfKy+p+CJdeCtPc60tyKQFff Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:18 -0700 IronPort-SDR: kjjXvAiRmEfGKdgA+lwFj3+lRl0BpUzmBEljs1XNAHuQfv+4sDzvso/u81QhgRhDYCYNQvVVIE KE/sGjrmgTAbgurxOyc9BAOAA0qaDSJtlo+EgnEzjElBjnmQ4jjYdWre7R0Qzdx7mEIzBBrB3Y jXe9dv2jcT8wDrmFxq8HKe36Iu5JBoCevWmGEQgKGNNTM1NgVtzguFMARz9LA/lH68clccRV4z 2rapODWIGOZ97OG8v1SGbgGNwle093IHZpxHsrTk4mdCmgTdcAIkR62GluSm2lZHH9N2cywI3q 6aY= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:12 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 08/13] btrfs: zoned: introduce space_info->active_total_bytes Date: Sat, 9 Jul 2022 08:18:45 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The active_total_bytes, like the total_bytes, accounts for the total bytes of active block groups in the space_info. With an introduction of active_total_bytes, we can check if the reserved bytes can be written to the block groups without activating a new block group. The check is necessary for metadata allocation on zoned btrfs. We cannot finish a block group, which may require waiting for the current transaction, from the metadata allocation context. Instead, we need to ensure the on-going allocation (reserved bytes) fits in active block groups. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 12 +++++++++--- fs/btrfs/space-info.c | 41 ++++++++++++++++++++++++++++++++--------- fs/btrfs/space-info.h | 4 +++- fs/btrfs/zoned.c | 6 ++++++ 4 files changed, 50 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index e930749770ac..51e7c1f1d93f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1051,8 +1051,13 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->length * factor); + WARN_ON(block_group->zone_is_active && + block_group->space_info->active_total_bytes + < block_group->length); } block_group->space_info->total_bytes -= block_group->length; + if (block_group->zone_is_active) + block_group->space_info->active_total_bytes -= block_group->length; block_group->space_info->bytes_readonly -= (block_group->length - block_group->zone_unusable); block_group->space_info->bytes_zone_unusable -= @@ -2107,7 +2112,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, cache->length, cache->used, cache->bytes_super, - cache->zone_unusable, &space_info); + cache->zone_unusable, cache->zone_is_active, + &space_info); cache->space_info = space_info; @@ -2177,7 +2183,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) } btrfs_update_space_info(fs_info, bg->flags, em->len, em->len, - 0, 0, &space_info); + 0, 0, false, &space_info); bg->space_info = space_info; link_block_group(bg); @@ -2558,7 +2564,7 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, cache->bytes_super, cache->zone_unusable, - &cache->space_info); + cache->zone_is_active, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 62d25112310d..b970909c0820 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -295,7 +295,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, u64 bytes_readonly, u64 bytes_zone_unusable, - struct btrfs_space_info **space_info) + bool active, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; int factor; @@ -306,6 +306,8 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, ASSERT(found); spin_lock(&found->lock); found->total_bytes += total_bytes; + if (active) + found->active_total_bytes += total_bytes; found->disk_total += total_bytes * factor; found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; @@ -369,6 +371,22 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, return avail; } +static inline u64 writable_total_bytes(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info) +{ + /* + * On regular btrfs, all total_bytes are always writable. On zoned + * btrfs, there may be a limitation imposed by max_active_zzones. For + * metadata allocation, we cannot finish an existing active block group + * to avoid a deadlock. Thus, we need to consider only the active groups + * to be writable for metadata space. + */ + if (!btrfs_is_zoned(fs_info) || (space_info->flags & BTRFS_BLOCK_GROUP_DATA)) + return space_info->total_bytes; + + return space_info->active_total_bytes; +} + int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, u64 bytes, enum btrfs_reserve_flush_enum flush) @@ -383,7 +401,7 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, used = btrfs_space_info_used(space_info, true); avail = calc_available_free_space(fs_info, space_info, flush); - if (used + bytes < space_info->total_bytes + avail) + if (used + bytes < writable_total_bytes(fs_info, space_info) + avail) return 1; return 0; } @@ -419,7 +437,7 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info, ticket = list_first_entry(head, struct reserve_ticket, list); /* Check and see if our ticket can be satisfied now. */ - if ((used + ticket->bytes <= space_info->total_bytes) || + if ((used + ticket->bytes <= writable_total_bytes(fs_info, space_info)) || btrfs_can_overcommit(fs_info, space_info, ticket->bytes, flush)) { btrfs_space_info_update_bytes_may_use(fs_info, @@ -750,6 +768,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, { u64 used; u64 avail; + u64 total; u64 to_reclaim = space_info->reclaim_size; lockdep_assert_held(&space_info->lock); @@ -764,8 +783,9 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, * space. If that's the case add in our overage so we make sure to put * appropriate pressure on the flushing state machine. */ - if (space_info->total_bytes + avail < used) - to_reclaim += used - (space_info->total_bytes + avail); + total = writable_total_bytes(fs_info, space_info); + if (total + avail < used) + to_reclaim += used - (total + avail); return to_reclaim; } @@ -775,9 +795,12 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, { u64 global_rsv_size = fs_info->global_block_rsv.reserved; u64 ordered, delalloc; - u64 thresh = div_factor_fine(space_info->total_bytes, 90); + u64 total = writable_total_bytes(fs_info, space_info); + u64 thresh; u64 used; + thresh = div_factor_fine(total, 90); + lockdep_assert_held(&space_info->lock); /* If we're just plain full then async reclaim just slows us down. */ @@ -839,8 +862,8 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, BTRFS_RESERVE_FLUSH_ALL); used = space_info->bytes_used + space_info->bytes_reserved + space_info->bytes_readonly + global_rsv_size; - if (used < space_info->total_bytes) - thresh += space_info->total_bytes - used; + if (used < total) + thresh += total - used; thresh >>= space_info->clamp; used = space_info->bytes_pinned; @@ -1557,7 +1580,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, * can_overcommit() to ensure we can overcommit to continue. */ if (!pending_tickets && - ((used + orig_bytes <= space_info->total_bytes) || + ((used + orig_bytes <= writable_total_bytes(fs_info, space_info)) || btrfs_can_overcommit(fs_info, space_info, orig_bytes, flush))) { btrfs_space_info_update_bytes_may_use(fs_info, space_info, orig_bytes); diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index e7de24a529cf..3cc356a55c53 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -19,6 +19,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 active_total_bytes; /* total bytes in the space, but only accounts + active block groups. */ u64 bytes_zone_unusable; /* total bytes that are unusable until resetting the device zone */ @@ -124,7 +126,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, u64 bytes_readonly, u64 bytes_zone_unusable, - struct btrfs_space_info **space_info); + bool active, struct btrfs_space_info **space_info); void btrfs_update_space_info_chunk_size(struct btrfs_space_info *space_info, u64 chunk_size); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 40ac90272b53..44a4b9e7dae9 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1848,6 +1848,7 @@ struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info, bool btrfs_zone_activate(struct btrfs_block_group *block_group) { struct btrfs_fs_info *fs_info = block_group->fs_info; + struct btrfs_space_info *space_info = block_group->space_info; struct map_lookup *map; struct btrfs_device *device; u64 physical; @@ -1859,6 +1860,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group) map = block_group->physical_map; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); if (block_group->zone_is_active) { ret = true; @@ -1887,7 +1889,10 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group) /* Successfully activated all the zones */ block_group->zone_is_active = 1; + space_info->active_total_bytes += block_group->length; spin_unlock(&block_group->lock); + btrfs_try_granting_tickets(fs_info, space_info); + spin_unlock(&space_info->lock); /* For the active block group list */ btrfs_get_block_group(block_group); @@ -1900,6 +1905,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group) out_unlock: spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); return ret; } From patchwork Fri Jul 8 23:18:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56D1BCCA485 for ; Fri, 8 Jul 2022 23:19:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239211AbiGHXTS (ORCPT ); Fri, 8 Jul 2022 19:19:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236955AbiGHXTQ (ORCPT ); Fri, 8 Jul 2022 19:19:16 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 021724198F; Fri, 8 Jul 2022 16:19:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322354; x=1688858354; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=f2igtJVgD9KUWGwQbp9cGyvAS3rC0g3GISa20ZYjc3M=; b=WjKsMxmu2msDE92WPQv7iHrUhJmmtB7/pN+W1TkZWD3zAJ3+AOU+1ayW aUMJd2QAIq/iVrAoUnU5OT5zwTQBJY29AyXJYxbyt41tzhJatmC/eMeJr fCSKjucH8RLN4r8SYZOSuqhT5wQVdwq6/KT2c/QF9CymNbTkl0bvaTqIP hdWUcLwCRjjaK44I5pJw5Kxnoh8cRO3BBMp58eO9nRlquOtEBR41YNZpr cW1bOz7dDgyqR/m+iMQUTrU7a3y+GIeS1JdnLWgKCdhxUQXzgwf1bVAEU 7DjoT7fQbE1Aa5Kj8crfaNhSXZwfpjO5x7/Br5p2V4/uE2JJ1OKYyLc5u Q==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871832" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:13 +0800 IronPort-SDR: w2kdxiJUA8WR27D+mQ6PjxG+0nEymtYVq5w+tMF0fNF83eVgAf6ty2CAvKbFyH+pvEoXIeUq/o dNw3/hvNtu7Z7VlNwLT4K3bznt7Uq2dtPtrycvebnaK5eJpve5t36dEyZlAdSP3RkYpBfGY4qL MUPWUJHfoWwTz1BuqlK5mWwUqXoVxtA+Ti7a93QardtdCL/YWJ6nR366cqNtUFQeMrey7cYCki NepmikcYjdQkWAwxIDzJQNM+lKGVicpxa4bhizshKaiQOXddCVVOdjuyKfz+REtMWVYnzlPlJM 7kJzDuLokSril1zJOWKwx59s Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:19 -0700 IronPort-SDR: YpjFQaxDmuUNh2/uhq5OtMaDTbryc3Um0+0onc9GHzUQhrvU/ZbktweZO2o/jqwLgD/IuO4bhe 8zcaS556ieJIJsQ7Jh53DfHohNngwNZEzdqLdMe4vsQx490toyyQiC65wmEovTrrD3gu09gBA8 Z50nVuyoOEpnqwD6N1V4Wz7wmz6bTwqxmKSCk3DeiygumbHPQDNUkIG1SPGrhMELsBFlXcCUHv NlbYf+3FPv34tlEbh+14HU8syuYn5qwYpO2WwtFcSHFS7tYwaKZbx4OqYWTFgRaCFv4r2dho00 1Z8= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:13 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota , Johannes Thumshirn Subject: [PATCH 09/13] btrfs: zoned: disable metadata overcommit for zoned Date: Sat, 9 Jul 2022 08:18:46 +0900 Message-Id: <42999b4386c75896ed14fde52d2d411a45824c0a.1657321126.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The metadata overcommit makes the space reservation flexible but it is also harmful to active zone tracking. Since we cannot finish a block group from the metadata allocation context, we might not activate a new block group and might not be able to actually write out the overcommit reservations. So, disable metadata overcommit for zoned btrfs. We will ensure the reservations are under active_total_bytes in the following patches. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/space-info.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index b970909c0820..7183a8dc9b34 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -399,7 +399,10 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, return 0; used = btrfs_space_info_used(space_info, true); - avail = calc_available_free_space(fs_info, space_info, flush); + if (btrfs_is_zoned(fs_info) && (space_info->flags & BTRFS_BLOCK_GROUP_METADATA)) + avail = 0; + else + avail = calc_available_free_space(fs_info, space_info, flush); if (used + bytes < writable_total_bytes(fs_info, space_info) + avail) return 1; From patchwork Fri Jul 8 23:18:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911960 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEA11CCA483 for ; Fri, 8 Jul 2022 23:19:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239343AbiGHXTV (ORCPT ); Fri, 8 Jul 2022 19:19:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239103AbiGHXTQ (ORCPT ); Fri, 8 Jul 2022 19:19:16 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7888841990; Fri, 8 Jul 2022 16:19:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322355; x=1688858355; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aWsC8S1XVZEWAEOxATs4DJI0HoCKwzC9L3mVldmQCbE=; b=dgA2TjCgIfd5gP4iO7QdGfDzXqXzuC3LGm4m0p7q0Z22EHydwMgcRWdW xNo6ljj04qXzEBM0M20Mgj4fjlrSZzg3JAR65za+lfVbcVmD2OggSum2E XbstU7WyA95BL2Q0FsQi5bNJMgbs2iDL+JFLtbjRbte/TYpWojUpVzNzc kiinWFHoipZ3XGXz0qHN0crRnliYgEV6BqWHX2OzxV0DpKllwLKfpWYsw ijlIt0rTkXeN9kLPVNED8mJZC3xh3mCN4AJxV5KItyw7dFifESBzQudEe fRsCxXcdLl5bSbOaZHfo0wc5QQ4Y2JKiFwP5sKCkeB8rHdLA1vgdUT/Hr Q==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871833" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:14 +0800 IronPort-SDR: xusOVtWDzUbiP2AMp4fpe2ZmLKps5zvhd6glGK1Dv6+yLW8+XWTmqusPPmYqIvD+nO4HMof1yn 6CILPOn7n0p8Tf0hEjyKQk1CsMXFOkGbmBukOHd5vHV7eMtYADpDAv2TWSIVLTIXBFfkcQMybN sfM+Bm0KsAHTbOccmM4dIXnctv2tUJdyYegjHaqln+4zbmYC3bnah0EK4UJZX6ra+qbtBUTDRd f1NKopB4GaKc5y25B2Cb1d14w0mrhJKZyOlFajQtMMn0qno36zye6GFUS+EfGE2Rth4Kdd29Wt o1UnfCQ56u5qLwojRUrP3QHq Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:20 -0700 IronPort-SDR: Reyb4YtgL1zgwxjjMn0rPIdrWEuLhb4OTpsEDAFALPZMckVorPLdfGLIa/OYF1JQD8iBzkTGxy /+3wIOFX6nlomAD5lISaP0vu1aKmoPg14TcNZ7+FC9RrVUlINRJIovQC/MhuQk6sbdlPyjibgE RLr/j7PfRYs7HV4UzUbhfA+14BSSWyQXMeueVfgP4x+l2lUGvuw323UfJ36oTyyxs3dTLiwXGc XZZaCTOvRT++ol/mJ9gkjU/i340U4Zouqml72j+t2crxV4Jnl9rfmNXdAYz9TtcwlA+UBbwg8C ufw= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:14 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 10/13] btrfs: zoned: activate metadata BG on flush_space Date: Sat, 9 Jul 2022 08:18:47 +0900 Message-Id: <9356a688352bf220fba3dda1deff0486055d42ee.1657321126.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For metadata space on zoned btrfs, reaching ALLOC_CHUNK{,_FORCE} means we don't have enough space left in the active_total_bytes. Before allocating a new chunk, we can try to activate an existing block group in this case. Also, allocating a chunk is not enough to grant a ticket for metadata space on zoned btrfs. We need to activate the block group to increase the active_total_bytes. btrfs_zoned_activate_one_bg() implements the activation feature. It will activate a block group by (maybe) finishing a block group. It will give up activating a block group if it cannot finish any block group. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota --- fs/btrfs/space-info.c | 30 ++++++++++++++++++++++++ fs/btrfs/zoned.c | 53 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 10 ++++++++ 3 files changed, 93 insertions(+) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 7183a8dc9b34..b99e3c32c07d 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -9,6 +9,7 @@ #include "ordered-data.h" #include "transaction.h" #include "block-group.h" +#include "zoned.h" /* * HOW DOES SPACE RESERVATION WORK @@ -724,6 +725,18 @@ static void flush_space(struct btrfs_fs_info *fs_info, break; case ALLOC_CHUNK: case ALLOC_CHUNK_FORCE: + /* + * For metadata space on zoned btrfs, reaching here means we + * don't have enough space left in active_total_bytes. Try to + * activate a block group first, because we may have inactive + * block group already allocated. + */ + ret = btrfs_zoned_activate_one_bg(fs_info, space_info, false); + if (ret < 0) + break; + else if (ret == 1) + break; + trans = btrfs_join_transaction(root); if (IS_ERR(trans)) { ret = PTR_ERR(trans); @@ -734,6 +747,23 @@ static void flush_space(struct btrfs_fs_info *fs_info, (state == ALLOC_CHUNK) ? CHUNK_ALLOC_NO_FORCE : CHUNK_ALLOC_FORCE); btrfs_end_transaction(trans); + + /* + * For metadata space on zoned btrfs, allocating a new chunk is + * not enough. We still need to activate the block group. Active + * the newly allocated block group by (maybe) finishing a block + * group. + */ + if (ret == 1) { + ret = btrfs_zoned_activate_one_bg(fs_info, space_info, true); + /* + * Revert to the original ret regardless we could finish + * one block group or not. + */ + if (ret >= 0) + ret = 1; + } + if (ret > 0 || ret == -ENOSPC) ret = 0; break; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 44a4b9e7dae9..67098f3fcd14 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2225,3 +2225,56 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info) return ret < 0 ? ret : 1; } + +int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, + bool do_finish) +{ + struct btrfs_block_group *bg; + bool need_finish; + int index; + + if (!btrfs_is_zoned(fs_info) || (space_info->flags & BTRFS_BLOCK_GROUP_DATA)) + return 0; + + /* No more block group to activate */ + if (space_info->active_total_bytes == space_info->total_bytes) + return 0; + + for (;;) { + int ret; + + need_finish = false; + down_read(&space_info->groups_sem); + for (index = 0; index < BTRFS_NR_RAID_TYPES; index++) { + list_for_each_entry(bg, &space_info->block_groups[index], list) { + if (!spin_trylock(&bg->lock)) + continue; + if (btrfs_zoned_bg_is_full(bg) || bg->zone_is_active) { + spin_unlock(&bg->lock); + continue; + } + spin_unlock(&bg->lock); + + if (btrfs_zone_activate(bg)) { + up_read(&space_info->groups_sem); + return 1; + } + + need_finish = true; + } + } + up_read(&space_info->groups_sem); + + if (!do_finish || !need_finish) + break; + + ret = btrfs_zone_finish_one_bg(fs_info); + if (ret == 0) + break; + if (ret < 0) + return ret; + } + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 329d28e2fd8d..f7b0b9035fd6 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -81,6 +81,8 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info); void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, u64 length); int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info); +int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, bool do_finish); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -256,6 +258,14 @@ static inline int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info) return 1; } +static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, + bool do_finish) +{ + /* Consider all the BGs are active */ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Fri Jul 8 23:18:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911959 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31455CCA482 for ; Fri, 8 Jul 2022 23:19:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239305AbiGHXTU (ORCPT ); Fri, 8 Jul 2022 19:19:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239107AbiGHXTQ (ORCPT ); Fri, 8 Jul 2022 19:19:16 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDB4B4198E; Fri, 8 Jul 2022 16:19:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322355; x=1688858355; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CEiqME908kz6QYHxabvGx9pQaRLg7nY4WcQdW60RnH8=; b=bebP+AjCOC2zjnoWUt1xoLGkkh0lWDVpDdCVEas09DveWXX5nc8GXCrD Oh+lIqn2iHPJSFB4iKSONfKe4e0Vc0edkHKFjD9neSMjyEfpSImd3OzOJ 41XoOeJwcSFI08dnNaP140o5HizLQjpx/WWFBWlBK8LE/sBJkUcEcPEfF D73BH17THXQ3XPu7glX/jh0IpN2nviJVJu4m8bVPM0FP8DIEfa8JjmkoH MnUOkvoHd4UyZbZx+Z9gRlZkK2rpJhDhbWe6W5B/Fm5RIIRrOcqL9Xap8 7KzCUF6/Cmjm/26QjGUaGLo/MiVYhECRFGdY5pmvEeI/YF8Jgw85wlLkk g==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871834" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:15 +0800 IronPort-SDR: dmbMpwOdNIYZ18mKgsBHdjhJi5tG2OSDnsBvUlZEWauu4D1NA9bDskBehAUaSo3LI2YTvmRI4G jmNGGDzYhSeRy1+zYmZn6lLlRKBtoGuJBtZgENS4VCL/TqUtLq3QDtPCjzdjbUrcuy9hxvn8Xv mMhDUFC69FgrDbHs5LMhPWiFpNcb28p7cfUT9oU5Wi/73AV+NtVilFlcjjXz1kQ/zhfbfMHFs6 /tCjTUZhSguYzjaHSFwU6YHun2F7zgTazT63rg2LnzPtV11c7HgnsbDeK9IJSBMo5uaZAvRNo5 wCtWOH+5zpMyEUtbK6JX+1aA Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:21 -0700 IronPort-SDR: eUMj/ZsmIzaPJO1Aapall29LD7Sh1+WJcQletCRnRUTx/qhNfA0bK8GRH+0ygVcL2TUlylujuP Xhd1RdTRGaMcn0ShNsiCpzQhYsVnP3BEAcC7eSNY+q1YSdt7iXLR5eCfEG//cEwYUx0B0B9lBU NZEDiD9yzjWUmdyGZ5PyEQfNCxJPbu/4Ajd/f1Xwl6JatM82wSjbvl0smkq1QEzwb3/pZMlmkZ QjfuLctX/6vKlU0e3mZUk6e1CzXUK1ffZ96Bd431q0BNoE0HcaLgOdZ1mr3v3Hq35X4c9oC4M2 QKc= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:15 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 11/13] btrfs: zoned: activate necessary block group Date: Sat, 9 Jul 2022 08:18:48 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are two places where allocating a chunk is not enough. These two places are trying to ensure the space by allocating a chunk. To meet the condition for active_total_bytes, we also need to activate a block group there. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 51e7c1f1d93f..14084da12844 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2664,6 +2664,14 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache, ret = btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE); if (ret < 0) goto out; + /* + * We have allocated a new chunk. We also need to activate that chunk to + * grant metadata tickets for zoned btrfs. + */ + ret = btrfs_zoned_activate_one_bg(fs_info, cache->space_info, true); + if (ret < 0) + goto out; + ret = inc_block_group_ro(cache, 0); if (ret == -ETXTBSY) goto unlock_out; @@ -3889,6 +3897,14 @@ static void reserve_chunk_space(struct btrfs_trans_handle *trans, if (IS_ERR(bg)) { ret = PTR_ERR(bg); } else { + /* + * We have a new chunk. We also need to activate it for + * zoned btrfs. + */ + ret = btrfs_zoned_activate_one_bg(fs_info, info, true); + if (ret < 0) + return; + /* * If we fail to add the chunk item here, we end up * trying again at phase 2 of chunk allocation, at From patchwork Fri Jul 8 23:18:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D540C433EF for ; Fri, 8 Jul 2022 23:19:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236955AbiGHXTW (ORCPT ); Fri, 8 Jul 2022 19:19:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239176AbiGHXTR (ORCPT ); Fri, 8 Jul 2022 19:19:17 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D195841980; Fri, 8 Jul 2022 16:19:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322356; x=1688858356; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EmpJCR6lPLjG6aCHj437xrpIe7wUnZTT8lVACC21V5w=; b=mG+w6Uwk1RBE6QYDPtNkuTcwA6ZMkoGUw4nsyUugLSRZroezWrEx6mTB tHTSk5tFVNNElVKsjumoyH9sHtzbfL/gTqswdAQVqRLLx9hlmIfQK3OKZ PWTKK9AgX8pc3Xq3rFwIh/rH5v9NnP+bxHK102LATwMSBv2LBaEnoJ6/Y dJDJmXUwE2jSOUV9/HVxllXna2NfDaYuehQf7PcdHj4Z3GYv0kDUGQwie yW90o21PfaceOpDKdL6v4QRhZPDpPjLcrqhOY5sdWRMQjUAxbmH5loTFm LqyGyW4m6WDObvBZ81D7Igjiib61Isuq97XrrJsSYI0xFZsfBftU2aSMl g==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871835" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:16 +0800 IronPort-SDR: zyMsGKem96ts4yraFEw1Zo1KnNuOKmjTB99wf0qr7n4OS2YwCh1hL/6xDyuSTnBN7qNTTes51d Gxm06E9AcEDCmvJtVX3fqh+Bpftl4mJwEZCabqhTWsYY1QaE1zNdUcI+hIKMIE7+z3heQwDWwr pcumqcEtL8xp6bw3MialvM4ZBsGcPlUlPmWeb2IzMzVboThXhUJX5AC2S0FumGFgA8B/pBR2cw HekFuuw006M00HADY6s6HwDXUwepC39K8+rLActoisBDSXtp7k0mmUn5ffRqxQD+8WUKC5qyfJ UG7cTvnLmUj5Exo0R4I4dTOe Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:21 -0700 IronPort-SDR: wwVNYIbhT0pKRWgedCIUZGcgnlMNBbjSTNeqmr9Ft+fsj1ROzm0i4VQKhWitbrJu6in+3zn2HU RnIvqeppWh9m40cYF16p19JgFvnaM7IWOAkxhlhAo3lDtTsL0FjtgbWVjyVZiKDpgFpu0KCS5v D+X+nAm7/ZZCDICgjfYrvsQ66WRlzRNLBUZUpIeQXEftSNjkXP3ECb7ZkEXEapVBXLbFj4udMw TDqzloD/3XIUKtnJ8IqbvTrXRfJk9XerpOHTWP0uCuL4o6cK5zujcUpldgLElhsB3Pzezahsvg VfU= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:16 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 12/13] btrfs: zoned: write out partially allocated region Date: Sat, 9 Jul 2022 08:18:49 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org cow_file_range() works in an all-or-nothing way: if it fails to allocate an extent for a part of the given region, it gives up all the region including the successfully allocated parts. On cow_file_range(), run_delalloc_zoned() writes data for the region only when it successfully allocate all the region. This all-or-nothing allocation and write-out are problematic when available space in all the block groups are get tight with the active zone restriction. btrfs_reserve_extent() try hard to utilize the left space in the active block groups and gives up finally and fails with -ENOSPC. However, if we send IOs for the successfully allocated region, we can finish a zone and can continue on the rest of the allocation on a newly allocated block group. This patch implements the partial write-out for run_delalloc_zoned(). With this patch applied, cow_file_range() returns -EAGAIN to tell the caller to do something to progress the further allocation, and tells the successfully allocated region with done_offset. Furthermore, the zoned extent allocator returns -EAGAIN to tell cow_file_range() going back to the caller side. Actually, we still need to wait for an IO to complete to continue the allocation. The next patch implements that part. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 10 +++++++ fs/btrfs/inode.c | 63 ++++++++++++++++++++++++++++++++---------- 2 files changed, 59 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5589e04eda0e..1b29b16f6736 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3995,6 +3995,16 @@ static int can_allocate_chunk_zoned(struct btrfs_fs_info *fs_info, if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size) return -ENOSPC; + /* + * Even min_alloc_size is not left in any block groups. Since we cannot + * activate a new block group, allocating it may not help. Let's tell a + * caller to try again and hope it progress something by writing some + * parts of the region. That is only possible for data block groups, + * where a part of the region can be written. + */ + if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) + return -EAGAIN; + /* * We cannot activate a new block group and no enough space left in any * block groups. So, allocating a new block group may not help. But, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ce937b0b014..681e2cb4dd9c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -117,7 +117,8 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback); static noinline int cow_file_range(struct btrfs_inode *inode, struct page *locked_page, u64 start, u64 end, int *page_started, - unsigned long *nr_written, int unlock); + unsigned long *nr_written, int unlock, + u64 *done_offset); static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start, u64 len, u64 orig_start, u64 block_start, u64 block_len, u64 orig_block_len, @@ -921,7 +922,7 @@ static int submit_uncompressed_range(struct btrfs_inode *inode, * can directly submit them without interruption. */ ret = cow_file_range(inode, locked_page, start, end, &page_started, - &nr_written, 0); + &nr_written, 0, NULL); /* Inline extent inserted, page gets unlocked and everything is done */ if (page_started) { ret = 0; @@ -1170,7 +1171,8 @@ static u64 get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, static noinline int cow_file_range(struct btrfs_inode *inode, struct page *locked_page, u64 start, u64 end, int *page_started, - unsigned long *nr_written, int unlock) + unsigned long *nr_written, int unlock, + u64 *done_offset) { struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; @@ -1363,6 +1365,21 @@ static noinline int cow_file_range(struct btrfs_inode *inode, btrfs_dec_block_group_reservations(fs_info, ins.objectid); btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1); out_unlock: + /* + * If done_offset is non-NULL and ret == -EAGAIN, we expect the + * caller to write out the successfully allocated region and retry. + */ + if (done_offset && ret == -EAGAIN) { + if (orig_start < start) + *done_offset = start - 1; + else + *done_offset = start; + return ret; + } else if (ret == -EAGAIN) { + /* Convert to -ENOSPC since the caller cannot retry. */ + ret = -ENOSPC; + } + /* * Now, we have three regions to clean up: * @@ -1608,19 +1625,37 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode, u64 end, int *page_started, unsigned long *nr_written) { + u64 done_offset = end; int ret; + bool locked_page_done = false; - ret = cow_file_range(inode, locked_page, start, end, page_started, - nr_written, 0); - if (ret) - return ret; + while (start <= end) { + ret = cow_file_range(inode, locked_page, start, end, page_started, + nr_written, 0, &done_offset); + if (ret && ret != -EAGAIN) + return ret; - if (*page_started) - return 0; + if (*page_started) { + ASSERT(ret == 0); + return 0; + } + + if (ret == 0) + done_offset = end; + + if (done_offset == start) + return -ENOSPC; + + if (!locked_page_done) { + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + } + locked_page_done = true; + extent_write_locked_range(&inode->vfs_inode, start, done_offset); + + start = done_offset + 1; + } - __set_page_dirty_nobuffers(locked_page); - account_page_redirty(locked_page); - extent_write_locked_range(&inode->vfs_inode, start, end); *page_started = 1; return 0; @@ -1712,7 +1747,7 @@ static int fallback_to_cow(struct btrfs_inode *inode, struct page *locked_page, } return cow_file_range(inode, locked_page, start, end, page_started, - nr_written, 1); + nr_written, 1, NULL); } struct can_nocow_file_extent_args { @@ -2185,7 +2220,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page page_started, nr_written); else ret = cow_file_range(inode, locked_page, start, end, - page_started, nr_written, 1); + page_started, nr_written, 1, NULL); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end, From patchwork Fri Jul 8 23:18:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12911962 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31D06CCA47B for ; Fri, 8 Jul 2022 23:19:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239308AbiGHXTX (ORCPT ); Fri, 8 Jul 2022 19:19:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239055AbiGHXTS (ORCPT ); Fri, 8 Jul 2022 19:19:18 -0400 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D58AC4198E; Fri, 8 Jul 2022 16:19:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1657322357; x=1688858357; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pc1V+hc3nLWgOYR+olBZbWKKUSmpRc4Pzg4j2XhZTsc=; b=d3s20VfXAQpo7EUYJhov3LbnxmCHb0PsxoC/WVTpNulf9jSSbvFvjrX5 fLrCi/+D8vibq94+rLMT4VDL9+vzkdma8SBZDWAaPZ6CyzfFiHtn+gJ9l uDyKq8qcXXk8PLl0NQun8IM0o8nlQJW2LVhs70HZQ/l/4A2y49vv+kz7X yzJBXZ0ap1fbn/heJM1GeKTsDVhSEiNi/RTLnm7EnO7H3fbMBBoos5QN7 mJRdUXv99KFmTJGG5tlvMQVTIxlCZLQdIOYhwuLKQDlB0ofVYXXu9F96y LQAw+aaGliTD/NCpMXj+8Bp2AZkGQwITCAzgJQIYSBrztW6r641GH4Mv3 A==; X-IronPort-AV: E=Sophos;i="5.92,256,1650902400"; d="scan'208";a="203871836" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 09 Jul 2022 07:19:17 +0800 IronPort-SDR: fwdJIj+8vtiTFBvlPVAfN0DqXZKVziEowQ7etfKPZRKh73SWByZie0zT/mpN7CnY2plsLVfBiu RqwTibA29BCC6ZCwybuwO8/5g+36BYAMDkJRr091DV/W9IUBkeEa9+e1Dg1DbZSmy2m10eJ6Ls E+MRqMwoTKksf3WYG0BBk7/KNJpNAFLTVSfU3NGOg8gGd9MKoRfbpV96BbnCtqHvkIsM1aLgi5 71oVWYhGfNDI0qxuHBYvmeJ8qX8XGE2MQOPGnZ1hfSRs8Mdx34rbtxnTj599t75g28uvv5yNTn 0XCbYX1IqusGBtSZNZlBWJvT Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 08 Jul 2022 15:36:22 -0700 IronPort-SDR: +9lmFEz7PD2i02CoLXx4j06g+16XdH2CuHirM0FvmwbCBHG2fyHc+zVwAHy36SYa2yi+VWKrxj I63ZtHTp7uTQeVcTPy1hkz1Z5NjNnb54fjQbKIKAf7eCh/U58BjqYGrr9Gq+pTslKbcxLnricQ x8bFZ1wpLx1XxAOHPlnPwfYEU90FoT98SR4j0+/MDBdrCIddvaqenaV61Wx0wwBmcJWbS7NhOM q6YlgwAh93M3B+6VNqT6A2TtM42zDl6DvSC+oo7cECDSQiTAn/kXKWyP4keIltwyCXeli1Ze82 cwI= WDCIronportException: Internal Received: from phd010370.ad.shared (HELO naota-xeon.wdc.com) ([10.225.55.250]) by uls-op-cesaip01.wdc.com with ESMTP; 08 Jul 2022 16:19:16 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, Naohiro Aota Subject: [PATCH 13/13] btrfs: zoned: wait until zone is finished when allocation didn't progress Date: Sat, 9 Jul 2022 08:18:50 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When the allocated position doesn't progress, we cannot submit IOs to finish a block group, but there should be ongoing IOs that will finish a block group. So, in that case, we wait for a zone to be finished and retry the allocation after that. Introduce a new flag BTRFS_FS_NEED_ZONE_FINISH for fs_info->flags to indicate we need a zone finish to have proceeded. The flag is set when the allocator detected it cannot activate a new block group. And, it is cleared once a zone is finished. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 4 ++++ fs/btrfs/disk-io.c | 1 + fs/btrfs/inode.c | 9 +++++++-- fs/btrfs/zoned.c | 6 ++++++ 4 files changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c215e15baea2..ddecd92fa848 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -638,6 +638,9 @@ enum { /* Indicate we have half completed snapshot deletions pending. */ BTRFS_FS_UNFINISHED_DROPS, + /* Indicate we have to finish a zone to do next allocation. */ + BTRFS_FS_NEED_ZONE_FINISH, + #if BITS_PER_LONG == 32 /* Indicate if we have error/warn message printed on 32bit systems */ BTRFS_FS_32BIT_ERROR, @@ -1084,6 +1087,7 @@ struct btrfs_fs_info { spinlock_t zone_active_bgs_lock; struct list_head zone_active_bgs; + wait_queue_head_t zone_finish_wait; /* Updates are not protected by any lock */ struct btrfs_commit_stats commit_stats; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 914557d59472..1fe5f79770a0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3135,6 +3135,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) init_waitqueue_head(&fs_info->transaction_blocked_wait); init_waitqueue_head(&fs_info->async_submit_wait); init_waitqueue_head(&fs_info->delayed_iputs_wait); + init_waitqueue_head(&fs_info->zone_finish_wait); /* Usable values until the real ones are cached from the superblock */ fs_info->nodesize = 4096; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 681e2cb4dd9c..815121350d91 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1643,8 +1643,13 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode, if (ret == 0) done_offset = end; - if (done_offset == start) - return -ENOSPC; + if (done_offset == start) { + struct btrfs_fs_info *info = inode->root->fs_info; + + wait_var_event(&info->zone_finish_wait, + !test_bit(BTRFS_FS_NEED_ZONE_FINISH, &info->flags)); + continue; + } if (!locked_page_done) { __set_page_dirty_nobuffers(locked_page); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 67098f3fcd14..471d870875ed 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2006,6 +2006,9 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ /* For active_bg_list */ btrfs_put_block_group(block_group); + clear_bit(BTRFS_FS_NEED_ZONE_FINISH, &fs_info->flags); + wake_up_all(&fs_info->zone_finish_wait); + return 0; } @@ -2042,6 +2045,9 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) } mutex_unlock(&fs_info->chunk_mutex); + if (!ret) + set_bit(BTRFS_FS_NEED_ZONE_FINISH, &fs_info->flags); + return ret; }