From patchwork Mon Jul 4 04:58:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CED3BC43334 for ; Mon, 4 Jul 2022 04:58:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231928AbiGDE6v (ORCPT ); Mon, 4 Jul 2022 00:58:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229518AbiGDE6s (ORCPT ); Mon, 4 Jul 2022 00:58:48 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C20826D0 for ; Sun, 3 Jul 2022 21:58:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910727; x=1688446727; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GY0qLqAH7JRZKP0pFrg7NP4NQkCiGdD4jLSGpT29R3s=; b=dkz0VmusuinHbSQJMI5tr9xNLA2d5mNR7E49up2RsdbnjXpLSwS/15K3 5OHQB7lHEub99wFZWEB07u/dlDaI80OxZqdVHphBWk+bQZxWUfU5V2TQx K5DmSJyQEJfmjLXa+0PSbK7LN6Rp4VB0uUxwr3arYxt3GlzUdqPZQ+W0W 2Qj6Zcx9wfEKD0xQf3V7dfKG5InUmtS58XIUOqF9w56XGJ3Brgua7tm1V iUa38m2SOMSUxQ7wEyLunqlL/9pwsdwB9PxFB0xlk1+g/H5jXxuSpxmSl 0wg6Mf93T4/QFI+lgWS+Zz5rMf8/mz05WcAxyZHitk2OvAJun/zKrN0RF A==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732398" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:47 +0800 IronPort-SDR: W8um3XZCoLY+VUBE2/p+6jHF9502stkaG1NawgErbnZGbzIWw0s8+LUKxkH455844i6kdZnb9s Bf0Uo8JIARTsvynwB1EsMWtqihGlXMRun9YS6FGgpkCH31CGBm8/ncvhyS1bogyXvp0j5y3AVa 8D7LGgBUM0LKkMeBmGt9uo+tLVx4BxAh9yuVk9VDRNAQgCEXfgz8pbRj342EGPCog57HVQaNtg ACiP12Xwi2eeMHi/GiLCX5Ai79vHJqbfJ67614Rwdt95QdJZw1PakBQPHE26k61Zm0s6/WxLYl z2MMSRlqhAT1DGevaRjWmMG1 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:40 -0700 IronPort-SDR: fbf7rJu+/KJ5olVma9c7VvX70hnq0ThBWGp1umG6CA5TZPMAz/tTwHgQdlxU1oB+JN/pM5kpTS BhvHThxiBUTSXExcaOdoToLVBNdkahcEoI6vBxG+REPsXh9cECmt0vsiVNwSrBxf/dVTvnqx/h 9LBIL53Z1otZSNUoKxcm4F5zrd4cgno34ZRjKlzMI9Fslxv6tNIBcziAxqFT7/EFn3sOAVrW4b OmPVJo9pcEBHFHYjnJ7d+OZEma3NL8eRfFuds6wsQHtqfSulztMgFov4qCS+MVhtITdxJddfXD Rto= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:47 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 01/13] block: add bdev_max_segments() helper Date: Mon, 4 Jul 2022 13:58:05 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add bdev_max_segments() like other queue parameters. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2f7b43444c5f..62e3ff52ab03 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1206,6 +1206,11 @@ bdev_max_zone_append_sectors(struct block_device *bdev) return queue_max_zone_append_sectors(bdev_get_queue(bdev)); } +static inline unsigned int bdev_max_segments(struct block_device *bdev) +{ + return queue_max_segments(bdev_get_queue(bdev)); +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; From patchwork Mon Jul 4 04:58:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06574C43334 for ; Mon, 4 Jul 2022 04:58:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232124AbiGDE6y (ORCPT ); Mon, 4 Jul 2022 00:58:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231376AbiGDE6t (ORCPT ); Mon, 4 Jul 2022 00:58:49 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9D545FA4 for ; Sun, 3 Jul 2022 21:58:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910728; x=1688446728; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8yCyf2FhjLaIcHMP4bgW6Bl5OA3CArHaLIB/b2LFXUQ=; b=l/27ZeenMzswxhPYbLNOaNOP4bV6J34PvqXcJaL9YaCWyaios6cpxCku oklf7zPxWfLGJFIBrv3aj+03NXzR6Ljive+FtzPumaCPNyclNnAq6r8D6 cjP6V4MXWKIOBMGhNS/hqFSl7lbQJR4KjbrsowRtt1giC1MTvFFHO1HcS jF72w/TqtwU7Wr+iHHs4VLeYi26KOOIT9Mm7lbxvA0+kIqArN6kG4ktJX ag79yOjBvfMZax3HS1Trh1lDcMFJaRjMyG5EmBaUa0PTsUiuUoWNIxaAo 845T8lRcR2OSWZPQYBhfRaqPCzbdGNMyGjev0dUA85KUrRvpkwFQWpbrG A==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732399" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:47 +0800 IronPort-SDR: HMcnJIHSWaFxS5iBloyaZu+8iHGQmNSW15YPkdIaTwvA3AikaJHTo0BCyWwpfoWwjD76y6Ls2L Kq6n2IRk+b1RXzTyhPthNW66Z9bo+IpQM1ldoMovv7vi+/ff4/HoefjVCnhPnBPDHZ/jN76bei 6t6S9+Lzhez3ZtYj9TFi4aWk4LQPFYXNG1H4mGXvb5L2KBDKBpK/T/fKTQJVb7OYvRjpxj20eG QkeJ7pTd/V3Zrrmvy5j8NX+uFkSATQL84Og+UhlCPl64g1d+kyRoHZr7jfrssPt4D//240HjRL 66rkQi4QSfkZ0ZcPCXx6XVua Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:40 -0700 IronPort-SDR: aMwQ37GlTXMjVvLkTEOmpQ3f8JMVbu1Ua9g4+mb2KdiqbxMgRQQi1Lld6N4Kbk2xOfgcANqP8A mwze9zeMgd3p3H9HknJvZcguMzMOv9Yjm8KYqmXxtloLiu3CZ4xEAM6MV4RNbnzX9m7dzd45We jNeoaf6jgD8zl0S729UgPO9Xw4a5wuVLkaMQi6mMwUsJ1ytVNHYGXzv0Jfv1APkcVCnC1EX7sO o8+vIFg6XJeRlByCSbg3QCMh47IKUjJF9eiWRC3nXL/v3HNzaSUTRAJDHLdl7wZPGgqDLBfeWL IgE= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:48 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 02/13] btrfs: zoned: revive max_zone_append_bytes Date: Mon, 4 Jul 2022 13:58:06 +0900 Message-Id: <687ec8ab8c61a9972d6936cdf189dc5756299051.1656909695.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch is basically a revert of commit 5a80d1c6a270 ("btrfs: zoned: remove max_zone_append_size logic"), but without unnecessary ASSERT and check. The max_zone_append_size will be used as a hint to estimate the number of extents to cover delalloc/writeback region in the later commits. The size of a ZONE APPEND bio is also limited by queue_max_segments(), so this commit considers it to calculate max_zone_append_size. Technically, a bio can be larger than queue_max_segments() * PAGE_SIZE if the pages are contiguous. But, it is safe to consider "queue_max_segments() * PAGE_SIZE" as an upper limit of an extent size to calculate the number of extents needed to write data. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/zoned.c | 10 ++++++++++ fs/btrfs/zoned.h | 1 + 3 files changed, 13 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4e2569f84aab..e4879912c475 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1071,6 +1071,8 @@ struct btrfs_fs_info { */ u64 zone_size; + /* Max size to emit ZONE_APPEND write command */ + u64 max_zone_append_size; struct mutex zoned_meta_io_lock; spinlock_t treelog_bg_lock; u64 treelog_bg; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 7a0f8fa44800..271b8b8fd4d0 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -415,6 +415,9 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache) nr_sectors = bdev_nr_sectors(bdev); zone_info->zone_size_shift = ilog2(zone_info->zone_size); zone_info->nr_zones = nr_sectors >> ilog2(zone_sectors); + zone_info->max_zone_append_size = + min_t(u64, (u64)bdev_max_zone_append_sectors(bdev) << SECTOR_SHIFT, + (u64)bdev_max_segments(bdev) << PAGE_SHIFT); if (!IS_ALIGNED(nr_sectors, zone_sectors)) zone_info->nr_zones++; @@ -640,6 +643,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 zoned_devices = 0; u64 nr_devices = 0; u64 zone_size = 0; + u64 max_zone_append_size = 0; const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; @@ -674,6 +678,11 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) ret = -EINVAL; goto out; } + if (!max_zone_append_size || + (zone_info->max_zone_append_size && + zone_info->max_zone_append_size < max_zone_append_size)) + max_zone_append_size = + zone_info->max_zone_append_size; } nr_devices++; } @@ -723,6 +732,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) } fs_info->zone_size = zone_size; + fs_info->max_zone_append_size = max_zone_append_size; fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; /* diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 6b2eec99162b..9caeab07fd38 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -19,6 +19,7 @@ struct btrfs_zoned_device_info { */ u64 zone_size; u8 zone_size_shift; + u64 max_zone_append_size; u32 nr_zones; unsigned int max_active_zones; atomic_t active_zones_left; From patchwork Mon Jul 4 04:58:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95BAEC433EF for ; Mon, 4 Jul 2022 04:58:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232313AbiGDE6z (ORCPT ); Mon, 4 Jul 2022 00:58:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231911AbiGDE6u (ORCPT ); Mon, 4 Jul 2022 00:58:50 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BFFF5FAA for ; Sun, 3 Jul 2022 21:58:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910728; x=1688446728; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lmoFEF1XxlfwzzfzmF5B5ykpWSwf4NtobksVdDs5dM8=; b=ESjyBGMDx3/sw5tBiubp0KP0S4VPhmA4u60JHankqG4siAZrBcP5hxnk N+vJdGwoyySpYxSouflV8kXFduTwPaHsQIcQqo/rlBd2CJd6YbYVG6Yd7 OFMcoqCtMqdAKFLmDIl5YVCqXg0hnGzPMiII322i+l0nT2A0vAxarBI8U cGJ673qzANGO4dR1Ojr3ZJ6UTLqMr6LQQIKC/DToesvsuwr0nxkT4duzC KdXqyO40flYsOBA0lOptRZTtCk6OxvPqRC+52cVBbyddD7ZUaRzD2B2Xs t4YV1x7rMQAKHirEVZFfo1MnetTSX+NpRtDnfU6NjH9IvvW7D3PTGBNV+ w==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732400" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:48 +0800 IronPort-SDR: iI6Mnmj0UtZrm2FMiVFgvwtj+yWrNzbw1y4fq3CcYnTD6FJ+HW09LPr+OEZrgZlS4cg5sq8Wn2 oQrGS4pRM3Ya7DCHkVsKE3ZNPe5OtOTHrIUUS4pEUYREMfQoFMHOdKZ7plNlBeUGqZ1XrsugBj dYcxIqMCypClRq1uyCGiMZN2TZ/xtx1VS/jA91lusokAeqkDdykUJamaXMtR3d8CPMs4f+7e/C AQwYwypKeBW2T5satXDsnUKOsbADJnRTJG33PWgm2Lwq1oqD1bGTrhazcLEB90V4KO2mB8Hgyy 829K+85phybrBnYCGJiW95ih Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:41 -0700 IronPort-SDR: 4QOSud4ay5qd0ecK5+4F5hADO3JQPYXqCvxaTicmEWJVH9NlrPApY4VXMLl8uI/9xwAysZ5bHT GGEVwZIrIN+kPs8pGsn7LOZU4iY8GkTmJk4EMeIwSakXbGT6w19IktjxCwG+Yi1dHwHCQnfqcm bI8v7ge6MF19YSE+rfCG6DXTy8DojDC4ds5VyxqrKWY23KYSRKUsZE96LncA1cgtx+BD4P/b8Q 8ULFU8nw+dNw7TZeG2oT/63fTnC2rikH3QZoZfu1VY1RHIvF0vRzWmDLWHGFb6qBIvyKm9sfUk BsQ= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:49 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 03/13] btrfs: replace BTRFS_MAX_EXTENT_SIZE with fs_info->max_extent_size Date: Mon, 4 Jul 2022 13:58:07 +0900 Message-Id: <128758fed168a54587c5a0902216aff0e6a72131.1656909695.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On zoned btrfs, data write out is limited by max_zone_append_size, and a large ordered extent is split according the size of a bio. OTOH, the number of extents to be written is calculated using BTRFS_MAX_EXTENT_SIZE, and that estimated number is used to reserve the metadata bytes to update and/or create the metadata items. The metadata reservation is done at e.g, btrfs_buffered_write() and then released according to the estimation changes. Thus, if the number of extent increases massively, the reserved metadata can run out. The increase of the number of extents easily occurs on zoned btrfs if BTRFS_MAX_EXTENT_SIZE > max_zone_append_size. And, it causes the following warning on a small RAM environment with disabling metadata over-commit (in the following patch). [75721.498492] ------------[ cut here ]------------ [75721.505624] BTRFS: block rsv 1 returned -28 [75721.512230] WARNING: CPU: 24 PID: 2327559 at fs/btrfs/block-rsv.c:537 btrfs_use_block_rsv+0x560/0x760 [btrfs] [75721.524407] Modules linked in: btrfs null_blk blake2b_generic xor raid6_pq loop dm_flakey dm_mod algif_hash af_alg veth xt_nat xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc overlay sunrpc ext4 mbcache jbd2 rapl ipmi_ssif bfq k10temp i2c_piix4 ipmi_si ipmi_devintf ipmi_msghandler zram ip_tables ccp ast bnxt_en drm_vram_helper drm_ttm_helper pkcs8_key_parser asn1_decoder public_key oid_registry fuse ipv6 [last unloaded: btrfs] [75721.581854] CPU: 24 PID: 2327559 Comm: kworker/u64:10 Kdump: loaded Tainted: G W 5.18.0-rc2-BTRFS-ZNS+ #109 [75721.597200] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 [75721.607310] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] [75721.616209] RIP: 0010:btrfs_use_block_rsv+0x560/0x760 [btrfs] [75721.624255] Code: 83 c0 01 38 d0 7c 0c 84 d2 74 08 4c 89 ff e8 57 59 64 e0 41 0f b7 74 24 62 ba e4 ff ff ff 48 c7 c7 a0 dc 33 a1 e8 c4 58 50 e2 <0f> 0b e9 9c fe ff ff 4d 8d a5 a0 02 00 00 4c 89 e7 e8 aa fb 5f e2 [75721.646649] RSP: 0018:ffffc9000fbdf3e0 EFLAGS: 00010286 [75721.654126] RAX: 0000000000000000 RBX: 0000000000004000 RCX: 0000000000000000 [75721.663524] RDX: 0000000000000004 RSI: 0000000000000008 RDI: fffff52001f7be6e [75721.672921] RBP: ffffc9000fbdf420 R08: 0000000000000001 R09: ffff889f8d1fc6c7 [75721.682493] R10: ffffed13f1a3f8d8 R11: 0000000000000001 R12: ffff88980a3c0e28 [75721.692284] R13: ffff889b66590000 R14: ffff88980a3c0e40 R15: ffff88980a3c0e8a [75721.701878] FS: 0000000000000000(0000) GS:ffff889f8d000000(0000) knlGS:0000000000000000 [75721.712601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [75721.720726] CR2: 000055d12e05c018 CR3: 0000800193594000 CR4: 0000000000350ee0 [75721.730499] Call Trace: [75721.735166] [75721.739886] btrfs_alloc_tree_block+0x1e1/0x1100 [btrfs] [75721.747545] ? btrfs_alloc_logged_file_extent+0x550/0x550 [btrfs] [75721.756145] ? btrfs_get_32+0xea/0x2d0 [btrfs] [75721.762852] ? btrfs_get_32+0xea/0x2d0 [btrfs] [75721.769520] ? push_leaf_left+0x420/0x620 [btrfs] [75721.776431] ? memcpy+0x4e/0x60 [75721.781931] split_leaf+0x433/0x12d0 [btrfs] [75721.788392] ? btrfs_get_token_32+0x580/0x580 [btrfs] [75721.795636] ? push_for_double_split.isra.0+0x420/0x420 [btrfs] [75721.803759] ? leaf_space_used+0x15d/0x1a0 [btrfs] [75721.811156] btrfs_search_slot+0x1bc3/0x2790 [btrfs] [75721.818300] ? lock_downgrade+0x7c0/0x7c0 [75721.824411] ? free_extent_buffer.part.0+0x107/0x200 [btrfs] [75721.832456] ? split_leaf+0x12d0/0x12d0 [btrfs] [75721.839149] ? free_extent_buffer.part.0+0x14f/0x200 [btrfs] [75721.846945] ? free_extent_buffer+0x13/0x20 [btrfs] [75721.853960] ? btrfs_release_path+0x4b/0x190 [btrfs] [75721.861429] btrfs_csum_file_blocks+0x85c/0x1500 [btrfs] [75721.869313] ? rcu_read_lock_sched_held+0x16/0x80 [75721.876085] ? lock_release+0x552/0xf80 [75721.881957] ? btrfs_del_csums+0x8c0/0x8c0 [btrfs] [75721.888886] ? __kasan_check_write+0x14/0x20 [75721.895152] ? do_raw_read_unlock+0x44/0x80 [75721.901323] ? _raw_write_lock_irq+0x60/0x80 [75721.907983] ? btrfs_global_root+0xb9/0xe0 [btrfs] [75721.915166] ? btrfs_csum_root+0x12b/0x180 [btrfs] [75721.921918] ? btrfs_get_global_root+0x820/0x820 [btrfs] [75721.929166] ? _raw_write_unlock+0x23/0x40 [75721.935116] ? unpin_extent_cache+0x1e3/0x390 [btrfs] [75721.942041] btrfs_finish_ordered_io.isra.0+0xa0c/0x1dc0 [btrfs] [75721.949906] ? try_to_wake_up+0x30/0x14a0 [75721.955700] ? btrfs_unlink_subvol+0xda0/0xda0 [btrfs] [75721.962661] ? rcu_read_lock_sched_held+0x16/0x80 [75721.969111] ? lock_acquire+0x41b/0x4c0 [75721.974982] finish_ordered_fn+0x15/0x20 [btrfs] [75721.981639] btrfs_work_helper+0x1af/0xa80 [btrfs] [75721.988184] ? _raw_spin_unlock_irq+0x28/0x50 [75721.994643] process_one_work+0x815/0x1460 [75722.000444] ? pwq_dec_nr_in_flight+0x250/0x250 [75722.006643] ? do_raw_spin_trylock+0xbb/0x190 [75722.013086] worker_thread+0x59a/0xeb0 [75722.018511] kthread+0x2ac/0x360 [75722.023428] ? process_one_work+0x1460/0x1460 [75722.029431] ? kthread_complete_and_exit+0x30/0x30 [75722.036044] ret_from_fork+0x22/0x30 [75722.041255] [75722.045047] irq event stamp: 0 [75722.049703] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [75722.057610] hardirqs last disabled at (0): [] copy_process+0x1c1a/0x66b0 [75722.067533] softirqs last enabled at (0): [] copy_process+0x1c59/0x66b0 [75722.077423] softirqs last disabled at (0): [<0000000000000000>] 0x0 [75722.085335] ---[ end trace 0000000000000000 ]--- To fix the estimation, we need to introduce fs_info->max_extent_size to replace BTRFS_MAX_EXTENT_SIZE, which allow setting the different size for regular btrfs vs zoned btrfs. Set fs_info->max_extent_size to BTRFS_MAX_EXTENT_SIZE by default. On zoned btrfs, it is set to fs_info->max_zone_append_size. CC: stable@vger.kernel.org # 5.12+ Fixes: d8e3fb106f39 ("btrfs: zoned: use ZONE_APPEND write for zoned mode") Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c | 2 ++ fs/btrfs/extent_io.c | 3 ++- fs/btrfs/inode.c | 6 ++++-- fs/btrfs/zoned.c | 2 ++ 5 files changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e4879912c475..fca253bdb4b8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1056,6 +1056,9 @@ struct btrfs_fs_info { u32 csums_per_leaf; u32 stripesize; + /* Maximum size of an extent. BTRFS_MAX_EXTENT_SIZE on regular btrfs. */ + u64 max_extent_size; + /* Block groups and devices containing active swapfiles. */ spinlock_t swapfile_pins_lock; struct rb_root swapfile_pins; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 70b388de4d66..ef9d28147b9e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3138,6 +3138,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) fs_info->sectorsize_bits = ilog2(4096); fs_info->stripesize = 4096; + fs_info->max_extent_size = BTRFS_MAX_EXTENT_SIZE; + spin_lock_init(&fs_info->swapfile_pins_lock); fs_info->swapfile_pins = RB_ROOT; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3194eca41635..80d9c218534f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2021,10 +2021,11 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode, struct page *locked_page, u64 *start, u64 *end) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; const u64 orig_start = *start; const u64 orig_end = *end; - u64 max_bytes = BTRFS_MAX_EXTENT_SIZE; + u64 max_bytes = fs_info->max_extent_size; u64 delalloc_start; u64 delalloc_end; bool found; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9890782fe932..74ac7ef69a3f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2201,6 +2201,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page void btrfs_split_delalloc_extent(struct inode *inode, struct extent_state *orig, u64 split) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); u64 size; /* not delalloc, ignore it */ @@ -2208,7 +2209,7 @@ void btrfs_split_delalloc_extent(struct inode *inode, return; size = orig->end - orig->start + 1; - if (size > BTRFS_MAX_EXTENT_SIZE) { + if (size > fs_info->max_extent_size) { u32 num_extents; u64 new_size; @@ -2237,6 +2238,7 @@ void btrfs_split_delalloc_extent(struct inode *inode, void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new, struct extent_state *other) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); u64 new_size, old_size; u32 num_extents; @@ -2250,7 +2252,7 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new, new_size = other->end - new->start + 1; /* we're not bigger than the max, unreserve the space and go */ - if (new_size <= BTRFS_MAX_EXTENT_SIZE) { + if (new_size <= fs_info->max_extent_size) { spin_lock(&BTRFS_I(inode)->lock); btrfs_mod_outstanding_extents(BTRFS_I(inode), -1); spin_unlock(&BTRFS_I(inode)->lock); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 271b8b8fd4d0..eb5a612ea912 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -734,6 +734,8 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; + if (fs_info->max_zone_append_size < fs_info->max_extent_size) + fs_info->max_extent_size = fs_info->max_zone_append_size; /* * Check mount options here, because we might change fs_info->zoned From patchwork Mon Jul 4 04:58:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904630 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC5C0CCA47C for ; Mon, 4 Jul 2022 04:58:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232196AbiGDE6y (ORCPT ); Mon, 4 Jul 2022 00:58:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231915AbiGDE6u (ORCPT ); Mon, 4 Jul 2022 00:58:50 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F235F267E for ; Sun, 3 Jul 2022 21:58:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910729; x=1688446729; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ectCsfNb0qppLMOswu3/MxgLhmpVLNsrg/MKHVAjkxM=; b=HkFpcxYe9Oxt/PgORMIliyMM//5ZUugtey5AeNNkIXkEhNG9TXefoLVa Zi2kbG5vjhx2sUy2sckYgFWUJX/X2wcyKfLLysLNAnlp+2klWa/kviA92 aCUbqWVJGgX3C/Um99Hw6XwKDw/c+Yz120hj7Btx0rzyUbOliOnl+W8PZ 44nLVDIapcxqDysugBONqKqo8fqof/1Q/OVQUBDgYQ7l35xSmg/H7Q/dI 5Bl2YxeEULEA+jSlfxgEqcecIgMIUYJEnGxnf2k+JlvJprStiRg9xDjyj JDNmJRW9baGfsYARvyf/FOjIAnaNvgUval2nFLeb5PFMr0DCOsO6JRRVC Q==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732402" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:49 +0800 IronPort-SDR: OdUtXtZq9FPoZRj7aJ7T9Hus6mrwDFH3n8pQ2xZF/TtDnswNF65q1oWZrXwDnZMK2fIcG84FZI uHZnht8C6ai9MQPiIwmxV/E/22m15sFzTKBoU8S2fw3eHFEzzxZI5zyIbyDgXZa/TvP2yGzZ6q PAICWYhyhxd0WupV1pgcH206x9fe0jCU7WoAxVaDFMtkluIKgX+Qr8nfEzxW0XKAo51J8vo5Es 2UkMrcl1H89BXhuCZDhiVZ2NxoZcQThPs5D1T8LZ1d9ltnnfiO1lnJ9CpppzBY/u+O0VM3c1CP AMuYxre0WFW1LnZnKVntOgr1 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:42 -0700 IronPort-SDR: E9d7WlOg6/6Ad95hvk5Y2grwCl6byRnGHFFe5HTIw8LvXubWBY+muU/Q3yJ8o/pBPkxxXeBTht 96Rw6CSF2+xbQbPiRR+7Ld0DACgFSsOcxRSf0QqI1teSCWy7EDW+odQR0GSycMNVD4GEjWIdS5 M+/JmcZ4j0N12E+LdiEMuA7B1N32RGEXV8yN086W7aICtDUhbL9/fU6WzY1Wc9TsxVKRclWPhI 1G28N7b/xgJPV2xcFmEglpDuIWDW7Z7KPEWdlCflDB5aw0FOMuc7eiXG0sB9ymijkanfftTA2N Smo= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:49 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 04/13] btrfs: convert count_max_extents() to use fs_info->max_extent_size Date: Mon, 4 Jul 2022 13:58:08 +0900 Message-Id: <943a1cb89b84002b49b758fce808a36a50c195a1.1656909695.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If count_max_extents() uses BTRFS_MAX_EXTENT_SIZE to calculate the number of extents needed, btrfs release the metadata reservation too much on its way to write out the data. Now that BTRFS_MAX_EXTENT_SIZE is replaced with fs_info->max_extent_size, convert count_max_extents() to use it instead, and fix the calculation of the metadata reservation. CC: stable@vger.kernel.org # 5.12+ Fixes: d8e3fb106f39 ("btrfs: zoned: use ZONE_APPEND write for zoned mode") Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 16 ++++++++-------- fs/btrfs/delalloc-space.c | 6 +++--- fs/btrfs/inode.c | 16 ++++++++-------- 3 files changed, 19 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index fca253bdb4b8..4aac7df5a17d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -107,14 +107,6 @@ struct btrfs_ioctl_encoded_io_args; #define BTRFS_STAT_CURR 0 #define BTRFS_STAT_PREV 1 -/* - * Count how many BTRFS_MAX_EXTENT_SIZE cover the @size - */ -static inline u32 count_max_extents(u64 size) -{ - return div_u64(size + BTRFS_MAX_EXTENT_SIZE - 1, BTRFS_MAX_EXTENT_SIZE); -} - static inline unsigned long btrfs_chunk_item_size(int num_stripes) { BUG_ON(num_stripes == 0); @@ -4057,6 +4049,14 @@ static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info) return fs_info->zone_size > 0; } +/* + * Count how many fs_info->max_extent_size cover the @size + */ +static inline u32 count_max_extents(struct btrfs_fs_info *fs_info, u64 size) +{ + return div_u64(size + fs_info->max_extent_size - 1, fs_info->max_extent_size); +} + static inline bool btrfs_is_data_reloc_root(const struct btrfs_root *root) { return root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID; diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c index 36ab0859a263..1e8f17ff829e 100644 --- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -273,7 +273,7 @@ static void calc_inode_reservations(struct btrfs_fs_info *fs_info, u64 num_bytes, u64 disk_num_bytes, u64 *meta_reserve, u64 *qgroup_reserve) { - u64 nr_extents = count_max_extents(num_bytes); + u64 nr_extents = count_max_extents(fs_info, num_bytes); u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes); u64 inode_update = btrfs_calc_metadata_size(fs_info, 1); @@ -350,7 +350,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes, * needs to free the reservation we just made. */ spin_lock(&inode->lock); - nr_extents = count_max_extents(num_bytes); + nr_extents = count_max_extents(fs_info, num_bytes); btrfs_mod_outstanding_extents(inode, nr_extents); inode->csum_bytes += disk_num_bytes; btrfs_calculate_inode_block_rsv_size(fs_info, inode); @@ -413,7 +413,7 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes) unsigned num_extents; spin_lock(&inode->lock); - num_extents = count_max_extents(num_bytes); + num_extents = count_max_extents(fs_info, num_bytes); btrfs_mod_outstanding_extents(inode, -num_extents); btrfs_calculate_inode_block_rsv_size(fs_info, inode); spin_unlock(&inode->lock); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 74ac7ef69a3f..357322da51b5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2218,10 +2218,10 @@ void btrfs_split_delalloc_extent(struct inode *inode, * applies here, just in reverse. */ new_size = orig->end - split + 1; - num_extents = count_max_extents(new_size); + num_extents = count_max_extents(fs_info, new_size); new_size = split - orig->start; - num_extents += count_max_extents(new_size); - if (count_max_extents(size) >= num_extents) + num_extents += count_max_extents(fs_info, new_size); + if (count_max_extents(fs_info, size) >= num_extents) return; } @@ -2278,10 +2278,10 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new, * this case. */ old_size = other->end - other->start + 1; - num_extents = count_max_extents(old_size); + num_extents = count_max_extents(fs_info, old_size); old_size = new->end - new->start + 1; - num_extents += count_max_extents(old_size); - if (count_max_extents(new_size) >= num_extents) + num_extents += count_max_extents(fs_info, old_size); + if (count_max_extents(fs_info, new_size) >= num_extents) return; spin_lock(&BTRFS_I(inode)->lock); @@ -2360,7 +2360,7 @@ void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state, if (!(state->state & EXTENT_DELALLOC) && (bits & EXTENT_DELALLOC)) { struct btrfs_root *root = BTRFS_I(inode)->root; u64 len = state->end + 1 - state->start; - u32 num_extents = count_max_extents(len); + u32 num_extents = count_max_extents(fs_info, len); bool do_list = !btrfs_is_free_space_inode(BTRFS_I(inode)); spin_lock(&BTRFS_I(inode)->lock); @@ -2402,7 +2402,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, struct btrfs_inode *inode = BTRFS_I(vfs_inode); struct btrfs_fs_info *fs_info = btrfs_sb(vfs_inode->i_sb); u64 len = state->end + 1 - state->start; - u32 num_extents = count_max_extents(len); + u32 num_extents = count_max_extents(fs_info, len); if ((state->state & EXTENT_DEFRAG) && (bits & EXTENT_DEFRAG)) { spin_lock(&inode->lock); From patchwork Mon Jul 4 04:58:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904632 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8179C43334 for ; Mon, 4 Jul 2022 04:58:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232319AbiGDE64 (ORCPT ); Mon, 4 Jul 2022 00:58:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231947AbiGDE6v (ORCPT ); Mon, 4 Jul 2022 00:58:51 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 687E326D0 for ; Sun, 3 Jul 2022 21:58:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910729; x=1688446729; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yfKfnFBF+lXgNVs1kpIXSL42zx5+0j29bNRAKN8Z1EM=; b=Vx4ZCLoFvG+7Z4jX80Y6kgG40+nQqLZI54+O8FJKgZlHZdf+nl/aIQl0 Rwt93DXbef7m2QckvxdLtSro2u/F2TafP3+mvJDZKBVde344ABdYgAQM4 qYhtB+IwE2nIvoBtam7xslOaxK00rd/3M7IXnbGMBbeAW4IOPgyPeqZaY BK7KlSf9PXoAAXt4LRgF5JQFH9rVMuK8hWxdWDSw6kPs4/SMPvNfMlTTH ASvQlX0NAsVpPzzA7pcN+aXJ/DEffjBWtYt2lf3VtskOLZesChP7SMEXD bpyFPakt+tU/qSySxf8ypLe7MD1GCgvQ2TCIuEpiX+0yDCrNwRkBsj466 Q==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732403" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:49 +0800 IronPort-SDR: ggkL+KlqWMzv4ZvcAVx8csdXD5b18luyP/JHMCraGrTVMpGMk5+cCG99LFczq3TCTNa2OieE13 OarVssXdA0zewxDCSY01a8Csyyjpul4+ONgCS4P6sMwjPhYKkpmuF5up4jwA4KDZVydtw/smsw EUJXu0L5N1YoVG71Lt/WOMy0vU6bnMSlOwE6V8P7WUtcCVO4H4u8TbVRDovFib7YMoM6mgkfGb sxJpK9DPKAufPfz+1UybVTbrARTgqFBBSizCVyW1+AHkGtjdvsqSMElHvLpX5pNBN0myjcM47R 2esOXkyr5irlu1YRhbN3aqXx Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:42 -0700 IronPort-SDR: 6R9zdRgwOXiynvFkCW7wE9wZXN02r2mVvRgcfA95XYPUF5UAoMCDqpZbL73v/TldkF4yHsvGeM D6/rjQUUtdNdazV8P0hxFDWsJ55BHfLiWlLOipT3UXSb8Vv5SE3NB+jF3WlV+zdUSMHGcGElwm OZ3ZdeGxGPsueSBp5eAnlXnN1ZhO0SSZbKOkM5zqP10VYh8NQQopiuksUuZf1Zk7dQzFjHoVei vHjhDSOF3TYZgRflA1AfX6Dn4423zyuG08lsTLc7OT9OQZ25Pxd9JhIDFDJX/APk8hqDlBV00Y PaY= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:50 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 05/13] btrfs: use fs_info->max_extent_size in get_extent_max_capacity() Date: Mon, 4 Jul 2022 13:58:09 +0900 Message-Id: <81fdc87b1b820d1d1bb54d3d2c24b085590ed506.1656909695.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Use fs_info->max_extent_size also in get_extent_max_capacity() for the completeness. This is only used for defrag and not really necessary to fix the metadata reservation size. But, it still suppresses unnecessary defrag operations. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/ioctl.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 7e1b4b0fbd6c..37480d4e6443 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1230,16 +1230,18 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start, return em; } -static u32 get_extent_max_capacity(const struct extent_map *em) +static u32 get_extent_max_capacity(struct btrfs_fs_info *fs_info, + const struct extent_map *em) { if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) return BTRFS_MAX_COMPRESSED; - return BTRFS_MAX_EXTENT_SIZE; + return fs_info->max_extent_size; } static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em, u32 extent_thresh, u64 newer_than, bool locked) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_map *next; bool ret = false; @@ -1263,7 +1265,7 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em, * If the next extent is at its max capacity, defragging current extent * makes no sense, as the total number of extents won't change. */ - if (next->len >= get_extent_max_capacity(em)) + if (next->len >= get_extent_max_capacity(fs_info, em)) goto out; /* Skip older extent */ if (next->generation < newer_than) @@ -1400,6 +1402,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode, bool locked, struct list_head *target_list, u64 *last_scanned_ret) { + struct btrfs_fs_info *fs_info = inode->root->fs_info; bool last_is_target = false; u64 cur = start; int ret = 0; @@ -1484,7 +1487,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode, * Skip extents already at its max capacity, this is mostly for * compressed extents, which max cap is only 128K. */ - if (em->len >= get_extent_max_capacity(em)) + if (em->len >= get_extent_max_capacity(fs_info, em)) goto next; /* From patchwork Mon Jul 4 04:58:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904634 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02D54CCA479 for ; Mon, 4 Jul 2022 04:58:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231911AbiGDE64 (ORCPT ); Mon, 4 Jul 2022 00:58:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231949AbiGDE6v (ORCPT ); Mon, 4 Jul 2022 00:58:51 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25304267E for ; Sun, 3 Jul 2022 21:58:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910730; x=1688446730; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tbfB8v/DgIT2PWfK91wcaqwjDog/GV8t3neYpFK4HUY=; b=TWMvBppbXFCzjjAv3LFHt8CcgbpYTiZwvy7wFVjW328eAftH6Qh0lBrF MnL8Mmdk7YLljIQmag1mqFbjgrF675r1LCmqsTVsAQDFAsXkNwhi2dGHx p6uJdHd9oDWekNC2txFIZ0qQnZRsXYS0mmhyu7jjZxswOf5k5SmLXlhJq rgOMomlXLICICSV5Nl2Y3jBbB1KJCGu7YpRNHBNS5DWYBjV3NQYYA5B2e wnxT7snp88h8jeq9jS08ii6e3tw0NzkiXkAjTUHVfUNJft0XZDBwxQ0T9 s5p5KOLlMbmdB2smm06l3/BesKMgt3vJqSeuxOKrmHNVl4U3n+ZhITbXE A==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732405" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:50 +0800 IronPort-SDR: be/f2K4xn1y+fF01UqAV1xBlq3n6VDrPPs8JkeqzrcLBw21oWtBlOG9gzVbKF6l2caovTzf9SO 5cwJAgPzG4kaFMebi8GQf1Yt7IgMnYakbFg8Wn6SbtWfGUb02kxkYZMxJkM6NmW2BYaTGmNoz6 Vyc+vcWFBP4sFGKGCiUOlLxdXVL05fgAJ5FVAX+8EFzZdNlxe1+quBOCVIRW41xBjmCIbGluGg uANAXygtZtiKAKfnixfoOQ7vFuVspdIxy9A73VvPAJpVhpfvym0KXOsrcgKlv2VPbgQOv3wFVs F0bkTbqhIR6mVrgkxO0Iyvzw Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:43 -0700 IronPort-SDR: PBQFFl0mqQ8eKPRGPUKYC3aFPnPYD2+NKjRBWE0ZcJWwhqxLyeZGp9TZ5/I7rjWMjo72RgH0L7 dQCoxiZFBgcsCYuN9HQBX9p5VocIQ33qEWWs9QWhlmM26dPPc+341OjNGYtUr3xxoL/OrX3zvV z53SUMXYsBSR07AVmhgu5bvzCC+KNcQ7I0bDEGQRWyh0NH8rb6mhbWaa5r+ZGPX2lPGpkYRB4n XE20+jpb+GwqjGFzjV8wykZh7bsdsQ4mwobDv3OCjoCo7VJ2SVP04Cm4oKpB5ZPtFLACMO08EZ JZU= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:51 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 06/13] btrfs: let can_allocate_chunk return int Date: Mon, 4 Jul 2022 13:58:10 +0900 Message-Id: <11f96ae212fb278793c81eedbb7edc01864c8a33.1656909695.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For the later patch, convert the return type from bool to int. There is no functional changes. Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent-tree.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f97a0f28f464..c8f26ab7fe24 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3965,12 +3965,12 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, } } -static bool can_allocate_chunk(struct btrfs_fs_info *fs_info, - struct find_free_extent_ctl *ffe_ctl) +static int can_allocate_chunk(struct btrfs_fs_info *fs_info, + struct find_free_extent_ctl *ffe_ctl) { switch (ffe_ctl->policy) { case BTRFS_EXTENT_ALLOC_CLUSTERED: - return true; + return 0; case BTRFS_EXTENT_ALLOC_ZONED: /* * If we have enough free space left in an already @@ -3980,8 +3980,8 @@ static bool can_allocate_chunk(struct btrfs_fs_info *fs_info, */ if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size && !btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->flags)) - return false; - return true; + return -ENOSPC; + return 0; default: BUG(); } @@ -4063,8 +4063,9 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info, int exist = 0; /*Check if allocation policy allows to create a new chunk */ - if (!can_allocate_chunk(fs_info, ffe_ctl)) - return -ENOSPC; + ret = can_allocate_chunk(fs_info, ffe_ctl); + if (ret) + return ret; trans = current->journal_info; if (trans) From patchwork Mon Jul 4 04:58:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 423EACCA481 for ; Mon, 4 Jul 2022 04:58:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231947AbiGDE65 (ORCPT ); Mon, 4 Jul 2022 00:58:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231989AbiGDE6w (ORCPT ); Mon, 4 Jul 2022 00:58:52 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 046E85FA4 for ; Sun, 3 Jul 2022 21:58:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910731; x=1688446731; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WhP92MmgPzjZ/GqG4lrRI9mzxR6+K32bGORHMqYyUeA=; b=JI7zrGGGUslkpKPKzYDhRRA1HdUQZ2hdw0KwV9lSssXK4XKDMg+dUQAW Ty0idcqIw+Zg2I5Lo9oo57o1ELu9UaucRk71CjAxwLD4QGuFZ9Ye6qwQc /wdXLDhYeoSJfWM03t+hBEEJ4k7A99m6m/fxKa703xpOuVa0gWENeyFJ3 PkQcYlFBGuKHvl0K9Q/Jum4iAh0E/keF2aOGgQvODtyTMONWDRlYL/5a1 iwdyUxxShIN1kJ7wVp0UMA1OVI2/P4eEux6IrfGd04VSZuoRqbo6rkMtS EgocqBBC90LKeHMvi6pW6OyltpTV6Z4xhTCC1gr6OE3cS+M+at/ZwcA8p A==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732407" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:51 +0800 IronPort-SDR: z9/j/2SVyg3CiiJ5m1ycF+4rj/JgeTjF/j8Ai5ZNNQn2iTKD72uwWhGIfBf5RbcbrMAzNqP4+x rzazxsDS8TFtRJzIsbEBSQPZpPQqZsh8etNYjj6xwT2W9CGw5JqZWkSsiJQxLiMvhLANW4E9yd +TCIJAw1KMYaeH+Ow/EugvpesiH50Oh9q/hYkHSDMT8wSjbSljEztD4s7iaycTwIRy70anelVQ 0xru6yk4jI88gRfiQMw7UOBPbG9wngLlzGDk7NYIh6qYOeQ4uJzP/qNhdSXBJgHrHsVffHpaMm J/OniO7GL2jUfHxStfyXh9sJ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:44 -0700 IronPort-SDR: oAOwd3DkdVU3RF/1THcwhOrxq9AwjsIw2RtM/PCWTxEqLU2doZsnX2hFvKJjCeUilzpMVyATiX EVCYTNg4NNa0IVdSBB6I8u2Bmotuev8+XgZpBOHkQqs8Qh0Pko3uF8fXgsIllyumK9sSYU8Vf0 mtPOTwrnGIHGU9Jkhg8VHXRm37qB6iZWbheEzTAokMp3eXj2q5YxdYAKpMAurVEmQAKR5G/AYr r/yIwaqYK9pU6QhvB+7mDkrTHiCzRbA50UD4zxB3Huia8gULIW3JP9WMZq1LxpfJDj/lxr+j9z EQU= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:51 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 07/13] btrfs: zoned: finish least available block group on data BG allocation Date: Mon, 4 Jul 2022 13:58:11 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When we run out of active zones and no sufficient space is left in any block groups, we need to finish one block group to make room to activate a new block group. However, we cannot do this for metadata block groups because we can cause a deadlock by waiting for a running transaction commit. So, do that only for a data block group. Furthermore, the block group to be finished has two requirements. First, the block group must not have reserved bytes left. Having reserved bytes means we have an allocated region but did not yet send bios for it. If that region is allocated by the thread calling btrfs_zone_finish(), it results in a deadlock. Second, the block group to be finished must not be a SYSTEM block group. Finishing a SYSTEM block group easily breaks further chunk allocation by nullifying the SYSTEM free space. In a certain case, we cannot find any zone finish candidate or btrfs_zone_finish() may fail. In that case, we fall back to split the allocation bytes and fill the last spaces left in the block groups. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 43 ++++++++++++++++++++++++++++++++---------- fs/btrfs/zoned.c | 40 +++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 3 files changed, 80 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c8f26ab7fe24..62e75c1d1155 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3965,6 +3965,38 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, } } +static int can_allocate_chunk_zoned(struct btrfs_fs_info *fs_info, + struct find_free_extent_ctl *ffe_ctl) +{ + /* If we can activate new zone, just allocate a chunk and use it */ + if (btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->flags)) + return 0; + + /* + * We already reached the max active zones. Try to finish one block + * group to make a room for a new block group. This is only possible for + * a data BG because btrfs_zone_finish() may need to wait for a running + * transaction which can cause a deadlock for metadata allocation. + */ + if ((ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) && btrfs_finish_one_bg(fs_info)) + return 0; + + /* + * If we have enough free space left in an already active block group + * and we can't activate any other zone now, do not allow allocating a + * new chunk and let find_free_extent() retry with a smaller size. + */ + if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size) + return -ENOSPC; + + /* + * We cannot activate a new block group and no enough space left in any + * block groups. So, allocating a new block group may not help. But, + * there is nothing to do anyway, so let's go with it. + */ + return 0; +} + static int can_allocate_chunk(struct btrfs_fs_info *fs_info, struct find_free_extent_ctl *ffe_ctl) { @@ -3972,16 +4004,7 @@ static int can_allocate_chunk(struct btrfs_fs_info *fs_info, case BTRFS_EXTENT_ALLOC_CLUSTERED: return 0; case BTRFS_EXTENT_ALLOC_ZONED: - /* - * If we have enough free space left in an already - * active block group and we can't activate any other - * zone now, do not allow allocating a new chunk and - * let find_free_extent() retry with a smaller size. - */ - if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size && - !btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->flags)) - return -ENOSPC; - return 0; + return can_allocate_chunk_zoned(fs_info, ffe_ctl); default: BUG(); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index eb5a612ea912..4a69e8492177 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2178,3 +2178,43 @@ void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logica spin_unlock(&block_group->lock); btrfs_put_block_group(block_group); } + +bool btrfs_finish_one_bg(struct btrfs_fs_info *fs_info) +{ + struct btrfs_block_group *block_group; + struct btrfs_block_group *min_bg = NULL; + u64 min_avail = U64_MAX; + int ret; + + spin_lock(&fs_info->zone_active_bgs_lock); + list_for_each_entry(block_group, &fs_info->zone_active_bgs, + active_bg_list) { + u64 avail; + + spin_lock(&block_group->lock); + if (block_group->reserved || + (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)) { + spin_unlock(&block_group->lock); + continue; + } + + avail = block_group->zone_capacity - block_group->alloc_offset; + if (min_avail > avail) { + if (min_bg) + btrfs_put_block_group(min_bg); + min_bg = block_group; + min_avail = avail; + btrfs_get_block_group(min_bg); + } + spin_unlock(&block_group->lock); + } + spin_unlock(&fs_info->zone_active_bgs_lock); + + if (!min_bg) + return false; + + ret = btrfs_zone_finish(min_bg); + btrfs_put_block_group(min_bg); + + return ret == 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 9caeab07fd38..09a19772ee68 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -80,6 +80,7 @@ void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info); bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info); void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, u64 length); +bool btrfs_finish_one_bg(struct btrfs_fs_info *fs_info); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -249,6 +250,12 @@ static inline bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info) static inline void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, u64 length) { } + +static inline bool btrfs_finish_one_bg(struct btrfs_fs_info *fs_info) +{ + return true; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Jul 4 04:58:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11A66C433EF for ; Mon, 4 Jul 2022 04:58:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232414AbiGDE65 (ORCPT ); Mon, 4 Jul 2022 00:58:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231153AbiGDE6y (ORCPT ); Mon, 4 Jul 2022 00:58:54 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3E8426D0 for ; Sun, 3 Jul 2022 21:58:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910732; x=1688446732; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dPMJP16w9Vb71KHsDc9D0IS2u03xDNbO/+hoBnN+Nu4=; b=GGTmLPLfOS/mWtCGP91/U2HxikmoaOvEKNArghV12pO4JlBZMTgEfd8h D1HtinHeQot3h9bciM9HNkvDApK+5uyjZWyAz5Yby7hztMbyp4emggIPh IeY3I0c3ipnTVX+t4Y4JFzj54fL7Qulu5HUGVHfUSyZixhsHFygBdx8Ya Y+28gnV8wa/whIW9MFFWz2cI0j50SLQ7gxw+bObYAvzKKRvpqKbazydms 2wTWkYx/ldWcwMySk/gcMl+BYqCIm/UPnyuDCf+Ifuz9YjRgX+GmdQ/KT sJjNdnd+vn54l3Pt7IKY5SDGOHFXvTsswib3iRhLM26kj2NYSjtSdZ6OQ w==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732408" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:52 +0800 IronPort-SDR: /euiknmKQeduV5zViIvV0DdPnDH/eKwNdK1tiLOajGX7yhFJj0sZnukDYd/lfZ23f4BiFM8oGI w7HO00fCzxAOD1zO508v9boV9pWl7kEiivmB2bxGjLeLOp7zBDW1ARWSyJvyUJGh3O/nKvuoCl +UuQwSK4AEYdR6QcbAjfLyh+1WBW2UszcR4l+2o97r+XB1jQ+hVxDCSuphWYXOyGN4FosPmxVB cYXeGoKxaYdP2qLMnCIq6+pOWQMS3QYFXkxUZe+nbWeZ8E2Y1v+GUPPamenFpOtUVeCemGl+Ik 2xhfCVBQkbe1EgnjS7EErr+t Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:45 -0700 IronPort-SDR: uFXXbNTBppuWZ1W9qgSdSGHoPoXPY/kqHrsE8cZOXhgtyjefHKuZ52jwPvuS1/0/9meFc0rFxn Fz+ld5Xkkz73LlTOIBPvoS85wy+7zWza+Qw3QQ0iCbvF9v2oKW2DiSX/oeNEhsFoD6fPwjrPr6 Nn9zP2MMnvsw1Trij5cjUlV+pwdvYy49GLNxabf0IcNI0ZliWj4I+p/I8SgvhKuokNnCozT88B p7jjbHA48TFvtn3jPoBJCbc48l1pBTzY6CbeDUFoUBI+5n6Wc69yY7jPOakq5ZQGsoSuphkxU0 4g4= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:52 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 08/13] btrfs: zoned: introduce space_info->active_total_bytes Date: Mon, 4 Jul 2022 13:58:12 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The active_total_bytes, like the total_bytes, accounts for the total bytes of active block groups in the space_info. With an introduction of active_total_bytes, we can check if the reserved bytes can be written to the block groups without activating a new block group. The check is necessary for metadata allocation on zoned btrfs. We cannot finish a block group, which may require waiting for the current transaction, from the metadata allocation context. Instead, we need to ensure the on-going allocation (reserved bytes) fits in active block groups. Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 12 +++++++++--- fs/btrfs/space-info.c | 41 ++++++++++++++++++++++++++++++++--------- fs/btrfs/space-info.h | 4 +++- fs/btrfs/zoned.c | 16 ++++++++++++++++ 4 files changed, 60 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index e930749770ac..51e7c1f1d93f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1051,8 +1051,13 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->length * factor); + WARN_ON(block_group->zone_is_active && + block_group->space_info->active_total_bytes + < block_group->length); } block_group->space_info->total_bytes -= block_group->length; + if (block_group->zone_is_active) + block_group->space_info->active_total_bytes -= block_group->length; block_group->space_info->bytes_readonly -= (block_group->length - block_group->zone_unusable); block_group->space_info->bytes_zone_unusable -= @@ -2107,7 +2112,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, cache->length, cache->used, cache->bytes_super, - cache->zone_unusable, &space_info); + cache->zone_unusable, cache->zone_is_active, + &space_info); cache->space_info = space_info; @@ -2177,7 +2183,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) } btrfs_update_space_info(fs_info, bg->flags, em->len, em->len, - 0, 0, &space_info); + 0, 0, false, &space_info); bg->space_info = space_info; link_block_group(bg); @@ -2558,7 +2564,7 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, cache->bytes_super, cache->zone_unusable, - &cache->space_info); + cache->zone_is_active, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 62d25112310d..c7a60341b2d2 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -295,7 +295,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, u64 bytes_readonly, u64 bytes_zone_unusable, - struct btrfs_space_info **space_info) + bool active, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; int factor; @@ -306,6 +306,8 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, ASSERT(found); spin_lock(&found->lock); found->total_bytes += total_bytes; + if (active) + found->active_total_bytes += total_bytes; found->disk_total += total_bytes * factor; found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; @@ -369,6 +371,22 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, return avail; } +static inline u64 writable_total_bytes(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info) +{ + /* + * On regular btrfs, all total_bytes are always writable. On zoned + * btrfs, there may be a limitation imposed by max_active_zzones. For + * metadata allocation, we cannot finish an existing active block group + * to avoid a deadlock. Thus, we need to consider only the active groups + * to be writable for metadata space. + */ + if (!btrfs_is_zoned(fs_info) || (space_info->flags & BTRFS_BLOCK_GROUP_DATA)) + return space_info->total_bytes; + + return space_info->active_total_bytes; +} + int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, u64 bytes, enum btrfs_reserve_flush_enum flush) @@ -383,7 +401,7 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, used = btrfs_space_info_used(space_info, true); avail = calc_available_free_space(fs_info, space_info, flush); - if (used + bytes < space_info->total_bytes + avail) + if (used + bytes < writable_total_bytes(fs_info, space_info) + avail) return 1; return 0; } @@ -419,7 +437,7 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info, ticket = list_first_entry(head, struct reserve_ticket, list); /* Check and see if our ticket can be satisfied now. */ - if ((used + ticket->bytes <= space_info->total_bytes) || + if ((used + ticket->bytes <= writable_total_bytes(fs_info, space_info)) || btrfs_can_overcommit(fs_info, space_info, ticket->bytes, flush)) { btrfs_space_info_update_bytes_may_use(fs_info, @@ -750,6 +768,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, { u64 used; u64 avail; + u64 total; u64 to_reclaim = space_info->reclaim_size; lockdep_assert_held(&space_info->lock); @@ -764,8 +783,9 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, * space. If that's the case add in our overage so we make sure to put * appropriate pressure on the flushing state machine. */ - if (space_info->total_bytes + avail < used) - to_reclaim += used - (space_info->total_bytes + avail); + total = writable_total_bytes(fs_info, space_info); + if (total + avail < used) + to_reclaim += used - (total + avail); return to_reclaim; } @@ -775,9 +795,12 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, { u64 global_rsv_size = fs_info->global_block_rsv.reserved; u64 ordered, delalloc; - u64 thresh = div_factor_fine(space_info->total_bytes, 90); + u64 total = writable_total_bytes(fs_info, space_info); + u64 thresh; u64 used; + thresh = div_factor_fine(total, 90); + lockdep_assert_held(&space_info->lock); /* If we're just plain full then async reclaim just slows us down. */ @@ -839,8 +862,8 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, BTRFS_RESERVE_FLUSH_ALL); used = space_info->bytes_used + space_info->bytes_reserved + space_info->bytes_readonly + global_rsv_size; - if (used < space_info->total_bytes) - thresh += space_info->total_bytes - used; + if (used < total) + thresh += total - used; thresh >>= space_info->clamp; used = space_info->bytes_pinned; @@ -1557,7 +1580,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, * can_overcommit() to ensure we can overcommit to continue. */ if (!pending_tickets && - ((used + orig_bytes <= space_info->total_bytes) || + ((used + orig_bytes <= writable_total_bytes(fs_info, space_info)) || btrfs_can_overcommit(fs_info, space_info, orig_bytes, flush))) { btrfs_space_info_update_bytes_may_use(fs_info, space_info, orig_bytes); diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index e7de24a529cf..3cc356a55c53 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -19,6 +19,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 active_total_bytes; /* total bytes in the space, but only accounts + active block groups. */ u64 bytes_zone_unusable; /* total bytes that are unusable until resetting the device zone */ @@ -124,7 +126,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, u64 bytes_readonly, u64 bytes_zone_unusable, - struct btrfs_space_info **space_info); + bool active, struct btrfs_space_info **space_info); void btrfs_update_space_info_chunk_size(struct btrfs_space_info *space_info, u64 chunk_size); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 4a69e8492177..9cabf088b800 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1838,6 +1838,7 @@ struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info, bool btrfs_zone_activate(struct btrfs_block_group *block_group) { struct btrfs_fs_info *fs_info = block_group->fs_info; + struct btrfs_space_info *space_info = block_group->space_info; struct map_lookup *map; struct btrfs_device *device; u64 physical; @@ -1849,6 +1850,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group) map = block_group->physical_map; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); if (block_group->zone_is_active) { ret = true; @@ -1877,7 +1879,10 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group) /* Successfully activated all the zones */ block_group->zone_is_active = 1; + space_info->active_total_bytes += block_group->length; spin_unlock(&block_group->lock); + btrfs_try_granting_tickets(fs_info, space_info); + spin_unlock(&space_info->lock); /* For the active block group list */ btrfs_get_block_group(block_group); @@ -1890,20 +1895,24 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group) out_unlock: spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); return ret; } static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_written) { struct btrfs_fs_info *fs_info = block_group->fs_info; + struct btrfs_space_info *space_info = block_group->space_info; struct map_lookup *map; bool need_zone_finish; int ret = 0; int i; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); if (!block_group->zone_is_active) { spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); return 0; } @@ -1912,6 +1921,7 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_SYSTEM)) && block_group->start + block_group->alloc_offset > block_group->meta_write_pointer) { spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); return -EAGAIN; } @@ -1924,6 +1934,7 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ */ if (!fully_written) { spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); ret = btrfs_inc_block_group_ro(block_group, false); if (ret) @@ -1935,6 +1946,7 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ btrfs_wait_ordered_roots(fs_info, U64_MAX, block_group->start, block_group->length); + spin_lock(&space_info->lock); spin_lock(&block_group->lock); /* @@ -1943,12 +1955,14 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ */ if (!block_group->zone_is_active) { spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); btrfs_dec_block_group_ro(block_group); return 0; } if (block_group->reserved) { spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); btrfs_dec_block_group_ro(block_group); return -EAGAIN; } @@ -1965,7 +1979,9 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ block_group->free_space_ctl->free_space = 0; btrfs_clear_treelog_bg(block_group); btrfs_clear_data_reloc_bg(block_group); + space_info->active_total_bytes -= block_group->length; spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); map = block_group->physical_map; for (i = 0; i < map->num_stripes; i++) { From patchwork Mon Jul 4 04:58:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904637 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE633CCA479 for ; Mon, 4 Jul 2022 04:59:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232358AbiGDE67 (ORCPT ); Mon, 4 Jul 2022 00:58:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232171AbiGDE6y (ORCPT ); Mon, 4 Jul 2022 00:58:54 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 007335FAA for ; Sun, 3 Jul 2022 21:58:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910733; x=1688446733; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EwDmWFc5fh5vTWPE6ypaTUYaucgB5pHnSi/gvodbfSs=; b=otLk3ZlTSsE1ifoTulv8utpqZzQUI5qXd93G22U6TwdvY7J4GKhI+Xuq CmlfZWRDAW0ix9m79LPq+Ediywrmyhy1OFfUySmyzrr0QZgLi4lX7L5eQ gX48Xfc4cNH8HnKSvNAVR+pYOSa1lEoN8ROImS32fsdOUrK785yaUNI2D +5l9ngiWdwBll1Eb5T22Yhq4u9G4EhMUvH0xq2GUeZMDZDNCHHmnlF+9j qyRUWYbZy0Bg9tg+KE/+mqa0KubcXURwLoj3QkuYTOA/KQC2eQOI2UtVI nWgcT4d5k2OgFPsMe6e5JKPEQ8ThWFlQQEX9tzdauBZLgITmbdeAp1/IX g==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732412" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:53 +0800 IronPort-SDR: G6jXy+mdj8OhArLiJuMVRTyo5ukqdR3eMI/xkIphVuyRo07K5gJPU8LwZqV7HcMAKl8GaYISKW TasAoc8rRDi98ZQVQSCJBPQtaYsOMFub2wXkn0cIrJArBTt8IjcV1wxV11vM669aNLt+4h2pAD 9fRRlNJdb9Uwco8yWNMwwu4moEztMNYjhb/QXIg79W3GrndkYdX2MqbY3hpDasSU03whioobpm dtPlVTI5mrLwtUXDemSbkQu/xdR+alObRClqnhP35XiMlbNOFhoV/D/PW+Ni6/yXv0ESEVr04u 8nLvC8lc/mWjFqcuXjm2Tznv Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:46 -0700 IronPort-SDR: eymQ4IV0M8SXNYztIs5SnsfarfddlmhJgHeidh7j6KpeI9FNDOSOdQHPpl9gVEr2yShn12CMsm aEZLK9GeEQSXEzqcM0y7Q0UY1K9WR0nwIc7iW04iBzn45LtQsKB9d2hK/RaUV694snCeVWHqa0 nWbaDvslUuqEgKiWmeKEkQU84QzKu9xfDbuA+iHwoIh/3r+zynBB+2pkTBpicUFjhKJHrYb8L4 9aQbKVMRENajpn4HsHYNGjEiWvXpXf7mZr/LgRd4eCRwicHdhqvCO3wXcMnWXrUxHyoPIBBiU5 YM0= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:53 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 09/13] btrfs: zoned: disable metadata overcommit for zoned Date: Mon, 4 Jul 2022 13:58:13 +0900 Message-Id: <3d7e559990ec7abe5cc5433b1916f62b5c44e818.1656909695.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The metadata overcommit makes the space reservation flexible but it is also harmful to active zone tracking. Since we cannot finish a block group from the metadata allocation context, we might not activate a new block group and might not be able to actually write out the overcommit reservations. So, disable metadata overcommit for zoned btrfs. We will ensure the reservations are under active_total_bytes in the following patches. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/space-info.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index c7a60341b2d2..4ce9dfbabd97 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -399,7 +399,10 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, return 0; used = btrfs_space_info_used(space_info, true); - avail = calc_available_free_space(fs_info, space_info, flush); + if (btrfs_is_zoned(fs_info) && (space_info->flags & BTRFS_BLOCK_GROUP_METADATA)) + avail = 0; + else + avail = calc_available_free_space(fs_info, space_info, flush); if (used + bytes < writable_total_bytes(fs_info, space_info) + avail) return 1; From patchwork Mon Jul 4 04:58:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904636 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5EB0C43334 for ; Mon, 4 Jul 2022 04:58:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231989AbiGDE66 (ORCPT ); Mon, 4 Jul 2022 00:58:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231915AbiGDE6z (ORCPT ); Mon, 4 Jul 2022 00:58:55 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B98F60EA for ; Sun, 3 Jul 2022 21:58:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910734; x=1688446734; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MmAAo5Vyq6rDLdzjWp3X2BgjBlXR286KTzWnkPad9WE=; b=p2xmuqOMxI1NXwWIjUJbvs7aSPuZDbhYDzTpRUP4I8BX7Bi9zj7chRWC WklL4Qc9x4is4PF0QEyjQYKGarQSTsOztwgwjGS9UnnsgjKC508WRShGG xQ1Q7vFqqY66bE1ang/m1thWAHT32IkFWlRimtFE/IR5KEsr5TlRmT8qw chJBGFRsMErLFdO08HlG5n1rYaxqmZnsSu7G7bh1Hlt4JyeS8dMRCS7cQ poR0QpAx9/BSq6V0loapFGd8RQXm6iF1bxWJgSbAiTJvKdv41TBBsBDgk C1lUU/KDO21l9Gjyj78OUarWo/EIINCRXDS0+dT0GGh1N+8tZyaVjJ5LK A==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732413" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:54 +0800 IronPort-SDR: i7o3NibT7HIIDF7ZSvad8BZgUt1MXm1HX7rcDc+O6XKysIdqts4ly62KCoJEFOtDcTp8KpLvke n+AL11DrtA4dyC4XixZJV2SNhaxMgEpr2+aV/cE15eUHjxabJWjWATl5ZLgel4KP5EZnEPUnly zKJwplc6MLuJwc/97OybK6w9r2ONwEhJfP0AITLMoQ+SwmEEqJbR2cwHdBUiig8SBjiVQgwFG2 mdJdJAV+o0QfYYqwWUz+myUd2AYWf+taVjww/0C+VWtgniI40iG2uyLgxLcNqLY05+LhJWEA78 w0CD11/sm0tuqfHRWHlixPvS Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:47 -0700 IronPort-SDR: YQmn1QRq20CgdXoeuk1FW4Y1IR/cjv3+PCt9GkYUbkCia5x9JWUQywqnCKb88YLtRp1acRQuYS KKF141+F5PHDFzuErptl7f4MgKXN+YgVN2SeZ50m0AU6ARxGDVvsQS3hG35QHCzFV9skdCrjfR qRwqudQdfDMPbeW8Ca+HbjsY16f6LRANkBTaJwVogs967u91wJJZNrs8Cp7saUuL8rZOjY7UrO 1mvMaDQSlF3bOaSt9UvUOLxTuKAAIer31RuY4xk5Muyt1TuwH+7mrq0LkAGSStBkn7ESmVeC7/ 7uM= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:54 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 10/13] btrfs: zoned: activate metadata BG on flush_space Date: Mon, 4 Jul 2022 13:58:14 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For metadata space on zoned btrfs, reaching ALLOC_CHUNK{,_FORCE} means we don't have enough space left in the active_total_bytes. Before allocating a new chunk, we can try to activate an existing block group in this case. Also, allocating a chunk is not enough to grant a ticket for metadata space on zoned btrfs. We need to activate the block group to increase the active_total_bytes. btrfs_zoned_activate_one_bg() implements the activation feature. It will activate a block group by (maybe) finishing a block group. It will give up activating a block group if it cannot finish any block group. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota Reviewed-by: Johannes Thumshirn --- fs/btrfs/space-info.c | 20 +++++++++++++++++++ fs/btrfs/zoned.c | 45 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 10 ++++++++++ 3 files changed, 75 insertions(+) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 4ce9dfbabd97..f35f36d89660 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -9,6 +9,7 @@ #include "ordered-data.h" #include "transaction.h" #include "block-group.h" +#include "zoned.h" /* * HOW DOES SPACE RESERVATION WORK @@ -724,6 +725,15 @@ static void flush_space(struct btrfs_fs_info *fs_info, break; case ALLOC_CHUNK: case ALLOC_CHUNK_FORCE: + /* + * For metadata space on zoned btrfs, reaching here means we + * don't have enough space left in active_total_bytes. Try to + * activate a block group first, because we may have inactive + * block group already allocated. + */ + if (btrfs_zoned_activate_one_bg(fs_info, space_info, false)) + break; + trans = btrfs_join_transaction(root); if (IS_ERR(trans)) { ret = PTR_ERR(trans); @@ -734,6 +744,16 @@ static void flush_space(struct btrfs_fs_info *fs_info, (state == ALLOC_CHUNK) ? CHUNK_ALLOC_NO_FORCE : CHUNK_ALLOC_FORCE); btrfs_end_transaction(trans); + + /* + * For metadata space on zoned btrfs, allocating a new chunk is + * not enough. We still need to activate the block group. Active + * the newly allocated block group by (maybe) finishing a block + * group. + */ + if (ret == 1) + btrfs_zoned_activate_one_bg(fs_info, space_info, true); + if (ret > 0 || ret == -ENOSPC) ret = 0; break; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 9cabf088b800..6441a311e658 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2234,3 +2234,48 @@ bool btrfs_finish_one_bg(struct btrfs_fs_info *fs_info) return ret == 0; } + +bool btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, + bool do_finish) +{ + struct btrfs_block_group *bg; + bool need_finish; + int index; + + if (!btrfs_is_zoned(fs_info) || (space_info->flags & BTRFS_BLOCK_GROUP_DATA)) + return false; + + /* No more block group to activate */ + if (space_info->active_total_bytes == space_info->total_bytes) + return false; + + for (;;) { + need_finish = false; + down_read(&space_info->groups_sem); + for (index = 0; index < BTRFS_NR_RAID_TYPES; index++) { + list_for_each_entry(bg, &space_info->block_groups[index], list) { + if (!spin_trylock(&bg->lock)) + continue; + if (bg->zone_is_active) { + spin_unlock(&bg->lock); + continue; + } + spin_unlock(&bg->lock); + + if (btrfs_zone_activate(bg)) { + up_read(&space_info->groups_sem); + return true; + } + + need_finish = true; + } + } + up_read(&space_info->groups_sem); + + if (!do_finish || !need_finish || !btrfs_finish_one_bg(fs_info)) + break; + } + + return false; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 09a19772ee68..1beca00c69fc 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -81,6 +81,8 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info); void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, u64 length); bool btrfs_finish_one_bg(struct btrfs_fs_info *fs_info); +bool btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, bool do_finish); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -256,6 +258,14 @@ static inline bool btrfs_finish_one_bg(struct btrfs_fs_info *fs_info) return true; } +static inline bool btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, + bool do_finish) +{ + /* Consider all the BGs are active */ + return false; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Jul 4 04:58:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904638 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E60AC433EF for ; Mon, 4 Jul 2022 04:59:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232171AbiGDE7A (ORCPT ); Mon, 4 Jul 2022 00:59:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231356AbiGDE64 (ORCPT ); Mon, 4 Jul 2022 00:58:56 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5035662E7 for ; Sun, 3 Jul 2022 21:58:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910734; x=1688446734; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=a9skFt42ZiIPVwzVrx254d/+4T4+JdXMMXYVtNvFxB8=; b=k/9oEzEZIHu13C6HlFN/z4D3OY90uYADbL9eHflaMEH+USQ8zn8iNMbL 010xanHTXBBG+WslhBvLswFRcRDiHLiY49WbKqJVEhwOQWCN5b+mVE6eS o5jqDtWCZCMn5Td9tWU3bAOAxTdo6Nh7wooa87obP985Ro1/MUzPKTNBj RWAWn+hy+QRPBmVYb47ICm+hKKVOHxPrZWO5QePdPAvCEov5WucFxODRf mSxElYCsA3pUszK/L0GuxCZeBRf49ZfgJc3JUAbwupxnQR3RrqKA13NAw kJSnxryNDF4rRlUXF7bm1TwWVYJrUxsD7TvVex5pjFyhRJJyV34vX1Fw6 g==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732414" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:54 +0800 IronPort-SDR: ZmAWaU1ROv4TV50bk0I5AI4l4h4lrJgMSThhUzo6vR+mEhcPeSxHxXi5jCJagmZWPNS6ArlRyS 7hk9vAuQ7JizdXu2bxqJJDsAas6MNLE9fiU1B/1M8asmSWZHfkwAlxepY4kVOvvb+ZVx0uDxKZ 4s3OM/ZQnPXHJLKG4bpwkjH+fBaBMOdoU58nZ5YWI+dB9tOSjxTokKeqT5Fhof1aK8+VTRF/5l LC41YmdfX/h7tk6VaWZ5aC3f1vrcBTw2D8MQ/yTcD1BbZcm25Lquky92JXopBc5EXX8Vg5k5G8 du3wdgcFJSV+7reKZRGVaD6m Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:47 -0700 IronPort-SDR: s81DjSIuHLRhRM0aPErxmH9+++xn1tnaV+1CGA7u8P/xzbSJvWlz/b0TPSpNtmy9nYm/13VwuI rz3OYGHEWzl+A/l/MZ1cjUwUKNTesN2epDTQRmnyEvi09/5n3h09umPKXbTWjlrsKZKLsu0OYM MwIgSIUXXBBy3e0Pkxm8Su637qR2Hgk3KGP4OJLQrxQlYp0LDuyxdq+ZWZti/sedjAOmsTTPhC 86jRyU/8Zpy8MzmE3UbmChL4ZNkAd+58TUDJa+ZYnuU0skrKf3mY/YnegJVnq8g3RYPF+opRRQ UbM= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:55 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 11/13] btrfs: zoned: activate necessary block group Date: Mon, 4 Jul 2022 13:58:15 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are two places where allocating a chunk is not enough. These two places are trying to ensure the space by allocating a chunk. To meet the condition for active_total_bytes, we also need to activate a block group there. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 51e7c1f1d93f..1c22cfe91a65 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2664,6 +2664,11 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache, ret = btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE); if (ret < 0) goto out; + /* + * We have allocated a new chunk. We also need to activate that chunk to + * grant metadata tickets for zoned btrfs. + */ + btrfs_zoned_activate_one_bg(fs_info, cache->space_info, true); ret = inc_block_group_ro(cache, 0); if (ret == -ETXTBSY) goto unlock_out; @@ -3889,6 +3894,12 @@ static void reserve_chunk_space(struct btrfs_trans_handle *trans, if (IS_ERR(bg)) { ret = PTR_ERR(bg); } else { + /* + * We have a new chunk. We also need to activate it for + * zoned btrfs. + */ + btrfs_zoned_activate_one_bg(fs_info, info, true); + /* * If we fail to add the chunk item here, we end up * trying again at phase 2 of chunk allocation, at From patchwork Mon Jul 4 04:58:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904639 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89C95C43334 for ; Mon, 4 Jul 2022 04:59:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232254AbiGDE7B (ORCPT ); Mon, 4 Jul 2022 00:59:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232348AbiGDE64 (ORCPT ); Mon, 4 Jul 2022 00:58:56 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A02726D0 for ; Sun, 3 Jul 2022 21:58:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910735; x=1688446735; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rqvPzvggZRUsTFQpMlHe+FsZmzSbXOE57YSRsqXgEbw=; b=A3uVvroTpyQmUXcg27Drj8XB/x9vVyKXvifb4EmX9DJdLmb4yKBc5/ID 398p+FYWrKKHUuqLJFPTcZh2WtRyX7xr/pRpP/X9rgnFrPDdW7deJ/B0C PMDuJoWSpiEPpBpMFxFBk2+oNIsJr61Hqi5R5BQnPJb/jEd9GGa5MBkIy 73aAYlwrkMTZWXDnkItgB1EauVs37d6jvjHOgMcwSwZSybCSEPn8Gemqj s/ixvrVCvDXzCDYeGwIkESjru/mHhOxrGrFbUCRm8Cq6KMspI8jqgPVcF ROZkRhDF4RwjB/5At9x623OHiuwAe6Fgl6Xr6ZJi1i+5CPXZVQhoHhXLv Q==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732416" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:55 +0800 IronPort-SDR: AFNUwXZ8JTRPnCJqZmc/RDGUp/dJN4NSJhMBuasw3O1wj9LC05i9ERyMYvsaAgwihPx0B+Sze/ aGx2zVxVkzfgzaKVKfrt/zRrrEk9okdtDlYr+RgoyrhXOL9GCZ3aPyD5FurflYZhQvG79U7LGE VVR7LMVXLD1gbMHYQv45MtJe4xdiyPLlcN5HbBWPF9d9W4IQKr6ReAmrtVQmyX4vAm0+3HfGmN bWG0JL0i9SlSBCS2HZJ4ejA3QGc/YMermym9U9x+AAWHGI3U71L8QdAILw5ubY28w6KufvOu0C 43vEAs9fCoL7awagf74ysqha Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:48 -0700 IronPort-SDR: YsMAAnorb+sVsV/VwS2ooyg/RuOg0SVpKfHnHLMz3vPpcQnnVhJzWOP3TSUAq6ALsue3YCv4jH Y5usn6KhH7zD+UfhCrFxQ0VfLzQdkHXrVE0VIV0hklhNUyzzpk/4firuVkCCAIdiz2N7o9lEWb k/Lh7aWRYbqRybjMZEGmpDLawecqm8l6pylGeOMT/d84yXXRjXy1YUuadSY/IaRelwAyzgRtfm 5mhIpnAjstGv1HGtQ+kA12SuqvlPs4P+QdGsvO1PkiasoNPYU0EV29pZcyUu0HgHh0Tmln2pZx JHA= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:56 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 12/13] btrfs: zoned: write out partially allocated region Date: Mon, 4 Jul 2022 13:58:16 +0900 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org cow_file_range() works in an all-or-nothing way: if it fails to allocate an extent for a part of the given region, it gives up all the region including the successfully allocated parts. On cow_file_range(), run_delalloc_zoned() writes data for the region only when it successfully allocate all the region. This all-or-nothing allocation and write-out are problematic when available space in all the block groups are get tight with the active zone restriction. btrfs_reserve_extent() try hard to utilize the left space in the active block groups and gives up finally and fails with -ENOSPC. However, if we send IOs for the successfully allocated region, we can finish a zone and can continue on the rest of the allocation on a newly allocated block group. This patch implements the partial write-out for run_delalloc_zoned(). With this patch applied, cow_file_range() returns -EAGAIN to tell the caller to do something to progress the further allocation, and tells the successfully allocated region with done_offset. Furthermore, the zoned extent allocator returns -EAGAIN to tell cow_file_range() going back to the caller side. Actually, we still need to wait for an IO to complete to continue the allocation. The next patch implements that part. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 10 +++++++ fs/btrfs/inode.c | 63 ++++++++++++++++++++++++++++++++---------- 2 files changed, 59 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 62e75c1d1155..5637d1cea1c5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3989,6 +3989,16 @@ static int can_allocate_chunk_zoned(struct btrfs_fs_info *fs_info, if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size) return -ENOSPC; + /* + * Even min_alloc_size is not left in any block groups. Since we cannot + * activate a new block group, allocating it may not help. Let's tell a + * caller to try again and hope it progress something by writing some + * parts of the region. That is only possible for data block groups, + * where a part of the region can be written. + */ + if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) + return -EAGAIN; + /* * We cannot activate a new block group and no enough space left in any * block groups. So, allocating a new block group may not help. But, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 357322da51b5..163f3d995f00 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -117,7 +117,8 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback); static noinline int cow_file_range(struct btrfs_inode *inode, struct page *locked_page, u64 start, u64 end, int *page_started, - unsigned long *nr_written, int unlock); + unsigned long *nr_written, int unlock, + u64 *done_offset); static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start, u64 len, u64 orig_start, u64 block_start, u64 block_len, u64 orig_block_len, @@ -921,7 +922,7 @@ static int submit_uncompressed_range(struct btrfs_inode *inode, * can directly submit them without interruption. */ ret = cow_file_range(inode, locked_page, start, end, &page_started, - &nr_written, 0); + &nr_written, 0, NULL); /* Inline extent inserted, page gets unlocked and everything is done */ if (page_started) { ret = 0; @@ -1170,7 +1171,8 @@ static u64 get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, static noinline int cow_file_range(struct btrfs_inode *inode, struct page *locked_page, u64 start, u64 end, int *page_started, - unsigned long *nr_written, int unlock) + unsigned long *nr_written, int unlock, + u64 *done_offset) { struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; @@ -1363,6 +1365,21 @@ static noinline int cow_file_range(struct btrfs_inode *inode, btrfs_dec_block_group_reservations(fs_info, ins.objectid); btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1); out_unlock: + /* + * If done_offset is non-NULL and ret == -EAGAIN, we expect the + * caller to write out the successfully allocated region and retry. + */ + if (done_offset && ret == -EAGAIN) { + if (orig_start < start) + *done_offset = start - 1; + else + *done_offset = start; + return ret; + } else if (ret == -EAGAIN) { + /* Convert to -ENOSPC since the caller cannot retry. */ + ret = -ENOSPC; + } + /* * Now, we have three regions to clean up: * @@ -1608,19 +1625,37 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode, u64 end, int *page_started, unsigned long *nr_written) { + u64 done_offset = end; int ret; + bool locked_page_done = false; - ret = cow_file_range(inode, locked_page, start, end, page_started, - nr_written, 0); - if (ret) - return ret; + while (start <= end) { + ret = cow_file_range(inode, locked_page, start, end, page_started, + nr_written, 0, &done_offset); + if (ret && ret != -EAGAIN) + return ret; - if (*page_started) - return 0; + if (*page_started) { + ASSERT(ret == 0); + return 0; + } + + if (ret == 0) + done_offset = end; + + if (done_offset == start) + return -ENOSPC; + + if (!locked_page_done) { + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + } + locked_page_done = true; + extent_write_locked_range(&inode->vfs_inode, start, done_offset); + + start = done_offset + 1; + } - __set_page_dirty_nobuffers(locked_page); - account_page_redirty(locked_page); - extent_write_locked_range(&inode->vfs_inode, start, end); *page_started = 1; return 0; @@ -1712,7 +1747,7 @@ static int fallback_to_cow(struct btrfs_inode *inode, struct page *locked_page, } return cow_file_range(inode, locked_page, start, end, page_started, - nr_written, 1); + nr_written, 1, NULL); } struct can_nocow_file_extent_args { @@ -2185,7 +2220,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page page_started, nr_written); else ret = cow_file_range(inode, locked_page, start, end, - page_started, nr_written, 1); + page_started, nr_written, 1, NULL); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end, From patchwork Mon Jul 4 04:58:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12904640 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 644EAC433EF for ; Mon, 4 Jul 2022 04:59:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232333AbiGDE7B (ORCPT ); Mon, 4 Jul 2022 00:59:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232410AbiGDE65 (ORCPT ); Mon, 4 Jul 2022 00:58:57 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE15D6322 for ; Sun, 3 Jul 2022 21:58:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1656910736; x=1688446736; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EZ8iZ6Ds6o4LNdZzKxneBuHgIAcCSIPTK4CUbY/sFxM=; b=dsS2PRlPRYAFgdGQHRnQWcH3YXTRa/65/Qz2UGu2yi7o2nwJ7S1etkhB 7MrgsendeHxz0nMj9vjNZgcHlDTYytgBcY+Dep2zVARJAArQEUCN7BD8p RvrfYmY1zc7sttLLnKQum9QR5uN4rlfuKJFyJgQQ5ZRQkg4q+HLaTGhbB v1giJ84VBiIyLi+RxiATpnZNFmhNMz1Eg/wZmnGEtHZ4KISYfS83mWtIe vmtCHgoczLlsjNdPjCsKqtiEQVH/C6KcqbthhaCHz2HYeMdJtvkoZ7WXx nTrOoepuNEit7J+z6lPNZVbBPup1aRV7KWiLfz9Nf2gU94Cw/+LzOWksC Q==; X-IronPort-AV: E=Sophos;i="5.92,243,1650902400"; d="scan'208";a="204732417" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Jul 2022 12:58:56 +0800 IronPort-SDR: xqNMhiWCjnVmImnJnCIDbcnljfvforuRYMij+ZBi+uP/+xi5KOSU8as5xrQn4OXUAPk+OyEHW4 3tTJuDzi2f4wGFUvC4SWCGj/40v03BqqWm15z9+6m/dJPw6oSudz1vWKeQBJrZyba0J2+2yrbN j75w7f91LzujLaKvSeEh+2x6Fz2zTqjop4NNJqB5CyGioFvATs8AIIG9bHXDB+2g1MhxXri1+c 8C0rc5lNiW3l5YsaR/PkCXZgNkhoqa3REUMaXp57JGoCi9X7tds1JAAQNnBabZF5Tjp9Sfl92A 0psm/NCbKtNEHiMWPAMg+hH3 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 03 Jul 2022 21:20:49 -0700 IronPort-SDR: ELbWEm+XjGUS8LA/LaCaT7kooud1b/mgCiqzNY0mW7zutZbQCmNRKFTkAT10TGdj2ig/UYdguT 0uoozp21uizKHcoCklnP+whMzW+NRb9SdwuMSTzZpleVNw8PWYNgfojNLlRlGY5quVxwXussec Q8ol+DPu/jWYAtvcrUshul1TRp7tF0uc/80yM6f6peynf3npjN03P9+mD8su0dUF+dU+9m+R83 oeEzVpxprdc+3dTXfu8IT0LC/xj9UKLbcym//1h+Laq6nCvkHVkxmJSU9ca6h2bOG1xzl2WdBT kN0= WDCIronportException: Internal Received: from h5lk5s2.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.119]) by uls-op-cesaip02.wdc.com with ESMTP; 03 Jul 2022 21:58:56 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH 13/13] btrfs: zoned: wait until zone is finished when allocation didn't progress Date: Mon, 4 Jul 2022 13:58:17 +0900 Message-Id: <756120a8d4216aff3426a670385182f40421bdc0.1656909695.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When the allocated position doesn't progress, we cannot submit IOs to finish a block group, but there should be ongoing IOs that will finish a block group. So, in that case, we wait for a zone to be finished and retry the allocation after that. Introduce a new flag BTRFS_FS_NEED_ZONE_FINISH for fs_info->flags to indicate we need a zone finish to have proceeded. The flag is set when the allocator detected it cannot activate a new block group. And, it is cleared once a zone is finished. CC: stable@vger.kernel.org # 5.16+ Fixes: afba2bc036b0 ("btrfs: zoned: implement active zone tracking") Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 4 ++++ fs/btrfs/disk-io.c | 1 + fs/btrfs/inode.c | 9 +++++++-- fs/btrfs/zoned.c | 6 ++++++ 4 files changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4aac7df5a17d..bace2f2eb9d5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -638,6 +638,9 @@ enum { /* Indicate we have half completed snapshot deletions pending. */ BTRFS_FS_UNFINISHED_DROPS, + /* Indicate we have to finish a zone to do next allocation. */ + BTRFS_FS_NEED_ZONE_FINISH, + #if BITS_PER_LONG == 32 /* Indicate if we have error/warn message printed on 32bit systems */ BTRFS_FS_32BIT_ERROR, @@ -1084,6 +1087,7 @@ struct btrfs_fs_info { spinlock_t zone_active_bgs_lock; struct list_head zone_active_bgs; + wait_queue_head_t zone_finish_wait; /* Updates are not protected by any lock */ struct btrfs_commit_stats commit_stats; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ef9d28147b9e..b76b7ef6d85d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3131,6 +3131,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) init_waitqueue_head(&fs_info->transaction_blocked_wait); init_waitqueue_head(&fs_info->async_submit_wait); init_waitqueue_head(&fs_info->delayed_iputs_wait); + init_waitqueue_head(&fs_info->zone_finish_wait); /* Usable values until the real ones are cached from the superblock */ fs_info->nodesize = 4096; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 163f3d995f00..d5f27cd1eef2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1643,8 +1643,13 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode, if (ret == 0) done_offset = end; - if (done_offset == start) - return -ENOSPC; + if (done_offset == start) { + struct btrfs_fs_info *info = inode->root->fs_info; + + wait_var_event(&info->zone_finish_wait, + !test_bit(BTRFS_FS_NEED_ZONE_FINISH, &info->flags)); + continue; + } if (!locked_page_done) { __set_page_dirty_nobuffers(locked_page); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 6441a311e658..3503dd29eab0 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2015,6 +2015,9 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ /* For active_bg_list */ btrfs_put_block_group(block_group); + clear_bit(BTRFS_FS_NEED_ZONE_FINISH, &fs_info->flags); + wake_up_all(&fs_info->zone_finish_wait); + return 0; } @@ -2051,6 +2054,9 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) } mutex_unlock(&fs_info->chunk_mutex); + if (!ret) + set_bit(BTRFS_FS_NEED_ZONE_FINISH, &fs_info->flags); + return ret; }