From patchwork Mon Aug 8 04:38:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12938468 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4483C25B0D for ; Mon, 8 Aug 2022 04:38:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233637AbiHHEi2 (ORCPT ); Mon, 8 Aug 2022 00:38:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232666AbiHHEi1 (ORCPT ); Mon, 8 Aug 2022 00:38:27 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BB9AE082; Sun, 7 Aug 2022 21:38:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1659933505; x=1691469505; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c8brNwEoa9N5yGdYygr0C5/QZh7F//f9InSCbd1Mw6I=; b=iwQT6GDqQi4GIUdiTHmUI5agM86b9I1RIrT18JHwV16/2kIyBKa85SJ6 ODCm2CjPo6j/vXadQ6iRN6ndOoMj9/zGIaQnUcrk+1s0u8ydIWuOU+XMl hQGqFKG8CP4YpLDBBrisAgGYuBOI9rgsvPkWNGlafD1v3LIYMcoa1MNbb CYba2tYwfB7TAkk9JtpyKqQm/5TZfcEH+igUlRdZlPCkLNqQtWG/4SfTS YLlk8rxLKD9/Dh+7WDgnO/7nJWAchbbr0ksRridxl7BtCApAx3s8qCmaH 7dq7p/cFYbYlz97K1J7hCW3j/84edER8e5KgMp6gcn8K/DOPr/fTSG48W w==; X-IronPort-AV: E=Sophos;i="5.93,221,1654531200"; d="scan'208";a="208100751" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2022 12:38:24 +0800 IronPort-SDR: k1HHzgc/zWz7NAiiNF9K7c/73Vte2hSbXjd1WNLHQuMmmNuwF46rx9gKMZH0xcMqZtG117JOdu ZpTsPUw0sbQprOafTJNlWUTaA93ItAbEZY2HkOKdURygGoHtGbqGGEdcLzP4uAj/9N3fTTMFpp NWcVM9bH/gQ4ygZG5FT2nPbB/kDyUBKPYfmQImtc3Nws6bhiuLZCLnZV1/63XoRbEJdaciDfJf WiMK8H5SBjCNEJyh/HIsa+wbvfBlK+Fk6wQYwSZlk0bF5KTSitgRz9bEDx3gZ1Cd9SCduaaFcD 5Efm4S/cLOOXDUED2DzP7uuV Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 07 Aug 2022 20:54:06 -0700 IronPort-SDR: RnZ/B6r4+6c7y7ucId14T0RRunVDObNPE9lMak9V/l7FVsh3TQSXL55ypFC2JlJdG6u2kZDi8/ 1iQJw+sWI/juMIfMSzKl1kkqJzdyU957UwEVSNGXS9414yyhGn+WLlJcVemrCw0kLZ7ZbPtA1q UFH9cjaOHcKS+n5h4SxNuAV4omXXPYXo1hqxT7+Fyr44ASJiwIHGSmXdlKrDR8xtyfh0mS4r2l jqu1lK+kR0UWHTqZaknTXRuPfRcRNxqqPH3LOl+zWvh+Tvmz/J8qIHjU8yiw8DNj5atp7KP3dm i9o= WDCIronportException: Internal Received: from ctl002.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.129]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Aug 2022 21:38:25 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, stable@vger.kernel.org Cc: Naohiro Aota , David Sterba Subject: [PATCH STABLE 5.15 1/2] btrfs: zoned: prevent allocation from previous data relocation BG Date: Mon, 8 Aug 2022 13:38:17 +0900 Message-Id: <20220808043818.1183760-2-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220808043818.1183760-1-naohiro.aota@wdc.com> References: <20220808043818.1183760-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org commit 343d8a30851c48a4ef0f5ef61d5e9fbd847a6883 upstream After commit 5f0addf7b890 ("btrfs: zoned: use dedicated lock for data relocation"), we observe IO errors on e.g, btrfs/232 like below. [09.0][T4038707] WARNING: CPU: 3 PID: 4038707 at fs/btrfs/extent-tree.c:2381 btrfs_cross_ref_exist+0xfc/0x120 [btrfs] [09.9][T4038707] Call Trace: [09.5][T4038707] [09.3][T4038707] run_delalloc_nocow+0x7f1/0x11a0 [btrfs] [09.6][T4038707] ? test_range_bit+0x174/0x320 [btrfs] [09.2][T4038707] ? fallback_to_cow+0x980/0x980 [btrfs] [09.3][T4038707] ? find_lock_delalloc_range+0x33e/0x3e0 [btrfs] [09.5][T4038707] btrfs_run_delalloc_range+0x445/0x1320 [btrfs] [09.2][T4038707] ? test_range_bit+0x320/0x320 [btrfs] [09.4][T4038707] ? lock_downgrade+0x6a0/0x6a0 [09.2][T4038707] ? orc_find.part.0+0x1ed/0x300 [09.5][T4038707] ? __module_address.part.0+0x25/0x300 [09.0][T4038707] writepage_delalloc+0x159/0x310 [btrfs] [09.4][ C3] sd 10:0:1:0: [sde] tag#2620 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [09.5][ C3] sd 10:0:1:0: [sde] tag#2620 Sense Key : Illegal Request [current] [09.9][ C3] sd 10:0:1:0: [sde] tag#2620 Add. Sense: Unaligned write command [09.5][ C3] sd 10:0:1:0: [sde] tag#2620 CDB: Write(16) 8a 00 00 00 00 00 02 f3 63 87 00 00 00 2c 00 00 [09.4][ C3] critical target error, dev sde, sector 396041272 op 0x1:(WRITE) flags 0x800 phys_seg 3 prio class 0 [09.9][ C3] BTRFS error (device dm-1): bdev /dev/mapper/dml_102_2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 The IO errors occur when we allocate a regular extent in previous data relocation block group. On zoned btrfs, we use a dedicated block group to relocate a data extent. Thus, we allocate relocating data extents (pre-alloc) only from the dedicated block group and vice versa. Once the free space in the dedicated block group gets tight, a relocating extent may not fit into the block group. In that case, we need to switch the dedicated block group to the next one. Then, the previous one is now freed up for allocating a regular extent. The BG is already not enough to allocate the relocating extent, but there is still room to allocate a smaller extent. Now the problem happens. By allocating a regular extent while nocow IOs for the relocation is still on-going, we will issue WRITE IOs (for relocation) and ZONE APPEND IOs (for the regular writes) at the same time. That mixed IOs confuses the write pointer and arises the unaligned write errors. This commit introduces a new bit 'zoned_data_reloc_ongoing' to the btrfs_block_group. We set this bit before releasing the dedicated block group, and no extent are allocated from a block group having this bit set. This bit is similar to setting block_group->ro, but is different from it by allowing nocow writes to start. Once all the nocow IO for relocation is done (hooked from btrfs_finish_ordered_io), we reset the bit to release the block group for further allocation. Fixes: c2707a255623 ("btrfs: zoned: add a dedicated data relocation block group") CC: stable@vger.kernel.org # 5.16+ Signed-off-by: Naohiro Aota Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/block-group.h | 1 + fs/btrfs/extent-tree.c | 20 ++++++++++++++++++-- fs/btrfs/inode.c | 2 ++ fs/btrfs/zoned.c | 27 +++++++++++++++++++++++++++ fs/btrfs/zoned.h | 5 +++++ 5 files changed, 53 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 37e55ebde735..d73db0dfacb2 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -98,6 +98,7 @@ struct btrfs_block_group { unsigned int to_copy:1; unsigned int relocating_repair:1; unsigned int chunk_item_inserted:1; + unsigned int zoned_data_reloc_ongoing:1; int disk_cache_state; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e3514f9a4e8d..248ea15c9734 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3804,7 +3804,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, block_group->start == fs_info->data_reloc_bg || fs_info->data_reloc_bg == 0); - if (block_group->ro) { + if (block_group->ro || block_group->zoned_data_reloc_ongoing) { ret = 1; goto out; } @@ -3865,8 +3865,24 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, out: if (ret && ffe_ctl->for_treelog) fs_info->treelog_bg = 0; - if (ret && ffe_ctl->for_data_reloc) + if (ret && ffe_ctl->for_data_reloc && + fs_info->data_reloc_bg == block_group->start) { + /* + * Do not allow further allocations from this block group. + * Compared to increasing the ->ro, setting the + * ->zoned_data_reloc_ongoing flag still allows nocow + * writers to come in. See btrfs_inc_nocow_writers(). + * + * We need to disable an allocation to avoid an allocation of + * regular (non-relocation data) extent. With mix of relocation + * extents and regular extents, we can dispatch WRITE commands + * (for relocation extents) and ZONE APPEND commands (for + * regular extents) at the same time to the same zone, which + * easily break the write pointer. + */ + block_group->zoned_data_reloc_ongoing = 1; fs_info->data_reloc_bg = 0; + } spin_unlock(&fs_info->relocation_bg_lock); spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ea7262050790..1b4fee8a2f28 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3069,6 +3069,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) ordered_extent->file_offset, ordered_extent->file_offset + logical_len); + btrfs_zoned_release_data_reloc_bg(fs_info, ordered_extent->disk_bytenr, + ordered_extent->disk_num_bytes); } else { BUG_ON(root == fs_info->tree_root); ret = insert_ordered_extent_file_extent(trans, ordered_extent); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 574769f921a2..fc791f7c7142 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1623,3 +1623,30 @@ void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) } mutex_unlock(&fs_devices->device_list_mutex); } + +void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, + u64 length) +{ + struct btrfs_block_group *block_group; + + if (!btrfs_is_zoned(fs_info)) + return; + + block_group = btrfs_lookup_block_group(fs_info, logical); + /* It should be called on a previous data relocation block group. */ + ASSERT(block_group && (block_group->flags & BTRFS_BLOCK_GROUP_DATA)); + + spin_lock(&block_group->lock); + if (!block_group->zoned_data_reloc_ongoing) + goto out; + + /* All relocation extents are written. */ + if (block_group->start + block_group->alloc_offset == logical + length) { + /* Now, release this block group for further allocations. */ + block_group->zoned_data_reloc_ongoing = 0; + } + +out: + spin_unlock(&block_group->lock); + btrfs_put_block_group(block_group); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 3a826f7c2040..574490ea2cc8 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -70,6 +70,8 @@ struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info, u64 logical, u64 length); void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg); void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info); +void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, + u64 length); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -207,6 +209,9 @@ static inline struct btrfs_device *btrfs_zoned_get_device( static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { } static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { } + +static inline void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, + u64 logical, u64 length) { } #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Aug 8 04:38:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12938469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A11F6C25B0C for ; Mon, 8 Aug 2022 04:38:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233841AbiHHEia (ORCPT ); Mon, 8 Aug 2022 00:38:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233189AbiHHEi2 (ORCPT ); Mon, 8 Aug 2022 00:38:28 -0400 Received: from esa5.hgst.iphmx.com (esa5.hgst.iphmx.com [216.71.153.144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C581DFD9; Sun, 7 Aug 2022 21:38:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1659933506; x=1691469506; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qGkEyaTmlYStnq2DFOKi3L5tBXmJjPOAKxO4uAvxLSo=; b=TdtqquxiKr2rg2To0vGcRH4lpHjOeQANfJHUQlcDMhQXY18Q/5W5evBm 6IoakoyLwr//vYb0oUaQaLXyRhOhXO5HLgGQMtxNSf9I990fKv1n7qpqy JS245+Wvw2ySpxf690HD3DXpn5LYA290WL58anJqN7y8LmXSCxqdnhNu3 dkKit+YXC+v0WwCUfrG7H1YmWuNpwGqqoQRHcPCpW68ZwkBmmQ3774p9Y Z0xPln5YJUGnjTMwOmanthOfbRmQYT68DgM/MVJDhZUREaLoRyoNN0iTD REkuhOyQt0SovmgNYy+kKQZnjYqfXKiKPm/D4nqzxRbCvIOSmt5mown5V Q==; X-IronPort-AV: E=Sophos;i="5.93,221,1654531200"; d="scan'208";a="208100755" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2022 12:38:26 +0800 IronPort-SDR: wsaR0+TRbCny2sZD5p/Q4a++H1mpU68AYpj8fRI3DlVzg0BxExb+8/zK51L4f7Q72eVNpvanb3 2ERC2m75E5pjmcrJoz4Gnqge7sy7kApzhan2PgdgYAegDWOXUkQqEOTJu0dWHK6GsFhWAl2Mvb 7njpCfSOhZxKq+2/7KJKMCoVjiYnyE8P8fANQfjfBZjzRBWupNlID0i7hIwx/HEicLl74htnHG bzFOVy0iYjLPpK4XKsj0X2IRzLWyfuz60q7zWf0hw5euTVCalE9AwQQLIo2DBgyhBoUZ6Ekyk1 IGvd16kHrTBVnH+ZK/AKX4Aj Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 07 Aug 2022 20:54:08 -0700 IronPort-SDR: Eng3JXc/mYj4JObpFZLpRAlsvTfM9KzfOBxrB//WYio7L43lt4fPvqBqagQ8jmhAbExWFw+JvP QfwjMS+bi37Ccv1e38KPPiTyX3t08Y3CAJyF7CDnPnVwMb4kAaMs9c7qzCs84mxRay/Ugcpowo 4LVjLsDzoi8UN9+lpy/2Ft581We7ATHOp1lle1TL8ImcWesPRCpZRL3yQa7sZap27eAHktw68t GARh9chrgTXYgDX1A8IDoNFybWC61PrQr6kX2uF/sSJSW54smT4cydsOw0en/Vlpre3YEoPxtf DAc= WDCIronportException: Internal Received: from ctl002.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.129]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Aug 2022 21:38:26 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, stable@vger.kernel.org Cc: Naohiro Aota , Johannes Thumshirn , David Sterba Subject: [PATCH STABLE 5.15 2/2] btrfs: zoned: fix critical section of relocation inode writeback Date: Mon, 8 Aug 2022 13:38:18 +0900 Message-Id: <20220808043818.1183760-3-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220808043818.1183760-1-naohiro.aota@wdc.com> References: <20220808043818.1183760-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org commit 19ab78ca86981e0e1e73036fb73a508731a7c078 upstream We use btrfs_zoned_data_reloc_{lock,unlock} to allow only one process to write out to the relocation inode. That critical section must include all the IO submission for the inode. However, flush_write_bio() in extent_writepages() is out of the critical section, causing an IO submission outside of the lock. This leads to an out of the order IO submission and fail the relocation process. Fix it by extending the critical section. Fixes: 35156d852762 ("btrfs: zoned: only allow one process to add pages to a relocation inode") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b791e280af0c..a90546b3107c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5152,13 +5152,14 @@ int extent_writepages(struct address_space *mapping, */ btrfs_zoned_data_reloc_lock(BTRFS_I(inode)); ret = extent_write_cache_pages(mapping, wbc, &epd); - btrfs_zoned_data_reloc_unlock(BTRFS_I(inode)); ASSERT(ret <= 0); if (ret < 0) { + btrfs_zoned_data_reloc_unlock(BTRFS_I(inode)); end_write_bio(&epd, ret); return ret; } ret = flush_write_bio(&epd); + btrfs_zoned_data_reloc_unlock(BTRFS_I(inode)); return ret; }