diff mbox series

btrfs: zoned: drop optimization of zone finish

Message ID c5e53de667c5ec4ff49aa1719da1be7e1f80734e.1656467635.git.naohiro.aota@wdc.com (mailing list archive)
State New, archived
Headers show
Series btrfs: zoned: drop optimization of zone finish | expand

Commit Message

Naohiro Aota June 29, 2022, 2 a.m. UTC
We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only
when necessary, i.e., we don't send REQ_OP_ZONE_FINISH when we assume we
wrote fully into the zone.

The assumption is determined by "alloc_offset == capacity". This condition
won't work if the last ordered extent is canceled due to some errors. In
that case, we consider the zone is deactivated without sending the finish
command while it's still active.

This inconstancy results in activating another block group while we cannot
really activate the underlying zone, which causes the active zone exceeds
errors like below.

    BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0
    nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
    active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0
    nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
    active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0

Fix the issue by removing the optimization for now.

CC: stable@vger.kernel.org # 5.18.x: d70cbdda75da btrfs: zoned: consolidate zone finish functions
CC: stable@vger.kernel.org # 5.18.x
Fixes: 8376d9e1ed8f ("btrfs: zoned: finish superblock zone once no space left for new SB")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

Comments

Johannes Thumshirn June 29, 2022, 7:27 a.m. UTC | #1
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
David Sterba July 8, 2022, 4:52 p.m. UTC | #2
On Wed, Jun 29, 2022 at 11:00:38AM +0900, Naohiro Aota wrote:
> We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only
> when necessary, i.e., we don't send REQ_OP_ZONE_FINISH when we assume we
> wrote fully into the zone.
> 
> The assumption is determined by "alloc_offset == capacity". This condition
> won't work if the last ordered extent is canceled due to some errors. In
> that case, we consider the zone is deactivated without sending the finish
> command while it's still active.
> 
> This inconstancy results in activating another block group while we cannot
> really activate the underlying zone, which causes the active zone exceeds
> errors like below.
> 
>     BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0
>     nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
>     active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0
>     nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
>     active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0
> 
> Fix the issue by removing the optimization for now.
> 
> CC: stable@vger.kernel.org # 5.18.x: d70cbdda75da btrfs: zoned: consolidate zone finish functions

The commit d70cbdda75da is not in 5.18 stable and first appeared in
5.19-rc1, so this fix is technically a regression fix.

I think there are several other prep and cleanup patches in zoned code
that would make backporting hard.
diff mbox series

Patch

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 7a0f8fa44800..119f75b11363 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1885,7 +1885,6 @@  static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
 {
 	struct btrfs_fs_info *fs_info = block_group->fs_info;
 	struct map_lookup *map;
-	bool need_zone_finish;
 	int ret = 0;
 	int i;
 
@@ -1942,12 +1941,6 @@  static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
 		}
 	}
 
-	/*
-	 * The block group is not fully allocated, so not fully written yet. We
-	 * need to send ZONE_FINISH command to free up an active zone.
-	 */
-	need_zone_finish = !btrfs_zoned_bg_is_full(block_group);
-
 	block_group->zone_is_active = 0;
 	block_group->alloc_offset = block_group->zone_capacity;
 	block_group->free_space_ctl->free_space = 0;
@@ -1963,15 +1956,13 @@  static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
 		if (device->zone_info->max_active_zones == 0)
 			continue;
 
-		if (need_zone_finish) {
-			ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
-					       physical >> SECTOR_SHIFT,
-					       device->zone_info->zone_size >> SECTOR_SHIFT,
-					       GFP_NOFS);
+		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
+				       physical >> SECTOR_SHIFT,
+				       device->zone_info->zone_size >> SECTOR_SHIFT,
+				       GFP_NOFS);
 
-			if (ret)
-				return ret;
-		}
+		if (ret)
+			return ret;
 
 		btrfs_dev_clear_active_zone(device, physical);
 	}