diff mbox series

[STABLE,5.18,3/3] btrfs: zoned: drop optimization of zone finish

Message ID 20220808013210.646680-4-naohiro.aota@wdc.com (mailing list archive)
State New, archived
Headers show
Series btrfs: backport zoned mode fixes | expand

Commit Message

Naohiro Aota Aug. 8, 2022, 1:32 a.m. UTC
commit b3a3b0255797e1d395253366ba24a4cc6c8bdf9c upstream

We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only
when necessary, i.e. we don't send REQ_OP_ZONE_FINISH when we assume we
wrote fully into the zone.

The assumption is determined by "alloc_offset == capacity". This condition
won't work if the last ordered extent is canceled due to some errors. In
that case, we consider the zone is deactivated without sending the finish
command while it's still active.

This inconstancy results in activating another block group while we cannot
really activate the underlying zone, which causes the active zone exceeds
errors like below.

    BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0
    nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
    active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0
    nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
    active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0

Fix the issue by removing the optimization for now.

Fixes: 8376d9e1ed8f ("btrfs: zoned: finish superblock zone once no space left for new SB")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/zoned.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

Comments

Naohiro Aota Aug. 8, 2022, 2 a.m. UTC | #1
Sorry. I forgot to amend a line adding "int i". I'll send v2.

On Mon, Aug 08, 2022 at 10:32:10AM +0900, Naohiro Aota wrote:
> commit b3a3b0255797e1d395253366ba24a4cc6c8bdf9c upstream
> 
> We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only
> when necessary, i.e. we don't send REQ_OP_ZONE_FINISH when we assume we
> wrote fully into the zone.
> 
> The assumption is determined by "alloc_offset == capacity". This condition
> won't work if the last ordered extent is canceled due to some errors. In
> that case, we consider the zone is deactivated without sending the finish
> command while it's still active.
> 
> This inconstancy results in activating another block group while we cannot
> really activate the underlying zone, which causes the active zone exceeds
> errors like below.
> 
>     BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0
>     nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
>     active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0
>     nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
>     active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0
> 
> Fix the issue by removing the optimization for now.
> 
> Fixes: 8376d9e1ed8f ("btrfs: zoned: finish superblock zone once no space left for new SB")
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/zoned.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 2c0851d94eff..b6b64da3422c 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2039,13 +2039,25 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len
>  	spin_unlock(&block_group->lock);
>  
>  	map = block_group->physical_map;
> -	device = map->stripes[0].dev;
> -	physical = map->stripes[0].physical;
> +	for (i = 0; i < map->num_stripes; i++) {
> +		int ret;
>  
> -	if (!device->zone_info->max_active_zones)
> -		goto out;
> +		device = map->stripes[i].dev;
> +		physical = map->stripes[i].physical;
>  
> -	btrfs_dev_clear_active_zone(device, physical);
> +		if (device->zone_info->max_active_zones == 0)
> +			continue;
> +
> +		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
> +				       physical >> SECTOR_SHIFT,
> +				       device->zone_info->zone_size >> SECTOR_SHIFT,
> +				       GFP_NOFS);
> +
> +		if (ret)
> +			return;
> +
> +		btrfs_dev_clear_active_zone(device, physical);
> +	}
>  
>  	spin_lock(&fs_info->zone_active_bgs_lock);
>  	ASSERT(!list_empty(&block_group->active_bg_list));
> -- 
> 2.35.1
>
diff mbox series

Patch

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 2c0851d94eff..b6b64da3422c 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2039,13 +2039,25 @@  void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len
 	spin_unlock(&block_group->lock);
 
 	map = block_group->physical_map;
-	device = map->stripes[0].dev;
-	physical = map->stripes[0].physical;
+	for (i = 0; i < map->num_stripes; i++) {
+		int ret;
 
-	if (!device->zone_info->max_active_zones)
-		goto out;
+		device = map->stripes[i].dev;
+		physical = map->stripes[i].physical;
 
-	btrfs_dev_clear_active_zone(device, physical);
+		if (device->zone_info->max_active_zones == 0)
+			continue;
+
+		ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
+				       physical >> SECTOR_SHIFT,
+				       device->zone_info->zone_size >> SECTOR_SHIFT,
+				       GFP_NOFS);
+
+		if (ret)
+			return;
+
+		btrfs_dev_clear_active_zone(device, physical);
+	}
 
 	spin_lock(&fs_info->zone_active_bgs_lock);
 	ASSERT(!list_empty(&block_group->active_bg_list));