Message ID | 20240420075811.1276893-3-dlemoal@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fixes for zone write plugging | expand |
On Sat, Apr 20, 2024 at 04:58:11PM +0900, Damien Le Moal wrote: > A zone write plug BIO work function blk_zone_wplug_bio_work() calls > submit_bio_noacct_nocheck() to execute the next unplugged BIO. This > function may block. So executing zone plugs BIO works using the block > layer global kblockd workqueue can potentially lead to preformance or > latency issues as the number of concurrent work for a workqueue is > limited to WQ_DFL_ACTIVE (256). > 1) For a system with a large number of zoned disks, issuing write > requests to otherwise unused zones may be delayed wiating for a work > thread to become available. > 2) Requeue operations which use kblockd but are independent of zone > write plugging may alsoi end up being delayed. > > To avoid these potential performance issues, create a workqueue per > zoned device to execute zone plugs BIO work. The workqueue max active > parameter is set to the maximum number of zone write plugs allocated > with the zone write plug mempool. This limit is equal to the maximum > number of open zones of the disk and defaults to 128 for disks that do > not have a limit on the number of open zones. Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> Should the zone write plug submission do non-blocking submissions as well to avoid stalling in the workqueue thread all the time?
On 2024/04/22 16:25, Christoph Hellwig wrote: > On Sat, Apr 20, 2024 at 04:58:11PM +0900, Damien Le Moal wrote: >> A zone write plug BIO work function blk_zone_wplug_bio_work() calls >> submit_bio_noacct_nocheck() to execute the next unplugged BIO. This >> function may block. So executing zone plugs BIO works using the block >> layer global kblockd workqueue can potentially lead to preformance or >> latency issues as the number of concurrent work for a workqueue is >> limited to WQ_DFL_ACTIVE (256). >> 1) For a system with a large number of zoned disks, issuing write >> requests to otherwise unused zones may be delayed wiating for a work >> thread to become available. >> 2) Requeue operations which use kblockd but are independent of zone >> write plugging may alsoi end up being delayed. >> >> To avoid these potential performance issues, create a workqueue per >> zoned device to execute zone plugs BIO work. The workqueue max active >> parameter is set to the maximum number of zone write plugs allocated >> with the zone write plug mempool. This limit is equal to the maximum >> number of open zones of the disk and defaults to 128 for disks that do >> not have a limit on the number of open zones. > > Looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > Should the zone write plug submission do non-blocking submissions as well > to avoid stalling in the workqueue thread all the time? I do not think that the stalling actually happens that often. The 2 main cases I see are: 1) Out of tag so we block on tag allocation when preparing the request in submit_bio_noacct_nocheck(), or 2) The device has BLK_MQ_F_BLOCKING set for its tag set (e.g. nullblk with memory backing). For (1), we could use RQF_NOWAIT to prevent blocking, but then we would need to retry later on with a timer to make forward progress for the plug. And I do not think we can actually avoid (2). So in the end, I do not see a clean way to completely avoid blocking in all cases.
diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 685f0b9159fd..4f5bf1c0a99f 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -1131,7 +1131,7 @@ static void disk_zone_wplug_unplug_bio(struct gendisk *disk, /* Schedule submission of the next plugged BIO if we have one. */ if (!bio_list_empty(&zwplug->bio_list)) { spin_unlock_irqrestore(&zwplug->lock, flags); - kblockd_schedule_work(&zwplug->bio_work); + queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); return; } @@ -1334,7 +1334,7 @@ static void disk_zone_wplug_handle_error(struct gendisk *disk, /* Restart BIO submission if we still have any BIO left. */ if (!bio_list_empty(&zwplug->bio_list)) { WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)); - kblockd_schedule_work(&zwplug->bio_work); + queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); goto unlock; } @@ -1411,14 +1411,25 @@ static int disk_alloc_zone_resources(struct gendisk *disk, disk->zone_wplugs_pool = mempool_create_kmalloc_pool(pool_size, sizeof(struct blk_zone_wplug)); - if (!disk->zone_wplugs_pool) { - kfree(disk->zone_wplugs_hash); - disk->zone_wplugs_hash = NULL; - disk->zone_wplugs_hash_bits = 0; - return -ENOMEM; - } + if (!disk->zone_wplugs_pool) + goto free_hash; + + disk->zone_wplugs_wq = + alloc_workqueue("%s_zwplugs", WQ_MEM_RECLAIM | WQ_HIGHPRI, + pool_size, disk->disk_name); + if (!disk->zone_wplugs_wq) + goto destroy_pool; return 0; + +destroy_pool: + mempool_destroy(disk->zone_wplugs_pool); + disk->zone_wplugs_pool = NULL; +free_hash: + kfree(disk->zone_wplugs_hash); + disk->zone_wplugs_hash = NULL; + disk->zone_wplugs_hash_bits = 0; + return -ENOMEM; } static void disk_destroy_zone_wplugs_hash_table(struct gendisk *disk) @@ -1449,6 +1460,11 @@ void disk_free_zone_resources(struct gendisk *disk) { cancel_work_sync(&disk->zone_wplugs_work); + if (disk->zone_wplugs_wq) { + destroy_workqueue(disk->zone_wplugs_wq); + disk->zone_wplugs_wq = NULL; + } + disk_destroy_zone_wplugs_hash_table(disk); /* diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c62536c78a46..a2847f8e5f82 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -191,6 +191,7 @@ struct gendisk { struct hlist_head *zone_wplugs_hash; struct list_head zone_wplugs_err_list; struct work_struct zone_wplugs_work; + struct workqueue_struct *zone_wplugs_wq; #endif /* CONFIG_BLK_DEV_ZONED */ #if IS_ENABLED(CONFIG_CDROM)
A zone write plug BIO work function blk_zone_wplug_bio_work() calls submit_bio_noacct_nocheck() to execute the next unplugged BIO. This function may block. So executing zone plugs BIO works using the block layer global kblockd workqueue can potentially lead to preformance or latency issues as the number of concurrent work for a workqueue is limited to WQ_DFL_ACTIVE (256). 1) For a system with a large number of zoned disks, issuing write requests to otherwise unused zones may be delayed wiating for a work thread to become available. 2) Requeue operations which use kblockd but are independent of zone write plugging may alsoi end up being delayed. To avoid these potential performance issues, create a workqueue per zoned device to execute zone plugs BIO work. The workqueue max active parameter is set to the maximum number of zone write plugs allocated with the zone write plug mempool. This limit is equal to the maximum number of open zones of the disk and defaults to 128 for disks that do not have a limit on the number of open zones. Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> --- block/blk-zoned.c | 32 ++++++++++++++++++++++++-------- include/linux/blkdev.h | 1 + 2 files changed, 25 insertions(+), 8 deletions(-)