Message ID | Zk9i7V2GRoHxBPRu@kernel.org (mailing list archive) |
---|---|
State | Rejected, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | [for-6.10-rc1] block: fix blk_validate_limits() to properly handle stacked devices | expand |
On Thu, May 23, 2024 at 11:38:21AM -0400, Mike Snitzer wrote: > Sure, we could elevate it to blk_validate_limits (and callers) but > adding a 'stacking' parameter is more intrusive on an API level. > > Best to just update blk_set_stacking_limits() to set a new 'stacking' > flag in struct queue_limits, and update blk_stack_limits() to stack > that flag up. > > I've verified this commit to work and have staged it in linux-next via > linux-dm.git's 'for-next', see: > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=cedc03d697ff255dd5b600146521434e2e921815 > > Jens (and obviously: Christoph, Ming and others), I'm happy to send > this to Linus tomorrow morning if you could please provide your > Reviewed-by or Acked-by. I'd prefer to keep the intermediate DM fix > just to "show the work and testing". A stacking flag in the limits is fundamentally wrong, please don't do this.
On Thu, May 23, 2024 at 05:44:35PM +0200, Christoph Hellwig wrote: > On Thu, May 23, 2024 at 11:38:21AM -0400, Mike Snitzer wrote: > > Sure, we could elevate it to blk_validate_limits (and callers) but > > adding a 'stacking' parameter is more intrusive on an API level. > > > > Best to just update blk_set_stacking_limits() to set a new 'stacking' > > flag in struct queue_limits, and update blk_stack_limits() to stack > > that flag up. > > > > I've verified this commit to work and have staged it in linux-next via > > linux-dm.git's 'for-next', see: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=cedc03d697ff255dd5b600146521434e2e921815 > > > > Jens (and obviously: Christoph, Ming and others), I'm happy to send > > this to Linus tomorrow morning if you could please provide your > > Reviewed-by or Acked-by. I'd prefer to keep the intermediate DM fix > > just to "show the work and testing". > > A stacking flag in the limits is fundamentally wrong, please don't > do this. Um, how so? It serves as a hint to how the limits were constructed. Reality is, we have stacking block devices that regularly are _not_ accounted for when people make changes to block core queue_limits code. That is a serious problem. Happy to see the need for the 'stacking' flag to go away in time but I fail to see why it is "fundamentally wrong". Mike
On Thu, May 23, 2024 at 11:48:50AM -0400, Mike Snitzer wrote: > > Reality is, we have stacking block devices that regularly are _not_ > accounted for when people make changes to block core queue_limits > code. That is a serious problem. Well, maybe we need to sure blktests covers this instead of either impossible to run on a stock distro (device_mapper tests) or always somehow hanging (lvm2 testsuite) separate tests.. > Happy to see the need for the 'stacking' flag to go away in time but I > fail to see why it is "fundamentally wrong". Because stacking limits should not in any way be built different. Both the stacking flag and your restoring of the value just hack around the problem instead of trying to address it. Let's use a little time to sort this out properly instead of piling up hacks.
On Thu, May 23, 2024 at 05:52:17PM +0200, Christoph Hellwig wrote: > On Thu, May 23, 2024 at 11:48:50AM -0400, Mike Snitzer wrote: > > > > Reality is, we have stacking block devices that regularly are _not_ > > accounted for when people make changes to block core queue_limits > > code. That is a serious problem. > > Well, maybe we need to sure blktests covers this instead of either > impossible to run on a stock distro (device_mapper tests) or always > somehow hanging (lvm2 testsuite) separate tests.. Happy to see mptest get folded into blktests (its just bash code) -- but it doesn't reproduce for you so not a reliable safeguard. The other testsuites aren't applicable to this particular issue, but I don't disagree that the unicorn automated test frameworks used for DM aren't robust enough. > > Happy to see the need for the 'stacking' flag to go away in time but I > > fail to see why it is "fundamentally wrong". > > Because stacking limits should not in any way be built different. > Both the stacking flag and your restoring of the value just hack > around the problem instead of trying to address it. Let's use a > little time to sort this out properly instead of piling up hacks. I have limited patience for queue_limits bugs given how long it took us to stablize limits stacking (and how regressions keep happening). All of the changes to render max_sectors and max_discard_sectors unsettable directly (and only allow them being set in terms of an ever growing array of overrides) were quite the big hammer. But yeah, it is what it is. I do appreciate your concern with me wanting limits stacking to be a more pronounced first-class citizen in block core's queue_limits code. I was trying to get the code back to a reliable state relative to blk_stack_limits() et al -- for any underlying driver that might also be blind to (ab)using max_user_sectors to steer blk_validate_limits(). I'm concerned such hacks are also fragile. But in general I know we share the same goal, so I'll slow down and yield to you to run with piecewise fixes where needed. Will reply to your latest patch now. Mike
On Thu, May 23, 2024 at 12:38:24PM -0400, Mike Snitzer wrote: > Happy to see mptest get folded into blktests (its just bash code) -- > but it doesn't reproduce for you so not a reliable safeguard. It is a lot better than not running it. And I'll look into why it doesn't reproduce. Right now the only thing I can think of is different kernel configs, maybe related to schedulers. Can you send me your .config? Is adding mptests something you want to do, or you'd prefer outhers to take care of?
On Thu, May 23, 2024 at 07:05:29PM +0200, Christoph Hellwig wrote: > On Thu, May 23, 2024 at 12:38:24PM -0400, Mike Snitzer wrote: > > Happy to see mptest get folded into blktests (its just bash code) -- > > but it doesn't reproduce for you so not a reliable safeguard. > > It is a lot better than not running it. And I'll look into why > it doesn't reproduce. Right now the only thing I can think of is > different kernel configs, maybe related to schedulers. Can you send > me your .config? Will do off-list. > Is adding mptests something you want to do, or you'd prefer outhers > to take care of? I won't have time in the near or mid-term. So if someone else would like to convert mptests over to blktests that'd be wonderful. Fanning out to test the various supported test permutations would be cool (meaning, test with both MULTIPATH_BACKEND_MODULE=scsidebug and MULTIPATH_BACKEND_MODULE=tcmloop ... but tcmloop is brittle due to sysfs changes vs targetcli's sysfs expectations -- but that's a targetcli issue that might have been fixed, assuming that code is supported still? Not revisited in a few months)
diff --git a/block/blk-settings.c b/block/blk-settings.c index cdbaef159c4b..24c799072f6c 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -35,6 +35,7 @@ EXPORT_SYMBOL_GPL(blk_queue_rq_timeout); void blk_set_stacking_limits(struct queue_limits *lim) { memset(lim, 0, sizeof(*lim)); + lim->stacking = true; lim->logical_block_size = SECTOR_SIZE; lim->physical_block_size = SECTOR_SIZE; lim->io_min = SECTOR_SIZE; @@ -103,7 +104,7 @@ static int blk_validate_zoned_limits(struct queue_limits *lim) */ static int blk_validate_limits(struct queue_limits *lim) { - unsigned int max_hw_sectors; + unsigned int max_hw_sectors, stacked_max_hw_sectors = 0; /* * Unless otherwise specified, default to 512 byte logical blocks and a @@ -121,6 +122,23 @@ static int blk_validate_limits(struct queue_limits *lim) if (lim->io_min < lim->physical_block_size) lim->io_min = lim->physical_block_size; + + /* + * For stacked block devices, don't throw-away stacked max_sectors. + */ + if (lim->stacking && lim->max_hw_sectors) { + /* + * lim->max_sectors and lim->max_hw_sectors were already + * validated, relative underlying device(s) in this stacked + * block device. + */ + stacked_max_hw_sectors = lim->max_hw_sectors; + /* + * Impose stacked max_sectors as upper-bound for code below. + */ + lim->max_hw_sectors = lim->max_sectors; + } + /* * max_hw_sectors has a somewhat weird default for historical reason, * but driver really should set their own instead of relying on this @@ -155,6 +173,11 @@ static int blk_validate_limits(struct queue_limits *lim) lim->max_sectors = round_down(lim->max_sectors, lim->logical_block_size >> SECTOR_SHIFT); + if (stacked_max_hw_sectors) { + /* Restore previously validated stacked max_hw_sectors */ + lim->max_hw_sectors = max_hw_sectors; + } + /* * Random default for the maximum number of segments. Driver should not * rely on this and set their own. @@ -881,11 +904,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->max_secure_erase_sectors); t->zone_write_granularity = max(t->zone_write_granularity, b->zone_write_granularity); - t->zoned = max(t->zoned, b->zoned); + t->zoned |= b->zoned; if (!t->zoned) { t->zone_write_granularity = 0; t->max_zone_append_sectors = 0; } + + t->stacking |= b->stacking; + return ret; } EXPORT_SYMBOL(blk_stack_limits); diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 6463b4afeaa4..88114719fe18 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1961,7 +1961,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *limits) { bool wc = false, fua = false; - unsigned int max_hw_sectors; int r; if (dm_table_supports_nowait(t)) @@ -1982,16 +1981,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, if (!dm_table_supports_secure_erase(t)) limits->max_secure_erase_sectors = 0; - /* Don't allow queue_limits_set() to throw-away stacked max_sectors */ - max_hw_sectors = limits->max_hw_sectors; - limits->max_hw_sectors = limits->max_sectors; r = queue_limits_set(q, limits); if (r) return r; - /* Restore stacked max_hw_sectors */ - mutex_lock(&q->limits_lock); - limits->max_hw_sectors = max_hw_sectors; - mutex_unlock(&q->limits_lock); if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_WC))) { wc = true; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c3e8f7cf96be..ad1b00e5cc3e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -307,7 +307,8 @@ struct queue_limits { unsigned char misaligned; unsigned char discard_misaligned; unsigned char raid_partial_stripes_expensive; - bool zoned; + bool zoned:1; + bool stacking:1; unsigned int max_open_zones; unsigned int max_active_zones;