Message ID | 20200514065819.1113949-1-damien.lemoal@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Improve io_opt limit stacking | expand |
On 2020/05/14 15:58, Damien Le Moal wrote: > When devices with different physical sector sizes are stacked, the > largest value is used as the stacked device physical sector size. For > the optimal IO size, the lowest common multiple (lcm) of the underlying > devices is used for the stacked device. In this scenario, if only one of > the underlying device reports an optimal IO size, that value is used as > is for the stacked device but that value may not be a multiple of the > stacked device physical sector size. In this case, blk_stack_limits() > returns an error resulting in warnings being printed on device mapper > startup (observed with dm-zoned dual drive setup combining a 512B > sector SSD with a 4K sector HDD). > > To fix this, rather than returning an error, the optimal IO size limit > for the stacked device can be adjusted to the lowest common multiple > (lcm) of the stacked physical sector size and optimal IO size, resulting > in a value that is a multiple of the physical sector size while still > being an optimal size for the underlying devices. > > This patch is complementary to the patch "nvme: Fix io_opt limit > setting" which prevents the nvme driver from reporting an optimal IO > size equal to a namespace sector size for a device that does not report > an optimal IO size. > > Suggested-by: Keith Busch <kbusch@kernel.org> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > block/blk-settings.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 9a2c23cd9700..9a2b017ff681 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -561,11 +561,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, > } > > /* Optimal I/O a multiple of the physical block size? */ > - if (t->io_opt & (t->physical_block_size - 1)) { > - t->io_opt = 0; > - t->misaligned = 1; > - ret = -1; > - } > + if (t->io_opt & (t->physical_block_size - 1)) > + t->io_opt = lcm(t->io_opt, t->physical_block_size); > > t->raid_partial_stripes_expensive = > max(t->raid_partial_stripes_expensive, > Jens, Any comment on this patch ? Note: the patch the patch "nvme: Fix io_opt limit setting" is already queued for 5.8.
Damien, >> + if (t->io_opt & (t->physical_block_size - 1)) >> + t->io_opt = lcm(t->io_opt, t->physical_block_size); > Any comment on this patch ? Note: the patch the patch "nvme: Fix > io_opt limit setting" is already queued for 5.8. Setting io_opt to the physical block size is not correct.
>>> + if (t->io_opt & (t->physical_block_size - 1)) >>> + t->io_opt = lcm(t->io_opt, t->physical_block_size); > >> Any comment on this patch ? Note: the patch the patch "nvme: Fix >> io_opt limit setting" is already queued for 5.8. > > Setting io_opt to the physical block size is not correct. Oh, missed the lcm(). But I'm still concerned about twiddling io_opt to a value different than the one reported by an underlying device. Setting io_opt to something that's less than a full stripe width in a RAID, for instance, doesn't produce the expected result. So I think I'd prefer not to set io_opt at all if it isn't consistent across all the stacked devices. Let me chew on it for a bit...
On Fri, May 22, 2020 at 09:36:18AM -0400, Martin K. Petersen wrote: > > >>> + if (t->io_opt & (t->physical_block_size - 1)) > >>> + t->io_opt = lcm(t->io_opt, t->physical_block_size); > > > >> Any comment on this patch ? Note: the patch the patch "nvme: Fix > >> io_opt limit setting" is already queued for 5.8. > > > > Setting io_opt to the physical block size is not correct. > > Oh, missed the lcm(). But I'm still concerned about twiddling io_opt to > a value different than the one reported by an underlying device. > > Setting io_opt to something that's less than a full stripe width in a > RAID, for instance, doesn't produce the expected result. So I think I'd > prefer not to set io_opt at all if it isn't consistent across all the > stacked devices. We already modify the stacked io_opt value if the two request_queues report different io_opt's. If, however, only one reports an io_opt value, and it happens to not align with the other's physical block size, the code currently rejects stacking those devices. Damien's patch should just set a larger io_opt value that aligns with both, so if one io_opt is a RAID stripe size, the stacked result will be multiple stripes. I think that makes sense, but please do let us know if you think such mismatched devices just shouldn't stack.
diff --git a/block/blk-settings.c b/block/blk-settings.c index 9a2c23cd9700..9a2b017ff681 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -561,11 +561,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, } /* Optimal I/O a multiple of the physical block size? */ - if (t->io_opt & (t->physical_block_size - 1)) { - t->io_opt = 0; - t->misaligned = 1; - ret = -1; - } + if (t->io_opt & (t->physical_block_size - 1)) + t->io_opt = lcm(t->io_opt, t->physical_block_size); t->raid_partial_stripes_expensive = max(t->raid_partial_stripes_expensive,
When devices with different physical sector sizes are stacked, the largest value is used as the stacked device physical sector size. For the optimal IO size, the lowest common multiple (lcm) of the underlying devices is used for the stacked device. In this scenario, if only one of the underlying device reports an optimal IO size, that value is used as is for the stacked device but that value may not be a multiple of the stacked device physical sector size. In this case, blk_stack_limits() returns an error resulting in warnings being printed on device mapper startup (observed with dm-zoned dual drive setup combining a 512B sector SSD with a 4K sector HDD). To fix this, rather than returning an error, the optimal IO size limit for the stacked device can be adjusted to the lowest common multiple (lcm) of the stacked physical sector size and optimal IO size, resulting in a value that is a multiple of the physical sector size while still being an optimal size for the underlying devices. This patch is complementary to the patch "nvme: Fix io_opt limit setting" which prevents the nvme driver from reporting an optimal IO size equal to a namespace sector size for a device that does not report an optimal IO size. Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- block/blk-settings.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)