Message ID | 20220308165349.231320-2-p.raghav@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | power_of_2 emulation support for NVMe ZNS devices | expand |
On Tue, Mar 08, 2022 at 05:53:44PM +0100, Pankaj Raghav wrote: > Remove the condition which disallows non-power_of_2 zone size ZNS drive > to be updated and use generic method to calculate number of zones > instead of relying on log and shift based calculation on zone size. > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > --- > drivers/nvme/host/zns.c | 9 +-------- > 1 file changed, 1 insertion(+), 8 deletions(-) > > diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c > index 9f81beb4df4e..ad02c61c0b52 100644 > --- a/drivers/nvme/host/zns.c > +++ b/drivers/nvme/host/zns.c > @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf) > } > > ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze)); > - if (!is_power_of_2(ns->zsze)) { > - dev_warn(ns->ctrl->device, > - "invalid zone size:%llu for namespace:%u\n", > - ns->zsze, ns->head->ns_id); > - status = -ENODEV; > - goto free_data; > - } > > blk_queue_set_zoned(ns->disk, BLK_ZONED_HM); > blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); > @@ -129,7 +122,7 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns, > sizeof(struct nvme_zone_descriptor); > > nr_zones = min_t(unsigned int, nr_zones, > - get_capacity(ns->disk) >> ilog2(ns->zsze)); > + get_capacity(ns->disk) / ns->zsze); > > bufsize = sizeof(struct nvme_zone_report) + > nr_zones * sizeof(struct nvme_zone_descriptor); > -- The zns report zones realigns the starting sector using an expected pow2 value, so I think you need to update that as well with something like the following: @@ -197,7 +189,7 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector, c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL; c.zmr.pr = NVME_REPORT_ZONE_PARTIAL; - sector &= ~(ns->zsze - 1); + sector = sector - sector % ns->zsze; while (zone_idx < nr_zones && sector < get_capacity(ns->disk)) { memset(report, 0, buflen);
On 2022-03-08 18:14, Keith Busch wrote: > The zns report zones realigns the starting sector using an expected pow2 > value, so I think you need to update that as well with something like > the following: > > @@ -197,7 +189,7 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector, > c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL; > c.zmr.pr = NVME_REPORT_ZONE_PARTIAL; > > - sector &= ~(ns->zsze - 1); > + sector = sector - sector % ns->zsze; > while (zone_idx < nr_zones && sector < get_capacity(ns->disk)) { > memset(report, 0, buflen); > I actually have these changes in the Patch 4/6: - sector &= ~(ns->zsze - 1); + sector = rounddown(sector, zone_size); But you are right, I should move those changes to this patch as this patch removes the po2 assumptions in NVMe ZNS driver. I will fix it up in the next revision. Thanks.
On 3/9/22 01:53, Pankaj Raghav wrote: > Remove the condition which disallows non-power_of_2 zone size ZNS drive > to be updated and use generic method to calculate number of zones > instead of relying on log and shift based calculation on zone size. > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > --- > drivers/nvme/host/zns.c | 9 +-------- > 1 file changed, 1 insertion(+), 8 deletions(-) > > diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c > index 9f81beb4df4e..ad02c61c0b52 100644 > --- a/drivers/nvme/host/zns.c > +++ b/drivers/nvme/host/zns.c > @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf) > } > > ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze)); > - if (!is_power_of_2(ns->zsze)) { > - dev_warn(ns->ctrl->device, > - "invalid zone size:%llu for namespace:%u\n", > - ns->zsze, ns->head->ns_id); > - status = -ENODEV; > - goto free_data; > - } > > blk_queue_set_zoned(ns->disk, BLK_ZONED_HM); > blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); > @@ -129,7 +122,7 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns, > sizeof(struct nvme_zone_descriptor); > > nr_zones = min_t(unsigned int, nr_zones, > - get_capacity(ns->disk) >> ilog2(ns->zsze)); > + get_capacity(ns->disk) / ns->zsze); This will not compile on 32-bits arch. This needs to use div64_u64(). > > bufsize = sizeof(struct nvme_zone_report) + > nr_zones * sizeof(struct nvme_zone_descriptor);
On 3/9/22 01:53, Pankaj Raghav wrote: > Remove the condition which disallows non-power_of_2 zone size ZNS drive > to be updated and use generic method to calculate number of zones > instead of relying on log and shift based calculation on zone size. > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > --- > drivers/nvme/host/zns.c | 9 +-------- > 1 file changed, 1 insertion(+), 8 deletions(-) > > diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c > index 9f81beb4df4e..ad02c61c0b52 100644 > --- a/drivers/nvme/host/zns.c > +++ b/drivers/nvme/host/zns.c > @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf) > } > > ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze)); > - if (!is_power_of_2(ns->zsze)) { > - dev_warn(ns->ctrl->device, > - "invalid zone size:%llu for namespace:%u\n", > - ns->zsze, ns->head->ns_id); > - status = -ENODEV; > - goto free_data; > - } Doing this will allow a non power of 2 zone sized device to be seen by the block layer. This will break functions such as blkdev_nr_zones() but this patch is not changing this functions, and other using bit shift calculations. > > blk_queue_set_zoned(ns->disk, BLK_ZONED_HM); > blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); > @@ -129,7 +122,7 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns, > sizeof(struct nvme_zone_descriptor); > > nr_zones = min_t(unsigned int, nr_zones, > - get_capacity(ns->disk) >> ilog2(ns->zsze)); > + get_capacity(ns->disk) / ns->zsze); > > bufsize = sizeof(struct nvme_zone_report) + > nr_zones * sizeof(struct nvme_zone_descriptor);
On 2022-03-09 04:40, Damien Le Moal wrote: > On 3/9/22 01:53, Pankaj Raghav wrote: >> nr_zones = min_t(unsigned int, nr_zones, >> - get_capacity(ns->disk) >> ilog2(ns->zsze)); >> + get_capacity(ns->disk) / ns->zsze); > > This will not compile on 32-bits arch. This needs to use div64_u64(). > Oops. I will fix that up in the next revision and also in other places that does not use a div64_u64. Thanks. > >
On 2022-03-09 04:44, Damien Le Moal wrote: > On 3/9/22 01:53, Pankaj Raghav wrote: >> >> ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze)); >> - if (!is_power_of_2(ns->zsze)) { >> - dev_warn(ns->ctrl->device, >> - "invalid zone size:%llu for namespace:%u\n", >> - ns->zsze, ns->head->ns_id); >> - status = -ENODEV; >> - goto free_data; >> - } > > Doing this will allow a non power of 2 zone sized device to be seen by > the block layer. This will break functions such as blkdev_nr_zones() but > this patch is not changing this functions, and other using bit shift > calculations. > The goal of this patchset was to emulate a po2 zone size for a npo2 device to the block layer. If you see the `npo2_zone_setup` callback in the NVMe driver (patch 4/6), we do the following: ``` + ns->zsze_po2 = 1 << get_count_order_long(ns->zsze); + capacity = nr_zones * ns->zsze_po2; + set_capacity_and_notify(ns->disk, capacity); ``` So we adapt the capacity of the disk based on the po2 zone size. The chunk sectors are also set to this new po2 zone size. Therefore, all the block layer functions will continue to work as the block layer sees the zone size of the device to be ns->zsze_po2 and not the actual device zone size which is ns->zsze. Changing the functions such blkdev_nr_zones that uses po2 calculation will/should be dealt separately if decide to relax the po2 constraint in the block layer.
diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c index 9f81beb4df4e..ad02c61c0b52 100644 --- a/drivers/nvme/host/zns.c +++ b/drivers/nvme/host/zns.c @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf) } ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze)); - if (!is_power_of_2(ns->zsze)) { - dev_warn(ns->ctrl->device, - "invalid zone size:%llu for namespace:%u\n", - ns->zsze, ns->head->ns_id); - status = -ENODEV; - goto free_data; - } blk_queue_set_zoned(ns->disk, BLK_ZONED_HM); blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); @@ -129,7 +122,7 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns, sizeof(struct nvme_zone_descriptor); nr_zones = min_t(unsigned int, nr_zones, - get_capacity(ns->disk) >> ilog2(ns->zsze)); + get_capacity(ns->disk) / ns->zsze); bufsize = sizeof(struct nvme_zone_report) + nr_zones * sizeof(struct nvme_zone_descriptor);
Remove the condition which disallows non-power_of_2 zone size ZNS drive to be updated and use generic method to calculate number of zones instead of relying on log and shift based calculation on zone size. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> --- drivers/nvme/host/zns.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-)