Message ID | 20220427160255.300418-1-p.raghav@samsung.com (mailing list archive) |
---|---|
Headers | show |
Series | support non power of 2 zoned devices | expand |
On 27/04/2022 09:03, Pankaj Raghav wrote: > - Background and Motivation: > > The zone storage implementation in Linux, introduced since v4.10, first > targetted SMR drives which have a power of 2 (po2) zone size alignment > requirement. The po2 zone size was further imposed implicitly by the > block layer's blk_queue_chunk_sectors(), used to prevent IO merging > across chunks beyond the specified size, since v3.16 through commit > 762380ad9322 ("block: add notion of a chunk size for request merging"). > But this same general block layer po2 requirement for blk_queue_chunk_sectors() > was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors' > to be non-power-of-2"). NAND, which is the media used in newer zoned storage > devices, does not naturally align to po2, and so the po2 requirement > does not make sense for those type of zone storage devices. > > Removing the po2 requirement from zone storage should therefore be possible > now provided that no userspace regression and no performance regressions are > introduced. Stop-gap patches have been already merged into f2fs-tools to > proactively not allow npo2 zone sizes until proper support is added [0]. > Additional kernel stop-gap patches are provided in this series for dm-zoned. > Support for npo2 zonefs and btrfs support is addressed in this series. > > There was an effort previously [1] to add support to non po2 devices via > device level emulation but that was rejected with a final conclusion > to add support for non po2 zoned device in the complete stack[2]. Hey Pankaj, One thing I'm concerned with this patches is, once we have npo2 zones (or to be precise not fs_info->sectorsize aligned zones) we have to check on every allocation if we still have at least have fs_info->sectorsize bytes left in a zone. If not we need to explicitly finish the zone, otherwise we'll run out of max active zones. This is a problem for zoned btrfs at the moment already but it'll be even worse with npo2, because we're never implicitly finishing zones. See also https://lore.kernel.org/linux-btrfs/42758829d8696a471a27f7aaeab5468f60b1565d.1651157034.git.naohiro.aota@wdc.com Thanks, Johannes
Hi Johannes, On 2022-05-03 00:07, Johannes Thumshirn wrote: >> There was an effort previously [1] to add support to non po2 devices via >> device level emulation but that was rejected with a final conclusion >> to add support for non po2 zoned device in the complete stack[2]. > > Hey Pankaj, > > One thing I'm concerned with this patches is, once we have npo2 zones (or to be precise > not fs_info->sectorsize aligned zones) we have to check on every allocation if we still > have at least have fs_info->sectorsize bytes left in a zone. If not we need to > explicitly finish the zone, otherwise we'll run out of max active zones. > This commit: `btrfs: zoned: relax the alignment constraint for zoned devices` makes sure the zone size is BTRFS_STRIPE_LEN aligned (64K). So even the npo2 zoned device should be aligned to `fs_info->sectorsize`, which is typically 4k. This was one of the comment that came from David Sterba: https://lore.kernel.org/all/20220315142740.GU12643@twin.jikos.cz/ where he suggested to have some sane alignment for the zone sizes. > This is a problem for zoned btrfs at the moment already but it'll be even worse > with npo2, because we're never implicitly finishing zones. > > See also > https://lore.kernel.org/linux-btrfs/42758829d8696a471a27f7aaeab5468f60b1565d.1651157034.git.naohiro.aota@wdc.com > I did take a look at this few days back and the patch should work fine also for npo2 zoned device as we allow only zone sizes that are BTRFS_STRIPE_LEN aligned. So even the max nodesize for METADATA BGs is only 64k and it should be aligned correctly to implicitly finish the zone. Let me know your thoughts and if I am missing something. Regards, Pankaj
On Tue, May 03, 2022 at 11:12:04AM +0200, Pankaj Raghav wrote: > Hi Johannes, > On 2022-05-03 00:07, Johannes Thumshirn wrote: > >> There was an effort previously [1] to add support to non po2 devices via > >> device level emulation but that was rejected with a final conclusion > >> to add support for non po2 zoned device in the complete stack[2]. > > > > Hey Pankaj, > > > > One thing I'm concerned with this patches is, once we have npo2 zones (or to be precise > > not fs_info->sectorsize aligned zones) we have to check on every allocation if we still > > have at least have fs_info->sectorsize bytes left in a zone. If not we need to > > explicitly finish the zone, otherwise we'll run out of max active zones. > > > This commit: `btrfs: zoned: relax the alignment constraint for zoned > devices` makes sure the zone size is BTRFS_STRIPE_LEN aligned (64K). So > even the npo2 zoned device should be aligned to `fs_info->sectorsize`, > which is typically 4k. > > This was one of the comment that came from David Sterba: > https://lore.kernel.org/all/20220315142740.GU12643@twin.jikos.cz/ > where he suggested to have some sane alignment for the zone sizes. My idea of 'sane' value would be 1M, that we have 4K for sectors is because of the 1:1 mapping to pages, but RAM sizes are on a different scale than storage devices. The 4K is absolute minimum but if the page size is taken as a basic constraint, ARM has 64K and there are some 256K arches.
On 2022-05-04 23:14, David Sterba wrote: >> This commit: `btrfs: zoned: relax the alignment constraint for zoned >> devices` makes sure the zone size is BTRFS_STRIPE_LEN aligned (64K). So >> even the npo2 zoned device should be aligned to `fs_info->sectorsize`, >> which is typically 4k. >> >> This was one of the comment that came from David Sterba: >> https://lore.kernel.org/all/20220315142740.GU12643@twin.jikos.cz/ >> where he suggested to have some sane alignment for the zone sizes. > > My idea of 'sane' value would be 1M, that we have 4K for sectors is > because of the 1:1 mapping to pages, but RAM sizes are on a different > scale than storage devices. The 4K is absolute minimum but if the page > size is taken as a basic constraint, ARM has 64K and there are some 256K > arches. That is a good point. I think it is safe to have 1MB as the minimum alignment so that it covers all architecture's page sizes. Thanks. I will queue this up.