Message ID | 20240701051800.1245240-4-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] block: remove a duplicate io_min check in blk_validate_limits | expand |
On 6/30/24 22:17, Christoph Hellwig wrote: > NOWS is one of the annoying "0's based values" in NVMe, where 0 means one > and we thus can't detect if it isn't set. Thus a NOWS value of 0 means > that the Namespace Optimal Write Size is a single LBA, which is clearly > bogus. Ignore the value in that case and don't propagate an io_opt > value to the block layer. > > Signed-off-by: Christoph Hellwig<hch@lst.de> indeed (Figure 97: Identify -> Namespace Optimal Write Size (NOWS)). wonder why can't spec use 0xFFFF to indicate not supported for all zero based values to avoid this confusion ? Looks good. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
On Mon, Jul 01, 2024 at 07:17:52AM +0200, Christoph Hellwig wrote: > NOWS is one of the annoying "0's based values" in NVMe, where 0 means one > and we thus can't detect if it isn't set. We can detect if it is set based on the namespace features flags, though. > Thus a NOWS value of 0 means > that the Namespace Optimal Write Size is a single LBA, which is clearly > bogus. Ignore the value in that case and don't propagate an io_opt > value to the block layer. Hm, why is that clearly bogus? Optane SSDs were optimized for single-sector writes.
On Mon, Jul 01, 2024 at 07:55:37AM -0600, Keith Busch wrote: > On Mon, Jul 01, 2024 at 07:17:52AM +0200, Christoph Hellwig wrote: > > NOWS is one of the annoying "0's based values" in NVMe, where 0 means one > > and we thus can't detect if it isn't set. > > We can detect if it is set based on the namespace features flags, > though. Except that the flag covers 5 different flags, for 2 different operations. > > Thus a NOWS value of 0 means > > that the Namespace Optimal Write Size is a single LBA, which is clearly > > bogus. Ignore the value in that case and don't propagate an io_opt > > value to the block layer. > > Hm, why is that clearly bogus? Optane SSDs were optimized for > single-sector writes. Because even if they optimize for it, the pure overhead of both the PCIe physical layer, nvme command overhead and software overhead will never make it more optimal to split a larger I/O down to this size, which the optimal write size is about.
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index ab0429644fe378..6784fc6e95625e 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2025,7 +2025,8 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id, /* NPWG = Namespace Preferred Write Granularity */ phys_bs = bs * (1 + le16_to_cpu(id->npwg)); /* NOWS = Namespace Optimal Write Size */ - io_opt = bs * (1 + le16_to_cpu(id->nows)); + if (id->nows) + io_opt = bs * (1 + le16_to_cpu(id->nows)); } /*
NOWS is one of the annoying "0's based values" in NVMe, where 0 means one and we thus can't detect if it isn't set. Thus a NOWS value of 0 means that the Namespace Optimal Write Size is a single LBA, which is clearly bogus. Ignore the value in that case and don't propagate an io_opt value to the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/nvme/host/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)