mbox series

[PATCHv13,00/11] block write streams with nvme fdp

Message ID 20241210194722.1905732-1-kbusch@meta.com (mailing list archive)
Headers show
Series block write streams with nvme fdp | expand

Message

Keith Busch Dec. 10, 2024, 7:47 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

Previous discussion threads:

  v12: https://lore.kernel.org/linux-nvme/20241206221801.790690-1-kbusch@meta.com/T/#u
  v11: https://lore.kernel.org/linux-nvme/20241206015308.3342386-1-kbusch@meta.com/T/#u

Changes from v12:

 - Removed statx. We need additional time to consider the best way to
   expose these attributes. Until then, applications can fallback to the
   sysfs block attribute to query write stream settings.

 - More verbose logging on error.

 - Added reviews.

 - Explicitly clear the return value to 0 for unsupported or
   unrecognized FDP configs; while it's not technically needed (it's
   already 0 in these paths), it makes it clear that a non-error return
   wasn't an accidental oversight.

 - Fixed the compiler warnings from unitialized variable; switched the
   do-while to a more clear for-loop.

 - Fixed long-line wrapping

 - Memory leak when cleaning up the namespace head.

 - Don't rescan FDP configuration if we successfully set it up before.
   The namespaces' FDP configuration is static.

 - Better function name querying the size granularity of the FDP reclaim
   unit.

Christoph Hellwig (7):
  fs: add a write stream field to the kiocb
  block: add a bi_write_stream field
  block: introduce a write_stream_granularity queue limit
  block: expose write streams for block device nodes
  nvme: add a nvme_get_log_lsi helper
  nvme: pass a void pointer to nvme_get/set_features for the result
  nvme: add FDP definitions

Keith Busch (4):
  block: introduce max_write_streams queue limit
  io_uring: enable per-io write streams
  nvme: register fdp parameters with the block layer
  nvme: use fdp streams if write stream is provided

 Documentation/ABI/stable/sysfs-block |  15 +++
 block/bio.c                          |   2 +
 block/blk-crypto-fallback.c          |   1 +
 block/blk-merge.c                    |   4 +
 block/blk-sysfs.c                    |   6 +
 block/bounce.c                       |   1 +
 block/fops.c                         |  23 ++++
 drivers/nvme/host/core.c             | 186 ++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h             |   7 +-
 include/linux/blk_types.h            |   1 +
 include/linux/blkdev.h               |  16 +++
 include/linux/fs.h                   |   1 +
 include/linux/nvme.h                 |  77 +++++++++++
 include/uapi/linux/io_uring.h        |   4 +
 io_uring/io_uring.c                  |   2 +
 io_uring/rw.c                        |   1 +
 16 files changed, 341 insertions(+), 6 deletions(-)

Comments

Christoph Hellwig Dec. 11, 2024, 7:13 a.m. UTC | #1
On Tue, Dec 10, 2024 at 11:47:11AM -0800, Keith Busch wrote:
> Changes from v12:

The changes looks good to me.  I'll ty to send out an API for querying
the paramters in the next so that we don't merge the granularity as dead
code, but I'll need a bit more time to also write proper tests and
stuff.  For that it probably makes sense to expose these streams using
null_blk so that we can actually have a block test.  If someone feels
like speeding things up I'd also be happy for someone else to take
that work.
John Garry Dec. 11, 2024, 8:46 a.m. UTC | #2
>   
>   	unsigned int		max_open_zones;
>   	unsigned int		max_active_zones;
> @@ -1249,6 +1250,12 @@ static inline unsigned short bdev_max_write_streams(struct block_device *bdev)
>   	return bdev_limits(bdev)->max_write_streams;
>   }
>   
> +static inline unsigned int
> +bdev_write_stream_granularity(struct block_device *bdev)

is this referenced anywhere?

I see that Nitesh mentioned something similar also previously

> +{
> +	return bdev_limits(bdev)->write_stream_granularity;
> +}
> +
>   static inline unsigned queue_logical_block_size(const struct request_queue *q)
>   {
>   	return q->limits.logical_block_size;
Kanchan Joshi Dec. 12, 2024, 5:51 a.m. UTC | #3
On 12/11/2024 12:43 PM, Christoph Hellwig wrote:
> The changes looks good to me.  I'll ty to send out an API for querying
> the paramters in the next so that we don't merge the granularity as dead
> code

What do you have in mind for this API. New fcntl or ioctl?
With ability limited to querying only or....
Christoph Hellwig Dec. 12, 2024, 6:05 a.m. UTC | #4
On Thu, Dec 12, 2024 at 11:21:07AM +0530, Kanchan Joshi wrote:
> On 12/11/2024 12:43 PM, Christoph Hellwig wrote:
> > The changes looks good to me.  I'll ty to send out an API for querying
> > the paramters in the next so that we don't merge the granularity as dead
> > code
> 
> What do you have in mind for this API. New fcntl or ioctl?
> With ability limited to querying only or....

Yes, new fcntl or ioctl, query the number of streams and (nominal)
stream granularity as a start.  There is a few other things that
might be useful:

 - check if the size is just nominal or not, aka if the device can
   shorten it.  FDP currently allows for that, but given that
   notifications for that are out of bounds it makes a complete mess
   of software actually trying to use it for more than simple hot/cold
   separation like cachelib, so we find a way to query that in the
   spec.
 - reporting of the remaining capacity per stream, although I'm not
   sure if that should be in the same ioctl/fcntl, or better done
   using a separate interface that has the stream number as input
   and the capacity as out, which would seem a lot simpler