diff mbox series

blk-settings: round down io_opt to at least 4K

Message ID 81b399f6-55f5-4aa2-0f31-8b4f8a44e6a4@redhat.com (mailing list archive)
State New
Headers show
Series blk-settings: round down io_opt to at least 4K | expand

Commit Message

Mikulas Patocka Jan. 20, 2025, 3:16 p.m. UTC
Some SATA SSDs and most NVMe SSDs report physical block size 512 bytes,
but they use 4K remapping table internally and they do slow
read-modify-write cycle for requests that are not aligned on 4K boundary.
Therefore, io_opt should be aligned on 4K.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: a23634644afc ("block: take io_opt and io_min into account for max_sectors")
Fixes: 9c0ba14828d6 ("blk-settings: round down io_opt to physical_block_size")
Cc: stable@vger.kernel.org	# v6.11+

---
 block/blk-settings.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Jan. 22, 2025, 6:12 a.m. UTC | #1
On Mon, Jan 20, 2025 at 04:16:26PM +0100, Mikulas Patocka wrote:
> Some SATA SSDs and most NVMe SSDs report physical block size 512 bytes,
> but they use 4K remapping table internally and they do slow
> read-modify-write cycle for requests that are not aligned on 4K boundary.
> Therefore, io_opt should be aligned on 4K.

Not really.  I mean it's always smart to not do tiny unaligned I/O
unless you have to.  So we're not just going to cap an exported value
to a magic number because of something.


> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: a23634644afc ("block: take io_opt and io_min into account for max_sectors")
> Fixes: 9c0ba14828d6 ("blk-settings: round down io_opt to physical_block_size")

Please explain how this actually is a fix.
Milan Broz Jan. 23, 2025, 12:24 p.m. UTC | #2
On 1/20/25 4:16 PM, Mikulas Patocka wrote:
> Some SATA SSDs and most NVMe SSDs report physical block size 512 bytes,
> but they use 4K remapping table internally and they do slow
> read-modify-write cycle for requests that are not aligned on 4K boundary.
> Therefore, io_opt should be aligned on 4K.

This also changes how various (usually broken) USB enclosures reports opt-io
(as you already found, it breaks our cryptsetup align test simulating these
resulting in much larger LUKS data alignment).

Not sure if the patch is acceptable, but if so, please add comment that
it can have side effects - it apparently affects more systems than NVMe and SSDs,
definitely some USB storage devices.

Milan


> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: a23634644afc ("block: take io_opt and io_min into account for max_sectors")
> Fixes: 9c0ba14828d6 ("blk-settings: round down io_opt to physical_block_size")
> Cc: stable@vger.kernel.org	# v6.11+
> 
> ---
>   block/blk-settings.c |    6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/block/blk-settings.c
> ===================================================================
> --- linux-2.6.orig/block/blk-settings.c	2025-01-03 21:10:56.000000000 +0100
> +++ linux-2.6/block/blk-settings.c	2025-01-20 15:59:13.000000000 +0100
> @@ -269,8 +269,12 @@ int blk_validate_limits(struct queue_lim
>   	 * The optimal I/O size may not be aligned to physical block size
>   	 * (because it may be limited by dma engines which have no clue about
>   	 * block size of the disks attached to them), so we round it down here.
> +	 *
> +	 * Note that some SSDs erroneously report physical_block_size 512
> +	 * despite the fact that they have remapping table granularity 4K and
> +	 * they perform read-modify-write for unaligned requests.
>   	 */
> -	lim->io_opt = round_down(lim->io_opt, lim->physical_block_size);
> +	lim->io_opt = round_down(lim->io_opt, max(4096, lim->physical_block_size));
>   
>   	/*
>   	 * max_hw_sectors has a somewhat weird default for historical reason,
>
diff mbox series

Patch

Index: linux-2.6/block/blk-settings.c
===================================================================
--- linux-2.6.orig/block/blk-settings.c	2025-01-03 21:10:56.000000000 +0100
+++ linux-2.6/block/blk-settings.c	2025-01-20 15:59:13.000000000 +0100
@@ -269,8 +269,12 @@  int blk_validate_limits(struct queue_lim
 	 * The optimal I/O size may not be aligned to physical block size
 	 * (because it may be limited by dma engines which have no clue about
 	 * block size of the disks attached to them), so we round it down here.
+	 *
+	 * Note that some SSDs erroneously report physical_block_size 512
+	 * despite the fact that they have remapping table granularity 4K and
+	 * they perform read-modify-write for unaligned requests.
 	 */
-	lim->io_opt = round_down(lim->io_opt, lim->physical_block_size);
+	lim->io_opt = round_down(lim->io_opt, max(4096, lim->physical_block_size));
 
 	/*
 	 * max_hw_sectors has a somewhat weird default for historical reason,