blk-settings: round down io_opt to at least 4K

Message ID	81b399f6-55f5-4aa2-0f31-8b4f8a44e6a4@redhat.com (mailing list archive)
State	New
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CECDA18FDC8 for <linux-block@vger.kernel.org>; Mon, 20 Jan 2025 15:16:37 +0000 (UTC) Date: Mon, 20 Jan 2025 16:16:26 +0100 (CET) From: Mikulas Patocka <mpatocka@redhat.com> To: Jens Axboe <axboe@kernel.dk> cc: Alasdair Kergon <agk@redhat.com>, Mike Snitzer <snitzer@kernel.org>, Zdenek Kabelac <zkabelac@redhat.com>, linux-block@vger.kernel.org, dm-devel@lists.linux.dev Subject: [PATCH] blk-settings: round down io_opt to at least 4K Message-ID: <81b399f6-55f5-4aa2-0f31-8b4f8a44e6a4@redhat.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII
Series	blk-settings: round down io_opt to at least 4K \| expand blk-settings: round down io_opt to at least 4K

Message ID

81b399f6-55f5-4aa2-0f31-8b4f8a44e6a4@redhat.com (mailing list archive)

State

New

Headers

Date: Mon, 20 Jan 2025 16:16:26 +0100 (CET)
From: Mikulas Patocka <mpatocka@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
cc: Alasdair Kergon <agk@redhat.com>, Mike Snitzer <snitzer@kernel.org>,
    Zdenek Kabelac <zkabelac@redhat.com>, linux-block@vger.kernel.org,
    dm-devel@lists.linux.dev
Subject: [PATCH] blk-settings: round down io_opt to at least 4K
Message-ID: <81b399f6-55f5-4aa2-0f31-8b4f8a44e6a4@redhat.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII

Series

blk-settings: round down io_opt to at least 4K | expand

Commit Message

Mikulas Patocka Jan. 20, 2025, 3:16 p.m. UTC

Some SATA SSDs and most NVMe SSDs report physical block size 512 bytes,
but they use 4K remapping table internally and they do slow
read-modify-write cycle for requests that are not aligned on 4K boundary.
Therefore, io_opt should be aligned on 4K.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: a23634644afc ("block: take io_opt and io_min into account for max_sectors")
Fixes: 9c0ba14828d6 ("blk-settings: round down io_opt to physical_block_size")
Cc: stable@vger.kernel.org	# v6.11+

---
 block/blk-settings.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Jan. 22, 2025, 6:12 a.m. UTC | #1

On Mon, Jan 20, 2025 at 04:16:26PM +0100, Mikulas Patocka wrote:
> Some SATA SSDs and most NVMe SSDs report physical block size 512 bytes,
> but they use 4K remapping table internally and they do slow
> read-modify-write cycle for requests that are not aligned on 4K boundary.
> Therefore, io_opt should be aligned on 4K.

Not really.  I mean it's always smart to not do tiny unaligned I/O
unless you have to.  So we're not just going to cap an exported value
to a magic number because of something.


> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: a23634644afc ("block: take io_opt and io_min into account for max_sectors")
> Fixes: 9c0ba14828d6 ("blk-settings: round down io_opt to physical_block_size")

Please explain how this actually is a fix.

Milan Broz Jan. 23, 2025, 12:24 p.m. UTC | #2

On 1/20/25 4:16 PM, Mikulas Patocka wrote:
> Some SATA SSDs and most NVMe SSDs report physical block size 512 bytes,
> but they use 4K remapping table internally and they do slow
> read-modify-write cycle for requests that are not aligned on 4K boundary.
> Therefore, io_opt should be aligned on 4K.

This also changes how various (usually broken) USB enclosures reports opt-io
(as you already found, it breaks our cryptsetup align test simulating these
resulting in much larger LUKS data alignment).

Not sure if the patch is acceptable, but if so, please add comment that
it can have side effects - it apparently affects more systems than NVMe and SSDs,
definitely some USB storage devices.

Milan


> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: a23634644afc ("block: take io_opt and io_min into account for max_sectors")
> Fixes: 9c0ba14828d6 ("blk-settings: round down io_opt to physical_block_size")
> Cc: stable@vger.kernel.org	# v6.11+
> 
> ---
>   block/blk-settings.c |    6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/block/blk-settings.c
> ===================================================================
> --- linux-2.6.orig/block/blk-settings.c	2025-01-03 21:10:56.000000000 +0100
> +++ linux-2.6/block/blk-settings.c	2025-01-20 15:59:13.000000000 +0100
> @@ -269,8 +269,12 @@ int blk_validate_limits(struct queue_lim
>   	 * The optimal I/O size may not be aligned to physical block size
>   	 * (because it may be limited by dma engines which have no clue about
>   	 * block size of the disks attached to them), so we round it down here.
> +	 *
> +	 * Note that some SSDs erroneously report physical_block_size 512
> +	 * despite the fact that they have remapping table granularity 4K and
> +	 * they perform read-modify-write for unaligned requests.
>   	 */
> -	lim->io_opt = round_down(lim->io_opt, lim->physical_block_size);
> +	lim->io_opt = round_down(lim->io_opt, max(4096, lim->physical_block_size));
>   
>   	/*
>   	 * max_hw_sectors has a somewhat weird default for historical reason,
>

Index: linux-2.6/block/blk-settings.c
===================================================================
--- linux-2.6.orig/block/blk-settings.c	2025-01-03 21:10:56.000000000 +0100
+++ linux-2.6/block/blk-settings.c	2025-01-20 15:59:13.000000000 +0100
@@ -269,8 +269,12 @@  int blk_validate_limits(struct queue_lim
 	 * The optimal I/O size may not be aligned to physical block size
 	 * (because it may be limited by dma engines which have no clue about
 	 * block size of the disks attached to them), so we round it down here.
+	 *
+	 * Note that some SSDs erroneously report physical_block_size 512
+	 * despite the fact that they have remapping table granularity 4K and
+	 * they perform read-modify-write for unaligned requests.
 	 */
-	lim->io_opt = round_down(lim->io_opt, lim->physical_block_size);
+	lim->io_opt = round_down(lim->io_opt, max(4096, lim->physical_block_size));
 
 	/*
 	 * max_hw_sectors has a somewhat weird default for historical reason,

blk-settings: round down io_opt to at least 4K

Commit Message

Comments

Patch