diff mbox series

[v17,02/14] dm-linear: Report to the block layer that the write order is preserved

Message ID 20250115224649.3973718-3-bvanassche@acm.org (mailing list archive)
State New
Headers show
Series Improve write performance for zoned UFS devices | expand

Commit Message

Bart Van Assche Jan. 15, 2025, 10:46 p.m. UTC
Enable write pipelining if dm-linear is stacked on top of a driver that
supports write pipelining.

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/md/dm-linear.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Mikulas Patocka Jan. 17, 2025, 6:04 p.m. UTC | #1
On Wed, 15 Jan 2025, Bart Van Assche wrote:

> Enable write pipelining if dm-linear is stacked on top of a driver that
> supports write pipelining.

Hi

What if you have multiple linear targets in a table? Then, the write order 
would not be preserved.

How is write pipelining supposed to work with suspend/resume? dm doesn't 
preserve the order of writes in case of suspend.

Mikulas

> Cc: Alasdair Kergon <agk@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Cc: Mikulas Patocka <mpatocka@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/md/dm-linear.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index 49fb0f684193..967fbf856abc 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -148,6 +148,11 @@ static int linear_report_zones(struct dm_target *ti,
>  #define linear_report_zones NULL
>  #endif
>  
> +static void linear_io_hints(struct dm_target *ti, struct queue_limits *limits)
> +{
> +	limits->driver_preserves_write_order = true;
> +}
> +
>  static int linear_iterate_devices(struct dm_target *ti,
>  				  iterate_devices_callout_fn fn, void *data)
>  {
> @@ -209,6 +214,7 @@ static struct target_type linear_target = {
>  	.map    = linear_map,
>  	.status = linear_status,
>  	.prepare_ioctl = linear_prepare_ioctl,
> +	.io_hints = linear_io_hints,
>  	.iterate_devices = linear_iterate_devices,
>  	.direct_access = linear_dax_direct_access,
>  	.dax_zero_page_range = linear_dax_zero_page_range,
>
Bart Van Assche Jan. 21, 2025, 9:38 p.m. UTC | #2
On 1/17/25 10:04 AM, Mikulas Patocka wrote:
> What if you have multiple linear targets in a table? Then, the write order
> would not be preserved.

If dm-linear is used on top of zoned storage, shouldn't each target
involve a whole number of zones? Doesn't this mean that dm-linear will
preserve the write order per zone?

> How is write pipelining supposed to work with suspend/resume? dm doesn't
> preserve the order of writes in case of suspend.

That's an interesting question. I expect that the following will happen
upon resume if zoned writes would have been reordered by dm-linear:
* The block device reports one or more unaligned write errors.
* For the zones for which an unaligned write error has been reported,
   the flag BLK_ZONE_WPLUG_ERROR is set (see also patch 07/14 in this
   series).
* Further zoned writes are postponed for the BLK_ZONE_WPLUG_ERROR zones
   until all pending zoned writes have completed.
* Once all pending zoned writes have completed for a
   BLK_ZONE_WPLUG_ERROR zone, these are resubmitted. This happens in LBA
   order.
* The resubmitted writes will succeed unless the submitter (e.g. a
   filesystem) left a gap between the zoned writes. If the submitter
   does not follow the zoned block device specification, the zoned
   writes will be retried until the number of retries has been exhausted.
   Block devices are expected to set the number of retries to a small
   positive number.

Thanks,

Bart.
Mikulas Patocka Jan. 27, 2025, 5:55 p.m. UTC | #3
On Tue, 21 Jan 2025, Bart Van Assche wrote:

> > How is write pipelining supposed to work with suspend/resume? dm doesn't
> > preserve the order of writes in case of suspend.
> 
> That's an interesting question. I expect that the following will happen
> upon resume if zoned writes would have been reordered by dm-linear:
> * The block device reports one or more unaligned write errors.
> * For the zones for which an unaligned write error has been reported,
>   the flag BLK_ZONE_WPLUG_ERROR is set (see also patch 07/14 in this
>   series).
> * Further zoned writes are postponed for the BLK_ZONE_WPLUG_ERROR zones
>   until all pending zoned writes have completed.
> * Once all pending zoned writes have completed for a
>   BLK_ZONE_WPLUG_ERROR zone, these are resubmitted. This happens in LBA
>   order.
> * The resubmitted writes will succeed unless the submitter (e.g. a
>   filesystem) left a gap between the zoned writes. If the submitter
>   does not follow the zoned block device specification, the zoned
>   writes will be retried until the number of retries has been exhausted.
>   Block devices are expected to set the number of retries to a small
>   positive number.

On suspend, dm_submit_bio calls queue_io to add bios to a list. On resume, 
the list is processed in order and the bios are submitted, but this 
submitting of deferred bios may race with new bios that may be received 
and directed to the underlying block device - so, the new bios may be 
submitted before the old bios.

Mikulas
diff mbox series

Patch

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 49fb0f684193..967fbf856abc 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -148,6 +148,11 @@  static int linear_report_zones(struct dm_target *ti,
 #define linear_report_zones NULL
 #endif
 
+static void linear_io_hints(struct dm_target *ti, struct queue_limits *limits)
+{
+	limits->driver_preserves_write_order = true;
+}
+
 static int linear_iterate_devices(struct dm_target *ti,
 				  iterate_devices_callout_fn fn, void *data)
 {
@@ -209,6 +214,7 @@  static struct target_type linear_target = {
 	.map    = linear_map,
 	.status = linear_status,
 	.prepare_ioctl = linear_prepare_ioctl,
+	.io_hints = linear_io_hints,
 	.iterate_devices = linear_iterate_devices,
 	.direct_access = linear_dax_direct_access,
 	.dax_zero_page_range = linear_dax_zero_page_range,