diff mbox series

[11/11] dm crypt: Fix zoned block device support

Message ID 20210519025529.707897-12-damien.lemoal@wdc.com (mailing list archive)
State New, archived
Headers show
Series dm: Improve zoned block device support | expand

Commit Message

Damien Le Moal May 19, 2021, 2:55 a.m. UTC
Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
of the zone to be written instead of the actual sector location to
write. The write location is determined by the device and returned to
the host upon completion of the operation. This interface, while simple
and efficient for writing into sequential zones of a zoned block
device, is incompatible with the use of sector values to calculate a
cypher block IV. All data written in a zone end up using the same IV
values corresponding to the first sectors of the zone, but read
operation will specify any sector within the zone resulting in an IV
mismatch between encryption and decryption.

To solve this problem, report to DM core that zone append operations are
not supported. This result in the zone append operations being emulated
using regular write operations.

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/md/dm-crypt.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

Comments

Milan Broz May 19, 2021, 3:45 p.m. UTC | #1
On 19/05/2021 04:55, Damien Le Moal wrote:
> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
> of the zone to be written instead of the actual sector location to
> write. The write location is determined by the device and returned to
> the host upon completion of the operation. This interface, while simple
> and efficient for writing into sequential zones of a zoned block
> device, is incompatible with the use of sector values to calculate a
> cypher block IV. All data written in a zone end up using the same IV
> values corresponding to the first sectors of the zone, but read
> operation will specify any sector within the zone resulting in an IV
> mismatch between encryption and decryption.
> 
> To solve this problem, report to DM core that zone append operations are
> not supported. This result in the zone append operations being emulated
> using regular write operations.

Yes, I think this is definitive better approach and it does not need
to fiddle with dm-crypt crypto, thanks.

Just one comment below:

> 
> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/md/dm-crypt.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index f410ceee51d7..44339823371c 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	}
>  	cc->start = tmpll;
>  
> -	/*
> -	 * For zoned block devices, we need to preserve the issuer write
> -	 * ordering. To do so, disable write workqueues and force inline
> -	 * encryption completion.
> -	 */
>  	if (bdev_is_zoned(cc->dev->bdev)) {
> +		/*
> +		 * For zoned block devices, we need to preserve the issuer write
> +		 * ordering. To do so, disable write workqueues and force inline
> +		 * encryption completion.
> +		 */
>  		set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags);
>  		set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags);
> +
> +		/*
> +		 * All zone append writes to a zone of a zoned block device will
> +		 * have the same BIO sector, the start of the zone. When the
> +		 * cypher IV mode uses sector values, all data targeting a
> +		 * zone will be encrypted using the first sector numbers of the
> +		 * zone. This will not result in write errors but will
> +		 * cause most reads to fail as reads will use the sector values
> +		 * for the actual data locations, resulting in IV mismatch.
> +		 * To avoid this problem, ask DM core to emulate zone append
> +		 * operations with regular writes.
> +		 */
> +		DMWARN("Zone append operations will be emulated");

Do we really want to fill log with these?

(I know it is not a good example in this context - but during online reencryption,
dm-crypt table segments are continuously reloaded and because the message is in in table constructor,
it will flood the syslog with repeated message.)

Maybe move it to debug or remove it completely?
What would be nice to have some zoned info extension to lsblk so we can investigate
storage stack over zoned device (if there is some sysfs knob to detect it, it should be trivial)... 

Thanks,
Milan

> +		ti->emulate_zone_append = true;
>  	}
>  
>  	if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
>
Damien Le Moal May 20, 2021, midnight UTC | #2
On 2021/05/20 0:46, Milan Broz wrote:
> On 19/05/2021 04:55, Damien Le Moal wrote:
>> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
>> of the zone to be written instead of the actual sector location to
>> write. The write location is determined by the device and returned to
>> the host upon completion of the operation. This interface, while simple
>> and efficient for writing into sequential zones of a zoned block
>> device, is incompatible with the use of sector values to calculate a
>> cypher block IV. All data written in a zone end up using the same IV
>> values corresponding to the first sectors of the zone, but read
>> operation will specify any sector within the zone resulting in an IV
>> mismatch between encryption and decryption.
>>
>> To solve this problem, report to DM core that zone append operations are
>> not supported. This result in the zone append operations being emulated
>> using regular write operations.
> 
> Yes, I think this is definitive better approach and it does not need
> to fiddle with dm-crypt crypto, thanks.
> 
> Just one comment below:
> 
>>
>> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>  drivers/md/dm-crypt.c | 24 +++++++++++++++++++-----
>>  1 file changed, 19 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
>> index f410ceee51d7..44339823371c 100644
>> --- a/drivers/md/dm-crypt.c
>> +++ b/drivers/md/dm-crypt.c
>> @@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>>  	}
>>  	cc->start = tmpll;
>>  
>> -	/*
>> -	 * For zoned block devices, we need to preserve the issuer write
>> -	 * ordering. To do so, disable write workqueues and force inline
>> -	 * encryption completion.
>> -	 */
>>  	if (bdev_is_zoned(cc->dev->bdev)) {
>> +		/*
>> +		 * For zoned block devices, we need to preserve the issuer write
>> +		 * ordering. To do so, disable write workqueues and force inline
>> +		 * encryption completion.
>> +		 */
>>  		set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags);
>>  		set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags);
>> +
>> +		/*
>> +		 * All zone append writes to a zone of a zoned block device will
>> +		 * have the same BIO sector, the start of the zone. When the
>> +		 * cypher IV mode uses sector values, all data targeting a
>> +		 * zone will be encrypted using the first sector numbers of the
>> +		 * zone. This will not result in write errors but will
>> +		 * cause most reads to fail as reads will use the sector values
>> +		 * for the actual data locations, resulting in IV mismatch.
>> +		 * To avoid this problem, ask DM core to emulate zone append
>> +		 * operations with regular writes.
>> +		 */
>> +		DMWARN("Zone append operations will be emulated");
> 
> Do we really want to fill log with these?

I added this to signal to the user, indirectly, that performance may be impacted
as the zone write locking mechanism used for the emulation essentially limits
write operations to at most 1 per zone. Overall, the drive QD can still be high,
but per zone, it will be at most one write per zone at any time.

> (I know it is not a good example in this context - but during online reencryption,
> dm-crypt table segments are continuously reloaded and because the message is in in table constructor,
> it will flood the syslog with repeated message.)
> 
> Maybe move it to debug or remove it completely?

OK. I will change this to debug.

> What would be nice to have some zoned info extension to lsblk so we can investigate
> storage stack over zoned device (if there is some sysfs knob to detect it, it should be trivial)... 

Yes, it is simple to add a sysfs attribute like
/sys/block/xxx/queue/zone_append_emulated.

That can be done later though. I will see if that can really help applications
or FSes. Right now, I do not see the need for this attribute. After all, all
scsi SMR drives already have zone append emulation (in the SD driver).

Thanks for the review. Will send V2 later today.

> 
> Thanks,
> Milan
> 
>> +		ti->emulate_zone_append = true;
>>  	}
>>  
>>  	if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
>>
> 
>
diff mbox series

Patch

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index f410ceee51d7..44339823371c 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3280,14 +3280,28 @@  static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	}
 	cc->start = tmpll;
 
-	/*
-	 * For zoned block devices, we need to preserve the issuer write
-	 * ordering. To do so, disable write workqueues and force inline
-	 * encryption completion.
-	 */
 	if (bdev_is_zoned(cc->dev->bdev)) {
+		/*
+		 * For zoned block devices, we need to preserve the issuer write
+		 * ordering. To do so, disable write workqueues and force inline
+		 * encryption completion.
+		 */
 		set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags);
 		set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags);
+
+		/*
+		 * All zone append writes to a zone of a zoned block device will
+		 * have the same BIO sector, the start of the zone. When the
+		 * cypher IV mode uses sector values, all data targeting a
+		 * zone will be encrypted using the first sector numbers of the
+		 * zone. This will not result in write errors but will
+		 * cause most reads to fail as reads will use the sector values
+		 * for the actual data locations, resulting in IV mismatch.
+		 * To avoid this problem, ask DM core to emulate zone append
+		 * operations with regular writes.
+		 */
+		DMWARN("Zone append operations will be emulated");
+		ti->emulate_zone_append = true;
 	}
 
 	if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {