Message ID | 20210519025529.707897-12-damien.lemoal@wdc.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | dm: Improve zoned block device support | expand |
On 19/05/2021 04:55, Damien Le Moal wrote: > Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector > of the zone to be written instead of the actual sector location to > write. The write location is determined by the device and returned to > the host upon completion of the operation. This interface, while simple > and efficient for writing into sequential zones of a zoned block > device, is incompatible with the use of sector values to calculate a > cypher block IV. All data written in a zone end up using the same IV > values corresponding to the first sectors of the zone, but read > operation will specify any sector within the zone resulting in an IV > mismatch between encryption and decryption. > > To solve this problem, report to DM core that zone append operations are > not supported. This result in the zone append operations being emulated > using regular write operations. Yes, I think this is definitive better approach and it does not need to fiddle with dm-crypt crypto, thanks. Just one comment below: > > Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > drivers/md/dm-crypt.c | 24 +++++++++++++++++++----- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c > index f410ceee51d7..44339823371c 100644 > --- a/drivers/md/dm-crypt.c > +++ b/drivers/md/dm-crypt.c > @@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv) > } > cc->start = tmpll; > > - /* > - * For zoned block devices, we need to preserve the issuer write > - * ordering. To do so, disable write workqueues and force inline > - * encryption completion. > - */ > if (bdev_is_zoned(cc->dev->bdev)) { > + /* > + * For zoned block devices, we need to preserve the issuer write > + * ordering. To do so, disable write workqueues and force inline > + * encryption completion. > + */ > set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags); > set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags); > + > + /* > + * All zone append writes to a zone of a zoned block device will > + * have the same BIO sector, the start of the zone. When the > + * cypher IV mode uses sector values, all data targeting a > + * zone will be encrypted using the first sector numbers of the > + * zone. This will not result in write errors but will > + * cause most reads to fail as reads will use the sector values > + * for the actual data locations, resulting in IV mismatch. > + * To avoid this problem, ask DM core to emulate zone append > + * operations with regular writes. > + */ > + DMWARN("Zone append operations will be emulated"); Do we really want to fill log with these? (I know it is not a good example in this context - but during online reencryption, dm-crypt table segments are continuously reloaded and because the message is in in table constructor, it will flood the syslog with repeated message.) Maybe move it to debug or remove it completely? What would be nice to have some zoned info extension to lsblk so we can investigate storage stack over zoned device (if there is some sysfs knob to detect it, it should be trivial)... Thanks, Milan > + ti->emulate_zone_append = true; > } > > if (crypt_integrity_aead(cc) || cc->integrity_iv_size) { > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 2021/05/20 0:46, Milan Broz wrote: > On 19/05/2021 04:55, Damien Le Moal wrote: >> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector >> of the zone to be written instead of the actual sector location to >> write. The write location is determined by the device and returned to >> the host upon completion of the operation. This interface, while simple >> and efficient for writing into sequential zones of a zoned block >> device, is incompatible with the use of sector values to calculate a >> cypher block IV. All data written in a zone end up using the same IV >> values corresponding to the first sectors of the zone, but read >> operation will specify any sector within the zone resulting in an IV >> mismatch between encryption and decryption. >> >> To solve this problem, report to DM core that zone append operations are >> not supported. This result in the zone append operations being emulated >> using regular write operations. > > Yes, I think this is definitive better approach and it does not need > to fiddle with dm-crypt crypto, thanks. > > Just one comment below: > >> >> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >> --- >> drivers/md/dm-crypt.c | 24 +++++++++++++++++++----- >> 1 file changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c >> index f410ceee51d7..44339823371c 100644 >> --- a/drivers/md/dm-crypt.c >> +++ b/drivers/md/dm-crypt.c >> @@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv) >> } >> cc->start = tmpll; >> >> - /* >> - * For zoned block devices, we need to preserve the issuer write >> - * ordering. To do so, disable write workqueues and force inline >> - * encryption completion. >> - */ >> if (bdev_is_zoned(cc->dev->bdev)) { >> + /* >> + * For zoned block devices, we need to preserve the issuer write >> + * ordering. To do so, disable write workqueues and force inline >> + * encryption completion. >> + */ >> set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags); >> set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags); >> + >> + /* >> + * All zone append writes to a zone of a zoned block device will >> + * have the same BIO sector, the start of the zone. When the >> + * cypher IV mode uses sector values, all data targeting a >> + * zone will be encrypted using the first sector numbers of the >> + * zone. This will not result in write errors but will >> + * cause most reads to fail as reads will use the sector values >> + * for the actual data locations, resulting in IV mismatch. >> + * To avoid this problem, ask DM core to emulate zone append >> + * operations with regular writes. >> + */ >> + DMWARN("Zone append operations will be emulated"); > > Do we really want to fill log with these? I added this to signal to the user, indirectly, that performance may be impacted as the zone write locking mechanism used for the emulation essentially limits write operations to at most 1 per zone. Overall, the drive QD can still be high, but per zone, it will be at most one write per zone at any time. > (I know it is not a good example in this context - but during online reencryption, > dm-crypt table segments are continuously reloaded and because the message is in in table constructor, > it will flood the syslog with repeated message.) > > Maybe move it to debug or remove it completely? OK. I will change this to debug. > What would be nice to have some zoned info extension to lsblk so we can investigate > storage stack over zoned device (if there is some sysfs knob to detect it, it should be trivial)... Yes, it is simple to add a sysfs attribute like /sys/block/xxx/queue/zone_append_emulated. That can be done later though. I will see if that can really help applications or FSes. Right now, I do not see the need for this attribute. After all, all scsi SMR drives already have zone append emulation (in the SD driver). Thanks for the review. Will send V2 later today. > > Thanks, > Milan > >> + ti->emulate_zone_append = true; >> } >> >> if (crypt_integrity_aead(cc) || cc->integrity_iv_size) { >> > >
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index f410ceee51d7..44339823371c 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv) } cc->start = tmpll; - /* - * For zoned block devices, we need to preserve the issuer write - * ordering. To do so, disable write workqueues and force inline - * encryption completion. - */ if (bdev_is_zoned(cc->dev->bdev)) { + /* + * For zoned block devices, we need to preserve the issuer write + * ordering. To do so, disable write workqueues and force inline + * encryption completion. + */ set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags); set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags); + + /* + * All zone append writes to a zone of a zoned block device will + * have the same BIO sector, the start of the zone. When the + * cypher IV mode uses sector values, all data targeting a + * zone will be encrypted using the first sector numbers of the + * zone. This will not result in write errors but will + * cause most reads to fail as reads will use the sector values + * for the actual data locations, resulting in IV mismatch. + * To avoid this problem, ask DM core to emulate zone append + * operations with regular writes. + */ + DMWARN("Zone append operations will be emulated"); + ti->emulate_zone_append = true; } if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector of the zone to be written instead of the actual sector location to write. The write location is determined by the device and returned to the host upon completion of the operation. This interface, while simple and efficient for writing into sequential zones of a zoned block device, is incompatible with the use of sector values to calculate a cypher block IV. All data written in a zone end up using the same IV values corresponding to the first sectors of the zone, but read operation will specify any sector within the zone resulting in an IV mismatch between encryption and decryption. To solve this problem, report to DM core that zone append operations are not supported. This result in the zone append operations being emulated using regular write operations. Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- drivers/md/dm-crypt.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-)