mbox series

[0/4] virtio: handle zoned backing devices

Message ID 20190709203806.17550-1-dmitry.fomichev@wdc.com (mailing list archive)
Headers show
Series virtio: handle zoned backing devices | expand

Message

Dmitry Fomichev July 9, 2019, 8:38 p.m. UTC
Currently, attaching zoned block devices (i.e. storage devices
compliant to ZAC/ZBC standards) using several virtio methods doesn't
work - the zoned devices appear as regular block devices at the guest.
This may cause unexpected i/o errors and, potentially, some data
corruption.

To be more precise, attaching a zoned device via virtio-pci-blk,
virtio-scsi-pci/scsi-disk or virtio-scsi-pci/scsi-hd demonstrates the
above behavior. A simple fix is needed to make
virtio-scsi-pci/scsi-block work and this is covered by a different
patch. The virtio-scsi-pci/scsi-generic method appears to handle zoned
devices without problems.

This patchset adds code to check if the backing device that is being
opened is zoned. If this is the case, the new code prohibits
virtualization of the device for all use cases lacking proper zoned
support.

Considering that there is still a couple of different working ways
to virtualize a ZBD, this patchset provides a reasonable short-term
solution for this problem. What about long term?

It appears to be beneficial to add proper ZBD support to virtio-blk.
In order to support this use case properly, some virtio-blk protocol
changes will be necessary. They are needed to allow the host code to
propagate some ZBD properties that are required for virtio guest
driver to configure the guest block device as ZBD, such as zoned
device model, zone size and the total number of zones. Further, some
support needs to be added for REPORT ZONES command as well as for zone
operations, such as OPEN ZONE, CLOSE ZONE, FINISH ZONE and RESET ZONE.

These additions to the protocol are relatively straightforward, but
they need to be approved by the virtio TC and the whole process may
take some time.

ZBD support for virtio-scsi-pci/scsi-disk and virtio-scsi-pci/scsi-hd
does not seem as necessary. Users will be expected to attach zoned
block devices via virtio-scsi-pci/scsi-block instead.

This patchset contains some Linux-specific code. This code is
necessary to obtain Zoned Block Device model value from Linux sysfs.
Dmitry Fomichev (3):
  block: Add zoned device model property
  raw: Recognize zoned backing devices
  virtio-blk: Don't realize zoned block devices

Shin'ichiro Kawasaki (1):
  hw/scsi: Don't realize zoned block devices for virtio-scsi legacy
    drivers

 block.c                        | 14 +++++++
 block/block-backend.c          | 20 ++++++++++
 block/file-posix.c             | 69 ++++++++++++++++++++++++++++------
 block/raw-format.c             |  8 ++++
 hw/block/virtio-blk.c          |  5 +++
 hw/scsi/scsi-disk.c            |  5 +++
 include/block/block.h          |  9 +++++
 include/block/block_int.h      |  4 ++
 include/sysemu/block-backend.h |  2 +
 9 files changed, 125 insertions(+), 11 deletions(-)

Comments

Paolo Bonzini July 10, 2019, 10:09 a.m. UTC | #1
On 09/07/19 22:38, Dmitry Fomichev wrote:
> Currently, attaching zoned block devices (i.e. storage devices
> compliant to ZAC/ZBC standards) using several virtio methods doesn't
> work - the zoned devices appear as regular block devices at the guest.
> This may cause unexpected i/o errors and, potentially, some data
> corruption.
> 
> To be more precise, attaching a zoned device via virtio-pci-blk,
> virtio-scsi-pci/scsi-disk or virtio-scsi-pci/scsi-hd demonstrates the
> above behavior. A simple fix is needed to make
> virtio-scsi-pci/scsi-block work and this is covered by a different
> patch. The virtio-scsi-pci/scsi-generic method appears to handle zoned
> devices without problems.

The problem with this approach is that other devices (e.g. ide-hd or sd
card) also break with zoned devices and the only way to fix it would be
to add code denying zoned block devices to all of them.

The question then becomes how to define a whitelist.  One possiblity is
to add a QOM interface (for example TYPE_ZONED_BLOCK_SUPPORT) to
scsi-block and scsi-generic.  In do_parse_drive you can query the
BlockBackend with bdrv_get_zoned_info, and return an error if the
backend is a zoned block device and the device does not implement
TYPE_ZONED_BLOCK_SUPPORT.  (Commit 6b1566c is an example of adding a new
QOM interface; in your case, it would be simpler as the interface would
not have any method).  Kevin, what do you think?

Also, why deny passing Host Aware devices?

Paolo
Kevin Wolf July 10, 2019, 11:02 a.m. UTC | #2
Am 10.07.2019 um 12:09 hat Paolo Bonzini geschrieben:
> On 09/07/19 22:38, Dmitry Fomichev wrote:
> > Currently, attaching zoned block devices (i.e. storage devices
> > compliant to ZAC/ZBC standards) using several virtio methods doesn't
> > work - the zoned devices appear as regular block devices at the guest.
> > This may cause unexpected i/o errors and, potentially, some data
> > corruption.
> > 
> > To be more precise, attaching a zoned device via virtio-pci-blk,
> > virtio-scsi-pci/scsi-disk or virtio-scsi-pci/scsi-hd demonstrates the
> > above behavior. A simple fix is needed to make
> > virtio-scsi-pci/scsi-block work and this is covered by a different
> > patch. The virtio-scsi-pci/scsi-generic method appears to handle zoned
> > devices without problems.
> 
> The problem with this approach is that other devices (e.g. ide-hd or sd
> card) also break with zoned devices and the only way to fix it would be
> to add code denying zoned block devices to all of them.
> 
> The question then becomes how to define a whitelist.  One possiblity is
> to add a QOM interface (for example TYPE_ZONED_BLOCK_SUPPORT) to
> scsi-block and scsi-generic.  In do_parse_drive you can query the
> BlockBackend with bdrv_get_zoned_info, and return an error if the
> backend is a zoned block device and the device does not implement
> TYPE_ZONED_BLOCK_SUPPORT.  (Commit 6b1566c is an example of adding a new
> QOM interface; in your case, it would be simpler as the interface would
> not have any method).  Kevin, what do you think?

What about non-device users such as block jobs or (NBD) exports? Won't
they have to special-case such devices, too? In fact, what about image
format drivers or even filters?

I feel that this needs to be managed at the BDS level somehow. Not sure
which mechanism to use, though. Permissions would be suitable for a
blacklist approach, but I agree with you that we need a whitelist
instead.

Hm... Actually, file-posix implements .bdrv_check_perm and could just
refuse attaching a parent there if it doesn't request a specific
permission like BLK_PERM_SUPPORT_ZONED. That should give us the
whitelist semantics through existing infrastructure.

Kevin
Paolo Bonzini July 10, 2019, 11:33 a.m. UTC | #3
On 10/07/19 13:02, Kevin Wolf wrote:
> Hm... Actually, file-posix implements .bdrv_check_perm and could just
> refuse attaching a parent there if it doesn't request a specific
> permission like BLK_PERM_SUPPORT_ZONED. That should give us the
> whitelist semantics through existing infrastructure.

I'd like Dmitry to have something more precise to base his work on.  The
permissions system is really complicated and I never really wrapped my
head around it, so I need your help.

IIUC, blkconf_apply_backend_options would grow a new argument (like
"resizable") and that argument would add BLK_PERM_SUPPORT_ZONED to the
perm that blkconf_apply_backend_options passes to blk_set_perm.  On the
other side raw_check_perm would say something like

    if (is_zoned(s) && !(perm & BLK_PERM_SUPPORT_ZONED)) {
        error_setg(....);
        return -ENOTSUP;
    }

Is this correct?

In addition, BLK_PERM_SUPPORT_ZONED would have to be a shared
permission, since it's possible to assign the same block device to
multiple scsi-block devices.  So BLK_PERM_SUPPORT_ZONED would be added
unconditionally to shared_perm.

Paolo

ps: I have always thought that shared_perm is expressed the wrong way
and should have been "denied_perm".  How hard would it be to change that
now?
Kevin Wolf July 10, 2019, 9:09 p.m. UTC | #4
Am 10.07.2019 um 13:33 hat Paolo Bonzini geschrieben:
> On 10/07/19 13:02, Kevin Wolf wrote:
> > Hm... Actually, file-posix implements .bdrv_check_perm and could just
> > refuse attaching a parent there if it doesn't request a specific
> > permission like BLK_PERM_SUPPORT_ZONED. That should give us the
> > whitelist semantics through existing infrastructure.
> 
> I'd like Dmitry to have something more precise to base his work on.  The
> permissions system is really complicated and I never really wrapped my
> head around it, so I need your help.
> 
> IIUC, blkconf_apply_backend_options would grow a new argument (like
> "resizable") and that argument would add BLK_PERM_SUPPORT_ZONED to the
> perm that blkconf_apply_backend_options passes to blk_set_perm.  On the
> other side raw_check_perm would say something like
> 
>     if (is_zoned(s) && !(perm & BLK_PERM_SUPPORT_ZONED)) {
>         error_setg(....);
>         return -ENOTSUP;
>     }
> 
> Is this correct?

Yes, I think this is how you'd best implement it.

> In addition, BLK_PERM_SUPPORT_ZONED would have to be a shared
> permission, since it's possible to assign the same block device to
> multiple scsi-block devices.  So BLK_PERM_SUPPORT_ZONED would be added
> unconditionally to shared_perm.

Right, this part shows that we're kind of abusing the permission system
here. I think unconditionally adding BLK_PERM_SUPPORT_ZONED to the set
of shared permissions could probably happen centrally in
bdrv_child_perm().

> ps: I have always thought that shared_perm is expressed the wrong way
> and should have been "denied_perm".  How hard would it be to change that
> now?

I'm not sure it would be better. It is shared_perm because that means
that the default is restrictive (error mode: operation refused, clearly
reported, easy to fix) rather than permissive (error mode: image
corruption, hard to figure out where we were too permissive). Basically,
whitelist instead of blacklist, once again.

But if we did indeed decide to change it, the only way to find out how
hard it is would be doing it. I suspect not too hard in principle, but
making sure that we converted all callers and don't introduce wrong
callers later through (silent) merge conflicts is the more concerning
part.

Kevin
Dmitry Fomichev July 11, 2019, 12:52 a.m. UTC | #5
On Wed, 2019-07-10 at 23:09 +0200, Kevin Wolf wrote:
> Am 10.07.2019 um 13:33 hat Paolo Bonzini geschrieben:
> > On 10/07/19 13:02, Kevin Wolf wrote:
> > > Hm... Actually, file-posix implements .bdrv_check_perm and could just
> > > refuse attaching a parent there if it doesn't request a specific
> > > permission like BLK_PERM_SUPPORT_ZONED. That should give us the
> > > whitelist semantics through existing infrastructure.
> > 
> > I'd like Dmitry to have something more precise to base his work on.  The
> > permissions system is really complicated and I never really wrapped my
> > head around it, so I need your help.
> > 
> > IIUC, blkconf_apply_backend_options would grow a new argument (like
> > "resizable") and that argument would add BLK_PERM_SUPPORT_ZONED to the
> > perm that blkconf_apply_backend_options passes to blk_set_perm.  On the
> > other side raw_check_perm would say something like
> > 
> >     if (is_zoned(s) && !(perm & BLK_PERM_SUPPORT_ZONED)) {
> >         error_setg(....);
> >         return -ENOTSUP;
> >     }
> > 
> > Is this correct?
> 
> Yes, I think this is how you'd best implement it.
> 
> > In addition, BLK_PERM_SUPPORT_ZONED would have to be a shared
> > permission, since it's possible to assign the same block device to
> > multiple scsi-block devices.  So BLK_PERM_SUPPORT_ZONED would be added
> > unconditionally to shared_perm.
> 
> Right, this part shows that we're kind of abusing the permission system
> here. I think unconditionally adding BLK_PERM_SUPPORT_ZONED to the set
> of shared permissions could probably happen centrally in
> bdrv_child_perm().
> 
> > ps: I have always thought that shared_perm is expressed the wrong way
> > and should have been "denied_perm".  How hard would it be to change that
> > now?
> 
> I'm not sure it would be better. It is shared_perm because that means
> that the default is restrictive (error mode: operation refused, clearly
> reported, easy to fix) rather than permissive (error mode: image
> corruption, hard to figure out where we were too permissive). Basically,
> whitelist instead of blacklist, once again.
> 
> But if we did indeed decide to change it, the only way to find out how
> hard it is would be doing it. I suspect not too hard in principle, but
> making sure that we converted all callers and don't introduce wrong
> callers later through (silent) merge conflicts is the more concerning
> part.
> 
> Kevin

Thanks for the feedback, the permissions based approach indeed looks
cleaner. I am looking into modifying the patchset to do the check in
bdrv_check_perm and will send a V2.

Paolo,
WRT to Host Aware drives, these MAY work, but we don't have any of these
available for testing and are not able to verify which drivers do work
with them and which do not. This is the reason for not letting them pass
thru. If you prefer, I can enable passing HA drives in V2.

Dmitry
Paolo Bonzini July 11, 2019, 8:04 a.m. UTC | #6
On 11/07/19 02:52, Dmitry Fomichev wrote:
> Paolo,
> WRT to Host Aware drives, these MAY work, but we don't have any of these
> available for testing and are not able to verify which drivers do work
> with them and which do not. This is the reason for not letting them pass
> thru. If you prefer, I can enable passing HA drives in V2.

In theory host aware should work the same as drive managed if the host
is oblivious of zones, so I prefer enabling them indeed.

Thanks,

Paolo