Message ID | 20210408101252.552-1-xieyongji@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce vhost-vdpa block device | expand |
On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote: > Since we already have some ways to emulate vDPA block device > in kernel[1] or userspace[2]. This series tries to introduce a > new vhost-vdpa block device for that. To use it, we can add > something like: > > qemu-system-x86_64 \ > -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0 This device is similar to vhost-user-blk. QEMU does not see it as a block device so storage migration, I/O throttling, image formats, etc are not supported. Stefano Garzarella and I discussed how vdpa-blk devices could integrate more deeply with QEMU's block layer. The QEMU block layer could be enabled only when necessary and otherwise bypassed for maximum performance. This alternative approach is similar to how vhost-net is implemented in QEMU. A BlockDriver would handle the vdpa-blk device and the regular virtio-blk-pci device would still be present. The virtqueues could be delegated to the vdpa-blk device in order to bypass the QEMU block layer. I wanted to mention this since it's likely that this kind of vdpa-blk device implementation will be posted in the future and you might be interested. It makes live migration with non-shared storage possible, for example. An issue with vhost-user-blk is that the ownership of qdev properties and VIRTIO Configuration Space fields was not clearly defined. Some things are handled by QEMU's vhost-user-blk code, some things are handled by the vhost-user device backend, and some things are negotiated between both entities. This patch series follows the existing vhost-user-blk approach, which I think will show serious issues as the device is more widely used and whenever virtio-blk or the implementation is extended with new features. It is very hard to provide backwards compatibility with the current approach where the ownership of qdev properties and VIRTIO Configuration Space fields is ad-hoc and largely undefined. Since vDPA has VIRTIO Configuration Space APIs, I suggest that the vhost-vDPA device controls the entire configuration space. QEMU should simply forward accesses between the guest and vhost-vdpa. Regarding qdev properties, it's a little trickier because QEMU needs to do the emulated VIRTIO device setup (allocating virtqueues, setting their sizes, etc). Can QEMU query this information from the vDPA device? If not, which qdev properties are read-only and must match the configuration of the vDPA device and which are read-write and can control the vDPA device? Stefan
On Mon, Apr 26, 2021 at 11:34 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote: > > Since we already have some ways to emulate vDPA block device > > in kernel[1] or userspace[2]. This series tries to introduce a > > new vhost-vdpa block device for that. To use it, we can add > > something like: > > > > qemu-system-x86_64 \ > > -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0 > > This device is similar to vhost-user-blk. QEMU does not see it as a > block device so storage migration, I/O throttling, image formats, etc > are not supported. Stefano Garzarella and I discussed how vdpa-blk > devices could integrate more deeply with QEMU's block layer. The QEMU > block layer could be enabled only when necessary and otherwise bypassed > for maximum performance. > Do you mean we can make use of the shadow virtqueue for the slow path (I/O will go through the block layer) and add a fast path (like what we do now) to bypass the block layer? > This alternative approach is similar to how vhost-net is implemented in > QEMU. A BlockDriver would handle the vdpa-blk device and the regular > virtio-blk-pci device would still be present. The virtqueues could be > delegated to the vdpa-blk device in order to bypass the QEMU block > layer. > > I wanted to mention this since it's likely that this kind of vdpa-blk > device implementation will be posted in the future and you might be > interested. It makes live migration with non-shared storage possible, > for example. > That would be nice, I'm looking forward to it! So do you think whether it's necessary to continue this approach? Looks like we don't need a vhost-vdpa-blk device any more in the new approach. > An issue with vhost-user-blk is that the ownership of qdev properties > and VIRTIO Configuration Space fields was not clearly defined. Some > things are handled by QEMU's vhost-user-blk code, some things are > handled by the vhost-user device backend, and some things are negotiated > between both entities. This patch series follows the existing > vhost-user-blk approach, which I think will show serious issues as the > device is more widely used and whenever virtio-blk or the implementation > is extended with new features. It is very hard to provide backwards > compatibility with the current approach where the ownership of qdev > properties and VIRTIO Configuration Space fields is ad-hoc and largely > undefined. > > Since vDPA has VIRTIO Configuration Space APIs, I suggest that the > vhost-vDPA device controls the entire configuration space. QEMU should > simply forward accesses between the guest and vhost-vdpa. > Does this already achieved by vhost_ops->vhost_get_config? And I want to know how to handle the endianness issue if qemu just simply does forwarding and doesn't care about the content of config space. > Regarding qdev properties, it's a little trickier because QEMU needs to > do the emulated VIRTIO device setup (allocating virtqueues, setting > their sizes, etc). Can QEMU query this information from the vDPA device? > If not, which qdev properties are read-only and must match the > configuration of the vDPA device and which are read-write and can > control the vDPA device? > Yes, that's an issue. We have to make sure the number of virtqueues and their size set by qemu is not greater than hardware limitation. Now I think we can query the max queue size, but looks like we don't have an interface to query the max number of virtqueues. Thanks, Yongji
On Tue, Apr 27, 2021 at 06:24:55PM +0800, Yongji Xie wrote: > On Mon, Apr 26, 2021 at 11:34 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote: > > > Since we already have some ways to emulate vDPA block device > > > in kernel[1] or userspace[2]. This series tries to introduce a > > > new vhost-vdpa block device for that. To use it, we can add > > > something like: > > > > > > qemu-system-x86_64 \ > > > -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0 > > > > This device is similar to vhost-user-blk. QEMU does not see it as a > > block device so storage migration, I/O throttling, image formats, etc > > are not supported. Stefano Garzarella and I discussed how vdpa-blk > > devices could integrate more deeply with QEMU's block layer. The QEMU > > block layer could be enabled only when necessary and otherwise bypassed > > for maximum performance. > > > > Do you mean we can make use of the shadow virtqueue for the slow path > (I/O will go through the block layer) and add a fast path (like what > we do now) to bypass the block layer? Yes. > > This alternative approach is similar to how vhost-net is implemented in > > QEMU. A BlockDriver would handle the vdpa-blk device and the regular > > virtio-blk-pci device would still be present. The virtqueues could be > > delegated to the vdpa-blk device in order to bypass the QEMU block > > layer. > > > > I wanted to mention this since it's likely that this kind of vdpa-blk > > device implementation will be posted in the future and you might be > > interested. It makes live migration with non-shared storage possible, > > for example. > > > > That would be nice, I'm looking forward to it! > > So do you think whether it's necessary to continue this approach? > Looks like we don't need a vhost-vdpa-blk device any more in the new > approach. There is no code for the vdpa-blk BlockDriver yet and the implementation will be more complex than this patch series (more risk the feature could be delayed). If you need vdpa-blk as soon as possible then this patch series may be a pragmatic solution. As long as it doesn't limit the vdpa-blk interface in a way that prevents the BlockDriver implementation then I think QEMU could support both. In the long term the vdpa-blk BlockDriver could replace -device vdpa-blk-pci since the performance should be identical and it supports more use cases. It's up to you. You're welcome to wait for the BlockDriver, work together with Stefano on the BlockDriver, or continue with your patch series. > > An issue with vhost-user-blk is that the ownership of qdev properties > > and VIRTIO Configuration Space fields was not clearly defined. Some > > things are handled by QEMU's vhost-user-blk code, some things are > > handled by the vhost-user device backend, and some things are negotiated > > between both entities. This patch series follows the existing > > vhost-user-blk approach, which I think will show serious issues as the > > device is more widely used and whenever virtio-blk or the implementation > > is extended with new features. It is very hard to provide backwards > > compatibility with the current approach where the ownership of qdev > > properties and VIRTIO Configuration Space fields is ad-hoc and largely > > undefined. > > > > Since vDPA has VIRTIO Configuration Space APIs, I suggest that the > > vhost-vDPA device controls the entire configuration space. QEMU should > > simply forward accesses between the guest and vhost-vdpa. > > > > Does this already achieved by vhost_ops->vhost_get_config? And I want > to know how to handle the endianness issue if qemu just simply does > forwarding and doesn't care about the content of config space. QEMU just copies bytes betwen the driver and the device via vdpa_config_ops->get/set_config(). I don't think it needs to worry about endianness in VIRTIO Configuration Space access code or did I miss something? My understanding is that vDPA currently supports same-endian Legacy and little-endian Modern devices. Cross-endian Legacy devices are currently not supported because there is no API to set endianness. If such an API is added in the future, then QEMU can use it to tell the vDPA device about guest endianness. QEMU still won't need to modify Configuration Space data itself. > > Regarding qdev properties, it's a little trickier because QEMU needs to > > do the emulated VIRTIO device setup (allocating virtqueues, setting > > their sizes, etc). Can QEMU query this information from the vDPA device? > > If not, which qdev properties are read-only and must match the > > configuration of the vDPA device and which are read-write and can > > control the vDPA device? > > > > Yes, that's an issue. We have to make sure the number of virtqueues > and their size set by qemu is not greater than hardware limitation. > Now I think we can query the max queue size, but looks like we don't > have an interface to query the max number of virtqueues. Okay, this is something that Jason and Stefano can discuss more. Ideally the QEMU command-line does not need to duplicate information about the vDPA device. The vdpa management tool (https://github.com/shemminger/iproute2/tree/main/vdpa) and VDUSE virtio-blk device will probably be where the main device configuration happens. As a starting point, can you describe how your VDUSE virtio-blk device is configured? Does it have a command-line/config file/RPC API to set the number of virtqueues, block size, etc? Stefan
On Tue, Apr 27, 2021 at 10:28 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Tue, Apr 27, 2021 at 06:24:55PM +0800, Yongji Xie wrote: > > On Mon, Apr 26, 2021 at 11:34 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > > On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote: > > > > Since we already have some ways to emulate vDPA block device > > > > in kernel[1] or userspace[2]. This series tries to introduce a > > > > new vhost-vdpa block device for that. To use it, we can add > > > > something like: > > > > > > > > qemu-system-x86_64 \ > > > > -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0 > > > > > > This device is similar to vhost-user-blk. QEMU does not see it as a > > > block device so storage migration, I/O throttling, image formats, etc > > > are not supported. Stefano Garzarella and I discussed how vdpa-blk > > > devices could integrate more deeply with QEMU's block layer. The QEMU > > > block layer could be enabled only when necessary and otherwise bypassed > > > for maximum performance. > > > > > > > Do you mean we can make use of the shadow virtqueue for the slow path > > (I/O will go through the block layer) and add a fast path (like what > > we do now) to bypass the block layer? > > Yes. > > > > This alternative approach is similar to how vhost-net is implemented in > > > QEMU. A BlockDriver would handle the vdpa-blk device and the regular > > > virtio-blk-pci device would still be present. The virtqueues could be > > > delegated to the vdpa-blk device in order to bypass the QEMU block > > > layer. > > > > > > I wanted to mention this since it's likely that this kind of vdpa-blk > > > device implementation will be posted in the future and you might be > > > interested. It makes live migration with non-shared storage possible, > > > for example. > > > > > > > That would be nice, I'm looking forward to it! > > > > So do you think whether it's necessary to continue this approach? > > Looks like we don't need a vhost-vdpa-blk device any more in the new > > approach. > > There is no code for the vdpa-blk BlockDriver yet and the implementation > will be more complex than this patch series (more risk the feature could > be delayed). > > If you need vdpa-blk as soon as possible then this patch series may be a > pragmatic solution. As long as it doesn't limit the vdpa-blk interface > in a way that prevents the BlockDriver implementation then I think QEMU > could support both. > > In the long term the vdpa-blk BlockDriver could replace -device > vdpa-blk-pci since the performance should be identical and it supports > more use cases. > > It's up to you. You're welcome to wait for the BlockDriver, work > together with Stefano on the BlockDriver, or continue with your patch > series. > I like the new idea! Let me wait for it. > > > An issue with vhost-user-blk is that the ownership of qdev properties > > > and VIRTIO Configuration Space fields was not clearly defined. Some > > > things are handled by QEMU's vhost-user-blk code, some things are > > > handled by the vhost-user device backend, and some things are negotiated > > > between both entities. This patch series follows the existing > > > vhost-user-blk approach, which I think will show serious issues as the > > > device is more widely used and whenever virtio-blk or the implementation > > > is extended with new features. It is very hard to provide backwards > > > compatibility with the current approach where the ownership of qdev > > > properties and VIRTIO Configuration Space fields is ad-hoc and largely > > > undefined. > > > > > > Since vDPA has VIRTIO Configuration Space APIs, I suggest that the > > > vhost-vDPA device controls the entire configuration space. QEMU should > > > simply forward accesses between the guest and vhost-vdpa. > > > > > > > Does this already achieved by vhost_ops->vhost_get_config? And I want > > to know how to handle the endianness issue if qemu just simply does > > forwarding and doesn't care about the content of config space. > > QEMU just copies bytes betwen the driver and the device via > vdpa_config_ops->get/set_config(). I don't think it needs to worry about > endianness in VIRTIO Configuration Space access code or did I miss > something? > > My understanding is that vDPA currently supports same-endian Legacy and > little-endian Modern devices. Cross-endian Legacy devices are currently > not supported because there is no API to set endianness. > OK, I get you. > If such an API is added in the future, then QEMU can use it to tell the > vDPA device about guest endianness. QEMU still won't need to modify > Configuration Space data itself. > > > > Regarding qdev properties, it's a little trickier because QEMU needs to > > > do the emulated VIRTIO device setup (allocating virtqueues, setting > > > their sizes, etc). Can QEMU query this information from the vDPA device? > > > If not, which qdev properties are read-only and must match the > > > configuration of the vDPA device and which are read-write and can > > > control the vDPA device? > > > > > > > Yes, that's an issue. We have to make sure the number of virtqueues > > and their size set by qemu is not greater than hardware limitation. > > Now I think we can query the max queue size, but looks like we don't > > have an interface to query the max number of virtqueues. > > Okay, this is something that Jason and Stefano can discuss more. Ideally > the QEMU command-line does not need to duplicate information about the > vDPA device. The vdpa management tool > (https://github.com/shemminger/iproute2/tree/main/vdpa) and VDUSE > virtio-blk device will probably be where the main device configuration > happens. > > As a starting point, can you describe how your VDUSE virtio-blk device > is configured? Does it have a command-line/config file/RPC API to set > the number of virtqueues, block size, etc? > Yes, we have a command-line to set those properties, something like: qemu-storage-daemon \ --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \ --monitor chardev=charmonitor \ --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \ --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128 Thanks, Yongji
On Wed, Apr 28, 2021 at 07:27:03PM +0800, Yongji Xie wrote: >On Tue, Apr 27, 2021 at 10:28 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: >> >> On Tue, Apr 27, 2021 at 06:24:55PM +0800, Yongji Xie wrote: >> > On Mon, Apr 26, 2021 at 11:34 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: >> > > >> > > On Thu, Apr 08, 2021 at 06:12:49PM +0800, Xie Yongji wrote: >> > > > Since we already have some ways to emulate vDPA block device >> > > > in kernel[1] or userspace[2]. This series tries to introduce a >> > > > new vhost-vdpa block device for that. To use it, we can add >> > > > something like: >> > > > >> > > > qemu-system-x86_64 \ >> > > > -device vhost-vdpa-blk-pci,vdpa-dev=/dev/vhost-vdpa-0 >> > > >> > > This device is similar to vhost-user-blk. QEMU does not see it as a >> > > block device so storage migration, I/O throttling, image formats, etc >> > > are not supported. Stefano Garzarella and I discussed how vdpa-blk >> > > devices could integrate more deeply with QEMU's block layer. The QEMU >> > > block layer could be enabled only when necessary and otherwise bypassed >> > > for maximum performance. >> > > >> > >> > Do you mean we can make use of the shadow virtqueue for the slow path >> > (I/O will go through the block layer) and add a fast path (like what >> > we do now) to bypass the block layer? >> >> Yes. >> >> > > This alternative approach is similar to how vhost-net is implemented in >> > > QEMU. A BlockDriver would handle the vdpa-blk device and the regular >> > > virtio-blk-pci device would still be present. The virtqueues could be >> > > delegated to the vdpa-blk device in order to bypass the QEMU block >> > > layer. >> > > >> > > I wanted to mention this since it's likely that this kind of vdpa-blk >> > > device implementation will be posted in the future and you might be >> > > interested. It makes live migration with non-shared storage possible, >> > > for example. >> > > >> > >> > That would be nice, I'm looking forward to it! >> > >> > So do you think whether it's necessary to continue this approach? >> > Looks like we don't need a vhost-vdpa-blk device any more in the new >> > approach. >> >> There is no code for the vdpa-blk BlockDriver yet and the implementation >> will be more complex than this patch series (more risk the feature could >> be delayed). >> >> If you need vdpa-blk as soon as possible then this patch series may be a >> pragmatic solution. As long as it doesn't limit the vdpa-blk interface >> in a way that prevents the BlockDriver implementation then I think QEMU >> could support both. >> >> In the long term the vdpa-blk BlockDriver could replace -device >> vdpa-blk-pci since the performance should be identical and it supports >> more use cases. >> >> It's up to you. You're welcome to wait for the BlockDriver, work >> together with Stefano on the BlockDriver, or continue with your patch >> series. >> > >I like the new idea! Let me wait for it. > Thanks for this great discussion! I'll keep you updated and I'll CC you on the RFC when I have something ready. Cheers, Stefano