Message ID | 20201120185105.279030-1-eperezma@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | vDPA software assisted live migration | expand |
The main intention with this POC/RFC is to serve as a base to implement SW LM in vdpa devices. To implement in vhost-vdpa devices, the high priority is to achieve an interface for vdpa to stop the device without losing its state, i.e., the avail_idx the destination device should start fetching descriptors from. Following this POC, an implementation on vdpa_sim will be performed. Apart from that, there is a TODO list about this series, they will be solved as the code is marked as valid. They don't affect the device, just internal qemu's code, and in case of change of direction it is easy to modify or delete. Comments about these are welcome. - Currently, it hijacks the log mechanism to know when migration is starting/done. Maybe it would be cleaner to forward migrate status from virtio_vmstate_change, since there is no need for the memory listener. However, this could make "memory backend" abstraction (also TODO) more complicated. This would drop patches 2,3 entirely. - There is a reverse search in a list on "vhost_dev_from_virtio" for each notification. Not really efficient, and it leads to a race condition at device destruction. - Implement new capabilities (no iommu, packed vq, event_idx, ...) - Lot of assertions need to be converted to proper error handling. Thanks! On Fri, Nov 20, 2020 at 8:02 PM Eugenio Pérez <eperezma@redhat.com> wrote: > > This series enable vDPA software assisted live migration for vhost-net > devices. This is a new method of vhost devices migration: Instead of > relay on vDPA device's dirty logging capability, SW assisted LM > intercepts dataplane, forwarding the descriptors between VM and device. > > In this migration mode, qemu offers a new vring to the device to > read and write into, and disable vhost notifiers, processing guest and > vhost notifications in qemu. On used buffer relay, qemu will mark the > dirty memory as with plain virtio-net devices. This way, devices does > not need to have dirty page logging capability. > > This series is a POC doing SW LM for vhost-net devices, which already > have dirty page logging capabilities. None of the changes have actual > effect with current devices until last two patches (26 and 27) are > applied, but they can be rebased on top of any other. These checks the > device to meet all requirements, and disable vhost-net devices logging > so migration goes through SW LM. This last patch is not meant to be > applied in the final revision, it is in the series just for testing > purposes. > > For use SW assisted LM these vhost-net devices need to be instantiated: > * With IOMMU (iommu_platform=on,ats=on) > * Without event_idx (event_idx=off) > > Just the notification forwarding (with no descriptor relay) can be > achieved with patches 7 and 9, and starting migration. Partial applies > between 13 and 24 will not work while migrating on source, and patch > 25 is needed for the destination to resume network activity. > > It is based on the ideas of DPDK SW assisted LM, in the series of > DPDK's https://patchwork.dpdk.org/cover/48370/ . > > Comments are welcome. > > Thanks! > > Eugenio Pérez (27): > vhost: Add vhost_dev_can_log > vhost: Add device callback in vhost_migration_log > vhost: Move log resize/put to vhost_dev_set_log > vhost: add vhost_kernel_set_vring_enable > vhost: Add hdev->dev.sw_lm_vq_handler > virtio: Add virtio_queue_get_used_notify_split > vhost: Route guest->host notification through qemu > vhost: Add a flag for software assisted Live Migration > vhost: Route host->guest notification through qemu > vhost: Allocate shadow vring > virtio: const-ify all virtio_tswap* functions > virtio: Add virtio_queue_full > vhost: Send buffers to device > virtio: Remove virtio_queue_get_used_notify_split > vhost: Do not invalidate signalled used > virtio: Expose virtqueue_alloc_element > vhost: add vhost_vring_set_notification_rcu > vhost: add vhost_vring_poll_rcu > vhost: add vhost_vring_get_buf_rcu > vhost: Return used buffers > vhost: Add vhost_virtqueue_memory_unmap > vhost: Add vhost_virtqueue_memory_map > vhost: unmap qemu's shadow virtqueues on sw live migration > vhost: iommu changes > vhost: Do not commit vhost used idx on vhost_virtqueue_stop > vhost: Add vhost_hdev_can_sw_lm > vhost: forbid vhost devices logging > > hw/virtio/vhost-sw-lm-ring.h | 39 +++ > include/hw/virtio/vhost.h | 5 + > include/hw/virtio/virtio-access.h | 8 +- > include/hw/virtio/virtio.h | 4 + > hw/net/virtio-net.c | 39 ++- > hw/virtio/vhost-backend.c | 29 ++ > hw/virtio/vhost-sw-lm-ring.c | 268 +++++++++++++++++++ > hw/virtio/vhost.c | 431 +++++++++++++++++++++++++----- > hw/virtio/virtio.c | 18 +- > hw/virtio/meson.build | 2 +- > 10 files changed, 758 insertions(+), 85 deletions(-) > create mode 100644 hw/virtio/vhost-sw-lm-ring.h > create mode 100644 hw/virtio/vhost-sw-lm-ring.c > > -- > 2.18.4 > >
Patchew URL: https://patchew.org/QEMU/20201120185105.279030-1-eperezma@redhat.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20201120185105.279030-1-eperezma@redhat.com Subject: [RFC PATCH 00/27] vDPA software assisted live migration === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20201117173635.29101-1-alex.bennee@linaro.org -> patchew/20201117173635.29101-1-alex.bennee@linaro.org * [new tag] patchew/20201120185105.279030-1-eperezma@redhat.com -> patchew/20201120185105.279030-1-eperezma@redhat.com Switched to a new branch 'test' af2fe22 vhost: forbid vhost devices logging 405925c vhost: Add vhost_hdev_can_sw_lm 7f2955b vhost: Do not commit vhost used idx on vhost_virtqueue_stop b68d3d5 vhost: iommu changes 74f282a vhost: unmap qemu's shadow virtqueues on sw live migration c999e86 vhost: Add vhost_virtqueue_memory_map 6e5f219 vhost: Add vhost_virtqueue_memory_unmap d5054c8 vhost: Return used buffers 806db46 vhost: add vhost_vring_get_buf_rcu b6b8168 vhost: add vhost_vring_poll_rcu ca44882 vhost: add vhost_vring_set_notification_rcu 3183f62 virtio: Expose virtqueue_alloc_element 4ead0ac vhost: Do not invalidate signalled used ceb76a4 virtio: Remove virtio_queue_get_used_notify_split 6aacdfe vhost: Send buffers to device 9fcc98d virtio: Add virtio_queue_full 41da0f8 virtio: const-ify all virtio_tswap* functions a3c92f1 vhost: Allocate shadow vring b40b3f7 vhost: Route host->guest notification through qemu 92ea117 vhost: Add a flag for software assisted Live Migration ace8c10 vhost: Route guest->host notification through qemu 6be11a6 virtio: Add virtio_queue_get_used_notify_split 47d4485 vhost: Add hdev->dev.sw_lm_vq_handler 7dbc8e5 vhost: add vhost_kernel_set_vring_enable 4e51c5c vhost: Move log resize/put to vhost_dev_set_log b964abc vhost: Add device callback in vhost_migration_log 264504e vhost: Add vhost_dev_can_log === OUTPUT BEGIN === 1/27 Checking commit 264504ee0018 (vhost: Add vhost_dev_can_log) 2/27 Checking commit b964abc315bd (vhost: Add device callback in vhost_migration_log) 3/27 Checking commit 4e51c5cec56d (vhost: Move log resize/put to vhost_dev_set_log) 4/27 Checking commit 7dbc8e5c64c3 (vhost: add vhost_kernel_set_vring_enable) 5/27 Checking commit 47d4485458d1 (vhost: Add hdev->dev.sw_lm_vq_handler) 6/27 Checking commit 6be11a63e3b4 (virtio: Add virtio_queue_get_used_notify_split) 7/27 Checking commit ace8c1034f83 (vhost: Route guest->host notification through qemu) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #24: new file mode 100644 total: 0 errors, 1 warnings, 245 lines checked Patch 7/27 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 8/27 Checking commit 92ea11700e32 (vhost: Add a flag for software assisted Live Migration) WARNING: Block comments use a leading /* on a separate line #46: FILE: hw/virtio/vhost.c:1581: + /* We've been called after migration is completed, so no need to WARNING: Block comments use * on subsequent lines #47: FILE: hw/virtio/vhost.c:1582: + /* We've been called after migration is completed, so no need to + disable it again total: 0 errors, 2 warnings, 45 lines checked Patch 8/27 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 9/27 Checking commit b40b3f79355c (vhost: Route host->guest notification through qemu) 10/27 Checking commit a3c92f15b554 (vhost: Allocate shadow vring) 11/27 Checking commit 41da0f8e02b2 (virtio: const-ify all virtio_tswap* functions) 12/27 Checking commit 9fcc98da9cd9 (virtio: Add virtio_queue_full) 13/27 Checking commit 6aacdfe0e0ba (vhost: Send buffers to device) ERROR: memory barrier without comment #50: FILE: hw/virtio/vhost-sw-lm-ring.c:45: + smp_rmb(); WARNING: Block comments use a leading /* on a separate line #98: FILE: hw/virtio/vhost-sw-lm-ring.c:93: +/* virtqueue_add: WARNING: Block comments use a leading /* on a separate line #125: FILE: hw/virtio/vhost-sw-lm-ring.c:120: + /* Put entry in available array (but don't update avail->idx until they WARNING: Block comments use a trailing */ on a separate line #126: FILE: hw/virtio/vhost-sw-lm-ring.c:121: + * do sync). */ ERROR: g_free(NULL) is safe this check is probably not required #147: FILE: hw/virtio/vhost-sw-lm-ring.c:142: + if (vq->ring_id_maps[host_head]) { + g_free(vq->ring_id_maps[host_head]); ERROR: braces {} are necessary for all arms of this statement #185: FILE: hw/virtio/vhost-sw-lm-ring.c:181: + for (i = 0; i < num - 1; i++) [...] ERROR: code indent should never use tabs #207: FILE: hw/virtio/vhost-sw-lm-ring.h:23: +^I struct vhost_vring_addr *addr);$ ERROR: space required before the open parenthesis '(' #247: FILE: hw/virtio/vhost.c:986: + } while(!virtio_queue_empty(vq)); total: 5 errors, 3 warnings, 275 lines checked Patch 13/27 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 14/27 Checking commit ceb76a4401b8 (virtio: Remove virtio_queue_get_used_notify_split) 15/27 Checking commit 4ead0ac8457f (vhost: Do not invalidate signalled used) 16/27 Checking commit 3183f62db3dc (virtio: Expose virtqueue_alloc_element) 17/27 Checking commit ca44882af152 (vhost: add vhost_vring_set_notification_rcu) ERROR: memory barrier without comment #45: FILE: hw/virtio/vhost-sw-lm-ring.c:83: + smp_mb(); total: 1 errors, 0 warnings, 45 lines checked Patch 17/27 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 18/27 Checking commit b6b8168b9fe7 (vhost: add vhost_vring_poll_rcu) ERROR: memory barrier without comment #37: FILE: hw/virtio/vhost-sw-lm-ring.c:98: + smp_rmb(); total: 1 errors, 0 warnings, 38 lines checked Patch 18/27 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 19/27 Checking commit 806db46e9194 (vhost: add vhost_vring_get_buf_rcu) 20/27 Checking commit d5054c8556bf (vhost: Return used buffers) 21/27 Checking commit 6e5f2192254a (vhost: Add vhost_virtqueue_memory_unmap) 22/27 Checking commit c999e86cf7f0 (vhost: Add vhost_virtqueue_memory_map) 23/27 Checking commit 74f282a80019 (vhost: unmap qemu's shadow virtqueues on sw live migration) 24/27 Checking commit b68d3d5fb839 (vhost: iommu changes) 25/27 Checking commit 7f2955b8e788 (vhost: Do not commit vhost used idx on vhost_virtqueue_stop) WARNING: Block comments use a leading /* on a separate line #40: FILE: hw/virtio/vhost.c:1442: + /* Connection to the backend is unusable, so let's sync internal total: 0 errors, 1 warnings, 27 lines checked Patch 25/27 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 26/27 Checking commit 405925c6b54a (vhost: Add vhost_hdev_can_sw_lm) 27/27 Checking commit af2fe2219d39 (vhost: forbid vhost devices logging) === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20201120185105.279030-1-eperezma@redhat.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
On 2020/11/21 上午2:50, Eugenio Pérez wrote: > This series enable vDPA software assisted live migration for vhost-net > devices. This is a new method of vhost devices migration: Instead of > relay on vDPA device's dirty logging capability, SW assisted LM > intercepts dataplane, forwarding the descriptors between VM and device. > > In this migration mode, qemu offers a new vring to the device to > read and write into, and disable vhost notifiers, processing guest and > vhost notifications in qemu. On used buffer relay, qemu will mark the > dirty memory as with plain virtio-net devices. This way, devices does > not need to have dirty page logging capability. > > This series is a POC doing SW LM for vhost-net devices, which already > have dirty page logging capabilities. None of the changes have actual > effect with current devices until last two patches (26 and 27) are > applied, but they can be rebased on top of any other. These checks the > device to meet all requirements, and disable vhost-net devices logging > so migration goes through SW LM. This last patch is not meant to be > applied in the final revision, it is in the series just for testing > purposes. > > For use SW assisted LM these vhost-net devices need to be instantiated: > * With IOMMU (iommu_platform=on,ats=on) > * Without event_idx (event_idx=off) So a question is at what level do we want to implement qemu assisted live migration. To me it could be done at two levels: 1) generic vhost level which makes it work for both vhost-net/vhost-user and vhost-vDPA 2) a specific type of vhost To me, having a generic one looks better but it would be much more complicated. So what I read from this series is it was a vhost kernel specific software assisted live migration which is a good start. Actually it may even have real use case, e.g it can save dirty bitmaps for guest with large memory. But we need to address the above limitations first. So I would like to know what's the reason for mandating iommu platform and ats? And I think we need to fix case of event idx support. > > Just the notification forwarding (with no descriptor relay) can be > achieved with patches 7 and 9, and starting migration. Partial applies > between 13 and 24 will not work while migrating on source, and patch > 25 is needed for the destination to resume network activity. > > It is based on the ideas of DPDK SW assisted LM, in the series of Actually we're better than that since there's no need the trick like hardcoded IOVA for mediated(shadow) virtqueue. > DPDK's https://patchwork.dpdk.org/cover/48370/ . I notice that you do GPA->VA translations and try to establish a VA->VA (use VA as IOVA) mapping via device IOTLB. This shortcut should work for vhost-kernel/user but not vhost-vDPA. The reason is that there's no guarantee that the whole 64bit address range could be used as IOVA. One example is that for hardware IOMMU like intel, it usually has 47 or 52 bits of address width. So we probably need an IOVA allocator that can make sure the IOVA is not overlapped and fit for [1]. We can probably build the IOVA for guest VA via memory listeners. Then we have 1) IOVA for GPA 2) IOVA for shadow VQ And advertise IOVA to VA mapping to vhost. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b48dc03e575a872404f33b04cd237953c5d7498 > > Comments are welcome. > > Thanks! > > Eugenio Pérez (27): > vhost: Add vhost_dev_can_log > vhost: Add device callback in vhost_migration_log > vhost: Move log resize/put to vhost_dev_set_log > vhost: add vhost_kernel_set_vring_enable > vhost: Add hdev->dev.sw_lm_vq_handler > virtio: Add virtio_queue_get_used_notify_split > vhost: Route guest->host notification through qemu > vhost: Add a flag for software assisted Live Migration > vhost: Route host->guest notification through qemu > vhost: Allocate shadow vring > virtio: const-ify all virtio_tswap* functions > virtio: Add virtio_queue_full > vhost: Send buffers to device > virtio: Remove virtio_queue_get_used_notify_split > vhost: Do not invalidate signalled used > virtio: Expose virtqueue_alloc_element > vhost: add vhost_vring_set_notification_rcu > vhost: add vhost_vring_poll_rcu > vhost: add vhost_vring_get_buf_rcu > vhost: Return used buffers > vhost: Add vhost_virtqueue_memory_unmap > vhost: Add vhost_virtqueue_memory_map > vhost: unmap qemu's shadow virtqueues on sw live migration > vhost: iommu changes > vhost: Do not commit vhost used idx on vhost_virtqueue_stop > vhost: Add vhost_hdev_can_sw_lm > vhost: forbid vhost devices logging > > hw/virtio/vhost-sw-lm-ring.h | 39 +++ > include/hw/virtio/vhost.h | 5 + > include/hw/virtio/virtio-access.h | 8 +- > include/hw/virtio/virtio.h | 4 + > hw/net/virtio-net.c | 39 ++- > hw/virtio/vhost-backend.c | 29 ++ > hw/virtio/vhost-sw-lm-ring.c | 268 +++++++++++++++++++ > hw/virtio/vhost.c | 431 +++++++++++++++++++++++++----- > hw/virtio/virtio.c | 18 +- > hw/virtio/meson.build | 2 +- > 10 files changed, 758 insertions(+), 85 deletions(-) > create mode 100644 hw/virtio/vhost-sw-lm-ring.h > create mode 100644 hw/virtio/vhost-sw-lm-ring.c So this looks like a pretty huge patchset which I'm trying to think of ways to split. An idea is to do this is two steps 1) implement a shadow virtqueue mode for vhost first (w/o live migration). Then we can test descriptors relay, IOVA allocating, etc. 2) add live migration support on top And it looks to me it's better to split the shadow virtqueue (virtio driver part) into an independent file. And use generic name (w/o "shadow") in order to be reused by other use cases as well. Thoughts?
On Wed, Nov 25, 2020 at 8:09 AM Jason Wang <jasowang@redhat.com> wrote: > > > On 2020/11/21 上午2:50, Eugenio Pérez wrote: > > This series enable vDPA software assisted live migration for vhost-net > > devices. This is a new method of vhost devices migration: Instead of > > relay on vDPA device's dirty logging capability, SW assisted LM > > intercepts dataplane, forwarding the descriptors between VM and device. > > > > In this migration mode, qemu offers a new vring to the device to > > read and write into, and disable vhost notifiers, processing guest and > > vhost notifications in qemu. On used buffer relay, qemu will mark the > > dirty memory as with plain virtio-net devices. This way, devices does > > not need to have dirty page logging capability. > > > > This series is a POC doing SW LM for vhost-net devices, which already > > have dirty page logging capabilities. None of the changes have actual > > effect with current devices until last two patches (26 and 27) are > > applied, but they can be rebased on top of any other. These checks the > > device to meet all requirements, and disable vhost-net devices logging > > so migration goes through SW LM. This last patch is not meant to be > > applied in the final revision, it is in the series just for testing > > purposes. > > > > For use SW assisted LM these vhost-net devices need to be instantiated: > > * With IOMMU (iommu_platform=on,ats=on) > > * Without event_idx (event_idx=off) > > > So a question is at what level do we want to implement qemu assisted > live migration. To me it could be done at two levels: > > 1) generic vhost level which makes it work for both vhost-net/vhost-user > and vhost-vDPA > 2) a specific type of vhost > > To me, having a generic one looks better but it would be much more > complicated. So what I read from this series is it was a vhost kernel > specific software assisted live migration which is a good start. > Actually it may even have real use case, e.g it can save dirty bitmaps > for guest with large memory. But we need to address the above > limitations first. > > So I would like to know what's the reason for mandating iommu platform > and ats? And I think we need to fix case of event idx support. > There is no specific reason for mandating iommu & ats, it was just started that way. I will extend the patch to support those cases too. > > > > > Just the notification forwarding (with no descriptor relay) can be > > achieved with patches 7 and 9, and starting migration. Partial applies > > between 13 and 24 will not work while migrating on source, and patch > > 25 is needed for the destination to resume network activity. > > > > It is based on the ideas of DPDK SW assisted LM, in the series of > > > Actually we're better than that since there's no need the trick like > hardcoded IOVA for mediated(shadow) virtqueue. > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . > > > I notice that you do GPA->VA translations and try to establish a VA->VA > (use VA as IOVA) mapping via device IOTLB. This shortcut should work for > vhost-kernel/user but not vhost-vDPA. The reason is that there's no > guarantee that the whole 64bit address range could be used as IOVA. One > example is that for hardware IOMMU like intel, it usually has 47 or 52 > bits of address width. > > So we probably need an IOVA allocator that can make sure the IOVA is not > overlapped and fit for [1]. We can probably build the IOVA for guest VA > via memory listeners. Then we have > > 1) IOVA for GPA > 2) IOVA for shadow VQ > > And advertise IOVA to VA mapping to vhost. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b48dc03e575a872404f33b04cd237953c5d7498 > Got it, will control it too. Maybe for vhost-net we could directly send iotlb miss for [0,~0ULL]. > > > > > Comments are welcome. > > > > Thanks! > > > > Eugenio Pérez (27): > > vhost: Add vhost_dev_can_log > > vhost: Add device callback in vhost_migration_log > > vhost: Move log resize/put to vhost_dev_set_log > > vhost: add vhost_kernel_set_vring_enable > > vhost: Add hdev->dev.sw_lm_vq_handler > > virtio: Add virtio_queue_get_used_notify_split > > vhost: Route guest->host notification through qemu > > vhost: Add a flag for software assisted Live Migration > > vhost: Route host->guest notification through qemu > > vhost: Allocate shadow vring > > virtio: const-ify all virtio_tswap* functions > > virtio: Add virtio_queue_full > > vhost: Send buffers to device > > virtio: Remove virtio_queue_get_used_notify_split > > vhost: Do not invalidate signalled used > > virtio: Expose virtqueue_alloc_element > > vhost: add vhost_vring_set_notification_rcu > > vhost: add vhost_vring_poll_rcu > > vhost: add vhost_vring_get_buf_rcu > > vhost: Return used buffers > > vhost: Add vhost_virtqueue_memory_unmap > > vhost: Add vhost_virtqueue_memory_map > > vhost: unmap qemu's shadow virtqueues on sw live migration > > vhost: iommu changes > > vhost: Do not commit vhost used idx on vhost_virtqueue_stop > > vhost: Add vhost_hdev_can_sw_lm > > vhost: forbid vhost devices logging > > > > hw/virtio/vhost-sw-lm-ring.h | 39 +++ > > include/hw/virtio/vhost.h | 5 + > > include/hw/virtio/virtio-access.h | 8 +- > > include/hw/virtio/virtio.h | 4 + > > hw/net/virtio-net.c | 39 ++- > > hw/virtio/vhost-backend.c | 29 ++ > > hw/virtio/vhost-sw-lm-ring.c | 268 +++++++++++++++++++ > > hw/virtio/vhost.c | 431 +++++++++++++++++++++++++----- > > hw/virtio/virtio.c | 18 +- > > hw/virtio/meson.build | 2 +- > > 10 files changed, 758 insertions(+), 85 deletions(-) > > create mode 100644 hw/virtio/vhost-sw-lm-ring.h > > create mode 100644 hw/virtio/vhost-sw-lm-ring.c > > > So this looks like a pretty huge patchset which I'm trying to think of > ways to split. An idea is to do this is two steps > > 1) implement a shadow virtqueue mode for vhost first (w/o live > migration). Then we can test descriptors relay, IOVA allocating, etc. How would that mode be activated if it is not tied to live migration? New backend/command line switch? Maybe it is better to also start with no iommu & ats support and add it on top. > 2) add live migration support on top > > And it looks to me it's better to split the shadow virtqueue (virtio > driver part) into an independent file. And use generic name (w/o > "shadow") in order to be reused by other use cases as well. > I think the same. Thanks! > Thoughts? >
On Wed, Nov 25, 2020 at 1:03 PM Eugenio Perez Martin <eperezma@redhat.com> wrote: > > On Wed, Nov 25, 2020 at 8:09 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On 2020/11/21 上午2:50, Eugenio Pérez wrote: > > > This series enable vDPA software assisted live migration for vhost-net > > > devices. This is a new method of vhost devices migration: Instead of > > > relay on vDPA device's dirty logging capability, SW assisted LM > > > intercepts dataplane, forwarding the descriptors between VM and device. > > > > > > In this migration mode, qemu offers a new vring to the device to > > > read and write into, and disable vhost notifiers, processing guest and > > > vhost notifications in qemu. On used buffer relay, qemu will mark the > > > dirty memory as with plain virtio-net devices. This way, devices does > > > not need to have dirty page logging capability. > > > > > > This series is a POC doing SW LM for vhost-net devices, which already > > > have dirty page logging capabilities. None of the changes have actual > > > effect with current devices until last two patches (26 and 27) are > > > applied, but they can be rebased on top of any other. These checks the > > > device to meet all requirements, and disable vhost-net devices logging > > > so migration goes through SW LM. This last patch is not meant to be > > > applied in the final revision, it is in the series just for testing > > > purposes. > > > > > > For use SW assisted LM these vhost-net devices need to be instantiated: > > > * With IOMMU (iommu_platform=on,ats=on) > > > * Without event_idx (event_idx=off) > > > > > > So a question is at what level do we want to implement qemu assisted > > live migration. To me it could be done at two levels: > > > > 1) generic vhost level which makes it work for both vhost-net/vhost-user > > and vhost-vDPA > > 2) a specific type of vhost > > > > To me, having a generic one looks better but it would be much more > > complicated. So what I read from this series is it was a vhost kernel > > specific software assisted live migration which is a good start. > > Actually it may even have real use case, e.g it can save dirty bitmaps > > for guest with large memory. But we need to address the above > > limitations first. > > > > So I would like to know what's the reason for mandating iommu platform > > and ats? And I think we need to fix case of event idx support. > > > > There is no specific reason for mandating iommu & ats, it was just > started that way. > > I will extend the patch to support those cases too. > > > > > > > > > Just the notification forwarding (with no descriptor relay) can be > > > achieved with patches 7 and 9, and starting migration. Partial applies > > > between 13 and 24 will not work while migrating on source, and patch > > > 25 is needed for the destination to resume network activity. > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of > > > > > > Actually we're better than that since there's no need the trick like > > hardcoded IOVA for mediated(shadow) virtqueue. > > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . > > > > > > I notice that you do GPA->VA translations and try to establish a VA->VA > > (use VA as IOVA) mapping via device IOTLB. This shortcut should work for > > vhost-kernel/user but not vhost-vDPA. The reason is that there's no > > guarantee that the whole 64bit address range could be used as IOVA. One > > example is that for hardware IOMMU like intel, it usually has 47 or 52 > > bits of address width. > > > > So we probably need an IOVA allocator that can make sure the IOVA is not > > overlapped and fit for [1]. We can probably build the IOVA for guest VA > > via memory listeners. Then we have > > > > 1) IOVA for GPA > > 2) IOVA for shadow VQ > > > > And advertise IOVA to VA mapping to vhost. > > > > [1] > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b48dc03e575a872404f33b04cd237953c5d7498 > > > > Got it, will control it too. > > Maybe for vhost-net we could directly send iotlb miss for [0,~0ULL]. > Sorry, this was intended to be a question :). Given a vhost-* device IOVA usable range, is ok to expose all of qemu overlapping VA to it? With the iotlb miss, for example. Would it be acceptable from a security point of view? The device would have access to all qemu VA, but on the other hand devices like vhost-net already have it. > > > > > > > > Comments are welcome. > > > > > > Thanks! > > > > > > Eugenio Pérez (27): > > > vhost: Add vhost_dev_can_log > > > vhost: Add device callback in vhost_migration_log > > > vhost: Move log resize/put to vhost_dev_set_log > > > vhost: add vhost_kernel_set_vring_enable > > > vhost: Add hdev->dev.sw_lm_vq_handler > > > virtio: Add virtio_queue_get_used_notify_split > > > vhost: Route guest->host notification through qemu > > > vhost: Add a flag for software assisted Live Migration > > > vhost: Route host->guest notification through qemu > > > vhost: Allocate shadow vring > > > virtio: const-ify all virtio_tswap* functions > > > virtio: Add virtio_queue_full > > > vhost: Send buffers to device > > > virtio: Remove virtio_queue_get_used_notify_split > > > vhost: Do not invalidate signalled used > > > virtio: Expose virtqueue_alloc_element > > > vhost: add vhost_vring_set_notification_rcu > > > vhost: add vhost_vring_poll_rcu > > > vhost: add vhost_vring_get_buf_rcu > > > vhost: Return used buffers > > > vhost: Add vhost_virtqueue_memory_unmap > > > vhost: Add vhost_virtqueue_memory_map > > > vhost: unmap qemu's shadow virtqueues on sw live migration > > > vhost: iommu changes > > > vhost: Do not commit vhost used idx on vhost_virtqueue_stop > > > vhost: Add vhost_hdev_can_sw_lm > > > vhost: forbid vhost devices logging > > > > > > hw/virtio/vhost-sw-lm-ring.h | 39 +++ > > > include/hw/virtio/vhost.h | 5 + > > > include/hw/virtio/virtio-access.h | 8 +- > > > include/hw/virtio/virtio.h | 4 + > > > hw/net/virtio-net.c | 39 ++- > > > hw/virtio/vhost-backend.c | 29 ++ > > > hw/virtio/vhost-sw-lm-ring.c | 268 +++++++++++++++++++ > > > hw/virtio/vhost.c | 431 +++++++++++++++++++++++++----- > > > hw/virtio/virtio.c | 18 +- > > > hw/virtio/meson.build | 2 +- > > > 10 files changed, 758 insertions(+), 85 deletions(-) > > > create mode 100644 hw/virtio/vhost-sw-lm-ring.h > > > create mode 100644 hw/virtio/vhost-sw-lm-ring.c > > > > > > So this looks like a pretty huge patchset which I'm trying to think of > > ways to split. An idea is to do this is two steps > > > > 1) implement a shadow virtqueue mode for vhost first (w/o live > > migration). Then we can test descriptors relay, IOVA allocating, etc. > > How would that mode be activated if it is not tied to live migration? > New backend/command line switch? > > Maybe it is better to also start with no iommu & ats support and add it on top. > > > 2) add live migration support on top > > > > And it looks to me it's better to split the shadow virtqueue (virtio > > driver part) into an independent file. And use generic name (w/o > > "shadow") in order to be reused by other use cases as well. > > > > I think the same. > > Thanks! > > > Thoughts? > >
On 2020/11/25 下午8:03, Eugenio Perez Martin wrote: > On Wed, Nov 25, 2020 at 8:09 AM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2020/11/21 上午2:50, Eugenio Pérez wrote: >>> This series enable vDPA software assisted live migration for vhost-net >>> devices. This is a new method of vhost devices migration: Instead of >>> relay on vDPA device's dirty logging capability, SW assisted LM >>> intercepts dataplane, forwarding the descriptors between VM and device. >>> >>> In this migration mode, qemu offers a new vring to the device to >>> read and write into, and disable vhost notifiers, processing guest and >>> vhost notifications in qemu. On used buffer relay, qemu will mark the >>> dirty memory as with plain virtio-net devices. This way, devices does >>> not need to have dirty page logging capability. >>> >>> This series is a POC doing SW LM for vhost-net devices, which already >>> have dirty page logging capabilities. None of the changes have actual >>> effect with current devices until last two patches (26 and 27) are >>> applied, but they can be rebased on top of any other. These checks the >>> device to meet all requirements, and disable vhost-net devices logging >>> so migration goes through SW LM. This last patch is not meant to be >>> applied in the final revision, it is in the series just for testing >>> purposes. >>> >>> For use SW assisted LM these vhost-net devices need to be instantiated: >>> * With IOMMU (iommu_platform=on,ats=on) >>> * Without event_idx (event_idx=off) >> >> So a question is at what level do we want to implement qemu assisted >> live migration. To me it could be done at two levels: >> >> 1) generic vhost level which makes it work for both vhost-net/vhost-user >> and vhost-vDPA >> 2) a specific type of vhost >> >> To me, having a generic one looks better but it would be much more >> complicated. So what I read from this series is it was a vhost kernel >> specific software assisted live migration which is a good start. >> Actually it may even have real use case, e.g it can save dirty bitmaps >> for guest with large memory. But we need to address the above >> limitations first. >> >> So I would like to know what's the reason for mandating iommu platform >> and ats? And I think we need to fix case of event idx support. >> > There is no specific reason for mandating iommu & ats, it was just > started that way. > > I will extend the patch to support those cases too. > >>> Just the notification forwarding (with no descriptor relay) can be >>> achieved with patches 7 and 9, and starting migration. Partial applies >>> between 13 and 24 will not work while migrating on source, and patch >>> 25 is needed for the destination to resume network activity. >>> >>> It is based on the ideas of DPDK SW assisted LM, in the series of >> >> Actually we're better than that since there's no need the trick like >> hardcoded IOVA for mediated(shadow) virtqueue. >> >> >>> DPDK's https://patchwork.dpdk.org/cover/48370/ . >> >> I notice that you do GPA->VA translations and try to establish a VA->VA >> (use VA as IOVA) mapping via device IOTLB. This shortcut should work for >> vhost-kernel/user but not vhost-vDPA. The reason is that there's no >> guarantee that the whole 64bit address range could be used as IOVA. One >> example is that for hardware IOMMU like intel, it usually has 47 or 52 >> bits of address width. >> >> So we probably need an IOVA allocator that can make sure the IOVA is not >> overlapped and fit for [1]. We can probably build the IOVA for guest VA >> via memory listeners. Then we have >> >> 1) IOVA for GPA >> 2) IOVA for shadow VQ >> >> And advertise IOVA to VA mapping to vhost. >> >> [1] >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b48dc03e575a872404f33b04cd237953c5d7498 >> > Got it, will control it too. > > Maybe for vhost-net we could directly send iotlb miss for [0,~0ULL]. It works but it means vhost-net needs some special care. To me a generic IOVA allocator looks better. > >>> Comments are welcome. >>> >>> Thanks! >>> >>> Eugenio Pérez (27): >>> vhost: Add vhost_dev_can_log >>> vhost: Add device callback in vhost_migration_log >>> vhost: Move log resize/put to vhost_dev_set_log >>> vhost: add vhost_kernel_set_vring_enable >>> vhost: Add hdev->dev.sw_lm_vq_handler >>> virtio: Add virtio_queue_get_used_notify_split >>> vhost: Route guest->host notification through qemu >>> vhost: Add a flag for software assisted Live Migration >>> vhost: Route host->guest notification through qemu >>> vhost: Allocate shadow vring >>> virtio: const-ify all virtio_tswap* functions >>> virtio: Add virtio_queue_full >>> vhost: Send buffers to device >>> virtio: Remove virtio_queue_get_used_notify_split >>> vhost: Do not invalidate signalled used >>> virtio: Expose virtqueue_alloc_element >>> vhost: add vhost_vring_set_notification_rcu >>> vhost: add vhost_vring_poll_rcu >>> vhost: add vhost_vring_get_buf_rcu >>> vhost: Return used buffers >>> vhost: Add vhost_virtqueue_memory_unmap >>> vhost: Add vhost_virtqueue_memory_map >>> vhost: unmap qemu's shadow virtqueues on sw live migration >>> vhost: iommu changes >>> vhost: Do not commit vhost used idx on vhost_virtqueue_stop >>> vhost: Add vhost_hdev_can_sw_lm >>> vhost: forbid vhost devices logging >>> >>> hw/virtio/vhost-sw-lm-ring.h | 39 +++ >>> include/hw/virtio/vhost.h | 5 + >>> include/hw/virtio/virtio-access.h | 8 +- >>> include/hw/virtio/virtio.h | 4 + >>> hw/net/virtio-net.c | 39 ++- >>> hw/virtio/vhost-backend.c | 29 ++ >>> hw/virtio/vhost-sw-lm-ring.c | 268 +++++++++++++++++++ >>> hw/virtio/vhost.c | 431 +++++++++++++++++++++++++----- >>> hw/virtio/virtio.c | 18 +- >>> hw/virtio/meson.build | 2 +- >>> 10 files changed, 758 insertions(+), 85 deletions(-) >>> create mode 100644 hw/virtio/vhost-sw-lm-ring.h >>> create mode 100644 hw/virtio/vhost-sw-lm-ring.c >> >> So this looks like a pretty huge patchset which I'm trying to think of >> ways to split. An idea is to do this is two steps >> >> 1) implement a shadow virtqueue mode for vhost first (w/o live >> migration). Then we can test descriptors relay, IOVA allocating, etc. > How would that mode be activated if it is not tied to live migration? > New backend/command line switch? Either a new cli option or even a qmp command can work. > > Maybe it is better to also start with no iommu & ats support and add it on top. Yes. > >> 2) add live migration support on top >> >> And it looks to me it's better to split the shadow virtqueue (virtio >> driver part) into an independent file. And use generic name (w/o >> "shadow") in order to be reused by other use cases as well. >> > I think the same. > > Thanks! > >> Thoughts? >> >
On Fri, Nov 20, 2020 at 07:50:38PM +0100, Eugenio Pérez wrote: >This series enable vDPA software assisted live migration for vhost-net >devices. This is a new method of vhost devices migration: Instead of >relay on vDPA device's dirty logging capability, SW assisted LM >intercepts dataplane, forwarding the descriptors between VM and device. > >In this migration mode, qemu offers a new vring to the device to >read and write into, and disable vhost notifiers, processing guest and >vhost notifications in qemu. On used buffer relay, qemu will mark the >dirty memory as with plain virtio-net devices. This way, devices does >not need to have dirty page logging capability. > >This series is a POC doing SW LM for vhost-net devices, which already >have dirty page logging capabilities. None of the changes have actual >effect with current devices until last two patches (26 and 27) are >applied, but they can be rebased on top of any other. These checks the >device to meet all requirements, and disable vhost-net devices logging >so migration goes through SW LM. This last patch is not meant to be >applied in the final revision, it is in the series just for testing >purposes. > >For use SW assisted LM these vhost-net devices need to be instantiated: >* With IOMMU (iommu_platform=on,ats=on) >* Without event_idx (event_idx=off) > >Just the notification forwarding (with no descriptor relay) can be >achieved with patches 7 and 9, and starting migration. Partial applies >between 13 and 24 will not work while migrating on source, and patch >25 is needed for the destination to resume network activity. > >It is based on the ideas of DPDK SW assisted LM, in the series of >DPDK's https://patchwork.dpdk.org/cover/48370/ . > >Comments are welcome. Hi Eugenio, I took a look and the idea of the shadow queue I think is the right way. It's very similar to what we thought with Stefan for io_uring passthrough and vdpa-blk. IIUC, when the migrations starts, the notifications from the guest to vhost are disabled, so QEMU starts to intercept them through the custom_handler installed in virtio-net (we need to understand how to generalize this). At this point QEMU starts to use the shadows queues and exposes them to vhost. The opposite is done for vhost to guest notifications, where vhost_handle_call is installed to masked_notifier to intercept the notification. I hope to give better feedback when I get a complete overview ;-) Anyway, as Jason suggested, we should split this series, so maybe we can merge some preparations patches (e.g. 1, 11, 21, 22) regardless the other patches. Thanks, Stefano
On Fri, Nov 20, 2020 at 07:50:38PM +0100, Eugenio Pérez wrote: > This series enable vDPA software assisted live migration for vhost-net > devices. This is a new method of vhost devices migration: Instead of > relay on vDPA device's dirty logging capability, SW assisted LM > intercepts dataplane, forwarding the descriptors between VM and device. Pros: + vhost/vDPA devices don't need to implement dirty memory logging + Obsoletes ioctl(VHOST_SET_LOG_BASE) and friends Cons: - Not generic, relies on vhost-net-specific ioctls - Doesn't support VIRTIO Shared Memory Regions https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex - Performance (see below) I think performance will be significantly lower when the shadow vq is enabled. Imagine a vDPA device with hardware vq doorbell registers mapped into the guest so the guest driver can directly kick the device. When the shadow vq is enabled a vmexit is needed to write to the shadow vq ioeventfd, then the host kernel scheduler switches to a QEMU thread to read the ioeventfd, the descriptors are translated, QEMU writes to the vhost hdev kick fd, the host kernel scheduler switches to the vhost worker thread, vhost/vDPA notifies the virtqueue, and finally the vDPA driver writes to the hardware vq doorbell register. That is a lot of overhead compared to writing to an exitless MMIO register! If the shadow vq was implemented in drivers/vhost/ and QEMU used the existing ioctl(VHOST_SET_LOG_BASE) approach, then the overhead would be reduced to just one set of ioeventfd/irqfd. In other words, the QEMU dirty memory logging happens asynchronously and isn't in the dataplane. In addition, hardware that supports dirty memory logging as well as software vDPA devices could completely eliminate the shadow vq for even better performance. But performance is a question of "is it good enough?". Maybe this approach is okay and users don't expect good performance while dirty memory logging is enabled. I just wanted to share the idea of moving the shadow vq into the kernel in case you like that approach better. Stefan
----- Original Message ----- > On Fri, Nov 20, 2020 at 07:50:38PM +0100, Eugenio Pérez wrote: > > This series enable vDPA software assisted live migration for vhost-net > > devices. This is a new method of vhost devices migration: Instead of > > relay on vDPA device's dirty logging capability, SW assisted LM > > intercepts dataplane, forwarding the descriptors between VM and device. > > Pros: > + vhost/vDPA devices don't need to implement dirty memory logging > + Obsoletes ioctl(VHOST_SET_LOG_BASE) and friends > > Cons: > - Not generic, relies on vhost-net-specific ioctls > - Doesn't support VIRTIO Shared Memory Regions > https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex I may miss something but my understanding is that it's the responsiblity of device to migrate this part? > - Performance (see below) > > I think performance will be significantly lower when the shadow vq is > enabled. Imagine a vDPA device with hardware vq doorbell registers > mapped into the guest so the guest driver can directly kick the device. > When the shadow vq is enabled a vmexit is needed to write to the shadow > vq ioeventfd, then the host kernel scheduler switches to a QEMU thread > to read the ioeventfd, the descriptors are translated, QEMU writes to > the vhost hdev kick fd, the host kernel scheduler switches to the vhost > worker thread, vhost/vDPA notifies the virtqueue, and finally the > vDPA driver writes to the hardware vq doorbell register. That is a lot > of overhead compared to writing to an exitless MMIO register! I think it's a balance. E.g we can poll the virtqueue to have an exitless doorbell. > > If the shadow vq was implemented in drivers/vhost/ and QEMU used the > existing ioctl(VHOST_SET_LOG_BASE) approach, then the overhead would be > reduced to just one set of ioeventfd/irqfd. In other words, the QEMU > dirty memory logging happens asynchronously and isn't in the dataplane. > > In addition, hardware that supports dirty memory logging as well as > software vDPA devices could completely eliminate the shadow vq for even > better performance. Yes. That's our plan. But the interface might require more thought. E.g is the bitmap a good approach? To me reporting dirty pages via virqueue is better since it get less footprint and is self throttled. And we need an address space other than the one used by guest for either bitmap for virtqueue. > > But performance is a question of "is it good enough?". Maybe this > approach is okay and users don't expect good performance while dirty > memory logging is enabled. Yes, and actually such slow down may help for the converge of the migration. Note that the whole idea is try to have a generic solution for all types of devices. It's good to consider the performance but for the first stage, it should be sufficient to make it work and consider to optimize on top. > I just wanted to share the idea of moving the > shadow vq into the kernel in case you like that approach better. My understanding is to keep kernel as simple as possible and leave the polices to userspace as much as possible. E.g it requires us to disable doorbell mapping and irq offloading, all of which were under the control of userspace. Thanks > > Stefan >
On Wed, Dec 09, 2020 at 04:26:50AM -0500, Jason Wang wrote: > ----- Original Message ----- > > On Fri, Nov 20, 2020 at 07:50:38PM +0100, Eugenio Pérez wrote: > > > This series enable vDPA software assisted live migration for vhost-net > > > devices. This is a new method of vhost devices migration: Instead of > > > relay on vDPA device's dirty logging capability, SW assisted LM > > > intercepts dataplane, forwarding the descriptors between VM and device. > > > > Pros: > > + vhost/vDPA devices don't need to implement dirty memory logging > > + Obsoletes ioctl(VHOST_SET_LOG_BASE) and friends > > > > Cons: > > - Not generic, relies on vhost-net-specific ioctls > > - Doesn't support VIRTIO Shared Memory Regions > > https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex > > I may miss something but my understanding is that it's the > responsiblity of device to migrate this part? Good point. You're right. > > - Performance (see below) > > > > I think performance will be significantly lower when the shadow vq is > > enabled. Imagine a vDPA device with hardware vq doorbell registers > > mapped into the guest so the guest driver can directly kick the device. > > When the shadow vq is enabled a vmexit is needed to write to the shadow > > vq ioeventfd, then the host kernel scheduler switches to a QEMU thread > > to read the ioeventfd, the descriptors are translated, QEMU writes to > > the vhost hdev kick fd, the host kernel scheduler switches to the vhost > > worker thread, vhost/vDPA notifies the virtqueue, and finally the > > vDPA driver writes to the hardware vq doorbell register. That is a lot > > of overhead compared to writing to an exitless MMIO register! > > I think it's a balance. E.g we can poll the virtqueue to have an > exitless doorbell. > > > > > If the shadow vq was implemented in drivers/vhost/ and QEMU used the > > existing ioctl(VHOST_SET_LOG_BASE) approach, then the overhead would be > > reduced to just one set of ioeventfd/irqfd. In other words, the QEMU > > dirty memory logging happens asynchronously and isn't in the dataplane. > > > > In addition, hardware that supports dirty memory logging as well as > > software vDPA devices could completely eliminate the shadow vq for even > > better performance. > > Yes. That's our plan. But the interface might require more thought. > > E.g is the bitmap a good approach? To me reporting dirty pages via > virqueue is better since it get less footprint and is self throttled. > > And we need an address space other than the one used by guest for > either bitmap for virtqueue. > > > > > But performance is a question of "is it good enough?". Maybe this > > approach is okay and users don't expect good performance while dirty > > memory logging is enabled. > > Yes, and actually such slow down may help for the converge of the > migration. > > Note that the whole idea is try to have a generic solution for all > types of devices. It's good to consider the performance but for the > first stage, it should be sufficient to make it work and consider to > optimize on top. Moving the shadow vq to the kernel later would be quite a big change requiring rewriting much of the code. That's why I mentioned this now before a lot of effort is invested in a QEMU implementation. > > I just wanted to share the idea of moving the > > shadow vq into the kernel in case you like that approach better. > > My understanding is to keep kernel as simple as possible and leave the > polices to userspace as much as possible. E.g it requires us to > disable doorbell mapping and irq offloading, all of which were under > the control of userspace. If the performance is acceptable with the QEMU approach then I think that's the best place to implement it. It looks high-overhead though so maybe one of the first things to do is to run benchmarks to collect data on how it performs? Stefan
On 2020/12/9 下午11:57, Stefan Hajnoczi wrote: > On Wed, Dec 09, 2020 at 04:26:50AM -0500, Jason Wang wrote: >> ----- Original Message ----- >>> On Fri, Nov 20, 2020 at 07:50:38PM +0100, Eugenio Pérez wrote: >>>> This series enable vDPA software assisted live migration for vhost-net >>>> devices. This is a new method of vhost devices migration: Instead of >>>> relay on vDPA device's dirty logging capability, SW assisted LM >>>> intercepts dataplane, forwarding the descriptors between VM and device. >>> Pros: >>> + vhost/vDPA devices don't need to implement dirty memory logging >>> + Obsoletes ioctl(VHOST_SET_LOG_BASE) and friends >>> >>> Cons: >>> - Not generic, relies on vhost-net-specific ioctls >>> - Doesn't support VIRTIO Shared Memory Regions >>> https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex >> I may miss something but my understanding is that it's the >> responsiblity of device to migrate this part? > Good point. You're right. > >>> - Performance (see below) >>> >>> I think performance will be significantly lower when the shadow vq is >>> enabled. Imagine a vDPA device with hardware vq doorbell registers >>> mapped into the guest so the guest driver can directly kick the device. >>> When the shadow vq is enabled a vmexit is needed to write to the shadow >>> vq ioeventfd, then the host kernel scheduler switches to a QEMU thread >>> to read the ioeventfd, the descriptors are translated, QEMU writes to >>> the vhost hdev kick fd, the host kernel scheduler switches to the vhost >>> worker thread, vhost/vDPA notifies the virtqueue, and finally the >>> vDPA driver writes to the hardware vq doorbell register. That is a lot >>> of overhead compared to writing to an exitless MMIO register! >> I think it's a balance. E.g we can poll the virtqueue to have an >> exitless doorbell. >> >>> If the shadow vq was implemented in drivers/vhost/ and QEMU used the >>> existing ioctl(VHOST_SET_LOG_BASE) approach, then the overhead would be >>> reduced to just one set of ioeventfd/irqfd. In other words, the QEMU >>> dirty memory logging happens asynchronously and isn't in the dataplane. >>> >>> In addition, hardware that supports dirty memory logging as well as >>> software vDPA devices could completely eliminate the shadow vq for even >>> better performance. >> Yes. That's our plan. But the interface might require more thought. >> >> E.g is the bitmap a good approach? To me reporting dirty pages via >> virqueue is better since it get less footprint and is self throttled. >> >> And we need an address space other than the one used by guest for >> either bitmap for virtqueue. >> >>> But performance is a question of "is it good enough?". Maybe this >>> approach is okay and users don't expect good performance while dirty >>> memory logging is enabled. >> Yes, and actually such slow down may help for the converge of the >> migration. >> >> Note that the whole idea is try to have a generic solution for all >> types of devices. It's good to consider the performance but for the >> first stage, it should be sufficient to make it work and consider to >> optimize on top. > Moving the shadow vq to the kernel later would be quite a big change > requiring rewriting much of the code. That's why I mentioned this now > before a lot of effort is invested in a QEMU implementation. Right. > >>> I just wanted to share the idea of moving the >>> shadow vq into the kernel in case you like that approach better. >> My understanding is to keep kernel as simple as possible and leave the >> polices to userspace as much as possible. E.g it requires us to >> disable doorbell mapping and irq offloading, all of which were under >> the control of userspace. > If the performance is acceptable with the QEMU approach then I think > that's the best place to implement it. It looks high-overhead though so > maybe one of the first things to do is to run benchmarks to collect data > on how it performs? Yes, I agree. Thanks > > Stefan