Message ID | 20231215172830.2540987-1-eperezma@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Map memory at destination .load_setup in vDPA-net migration | expand |
QE tested this series with regression tests, there are no new regression issues. Tested-by: Lei Yang <leiyang@redhat.com> On Sat, Dec 16, 2023 at 1:28 AM Eugenio Pérez <eperezma@redhat.com> wrote: > > Current memory operations like pinning may take a lot of time at the > destination. Currently they are done after the source of the migration is > stopped, and before the workload is resumed at the destination. This is a > period where neigher traffic can flow, nor the VM workload can continue > (downtime). > > We can do better as we know the memory layout of the guest RAM at the > destination from the moment the migration starts. Moving that operation allows > QEMU to communicate the kernel the maps while the workload is still running in > the source, so Linux can start mapping them. > > Also, the destination of the guest memory may finish before the destination > QEMU maps all the memory. In this case, the rest of the memory will be mapped > at the same time as before applying this series, when the device is starting. > So we're only improving with this series. > > If the destination has the switchover_ack capability enabled, the destination > hold the migration until all the memory is mapped. > > This needs to be applied on top of [1]. That series performs some code > reorganization that allows to map the guest memory without knowing the queue > layout the guest configure on the device. > > This series reduced the downtime in the stop-and-copy phase of the live > migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa devices, > per [2]. > > Future directions on top of this series may include: > * Iterative migration of virtio-net devices, as it may reduce downtime per [3]. > vhost-vdpa net can apply the configuration through CVQ in the destination > while the source is still migrating. > * Move more things ahead of migration time, like DRIVER_OK. > * Check that the devices of the destination are valid, and cancel the migration > in case it is not. > > v1 from RFC v2: > * Hold on migration if memory has not been mapped in full with switchover_ack. > * Revert map if the device is not started. > > RFC v2: > * Delegate map to another thread so it does no block QMP. > * Fix not allocating iova_tree if x-svq=on at the destination. > * Rebased on latest master. > * More cleanups of current code, that might be split from this series too. > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html > [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html > [3] https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566f61@nvidia.com/T/ > > Eugenio Pérez (12): > vdpa: do not set virtio status bits if unneeded > vdpa: make batch_begin_once early return > vdpa: merge _begin_batch into _batch_begin_once > vdpa: extract out _dma_end_batch from _listener_commit > vdpa: factor out stop path of vhost_vdpa_dev_start > vdpa: check for iova tree initialized at net_client_start > vdpa: set backend capabilities at vhost_vdpa_init > vdpa: add vhost_vdpa_load_setup > vdpa: approve switchover after memory map in the migration destination > vdpa: add vhost_vdpa_net_load_setup NetClient callback > vdpa: add vhost_vdpa_net_switchover_ack_needed > virtio_net: register incremental migration handlers > > include/hw/virtio/vhost-vdpa.h | 32 ++++ > include/net/net.h | 8 + > hw/net/virtio-net.c | 48 ++++++ > hw/virtio/vhost-vdpa.c | 274 +++++++++++++++++++++++++++------ > net/vhost-vdpa.c | 43 +++++- > 5 files changed, 357 insertions(+), 48 deletions(-) > > -- > 2.39.3 > >
On Fri, Dec 15, 2023 at 06:28:18PM +0100, Eugenio Pérez wrote: > Current memory operations like pinning may take a lot of time at the > destination. Currently they are done after the source of the migration is > stopped, and before the workload is resumed at the destination. This is a > period where neigher traffic can flow, nor the VM workload can continue > (downtime). > > We can do better as we know the memory layout of the guest RAM at the > destination from the moment the migration starts. Moving that operation allows > QEMU to communicate the kernel the maps while the workload is still running in > the source, so Linux can start mapping them. > > Also, the destination of the guest memory may finish before the destination > QEMU maps all the memory. In this case, the rest of the memory will be mapped > at the same time as before applying this series, when the device is starting. > So we're only improving with this series. > > If the destination has the switchover_ack capability enabled, the destination > hold the migration until all the memory is mapped. > > This needs to be applied on top of [1]. That series performs some code > reorganization that allows to map the guest memory without knowing the queue > layout the guest configure on the device. > > This series reduced the downtime in the stop-and-copy phase of the live > migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa devices, > per [2]. I think this is reasonable and could be applied - batching is good. Could you rebase on master and repost please? > Future directions on top of this series may include: > * Iterative migration of virtio-net devices, as it may reduce downtime per [3]. > vhost-vdpa net can apply the configuration through CVQ in the destination > while the source is still migrating. > * Move more things ahead of migration time, like DRIVER_OK. > * Check that the devices of the destination are valid, and cancel the migration > in case it is not. > > v1 from RFC v2: > * Hold on migration if memory has not been mapped in full with switchover_ack. > * Revert map if the device is not started. > > RFC v2: > * Delegate map to another thread so it does no block QMP. > * Fix not allocating iova_tree if x-svq=on at the destination. > * Rebased on latest master. > * More cleanups of current code, that might be split from this series too. > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html > [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html > [3] https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566f61@nvidia.com/T/ > > Eugenio Pérez (12): > vdpa: do not set virtio status bits if unneeded > vdpa: make batch_begin_once early return > vdpa: merge _begin_batch into _batch_begin_once > vdpa: extract out _dma_end_batch from _listener_commit > vdpa: factor out stop path of vhost_vdpa_dev_start > vdpa: check for iova tree initialized at net_client_start > vdpa: set backend capabilities at vhost_vdpa_init > vdpa: add vhost_vdpa_load_setup > vdpa: approve switchover after memory map in the migration destination > vdpa: add vhost_vdpa_net_load_setup NetClient callback > vdpa: add vhost_vdpa_net_switchover_ack_needed > virtio_net: register incremental migration handlers > > include/hw/virtio/vhost-vdpa.h | 32 ++++ > include/net/net.h | 8 + > hw/net/virtio-net.c | 48 ++++++ > hw/virtio/vhost-vdpa.c | 274 +++++++++++++++++++++++++++------ > net/vhost-vdpa.c | 43 +++++- > 5 files changed, 357 insertions(+), 48 deletions(-) > > -- > 2.39.3 >
On Mon, Dec 25, 2023 at 5:31 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Dec 15, 2023 at 06:28:18PM +0100, Eugenio Pérez wrote: > > Current memory operations like pinning may take a lot of time at the > > destination. Currently they are done after the source of the migration is > > stopped, and before the workload is resumed at the destination. This is a > > period where neigher traffic can flow, nor the VM workload can continue > > (downtime). > > > > We can do better as we know the memory layout of the guest RAM at the > > destination from the moment the migration starts. Moving that operation allows > > QEMU to communicate the kernel the maps while the workload is still running in > > the source, so Linux can start mapping them. > > > > Also, the destination of the guest memory may finish before the destination > > QEMU maps all the memory. In this case, the rest of the memory will be mapped > > at the same time as before applying this series, when the device is starting. > > So we're only improving with this series. > > > > If the destination has the switchover_ack capability enabled, the destination > > hold the migration until all the memory is mapped. > > > > This needs to be applied on top of [1]. That series performs some code > > reorganization that allows to map the guest memory without knowing the queue > > layout the guest configure on the device. > > > > This series reduced the downtime in the stop-and-copy phase of the live > > migration from 20s~30s to 5s, with a 128G mem guest and two mlx5_vdpa devices, > > per [2]. > > I think this is reasonable and could be applied - batching is good. > Could you rebase on master and repost please? > New comments appeared in the meantime [1], but I'll rebase with the needed changes after they converge. Thanks! [1] https://patchwork.kernel.org/comment/25653487/ > > Future directions on top of this series may include: > > * Iterative migration of virtio-net devices, as it may reduce downtime per [3]. > > vhost-vdpa net can apply the configuration through CVQ in the destination > > while the source is still migrating. > > * Move more things ahead of migration time, like DRIVER_OK. > > * Check that the devices of the destination are valid, and cancel the migration > > in case it is not. > > > > v1 from RFC v2: > > * Hold on migration if memory has not been mapped in full with switchover_ack. > > * Revert map if the device is not started. > > > > RFC v2: > > * Delegate map to another thread so it does no block QMP. > > * Fix not allocating iova_tree if x-svq=on at the destination. > > * Rebased on latest master. > > * More cleanups of current code, that might be split from this series too. > > > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01986.html > > [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg00909.html > > [3] https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566f61@nvidia.com/T/ > > > > Eugenio Pérez (12): > > vdpa: do not set virtio status bits if unneeded > > vdpa: make batch_begin_once early return > > vdpa: merge _begin_batch into _batch_begin_once > > vdpa: extract out _dma_end_batch from _listener_commit > > vdpa: factor out stop path of vhost_vdpa_dev_start > > vdpa: check for iova tree initialized at net_client_start > > vdpa: set backend capabilities at vhost_vdpa_init > > vdpa: add vhost_vdpa_load_setup > > vdpa: approve switchover after memory map in the migration destination > > vdpa: add vhost_vdpa_net_load_setup NetClient callback > > vdpa: add vhost_vdpa_net_switchover_ack_needed > > virtio_net: register incremental migration handlers > > > > include/hw/virtio/vhost-vdpa.h | 32 ++++ > > include/net/net.h | 8 + > > hw/net/virtio-net.c | 48 ++++++ > > hw/virtio/vhost-vdpa.c | 274 +++++++++++++++++++++++++++------ > > net/vhost-vdpa.c | 43 +++++- > > 5 files changed, 357 insertions(+), 48 deletions(-) > > > > -- > > 2.39.3 > > >