Message ID | 20230918044932.1433744-1-yajunw@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | virtio-net: Introduce LM early load | expand |
Ping. On 9/18/2023 12:49 PM, Yajun Wu wrote: > This series of patches aims to minimize the downtime during live migration of a > virtio-net device with a vhost-user backend. In the case of hardware virtual > Data Path Acceleration (vDPA) implementation, the hardware configuration, which > includes tasks like VQ creation and RSS setting, may take above 200ms. This > significantly increases the downtime of the VM, particularly in terms of > networking. > > To reduce the VM downtime, the proposed approach involves capturing the basic > device state/configuration during the VM's running stage and performing the > initial device configuration(presetup). During the normal configuration process > when the VM is in a stopped state, the second configuration is compared to the > first one, and only the differences are applied to reduce downtime. Ideally, > only the vring available index needs to be changed within VM stop. > > This feature is disabled by default, because backend like dpdk also needs > adding support for vhost new message. New device property "x-early-migration" > can enable this feature. > > 1. Register a new vmstate for virtio-net with an early_setup flag to send the > device state during migration setup. > 2. After device state load on destination VM, need to send device status to > vhost backend in a new way. Introduce new vhost-user message: > VHOST_USER_PRESETUP, to notify backend of presetup. > 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow: > a. vhost-dev sending presetup start. > b. virtio-net setting mtu. > c. vhost-dev sending vring configuration and setting dummy call/kick fd. > d. vhost-net sending vring enable. > e. vhost-dev sending presetup end. > > > TODOs: > ====== > - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface > if there's same requirement for vhost-vdpa. > > - No vIOMMU support so far. If there is a need for vIOMMU support, it is > planned to be addressed in a follow-up patchset. > > > Test: > ===== > - Live migration VM with 2 virtio-net devices, ping can recover. > Together with DPDK patch [1]. > - The time consumption of DPDK function dev_conf is reduced from 191.4 ms > to 6.6 ms. > > > References: > =========== > > [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37 > > Any comments or feedback are highly appreciated. > > Thanks, > Yajun > > > Yajun Wu (5): > vhost-user: Add presetup protocol feature and op > vhost: Add support for presetup > vhost-net: Add support for presetup > virtio: Add VMState for early load > virtio-net: Introduce LM early load > > docs/interop/vhost-user.rst | 10 ++ > hw/net/trace-events | 1 + > hw/net/vhost_net.c | 40 +++++++ > hw/net/virtio-net.c | 100 ++++++++++++++++++ > hw/virtio/vhost-user.c | 30 ++++++ > hw/virtio/vhost.c | 166 +++++++++++++++++++++++++----- > hw/virtio/virtio.c | 152 ++++++++++++++++----------- > include/hw/virtio/vhost-backend.h | 3 + > include/hw/virtio/vhost.h | 12 +++ > include/hw/virtio/virtio-net.h | 1 + > include/hw/virtio/virtio.h | 10 +- > include/net/vhost_net.h | 3 + > 12 files changed, 443 insertions(+), 85 deletions(-) >
On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu <yajunw@nvidia.com> wrote: > > This series of patches aims to minimize the downtime during live migration of a > virtio-net device with a vhost-user backend. In the case of hardware virtual > Data Path Acceleration (vDPA) implementation, the hardware configuration, which > includes tasks like VQ creation and RSS setting, may take above 200ms. This > significantly increases the downtime of the VM, particularly in terms of > networking. > Hi! Sorry I totally missed this email. Please CC me in next versions. Just for completion, there is an ongoing plan to reduce the downtime in vhost-vdpa. You can find more details at [1]. To send the state periodically is in the roadmap, but some benchmarking detected that memory pinning and unpinning affects more to downtime. I'll send a RFC soon with this. The plan was to continue with iterative state restoring, so I'm happy to know more people are looking into it! In the case of vhost-vdpa it already restores the state by not enabling dataplane until migration completes. All the load is performed using CVQ, as you can see in net/vhost-vdpa.c:vhost_vdpa_net_load. After that, all dataplane is started again. My idea is to start vhost-vdpa (by calling vhost_vdpa_dev_start) at the destination at the same moment the migration starts, as it will not have dataplane enabled. After that, the source should send the virtio-net vmstate every time it changes. vhost-vdpa net is able to send and receive through CVQ, so it should be able to modify net device configuration as many times as needed. I guess that could be done by calling something in the line of your vhost_user_set_presetup_state. This can be improved in vhost-vdpa by being able to send only the new state. When all the migration is completed, vhost-vdpa net dataplane should start as it does now. If you are interested in saving changes to vhost-user protocol, maybe qemu could just disable the dataplane too with VHOST_USER_SET_VRING_ENABLE? If not, I think both approaches have a lot in common, so I'm sure we can develop one backend on top of another. Thanks! [1] https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg00659.html > To reduce the VM downtime, the proposed approach involves capturing the basic > device state/configuration during the VM's running stage and performing the > initial device configuration(presetup). During the normal configuration process > when the VM is in a stopped state, the second configuration is compared to the > first one, and only the differences are applied to reduce downtime. Ideally, > only the vring available index needs to be changed within VM stop. > > This feature is disabled by default, because backend like dpdk also needs > adding support for vhost new message. New device property "x-early-migration" > can enable this feature. > > 1. Register a new vmstate for virtio-net with an early_setup flag to send the > device state during migration setup. > 2. After device state load on destination VM, need to send device status to > vhost backend in a new way. Introduce new vhost-user message: > VHOST_USER_PRESETUP, to notify backend of presetup. > 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow: > a. vhost-dev sending presetup start. > b. virtio-net setting mtu. > c. vhost-dev sending vring configuration and setting dummy call/kick fd. > d. vhost-net sending vring enable. > e. vhost-dev sending presetup end. > > > TODOs: > ====== > - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface > if there's same requirement for vhost-vdpa. > > - No vIOMMU support so far. If there is a need for vIOMMU support, it is > planned to be addressed in a follow-up patchset. > > > Test: > ===== > - Live migration VM with 2 virtio-net devices, ping can recover. > Together with DPDK patch [1]. > - The time consumption of DPDK function dev_conf is reduced from 191.4 ms > to 6.6 ms. > > > References: > =========== > > [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37 > > Any comments or feedback are highly appreciated. > > Thanks, > Yajun > > > Yajun Wu (5): > vhost-user: Add presetup protocol feature and op > vhost: Add support for presetup > vhost-net: Add support for presetup > virtio: Add VMState for early load > virtio-net: Introduce LM early load > > docs/interop/vhost-user.rst | 10 ++ > hw/net/trace-events | 1 + > hw/net/vhost_net.c | 40 +++++++ > hw/net/virtio-net.c | 100 ++++++++++++++++++ > hw/virtio/vhost-user.c | 30 ++++++ > hw/virtio/vhost.c | 166 +++++++++++++++++++++++++----- > hw/virtio/virtio.c | 152 ++++++++++++++++----------- > include/hw/virtio/vhost-backend.h | 3 + > include/hw/virtio/vhost.h | 12 +++ > include/hw/virtio/virtio-net.h | 1 + > include/hw/virtio/virtio.h | 10 +- > include/net/vhost_net.h | 3 + > 12 files changed, 443 insertions(+), 85 deletions(-) > > -- > 2.27.0 > >
On 10/18/2023 12:47 AM, Eugenio Perez Martin wrote: > External email: Use caution opening links or attachments > > > On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu <yajunw@nvidia.com> wrote: >> This series of patches aims to minimize the downtime during live migration of a >> virtio-net device with a vhost-user backend. In the case of hardware virtual >> Data Path Acceleration (vDPA) implementation, the hardware configuration, which >> includes tasks like VQ creation and RSS setting, may take above 200ms. This >> significantly increases the downtime of the VM, particularly in terms of >> networking. >> > Hi! > > Sorry I totally missed this email. Please CC me in next versions. > > Just for completion, there is an ongoing plan to reduce the downtime > in vhost-vdpa. You can find more details at [1]. > > To send the state periodically is in the roadmap, but some > benchmarking detected that memory pinning and unpinning affects more > to downtime. I'll send a RFC soon with this. The plan was to continue > with iterative state restoring, so I'm happy to know more people are > looking into it! > > In the case of vhost-vdpa it already restores the state by not > enabling dataplane until migration completes. All the load is > performed using CVQ, as you can see in > net/vhost-vdpa.c:vhost_vdpa_net_load. After that, all dataplane is > started again. > > My idea is to start vhost-vdpa (by calling vhost_vdpa_dev_start) at > the destination at the same moment the migration starts, as it will > not have dataplane enabled. After that, the source should send the > virtio-net vmstate every time it changes. vhost-vdpa net is able to > send and receive through CVQ, so it should be able to modify net > device configuration as many times as needed. I guess that could be > done by calling something in the line of your > vhost_user_set_presetup_state. This is very good approach. How do you know when virtio-net vmstate change? vhost-user and vhost-vdpa should share same code of virtio-net vmstate early sync. > > This can be improved in vhost-vdpa by being able to send only the new state. > > When all the migration is completed, vhost-vdpa net dataplane should > start as it does now. > > If you are interested in saving changes to vhost-user protocol, maybe > qemu could just disable the dataplane too with > VHOST_USER_SET_VRING_ENABLE? If not, I think both approaches have a > lot in common, so I'm sure we can develop one backend on top of > another. > > Thanks! > > [1] https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg00659.html I'm afraid just like DRIVER_OK as a hint for vhost-user vDPA to apply all the configuration to HW. Vhost-user also needs same hint as the end of each round vmstate sync to apply configuration to HW. That's why I need define new protocol message. Because of MQ can also change, VQ enable is a valid parameter to HW. HW will create only enabled queue, number of enabled queues affects RSS setting. > >> To reduce the VM downtime, the proposed approach involves capturing the basic >> device state/configuration during the VM's running stage and performing the >> initial device configuration(presetup). During the normal configuration process >> when the VM is in a stopped state, the second configuration is compared to the >> first one, and only the differences are applied to reduce downtime. Ideally, >> only the vring available index needs to be changed within VM stop. >> >> This feature is disabled by default, because backend like dpdk also needs >> adding support for vhost new message. New device property "x-early-migration" >> can enable this feature. >> >> 1. Register a new vmstate for virtio-net with an early_setup flag to send the >> device state during migration setup. >> 2. After device state load on destination VM, need to send device status to >> vhost backend in a new way. Introduce new vhost-user message: >> VHOST_USER_PRESETUP, to notify backend of presetup. >> 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow: >> a. vhost-dev sending presetup start. >> b. virtio-net setting mtu. >> c. vhost-dev sending vring configuration and setting dummy call/kick fd. >> d. vhost-net sending vring enable. >> e. vhost-dev sending presetup end. >> >> >> TODOs: >> ====== >> - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface >> if there's same requirement for vhost-vdpa. >> >> - No vIOMMU support so far. If there is a need for vIOMMU support, it is >> planned to be addressed in a follow-up patchset. >> >> >> Test: >> ===== >> - Live migration VM with 2 virtio-net devices, ping can recover. >> Together with DPDK patch [1]. >> - The time consumption of DPDK function dev_conf is reduced from 191.4 ms >> to 6.6 ms. >> >> >> References: >> =========== >> >> [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37 >> >> Any comments or feedback are highly appreciated. >> >> Thanks, >> Yajun >> >> >> Yajun Wu (5): >> vhost-user: Add presetup protocol feature and op >> vhost: Add support for presetup >> vhost-net: Add support for presetup >> virtio: Add VMState for early load >> virtio-net: Introduce LM early load >> >> docs/interop/vhost-user.rst | 10 ++ >> hw/net/trace-events | 1 + >> hw/net/vhost_net.c | 40 +++++++ >> hw/net/virtio-net.c | 100 ++++++++++++++++++ >> hw/virtio/vhost-user.c | 30 ++++++ >> hw/virtio/vhost.c | 166 +++++++++++++++++++++++++----- >> hw/virtio/virtio.c | 152 ++++++++++++++++----------- >> include/hw/virtio/vhost-backend.h | 3 + >> include/hw/virtio/vhost.h | 12 +++ >> include/hw/virtio/virtio-net.h | 1 + >> include/hw/virtio/virtio.h | 10 +- >> include/net/vhost_net.h | 3 + >> 12 files changed, 443 insertions(+), 85 deletions(-) >> >> -- >> 2.27.0 >> >>
On Wed, Oct 18, 2023 at 8:41 AM Yajun Wu <yajunw@nvidia.com> wrote: > > > On 10/18/2023 12:47 AM, Eugenio Perez Martin wrote: > > External email: Use caution opening links or attachments > > > > > > On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu <yajunw@nvidia.com> wrote: > >> This series of patches aims to minimize the downtime during live migration of a > >> virtio-net device with a vhost-user backend. In the case of hardware virtual > >> Data Path Acceleration (vDPA) implementation, the hardware configuration, which > >> includes tasks like VQ creation and RSS setting, may take above 200ms. This > >> significantly increases the downtime of the VM, particularly in terms of > >> networking. > >> > > Hi! > > > > Sorry I totally missed this email. Please CC me in next versions. > > > > Just for completion, there is an ongoing plan to reduce the downtime > > in vhost-vdpa. You can find more details at [1]. > > > > To send the state periodically is in the roadmap, but some > > benchmarking detected that memory pinning and unpinning affects more > > to downtime. I'll send a RFC soon with this. The plan was to continue > > with iterative state restoring, so I'm happy to know more people are > > looking into it! > > > > In the case of vhost-vdpa it already restores the state by not > > enabling dataplane until migration completes. All the load is > > performed using CVQ, as you can see in > > net/vhost-vdpa.c:vhost_vdpa_net_load. After that, all dataplane is > > started again. > > > > My idea is to start vhost-vdpa (by calling vhost_vdpa_dev_start) at > > the destination at the same moment the migration starts, as it will > > not have dataplane enabled. After that, the source should send the > > virtio-net vmstate every time it changes. vhost-vdpa net is able to > > send and receive through CVQ, so it should be able to modify net > > device configuration as many times as needed. I guess that could be > > done by calling something in the line of your > > vhost_user_set_presetup_state. > This is very good approach. How do you know when virtio-net vmstate > change? vhost-user and vhost-vdpa should share same code of virtio-net > vmstate early sync. CVQ in vhost-vdpa must be shadowed already to be able to migrate. Everytime the guest places a buffer in CVQ, net/vhost-vdpa.c:vhost_vdpa_net_handle_ctrl_avail is called, which calls virtio_net_handle_ctrl_iov. So virtio_net_handle_ctrl_iov should be able to check if we're migrating and signal that the state must be re-sent. > > > > This can be improved in vhost-vdpa by being able to send only the new state. > > > > When all the migration is completed, vhost-vdpa net dataplane should > > start as it does now. > > > > If you are interested in saving changes to vhost-user protocol, maybe > > qemu could just disable the dataplane too with > > VHOST_USER_SET_VRING_ENABLE? If not, I think both approaches have a > > lot in common, so I'm sure we can develop one backend on top of > > another. > > > > Thanks! > > > > [1] https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg00659.html > > I'm afraid just like DRIVER_OK as a hint for vhost-user vDPA to apply > all the configuration to HW. Vhost-user also needs same hint as the end > of each round vmstate sync to apply configuration to HW. That's why I > need define new protocol message. > > Because of MQ can also change, VQ enable is a valid parameter to HW. HW > will create only enabled queue, number of enabled queues affects RSS > setting. > I'm not sure I follow 100%, the first part is true for properties like vq address etc. For that to change, a full device reset in the destination is needed. But for the number of queues, the destination QEMU is able to send multiple CVQ commands before starting the dataplane as long as the device supports the dataplane late enabling. > > > > >> To reduce the VM downtime, the proposed approach involves capturing the basic > >> device state/configuration during the VM's running stage and performing the > >> initial device configuration(presetup). During the normal configuration process > >> when the VM is in a stopped state, the second configuration is compared to the > >> first one, and only the differences are applied to reduce downtime. Ideally, > >> only the vring available index needs to be changed within VM stop. > >> > >> This feature is disabled by default, because backend like dpdk also needs > >> adding support for vhost new message. New device property "x-early-migration" > >> can enable this feature. > >> > >> 1. Register a new vmstate for virtio-net with an early_setup flag to send the > >> device state during migration setup. > >> 2. After device state load on destination VM, need to send device status to > >> vhost backend in a new way. Introduce new vhost-user message: > >> VHOST_USER_PRESETUP, to notify backend of presetup. > >> 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow: > >> a. vhost-dev sending presetup start. > >> b. virtio-net setting mtu. > >> c. vhost-dev sending vring configuration and setting dummy call/kick fd. > >> d. vhost-net sending vring enable. > >> e. vhost-dev sending presetup end. > >> > >> > >> TODOs: > >> ====== > >> - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface > >> if there's same requirement for vhost-vdpa. > >> > >> - No vIOMMU support so far. If there is a need for vIOMMU support, it is > >> planned to be addressed in a follow-up patchset. > >> > >> > >> Test: > >> ===== > >> - Live migration VM with 2 virtio-net devices, ping can recover. > >> Together with DPDK patch [1]. > >> - The time consumption of DPDK function dev_conf is reduced from 191.4 ms > >> to 6.6 ms. > >> > >> > >> References: > >> =========== > >> > >> [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37 > >> > >> Any comments or feedback are highly appreciated. > >> > >> Thanks, > >> Yajun > >> > >> > >> Yajun Wu (5): > >> vhost-user: Add presetup protocol feature and op > >> vhost: Add support for presetup > >> vhost-net: Add support for presetup > >> virtio: Add VMState for early load > >> virtio-net: Introduce LM early load > >> > >> docs/interop/vhost-user.rst | 10 ++ > >> hw/net/trace-events | 1 + > >> hw/net/vhost_net.c | 40 +++++++ > >> hw/net/virtio-net.c | 100 ++++++++++++++++++ > >> hw/virtio/vhost-user.c | 30 ++++++ > >> hw/virtio/vhost.c | 166 +++++++++++++++++++++++++----- > >> hw/virtio/virtio.c | 152 ++++++++++++++++----------- > >> include/hw/virtio/vhost-backend.h | 3 + > >> include/hw/virtio/vhost.h | 12 +++ > >> include/hw/virtio/virtio-net.h | 1 + > >> include/hw/virtio/virtio.h | 10 +- > >> include/net/vhost_net.h | 3 + > >> 12 files changed, 443 insertions(+), 85 deletions(-) > >> > >> -- > >> 2.27.0 > >> > >> >
Hi Yajun, Sorry for the late reply. Apart from the few nitpicks commented, I think it is valid to start from this series and then add the capability to re-send the configuration in case the source changes it by another series on top. That would allow us to keep both series small. Not sure if all can be done before the next release, so we don't have to change the virtio-net migration format twice... Please let me know what you think about the comments. Thanks! On Thu, Oct 19, 2023 at 5:00 PM Eugenio Perez Martin <eperezma@redhat.com> wrote: > > On Wed, Oct 18, 2023 at 8:41 AM Yajun Wu <yajunw@nvidia.com> wrote: > > > > > > On 10/18/2023 12:47 AM, Eugenio Perez Martin wrote: ...