Message ID | 1603449643-12851-1-git-send-email-kwankhede@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Add migration support for VFIO devices | expand |
Hi Kirti, On 10/23/20 12:40 PM, Kirti Wankhede wrote: > Hi, > > This Patch set adds migration support for VFIO devices in QEMU. ... > Since there is no device which has hardware support for system memmory > dirty bitmap tracking, right now there is no other API from vendor driver > to VFIO IOMMU module to report dirty pages. In future, when such hardware > support will be implemented, an API will be required in kernel such that > vendor driver could report dirty pages to VFIO module during migration phases. > > Below is the flow of state change for live migration where states in brackets > represent VM state, migration state and VFIO device state as: > (VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE) > > Live migration save path: > QEMU normal running state > (RUNNING, _NONE, _RUNNING) > | > migrate_init spawns migration_thread. > (RUNNING, _SETUP, _RUNNING|_SAVING) > Migration thread then calls each device's .save_setup() > | > (RUNNING, _ACTIVE, _RUNNING|_SAVING) > If device is active, get pending bytes by .save_live_pending() > if pending bytes >= threshold_size, call save_live_iterate() > Data of VFIO device for pre-copy phase is copied. > Iterate till total pending bytes converge and are less than threshold > | > On migration completion, vCPUs stops and calls .save_live_complete_precopy > for each active device. VFIO device is then transitioned in > _SAVING state. > (FINISH_MIGRATE, _DEVICE, _SAVING) > For VFIO device, iterate in .save_live_complete_precopy until > pending data is 0. > (FINISH_MIGRATE, _DEVICE, _STOPPED) > | > (FINISH_MIGRATE, _COMPLETED, _STOPPED) > Migraton thread schedule cleanup bottom half and exit > > Live migration resume path: > Incomming migration calls .load_setup for each device > (RESTORE_VM, _ACTIVE, _STOPPED) > | > For each device, .load_state is called for that device section data > (RESTORE_VM, _ACTIVE, _RESUMING) > | > At the end, called .load_cleanup for each device and vCPUs are started. > | > (RUNNING, _NONE, _RUNNING) > > Note that: > - Migration post copy is not supported. Can you commit this ^^^ somewhere in docs/devel/ please? (as a patch on top of this series)
On 10/24/2020 10:26 PM, Philippe Mathieu-Daudé wrote: > Hi Kirti, > > On 10/23/20 12:40 PM, Kirti Wankhede wrote: >> Hi, >> >> This Patch set adds migration support for VFIO devices in QEMU. > ... > >> Since there is no device which has hardware support for system memmory >> dirty bitmap tracking, right now there is no other API from vendor driver >> to VFIO IOMMU module to report dirty pages. In future, when such hardware >> support will be implemented, an API will be required in kernel such that >> vendor driver could report dirty pages to VFIO module during migration >> phases. >> >> Below is the flow of state change for live migration where states in >> brackets >> represent VM state, migration state and VFIO device state as: >> (VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE) >> >> Live migration save path: >> QEMU normal running state >> (RUNNING, _NONE, _RUNNING) >> | >> migrate_init spawns migration_thread. >> (RUNNING, _SETUP, _RUNNING|_SAVING) >> Migration thread then calls each device's .save_setup() >> | >> (RUNNING, _ACTIVE, _RUNNING|_SAVING) >> If device is active, get pending bytes by .save_live_pending() >> if pending bytes >= threshold_size, call save_live_iterate() >> Data of VFIO device for pre-copy phase is copied. >> Iterate till total pending bytes converge and are less than >> threshold >> | >> On migration completion, vCPUs stops and calls >> .save_live_complete_precopy >> for each active device. VFIO device is then transitioned in >> _SAVING state. >> (FINISH_MIGRATE, _DEVICE, _SAVING) >> For VFIO device, iterate in .save_live_complete_precopy until >> pending data is 0. >> (FINISH_MIGRATE, _DEVICE, _STOPPED) >> | >> (FINISH_MIGRATE, _COMPLETED, _STOPPED) >> Migraton thread schedule cleanup bottom half and exit >> >> Live migration resume path: >> Incomming migration calls .load_setup for each device >> (RESTORE_VM, _ACTIVE, _STOPPED) >> | >> For each device, .load_state is called for that device section data >> (RESTORE_VM, _ACTIVE, _RESUMING) >> | >> At the end, called .load_cleanup for each device and vCPUs are >> started. >> | >> (RUNNING, _NONE, _RUNNING) >> >> Note that: >> - Migration post copy is not supported. > > Can you commit this ^^^ somewhere in docs/devel/ please? > (as a patch on top of this series) > Philippe, Alex, I'm going to respin this series with r-bs and fix suggested by Yan. Should this doc be part of this series or we can add it later after 10/27 if again review of this doc would need some iterations? Thanks, Kirti
On 10/24/20 7:48 PM, Kirti Wankhede wrote: > On 10/24/2020 10:26 PM, Philippe Mathieu-Daudé wrote: >> Hi Kirti, >> >> On 10/23/20 12:40 PM, Kirti Wankhede wrote: >>> Hi, >>> >>> This Patch set adds migration support for VFIO devices in QEMU. >> ... >> >>> Since there is no device which has hardware support for system memmory >>> dirty bitmap tracking, right now there is no other API from vendor >>> driver >>> to VFIO IOMMU module to report dirty pages. In future, when such >>> hardware >>> support will be implemented, an API will be required in kernel such that >>> vendor driver could report dirty pages to VFIO module during >>> migration phases. >>> >>> Below is the flow of state change for live migration where states in >>> brackets >>> represent VM state, migration state and VFIO device state as: >>> (VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE) >>> >>> Live migration save path: >>> QEMU normal running state >>> (RUNNING, _NONE, _RUNNING) >>> | >>> migrate_init spawns migration_thread. >>> (RUNNING, _SETUP, _RUNNING|_SAVING) >>> Migration thread then calls each device's .save_setup() >>> | >>> (RUNNING, _ACTIVE, _RUNNING|_SAVING) >>> If device is active, get pending bytes by .save_live_pending() >>> if pending bytes >= threshold_size, call save_live_iterate() >>> Data of VFIO device for pre-copy phase is copied. >>> Iterate till total pending bytes converge and are less than >>> threshold >>> | >>> On migration completion, vCPUs stops and calls >>> .save_live_complete_precopy >>> for each active device. VFIO device is then transitioned in >>> _SAVING state. >>> (FINISH_MIGRATE, _DEVICE, _SAVING) >>> For VFIO device, iterate in .save_live_complete_precopy until >>> pending data is 0. >>> (FINISH_MIGRATE, _DEVICE, _STOPPED) >>> | >>> (FINISH_MIGRATE, _COMPLETED, _STOPPED) >>> Migraton thread schedule cleanup bottom half and exit >>> >>> Live migration resume path: >>> Incomming migration calls .load_setup for each device >>> (RESTORE_VM, _ACTIVE, _STOPPED) >>> | >>> For each device, .load_state is called for that device section data >>> (RESTORE_VM, _ACTIVE, _RESUMING) >>> | >>> At the end, called .load_cleanup for each device and vCPUs are >>> started. >>> | >>> (RUNNING, _NONE, _RUNNING) >>> >>> Note that: >>> - Migration post copy is not supported. >> >> Can you commit this ^^^ somewhere in docs/devel/ please? >> (as a patch on top of this series) >> > > Philippe, Alex, > I'm going to respin this series with r-bs and fix suggested by Yan. > Should this doc be part of this series or we can add it later after > 10/27 if again review of this doc would need some iterations? I suppose it is up to the maintainer, no objection from my part. This information seems valuable and wouldn't like it be lost. If by 10/27 you refer to the "soft freeze", then there is no problem to add documentation patches after this date :) Regards, Phil.