mbox series

[V3,vfio,0/7] Enhance the vfio-virtio driver to support live migration

Message ID 20241112083729.145005-1-yishaih@nvidia.com (mailing list archive)
Headers show
Series Enhance the vfio-virtio driver to support live migration | expand

Message

Yishai Hadas Nov. 12, 2024, 8:37 a.m. UTC
This series enhances the vfio-virtio driver to support live migration
for virtio-net Virtual Functions (VFs) that are migration-capable.
 
This series follows the Virtio 1.4 specification to implement the
necessary device parts commands, enabling a device to participate in the
live migration process.

The key VFIO features implemented include: VFIO_MIGRATION_STOP_COPY,
VFIO_MIGRATION_P2P, VFIO_MIGRATION_PRE_COPY.
 
The implementation integrates with the VFIO subsystem via vfio_pci_core
and incorporates Virtio-specific logic to handle the migration process.
 
Migration functionality follows the definitions in uapi/vfio.h and uses
the Virtio VF-to-PF admin queue command channel for executing the device
parts related commands.
 
Patch Overview:
The first four patches focus on the Virtio layer and address the
following:
- Define the layout of the device parts commands required as part of the
  migration process.
- Provide APIs to enable upper layers (e.g., VFIO, net) to execute the
  related device parts commands.
 
The last three patches focus on the VFIO layer:
- Extend the vfio-virtio driver to support live migration for Virtio-net
  VFs.
- Move legacy I/O operations to a separate file, which is compiled only
  when VIRTIO_PCI_ADMIN_LEGACY is configured, ensuring that live
  migration depends solely on VIRTIO_PCI.
 
Additional Notes:
- The kernel protocol between the source and target devices includes a
  header containing metadata such as record size, tag, and flags.
  The record size allows the target to read a complete image from the
  source before passing device part data. This follows the Virtio
  specification, which mandates that partial device parts are not
  supplied. The tag and flags serve as placeholders for future extensions
  to the kernel protocol between the source and target, ensuring backward
  and forward compatibility.
 
- Both the source and target comply with the Virtio specification by
  using a device part object with a unique ID during the migration
  process. As this resource is limited to a maximum of 255, its lifecycle
  is confined to periods when live migration is active.

- According to the Virtio specification, a device has only two states:
  RUNNING and STOPPED. Consequently, certain VFIO transitions (e.g.,
  RUNNING_P2P->STOP, STOP->RUNNING_P2P) are treated as no-ops. When
  transitioning to RUNNING_P2P, the device state is set to STOP and
  remains STOPPED until it transitions back from RUNNING_P2P->RUNNING, at
  which point it resumes its RUNNING state. During transition to STOP,
  the virtio device only stops initiating outgoing requests(e.g. DMA,
  MSIx, etc.) but still must accept incoming operations.

- Furthermore, the Virtio specification does not support reading partial
  or incremental device contexts. This means that during the PRE_COPY
  state, the vfio-virtio driver reads the full device state. This step is
  beneficial because it allows the device to send some "initial data"
  before moving to the STOP_COPY state, thus reducing downtime by
  preparing early and warming-up. As the device state can be changed and
  the benefit is highest when the pre copy data closely matches the final
  data we read it in a rate limiter mode and reporting no data available
  for some time interval after the previous call. With PRE_COPY enabled,
  we observed a downtime reduction of approximately 70-75% in various
  scenarios compared to when PRE_COPY was disabled, while keeping the
  total migration time nearly the same.

- Support for dirty page tracking during migration will be provided via
  the IOMMUFD framework.
 
- This series has been successfully tested on Virtio-net VF devices.

Changes from V2:
https://lore.kernel.org/kvm/20241111084157.88044-6-yishaih@nvidia.com/T/

Vfio:
Patch #7:
- Remove the 'select IOMMUFD_DRIVER' clause.
  As noted by Alex and Joao, IOMMUFD_DRIVER is only configuring in
  iova_bitmap support independent of IOMMUFD, which is not needed for
  this driver.

Changes from V1:
https://lore.kernel.org/kvm/20241104102131.184193-3-yishaih@nvidia.com/T/

Virtio:
Patches #1-#4:
- Add Acked-by: Michael S. Tsirkin <mst@redhat.com>

Vfio:
Patch #5:
- Set the driver 'ops' within probe(), including the call to
  virtiovf_set_migratable() right after allocating the virtiodev.
  Consequently, virtiovf_pci_init_device() is now only used for legacy IO
  access, where its specific initialization is performed. This approach
  was recommended by Alex to avoid modifying the 'ops' pointer after the
  core device has been allocated.
- Fix multi-line comment style.

Patch #6:
- Improve the rate limiter flow to account for the 'initial/dirty bytes'
  which potentially might not being fully read yet, as noted by Alex.
- Add a comment for the rate limiter uristic. As Alex mentioned this
  would be useful for future maintenance.

Patch #7:
- Add a new boolean Kconfig entry named
  CONFIG_VIRTIO_VFIO_PCI_ADMIN_LEGACY as a submenu item under the
  primary driver functionality for live migration. This new entry allows
  users to selectively enable legacy access functionality, providing a
  dedicated description for each feature.
  This addition aligns with Alex's recommendation in this area.

Changes from V0:
https://lore.kernel.org/kvm/20241101102518.1bf2c6e6.alex.williamson@redhat.com/T/

Vfio:
Patch #5:
- Enhance the commit log to provide a clearer explanation of P2P
  behavior over Virtio devices, as discussed on the mailing list.
Patch #6:
- Implement the rate limiter mechanism as part of the PRE_COPY state,
  following Alex's suggestion.
- Update the commit log to include actual data demonstrating the impact of
  PRE_COPY, as requested by Alex.
Patch #7:
- Update the default driver operations (i.e., vfio_device_ops) to use
  the live migration set, and expand it to include the legacy I/O
  operations if they are compiled and supported.

Yishai

Yishai Hadas (7):
  virtio_pci: Introduce device parts access commands
  virtio: Extend the admin command to include the result size
  virtio: Manage device and driver capabilities via the admin commands
  virtio-pci: Introduce APIs to execute device parts admin commands
  vfio/virtio: Add support for the basic live migration functionality
  vfio/virtio: Add PRE_COPY support for live migration
  vfio/virtio: Enable live migration once VIRTIO_PCI was configured

 drivers/vfio/pci/virtio/Kconfig     |   42 +-
 drivers/vfio/pci/virtio/Makefile    |    3 +-
 drivers/vfio/pci/virtio/common.h    |  127 +++
 drivers/vfio/pci/virtio/legacy_io.c |  418 +++++++++
 drivers/vfio/pci/virtio/main.c      |  476 ++--------
 drivers/vfio/pci/virtio/migrate.c   | 1335 +++++++++++++++++++++++++++
 drivers/virtio/virtio_pci_common.h  |   19 +-
 drivers/virtio/virtio_pci_modern.c  |  457 ++++++++-
 include/linux/virtio.h              |    1 +
 include/linux/virtio_pci_admin.h    |   11 +
 include/uapi/linux/virtio_pci.h     |  131 +++
 11 files changed, 2599 insertions(+), 421 deletions(-)
 create mode 100644 drivers/vfio/pci/virtio/common.h
 create mode 100644 drivers/vfio/pci/virtio/legacy_io.c
 create mode 100644 drivers/vfio/pci/virtio/migrate.c