@@ -322,6 +322,32 @@ VhostUserShared
:UUID: 16 bytes UUID, whose first three components (a 32-bit value, then
two 16-bit values) are stored in big endian.
+Device state transfer parameters
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
++--------------------+-----------------+
+| transfer direction | migration phase |
++--------------------+-----------------+
+
+:transfer direction: a 32-bit enum, describing the direction in which
+ the state is transferred:
+
+ - 0: Save: Transfer the state from the back-end to the front-end,
+ which happens on the source side of migration
+ - 1: Load: Transfer the state from the front-end to the back-end,
+ which happens on the destination side of migration
+
+:migration phase: a 32-bit enum, describing the state in which the VM
+ guest and devices are:
+
+ - 0: Stopped (in the period after the transfer of memory-mapped
+ regions before switch-over to the destination): The VM guest is
+ stopped, and the vhost-user device is suspended (see
+ :ref:`Suspended device state <suspended_device_state>`).
+
+ In the future, additional phases might be added e.g. to allow
+ iterative migration while the device is running.
+
C structure
-----------
@@ -381,6 +407,7 @@ in the ancillary data:
* ``VHOST_USER_SET_VRING_ERR``
* ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
+* ``VHOST_USER_SET_DEVICE_STATE_FD``
If *front-end* is unable to send the full message or receives a wrong
reply it will close the connection. An optional reconnection mechanism
@@ -555,6 +582,80 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
back-end. The front-end indicates support for this via the
``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
+.. _migrating_backend_state:
+
+Migrating back-end state
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Migrating device state involves transferring the state from one
+back-end, called the source, to another back-end, called the
+destination. After migration, the destination transparently resumes
+operation without requiring the driver to re-initialize the device at
+the VIRTIO level. If the migration fails, then the source can
+transparently resume operation until another migration attempt is made.
+
+Generally, the front-end is connected to a virtual machine guest (which
+contains the driver), which has its own state to transfer between source
+and destination, and therefore will have an implementation-specific
+mechanism to do so. The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature
+provides functionality to have the front-end include the back-end's
+state in this transfer operation so the back-end does not need to
+implement its own mechanism, and so the virtual machine may have its
+complete state, including vhost-user devices' states, contained within a
+single stream of data.
+
+To do this, the back-end state is transferred from back-end to front-end
+on the source side, and vice versa on the destination side. This
+transfer happens over a channel that is negotiated using the
+``VHOST_USER_SET_DEVICE_STATE_FD`` message. This message has two
+parameters:
+
+* Direction of transfer: On the source, the data is saved, transferring
+ it from the back-end to the front-end. On the destination, the data
+ is loaded, transferring it from the front-end to the back-end.
+
+* Migration phase: Currently, the only supported phase is the period
+ after the transfer of memory-mapped regions before switch-over to the
+ destination, when both the source and destination devices are
+ suspended (:ref:`Suspended device state <suspended_device_state>`).
+ In the future, additional phases might be supported to allow iterative
+ migration while the device is running.
+
+The nature of the channel is implementation-defined, but it must
+generally behave like a pipe: The writing end will write all the data it
+has into it, signalling the end of data by closing its end. The reading
+end must read all of this data (until encountering the end of file) and
+process it.
+
+* When saving, the writing end is the source back-end, and the reading
+ end is the source front-end. After reading the state data from the
+ channel, the source front-end must transfer it to the destination
+ front-end through an implementation-defined mechanism.
+
+* When loading, the writing end is the destination front-end, and the
+ reading end is the destination back-end. After reading the state data
+ from the channel, the destination back-end must deserialize its
+ internal state from that data and set itself up to allow the driver to
+ seamlessly resume operation on the VIRTIO level.
+
+Seamlessly resuming operation means that the migration must be
+transparent to the guest driver, which operates on the VIRTIO level.
+This driver will not perform any re-initialization steps, but continue
+to use the device as if no migration had occurred. The vhost-user
+front-end, however, will re-initialize the vhost state on the
+destination, following the usual protocol for establishing a connection
+to a vhost-user back-end: This includes, for example, setting up memory
+mappings and kick and call FDs as necessary, negotiating protocol
+features, or setting the initial vring base indices (to the same value
+as on the source side, so that operation can resume).
+
+Both on the source and on the destination side, after the respective
+front-end has seen all data transferred (when the transfer FD has been
+closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to
+verify that data transfer was successful in the back-end, too. The
+back-end responds once it knows whether the transfer and processing was
+successful or not.
+
Memory access
-------------
@@ -949,6 +1050,7 @@ Protocol features
#define VHOST_USER_PROTOCOL_F_STATUS 16
#define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
#define VHOST_USER_PROTOCOL_F_SHARED_OBJECT 18
+ #define VHOST_USER_PROTOCOL_F_DEVICE_STATE 19
Front-end message types
-----------------------
@@ -1553,6 +1655,76 @@ Front-end message types
the requested UUID. Back-end will reply passing the fd when the operation
is successful, or no fd otherwise.
+``VHOST_USER_SET_DEVICE_STATE_FD``
+ :id: 42
+ :equivalent ioctl: N/A
+ :request payload: device state transfer parameters
+ :reply payload: ``u64``
+
+ Front-end and back-end negotiate a channel over which to transfer the
+ back-end’s internal state during migration. Either side (front-end or
+ back-end) may create the channel. The nature of this channel is not
+ restricted or defined in this document, but whichever side creates it
+ must create a file descriptor that is provided to the respectively
+ other side, allowing access to the channel. This FD must behave as
+ follows:
+
+ * For the writing end, it must allow writing the whole back-end state
+ sequentially. Closing the file descriptor signals the end of
+ transfer.
+
+ * For the reading end, it must allow reading the whole back-end state
+ sequentially. The end of file signals the end of the transfer.
+
+ For example, the channel may be a pipe, in which case the two ends of
+ the pipe fulfill these requirements respectively.
+
+ Initially, the front-end creates a channel along with such an FD. It
+ passes the FD to the back-end as ancillary data of a
+ ``VHOST_USER_SET_DEVICE_STATE_FD`` message. The back-end may create a
+ different transfer channel, passing the respective FD back to the
+ front-end as ancillary data of the reply. If so, the front-end must
+ then discard its channel and use the one provided by the back-end.
+
+ Whether the back-end should decide to use its own channel is decided
+ based on efficiency: If the channel is a pipe, both ends will most
+ likely need to copy data into and out of it. Any channel that allows
+ for more efficient processing on at least one end, e.g. through
+ zero-copy, is considered more efficient and thus preferred. If the
+ back-end can provide such a channel, it should decide to use it.
+
+ The request payload contains parameters for the subsequent data
+ transfer, as described in the :ref:`Migrating back-end state
+ <migrating_backend_state>` section.
+
+ The value returned is both an indication for success, and whether a
+ file descriptor for a back-end-provided channel is returned: Bits 0–7
+ are 0 on success, and non-zero on error. Bit 8 is the invalid FD
+ flag; this flag is set when there is no file descriptor returned.
+ When this flag is not set, the front-end must use the returned file
+ descriptor as its end of the transfer channel. The back-end must not
+ both indicate an error and return a file descriptor.
+
+ Using this function requires prior negotiation of the
+ ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
+
+``VHOST_USER_CHECK_DEVICE_STATE``
+ :id: 43
+ :equivalent ioctl: N/A
+ :request payload: N/A
+ :reply payload: ``u64``
+
+ After transferring the back-end’s internal state during migration (see
+ the :ref:`Migrating back-end state <migrating_backend_state>`
+ section), check whether the back-end was able to successfully fully
+ process the state.
+
+ The value returned indicates success or error; 0 is success, any
+ non-zero value is an error.
+
+ Using this function requires prior negotiation of the
+ ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
+
Back-end message types
----------------------