Message ID | 20220130160826.32449-11-yishaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add mlx5 live migration driver and v2 migration protocol | expand |
On Sun, Jan 30 2022, Yishai Hadas <yishaih@nvidia.com> wrote: > From: Jason Gunthorpe <jgg@nvidia.com> > > v1 was never implemented and is replaced by v2. > > The old uAPI definitions are removed from the header file. As per Linus's > past remarks we do not have a hard requirement to retain compilation > compatibility in uapi headers and qemu is already following Linus's > preferred model of copying the kernel headers. If we are all in agreement that we will replace v1 with v2 (and I think we are), we probably should remove the x-enable-migration stuff in QEMU sooner rather than later, to avoid leaving a trap for the next unsuspecting person trying to update the headers. > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Yishai Hadas <yishaih@nvidia.com> > --- > include/uapi/linux/vfio.h | 228 -------------------------------------- > 1 file changed, 228 deletions(-) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 9efc35535b29..70c77da5812d 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -323,7 +323,6 @@ struct vfio_region_info_cap_type { > #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) > #define VFIO_REGION_TYPE_GFX (1) > #define VFIO_REGION_TYPE_CCW (2) > -#define VFIO_REGION_TYPE_MIGRATION (3) Do we want to keep region type 3 reserved? Probably not really needed, but would put us on the safe side.
On Tue, Feb 01, 2022 at 12:23:05PM +0100, Cornelia Huck wrote: > On Sun, Jan 30 2022, Yishai Hadas <yishaih@nvidia.com> wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > v1 was never implemented and is replaced by v2. > > > > The old uAPI definitions are removed from the header file. As per Linus's > > past remarks we do not have a hard requirement to retain compilation > > compatibility in uapi headers and qemu is already following Linus's > > preferred model of copying the kernel headers. > > If we are all in agreement that we will replace v1 with v2 (and I think > we are), we probably should remove the x-enable-migration stuff in QEMU > sooner rather than later, to avoid leaving a trap for the next > unsuspecting person trying to update the headers. Once we have agreement on the kernel patch we plan to send a QEMU patch making it support the v2 interface and the migration non-experimental. We are also working to fixing the error paths, at least least within the limitations of the current qemu design. The v1 support should remain in old releases as it is being used in the field "experimentally". > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 9efc35535b29..70c77da5812d 100644 > > +++ b/include/uapi/linux/vfio.h > > @@ -323,7 +323,6 @@ struct vfio_region_info_cap_type { > > #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) > > #define VFIO_REGION_TYPE_GFX (1) > > #define VFIO_REGION_TYPE_CCW (2) > > -#define VFIO_REGION_TYPE_MIGRATION (3) > > Do we want to keep region type 3 reserved? Probably not really needed, > but would put us on the safe side. Yes, thanks, this was too zealous to drop it Jason
On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Feb 01, 2022 at 12:23:05PM +0100, Cornelia Huck wrote: >> On Sun, Jan 30 2022, Yishai Hadas <yishaih@nvidia.com> wrote: >> >> > From: Jason Gunthorpe <jgg@nvidia.com> >> > >> > v1 was never implemented and is replaced by v2. >> > >> > The old uAPI definitions are removed from the header file. As per Linus's >> > past remarks we do not have a hard requirement to retain compilation >> > compatibility in uapi headers and qemu is already following Linus's >> > preferred model of copying the kernel headers. >> >> If we are all in agreement that we will replace v1 with v2 (and I think >> we are), we probably should remove the x-enable-migration stuff in QEMU >> sooner rather than later, to avoid leaving a trap for the next >> unsuspecting person trying to update the headers. > > Once we have agreement on the kernel patch we plan to send a QEMU > patch making it support the v2 interface and the migration > non-experimental. We are also working to fixing the error paths, at > least least within the limitations of the current qemu design. I'd argue that just ripping out the old interface first would be easier, as it does not require us to synchronize with a headers sync (and does not require to synchronize a headers sync with ripping it out...) > The v1 support should remain in old releases as it is being used in > the field "experimentally". Of course; it would be hard to rip it out retroactively :) But it should really be gone in QEMU 7.0. Considering adding the v2 uapi, we might get unlucky: The Linux 5.18 merge window will likely be in mid-late March (and we cannot run a headers sync before the patches hit Linus' tree), while QEMU 7.0 will likely enter freeze in mid-late March as well. So there's a non-zero chance that the new uapi will need to be deferred to 7.1.
On Tue, Feb 01, 2022 at 01:39:23PM +0100, Cornelia Huck wrote: > On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > On Tue, Feb 01, 2022 at 12:23:05PM +0100, Cornelia Huck wrote: > >> On Sun, Jan 30 2022, Yishai Hadas <yishaih@nvidia.com> wrote: > >> > >> > From: Jason Gunthorpe <jgg@nvidia.com> > >> > > >> > v1 was never implemented and is replaced by v2. > >> > > >> > The old uAPI definitions are removed from the header file. As per Linus's > >> > past remarks we do not have a hard requirement to retain compilation > >> > compatibility in uapi headers and qemu is already following Linus's > >> > preferred model of copying the kernel headers. > >> > >> If we are all in agreement that we will replace v1 with v2 (and I think > >> we are), we probably should remove the x-enable-migration stuff in QEMU > >> sooner rather than later, to avoid leaving a trap for the next > >> unsuspecting person trying to update the headers. > > > > Once we have agreement on the kernel patch we plan to send a QEMU > > patch making it support the v2 interface and the migration > > non-experimental. We are also working to fixing the error paths, at > > least least within the limitations of the current qemu design. > > I'd argue that just ripping out the old interface first would be easier, > as it does not require us to synchronize with a headers sync (and does > not require to synchronize a headers sync with ripping it out...) We haven't worked out the best way to organize the qemu patch series, currently it is just one patch that updates everything together, but that is perhaps a bit too big... I have thought that a 3 patch series deleting the existing v1 code and then readding it is a potential option, but we don't change everything, just almost everything.. > > The v1 support should remain in old releases as it is being used in > > the field "experimentally". > > Of course; it would be hard to rip it out retroactively :) > > But it should really be gone in QEMU 7.0. Seems like you are arguing from both sides, we can't put the v2 in to 7.0 because Linus has not accepted it but we have to rip the v1 out even though Linus hasn't accepted that? We can certainly defer the kernels removal patch for a release if it makes qemu's life easier? > Considering adding the v2 uapi, we might get unlucky: The Linux 5.18 > merge window will likely be in mid-late March (and we cannot run a > headers sync before the patches hit Linus' tree), while QEMU 7.0 will > likely enter freeze in mid-late March as well. So there's a non-zero > chance that the new uapi will need to be deferred to 7.1. Usually in rdma land we start advancing the user side once the kernel patches hit the kernel maintainer tree, not Linus's. I run a non-rebasing tree so that gives a permanent git hash. It works well enough and avoids these kinds of artificial delays. Anyhow, it doesn't matter much for the kernel series, but the sooner we can agree on this the better, I suppose. Jason
On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Feb 01, 2022 at 01:39:23PM +0100, Cornelia Huck wrote: >> On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: >> >> > On Tue, Feb 01, 2022 at 12:23:05PM +0100, Cornelia Huck wrote: >> >> On Sun, Jan 30 2022, Yishai Hadas <yishaih@nvidia.com> wrote: >> >> >> >> > From: Jason Gunthorpe <jgg@nvidia.com> >> >> > >> >> > v1 was never implemented and is replaced by v2. >> >> > >> >> > The old uAPI definitions are removed from the header file. As per Linus's >> >> > past remarks we do not have a hard requirement to retain compilation >> >> > compatibility in uapi headers and qemu is already following Linus's >> >> > preferred model of copying the kernel headers. >> >> >> >> If we are all in agreement that we will replace v1 with v2 (and I think >> >> we are), we probably should remove the x-enable-migration stuff in QEMU >> >> sooner rather than later, to avoid leaving a trap for the next >> >> unsuspecting person trying to update the headers. >> > >> > Once we have agreement on the kernel patch we plan to send a QEMU >> > patch making it support the v2 interface and the migration >> > non-experimental. We are also working to fixing the error paths, at >> > least least within the limitations of the current qemu design. >> >> I'd argue that just ripping out the old interface first would be easier, >> as it does not require us to synchronize with a headers sync (and does >> not require to synchronize a headers sync with ripping it out...) > > We haven't worked out the best way to organize the qemu patch series, > currently it is just one patch that updates everything together, but > that is perhaps a bit too big... > > I have thought that a 3 patch series deleting the existing v1 code and > then readding it is a potential option, but we don't change > everything, just almost everything.. Even in that case, removing the old code and adding the new one is probably much easier to review. (Also, you obviously need to have the header update in between those two stages.) > >> > The v1 support should remain in old releases as it is being used in >> > the field "experimentally". >> >> Of course; it would be hard to rip it out retroactively :) >> >> But it should really be gone in QEMU 7.0. > > Seems like you are arguing from both sides, we can't put the v2 in to > 7.0 because Linus has not accepted it but we have to rip the v1 out > even though Linus hasn't accepted that? > > We can certainly defer the kernels removal patch for a release if it > makes qemu's life easier? No, I'm only talking about the QEMU implementation (i.e. the code that uses the v1 definitions and exposes x-enable-migration). Any change in the headers needs to be done via a sync with upstream Linux. > >> Considering adding the v2 uapi, we might get unlucky: The Linux 5.18 >> merge window will likely be in mid-late March (and we cannot run a >> headers sync before the patches hit Linus' tree), while QEMU 7.0 will >> likely enter freeze in mid-late March as well. So there's a non-zero >> chance that the new uapi will need to be deferred to 7.1. > > Usually in rdma land we start advancing the user side once the kernel > patches hit the kernel maintainer tree, not Linus's. I run a > non-rebasing tree so that gives a permanent git hash. It works well > enough and avoids these kinds of artificial delays. QEMU policy is "it must be in Linus' tree [*]", because we run a full header sync. We have been bitten by premature updates in the past. Updates of only parts of the headers are only acceptable during development of a patch series, and must be marked as "will be replaced with a proper header sync". [*] Preferrably a (full or -rc) release, but the very minimum is a git hash from his tree.
On Tue, Feb 01, 2022 at 02:26:29PM +0100, Cornelia Huck wrote: > > We can certainly defer the kernels removal patch for a release if it > > makes qemu's life easier? > > No, I'm only talking about the QEMU implementation (i.e. the code that > uses the v1 definitions and exposes x-enable-migration). Any change in > the headers needs to be done via a sync with upstream Linux. If we leave the v1 and v2 defs in the kernel header then qemu can sync and do the trivial rename and keep going as-is. Then we can come with the patches to qemu update to v2, however that looks. We'll clean the kernel header in the next cylce. OK? Jason
On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Feb 01, 2022 at 02:26:29PM +0100, Cornelia Huck wrote: > >> > We can certainly defer the kernels removal patch for a release if it >> > makes qemu's life easier? >> >> No, I'm only talking about the QEMU implementation (i.e. the code that >> uses the v1 definitions and exposes x-enable-migration). Any change in >> the headers needs to be done via a sync with upstream Linux. > > If we leave the v1 and v2 defs in the kernel header then qemu can sync > and do the trivial rename and keep going as-is. > > Then we can come with the patches to qemu update to v2, however that > looks. > > We'll clean the kernel header in the next cylce. I'm not sure we're talking about the same things here... My proposal is: - remove the current QEMU implementation of vfio migration for 7.0 (it's experimental, and if there's anybody experimenting with that, they can stay on 6.2) - continue with getting this proposal for the kernel into good shape, so that it can hopefully make the next merge window (- also continue to get the documentation into good shape) - have an RFC for QEMU that contains a provisional update of the relevant vfio headers so that we can discuss the QEMU side (and maybe shoot down any potential problems in the uapi before they are merged in the kernel) I don't think a "dual version header" would really help here. If we don't want to rip out the old QEMU implementation yet, I can certainly also live with that. We just need to be mindful once the changes hit Linus' tree, but it is quite likely that QEMU would be in freeze by then. As long as updating the headers leads to an obvious failure, it's managable (although the removal would still be my preferred approach.) Alex, what do you think?
On Tue, Feb 01, 2022 at 03:19:18PM +0100, Cornelia Huck wrote: > - remove the current QEMU implementation of vfio migration for 7.0 (it's > experimental, and if there's anybody experimenting with that, they can > stay on 6.2) I think we went from "we clarified how the ABI works and made something ABI compataible with qemu" to "let's delete the whole thing from a released qemu" rather quickly.. To be clear this is still all logically compatible with the v1 interface, and we might, or might not, want to use the ABI compatible version we already built out of tree to support the existing installed base of qemu. Dropping the whole thing seems to only make things worse for this ecosystem, IMHO. > (- also continue to get the documentation into good shape) Which items do you see here? > - have an RFC for QEMU that contains a provisional update of the > relevant vfio headers so that we can discuss the QEMU side (and maybe > shoot down any potential problems in the uapi before they are merged > in the kernel) This qemu patch is linked in the cover letter. Jason
On Tue, 01 Feb 2022 13:39:23 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > On Tue, Feb 01, 2022 at 12:23:05PM +0100, Cornelia Huck wrote: > >> On Sun, Jan 30 2022, Yishai Hadas <yishaih@nvidia.com> wrote: > >> > >> > From: Jason Gunthorpe <jgg@nvidia.com> > >> > > >> > v1 was never implemented and is replaced by v2. > >> > > >> > The old uAPI definitions are removed from the header file. As per Linus's > >> > past remarks we do not have a hard requirement to retain compilation > >> > compatibility in uapi headers and qemu is already following Linus's > >> > preferred model of copying the kernel headers. > >> > >> If we are all in agreement that we will replace v1 with v2 (and I think > >> we are), we probably should remove the x-enable-migration stuff in QEMU > >> sooner rather than later, to avoid leaving a trap for the next > >> unsuspecting person trying to update the headers. > > > > Once we have agreement on the kernel patch we plan to send a QEMU > > patch making it support the v2 interface and the migration > > non-experimental. We are also working to fixing the error paths, at > > least least within the limitations of the current qemu design. > > I'd argue that just ripping out the old interface first would be easier, > as it does not require us to synchronize with a headers sync (and does > not require to synchronize a headers sync with ripping it out...) > > > The v1 support should remain in old releases as it is being used in > > the field "experimentally". > > Of course; it would be hard to rip it out retroactively :) > > But it should really be gone in QEMU 7.0. > > Considering adding the v2 uapi, we might get unlucky: The Linux 5.18 > merge window will likely be in mid-late March (and we cannot run a > headers sync before the patches hit Linus' tree), while QEMU 7.0 will > likely enter freeze in mid-late March as well. So there's a non-zero > chance that the new uapi will need to be deferred to 7.1. Agreed that v1 migration TYPE/SUBTYPE should live in infamy as reserved, but I'm not sure why we need to make the rest of it a big complicated problem. On one hand, leaving stubs for the necessary structure and macros until QEMU gets updated doesn't seem so terrible. Nor actually does letting the next QEMU header update cause build breakages, which would probably frustrate the person submitting that update, but it's not like QEMU hasn't done selective header updates in the past. The former is probably the more friendly approach if we don't outrage someone in the kernel community in the meantime. Thanks, Alex
On Tue, Feb 01, 2022 at 04:01:06PM -0700, Alex Williamson wrote: > Agreed that v1 migration TYPE/SUBTYPE should live in infamy as > reserved, but I'm not sure why we need to make the rest of it a big > complicated problem. On one hand, leaving stubs for the necessary > structure and macros until QEMU gets updated doesn't seem so terrible. > Nor actually does letting the next QEMU header update cause build > breakages, which would probably frustrate the person submitting that > update, but it's not like QEMU hasn't done selective header updates in > the past. The former is probably the more friendly approach if we > don't outrage someone in the kernel community in the meantime. So lets drop the removal patch and keep the V1 rename, it is easy for qemu to follow along with this. Sometime later we can purge all the dead things from the header, eg the POWERNV stuff we left behind last year as well. Thanks, Jason
On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Feb 01, 2022 at 03:19:18PM +0100, Cornelia Huck wrote: >> (- also continue to get the documentation into good shape) > > Which items do you see here? Well, it still needs to be updated, no? > >> - have an RFC for QEMU that contains a provisional update of the >> relevant vfio headers so that we can discuss the QEMU side (and maybe >> shoot down any potential problems in the uapi before they are merged >> in the kernel) > > This qemu patch is linked in the cover letter. The QEMU changes need to be discussed on qemu-devel, a link to a git tree with work in progress only goes so far. (From my quick look there, this needs to have any headers changes split out into a separate patch. The changes in migration.c are hard to review; is there any chance to split the error path cleanups from the interface changes?)
On Tue, Feb 01 2022, Alex Williamson <alex.williamson@redhat.com> wrote: > On Tue, 01 Feb 2022 13:39:23 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > >> On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: >> >> > On Tue, Feb 01, 2022 at 12:23:05PM +0100, Cornelia Huck wrote: >> >> On Sun, Jan 30 2022, Yishai Hadas <yishaih@nvidia.com> wrote: >> >> >> >> > From: Jason Gunthorpe <jgg@nvidia.com> >> >> > >> >> > v1 was never implemented and is replaced by v2. >> >> > >> >> > The old uAPI definitions are removed from the header file. As per Linus's >> >> > past remarks we do not have a hard requirement to retain compilation >> >> > compatibility in uapi headers and qemu is already following Linus's >> >> > preferred model of copying the kernel headers. >> >> >> >> If we are all in agreement that we will replace v1 with v2 (and I think >> >> we are), we probably should remove the x-enable-migration stuff in QEMU >> >> sooner rather than later, to avoid leaving a trap for the next >> >> unsuspecting person trying to update the headers. >> > >> > Once we have agreement on the kernel patch we plan to send a QEMU >> > patch making it support the v2 interface and the migration >> > non-experimental. We are also working to fixing the error paths, at >> > least least within the limitations of the current qemu design. >> >> I'd argue that just ripping out the old interface first would be easier, >> as it does not require us to synchronize with a headers sync (and does >> not require to synchronize a headers sync with ripping it out...) >> >> > The v1 support should remain in old releases as it is being used in >> > the field "experimentally". >> >> Of course; it would be hard to rip it out retroactively :) >> >> But it should really be gone in QEMU 7.0. >> >> Considering adding the v2 uapi, we might get unlucky: The Linux 5.18 >> merge window will likely be in mid-late March (and we cannot run a >> headers sync before the patches hit Linus' tree), while QEMU 7.0 will >> likely enter freeze in mid-late March as well. So there's a non-zero >> chance that the new uapi will need to be deferred to 7.1. > > > Agreed that v1 migration TYPE/SUBTYPE should live in infamy as > reserved, but I'm not sure why we need to make the rest of it a big > complicated problem. On one hand, leaving stubs for the necessary > structure and macros until QEMU gets updated doesn't seem so terrible. > Nor actually does letting the next QEMU header update cause build > breakages, which would probably frustrate the person submitting that > update, but it's not like QEMU hasn't done selective header updates in > the past. The former is probably the more friendly approach if we > don't outrage someone in the kernel community in the meantime. Leaving stubs in (while making it clear that v1 is not something you should use) seems like a good compromise. While we have done selective headers updates in QEMU in the past, I always found them painful, so I'd like to avoid that.
On Wed, Feb 02, 2022 at 12:34:31PM +0100, Cornelia Huck wrote: > On Tue, Feb 01 2022, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > On Tue, Feb 01, 2022 at 03:19:18PM +0100, Cornelia Huck wrote: > >> - have an RFC for QEMU that contains a provisional update of the > >> relevant vfio headers so that we can discuss the QEMU side (and maybe > >> shoot down any potential problems in the uapi before they are merged > >> in the kernel) > > > > This qemu patch is linked in the cover letter. > > The QEMU changes need to be discussed on qemu-devel, a link to a git > tree with work in progress only goes so far. Of course, but we are not going to bother the qemu community until the kernel side is settled. > (From my quick look there, this needs to have any headers changes split > out into a separate patch. The changes in migration.c are hard to > review; is there any chance to split the error path cleanups from the > interface changes?) We can do whatever, once we figure out what it actually needs to look like. Rip and replace might be the best option. Jason
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 9efc35535b29..70c77da5812d 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -323,7 +323,6 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) #define VFIO_REGION_TYPE_GFX (1) #define VFIO_REGION_TYPE_CCW (2) -#define VFIO_REGION_TYPE_MIGRATION (3) /* sub-types for VFIO_REGION_TYPE_PCI_* */ @@ -404,233 +403,6 @@ struct vfio_region_gfx_edid { #define VFIO_REGION_SUBTYPE_CCW_SCHIB (2) #define VFIO_REGION_SUBTYPE_CCW_CRW (3) -/* sub-types for VFIO_REGION_TYPE_MIGRATION */ -#define VFIO_REGION_SUBTYPE_MIGRATION (1) - -/* - * The structure vfio_device_migration_info is placed at the 0th offset of - * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related - * migration information. Field accesses from this structure are only supported - * at their native width and alignment. Otherwise, the result is undefined and - * vendor drivers should return an error. - * - * device_state: (read/write) - * - The user application writes to this field to inform the vendor driver - * about the device state to be transitioned to. - * - The vendor driver should take the necessary actions to change the - * device state. After successful transition to a given state, the - * vendor driver should return success on write(device_state, state) - * system call. If the device state transition fails, the vendor driver - * should return an appropriate -errno for the fault condition. - * - On the user application side, if the device state transition fails, - * that is, if write(device_state, state) returns an error, read - * device_state again to determine the current state of the device from - * the vendor driver. - * - The vendor driver should return previous state of the device unless - * the vendor driver has encountered an internal error, in which case - * the vendor driver may report the device_state VFIO_DEVICE_STATE_ERROR. - * - The user application must use the device reset ioctl to recover the - * device from VFIO_DEVICE_STATE_ERROR state. If the device is - * indicated to be in a valid device state by reading device_state, the - * user application may attempt to transition the device to any valid - * state reachable from the current state or terminate itself. - * - * device_state consists of 3 bits: - * - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear, - * it indicates the _STOP state. When the device state is changed to - * _STOP, driver should stop the device before write() returns. - * - If bit 1 is set, it indicates the _SAVING state, which means that the - * driver should start gathering device state information that will be - * provided to the VFIO user application to save the device's state. - * - If bit 2 is set, it indicates the _RESUMING state, which means that - * the driver should prepare to resume the device. Data provided through - * the migration region should be used to resume the device. - * Bits 3 - 31 are reserved for future use. To preserve them, the user - * application should perform a read-modify-write operation on this - * field when modifying the specified bits. - * - * +------- _RESUMING - * |+------ _SAVING - * ||+----- _RUNNING - * ||| - * 000b => Device Stopped, not saving or resuming - * 001b => Device running, which is the default state - * 010b => Stop the device & save the device state, stop-and-copy state - * 011b => Device running and save the device state, pre-copy state - * 100b => Device stopped and the device state is resuming - * 101b => Invalid state - * 110b => Error state - * 111b => Invalid state - * - * State transitions: - * - * _RESUMING _RUNNING Pre-copy Stop-and-copy _STOP - * (100b) (001b) (011b) (010b) (000b) - * 0. Running or default state - * | - * - * 1. Normal Shutdown (optional) - * |------------------------------------->| - * - * 2. Save the state or suspend - * |------------------------->|---------->| - * - * 3. Save the state during live migration - * |----------->|------------>|---------->| - * - * 4. Resuming - * |<---------| - * - * 5. Resumed - * |--------->| - * - * 0. Default state of VFIO device is _RUNNING when the user application starts. - * 1. During normal shutdown of the user application, the user application may - * optionally change the VFIO device state from _RUNNING to _STOP. This - * transition is optional. The vendor driver must support this transition but - * must not require it. - * 2. When the user application saves state or suspends the application, the - * device state transitions from _RUNNING to stop-and-copy and then to _STOP. - * On state transition from _RUNNING to stop-and-copy, driver must stop the - * device, save the device state and send it to the application through the - * migration region. The sequence to be followed for such transition is given - * below. - * 3. In live migration of user application, the state transitions from _RUNNING - * to pre-copy, to stop-and-copy, and to _STOP. - * On state transition from _RUNNING to pre-copy, the driver should start - * gathering the device state while the application is still running and send - * the device state data to application through the migration region. - * On state transition from pre-copy to stop-and-copy, the driver must stop - * the device, save the device state and send it to the user application - * through the migration region. - * Vendor drivers must support the pre-copy state even for implementations - * where no data is provided to the user before the stop-and-copy state. The - * user must not be required to consume all migration data before the device - * transitions to a new state, including the stop-and-copy state. - * The sequence to be followed for above two transitions is given below. - * 4. To start the resuming phase, the device state should be transitioned from - * the _RUNNING to the _RESUMING state. - * In the _RESUMING state, the driver should use the device state data - * received through the migration region to resume the device. - * 5. After providing saved device data to the driver, the application should - * change the state from _RESUMING to _RUNNING. - * - * reserved: - * Reads on this field return zero and writes are ignored. - * - * pending_bytes: (read only) - * The number of pending bytes still to be migrated from the vendor driver. - * - * data_offset: (read only) - * The user application should read data_offset field from the migration - * region. The user application should read the device data from this - * offset within the migration region during the _SAVING state or write - * the device data during the _RESUMING state. See below for details of - * sequence to be followed. - * - * data_size: (read/write) - * The user application should read data_size to get the size in bytes of - * the data copied in the migration region during the _SAVING state and - * write the size in bytes of the data copied in the migration region - * during the _RESUMING state. - * - * The format of the migration region is as follows: - * ------------------------------------------------------------------ - * |vfio_device_migration_info| data section | - * | | /////////////////////////////// | - * ------------------------------------------------------------------ - * ^ ^ - * offset 0-trapped part data_offset - * - * The structure vfio_device_migration_info is always followed by the data - * section in the region, so data_offset will always be nonzero. The offset - * from where the data is copied is decided by the kernel driver. The data - * section can be trapped, mmapped, or partitioned, depending on how the kernel - * driver defines the data section. The data section partition can be defined - * as mapped by the sparse mmap capability. If mmapped, data_offset must be - * page aligned, whereas initial section which contains the - * vfio_device_migration_info structure, might not end at the offset, which is - * page aligned. The user is not required to access through mmap regardless - * of the capabilities of the region mmap. - * The vendor driver should determine whether and how to partition the data - * section. The vendor driver should return data_offset accordingly. - * - * The sequence to be followed while in pre-copy state and stop-and-copy state - * is as follows: - * a. Read pending_bytes, indicating the start of a new iteration to get device - * data. Repeated read on pending_bytes at this stage should have no side - * effects. - * If pending_bytes == 0, the user application should not iterate to get data - * for that device. - * If pending_bytes > 0, perform the following steps. - * b. Read data_offset, indicating that the vendor driver should make data - * available through the data section. The vendor driver should return this - * read operation only after data is available from (region + data_offset) - * to (region + data_offset + data_size). - * c. Read data_size, which is the amount of data in bytes available through - * the migration region. - * Read on data_offset and data_size should return the offset and size of - * the current buffer if the user application reads data_offset and - * data_size more than once here. - * d. Read data_size bytes of data from (region + data_offset) from the - * migration region. - * e. Process the data. - * f. Read pending_bytes, which indicates that the data from the previous - * iteration has been read. If pending_bytes > 0, go to step b. - * - * The user application can transition from the _SAVING|_RUNNING - * (pre-copy state) to the _SAVING (stop-and-copy) state regardless of the - * number of pending bytes. The user application should iterate in _SAVING - * (stop-and-copy) until pending_bytes is 0. - * - * The sequence to be followed while _RESUMING device state is as follows: - * While data for this device is available, repeat the following steps: - * a. Read data_offset from where the user application should write data. - * b. Write migration data starting at the migration region + data_offset for - * the length determined by data_size from the migration source. - * c. Write data_size, which indicates to the vendor driver that data is - * written in the migration region. Vendor driver must return this write - * operations on consuming data. Vendor driver should apply the - * user-provided migration region data to the device resume state. - * - * If an error occurs during the above sequences, the vendor driver can return - * an error code for next read() or write() operation, which will terminate the - * loop. The user application should then take the next necessary action, for - * example, failing migration or terminating the user application. - * - * For the user application, data is opaque. The user application should write - * data in the same order as the data is received and the data should be of - * same transaction size at the source. - */ - -struct vfio_device_migration_info { - __u32 device_state; /* VFIO device state */ -#define VFIO_DEVICE_STATE_V1_STOP (0) -#define VFIO_DEVICE_STATE_V1_RUNNING (1 << 0) -#define VFIO_DEVICE_STATE_V1_SAVING (1 << 1) -#define VFIO_DEVICE_STATE_V1_RESUMING (1 << 2) -#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \ - VFIO_DEVICE_STATE_SAVING | \ - VFIO_DEVICE_STATE_RESUMING) - -#define VFIO_DEVICE_STATE_VALID(state) \ - (state & VFIO_DEVICE_STATE_RESUMING ? \ - (state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_RESUMING : 1) - -#define VFIO_DEVICE_STATE_IS_ERROR(state) \ - ((state & VFIO_DEVICE_STATE_MASK) == (VFIO_DEVICE_STATE_SAVING | \ - VFIO_DEVICE_STATE_RESUMING)) - -#define VFIO_DEVICE_STATE_SET_ERROR(state) \ - ((state & ~VFIO_DEVICE_STATE_MASK) | VFIO_DEVICE_SATE_SAVING | \ - VFIO_DEVICE_STATE_RESUMING) - - __u32 reserved; - __u64 pending_bytes; - __u64 data_offset; - __u64 data_size; -}; - /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within