Message ID | 20221201152931.47913-1-yishaih@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Add migration PRE_COPY support for mlx5 driver | expand |
On Thu, Dec 01, 2022 at 05:29:17PM +0200, Yishai Hadas wrote: > > Jason Gunthorpe (1): > vfio: Extend the device migration protocol with PRE_COPY > > Shay Drory (3): > net/mlx5: Introduce ifc bits for pre_copy > vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error > vfio/mlx5: Enable MIGRATION_PRE_COPY flag > > Yishai Hadas (10): > vfio/mlx5: Enforce a single SAVE command at a time > vfio/mlx5: Refactor PD usage > vfio/mlx5: Refactor MKEY usage > vfio/mlx5: Refactor migration file state > vfio/mlx5: Refactor to use queue based data chunks > vfio/mlx5: Introduce device transitions of PRE_COPY > vfio/mlx5: Introduce SW headers for migration states > vfio/mlx5: Introduce vfio precopy ioctl implementation > vfio/mlx5: Consider temporary end of stream as part of PRE_COPY > vfio/mlx5: Introduce multiple loads This looks OK to me now, the logic is clear Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Thanks, Jason
> From: Yishai Hadas <yishaih@nvidia.com> > Sent: Thursday, December 1, 2022 11:29 PM > > In mlx5 driver we could gain with this series about 20-30 percent > improvement > in the downtime compared to the previous code when PRE_COPY wasn't > supported. > Curious to see more data here. what is the workload/configuration? What is the size of the full state and downtime w/o PRECOPY? with PRECOPY what is the size of initial/middle/final states?
On 02/12/2022 10:57, Tian, Kevin wrote: >> From: Yishai Hadas <yishaih@nvidia.com> >> Sent: Thursday, December 1, 2022 11:29 PM >> >> In mlx5 driver we could gain with this series about 20-30 percent >> improvement >> in the downtime compared to the previous code when PRE_COPY wasn't >> supported. >> > Curious to see more data here. > > what is the workload/configuration? We tested with multiple workloads which were varied by the number of allocated resources, number of VFs on the VM, busy or idle device depending on some traffic that runs in the background, etc. > > What is the size of the full state and downtime w/o PRECOPY? It depends on the amount of the allocated resources that were already opened upon the migration, and the other workloads parameters as mentioned above. The downtime gain was mainly achieved by sending the initial/middle states having the metadata without regard to the size. > > with PRECOPY what is the size of initial/middle/final states? Generally saying, the initial state may include metadata on the current state, the middle states may hold 'diff' compared to the initial/previous ones and in most cases may be smaller, the final state holds the data itself and may be larger. Yishai