mbox series

[vfio,0/9] Add chunk mode support for mlx5 driver

Message ID 20230911093856.81910-1-yishaih@nvidia.com (mailing list archive)
Headers show
Series Add chunk mode support for mlx5 driver | expand

Message

Yishai Hadas Sept. 11, 2023, 9:38 a.m. UTC
This series adds 'chunk mode' support for mlx5 driver upon the migration
flow.

Before this series, we were limited to 4GB state size, as of the 4 bytes
max value based on the device specification for the query/save/load
commands.

Once the device supports 'chunk mode' the driver can support state size
which is larger than 4GB.

In that case, the device has the capability to split a single image to
multiple chunks as long as the software provides a buffer in the minimum
size reported by the device.

The driver should query for the minimum buffer size required using
QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
input, in that case, the output will include both the minimum buffer
size and also the remaining total size to be reported/used where it will
be applicable.

Upon chunk mode, there may be multiple images that will be read from the
device upon STOP_COPY. The driver will read ahead from the firmware the
full state in small/optimized chunks while letting QEMU/user space read
in parallel the available data.

The chunk buffer size is picked up based on the minimum size that
firmware requires, the total full size and some max value in the driver
code which was set to 8MB to achieve some optimized downtime in the
general case.

With that series in place, we could migrate successfully a device state
with a larger size than 4GB, while even improving the downtime in some
scenarios.

Note:
As the first patch should go to net/mlx5 we may need to send it as a
pull request format to VFIO to avoid conflicts before acceptance.

Yishai

Yishai Hadas (9):
  net/mlx5: Introduce ifc bits for migration in a chunk mode
  vfio/mlx5: Wake up the reader post of disabling the SAVING migration
    file
  vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
    error
  vfio/mlx5: Enable querying state size which is > 4GB
  vfio/mlx5: Rename some stuff to match chunk mode
  vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
  vfio/mlx5: Add support for SAVING in chunk mode
  vfio/mlx5: Add support for READING in chunk mode
  vfio/mlx5: Activate the chunk mode functionality

 drivers/vfio/pci/mlx5/cmd.c   | 103 +++++++++----
 drivers/vfio/pci/mlx5/cmd.h   |  28 +++-
 drivers/vfio/pci/mlx5/main.c  | 283 +++++++++++++++++++++++++---------
 include/linux/mlx5/mlx5_ifc.h |  15 +-
 4 files changed, 322 insertions(+), 107 deletions(-)

Comments

Jason Gunthorpe Sept. 20, 2023, 6:31 p.m. UTC | #1
On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:
> This series adds 'chunk mode' support for mlx5 driver upon the migration
> flow.
> 
> Before this series, we were limited to 4GB state size, as of the 4 bytes
> max value based on the device specification for the query/save/load
> commands.
> 
> Once the device supports 'chunk mode' the driver can support state size
> which is larger than 4GB.
> 
> In that case, the device has the capability to split a single image to
> multiple chunks as long as the software provides a buffer in the minimum
> size reported by the device.
> 
> The driver should query for the minimum buffer size required using
> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> input, in that case, the output will include both the minimum buffer
> size and also the remaining total size to be reported/used where it will
> be applicable.
> 
> Upon chunk mode, there may be multiple images that will be read from the
> device upon STOP_COPY. The driver will read ahead from the firmware the
> full state in small/optimized chunks while letting QEMU/user space read
> in parallel the available data.
> 
> The chunk buffer size is picked up based on the minimum size that
> firmware requires, the total full size and some max value in the driver
> code which was set to 8MB to achieve some optimized downtime in the
> general case.
> 
> With that series in place, we could migrate successfully a device state
> with a larger size than 4GB, while even improving the downtime in some
> scenarios.
> 
> Note:
> As the first patch should go to net/mlx5 we may need to send it as a
> pull request format to VFIO to avoid conflicts before acceptance.
> 
> Yishai
> 
> Yishai Hadas (9):
>   net/mlx5: Introduce ifc bits for migration in a chunk mode
>   vfio/mlx5: Wake up the reader post of disabling the SAVING migration
>     file
>   vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
>     error
>   vfio/mlx5: Enable querying state size which is > 4GB
>   vfio/mlx5: Rename some stuff to match chunk mode
>   vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
>   vfio/mlx5: Add support for SAVING in chunk mode
>   vfio/mlx5: Add support for READING in chunk mode
>   vfio/mlx5: Activate the chunk mode functionality

I didn't check in great depth but this looks OK to me

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

I think this is a good design to start motivating more qmeu
improvements, eg using io_uring as we could go further in the driver
to optimize with that kind of support.

Jason
Yishai Hadas Sept. 27, 2023, 10:59 a.m. UTC | #2
On 20/09/2023 21:31, Jason Gunthorpe wrote:
> On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:
>> This series adds 'chunk mode' support for mlx5 driver upon the migration
>> flow.
>>
>> Before this series, we were limited to 4GB state size, as of the 4 bytes
>> max value based on the device specification for the query/save/load
>> commands.
>>
>> Once the device supports 'chunk mode' the driver can support state size
>> which is larger than 4GB.
>>
>> In that case, the device has the capability to split a single image to
>> multiple chunks as long as the software provides a buffer in the minimum
>> size reported by the device.
>>
>> The driver should query for the minimum buffer size required using
>> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
>> input, in that case, the output will include both the minimum buffer
>> size and also the remaining total size to be reported/used where it will
>> be applicable.
>>
>> Upon chunk mode, there may be multiple images that will be read from the
>> device upon STOP_COPY. The driver will read ahead from the firmware the
>> full state in small/optimized chunks while letting QEMU/user space read
>> in parallel the available data.
>>
>> The chunk buffer size is picked up based on the minimum size that
>> firmware requires, the total full size and some max value in the driver
>> code which was set to 8MB to achieve some optimized downtime in the
>> general case.
>>
>> With that series in place, we could migrate successfully a device state
>> with a larger size than 4GB, while even improving the downtime in some
>> scenarios.
>>
>> Note:
>> As the first patch should go to net/mlx5 we may need to send it as a
>> pull request format to VFIO to avoid conflicts before acceptance.
>>
>> Yishai
>>
>> Yishai Hadas (9):
>>    net/mlx5: Introduce ifc bits for migration in a chunk mode
>>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
>>      file
>>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
>>      error
>>    vfio/mlx5: Enable querying state size which is > 4GB
>>    vfio/mlx5: Rename some stuff to match chunk mode
>>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
>>    vfio/mlx5: Add support for SAVING in chunk mode
>>    vfio/mlx5: Add support for READING in chunk mode
>>    vfio/mlx5: Activate the chunk mode functionality
> I didn't check in great depth but this looks OK to me
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks Jason

>
> I think this is a good design to start motivating more qmeu
> improvements, eg using io_uring as we could go further in the driver
> to optimize with that kind of support.
>
> Jason

Alex,

Can we move forward with the series and send a PR for the first patch 
that needs to go also to net/mlx5 ?

Thanks,
Yishai
Alex Williamson Sept. 27, 2023, 10:10 p.m. UTC | #3
On Wed, 27 Sep 2023 13:59:06 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 20/09/2023 21:31, Jason Gunthorpe wrote:
> > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:  
> >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> >> flow.
> >>
> >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> >> max value based on the device specification for the query/save/load
> >> commands.
> >>
> >> Once the device supports 'chunk mode' the driver can support state size
> >> which is larger than 4GB.
> >>
> >> In that case, the device has the capability to split a single image to
> >> multiple chunks as long as the software provides a buffer in the minimum
> >> size reported by the device.
> >>
> >> The driver should query for the minimum buffer size required using
> >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> >> input, in that case, the output will include both the minimum buffer
> >> size and also the remaining total size to be reported/used where it will
> >> be applicable.
> >>
> >> Upon chunk mode, there may be multiple images that will be read from the
> >> device upon STOP_COPY. The driver will read ahead from the firmware the
> >> full state in small/optimized chunks while letting QEMU/user space read
> >> in parallel the available data.
> >>
> >> The chunk buffer size is picked up based on the minimum size that
> >> firmware requires, the total full size and some max value in the driver
> >> code which was set to 8MB to achieve some optimized downtime in the
> >> general case.
> >>
> >> With that series in place, we could migrate successfully a device state
> >> with a larger size than 4GB, while even improving the downtime in some
> >> scenarios.
> >>
> >> Note:
> >> As the first patch should go to net/mlx5 we may need to send it as a
> >> pull request format to VFIO to avoid conflicts before acceptance.
> >>
> >> Yishai
> >>
> >> Yishai Hadas (9):
> >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> >>      file
> >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> >>      error
> >>    vfio/mlx5: Enable querying state size which is > 4GB
> >>    vfio/mlx5: Rename some stuff to match chunk mode
> >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> >>    vfio/mlx5: Add support for SAVING in chunk mode
> >>    vfio/mlx5: Add support for READING in chunk mode
> >>    vfio/mlx5: Activate the chunk mode functionality  
> > I didn't check in great depth but this looks OK to me
> >
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>  
> 
> Thanks Jason
> 
> >
> > I think this is a good design to start motivating more qmeu
> > improvements, eg using io_uring as we could go further in the driver
> > to optimize with that kind of support.
> >
> > Jason  
> 
> Alex,
> 
> Can we move forward with the series and send a PR for the first patch 
> that needs to go also to net/mlx5 ?

Yeah, I don't spot any issues with it either.  Thanks,

Alex
Leon Romanovsky Sept. 28, 2023, 11:08 a.m. UTC | #4
On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:
> On Wed, 27 Sep 2023 13:59:06 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
> > On 20/09/2023 21:31, Jason Gunthorpe wrote:
> > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:  
> > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > >> flow.
> > >>
> > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > >> max value based on the device specification for the query/save/load
> > >> commands.
> > >>
> > >> Once the device supports 'chunk mode' the driver can support state size
> > >> which is larger than 4GB.
> > >>
> > >> In that case, the device has the capability to split a single image to
> > >> multiple chunks as long as the software provides a buffer in the minimum
> > >> size reported by the device.
> > >>
> > >> The driver should query for the minimum buffer size required using
> > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > >> input, in that case, the output will include both the minimum buffer
> > >> size and also the remaining total size to be reported/used where it will
> > >> be applicable.
> > >>
> > >> Upon chunk mode, there may be multiple images that will be read from the
> > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > >> full state in small/optimized chunks while letting QEMU/user space read
> > >> in parallel the available data.
> > >>
> > >> The chunk buffer size is picked up based on the minimum size that
> > >> firmware requires, the total full size and some max value in the driver
> > >> code which was set to 8MB to achieve some optimized downtime in the
> > >> general case.
> > >>
> > >> With that series in place, we could migrate successfully a device state
> > >> with a larger size than 4GB, while even improving the downtime in some
> > >> scenarios.
> > >>
> > >> Note:
> > >> As the first patch should go to net/mlx5 we may need to send it as a
> > >> pull request format to VFIO to avoid conflicts before acceptance.
> > >>
> > >> Yishai
> > >>
> > >> Yishai Hadas (9):
> > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > >>      file
> > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > >>      error
> > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > >>    vfio/mlx5: Add support for READING in chunk mode
> > >>    vfio/mlx5: Activate the chunk mode functionality  
> > > I didn't check in great depth but this looks OK to me
> > >
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>  
> > 
> > Thanks Jason
> > 
> > >
> > > I think this is a good design to start motivating more qmeu
> > > improvements, eg using io_uring as we could go further in the driver
> > > to optimize with that kind of support.
> > >
> > > Jason  
> > 
> > Alex,
> > 
> > Can we move forward with the series and send a PR for the first patch 
> > that needs to go also to net/mlx5 ?
> 
> Yeah, I don't spot any issues with it either.  Thanks,

Hi Alex,

I uploaded the first patch to shared branch, can you please pull it?
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio

Thanks

> 
> Alex
> 
>
Alex Williamson Sept. 28, 2023, 6:29 p.m. UTC | #5
On Thu, 28 Sep 2023 14:08:08 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:
> > On Wed, 27 Sep 2023 13:59:06 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >   
> > > On 20/09/2023 21:31, Jason Gunthorpe wrote:  
> > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:    
> > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > >> flow.
> > > >>
> > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > >> max value based on the device specification for the query/save/load
> > > >> commands.
> > > >>
> > > >> Once the device supports 'chunk mode' the driver can support state size
> > > >> which is larger than 4GB.
> > > >>
> > > >> In that case, the device has the capability to split a single image to
> > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > >> size reported by the device.
> > > >>
> > > >> The driver should query for the minimum buffer size required using
> > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > >> input, in that case, the output will include both the minimum buffer
> > > >> size and also the remaining total size to be reported/used where it will
> > > >> be applicable.
> > > >>
> > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > >> in parallel the available data.
> > > >>
> > > >> The chunk buffer size is picked up based on the minimum size that
> > > >> firmware requires, the total full size and some max value in the driver
> > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > >> general case.
> > > >>
> > > >> With that series in place, we could migrate successfully a device state
> > > >> with a larger size than 4GB, while even improving the downtime in some
> > > >> scenarios.
> > > >>
> > > >> Note:
> > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > >>
> > > >> Yishai
> > > >>
> > > >> Yishai Hadas (9):
> > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > >>      file
> > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > >>      error
> > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > >>    vfio/mlx5: Activate the chunk mode functionality    
> > > > I didn't check in great depth but this looks OK to me
> > > >
> > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>    
> > > 
> > > Thanks Jason
> > >   
> > > >
> > > > I think this is a good design to start motivating more qmeu
> > > > improvements, eg using io_uring as we could go further in the driver
> > > > to optimize with that kind of support.
> > > >
> > > > Jason    
> > > 
> > > Alex,
> > > 
> > > Can we move forward with the series and send a PR for the first patch 
> > > that needs to go also to net/mlx5 ?  
> > 
> > Yeah, I don't spot any issues with it either.  Thanks,  
> 
> Hi Alex,
> 
> I uploaded the first patch to shared branch, can you please pull it?
> https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio

Yep, got it.  Thanks.

Yishai, were you planning to resend the remainder or do you just want
me to pull 2-9 from this series?  Thanks,

Alex
Leon Romanovsky Sept. 28, 2023, 6:42 p.m. UTC | #6
On Thu, Sep 28, 2023 at 12:29:52PM -0600, Alex Williamson wrote:
> On Thu, 28 Sep 2023 14:08:08 +0300
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:
> > > On Wed, 27 Sep 2023 13:59:06 +0300
> > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > >   
> > > > On 20/09/2023 21:31, Jason Gunthorpe wrote:  
> > > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:    
> > > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > > >> flow.
> > > > >>
> > > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > > >> max value based on the device specification for the query/save/load
> > > > >> commands.
> > > > >>
> > > > >> Once the device supports 'chunk mode' the driver can support state size
> > > > >> which is larger than 4GB.
> > > > >>
> > > > >> In that case, the device has the capability to split a single image to
> > > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > > >> size reported by the device.
> > > > >>
> > > > >> The driver should query for the minimum buffer size required using
> > > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > > >> input, in that case, the output will include both the minimum buffer
> > > > >> size and also the remaining total size to be reported/used where it will
> > > > >> be applicable.
> > > > >>
> > > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > > >> in parallel the available data.
> > > > >>
> > > > >> The chunk buffer size is picked up based on the minimum size that
> > > > >> firmware requires, the total full size and some max value in the driver
> > > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > > >> general case.
> > > > >>
> > > > >> With that series in place, we could migrate successfully a device state
> > > > >> with a larger size than 4GB, while even improving the downtime in some
> > > > >> scenarios.
> > > > >>
> > > > >> Note:
> > > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > > >>
> > > > >> Yishai
> > > > >>
> > > > >> Yishai Hadas (9):
> > > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > > >>      file
> > > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > > >>      error
> > > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > > >>    vfio/mlx5: Activate the chunk mode functionality    
> > > > > I didn't check in great depth but this looks OK to me
> > > > >
> > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>    
> > > > 
> > > > Thanks Jason
> > > >   
> > > > >
> > > > > I think this is a good design to start motivating more qmeu
> > > > > improvements, eg using io_uring as we could go further in the driver
> > > > > to optimize with that kind of support.
> > > > >
> > > > > Jason    
> > > > 
> > > > Alex,
> > > > 
> > > > Can we move forward with the series and send a PR for the first patch 
> > > > that needs to go also to net/mlx5 ?  
> > > 
> > > Yeah, I don't spot any issues with it either.  Thanks,  
> > 
> > Hi Alex,
> > 
> > I uploaded the first patch to shared branch, can you please pull it?
> > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio
> 
> Yep, got it.  Thanks.
> 
> Yishai, were you planning to resend the remainder or do you just want
> me to pull 2-9 from this series?  Thanks,

Just pull, like I did with b4 :)

~/src/b4/b4.sh shazam -l -s https://lore.kernel.org/kvm/20230911093856.81910-1-yishaih@nvidia.com/ -P 2-9 -t

Thanks

> 
> Alex
>
Alex Williamson Sept. 28, 2023, 6:47 p.m. UTC | #7
On Thu, 28 Sep 2023 21:42:22 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Thu, Sep 28, 2023 at 12:29:52PM -0600, Alex Williamson wrote:
> > On Thu, 28 Sep 2023 14:08:08 +0300
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:  
> > > > On Wed, 27 Sep 2023 13:59:06 +0300
> > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > >     
> > > > > On 20/09/2023 21:31, Jason Gunthorpe wrote:    
> > > > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:      
> > > > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > > > >> flow.
> > > > > >>
> > > > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > > > >> max value based on the device specification for the query/save/load
> > > > > >> commands.
> > > > > >>
> > > > > >> Once the device supports 'chunk mode' the driver can support state size
> > > > > >> which is larger than 4GB.
> > > > > >>
> > > > > >> In that case, the device has the capability to split a single image to
> > > > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > > > >> size reported by the device.
> > > > > >>
> > > > > >> The driver should query for the minimum buffer size required using
> > > > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > > > >> input, in that case, the output will include both the minimum buffer
> > > > > >> size and also the remaining total size to be reported/used where it will
> > > > > >> be applicable.
> > > > > >>
> > > > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > > > >> in parallel the available data.
> > > > > >>
> > > > > >> The chunk buffer size is picked up based on the minimum size that
> > > > > >> firmware requires, the total full size and some max value in the driver
> > > > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > > > >> general case.
> > > > > >>
> > > > > >> With that series in place, we could migrate successfully a device state
> > > > > >> with a larger size than 4GB, while even improving the downtime in some
> > > > > >> scenarios.
> > > > > >>
> > > > > >> Note:
> > > > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > > > >>
> > > > > >> Yishai
> > > > > >>
> > > > > >> Yishai Hadas (9):
> > > > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > > > >>      file
> > > > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > > > >>      error
> > > > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > > > >>    vfio/mlx5: Activate the chunk mode functionality      
> > > > > > I didn't check in great depth but this looks OK to me
> > > > > >
> > > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>      
> > > > > 
> > > > > Thanks Jason
> > > > >     
> > > > > >
> > > > > > I think this is a good design to start motivating more qmeu
> > > > > > improvements, eg using io_uring as we could go further in the driver
> > > > > > to optimize with that kind of support.
> > > > > >
> > > > > > Jason      
> > > > > 
> > > > > Alex,
> > > > > 
> > > > > Can we move forward with the series and send a PR for the first patch 
> > > > > that needs to go also to net/mlx5 ?    
> > > > 
> > > > Yeah, I don't spot any issues with it either.  Thanks,    
> > > 
> > > Hi Alex,
> > > 
> > > I uploaded the first patch to shared branch, can you please pull it?
> > > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio  
> > 
> > Yep, got it.  Thanks.
> > 
> > Yishai, were you planning to resend the remainder or do you just want
> > me to pull 2-9 from this series?  Thanks,  
> 
> Just pull, like I did with b4 :)
> 
> ~/src/b4/b4.sh shazam -l -s https://lore.kernel.org/kvm/20230911093856.81910-1-yishaih@nvidia.com/ -P 2-9 -t

Yep, the mechanics were really not the question, I'm just double
checking to avoid any conflicts with a re-post.  Thanks,

Alex
Leon Romanovsky Sept. 28, 2023, 6:51 p.m. UTC | #8
On Thu, Sep 28, 2023 at 12:47:03PM -0600, Alex Williamson wrote:
> On Thu, 28 Sep 2023 21:42:22 +0300
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Thu, Sep 28, 2023 at 12:29:52PM -0600, Alex Williamson wrote:
> > > On Thu, 28 Sep 2023 14:08:08 +0300
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >   
> > > > On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:  
> > > > > On Wed, 27 Sep 2023 13:59:06 +0300
> > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > >     
> > > > > > On 20/09/2023 21:31, Jason Gunthorpe wrote:    
> > > > > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:      
> > > > > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > > > > >> flow.
> > > > > > >>
> > > > > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > > > > >> max value based on the device specification for the query/save/load
> > > > > > >> commands.
> > > > > > >>
> > > > > > >> Once the device supports 'chunk mode' the driver can support state size
> > > > > > >> which is larger than 4GB.
> > > > > > >>
> > > > > > >> In that case, the device has the capability to split a single image to
> > > > > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > > > > >> size reported by the device.
> > > > > > >>
> > > > > > >> The driver should query for the minimum buffer size required using
> > > > > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > > > > >> input, in that case, the output will include both the minimum buffer
> > > > > > >> size and also the remaining total size to be reported/used where it will
> > > > > > >> be applicable.
> > > > > > >>
> > > > > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > > > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > > > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > > > > >> in parallel the available data.
> > > > > > >>
> > > > > > >> The chunk buffer size is picked up based on the minimum size that
> > > > > > >> firmware requires, the total full size and some max value in the driver
> > > > > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > > > > >> general case.
> > > > > > >>
> > > > > > >> With that series in place, we could migrate successfully a device state
> > > > > > >> with a larger size than 4GB, while even improving the downtime in some
> > > > > > >> scenarios.
> > > > > > >>
> > > > > > >> Note:
> > > > > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > > > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > > > > >>
> > > > > > >> Yishai
> > > > > > >>
> > > > > > >> Yishai Hadas (9):
> > > > > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > > > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > > > > >>      file
> > > > > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > > > > >>      error
> > > > > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > > > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > > > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > > > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > > > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > > > > >>    vfio/mlx5: Activate the chunk mode functionality      
> > > > > > > I didn't check in great depth but this looks OK to me
> > > > > > >
> > > > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>      
> > > > > > 
> > > > > > Thanks Jason
> > > > > >     
> > > > > > >
> > > > > > > I think this is a good design to start motivating more qmeu
> > > > > > > improvements, eg using io_uring as we could go further in the driver
> > > > > > > to optimize with that kind of support.
> > > > > > >
> > > > > > > Jason      
> > > > > > 
> > > > > > Alex,
> > > > > > 
> > > > > > Can we move forward with the series and send a PR for the first patch 
> > > > > > that needs to go also to net/mlx5 ?    
> > > > > 
> > > > > Yeah, I don't spot any issues with it either.  Thanks,    
> > > > 
> > > > Hi Alex,
> > > > 
> > > > I uploaded the first patch to shared branch, can you please pull it?
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio  
> > > 
> > > Yep, got it.  Thanks.
> > > 
> > > Yishai, were you planning to resend the remainder or do you just want
> > > me to pull 2-9 from this series?  Thanks,  
> > 
> > Just pull, like I did with b4 :)
> > 
> > ~/src/b4/b4.sh shazam -l -s https://lore.kernel.org/kvm/20230911093856.81910-1-yishaih@nvidia.com/ -P 2-9 -t
> 
> Yep, the mechanics were really not the question, I'm just double
> checking to avoid any conflicts with a re-post.  Thanks,

It is pretty safe to say that he won't re-post. 
He had no plans to resend the series.

Thanks

> 
> Alex
>
Alex Williamson Sept. 28, 2023, 9:12 p.m. UTC | #9
On Thu, 28 Sep 2023 21:51:02 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Thu, Sep 28, 2023 at 12:47:03PM -0600, Alex Williamson wrote:
> > On Thu, 28 Sep 2023 21:42:22 +0300
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Thu, Sep 28, 2023 at 12:29:52PM -0600, Alex Williamson wrote:  
> > > > On Thu, 28 Sep 2023 14:08:08 +0300
> > > > Leon Romanovsky <leon@kernel.org> wrote:
> > > >     
> > > > > On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:    
> > > > > > On Wed, 27 Sep 2023 13:59:06 +0300
> > > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > > >       
> > > > > > > On 20/09/2023 21:31, Jason Gunthorpe wrote:      
> > > > > > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:        
> > > > > > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > > > > > >> flow.
> > > > > > > >>
> > > > > > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > > > > > >> max value based on the device specification for the query/save/load
> > > > > > > >> commands.
> > > > > > > >>
> > > > > > > >> Once the device supports 'chunk mode' the driver can support state size
> > > > > > > >> which is larger than 4GB.
> > > > > > > >>
> > > > > > > >> In that case, the device has the capability to split a single image to
> > > > > > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > > > > > >> size reported by the device.
> > > > > > > >>
> > > > > > > >> The driver should query for the minimum buffer size required using
> > > > > > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > > > > > >> input, in that case, the output will include both the minimum buffer
> > > > > > > >> size and also the remaining total size to be reported/used where it will
> > > > > > > >> be applicable.
> > > > > > > >>
> > > > > > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > > > > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > > > > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > > > > > >> in parallel the available data.
> > > > > > > >>
> > > > > > > >> The chunk buffer size is picked up based on the minimum size that
> > > > > > > >> firmware requires, the total full size and some max value in the driver
> > > > > > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > > > > > >> general case.
> > > > > > > >>
> > > > > > > >> With that series in place, we could migrate successfully a device state
> > > > > > > >> with a larger size than 4GB, while even improving the downtime in some
> > > > > > > >> scenarios.
> > > > > > > >>
> > > > > > > >> Note:
> > > > > > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > > > > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > > > > > >>
> > > > > > > >> Yishai
> > > > > > > >>
> > > > > > > >> Yishai Hadas (9):
> > > > > > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > > > > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > > > > > >>      file
> > > > > > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > > > > > >>      error
> > > > > > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > > > > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > > > > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > > > > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > > > > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > > > > > >>    vfio/mlx5: Activate the chunk mode functionality        
> > > > > > > > I didn't check in great depth but this looks OK to me
> > > > > > > >
> > > > > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>        
> > > > > > > 
> > > > > > > Thanks Jason
> > > > > > >       
> > > > > > > >
> > > > > > > > I think this is a good design to start motivating more qmeu
> > > > > > > > improvements, eg using io_uring as we could go further in the driver
> > > > > > > > to optimize with that kind of support.
> > > > > > > >
> > > > > > > > Jason        
> > > > > > > 
> > > > > > > Alex,
> > > > > > > 
> > > > > > > Can we move forward with the series and send a PR for the first patch 
> > > > > > > that needs to go also to net/mlx5 ?      
> > > > > > 
> > > > > > Yeah, I don't spot any issues with it either.  Thanks,      
> > > > > 
> > > > > Hi Alex,
> > > > > 
> > > > > I uploaded the first patch to shared branch, can you please pull it?
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio    
> > > > 
> > > > Yep, got it.  Thanks.
> > > > 
> > > > Yishai, were you planning to resend the remainder or do you just want
> > > > me to pull 2-9 from this series?  Thanks,    
> > > 
> > > Just pull, like I did with b4 :)
> > > 
> > > ~/src/b4/b4.sh shazam -l -s https://lore.kernel.org/kvm/20230911093856.81910-1-yishaih@nvidia.com/ -P 2-9 -t  
> > 
> > Yep, the mechanics were really not the question, I'm just double
> > checking to avoid any conflicts with a re-post.  Thanks,  
> 
> It is pretty safe to say that he won't re-post. 
> He had no plans to resend the series.

Ok, applied the remainder of the series to the vfio next branch for
v6.7.  Thanks,

Alex
Leon Romanovsky Oct. 2, 2023, 8:47 a.m. UTC | #10
On Mon, 11 Sep 2023 12:38:47 +0300, Yishai Hadas wrote:
> This series adds 'chunk mode' support for mlx5 driver upon the migration
> flow.
> 
> Before this series, we were limited to 4GB state size, as of the 4 bytes
> max value based on the device specification for the query/save/load
> commands.
> 
> [...]

Applied, thanks!

[1/9] net/mlx5: Introduce ifc bits for migration in a chunk mode
      https://git.kernel.org/rdma/rdma/c/5aa4c9608d2d5f

Best regards,