mbox series

[V2,mlx5-next,0/4] Improve mlx5 live migration driver

Message ID 20220510090206.90374-1-yishaih@nvidia.com (mailing list archive)
Headers show
Series Improve mlx5 live migration driver | expand

Message

Yishai Hadas May 10, 2022, 9:02 a.m. UTC
This series improves mlx5 live migration driver in few aspects as of
below.

Refactor to enable running migration commands in parallel over the PF
command interface.

To achieve that we exposed from mlx5_core an API to let the VF be
notified before that the PF command interface goes down/up. (e.g. PF
reload upon health recovery).

Once having the above functionality in place mlx5 vfio doesn't need any
more to obtain the global PF lock upon using the command interface but
can rely on the above mechanism to be in sync with the PF.

This can enable parallel VFs migration over the PF command interface
from kernel driver point of view.

In addition,
Moved to use the PF async command mode for the SAVE state command.
This enables returning earlier to user space upon issuing successfully
the command and improve latency by let things run in parallel.

Alex, as this series touches mlx5_core we may need to send this in a
pull request format to VFIO to avoid conflicts before acceptance.

V2:
- Improve packing in some structures as was suggested by Alex.
- Move workqueue managing into set/remove migratable functions as was
  suggested by Alex.

V1: https://lore.kernel.org/netdev/20220508131053.241347-1-yishaih@nvidia.com/
- Put the net/mlx5 patch as the first patch based on Jason's note.
- Refactor and combine the previous first patch with the third patch to
  have a cleaner readable code, this follows Alex's notes on V0.

V0: https://lore.kernel.org/netdev/20220504213309.GM49344@nvidia.com/T/

Yishai

Yishai Hadas (4):
  net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister
    APIs
  vfio/mlx5: Manage the VF attach/detach callback from the PF
  vfio/mlx5: Refactor to enable VFs migration in parallel
  vfio/mlx5: Run the SAVE state command in an async mode

 .../net/ethernet/mellanox/mlx5/core/sriov.c   |  65 ++++-
 drivers/vfio/pci/mlx5/cmd.c                   | 236 +++++++++++++-----
 drivers/vfio/pci/mlx5/cmd.h                   |  52 +++-
 drivers/vfio/pci/mlx5/main.c                  | 122 ++++-----
 include/linux/mlx5/driver.h                   |  12 +
 5 files changed, 351 insertions(+), 136 deletions(-)

Comments

Leon Romanovsky May 10, 2022, 1:16 p.m. UTC | #1
On Tue, May 10, 2022 at 12:02:02PM +0300, Yishai Hadas wrote:
> This series improves mlx5 live migration driver in few aspects as of
> below.
> 
> Refactor to enable running migration commands in parallel over the PF
> command interface.
> 
> To achieve that we exposed from mlx5_core an API to let the VF be
> notified before that the PF command interface goes down/up. (e.g. PF
> reload upon health recovery).
> 
> Once having the above functionality in place mlx5 vfio doesn't need any
> more to obtain the global PF lock upon using the command interface but
> can rely on the above mechanism to be in sync with the PF.
> 
> This can enable parallel VFs migration over the PF command interface
> from kernel driver point of view.
> 
> In addition,
> Moved to use the PF async command mode for the SAVE state command.
> This enables returning earlier to user space upon issuing successfully
> the command and improve latency by let things run in parallel.
> 
> Alex, as this series touches mlx5_core we may need to send this in a
> pull request format to VFIO to avoid conflicts before acceptance.

The PR was sent.
https://lore.kernel.org/netdev/20220510131236.1039430-1-leon@kernel.org/T/#u

Thanks
Alex Williamson May 10, 2022, 3 p.m. UTC | #2
On Tue, 10 May 2022 16:16:16 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Tue, May 10, 2022 at 12:02:02PM +0300, Yishai Hadas wrote:
> > This series improves mlx5 live migration driver in few aspects as of
> > below.
> > 
> > Refactor to enable running migration commands in parallel over the PF
> > command interface.
> > 
> > To achieve that we exposed from mlx5_core an API to let the VF be
> > notified before that the PF command interface goes down/up. (e.g. PF
> > reload upon health recovery).
> > 
> > Once having the above functionality in place mlx5 vfio doesn't need any
> > more to obtain the global PF lock upon using the command interface but
> > can rely on the above mechanism to be in sync with the PF.
> > 
> > This can enable parallel VFs migration over the PF command interface
> > from kernel driver point of view.
> > 
> > In addition,
> > Moved to use the PF async command mode for the SAVE state command.
> > This enables returning earlier to user space upon issuing successfully
> > the command and improve latency by let things run in parallel.
> > 
> > Alex, as this series touches mlx5_core we may need to send this in a
> > pull request format to VFIO to avoid conflicts before acceptance.  
> 
> The PR was sent.
> https://lore.kernel.org/netdev/20220510131236.1039430-1-leon@kernel.org/T/#u

For patches 2-4, please add:

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>

Thanks,
Alex
Leon Romanovsky May 11, 2022, 6:40 a.m. UTC | #3
On Tue, May 10, 2022 at 09:00:53AM -0600, Alex Williamson wrote:
> On Tue, 10 May 2022 16:16:16 +0300
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Tue, May 10, 2022 at 12:02:02PM +0300, Yishai Hadas wrote:
> > > This series improves mlx5 live migration driver in few aspects as of
> > > below.
> > > 
> > > Refactor to enable running migration commands in parallel over the PF
> > > command interface.
> > > 
> > > To achieve that we exposed from mlx5_core an API to let the VF be
> > > notified before that the PF command interface goes down/up. (e.g. PF
> > > reload upon health recovery).
> > > 
> > > Once having the above functionality in place mlx5 vfio doesn't need any
> > > more to obtain the global PF lock upon using the command interface but
> > > can rely on the above mechanism to be in sync with the PF.
> > > 
> > > This can enable parallel VFs migration over the PF command interface
> > > from kernel driver point of view.
> > > 
> > > In addition,
> > > Moved to use the PF async command mode for the SAVE state command.
> > > This enables returning earlier to user space upon issuing successfully
> > > the command and improve latency by let things run in parallel.
> > > 
> > > Alex, as this series touches mlx5_core we may need to send this in a
> > > pull request format to VFIO to avoid conflicts before acceptance.  
> > 
> > The PR was sent.
> > https://lore.kernel.org/netdev/20220510131236.1039430-1-leon@kernel.org/T/#u
> 
> For patches 2-4, please add:
> 
> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>

Done, I force pushed same branch and tag, so previous PR is still valid
to be pulled.
https://lore.kernel.org/kvm/20220510131236.1039430-1-leon@kernel.org/T/#u

Thanks

> 
> Thanks,
> Alex
>
Alex Williamson May 12, 2022, 6:21 p.m. UTC | #4
On Wed, 11 May 2022 09:40:37 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Tue, May 10, 2022 at 09:00:53AM -0600, Alex Williamson wrote:
> > On Tue, 10 May 2022 16:16:16 +0300
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Tue, May 10, 2022 at 12:02:02PM +0300, Yishai Hadas wrote:  
> > > > This series improves mlx5 live migration driver in few aspects as of
> > > > below.
> > > > 
> > > > Refactor to enable running migration commands in parallel over the PF
> > > > command interface.
> > > > 
> > > > To achieve that we exposed from mlx5_core an API to let the VF be
> > > > notified before that the PF command interface goes down/up. (e.g. PF
> > > > reload upon health recovery).
> > > > 
> > > > Once having the above functionality in place mlx5 vfio doesn't need any
> > > > more to obtain the global PF lock upon using the command interface but
> > > > can rely on the above mechanism to be in sync with the PF.
> > > > 
> > > > This can enable parallel VFs migration over the PF command interface
> > > > from kernel driver point of view.
> > > > 
> > > > In addition,
> > > > Moved to use the PF async command mode for the SAVE state command.
> > > > This enables returning earlier to user space upon issuing successfully
> > > > the command and improve latency by let things run in parallel.
> > > > 
> > > > Alex, as this series touches mlx5_core we may need to send this in a
> > > > pull request format to VFIO to avoid conflicts before acceptance.    
> > > 
> > > The PR was sent.
> > > https://lore.kernel.org/netdev/20220510131236.1039430-1-leon@kernel.org/T/#u  
> > 
> > For patches 2-4, please add:
> > 
> > Reviewed-by: Alex Williamson <alex.williamson@redhat.com>  
> 
> Done, I force pushed same branch and tag, so previous PR is still valid
> to be pulled.
> https://lore.kernel.org/kvm/20220510131236.1039430-1-leon@kernel.org/T/#u

Merged to vfio next branch for v5.19.  Thanks,

Alex