mbox series

[V4,vfio,00/10] Add device DMA logging support for mlx5 driver

Message ID 20220815151109.180403-1-yishaih@nvidia.com (mailing list archive)
Headers show
Series Add device DMA logging support for mlx5 driver | expand

Message

Yishai Hadas Aug. 15, 2022, 3:10 p.m. UTC
This series adds device DMA logging uAPIs and their implementation as
part of mlx5 driver.

DMA logging allows a device to internally record what DMAs the device is
initiating and report them back to userspace. It is part of the VFIO
migration infrastructure that allows implementing dirty page tracking
during the pre copy phase of live migration. Only DMA WRITEs are logged,
and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.

The uAPIs are based on the FEATURE ioctl as were introduced earlier by
the below RFC [1] and follows the notes that were discussed in the
mailing list.

It includes:
- A PROBE option to detect if the device supports DMA logging.
- A SET option to start device DMA logging in given IOVAs ranges.
- A GET option to read back and clear the device DMA log.
- A SET option to stop device DMA logging that was previously started.

Extra details exist as part of relevant patches in the series.

In addition, the series adds some infrastructure support for managing an
IOVA bitmap done by Joao Martins.

It abstracts how an IOVA range is represented in a bitmap that is
granulated by a given page_size. So it translates all the lifting of
dealing with user pointers into its corresponding kernel addresses.
This new functionality abstracts the complexity of user/kernel bitmap
pointer usage and finally enables an API to set some bits.

This functionality will be used as part of IOMMUFD series for the system
IOMMU tracking.

Finally, we come with mlx5 implementation based on its device
specification for the DMA logging APIs.

The matching qemu changes can be previewed here [2].
They come on top of the v2 migration protocol patches that were sent
already to the mailing list.

Note:
- As this series touched mlx5_core parts we may need to send the
  net/mlx5 patches as a pull request format to VFIO to avoid conflicts
  before acceptance.

[1] https://lore.kernel.org/all/20220501123301.127279-1-yishaih@nvidia.com/T/
[2] https://github.com/avihai1122/qemu/commits/device_dirty_tracking

Changes from V3: https://lore.kernel.org/all/202207011231.1oPQhSzo-lkp@intel.com/t/
Rebase on top of v6.0 rc1.
Patch #3:
- Drop the documentation note regarding the 'atomic OR' usage of the bitmap
  as part of VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT.
  This deletion was missed as part of V3 to match kernel code.
  To better clarify, as part of testing V3, we could see a big penalty in
  performance (*2 in some cases) when the iova bitmap patch used atomic
  bit operations. As QEMU doesn't really share bitmaps between multiple
  trackers we saw no reason to use atomics and get a bad performance.
  If in the future, will be a real use case that will justify it, we can
  come with VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT_ATOMIC new option with
  the matching kernel code.
Patch #4:
- The rename patch from vfio.c to vfio_main.c was accepted already, not
  part of this series anymore.

Changes from V2: https://lore.kernel.org/netdev/20220726151232.GF4438@nvidia.com/t/
Patch #1
- Add some reserved fields that were missed.
Patch #3:
- Improve the UAPI documentation in few places as was asked by Alex and
  Kevin, based on the discussion in the mailing list.
Patch #5:
- Improvements from Joao for his IOVA bitmap patch to be
  cleaner/simpler as was asked by Alex. It includes the below:
   * Make iova_to_index and index_to_iova fully symmetrical.
   * Use 'sizeof(*iter->data) * BITS_PER_BYTE' in both index_to_iova
     and iova_to_index.
   * Remove iova_bitmap_init() and just stay with iova_bitmap_iter_init().
   * s/left/remaining/
   * To not use @remaining variable for both index and iova/length.
   * Remove stale comment on max dirty bitmap bits.
   * Remove DIV_ROUNDUP from iova_to_index() helper and replace with a
     division.
   * Use iova rather than length where appropriate, while noting with
     commentary the usage of length as next relative IOVA.
   * Rework pinning to be internal and remove that from the iova iter
     API caller.
   * get() and put() now teardown iova_bitmap::dirty npages.
   * Move unnecessary includes into the C file.
   * Add theory of operation and theory of usage in the header file.
   * Add more comments on private helpers on less obvious logic
   * Add documentation on all public APIs.
  * Change commit to reflect new usage of APIs.
Patch #6:
- Drop the hard-coded 1024 for LOG_MAX_RANGES and replace to consider
  PAGE_SIZE as was suggested by Jason.
- Return -E2BIG as Alex suggested.
- Adapt the loop upon logging report to new IOVA bit map stuff.

Changes from V1: https://lore.kernel.org/netdev/202207052209.x00Iykkp-lkp@intel.com/T/

- Patch #6: Fix a note given by krobot, select INTERVAL_TREE for VFIO.

Changes from V0: https://lore.kernel.org/netdev/202207011231.1oPQhSzo-lkp@intel.com/T/

- Drop the first 2 patches that Alex merged already.
- Fix a note given by krobot, based on Jason's suggestion.
- Some improvements from Joao for his IOVA bitmap patch to be
  cleaner/simpler. It includes the below:
    * Rename iova_bitmap_array_length to iova_bitmap_iova_to_index.
    * Rename iova_bitmap_index_to_length to iova_bitmap_index_to_iova.
    * Change iova_bitmap_iova_to_index to take an iova_bitmap_iter
      as an argument to pair with iova_bitmap_index_to_length.
    * Make iova_bitmap_iter_done() use >= instead of
      substraction+comparison. This fixes iova_bitmap_iter_done()
      return as it was previously returning when !done.
    * Remove iova_bitmap_iter_length().
    * Simplify iova_bitmap_length() overcomplicated trailing end check
    * Convert all sizeof(u64) into sizeof(*iter->data).
    * Use u64 __user for ::data instead of void in both struct and
      initialization of iova_bitmap.

Yishai

Joao Martins (1):
  vfio: Add an IOVA bitmap support

Yishai Hadas (9):
  net/mlx5: Introduce ifc bits for page tracker
  net/mlx5: Query ADV_VIRTUALIZATION capabilities
  vfio: Introduce DMA logging uAPIs
  vfio: Introduce the DMA logging feature support
  vfio/mlx5: Init QP based resources for dirty tracking
  vfio/mlx5: Create and destroy page tracker object
  vfio/mlx5: Report dirty pages from tracker
  vfio/mlx5: Manage error scenarios on tracker
  vfio/mlx5: Set the driver DMA logging callbacks

 drivers/net/ethernet/mellanox/mlx5/core/fw.c  |   6 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   1 +
 drivers/vfio/Kconfig                          |   1 +
 drivers/vfio/Makefile                         |   6 +-
 drivers/vfio/iova_bitmap.c                    | 224 ++++
 drivers/vfio/pci/mlx5/cmd.c                   | 995 +++++++++++++++++-
 drivers/vfio/pci/mlx5/cmd.h                   |  63 +-
 drivers/vfio/pci/mlx5/main.c                  |   9 +-
 drivers/vfio/pci/vfio_pci_core.c              |   5 +
 drivers/vfio/vfio_main.c                      | 159 +++
 include/linux/iova_bitmap.h                   | 189 ++++
 include/linux/mlx5/device.h                   |   9 +
 include/linux/mlx5/mlx5_ifc.h                 |  83 +-
 include/linux/vfio.h                          |  21 +-
 include/uapi/linux/vfio.h                     |  86 ++
 15 files changed, 1837 insertions(+), 20 deletions(-)
 create mode 100644 drivers/vfio/iova_bitmap.c
 create mode 100644 include/linux/iova_bitmap.h

Comments

Yishai Hadas Aug. 25, 2022, 11:13 a.m. UTC | #1
On 15/08/2022 18:10, Yishai Hadas wrote:
> This series adds device DMA logging uAPIs and their implementation as
> part of mlx5 driver.
>
> DMA logging allows a device to internally record what DMAs the device is
> initiating and report them back to userspace. It is part of the VFIO
> migration infrastructure that allows implementing dirty page tracking
> during the pre copy phase of live migration. Only DMA WRITEs are logged,
> and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
>
> The uAPIs are based on the FEATURE ioctl as were introduced earlier by
> the below RFC [1] and follows the notes that were discussed in the
> mailing list.
>
> It includes:
> - A PROBE option to detect if the device supports DMA logging.
> - A SET option to start device DMA logging in given IOVAs ranges.
> - A GET option to read back and clear the device DMA log.
> - A SET option to stop device DMA logging that was previously started.
>
> Extra details exist as part of relevant patches in the series.
>
> In addition, the series adds some infrastructure support for managing an
> IOVA bitmap done by Joao Martins.
>
> It abstracts how an IOVA range is represented in a bitmap that is
> granulated by a given page_size. So it translates all the lifting of
> dealing with user pointers into its corresponding kernel addresses.
> This new functionality abstracts the complexity of user/kernel bitmap
> pointer usage and finally enables an API to set some bits.
>
> This functionality will be used as part of IOMMUFD series for the system
> IOMMU tracking.
>
> Finally, we come with mlx5 implementation based on its device
> specification for the DMA logging APIs.
>
> The matching qemu changes can be previewed here [2].
> They come on top of the v2 migration protocol patches that were sent
> already to the mailing list.
>
> Note:
> - As this series touched mlx5_core parts we may need to send the
>    net/mlx5 patches as a pull request format to VFIO to avoid conflicts
>    before acceptance.
>
> [1] https://lore.kernel.org/all/20220501123301.127279-1-yishaih@nvidia.com/T/
> [2] https://github.com/avihai1122/qemu/commits/device_dirty_tracking
>
> Changes from V3: https://lore.kernel.org/all/202207011231.1oPQhSzo-lkp@intel.com/t/
> Rebase on top of v6.0 rc1.
> Patch #3:
> - Drop the documentation note regarding the 'atomic OR' usage of the bitmap
>    as part of VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT.
>    This deletion was missed as part of V3 to match kernel code.
>    To better clarify, as part of testing V3, we could see a big penalty in
>    performance (*2 in some cases) when the iova bitmap patch used atomic
>    bit operations. As QEMU doesn't really share bitmaps between multiple
>    trackers we saw no reason to use atomics and get a bad performance.
>    If in the future, will be a real use case that will justify it, we can
>    come with VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT_ATOMIC new option with
>    the matching kernel code.
> Patch #4:
> - The rename patch from vfio.c to vfio_main.c was accepted already, not
>    part of this series anymore.
>
> Changes from V2: https://lore.kernel.org/netdev/20220726151232.GF4438@nvidia.com/t/
> Patch #1
> - Add some reserved fields that were missed.
> Patch #3:
> - Improve the UAPI documentation in few places as was asked by Alex and
>    Kevin, based on the discussion in the mailing list.
> Patch #5:
> - Improvements from Joao for his IOVA bitmap patch to be
>    cleaner/simpler as was asked by Alex. It includes the below:
>     * Make iova_to_index and index_to_iova fully symmetrical.
>     * Use 'sizeof(*iter->data) * BITS_PER_BYTE' in both index_to_iova
>       and iova_to_index.
>     * Remove iova_bitmap_init() and just stay with iova_bitmap_iter_init().
>     * s/left/remaining/
>     * To not use @remaining variable for both index and iova/length.
>     * Remove stale comment on max dirty bitmap bits.
>     * Remove DIV_ROUNDUP from iova_to_index() helper and replace with a
>       division.
>     * Use iova rather than length where appropriate, while noting with
>       commentary the usage of length as next relative IOVA.
>     * Rework pinning to be internal and remove that from the iova iter
>       API caller.
>     * get() and put() now teardown iova_bitmap::dirty npages.
>     * Move unnecessary includes into the C file.
>     * Add theory of operation and theory of usage in the header file.
>     * Add more comments on private helpers on less obvious logic
>     * Add documentation on all public APIs.
>    * Change commit to reflect new usage of APIs.
> Patch #6:
> - Drop the hard-coded 1024 for LOG_MAX_RANGES and replace to consider
>    PAGE_SIZE as was suggested by Jason.
> - Return -E2BIG as Alex suggested.
> - Adapt the loop upon logging report to new IOVA bit map stuff.
>
> Changes from V1: https://lore.kernel.org/netdev/202207052209.x00Iykkp-lkp@intel.com/T/
>
> - Patch #6: Fix a note given by krobot, select INTERVAL_TREE for VFIO.
>
> Changes from V0: https://lore.kernel.org/netdev/202207011231.1oPQhSzo-lkp@intel.com/T/
>
> - Drop the first 2 patches that Alex merged already.
> - Fix a note given by krobot, based on Jason's suggestion.
> - Some improvements from Joao for his IOVA bitmap patch to be
>    cleaner/simpler. It includes the below:
>      * Rename iova_bitmap_array_length to iova_bitmap_iova_to_index.
>      * Rename iova_bitmap_index_to_length to iova_bitmap_index_to_iova.
>      * Change iova_bitmap_iova_to_index to take an iova_bitmap_iter
>        as an argument to pair with iova_bitmap_index_to_length.
>      * Make iova_bitmap_iter_done() use >= instead of
>        substraction+comparison. This fixes iova_bitmap_iter_done()
>        return as it was previously returning when !done.
>      * Remove iova_bitmap_iter_length().
>      * Simplify iova_bitmap_length() overcomplicated trailing end check
>      * Convert all sizeof(u64) into sizeof(*iter->data).
>      * Use u64 __user for ::data instead of void in both struct and
>        initialization of iova_bitmap.
>
> Yishai
>
> Joao Martins (1):
>    vfio: Add an IOVA bitmap support
>
> Yishai Hadas (9):
>    net/mlx5: Introduce ifc bits for page tracker
>    net/mlx5: Query ADV_VIRTUALIZATION capabilities
>    vfio: Introduce DMA logging uAPIs
>    vfio: Introduce the DMA logging feature support
>    vfio/mlx5: Init QP based resources for dirty tracking
>    vfio/mlx5: Create and destroy page tracker object
>    vfio/mlx5: Report dirty pages from tracker
>    vfio/mlx5: Manage error scenarios on tracker
>    vfio/mlx5: Set the driver DMA logging callbacks
>
>   drivers/net/ethernet/mellanox/mlx5/core/fw.c  |   6 +
>   .../net/ethernet/mellanox/mlx5/core/main.c    |   1 +
>   drivers/vfio/Kconfig                          |   1 +
>   drivers/vfio/Makefile                         |   6 +-
>   drivers/vfio/iova_bitmap.c                    | 224 ++++
>   drivers/vfio/pci/mlx5/cmd.c                   | 995 +++++++++++++++++-
>   drivers/vfio/pci/mlx5/cmd.h                   |  63 +-
>   drivers/vfio/pci/mlx5/main.c                  |   9 +-
>   drivers/vfio/pci/vfio_pci_core.c              |   5 +
>   drivers/vfio/vfio_main.c                      | 159 +++
>   include/linux/iova_bitmap.h                   | 189 ++++
>   include/linux/mlx5/device.h                   |   9 +
>   include/linux/mlx5/mlx5_ifc.h                 |  83 +-
>   include/linux/vfio.h                          |  21 +-
>   include/uapi/linux/vfio.h                     |  86 ++
>   15 files changed, 1837 insertions(+), 20 deletions(-)
>   create mode 100644 drivers/vfio/iova_bitmap.c
>   create mode 100644 include/linux/iova_bitmap.h
>
Alex,

Can we please proceed with sending PR for the series to be accepted ?  
(i.e. as of the first two net/mlx5 patches).

The comments that were given in the previous kernel cycle were addressed 
and there is no open comment here for few weeks already.

Yishai
Alex Williamson Aug. 25, 2022, 7:37 p.m. UTC | #2
On Thu, 25 Aug 2022 14:13:01 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:
> Alex,
> 
> Can we please proceed with sending PR for the series to be accepted ?  
> (i.e. as of the first two net/mlx5 patches).
> 
> The comments that were given in the previous kernel cycle were addressed 
> and there is no open comment here for few weeks already.

Hmm, it's only been posted since last week.  I still find the iova
bitmap code to be quite a mess, I just sent comments.  Thanks,

Alex