mbox series

[v2,00/10] Implement multi-GPU DMA mappings for KFD

Message ID 20210422013058.6305-1-Felix.Kuehling@amd.com (mailing list archive)
Headers show
Series Implement multi-GPU DMA mappings for KFD | expand

Message

Felix Kuehling April 22, 2021, 1:30 a.m. UTC
This patch series fixes DMA-mappings of system memory (GTT and userptr)
for KFD running on multi-GPU systems with IOMMU enabled. One SG-BO per
GPU is needed to maintain the DMA mappings of each BO.

Changes in v2:
- Made the original BO parent of the SG BO to fix bo destruction order
- Removed individualiation hack that is, not needed with parent BO
- Removed resv locking hace in amdgpu_ttm_unpopulate, not needed without
  the individualization hack
- Added a patch to enable the Intel IOMMU driver in rock-dbg_defconfig
- Added a patch to move dmabuf attach/detach into backend_(un)bind

I'm still seeing some IOMMU access faults in the eviction test. They seem
to be related to userptr handling. They happen even without this patch
series on a single-GPU system, where this patch series is not needed. I
believe this is an old problem in KFD or amdgpu that is being exposed by
device isolation from the IOMMU. I'm debugging it, but it should not hold
up this patch series.

"drm/ttm: Don't count pages in SG BOs against pages_limit" was already
applied to drm-misc (I think). I'm still including it here because my
patches depend on it. Without that, the SG BOs created for DMA mappings
cause many tests fail because TTM incorrectly thinks it's out of memory.

Felix Kuehling (10):
  rock-dbg_defconfig: Enable Intel IOMMU
  drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
  drm/amdgpu: Keep a bo-reference per-attachment
  drm/amdgpu: Simplify AQL queue mapping
  drm/amdgpu: Add multi-GPU DMA mapping helpers
  drm/amdgpu: DMA map/unmap when updating GPU mappings
  drm/amdgpu: Move kfd_mem_attach outside reservation
  drm/amdgpu: Add DMA mapping of GTT BOs
  drm/ttm: Don't count pages in SG BOs against pages_limit
  drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind

 arch/x86/configs/rock-dbg_defconfig           |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  18 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 530 ++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  51 +-
 drivers/gpu/drm/ttm/ttm_tt.c                  |  27 +-
 5 files changed, 437 insertions(+), 200 deletions(-)

Comments

Zeng, Oak April 27, 2021, 3:16 p.m. UTC | #1
This series is Acked-by: Oak Zeng <Oak.Zeng@amd.com> 

Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" <dri-devel-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:

    This patch series fixes DMA-mappings of system memory (GTT and userptr)
    for KFD running on multi-GPU systems with IOMMU enabled. One SG-BO per
    GPU is needed to maintain the DMA mappings of each BO.

    Changes in v2:
    - Made the original BO parent of the SG BO to fix bo destruction order
    - Removed individualiation hack that is, not needed with parent BO
    - Removed resv locking hace in amdgpu_ttm_unpopulate, not needed without
      the individualization hack
    - Added a patch to enable the Intel IOMMU driver in rock-dbg_defconfig
    - Added a patch to move dmabuf attach/detach into backend_(un)bind

    I'm still seeing some IOMMU access faults in the eviction test. They seem
    to be related to userptr handling. They happen even without this patch
    series on a single-GPU system, where this patch series is not needed. I
    believe this is an old problem in KFD or amdgpu that is being exposed by
    device isolation from the IOMMU. I'm debugging it, but it should not hold
    up this patch series.

    "drm/ttm: Don't count pages in SG BOs against pages_limit" was already
    applied to drm-misc (I think). I'm still including it here because my
    patches depend on it. Without that, the SG BOs created for DMA mappings
    cause many tests fail because TTM incorrectly thinks it's out of memory.

    Felix Kuehling (10):
      rock-dbg_defconfig: Enable Intel IOMMU
      drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
      drm/amdgpu: Keep a bo-reference per-attachment
      drm/amdgpu: Simplify AQL queue mapping
      drm/amdgpu: Add multi-GPU DMA mapping helpers
      drm/amdgpu: DMA map/unmap when updating GPU mappings
      drm/amdgpu: Move kfd_mem_attach outside reservation
      drm/amdgpu: Add DMA mapping of GTT BOs
      drm/ttm: Don't count pages in SG BOs against pages_limit
      drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind

     arch/x86/configs/rock-dbg_defconfig           |  11 +-
     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  18 +-
     .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 530 ++++++++++++------
     drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  51 +-
     drivers/gpu/drm/ttm/ttm_tt.c                  |  27 +-
     5 files changed, 437 insertions(+), 200 deletions(-)

    -- 
    2.31.1

    _______________________________________________
    dri-devel mailing list
    dri-devel@lists.freedesktop.org
    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Coak.zeng%40amd.com%7Cfb31922bd50846641e9508d9052e635d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546519058204046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yxNesWxDmM5H8ObiNmeaa0DBIEyptiBpjUKSUqS%2B52M%3D&amp;reserved=0