mbox series

[drm-next,00/14,RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

Message ID 20230118061256.2689-1-dakr@redhat.com (mailing list archive)
Headers show
Series DRM GPUVA Manager & Nouveau VM_BIND UAPI | expand

Message

Danilo Krummrich Jan. 18, 2023, 6:12 a.m. UTC
This patch series provides a new UAPI for the Nouveau driver in order to
support Vulkan features, such as sparse bindings and sparse residency.

Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
keep track of GPU virtual address (VA) mappings in a more generic way.

The DRM GPUVA manager is indented to help drivers implement userspace-manageable
GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
serves the following purposes in this context.

    1) Provide a dedicated range allocator to track GPU VA allocations and
       mappings, making use of the drm_mm range allocator.

    2) Generically connect GPU VA mappings to their backing buffers, in
       particular DRM GEM objects.

    3) Provide a common implementation to perform more complex mapping
       operations on the GPU VA space. In particular splitting and merging
       of GPU VA mappings, e.g. for intersecting mapping requests or partial
       unmap requests.

The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
providing the following new interfaces.

    1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
       for UMDs to specify the portion of VA space managed by the kernel and
       userspace, respectively.

    2) Allocate and free a VA space region as well as bind and unbind memory
       to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.

    3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
scheduler to queue jobs and support asynchronous processing with DRM syncobjs
as synchronization mechanism.

By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.

The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
for GEM buffers) by Christian König. Since the patch implementing drm_exec was
not yet merged into drm-next it is part of this series, as well as a small fix
for this patch, which was found while testing this series.

This patch series is also available at [1].

There is a Mesa NVK merge request by Dave Airlie [2] implementing the
corresponding userspace parts for this series.

The Vulkan CTS test suite passes the sparse binding and sparse residency test
cases for the new UAPI together with Dave's Mesa work.

There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
VA manager's logic through Nouveau's new UAPI and should be considered just as
helper for implementation.

However, I absolutely intend to change those test cases to proper kunit test
cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
design.

[1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
    https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
[2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
[3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind

I also want to give credit to Dave Airlie, who contributed a lot of ideas to
this patch series.

Christian König (1):
  drm: execution context for GEM buffers

Danilo Krummrich (13):
  drm/exec: fix memory leak in drm_exec_prepare_obj()
  drm: manager to keep track of GPUs VA mappings
  drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  drm/nouveau: new VM_BIND uapi interfaces
  drm/nouveau: get vmm via nouveau_cli_vmm()
  drm/nouveau: bo: initialize GEM GPU VA interface
  drm/nouveau: move usercopy helpers to nouveau_drv.h
  drm/nouveau: fence: fail to emit when fence context is killed
  drm/nouveau: chan: provide nouveau_channel_kill()
  drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  drm/nouveau: implement uvmm for user mode bindings
  drm/nouveau: implement new VM_BIND UAPI
  drm/nouveau: debugfs: implement DRM GPU VA debugfs

 Documentation/gpu/driver-uapi.rst             |   11 +
 Documentation/gpu/drm-mm.rst                  |   43 +
 drivers/gpu/drm/Kconfig                       |    6 +
 drivers/gpu/drm/Makefile                      |    3 +
 drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
 drivers/gpu/drm/drm_debugfs.c                 |   56 +
 drivers/gpu/drm/drm_exec.c                    |  294 ++++
 drivers/gpu/drm/drm_gem.c                     |    3 +
 drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 +++++++++++++++++
 drivers/gpu/drm/nouveau/Kbuild                |    3 +
 drivers/gpu/drm/nouveau/Kconfig               |    2 +
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
 drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
 .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
 drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
 drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
 drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
 drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
 drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
 drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
 drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
 drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
 drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
 drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
 drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
 drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
 drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
 drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
 drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
 drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
 include/drm/drm_debugfs.h                     |   25 +
 include/drm/drm_drv.h                         |    6 +
 include/drm/drm_exec.h                        |  144 ++
 include/drm/drm_gem.h                         |   75 +
 include/drm/drm_gpuva_mgr.h                   |  527 +++++++
 include/uapi/drm/nouveau_drm.h                |  216 +++
 47 files changed, 5266 insertions(+), 126 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
 create mode 100644 include/drm/drm_exec.h
 create mode 100644 include/drm/drm_gpuva_mgr.h


base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d

Comments

Christian König Jan. 18, 2023, 8:53 a.m. UTC | #1
Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> This patch series provides a new UAPI for the Nouveau driver in order to
> support Vulkan features, such as sparse bindings and sparse residency.
>
> Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
> keep track of GPU virtual address (VA) mappings in a more generic way.
>
> The DRM GPUVA manager is indented to help drivers implement userspace-manageable
> GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
> serves the following purposes in this context.
>
>      1) Provide a dedicated range allocator to track GPU VA allocations and
>         mappings, making use of the drm_mm range allocator.

This means that the ranges are allocated by the kernel? If yes that's a 
really really bad idea.

Regards,
Christian.

>
>      2) Generically connect GPU VA mappings to their backing buffers, in
>         particular DRM GEM objects.
>
>      3) Provide a common implementation to perform more complex mapping
>         operations on the GPU VA space. In particular splitting and merging
>         of GPU VA mappings, e.g. for intersecting mapping requests or partial
>         unmap requests.
>
> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
> providing the following new interfaces.
>
>      1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>         for UMDs to specify the portion of VA space managed by the kernel and
>         userspace, respectively.
>
>      2) Allocate and free a VA space region as well as bind and unbind memory
>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>
>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>
> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
> scheduler to queue jobs and support asynchronous processing with DRM syncobjs
> as synchronization mechanism.
>
> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>
> The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
> for GEM buffers) by Christian König. Since the patch implementing drm_exec was
> not yet merged into drm-next it is part of this series, as well as a small fix
> for this patch, which was found while testing this series.
>
> This patch series is also available at [1].
>
> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> corresponding userspace parts for this series.
>
> The Vulkan CTS test suite passes the sparse binding and sparse residency test
> cases for the new UAPI together with Dave's Mesa work.
>
> There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
> and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
> VA manager's logic through Nouveau's new UAPI and should be considered just as
> helper for implementation.
>
> However, I absolutely intend to change those test cases to proper kunit test
> cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
> design.
>
> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>      https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> [3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>
> I also want to give credit to Dave Airlie, who contributed a lot of ideas to
> this patch series.
>
> Christian König (1):
>    drm: execution context for GEM buffers
>
> Danilo Krummrich (13):
>    drm/exec: fix memory leak in drm_exec_prepare_obj()
>    drm: manager to keep track of GPUs VA mappings
>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>    drm/nouveau: new VM_BIND uapi interfaces
>    drm/nouveau: get vmm via nouveau_cli_vmm()
>    drm/nouveau: bo: initialize GEM GPU VA interface
>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>    drm/nouveau: fence: fail to emit when fence context is killed
>    drm/nouveau: chan: provide nouveau_channel_kill()
>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>    drm/nouveau: implement uvmm for user mode bindings
>    drm/nouveau: implement new VM_BIND UAPI
>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>
>   Documentation/gpu/driver-uapi.rst             |   11 +
>   Documentation/gpu/drm-mm.rst                  |   43 +
>   drivers/gpu/drm/Kconfig                       |    6 +
>   drivers/gpu/drm/Makefile                      |    3 +
>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
>   drivers/gpu/drm/drm_gem.c                     |    3 +
>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 +++++++++++++++++
>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>   include/drm/drm_debugfs.h                     |   25 +
>   include/drm/drm_drv.h                         |    6 +
>   include/drm/drm_exec.h                        |  144 ++
>   include/drm/drm_gem.h                         |   75 +
>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>   include/uapi/drm/nouveau_drm.h                |  216 +++
>   47 files changed, 5266 insertions(+), 126 deletions(-)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>   create mode 100644 include/drm/drm_exec.h
>   create mode 100644 include/drm/drm_gpuva_mgr.h
>
>
> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
Danilo Krummrich Jan. 18, 2023, 3:34 p.m. UTC | #2
Hi Christian,

On 1/18/23 09:53, Christian König wrote:
> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>> This patch series provides a new UAPI for the Nouveau driver in order to
>> support Vulkan features, such as sparse bindings and sparse residency.
>>
>> Furthermore, with the DRM GPUVA manager it provides a new DRM core 
>> feature to
>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>
>> The DRM GPUVA manager is indented to help drivers implement 
>> userspace-manageable
>> GPU VA spaces in reference to the Vulkan API. In order to achieve this 
>> goal it
>> serves the following purposes in this context.
>>
>>      1) Provide a dedicated range allocator to track GPU VA 
>> allocations and
>>         mappings, making use of the drm_mm range allocator.
> 
> This means that the ranges are allocated by the kernel? If yes that's a 
> really really bad idea.

No, it's just for keeping track of the ranges userspace has allocated.

- Danilo

> 
> Regards,
> Christian.
> 
>>
>>      2) Generically connect GPU VA mappings to their backing buffers, in
>>         particular DRM GEM objects.
>>
>>      3) Provide a common implementation to perform more complex mapping
>>         operations on the GPU VA space. In particular splitting and 
>> merging
>>         of GPU VA mappings, e.g. for intersecting mapping requests or 
>> partial
>>         unmap requests.
>>
>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, 
>> itself
>> providing the following new interfaces.
>>
>>      1) Initialize a GPU VA space via the new 
>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>         for UMDs to specify the portion of VA space managed by the 
>> kernel and
>>         userspace, respectively.
>>
>>      2) Allocate and free a VA space region as well as bind and unbind 
>> memory
>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>
>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>
>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use 
>> of the DRM
>> scheduler to queue jobs and support asynchronous processing with DRM 
>> syncobjs
>> as synchronization mechanism.
>>
>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>
>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution 
>> context
>> for GEM buffers) by Christian König. Since the patch implementing 
>> drm_exec was
>> not yet merged into drm-next it is part of this series, as well as a 
>> small fix
>> for this patch, which was found while testing this series.
>>
>> This patch series is also available at [1].
>>
>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>> corresponding userspace parts for this series.
>>
>> The Vulkan CTS test suite passes the sparse binding and sparse 
>> residency test
>> cases for the new UAPI together with Dave's Mesa work.
>>
>> There are also some test cases in the igt-gpu-tools project [3] for 
>> the new UAPI
>> and hence the DRM GPU VA manager. However, most of them are testing 
>> the DRM GPU
>> VA manager's logic through Nouveau's new UAPI and should be considered 
>> just as
>> helper for implementation.
>>
>> However, I absolutely intend to change those test cases to proper 
>> kunit test
>> cases for the DRM GPUVA manager, once and if we agree on it's 
>> usefulness and
>> design.
>>
>> [1] 
>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>>      https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>> [3] 
>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>
>> I also want to give credit to Dave Airlie, who contributed a lot of 
>> ideas to
>> this patch series.
>>
>> Christian König (1):
>>    drm: execution context for GEM buffers
>>
>> Danilo Krummrich (13):
>>    drm/exec: fix memory leak in drm_exec_prepare_obj()
>>    drm: manager to keep track of GPUs VA mappings
>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>>    drm/nouveau: new VM_BIND uapi interfaces
>>    drm/nouveau: get vmm via nouveau_cli_vmm()
>>    drm/nouveau: bo: initialize GEM GPU VA interface
>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>>    drm/nouveau: fence: fail to emit when fence context is killed
>>    drm/nouveau: chan: provide nouveau_channel_kill()
>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>    drm/nouveau: implement uvmm for user mode bindings
>>    drm/nouveau: implement new VM_BIND UAPI
>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>
>>   Documentation/gpu/driver-uapi.rst             |   11 +
>>   Documentation/gpu/drm-mm.rst                  |   43 +
>>   drivers/gpu/drm/Kconfig                       |    6 +
>>   drivers/gpu/drm/Makefile                      |    3 +
>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
>>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
>>   drivers/gpu/drm/drm_gem.c                     |    3 +
>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 +++++++++++++++++
>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>>   include/drm/drm_debugfs.h                     |   25 +
>>   include/drm/drm_drv.h                         |    6 +
>>   include/drm/drm_exec.h                        |  144 ++
>>   include/drm/drm_gem.h                         |   75 +
>>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>>   include/uapi/drm/nouveau_drm.h                |  216 +++
>>   47 files changed, 5266 insertions(+), 126 deletions(-)
>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>   create mode 100644 include/drm/drm_exec.h
>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>
>>
>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
>
Christian König Jan. 18, 2023, 3:37 p.m. UTC | #3
Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> Hi Christian,
>
> On 1/18/23 09:53, Christian König wrote:
>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>> This patch series provides a new UAPI for the Nouveau driver in 
>>> order to
>>> support Vulkan features, such as sparse bindings and sparse residency.
>>>
>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core 
>>> feature to
>>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>>
>>> The DRM GPUVA manager is indented to help drivers implement 
>>> userspace-manageable
>>> GPU VA spaces in reference to the Vulkan API. In order to achieve 
>>> this goal it
>>> serves the following purposes in this context.
>>>
>>>      1) Provide a dedicated range allocator to track GPU VA 
>>> allocations and
>>>         mappings, making use of the drm_mm range allocator.
>>
>> This means that the ranges are allocated by the kernel? If yes that's 
>> a really really bad idea.
>
> No, it's just for keeping track of the ranges userspace has allocated.

Ok, that makes more sense.

So basically you have an IOCTL which asks kernel for a free range? Or 
what exactly is the drm_mm used for here?

Regards,
Christian.

>
> - Danilo
>
>>
>> Regards,
>> Christian.
>>
>>>
>>>      2) Generically connect GPU VA mappings to their backing 
>>> buffers, in
>>>         particular DRM GEM objects.
>>>
>>>      3) Provide a common implementation to perform more complex mapping
>>>         operations on the GPU VA space. In particular splitting and 
>>> merging
>>>         of GPU VA mappings, e.g. for intersecting mapping requests 
>>> or partial
>>>         unmap requests.
>>>
>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, 
>>> itself
>>> providing the following new interfaces.
>>>
>>>      1) Initialize a GPU VA space via the new 
>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>>         for UMDs to specify the portion of VA space managed by the 
>>> kernel and
>>>         userspace, respectively.
>>>
>>>      2) Allocate and free a VA space region as well as bind and 
>>> unbind memory
>>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND 
>>> ioctl.
>>>
>>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>
>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use 
>>> of the DRM
>>> scheduler to queue jobs and support asynchronous processing with DRM 
>>> syncobjs
>>> as synchronization mechanism.
>>>
>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>
>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec 
>>> (execution context
>>> for GEM buffers) by Christian König. Since the patch implementing 
>>> drm_exec was
>>> not yet merged into drm-next it is part of this series, as well as a 
>>> small fix
>>> for this patch, which was found while testing this series.
>>>
>>> This patch series is also available at [1].
>>>
>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>>> corresponding userspace parts for this series.
>>>
>>> The Vulkan CTS test suite passes the sparse binding and sparse 
>>> residency test
>>> cases for the new UAPI together with Dave's Mesa work.
>>>
>>> There are also some test cases in the igt-gpu-tools project [3] for 
>>> the new UAPI
>>> and hence the DRM GPU VA manager. However, most of them are testing 
>>> the DRM GPU
>>> VA manager's logic through Nouveau's new UAPI and should be 
>>> considered just as
>>> helper for implementation.
>>>
>>> However, I absolutely intend to change those test cases to proper 
>>> kunit test
>>> cases for the DRM GPUVA manager, once and if we agree on it's 
>>> usefulness and
>>> design.
>>>
>>> [1] 
>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next 
>>> /
>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>>> [3] 
>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>>
>>> I also want to give credit to Dave Airlie, who contributed a lot of 
>>> ideas to
>>> this patch series.
>>>
>>> Christian König (1):
>>>    drm: execution context for GEM buffers
>>>
>>> Danilo Krummrich (13):
>>>    drm/exec: fix memory leak in drm_exec_prepare_obj()
>>>    drm: manager to keep track of GPUs VA mappings
>>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>>>    drm/nouveau: new VM_BIND uapi interfaces
>>>    drm/nouveau: get vmm via nouveau_cli_vmm()
>>>    drm/nouveau: bo: initialize GEM GPU VA interface
>>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>>>    drm/nouveau: fence: fail to emit when fence context is killed
>>>    drm/nouveau: chan: provide nouveau_channel_kill()
>>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>>    drm/nouveau: implement uvmm for user mode bindings
>>>    drm/nouveau: implement new VM_BIND UAPI
>>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>>
>>>   Documentation/gpu/driver-uapi.rst             |   11 +
>>>   Documentation/gpu/drm-mm.rst                  |   43 +
>>>   drivers/gpu/drm/Kconfig                       |    6 +
>>>   drivers/gpu/drm/Makefile                      |    3 +
>>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>>>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
>>>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
>>>   drivers/gpu/drm/drm_gem.c                     |    3 +
>>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 
>>> +++++++++++++++++
>>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>>>   include/drm/drm_debugfs.h                     |   25 +
>>>   include/drm/drm_drv.h                         |    6 +
>>>   include/drm/drm_exec.h                        |  144 ++
>>>   include/drm/drm_gem.h                         |   75 +
>>>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>>>   include/uapi/drm/nouveau_drm.h                |  216 +++
>>>   47 files changed, 5266 insertions(+), 126 deletions(-)
>>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>>   create mode 100644 include/drm/drm_exec.h
>>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>>
>>>
>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
>>
>
Danilo Krummrich Jan. 18, 2023, 4:19 p.m. UTC | #4
On 1/18/23 16:37, Christian König wrote:
> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
>> Hi Christian,
>>
>> On 1/18/23 09:53, Christian König wrote:
>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>>> This patch series provides a new UAPI for the Nouveau driver in 
>>>> order to
>>>> support Vulkan features, such as sparse bindings and sparse residency.
>>>>
>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core 
>>>> feature to
>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>>>
>>>> The DRM GPUVA manager is indented to help drivers implement 
>>>> userspace-manageable
>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve 
>>>> this goal it
>>>> serves the following purposes in this context.
>>>>
>>>>      1) Provide a dedicated range allocator to track GPU VA 
>>>> allocations and
>>>>         mappings, making use of the drm_mm range allocator.
>>>
>>> This means that the ranges are allocated by the kernel? If yes that's 
>>> a really really bad idea.
>>
>> No, it's just for keeping track of the ranges userspace has allocated.
> 
> Ok, that makes more sense.
> 
> So basically you have an IOCTL which asks kernel for a free range? Or 
> what exactly is the drm_mm used for here?

Not even that, userspace provides both the base address and the range,
the kernel really just keeps track of things. Though, writing a UAPI on
top of the GPUVA manager asking for a free range instead would be 
possible by just adding the corresponding wrapper functions to get a 
free hole.

Currently, and that's what I think I read out of your question, the main 
benefit of using drm_mm over simply stuffing the entries into a list or 
something boils down to easier collision detection and iterating 
sub-ranges of the whole VA space.

> 
> Regards,
> Christian.
> 
>>
>> - Danilo
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>      2) Generically connect GPU VA mappings to their backing 
>>>> buffers, in
>>>>         particular DRM GEM objects.
>>>>
>>>>      3) Provide a common implementation to perform more complex mapping
>>>>         operations on the GPU VA space. In particular splitting and 
>>>> merging
>>>>         of GPU VA mappings, e.g. for intersecting mapping requests 
>>>> or partial
>>>>         unmap requests.
>>>>
>>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, 
>>>> itself
>>>> providing the following new interfaces.
>>>>
>>>>      1) Initialize a GPU VA space via the new 
>>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>>>         for UMDs to specify the portion of VA space managed by the 
>>>> kernel and
>>>>         userspace, respectively.
>>>>
>>>>      2) Allocate and free a VA space region as well as bind and 
>>>> unbind memory
>>>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND 
>>>> ioctl.
>>>>
>>>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>>
>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use 
>>>> of the DRM
>>>> scheduler to queue jobs and support asynchronous processing with DRM 
>>>> syncobjs
>>>> as synchronization mechanism.
>>>>
>>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>>
>>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec 
>>>> (execution context
>>>> for GEM buffers) by Christian König. Since the patch implementing 
>>>> drm_exec was
>>>> not yet merged into drm-next it is part of this series, as well as a 
>>>> small fix
>>>> for this patch, which was found while testing this series.
>>>>
>>>> This patch series is also available at [1].
>>>>
>>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>>>> corresponding userspace parts for this series.
>>>>
>>>> The Vulkan CTS test suite passes the sparse binding and sparse 
>>>> residency test
>>>> cases for the new UAPI together with Dave's Mesa work.
>>>>
>>>> There are also some test cases in the igt-gpu-tools project [3] for 
>>>> the new UAPI
>>>> and hence the DRM GPU VA manager. However, most of them are testing 
>>>> the DRM GPU
>>>> VA manager's logic through Nouveau's new UAPI and should be 
>>>> considered just as
>>>> helper for implementation.
>>>>
>>>> However, I absolutely intend to change those test cases to proper 
>>>> kunit test
>>>> cases for the DRM GPUVA manager, once and if we agree on it's 
>>>> usefulness and
>>>> design.
>>>>
>>>> [1] 
>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>>>> [3] 
>>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>>>
>>>> I also want to give credit to Dave Airlie, who contributed a lot of 
>>>> ideas to
>>>> this patch series.
>>>>
>>>> Christian König (1):
>>>>    drm: execution context for GEM buffers
>>>>
>>>> Danilo Krummrich (13):
>>>>    drm/exec: fix memory leak in drm_exec_prepare_obj()
>>>>    drm: manager to keep track of GPUs VA mappings
>>>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>>>>    drm/nouveau: new VM_BIND uapi interfaces
>>>>    drm/nouveau: get vmm via nouveau_cli_vmm()
>>>>    drm/nouveau: bo: initialize GEM GPU VA interface
>>>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>>>>    drm/nouveau: fence: fail to emit when fence context is killed
>>>>    drm/nouveau: chan: provide nouveau_channel_kill()
>>>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>>>    drm/nouveau: implement uvmm for user mode bindings
>>>>    drm/nouveau: implement new VM_BIND UAPI
>>>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>>>
>>>>   Documentation/gpu/driver-uapi.rst             |   11 +
>>>>   Documentation/gpu/drm-mm.rst                  |   43 +
>>>>   drivers/gpu/drm/Kconfig                       |    6 +
>>>>   drivers/gpu/drm/Makefile                      |    3 +
>>>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>>>>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
>>>>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
>>>>   drivers/gpu/drm/drm_gem.c                     |    3 +
>>>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 
>>>> +++++++++++++++++
>>>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>>>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>>>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>>>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>>>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>>>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>>>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>>>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>>>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>>>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>>>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>>>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>>>>   include/drm/drm_debugfs.h                     |   25 +
>>>>   include/drm/drm_drv.h                         |    6 +
>>>>   include/drm/drm_exec.h                        |  144 ++
>>>>   include/drm/drm_gem.h                         |   75 +
>>>>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>>>>   include/uapi/drm/nouveau_drm.h                |  216 +++
>>>>   47 files changed, 5266 insertions(+), 126 deletions(-)
>>>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>>>   create mode 100644 include/drm/drm_exec.h
>>>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>>>
>>>>
>>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
>>>
>>
>
Alex Deucher Jan. 18, 2023, 4:30 p.m. UTC | #5
On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
>
> On 1/18/23 16:37, Christian König wrote:
> > Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> >> Hi Christian,
> >>
> >> On 1/18/23 09:53, Christian König wrote:
> >>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> >>>> This patch series provides a new UAPI for the Nouveau driver in
> >>>> order to
> >>>> support Vulkan features, such as sparse bindings and sparse residency.
> >>>>
> >>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
> >>>> feature to
> >>>> keep track of GPU virtual address (VA) mappings in a more generic way.
> >>>>
> >>>> The DRM GPUVA manager is indented to help drivers implement
> >>>> userspace-manageable
> >>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
> >>>> this goal it
> >>>> serves the following purposes in this context.
> >>>>
> >>>>      1) Provide a dedicated range allocator to track GPU VA
> >>>> allocations and
> >>>>         mappings, making use of the drm_mm range allocator.
> >>>
> >>> This means that the ranges are allocated by the kernel? If yes that's
> >>> a really really bad idea.
> >>
> >> No, it's just for keeping track of the ranges userspace has allocated.
> >
> > Ok, that makes more sense.
> >
> > So basically you have an IOCTL which asks kernel for a free range? Or
> > what exactly is the drm_mm used for here?
>
> Not even that, userspace provides both the base address and the range,
> the kernel really just keeps track of things. Though, writing a UAPI on
> top of the GPUVA manager asking for a free range instead would be
> possible by just adding the corresponding wrapper functions to get a
> free hole.
>
> Currently, and that's what I think I read out of your question, the main
> benefit of using drm_mm over simply stuffing the entries into a list or
> something boils down to easier collision detection and iterating
> sub-ranges of the whole VA space.

Why not just do this in userspace?  We have a range manager in
libdrm_amdgpu that you could lift out into libdrm or some other
helper.

Alex


>
> >
> > Regards,
> > Christian.
> >
> >>
> >> - Danilo
> >>
> >>>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>
> >>>>      2) Generically connect GPU VA mappings to their backing
> >>>> buffers, in
> >>>>         particular DRM GEM objects.
> >>>>
> >>>>      3) Provide a common implementation to perform more complex mapping
> >>>>         operations on the GPU VA space. In particular splitting and
> >>>> merging
> >>>>         of GPU VA mappings, e.g. for intersecting mapping requests
> >>>> or partial
> >>>>         unmap requests.
> >>>>
> >>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager,
> >>>> itself
> >>>> providing the following new interfaces.
> >>>>
> >>>>      1) Initialize a GPU VA space via the new
> >>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
> >>>>         for UMDs to specify the portion of VA space managed by the
> >>>> kernel and
> >>>>         userspace, respectively.
> >>>>
> >>>>      2) Allocate and free a VA space region as well as bind and
> >>>> unbind memory
> >>>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND
> >>>> ioctl.
> >>>>
> >>>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> >>>>
> >>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use
> >>>> of the DRM
> >>>> scheduler to queue jobs and support asynchronous processing with DRM
> >>>> syncobjs
> >>>> as synchronization mechanism.
> >>>>
> >>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> >>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> >>>>
> >>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec
> >>>> (execution context
> >>>> for GEM buffers) by Christian König. Since the patch implementing
> >>>> drm_exec was
> >>>> not yet merged into drm-next it is part of this series, as well as a
> >>>> small fix
> >>>> for this patch, which was found while testing this series.
> >>>>
> >>>> This patch series is also available at [1].
> >>>>
> >>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> >>>> corresponding userspace parts for this series.
> >>>>
> >>>> The Vulkan CTS test suite passes the sparse binding and sparse
> >>>> residency test
> >>>> cases for the new UAPI together with Dave's Mesa work.
> >>>>
> >>>> There are also some test cases in the igt-gpu-tools project [3] for
> >>>> the new UAPI
> >>>> and hence the DRM GPU VA manager. However, most of them are testing
> >>>> the DRM GPU
> >>>> VA manager's logic through Nouveau's new UAPI and should be
> >>>> considered just as
> >>>> helper for implementation.
> >>>>
> >>>> However, I absolutely intend to change those test cases to proper
> >>>> kunit test
> >>>> cases for the DRM GPUVA manager, once and if we agree on it's
> >>>> usefulness and
> >>>> design.
> >>>>
> >>>> [1]
> >>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
> >>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> >>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> >>>> [3]
> >>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> >>>>
> >>>> I also want to give credit to Dave Airlie, who contributed a lot of
> >>>> ideas to
> >>>> this patch series.
> >>>>
> >>>> Christian König (1):
> >>>>    drm: execution context for GEM buffers
> >>>>
> >>>> Danilo Krummrich (13):
> >>>>    drm/exec: fix memory leak in drm_exec_prepare_obj()
> >>>>    drm: manager to keep track of GPUs VA mappings
> >>>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
> >>>>    drm/nouveau: new VM_BIND uapi interfaces
> >>>>    drm/nouveau: get vmm via nouveau_cli_vmm()
> >>>>    drm/nouveau: bo: initialize GEM GPU VA interface
> >>>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
> >>>>    drm/nouveau: fence: fail to emit when fence context is killed
> >>>>    drm/nouveau: chan: provide nouveau_channel_kill()
> >>>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
> >>>>    drm/nouveau: implement uvmm for user mode bindings
> >>>>    drm/nouveau: implement new VM_BIND UAPI
> >>>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
> >>>>
> >>>>   Documentation/gpu/driver-uapi.rst             |   11 +
> >>>>   Documentation/gpu/drm-mm.rst                  |   43 +
> >>>>   drivers/gpu/drm/Kconfig                       |    6 +
> >>>>   drivers/gpu/drm/Makefile                      |    3 +
> >>>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
> >>>>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
> >>>>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
> >>>>   drivers/gpu/drm/drm_gem.c                     |    3 +
> >>>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323
> >>>> +++++++++++++++++
> >>>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
> >>>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
> >>>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
> >>>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
> >>>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
> >>>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
> >>>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
> >>>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
> >>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
> >>>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
> >>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
> >>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
> >>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
> >>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
> >>>>   include/drm/drm_debugfs.h                     |   25 +
> >>>>   include/drm/drm_drv.h                         |    6 +
> >>>>   include/drm/drm_exec.h                        |  144 ++
> >>>>   include/drm/drm_gem.h                         |   75 +
> >>>>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
> >>>>   include/uapi/drm/nouveau_drm.h                |  216 +++
> >>>>   47 files changed, 5266 insertions(+), 126 deletions(-)
> >>>>   create mode 100644 drivers/gpu/drm/drm_exec.c
> >>>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
> >>>>   create mode 100644 include/drm/drm_exec.h
> >>>>   create mode 100644 include/drm/drm_gpuva_mgr.h
> >>>>
> >>>>
> >>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
> >>>
> >>
> >
>
Danilo Krummrich Jan. 18, 2023, 4:50 p.m. UTC | #6
On 1/18/23 17:30, Alex Deucher wrote:
> On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
>>
>> On 1/18/23 16:37, Christian König wrote:
>>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
>>>> Hi Christian,
>>>>
>>>> On 1/18/23 09:53, Christian König wrote:
>>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>>>>> This patch series provides a new UAPI for the Nouveau driver in
>>>>>> order to
>>>>>> support Vulkan features, such as sparse bindings and sparse residency.
>>>>>>
>>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
>>>>>> feature to
>>>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>>>>>
>>>>>> The DRM GPUVA manager is indented to help drivers implement
>>>>>> userspace-manageable
>>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
>>>>>> this goal it
>>>>>> serves the following purposes in this context.
>>>>>>
>>>>>>       1) Provide a dedicated range allocator to track GPU VA
>>>>>> allocations and
>>>>>>          mappings, making use of the drm_mm range allocator.
>>>>>
>>>>> This means that the ranges are allocated by the kernel? If yes that's
>>>>> a really really bad idea.
>>>>
>>>> No, it's just for keeping track of the ranges userspace has allocated.
>>>
>>> Ok, that makes more sense.
>>>
>>> So basically you have an IOCTL which asks kernel for a free range? Or
>>> what exactly is the drm_mm used for here?
>>
>> Not even that, userspace provides both the base address and the range,
>> the kernel really just keeps track of things. Though, writing a UAPI on
>> top of the GPUVA manager asking for a free range instead would be
>> possible by just adding the corresponding wrapper functions to get a
>> free hole.
>>
>> Currently, and that's what I think I read out of your question, the main
>> benefit of using drm_mm over simply stuffing the entries into a list or
>> something boils down to easier collision detection and iterating
>> sub-ranges of the whole VA space.
> 
> Why not just do this in userspace?  We have a range manager in
> libdrm_amdgpu that you could lift out into libdrm or some other
> helper.

The kernel still needs to keep track of the mappings within the various 
VA spaces, e.g. it silently needs to unmap mappings that are backed by 
BOs that get evicted and remap them once they're validated (or swapped 
back in).

> 
> Alex
> 
> 
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> - Danilo
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>       2) Generically connect GPU VA mappings to their backing
>>>>>> buffers, in
>>>>>>          particular DRM GEM objects.
>>>>>>
>>>>>>       3) Provide a common implementation to perform more complex mapping
>>>>>>          operations on the GPU VA space. In particular splitting and
>>>>>> merging
>>>>>>          of GPU VA mappings, e.g. for intersecting mapping requests
>>>>>> or partial
>>>>>>          unmap requests.
>>>>>>
>>>>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager,
>>>>>> itself
>>>>>> providing the following new interfaces.
>>>>>>
>>>>>>       1) Initialize a GPU VA space via the new
>>>>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>>>>>          for UMDs to specify the portion of VA space managed by the
>>>>>> kernel and
>>>>>>          userspace, respectively.
>>>>>>
>>>>>>       2) Allocate and free a VA space region as well as bind and
>>>>>> unbind memory
>>>>>>          to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND
>>>>>> ioctl.
>>>>>>
>>>>>>       3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>>>>
>>>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use
>>>>>> of the DRM
>>>>>> scheduler to queue jobs and support asynchronous processing with DRM
>>>>>> syncobjs
>>>>>> as synchronization mechanism.
>>>>>>
>>>>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>>>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>>>>
>>>>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec
>>>>>> (execution context
>>>>>> for GEM buffers) by Christian König. Since the patch implementing
>>>>>> drm_exec was
>>>>>> not yet merged into drm-next it is part of this series, as well as a
>>>>>> small fix
>>>>>> for this patch, which was found while testing this series.
>>>>>>
>>>>>> This patch series is also available at [1].
>>>>>>
>>>>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>>>>>> corresponding userspace parts for this series.
>>>>>>
>>>>>> The Vulkan CTS test suite passes the sparse binding and sparse
>>>>>> residency test
>>>>>> cases for the new UAPI together with Dave's Mesa work.
>>>>>>
>>>>>> There are also some test cases in the igt-gpu-tools project [3] for
>>>>>> the new UAPI
>>>>>> and hence the DRM GPU VA manager. However, most of them are testing
>>>>>> the DRM GPU
>>>>>> VA manager's logic through Nouveau's new UAPI and should be
>>>>>> considered just as
>>>>>> helper for implementation.
>>>>>>
>>>>>> However, I absolutely intend to change those test cases to proper
>>>>>> kunit test
>>>>>> cases for the DRM GPUVA manager, once and if we agree on it's
>>>>>> usefulness and
>>>>>> design.
>>>>>>
>>>>>> [1]
>>>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>>>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>>>>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>>>>>> [3]
>>>>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>>>>>
>>>>>> I also want to give credit to Dave Airlie, who contributed a lot of
>>>>>> ideas to
>>>>>> this patch series.
>>>>>>
>>>>>> Christian König (1):
>>>>>>     drm: execution context for GEM buffers
>>>>>>
>>>>>> Danilo Krummrich (13):
>>>>>>     drm/exec: fix memory leak in drm_exec_prepare_obj()
>>>>>>     drm: manager to keep track of GPUs VA mappings
>>>>>>     drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>>>>>>     drm/nouveau: new VM_BIND uapi interfaces
>>>>>>     drm/nouveau: get vmm via nouveau_cli_vmm()
>>>>>>     drm/nouveau: bo: initialize GEM GPU VA interface
>>>>>>     drm/nouveau: move usercopy helpers to nouveau_drv.h
>>>>>>     drm/nouveau: fence: fail to emit when fence context is killed
>>>>>>     drm/nouveau: chan: provide nouveau_channel_kill()
>>>>>>     drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>>>>>     drm/nouveau: implement uvmm for user mode bindings
>>>>>>     drm/nouveau: implement new VM_BIND UAPI
>>>>>>     drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>>>>>
>>>>>>    Documentation/gpu/driver-uapi.rst             |   11 +
>>>>>>    Documentation/gpu/drm-mm.rst                  |   43 +
>>>>>>    drivers/gpu/drm/Kconfig                       |    6 +
>>>>>>    drivers/gpu/drm/Makefile                      |    3 +
>>>>>>    drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>>>>>>    drivers/gpu/drm/drm_debugfs.c                 |   56 +
>>>>>>    drivers/gpu/drm/drm_exec.c                    |  294 ++++
>>>>>>    drivers/gpu/drm/drm_gem.c                     |    3 +
>>>>>>    drivers/gpu/drm/drm_gpuva_mgr.c               | 1323
>>>>>> +++++++++++++++++
>>>>>>    drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>>>>>    drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>>>>>    drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>>>>>>    drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>>>>>>    .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>>>>>>    drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>>>>>>    drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>>>>>>    drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>>>>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>>>>>    drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>>>>>>    .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>>>>>>    .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>>>>>>    drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>>>>>>    drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>>>>>>    include/drm/drm_debugfs.h                     |   25 +
>>>>>>    include/drm/drm_drv.h                         |    6 +
>>>>>>    include/drm/drm_exec.h                        |  144 ++
>>>>>>    include/drm/drm_gem.h                         |   75 +
>>>>>>    include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>>>>>>    include/uapi/drm/nouveau_drm.h                |  216 +++
>>>>>>    47 files changed, 5266 insertions(+), 126 deletions(-)
>>>>>>    create mode 100644 drivers/gpu/drm/drm_exec.c
>>>>>>    create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>>>>>    create mode 100644 include/drm/drm_exec.h
>>>>>>    create mode 100644 include/drm/drm_gpuva_mgr.h
>>>>>>
>>>>>>
>>>>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
>>>>>
>>>>
>>>
>>
>
Alex Deucher Jan. 18, 2023, 4:54 p.m. UTC | #7
On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich <dakr@redhat.com> wrote:
>
>
>
> On 1/18/23 17:30, Alex Deucher wrote:
> > On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
> >>
> >> On 1/18/23 16:37, Christian König wrote:
> >>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> >>>> Hi Christian,
> >>>>
> >>>> On 1/18/23 09:53, Christian König wrote:
> >>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> >>>>>> This patch series provides a new UAPI for the Nouveau driver in
> >>>>>> order to
> >>>>>> support Vulkan features, such as sparse bindings and sparse residency.
> >>>>>>
> >>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
> >>>>>> feature to
> >>>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
> >>>>>>
> >>>>>> The DRM GPUVA manager is indented to help drivers implement
> >>>>>> userspace-manageable
> >>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
> >>>>>> this goal it
> >>>>>> serves the following purposes in this context.
> >>>>>>
> >>>>>>       1) Provide a dedicated range allocator to track GPU VA
> >>>>>> allocations and
> >>>>>>          mappings, making use of the drm_mm range allocator.
> >>>>>
> >>>>> This means that the ranges are allocated by the kernel? If yes that's
> >>>>> a really really bad idea.
> >>>>
> >>>> No, it's just for keeping track of the ranges userspace has allocated.
> >>>
> >>> Ok, that makes more sense.
> >>>
> >>> So basically you have an IOCTL which asks kernel for a free range? Or
> >>> what exactly is the drm_mm used for here?
> >>
> >> Not even that, userspace provides both the base address and the range,
> >> the kernel really just keeps track of things. Though, writing a UAPI on
> >> top of the GPUVA manager asking for a free range instead would be
> >> possible by just adding the corresponding wrapper functions to get a
> >> free hole.
> >>
> >> Currently, and that's what I think I read out of your question, the main
> >> benefit of using drm_mm over simply stuffing the entries into a list or
> >> something boils down to easier collision detection and iterating
> >> sub-ranges of the whole VA space.
> >
> > Why not just do this in userspace?  We have a range manager in
> > libdrm_amdgpu that you could lift out into libdrm or some other
> > helper.
>
> The kernel still needs to keep track of the mappings within the various
> VA spaces, e.g. it silently needs to unmap mappings that are backed by
> BOs that get evicted and remap them once they're validated (or swapped
> back in).

Ok, you are just using this for maintaining the GPU VM space in the kernel.

Alex

>
> >
> > Alex
> >
> >
> >>
> >>>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>
> >>>> - Danilo
> >>>>
> >>>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>>
> >>>>>>       2) Generically connect GPU VA mappings to their backing
> >>>>>> buffers, in
> >>>>>>          particular DRM GEM objects.
> >>>>>>
> >>>>>>       3) Provide a common implementation to perform more complex mapping
> >>>>>>          operations on the GPU VA space. In particular splitting and
> >>>>>> merging
> >>>>>>          of GPU VA mappings, e.g. for intersecting mapping requests
> >>>>>> or partial
> >>>>>>          unmap requests.
> >>>>>>
> >>>>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager,
> >>>>>> itself
> >>>>>> providing the following new interfaces.
> >>>>>>
> >>>>>>       1) Initialize a GPU VA space via the new
> >>>>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
> >>>>>>          for UMDs to specify the portion of VA space managed by the
> >>>>>> kernel and
> >>>>>>          userspace, respectively.
> >>>>>>
> >>>>>>       2) Allocate and free a VA space region as well as bind and
> >>>>>> unbind memory
> >>>>>>          to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND
> >>>>>> ioctl.
> >>>>>>
> >>>>>>       3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> >>>>>>
> >>>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use
> >>>>>> of the DRM
> >>>>>> scheduler to queue jobs and support asynchronous processing with DRM
> >>>>>> syncobjs
> >>>>>> as synchronization mechanism.
> >>>>>>
> >>>>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> >>>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> >>>>>>
> >>>>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec
> >>>>>> (execution context
> >>>>>> for GEM buffers) by Christian König. Since the patch implementing
> >>>>>> drm_exec was
> >>>>>> not yet merged into drm-next it is part of this series, as well as a
> >>>>>> small fix
> >>>>>> for this patch, which was found while testing this series.
> >>>>>>
> >>>>>> This patch series is also available at [1].
> >>>>>>
> >>>>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> >>>>>> corresponding userspace parts for this series.
> >>>>>>
> >>>>>> The Vulkan CTS test suite passes the sparse binding and sparse
> >>>>>> residency test
> >>>>>> cases for the new UAPI together with Dave's Mesa work.
> >>>>>>
> >>>>>> There are also some test cases in the igt-gpu-tools project [3] for
> >>>>>> the new UAPI
> >>>>>> and hence the DRM GPU VA manager. However, most of them are testing
> >>>>>> the DRM GPU
> >>>>>> VA manager's logic through Nouveau's new UAPI and should be
> >>>>>> considered just as
> >>>>>> helper for implementation.
> >>>>>>
> >>>>>> However, I absolutely intend to change those test cases to proper
> >>>>>> kunit test
> >>>>>> cases for the DRM GPUVA manager, once and if we agree on it's
> >>>>>> usefulness and
> >>>>>> design.
> >>>>>>
> >>>>>> [1]
> >>>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
> >>>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> >>>>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> >>>>>> [3]
> >>>>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> >>>>>>
> >>>>>> I also want to give credit to Dave Airlie, who contributed a lot of
> >>>>>> ideas to
> >>>>>> this patch series.
> >>>>>>
> >>>>>> Christian König (1):
> >>>>>>     drm: execution context for GEM buffers
> >>>>>>
> >>>>>> Danilo Krummrich (13):
> >>>>>>     drm/exec: fix memory leak in drm_exec_prepare_obj()
> >>>>>>     drm: manager to keep track of GPUs VA mappings
> >>>>>>     drm: debugfs: provide infrastructure to dump a DRM GPU VA space
> >>>>>>     drm/nouveau: new VM_BIND uapi interfaces
> >>>>>>     drm/nouveau: get vmm via nouveau_cli_vmm()
> >>>>>>     drm/nouveau: bo: initialize GEM GPU VA interface
> >>>>>>     drm/nouveau: move usercopy helpers to nouveau_drv.h
> >>>>>>     drm/nouveau: fence: fail to emit when fence context is killed
> >>>>>>     drm/nouveau: chan: provide nouveau_channel_kill()
> >>>>>>     drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
> >>>>>>     drm/nouveau: implement uvmm for user mode bindings
> >>>>>>     drm/nouveau: implement new VM_BIND UAPI
> >>>>>>     drm/nouveau: debugfs: implement DRM GPU VA debugfs
> >>>>>>
> >>>>>>    Documentation/gpu/driver-uapi.rst             |   11 +
> >>>>>>    Documentation/gpu/drm-mm.rst                  |   43 +
> >>>>>>    drivers/gpu/drm/Kconfig                       |    6 +
> >>>>>>    drivers/gpu/drm/Makefile                      |    3 +
> >>>>>>    drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
> >>>>>>    drivers/gpu/drm/drm_debugfs.c                 |   56 +
> >>>>>>    drivers/gpu/drm/drm_exec.c                    |  294 ++++
> >>>>>>    drivers/gpu/drm/drm_gem.c                     |    3 +
> >>>>>>    drivers/gpu/drm/drm_gpuva_mgr.c               | 1323
> >>>>>> +++++++++++++++++
> >>>>>>    drivers/gpu/drm/nouveau/Kbuild                |    3 +
> >>>>>>    drivers/gpu/drm/nouveau/Kconfig               |    2 +
> >>>>>>    drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
> >>>>>>    drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
> >>>>>>    .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
> >>>>>>    drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
> >>>>>>    .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
> >>>>>>    .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
> >>>>>>    drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
> >>>>>>    drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
> >>>>>>    include/drm/drm_debugfs.h                     |   25 +
> >>>>>>    include/drm/drm_drv.h                         |    6 +
> >>>>>>    include/drm/drm_exec.h                        |  144 ++
> >>>>>>    include/drm/drm_gem.h                         |   75 +
> >>>>>>    include/drm/drm_gpuva_mgr.h                   |  527 +++++++
> >>>>>>    include/uapi/drm/nouveau_drm.h                |  216 +++
> >>>>>>    47 files changed, 5266 insertions(+), 126 deletions(-)
> >>>>>>    create mode 100644 drivers/gpu/drm/drm_exec.c
> >>>>>>    create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
> >>>>>>    create mode 100644 include/drm/drm_exec.h
> >>>>>>    create mode 100644 include/drm/drm_gpuva_mgr.h
> >>>>>>
> >>>>>>
> >>>>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
> >>>>>
> >>>>
> >>>
> >>
> >
>
Dave Airlie Jan. 18, 2023, 7:17 p.m. UTC | #8
On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich <dakr@redhat.com> wrote:
> >
> >
> >
> > On 1/18/23 17:30, Alex Deucher wrote:
> > > On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
> > >>
> > >> On 1/18/23 16:37, Christian König wrote:
> > >>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> > >>>> Hi Christian,
> > >>>>
> > >>>> On 1/18/23 09:53, Christian König wrote:
> > >>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> > >>>>>> This patch series provides a new UAPI for the Nouveau driver in
> > >>>>>> order to
> > >>>>>> support Vulkan features, such as sparse bindings and sparse residency.
> > >>>>>>
> > >>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
> > >>>>>> feature to
> > >>>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
> > >>>>>>
> > >>>>>> The DRM GPUVA manager is indented to help drivers implement
> > >>>>>> userspace-manageable
> > >>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
> > >>>>>> this goal it
> > >>>>>> serves the following purposes in this context.
> > >>>>>>
> > >>>>>>       1) Provide a dedicated range allocator to track GPU VA
> > >>>>>> allocations and
> > >>>>>>          mappings, making use of the drm_mm range allocator.
> > >>>>>
> > >>>>> This means that the ranges are allocated by the kernel? If yes that's
> > >>>>> a really really bad idea.
> > >>>>
> > >>>> No, it's just for keeping track of the ranges userspace has allocated.
> > >>>
> > >>> Ok, that makes more sense.
> > >>>
> > >>> So basically you have an IOCTL which asks kernel for a free range? Or
> > >>> what exactly is the drm_mm used for here?
> > >>
> > >> Not even that, userspace provides both the base address and the range,
> > >> the kernel really just keeps track of things. Though, writing a UAPI on
> > >> top of the GPUVA manager asking for a free range instead would be
> > >> possible by just adding the corresponding wrapper functions to get a
> > >> free hole.
> > >>
> > >> Currently, and that's what I think I read out of your question, the main
> > >> benefit of using drm_mm over simply stuffing the entries into a list or
> > >> something boils down to easier collision detection and iterating
> > >> sub-ranges of the whole VA space.
> > >
> > > Why not just do this in userspace?  We have a range manager in
> > > libdrm_amdgpu that you could lift out into libdrm or some other
> > > helper.
> >
> > The kernel still needs to keep track of the mappings within the various
> > VA spaces, e.g. it silently needs to unmap mappings that are backed by
> > BOs that get evicted and remap them once they're validated (or swapped
> > back in).
>
> Ok, you are just using this for maintaining the GPU VM space in the kernel.
>

Yes the idea behind having common code wrapping drm_mm for this is to
allow us to make the rules consistent across drivers.

Userspace (generally Vulkan, some compute) has interfaces that pretty
much dictate a lot of how VMA tracking works, esp around lifetimes,
sparse mappings and splitting/merging underlying page tables, I'd
really like this to be more consistent across drivers, because already
I think we've seen with freedreno some divergence from amdgpu and we
also have i915/xe to deal with. I'd like to at least have one place
that we can say this is how it should work, since this is something
that *should* be consistent across drivers mostly, as it is more about
how the uapi is exposed.

Dave.
Christian König Jan. 18, 2023, 7:48 p.m. UTC | #9
Am 18.01.23 um 20:17 schrieb Dave Airlie:
> On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
>> On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich <dakr@redhat.com> wrote:
>>>
>>>
>>> On 1/18/23 17:30, Alex Deucher wrote:
>>>> On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
>>>>> On 1/18/23 16:37, Christian König wrote:
>>>>>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> On 1/18/23 09:53, Christian König wrote:
>>>>>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>>>>>>>> This patch series provides a new UAPI for the Nouveau driver in
>>>>>>>>> order to
>>>>>>>>> support Vulkan features, such as sparse bindings and sparse residency.
>>>>>>>>>
>>>>>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
>>>>>>>>> feature to
>>>>>>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>>>>>>>>
>>>>>>>>> The DRM GPUVA manager is indented to help drivers implement
>>>>>>>>> userspace-manageable
>>>>>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
>>>>>>>>> this goal it
>>>>>>>>> serves the following purposes in this context.
>>>>>>>>>
>>>>>>>>>        1) Provide a dedicated range allocator to track GPU VA
>>>>>>>>> allocations and
>>>>>>>>>           mappings, making use of the drm_mm range allocator.
>>>>>>>> This means that the ranges are allocated by the kernel? If yes that's
>>>>>>>> a really really bad idea.
>>>>>>> No, it's just for keeping track of the ranges userspace has allocated.
>>>>>> Ok, that makes more sense.
>>>>>>
>>>>>> So basically you have an IOCTL which asks kernel for a free range? Or
>>>>>> what exactly is the drm_mm used for here?
>>>>> Not even that, userspace provides both the base address and the range,
>>>>> the kernel really just keeps track of things. Though, writing a UAPI on
>>>>> top of the GPUVA manager asking for a free range instead would be
>>>>> possible by just adding the corresponding wrapper functions to get a
>>>>> free hole.
>>>>>
>>>>> Currently, and that's what I think I read out of your question, the main
>>>>> benefit of using drm_mm over simply stuffing the entries into a list or
>>>>> something boils down to easier collision detection and iterating
>>>>> sub-ranges of the whole VA space.
>>>> Why not just do this in userspace?  We have a range manager in
>>>> libdrm_amdgpu that you could lift out into libdrm or some other
>>>> helper.
>>> The kernel still needs to keep track of the mappings within the various
>>> VA spaces, e.g. it silently needs to unmap mappings that are backed by
>>> BOs that get evicted and remap them once they're validated (or swapped
>>> back in).
>> Ok, you are just using this for maintaining the GPU VM space in the kernel.
>>
> Yes the idea behind having common code wrapping drm_mm for this is to
> allow us to make the rules consistent across drivers.
>
> Userspace (generally Vulkan, some compute) has interfaces that pretty
> much dictate a lot of how VMA tracking works, esp around lifetimes,
> sparse mappings and splitting/merging underlying page tables, I'd
> really like this to be more consistent across drivers, because already
> I think we've seen with freedreno some divergence from amdgpu and we
> also have i915/xe to deal with. I'd like to at least have one place
> that we can say this is how it should work, since this is something
> that *should* be consistent across drivers mostly, as it is more about
> how the uapi is exposed.

That's a really good idea, but the implementation with drm_mm won't work 
like that.

We have Vulkan applications which use the sparse feature to create 
literally millions of mappings. That's why I have fine tuned the mapping 
structure in amdgpu down to ~80 bytes IIRC and save every CPU cycle 
possible in the handling of that.

A drm_mm_node is more in the range of ~200 bytes and certainly not 
suitable for this kind of job.

I strongly suggest to rather use a good bunch of the amdgpu VM code as 
blueprint for the common infrastructure.

Regards,
Christian.

>
> Dave.
Danilo Krummrich Jan. 19, 2023, 4:04 a.m. UTC | #10
On 1/18/23 20:48, Christian König wrote:
> Am 18.01.23 um 20:17 schrieb Dave Airlie:
>> On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
>>> On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich <dakr@redhat.com> 
>>> wrote:
>>>>
>>>>
>>>> On 1/18/23 17:30, Alex Deucher wrote:
>>>>> On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> 
>>>>> wrote:
>>>>>> On 1/18/23 16:37, Christian König wrote:
>>>>>>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
>>>>>>>> Hi Christian,
>>>>>>>>
>>>>>>>> On 1/18/23 09:53, Christian König wrote:
>>>>>>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>>>>>>>>> This patch series provides a new UAPI for the Nouveau driver in
>>>>>>>>>> order to
>>>>>>>>>> support Vulkan features, such as sparse bindings and sparse 
>>>>>>>>>> residency.
>>>>>>>>>>
>>>>>>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM 
>>>>>>>>>> core
>>>>>>>>>> feature to
>>>>>>>>>> keep track of GPU virtual address (VA) mappings in a more 
>>>>>>>>>> generic way.
>>>>>>>>>>
>>>>>>>>>> The DRM GPUVA manager is indented to help drivers implement
>>>>>>>>>> userspace-manageable
>>>>>>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
>>>>>>>>>> this goal it
>>>>>>>>>> serves the following purposes in this context.
>>>>>>>>>>
>>>>>>>>>>        1) Provide a dedicated range allocator to track GPU VA
>>>>>>>>>> allocations and
>>>>>>>>>>           mappings, making use of the drm_mm range allocator.
>>>>>>>>> This means that the ranges are allocated by the kernel? If yes 
>>>>>>>>> that's
>>>>>>>>> a really really bad idea.
>>>>>>>> No, it's just for keeping track of the ranges userspace has 
>>>>>>>> allocated.
>>>>>>> Ok, that makes more sense.
>>>>>>>
>>>>>>> So basically you have an IOCTL which asks kernel for a free 
>>>>>>> range? Or
>>>>>>> what exactly is the drm_mm used for here?
>>>>>> Not even that, userspace provides both the base address and the 
>>>>>> range,
>>>>>> the kernel really just keeps track of things. Though, writing a 
>>>>>> UAPI on
>>>>>> top of the GPUVA manager asking for a free range instead would be
>>>>>> possible by just adding the corresponding wrapper functions to get a
>>>>>> free hole.
>>>>>>
>>>>>> Currently, and that's what I think I read out of your question, 
>>>>>> the main
>>>>>> benefit of using drm_mm over simply stuffing the entries into a 
>>>>>> list or
>>>>>> something boils down to easier collision detection and iterating
>>>>>> sub-ranges of the whole VA space.
>>>>> Why not just do this in userspace?  We have a range manager in
>>>>> libdrm_amdgpu that you could lift out into libdrm or some other
>>>>> helper.
>>>> The kernel still needs to keep track of the mappings within the various
>>>> VA spaces, e.g. it silently needs to unmap mappings that are backed by
>>>> BOs that get evicted and remap them once they're validated (or swapped
>>>> back in).
>>> Ok, you are just using this for maintaining the GPU VM space in the 
>>> kernel.
>>>
>> Yes the idea behind having common code wrapping drm_mm for this is to
>> allow us to make the rules consistent across drivers.
>>
>> Userspace (generally Vulkan, some compute) has interfaces that pretty
>> much dictate a lot of how VMA tracking works, esp around lifetimes,
>> sparse mappings and splitting/merging underlying page tables, I'd
>> really like this to be more consistent across drivers, because already
>> I think we've seen with freedreno some divergence from amdgpu and we
>> also have i915/xe to deal with. I'd like to at least have one place
>> that we can say this is how it should work, since this is something
>> that *should* be consistent across drivers mostly, as it is more about
>> how the uapi is exposed.
> 
> That's a really good idea, but the implementation with drm_mm won't work 
> like that.
> 
> We have Vulkan applications which use the sparse feature to create 
> literally millions of mappings. That's why I have fine tuned the mapping 
> structure in amdgpu down to ~80 bytes IIRC and save every CPU cycle 
> possible in the handling of that.

That's a valuable information. Can you recommend such an application for 
testing / benchmarking?

Your optimization effort sounds great. May it be worth thinking about 
generalizing your approach by itself and stacking the drm_gpuva_manager 
on top of it?

> 
> A drm_mm_node is more in the range of ~200 bytes and certainly not 
> suitable for this kind of job.
> 
> I strongly suggest to rather use a good bunch of the amdgpu VM code as 
> blueprint for the common infrastructure.

I will definitely have look.

> 
> Regards,
> Christian.
> 
>>
>> Dave.
>
Matthew Brost Jan. 19, 2023, 5:23 a.m. UTC | #11
On Thu, Jan 19, 2023 at 05:04:32AM +0100, Danilo Krummrich wrote:
> On 1/18/23 20:48, Christian König wrote:
> > Am 18.01.23 um 20:17 schrieb Dave Airlie:
> > > On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich
> > > > <dakr@redhat.com> wrote:
> > > > > 
> > > > > 
> > > > > On 1/18/23 17:30, Alex Deucher wrote:
> > > > > > On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich
> > > > > > <dakr@redhat.com> wrote:
> > > > > > > On 1/18/23 16:37, Christian König wrote:
> > > > > > > > Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> > > > > > > > > Hi Christian,
> > > > > > > > > 
> > > > > > > > > On 1/18/23 09:53, Christian König wrote:
> > > > > > > > > > Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> > > > > > > > > > > This patch series provides a new UAPI for the Nouveau driver in
> > > > > > > > > > > order to
> > > > > > > > > > > support Vulkan features, such as
> > > > > > > > > > > sparse bindings and sparse
> > > > > > > > > > > residency.
> > > > > > > > > > > 
> > > > > > > > > > > Furthermore, with the DRM GPUVA
> > > > > > > > > > > manager it provides a new DRM core
> > > > > > > > > > > feature to
> > > > > > > > > > > keep track of GPU virtual address
> > > > > > > > > > > (VA) mappings in a more generic way.
> > > > > > > > > > > 
> > > > > > > > > > > The DRM GPUVA manager is indented to help drivers implement
> > > > > > > > > > > userspace-manageable
> > > > > > > > > > > GPU VA spaces in reference to the Vulkan API. In order to achieve
> > > > > > > > > > > this goal it
> > > > > > > > > > > serves the following purposes in this context.
> > > > > > > > > > > 
> > > > > > > > > > >        1) Provide a dedicated range allocator to track GPU VA
> > > > > > > > > > > allocations and
> > > > > > > > > > >           mappings, making use of the drm_mm range allocator.
> > > > > > > > > > This means that the ranges are allocated
> > > > > > > > > > by the kernel? If yes that's
> > > > > > > > > > a really really bad idea.
> > > > > > > > > No, it's just for keeping track of the
> > > > > > > > > ranges userspace has allocated.
> > > > > > > > Ok, that makes more sense.
> > > > > > > > 
> > > > > > > > So basically you have an IOCTL which asks kernel
> > > > > > > > for a free range? Or
> > > > > > > > what exactly is the drm_mm used for here?
> > > > > > > Not even that, userspace provides both the base
> > > > > > > address and the range,
> > > > > > > the kernel really just keeps track of things.
> > > > > > > Though, writing a UAPI on
> > > > > > > top of the GPUVA manager asking for a free range instead would be
> > > > > > > possible by just adding the corresponding wrapper functions to get a
> > > > > > > free hole.
> > > > > > > 
> > > > > > > Currently, and that's what I think I read out of
> > > > > > > your question, the main
> > > > > > > benefit of using drm_mm over simply stuffing the
> > > > > > > entries into a list or
> > > > > > > something boils down to easier collision detection and iterating
> > > > > > > sub-ranges of the whole VA space.
> > > > > > Why not just do this in userspace?  We have a range manager in
> > > > > > libdrm_amdgpu that you could lift out into libdrm or some other
> > > > > > helper.
> > > > > The kernel still needs to keep track of the mappings within the various
> > > > > VA spaces, e.g. it silently needs to unmap mappings that are backed by
> > > > > BOs that get evicted and remap them once they're validated (or swapped
> > > > > back in).
> > > > Ok, you are just using this for maintaining the GPU VM space in
> > > > the kernel.
> > > > 
> > > Yes the idea behind having common code wrapping drm_mm for this is to
> > > allow us to make the rules consistent across drivers.
> > > 
> > > Userspace (generally Vulkan, some compute) has interfaces that pretty
> > > much dictate a lot of how VMA tracking works, esp around lifetimes,
> > > sparse mappings and splitting/merging underlying page tables, I'd
> > > really like this to be more consistent across drivers, because already
> > > I think we've seen with freedreno some divergence from amdgpu and we
> > > also have i915/xe to deal with. I'd like to at least have one place
> > > that we can say this is how it should work, since this is something
> > > that *should* be consistent across drivers mostly, as it is more about
> > > how the uapi is exposed.
> > 
> > That's a really good idea, but the implementation with drm_mm won't work
> > like that.
> > 
> > We have Vulkan applications which use the sparse feature to create
> > literally millions of mappings. That's why I have fine tuned the mapping

Is this not an application issue? Millions of mappings seems a bit
absurd to me.

> > structure in amdgpu down to ~80 bytes IIRC and save every CPU cycle
> > possible in the handling of that.

We might need to bit of work here in Xe as our xe_vma structure is quite
big as we currently use it as dumping ground for various features.

> 
> That's a valuable information. Can you recommend such an application for
> testing / benchmarking?
>

Also interested.
 
> Your optimization effort sounds great. May it be worth thinking about
> generalizing your approach by itself and stacking the drm_gpuva_manager on
> top of it?
>

FWIW the Xe is on board with the drm_gpuva_manager effort, we basically
open code all of this right now. I'd like to port over to
drm_gpuva_manager ASAP so we can contribute and help find a viable
solution for all of us.

Matt
 
> > 
> > A drm_mm_node is more in the range of ~200 bytes and certainly not
> > suitable for this kind of job.
> > 
> > I strongly suggest to rather use a good bunch of the amdgpu VM code as
> > blueprint for the common infrastructure.
> 
> I will definitely have look.
> 
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > Dave.
> > 
>
Christian König Jan. 19, 2023, 11:33 a.m. UTC | #12
Am 19.01.23 um 06:23 schrieb Matthew Brost:
> [SNIP]
>>>> Userspace (generally Vulkan, some compute) has interfaces that pretty
>>>> much dictate a lot of how VMA tracking works, esp around lifetimes,
>>>> sparse mappings and splitting/merging underlying page tables, I'd
>>>> really like this to be more consistent across drivers, because already
>>>> I think we've seen with freedreno some divergence from amdgpu and we
>>>> also have i915/xe to deal with. I'd like to at least have one place
>>>> that we can say this is how it should work, since this is something
>>>> that *should* be consistent across drivers mostly, as it is more about
>>>> how the uapi is exposed.
>>> That's a really good idea, but the implementation with drm_mm won't work
>>> like that.
>>>
>>> We have Vulkan applications which use the sparse feature to create
>>> literally millions of mappings. That's why I have fine tuned the mapping
> Is this not an application issue? Millions of mappings seems a bit
> absurd to me.

That's unfortunately how some games are designed these days.

>>> structure in amdgpu down to ~80 bytes IIRC and save every CPU cycle
>>> possible in the handling of that.
> We might need to bit of work here in Xe as our xe_vma structure is quite
> big as we currently use it as dumping ground for various features.

We have done that as well and it turned out to be a bad idea. At one 
point we added some power management information into the mapping 
structure, but quickly reverted that.

>> That's a valuable information. Can you recommend such an application for
>> testing / benchmarking?
>>
> Also interested.

On of the most demanding ones is Forza Horizon 5. The general approach 
of that game seems to be to allocate 64GiB of address space (equals 16 
million 4kiB pages) and then mmap() whatever data it needs into that 
self managed space, assuming that every 4KiB page is individually 
mapable to a different location.

>> Your optimization effort sounds great. May it be worth thinking about
>> generalizing your approach by itself and stacking the drm_gpuva_manager on
>> top of it?
>>
> FWIW the Xe is on board with the drm_gpuva_manager effort, we basically
> open code all of this right now. I'd like to port over to
> drm_gpuva_manager ASAP so we can contribute and help find a viable
> solution for all of us.

Sounds good. I haven't looked into the drm_gpuva_manager code yet, but a 
few design notes I've leaned from amdgpu:

Separate address space management (drm_mm) from page table management. 
In other words when an application asks for 64GiB for free address space 
you don't look into the page table structures, but rather into a 
separate drm_mm instance. In amdgpu we even moved the later into 
userspace, but the general take away is that you have only a handful of 
address space requests while you have tons of mapping/unmapping requests.

Separate the tracking structure into two, one for each BO+VM combination 
(we call that amdgpu_bo_va) and one for each mapping (called 
amdgpu_bo_va_mapping). We unfortunately use that for our hw dependent 
state machine as well, so it isn't easily generalize-able.

I've gone back on forth on merging VMA and then not again. Not merging 
them can save quite a bit of overhead, but results in much more mappings 
for some use cases.

Regards,
Christian.

>
> Matt
>   
>>> A drm_mm_node is more in the range of ~200 bytes and certainly not
>>> suitable for this kind of job.
>>>
>>> I strongly suggest to rather use a good bunch of the amdgpu VM code as
>>> blueprint for the common infrastructure.
>> I will definitely have look.
>>
>>> Regards,
>>> Christian.
>>>
>>>> Dave.
Oded Gabbay Feb. 6, 2023, 2:48 p.m. UTC | #13
On Thu, Jan 19, 2023 at 7:24 AM Matthew Brost <matthew.brost@intel.com> wrote:
>
> On Thu, Jan 19, 2023 at 05:04:32AM +0100, Danilo Krummrich wrote:
> > On 1/18/23 20:48, Christian König wrote:
> > > Am 18.01.23 um 20:17 schrieb Dave Airlie:
> > > > On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > > On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich
> > > > > <dakr@redhat.com> wrote:
> > > > > >
> > > > > >
> > > > > > On 1/18/23 17:30, Alex Deucher wrote:
> > > > > > > On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich
> > > > > > > <dakr@redhat.com> wrote:
> > > > > > > > On 1/18/23 16:37, Christian König wrote:
> > > > > > > > > Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> > > > > > > > > > Hi Christian,
> > > > > > > > > >
> > > > > > > > > > On 1/18/23 09:53, Christian König wrote:
> > > > > > > > > > > Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> > > > > > > > > > > > This patch series provides a new UAPI for the Nouveau driver in
> > > > > > > > > > > > order to
> > > > > > > > > > > > support Vulkan features, such as
> > > > > > > > > > > > sparse bindings and sparse
> > > > > > > > > > > > residency.
> > > > > > > > > > > >
> > > > > > > > > > > > Furthermore, with the DRM GPUVA
> > > > > > > > > > > > manager it provides a new DRM core
> > > > > > > > > > > > feature to
> > > > > > > > > > > > keep track of GPU virtual address
> > > > > > > > > > > > (VA) mappings in a more generic way.
> > > > > > > > > > > >
> > > > > > > > > > > > The DRM GPUVA manager is indented to help drivers implement
> > > > > > > > > > > > userspace-manageable
> > > > > > > > > > > > GPU VA spaces in reference to the Vulkan API. In order to achieve
> > > > > > > > > > > > this goal it
> > > > > > > > > > > > serves the following purposes in this context.
> > > > > > > > > > > >
> > > > > > > > > > > >        1) Provide a dedicated range allocator to track GPU VA
> > > > > > > > > > > > allocations and
> > > > > > > > > > > >           mappings, making use of the drm_mm range allocator.
> > > > > > > > > > > This means that the ranges are allocated
> > > > > > > > > > > by the kernel? If yes that's
> > > > > > > > > > > a really really bad idea.
> > > > > > > > > > No, it's just for keeping track of the
> > > > > > > > > > ranges userspace has allocated.
> > > > > > > > > Ok, that makes more sense.
> > > > > > > > >
> > > > > > > > > So basically you have an IOCTL which asks kernel
> > > > > > > > > for a free range? Or
> > > > > > > > > what exactly is the drm_mm used for here?
> > > > > > > > Not even that, userspace provides both the base
> > > > > > > > address and the range,
> > > > > > > > the kernel really just keeps track of things.
> > > > > > > > Though, writing a UAPI on
> > > > > > > > top of the GPUVA manager asking for a free range instead would be
> > > > > > > > possible by just adding the corresponding wrapper functions to get a
> > > > > > > > free hole.
> > > > > > > >
> > > > > > > > Currently, and that's what I think I read out of
> > > > > > > > your question, the main
> > > > > > > > benefit of using drm_mm over simply stuffing the
> > > > > > > > entries into a list or
> > > > > > > > something boils down to easier collision detection and iterating
> > > > > > > > sub-ranges of the whole VA space.
> > > > > > > Why not just do this in userspace?  We have a range manager in
> > > > > > > libdrm_amdgpu that you could lift out into libdrm or some other
> > > > > > > helper.
> > > > > > The kernel still needs to keep track of the mappings within the various
> > > > > > VA spaces, e.g. it silently needs to unmap mappings that are backed by
> > > > > > BOs that get evicted and remap them once they're validated (or swapped
> > > > > > back in).
> > > > > Ok, you are just using this for maintaining the GPU VM space in
> > > > > the kernel.
> > > > >
> > > > Yes the idea behind having common code wrapping drm_mm for this is to
> > > > allow us to make the rules consistent across drivers.
> > > >
> > > > Userspace (generally Vulkan, some compute) has interfaces that pretty
> > > > much dictate a lot of how VMA tracking works, esp around lifetimes,
> > > > sparse mappings and splitting/merging underlying page tables, I'd
> > > > really like this to be more consistent across drivers, because already
> > > > I think we've seen with freedreno some divergence from amdgpu and we
> > > > also have i915/xe to deal with. I'd like to at least have one place
> > > > that we can say this is how it should work, since this is something
> > > > that *should* be consistent across drivers mostly, as it is more about
> > > > how the uapi is exposed.
> > >
> > > That's a really good idea, but the implementation with drm_mm won't work
> > > like that.
> > >
> > > We have Vulkan applications which use the sparse feature to create
> > > literally millions of mappings. That's why I have fine tuned the mapping
>
> Is this not an application issue? Millions of mappings seems a bit
> absurd to me.
If I look at the most extreme case for AI, assuming 256GB of HBM
memory and page mapping of 2MB, we get to 128K of mappings. But that's
really the extreme case imo. I assume most mappings will be much
larger. In fact, in the most realistic scenario of large-scale
training, a single user will probably map the entire HBM memory using
1GB pages.

I have also a question, could this GPUVA code manage VA ranges
mappings for userptr mappings, assuming we work without svm/uva/usm
(pointer-is-a-pointer) ? Because then we are talking about possible
4KB mappings of 1 - 1.5 TB host server RAM (Implied in my question is
the assumption this can be used also for non-VK use-cases. Please tell
me if I'm totally wrong here).

Thanks,
Oded
Danilo Krummrich March 16, 2023, 4:39 p.m. UTC | #14
Hi Oded,

sorry for the late response, somehow this mail slipped through.

On 2/6/23 15:48, Oded Gabbay wrote:
> On Thu, Jan 19, 2023 at 7:24 AM Matthew Brost <matthew.brost@intel.com> wrote:
>> Is this not an application issue? Millions of mappings seems a bit
>> absurd to me.
> If I look at the most extreme case for AI, assuming 256GB of HBM
> memory and page mapping of 2MB, we get to 128K of mappings. But that's
> really the extreme case imo. I assume most mappings will be much
> larger. In fact, in the most realistic scenario of large-scale
> training, a single user will probably map the entire HBM memory using
> 1GB pages.
> 
> I have also a question, could this GPUVA code manage VA ranges
> mappings for userptr mappings, assuming we work without svm/uva/usm
> (pointer-is-a-pointer) ? Because then we are talking about possible
> 4KB mappings of 1 - 1.5 TB host server RAM (Implied in my question is
> the assumption this can be used also for non-VK use-cases. Please tell
> me if I'm totally wrong here).

In V2 I switched from drm_mm to maple tree, which should improve 
handling of lots of entries. I also dropped the requirement for GPUVA 
entries to be backed by a valid GEM object.

I think it can be used for non-VK use-cases. It basically just keeps 
track of mappings (not allocating them in the sense of finding a hole 
and providing a base address for a given size). There are basic 
functions to insert and remove entries. For those basic functions it is 
ensured that colliding entries can't be inserted and only a specific 
given entry can be removed, rather than e.g. an arbitrary range.

There are also more advanced functions where users of the GPUVA manager 
can request to "force map" a new mapping and to unmap a given range. The 
GPUVA manager will figure out the (sub-)operations to make this happen 
(.e.g. remove mappings in the way, split up mappings, etc.) and either 
provide these operations (or steps) through callbacks or though a list 
of operations to the caller to process them.

Are there any other use-cases or features you could think of that would 
be beneficial for accelerators?

- Danilo

> 
> Thanks,
> Oded
>