mbox series

[RFC,v3,0/9] Waitboost drm syncobj waits

Message ID 20230216105921.624960-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
Headers show
Series Waitboost drm syncobj waits | expand

Message

Tvrtko Ursulin Feb. 16, 2023, 10:59 a.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In i915 we have this concept of "wait boosting" where we give a priority boost
for instance to fences which are actively waited upon from userspace. This has
it's pros and cons and can certainly be discussed at lenght. However fact is
some workloads really like it.

Problem is that with the arrival of drm syncobj and a new userspace waiting
entry point it added, the waitboost mechanism was bypassed. AFAIU this mostly
happens with all Vulkan based userspaces. Hence I cooked up this mini series to
see if discussion about restoring the waitboost can be had.

The series adds a concept of "wait count" to dma fence which is intended to
represent explicit userspace waits. It is therefore incremented for every
explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback (like
dma_fence_add_callback but from explicit/userspace wait paths). Individual
drivers can then inspect this via dma_fence_wait_count() and decide to wait
boost the waits on such fences.

Patch has been slightly tested for performance impact by Google using some clvk
workloads and shows a good improvement (frame time improved from 16ms to 13ms).

It is also important to mention that benefits of waitboosting are not only about
workloads related to frame presentation time, but also for serialized
computations which constantly move between the CPU and GPU.

*)
https://gitlab.freedesktop.org/drm/intel/-/issues/8014

v2:
 * Small fixups based on CI feedback:
    * Handle decrement correctly for already signalled case while adding callback.
    * Remove i915 assert which was making sure struct i915_request does not grow.
 * Split out the i915 patch into three separate functional changes.

v3:
 * Handle drivers which open-code callback additions.

Tvrtko Ursulin (9):
  dma-fence: Move i915 helpers into common
  dma-fence: Add callback initialization helper
  drm/i915: Use fence callback initialization helper
  drm/vmwgfx: Use fence callback initialization helper
  dma-fence: Track explicit waiters
  drm/syncobj: Mark syncobj waits as external waiters
  drm/i915: Waitboost external waits
  drm/i915: Mark waits as explicit
  drm/i915: Wait boost requests waited upon by others

 drivers/dma-buf/dma-fence.c                 | 137 ++++++++++++++------
 drivers/gpu/drm/drm_syncobj.c               |   6 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  22 ----
 drivers/gpu/drm/i915/gt/intel_engine_pm.c   |   1 -
 drivers/gpu/drm/i915/i915_active.c          |   2 +-
 drivers/gpu/drm/i915/i915_active.h          |   2 +-
 drivers/gpu/drm/i915/i915_request.c         |  13 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c       |   2 +-
 include/linux/dma-fence.h                   |  26 ++++
 9 files changed, 141 insertions(+), 70 deletions(-)

Comments

Daniel Vetter Feb. 16, 2023, 11:19 a.m. UTC | #1
On Thu, Feb 16, 2023 at 10:59:12AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> In i915 we have this concept of "wait boosting" where we give a priority boost
> for instance to fences which are actively waited upon from userspace. This has
> it's pros and cons and can certainly be discussed at lenght. However fact is
> some workloads really like it.
> 
> Problem is that with the arrival of drm syncobj and a new userspace waiting
> entry point it added, the waitboost mechanism was bypassed. AFAIU this mostly
> happens with all Vulkan based userspaces. Hence I cooked up this mini series to
> see if discussion about restoring the waitboost can be had.
> 
> The series adds a concept of "wait count" to dma fence which is intended to
> represent explicit userspace waits. It is therefore incremented for every
> explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback (like
> dma_fence_add_callback but from explicit/userspace wait paths). Individual
> drivers can then inspect this via dma_fence_wait_count() and decide to wait
> boost the waits on such fences.
> 
> Patch has been slightly tested for performance impact by Google using some clvk
> workloads and shows a good improvement (frame time improved from 16ms to 13ms).
> 
> It is also important to mention that benefits of waitboosting are not only about
> workloads related to frame presentation time, but also for serialized
> computations which constantly move between the CPU and GPU.

I think this should be integrated with https://lore.kernel.org/all/20210903184806.1680887-1-robdclark@gmail.com/
so that we have one overall approach here that works for all drivers.
Obviously should include support for all interested parties.
-Daniel

> 
> *)
> https://gitlab.freedesktop.org/drm/intel/-/issues/8014
> 
> v2:
>  * Small fixups based on CI feedback:
>     * Handle decrement correctly for already signalled case while adding callback.
>     * Remove i915 assert which was making sure struct i915_request does not grow.
>  * Split out the i915 patch into three separate functional changes.
> 
> v3:
>  * Handle drivers which open-code callback additions.
> 
> Tvrtko Ursulin (9):
>   dma-fence: Move i915 helpers into common
>   dma-fence: Add callback initialization helper
>   drm/i915: Use fence callback initialization helper
>   drm/vmwgfx: Use fence callback initialization helper
>   dma-fence: Track explicit waiters
>   drm/syncobj: Mark syncobj waits as external waiters
>   drm/i915: Waitboost external waits
>   drm/i915: Mark waits as explicit
>   drm/i915: Wait boost requests waited upon by others
> 
>  drivers/dma-buf/dma-fence.c                 | 137 ++++++++++++++------
>  drivers/gpu/drm/drm_syncobj.c               |   6 +-
>  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  22 ----
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c   |   1 -
>  drivers/gpu/drm/i915/i915_active.c          |   2 +-
>  drivers/gpu/drm/i915/i915_active.h          |   2 +-
>  drivers/gpu/drm/i915/i915_request.c         |  13 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c       |   2 +-
>  include/linux/dma-fence.h                   |  26 ++++
>  9 files changed, 141 insertions(+), 70 deletions(-)
> 
> -- 
> 2.34.1
>