mbox series

[v3,0/4] Improve anti-pre-emption w/a for compute workloads

Message ID 20220303223737.708659-1-John.C.Harrison@Intel.com (mailing list archive)
Headers show
Series Improve anti-pre-emption w/a for compute workloads | expand

Message

John Harrison March 3, 2022, 10:37 p.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

Compute workloads are inherently not pre-emptible on current hardware.
Thus the pre-emption timeout was disabled as a workaround to prevent
unwanted resets. Instead, the hang detection was left to the heartbeat
and its (longer) timeout. This is undesirable with GuC submission as
the heartbeat is a full GT reset rather than a per engine reset and so
is much more destructive. Instead, just bump the pre-emption timeout
to a big value. Also, update the heartbeat to allow such a long
pre-emption delay in the final heartbeat period.

v2: Add clamping helpers.
v3: Remove long timeout algorithm and replace with hard coded value
(review feedback from Tvrtko). Also, fix execlist selftest failure and
fix bug in compute enabling patch related to pre-emption timeouts.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>


John Harrison (4):
  drm/i915/guc: Limit scheduling properties to avoid overflow
  drm/i915: Fix compute pre-emption w/a to apply to compute engines
  drm/i915: Make the heartbeat play nice with long pre-emption timeouts
  drm/i915: Improve long running OCL w/a for GuC submission

 drivers/gpu/drm/i915/Kconfig.profile          | 26 +++++-
 drivers/gpu/drm/i915/gt/intel_engine.h        |  6 ++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 92 +++++++++++++++++--
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 18 ++++
 drivers/gpu/drm/i915/gt/sysfs_engines.c       | 25 +++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  9 ++
 6 files changed, 153 insertions(+), 23 deletions(-)