diff mbox

[RFC,1/8] drm/i915: Downgrade tasklet GEM_BUG_ON for request not completed

Message ID 20180316183105.16027-2-jeff.mcgee@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

jeff.mcgee@intel.com March 16, 2018, 6:30 p.m. UTC
From: Jeff McGee <jeff.mcgee@intel.com>

It is possible for the hardware to be reset just as a context is
completing. The reset post-processing may see the seqno update and
assume that the context escaped corruption, but the context may
have been disrupted in the save out process. The corruption may
screw up the HEAD and TAIL pointers such that the next submission
of the context switches out without running the intended request.
The GEM_BUG_ON will be hit in this situation, but it is not really
a driver error. So make it a GEM_WARN_ON so that we notice while
letting hangcheck detect and clean up.

This patch is required to support the force preemption feature.

Test: Run IGT gem_exec_fpreempt repeatedly with
      CONFIG_DRM_I915_DEBUG_GEM.
Change-Id: I87da4b16bad805fe48153a9ed9169900681ebba7
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Wilson March 16, 2018, 8:30 p.m. UTC | #1
Quoting jeff.mcgee@intel.com (2018-03-16 18:30:58)
> From: Jeff McGee <jeff.mcgee@intel.com>
> 
> It is possible for the hardware to be reset just as a context is
> completing. The reset post-processing may see the seqno update and
> assume that the context escaped corruption, but the context may
> have been disrupted in the save out process. The corruption may
> screw up the HEAD and TAIL pointers such that the next submission
> of the context switches out without running the intended request.
> The GEM_BUG_ON will be hit in this situation, but it is not really
> a driver error. So make it a GEM_WARN_ON so that we notice while
> letting hangcheck detect and clean up.
> 
> This patch is required to support the force preemption feature.
> 
> Test: Run IGT gem_exec_fpreempt repeatedly with
>       CONFIG_DRM_I915_DEBUG_GEM.
> Change-Id: I87da4b16bad805fe48153a9ed9169900681ebba7
> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b44861459d24..7d93fcd56d34 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -920,7 +920,7 @@ static void intel_lrc_irq_handler(unsigned long data)
>                         GEM_BUG_ON(count == 0);
>                         if (--count == 0) {
>                                 GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
> -                               GEM_BUG_ON(!i915_gem_request_completed(rq));
> +                               GEM_WARN_ON(!i915_gem_request_completed(rq));

Nope, either way you broke pretty badly.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b44861459d24..7d93fcd56d34 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -920,7 +920,7 @@  static void intel_lrc_irq_handler(unsigned long data)
 			GEM_BUG_ON(count == 0);
 			if (--count == 0) {
 				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
-				GEM_BUG_ON(!i915_gem_request_completed(rq));
+				GEM_WARN_ON(!i915_gem_request_completed(rq));
 				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 
 				trace_i915_gem_request_out(rq);