Message ID | 1407250286-1801-4-git-send-email-rodrigo.vivi@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote: > From: Chris Wilson <chris@chris-wilson.co.uk> > > If we try to execute on a known ring, but it has failed to be > initialised correctly, report that the GPU is hung rather than the > command invalid. This leaves us reporting EINVAL only if the user > requests execution on a ring that is not supported by the device. > > This should prevent UXA from getting stuck in a null render loop after a > failed resume. > > v2 (Rodrigo): Fix conflict and add VCS2 ring and > s/intel_ring_buffer/intel_engine_cs. > > Reported-by: Jiri Kosina <jikos@jikos.cz> > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> This isn't required any more, see commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Apr 9 09:19:43 2014 +0100 drm/i915: Mark device as wedged if we fail to resume for the alternate merged patch. -Daniel > --- > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index 60998fc..288ff61 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -1233,6 +1233,19 @@ eb_get_batch(struct eb_vmas *eb) > return vma->obj; > } > > +static bool > +intel_ring_valid(struct intel_engine_cs *ring) > +{ > + switch (ring->id) { > + case RCS: return true; > + case VCS: return HAS_BSD(ring->dev); > + case BCS: return HAS_BLT(ring->dev); > + case VECS: return HAS_VEBOX(ring->dev); > + case VCS2: return HAS_BSD2(ring->dev); > + default: return false; > + } > +} > + > static int > i915_gem_do_execbuffer(struct drm_device *dev, void *data, > struct drm_file *file, > @@ -1289,7 +1302,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, > if (!intel_ring_initialized(ring)) { > DRM_DEBUG("execbuf with invalid ring: %d\n", > (int)(args->flags & I915_EXEC_RING_MASK)); > - return -EINVAL; > + return intel_ring_valid(ring) ? -EIO : -EINVAL; > } > > if (args->buffer_count < 1) { > -- > 1.9.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote: > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote: > > From: Chris Wilson <chris@chris-wilson.co.uk> > > > > If we try to execute on a known ring, but it has failed to be > > initialised correctly, report that the GPU is hung rather than the > > command invalid. This leaves us reporting EINVAL only if the user > > requests execution on a ring that is not supported by the device. > > > > This should prevent UXA from getting stuck in a null render loop after a > > failed resume. > > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and > > s/intel_ring_buffer/intel_engine_cs. > > > > Reported-by: Jiri Kosina <jikos@jikos.cz> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > > This isn't required any more, see > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Apr 9 09:19:43 2014 +0100 > > drm/i915: Mark device as wedged if we fail to resume > > for the alternate merged patch. Hmm, there is still a path that ends here, but the example above is already fixed as you say. -Chris
On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote: > On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote: > > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote: > > > From: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > If we try to execute on a known ring, but it has failed to be > > > initialised correctly, report that the GPU is hung rather than the > > > command invalid. This leaves us reporting EINVAL only if the user > > > requests execution on a ring that is not supported by the device. > > > > > > This should prevent UXA from getting stuck in a null render loop after a > > > failed resume. > > > > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and > > > s/intel_ring_buffer/intel_engine_cs. > > > > > > Reported-by: Jiri Kosina <jikos@jikos.cz> > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > > > > This isn't required any more, see > > > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0 > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Wed Apr 9 09:19:43 2014 +0100 > > > > drm/i915: Mark device as wedged if we fail to resume > > > > for the alternate merged patch. > > Hmm, there is still a path that ends here, but the example above is > already fixed as you say. We have the EIO check both in the resume and driver load paths. Which other path are we missing? -Daniel
On Wed, Aug 06, 2014 at 10:39:16AM +0200, Daniel Vetter wrote: > On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote: > > On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote: > > > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote: > > > > From: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > > > If we try to execute on a known ring, but it has failed to be > > > > initialised correctly, report that the GPU is hung rather than the > > > > command invalid. This leaves us reporting EINVAL only if the user > > > > requests execution on a ring that is not supported by the device. > > > > > > > > This should prevent UXA from getting stuck in a null render loop after a > > > > failed resume. > > > > > > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and > > > > s/intel_ring_buffer/intel_engine_cs. > > > > > > > > Reported-by: Jiri Kosina <jikos@jikos.cz> > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > > > > > > This isn't required any more, see > > > > > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0 > > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > > Date: Wed Apr 9 09:19:43 2014 +0100 > > > > > > drm/i915: Mark device as wedged if we fail to resume > > > > > > for the alternate merged patch. > > > > Hmm, there is still a path that ends here, but the example above is > > already fixed as you say. > > We have the EIO check both in the resume and driver load paths. Which > other path are we missing? The GPU may be set to wedged, but this check in execbuffer occurs before we check for a wedged GPU. -Chris
On Fri, Aug 08, 2014 at 10:17:10AM +0100, Chris Wilson wrote: > On Wed, Aug 06, 2014 at 10:39:16AM +0200, Daniel Vetter wrote: > > On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote: > > > On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote: > > > > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote: > > > > > From: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > > > > > If we try to execute on a known ring, but it has failed to be > > > > > initialised correctly, report that the GPU is hung rather than the > > > > > command invalid. This leaves us reporting EINVAL only if the user > > > > > requests execution on a ring that is not supported by the device. > > > > > > > > > > This should prevent UXA from getting stuck in a null render loop after a > > > > > failed resume. > > > > > > > > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and > > > > > s/intel_ring_buffer/intel_engine_cs. > > > > > > > > > > Reported-by: Jiri Kosina <jikos@jikos.cz> > > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > > > > > > > > This isn't required any more, see > > > > > > > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0 > > > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > > > Date: Wed Apr 9 09:19:43 2014 +0100 > > > > > > > > drm/i915: Mark device as wedged if we fail to resume > > > > > > > > for the alternate merged patch. > > > > > > Hmm, there is still a path that ends here, but the example above is > > > already fixed as you say. > > > > We have the EIO check both in the resume and driver load paths. Which > > other path are we missing? > > The GPU may be set to wedged, but this check in execbuffer occurs before > we check for a wedged GPU. But we no longer free the ring structures over suspedn/resume, so at least the commit message is outdated. I wonder whether the easier fix wouldn't be to continue ring init if we get an -EIO. -Daniel
On Fri, Aug 08, 2014 at 11:46:07AM +0200, Daniel Vetter wrote: > On Fri, Aug 08, 2014 at 10:17:10AM +0100, Chris Wilson wrote: > > On Wed, Aug 06, 2014 at 10:39:16AM +0200, Daniel Vetter wrote: > > > On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote: > > > > On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote: > > > > > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote: > > > > > > From: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > > > > > > > If we try to execute on a known ring, but it has failed to be > > > > > > initialised correctly, report that the GPU is hung rather than the > > > > > > command invalid. This leaves us reporting EINVAL only if the user > > > > > > requests execution on a ring that is not supported by the device. > > > > > > > > > > > > This should prevent UXA from getting stuck in a null render loop after a > > > > > > failed resume. > > > > > > > > > > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and > > > > > > s/intel_ring_buffer/intel_engine_cs. > > > > > > > > > > > > Reported-by: Jiri Kosina <jikos@jikos.cz> > > > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > > > > > > > > > > This isn't required any more, see > > > > > > > > > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0 > > > > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > > > > Date: Wed Apr 9 09:19:43 2014 +0100 > > > > > > > > > > drm/i915: Mark device as wedged if we fail to resume > > > > > > > > > > for the alternate merged patch. > > > > > > > > Hmm, there is still a path that ends here, but the example above is > > > > already fixed as you say. > > > > > > We have the EIO check both in the resume and driver load paths. Which > > > other path are we missing? > > > > The GPU may be set to wedged, but this check in execbuffer occurs before > > we check for a wedged GPU. > > But we no longer free the ring structures over suspedn/resume, so at least > the commit message is outdated. Went off on an incorrect tangent. > I wonder whether the easier fix wouldn't be to continue ring init if we > get an -EIO. Right, that's the problem I remember. But I am not sure if it is really worth it. It's only to hide log spew in a corner case of a corner case in UXA. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 60998fc..288ff61 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1233,6 +1233,19 @@ eb_get_batch(struct eb_vmas *eb) return vma->obj; } +static bool +intel_ring_valid(struct intel_engine_cs *ring) +{ + switch (ring->id) { + case RCS: return true; + case VCS: return HAS_BSD(ring->dev); + case BCS: return HAS_BLT(ring->dev); + case VECS: return HAS_VEBOX(ring->dev); + case VCS2: return HAS_BSD2(ring->dev); + default: return false; + } +} + static int i915_gem_do_execbuffer(struct drm_device *dev, void *data, struct drm_file *file, @@ -1289,7 +1302,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, if (!intel_ring_initialized(ring)) { DRM_DEBUG("execbuf with invalid ring: %d\n", (int)(args->flags & I915_EXEC_RING_MASK)); - return -EINVAL; + return intel_ring_valid(ring) ? -EIO : -EINVAL; } if (args->buffer_count < 1) {