Message ID | 1370990967-22892-2-git-send-email-marcheu@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote: > During suspend all fences are reset, including their pin_count which > is reset to 0. However a framebuffer can be bound across > suspend/resume, which means that when the buffer is unbound after > resume, the pin count for the buffer will be negative. Since the > fence pin count is now negative when available and zero when in use, > the buffer's fence will get recycled when the fence is in use which > is the opposite of what we want. The visible effect is that since the > fence is recycled the tiling mode goes away while the buffer is being > displayed and we get lines/screens of garbage. > > To fix this, we repin the fences for all bound fbs on resume, which > ensures the pin count is right. Yikes. So why do we not just keep the fences alive during suspend (not touching their pin_count), and then just iterate over the list of fences rewriting the register as required upon resume? That would seem less error prone than trying to reconstruct the lost pin_count. -Chris
On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote: >> During suspend all fences are reset, including their pin_count which >> is reset to 0. However a framebuffer can be bound across >> suspend/resume, which means that when the buffer is unbound after >> resume, the pin count for the buffer will be negative. Since the >> fence pin count is now negative when available and zero when in use, >> the buffer's fence will get recycled when the fence is in use which >> is the opposite of what we want. The visible effect is that since the >> fence is recycled the tiling mode goes away while the buffer is being >> displayed and we get lines/screens of garbage. >> >> To fix this, we repin the fences for all bound fbs on resume, which >> ensures the pin count is right. > > Yikes. So why do we not just keep the fences alive during suspend (not > touching their pin_count), and then just iterate over the list of fences > rewriting the register as required upon resume? That would seem less > error prone than trying to reconstruct the lost pin_count. I suspect they'd need to be saved/restored at the hw level as well, which AFAICS isn't happening today... Stéphane
On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote: > > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote: > >> During suspend all fences are reset, including their pin_count which > >> is reset to 0. However a framebuffer can be bound across > >> suspend/resume, which means that when the buffer is unbound after > >> resume, the pin count for the buffer will be negative. Since the > >> fence pin count is now negative when available and zero when in use, > >> the buffer's fence will get recycled when the fence is in use which > >> is the opposite of what we want. The visible effect is that since the > >> fence is recycled the tiling mode goes away while the buffer is being > >> displayed and we get lines/screens of garbage. > >> > >> To fix this, we repin the fences for all bound fbs on resume, which > >> ensures the pin count is right. > > > > Yikes. So why do we not just keep the fences alive during suspend (not > > touching their pin_count), and then just iterate over the list of fences > > rewriting the register as required upon resume? That would seem less > > error prone than trying to reconstruct the lost pin_count. > > I suspect they'd need to be saved/restored at the hw level as well, > which AFAICS isn't happening today... Ugh, I introduced this bug 30 months ago - saved by the VT switch on resume. But we can restore the fences from dev_priv->fence_regs... Actually we have a very similar problem after a GPU reset where we should restore fences for pinned objects (i.e. the scanout). The patch to fix both looks fairly straightforward. -Chris
On Wed, 12 Jun 2013 00:48:25 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote: > > > > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote: > > >> During suspend all fences are reset, including their pin_count which > > >> is reset to 0. However a framebuffer can be bound across > > >> suspend/resume, which means that when the buffer is unbound after > > >> resume, the pin count for the buffer will be negative. Since the > > >> fence pin count is now negative when available and zero when in use, > > >> the buffer's fence will get recycled when the fence is in use which > > >> is the opposite of what we want. The visible effect is that since the > > >> fence is recycled the tiling mode goes away while the buffer is being > > >> displayed and we get lines/screens of garbage. > > >> > > >> To fix this, we repin the fences for all bound fbs on resume, which > > >> ensures the pin count is right. > > > > > > Yikes. So why do we not just keep the fences alive during suspend (not > > > touching their pin_count), and then just iterate over the list of fences > > > rewriting the register as required upon resume? That would seem less > > > error prone than trying to reconstruct the lost pin_count. > > > > I suspect they'd need to be saved/restored at the hw level as well, > > which AFAICS isn't happening today... > > Ugh, I introduced this bug 30 months ago - saved by the VT switch on > resume. But we can restore the fences from dev_priv->fence_regs... > Actually we have a very similar problem after a GPU reset where we > should restore fences for pinned objects (i.e. the scanout). The patch > to fix both looks fairly straightforward. To be clear, this only affects gen3 right? For gen4+ we don't need the fences for scanout since we have a bit in the plane control... Or are we failing to fault on a previously mapped scanout too? If so, we'd need to cover more than just scanout here.
On Wed, Jun 12, 2013 at 03:06:51PM -0700, Jesse Barnes wrote: > On Wed, 12 Jun 2013 00:48:25 +0100 > Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote: > > > > > > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote: > > > >> During suspend all fences are reset, including their pin_count which > > > >> is reset to 0. However a framebuffer can be bound across > > > >> suspend/resume, which means that when the buffer is unbound after > > > >> resume, the pin count for the buffer will be negative. Since the > > > >> fence pin count is now negative when available and zero when in use, > > > >> the buffer's fence will get recycled when the fence is in use which > > > >> is the opposite of what we want. The visible effect is that since the > > > >> fence is recycled the tiling mode goes away while the buffer is being > > > >> displayed and we get lines/screens of garbage. > > > >> > > > >> To fix this, we repin the fences for all bound fbs on resume, which > > > >> ensures the pin count is right. > > > > > > > > Yikes. So why do we not just keep the fences alive during suspend (not > > > > touching their pin_count), and then just iterate over the list of fences > > > > rewriting the register as required upon resume? That would seem less > > > > error prone than trying to reconstruct the lost pin_count. > > > > > > I suspect they'd need to be saved/restored at the hw level as well, > > > which AFAICS isn't happening today... > > > > Ugh, I introduced this bug 30 months ago - saved by the VT switch on > > resume. But we can restore the fences from dev_priv->fence_regs... > > Actually we have a very similar problem after a GPU reset where we > > should restore fences for pinned objects (i.e. the scanout). The patch > > to fix both looks fairly straightforward. > > To be clear, this only affects gen3 right? For gen4+ we don't need the > fences for scanout since we have a bit in the plane control... True, you will only get scanout corruption from lack of a fence on gen2/3. FBC will also be more broken than before. > Or are we failing to fault on a previously mapped scanout too? If so, > we'd need to cover more than just scanout here. They all get faulted in again, and all will grab a fence again. Only scanouts pinned at the time of resume will believe that they hold an additional reference to the fence, and so unpin it once too often. If we do that enough times, we will starve ourselves of fences. And a gen3 scanout runs the risk of losing its fence at any time. So the impact is far less severe than it appears at first glance. Unless I've missed something. -Chris
On Wed, Jun 12, 2013 at 3:06 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > On Wed, 12 Jun 2013 00:48:25 +0100 > Chris Wilson <chris@chris-wilson.co.uk> wrote: > >> On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote: >> > >> > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: >> > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote: >> > >> During suspend all fences are reset, including their pin_count which >> > >> is reset to 0. However a framebuffer can be bound across >> > >> suspend/resume, which means that when the buffer is unbound after >> > >> resume, the pin count for the buffer will be negative. Since the >> > >> fence pin count is now negative when available and zero when in use, >> > >> the buffer's fence will get recycled when the fence is in use which >> > >> is the opposite of what we want. The visible effect is that since the >> > >> fence is recycled the tiling mode goes away while the buffer is being >> > >> displayed and we get lines/screens of garbage. >> > >> >> > >> To fix this, we repin the fences for all bound fbs on resume, which >> > >> ensures the pin count is right. >> > > >> > > Yikes. So why do we not just keep the fences alive during suspend (not >> > > touching their pin_count), and then just iterate over the list of fences >> > > rewriting the register as required upon resume? That would seem less >> > > error prone than trying to reconstruct the lost pin_count. >> > >> > I suspect they'd need to be saved/restored at the hw level as well, >> > which AFAICS isn't happening today... >> >> Ugh, I introduced this bug 30 months ago - saved by the VT switch on >> resume. But we can restore the fences from dev_priv->fence_regs... >> Actually we have a very similar problem after a GPU reset where we >> should restore fences for pinned objects (i.e. the scanout). The patch >> to fix both looks fairly straightforward. > > To be clear, this only affects gen3 right? For gen4+ we don't need the > fences for scanout since we have a bit in the plane control... Yup I've only ever seen the issue on gen3. Anyway, what should we do about this? Should I make another patch where I save/restore the fence regs instead? Stéphane > > Or are we failing to fault on a previously mapped scanout too? If so, > we'd need to cover more than just scanout here. > > -- > Jesse Barnes, Intel Open Source Technology Center
On Fri, Jun 14, 2013 at 9:12 PM, Stéphane Marchesin <marcheu@chromium.org> wrote: > On Wed, Jun 12, 2013 at 3:06 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: >> On Wed, 12 Jun 2013 00:48:25 +0100 >> Chris Wilson <chris@chris-wilson.co.uk> wrote: >> >>> On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote: >>> > >>> > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: >>> > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote: >>> > >> During suspend all fences are reset, including their pin_count which >>> > >> is reset to 0. However a framebuffer can be bound across >>> > >> suspend/resume, which means that when the buffer is unbound after >>> > >> resume, the pin count for the buffer will be negative. Since the >>> > >> fence pin count is now negative when available and zero when in use, >>> > >> the buffer's fence will get recycled when the fence is in use which >>> > >> is the opposite of what we want. The visible effect is that since the >>> > >> fence is recycled the tiling mode goes away while the buffer is being >>> > >> displayed and we get lines/screens of garbage. >>> > >> >>> > >> To fix this, we repin the fences for all bound fbs on resume, which >>> > >> ensures the pin count is right. >>> > > >>> > > Yikes. So why do we not just keep the fences alive during suspend (not >>> > > touching their pin_count), and then just iterate over the list of fences >>> > > rewriting the register as required upon resume? That would seem less >>> > > error prone than trying to reconstruct the lost pin_count. >>> > >>> > I suspect they'd need to be saved/restored at the hw level as well, >>> > which AFAICS isn't happening today... >>> >>> Ugh, I introduced this bug 30 months ago - saved by the VT switch on >>> resume. But we can restore the fences from dev_priv->fence_regs... >>> Actually we have a very similar problem after a GPU reset where we >>> should restore fences for pinned objects (i.e. the scanout). The patch >>> to fix both looks fairly straightforward. >> >> To be clear, this only affects gen3 right? For gen4+ we don't need the >> fences for scanout since we have a bit in the plane control... > > Yup I've only ever seen the issue on gen3. > > Anyway, what should we do about this? Should I make another patch > where I save/restore the fence regs instead? drm-intel-fixes has already the improved patch from Chris, drm-intel-next-queued has a patch to add a WARN so we'll catch this much quicker next time around. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index a2e4953..b8e82ab 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -643,6 +643,8 @@ static int __i915_drm_thaw(struct drm_device *dev) /* We need working interrupts for modeset enabling ... */ drm_irq_install(dev); + /* Repin all live fences before resuming */ + intel_repin_fences(dev); intel_modeset_init_hw(dev); drm_modeset_lock_all(dev); diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 56746dc..4bf3240 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2051,6 +2051,38 @@ void intel_unpin_fb_obj(struct drm_i915_gem_object *obj) i915_gem_object_unpin(obj); } +/* + * Repin the fences for all currently bound fbs. During suspend i915_drm_freeze + * calls i915_gem_reset, which in turn calls i915_gem_reset_fences, which + * resets the fence pin_counts to 0. So on resume we can call this function to + * restore the fences on bound framebuffers. + */ +void intel_repin_fences(struct drm_device *dev) +{ + struct drm_crtc *crtc; + struct drm_i915_gem_object *obj; + int ret; + + mutex_lock(&mode_config->mutex); + list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) { + if (!crtc || !crtc->fb) + continue; + obj = to_intel_framebuffer(crtc->fb)->obj; + if (!obj) + continue; + + /* Install a fence for tiled scan-out. */ + if (obj->tiling_mode != I915_TILING_NONE) { + ret = i915_gem_object_get_fence(obj); + if (ret) + DRM_ERROR("Couldn't get a fence\n"); + else + i915_gem_object_pin_fence(obj); + } + } + mutex_unlock(&mode_config->mutex); +} + /* Computes the linear offset to the base tile and adjusts x, y. bytes per pixel * is assumed to be a power-of-two. */ unsigned long intel_gen4_compute_page_offset(int *x, int *y, diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 624a9e6..b5395ed 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -630,6 +630,7 @@ extern int intel_pin_and_fence_fb_obj(struct drm_device *dev, struct drm_i915_gem_object *obj, struct intel_ring_buffer *pipelined); extern void intel_unpin_fb_obj(struct drm_i915_gem_object *obj); +extern void intel_repin_fences(struct drm_device *dev); extern int intel_framebuffer_init(struct drm_device *dev, struct intel_framebuffer *ifb,
During suspend all fences are reset, including their pin_count which is reset to 0. However a framebuffer can be bound across suspend/resume, which means that when the buffer is unbound after resume, the pin count for the buffer will be negative. Since the fence pin count is now negative when available and zero when in use, the buffer's fence will get recycled when the fence is in use which is the opposite of what we want. The visible effect is that since the fence is recycled the tiling mode goes away while the buffer is being displayed and we get lines/screens of garbage. To fix this, we repin the fences for all bound fbs on resume, which ensures the pin count is right. Signed-off-by: Stéphane Marchesin <marcheu@chromium.org> --- drivers/gpu/drm/i915/i915_drv.c | 2 ++ drivers/gpu/drm/i915/intel_display.c | 32 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_drv.h | 1 + 3 files changed, 35 insertions(+)