[2/2] drm/i915: repin bound framebuffers on resume

Message ID	1370990967-22892-2-git-send-email-marcheu@chromium.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org> From: =?UTF-8?q?St=C3=A9phane=20Marchesin?= <marcheu@chromium.org> To: intel-gfx@lists.freedesktop.org Date: Tue, 11 Jun 2013 15:49:27 -0700 Message-Id: <1370990967-22892-2-git-send-email-marcheu@chromium.org> In-Reply-To: <1370990967-22892-1-git-send-email-marcheu@chromium.org> References: <1370990967-22892-1-git-send-email-marcheu@chromium.org> MIME-Version: 1.0 Cc: daniel.vetter@ffwll.ch Subject: [Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

Stéphane Marchesin June 11, 2013, 10:49 p.m. UTC

During suspend all fences are reset, including their pin_count which
is reset to 0. However a framebuffer can be bound across
suspend/resume, which means that when the buffer is unbound after
resume, the pin count for the buffer will be negative. Since the
fence pin count is now negative when available and zero when in use,
the buffer's fence will get recycled when the fence is in use which
is the opposite of what we want. The visible effect is that since the
fence is recycled the tiling mode goes away while the buffer is being
displayed and we get lines/screens of garbage.

To fix this, we repin the fences for all bound fbs on resume, which
ensures the pin count is right.

Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
---
 drivers/gpu/drm/i915/i915_drv.c      |  2 ++
 drivers/gpu/drm/i915/intel_display.c | 32 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_drv.h     |  1 +
 3 files changed, 35 insertions(+)

Chris Wilson June 11, 2013, 10:57 p.m. UTC | #1

On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
> During suspend all fences are reset, including their pin_count which
> is reset to 0. However a framebuffer can be bound across
> suspend/resume, which means that when the buffer is unbound after
> resume, the pin count for the buffer will be negative. Since the
> fence pin count is now negative when available and zero when in use,
> the buffer's fence will get recycled when the fence is in use which
> is the opposite of what we want. The visible effect is that since the
> fence is recycled the tiling mode goes away while the buffer is being
> displayed and we get lines/screens of garbage.
> 
> To fix this, we repin the fences for all bound fbs on resume, which
> ensures the pin count is right.

Yikes. So why do we not just keep the fences alive during suspend (not
touching their pin_count), and then just iterate over the list of fences
rewriting the register as required upon resume? That would seem less
error prone than trying to reconstruct the lost pin_count.
-Chris

Stéphane Marchesin June 11, 2013, 11:01 p.m. UTC | #2

On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
>> During suspend all fences are reset, including their pin_count which
>> is reset to 0. However a framebuffer can be bound across
>> suspend/resume, which means that when the buffer is unbound after
>> resume, the pin count for the buffer will be negative. Since the
>> fence pin count is now negative when available and zero when in use,
>> the buffer's fence will get recycled when the fence is in use which
>> is the opposite of what we want. The visible effect is that since the
>> fence is recycled the tiling mode goes away while the buffer is being
>> displayed and we get lines/screens of garbage.
>>
>> To fix this, we repin the fences for all bound fbs on resume, which
>> ensures the pin count is right.
>
> Yikes. So why do we not just keep the fences alive during suspend (not
> touching their pin_count), and then just iterate over the list of fences
> rewriting the register as required upon resume? That would seem less
> error prone than trying to reconstruct the lost pin_count.

I suspect they'd need to be saved/restored at the hw level as well,
which AFAICS isn't happening today...

Stéphane

Chris Wilson June 11, 2013, 11:48 p.m. UTC | #3

On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
> 
> On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
> >> During suspend all fences are reset, including their pin_count which
> >> is reset to 0. However a framebuffer can be bound across
> >> suspend/resume, which means that when the buffer is unbound after
> >> resume, the pin count for the buffer will be negative. Since the
> >> fence pin count is now negative when available and zero when in use,
> >> the buffer's fence will get recycled when the fence is in use which
> >> is the opposite of what we want. The visible effect is that since the
> >> fence is recycled the tiling mode goes away while the buffer is being
> >> displayed and we get lines/screens of garbage.
> >>
> >> To fix this, we repin the fences for all bound fbs on resume, which
> >> ensures the pin count is right.
> >
> > Yikes. So why do we not just keep the fences alive during suspend (not
> > touching their pin_count), and then just iterate over the list of fences
> > rewriting the register as required upon resume? That would seem less
> > error prone than trying to reconstruct the lost pin_count.
> 
> I suspect they'd need to be saved/restored at the hw level as well,
> which AFAICS isn't happening today...

Ugh, I introduced this bug 30 months ago - saved by the VT switch on
resume. But we can restore the fences from dev_priv->fence_regs...
Actually we have a very similar problem after a GPU reset where we
should restore fences for pinned objects (i.e. the scanout). The patch
to fix both looks fairly straightforward.
-Chris

Jesse Barnes June 12, 2013, 10:06 p.m. UTC | #4

On Wed, 12 Jun 2013 00:48:25 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
> > 
> > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
> > >> During suspend all fences are reset, including their pin_count which
> > >> is reset to 0. However a framebuffer can be bound across
> > >> suspend/resume, which means that when the buffer is unbound after
> > >> resume, the pin count for the buffer will be negative. Since the
> > >> fence pin count is now negative when available and zero when in use,
> > >> the buffer's fence will get recycled when the fence is in use which
> > >> is the opposite of what we want. The visible effect is that since the
> > >> fence is recycled the tiling mode goes away while the buffer is being
> > >> displayed and we get lines/screens of garbage.
> > >>
> > >> To fix this, we repin the fences for all bound fbs on resume, which
> > >> ensures the pin count is right.
> > >
> > > Yikes. So why do we not just keep the fences alive during suspend (not
> > > touching their pin_count), and then just iterate over the list of fences
> > > rewriting the register as required upon resume? That would seem less
> > > error prone than trying to reconstruct the lost pin_count.
> > 
> > I suspect they'd need to be saved/restored at the hw level as well,
> > which AFAICS isn't happening today...
> 
> Ugh, I introduced this bug 30 months ago - saved by the VT switch on
> resume. But we can restore the fences from dev_priv->fence_regs...
> Actually we have a very similar problem after a GPU reset where we
> should restore fences for pinned objects (i.e. the scanout). The patch
> to fix both looks fairly straightforward.

To be clear, this only affects gen3 right?  For gen4+ we don't need the
fences for scanout since we have a bit in the plane control...

Or are we failing to fault on a previously mapped scanout too?  If so,
we'd need to cover more than just scanout here.

Chris Wilson June 12, 2013, 10:41 p.m. UTC | #5

On Wed, Jun 12, 2013 at 03:06:51PM -0700, Jesse Barnes wrote:
> On Wed, 12 Jun 2013 00:48:25 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
> > > 
> > > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
> > > >> During suspend all fences are reset, including their pin_count which
> > > >> is reset to 0. However a framebuffer can be bound across
> > > >> suspend/resume, which means that when the buffer is unbound after
> > > >> resume, the pin count for the buffer will be negative. Since the
> > > >> fence pin count is now negative when available and zero when in use,
> > > >> the buffer's fence will get recycled when the fence is in use which
> > > >> is the opposite of what we want. The visible effect is that since the
> > > >> fence is recycled the tiling mode goes away while the buffer is being
> > > >> displayed and we get lines/screens of garbage.
> > > >>
> > > >> To fix this, we repin the fences for all bound fbs on resume, which
> > > >> ensures the pin count is right.
> > > >
> > > > Yikes. So why do we not just keep the fences alive during suspend (not
> > > > touching their pin_count), and then just iterate over the list of fences
> > > > rewriting the register as required upon resume? That would seem less
> > > > error prone than trying to reconstruct the lost pin_count.
> > > 
> > > I suspect they'd need to be saved/restored at the hw level as well,
> > > which AFAICS isn't happening today...
> > 
> > Ugh, I introduced this bug 30 months ago - saved by the VT switch on
> > resume. But we can restore the fences from dev_priv->fence_regs...
> > Actually we have a very similar problem after a GPU reset where we
> > should restore fences for pinned objects (i.e. the scanout). The patch
> > to fix both looks fairly straightforward.
> 
> To be clear, this only affects gen3 right?  For gen4+ we don't need the
> fences for scanout since we have a bit in the plane control...

True, you will only get scanout corruption from lack of a fence on gen2/3.
FBC will also be more broken than before.
 
> Or are we failing to fault on a previously mapped scanout too?  If so,
> we'd need to cover more than just scanout here.

They all get faulted in again, and all will grab a fence again. Only
scanouts pinned at the time of resume will believe that they hold an
additional reference to the fence, and so unpin it once too often. If we
do that enough times, we will starve ourselves of fences. And a gen3
scanout runs the risk of losing its fence at any time.

So the impact is far less severe than it appears at first glance. Unless
I've missed something.
-Chris

Stéphane Marchesin June 14, 2013, 7:12 p.m. UTC | #6

On Wed, Jun 12, 2013 at 3:06 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> On Wed, 12 Jun 2013 00:48:25 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
>> On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
>> >
>> > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
>> > >> During suspend all fences are reset, including their pin_count which
>> > >> is reset to 0. However a framebuffer can be bound across
>> > >> suspend/resume, which means that when the buffer is unbound after
>> > >> resume, the pin count for the buffer will be negative. Since the
>> > >> fence pin count is now negative when available and zero when in use,
>> > >> the buffer's fence will get recycled when the fence is in use which
>> > >> is the opposite of what we want. The visible effect is that since the
>> > >> fence is recycled the tiling mode goes away while the buffer is being
>> > >> displayed and we get lines/screens of garbage.
>> > >>
>> > >> To fix this, we repin the fences for all bound fbs on resume, which
>> > >> ensures the pin count is right.
>> > >
>> > > Yikes. So why do we not just keep the fences alive during suspend (not
>> > > touching their pin_count), and then just iterate over the list of fences
>> > > rewriting the register as required upon resume? That would seem less
>> > > error prone than trying to reconstruct the lost pin_count.
>> >
>> > I suspect they'd need to be saved/restored at the hw level as well,
>> > which AFAICS isn't happening today...
>>
>> Ugh, I introduced this bug 30 months ago - saved by the VT switch on
>> resume. But we can restore the fences from dev_priv->fence_regs...
>> Actually we have a very similar problem after a GPU reset where we
>> should restore fences for pinned objects (i.e. the scanout). The patch
>> to fix both looks fairly straightforward.
>
> To be clear, this only affects gen3 right?  For gen4+ we don't need the
> fences for scanout since we have a bit in the plane control...

Yup I've only ever seen the issue on gen3.

Anyway, what should we do about this? Should I make another patch
where I save/restore the fence regs instead?

Stéphane

>
> Or are we failing to fault on a previously mapped scanout too?  If so,
> we'd need to cover more than just scanout here.
>
> --
> Jesse Barnes, Intel Open Source Technology Center

Daniel Vetter June 14, 2013, 7:21 p.m. UTC | #7

On Fri, Jun 14, 2013 at 9:12 PM, Stéphane Marchesin
<marcheu@chromium.org> wrote:
> On Wed, Jun 12, 2013 at 3:06 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
>> On Wed, 12 Jun 2013 00:48:25 +0100
>> Chris Wilson <chris@chris-wilson.co.uk> wrote:
>>
>>> On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
>>> >
>>> > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>>> > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
>>> > >> During suspend all fences are reset, including their pin_count which
>>> > >> is reset to 0. However a framebuffer can be bound across
>>> > >> suspend/resume, which means that when the buffer is unbound after
>>> > >> resume, the pin count for the buffer will be negative. Since the
>>> > >> fence pin count is now negative when available and zero when in use,
>>> > >> the buffer's fence will get recycled when the fence is in use which
>>> > >> is the opposite of what we want. The visible effect is that since the
>>> > >> fence is recycled the tiling mode goes away while the buffer is being
>>> > >> displayed and we get lines/screens of garbage.
>>> > >>
>>> > >> To fix this, we repin the fences for all bound fbs on resume, which
>>> > >> ensures the pin count is right.
>>> > >
>>> > > Yikes. So why do we not just keep the fences alive during suspend (not
>>> > > touching their pin_count), and then just iterate over the list of fences
>>> > > rewriting the register as required upon resume? That would seem less
>>> > > error prone than trying to reconstruct the lost pin_count.
>>> >
>>> > I suspect they'd need to be saved/restored at the hw level as well,
>>> > which AFAICS isn't happening today...
>>>
>>> Ugh, I introduced this bug 30 months ago - saved by the VT switch on
>>> resume. But we can restore the fences from dev_priv->fence_regs...
>>> Actually we have a very similar problem after a GPU reset where we
>>> should restore fences for pinned objects (i.e. the scanout). The patch
>>> to fix both looks fairly straightforward.
>>
>> To be clear, this only affects gen3 right?  For gen4+ we don't need the
>> fences for scanout since we have a bit in the plane control...
>
> Yup I've only ever seen the issue on gen3.
>
> Anyway, what should we do about this? Should I make another patch
> where I save/restore the fence regs instead?

drm-intel-fixes has already the improved patch from Chris,
drm-intel-next-queued has a patch to add a WARN so we'll catch this
much quicker next time around.
-Daniel

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

[2/2] drm/i915: repin bound framebuffers on resume

Commit Message

Comments

Patch