drm/i915: FBC flush nuke for BDW

Message ID	1407149498-3289-1-git-send-email-rodrigo.vivi@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Rodrigo Vivi <rodrigo.vivi@intel.com> To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Aug 2014 03:51:38 -0700 Message-Id: <1407149498-3289-1-git-send-email-rodrigo.vivi@intel.com> In-Reply-To: <20140804081147.GL8727@phenom.ffwll.local> References: <20140804081147.GL8727@phenom.ffwll.local> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Subject: [Intel-gfx] [PATCH] drm/i915: FBC flush nuke for BDW Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Rodrigo Vivi Aug. 4, 2014, 10:51 a.m. UTC

According to spec FBC on BDW and HSW are identical without any gaps.
So let's copy the nuke and let FBC really start compressing stuff.

Without this patch we can verify with false color that nothing is being
compressed. With the nuke in place and false color it is possible
to see false color debugs.

Unfortunatelly on some rings like BCS on BDW we have to avoid Bits 22:18 on
LRIs due to a high risk of hung. So, when using Blt ring for frontbuffer rend
cache would never been cleaned and FBC would stop compressing buffer.
One alternative is to cache clean on software frontbuffer tracking.

v2: Fix rebase conflict.
v3: Do not clean cache on BCS ring. Instead use sw frontbuffer tracking.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/intel_display.c    |  3 +++
 drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
 4 files changed, 23 insertions(+), 1 deletion(-)

Rodrigo Vivi Aug. 7, 2014, 8:04 p.m. UTC | #1

I tested here on HSW a full sw nuke/cache clean and I didn't liked the
result.
It seems to compress less than the hw one and to recompress everything a
lot and stay less time compressed.

So, imho v3 is the way to go.


On Mon, Aug 4, 2014 at 3:51 AM, Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:

> According to spec FBC on BDW and HSW are identical without any gaps.
> So let's copy the nuke and let FBC really start compressing stuff.
>
> Without this patch we can verify with false color that nothing is being
> compressed. With the nuke in place and false color it is possible
> to see false color debugs.
>
> Unfortunatelly on some rings like BCS on BDW we have to avoid Bits 22:18 on
> LRIs due to a high risk of hung. So, when using Blt ring for frontbuffer
> rend
> cache would never been cleaned and FBC would stop compressing buffer.
> One alternative is to cache clean on software frontbuffer tracking.
>
> v2: Fix rebase conflict.
> v3: Do not clean cache on BCS ring. Instead use sw frontbuffer tracking.
>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>  drivers/gpu/drm/i915/intel_display.c    |  3 +++
>  drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
>  4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> index 2a372f2..25d7365 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2713,6 +2713,7 @@ extern void intel_modeset_setup_hw_state(struct
> drm_device *dev,
>  extern void i915_redisable_vga(struct drm_device *dev);
>  extern void i915_redisable_vga_power_on(struct drm_device *dev);
>  extern bool intel_fbc_enabled(struct drm_device *dev);
> +extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
>  extern void intel_disable_fbc(struct drm_device *dev);
>  extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
>  extern void intel_init_pch_refclk(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_display.c
> b/drivers/gpu/drm/i915/intel_display.c
> index 883af0b..c8421cd 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9044,6 +9044,9 @@ void intel_frontbuffer_flush(struct drm_device *dev,
>         intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
>
>         intel_edp_psr_flush(dev, frontbuffer_bits);
> +
> +       if (IS_GEN8(dev))
> +               gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
>  }
>
>  /**
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 684dc5f..de07d3e 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -345,6 +345,16 @@ bool intel_fbc_enabled(struct drm_device *dev)
>         return dev_priv->display.fbc_enabled(dev);
>  }
>
> +void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
> +{
> +       struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +       if (!IS_GEN8(dev))
> +               return;
> +
> +       I915_WRITE(MSG_FBC_REND_STATE, value);
> +}
> +
>  static void intel_fbc_work_fn(struct work_struct *__work)
>  {
>         struct intel_fbc_work *work =
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 2908896..2fe871c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -406,6 +406,7 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
>  {
>         u32 flags = 0;
>         u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +       int ret;
>
>         flags |= PIPE_CONTROL_CS_STALL;
>
> @@ -424,7 +425,14 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
>                 flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
>         }
>
> -       return gen8_emit_pipe_control(ring, flags, scratch_addr);
> +       ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
> +       if (ret)
> +               return ret;
> +
> +       if (!invalidate_domains && flush_domains)
> +               return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
> +
> +       return 0;
>  }
>
>  static void ring_write_tail(struct intel_engine_cs *ring,
> --
> 1.9.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

Daniel Vetter Aug. 8, 2014, 7:06 a.m. UTC | #2

On Thu, Aug 07, 2014 at 01:04:19PM -0700, Rodrigo Vivi wrote:
> I tested here on HSW a full sw nuke/cache clean and I didn't liked the
> result.
> It seems to compress less than the hw one and to recompress everything a
> lot and stay less time compressed.

That is really unexpected. For a modern desktop (i.e. anything that
pageflips) there should be zero difference. And for actual frontbuffer
rendering there should only be a difference when doing tiny cpu rendering
to the frontbuffer.

So something didn't work out as expected. Can you please push the code
somewhere, or just submit patches to intel-gfx?

Thanks, Daniel
> 
> So, imho v3 is the way to go.
> 
> 
> On Mon, Aug 4, 2014 at 3:51 AM, Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> 
> > According to spec FBC on BDW and HSW are identical without any gaps.
> > So let's copy the nuke and let FBC really start compressing stuff.
> >
> > Without this patch we can verify with false color that nothing is being
> > compressed. With the nuke in place and false color it is possible
> > to see false color debugs.
> >
> > Unfortunatelly on some rings like BCS on BDW we have to avoid Bits 22:18 on
> > LRIs due to a high risk of hung. So, when using Blt ring for frontbuffer
> > rend
> > cache would never been cleaned and FBC would stop compressing buffer.
> > One alternative is to cache clean on software frontbuffer tracking.
> >
> > v2: Fix rebase conflict.
> > v3: Do not clean cache on BCS ring. Instead use sw frontbuffer tracking.
> >
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
> >  drivers/gpu/drm/i915/intel_display.c    |  3 +++
> >  drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
> >  4 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 2a372f2..25d7365 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2713,6 +2713,7 @@ extern void intel_modeset_setup_hw_state(struct
> > drm_device *dev,
> >  extern void i915_redisable_vga(struct drm_device *dev);
> >  extern void i915_redisable_vga_power_on(struct drm_device *dev);
> >  extern bool intel_fbc_enabled(struct drm_device *dev);
> > +extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
> >  extern void intel_disable_fbc(struct drm_device *dev);
> >  extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
> >  extern void intel_init_pch_refclk(struct drm_device *dev);
> > diff --git a/drivers/gpu/drm/i915/intel_display.c
> > b/drivers/gpu/drm/i915/intel_display.c
> > index 883af0b..c8421cd 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -9044,6 +9044,9 @@ void intel_frontbuffer_flush(struct drm_device *dev,
> >         intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
> >
> >         intel_edp_psr_flush(dev, frontbuffer_bits);
> > +
> > +       if (IS_GEN8(dev))
> > +               gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
> >  }
> >
> >  /**
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > b/drivers/gpu/drm/i915/intel_pm.c
> > index 684dc5f..de07d3e 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -345,6 +345,16 @@ bool intel_fbc_enabled(struct drm_device *dev)
> >         return dev_priv->display.fbc_enabled(dev);
> >  }
> >
> > +void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
> > +{
> > +       struct drm_i915_private *dev_priv = dev->dev_private;
> > +
> > +       if (!IS_GEN8(dev))
> > +               return;
> > +
> > +       I915_WRITE(MSG_FBC_REND_STATE, value);
> > +}
> > +
> >  static void intel_fbc_work_fn(struct work_struct *__work)
> >  {
> >         struct intel_fbc_work *work =
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 2908896..2fe871c 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -406,6 +406,7 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
> >  {
> >         u32 flags = 0;
> >         u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> > +       int ret;
> >
> >         flags |= PIPE_CONTROL_CS_STALL;
> >
> > @@ -424,7 +425,14 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
> >                 flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> >         }
> >
> > -       return gen8_emit_pipe_control(ring, flags, scratch_addr);
> > +       ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
> > +       if (ret)
> > +               return ret;
> > +
> > +       if (!invalidate_domains && flush_domains)
> > +               return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
> > +
> > +       return 0;
> >  }
> >
> >  static void ring_write_tail(struct intel_engine_cs *ring,
> > --
> > 1.9.3
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >
> 
> 
> 
> -- 
> Rodrigo Vivi
> Blog: http://blog.vivi.eng.br

> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Daniel Vetter Aug. 19, 2014, 6:58 p.m. UTC | #3

Readding intel-gfx. Please don't drop mailing lists cc's without telling me.

Thanks, Daniel

On Tue, Aug 19, 2014 at 8:57 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> Yeah, that does a lot too much flushing - you need to track relevant
> dirty bits like psr does, and then only flush when there has been a
> preceeding invalidate with the primary plane frontbuffer bit for the
> pipe that's using fbc. On top of that there's room for more
> improvements (filtering out pageflips and optimizing that more, atm we
> just disable fbc over a pageflip which is a bit meh), and we should
> also be able to ditch all the existing fbc nuking we do from the cmd
> streamer.
> -Daniel
>
> On Tue, Aug 19, 2014 at 12:09 AM, Rodrigo Vivi <rodrigo.vivi@gmail.com> wrote:
>> http://cgit.freedesktop.org/~vivijim/drm-intel/commit/?h=fbc-sw-nuke-hsw&id=71875d3331aa3baef4f6f6bd297cc70dd94df1b6
>>
>>
>> On Fri, Aug 8, 2014 at 12:06 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>
>>> On Thu, Aug 07, 2014 at 01:04:19PM -0700, Rodrigo Vivi wrote:
>>> > I tested here on HSW a full sw nuke/cache clean and I didn't liked the
>>> > result.
>>> > It seems to compress less than the hw one and to recompress everything a
>>> > lot and stay less time compressed.
>>>
>>> That is really unexpected. For a modern desktop (i.e. anything that
>>> pageflips) there should be zero difference. And for actual frontbuffer
>>> rendering there should only be a difference when doing tiny cpu rendering
>>> to the frontbuffer.
>>>
>>> So something didn't work out as expected. Can you please push the code
>>> somewhere, or just submit patches to intel-gfx?
>>>
>>> Thanks, Daniel
>>> >
>>> > So, imho v3 is the way to go.
>>> >
>>> >
>>> > On Mon, Aug 4, 2014 at 3:51 AM, Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> > wrote:
>>> >
>>> > > According to spec FBC on BDW and HSW are identical without any gaps.
>>> > > So let's copy the nuke and let FBC really start compressing stuff.
>>> > >
>>> > > Without this patch we can verify with false color that nothing is
>>> > > being
>>> > > compressed. With the nuke in place and false color it is possible
>>> > > to see false color debugs.
>>> > >
>>> > > Unfortunatelly on some rings like BCS on BDW we have to avoid Bits
>>> > > 22:18 on
>>> > > LRIs due to a high risk of hung. So, when using Blt ring for
>>> > > frontbuffer
>>> > > rend
>>> > > cache would never been cleaned and FBC would stop compressing buffer.
>>> > > One alternative is to cache clean on software frontbuffer tracking.
>>> > >
>>> > > v2: Fix rebase conflict.
>>> > > v3: Do not clean cache on BCS ring. Instead use sw frontbuffer
>>> > > tracking.
>>> > >
>>> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> > > ---
>>> > >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>>> > >  drivers/gpu/drm/i915/intel_display.c    |  3 +++
>>> > >  drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
>>> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
>>> > >  4 files changed, 23 insertions(+), 1 deletion(-)
>>> > >
>>> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>> > > b/drivers/gpu/drm/i915/i915_drv.h
>>> > > index 2a372f2..25d7365 100644
>>> > > --- a/drivers/gpu/drm/i915/i915_drv.h
>>> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> > > @@ -2713,6 +2713,7 @@ extern void intel_modeset_setup_hw_state(struct
>>> > > drm_device *dev,
>>> > >  extern void i915_redisable_vga(struct drm_device *dev);
>>> > >  extern void i915_redisable_vga_power_on(struct drm_device *dev);
>>> > >  extern bool intel_fbc_enabled(struct drm_device *dev);
>>> > > +extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
>>> > >  extern void intel_disable_fbc(struct drm_device *dev);
>>> > >  extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
>>> > >  extern void intel_init_pch_refclk(struct drm_device *dev);
>>> > > diff --git a/drivers/gpu/drm/i915/intel_display.c
>>> > > b/drivers/gpu/drm/i915/intel_display.c
>>> > > index 883af0b..c8421cd 100644
>>> > > --- a/drivers/gpu/drm/i915/intel_display.c
>>> > > +++ b/drivers/gpu/drm/i915/intel_display.c
>>> > > @@ -9044,6 +9044,9 @@ void intel_frontbuffer_flush(struct drm_device
>>> > > *dev,
>>> > >         intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
>>> > >
>>> > >         intel_edp_psr_flush(dev, frontbuffer_bits);
>>> > > +
>>> > > +       if (IS_GEN8(dev))
>>> > > +               gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
>>> > >  }
>>> > >
>>> > >  /**
>>> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
>>> > > b/drivers/gpu/drm/i915/intel_pm.c
>>> > > index 684dc5f..de07d3e 100644
>>> > > --- a/drivers/gpu/drm/i915/intel_pm.c
>>> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
>>> > > @@ -345,6 +345,16 @@ bool intel_fbc_enabled(struct drm_device *dev)
>>> > >         return dev_priv->display.fbc_enabled(dev);
>>> > >  }
>>> > >
>>> > > +void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
>>> > > +{
>>> > > +       struct drm_i915_private *dev_priv = dev->dev_private;
>>> > > +
>>> > > +       if (!IS_GEN8(dev))
>>> > > +               return;
>>> > > +
>>> > > +       I915_WRITE(MSG_FBC_REND_STATE, value);
>>> > > +}
>>> > > +
>>> > >  static void intel_fbc_work_fn(struct work_struct *__work)
>>> > >  {
>>> > >         struct intel_fbc_work *work =
>>> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> > > b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> > > index 2908896..2fe871c 100644
>>> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> > > @@ -406,6 +406,7 @@ gen8_render_ring_flush(struct intel_engine_cs
>>> > > *ring,
>>> > >  {
>>> > >         u32 flags = 0;
>>> > >         u32 scratch_addr = ring->scratch.gtt_offset + 2 *
>>> > > CACHELINE_BYTES;
>>> > > +       int ret;
>>> > >
>>> > >         flags |= PIPE_CONTROL_CS_STALL;
>>> > >
>>> > > @@ -424,7 +425,14 @@ gen8_render_ring_flush(struct intel_engine_cs
>>> > > *ring,
>>> > >                 flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
>>> > >         }
>>> > >
>>> > > -       return gen8_emit_pipe_control(ring, flags, scratch_addr);
>>> > > +       ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
>>> > > +       if (ret)
>>> > > +               return ret;
>>> > > +
>>> > > +       if (!invalidate_domains && flush_domains)
>>> > > +               return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
>>> > > +
>>> > > +       return 0;
>>> > >  }
>>> > >
>>> > >  static void ring_write_tail(struct intel_engine_cs *ring,
>>> > > --
>>> > > 1.9.3
>>> > >
>>> > > _______________________________________________
>>> > > Intel-gfx mailing list
>>> > > Intel-gfx@lists.freedesktop.org
>>> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Rodrigo Vivi
>>> > Blog: http://blog.vivi.eng.br
>>>
>>> > _______________________________________________
>>> > Intel-gfx mailing list
>>> > Intel-gfx@lists.freedesktop.org
>>> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>
>>
>>
>>
>> --
>> Rodrigo Vivi
>> Blog: http://blog.vivi.eng.br
>>
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Rodrigo Vivi Aug. 21, 2014, 4:44 p.m. UTC | #4

List was accidentally drop. I didn't mean it. Sorry.

And yes, you are right, we need a way to reduce nukes and cleans similar
that we have for psr. I'll try it.


On Tue, Aug 19, 2014 at 11:58 AM, Daniel Vetter <daniel@ffwll.ch> wrote:

> Readding intel-gfx. Please don't drop mailing lists cc's without telling
> me.
>
> Thanks, Daniel
>
> On Tue, Aug 19, 2014 at 8:57 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > Yeah, that does a lot too much flushing - you need to track relevant
> > dirty bits like psr does, and then only flush when there has been a
> > preceeding invalidate with the primary plane frontbuffer bit for the
> > pipe that's using fbc. On top of that there's room for more
> > improvements (filtering out pageflips and optimizing that more, atm we
> > just disable fbc over a pageflip which is a bit meh), and we should
> > also be able to ditch all the existing fbc nuking we do from the cmd
> > streamer.
> > -Daniel
> >
> > On Tue, Aug 19, 2014 at 12:09 AM, Rodrigo Vivi <rodrigo.vivi@gmail.com>
> wrote:
> >>
> http://cgit.freedesktop.org/~vivijim/drm-intel/commit/?h=fbc-sw-nuke-hsw&id=71875d3331aa3baef4f6f6bd297cc70dd94df1b6
> >>
> >>
> >> On Fri, Aug 8, 2014 at 12:06 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>
> >>> On Thu, Aug 07, 2014 at 01:04:19PM -0700, Rodrigo Vivi wrote:
> >>> > I tested here on HSW a full sw nuke/cache clean and I didn't liked
> the
> >>> > result.
> >>> > It seems to compress less than the hw one and to recompress
> everything a
> >>> > lot and stay less time compressed.
> >>>
> >>> That is really unexpected. For a modern desktop (i.e. anything that
> >>> pageflips) there should be zero difference. And for actual frontbuffer
> >>> rendering there should only be a difference when doing tiny cpu
> rendering
> >>> to the frontbuffer.
> >>>
> >>> So something didn't work out as expected. Can you please push the code
> >>> somewhere, or just submit patches to intel-gfx?
> >>>
> >>> Thanks, Daniel
> >>> >
> >>> > So, imho v3 is the way to go.
> >>> >
> >>> >
> >>> > On Mon, Aug 4, 2014 at 3:51 AM, Rodrigo Vivi <rodrigo.vivi@intel.com
> >
> >>> > wrote:
> >>> >
> >>> > > According to spec FBC on BDW and HSW are identical without any
> gaps.
> >>> > > So let's copy the nuke and let FBC really start compressing stuff.
> >>> > >
> >>> > > Without this patch we can verify with false color that nothing is
> >>> > > being
> >>> > > compressed. With the nuke in place and false color it is possible
> >>> > > to see false color debugs.
> >>> > >
> >>> > > Unfortunatelly on some rings like BCS on BDW we have to avoid Bits
> >>> > > 22:18 on
> >>> > > LRIs due to a high risk of hung. So, when using Blt ring for
> >>> > > frontbuffer
> >>> > > rend
> >>> > > cache would never been cleaned and FBC would stop compressing
> buffer.
> >>> > > One alternative is to cache clean on software frontbuffer tracking.
> >>> > >
> >>> > > v2: Fix rebase conflict.
> >>> > > v3: Do not clean cache on BCS ring. Instead use sw frontbuffer
> >>> > > tracking.
> >>> > >
> >>> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> >>> > > ---
> >>> > >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
> >>> > >  drivers/gpu/drm/i915/intel_display.c    |  3 +++
> >>> > >  drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
> >>> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
> >>> > >  4 files changed, 23 insertions(+), 1 deletion(-)
> >>> > >
> >>> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >>> > > b/drivers/gpu/drm/i915/i915_drv.h
> >>> > > index 2a372f2..25d7365 100644
> >>> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> >>> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> >>> > > @@ -2713,6 +2713,7 @@ extern void
> intel_modeset_setup_hw_state(struct
> >>> > > drm_device *dev,
> >>> > >  extern void i915_redisable_vga(struct drm_device *dev);
> >>> > >  extern void i915_redisable_vga_power_on(struct drm_device *dev);
> >>> > >  extern bool intel_fbc_enabled(struct drm_device *dev);
> >>> > > +extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
> >>> > >  extern void intel_disable_fbc(struct drm_device *dev);
> >>> > >  extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
> >>> > >  extern void intel_init_pch_refclk(struct drm_device *dev);
> >>> > > diff --git a/drivers/gpu/drm/i915/intel_display.c
> >>> > > b/drivers/gpu/drm/i915/intel_display.c
> >>> > > index 883af0b..c8421cd 100644
> >>> > > --- a/drivers/gpu/drm/i915/intel_display.c
> >>> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> >>> > > @@ -9044,6 +9044,9 @@ void intel_frontbuffer_flush(struct
> drm_device
> >>> > > *dev,
> >>> > >         intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
> >>> > >
> >>> > >         intel_edp_psr_flush(dev, frontbuffer_bits);
> >>> > > +
> >>> > > +       if (IS_GEN8(dev))
> >>> > > +               gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
> >>> > >  }
> >>> > >
> >>> > >  /**
> >>> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> >>> > > b/drivers/gpu/drm/i915/intel_pm.c
> >>> > > index 684dc5f..de07d3e 100644
> >>> > > --- a/drivers/gpu/drm/i915/intel_pm.c
> >>> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> >>> > > @@ -345,6 +345,16 @@ bool intel_fbc_enabled(struct drm_device *dev)
> >>> > >         return dev_priv->display.fbc_enabled(dev);
> >>> > >  }
> >>> > >
> >>> > > +void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
> >>> > > +{
> >>> > > +       struct drm_i915_private *dev_priv = dev->dev_private;
> >>> > > +
> >>> > > +       if (!IS_GEN8(dev))
> >>> > > +               return;
> >>> > > +
> >>> > > +       I915_WRITE(MSG_FBC_REND_STATE, value);
> >>> > > +}
> >>> > > +
> >>> > >  static void intel_fbc_work_fn(struct work_struct *__work)
> >>> > >  {
> >>> > >         struct intel_fbc_work *work =
> >>> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>> > > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>> > > index 2908896..2fe871c 100644
> >>> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>> > > @@ -406,6 +406,7 @@ gen8_render_ring_flush(struct intel_engine_cs
> >>> > > *ring,
> >>> > >  {
> >>> > >         u32 flags = 0;
> >>> > >         u32 scratch_addr = ring->scratch.gtt_offset + 2 *
> >>> > > CACHELINE_BYTES;
> >>> > > +       int ret;
> >>> > >
> >>> > >         flags |= PIPE_CONTROL_CS_STALL;
> >>> > >
> >>> > > @@ -424,7 +425,14 @@ gen8_render_ring_flush(struct intel_engine_cs
> >>> > > *ring,
> >>> > >                 flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> >>> > >         }
> >>> > >
> >>> > > -       return gen8_emit_pipe_control(ring, flags, scratch_addr);
> >>> > > +       ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
> >>> > > +       if (ret)
> >>> > > +               return ret;
> >>> > > +
> >>> > > +       if (!invalidate_domains && flush_domains)
> >>> > > +               return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
> >>> > > +
> >>> > > +       return 0;
> >>> > >  }
> >>> > >
> >>> > >  static void ring_write_tail(struct intel_engine_cs *ring,
> >>> > > --
> >>> > > 1.9.3
> >>> > >
> >>> > > _______________________________________________
> >>> > > Intel-gfx mailing list
> >>> > > Intel-gfx@lists.freedesktop.org
> >>> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >>> > >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Rodrigo Vivi
> >>> > Blog: http://blog.vivi.eng.br
> >>>
> >>> > _______________________________________________
> >>> > Intel-gfx mailing list
> >>> > Intel-gfx@lists.freedesktop.org
> >>> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >>>
> >>>
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >>
> >>
> >>
> >>
> >> --
> >> Rodrigo Vivi
> >> Blog: http://blog.vivi.eng.br
> >>
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>

Rodrigo Vivi Aug. 26, 2014, 12:39 a.m. UTC | #5

I tried your suggestion for nuke fbc less... so I was just nuking along
with psr_exit and cleaning the cache along with psr_do_enable and it got
really better... I could see false colors working nice enough...
However it got worst on PC7 residency... PC7 flutuated much more with the
sw version...

So I prefer to continue using the HW/ring version we have already working
for HSW and merge this v3 to get FBC working at BDW.



On Thu, Aug 21, 2014 at 9:44 AM, Rodrigo Vivi <rodrigo.vivi@gmail.com>
wrote:

> List was accidentally drop. I didn't mean it. Sorry.
>
> And yes, you are right, we need a way to reduce nukes and cleans similar
> that we have for psr. I'll try it.
>
>
> On Tue, Aug 19, 2014 at 11:58 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>
>> Readding intel-gfx. Please don't drop mailing lists cc's without telling
>> me.
>>
>> Thanks, Daniel
>>
>> On Tue, Aug 19, 2014 at 8:57 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> > Yeah, that does a lot too much flushing - you need to track relevant
>> > dirty bits like psr does, and then only flush when there has been a
>> > preceeding invalidate with the primary plane frontbuffer bit for the
>> > pipe that's using fbc. On top of that there's room for more
>> > improvements (filtering out pageflips and optimizing that more, atm we
>> > just disable fbc over a pageflip which is a bit meh), and we should
>> > also be able to ditch all the existing fbc nuking we do from the cmd
>> > streamer.
>> > -Daniel
>> >
>> > On Tue, Aug 19, 2014 at 12:09 AM, Rodrigo Vivi <rodrigo.vivi@gmail.com>
>> wrote:
>> >>
>> http://cgit.freedesktop.org/~vivijim/drm-intel/commit/?h=fbc-sw-nuke-hsw&id=71875d3331aa3baef4f6f6bd297cc70dd94df1b6
>> >>
>> >>
>> >> On Fri, Aug 8, 2014 at 12:06 AM, Daniel Vetter <daniel@ffwll.ch>
>> wrote:
>> >>>
>> >>> On Thu, Aug 07, 2014 at 01:04:19PM -0700, Rodrigo Vivi wrote:
>> >>> > I tested here on HSW a full sw nuke/cache clean and I didn't liked
>> the
>> >>> > result.
>> >>> > It seems to compress less than the hw one and to recompress
>> everything a
>> >>> > lot and stay less time compressed.
>> >>>
>> >>> That is really unexpected. For a modern desktop (i.e. anything that
>> >>> pageflips) there should be zero difference. And for actual frontbuffer
>> >>> rendering there should only be a difference when doing tiny cpu
>> rendering
>> >>> to the frontbuffer.
>> >>>
>> >>> So something didn't work out as expected. Can you please push the code
>> >>> somewhere, or just submit patches to intel-gfx?
>> >>>
>> >>> Thanks, Daniel
>> >>> >
>> >>> > So, imho v3 is the way to go.
>> >>> >
>> >>> >
>> >>> > On Mon, Aug 4, 2014 at 3:51 AM, Rodrigo Vivi <
>> rodrigo.vivi@intel.com>
>> >>> > wrote:
>> >>> >
>> >>> > > According to spec FBC on BDW and HSW are identical without any
>> gaps.
>> >>> > > So let's copy the nuke and let FBC really start compressing stuff.
>> >>> > >
>> >>> > > Without this patch we can verify with false color that nothing is
>> >>> > > being
>> >>> > > compressed. With the nuke in place and false color it is possible
>> >>> > > to see false color debugs.
>> >>> > >
>> >>> > > Unfortunatelly on some rings like BCS on BDW we have to avoid Bits
>> >>> > > 22:18 on
>> >>> > > LRIs due to a high risk of hung. So, when using Blt ring for
>> >>> > > frontbuffer
>> >>> > > rend
>> >>> > > cache would never been cleaned and FBC would stop compressing
>> buffer.
>> >>> > > One alternative is to cache clean on software frontbuffer
>> tracking.
>> >>> > >
>> >>> > > v2: Fix rebase conflict.
>> >>> > > v3: Do not clean cache on BCS ring. Instead use sw frontbuffer
>> >>> > > tracking.
>> >>> > >
>> >>> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> >>> > > ---
>> >>> > >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>> >>> > >  drivers/gpu/drm/i915/intel_display.c    |  3 +++
>> >>> > >  drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
>> >>> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
>> >>> > >  4 files changed, 23 insertions(+), 1 deletion(-)
>> >>> > >
>> >>> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> >>> > > b/drivers/gpu/drm/i915/i915_drv.h
>> >>> > > index 2a372f2..25d7365 100644
>> >>> > > --- a/drivers/gpu/drm/i915/i915_drv.h
>> >>> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
>> >>> > > @@ -2713,6 +2713,7 @@ extern void
>> intel_modeset_setup_hw_state(struct
>> >>> > > drm_device *dev,
>> >>> > >  extern void i915_redisable_vga(struct drm_device *dev);
>> >>> > >  extern void i915_redisable_vga_power_on(struct drm_device *dev);
>> >>> > >  extern bool intel_fbc_enabled(struct drm_device *dev);
>> >>> > > +extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
>> >>> > >  extern void intel_disable_fbc(struct drm_device *dev);
>> >>> > >  extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
>> >>> > >  extern void intel_init_pch_refclk(struct drm_device *dev);
>> >>> > > diff --git a/drivers/gpu/drm/i915/intel_display.c
>> >>> > > b/drivers/gpu/drm/i915/intel_display.c
>> >>> > > index 883af0b..c8421cd 100644
>> >>> > > --- a/drivers/gpu/drm/i915/intel_display.c
>> >>> > > +++ b/drivers/gpu/drm/i915/intel_display.c
>> >>> > > @@ -9044,6 +9044,9 @@ void intel_frontbuffer_flush(struct
>> drm_device
>> >>> > > *dev,
>> >>> > >         intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
>> >>> > >
>> >>> > >         intel_edp_psr_flush(dev, frontbuffer_bits);
>> >>> > > +
>> >>> > > +       if (IS_GEN8(dev))
>> >>> > > +               gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
>> >>> > >  }
>> >>> > >
>> >>> > >  /**
>> >>> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
>> >>> > > b/drivers/gpu/drm/i915/intel_pm.c
>> >>> > > index 684dc5f..de07d3e 100644
>> >>> > > --- a/drivers/gpu/drm/i915/intel_pm.c
>> >>> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
>> >>> > > @@ -345,6 +345,16 @@ bool intel_fbc_enabled(struct drm_device
>> *dev)
>> >>> > >         return dev_priv->display.fbc_enabled(dev);
>> >>> > >  }
>> >>> > >
>> >>> > > +void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
>> >>> > > +{
>> >>> > > +       struct drm_i915_private *dev_priv = dev->dev_private;
>> >>> > > +
>> >>> > > +       if (!IS_GEN8(dev))
>> >>> > > +               return;
>> >>> > > +
>> >>> > > +       I915_WRITE(MSG_FBC_REND_STATE, value);
>> >>> > > +}
>> >>> > > +
>> >>> > >  static void intel_fbc_work_fn(struct work_struct *__work)
>> >>> > >  {
>> >>> > >         struct intel_fbc_work *work =
>> >>> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> >>> > > b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> >>> > > index 2908896..2fe871c 100644
>> >>> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> >>> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> >>> > > @@ -406,6 +406,7 @@ gen8_render_ring_flush(struct intel_engine_cs
>> >>> > > *ring,
>> >>> > >  {
>> >>> > >         u32 flags = 0;
>> >>> > >         u32 scratch_addr = ring->scratch.gtt_offset + 2 *
>> >>> > > CACHELINE_BYTES;
>> >>> > > +       int ret;
>> >>> > >
>> >>> > >         flags |= PIPE_CONTROL_CS_STALL;
>> >>> > >
>> >>> > > @@ -424,7 +425,14 @@ gen8_render_ring_flush(struct intel_engine_cs
>> >>> > > *ring,
>> >>> > >                 flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
>> >>> > >         }
>> >>> > >
>> >>> > > -       return gen8_emit_pipe_control(ring, flags, scratch_addr);
>> >>> > > +       ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
>> >>> > > +       if (ret)
>> >>> > > +               return ret;
>> >>> > > +
>> >>> > > +       if (!invalidate_domains && flush_domains)
>> >>> > > +               return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
>> >>> > > +
>> >>> > > +       return 0;
>> >>> > >  }
>> >>> > >
>> >>> > >  static void ring_write_tail(struct intel_engine_cs *ring,
>> >>> > > --
>> >>> > > 1.9.3
>> >>> > >
>> >>> > > _______________________________________________
>> >>> > > Intel-gfx mailing list
>> >>> > > Intel-gfx@lists.freedesktop.org
>> >>> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> >>> > >
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Rodrigo Vivi
>> >>> > Blog: http://blog.vivi.eng.br
>> >>>
>> >>> > _______________________________________________
>> >>> > Intel-gfx mailing list
>> >>> > Intel-gfx@lists.freedesktop.org
>> >>> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> >>>
>> >>>
>> >>> --
>> >>> Daniel Vetter
>> >>> Software Engineer, Intel Corporation
>> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Rodrigo Vivi
>> >> Blog: http://blog.vivi.eng.br
>> >>
>> >
>> >
>> >
>> > --
>> > Daniel Vetter
>> > Software Engineer, Intel Corporation
>> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>
>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>
>
>
>
> --
> Rodrigo Vivi
> Blog: http://blog.vivi.eng.br
>
>

Daniel Vetter Aug. 26, 2014, 7:54 a.m. UTC | #6

On Tue, Aug 26, 2014 at 2:39 AM, Rodrigo Vivi <rodrigo.vivi@gmail.com> wrote:
> So I prefer to continue using the HW/ring version we have already working
> for HSW and merge this v3 to get FBC working at BDW.

Well for that we first need to fix up the psr testcase. I really want that.

And then I also want fbc enabled by default, which means you need to
rebase/review Ville's patch series to make that work.

Also I really think the fb frontbuffer tracking can be made to work
for fbc - if you do it right you should actually end up with fewer
frontbuffer flushes than what we currently do by submitting them
through rings.
-Daniel

Rodrigo Vivi Aug. 26, 2014, 6:38 p.m. UTC | #7

On Tue, Aug 26, 2014 at 12:54 AM, Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Aug 26, 2014 at 2:39 AM, Rodrigo Vivi <rodrigo.vivi@gmail.com>
> wrote:
> > So I prefer to continue using the HW/ring version we have already working
> > for HSW and merge this v3 to get FBC working at BDW.
>
> Well for that we first need to fix up the psr testcase. I really want that.
>

fbc and psr are independend features and tasks prioritization should come
from managers and program managers, right?!

>
> And then I also want fbc enabled by default, which means you need to
> rebase/review Ville's patch series to make that work.
>

We have FBC working with issues and protected by parameter on all
platforms. On BDW there is no fbc at all. This patch makes FBC state at
least go to the same level as we already have FBC working on all other
platforms.

I don't see a dependency here between this fix and the big FBC rework-fix.
This patch isn't enabling FBC by default. But allowing people that want and
need to use FBC.

>
> Also I really think the fb frontbuffer tracking can be made to work
> for fbc - if you do it right you should actually end up with fewer
> frontbuffer flushes than what we currently do by submitting them
> through rings.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>

Daniel Vetter Aug. 26, 2014, 8:43 p.m. UTC | #8

On Tue, Aug 26, 2014 at 8:38 PM, Rodrigo Vivi <rodrigo.vivi@gmail.com> wrote:
> On Tue, Aug 26, 2014 at 12:54 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Tue, Aug 26, 2014 at 2:39 AM, Rodrigo Vivi <rodrigo.vivi@gmail.com>
>> wrote:
>> > So I prefer to continue using the HW/ring version we have already
>> > working
>> > for HSW and merge this v3 to get FBC working at BDW.
>>
>> Well for that we first need to fix up the psr testcase. I really want
>> that.
>
> fbc and psr are independend features and tasks prioritization should come
> from managers and program managers, right?!
>
>> And then I also want fbc enabled by default, which means you need to
>> rebase/review Ville's patch series to make that work.
>
> We have FBC working with issues and protected by parameter on all platforms.
> On BDW there is no fbc at all. This patch makes FBC state at least go to the
> same level as we already have FBC working on all other platforms.
>
> I don't see a dependency here between this fix and the big FBC rework-fix.
> This patch isn't enabling FBC by default. But allowing people that want and
> need to use FBC.

It's going to be the same answer for both parts - I don't really like
if we add features but don't complete them (so testcases and enabled
by default). And in our long meeting today a lot of people asked my
why I'm reluctant to merge patches so that we can clean them up
in-tree (which I agree is often the much more efficient approach).
Reactions like yours here are pretty much the reason for that -
getting something in to make an internal customer happy or check of a
box in our tracking or some other requirement and then move on right
away.

And I know that you yourself are in a very bad spot trenched between
me and requests from management and project tracking. And this is also
not to single out you at all, it's just what I see all over.
-Daniel

Paulo Zanoni Sept. 5, 2014, 6:28 p.m. UTC | #9

2014-08-04 7:51 GMT-03:00 Rodrigo Vivi <rodrigo.vivi@intel.com>:
> According to spec FBC on BDW and HSW are identical without any gaps.
> So let's copy the nuke and let FBC really start compressing stuff.
>
> Without this patch we can verify with false color that nothing is being
> compressed. With the nuke in place and false color it is possible
> to see false color debugs.
>
> Unfortunatelly on some rings like BCS on BDW we have to avoid Bits 22:18 on
> LRIs due to a high risk of hung. So, when using Blt ring for frontbuffer rend
> cache would never been cleaned and FBC would stop compressing buffer.
> One alternative is to cache clean on software frontbuffer tracking.
>
> v2: Fix rebase conflict.
> v3: Do not clean cache on BCS ring. Instead use sw frontbuffer tracking.

This patch causes lots of WARNs when running igt/pm_rpm/cursor and a
few other similar test cases. Now the interesting this is that we're
trying to do FBC flushing, but:
- FBC is disabled by default
- The screen is disabled
- We're runtime suspended
- We're touching cursors when we decide to flush FBC.

This is the first WARN that happens:

[  123.581179] [drm:intel_runtime_suspend] Device suspended
[  123.680666] ------------[ cut here ]------------
[  123.680697] WARNING: CPU: 1 PID: 1776 at
drivers/gpu/drm/i915/intel_uncore.c:47
assert_device_not_suspended.isra.8+0x43/0x50 [i915]()
[  123.680699] Device suspended
[  123.680700] Modules linked in: intel_rapl x86_pkg_temp_thermal
serio_raw intel_powerclamp efivars btusb iwlmvm iwlwifi mei_me mei
int3403_thermal snd_hda_codec_hdmi snd_hda_intel snd_hda_controller
i2c_designware_platform snd_hda_codec i2c_designware_core snd_hwdep
snd_pcm_oss snd_mixer_oss acpi_pad snd_pcm snd_timer fuse nls_utf8
nls_cp437 vfat fat sd_mod ahci libahci i915 sdhci_pci e1000e
drm_kms_helper drm sdhci_acpi sdhci
[  123.680733] CPU: 1 PID: 1776 Comm: pm_rpm Not tainted
3.17.0-rc2.1409051503pz+ #1100
[  123.680736] Hardware name: Intel Corporation Broadwell Client
platform/Wilson Beach SDS, BIOS BDW-E2R1.86C.0072.R03.1405072127
05/07/2014
[  123.680738]  0000000000000009 ffff88024174fab8 ffffffff816f6d23
ffff88024174fb00
[  123.680742]  ffff88024174faf0 ffffffff8107b368 ffff880037700000
0000000000050380
[  123.680746]  0000000000050380 ffff880037700068 0000000000000001
ffff88024174fb50
[  123.680751] Call Trace:
[  123.680758]  [<ffffffff816f6d23>] dump_stack+0x4d/0x66
[  123.680763]  [<ffffffff8107b368>] warn_slowpath_common+0x78/0xa0
[  123.680766]  [<ffffffff8107b3d7>] warn_slowpath_fmt+0x47/0x50
[  123.680772]  [<ffffffff810c0efd>] ? trace_hardirqs_on_caller+0x15d/0x200
[  123.680791]  [<ffffffffa0118c63>]
assert_device_not_suspended.isra.8+0x43/0x50 [i915]
[  123.680809]  [<ffffffffa011cbd5>] gen8_write32+0x35/0x180 [i915]
[  123.680821]  [<ffffffffa00dbb49>] gen8_fbc_sw_flush+0x29/0x30 [i915]
[  123.680840]  [<ffffffffa013150d>] intel_frontbuffer_flush+0x7d/0x90 [i915]
[  123.680859]  [<ffffffffa0135f6b>]
intel_cursor_plane_update+0x13b/0x150 [i915]
[  123.680876]  [<ffffffffa00276b0>] setplane_internal+0x260/0x2a0 [drm]
[  123.680889]  [<ffffffffa002780e>] drm_mode_cursor_common+0x11e/0x310 [drm]
[  123.680904]  [<ffffffffa002ae1c>] drm_mode_cursor_ioctl+0x3c/0x40 [drm]
[  123.680914]  [<ffffffffa001cc9f>] drm_ioctl+0x1df/0x6a0 [drm]
[  123.680920]  [<ffffffff816fe769>] ? mutex_unlock+0x9/0x10
[  123.680924]  [<ffffffff811f3856>] ? seq_read+0xb6/0x3e0
[  123.680929]  [<ffffffff811e2540>] do_vfs_ioctl+0x2e0/0x4e0
[  123.680934]  [<ffffffff81701077>] ? sysret_check+0x1b/0x56
[  123.680939]  [<ffffffff810c0efd>] ? trace_hardirqs_on_caller+0x15d/0x200
[  123.680943]  [<ffffffff811e27c1>] SyS_ioctl+0x81/0xa0
[  123.680947]  [<ffffffff81701052>] system_call_fastpath+0x16/0x1b
[  123.680949] ---[ end trace 9f389682639d2eb9 ]---
[  123.680953] ------------[ cut here ]------------


So, what we can/could do:
1 - Change gen8_fbc_sw_flush() do to nothing when FBC is disabled.
2 - At some point of the stack above, avoid calling everything else if
the screen and/or plane and/or crtc is disabled. But when? At
intel_cursor_plane_update? At intel_frontbuffer_flush?
3 - Don't call gen8_fbc_sw_flush if the frontbuffer_bits don't include
the primary plane bits (since, AFAIR, FBC ignores the sprite and
cursors, right?)

Is anybody against any of the 3 items above?

Also notice that the "cursor" subtest is not the only one that causes
these WARNs, so I expect similar backtraces with other similar
solutions for equivalent problems with the other planes.

Thanks,
Paulo

>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>  drivers/gpu/drm/i915/intel_display.c    |  3 +++
>  drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
>  4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2a372f2..25d7365 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2713,6 +2713,7 @@ extern void intel_modeset_setup_hw_state(struct drm_device *dev,
>  extern void i915_redisable_vga(struct drm_device *dev);
>  extern void i915_redisable_vga_power_on(struct drm_device *dev);
>  extern bool intel_fbc_enabled(struct drm_device *dev);
> +extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
>  extern void intel_disable_fbc(struct drm_device *dev);
>  extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
>  extern void intel_init_pch_refclk(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 883af0b..c8421cd 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9044,6 +9044,9 @@ void intel_frontbuffer_flush(struct drm_device *dev,
>         intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
>
>         intel_edp_psr_flush(dev, frontbuffer_bits);
> +
> +       if (IS_GEN8(dev))
> +               gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
>  }
>
>  /**
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 684dc5f..de07d3e 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -345,6 +345,16 @@ bool intel_fbc_enabled(struct drm_device *dev)
>         return dev_priv->display.fbc_enabled(dev);
>  }
>
> +void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
> +{
> +       struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +       if (!IS_GEN8(dev))
> +               return;
> +
> +       I915_WRITE(MSG_FBC_REND_STATE, value);
> +}
> +
>  static void intel_fbc_work_fn(struct work_struct *__work)
>  {
>         struct intel_fbc_work *work =
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 2908896..2fe871c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -406,6 +406,7 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
>  {
>         u32 flags = 0;
>         u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +       int ret;
>
>         flags |= PIPE_CONTROL_CS_STALL;
>
> @@ -424,7 +425,14 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
>                 flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
>         }
>
> -       return gen8_emit_pipe_control(ring, flags, scratch_addr);
> +       ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
> +       if (ret)
> +               return ret;
> +
> +       if (!invalidate_domains && flush_domains)
> +               return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
> +
> +       return 0;
>  }
>
>  static void ring_write_tail(struct intel_engine_cs *ring,
> --
> 1.9.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Rodrigo Vivi Sept. 5, 2014, 7:35 p.m. UTC | #10

On Fri, Sep 5, 2014 at 11:28 AM, Paulo Zanoni <przanoni@gmail.com> wrote:

> 2014-08-04 7:51 GMT-03:00 Rodrigo Vivi <rodrigo.vivi@intel.com>:
> > According to spec FBC on BDW and HSW are identical without any gaps.
> > So let's copy the nuke and let FBC really start compressing stuff.
> >
> > Without this patch we can verify with false color that nothing is being
> > compressed. With the nuke in place and false color it is possible
> > to see false color debugs.
> >
> > Unfortunatelly on some rings like BCS on BDW we have to avoid Bits 22:18
> on
> > LRIs due to a high risk of hung. So, when using Blt ring for frontbuffer
> rend
> > cache would never been cleaned and FBC would stop compressing buffer.
> > One alternative is to cache clean on software frontbuffer tracking.
> >
> > v2: Fix rebase conflict.
> > v3: Do not clean cache on BCS ring. Instead use sw frontbuffer tracking.
>
> This patch causes lots of WARNs when running igt/pm_rpm/cursor and a
> few other similar test cases. Now the interesting this is that we're
> trying to do FBC flushing, but:
>

As I told you on the IRC already, thank you for finding this. It is good to
get reasons and arguments.


> - FBC is disabled by default


Yeah, we need to fix it although it just matter much


> - The screen is disabled

- We're runtime suspended
>

These are another error. Not caused by this patch.
Why frontbuffer_flush is being called with screen disabled or on runtime
suspend?



> - We're touching cursors when we decide to flush FBC.
>

This is not actually a flush. But cleaning the cache to allow compression
restart.
I believe we could have some variable to indicate when we touched ring
flush than on next frontbuffer we do the mmio cache clean.


>
> This is the first WARN that happens:
>
> [  123.581179] [drm:intel_runtime_suspend] Device suspended
> [  123.680666] ------------[ cut here ]------------
> [  123.680697] WARNING: CPU: 1 PID: 1776 at
> drivers/gpu/drm/i915/intel_uncore.c:47
> assert_device_not_suspended.isra.8+0x43/0x50 [i915]()
> [  123.680699] Device suspended
> [  123.680700] Modules linked in: intel_rapl x86_pkg_temp_thermal
> serio_raw intel_powerclamp efivars btusb iwlmvm iwlwifi mei_me mei
> int3403_thermal snd_hda_codec_hdmi snd_hda_intel snd_hda_controller
> i2c_designware_platform snd_hda_codec i2c_designware_core snd_hwdep
> snd_pcm_oss snd_mixer_oss acpi_pad snd_pcm snd_timer fuse nls_utf8
> nls_cp437 vfat fat sd_mod ahci libahci i915 sdhci_pci e1000e
> drm_kms_helper drm sdhci_acpi sdhci
> [  123.680733] CPU: 1 PID: 1776 Comm: pm_rpm Not tainted
> 3.17.0-rc2.1409051503pz+ #1100
> [  123.680736] Hardware name: Intel Corporation Broadwell Client
> platform/Wilson Beach SDS, BIOS BDW-E2R1.86C.0072.R03.1405072127
> 05/07/2014
> [  123.680738]  0000000000000009 ffff88024174fab8 ffffffff816f6d23
> ffff88024174fb00
> [  123.680742]  ffff88024174faf0 ffffffff8107b368 ffff880037700000
> 0000000000050380
> [  123.680746]  0000000000050380 ffff880037700068 0000000000000001
> ffff88024174fb50
> [  123.680751] Call Trace:
> [  123.680758]  [<ffffffff816f6d23>] dump_stack+0x4d/0x66
> [  123.680763]  [<ffffffff8107b368>] warn_slowpath_common+0x78/0xa0
> [  123.680766]  [<ffffffff8107b3d7>] warn_slowpath_fmt+0x47/0x50
> [  123.680772]  [<ffffffff810c0efd>] ? trace_hardirqs_on_caller+0x15d/0x200
> [  123.680791]  [<ffffffffa0118c63>]
> assert_device_not_suspended.isra.8+0x43/0x50 [i915]
> [  123.680809]  [<ffffffffa011cbd5>] gen8_write32+0x35/0x180 [i915]
> [  123.680821]  [<ffffffffa00dbb49>] gen8_fbc_sw_flush+0x29/0x30 [i915]
> [  123.680840]  [<ffffffffa013150d>] intel_frontbuffer_flush+0x7d/0x90
> [i915]
> [  123.680859]  [<ffffffffa0135f6b>]
> intel_cursor_plane_update+0x13b/0x150 [i915]
> [  123.680876]  [<ffffffffa00276b0>] setplane_internal+0x260/0x2a0 [drm]
> [  123.680889]  [<ffffffffa002780e>] drm_mode_cursor_common+0x11e/0x310
> [drm]
> [  123.680904]  [<ffffffffa002ae1c>] drm_mode_cursor_ioctl+0x3c/0x40 [drm]
> [  123.680914]  [<ffffffffa001cc9f>] drm_ioctl+0x1df/0x6a0 [drm]
> [  123.680920]  [<ffffffff816fe769>] ? mutex_unlock+0x9/0x10
> [  123.680924]  [<ffffffff811f3856>] ? seq_read+0xb6/0x3e0
> [  123.680929]  [<ffffffff811e2540>] do_vfs_ioctl+0x2e0/0x4e0
> [  123.680934]  [<ffffffff81701077>] ? sysret_check+0x1b/0x56
> [  123.680939]  [<ffffffff810c0efd>] ? trace_hardirqs_on_caller+0x15d/0x200
> [  123.680943]  [<ffffffff811e27c1>] SyS_ioctl+0x81/0xa0
> [  123.680947]  [<ffffffff81701052>] system_call_fastpath+0x16/0x1b
> [  123.680949] ---[ end trace 9f389682639d2eb9 ]---
> [  123.680953] ------------[ cut here ]------------
>
>
> So, what we can/could do:
> 1 - Change gen8_fbc_sw_flush() do to nothing when FBC is disabled.
>

Agreed!


> 2 - At some point of the stack above, avoid calling everything else if
> the screen and/or plane and/or crtc is disabled. But when? At
> intel_cursor_plane_update? At intel_frontbuffer_flush?
>

Good question!


> 3 - Don't call gen8_fbc_sw_flush if the frontbuffer_bits don't include
> the primary plane bits (since, AFAIR, FBC ignores the sprite and
> cursors, right?)
>

not sure. but we could try check primary bits + do only when we did a ring
flush on blt ring. Actually other ring than render.


>
> Is anybody against any of the 3 items above?
>

should I change this patch itself or provide another one on top?


>
> Also notice that the "cursor" subtest is not the only one that causes
> these WARNs, so I expect similar backtraces with other similar
> solutions for equivalent problems with the other planes.
>

so check for primary bit is the way!


>
> Thanks,
> Paulo
>

Thank you!
Rodrigo.


>
> >
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
> >  drivers/gpu/drm/i915/intel_display.c    |  3 +++
> >  drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
> >  4 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index 2a372f2..25d7365 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2713,6 +2713,7 @@ extern void intel_modeset_setup_hw_state(struct
> drm_device *dev,
> >  extern void i915_redisable_vga(struct drm_device *dev);
> >  extern void i915_redisable_vga_power_on(struct drm_device *dev);
> >  extern bool intel_fbc_enabled(struct drm_device *dev);
> > +extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
> >  extern void intel_disable_fbc(struct drm_device *dev);
> >  extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
> >  extern void intel_init_pch_refclk(struct drm_device *dev);
> > diff --git a/drivers/gpu/drm/i915/intel_display.c
> b/drivers/gpu/drm/i915/intel_display.c
> > index 883af0b..c8421cd 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -9044,6 +9044,9 @@ void intel_frontbuffer_flush(struct drm_device
> *dev,
> >         intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
> >
> >         intel_edp_psr_flush(dev, frontbuffer_bits);
> > +
> > +       if (IS_GEN8(dev))
> > +               gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
> >  }
> >
> >  /**
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> > index 684dc5f..de07d3e 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -345,6 +345,16 @@ bool intel_fbc_enabled(struct drm_device *dev)
> >         return dev_priv->display.fbc_enabled(dev);
> >  }
> >
> > +void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
> > +{
> > +       struct drm_i915_private *dev_priv = dev->dev_private;
> > +
> > +       if (!IS_GEN8(dev))
> > +               return;
> > +
> > +       I915_WRITE(MSG_FBC_REND_STATE, value);
> > +}
> > +
> >  static void intel_fbc_work_fn(struct work_struct *__work)
> >  {
> >         struct intel_fbc_work *work =
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 2908896..2fe871c 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -406,6 +406,7 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
> >  {
> >         u32 flags = 0;
> >         u32 scratch_addr = ring->scratch.gtt_offset + 2 *
> CACHELINE_BYTES;
> > +       int ret;
> >
> >         flags |= PIPE_CONTROL_CS_STALL;
> >
> > @@ -424,7 +425,14 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
> >                 flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> >         }
> >
> > -       return gen8_emit_pipe_control(ring, flags, scratch_addr);
> > +       ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
> > +       if (ret)
> > +               return ret;
> > +
> > +       if (!invalidate_domains && flush_domains)
> > +               return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
> > +
> > +       return 0;
> >  }
> >
> >  static void ring_write_tail(struct intel_engine_cs *ring,
> > --
> > 1.9.3
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
>
> --
> Paulo Zanoni
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

Rodrigo Vivi Sept. 5, 2014, 9:12 p.m. UTC | #11

As Paulo told the part 2 of his proposal is still missing and we will get
the WARN of reading fbc register at fbc_enabled when on runtime_suspend...

But I couldn't find the propper place to check for intel_crtc->active. This
was another issue reported by power team that on screen off there was to
much activity on our driver. So we need to check it more carefully the
proper place to avoid all unecessary inactivities with display off.

Do you have any suggestion Daniel?



On Fri, Sep 5, 2014 at 12:35 PM, Rodrigo Vivi <rodrigo.vivi@gmail.com>
wrote:

>
>
>
> On Fri, Sep 5, 2014 at 11:28 AM, Paulo Zanoni <przanoni@gmail.com> wrote:
>
>> 2014-08-04 7:51 GMT-03:00 Rodrigo Vivi <rodrigo.vivi@intel.com>:
>> > According to spec FBC on BDW and HSW are identical without any gaps.
>> > So let's copy the nuke and let FBC really start compressing stuff.
>> >
>> > Without this patch we can verify with false color that nothing is being
>> > compressed. With the nuke in place and false color it is possible
>> > to see false color debugs.
>> >
>> > Unfortunatelly on some rings like BCS on BDW we have to avoid Bits
>> 22:18 on
>> > LRIs due to a high risk of hung. So, when using Blt ring for
>> frontbuffer rend
>> > cache would never been cleaned and FBC would stop compressing buffer.
>> > One alternative is to cache clean on software frontbuffer tracking.
>> >
>> > v2: Fix rebase conflict.
>> > v3: Do not clean cache on BCS ring. Instead use sw frontbuffer tracking.
>>
>> This patch causes lots of WARNs when running igt/pm_rpm/cursor and a
>> few other similar test cases. Now the interesting this is that we're
>> trying to do FBC flushing, but:
>>
>
> As I told you on the IRC already, thank you for finding this. It is good
> to get reasons and arguments.
>
>
>> - FBC is disabled by default
>
>
> Yeah, we need to fix it although it just matter much
>
>
>> - The screen is disabled
>
> - We're runtime suspended
>>
>
> These are another error. Not caused by this patch.
> Why frontbuffer_flush is being called with screen disabled or on runtime
> suspend?
>
>
>
>> - We're touching cursors when we decide to flush FBC.
>>
>
> This is not actually a flush. But cleaning the cache to allow compression
> restart.
> I believe we could have some variable to indicate when we touched ring
> flush than on next frontbuffer we do the mmio cache clean.
>
>
>>
>> This is the first WARN that happens:
>>
>> [  123.581179] [drm:intel_runtime_suspend] Device suspended
>> [  123.680666] ------------[ cut here ]------------
>> [  123.680697] WARNING: CPU: 1 PID: 1776 at
>> drivers/gpu/drm/i915/intel_uncore.c:47
>> assert_device_not_suspended.isra.8+0x43/0x50 [i915]()
>> [  123.680699] Device suspended
>> [  123.680700] Modules linked in: intel_rapl x86_pkg_temp_thermal
>> serio_raw intel_powerclamp efivars btusb iwlmvm iwlwifi mei_me mei
>> int3403_thermal snd_hda_codec_hdmi snd_hda_intel snd_hda_controller
>> i2c_designware_platform snd_hda_codec i2c_designware_core snd_hwdep
>> snd_pcm_oss snd_mixer_oss acpi_pad snd_pcm snd_timer fuse nls_utf8
>> nls_cp437 vfat fat sd_mod ahci libahci i915 sdhci_pci e1000e
>> drm_kms_helper drm sdhci_acpi sdhci
>> [  123.680733] CPU: 1 PID: 1776 Comm: pm_rpm Not tainted
>> 3.17.0-rc2.1409051503pz+ #1100
>> [  123.680736] Hardware name: Intel Corporation Broadwell Client
>> platform/Wilson Beach SDS, BIOS BDW-E2R1.86C.0072.R03.1405072127
>> 05/07/2014
>> [  123.680738]  0000000000000009 ffff88024174fab8 ffffffff816f6d23
>> ffff88024174fb00
>> [  123.680742]  ffff88024174faf0 ffffffff8107b368 ffff880037700000
>> 0000000000050380
>> [  123.680746]  0000000000050380 ffff880037700068 0000000000000001
>> ffff88024174fb50
>> [  123.680751] Call Trace:
>> [  123.680758]  [<ffffffff816f6d23>] dump_stack+0x4d/0x66
>> [  123.680763]  [<ffffffff8107b368>] warn_slowpath_common+0x78/0xa0
>> [  123.680766]  [<ffffffff8107b3d7>] warn_slowpath_fmt+0x47/0x50
>> [  123.680772]  [<ffffffff810c0efd>] ?
>> trace_hardirqs_on_caller+0x15d/0x200
>> [  123.680791]  [<ffffffffa0118c63>]
>> assert_device_not_suspended.isra.8+0x43/0x50 [i915]
>> [  123.680809]  [<ffffffffa011cbd5>] gen8_write32+0x35/0x180 [i915]
>> [  123.680821]  [<ffffffffa00dbb49>] gen8_fbc_sw_flush+0x29/0x30 [i915]
>> [  123.680840]  [<ffffffffa013150d>] intel_frontbuffer_flush+0x7d/0x90
>> [i915]
>> [  123.680859]  [<ffffffffa0135f6b>]
>> intel_cursor_plane_update+0x13b/0x150 [i915]
>> [  123.680876]  [<ffffffffa00276b0>] setplane_internal+0x260/0x2a0 [drm]
>> [  123.680889]  [<ffffffffa002780e>] drm_mode_cursor_common+0x11e/0x310
>> [drm]
>> [  123.680904]  [<ffffffffa002ae1c>] drm_mode_cursor_ioctl+0x3c/0x40 [drm]
>> [  123.680914]  [<ffffffffa001cc9f>] drm_ioctl+0x1df/0x6a0 [drm]
>> [  123.680920]  [<ffffffff816fe769>] ? mutex_unlock+0x9/0x10
>> [  123.680924]  [<ffffffff811f3856>] ? seq_read+0xb6/0x3e0
>> [  123.680929]  [<ffffffff811e2540>] do_vfs_ioctl+0x2e0/0x4e0
>> [  123.680934]  [<ffffffff81701077>] ? sysret_check+0x1b/0x56
>> [  123.680939]  [<ffffffff810c0efd>] ?
>> trace_hardirqs_on_caller+0x15d/0x200
>> [  123.680943]  [<ffffffff811e27c1>] SyS_ioctl+0x81/0xa0
>> [  123.680947]  [<ffffffff81701052>] system_call_fastpath+0x16/0x1b
>> [  123.680949] ---[ end trace 9f389682639d2eb9 ]---
>> [  123.680953] ------------[ cut here ]------------
>>
>>
>> So, what we can/could do:
>> 1 - Change gen8_fbc_sw_flush() do to nothing when FBC is disabled.
>>
>
> Agreed!
>
>
>> 2 - At some point of the stack above, avoid calling everything else if
>> the screen and/or plane and/or crtc is disabled. But when? At
>> intel_cursor_plane_update? At intel_frontbuffer_flush?
>>
>
> Good question!
>
>
>> 3 - Don't call gen8_fbc_sw_flush if the frontbuffer_bits don't include
>> the primary plane bits (since, AFAIR, FBC ignores the sprite and
>> cursors, right?)
>>
>
> not sure. but we could try check primary bits + do only when we did a ring
> flush on blt ring. Actually other ring than render.
>
>
>>
>> Is anybody against any of the 3 items above?
>>
>
> should I change this patch itself or provide another one on top?
>
>
>>
>> Also notice that the "cursor" subtest is not the only one that causes
>> these WARNs, so I expect similar backtraces with other similar
>> solutions for equivalent problems with the other planes.
>>
>
> so check for primary bit is the way!
>
>
>>
>> Thanks,
>> Paulo
>>
>
> Thank you!
> Rodrigo.
>
>
>>
>> >
>> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>> >  drivers/gpu/drm/i915/intel_display.c    |  3 +++
>> >  drivers/gpu/drm/i915/intel_pm.c         | 10 ++++++++++
>> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++++-
>> >  4 files changed, 23 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> > index 2a372f2..25d7365 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.h
>> > +++ b/drivers/gpu/drm/i915/i915_drv.h
>> > @@ -2713,6 +2713,7 @@ extern void intel_modeset_setup_hw_state(struct
>> drm_device *dev,
>> >  extern void i915_redisable_vga(struct drm_device *dev);
>> >  extern void i915_redisable_vga_power_on(struct drm_device *dev);
>> >  extern bool intel_fbc_enabled(struct drm_device *dev);
>> > +extern void gen8_fbc_sw_flush(struct drm_device *dev, u32 value);
>> >  extern void intel_disable_fbc(struct drm_device *dev);
>> >  extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
>> >  extern void intel_init_pch_refclk(struct drm_device *dev);
>> > diff --git a/drivers/gpu/drm/i915/intel_display.c
>> b/drivers/gpu/drm/i915/intel_display.c
>> > index 883af0b..c8421cd 100644
>> > --- a/drivers/gpu/drm/i915/intel_display.c
>> > +++ b/drivers/gpu/drm/i915/intel_display.c
>> > @@ -9044,6 +9044,9 @@ void intel_frontbuffer_flush(struct drm_device
>> *dev,
>> >         intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
>> >
>> >         intel_edp_psr_flush(dev, frontbuffer_bits);
>> > +
>> > +       if (IS_GEN8(dev))
>> > +               gen8_fbc_sw_flush(dev, FBC_REND_CACHE_CLEAN);
>> >  }
>> >
>> >  /**
>> > diff --git a/drivers/gpu/drm/i915/intel_pm.c
>> b/drivers/gpu/drm/i915/intel_pm.c
>> > index 684dc5f..de07d3e 100644
>> > --- a/drivers/gpu/drm/i915/intel_pm.c
>> > +++ b/drivers/gpu/drm/i915/intel_pm.c
>> > @@ -345,6 +345,16 @@ bool intel_fbc_enabled(struct drm_device *dev)
>> >         return dev_priv->display.fbc_enabled(dev);
>> >  }
>> >
>> > +void gen8_fbc_sw_flush(struct drm_device *dev, u32 value)
>> > +{
>> > +       struct drm_i915_private *dev_priv = dev->dev_private;
>> > +
>> > +       if (!IS_GEN8(dev))
>> > +               return;
>> > +
>> > +       I915_WRITE(MSG_FBC_REND_STATE, value);
>> > +}
>> > +
>> >  static void intel_fbc_work_fn(struct work_struct *__work)
>> >  {
>> >         struct intel_fbc_work *work =
>> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > index 2908896..2fe871c 100644
>> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > @@ -406,6 +406,7 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
>> >  {
>> >         u32 flags = 0;
>> >         u32 scratch_addr = ring->scratch.gtt_offset + 2 *
>> CACHELINE_BYTES;
>> > +       int ret;
>> >
>> >         flags |= PIPE_CONTROL_CS_STALL;
>> >
>> > @@ -424,7 +425,14 @@ gen8_render_ring_flush(struct intel_engine_cs
>> *ring,
>> >                 flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
>> >         }
>> >
>> > -       return gen8_emit_pipe_control(ring, flags, scratch_addr);
>> > +       ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
>> > +       if (ret)
>> > +               return ret;
>> > +
>> > +       if (!invalidate_domains && flush_domains)
>> > +               return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
>> > +
>> > +       return 0;
>> >  }
>> >
>> >  static void ring_write_tail(struct intel_engine_cs *ring,
>> > --
>> > 1.9.3
>> >
>> > _______________________________________________
>> > Intel-gfx mailing list
>> > Intel-gfx@lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>>
>>
>> --
>> Paulo Zanoni
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>
>
>
> --
> Rodrigo Vivi
> Blog: http://blog.vivi.eng.br
>
>

Daniel Vetter Sept. 8, 2014, 7:26 a.m. UTC | #12

On Fri, Sep 05, 2014 at 12:35:18PM -0700, Rodrigo Vivi wrote:
> On Fri, Sep 5, 2014 at 11:28 AM, Paulo Zanoni <przanoni@gmail.com> wrote:
> > - The screen is disabled
> 
> - We're runtime suspended
> >
> 
> These are another error. Not caused by this patch.
> Why frontbuffer_flush is being called with screen disabled or on runtime
> suspend?

Because the frontbuffer tracking only tells you when a frontbuffer gets
updated. It's the job of the consumers to properly filter these events so
that they only act upon those which are relevant. psr does that, fbc (as
is) doesn't.

And if you have that filtering, adding more filtering as a safeguard
really isn't all that useful.

> > - We're touching cursors when we decide to flush FBC.
> >
> 
> This is not actually a flush. But cleaning the cache to allow compression
> restart.
> I believe we could have some variable to indicate when we touched ring
> flush than on next frontbuffer we do the mmio cache clean.

If you filter frontbuffers properly you can ignore all the cursor/sprite
events completely.
-Daniel

Daniel Vetter Sept. 8, 2014, 7:29 a.m. UTC | #13

On Fri, Sep 05, 2014 at 02:12:49PM -0700, Rodrigo Vivi wrote:
> As Paulo told the part 2 of his proposal is still missing and we will get
> the WARN of reading fbc register at fbc_enabled when on runtime_suspend...
> 
> But I couldn't find the propper place to check for intel_crtc->active. This
> was another issue reported by power team that on screen off there was to
> much activity on our driver. So we need to check it more carefully the
> proper place to avoid all unecessary inactivities with display off.
> 
> Do you have any suggestion Daniel?

Don't check for intel->active since doing that will lead to hilarious
locking inversions. You have to add a bit of tracking (with it's own
locking) for which frontbuffer events are interesting for fbc like psr
does. Since we only do fbc on one plane ever that's only ever going to be
one bit, but setting/clearing it correctly in crtc_enable/disable is the
important part here.
-Daniel

drm/i915: FBC flush nuke for BDW

Commit Message

Comments

Patch