Message ID | 1465607883-8533-1-git-send-email-paulo.r.zanoni@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Jun 10 Paulo Zanoni wrote: > Since my test machines don't produce FIFO underrun errors, I tested this by > creating a debugfs file that just calls intel_fbc_handle_fifo_underrun(). I'd > appreciate some Tested-by tags, if possible. Since May 8 I have been using 4.6.0-rc6 patched with drm-intel-nightly_2016y-05m-06d-14h-29m-58s (from what I understand, something close to what is now in mainline), and fbc disabled on the kernel commandline. I have not rebooted since May 8. For that kernel, i915.enable_fbc=0 is both required and sufficient to prevent freezes at random points in time (as reported). The drm-intel-nightly_2016y-05m-06d-14h-29m-58s patch on the other hand has the effect that there are never any FIFO underruns logged anymore. If there are no FIFO underruns reported in dmesg, your underrun detection routine would never hit, would it? If you nevertheless think it's worth trying, should I test it on top of 4.7-rc3 or on current drm-intel-nightly?
Em Seg, 2016-06-13 às 13:47 +0200, Stefan Richter escreveu: > On Jun 10 Paulo Zanoni wrote: > > > > Since my test machines don't produce FIFO underrun errors, I tested > > this by > > creating a debugfs file that just calls > > intel_fbc_handle_fifo_underrun(). I'd > > appreciate some Tested-by tags, if possible. > Since May 8 I have been using 4.6.0-rc6 patched with > drm-intel-nightly_2016y-05m-06d-14h-29m-58s (from what I understand, > something close to what is now in mainline), and fbc disabled on the > kernel commandline. I have not rebooted since May 8. Ok, so you're saying that if you boot this Kernel with i915.fbc=1, you won't get any FIFO underrun messages, but the system is still going to freeze somehow? If that's the case, then this patch won't help. > > For that kernel, i915.enable_fbc=0 is both required and sufficient to > prevent freezes at random points in time (as reported). The > drm-intel-nightly_2016y-05m-06d-14h-29m-58s patch on the other hand > has the > effect that there are never any FIFO underruns logged anymore. > > If there are no FIFO underruns reported in dmesg, your underrun > detection routine would never hit, would it? You're right, there's no need to test on this case. > > If you nevertheless think it's worth trying, should I test it on top > of > 4.7-rc3 or on current drm-intel-nightly?
On Jun 13 Zanoni, Paulo R wrote: > Em Seg, 2016-06-13 às 13:47 +0200, Stefan Richter escreveu: > > On Jun 10 Paulo Zanoni wrote: > > > > > > Since my test machines don't produce FIFO underrun errors, I tested > > > this by > > > creating a debugfs file that just calls > > > intel_fbc_handle_fifo_underrun(). I'd > > > appreciate some Tested-by tags, if possible. > > Since May 8 I have been using 4.6.0-rc6 patched with > > drm-intel-nightly_2016y-05m-06d-14h-29m-58s (from what I understand, > > something close to what is now in mainline), and fbc disabled on the > > kernel commandline. I have not rebooted since May 8. > > Ok, so you're saying that if you boot this Kernel with i915.fbc=1, you > won't get any FIFO underrun messages, but the system is still going to > freeze somehow? Yes. > If that's the case, then this patch won't help. Nevertheless I will try to test current mainline + your patch this week or next week, just to be sure and to have another point of reference.
On Fri, 2016-06-10 at 22:18 -0300, Paulo Zanoni wrote: > Ever since I started working on FBC I was already aware that FBC can > really amplify the FIFO underrun symptoms. On systems where FIFO > underruns were harmless error messages, enabling FBC would cause the > underruns to give black screens. > Do we know why we get black screens in this scenario? > We recently tried to enable FBC on Haswell and got reports of a system > that would hang after some hours of uptime, and the first bad commit > was the one that enabled FBC. We also observed that this system had > FIFO underrun error messages on its dmesg. Although we don't have any > evidence that fixing the underruns would solve the bug and make FBC > work properly on this machine, IMHO it's better if we minimize the > amount of possible problems by just giving up FBC whenever we detect > an underrun. > > v2: new version, different implementation and commit message. > > Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> > Cc: Lyude <cpaul@redhat.com> > Cc: Steven Honeyman <stevenhoneyman@gmail.com> > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 3 ++ > drivers/gpu/drm/i915/intel_drv.h | 1 + > drivers/gpu/drm/i915/intel_fbc.c | 53 ++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/intel_fifo_underrun.c | 2 ++ > 4 files changed, 59 insertions(+) > > > Since my test machines don't produce FIFO underrun errors, I tested this by > creating a debugfs file that just calls intel_fbc_handle_fifo_underrun(). I'd > appreciate some Tested-by tags, if possible. > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 20a676d..18b4257 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -908,6 +908,9 @@ struct intel_fbc { > bool enabled; > bool active; > > + bool underrun_detected; > + struct work_struct underrun_work; > + > struct intel_fbc_state_cache { > struct { > unsigned int mode_flags; > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h > index ebe7b34..7bf97b1 100644 > --- a/drivers/gpu/drm/i915/intel_drv.h > +++ b/drivers/gpu/drm/i915/intel_drv.h > @@ -1436,6 +1436,7 @@ void intel_fbc_invalidate(struct drm_i915_private *dev_priv, > void intel_fbc_flush(struct drm_i915_private *dev_priv, > unsigned int frontbuffer_bits, enum fb_op_origin origin); > void intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv); > +void intel_fbc_handle_fifo_underrun(struct drm_i915_private *dev_priv); > > /* intel_hdmi.c */ > void intel_hdmi_init(struct drm_device *dev, i915_reg_t hdmi_reg, enum port port); > diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c > index d268f76..2363bff 100644 > --- a/drivers/gpu/drm/i915/intel_fbc.c > +++ b/drivers/gpu/drm/i915/intel_fbc.c > @@ -755,6 +755,13 @@ static bool intel_fbc_can_activate(struct intel_crtc *crtc) > struct intel_fbc *fbc = &dev_priv->fbc; > struct intel_fbc_state_cache *cache = &fbc->state_cache; > > + /* We don't need to use a state cache here since this information is > + * global for every CRTC. */ > + if (fbc->underrun_detected) { > + fbc->no_fbc_reason = "underrun detected"; > + return false; > + } > + > if (!cache->plane.visible) { > fbc->no_fbc_reason = "primary plane not visible"; > return false; > @@ -1195,6 +1202,51 @@ void intel_fbc_global_disable(struct drm_i915_private *dev_priv) > cancel_work_sync(&fbc->work.work); > } > > +static void intel_fbc_underrun_work_fn(struct work_struct *work) > +{ > + struct drm_i915_private *dev_priv = > + container_of(work, struct drm_i915_private, fbc.underrun_work); > + struct intel_fbc *fbc = &dev_priv->fbc; > + > + mutex_lock(&fbc->lock); > + > + /* Maybe we were scheduled twice. */ > + if (fbc->underrun_detected) > + goto out; > + > + DRM_DEBUG_KMS("Disabling FBC due to FIFO underrun.\n"); > + fbc->underrun_detected = true; > + > + intel_fbc_deactivate(dev_priv); > +out: > + mutex_unlock(&fbc->lock); > +} > + > +/** > + * intel_fbc_handle_fifo_underrun - disable FBC when we get a FIFO underrun > + * @dev_priv: i915 device instance > + * > + * Without FBC, most underruns are harmless and don't really cause too many > + * problems, except for an annoying message on dmesg. With FBC, underruns can > + * become black screens or even worse, especially when paired with bad > + * watermarks. So in order for us to be on the safe side, completely disable FBC > + * in case we ever detect a FIFO underrun on any pipe. An underrun on any pipe > + * already suggests that watermarks may be bad, so try to be as safe as > + * possible. > + */ > +void intel_fbc_handle_fifo_underrun(struct drm_i915_private *dev_priv) > +{ > + struct intel_fbc *fbc = &dev_priv->fbc; > + > + if (!fbc_supported(dev_priv)) > + return; > + Should we bail out if fbc is not enabled? Also, can we just disable fbc if we see an underrun instead of using a new flag to prevent activation? > + if (fbc->underrun_detected) > + return; > + > + schedule_work(&fbc->underrun_work); > +} > + > /** > * intel_fbc_init_pipe_state - initialize FBC's CRTC visibility tracking > * @dev_priv: i915 device instance > @@ -1250,6 +1302,7 @@ void intel_fbc_init(struct drm_i915_private *dev_priv) > enum pipe pipe; > > INIT_WORK(&fbc->work.work, intel_fbc_work_fn); > + INIT_WORK(&fbc->underrun_work, intel_fbc_underrun_work_fn); > mutex_init(&fbc->lock); > fbc->enabled = false; > fbc->active = false; > diff --git a/drivers/gpu/drm/i915/intel_fifo_underrun.c b/drivers/gpu/drm/i915/intel_fifo_underrun.c > index 9be839a..9d43cbd 100644 > --- a/drivers/gpu/drm/i915/intel_fifo_underrun.c > +++ b/drivers/gpu/drm/i915/intel_fifo_underrun.c > @@ -370,6 +370,8 @@ void intel_cpu_fifo_underrun_irq_handler(struct drm_i915_private *dev_priv, > if (intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, false)) > DRM_ERROR("CPU pipe %c FIFO underrun\n", > pipe_name(pipe)); > + > + intel_fbc_handle_fifo_underrun(dev_priv); > } >
On Fri, Aug 12, 2016 at 11:24:59PM +0000, Pandiyan, Dhinakaran wrote: > > +/** > > + * intel_fbc_handle_fifo_underrun - disable FBC when we get a FIFO underrun > > + * @dev_priv: i915 device instance > > + * > > + * Without FBC, most underruns are harmless and don't really cause too many > > + * problems, except for an annoying message on dmesg. With FBC, underruns can > > + * become black screens or even worse, especially when paired with bad > > + * watermarks. So in order for us to be on the safe side, completely disable FBC > > + * in case we ever detect a FIFO underrun on any pipe. An underrun on any pipe > > + * already suggests that watermarks may be bad, so try to be as safe as > > + * possible. > > + */ > > +void intel_fbc_handle_fifo_underrun(struct drm_i915_private *dev_priv) > > +{ > > + struct intel_fbc *fbc = &dev_priv->fbc; > > + > > + if (!fbc_supported(dev_priv)) > > + return; > > + > > Should we bail out if fbc is not enabled? > Also, can we just disable fbc if we see an underrun instead of using a > new flag to prevent activation? The information that is crucially absent in the function name, and its kerneldoc, is that this function is run from hardirq context. There isn't much you are allowed to do here but schedule work. -Chris
Em Sáb, 2016-08-13 às 10:05 +0100, Chris Wilson escreveu: > On Fri, Aug 12, 2016 at 11:24:59PM +0000, Pandiyan, Dhinakaran wrote: > Do we know why we get black screens in this scenario? I don't know exactly, but I can speculate that it's probably because the display engine fails to decompress the framebuffer and gets completely lost on when the data ends, so it gives up and shows black. > > > > > > > > +/** > > > + * intel_fbc_handle_fifo_underrun - disable FBC when we get a > > > FIFO underrun > > > + * @dev_priv: i915 device instance > > > + * > > > + * Without FBC, most underruns are harmless and don't really > > > cause too many > > > + * problems, except for an annoying message on dmesg. With FBC, > > > underruns can > > > + * become black screens or even worse, especially when paired > > > with bad > > > + * watermarks. So in order for us to be on the safe side, > > > completely disable FBC > > > + * in case we ever detect a FIFO underrun on any pipe. An > > > underrun on any pipe > > > + * already suggests that watermarks may be bad, so try to be as > > > safe as > > > + * possible. > > > + */ > > > +void intel_fbc_handle_fifo_underrun(struct drm_i915_private > > > *dev_priv) > > > +{ > > > + struct intel_fbc *fbc = &dev_priv->fbc; > > > + > > > + if (!fbc_supported(dev_priv)) > > > + return; > > > + > > > > Should we bail out if fbc is not enabled? No. Even if FBC is disabled or deactivated we need to make sure the code doesn't decide to enable/activate FBC later, and simply disabling it now won't prevent it from being reactivated on the next modesets. > > Also, can we just disable fbc if we see an underrun instead of > > using a > > new flag to prevent activation? No, for the same reason as above: we don't want the code to decide to enable/activate FBC later. > > The information that is crucially absent in the function name, and > its > kerneldoc, is that this function is run from hardirq context. There > isn't > much you are allowed to do here but schedule work. You're right. I'll rename the function and improve the doc. > -Chris >
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 20a676d..18b4257 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -908,6 +908,9 @@ struct intel_fbc { bool enabled; bool active; + bool underrun_detected; + struct work_struct underrun_work; + struct intel_fbc_state_cache { struct { unsigned int mode_flags; diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index ebe7b34..7bf97b1 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -1436,6 +1436,7 @@ void intel_fbc_invalidate(struct drm_i915_private *dev_priv, void intel_fbc_flush(struct drm_i915_private *dev_priv, unsigned int frontbuffer_bits, enum fb_op_origin origin); void intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv); +void intel_fbc_handle_fifo_underrun(struct drm_i915_private *dev_priv); /* intel_hdmi.c */ void intel_hdmi_init(struct drm_device *dev, i915_reg_t hdmi_reg, enum port port); diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c index d268f76..2363bff 100644 --- a/drivers/gpu/drm/i915/intel_fbc.c +++ b/drivers/gpu/drm/i915/intel_fbc.c @@ -755,6 +755,13 @@ static bool intel_fbc_can_activate(struct intel_crtc *crtc) struct intel_fbc *fbc = &dev_priv->fbc; struct intel_fbc_state_cache *cache = &fbc->state_cache; + /* We don't need to use a state cache here since this information is + * global for every CRTC. */ + if (fbc->underrun_detected) { + fbc->no_fbc_reason = "underrun detected"; + return false; + } + if (!cache->plane.visible) { fbc->no_fbc_reason = "primary plane not visible"; return false; @@ -1195,6 +1202,51 @@ void intel_fbc_global_disable(struct drm_i915_private *dev_priv) cancel_work_sync(&fbc->work.work); } +static void intel_fbc_underrun_work_fn(struct work_struct *work) +{ + struct drm_i915_private *dev_priv = + container_of(work, struct drm_i915_private, fbc.underrun_work); + struct intel_fbc *fbc = &dev_priv->fbc; + + mutex_lock(&fbc->lock); + + /* Maybe we were scheduled twice. */ + if (fbc->underrun_detected) + goto out; + + DRM_DEBUG_KMS("Disabling FBC due to FIFO underrun.\n"); + fbc->underrun_detected = true; + + intel_fbc_deactivate(dev_priv); +out: + mutex_unlock(&fbc->lock); +} + +/** + * intel_fbc_handle_fifo_underrun - disable FBC when we get a FIFO underrun + * @dev_priv: i915 device instance + * + * Without FBC, most underruns are harmless and don't really cause too many + * problems, except for an annoying message on dmesg. With FBC, underruns can + * become black screens or even worse, especially when paired with bad + * watermarks. So in order for us to be on the safe side, completely disable FBC + * in case we ever detect a FIFO underrun on any pipe. An underrun on any pipe + * already suggests that watermarks may be bad, so try to be as safe as + * possible. + */ +void intel_fbc_handle_fifo_underrun(struct drm_i915_private *dev_priv) +{ + struct intel_fbc *fbc = &dev_priv->fbc; + + if (!fbc_supported(dev_priv)) + return; + + if (fbc->underrun_detected) + return; + + schedule_work(&fbc->underrun_work); +} + /** * intel_fbc_init_pipe_state - initialize FBC's CRTC visibility tracking * @dev_priv: i915 device instance @@ -1250,6 +1302,7 @@ void intel_fbc_init(struct drm_i915_private *dev_priv) enum pipe pipe; INIT_WORK(&fbc->work.work, intel_fbc_work_fn); + INIT_WORK(&fbc->underrun_work, intel_fbc_underrun_work_fn); mutex_init(&fbc->lock); fbc->enabled = false; fbc->active = false; diff --git a/drivers/gpu/drm/i915/intel_fifo_underrun.c b/drivers/gpu/drm/i915/intel_fifo_underrun.c index 9be839a..9d43cbd 100644 --- a/drivers/gpu/drm/i915/intel_fifo_underrun.c +++ b/drivers/gpu/drm/i915/intel_fifo_underrun.c @@ -370,6 +370,8 @@ void intel_cpu_fifo_underrun_irq_handler(struct drm_i915_private *dev_priv, if (intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, false)) DRM_ERROR("CPU pipe %c FIFO underrun\n", pipe_name(pipe)); + + intel_fbc_handle_fifo_underrun(dev_priv); } /**
Ever since I started working on FBC I was already aware that FBC can really amplify the FIFO underrun symptoms. On systems where FIFO underruns were harmless error messages, enabling FBC would cause the underruns to give black screens. We recently tried to enable FBC on Haswell and got reports of a system that would hang after some hours of uptime, and the first bad commit was the one that enabled FBC. We also observed that this system had FIFO underrun error messages on its dmesg. Although we don't have any evidence that fixing the underruns would solve the bug and make FBC work properly on this machine, IMHO it's better if we minimize the amount of possible problems by just giving up FBC whenever we detect an underrun. v2: new version, different implementation and commit message. Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Lyude <cpaul@redhat.com> Cc: Steven Honeyman <stevenhoneyman@gmail.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> --- drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/intel_drv.h | 1 + drivers/gpu/drm/i915/intel_fbc.c | 53 ++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_fifo_underrun.c | 2 ++ 4 files changed, 59 insertions(+) Since my test machines don't produce FIFO underrun errors, I tested this by creating a debugfs file that just calls intel_fbc_handle_fifo_underrun(). I'd appreciate some Tested-by tags, if possible.