diff mbox

[v4,2/4] drm/i915/psr: Account for sink CRC raciness on some panels

Message ID 1500413648-4765-2-git-send-email-jim.bride@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

jim.bride@linux.intel.com July 18, 2017, 9:34 p.m. UTC
According to the eDP spec, when the count field in TEST_SINK_MISC
increments then the six bytes of sink CRC information in the DPCD
should be valid.  Unfortunately, this doesn't seem to be the case
on some panels, and as a result we get some incorrect and inconsistent
values from the sink CRC DPCD locations at times.  This problem exhibits
itself more on faster processors (relative failure rates HSW < SKL < KBL.)
In order to try and account for this, we try a lot harder to read the sink
CRC until we get consistent values twice in a row before returning what we
read and delay for a time before trying to read.  We still see some
occasional failures, but reading the sink CRC is much more reliable,
particularly on SKL and KBL, with these changes than without.

v2: * Reduce number of retries when reading the sink CRC (Jani)
    * Refactor to minimize changes to the code (Jani)
    * Rebase
v3: * Rebase
v4: * Switch from do-while to for loop when reading CRC values (Jani)
    * Rebase
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jim Bride <jim.bride@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_dp.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

Comments

Rodrigo Vivi Aug. 3, 2017, 6:07 p.m. UTC | #1
On Tue, Jul 18, 2017 at 2:34 PM, Jim Bride <jim.bride@linux.intel.com> wrote:
> According to the eDP spec, when the count field in TEST_SINK_MISC
> increments then the six bytes of sink CRC information in the DPCD
> should be valid.  Unfortunately, this doesn't seem to be the case
> on some panels, and as a result we get some incorrect and inconsistent
> values from the sink CRC DPCD locations at times.  This problem exhibits
> itself more on faster processors (relative failure rates HSW < SKL < KBL.)
> In order to try and account for this, we try a lot harder to read the sink
> CRC until we get consistent values twice in a row before returning what we
> read and delay for a time before trying to read.  We still see some
> occasional failures, but reading the sink CRC is much more reliable,
> particularly on SKL and KBL, with these changes than without.

Is DK now ok with this description?
I believe he requested more info here.

>
> v2: * Reduce number of retries when reading the sink CRC (Jani)
>     * Refactor to minimize changes to the code (Jani)
>     * Rebase
> v3: * Rebase
> v4: * Switch from do-while to for loop when reading CRC values (Jani)
>     * Rebase
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Signed-off-by: Jim Bride <jim.bride@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_dp.c | 33 ++++++++++++++++++++++++++++++---
>  1 file changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> index 2d42d09..c90ca1c 100644
> --- a/drivers/gpu/drm/i915/intel_dp.c
> +++ b/drivers/gpu/drm/i915/intel_dp.c
> @@ -3906,6 +3906,11 @@ int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)
>         u8 buf;
>         int count, ret;
>         int attempts = 6;
> +       u8 old_crc[6];
> +
> +       if (crc == NULL) {
> +               return -ENOMEM;
> +       }

wouldn't we drop this check per DK and Jani request?
I believe we don't need it, but even if there are cases that we need
we could remove the braces..

>
>         ret = intel_dp_sink_crc_start(intel_dp);
>         if (ret)
> @@ -3929,11 +3934,33 @@ int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)
>                 goto stop;
>         }
>
> -       if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR, crc, 6) < 0) {
> -               ret = -EIO;
> -               goto stop;
> +       /*
> +        * Sometimes it takes a while for the "real" CRC values to land in
> +        * the DPCD, so try several times until we get two reads in a row
> +        * that are the same.  If we're an eDP panel, delay between reads
> +        * for a while since the values take a bit longer to propagate.
> +        */
> +       for (attempts = 0; attempts < 6; attempts++) {
> +               intel_wait_for_vblank(dev_priv, intel_crtc->pipe);

DK, we need vblank wait because the crc calculation also may take one vblank.
usually 2 actually... one to make sure you have the full screen
updated and one for the calculation.
In the past when we didn't used the count we were waiting 2 vblanks...

> +
> +               if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR,
> +                                    crc, 6) < 0) {
> +                       ret = -EIO;
> +                       break;
> +               }
> +
> +               if (attempts && memcmp(old_crc, crc, 6) == 0)
> +                       break;
> +               memcpy(old_crc, crc, 6);

little bikeshed: too many hardcoded "6" around... a sizeof would be better...
but whatever...

> +
> +               if (is_edp(intel_dp))
> +                       usleep_range(20000, 25000);
>         }
>
> +       if (attempts == 6) {
> +               DRM_DEBUG_KMS("Failed to get CRC after 6 attempts.\n");
> +               ret = -ETIMEDOUT;
> +       }
>  stop:
>         intel_dp_sink_crc_stop(intel_dp);
>         return ret;
> --
> 2.7.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Dhinakaran Pandiyan Aug. 4, 2017, 6:38 p.m. UTC | #2
On Thu, 2017-08-03 at 11:07 -0700, Rodrigo Vivi wrote:
> On Tue, Jul 18, 2017 at 2:34 PM, Jim Bride <jim.bride@linux.intel.com> wrote:

> > According to the eDP spec, when the count field in TEST_SINK_MISC

> > increments then the six bytes of sink CRC information in the DPCD

> > should be valid.  Unfortunately, this doesn't seem to be the case

> > on some panels, and as a result we get some incorrect and inconsistent

> > values from the sink CRC DPCD locations at times.  This problem exhibits

> > itself more on faster processors (relative failure rates HSW < SKL < KBL.)

> > In order to try and account for this, we try a lot harder to read the sink

> > CRC until we get consistent values twice in a row before returning what we

> > read and delay for a time before trying to read.  We still see some

> > occasional failures, but reading the sink CRC is much more reliable,

> > particularly on SKL and KBL, with these changes than without.


I'm curious if we get the correct crc if we waited a full second.

> 

> Is DK now ok with this description?

> I believe he requested more info here.

> 

> >

> > v2: * Reduce number of retries when reading the sink CRC (Jani)

> >     * Refactor to minimize changes to the code (Jani)

> >     * Rebase

> > v3: * Rebase

> > v4: * Switch from do-while to for loop when reading CRC values (Jani)

> >     * Rebase

> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

> > Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>

> > Cc: Jani Nikula <jani.nikula@intel.com>

> > Signed-off-by: Jim Bride <jim.bride@linux.intel.com>

> > ---

> >  drivers/gpu/drm/i915/intel_dp.c | 33 ++++++++++++++++++++++++++++++---

> >  1 file changed, 30 insertions(+), 3 deletions(-)

> >

> > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c

> > index 2d42d09..c90ca1c 100644

> > --- a/drivers/gpu/drm/i915/intel_dp.c

> > +++ b/drivers/gpu/drm/i915/intel_dp.c

> > @@ -3906,6 +3906,11 @@ int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)

> >         u8 buf;

> >         int count, ret;

> >         int attempts = 6;

> > +       u8 old_crc[6];

> > +

> > +       if (crc == NULL) {

> > +               return -ENOMEM;

> > +       }

> 

> wouldn't we drop this check per DK and Jani request?

> I believe we don't need it, but even if there are cases that we need

> we could remove the braces..

> 


Yeah, crc is allocated on the stack. If that is null, we'll have bigger
problems to deal with. And I think it's reasonable to assume the caller
is sending a valid array to fill data.

> >

> >         ret = intel_dp_sink_crc_start(intel_dp);

> >         if (ret)

> > @@ -3929,11 +3934,33 @@ int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)

> >                 goto stop;

> >         }

> >

> > -       if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR, crc, 6) < 0) {

> > -               ret = -EIO;

> > -               goto stop;

> > +       /*

> > +        * Sometimes it takes a while for the "real" CRC values to land in

> > +        * the DPCD, so try several times until we get two reads in a row

> > +        * that are the same.  If we're an eDP panel, delay between reads

> > +        * for a while since the values take a bit longer to propagate.

> > +        */

> > +       for (attempts = 0; attempts < 6; attempts++) {

> > +               intel_wait_for_vblank(dev_priv, intel_crtc->pipe);

> 

> DK, we need vblank wait because the crc calculation also may take one vblank.

> usually 2 actually... one to make sure you have the full screen

> updated and one for the calculation.

> In the past when we didn't used the count we were waiting 2 vblanks...

> 



Thanks for the clarification. My reasoning was, after the first two
vblank_waits for the sink to calculate crc, the ones in the retry path
were unnecessary. We just need some delay before reading the dpcd again
without having to enable vblank interrupts. Anyway, the number of
retries is low enough that it shouldn't matter.

On the other hand, since the only consumers of dp sink crc are tests,
why can't the kernel just dump what it reads to debugfs and let the test
deal with erroneous results?

> > +

> > +               if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR,

> > +                                    crc, 6) < 0) {

> > +                       ret = -EIO;

> > +                       break;

> > +               }

> > +

> > +               if (attempts && memcmp(old_crc, crc, 6) == 0)

> > +                       break;

> > +               memcpy(old_crc, crc, 6);

> 

> little bikeshed: too many hardcoded "6" around... a sizeof would be better...

> but whatever...

> 

> > +

> > +               if (is_edp(intel_dp))

> > +                       usleep_range(20000, 25000);

> >         }

> >

> > +       if (attempts == 6) {

> > +               DRM_DEBUG_KMS("Failed to get CRC after 6 attempts.\n");

> > +               ret = -ETIMEDOUT;

> > +       }

> >  stop:

> >         intel_dp_sink_crc_stop(intel_dp);

> >         return ret;

> > --

> > 2.7.4

> >

> > _______________________________________________

> > Intel-gfx mailing list

> > Intel-gfx@lists.freedesktop.org

> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

> 

> 

>
jim.bride@linux.intel.com Aug. 7, 2017, 3:58 p.m. UTC | #3
On Fri, Aug 04, 2017 at 06:38:02PM +0000, Pandiyan, Dhinakaran wrote:
> 
> 
> 
> On Thu, 2017-08-03 at 11:07 -0700, Rodrigo Vivi wrote:
> > On Tue, Jul 18, 2017 at 2:34 PM, Jim Bride <jim.bride@linux.intel.com> wrote:
> > > According to the eDP spec, when the count field in TEST_SINK_MISC
> > > increments then the six bytes of sink CRC information in the DPCD
> > > should be valid.  Unfortunately, this doesn't seem to be the case
> > > on some panels, and as a result we get some incorrect and inconsistent
> > > values from the sink CRC DPCD locations at times.  This problem exhibits
> > > itself more on faster processors (relative failure rates HSW < SKL < KBL.)
> > > In order to try and account for this, we try a lot harder to read the sink
> > > CRC until we get consistent values twice in a row before returning what we
> > > read and delay for a time before trying to read.  We still see some
> > > occasional failures, but reading the sink CRC is much more reliable,
> > > particularly on SKL and KBL, with these changes than without.
> 
> I'm curious if we get the correct crc if we waited a full second.

On SKL, times less than a second work fine generally.  On KBL, the
sink CRC is *way* less reliable, and I've seen runs where I set the
retry counts ridiculously high (> 30) and still not received valid
values.


Jim


> > 
> > Is DK now ok with this description?
> > I believe he requested more info here.
> > 
> > >
> > > v2: * Reduce number of retries when reading the sink CRC (Jani)
> > >     * Refactor to minimize changes to the code (Jani)
> > >     * Rebase
> > > v3: * Rebase
> > > v4: * Switch from do-while to for loop when reading CRC values (Jani)
> > >     * Rebase
> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > Signed-off-by: Jim Bride <jim.bride@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_dp.c | 33 ++++++++++++++++++++++++++++++---
> > >  1 file changed, 30 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> > > index 2d42d09..c90ca1c 100644
> > > --- a/drivers/gpu/drm/i915/intel_dp.c
> > > +++ b/drivers/gpu/drm/i915/intel_dp.c
> > > @@ -3906,6 +3906,11 @@ int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)
> > >         u8 buf;
> > >         int count, ret;
> > >         int attempts = 6;
> > > +       u8 old_crc[6];
> > > +
> > > +       if (crc == NULL) {
> > > +               return -ENOMEM;
> > > +       }
> > 
> > wouldn't we drop this check per DK and Jani request?
> > I believe we don't need it, but even if there are cases that we need
> > we could remove the braces..
> > 
> 
> Yeah, crc is allocated on the stack. If that is null, we'll have bigger
> problems to deal with. And I think it's reasonable to assume the caller
> is sending a valid array to fill data.
> 
> > >
> > >         ret = intel_dp_sink_crc_start(intel_dp);
> > >         if (ret)
> > > @@ -3929,11 +3934,33 @@ int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)
> > >                 goto stop;
> > >         }
> > >
> > > -       if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR, crc, 6) < 0) {
> > > -               ret = -EIO;
> > > -               goto stop;
> > > +       /*
> > > +        * Sometimes it takes a while for the "real" CRC values to land in
> > > +        * the DPCD, so try several times until we get two reads in a row
> > > +        * that are the same.  If we're an eDP panel, delay between reads
> > > +        * for a while since the values take a bit longer to propagate.
> > > +        */
> > > +       for (attempts = 0; attempts < 6; attempts++) {
> > > +               intel_wait_for_vblank(dev_priv, intel_crtc->pipe);
> > 
> > DK, we need vblank wait because the crc calculation also may take one vblank.
> > usually 2 actually... one to make sure you have the full screen
> > updated and one for the calculation.
> > In the past when we didn't used the count we were waiting 2 vblanks...
> > 
> 
> 
> Thanks for the clarification. My reasoning was, after the first two
> vblank_waits for the sink to calculate crc, the ones in the retry path
> were unnecessary. We just need some delay before reading the dpcd again
> without having to enable vblank interrupts. Anyway, the number of
> retries is low enough that it shouldn't matter.
> 
> On the other hand, since the only consumers of dp sink crc are tests,
> why can't the kernel just dump what it reads to debugfs and let the test
> deal with erroneous results?
> 
> > > +
> > > +               if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR,
> > > +                                    crc, 6) < 0) {
> > > +                       ret = -EIO;
> > > +                       break;
> > > +               }
> > > +
> > > +               if (attempts && memcmp(old_crc, crc, 6) == 0)
> > > +                       break;
> > > +               memcpy(old_crc, crc, 6);
> > 
> > little bikeshed: too many hardcoded "6" around... a sizeof would be better...
> > but whatever...
> > 
> > > +
> > > +               if (is_edp(intel_dp))
> > > +                       usleep_range(20000, 25000);
> > >         }
> > >
> > > +       if (attempts == 6) {
> > > +               DRM_DEBUG_KMS("Failed to get CRC after 6 attempts.\n");
> > > +               ret = -ETIMEDOUT;
> > > +       }
> > >  stop:
> > >         intel_dp_sink_crc_stop(intel_dp);
> > >         return ret;
> > > --
> > > 2.7.4
> > >
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> > 
> >
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 2d42d09..c90ca1c 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -3906,6 +3906,11 @@  int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)
 	u8 buf;
 	int count, ret;
 	int attempts = 6;
+	u8 old_crc[6];
+
+	if (crc == NULL) {
+		return -ENOMEM;
+	}
 
 	ret = intel_dp_sink_crc_start(intel_dp);
 	if (ret)
@@ -3929,11 +3934,33 @@  int intel_dp_sink_crc(struct intel_dp *intel_dp, u8 *crc)
 		goto stop;
 	}
 
-	if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR, crc, 6) < 0) {
-		ret = -EIO;
-		goto stop;
+	/*
+	 * Sometimes it takes a while for the "real" CRC values to land in
+	 * the DPCD, so try several times until we get two reads in a row
+	 * that are the same.  If we're an eDP panel, delay between reads
+	 * for a while since the values take a bit longer to propagate.
+	 */
+	for (attempts = 0; attempts < 6; attempts++) {
+		intel_wait_for_vblank(dev_priv, intel_crtc->pipe);
+
+		if (drm_dp_dpcd_read(&intel_dp->aux, DP_TEST_CRC_R_CR,
+				     crc, 6) < 0) {
+			ret = -EIO;
+			break;
+		}
+
+		if (attempts && memcmp(old_crc, crc, 6) == 0)
+			break;
+		memcpy(old_crc, crc, 6);
+
+		if (is_edp(intel_dp))
+			usleep_range(20000, 25000);
 	}
 
+	if (attempts == 6) {
+		DRM_DEBUG_KMS("Failed to get CRC after 6 attempts.\n");
+		ret = -ETIMEDOUT;
+	}
 stop:
 	intel_dp_sink_crc_stop(intel_dp);
 	return ret;