diff mbox series

[v2] drm/i915/display: Reset message bus after each read/write operation

Message ID 20231013065532.634872-1-mika.kahola@intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm/i915/display: Reset message bus after each read/write operation | expand

Commit Message

Kahola, Mika Oct. 13, 2023, 6:55 a.m. UTC
Every know and then we receive the following error when running
for example IGT test kms_flip.

[drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
[drm] *ERROR* PHY G Write 0d81 failed after 3 retries.

Since the error is sporadic in nature, the patch proposes
to reset the message bus after every successful or unsuccessful
read or write operation. However, the testing revealed that this
alone is not sufficient method and therefore an additional
delay is introduced anything from 200us to 300us to let HW to
settle down. This delay is experimental value and has no
specification to back it up.

v2: Add FIXME's to indicate the experimental nature of
    this workaround (Rodrigo)

Signed-off-by: Mika Kahola <mika.kahola@intel.com>
---
 drivers/gpu/drm/i915/display/intel_cx0_phy.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Rodrigo Vivi Oct. 13, 2023, 8:21 p.m. UTC | #1
On Fri, Oct 13, 2023 at 09:55:32AM +0300, Mika Kahola wrote:
> Every know and then we receive the following error when running
> for example IGT test kms_flip.
> 
> [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> 
> Since the error is sporadic in nature, the patch proposes
> to reset the message bus after every successful or unsuccessful
> read or write operation. However, the testing revealed that this
> alone is not sufficient method and therefore an additional
> delay is introduced anything from 200us to 300us to let HW to
> settle down. This delay is experimental value and has no
> specification to back it up.
> 
> v2: Add FIXME's to indicate the experimental nature of
>     this workaround (Rodrigo)
> 
> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> index 6e6a1818071e..7c48ec5e54bd 100644
> --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> @@ -221,6 +221,14 @@ static u8 __intel_cx0_read(struct drm_i915_private *i915, enum port port,
>  	for (i = 0; i < 3; i++) {
>  		status = __intel_cx0_read_once(i915, port, lane, addr);
>  
> +		/*
> +		 * FIXME: Workaround to let HW to settle
> +		 * down and let the message bus to end up
> +		 * in a known state
> +		 */
> +		intel_cx0_bus_reset(i915, port, lane);
> +		usleep_range(200, 300);
> +
>  		if (status >= 0)
>  			return status;
>  	}
> @@ -300,6 +308,14 @@ static void __intel_cx0_write(struct drm_i915_private *i915, enum port port,
>  	for (i = 0; i < 3; i++) {
>  		status = __intel_cx0_write_once(i915, port, lane, addr, data, committed);
>  
> +		/*
> +		 * FIXME: Workaround to let HW to settle
> +		 * down and let the message bus to end up
> +		 * in a known state
> +		 */
> +		intel_cx0_bus_reset(i915, port, lane);
> +		usleep_range(200, 300);

what cases trigger these paths?
and how many calls in the modset case?
what about suspend/resume cylces?

if we do a single rmw we are introducing at least 400us of delay here.
have we measured the overall final impact of these extra sleeps on the resume and modeset latencies?

> +
>  		if (status == 0)
>  			return;
>  	}
> -- 
> 2.34.1
>
Kahola, Mika Oct. 16, 2023, 11:36 a.m. UTC | #2
> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> Sent: Friday, October 13, 2023 11:22 PM
> To: Kahola, Mika <mika.kahola@intel.com>
> Cc: intel-gfx@lists.freedesktop.org; Sousa, Gustavo <gustavo.sousa@intel.com>
> Subject: Re: [PATCH v2] drm/i915/display: Reset message bus after each read/write operation
> 
> On Fri, Oct 13, 2023 at 09:55:32AM +0300, Mika Kahola wrote:
> > Every know and then we receive the following error when running for
> > example IGT test kms_flip.
> >
> > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> >
> > Since the error is sporadic in nature, the patch proposes to reset the
> > message bus after every successful or unsuccessful read or write
> > operation. However, the testing revealed that this alone is not
> > sufficient method and therefore an additional delay is introduced
> > anything from 200us to 300us to let HW to settle down. This delay is
> > experimental value and has no specification to back it up.
> >
> > v2: Add FIXME's to indicate the experimental nature of
> >     this workaround (Rodrigo)
> >
> > Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > index 6e6a1818071e..7c48ec5e54bd 100644
> > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > @@ -221,6 +221,14 @@ static u8 __intel_cx0_read(struct drm_i915_private *i915, enum port port,
> >  	for (i = 0; i < 3; i++) {
> >  		status = __intel_cx0_read_once(i915, port, lane, addr);
> >
> > +		/*
> > +		 * FIXME: Workaround to let HW to settle
> > +		 * down and let the message bus to end up
> > +		 * in a known state
> > +		 */
> > +		intel_cx0_bus_reset(i915, port, lane);
> > +		usleep_range(200, 300);
> > +
> >  		if (status >= 0)
> >  			return status;
> >  	}
> > @@ -300,6 +308,14 @@ static void __intel_cx0_write(struct drm_i915_private *i915, enum port port,
> >  	for (i = 0; i < 3; i++) {
> >  		status = __intel_cx0_write_once(i915, port, lane, addr, data,
> > committed);
> >
> > +		/*
> > +		 * FIXME: Workaround to let HW to settle
> > +		 * down and let the message bus to end up
> > +		 * in a known state
> > +		 */
> > +		intel_cx0_bus_reset(i915, port, lane);
> > +		usleep_range(200, 300);
> 
> what cases trigger these paths?
I have noticed this when executing IGT test kms_flip with 4k monitors and eDP connected. Specially with 2x- cases.

> and how many calls in the modset case?
I haven't put any counters for this but quite a few anyways. This surely introduces additional delay.

> what about suspend/resume cylces?
> 
> if we do a single rmw we are introducing at least 400us of delay here.
> have we measured the overall final impact of these extra sleeps on the resume and modeset latencies?
We haven't measured overall impact. I did some further testing and 200-300us delay is an overkill solution. I tested with 1-10us delay and with my test vehicle, I didn't see any issue to use that. 

In fact, I moved the bus reset routine to be part of *_read_once() and *_write_once() functions and to me despite it looks more cleaner solution I can get rid of the delay. It must be noted that my test vehicle has changed to different MTL. Anyway, I will float the patch with revised function placement and delay drop. We can continue discussion from there.

Thanks!

-Mika-  

> 
> > +
> >  		if (status == 0)
> >  			return;
> >  	}
> > --
> > 2.34.1
> >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
index 6e6a1818071e..7c48ec5e54bd 100644
--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
@@ -221,6 +221,14 @@  static u8 __intel_cx0_read(struct drm_i915_private *i915, enum port port,
 	for (i = 0; i < 3; i++) {
 		status = __intel_cx0_read_once(i915, port, lane, addr);
 
+		/*
+		 * FIXME: Workaround to let HW to settle
+		 * down and let the message bus to end up
+		 * in a known state
+		 */
+		intel_cx0_bus_reset(i915, port, lane);
+		usleep_range(200, 300);
+
 		if (status >= 0)
 			return status;
 	}
@@ -300,6 +308,14 @@  static void __intel_cx0_write(struct drm_i915_private *i915, enum port port,
 	for (i = 0; i < 3; i++) {
 		status = __intel_cx0_write_once(i915, port, lane, addr, data, committed);
 
+		/*
+		 * FIXME: Workaround to let HW to settle
+		 * down and let the message bus to end up
+		 * in a known state
+		 */
+		intel_cx0_bus_reset(i915, port, lane);
+		usleep_range(200, 300);
+
 		if (status == 0)
 			return;
 	}