diff mbox series

drm/i915/mtl: Clear possible sticky bits on PICA message bus

Message ID 20231101103101.156505-1-mika.kahola@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/mtl: Clear possible sticky bits on PICA message bus | expand

Commit Message

Kahola, Mika Nov. 1, 2023, 10:31 a.m. UTC
It is possible that sticky bits or error bits are left on
message bus status register. Reading and then writing the
value back to messagebus status register clears all possible
sticky bits and errors.

Signed-off-by: Mika Kahola <mika.kahola@intel.com>
---
 drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Jani Nikula Nov. 1, 2023, 10:51 a.m. UTC | #1
On Wed, 01 Nov 2023, Mika Kahola <mika.kahola@intel.com> wrote:
> It is possible that sticky bits or error bits are left on
> message bus status register. Reading and then writing the
> value back to messagebus status register clears all possible
> sticky bits and errors.

Note that I don't know if this is the right thing to do, or the right
place to do this, but I'll just comment on the *implementation*,
i.e. please wait for proper review before addressing my comments.

>
> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> index b2ad4c6172f6..f439f0c7b400 100644
> --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> @@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
>  		return -ETIMEDOUT;
>  	}
>  
> +	/*
> +	 * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> +	 * any error sticky bits set from previous transactions
> +	 */
> +	val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> +	intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), val);

I think it's slightly confusing to use val here, as it's then passed on
to intel_cx0_wait_for_ack() and you're left wondering if that's required
or just reuse of the val variable.

This should do the same thing in one line, without reusing val:

	intel_de_rmw(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), 0, 0);

> +
>  	intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
>  		       XELPDP_PORT_M2P_TRANSACTION_PENDING |
>  		       XELPDP_PORT_M2P_COMMAND_READ |
> @@ -262,6 +269,13 @@ static int __intel_cx0_write_once(struct drm_i915_private *i915, enum port port,
>  		return -ETIMEDOUT;
>  	}
>  
> +	/*
> +	 * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> +	 * any error sticky bits set from previous transactions
> +	 */
> +	val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> +	intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), val);
> +
>  	intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
>  		       XELPDP_PORT_M2P_TRANSACTION_PENDING |
>  		       (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
Ville Syrjala Nov. 1, 2023, 11:08 a.m. UTC | #2
On Wed, Nov 01, 2023 at 12:51:12PM +0200, Jani Nikula wrote:
> On Wed, 01 Nov 2023, Mika Kahola <mika.kahola@intel.com> wrote:
> > It is possible that sticky bits or error bits are left on
> > message bus status register. Reading and then writing the
> > value back to messagebus status register clears all possible
> > sticky bits and errors.
> 
> Note that I don't know if this is the right thing to do, or the right
> place to do this, but I'll just comment on the *implementation*,
> i.e. please wait for proper review before addressing my comments.
> 
> >
> > Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > index b2ad4c6172f6..f439f0c7b400 100644
> > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > @@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
> >  		return -ETIMEDOUT;
> >  	}
> >  
> > +	/*
> > +	 * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > +	 * any error sticky bits set from previous transactions
> > +	 */
> > +	val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > +	intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), val);
> 
> I think it's slightly confusing to use val here, as it's then passed on
> to intel_cx0_wait_for_ack() and you're left wondering if that's required
> or just reuse of the val variable.
> 
> This should do the same thing in one line, without reusing val:
> 
> 	intel_de_rmw(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), 0, 0);

Why is this not just a intel_clear_response_ready_flag()?

Side note: that function name is somewhat misleading...

> 
> > +
> >  	intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> >  		       XELPDP_PORT_M2P_TRANSACTION_PENDING |
> >  		       XELPDP_PORT_M2P_COMMAND_READ |
> > @@ -262,6 +269,13 @@ static int __intel_cx0_write_once(struct drm_i915_private *i915, enum port port,
> >  		return -ETIMEDOUT;
> >  	}
> >  
> > +	/*
> > +	 * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > +	 * any error sticky bits set from previous transactions
> > +	 */
> > +	val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > +	intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), val);
> > +
> >  	intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> >  		       XELPDP_PORT_M2P_TRANSACTION_PENDING |
> >  		       (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
> 
> -- 
> Jani Nikula, Intel
Kahola, Mika Nov. 1, 2023, 1:21 p.m. UTC | #3
> -----Original Message-----
> From: Jani Nikula <jani.nikula@linux.intel.com>
> Sent: Wednesday, November 1, 2023 12:51 PM
> To: Kahola, Mika <mika.kahola@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA message bus
> 
> On Wed, 01 Nov 2023, Mika Kahola <mika.kahola@intel.com> wrote:
> > It is possible that sticky bits or error bits are left on message bus
> > status register. Reading and then writing the value back to messagebus
> > status register clears all possible sticky bits and errors.
> 
> Note that I don't know if this is the right thing to do, or the right place to do this, but I'll just comment on the *implementation*,
> i.e. please wait for proper review before addressing my comments.
> 
> >
> > Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > index b2ad4c6172f6..f439f0c7b400 100644
> > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > @@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
> >  		return -ETIMEDOUT;
> >  	}
> >
> > +	/*
> > +	 * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > +	 * any error sticky bits set from previous transactions
> > +	 */
> > +	val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > +	intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane),
> > +val);
> 
> I think it's slightly confusing to use val here, as it's then passed on to intel_cx0_wait_for_ack() and you're left wondering if that's
> required or just reuse of the val variable.

This is a bit controversial patch. I have seen "reference" implementation which does similar thing. From BSpec 65101 PORT_P2M_MSGBUS_STATUS bit31 "Response Ready" the bit is set by HW on read completion or when write ack is received by SW after read. This means that we either have response pending or not. Now, the whole idea to float patch was to spark discussion if this is the way we should do this. 

There is an issue of sporadically fail reading or writing to the PICA message bus. This extra step might be something that we need (locally I couldn't trigger the the failure with this). One can interpret the spec in two ways. One is that the hw clears these bits when executing read command and hence we could write the value back as such. The other one is that we need to read the register and clear the bits like we already do with intel_clear_response_ready_flag() function.

One sidenote is that I couldn't find this extra step from the spec. Maybe need for something like this was discovered later?

-Mika-

> 
> This should do the same thing in one line, without reusing val:
> 
> 	intel_de_rmw(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), 0, 0);
> 
> > +
> >  	intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> >  		       XELPDP_PORT_M2P_TRANSACTION_PENDING |
> >  		       XELPDP_PORT_M2P_COMMAND_READ | @@ -262,6 +269,13 @@ static
> > int __intel_cx0_write_once(struct drm_i915_private *i915, enum port port,
> >  		return -ETIMEDOUT;
> >  	}
> >
> > +	/*
> > +	 * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > +	 * any error sticky bits set from previous transactions
> > +	 */
> > +	val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > +	intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane),
> > +val);
> > +
> >  	intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> >  		       XELPDP_PORT_M2P_TRANSACTION_PENDING |
> >  		       (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
> 
> --
> Jani Nikula, Intel
Gustavo Sousa Nov. 2, 2023, 2:23 p.m. UTC | #4
Quoting Mika Kahola (2023-11-01 07:31:01-03:00)
>It is possible that sticky bits or error bits are left on
>message bus status register. Reading and then writing the
>value back to messagebus status register clears all possible
>sticky bits and errors.
>
>Signed-off-by: Mika Kahola <mika.kahola@intel.com>
>---
> drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>index b2ad4c6172f6..f439f0c7b400 100644
>--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>@@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
>                 return -ETIMEDOUT;
>         }
> 
>+        /*
>+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
>+         * any error sticky bits set from previous transactions
>+         */
>+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
>+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), val);
>+
>         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
>                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
>                        XELPDP_PORT_M2P_COMMAND_READ |
>@@ -262,6 +269,13 @@ static int __intel_cx0_write_once(struct drm_i915_private *i915, enum port port,
>                 return -ETIMEDOUT;
>         }
> 
>+        /*
>+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
>+         * any error sticky bits set from previous transactions
>+         */
>+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
>+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), val);
>+

Looking at the current state of the code, looks like to me that we already
clear the bits from both the "success" and "failure" paths. For the "success"
paths, that is done by a direct call to intel_clear_response_ready_flag(); for
the "failure" case, the call to intel_clear_response_ready_flag() is done as
part of intel_cx0_bus_reset().

Thus, considering that we start using the msgbus from a clean state, maybe these
extra steps are not necessary? Have you tried adding a call to
intel_cx0_bus_reset() as part of intel_cx0_phy_transaction_begin()?

Also, I think it would be good if we understood better were those uncleared bits
are coming from...

--
Gustavo Sousa

>         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
>                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
>                        (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
>-- 
>2.34.1
>
Kahola, Mika Nov. 2, 2023, 2:54 p.m. UTC | #5
> -----Original Message-----
> From: Sousa, Gustavo <gustavo.sousa@intel.com>
> Sent: Thursday, November 2, 2023 4:23 PM
> To: Kahola, Mika <mika.kahola@intel.com>; intel-gfx@lists.freedesktop.org
> Cc: Kahola, Mika <mika.kahola@intel.com>; Nikula, Jani <jani.nikula@intel.com>; Syrjala, Ville <ville.syrjala@intel.com>
> Subject: Re: [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA message bus
> 
> Quoting Mika Kahola (2023-11-01 07:31:01-03:00)
> >It is possible that sticky bits or error bits are left on message bus
> >status register. Reading and then writing the value back to messagebus
> >status register clears all possible sticky bits and errors.
> >
> >Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> >---
> > drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >index b2ad4c6172f6..f439f0c7b400 100644
> >--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >@@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
> >                 return -ETIMEDOUT;
> >         }
> >
> >+        /*
> >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> >+         * any error sticky bits set from previous transactions
> >+         */
> >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> >+ lane), val);
> >+
> >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> >                        XELPDP_PORT_M2P_COMMAND_READ | @@ -262,6
> >+269,13 @@ static int __intel_cx0_write_once(struct drm_i915_private *i915, enum port port,
> >                 return -ETIMEDOUT;
> >         }
> >
> >+        /*
> >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> >+         * any error sticky bits set from previous transactions
> >+         */
> >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> >+ lane), val);
> >+
> 
> Looking at the current state of the code, looks like to me that we already clear the bits from both the "success" and "failure" paths.
> For the "success"
> paths, that is done by a direct call to intel_clear_response_ready_flag(); for the "failure" case, the call to
> intel_clear_response_ready_flag() is done as part of intel_cx0_bus_reset().
> 
> Thus, considering that we start using the msgbus from a clean state, maybe these extra steps are not necessary? Have you tried
> adding a call to
> intel_cx0_bus_reset() as part of intel_cx0_phy_transaction_begin()?
That I haven't try to reset bus at the stage. I can give it a go and check what happens. To me it seems, that we are sometimes stuck at waiting the ack and eventually we time out and bail out. I have no clue why this happens from time to time. We already reset the bus after every read/write operation. In addition, a small delay seem to help and clear the sporadic read/write failures to the bus. However, this is more like a workaround and I'm not really happy about this sort of hack.

I will give a go for your suggestion and come back once I have the results.

Thanks!
-Mika-

> 
> Also, I think it would be good if we understood better were those uncleared bits are coming from...
> 
> --
> Gustavo Sousa
> 
> >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> >                        (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
> >--
> >2.34.1
> >
Kahola, Mika Nov. 3, 2023, 2:47 p.m. UTC | #6
> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Kahola, Mika
> Sent: Thursday, November 2, 2023 4:54 PM
> To: Sousa, Gustavo <gustavo.sousa@intel.com>; intel-gfx@lists.freedesktop.org
> Cc: Nikula, Jani <jani.nikula@intel.com>; Syrjala, Ville <ville.syrjala@intel.com>
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA message bus
> 
> > -----Original Message-----
> > From: Sousa, Gustavo <gustavo.sousa@intel.com>
> > Sent: Thursday, November 2, 2023 4:23 PM
> > To: Kahola, Mika <mika.kahola@intel.com>;
> > intel-gfx@lists.freedesktop.org
> > Cc: Kahola, Mika <mika.kahola@intel.com>; Nikula, Jani
> > <jani.nikula@intel.com>; Syrjala, Ville <ville.syrjala@intel.com>
> > Subject: Re: [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA
> > message bus
> >
> > Quoting Mika Kahola (2023-11-01 07:31:01-03:00)
> > >It is possible that sticky bits or error bits are left on message bus
> > >status register. Reading and then writing the value back to
> > >messagebus status register clears all possible sticky bits and errors.
> > >
> > >Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> > >---
> > > drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
> > > 1 file changed, 14 insertions(+)
> > >
> > >diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > >b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > >index b2ad4c6172f6..f439f0c7b400 100644
> > >--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > >+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > >@@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
> > >                 return -ETIMEDOUT;
> > >         }
> > >
> > >+        /*
> > >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > >+         * any error sticky bits set from previous transactions
> > >+         */
> > >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> > >+ lane), val);
> > >+
> > >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> > >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> > >                        XELPDP_PORT_M2P_COMMAND_READ | @@ -262,6
> > >+269,13 @@ static int __intel_cx0_write_once(struct drm_i915_private
> > >+*i915, enum port port,
> > >                 return -ETIMEDOUT;
> > >         }
> > >
> > >+        /*
> > >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > >+         * any error sticky bits set from previous transactions
> > >+         */
> > >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> > >+ lane), val);
> > >+
> >
> > Looking at the current state of the code, looks like to me that we already clear the bits from both the "success" and "failure"
> paths.
> > For the "success"
> > paths, that is done by a direct call to
> > intel_clear_response_ready_flag(); for the "failure" case, the call to
> > intel_clear_response_ready_flag() is done as part of intel_cx0_bus_reset().
> >
> > Thus, considering that we start using the msgbus from a clean state,
> > maybe these extra steps are not necessary? Have you tried adding a
> > call to
> > intel_cx0_bus_reset() as part of intel_cx0_phy_transaction_begin()?
> That I haven't try to reset bus at the stage. I can give it a go and check what happens. To me it seems, that we are sometimes
> stuck at waiting the ack and eventually we time out and bail out. I have no clue why this happens from time to time. We already
> reset the bus after every read/write operation. In addition, a small delay seem to help and clear the sporadic read/write failures
> to the bus. However, this is more like a workaround and I'm not really happy about this sort of hack.
> 
> I will give a go for your suggestion and come back once I have the results.

I ran a test with intel_cx0_bus_reset() when placed in intel_cx0_phy_transaction_begin(). This didn't help either as with kms_flip IGT test I was able to trigger this read failure

[drm] *ERROR* PHY G Read 0d80 failed after 3 retries.

This was with the configuration where the test vehicle had 4K and eDP monitors connected.


> 
> Thanks!
> -Mika-
> 
> >
> > Also, I think it would be good if we understood better were those uncleared bits are coming from...
> >
> > --
> > Gustavo Sousa
> >
> > >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> > >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> > >                        (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
> > >--
> > >2.34.1
> > >
Kahola, Mika Nov. 3, 2023, 3 p.m. UTC | #7
> -----Original Message-----
> From: Kahola, Mika <mika.kahola@intel.com>
> Sent: Friday, November 3, 2023 4:47 PM
> To: Kahola, Mika <mika.kahola@intel.com>; Sousa, Gustavo <gustavo.sousa@intel.com>; intel-gfx@lists.freedesktop.org
> Cc: Nikula, Jani <jani.nikula@intel.com>; Syrjala, Ville <ville.syrjala@intel.com>
> Subject: RE: [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA message bus
> 
> > -----Original Message-----
> > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of
> > Kahola, Mika
> > Sent: Thursday, November 2, 2023 4:54 PM
> > To: Sousa, Gustavo <gustavo.sousa@intel.com>;
> > intel-gfx@lists.freedesktop.org
> > Cc: Nikula, Jani <jani.nikula@intel.com>; Syrjala, Ville
> > <ville.syrjala@intel.com>
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Clear possible sticky
> > bits on PICA message bus
> >
> > > -----Original Message-----
> > > From: Sousa, Gustavo <gustavo.sousa@intel.com>
> > > Sent: Thursday, November 2, 2023 4:23 PM
> > > To: Kahola, Mika <mika.kahola@intel.com>;
> > > intel-gfx@lists.freedesktop.org
> > > Cc: Kahola, Mika <mika.kahola@intel.com>; Nikula, Jani
> > > <jani.nikula@intel.com>; Syrjala, Ville <ville.syrjala@intel.com>
> > > Subject: Re: [PATCH] drm/i915/mtl: Clear possible sticky bits on
> > > PICA message bus
> > >
> > > Quoting Mika Kahola (2023-11-01 07:31:01-03:00)
> > > >It is possible that sticky bits or error bits are left on message
> > > >bus status register. Reading and then writing the value back to
> > > >messagebus status register clears all possible sticky bits and errors.
> > > >
> > > >Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> > > >---
> > > > drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
> > > > 1 file changed, 14 insertions(+)
> > > >
> > > >diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > >b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > >index b2ad4c6172f6..f439f0c7b400 100644
> > > >--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > >+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > >@@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
> > > >                 return -ETIMEDOUT;
> > > >         }
> > > >
> > > >+        /*
> > > >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > > >+         * any error sticky bits set from previous transactions
> > > >+         */
> > > >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > > >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> > > >+ lane), val);
> > > >+
> > > >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> > > >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> > > >                        XELPDP_PORT_M2P_COMMAND_READ | @@ -262,6
> > > >+269,13 @@ static int __intel_cx0_write_once(struct
> > > >+drm_i915_private *i915, enum port port,
> > > >                 return -ETIMEDOUT;
> > > >         }
> > > >
> > > >+        /*
> > > >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > > >+         * any error sticky bits set from previous transactions
> > > >+         */
> > > >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > > >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> > > >+ lane), val);
> > > >+
> > >
> > > Looking at the current state of the code, looks like to me that we already clear the bits from both the "success" and "failure"
> > paths.
> > > For the "success"
> > > paths, that is done by a direct call to
> > > intel_clear_response_ready_flag(); for the "failure" case, the call
> > > to
> > > intel_clear_response_ready_flag() is done as part of intel_cx0_bus_reset().
> > >
> > > Thus, considering that we start using the msgbus from a clean state,
> > > maybe these extra steps are not necessary? Have you tried adding a
> > > call to
> > > intel_cx0_bus_reset() as part of intel_cx0_phy_transaction_begin()?
> > That I haven't try to reset bus at the stage. I can give it a go and
> > check what happens. To me it seems, that we are sometimes stuck at
> > waiting the ack and eventually we time out and bail out. I have no
> > clue why this happens from time to time. We already reset the bus after every read/write operation. In addition, a small delay
> seem to help and clear the sporadic read/write failures to the bus. However, this is more like a workaround and I'm not really
> happy about this sort of hack.
> >
> > I will give a go for your suggestion and come back once I have the results.
> 
> I ran a test with intel_cx0_bus_reset() when placed in intel_cx0_phy_transaction_begin(). This didn't help either as with kms_flip
> IGT test I was able to trigger this read failure
> 
> [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> 
> This was with the configuration where the test vehicle had 4K and eDP monitors connected.

I think we can ignore this patch. I was able to trigger this error with this patch applied as well. This doesn't fix the issue either.

Sorry for the noise.

-Mika-

> 
> 
> >
> > Thanks!
> > -Mika-
> >
> > >
> > > Also, I think it would be good if we understood better were those uncleared bits are coming from...
> > >
> > > --
> > > Gustavo Sousa
> > >
> > > >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> > > >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> > > >                        (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
> > > >--
> > > >2.34.1
> > > >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
index b2ad4c6172f6..f439f0c7b400 100644
--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
@@ -195,6 +195,13 @@  static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
 		return -ETIMEDOUT;
 	}
 
+	/*
+	 * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
+	 * any error sticky bits set from previous transactions
+	 */
+	val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
+	intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), val);
+
 	intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
 		       XELPDP_PORT_M2P_TRANSACTION_PENDING |
 		       XELPDP_PORT_M2P_COMMAND_READ |
@@ -262,6 +269,13 @@  static int __intel_cx0_write_once(struct drm_i915_private *i915, enum port port,
 		return -ETIMEDOUT;
 	}
 
+	/*
+	 * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
+	 * any error sticky bits set from previous transactions
+	 */
+	val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
+	intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane), val);
+
 	intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
 		       XELPDP_PORT_M2P_TRANSACTION_PENDING |
 		       (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :