diff mbox series

[4/9] drm/panfrost: Handle HW_ISSUE_TTRX_3076

Message ID 20220211202728.6146-5-alyssa.rosenzweig@collabora.com (mailing list archive)
State New, archived
Headers show
Series drm/panfrost: Initial Valhall support | expand

Commit Message

Alyssa Rosenzweig Feb. 11, 2022, 8:27 p.m. UTC
From: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Some Valhall GPUs require resets when encountering bus faults due to
occlusion query writes. Add the issue bit for this and handle it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 9 +++++++--
 drivers/gpu/drm/panfrost/panfrost_issues.h | 4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

Comments

Steven Price Feb. 14, 2022, 4:23 p.m. UTC | #1
On 11/02/2022 20:27, alyssa.rosenzweig@collabora.com wrote:
> From: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> 
> Some Valhall GPUs require resets when encountering bus faults due to
> occlusion query writes. Add the issue bit for this and handle it.
> 
> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>
(although one nit below)

Just in case any one is wondering - these bus faults occur when
switching the GPU's MMU to unmapped - it's not a normal "bus fault" from
the external bus. This is triggered by an attempt to read unmapped
memory which is completed by the driver by switching the entire MMU to
unmapped.

> ---
>  drivers/gpu/drm/panfrost/panfrost_device.c | 9 +++++++--
>  drivers/gpu/drm/panfrost/panfrost_issues.h | 4 ++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
> index 7f51a4682ccb..ee612303f076 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
> @@ -11,6 +11,7 @@
>  #include "panfrost_device.h"
>  #include "panfrost_devfreq.h"
>  #include "panfrost_features.h"
> +#include "panfrost_issues.h"
>  #include "panfrost_gpu.h"
>  #include "panfrost_job.h"
>  #include "panfrost_mmu.h"
> @@ -380,9 +381,13 @@ const char *panfrost_exception_name(u32 exception_code)
>  bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
>  				    u32 exception_code)
>  {
> -	/* Right now, none of the GPU we support need a reset, but this
> -	 * might change.
> +	/* If an occlusion query write causes a bus fault on affected GPUs,
> +	 * future fragment jobs may hang. Reset to workaround.
>  	 */
> +	if (exception_code == DRM_PANFROST_EXCEPTION_JOB_BUS_FAULT)
> +		return panfrost_has_hw_issue(pfdev, HW_ISSUE_TTRX_3076);
> +
> +	/* No other GPUs we support need a reset */
>  	return false;
>  }
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h
> index a66692663833..058f6a4c8435 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_issues.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
> @@ -128,6 +128,10 @@ enum panfrost_hw_issue {
>  	/* Must set SC_VAR_ALGORITHM */
>  	HW_ISSUE_TTRX_2968_TTRX_3162,
>  
> +	/* Bus fault from occlusion query write may cause future fragment jobs
> +	 * to hang */

NIT: Kernel comment style has the "/*" and "*/" on lines by themselves
for multi-line comments. checkpatch will complain!

> +	HW_ISSUE_TTRX_3076,
> +
>  	HW_ISSUE_END
>  };
>
Alyssa Rosenzweig Feb. 14, 2022, 5:06 p.m. UTC | #2
On Mon, Feb 14, 2022 at 04:23:18PM +0000, Steven Price wrote:
> On 11/02/2022 20:27, alyssa.rosenzweig@collabora.com wrote:
> > From: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> > 
> > Some Valhall GPUs require resets when encountering bus faults due to
> > occlusion query writes. Add the issue bit for this and handle it.
> > 
> > Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> 
> Reviewed-by: Steven Price <steven.price@arm.com>
> (although one nit below)
> 
> Just in case any one is wondering - these bus faults occur when
> switching the GPU's MMU to unmapped - it's not a normal "bus fault" from
> the external bus. This is triggered by an attempt to read unmapped
> memory which is completed by the driver by switching the entire MMU to
> unmapped.

Ouch, that's subtle.

> > diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h
> > index a66692663833..058f6a4c8435 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_issues.h
> > +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
> > @@ -128,6 +128,10 @@ enum panfrost_hw_issue {
> >  	/* Must set SC_VAR_ALGORITHM */
> >  	HW_ISSUE_TTRX_2968_TTRX_3162,
> >  
> > +	/* Bus fault from occlusion query write may cause future fragment jobs
> > +	 * to hang */
> 
> NIT: Kernel comment style has the "/*" and "*/" on lines by themselves
> for multi-line comments. checkpatch will complain!

Yes, I am aware (and checkpatch did complain). The existing multi-line
comments in that file do not have the extra lines. Consistency within
the file seemed like the lesser evil. If you think it's better to
appease checkpatch, I can reformat for v2.

(I can also throw in a patch fixing the rest of that file's multiline
comments but that seems a bit extra.)
Steven Price Feb. 16, 2022, 4:06 p.m. UTC | #3
On 14/02/2022 17:06, Alyssa Rosenzweig wrote:
> On Mon, Feb 14, 2022 at 04:23:18PM +0000, Steven Price wrote:
>> On 11/02/2022 20:27, alyssa.rosenzweig@collabora.com wrote:
>>> From: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>
>>> Some Valhall GPUs require resets when encountering bus faults due to
>>> occlusion query writes. Add the issue bit for this and handle it.
>>>
>>> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>
>> Reviewed-by: Steven Price <steven.price@arm.com>
>> (although one nit below)
>>
>> Just in case any one is wondering - these bus faults occur when
>> switching the GPU's MMU to unmapped - it's not a normal "bus fault" from
>> the external bus. This is triggered by an attempt to read unmapped
>> memory which is completed by the driver by switching the entire MMU to
>> unmapped.
> 
> Ouch, that's subtle.
> 
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h
>>> index a66692663833..058f6a4c8435 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_issues.h
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
>>> @@ -128,6 +128,10 @@ enum panfrost_hw_issue {
>>>  	/* Must set SC_VAR_ALGORITHM */
>>>  	HW_ISSUE_TTRX_2968_TTRX_3162,
>>>  
>>> +	/* Bus fault from occlusion query write may cause future fragment jobs
>>> +	 * to hang */
>>
>> NIT: Kernel comment style has the "/*" and "*/" on lines by themselves
>> for multi-line comments. checkpatch will complain!
> 
> Yes, I am aware (and checkpatch did complain). The existing multi-line
> comments in that file do not have the extra lines. Consistency within
> the file seemed like the lesser evil. If you think it's better to
> appease checkpatch, I can reformat for v2.
> 
> (I can also throw in a patch fixing the rest of that file's multiline
> comments but that seems a bit extra.)

Nah! I've never been very keen on that style rule myself, but I usually
try to keep checkpatch happy when working on the kernel. Lets just
follow the rest of the file for now.

Thanks,

Steve
diff mbox series

Patch

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
index 7f51a4682ccb..ee612303f076 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -11,6 +11,7 @@ 
 #include "panfrost_device.h"
 #include "panfrost_devfreq.h"
 #include "panfrost_features.h"
+#include "panfrost_issues.h"
 #include "panfrost_gpu.h"
 #include "panfrost_job.h"
 #include "panfrost_mmu.h"
@@ -380,9 +381,13 @@  const char *panfrost_exception_name(u32 exception_code)
 bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
 				    u32 exception_code)
 {
-	/* Right now, none of the GPU we support need a reset, but this
-	 * might change.
+	/* If an occlusion query write causes a bus fault on affected GPUs,
+	 * future fragment jobs may hang. Reset to workaround.
 	 */
+	if (exception_code == DRM_PANFROST_EXCEPTION_JOB_BUS_FAULT)
+		return panfrost_has_hw_issue(pfdev, HW_ISSUE_TTRX_3076);
+
+	/* No other GPUs we support need a reset */
 	return false;
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h
index a66692663833..058f6a4c8435 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -128,6 +128,10 @@  enum panfrost_hw_issue {
 	/* Must set SC_VAR_ALGORITHM */
 	HW_ISSUE_TTRX_2968_TTRX_3162,
 
+	/* Bus fault from occlusion query write may cause future fragment jobs
+	 * to hang */
+	HW_ISSUE_TTRX_3076,
+
 	HW_ISSUE_END
 };