Message ID | 20220211202728.6146-5-alyssa.rosenzweig@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/panfrost: Initial Valhall support | expand |
On 11/02/2022 20:27, alyssa.rosenzweig@collabora.com wrote: > From: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> > > Some Valhall GPUs require resets when encountering bus faults due to > occlusion query writes. Add the issue bit for this and handle it. > > Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> (although one nit below) Just in case any one is wondering - these bus faults occur when switching the GPU's MMU to unmapped - it's not a normal "bus fault" from the external bus. This is triggered by an attempt to read unmapped memory which is completed by the driver by switching the entire MMU to unmapped. > --- > drivers/gpu/drm/panfrost/panfrost_device.c | 9 +++++++-- > drivers/gpu/drm/panfrost/panfrost_issues.h | 4 ++++ > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c > index 7f51a4682ccb..ee612303f076 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_device.c > +++ b/drivers/gpu/drm/panfrost/panfrost_device.c > @@ -11,6 +11,7 @@ > #include "panfrost_device.h" > #include "panfrost_devfreq.h" > #include "panfrost_features.h" > +#include "panfrost_issues.h" > #include "panfrost_gpu.h" > #include "panfrost_job.h" > #include "panfrost_mmu.h" > @@ -380,9 +381,13 @@ const char *panfrost_exception_name(u32 exception_code) > bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, > u32 exception_code) > { > - /* Right now, none of the GPU we support need a reset, but this > - * might change. > + /* If an occlusion query write causes a bus fault on affected GPUs, > + * future fragment jobs may hang. Reset to workaround. > */ > + if (exception_code == DRM_PANFROST_EXCEPTION_JOB_BUS_FAULT) > + return panfrost_has_hw_issue(pfdev, HW_ISSUE_TTRX_3076); > + > + /* No other GPUs we support need a reset */ > return false; > } > > diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h > index a66692663833..058f6a4c8435 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_issues.h > +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h > @@ -128,6 +128,10 @@ enum panfrost_hw_issue { > /* Must set SC_VAR_ALGORITHM */ > HW_ISSUE_TTRX_2968_TTRX_3162, > > + /* Bus fault from occlusion query write may cause future fragment jobs > + * to hang */ NIT: Kernel comment style has the "/*" and "*/" on lines by themselves for multi-line comments. checkpatch will complain! > + HW_ISSUE_TTRX_3076, > + > HW_ISSUE_END > }; >
On Mon, Feb 14, 2022 at 04:23:18PM +0000, Steven Price wrote: > On 11/02/2022 20:27, alyssa.rosenzweig@collabora.com wrote: > > From: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> > > > > Some Valhall GPUs require resets when encountering bus faults due to > > occlusion query writes. Add the issue bit for this and handle it. > > > > Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> > > Reviewed-by: Steven Price <steven.price@arm.com> > (although one nit below) > > Just in case any one is wondering - these bus faults occur when > switching the GPU's MMU to unmapped - it's not a normal "bus fault" from > the external bus. This is triggered by an attempt to read unmapped > memory which is completed by the driver by switching the entire MMU to > unmapped. Ouch, that's subtle. > > diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h > > index a66692663833..058f6a4c8435 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_issues.h > > +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h > > @@ -128,6 +128,10 @@ enum panfrost_hw_issue { > > /* Must set SC_VAR_ALGORITHM */ > > HW_ISSUE_TTRX_2968_TTRX_3162, > > > > + /* Bus fault from occlusion query write may cause future fragment jobs > > + * to hang */ > > NIT: Kernel comment style has the "/*" and "*/" on lines by themselves > for multi-line comments. checkpatch will complain! Yes, I am aware (and checkpatch did complain). The existing multi-line comments in that file do not have the extra lines. Consistency within the file seemed like the lesser evil. If you think it's better to appease checkpatch, I can reformat for v2. (I can also throw in a patch fixing the rest of that file's multiline comments but that seems a bit extra.)
On 14/02/2022 17:06, Alyssa Rosenzweig wrote: > On Mon, Feb 14, 2022 at 04:23:18PM +0000, Steven Price wrote: >> On 11/02/2022 20:27, alyssa.rosenzweig@collabora.com wrote: >>> From: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> >>> >>> Some Valhall GPUs require resets when encountering bus faults due to >>> occlusion query writes. Add the issue bit for this and handle it. >>> >>> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> >> >> Reviewed-by: Steven Price <steven.price@arm.com> >> (although one nit below) >> >> Just in case any one is wondering - these bus faults occur when >> switching the GPU's MMU to unmapped - it's not a normal "bus fault" from >> the external bus. This is triggered by an attempt to read unmapped >> memory which is completed by the driver by switching the entire MMU to >> unmapped. > > Ouch, that's subtle. > >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h >>> index a66692663833..058f6a4c8435 100644 >>> --- a/drivers/gpu/drm/panfrost/panfrost_issues.h >>> +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h >>> @@ -128,6 +128,10 @@ enum panfrost_hw_issue { >>> /* Must set SC_VAR_ALGORITHM */ >>> HW_ISSUE_TTRX_2968_TTRX_3162, >>> >>> + /* Bus fault from occlusion query write may cause future fragment jobs >>> + * to hang */ >> >> NIT: Kernel comment style has the "/*" and "*/" on lines by themselves >> for multi-line comments. checkpatch will complain! > > Yes, I am aware (and checkpatch did complain). The existing multi-line > comments in that file do not have the extra lines. Consistency within > the file seemed like the lesser evil. If you think it's better to > appease checkpatch, I can reformat for v2. > > (I can also throw in a patch fixing the rest of that file's multiline > comments but that seems a bit extra.) Nah! I've never been very keen on that style rule myself, but I usually try to keep checkpatch happy when working on the kernel. Lets just follow the rest of the file for now. Thanks, Steve
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c index 7f51a4682ccb..ee612303f076 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.c +++ b/drivers/gpu/drm/panfrost/panfrost_device.c @@ -11,6 +11,7 @@ #include "panfrost_device.h" #include "panfrost_devfreq.h" #include "panfrost_features.h" +#include "panfrost_issues.h" #include "panfrost_gpu.h" #include "panfrost_job.h" #include "panfrost_mmu.h" @@ -380,9 +381,13 @@ const char *panfrost_exception_name(u32 exception_code) bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, u32 exception_code) { - /* Right now, none of the GPU we support need a reset, but this - * might change. + /* If an occlusion query write causes a bus fault on affected GPUs, + * future fragment jobs may hang. Reset to workaround. */ + if (exception_code == DRM_PANFROST_EXCEPTION_JOB_BUS_FAULT) + return panfrost_has_hw_issue(pfdev, HW_ISSUE_TTRX_3076); + + /* No other GPUs we support need a reset */ return false; } diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h index a66692663833..058f6a4c8435 100644 --- a/drivers/gpu/drm/panfrost/panfrost_issues.h +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h @@ -128,6 +128,10 @@ enum panfrost_hw_issue { /* Must set SC_VAR_ALGORITHM */ HW_ISSUE_TTRX_2968_TTRX_3162, + /* Bus fault from occlusion query write may cause future fragment jobs + * to hang */ + HW_ISSUE_TTRX_3076, + HW_ISSUE_END };