diff mbox series

[v1] drm/i915/selftest: Log throttle reasons on failure

Message ID 20241205081413.1529252-1-raag.jadav@intel.com (mailing list archive)
State New
Headers show
Series [v1] drm/i915/selftest: Log throttle reasons on failure | expand

Commit Message

Raag Jadav Dec. 5, 2024, 8:14 a.m. UTC
Log throttle reasons on selftest failure which will be useful for
debugging.

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Andi Shyti Dec. 5, 2024, 1:20 p.m. UTC | #1
Hi Raag,

On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote:
> Log throttle reasons on selftest failure which will be useful for
> debugging.
> 
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi
Rodrigo Vivi Dec. 6, 2024, 3:45 p.m. UTC | #2
On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote:
> Log throttle reasons on selftest failure which will be useful for
> debugging.
> 
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
> index dcef8d498919..1e0e59bc69b6 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_rps.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
> @@ -478,8 +478,11 @@ int live_rps_control(void *arg)
>  			min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt));
>  
>  		if (limit == rps->min_freq) {

I was going to merge this, but then I noticed that this prints only
when the throttle moves that to our min_freq...  When PCODE throttle
the freq, the guaranteed freq can be at any point, not necessarily
to the minimal, so this print is not very effective in the end of the day

> -			pr_err("%s: GPU throttled to minimum!\n",
> -			       engine->name);
> +			u32 throttle = intel_uncore_read(gt->uncore,
> +							 intel_gt_perf_limit_reasons_reg(gt));
> +
> +			pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n",
> +			       engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK);
>  			show_pstate_limits(rps);
>  			err = -ENODEV;
>  			break;
> -- 
> 2.34.1
>
Raag Jadav Dec. 7, 2024, 6:14 a.m. UTC | #3
Cc: Chris

On Fri, Dec 06, 2024 at 10:45:18AM -0500, Rodrigo Vivi wrote:
> On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote:
> > Log throttle reasons on selftest failure which will be useful for
> > debugging.
> > 
> > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > index dcef8d498919..1e0e59bc69b6 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > @@ -478,8 +478,11 @@ int live_rps_control(void *arg)
> >  			min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt));
> >  
> >  		if (limit == rps->min_freq) {
> 
> I was going to merge this, but then I noticed that this prints only
> when the throttle moves that to our min_freq...  When PCODE throttle
> the freq, the guaranteed freq can be at any point, not necessarily
> to the minimal, so this print is not very effective in the end of the day

Makes me wonder why such a criteria at all?

Raag

> > -			pr_err("%s: GPU throttled to minimum!\n",
> > -			       engine->name);
> > +			u32 throttle = intel_uncore_read(gt->uncore,
> > +							 intel_gt_perf_limit_reasons_reg(gt));
> > +
> > +			pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n",
> > +			       engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK);
> >  			show_pstate_limits(rps);
> >  			err = -ENODEV;
> >  			break;
> > -- 
> > 2.34.1
> >
Rodrigo Vivi Dec. 9, 2024, 4:28 p.m. UTC | #4
On Sat, Dec 07, 2024 at 08:14:42AM +0200, Raag Jadav wrote:
> Cc: Chris
> 
> On Fri, Dec 06, 2024 at 10:45:18AM -0500, Rodrigo Vivi wrote:
> > On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote:
> > > Log throttle reasons on selftest failure which will be useful for
> > > debugging.
> > > 
> > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > index dcef8d498919..1e0e59bc69b6 100644
> > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > @@ -478,8 +478,11 @@ int live_rps_control(void *arg)
> > >  			min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt));
> > >  
> > >  		if (limit == rps->min_freq) {
> > 
> > I was going to merge this, but then I noticed that this prints only
> > when the throttle moves that to our min_freq...  When PCODE throttle
> > the freq, the guaranteed freq can be at any point, not necessarily
> > to the minimal, so this print is not very effective in the end of the day
> 
> Makes me wonder why such a criteria at all?

very good question...
Perhaps we need to revamp entirely this selftest or kill it?

> 
> Raag
> 
> > > -			pr_err("%s: GPU throttled to minimum!\n",
> > > -			       engine->name);
> > > +			u32 throttle = intel_uncore_read(gt->uncore,
> > > +							 intel_gt_perf_limit_reasons_reg(gt));
> > > +
> > > +			pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n",
> > > +			       engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK);
> > >  			show_pstate_limits(rps);
> > >  			err = -ENODEV;
> > >  			break;
> > > -- 
> > > 2.34.1
> > >
Raag Jadav Dec. 10, 2024, 8:53 a.m. UTC | #5
On Mon, Dec 09, 2024 at 11:28:39AM -0500, Rodrigo Vivi wrote:
> On Sat, Dec 07, 2024 at 08:14:42AM +0200, Raag Jadav wrote:
> > Cc: Chris
> > 
> > On Fri, Dec 06, 2024 at 10:45:18AM -0500, Rodrigo Vivi wrote:
> > > On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote:
> > > > Log throttle reasons on selftest failure which will be useful for
> > > > debugging.
> > > > 
> > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++--
> > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > > index dcef8d498919..1e0e59bc69b6 100644
> > > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > > @@ -478,8 +478,11 @@ int live_rps_control(void *arg)
> > > >  			min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt));
> > > >  
> > > >  		if (limit == rps->min_freq) {
> > > 
> > > I was going to merge this, but then I noticed that this prints only
> > > when the throttle moves that to our min_freq...  When PCODE throttle
> > > the freq, the guaranteed freq can be at any point, not necessarily
> > > to the minimal, so this print is not very effective in the end of the day
> > 
> > Makes me wonder why such a criteria at all?
> 
> very good question...
> Perhaps we need to revamp entirely this selftest or kill it?

Depends. Do we qualify throttling as a failure?
If yes, we'll keep hitting this every now and then.
If no, then just dropping this condition might be enough.

Raag

> > > > -			pr_err("%s: GPU throttled to minimum!\n",
> > > > -			       engine->name);
> > > > +			u32 throttle = intel_uncore_read(gt->uncore,
> > > > +							 intel_gt_perf_limit_reasons_reg(gt));
> > > > +
> > > > +			pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n",
> > > > +			       engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK);
> > > >  			show_pstate_limits(rps);
> > > >  			err = -ENODEV;
> > > >  			break;
> > > > -- 
> > > > 2.34.1
> > > >
Rodrigo Vivi Dec. 10, 2024, 10:38 p.m. UTC | #6
On Tue, Dec 10, 2024 at 10:53:10AM +0200, Raag Jadav wrote:
> On Mon, Dec 09, 2024 at 11:28:39AM -0500, Rodrigo Vivi wrote:
> > On Sat, Dec 07, 2024 at 08:14:42AM +0200, Raag Jadav wrote:
> > > Cc: Chris
> > > 
> > > On Fri, Dec 06, 2024 at 10:45:18AM -0500, Rodrigo Vivi wrote:
> > > > On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote:
> > > > > Log throttle reasons on selftest failure which will be useful for
> > > > > debugging.
> > > > > 
> > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++--
> > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > > > index dcef8d498919..1e0e59bc69b6 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > > > @@ -478,8 +478,11 @@ int live_rps_control(void *arg)
> > > > >  			min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt));
> > > > >  
> > > > >  		if (limit == rps->min_freq) {
> > > > 
> > > > I was going to merge this, but then I noticed that this prints only
> > > > when the throttle moves that to our min_freq...  When PCODE throttle
> > > > the freq, the guaranteed freq can be at any point, not necessarily
> > > > to the minimal, so this print is not very effective in the end of the day
> > > 
> > > Makes me wonder why such a criteria at all?
> > 
> > very good question...
> > Perhaps we need to revamp entirely this selftest or kill it?
> 
> Depends. Do we qualify throttling as a failure?
> If yes, we'll keep hitting this every now and then.
> If no, then just dropping this condition might be enough.

hmm that will make CI angry... we can remove the condition and
then tune down the msg to debug and not error.

But perhaps the test was done with the assumption in mind that
a throttle to a minimum is a catastrofic error, which I disagree.

Throttle is throttle is normal operation and depending on many
external factors and many things that are out of our control and
that changes from platform to platform.

> 
> Raag
> 
> > > > > -			pr_err("%s: GPU throttled to minimum!\n",
> > > > > -			       engine->name);
> > > > > +			u32 throttle = intel_uncore_read(gt->uncore,
> > > > > +							 intel_gt_perf_limit_reasons_reg(gt));
> > > > > +
> > > > > +			pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n",
> > > > > +			       engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK);
> > > > >  			show_pstate_limits(rps);
> > > > >  			err = -ENODEV;
> > > > >  			break;
> > > > > -- 
> > > > > 2.34.1
> > > > >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
index dcef8d498919..1e0e59bc69b6 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -478,8 +478,11 @@  int live_rps_control(void *arg)
 			min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt));
 
 		if (limit == rps->min_freq) {
-			pr_err("%s: GPU throttled to minimum!\n",
-			       engine->name);
+			u32 throttle = intel_uncore_read(gt->uncore,
+							 intel_gt_perf_limit_reasons_reg(gt));
+
+			pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n",
+			       engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK);
 			show_pstate_limits(rps);
 			err = -ENODEV;
 			break;