Message ID | 20241205081413.1529252-1-raag.jadav@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] drm/i915/selftest: Log throttle reasons on failure | expand |
Hi Raag, On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote: > Log throttle reasons on selftest failure which will be useful for > debugging. > > Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Thanks, Andi
On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote: > Log throttle reasons on selftest failure which will be useful for > debugging. > > Signed-off-by: Raag Jadav <raag.jadav@intel.com> > --- > drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c > index dcef8d498919..1e0e59bc69b6 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c > @@ -478,8 +478,11 @@ int live_rps_control(void *arg) > min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt)); > > if (limit == rps->min_freq) { I was going to merge this, but then I noticed that this prints only when the throttle moves that to our min_freq... When PCODE throttle the freq, the guaranteed freq can be at any point, not necessarily to the minimal, so this print is not very effective in the end of the day > - pr_err("%s: GPU throttled to minimum!\n", > - engine->name); > + u32 throttle = intel_uncore_read(gt->uncore, > + intel_gt_perf_limit_reasons_reg(gt)); > + > + pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n", > + engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK); > show_pstate_limits(rps); > err = -ENODEV; > break; > -- > 2.34.1 >
Cc: Chris On Fri, Dec 06, 2024 at 10:45:18AM -0500, Rodrigo Vivi wrote: > On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote: > > Log throttle reasons on selftest failure which will be useful for > > debugging. > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com> > > --- > > drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c > > index dcef8d498919..1e0e59bc69b6 100644 > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c > > @@ -478,8 +478,11 @@ int live_rps_control(void *arg) > > min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt)); > > > > if (limit == rps->min_freq) { > > I was going to merge this, but then I noticed that this prints only > when the throttle moves that to our min_freq... When PCODE throttle > the freq, the guaranteed freq can be at any point, not necessarily > to the minimal, so this print is not very effective in the end of the day Makes me wonder why such a criteria at all? Raag > > - pr_err("%s: GPU throttled to minimum!\n", > > - engine->name); > > + u32 throttle = intel_uncore_read(gt->uncore, > > + intel_gt_perf_limit_reasons_reg(gt)); > > + > > + pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n", > > + engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK); > > show_pstate_limits(rps); > > err = -ENODEV; > > break; > > -- > > 2.34.1 > >
On Sat, Dec 07, 2024 at 08:14:42AM +0200, Raag Jadav wrote: > Cc: Chris > > On Fri, Dec 06, 2024 at 10:45:18AM -0500, Rodrigo Vivi wrote: > > On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote: > > > Log throttle reasons on selftest failure which will be useful for > > > debugging. > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com> > > > --- > > > drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c > > > index dcef8d498919..1e0e59bc69b6 100644 > > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c > > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c > > > @@ -478,8 +478,11 @@ int live_rps_control(void *arg) > > > min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt)); > > > > > > if (limit == rps->min_freq) { > > > > I was going to merge this, but then I noticed that this prints only > > when the throttle moves that to our min_freq... When PCODE throttle > > the freq, the guaranteed freq can be at any point, not necessarily > > to the minimal, so this print is not very effective in the end of the day > > Makes me wonder why such a criteria at all? very good question... Perhaps we need to revamp entirely this selftest or kill it? > > Raag > > > > - pr_err("%s: GPU throttled to minimum!\n", > > > - engine->name); > > > + u32 throttle = intel_uncore_read(gt->uncore, > > > + intel_gt_perf_limit_reasons_reg(gt)); > > > + > > > + pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n", > > > + engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK); > > > show_pstate_limits(rps); > > > err = -ENODEV; > > > break; > > > -- > > > 2.34.1 > > >
On Mon, Dec 09, 2024 at 11:28:39AM -0500, Rodrigo Vivi wrote: > On Sat, Dec 07, 2024 at 08:14:42AM +0200, Raag Jadav wrote: > > Cc: Chris > > > > On Fri, Dec 06, 2024 at 10:45:18AM -0500, Rodrigo Vivi wrote: > > > On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote: > > > > Log throttle reasons on selftest failure which will be useful for > > > > debugging. > > > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com> > > > > --- > > > > drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++-- > > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > index dcef8d498919..1e0e59bc69b6 100644 > > > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > @@ -478,8 +478,11 @@ int live_rps_control(void *arg) > > > > min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt)); > > > > > > > > if (limit == rps->min_freq) { > > > > > > I was going to merge this, but then I noticed that this prints only > > > when the throttle moves that to our min_freq... When PCODE throttle > > > the freq, the guaranteed freq can be at any point, not necessarily > > > to the minimal, so this print is not very effective in the end of the day > > > > Makes me wonder why such a criteria at all? > > very good question... > Perhaps we need to revamp entirely this selftest or kill it? Depends. Do we qualify throttling as a failure? If yes, we'll keep hitting this every now and then. If no, then just dropping this condition might be enough. Raag > > > > - pr_err("%s: GPU throttled to minimum!\n", > > > > - engine->name); > > > > + u32 throttle = intel_uncore_read(gt->uncore, > > > > + intel_gt_perf_limit_reasons_reg(gt)); > > > > + > > > > + pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n", > > > > + engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK); > > > > show_pstate_limits(rps); > > > > err = -ENODEV; > > > > break; > > > > -- > > > > 2.34.1 > > > >
On Tue, Dec 10, 2024 at 10:53:10AM +0200, Raag Jadav wrote: > On Mon, Dec 09, 2024 at 11:28:39AM -0500, Rodrigo Vivi wrote: > > On Sat, Dec 07, 2024 at 08:14:42AM +0200, Raag Jadav wrote: > > > Cc: Chris > > > > > > On Fri, Dec 06, 2024 at 10:45:18AM -0500, Rodrigo Vivi wrote: > > > > On Thu, Dec 05, 2024 at 01:44:13PM +0530, Raag Jadav wrote: > > > > > Log throttle reasons on selftest failure which will be useful for > > > > > debugging. > > > > > > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com> > > > > > --- > > > > > drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++-- > > > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > > index dcef8d498919..1e0e59bc69b6 100644 > > > > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > > @@ -478,8 +478,11 @@ int live_rps_control(void *arg) > > > > > min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt)); > > > > > > > > > > if (limit == rps->min_freq) { > > > > > > > > I was going to merge this, but then I noticed that this prints only > > > > when the throttle moves that to our min_freq... When PCODE throttle > > > > the freq, the guaranteed freq can be at any point, not necessarily > > > > to the minimal, so this print is not very effective in the end of the day > > > > > > Makes me wonder why such a criteria at all? > > > > very good question... > > Perhaps we need to revamp entirely this selftest or kill it? > > Depends. Do we qualify throttling as a failure? > If yes, we'll keep hitting this every now and then. > If no, then just dropping this condition might be enough. hmm that will make CI angry... we can remove the condition and then tune down the msg to debug and not error. But perhaps the test was done with the assumption in mind that a throttle to a minimum is a catastrofic error, which I disagree. Throttle is throttle is normal operation and depending on many external factors and many things that are out of our control and that changes from platform to platform. > > Raag > > > > > > - pr_err("%s: GPU throttled to minimum!\n", > > > > > - engine->name); > > > > > + u32 throttle = intel_uncore_read(gt->uncore, > > > > > + intel_gt_perf_limit_reasons_reg(gt)); > > > > > + > > > > > + pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n", > > > > > + engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK); > > > > > show_pstate_limits(rps); > > > > > err = -ENODEV; > > > > > break; > > > > > -- > > > > > 2.34.1 > > > > >
diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c index dcef8d498919..1e0e59bc69b6 100644 --- a/drivers/gpu/drm/i915/gt/selftest_rps.c +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c @@ -478,8 +478,11 @@ int live_rps_control(void *arg) min, max, ktime_to_ns(min_dt), ktime_to_ns(max_dt)); if (limit == rps->min_freq) { - pr_err("%s: GPU throttled to minimum!\n", - engine->name); + u32 throttle = intel_uncore_read(gt->uncore, + intel_gt_perf_limit_reasons_reg(gt)); + + pr_err("%s: GPU throttled to minimum frequency with reasons 0x%08x\n", + engine->name, throttle & GT0_PERF_LIMIT_REASONS_MASK); show_pstate_limits(rps); err = -ENODEV; break;
Log throttle reasons on selftest failure which will be useful for debugging. Signed-off-by: Raag Jadav <raag.jadav@intel.com> --- drivers/gpu/drm/i915/gt/selftest_rps.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)