Message ID | 20221105003235.1717908-3-umesh.nerlige.ramappa@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix live busy stats selftest failure | expand |
On 05/11/2022 00:32, Umesh Nerlige Ramappa wrote: > Engine busyness samples around a 10ms period is failing with busyness > ranging approx. from 87% to 115%. The expected range is +/- 5% of the > sample period. > > When determining busyness of active engine, the GuC based engine > busyness implementation relies on a 64 bit timestamp register read. The > latency incurred by this register read causes the failure. > > On DG1, when the test fails, the observed latencies range from 900us - > 1.5ms. Is it at all faster with the locked 2x32 or still the same unexplained display related latencies can happen? > One solution tried was to reduce the latency between reg read and > CPU timestamp capture, but such optimization does not add value to user > since the CPU timestamp obtained here is only used for (1) selftest and > (2) i915 rps implementation specific to execlist scheduler. Also, this > solution only reduces the frequency of failure and does not eliminate > it. > > In order to make the selftest more robust and account for such > latencies, increase the sample period to 100 ms. > > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> > --- > drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > index 0dcb3ed44a73..87c94314cf67 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > @@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg) > ENGINE_TRACE(engine, "measuring busy time\n"); > preempt_disable(); > de = intel_engine_get_busy_time(engine, &t[0]); > - mdelay(10); > + mdelay(100); > de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de); > preempt_enable(); > dt = ktime_sub(t[1], t[0]); Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
On Mon, Nov 07, 2022 at 10:16:20AM +0000, Tvrtko Ursulin wrote: > >On 05/11/2022 00:32, Umesh Nerlige Ramappa wrote: >>Engine busyness samples around a 10ms period is failing with busyness >>ranging approx. from 87% to 115%. The expected range is +/- 5% of the >>sample period. >> >>When determining busyness of active engine, the GuC based engine >>busyness implementation relies on a 64 bit timestamp register read. The >>latency incurred by this register read causes the failure. >> >>On DG1, when the test fails, the observed latencies range from 900us - >>1.5ms. > >Is it at all faster with the locked 2x32 or still the same unexplained >display related latencies can happen? Considering that originally this failed 1 in 10 runs, The locked 2x32 patch in this series reduces failure rate to 1 in 50. What really helps is - if the CPU timestamp is taken within the forcewake block, then the correlation between GPU/CPU times is very good and that reduces the selftest failure frequency (1 in 200). More like this: uncore_lock fw_get read 64-bit GPU time read CPU timestamp fw_put uncore_unlock. I recall we had arrived at this sequence in the past when implementing query_cs_cycles - https://patchwork.freedesktop.org/patch/432041/?series=89766&rev=1 I still included the locked 2x32 patch here because 1 in 50 is still better than 1 in 10. For now, 100 ms sample period is the only promising solution I see. No failures for 1000 runs. Thanks, Umesh > >>One solution tried was to reduce the latency between reg read and >>CPU timestamp capture, but such optimization does not add value to user >>since the CPU timestamp obtained here is only used for (1) selftest and >>(2) i915 rps implementation specific to execlist scheduler. Also, this >>solution only reduces the frequency of failure and does not eliminate >>it. >> >>In order to make the selftest more robust and account for such >>latencies, increase the sample period to 100 ms. >> >>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> >>--- >> drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >>diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c >>index 0dcb3ed44a73..87c94314cf67 100644 >>--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c >>+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c >>@@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg) >> ENGINE_TRACE(engine, "measuring busy time\n"); >> preempt_disable(); >> de = intel_engine_get_busy_time(engine, &t[0]); >>- mdelay(10); >>+ mdelay(100); >> de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de); >> preempt_enable(); >> dt = ktime_sub(t[1], t[0]); > >Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >Regards, > >Tvrtko
On Fri, 04 Nov 2022 17:32:35 -0700, Umesh Nerlige Ramappa wrote: > > Engine busyness samples around a 10ms period is failing with busyness > ranging approx. from 87% to 115%. The expected range is +/- 5% of the > sample period. > > When determining busyness of active engine, the GuC based engine > busyness implementation relies on a 64 bit timestamp register read. The > latency incurred by this register read causes the failure. > > On DG1, when the test fails, the observed latencies range from 900us - > 1.5ms. > > One solution tried was to reduce the latency between reg read and > CPU timestamp capture, but such optimization does not add value to user > since the CPU timestamp obtained here is only used for (1) selftest and > (2) i915 rps implementation specific to execlist scheduler. Also, this > solution only reduces the frequency of failure and does not eliminate > it. > > In order to make the selftest more robust and account for such > latencies, increase the sample period to 100 ms. Hi Umesh, I think it would be good to add to the commit message: * Gitlab bug number if any * Paste of the actual dmesg error in the commit message * Also adapt the above commit message to the fact that we've now added the optimized 64 bit read With that this is: Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> If you want me to review the new commit message I can do that too. Thanks. -- Ashutosh > > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> > --- > drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > index 0dcb3ed44a73..87c94314cf67 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > @@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg) > ENGINE_TRACE(engine, "measuring busy time\n"); > preempt_disable(); > de = intel_engine_get_busy_time(engine, &t[0]); > - mdelay(10); > + mdelay(100); > de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de); > preempt_enable(); > dt = ktime_sub(t[1], t[0]); > -- > 2.36.1 >
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c index 0dcb3ed44a73..87c94314cf67 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c @@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg) ENGINE_TRACE(engine, "measuring busy time\n"); preempt_disable(); de = intel_engine_get_busy_time(engine, &t[0]); - mdelay(10); + mdelay(100); de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de); preempt_enable(); dt = ktime_sub(t[1], t[0]);
Engine busyness samples around a 10ms period is failing with busyness ranging approx. from 87% to 115%. The expected range is +/- 5% of the sample period. When determining busyness of active engine, the GuC based engine busyness implementation relies on a 64 bit timestamp register read. The latency incurred by this register read causes the failure. On DG1, when the test fails, the observed latencies range from 900us - 1.5ms. One solution tried was to reduce the latency between reg read and CPU timestamp capture, but such optimization does not add value to user since the CPU timestamp obtained here is only used for (1) selftest and (2) i915 rps implementation specific to execlist scheduler. Also, this solution only reduces the frequency of failure and does not eliminate it. In order to make the selftest more robust and account for such latencies, increase the sample period to 100 ms. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> --- drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)