Message ID | 20211112025222.61031-1-umesh.nerlige.ramappa@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/pmu: Increase the live_engine_busy_stats sample period | expand |
On Thu, Nov 11, 2021 at 06:52:22PM -0800, Umesh Nerlige Ramappa wrote: > Irrespective of the backend for request submissions, busyness for an > engine with an active context is calculated using: > > busyness = total + (current_time - context_switch_in_time) > > In execlists mode of operation, the context switch events are handled > by the CPU. Context switch in/out time and current_time are captured > in CPU time domain using ktime_get(). > > In GuC mode of submission, context switch events are handled by GuC and > the times in the above formula are captured in GT clock domain. This > information is shared with the CPU through shared memory. This results > in 2 caveats: > > 1) The time taken between start of a batch and the time that CPU is able > to see the context_switch_in_time in shared memory is dependent on GuC > and memory bandwidth constraints. > > 2) Determining current_time requires an MMIO read that can take anywhere > between a few us to a couple ms. A reference CPU time is captured soon > after reading the MMIO so that the caller can compare the cpu delta > between 2 busyness samples. The issue here is that the CPU delta and the > busyness delta can be skewed because of the time taken to read the > register. > > These 2 factors affect the accuracy of the selftest - > live_engine_busy_stats. For (1) the selftest waits until busyness stats > are visible to the CPU. The effects of (2) are more prominent for the > current busyness sample period of 100 us. Increase the busyness sample > period from 100 us to 10 ms to overccome (2). > > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Explaination of increased wait period makes sense to me. With that: Reviewed-by: Matthew Brost <matthew.brost@intel.com> > --- > drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > index 0bfd738dbf3a..96cc565afa78 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c > @@ -316,7 +316,7 @@ static int live_engine_busy_stats(void *arg) > ENGINE_TRACE(engine, "measuring busy time\n"); > preempt_disable(); > de = intel_engine_get_busy_time(engine, &t[0]); > - udelay(100); > + udelay(10000); > de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de); > preempt_enable(); > dt = ktime_sub(t[1], t[0]); > -- > 2.20.1 >
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c index 0bfd738dbf3a..96cc565afa78 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c @@ -316,7 +316,7 @@ static int live_engine_busy_stats(void *arg) ENGINE_TRACE(engine, "measuring busy time\n"); preempt_disable(); de = intel_engine_get_busy_time(engine, &t[0]); - udelay(100); + udelay(10000); de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de); preempt_enable(); dt = ktime_sub(t[1], t[0]);
Irrespective of the backend for request submissions, busyness for an engine with an active context is calculated using: busyness = total + (current_time - context_switch_in_time) In execlists mode of operation, the context switch events are handled by the CPU. Context switch in/out time and current_time are captured in CPU time domain using ktime_get(). In GuC mode of submission, context switch events are handled by GuC and the times in the above formula are captured in GT clock domain. This information is shared with the CPU through shared memory. This results in 2 caveats: 1) The time taken between start of a batch and the time that CPU is able to see the context_switch_in_time in shared memory is dependent on GuC and memory bandwidth constraints. 2) Determining current_time requires an MMIO read that can take anywhere between a few us to a couple ms. A reference CPU time is captured soon after reading the MMIO so that the caller can compare the cpu delta between 2 busyness samples. The issue here is that the CPU delta and the busyness delta can be skewed because of the time taken to read the register. These 2 factors affect the accuracy of the selftest - live_engine_busy_stats. For (1) the selftest waits until busyness stats are visible to the CPU. The effects of (2) are more prominent for the current busyness sample period of 100 us. Increase the busyness sample period from 100 us to 10 ms to overccome (2). Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> --- drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)