diff mbox

[i-g-t,v2] tests/perf_pmu: Test busyness reporting in face of GPU hangs

Message ID 20180228171519.19657-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin Feb. 28, 2018, 5:15 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Verify that the reported busyness is in line with what would we expect
from a batch which causes a hang and gets kicked out from the engine.

v2: Change to explicit igt_force_gpu_reset instead of guessing when a spin
    batch will hang. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

Comments

Chris Wilson March 1, 2018, 8:08 a.m. UTC | #1
Quoting Tvrtko Ursulin (2018-02-28 17:15:19)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Verify that the reported busyness is in line with what would we expect
> from a batch which causes a hang and gets kicked out from the engine.
> 
> v2: Change to explicit igt_force_gpu_reset instead of guessing when a spin
>     batch will hang. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

It's nice and quick, yes. However, sometime the opposite is true and you
have to wait for the batch you want to start before pulling the trigger.

I'd put a usleep(100) in there /* Wait for batch to execute */ and we
should put the wait-for-execution ability in igt_spin_t. Unfortunately
that requires MI_STORE_DWORD_IMM (or more creativity) limiting it's
availability.

With the sleep issue addressed (commented upon if nothing else),
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Tvrtko Ursulin March 1, 2018, 9:21 a.m. UTC | #2
On 01/03/2018 08:08, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-28 17:15:19)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Verify that the reported busyness is in line with what would we expect
>> from a batch which causes a hang and gets kicked out from the engine.
>>
>> v2: Change to explicit igt_force_gpu_reset instead of guessing when a spin
>>      batch will hang. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> It's nice and quick, yes. However, sometime the opposite is true and you
> have to wait for the batch you want to start before pulling the trigger.
> 
> I'd put a usleep(100) in there /* Wait for batch to execute */ and we

Hm, but reset is triggered after the first sleep (which checks for 100% 
busy). So even the non-hanging test flavour could be affect if the delay 
before execution is so long. So by this logic this usleep(100) (or more 
for small core CI) should then go to many tests. Only difference in the 
hang flavour is that if it completely failed to run until after the 
reset, then the idle assert would fail. So maybe I should just add an 
assert that the batch is idle before sampling pmu after reset?

Regards,

Tvrtko

> should put the wait-for-execution ability in igt_spin_t. Unfortunately
> that requires MI_STORE_DWORD_IMM (or more creativity) limiting it's
> availability.
> 
> With the sleep issue addressed (commented upon if nothing else),
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Chris Wilson March 1, 2018, 9:27 a.m. UTC | #3
Quoting Tvrtko Ursulin (2018-03-01 09:21:52)
> 
> On 01/03/2018 08:08, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-02-28 17:15:19)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Verify that the reported busyness is in line with what would we expect
> >> from a batch which causes a hang and gets kicked out from the engine.
> >>
> >> v2: Change to explicit igt_force_gpu_reset instead of guessing when a spin
> >>      batch will hang. (Chris Wilson)
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > It's nice and quick, yes. However, sometime the opposite is true and you
> > have to wait for the batch you want to start before pulling the trigger.
> > 
> > I'd put a usleep(100) in there /* Wait for batch to execute */ and we
> 
> Hm, but reset is triggered after the first sleep (which checks for 100% 
> busy). So even the non-hanging test flavour could be affect if the delay 
> before execution is so long. So by this logic this usleep(100) (or more 
> for small core CI) should then go to many tests. Only difference in the 
> hang flavour is that if it completely failed to run until after the 
> reset, then the idle assert would fail. So maybe I should just add an 
> assert that the batch is idle before sampling pmu after reset?

Sneaky. Yes, that will work, just add a comment for the case where it
may fail (reset before batch execution).
-Chris
diff mbox

Patch

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 3bbb18d2f216..f5c70776e2cf 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -168,6 +168,7 @@  static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
 #define TEST_TRAILING_IDLE (4)
 #define TEST_RUNTIME_PM (8)
 #define FLAG_LONG (16)
+#define FLAG_HANG (32)
 
 static void end_spin(int fd, igt_spin_t *spin, unsigned int flags)
 {
@@ -204,11 +205,27 @@  single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
 		end_spin(gem_fd, spin, flags);
 	val = pmu_read_single(fd) - val;
 
-	end_spin(gem_fd, spin, FLAG_SYNC);
+	if (flags & FLAG_HANG)
+		igt_force_gpu_reset(gem_fd);
+	else
+		end_spin(gem_fd, spin, FLAG_SYNC);
+
+	assert_within_epsilon(val, flags & TEST_BUSY ? slept : 0.f, tolerance);
+
+	/* Check for idle after hang. */
+	if (flags & FLAG_HANG) {
+		/* Sleep for a bit for reset unwind to settle. */
+		sleep(1);
+		val = pmu_read_single(fd);
+		slept = measured_usleep(batch_duration_ns / 1000);
+		val = pmu_read_single(fd) - val;
+
+		assert_within_epsilon(val, 0, tolerance);
+	}
+
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd);
 
-	assert_within_epsilon(val, flags & TEST_BUSY ? slept : 0.f, tolerance);
 	gem_quiescent_gpu(gem_fd);
 }
 
@@ -1690,6 +1707,9 @@  igt_main
 					      pct[i], e->name)
 					accuracy(fd, e, pct[i]);
 			}
+
+			igt_subtest_f("busy-hang-%s", e->name)
+				single(fd, e, TEST_BUSY | FLAG_HANG);
 		}
 
 		/**