Message ID | 20231201122109.729006-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm/i915/selftests: Fix engine reset count storage for multi-tile | expand |
Hi Tvrtko, > Engine->id namespace is per-tile so struct igt_live_test->reset_engine[] > needs to be two-dimensional so engine reset counts from all tiles can be > stored with no aliasing. With aliasing, if we had a real multi-tile > platform, the reset counts would be incorrect for same engine instance on > different tiles. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Fixes: 0c29efa23f5c ("drm/i915/selftests: Consider multi-gt instead of to_gt()") > Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> > Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> > Cc: Andi Shyti <andi.shyti@linux.intel.com> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> sorry for being late here... the patch makes sense to me and the CI failures don't look related. Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Thanks, Andi
On 07/12/2023 11:26, Andi Shyti wrote: > Hi Tvrtko, > >> Engine->id namespace is per-tile so struct igt_live_test->reset_engine[] >> needs to be two-dimensional so engine reset counts from all tiles can be >> stored with no aliasing. With aliasing, if we had a real multi-tile >> platform, the reset counts would be incorrect for same engine instance on >> different tiles. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Fixes: 0c29efa23f5c ("drm/i915/selftests: Consider multi-gt instead of to_gt()") >> Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> >> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> >> Cc: Andi Shyti <andi.shyti@linux.intel.com> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > > sorry for being late here... the patch makes sense to me and the > CI failures don't look related. > > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Thanks pushed! There is more work to be done with the fact i915_reset_engine_count has it's own aliasing when used like this, but I opted to leave that for some other time. Regards, Tvrtko
On Thu, Dec 07, 2023 at 11:43:28AM +0000, Tvrtko Ursulin wrote: > > On 07/12/2023 11:26, Andi Shyti wrote: > > Hi Tvrtko, > > > > > Engine->id namespace is per-tile so struct igt_live_test->reset_engine[] > > > needs to be two-dimensional so engine reset counts from all tiles can be > > > stored with no aliasing. With aliasing, if we had a real multi-tile > > > platform, the reset counts would be incorrect for same engine instance on > > > different tiles. > > > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > > Fixes: 0c29efa23f5c ("drm/i915/selftests: Consider multi-gt instead of to_gt()") > > > Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> > > > Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> > > > Cc: Andi Shyti <andi.shyti@linux.intel.com> > > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > > > > sorry for being late here... the patch makes sense to me and the > > CI failures don't look related. > > > > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> > > Thanks pushed! > > There is more work to be done with the fact i915_reset_engine_count has it's > own aliasing when used like this, but I opted to leave that for some other > time. feel free to share if you have some preparatory work done already and I can try to help out. Otherwise I can take a look at it, as well. Andi
On 07/12/2023 11:46, Andi Shyti wrote: > On Thu, Dec 07, 2023 at 11:43:28AM +0000, Tvrtko Ursulin wrote: >> >> On 07/12/2023 11:26, Andi Shyti wrote: >>> Hi Tvrtko, >>> >>>> Engine->id namespace is per-tile so struct igt_live_test->reset_engine[] >>>> needs to be two-dimensional so engine reset counts from all tiles can be >>>> stored with no aliasing. With aliasing, if we had a real multi-tile >>>> platform, the reset counts would be incorrect for same engine instance on >>>> different tiles. >>>> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> Fixes: 0c29efa23f5c ("drm/i915/selftests: Consider multi-gt instead of to_gt()") >>>> Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> >>>> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> >>>> Cc: Andi Shyti <andi.shyti@linux.intel.com> >>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >>> >>> sorry for being late here... the patch makes sense to me and the >>> CI failures don't look related. >>> >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> >> >> Thanks pushed! >> >> There is more work to be done with the fact i915_reset_engine_count has it's >> own aliasing when used like this, but I opted to leave that for some other >> time. > > feel free to share if you have some preparatory work done already > and I can try to help out. Otherwise I can take a look at it, as > well. I don't have any patches I was just noticed when doing this that even though i915_reset_engine_count takes the engine as parameter, the i915->gpu_error is a single gt construct and as such I think using i915_reset_engine_count from per gt selftests is a mismatch. I thought options were to add engine reset counts in the engine itself and use that from selftests. Leaving i915_reset_engine_count to be used from error capture paths. And it probably needs to be renamed accordingly so it is not misleading. But then there may be issues around virtual engines though which this helper conveniently and quietly side stepped. At that point I stopped thinking about it, given how real multi-tile for i915 is not happening, I didn't see it worth the effort. Still the sour taste of a mess remains so if you can think of an elegant and relatively cheap solution I think it would be good to tidy. Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c index 4ddc6d902752..7d41874a49c5 100644 --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c @@ -37,8 +37,9 @@ int igt_live_test_begin(struct igt_live_test *t, } for_each_engine(engine, gt, id) - t->reset_engine[id] = - i915_reset_engine_count(&i915->gpu_error, engine); + t->reset_engine[i][id] = + i915_reset_engine_count(&i915->gpu_error, + engine); } t->reset_global = i915_reset_count(&i915->gpu_error); @@ -66,14 +67,14 @@ int igt_live_test_end(struct igt_live_test *t) for_each_gt(gt, i915, i) { for_each_engine(engine, gt, id) { - if (t->reset_engine[id] == + if (t->reset_engine[i][id] == i915_reset_engine_count(&i915->gpu_error, engine)) continue; gt_err(gt, "%s(%s): engine '%s' was reset %d times!\n", t->func, t->name, engine->name, i915_reset_engine_count(&i915->gpu_error, engine) - - t->reset_engine[id]); + t->reset_engine[i][id]); return -EIO; } } diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.h b/drivers/gpu/drm/i915/selftests/igt_live_test.h index 36ed42736c52..83e3ad430922 100644 --- a/drivers/gpu/drm/i915/selftests/igt_live_test.h +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.h @@ -7,6 +7,7 @@ #ifndef IGT_LIVE_TEST_H #define IGT_LIVE_TEST_H +#include "gt/intel_gt_defines.h" /* for I915_MAX_GT */ #include "gt/intel_engine.h" /* for I915_NUM_ENGINES */ struct drm_i915_private; @@ -17,7 +18,7 @@ struct igt_live_test { const char *name; unsigned int reset_global; - unsigned int reset_engine[I915_NUM_ENGINES]; + unsigned int reset_engine[I915_MAX_GT][I915_NUM_ENGINES]; }; /*