Message ID | 20211011175704.28509-1-matthew.brost@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/selftests: Increase timeout in requests perf selftest | expand |
On 10/11/2021 10:57, Matthew Brost wrote: > perf_parallel_engines is micro benchmark to test i915 request > scheduling. The test creates a thread per physical engine and submits > NOP requests and waits the requests to complete in a loop. In execlists > mode this works perfectly fine as powerful CPU has enough cores to feed > each engine and process the CSBs. With GuC submission the uC gets > overwhelmed as all threads feed into a single CTB channel and the GuC > gets bombarded with CSBs as contexts are immediately switched in and out > on the engines due to the zero runtime of the requests. When the GuC is > overwhelmed scheduling of contexts is unfair due to the nature of the > GuC scheduling algorithm. This behavior is understood and deemed > acceptable as this micro benchmark isn't close to real world use case. > Increasing the timeout of wait period for requests to complete. This > makes the test understand that is ok for contexts to get starved in this > scenario. > > A future patch / cleanup may just delete these micro benchmark tests as > they basically mean nothing. We care about real workloads not made up > ones. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> > --- > drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c > index d67710d10615..6496671a113c 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_request.c > +++ b/drivers/gpu/drm/i915/selftests/i915_request.c > @@ -2805,7 +2805,7 @@ static int p_sync0(void *arg) > i915_request_add(rq); > > err = 0; > - if (i915_request_wait(rq, 0, HZ / 5) < 0) > + if (i915_request_wait(rq, 0, HZ) < 0) > err = -ETIME; > i915_request_put(rq); > if (err) > @@ -2876,7 +2876,7 @@ static int p_sync1(void *arg) > i915_request_add(rq); > > err = 0; > - if (prev && i915_request_wait(prev, 0, HZ / 5) < 0) > + if (prev && i915_request_wait(prev, 0, HZ) < 0) > err = -ETIME; > i915_request_put(prev); > prev = rq;
On Wed, 2021-10-20 at 13:34 -0700, John Harrison wrote: > On 10/11/2021 10:57, Matthew Brost wrote: > > perf_parallel_engines is micro benchmark to test i915 request > > scheduling. The test creates a thread per physical engine and > > submits > > NOP requests and waits the requests to complete in a loop. In > > execlists > > mode this works perfectly fine as powerful CPU has enough cores to > > feed > > each engine and process the CSBs. With GuC submission the uC gets > > overwhelmed as all threads feed into a single CTB channel and the > > GuC > > gets bombarded with CSBs as contexts are immediately switched in > > and out > > on the engines due to the zero runtime of the requests. When the > > GuC is > > overwhelmed scheduling of contexts is unfair due to the nature of > > the > > GuC scheduling algorithm. This behavior is understood and deemed > > acceptable as this micro benchmark isn't close to real world use > > case. > > Increasing the timeout of wait period for requests to complete. > > This > > makes the test understand that is ok for contexts to get starved in > > this > > scenario. > > > > A future patch / cleanup may just delete these micro benchmark > > tests as > > they basically mean nothing. We care about real workloads not made > > up > > ones. > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Also Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> I think one important thing to keep in mind here is that this selftest actually *did* find a flaw, Albeit it upon analysis turned out not to be significant. But given that, user-space clients like, for example, gem_exec_suspend seems to be able to trigger similar delays as well at least to some extend with a huge amount of small requests submitted from user-space we shold probably verify at some point that this isn't exploitable by a malicious client starving other clients on the same system. /Thomas > > > --- > > drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c > > b/drivers/gpu/drm/i915/selftests/i915_request.c > > index d67710d10615..6496671a113c 100644 > > --- a/drivers/gpu/drm/i915/selftests/i915_request.c > > +++ b/drivers/gpu/drm/i915/selftests/i915_request.c > > @@ -2805,7 +2805,7 @@ static int p_sync0(void *arg) > > i915_request_add(rq); > > > > err = 0; > > - if (i915_request_wait(rq, 0, HZ / 5) < 0) > > + if (i915_request_wait(rq, 0, HZ) < 0) > > err = -ETIME; > > i915_request_put(rq); > > if (err) > > @@ -2876,7 +2876,7 @@ static int p_sync1(void *arg) > > i915_request_add(rq); > > > > err = 0; > > - if (prev && i915_request_wait(prev, 0, HZ / 5) < 0) > > + if (prev && i915_request_wait(prev, 0, HZ) < 0) > > err = -ETIME; > > i915_request_put(prev); > > prev = rq; >
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index d67710d10615..6496671a113c 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -2805,7 +2805,7 @@ static int p_sync0(void *arg) i915_request_add(rq); err = 0; - if (i915_request_wait(rq, 0, HZ / 5) < 0) + if (i915_request_wait(rq, 0, HZ) < 0) err = -ETIME; i915_request_put(rq); if (err) @@ -2876,7 +2876,7 @@ static int p_sync1(void *arg) i915_request_add(rq); err = 0; - if (prev && i915_request_wait(prev, 0, HZ / 5) < 0) + if (prev && i915_request_wait(prev, 0, HZ) < 0) err = -ETIME; i915_request_put(prev); prev = rq;
perf_parallel_engines is micro benchmark to test i915 request scheduling. The test creates a thread per physical engine and submits NOP requests and waits the requests to complete in a loop. In execlists mode this works perfectly fine as powerful CPU has enough cores to feed each engine and process the CSBs. With GuC submission the uC gets overwhelmed as all threads feed into a single CTB channel and the GuC gets bombarded with CSBs as contexts are immediately switched in and out on the engines due to the zero runtime of the requests. When the GuC is overwhelmed scheduling of contexts is unfair due to the nature of the GuC scheduling algorithm. This behavior is understood and deemed acceptable as this micro benchmark isn't close to real world use case. Increasing the timeout of wait period for requests to complete. This makes the test understand that is ok for contexts to get starved in this scenario. A future patch / cleanup may just delete these micro benchmark tests as they basically mean nothing. We care about real workloads not made up ones. Signed-off-by: Matthew Brost <matthew.brost@intel.com> --- drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)