diff mbox series

drm/i915/selftests: Increase timeout in requests perf selftest

Message ID 20211011175704.28509-1-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/selftests: Increase timeout in requests perf selftest | expand

Commit Message

Matthew Brost Oct. 11, 2021, 5:57 p.m. UTC
perf_parallel_engines is micro benchmark to test i915 request
scheduling. The test creates a thread per physical engine and submits
NOP requests and waits the requests to complete in a loop. In execlists
mode this works perfectly fine as powerful CPU has enough cores to feed
each engine and process the CSBs. With GuC submission the uC gets
overwhelmed as all threads feed into a single CTB channel and the GuC
gets bombarded with CSBs as contexts are immediately switched in and out
on the engines due to the zero runtime of the requests. When the GuC is
overwhelmed scheduling of contexts is unfair due to the nature of the
GuC scheduling algorithm. This behavior is understood and deemed
acceptable as this micro benchmark isn't close to real world use case.
Increasing the timeout of wait period for requests to complete. This
makes the test understand that is ok for contexts to get starved in this
scenario.

A future patch / cleanup may just delete these micro benchmark tests as
they basically mean nothing. We care about real workloads not made up
ones.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

John Harrison Oct. 20, 2021, 8:34 p.m. UTC | #1
On 10/11/2021 10:57, Matthew Brost wrote:
> perf_parallel_engines is micro benchmark to test i915 request
> scheduling. The test creates a thread per physical engine and submits
> NOP requests and waits the requests to complete in a loop. In execlists
> mode this works perfectly fine as powerful CPU has enough cores to feed
> each engine and process the CSBs. With GuC submission the uC gets
> overwhelmed as all threads feed into a single CTB channel and the GuC
> gets bombarded with CSBs as contexts are immediately switched in and out
> on the engines due to the zero runtime of the requests. When the GuC is
> overwhelmed scheduling of contexts is unfair due to the nature of the
> GuC scheduling algorithm. This behavior is understood and deemed
> acceptable as this micro benchmark isn't close to real world use case.
> Increasing the timeout of wait period for requests to complete. This
> makes the test understand that is ok for contexts to get starved in this
> scenario.
>
> A future patch / cleanup may just delete these micro benchmark tests as
> they basically mean nothing. We care about real workloads not made up
> ones.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index d67710d10615..6496671a113c 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -2805,7 +2805,7 @@ static int p_sync0(void *arg)
>   		i915_request_add(rq);
>   
>   		err = 0;
> -		if (i915_request_wait(rq, 0, HZ / 5) < 0)
> +		if (i915_request_wait(rq, 0, HZ) < 0)
>   			err = -ETIME;
>   		i915_request_put(rq);
>   		if (err)
> @@ -2876,7 +2876,7 @@ static int p_sync1(void *arg)
>   		i915_request_add(rq);
>   
>   		err = 0;
> -		if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
> +		if (prev && i915_request_wait(prev, 0, HZ) < 0)
>   			err = -ETIME;
>   		i915_request_put(prev);
>   		prev = rq;
Thomas Hellstrom Oct. 21, 2021, 5:36 a.m. UTC | #2
On Wed, 2021-10-20 at 13:34 -0700, John Harrison wrote:
> On 10/11/2021 10:57, Matthew Brost wrote:
> > perf_parallel_engines is micro benchmark to test i915 request
> > scheduling. The test creates a thread per physical engine and
> > submits
> > NOP requests and waits the requests to complete in a loop. In
> > execlists
> > mode this works perfectly fine as powerful CPU has enough cores to
> > feed
> > each engine and process the CSBs. With GuC submission the uC gets
> > overwhelmed as all threads feed into a single CTB channel and the
> > GuC
> > gets bombarded with CSBs as contexts are immediately switched in
> > and out
> > on the engines due to the zero runtime of the requests. When the
> > GuC is
> > overwhelmed scheduling of contexts is unfair due to the nature of
> > the
> > GuC scheduling algorithm. This behavior is understood and deemed
> > acceptable as this micro benchmark isn't close to real world use
> > case.
> > Increasing the timeout of wait period for requests to complete.
> > This
> > makes the test understand that is ok for contexts to get starved in
> > this
> > scenario.
> > 
> > A future patch / cleanup may just delete these micro benchmark
> > tests as
> > they basically mean nothing. We care about real workloads not made
> > up
> > ones.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

Also
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

I think one important thing to keep in mind here is that this selftest
actually *did* find a flaw, Albeit it upon analysis turned out not to
be significant.

But given that, user-space clients like, for example, gem_exec_suspend
seems to be able to trigger similar delays as well at least to some
extend with a huge amount of small requests submitted from user-space
we shold probably verify at some point that this isn't exploitable by a
malicious client starving other clients on the same system.

/Thomas


> 
> > ---
> >   drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c
> > b/drivers/gpu/drm/i915/selftests/i915_request.c
> > index d67710d10615..6496671a113c 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> > @@ -2805,7 +2805,7 @@ static int p_sync0(void *arg)
> >                 i915_request_add(rq);
> >   
> >                 err = 0;
> > -               if (i915_request_wait(rq, 0, HZ / 5) < 0)
> > +               if (i915_request_wait(rq, 0, HZ) < 0)
> >                         err = -ETIME;
> >                 i915_request_put(rq);
> >                 if (err)
> > @@ -2876,7 +2876,7 @@ static int p_sync1(void *arg)
> >                 i915_request_add(rq);
> >   
> >                 err = 0;
> > -               if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
> > +               if (prev && i915_request_wait(prev, 0, HZ) < 0)
> >                         err = -ETIME;
> >                 i915_request_put(prev);
> >                 prev = rq;
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index d67710d10615..6496671a113c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -2805,7 +2805,7 @@  static int p_sync0(void *arg)
 		i915_request_add(rq);
 
 		err = 0;
-		if (i915_request_wait(rq, 0, HZ / 5) < 0)
+		if (i915_request_wait(rq, 0, HZ) < 0)
 			err = -ETIME;
 		i915_request_put(rq);
 		if (err)
@@ -2876,7 +2876,7 @@  static int p_sync1(void *arg)
 		i915_request_add(rq);
 
 		err = 0;
-		if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
+		if (prev && i915_request_wait(prev, 0, HZ) < 0)
 			err = -ETIME;
 		i915_request_put(prev);
 		prev = rq;