Message ID | 20170828085052.31410-1-katarzyna.dec@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2017-08-28 at 10:50 +0200, Katarzyna Dec wrote: > CI is observing sporadical failures in pm_rps subtests. > There are a couple of reasons. One of them is the fact that > on gen6, gen7 and gen7.5, max frequency (as in the HW limit) > is not set to RP0, but the value obtaind from PCODE (which > may be different from RP0). Thus the test is operating under > wrong assumptions (SOFTMAX == RP0 == BOOST which is simply > not the case). Let's compare current frequency with BOOST > frequency rather than SOFTMAX to get the test behaviour under control. > In boost_freq function I set MAX freq to midium freqency, which ensures > that we for sure reach BOOST frequency. This could help with failures > with boost frequency failing to drop down. > GPU reset needs to be modified so we are not dependent on kernel's low > priority retire worker. Reset method was replaced by igt_force_gpu_reset() > and in reset testcase we make sure that we can recover from hang. > > v2: Commit message, simplified waiting for boost to finish, drop > noisy whitespace cleanup. > > v3: Removed reading from i915_rps_boost_info debugfs because it not > the same on every kernel. Removed function waiting for boost. > Instead of that I made sure we will reach in boost by setting MAX freq to > fmid. > > v4: Moved proposal with making test drm master to other patch > > v5: Used igt_force_gpu_reset() to reset GPU. Modified "reset" testcase. > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Jeff Mcgee <jeff.mcgee@intel.com> > Cc: Petri Latvala <petri.latvala@intel.com> > Cc: Jani Saarinen <jani.saarinen@intel.com> > Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> > Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com> > --- > tests/pm_rps.c | 63 +++++++++++++++++++++++++++++++++++-------------------- > --- > 1 file changed, 38 insertions(+), 25 deletions(-) > > diff --git a/tests/pm_rps.c b/tests/pm_rps.c > index f0455e78..a6c6f1eb 100644 > --- a/tests/pm_rps.c > +++ b/tests/pm_rps.c > @@ -50,6 +50,7 @@ enum { > RP0, > RP1, > RPn, > + BOOST, > NUMFREQ > }; > > @@ -60,7 +61,7 @@ struct junk { > const char *mode; > FILE *filp; > } stuff[] = { > - { "cur", "r", NULL }, { "min", "rb+", NULL }, { "max", "rb+", NULL }, > { "RP0", "r", NULL }, { "RP1", "r", NULL }, { "RPn", "r", NULL }, { NULL, > NULL, NULL } > + { "cur", "r", NULL }, { "min", "rb+", NULL }, { "max", "rb+", NULL }, > { "RP0", "r", NULL }, { "RP1", "r", NULL }, { "RPn", "r", NULL }, { "boost", > "rb+", NULL }, { NULL, NULL, NULL } > }; > > static int readval(FILE *filp) > @@ -167,7 +168,7 @@ static void dump(const int *freqs) > } > > enum load { > - LOW, > + LOW = 0, > HIGH > }; > > @@ -185,9 +186,10 @@ static struct load_helper { > > static void load_helper_signal_handler(int sig) > { > - if (sig == SIGUSR2) > - lh.load = lh.load == LOW ? HIGH : LOW; > - else > + if (sig == SIGUSR2) { > + lh.load = !lh.load; > + igt_debug("Switching background load to %s\n", lh.load ? > "high" : "low"); > + } else > lh.exit = true; > } > > @@ -238,6 +240,7 @@ static void load_helper_run(enum load load) > return; > } > > + lh.exit = false; > lh.load = load; > > igt_fork_helper(&lh.igt_proc) { > @@ -263,6 +266,8 @@ static void load_helper_run(enum load load) > if (intel_gen(lh.devid) >= 6) > execbuf.flags = I915_EXEC_BLT; > > + igt_debug("Applying %s load...\n", lh.load ? "high" : "low"); > + > while (!lh.exit) { > memset(&object, 0, sizeof(object)); > object.handle = fences[val%3]; > @@ -296,6 +301,8 @@ static void load_helper_run(enum load load) > gem_close(drm_fd, fences[0]); > gem_close(drm_fd, fences[1]); > gem_close(drm_fd, fences[2]); > + > + igt_drop_caches_set(drm_fd, DROP_RETIRE); > } > } > > @@ -553,38 +560,43 @@ static void stabilize_check(int *out) > igt_debug("Waited %d msec to stabilize cur\n", wait); > } > > -static void reset_gpu(void) > -{ > - int fd = drm_open_driver(DRIVER_INTEL); > - igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT)); > - close(fd); > -} > - > static void boost_freq(int fd, int *boost_freqs) > { > int64_t timeout = 1; > - int ring = -1; > igt_spin_t *load; > + unsigned int engine; > + int fmid = (origfreqs[RPn] + origfreqs[RP0]) / 2; > > - load = igt_spin_batch_new(fd, ring, 0); > + fmid = get_hw_rounded_freq(fmid); > + //set max freq to less then boost freq Looks good to me. Just one very minor comment - we do use two different comment styles, should we instead use one? Do you think we should add any simple test descriptions above the tests or these tests are easy to understand? > + writeval(stuff[MAX].filp, fmid); > > + /* put boost on the same engine as low load */ > + engine = I915_EXEC_RENDER; Otherwise Reviewed-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com
On Tue, Aug 29, 2017 at 07:43:20AM +0000, Szwichtenberg, Radoslaw wrote: > On Mon, 2017-08-28 at 10:50 +0200, Katarzyna Dec wrote: > > CI is observing sporadical failures in pm_rps subtests. > > There are a couple of reasons. One of them is the fact that > > on gen6, gen7 and gen7.5, max frequency (as in the HW limit) > > is not set to RP0, but the value obtaind from PCODE (which > > may be different from RP0). Thus the test is operating under > > wrong assumptions (SOFTMAX == RP0 == BOOST which is simply > > not the case). Let's compare current frequency with BOOST > > frequency rather than SOFTMAX to get the test behaviour under control. > > In boost_freq function I set MAX freq to midium freqency, which ensures > > that we for sure reach BOOST frequency. This could help with failures > > with boost frequency failing to drop down. > > GPU reset needs to be modified so we are not dependent on kernel's low > > priority retire worker. Reset method was replaced by igt_force_gpu_reset() > > and in reset testcase we make sure that we can recover from hang. > > > > v2: Commit message, simplified waiting for boost to finish, drop > > noisy whitespace cleanup. > > > > v3: Removed reading from i915_rps_boost_info debugfs because it not > > the same on every kernel. Removed function waiting for boost. > > Instead of that I made sure we will reach in boost by setting MAX freq to > > fmid. > > > > v4: Moved proposal with making test drm master to other patch > > > > v5: Used igt_force_gpu_reset() to reset GPU. Modified "reset" testcase. > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Jeff Mcgee <jeff.mcgee@intel.com> > > Cc: Petri Latvala <petri.latvala@intel.com> > > Cc: Jani Saarinen <jani.saarinen@intel.com> > > Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> > > Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com> > > --- > > tests/pm_rps.c | 63 +++++++++++++++++++++++++++++++++++-------------------- > > --- > > 1 file changed, 38 insertions(+), 25 deletions(-) > > > > diff --git a/tests/pm_rps.c b/tests/pm_rps.c > > index f0455e78..a6c6f1eb 100644 > > --- a/tests/pm_rps.c > > +++ b/tests/pm_rps.c > > @@ -50,6 +50,7 @@ enum { > > RP0, > > RP1, > > RPn, > > + BOOST, > > NUMFREQ > > }; > > > > @@ -60,7 +61,7 @@ struct junk { > > const char *mode; > > FILE *filp; > > } stuff[] = { > > - { "cur", "r", NULL }, { "min", "rb+", NULL }, { "max", "rb+", NULL }, > > { "RP0", "r", NULL }, { "RP1", "r", NULL }, { "RPn", "r", NULL }, { NULL, > > NULL, NULL } > > + { "cur", "r", NULL }, { "min", "rb+", NULL }, { "max", "rb+", NULL }, > > { "RP0", "r", NULL }, { "RP1", "r", NULL }, { "RPn", "r", NULL }, { "boost", > > "rb+", NULL }, { NULL, NULL, NULL } > > }; > > > > static int readval(FILE *filp) > > @@ -167,7 +168,7 @@ static void dump(const int *freqs) > > } > > > > enum load { > > - LOW, > > + LOW = 0, > > HIGH > > }; > > > > @@ -185,9 +186,10 @@ static struct load_helper { > > > > static void load_helper_signal_handler(int sig) > > { > > - if (sig == SIGUSR2) > > - lh.load = lh.load == LOW ? HIGH : LOW; > > - else > > + if (sig == SIGUSR2) { > > + lh.load = !lh.load; > > + igt_debug("Switching background load to %s\n", lh.load ? > > "high" : "low"); > > + } else > > lh.exit = true; > > } > > > > @@ -238,6 +240,7 @@ static void load_helper_run(enum load load) > > return; > > } > > > > + lh.exit = false; > > lh.load = load; > > > > igt_fork_helper(&lh.igt_proc) { > > @@ -263,6 +266,8 @@ static void load_helper_run(enum load load) > > if (intel_gen(lh.devid) >= 6) > > execbuf.flags = I915_EXEC_BLT; > > > > + igt_debug("Applying %s load...\n", lh.load ? "high" : "low"); > > + > > while (!lh.exit) { > > memset(&object, 0, sizeof(object)); > > object.handle = fences[val%3]; > > @@ -296,6 +301,8 @@ static void load_helper_run(enum load load) > > gem_close(drm_fd, fences[0]); > > gem_close(drm_fd, fences[1]); > > gem_close(drm_fd, fences[2]); > > + > > + igt_drop_caches_set(drm_fd, DROP_RETIRE); > > } > > } > > > > @@ -553,38 +560,43 @@ static void stabilize_check(int *out) > > igt_debug("Waited %d msec to stabilize cur\n", wait); > > } > > > > -static void reset_gpu(void) > > -{ > > - int fd = drm_open_driver(DRIVER_INTEL); > > - igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT)); > > - close(fd); > > -} > > - > > static void boost_freq(int fd, int *boost_freqs) > > { > > int64_t timeout = 1; > > - int ring = -1; > > igt_spin_t *load; > > + unsigned int engine; > > + int fmid = (origfreqs[RPn] + origfreqs[RP0]) / 2; > > > > - load = igt_spin_batch_new(fd, ring, 0); > > + fmid = get_hw_rounded_freq(fmid); > > + //set max freq to less then boost freq > Looks good to me. > Just one very minor comment - we do use two different comment styles, should we > instead use one? Do you think we should add any simple test descriptions above > the tests or these tests are easy to understand? We try to follow the kernel's CodingStyle, see https://01.org/linuxgraphics/gfx-docs/drm/process/coding-style.html This means /* */ C style comments everywhere. Please fix. > > + writeval(stuff[MAX].filp, fmid); > > > > + /* put boost on the same engine as low load */ > > + engine = I915_EXEC_RENDER; > > Otherwise > Reviewed-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com When you resubmit, plus include Radek's r-b tag. Thanks, Daniel
diff --git a/tests/pm_rps.c b/tests/pm_rps.c index f0455e78..a6c6f1eb 100644 --- a/tests/pm_rps.c +++ b/tests/pm_rps.c @@ -50,6 +50,7 @@ enum { RP0, RP1, RPn, + BOOST, NUMFREQ }; @@ -60,7 +61,7 @@ struct junk { const char *mode; FILE *filp; } stuff[] = { - { "cur", "r", NULL }, { "min", "rb+", NULL }, { "max", "rb+", NULL }, { "RP0", "r", NULL }, { "RP1", "r", NULL }, { "RPn", "r", NULL }, { NULL, NULL, NULL } + { "cur", "r", NULL }, { "min", "rb+", NULL }, { "max", "rb+", NULL }, { "RP0", "r", NULL }, { "RP1", "r", NULL }, { "RPn", "r", NULL }, { "boost", "rb+", NULL }, { NULL, NULL, NULL } }; static int readval(FILE *filp) @@ -167,7 +168,7 @@ static void dump(const int *freqs) } enum load { - LOW, + LOW = 0, HIGH }; @@ -185,9 +186,10 @@ static struct load_helper { static void load_helper_signal_handler(int sig) { - if (sig == SIGUSR2) - lh.load = lh.load == LOW ? HIGH : LOW; - else + if (sig == SIGUSR2) { + lh.load = !lh.load; + igt_debug("Switching background load to %s\n", lh.load ? "high" : "low"); + } else lh.exit = true; } @@ -238,6 +240,7 @@ static void load_helper_run(enum load load) return; } + lh.exit = false; lh.load = load; igt_fork_helper(&lh.igt_proc) { @@ -263,6 +266,8 @@ static void load_helper_run(enum load load) if (intel_gen(lh.devid) >= 6) execbuf.flags = I915_EXEC_BLT; + igt_debug("Applying %s load...\n", lh.load ? "high" : "low"); + while (!lh.exit) { memset(&object, 0, sizeof(object)); object.handle = fences[val%3]; @@ -296,6 +301,8 @@ static void load_helper_run(enum load load) gem_close(drm_fd, fences[0]); gem_close(drm_fd, fences[1]); gem_close(drm_fd, fences[2]); + + igt_drop_caches_set(drm_fd, DROP_RETIRE); } } @@ -553,38 +560,43 @@ static void stabilize_check(int *out) igt_debug("Waited %d msec to stabilize cur\n", wait); } -static void reset_gpu(void) -{ - int fd = drm_open_driver(DRIVER_INTEL); - igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT)); - close(fd); -} - static void boost_freq(int fd, int *boost_freqs) { int64_t timeout = 1; - int ring = -1; igt_spin_t *load; + unsigned int engine; + int fmid = (origfreqs[RPn] + origfreqs[RP0]) / 2; - load = igt_spin_batch_new(fd, ring, 0); + fmid = get_hw_rounded_freq(fmid); + //set max freq to less then boost freq + writeval(stuff[MAX].filp, fmid); + /* put boost on the same engine as low load */ + engine = I915_EXEC_RENDER; + if (intel_gen(lh.devid) >= 6) + engine = I915_EXEC_BLT; + load = igt_spin_batch_new(fd, engine, 0); /* Waiting will grant us a boost to maximum */ gem_wait(fd, load->handle, &timeout); read_freqs(boost_freqs); dump(boost_freqs); + /* Avoid downlocking till boost request is pending */ + igt_spin_batch_end(load); + gem_sync(fd, load->handle); igt_spin_batch_free(fd, load); + + //set max freq to original softmax + writeval(stuff[MAX].filp, origfreqs[MAX]); } -static void waitboost(bool reset) +static void waitboost(int fd, bool reset) { int pre_freqs[NUMFREQ]; int boost_freqs[NUMFREQ]; int post_freqs[NUMFREQ]; - int fd = drm_open_driver(DRIVER_INTEL); - load_helper_run(LOW); igt_debug("Apply low load...\n"); @@ -593,7 +605,7 @@ static void waitboost(bool reset) if (reset) { igt_debug("Reset gpu...\n"); - reset_gpu(); + igt_force_gpu_reset(fd); sleep(1); } @@ -611,10 +623,9 @@ static void waitboost(bool reset) idle_check(); igt_assert_lt(pre_freqs[CUR], pre_freqs[MAX]); - igt_assert_eq(boost_freqs[CUR], boost_freqs[MAX]); + igt_assert_eq(boost_freqs[CUR], boost_freqs[BOOST]); igt_assert_lt(post_freqs[CUR], post_freqs[MAX]); - close(fd); } static void pm_rps_exit_handler(int sig) @@ -657,7 +668,7 @@ igt_main val = readval(junk->filp); igt_assert(val >= 0); junk++; - } while(junk->name != NULL); + } while (junk->name != NULL); read_freqs(origfreqs); @@ -679,9 +690,11 @@ igt_main } igt_subtest("waitboost") - waitboost(false); - - igt_subtest("reset") - waitboost(true); + waitboost(drm_fd, false); + igt_subtest("reset") { + igt_hang_t hang = igt_allow_hang(drm_fd, 0, 0); + waitboost(drm_fd, true); + igt_disallow_hang(drm_fd, hang); + } }
CI is observing sporadical failures in pm_rps subtests. There are a couple of reasons. One of them is the fact that on gen6, gen7 and gen7.5, max frequency (as in the HW limit) is not set to RP0, but the value obtaind from PCODE (which may be different from RP0). Thus the test is operating under wrong assumptions (SOFTMAX == RP0 == BOOST which is simply not the case). Let's compare current frequency with BOOST frequency rather than SOFTMAX to get the test behaviour under control. In boost_freq function I set MAX freq to midium freqency, which ensures that we for sure reach BOOST frequency. This could help with failures with boost frequency failing to drop down. GPU reset needs to be modified so we are not dependent on kernel's low priority retire worker. Reset method was replaced by igt_force_gpu_reset() and in reset testcase we make sure that we can recover from hang. v2: Commit message, simplified waiting for boost to finish, drop noisy whitespace cleanup. v3: Removed reading from i915_rps_boost_info debugfs because it not the same on every kernel. Removed function waiting for boost. Instead of that I made sure we will reach in boost by setting MAX freq to fmid. v4: Moved proposal with making test drm master to other patch v5: Used igt_force_gpu_reset() to reset GPU. Modified "reset" testcase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jeff Mcgee <jeff.mcgee@intel.com> Cc: Petri Latvala <petri.latvala@intel.com> Cc: Jani Saarinen <jani.saarinen@intel.com> Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com> --- tests/pm_rps.c | 63 +++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 38 insertions(+), 25 deletions(-)