Message ID | 20191113154639.27144-1-lionel.g.landwerlin@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/perf: don't forget noa wait after oa config | expand |
Quoting Lionel Landwerlin (2019-11-13 15:46:39) > I'm observing incoherence metric values, changing from run to run. > > It appears the patches introducing noa wait & reconfiguration from > command stream switched places in the series multiple times during the > review. This lead to the dependency of one onto the order to go > missing... I don't think I dropped it; if I did my apologies. I do feel the egg-on-face for writing a selftest to verify that noa_wait does what you said it did, but completely missing that it went unused :) > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") > --- > drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index 507236bd41ae..31e47ee23357 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, > config_length += num_lri_dwords(oa_config->mux_regs_len); > config_length += num_lri_dwords(oa_config->b_counter_regs_len); > config_length += num_lri_dwords(oa_config->flex_regs_len); > - config_length++; /* MI_BATCH_BUFFER_END */ > + config_length += 3; /* MI_BATCH_BUFFER_START */ > config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); > > obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); > @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, > oa_config->flex_regs, > oa_config->flex_regs_len); > > - *cs++ = MI_BATCH_BUFFER_END; > + /* Jump into the active wait. */ > + *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ? > + MI_BATCH_BUFFER_START : > + MI_BATCH_BUFFER_START_GEN8); > + *cs++ = i915_ggtt_offset(stream->noa_wait); > + *cs++ = 0; Yikes, stream->noa_wait is unused. Hmm, the noa_wait doesn't have any arbitration points internally, so we probably do need to make it non-preemptable as well? With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config, Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> We need to wrap emit_oa_config() in a similar selftest and verify that a read of the oa regs are correct and that the TIMESTAMP indicates the appropriate delay before the read. If you feel bored. -Chris
On 13/11/2019 18:35, Chris Wilson wrote: > Quoting Lionel Landwerlin (2019-11-13 15:46:39) >> I'm observing incoherence metric values, changing from run to run. >> >> It appears the patches introducing noa wait & reconfiguration from >> command stream switched places in the series multiple times during the >> review. This lead to the dependency of one onto the order to go >> missing... > I don't think I dropped it; if I did my apologies. I do feel the > egg-on-face for writing a selftest to verify that noa_wait does what you > said it did, but completely missing that it went unused :) > >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") >> --- >> drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c >> index 507236bd41ae..31e47ee23357 100644 >> --- a/drivers/gpu/drm/i915/i915_perf.c >> +++ b/drivers/gpu/drm/i915/i915_perf.c >> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, >> config_length += num_lri_dwords(oa_config->mux_regs_len); >> config_length += num_lri_dwords(oa_config->b_counter_regs_len); >> config_length += num_lri_dwords(oa_config->flex_regs_len); >> - config_length++; /* MI_BATCH_BUFFER_END */ >> + config_length += 3; /* MI_BATCH_BUFFER_START */ >> config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); >> >> obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); >> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, >> oa_config->flex_regs, >> oa_config->flex_regs_len); >> >> - *cs++ = MI_BATCH_BUFFER_END; >> + /* Jump into the active wait. */ >> + *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ? >> + MI_BATCH_BUFFER_START : >> + MI_BATCH_BUFFER_START_GEN8); >> + *cs++ = i915_ggtt_offset(stream->noa_wait); >> + *cs++ = 0; > Yikes, stream->noa_wait is unused. > > Hmm, the noa_wait doesn't have any arbitration points internally, so we > probably do need to make it non-preemptable as well? > > With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config, > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > > We need to wrap emit_oa_config() in a similar selftest and verify > that a read of the oa regs are correct and that the TIMESTAMP indicates > the appropriate delay before the read. If you feel bored. > -Chris As long as we wait long enough, it should be okay. Why making it nopreempt? -Lionel
Quoting Lionel Landwerlin (2019-11-13 18:07:59) > On 13/11/2019 18:35, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2019-11-13 15:46:39) > >> I'm observing incoherence metric values, changing from run to run. > >> > >> It appears the patches introducing noa wait & reconfiguration from > >> command stream switched places in the series multiple times during the > >> review. This lead to the dependency of one onto the order to go > >> missing... > > I don't think I dropped it; if I did my apologies. I do feel the > > egg-on-face for writing a selftest to verify that noa_wait does what you > > said it did, but completely missing that it went unused :) > > > >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > >> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") > >> --- > >> drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- > >> 1 file changed, 7 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > >> index 507236bd41ae..31e47ee23357 100644 > >> --- a/drivers/gpu/drm/i915/i915_perf.c > >> +++ b/drivers/gpu/drm/i915/i915_perf.c > >> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, > >> config_length += num_lri_dwords(oa_config->mux_regs_len); > >> config_length += num_lri_dwords(oa_config->b_counter_regs_len); > >> config_length += num_lri_dwords(oa_config->flex_regs_len); > >> - config_length++; /* MI_BATCH_BUFFER_END */ > >> + config_length += 3; /* MI_BATCH_BUFFER_START */ > >> config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); > >> > >> obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); > >> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, > >> oa_config->flex_regs, > >> oa_config->flex_regs_len); > >> > >> - *cs++ = MI_BATCH_BUFFER_END; > >> + /* Jump into the active wait. */ > >> + *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ? > >> + MI_BATCH_BUFFER_START : > >> + MI_BATCH_BUFFER_START_GEN8); > >> + *cs++ = i915_ggtt_offset(stream->noa_wait); > >> + *cs++ = 0; > > Yikes, stream->noa_wait is unused. > > > > Hmm, the noa_wait doesn't have any arbitration points internally, so we > > probably do need to make it non-preemptable as well? > > > > With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config, > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > We need to wrap emit_oa_config() in a similar selftest and verify > > that a read of the oa regs are correct and that the TIMESTAMP indicates > > the appropriate delay before the read. If you feel bored. > > -Chris > > > As long as we wait long enough, it should be okay. > > Why making it nopreempt? Aiui, the batch buffer has no arbitration points so the delay may incur the wrath of the forced preemption. That is another request (of higher priority) wishing to run, but not being able to. -Chris
Quoting Chris Wilson (2019-11-13 18:10:22) > Quoting Lionel Landwerlin (2019-11-13 18:07:59) > > On 13/11/2019 18:35, Chris Wilson wrote: > > > Quoting Lionel Landwerlin (2019-11-13 15:46:39) > > >> I'm observing incoherence metric values, changing from run to run. > > >> > > >> It appears the patches introducing noa wait & reconfiguration from > > >> command stream switched places in the series multiple times during the > > >> review. This lead to the dependency of one onto the order to go > > >> missing... > > > I don't think I dropped it; if I did my apologies. I do feel the > > > egg-on-face for writing a selftest to verify that noa_wait does what you > > > said it did, but completely missing that it went unused :) > > > > > >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > > >> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") > > >> --- > > >> drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- > > >> 1 file changed, 7 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > > >> index 507236bd41ae..31e47ee23357 100644 > > >> --- a/drivers/gpu/drm/i915/i915_perf.c > > >> +++ b/drivers/gpu/drm/i915/i915_perf.c > > >> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, > > >> config_length += num_lri_dwords(oa_config->mux_regs_len); > > >> config_length += num_lri_dwords(oa_config->b_counter_regs_len); > > >> config_length += num_lri_dwords(oa_config->flex_regs_len); > > >> - config_length++; /* MI_BATCH_BUFFER_END */ > > >> + config_length += 3; /* MI_BATCH_BUFFER_START */ > > >> config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); > > >> > > >> obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); > > >> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, > > >> oa_config->flex_regs, > > >> oa_config->flex_regs_len); > > >> > > >> - *cs++ = MI_BATCH_BUFFER_END; > > >> + /* Jump into the active wait. */ > > >> + *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ? > > >> + MI_BATCH_BUFFER_START : > > >> + MI_BATCH_BUFFER_START_GEN8); > > >> + *cs++ = i915_ggtt_offset(stream->noa_wait); > > >> + *cs++ = 0; > > > Yikes, stream->noa_wait is unused. > > > > > > Hmm, the noa_wait doesn't have any arbitration points internally, so we > > > probably do need to make it non-preemptable as well? > > > > > > With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config, > > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > We need to wrap emit_oa_config() in a similar selftest and verify > > > that a read of the oa regs are correct and that the TIMESTAMP indicates > > > the appropriate delay before the read. If you feel bored. > > > -Chris > > > > > > As long as we wait long enough, it should be okay. > > > > Why making it nopreempt? > > Aiui, the batch buffer has no arbitration points so the delay may incur > the wrath of the forced preemption. That is another request (of higher > priority) wishing to run, but not being able to. The alternative would be adding a MI_ARB_CHECK at the start of the loop if you happy with being preempted out. -Chris
On 13/11/2019 20:11, Chris Wilson wrote: > Quoting Chris Wilson (2019-11-13 18:10:22) >> Quoting Lionel Landwerlin (2019-11-13 18:07:59) >>> On 13/11/2019 18:35, Chris Wilson wrote: >>>> Quoting Lionel Landwerlin (2019-11-13 15:46:39) >>>>> I'm observing incoherence metric values, changing from run to run. >>>>> >>>>> It appears the patches introducing noa wait & reconfiguration from >>>>> command stream switched places in the series multiple times during the >>>>> review. This lead to the dependency of one onto the order to go >>>>> missing... >>>> I don't think I dropped it; if I did my apologies. I do feel the >>>> egg-on-face for writing a selftest to verify that noa_wait does what you >>>> said it did, but completely missing that it went unused :) It was probably my mistake :) >>>> >>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >>>>> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") >>>>> --- >>>>> drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- >>>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c >>>>> index 507236bd41ae..31e47ee23357 100644 >>>>> --- a/drivers/gpu/drm/i915/i915_perf.c >>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c >>>>> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, >>>>> config_length += num_lri_dwords(oa_config->mux_regs_len); >>>>> config_length += num_lri_dwords(oa_config->b_counter_regs_len); >>>>> config_length += num_lri_dwords(oa_config->flex_regs_len); >>>>> - config_length++; /* MI_BATCH_BUFFER_END */ >>>>> + config_length += 3; /* MI_BATCH_BUFFER_START */ >>>>> config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); >>>>> >>>>> obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); >>>>> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, >>>>> oa_config->flex_regs, >>>>> oa_config->flex_regs_len); >>>>> >>>>> - *cs++ = MI_BATCH_BUFFER_END; >>>>> + /* Jump into the active wait. */ >>>>> + *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ? >>>>> + MI_BATCH_BUFFER_START : >>>>> + MI_BATCH_BUFFER_START_GEN8); >>>>> + *cs++ = i915_ggtt_offset(stream->noa_wait); >>>>> + *cs++ = 0; >>>> Yikes, stream->noa_wait is unused. >>>> >>>> Hmm, the noa_wait doesn't have any arbitration points internally, so we >>>> probably do need to make it non-preemptable as well? >>>> >>>> With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config, >>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> >>>> >>>> We need to wrap emit_oa_config() in a similar selftest and verify >>>> that a read of the oa regs are correct and that the TIMESTAMP indicates >>>> the appropriate delay before the read. If you feel bored. >>>> -Chris >>> >>> As long as we wait long enough, it should be okay. >>> >>> Why making it nopreempt? >> Aiui, the batch buffer has no arbitration points so the delay may incur >> the wrath of the forced preemption. That is another request (of higher >> priority) wishing to run, but not being able to. > The alternative would be adding a MI_ARB_CHECK at the start of the loop > if you happy with being preempted out. > -Chris I guess I'll do that :) -Lionel
Quoting Lionel Landwerlin (2019-11-13 18:33:59) > On 13/11/2019 20:11, Chris Wilson wrote: > > Quoting Chris Wilson (2019-11-13 18:10:22) > >> Quoting Lionel Landwerlin (2019-11-13 18:07:59) > >>> On 13/11/2019 18:35, Chris Wilson wrote: > >>>> Quoting Lionel Landwerlin (2019-11-13 15:46:39) > >>>>> I'm observing incoherence metric values, changing from run to run. > >>>>> > >>>>> It appears the patches introducing noa wait & reconfiguration from > >>>>> command stream switched places in the series multiple times during the > >>>>> review. This lead to the dependency of one onto the order to go > >>>>> missing... > >>>> I don't think I dropped it; if I did my apologies. I do feel the > >>>> egg-on-face for writing a selftest to verify that noa_wait does what you > >>>> said it did, but completely missing that it went unused :) > > > It was probably my mistake :) > > > >>>> > >>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > >>>>> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") > >>>>> --- > >>>>> drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- > >>>>> 1 file changed, 7 insertions(+), 2 deletions(-) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > >>>>> index 507236bd41ae..31e47ee23357 100644 > >>>>> --- a/drivers/gpu/drm/i915/i915_perf.c > >>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c > >>>>> @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, > >>>>> config_length += num_lri_dwords(oa_config->mux_regs_len); > >>>>> config_length += num_lri_dwords(oa_config->b_counter_regs_len); > >>>>> config_length += num_lri_dwords(oa_config->flex_regs_len); > >>>>> - config_length++; /* MI_BATCH_BUFFER_END */ > >>>>> + config_length += 3; /* MI_BATCH_BUFFER_START */ > >>>>> config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); > >>>>> > >>>>> obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); > >>>>> @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, > >>>>> oa_config->flex_regs, > >>>>> oa_config->flex_regs_len); > >>>>> > >>>>> - *cs++ = MI_BATCH_BUFFER_END; > >>>>> + /* Jump into the active wait. */ > >>>>> + *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ? > >>>>> + MI_BATCH_BUFFER_START : > >>>>> + MI_BATCH_BUFFER_START_GEN8); > >>>>> + *cs++ = i915_ggtt_offset(stream->noa_wait); > >>>>> + *cs++ = 0; > >>>> Yikes, stream->noa_wait is unused. > >>>> > >>>> Hmm, the noa_wait doesn't have any arbitration points internally, so we > >>>> probably do need to make it non-preemptable as well? > >>>> > >>>> With a rq->flags |= I915_REQUEST_NOPREEMPT in emit_oa_config, > >>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > >>>> > >>>> We need to wrap emit_oa_config() in a similar selftest and verify > >>>> that a read of the oa regs are correct and that the TIMESTAMP indicates > >>>> the appropriate delay before the read. If you feel bored. > >>>> -Chris > >>> > >>> As long as we wait long enough, it should be okay. > >>> > >>> Why making it nopreempt? > >> Aiui, the batch buffer has no arbitration points so the delay may incur > >> the wrath of the forced preemption. That is another request (of higher > >> priority) wishing to run, but not being able to. > > The alternative would be adding a MI_ARB_CHECK at the start of the loop > > if you happy with being preempted out. > > -Chris > > I guess I'll do that :) Ok, then have an Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> on the understanding you'll look at adding an MI_ARB_CHECK next :) -Chris
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 507236bd41ae..31e47ee23357 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1870,7 +1870,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, config_length += num_lri_dwords(oa_config->mux_regs_len); config_length += num_lri_dwords(oa_config->b_counter_regs_len); config_length += num_lri_dwords(oa_config->flex_regs_len); - config_length++; /* MI_BATCH_BUFFER_END */ + config_length += 3; /* MI_BATCH_BUFFER_START */ config_length = ALIGN(sizeof(u32) * config_length, I915_GTT_PAGE_SIZE); obj = i915_gem_object_create_shmem(stream->perf->i915, config_length); @@ -1895,7 +1895,12 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, oa_config->flex_regs, oa_config->flex_regs_len); - *cs++ = MI_BATCH_BUFFER_END; + /* Jump into the active wait. */ + *cs++ = (INTEL_GEN(stream->perf->i915) < 8 ? + MI_BATCH_BUFFER_START : + MI_BATCH_BUFFER_START_GEN8); + *cs++ = i915_ggtt_offset(stream->noa_wait); + *cs++ = 0; i915_gem_object_flush_map(obj); i915_gem_object_unpin_map(obj);
I'm observing incoherence metric values, changing from run to run. It appears the patches introducing noa wait & reconfiguration from command stream switched places in the series multiple times during the review. This lead to the dependency of one onto the order to go missing... Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream") --- drivers/gpu/drm/i915/i915_perf.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)