Message ID | 20200709195837.4285-1-umesh.nerlige.ramappa@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/perf: Use GTT when saving/restoring engine GPR | expand |
Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37) > MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which > translation to use when saving restoring the engine general purpose > registers to and from the GT scratch. Since GT scratch is mapped to > ggtt, we need to set an additional bit in the command to use GTT. > > Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations") > Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> > --- > drivers/gpu/drm/i915/i915_perf.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index de69d430b1ed..c6f6370283cf 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, > u32 d; > > cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM; > + cmd |= MI_SRM_LRM_GLOBAL_GTT; Indeed. For bonus points, please write a selftest to verify that the user's GPR are restored. Something as simple as live_noa_delay, but instead of measuring the time; submit a request to write telltales into the GPR, submit a request to run noa_wait; then readback the telltales from a final request. Now since the noa_wait is being run from the GGTT, the address space selector should be reading from the GGTT. So I am actually curious as to whether we have a bug or not. -Chris
On 09/07/2020 22:58, Umesh Nerlige Ramappa wrote: > MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which > translation to use when saving restoring the engine general purpose > registers to and from the GT scratch. Since GT scratch is mapped to > ggtt, we need to set an additional bit in the command to use GTT. > > Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations") > Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Thanks a lot! > --- > drivers/gpu/drm/i915/i915_perf.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index de69d430b1ed..c6f6370283cf 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, > u32 d; > > cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM; > + cmd |= MI_SRM_LRM_GLOBAL_GTT; > if (INTEL_GEN(stream->perf->i915) >= 8) > cmd++; >
On 09/07/2020 23:37, Chris Wilson wrote: > Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37) >> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which >> translation to use when saving restoring the engine general purpose >> registers to and from the GT scratch. Since GT scratch is mapped to >> ggtt, we need to set an additional bit in the command to use GTT. >> >> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations") >> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> >> --- >> drivers/gpu/drm/i915/i915_perf.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c >> index de69d430b1ed..c6f6370283cf 100644 >> --- a/drivers/gpu/drm/i915/i915_perf.c >> +++ b/drivers/gpu/drm/i915/i915_perf.c >> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, >> u32 d; >> >> cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM; >> + cmd |= MI_SRM_LRM_GLOBAL_GTT; > Indeed. > > For bonus points, please write a selftest to verify that the user's GPR > are restored. Something as simple as live_noa_delay, but instead of > measuring the time; submit a request to write telltales into the GPR, > submit a request to run noa_wait; then readback the telltales from a > final request. > > Now since the noa_wait is being run from the GGTT, the address space > selector should be reading from the GGTT. So I am actually curious as to > whether we have a bug or not. > -Chris I'm not super competent on the PPGTT setup, but I assumed this worked because we wrote into the ppgtt scratch page. Potentially trashing an app VM space if it executes a reconfiguration. -Lionel
Quoting Lionel Landwerlin (2020-07-09 21:42:39) > On 09/07/2020 23:37, Chris Wilson wrote: > > Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37) > >> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which > >> translation to use when saving restoring the engine general purpose > >> registers to and from the GT scratch. Since GT scratch is mapped to > >> ggtt, we need to set an additional bit in the command to use GTT. > >> > >> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations") > >> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> > >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> > >> --- > >> drivers/gpu/drm/i915/i915_perf.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > >> index de69d430b1ed..c6f6370283cf 100644 > >> --- a/drivers/gpu/drm/i915/i915_perf.c > >> +++ b/drivers/gpu/drm/i915/i915_perf.c > >> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, > >> u32 d; > >> > >> cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM; > >> + cmd |= MI_SRM_LRM_GLOBAL_GTT; > > Indeed. > > > > For bonus points, please write a selftest to verify that the user's GPR > > are restored. Something as simple as live_noa_delay, but instead of > > measuring the time; submit a request to write telltales into the GPR, > > submit a request to run noa_wait; then readback the telltales from a > > final request. > > > > Now since the noa_wait is being run from the GGTT, the address space > > selector should be reading from the GGTT. So I am actually curious as to > > whether we have a bug or not. > > -Chris > > I'm not super competent on the PPGTT setup, but I assumed this worked > because we wrote into the ppgtt scratch page. It's not a STORE_INDEX, it's writing to an absolute address based on the address space selector which is a combination of the batch address space and the command bits. Certainly the batch itself is read from the GGTT, but I can't off hand remember the rules for the various MI (whether they could even access ppGTT when launched from GGTT). -Chris
On 09/07/2020 23:46, Chris Wilson wrote: > Quoting Lionel Landwerlin (2020-07-09 21:42:39) >> On 09/07/2020 23:37, Chris Wilson wrote: >>> Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37) >>>> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which >>>> translation to use when saving restoring the engine general purpose >>>> registers to and from the GT scratch. Since GT scratch is mapped to >>>> ggtt, we need to set an additional bit in the command to use GTT. >>>> >>>> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations") >>>> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> >>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> >>>> --- >>>> drivers/gpu/drm/i915/i915_perf.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c >>>> index de69d430b1ed..c6f6370283cf 100644 >>>> --- a/drivers/gpu/drm/i915/i915_perf.c >>>> +++ b/drivers/gpu/drm/i915/i915_perf.c >>>> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, >>>> u32 d; >>>> >>>> cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM; >>>> + cmd |= MI_SRM_LRM_GLOBAL_GTT; >>> Indeed. >>> >>> For bonus points, please write a selftest to verify that the user's GPR >>> are restored. Something as simple as live_noa_delay, but instead of >>> measuring the time; submit a request to write telltales into the GPR, >>> submit a request to run noa_wait; then readback the telltales from a >>> final request. >>> >>> Now since the noa_wait is being run from the GGTT, the address space >>> selector should be reading from the GGTT. So I am actually curious as to >>> whether we have a bug or not. >>> -Chris >> I'm not super competent on the PPGTT setup, but I assumed this worked >> because we wrote into the ppgtt scratch page. > It's not a STORE_INDEX, it's writing to an absolute address based on the > address space selector which is a combination of the batch address space > and the command bits. Certainly the batch itself is read from the GGTT, > but I can't off hand remember the rules for the various MI (whether they > could even access ppGTT when launched from GGTT). > -Chris My understanding was that from a GGTT batch you could read/write into PPGTT. But not the other way around (obvisously). I thought the kernel mapped a page throughout the entire PPGTT space to prevent pagefaults. Is that not the case? -Lionel
Quoting Lionel Landwerlin (2020-07-09 21:50:30) > On 09/07/2020 23:46, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2020-07-09 21:42:39) > >> On 09/07/2020 23:37, Chris Wilson wrote: > >>> Quoting Umesh Nerlige Ramappa (2020-07-09 20:58:37) > >>>> MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which > >>>> translation to use when saving restoring the engine general purpose > >>>> registers to and from the GT scratch. Since GT scratch is mapped to > >>>> ggtt, we need to set an additional bit in the command to use GTT. > >>>> > >>>> Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations") > >>>> Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> > >>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> > >>>> --- > >>>> drivers/gpu/drm/i915/i915_perf.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > >>>> index de69d430b1ed..c6f6370283cf 100644 > >>>> --- a/drivers/gpu/drm/i915/i915_perf.c > >>>> +++ b/drivers/gpu/drm/i915/i915_perf.c > >>>> @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, > >>>> u32 d; > >>>> > >>>> cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM; > >>>> + cmd |= MI_SRM_LRM_GLOBAL_GTT; > >>> Indeed. > >>> > >>> For bonus points, please write a selftest to verify that the user's GPR > >>> are restored. Something as simple as live_noa_delay, but instead of > >>> measuring the time; submit a request to write telltales into the GPR, > >>> submit a request to run noa_wait; then readback the telltales from a > >>> final request. > >>> > >>> Now since the noa_wait is being run from the GGTT, the address space > >>> selector should be reading from the GGTT. So I am actually curious as to > >>> whether we have a bug or not. > >>> -Chris > >> I'm not super competent on the PPGTT setup, but I assumed this worked > >> because we wrote into the ppgtt scratch page. > > It's not a STORE_INDEX, it's writing to an absolute address based on the > > address space selector which is a combination of the batch address space > > and the command bits. Certainly the batch itself is read from the GGTT, > > but I can't off hand remember the rules for the various MI (whether they > > could even access ppGTT when launched from GGTT). > > -Chris > > My understanding was that from a GGTT batch you could read/write into PPGTT. > > But not the other way around (obvisously). > > I thought the kernel mapped a page throughout the entire PPGTT space to > prevent pagefaults. Is that not the case? Yes, a readonly page, where supported. Ah, now I understand you meant *that* scratch page as opposed to a page we allocated for the purpose of saving per-context state like the gt->scratch page. -Chris
Quoting Lionel Landwerlin (2020-07-09 21:40:50) > On 09/07/2020 22:58, Umesh Nerlige Ramappa wrote: > > MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which > > translation to use when saving restoring the engine general purpose > > registers to and from the GT scratch. Since GT scratch is mapped to > > ggtt, we need to set an additional bit in the command to use GTT. > > > > Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations") > > Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> > > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> > > Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > > > Thanks a lot! Pushed, thanks for the fix. -Chris
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index de69d430b1ed..c6f6370283cf 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1592,6 +1592,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, u32 d; cmd = save ? MI_STORE_REGISTER_MEM : MI_LOAD_REGISTER_MEM; + cmd |= MI_SRM_LRM_GLOBAL_GTT; if (INTEL_GEN(stream->perf->i915) >= 8) cmd++;
MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which translation to use when saving restoring the engine general purpose registers to and from the GT scratch. Since GT scratch is mapped to ggtt, we need to set an additional bit in the command to use GTT. Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations") Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> --- drivers/gpu/drm/i915/i915_perf.c | 1 + 1 file changed, 1 insertion(+)