diff mbox

drm/i915: Allow null render state batchbuffers bigger than one page

Message ID 1493370666-14461-1-git-send-email-oscar.mateo@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

oscar.mateo@intel.com April 28, 2017, 9:11 a.m. UTC
The new batchbuffer for CNL surpasses the 4096 byte mark.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_render_state.c | 40 +++++++++++++++-------------
 1 file changed, 21 insertions(+), 19 deletions(-)

Comments

Chris Wilson April 28, 2017, 4:53 p.m. UTC | #1
On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
> The new batchbuffer for CNL surpasses the 4096 byte mark.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

Evil, 4k+ of nothing-ness that userspace then has to configure for itself
for correctness anyway.

Patch looks ok, but still question the sanity.
-Chris
Mika Kuoppala May 2, 2017, 9:17 a.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>> The new batchbuffer for CNL surpasses the 4096 byte mark.
>> 
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> Cc: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
>
> Evil, 4k+ of nothing-ness that userspace then has to configure for itself
> for correctness anyway.
>
> Patch looks ok, but still question the sanity.

Is there a requirement for CNL to init the renderstate?

I would like to drop the render state init from CNL if
we can't find evidence that it needs it. Bspec indicates
that it doesnt.

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
oscar.mateo@intel.com May 2, 2017, 9:31 a.m. UTC | #3
On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
>> On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>> The new batchbuffer for CNL surpasses the 4096 byte mark.
>>>
>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>> Cc: Ben Widawsky <ben@bwidawsk.net>
>>> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
>> Evil, 4k+ of nothing-ness that userspace then has to configure for itself
>> for correctness anyway.
>>
>> Patch looks ok, but still question the sanity.
> Is there a requirement for CNL to init the renderstate?
>
> I would like to drop the render state init from CNL if
> we can't find evidence that it needs it. Bspec indicates
> that it doesnt.
>
> -Mika

Hi Mika,

I can double-check with the hardware architects, but word around here is 
that render state init has never stopped being a requirement. Where did 
you see in the BSpec that it is not required for CNL?

Thanks
Mika Kuoppala May 3, 2017, 8:52 a.m. UTC | #4
Oscar Mateo <oscar.mateo@intel.com> writes:

> On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>>> On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>>> The new batchbuffer for CNL surpasses the 4096 byte mark.
>>>>
>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>> Cc: Ben Widawsky <ben@bwidawsk.net>
>>>> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
>>> Evil, 4k+ of nothing-ness that userspace then has to configure for itself
>>> for correctness anyway.
>>>
>>> Patch looks ok, but still question the sanity.
>> Is there a requirement for CNL to init the renderstate?
>>
>> I would like to drop the render state init from CNL if
>> we can't find evidence that it needs it. Bspec indicates
>> that it doesnt.
>>
>> -Mika
>
> Hi Mika,
>
> I can double-check with the hardware architects, but word around here is 
> that render state init has never stopped being a requirement. Where did 
> you see in the BSpec that it is not required for CNL?
>

It would be great if you could refresh the answer and perhaps
even get some answers to the 'why' parts.

In the "Context Descriptor Format" section, it says:
"Render CS Only: Render state need not be initialized; the Render
Context Restore Inhibit bit in the Context/Save image in memory should
be set to prevent restoring garbage render context."

-Mika
oscar.mateo@intel.com May 3, 2017, 9:12 a.m. UTC | #5
On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
> Oscar Mateo <oscar.mateo@intel.com> writes:
>
>> On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
>>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>>
>>>> On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>>>> The new batchbuffer for CNL surpasses the 4096 byte mark.
>>>>>
>>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>> Cc: Ben Widawsky <ben@bwidawsk.net>
>>>>> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
>>>> Evil, 4k+ of nothing-ness that userspace then has to configure for itself
>>>> for correctness anyway.
>>>>
>>>> Patch looks ok, but still question the sanity.
>>> Is there a requirement for CNL to init the renderstate?
>>>
>>> I would like to drop the render state init from CNL if
>>> we can't find evidence that it needs it. Bspec indicates
>>> that it doesnt.
>>>
>>> -Mika
>> Hi Mika,
>>
>> I can double-check with the hardware architects, but word around here is
>> that render state init has never stopped being a requirement. Where did
>> you see in the BSpec that it is not required for CNL?
>>
> It would be great if you could refresh the answer and perhaps
> even get some answers to the 'why' parts.
>
> In the "Context Descriptor Format" section, it says:
> "Render CS Only: Render state need not be initialized; the Render
> Context Restore Inhibit bit in the Context/Save image in memory should
> be set to prevent restoring garbage render context."
>
> -Mika

:_(

The same section also says:

“See the Logical Ring Context Format section for details.”

And then “Logical Ring Context Format” section goes on to say:

“It is tedious for software to populate the engine context as per the 
requirements, it is recommended to implicitly use engine to populate 
this portion of the context. […] Software must program all the state 
required to initialize the engine in the ring buffer which would 
initialize the hardware state.”

I’ll try to clarify it...
Chris Wilson May 3, 2017, 4:31 p.m. UTC | #6
On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
>    On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
> 
>  Oscar Mateo [1]<oscar.mateo@intel.com> writes:
> 
> 
>  On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
> 
>  Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
> 
> 
>  On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
> 
>  The new batchbuffer for CNL surpasses the 4096 byte mark.
> 
>  Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
>  Cc: Ben Widawsky [4]<ben@bwidawsk.net>
>  Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
> 
>  Evil, 4k+ of nothing-ness that userspace then has to configure for itself
>  for correctness anyway.
> 
>  Patch looks ok, but still question the sanity.
> 
>  Is there a requirement for CNL to init the renderstate?
> 
>  I would like to drop the render state init from CNL if
>  we can't find evidence that it needs it. Bspec indicates
>  that it doesnt.
> 
>  -Mika
> 
>  Hi Mika,
> 
>  I can double-check with the hardware architects, but word around here is
>  that render state init has never stopped being a requirement. Where did
>  you see in the BSpec that it is not required for CNL?
> 
> 
>  It would be great if you could refresh the answer and perhaps
>  even get some answers to the 'why' parts.
> 
>  In the "Context Descriptor Format" section, it says:
>  "Render CS Only: Render state need not be initialized; the Render
>  Context Restore Inhibit bit in the Context/Save image in memory should
>  be set to prevent restoring garbage render context."
> 
>  -Mika
> 
>    :_(
> 
>    The same section also says:
> 
>     
> 
>    â**See the Logical Ring Context Format section for details.â**
> 
>     
> 
>    And then â**Logical Ring Context Formatâ** section goes on to say:
> 
>     
> 
>    â**It is tedious for software to populate the engine context as per the
>    requirements, it is recommended to implicitly use engine to populate this
>    portion of the context. [â*¦] Software must program all the state required
>    to initialize the engine in the ring buffer which would initialize the
>    hardware state.â**

Yet what the kernel programs is completely garbage for the user, so the
user still has to program the initial GPU state to their own
specifications. Just say no to policy in the kernel. We need a stronger
reason than this, and if that was the only reason the original render
state was merged, I am very angry.
-Chris
Rodrigo Vivi July 13, 2017, 10:28 p.m. UTC | #7
On Wed, May 3, 2017 at 9:31 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
>>    On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
>>
>>  Oscar Mateo [1]<oscar.mateo@intel.com> writes:
>>
>>
>>  On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
>>
>>  Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
>>
>>
>>  On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>
>>  The new batchbuffer for CNL surpasses the 4096 byte mark.
>>
>>  Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
>>  Cc: Ben Widawsky [4]<ben@bwidawsk.net>
>>  Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
>>
>>  Evil, 4k+ of nothing-ness that userspace then has to configure for itself
>>  for correctness anyway.
>>
>>  Patch looks ok, but still question the sanity.
>>
>>  Is there a requirement for CNL to init the renderstate?
>>
>>  I would like to drop the render state init from CNL if
>>  we can't find evidence that it needs it. Bspec indicates
>>  that it doesnt.

I'd like to drop as well, and I was hearing people around telling we
didn't need anymore,
however without this during power on I had bad failures...

>>
>>  -Mika
>>
>>  Hi Mika,
>>
>>  I can double-check with the hardware architects, but word around here is
>>  that render state init has never stopped being a requirement. Where did
>>  you see in the BSpec that it is not required for CNL?
>>
>>
>>  It would be great if you could refresh the answer and perhaps
>>  even get some answers to the 'why' parts.
>>
>>  In the "Context Descriptor Format" section, it says:
>>  "Render CS Only: Render state need not be initialized; the Render
>>  Context Restore Inhibit bit in the Context/Save image in memory should
>>  be set to prevent restoring garbage render context."
>>
>>  -Mika
>>
>>    :_(
>>
>>    The same section also says:
>>
>>    Â
>>
>>    â**See the Logical Ring Context Format section for details.â**
>>
>>    Â
>>
>>    And then â**Logical Ring Context Formatâ** section goes on to say:
>>
>>    Â
>>
>>    â**It is tedious for software to populate the engine context as per the
>>    requirements, it is recommended to implicitly use engine to populate this
>>    portion of the context. [â*¦] Software must program all the state required
>>    to initialize the engine in the ring buffer which would initialize the
>>    hardware state.â**
>
> Yet what the kernel programs is completely garbage for the user, so the
> user still has to program the initial GPU state to their own
> specifications. Just say no to policy in the kernel. We need a stronger
> reason than this, and if that was the only reason the original render
> state was merged, I am very angry.

so...  based on what I saw we need this,
I agree the justification is not good because I could never actually
understand or make any sense out of this golden context....
But we need a solution to this impasse, to be able to move forward...

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
oscar.mateo@intel.com July 14, 2017, 2:52 p.m. UTC | #8
On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
>>>     On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
>>>
>>>   Oscar Mateo [1]<oscar.mateo@intel.com> writes:
>>>
>>>
>>>   On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
>>>
>>>   Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
>>>
>>>
>>>   On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>>
>>>   The new batchbuffer for CNL surpasses the 4096 byte mark.
>>>
>>>   Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
>>>   Cc: Ben Widawsky [4]<ben@bwidawsk.net>
>>>   Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
>>>
>>>   Evil, 4k+ of nothing-ness that userspace then has to configure for itself
>>>   for correctness anyway.
>>>
>>>   Patch looks ok, but still question the sanity.
>>>
>>>   Is there a requirement for CNL to init the renderstate?
>>>
>>>   I would like to drop the render state init from CNL if
>>>   we can't find evidence that it needs it. Bspec indicates
>>>   that it doesnt.
> I'd like to drop as well, and I was hearing people around telling we
> didn't need anymore,
> however without this during power on I had bad failures...
>

The best I could get from architecture (+Raf) is that setting valid and 
coherent values for the whole render state is required as soon as the 
context is created, no matter who does it. If you see failures when the 
KMD does not do it, that means the UMD must be missing something, right?

>>>   -Mika
>>>
>>>   Hi Mika,
>>>
>>>   I can double-check with the hardware architects, but word around here is
>>>   that render state init has never stopped being a requirement. Where did
>>>   you see in the BSpec that it is not required for CNL?
>>>
>>>
>>>   It would be great if you could refresh the answer and perhaps
>>>   even get some answers to the 'why' parts.
>>>
>>>   In the "Context Descriptor Format" section, it says:
>>>   "Render CS Only: Render state need not be initialized; the Render
>>>   Context Restore Inhibit bit in the Context/Save image in memory should
>>>   be set to prevent restoring garbage render context."
>>>
>>>   -Mika
>>>
>>>     :_(
>>>
>>>     The same section also says:
>>>
>>>     Â
>>>
>>>     â**See the Logical Ring Context Format section for details.â**
>>>
>>>     Â
>>>
>>>     And then â**Logical Ring Context Formatâ** section goes on to say:
>>>
>>>     Â
>>>
>>>     â**It is tedious for software to populate the engine context as per the
>>>     requirements, it is recommended to implicitly use engine to populate this
>>>     portion of the context. [â*¦] Software must program all the state required
>>>     to initialize the engine in the ring buffer which would initialize the
>>>     hardware state.â**
>> Yet what the kernel programs is completely garbage for the user, so the
>> user still has to program the initial GPU state to their own
>> specifications. Just say no to policy in the kernel. We need a stronger
>> reason than this, and if that was the only reason the original render
>> state was merged, I am very angry.
> so...  based on what I saw we need this,
> I agree the justification is not good because I could never actually
> understand or make any sense out of this golden context....
> But we need a solution to this impasse, to be able to move forward...
>
>> -Chris
>>
>> --
>> Chris Wilson, Intel Open Source Technology Centre
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
>
>
Chris Wilson July 14, 2017, 3:08 p.m. UTC | #9
Quoting Oscar Mateo (2017-07-14 15:52:59)
> 
> 
> 
> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
> > On Wed, May 3, 2017 at 9:31 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
> >>>     On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
> >>>
> >>>   Oscar Mateo [1]<oscar.mateo@intel.com> writes:
> >>>
> >>>
> >>>   On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
> >>>
> >>>   Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
> >>>
> >>>
> >>>   On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
> >>>
> >>>   The new batchbuffer for CNL surpasses the 4096 byte mark.
> >>>
> >>>   Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
> >>>   Cc: Ben Widawsky [4]<ben@bwidawsk.net>
> >>>   Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
> >>>
> >>>   Evil, 4k+ of nothing-ness that userspace then has to configure for itself
> >>>   for correctness anyway.
> >>>
> >>>   Patch looks ok, but still question the sanity.
> >>>
> >>>   Is there a requirement for CNL to init the renderstate?
> >>>
> >>>   I would like to drop the render state init from CNL if
> >>>   we can't find evidence that it needs it. Bspec indicates
> >>>   that it doesnt.
> > I'd like to drop as well, and I was hearing people around telling we
> > didn't need anymore,
> > however without this during power on I had bad failures...
> >
> 
> The best I could get from architecture (+Raf) is that setting valid and 
> coherent values for the whole render state is required as soon as the 
> context is created, no matter who does it. If you see failures when the 
> KMD does not do it, that means the UMD must be missing something, right?

That is my initial response as well. The kernel does load one context,
just so that the hardware always has space to write to on power saving.
The only batch executed for it is the golden render state. Easy enough
to only initialise that kernel context to isolate whether it is
self-inflicted or that userspace overlooked something in its state
management. (I have the view that even if userspace doesn't think it
needs to use a particular bit of state today, tomorrow it will so will
need it anyway!)
-Chris
>
oscar.mateo@intel.com July 18, 2017, 3:15 p.m. UTC | #10
On 07/14/2017 08:08 AM, Chris Wilson wrote:
> Quoting Oscar Mateo (2017-07-14 15:52:59)
>>
>>
>> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
>>> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>>>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
>>>>>      On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
>>>>>
>>>>>    Oscar Mateo [1]<oscar.mateo@intel.com> writes:
>>>>>
>>>>>
>>>>>    On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
>>>>>
>>>>>    Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
>>>>>
>>>>>
>>>>>    On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>>>>
>>>>>    The new batchbuffer for CNL surpasses the 4096 byte mark.
>>>>>
>>>>>    Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
>>>>>    Cc: Ben Widawsky [4]<ben@bwidawsk.net>
>>>>>    Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
>>>>>
>>>>>    Evil, 4k+ of nothing-ness that userspace then has to configure for itself
>>>>>    for correctness anyway.
>>>>>
>>>>>    Patch looks ok, but still question the sanity.
>>>>>
>>>>>    Is there a requirement for CNL to init the renderstate?
>>>>>
>>>>>    I would like to drop the render state init from CNL if
>>>>>    we can't find evidence that it needs it. Bspec indicates
>>>>>    that it doesnt.
>>> I'd like to drop as well, and I was hearing people around telling we
>>> didn't need anymore,
>>> however without this during power on I had bad failures...
>>>
>> The best I could get from architecture (+Raf) is that setting valid and
>> coherent values for the whole render state is required as soon as the
>> context is created, no matter who does it. If you see failures when the
>> KMD does not do it, that means the UMD must be missing something, right?
> That is my initial response as well. The kernel does load one context,
> just so that the hardware always has space to write to on power saving.
> The only batch executed for it is the golden render state. Easy enough
> to only initialise that kernel context to isolate whether it is
> self-inflicted or that userspace overlooked something in its state
> management. (I have the view that even if userspace doesn't think it
> needs to use a particular bit of state today, tomorrow it will so will
> need it anyway!)
> -Chris

Rodrigo, you have access to a CNL: can you make this test? The idea is 
to find out if the root cause for the failures you were seeing is the 
kernel default context or in the UMD-created contexts.

Thanks,
Oscar
Rodrigo Vivi Aug. 24, 2017, 12:01 a.m. UTC | #11
On Tue, Jul 18, 2017 at 8:15 AM, Oscar Mateo <oscar.mateo@intel.com> wrote:
>
>
>
> On 07/14/2017 08:08 AM, Chris Wilson wrote:
>>
>> Quoting Oscar Mateo (2017-07-14 15:52:59)
>>>
>>>
>>>
>>> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
>>>>
>>>> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson <chris@chris-wilson.co.uk>
>>>> wrote:
>>>>>
>>>>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
>>>>>>
>>>>>>      On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
>>>>>>
>>>>>>    Oscar Mateo [1]<oscar.mateo@intel.com> writes:
>>>>>>
>>>>>>
>>>>>>    On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
>>>>>>
>>>>>>    Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
>>>>>>
>>>>>>
>>>>>>    On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>>>>>
>>>>>>    The new batchbuffer for CNL surpasses the 4096 byte mark.
>>>>>>
>>>>>>    Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
>>>>>>    Cc: Ben Widawsky [4]<ben@bwidawsk.net>
>>>>>>    Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
>>>>>>
>>>>>>    Evil, 4k+ of nothing-ness that userspace then has to configure for
>>>>>> itself
>>>>>>    for correctness anyway.
>>>>>>
>>>>>>    Patch looks ok, but still question the sanity.
>>>>>>
>>>>>>    Is there a requirement for CNL to init the renderstate?
>>>>>>
>>>>>>    I would like to drop the render state init from CNL if
>>>>>>    we can't find evidence that it needs it. Bspec indicates
>>>>>>    that it doesnt.
>>>>
>>>> I'd like to drop as well, and I was hearing people around telling we
>>>> didn't need anymore,
>>>> however without this during power on I had bad failures...
>>>>
>>> The best I could get from architecture (+Raf) is that setting valid and
>>> coherent values for the whole render state is required as soon as the
>>> context is created, no matter who does it. If you see failures when the
>>> KMD does not do it, that means the UMD must be missing something, right?
>>
>> That is my initial response as well. The kernel does load one context,
>> just so that the hardware always has space to write to on power saving.
>> The only batch executed for it is the golden render state. Easy enough
>> to only initialise that kernel context to isolate whether it is
>> self-inflicted or that userspace overlooked something in its state
>> management. (I have the view that even if userspace doesn't think it
>> needs to use a particular bit of state today, tomorrow it will so will
>> need it anyway!)
>> -Chris
>
>
> Rodrigo, you have access to a CNL: can you make this test? The idea is to
> find out if the root cause for the failures you were seeing is the kernel
> default context or in the UMD-created contexts.

I'm sorry for the delay on this one.

On the parts I have now I couldn't reproduce the issues I saw during power-on
where null context helped.

But anyways apparently we need this right?!

What about the 4k+ sanity that Chris raised? Anything we should address first?

>
> Thanks,
> Oscar
>
oscar.mateo@intel.com Aug. 24, 2017, 10:39 p.m. UTC | #12
On 08/23/2017 05:01 PM, Rodrigo Vivi wrote:
> On Tue, Jul 18, 2017 at 8:15 AM, Oscar Mateo <oscar.mateo@intel.com> wrote:
>>
>>
>> On 07/14/2017 08:08 AM, Chris Wilson wrote:
>>> Quoting Oscar Mateo (2017-07-14 15:52:59)
>>>>
>>>>
>>>> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
>>>>> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson <chris@chris-wilson.co.uk>
>>>>> wrote:
>>>>>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
>>>>>>>       On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
>>>>>>>
>>>>>>>     Oscar Mateo [1]<oscar.mateo@intel.com> writes:
>>>>>>>
>>>>>>>
>>>>>>>     On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
>>>>>>>
>>>>>>>     Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
>>>>>>>
>>>>>>>
>>>>>>>     On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>>>>>>
>>>>>>>     The new batchbuffer for CNL surpasses the 4096 byte mark.
>>>>>>>
>>>>>>>     Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
>>>>>>>     Cc: Ben Widawsky [4]<ben@bwidawsk.net>
>>>>>>>     Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
>>>>>>>
>>>>>>>     Evil, 4k+ of nothing-ness that userspace then has to configure for
>>>>>>> itself
>>>>>>>     for correctness anyway.
>>>>>>>
>>>>>>>     Patch looks ok, but still question the sanity.
>>>>>>>
>>>>>>>     Is there a requirement for CNL to init the renderstate?
>>>>>>>
>>>>>>>     I would like to drop the render state init from CNL if
>>>>>>>     we can't find evidence that it needs it. Bspec indicates
>>>>>>>     that it doesnt.
>>>>> I'd like to drop as well, and I was hearing people around telling we
>>>>> didn't need anymore,
>>>>> however without this during power on I had bad failures...
>>>>>
>>>> The best I could get from architecture (+Raf) is that setting valid and
>>>> coherent values for the whole render state is required as soon as the
>>>> context is created, no matter who does it. If you see failures when the
>>>> KMD does not do it, that means the UMD must be missing something, right?
>>> That is my initial response as well. The kernel does load one context,
>>> just so that the hardware always has space to write to on power saving.
>>> The only batch executed for it is the golden render state. Easy enough
>>> to only initialise that kernel context to isolate whether it is
>>> self-inflicted or that userspace overlooked something in its state
>>> management. (I have the view that even if userspace doesn't think it
>>> needs to use a particular bit of state today, tomorrow it will so will
>>> need it anyway!)
>>> -Chris
>>
>> Rodrigo, you have access to a CNL: can you make this test? The idea is to
>> find out if the root cause for the failures you were seeing is the kernel
>> default context or in the UMD-created contexts.
> I'm sorry for the delay on this one.
>
> On the parts I have now I couldn't reproduce the issues I saw during power-on
> where null context helped.
>
> But anyways apparently we need this right?!
>
> What about the 4k+ sanity that Chris raised? Anything we should address first?

I don't think Chris had any problem with the batchbuffer being bigger 
than 4k per se. His concern was: "why do we need to send this 
batchbuffer from the KMD at all if the UMD has to send something very 
similar anyway?".
Even if this was true (I haven't found anybody to confirm or deny it) 
there is still the question of the kernel context (which would never get 
initialized to valid values by the UMD). The test was to only send the 
golden state for the kernel context (and nothing else) and see if your 
issues went away.

Since your issues went away on their own without any golden state 
whatsoever... does that mean Mesa fixed something they were missing 
during the PO?
Rodrigo Vivi Aug. 24, 2017, 11 p.m. UTC | #13
On Thu, Aug 24, 2017 at 3:39 PM, Oscar Mateo <oscar.mateo@intel.com> wrote:
>
>
> On 08/23/2017 05:01 PM, Rodrigo Vivi wrote:
>>
>> On Tue, Jul 18, 2017 at 8:15 AM, Oscar Mateo <oscar.mateo@intel.com>
>> wrote:
>>>
>>>
>>>
>>> On 07/14/2017 08:08 AM, Chris Wilson wrote:
>>>>
>>>> Quoting Oscar Mateo (2017-07-14 15:52:59)
>>>>>
>>>>>
>>>>>
>>>>> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
>>>>>>
>>>>>> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson
>>>>>> <chris@chris-wilson.co.uk>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
>>>>>>>>
>>>>>>>>       On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
>>>>>>>>
>>>>>>>>     Oscar Mateo [1]<oscar.mateo@intel.com> writes:
>>>>>>>>
>>>>>>>>
>>>>>>>>     On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
>>>>>>>>
>>>>>>>>     Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
>>>>>>>>
>>>>>>>>
>>>>>>>>     On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
>>>>>>>>
>>>>>>>>     The new batchbuffer for CNL surpasses the 4096 byte mark.
>>>>>>>>
>>>>>>>>     Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
>>>>>>>>     Cc: Ben Widawsky [4]<ben@bwidawsk.net>
>>>>>>>>     Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
>>>>>>>>
>>>>>>>>     Evil, 4k+ of nothing-ness that userspace then has to configure
>>>>>>>> for
>>>>>>>> itself
>>>>>>>>     for correctness anyway.
>>>>>>>>
>>>>>>>>     Patch looks ok, but still question the sanity.
>>>>>>>>
>>>>>>>>     Is there a requirement for CNL to init the renderstate?
>>>>>>>>
>>>>>>>>     I would like to drop the render state init from CNL if
>>>>>>>>     we can't find evidence that it needs it. Bspec indicates
>>>>>>>>     that it doesnt.
>>>>>>
>>>>>> I'd like to drop as well, and I was hearing people around telling we
>>>>>> didn't need anymore,
>>>>>> however without this during power on I had bad failures...
>>>>>>
>>>>> The best I could get from architecture (+Raf) is that setting valid and
>>>>> coherent values for the whole render state is required as soon as the
>>>>> context is created, no matter who does it. If you see failures when the
>>>>> KMD does not do it, that means the UMD must be missing something,
>>>>> right?
>>>>
>>>> That is my initial response as well. The kernel does load one context,
>>>> just so that the hardware always has space to write to on power saving.
>>>> The only batch executed for it is the golden render state. Easy enough
>>>> to only initialise that kernel context to isolate whether it is
>>>> self-inflicted or that userspace overlooked something in its state
>>>> management. (I have the view that even if userspace doesn't think it
>>>> needs to use a particular bit of state today, tomorrow it will so will
>>>> need it anyway!)
>>>> -Chris
>>>
>>>
>>> Rodrigo, you have access to a CNL: can you make this test? The idea is to
>>> find out if the root cause for the failures you were seeing is the kernel
>>> default context or in the UMD-created contexts.
>>
>> I'm sorry for the delay on this one.
>>
>> On the parts I have now I couldn't reproduce the issues I saw during
>> power-on
>> where null context helped.
>>
>> But anyways apparently we need this right?!
>>
>> What about the 4k+ sanity that Chris raised? Anything we should address
>> first?
>
>
> I don't think Chris had any problem with the batchbuffer being bigger than
> 4k per se. His concern was: "why do we need to send this batchbuffer from
> the KMD at all if the UMD has to send something very similar anyway?".
> Even if this was true (I haven't found anybody to confirm or deny it) there
> is still the question of the kernel context (which would never get
> initialized to valid values by the UMD).

so, chris, rv-b? acked-by?

> The test was to only send the
> golden state for the kernel context (and nothing else) and see if your
> issues went away.
>
> Since your issues went away on their own without any golden state
> whatsoever... does that mean Mesa fixed something they were missing during
> the PO?

not sure what it was anymore

>
>
Rodrigo Vivi Oct. 5, 2017, 4:34 a.m. UTC | #14
On Thu, Aug 24, 2017 at 11:00:27PM +0000, Rodrigo Vivi wrote:
> On Thu, Aug 24, 2017 at 3:39 PM, Oscar Mateo <oscar.mateo@intel.com> wrote:
> >
> >
> > On 08/23/2017 05:01 PM, Rodrigo Vivi wrote:
> >>
> >> On Tue, Jul 18, 2017 at 8:15 AM, Oscar Mateo <oscar.mateo@intel.com>
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 07/14/2017 08:08 AM, Chris Wilson wrote:
> >>>>
> >>>> Quoting Oscar Mateo (2017-07-14 15:52:59)
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
> >>>>>>
> >>>>>> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson
> >>>>>> <chris@chris-wilson.co.uk>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
> >>>>>>>>
> >>>>>>>>       On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
> >>>>>>>>
> >>>>>>>>     Oscar Mateo [1]<oscar.mateo@intel.com> writes:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>     On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
> >>>>>>>>
> >>>>>>>>     Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>     On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
> >>>>>>>>
> >>>>>>>>     The new batchbuffer for CNL surpasses the 4096 byte mark.
> >>>>>>>>
> >>>>>>>>     Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
> >>>>>>>>     Cc: Ben Widawsky [4]<ben@bwidawsk.net>
> >>>>>>>>     Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
> >>>>>>>>
> >>>>>>>>     Evil, 4k+ of nothing-ness that userspace then has to configure
> >>>>>>>> for
> >>>>>>>> itself
> >>>>>>>>     for correctness anyway.
> >>>>>>>>
> >>>>>>>>     Patch looks ok, but still question the sanity.
> >>>>>>>>
> >>>>>>>>     Is there a requirement for CNL to init the renderstate?
> >>>>>>>>
> >>>>>>>>     I would like to drop the render state init from CNL if
> >>>>>>>>     we can't find evidence that it needs it. Bspec indicates
> >>>>>>>>     that it doesnt.
> >>>>>>
> >>>>>> I'd like to drop as well, and I was hearing people around telling we
> >>>>>> didn't need anymore,
> >>>>>> however without this during power on I had bad failures...
> >>>>>>
> >>>>> The best I could get from architecture (+Raf) is that setting valid and
> >>>>> coherent values for the whole render state is required as soon as the
> >>>>> context is created, no matter who does it. If you see failures when the
> >>>>> KMD does not do it, that means the UMD must be missing something,
> >>>>> right?
> >>>>
> >>>> That is my initial response as well. The kernel does load one context,
> >>>> just so that the hardware always has space to write to on power saving.
> >>>> The only batch executed for it is the golden render state. Easy enough
> >>>> to only initialise that kernel context to isolate whether it is
> >>>> self-inflicted or that userspace overlooked something in its state
> >>>> management. (I have the view that even if userspace doesn't think it
> >>>> needs to use a particular bit of state today, tomorrow it will so will
> >>>> need it anyway!)
> >>>> -Chris
> >>>
> >>>
> >>> Rodrigo, you have access to a CNL: can you make this test? The idea is to
> >>> find out if the root cause for the failures you were seeing is the kernel
> >>> default context or in the UMD-created contexts.
> >>
> >> I'm sorry for the delay on this one.
> >>
> >> On the parts I have now I couldn't reproduce the issues I saw during
> >> power-on
> >> where null context helped.
> >>
> >> But anyways apparently we need this right?!
> >>
> >> What about the 4k+ sanity that Chris raised? Anything we should address
> >> first?
> >
> >
> > I don't think Chris had any problem with the batchbuffer being bigger than
> > 4k per se. His concern was: "why do we need to send this batchbuffer from
> > the KMD at all if the UMD has to send something very similar anyway?".
> > Even if this was true (I haven't found anybody to confirm or deny it) there
> > is still the question of the kernel context (which would never get
> > initialized to valid values by the UMD).
> 
> so, chris, rv-b? acked-by?

chris, mika, oscar...
what should we do with this?
just discard, ignore and move on without the null context for gen10+?

> 
> > The test was to only send the
> > golden state for the kernel context (and nothing else) and see if your
> > issues went away.
> >
> > Since your issues went away on their own without any golden state
> > whatsoever... does that mean Mesa fixed something they were missing during
> > the PO?
> 
> not sure what it was anymore
> 
> >
> >
> 
> 
> 
> -- 
> Rodrigo Vivi
> Blog: http://blog.vivi.eng.br
Chris Wilson Oct. 10, 2017, 10:25 a.m. UTC | #15
Quoting Rodrigo Vivi (2017-10-05 05:34:02)
> On Thu, Aug 24, 2017 at 11:00:27PM +0000, Rodrigo Vivi wrote:
> > On Thu, Aug 24, 2017 at 3:39 PM, Oscar Mateo <oscar.mateo@intel.com> wrote:
> > >
> > >
> > > On 08/23/2017 05:01 PM, Rodrigo Vivi wrote:
> > >>
> > >> On Tue, Jul 18, 2017 at 8:15 AM, Oscar Mateo <oscar.mateo@intel.com>
> > >> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 07/14/2017 08:08 AM, Chris Wilson wrote:
> > >>>>
> > >>>> Quoting Oscar Mateo (2017-07-14 15:52:59)
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
> > >>>>>>
> > >>>>>> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson
> > >>>>>> <chris@chris-wilson.co.uk>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
> > >>>>>>>>
> > >>>>>>>>       On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
> > >>>>>>>>
> > >>>>>>>>     Oscar Mateo [1]<oscar.mateo@intel.com> writes:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>     On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
> > >>>>>>>>
> > >>>>>>>>     Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>     On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
> > >>>>>>>>
> > >>>>>>>>     The new batchbuffer for CNL surpasses the 4096 byte mark.
> > >>>>>>>>
> > >>>>>>>>     Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
> > >>>>>>>>     Cc: Ben Widawsky [4]<ben@bwidawsk.net>
> > >>>>>>>>     Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
> > >>>>>>>>
> > >>>>>>>>     Evil, 4k+ of nothing-ness that userspace then has to configure
> > >>>>>>>> for
> > >>>>>>>> itself
> > >>>>>>>>     for correctness anyway.
> > >>>>>>>>
> > >>>>>>>>     Patch looks ok, but still question the sanity.
> > >>>>>>>>
> > >>>>>>>>     Is there a requirement for CNL to init the renderstate?
> > >>>>>>>>
> > >>>>>>>>     I would like to drop the render state init from CNL if
> > >>>>>>>>     we can't find evidence that it needs it. Bspec indicates
> > >>>>>>>>     that it doesnt.
> > >>>>>>
> > >>>>>> I'd like to drop as well, and I was hearing people around telling we
> > >>>>>> didn't need anymore,
> > >>>>>> however without this during power on I had bad failures...
> > >>>>>>
> > >>>>> The best I could get from architecture (+Raf) is that setting valid and
> > >>>>> coherent values for the whole render state is required as soon as the
> > >>>>> context is created, no matter who does it. If you see failures when the
> > >>>>> KMD does not do it, that means the UMD must be missing something,
> > >>>>> right?
> > >>>>
> > >>>> That is my initial response as well. The kernel does load one context,
> > >>>> just so that the hardware always has space to write to on power saving.
> > >>>> The only batch executed for it is the golden render state. Easy enough
> > >>>> to only initialise that kernel context to isolate whether it is
> > >>>> self-inflicted or that userspace overlooked something in its state
> > >>>> management. (I have the view that even if userspace doesn't think it
> > >>>> needs to use a particular bit of state today, tomorrow it will so will
> > >>>> need it anyway!)
> > >>>> -Chris
> > >>>
> > >>>
> > >>> Rodrigo, you have access to a CNL: can you make this test? The idea is to
> > >>> find out if the root cause for the failures you were seeing is the kernel
> > >>> default context or in the UMD-created contexts.
> > >>
> > >> I'm sorry for the delay on this one.
> > >>
> > >> On the parts I have now I couldn't reproduce the issues I saw during
> > >> power-on
> > >> where null context helped.
> > >>
> > >> But anyways apparently we need this right?!
> > >>
> > >> What about the 4k+ sanity that Chris raised? Anything we should address
> > >> first?
> > >
> > >
> > > I don't think Chris had any problem with the batchbuffer being bigger than
> > > 4k per se. His concern was: "why do we need to send this batchbuffer from
> > > the KMD at all if the UMD has to send something very similar anyway?".
> > > Even if this was true (I haven't found anybody to confirm or deny it) there
> > > is still the question of the kernel context (which would never get
> > > initialized to valid values by the UMD).
> > 
> > so, chris, rv-b? acked-by?
> 
> chris, mika, oscar...
> what should we do with this?
> just discard, ignore and move on without the null context for gen10+?

If there's no requirement for us to have it, then let's break the cargo
cult. Certainly userspace does not expect 3DSTATE to have any default
value, unlike the defaults specified for mmio state (which is currently
causing a huge upset). It's only if the bspec has wording that makes
certain valid 3DSTATE (or GPGPU or MEDIA) mandatory for powercontext etc
do we have to worry.
-Chris
Chris Wilson Oct. 10, 2017, 10:29 a.m. UTC | #16
Quoting Chris Wilson (2017-10-10 11:25:38)
> Quoting Rodrigo Vivi (2017-10-05 05:34:02)
> > On Thu, Aug 24, 2017 at 11:00:27PM +0000, Rodrigo Vivi wrote:
> > > On Thu, Aug 24, 2017 at 3:39 PM, Oscar Mateo <oscar.mateo@intel.com> wrote:
> > > >
> > > >
> > > > On 08/23/2017 05:01 PM, Rodrigo Vivi wrote:
> > > >>
> > > >> On Tue, Jul 18, 2017 at 8:15 AM, Oscar Mateo <oscar.mateo@intel.com>
> > > >> wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>> On 07/14/2017 08:08 AM, Chris Wilson wrote:
> > > >>>>
> > > >>>> Quoting Oscar Mateo (2017-07-14 15:52:59)
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
> > > >>>>>>
> > > >>>>>> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson
> > > >>>>>> <chris@chris-wilson.co.uk>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
> > > >>>>>>>>
> > > >>>>>>>>       On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
> > > >>>>>>>>
> > > >>>>>>>>     Oscar Mateo [1]<oscar.mateo@intel.com> writes:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>     On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
> > > >>>>>>>>
> > > >>>>>>>>     Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>     On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
> > > >>>>>>>>
> > > >>>>>>>>     The new batchbuffer for CNL surpasses the 4096 byte mark.
> > > >>>>>>>>
> > > >>>>>>>>     Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
> > > >>>>>>>>     Cc: Ben Widawsky [4]<ben@bwidawsk.net>
> > > >>>>>>>>     Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
> > > >>>>>>>>
> > > >>>>>>>>     Evil, 4k+ of nothing-ness that userspace then has to configure
> > > >>>>>>>> for
> > > >>>>>>>> itself
> > > >>>>>>>>     for correctness anyway.
> > > >>>>>>>>
> > > >>>>>>>>     Patch looks ok, but still question the sanity.
> > > >>>>>>>>
> > > >>>>>>>>     Is there a requirement for CNL to init the renderstate?
> > > >>>>>>>>
> > > >>>>>>>>     I would like to drop the render state init from CNL if
> > > >>>>>>>>     we can't find evidence that it needs it. Bspec indicates
> > > >>>>>>>>     that it doesnt.
> > > >>>>>>
> > > >>>>>> I'd like to drop as well, and I was hearing people around telling we
> > > >>>>>> didn't need anymore,
> > > >>>>>> however without this during power on I had bad failures...
> > > >>>>>>
> > > >>>>> The best I could get from architecture (+Raf) is that setting valid and
> > > >>>>> coherent values for the whole render state is required as soon as the
> > > >>>>> context is created, no matter who does it. If you see failures when the
> > > >>>>> KMD does not do it, that means the UMD must be missing something,
> > > >>>>> right?
> > > >>>>
> > > >>>> That is my initial response as well. The kernel does load one context,
> > > >>>> just so that the hardware always has space to write to on power saving.
> > > >>>> The only batch executed for it is the golden render state. Easy enough
> > > >>>> to only initialise that kernel context to isolate whether it is
> > > >>>> self-inflicted or that userspace overlooked something in its state
> > > >>>> management. (I have the view that even if userspace doesn't think it
> > > >>>> needs to use a particular bit of state today, tomorrow it will so will
> > > >>>> need it anyway!)
> > > >>>> -Chris
> > > >>>
> > > >>>
> > > >>> Rodrigo, you have access to a CNL: can you make this test? The idea is to
> > > >>> find out if the root cause for the failures you were seeing is the kernel
> > > >>> default context or in the UMD-created contexts.
> > > >>
> > > >> I'm sorry for the delay on this one.
> > > >>
> > > >> On the parts I have now I couldn't reproduce the issues I saw during
> > > >> power-on
> > > >> where null context helped.
> > > >>
> > > >> But anyways apparently we need this right?!
> > > >>
> > > >> What about the 4k+ sanity that Chris raised? Anything we should address
> > > >> first?
> > > >
> > > >
> > > > I don't think Chris had any problem with the batchbuffer being bigger than
> > > > 4k per se. His concern was: "why do we need to send this batchbuffer from
> > > > the KMD at all if the UMD has to send something very similar anyway?".
> > > > Even if this was true (I haven't found anybody to confirm or deny it) there
> > > > is still the question of the kernel context (which would never get
> > > > initialized to valid values by the UMD).
> > > 
> > > so, chris, rv-b? acked-by?
> > 
> > chris, mika, oscar...
> > what should we do with this?
> > just discard, ignore and move on without the null context for gen10+?
> 
> If there's no requirement for us to have it, then let's break the cargo
> cult. Certainly userspace does not expect 3DSTATE to have any default
> value, unlike the defaults specified for mmio state (which is currently
> causing a huge upset). It's only if the bspec has wording that makes
> certain valid 3DSTATE (or GPGPU or MEDIA) mandatory for powercontext etc
> do we have to worry.

The other angle is that the proto context is entirely defined by us. New
userspace contexts should not see any state that is outside of the
context construction (either directly specified inside the image or
implicitly from priv registers). In essence for lrc, we already define
the golden render state but call it a context image instead.
-Chris
Rodrigo Vivi Oct. 12, 2017, 10:31 p.m. UTC | #17
On Tue, Oct 10, 2017 at 10:29:41AM +0000, Chris Wilson wrote:
> Quoting Chris Wilson (2017-10-10 11:25:38)
> > Quoting Rodrigo Vivi (2017-10-05 05:34:02)
> > > On Thu, Aug 24, 2017 at 11:00:27PM +0000, Rodrigo Vivi wrote:
> > > > On Thu, Aug 24, 2017 at 3:39 PM, Oscar Mateo <oscar.mateo@intel.com> wrote:
> > > > >
> > > > >
> > > > > On 08/23/2017 05:01 PM, Rodrigo Vivi wrote:
> > > > >>
> > > > >> On Tue, Jul 18, 2017 at 8:15 AM, Oscar Mateo <oscar.mateo@intel.com>
> > > > >> wrote:
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On 07/14/2017 08:08 AM, Chris Wilson wrote:
> > > > >>>>
> > > > >>>> Quoting Oscar Mateo (2017-07-14 15:52:59)
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On 07/13/2017 03:28 PM, Rodrigo Vivi wrote:
> > > > >>>>>>
> > > > >>>>>> On Wed, May 3, 2017 at 9:31 AM, Chris Wilson
> > > > >>>>>> <chris@chris-wilson.co.uk>
> > > > >>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>> On Wed, May 03, 2017 at 09:12:18AM +0000, Oscar Mateo wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>       On 05/03/2017 08:52 AM, Mika Kuoppala wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>     Oscar Mateo [1]<oscar.mateo@intel.com> writes:
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>     On 05/02/2017 09:17 AM, Mika Kuoppala wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>     Chris Wilson [2]<chris@chris-wilson.co.uk> writes:
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>     On Fri, Apr 28, 2017 at 09:11:06AM +0000, Oscar Mateo wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>     The new batchbuffer for CNL surpasses the 4096 byte mark.
> > > > >>>>>>>>
> > > > >>>>>>>>     Cc: Mika Kuoppala [3]<mika.kuoppala@intel.com>
> > > > >>>>>>>>     Cc: Ben Widawsky [4]<ben@bwidawsk.net>
> > > > >>>>>>>>     Signed-off-by: Oscar Mateo [5]<oscar.mateo@intel.com>
> > > > >>>>>>>>
> > > > >>>>>>>>     Evil, 4k+ of nothing-ness that userspace then has to configure
> > > > >>>>>>>> for
> > > > >>>>>>>> itself
> > > > >>>>>>>>     for correctness anyway.
> > > > >>>>>>>>
> > > > >>>>>>>>     Patch looks ok, but still question the sanity.
> > > > >>>>>>>>
> > > > >>>>>>>>     Is there a requirement for CNL to init the renderstate?
> > > > >>>>>>>>
> > > > >>>>>>>>     I would like to drop the render state init from CNL if
> > > > >>>>>>>>     we can't find evidence that it needs it. Bspec indicates
> > > > >>>>>>>>     that it doesnt.
> > > > >>>>>>
> > > > >>>>>> I'd like to drop as well, and I was hearing people around telling we
> > > > >>>>>> didn't need anymore,
> > > > >>>>>> however without this during power on I had bad failures...
> > > > >>>>>>
> > > > >>>>> The best I could get from architecture (+Raf) is that setting valid and
> > > > >>>>> coherent values for the whole render state is required as soon as the
> > > > >>>>> context is created, no matter who does it. If you see failures when the
> > > > >>>>> KMD does not do it, that means the UMD must be missing something,
> > > > >>>>> right?
> > > > >>>>
> > > > >>>> That is my initial response as well. The kernel does load one context,
> > > > >>>> just so that the hardware always has space to write to on power saving.
> > > > >>>> The only batch executed for it is the golden render state. Easy enough
> > > > >>>> to only initialise that kernel context to isolate whether it is
> > > > >>>> self-inflicted or that userspace overlooked something in its state
> > > > >>>> management. (I have the view that even if userspace doesn't think it
> > > > >>>> needs to use a particular bit of state today, tomorrow it will so will
> > > > >>>> need it anyway!)
> > > > >>>> -Chris
> > > > >>>
> > > > >>>
> > > > >>> Rodrigo, you have access to a CNL: can you make this test? The idea is to
> > > > >>> find out if the root cause for the failures you were seeing is the kernel
> > > > >>> default context or in the UMD-created contexts.
> > > > >>
> > > > >> I'm sorry for the delay on this one.
> > > > >>
> > > > >> On the parts I have now I couldn't reproduce the issues I saw during
> > > > >> power-on
> > > > >> where null context helped.
> > > > >>
> > > > >> But anyways apparently we need this right?!
> > > > >>
> > > > >> What about the 4k+ sanity that Chris raised? Anything we should address
> > > > >> first?
> > > > >
> > > > >
> > > > > I don't think Chris had any problem with the batchbuffer being bigger than
> > > > > 4k per se. His concern was: "why do we need to send this batchbuffer from
> > > > > the KMD at all if the UMD has to send something very similar anyway?".
> > > > > Even if this was true (I haven't found anybody to confirm or deny it) there
> > > > > is still the question of the kernel context (which would never get
> > > > > initialized to valid values by the UMD).
> > > > 
> > > > so, chris, rv-b? acked-by?
> > > 
> > > chris, mika, oscar...
> > > what should we do with this?
> > > just discard, ignore and move on without the null context for gen10+?
> > 
> > If there's no requirement for us to have it, then let's break the cargo
> > cult. Certainly userspace does not expect 3DSTATE to have any default
> > value, unlike the defaults specified for mmio state (which is currently
> > causing a huge upset). It's only if the bspec has wording that makes
> > certain valid 3DSTATE (or GPGPU or MEDIA) mandatory for powercontext etc
> > do we have to worry.
> 
> The other angle is that the proto context is entirely defined by us. New
> userspace contexts should not see any state that is outside of the
> context construction (either directly specified inside the image or
> implicitly from priv registers). In essence for lrc, we already define
> the golden render state but call it a context image instead.
> -Chris

So, are you saying there is absolutely no risk of one userspace component
leaving garbage on any of these registers and other component assuming it
is null or valid do some RMW and end up with wrong setup?

I believe in the past there were cases like this between Mesa and Libva.

And if issues like this starts to appear back than apparently
the debug is harder because it would be random the garbage left behind.

I understand the cargo part, but with many different userspaces out there
using the GPU, the cost of kernel assuring the null is really low
compared with the stability it can bring without relying on userspace.

I understand your part of breaking the cargo. But my doubt is, if we stop this
after many years clearing this up we expect userspaces go ahead and all
of them modify all their code to not make any assumptions on CNL+
regarding those states?

Thanks,
Rodrigo.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 12d7036..07f9bd6 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -62,12 +62,12 @@  struct intel_render_state {
  * this is sufficient as the null state generator makes the final batch
  * with two passes to build command and state separately. At this point
  * the size of both are known and it compacts them by relocating the state
- * right after the commands taking care of alignment so we should sufficient
- * space below them for adding new commands.
+ * right after the commands taking care of alignment so we should have
+ * sufficient space below them for adding new commands.
  */
-#define OUT_BATCH(batch, i, val)				\
+#define OUT_BATCH(batch, size, i, val)				\
 	do {							\
-		if ((i) >= PAGE_SIZE / sizeof(u32))		\
+		if ((i) >= size / sizeof(u32))			\
 			goto err;				\
 		(batch)[(i)++] = (val);				\
 	} while(0)
@@ -86,7 +86,11 @@  static int render_state_setup(struct intel_render_state *so,
 	if (ret)
 		return ret;
 
-	d = kmap_atomic(i915_gem_object_get_dirty_page(obj, 0));
+	d = i915_gem_object_pin_map(obj, I915_MAP_WB);
+	if (IS_ERR(d)) {
+		ret = PTR_ERR(d);
+		goto out;
+	}
 
 	while (i < rodata->batch_items) {
 		u32 s = rodata->batch[i];
@@ -118,7 +122,7 @@  static int render_state_setup(struct intel_render_state *so,
 	so->batch_size = rodata->batch_items * sizeof(u32);
 
 	while (i % CACHELINE_DWORDS)
-		OUT_BATCH(d, i, MI_NOOP);
+		OUT_BATCH(d, obj->base.size, i, MI_NOOP);
 
 	so->aux_offset = i * sizeof(u32);
 
@@ -141,15 +145,15 @@  static int render_state_setup(struct intel_render_state *so,
 		 */
 		u32 eu_pool_config = 0x00777000;
 
-		OUT_BATCH(d, i, GEN9_MEDIA_POOL_STATE);
-		OUT_BATCH(d, i, GEN9_MEDIA_POOL_ENABLE);
-		OUT_BATCH(d, i, eu_pool_config);
-		OUT_BATCH(d, i, 0);
-		OUT_BATCH(d, i, 0);
-		OUT_BATCH(d, i, 0);
+		OUT_BATCH(d, obj->base.size, i, GEN9_MEDIA_POOL_STATE);
+		OUT_BATCH(d, obj->base.size, i, GEN9_MEDIA_POOL_ENABLE);
+		OUT_BATCH(d, obj->base.size, i, eu_pool_config);
+		OUT_BATCH(d, obj->base.size, i, 0);
+		OUT_BATCH(d, obj->base.size, i, 0);
+		OUT_BATCH(d, obj->base.size, i, 0);
 	}
 
-	OUT_BATCH(d, i, MI_BATCH_BUFFER_END);
+	OUT_BATCH(d, obj->base.size, i, MI_BATCH_BUFFER_END);
 	so->aux_size = i * sizeof(u32) - so->aux_offset;
 	so->aux_offset += so->batch_offset;
 	/*
@@ -160,7 +164,7 @@  static int render_state_setup(struct intel_render_state *so,
 
 	if (needs_clflush)
 		drm_clflush_virt_range(d, i * sizeof(u32));
-	kunmap_atomic(d);
+	i915_gem_object_unpin_map(obj);
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, false);
 out:
@@ -168,7 +172,7 @@  static int render_state_setup(struct intel_render_state *so,
 	return ret;
 
 err:
-	kunmap_atomic(d);
+	i915_gem_object_unpin_map(obj);
 	ret = -EINVAL;
 	goto out;
 }
@@ -189,14 +193,12 @@  int i915_gem_render_state_init(struct intel_engine_cs *engine)
 	if (!rodata)
 		return 0;
 
-	if (rodata->batch_items * 4 > PAGE_SIZE)
-		return -EINVAL;
-
 	so = kmalloc(sizeof(*so), GFP_KERNEL);
 	if (!so)
 		return -ENOMEM;
 
-	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
+	obj = i915_gem_object_create_internal(engine->i915,
+			PAGE_ALIGN(rodata->batch_items * 4));
 	if (IS_ERR(obj)) {
 		ret = PTR_ERR(obj);
 		goto err_free;