diff mbox

drm/i915: Emit even number of dwords when emitting LRIs

Message ID 1414000792-16111-1-git-send-email-arun.siluvery@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

arun.siluvery@linux.intel.com Oct. 22, 2014, 5:59 p.m. UTC
The number of DWords should be even when doing ring emits as
command sequences require QWord alignment.

v2: user LRI variant that can write multiple regs in one go (Damien).
We can simply insert one NOP at the end instead of one per register write.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Lespiau, Damien Oct. 22, 2014, 9:59 p.m. UTC | #1
On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:
> The number of DWords should be even when doing ring emits as
> command sequences require QWord alignment.
> 
> v2: user LRI variant that can write multiple regs in one go (Damien).
> We can simply insert one NOP at the end instead of one per register write.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>

Looks good to me (maybe without the extra '()' outlined below).

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Daniel Vetter Oct. 23, 2014, 12:21 p.m. UTC | #2
On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:
> The number of DWords should be even when doing ring emits as
> command sequences require QWord alignment.
> 
> v2: user LRI variant that can write multiple regs in one go (Damien).
> We can simply insert one NOP at the end instead of one per register write.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 497b836..a8f72e8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
>  	if (ret)
>  		return ret;
>  
> -	ret = intel_ring_begin(ring, w->count * 3);
> +	ret = intel_ring_begin(ring, (w->count * 2 + 2));
>  	if (ret)
>  		return ret;
>  
> +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));

Afaik there's a limit to the size of an MI_LRI. Where's the check for
that (probably with a WARN_ON for now to avoid unecessary complexity)?
-Daniel

>  	for (i = 0; i < w->count; i++) {
> -		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
>  		intel_ring_emit(ring, w->reg[i].addr);
>  		intel_ring_emit(ring, w->reg[i].value);
>  	}
> +	intel_ring_emit(ring, MI_NOOP);
>  
>  	intel_ring_advance(ring);
>  
> -- 
> 2.1.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Lespiau, Damien Oct. 23, 2014, 12:42 p.m. UTC | #3
On Thu, Oct 23, 2014 at 02:21:02PM +0200, Daniel Vetter wrote:
> On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:
> > The number of DWords should be even when doing ring emits as
> > command sequences require QWord alignment.
> > 
> > v2: user LRI variant that can write multiple regs in one go (Damien).
> > We can simply insert one NOP at the end instead of one per register write.
> > 
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 497b836..a8f72e8 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
> >  	if (ret)
> >  		return ret;
> >  
> > -	ret = intel_ring_begin(ring, w->count * 3);
> > +	ret = intel_ring_begin(ring, (w->count * 2 + 2));
> >  	if (ret)
> >  		return ret;
> >  
> > +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
> 
> Afaik there's a limit to the size of an MI_LRI. Where's the check for
> that (probably with a WARN_ON for now to avoid unecessary complexity)?

I guess there's always the size of the length field, I don't see any
other indication. Note that I can find the documentation of the
multi-registers version of LRI either. So, well, we probably should
double check it does work.
Chris Wilson Oct. 23, 2014, 12:50 p.m. UTC | #4
On Thu, Oct 23, 2014 at 01:42:38PM +0100, Damien Lespiau wrote:
> On Thu, Oct 23, 2014 at 02:21:02PM +0200, Daniel Vetter wrote:
> > On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:
> > > The number of DWords should be even when doing ring emits as
> > > command sequences require QWord alignment.
> > > 
> > > v2: user LRI variant that can write multiple regs in one go (Damien).
> > > We can simply insert one NOP at the end instead of one per register write.
> > > 
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 497b836..a8f72e8 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > -	ret = intel_ring_begin(ring, w->count * 3);
> > > +	ret = intel_ring_begin(ring, (w->count * 2 + 2));
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
> > 
> > Afaik there's a limit to the size of an MI_LRI. Where's the check for
> > that (probably with a WARN_ON for now to avoid unecessary complexity)?
> 
> I guess there's always the size of the length field, I don't see any
> other indication. Note that I can find the documentation of the
> multi-registers version of LRI either. So, well, we probably should
> double check it does work.

It does work. The max is around 60 iirc (the max length of the
command).
-Chris
Ville Syrjälä Oct. 23, 2014, 1:41 p.m. UTC | #5
On Thu, Oct 23, 2014 at 01:50:23PM +0100, Chris Wilson wrote:
> On Thu, Oct 23, 2014 at 01:42:38PM +0100, Damien Lespiau wrote:
> > On Thu, Oct 23, 2014 at 02:21:02PM +0200, Daniel Vetter wrote:
> > > On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:
> > > > The number of DWords should be even when doing ring emits as
> > > > command sequences require QWord alignment.
> > > > 
> > > > v2: user LRI variant that can write multiple regs in one go (Damien).
> > > > We can simply insert one NOP at the end instead of one per register write.
> > > > 
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index 497b836..a8f72e8 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
> > > >  	if (ret)
> > > >  		return ret;
> > > >  
> > > > -	ret = intel_ring_begin(ring, w->count * 3);
> > > > +	ret = intel_ring_begin(ring, (w->count * 2 + 2));
> > > >  	if (ret)
> > > >  		return ret;
> > > >  
> > > > +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
> > > 
> > > Afaik there's a limit to the size of an MI_LRI. Where's the check for
> > > that (probably with a WARN_ON for now to avoid unecessary complexity)?
> > 
> > I guess there's always the size of the length field, I don't see any
> > other indication. Note that I can find the documentation of the
> > multi-registers version of LRI either. So, well, we probably should
> > double check it does work.
> 
> It does work. The max is around 60 iirc (the max length of the
> command).

The maximum length seems to be 0xff on gen6+ and 0x3f before that,
which would mean at most 128 or 32 registers.

Also the context image is full of these multi register LRIs. Based on a
quick glance the longest LRI in there is 0x5f on IVB, 0xcf on HSW, and
0xdf on BDW, which translate to 48, 104, and 108 registers per LRI. So
we know at least those must work or context restore would not work.
Before gen7 the context doesn't seem to resemble a batch, so I can't
tell anything about those platforms based on the context image.
arun.siluvery@linux.intel.com Oct. 23, 2014, 1:55 p.m. UTC | #6
On 23/10/2014 14:41, Ville Syrjälä wrote:
> On Thu, Oct 23, 2014 at 01:50:23PM +0100, Chris Wilson wrote:
>> On Thu, Oct 23, 2014 at 01:42:38PM +0100, Damien Lespiau wrote:
>>> On Thu, Oct 23, 2014 at 02:21:02PM +0200, Daniel Vetter wrote:
>>>> On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:
>>>>> The number of DWords should be even when doing ring emits as
>>>>> command sequences require QWord alignment.
>>>>>
>>>>> v2: user LRI variant that can write multiple regs in one go (Damien).
>>>>> We can simply insert one NOP at the end instead of one per register write.
>>>>>
>>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
>>>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>>> index 497b836..a8f72e8 100644
>>>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>>> @@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
>>>>>   	if (ret)
>>>>>   		return ret;
>>>>>
>>>>> -	ret = intel_ring_begin(ring, w->count * 3);
>>>>> +	ret = intel_ring_begin(ring, (w->count * 2 + 2));
>>>>>   	if (ret)
>>>>>   		return ret;
>>>>>
>>>>> +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
>>>>
>>>> Afaik there's a limit to the size of an MI_LRI. Where's the check for
>>>> that (probably with a WARN_ON for now to avoid unecessary complexity)?
>>>
>>> I guess there's always the size of the length field, I don't see any
>>> other indication. Note that I can find the documentation of the
>>> multi-registers version of LRI either. So, well, we probably should
>>> double check it does work.
>>
>> It does work. The max is around 60 iirc (the max length of the
>> command).
>
> The maximum length seems to be 0xff on gen6+ and 0x3f before that,
> which would mean at most 128 or 32 registers.
>
> Also the context image is full of these multi register LRIs. Based on a
> quick glance the longest LRI in there is 0x5f on IVB, 0xcf on HSW, and
> 0xdf on BDW, which translate to 48, 104, and 108 registers per LRI. So
> we know at least those must work or context restore would not work.
> Before gen7 the context doesn't seem to resemble a batch, so I can't
> tell anything about those platforms based on the context image.
>

w->count is already checked against max workarounds which is 16 now so 
we are well within the limit; I think additional check would be 
redundant here and it is unlikely to have more than 128 workarounds.

regards
Arun
Mika Kuoppala Oct. 23, 2014, 2:42 p.m. UTC | #7
Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Thu, Oct 23, 2014 at 01:42:38PM +0100, Damien Lespiau wrote:
>> On Thu, Oct 23, 2014 at 02:21:02PM +0200, Daniel Vetter wrote:
>> > On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:
>> > > The number of DWords should be even when doing ring emits as
>> > > command sequences require QWord alignment.
>> > > 
>> > > v2: user LRI variant that can write multiple regs in one go (Damien).
>> > > We can simply insert one NOP at the end instead of one per register write.
>> > > 
>> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> > > ---
>> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
>> > >  1 file changed, 3 insertions(+), 2 deletions(-)
>> > > 
>> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > > index 497b836..a8f72e8 100644
>> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > > @@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
>> > >  	if (ret)
>> > >  		return ret;
>> > >  
>> > > -	ret = intel_ring_begin(ring, w->count * 3);
>> > > +	ret = intel_ring_begin(ring, (w->count * 2 + 2));
>> > >  	if (ret)
>> > >  		return ret;
>> > >  
>> > > +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
>> > 
>> > Afaik there's a limit to the size of an MI_LRI. Where's the check for
>> > that (probably with a WARN_ON for now to avoid unecessary complexity)?
>> 
>> I guess there's always the size of the length field, I don't see any
>> other indication. Note that I can find the documentation of the
>> multi-registers version of LRI either. So, well, we probably should
>> double check it does work.
>
> It does work. The max is around 60 iirc (the max length of the
> command).
> -Chris
>

I did some test with bdw:

The maximum is 128 writes, resulting the 8 bit length 
field of the command being 0xff, thus following the spec.
The 128'th write went through.

Perhaps the max command length is then less in older gens?

Perhaps WARN_ON(x > 128) in MI_LOAD_REGISTER_IMM would be in place
but one needs minor tweak to command parser a bit also then.

#define I915_MAX_WA_REGS 16

keeps us safe for now atleast.

-Mika

> -- 
> Chris Wilson, Intel Open Source Technology Centre
Daniel Vetter Oct. 23, 2014, 3:49 p.m. UTC | #8
On Thu, Oct 23, 2014 at 05:42:47PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Thu, Oct 23, 2014 at 01:42:38PM +0100, Damien Lespiau wrote:
> >> On Thu, Oct 23, 2014 at 02:21:02PM +0200, Daniel Vetter wrote:
> >> > On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:
> >> > > The number of DWords should be even when doing ring emits as
> >> > > command sequences require QWord alignment.
> >> > > 
> >> > > v2: user LRI variant that can write multiple regs in one go (Damien).
> >> > > We can simply insert one NOP at the end instead of one per register write.
> >> > > 
> >> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> >> > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >> > > ---
> >> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
> >> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> >> > > 
> >> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> > > index 497b836..a8f72e8 100644
> >> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> > > @@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
> >> > >  	if (ret)
> >> > >  		return ret;
> >> > >  
> >> > > -	ret = intel_ring_begin(ring, w->count * 3);
> >> > > +	ret = intel_ring_begin(ring, (w->count * 2 + 2));
> >> > >  	if (ret)
> >> > >  		return ret;
> >> > >  
> >> > > +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
> >> > 
> >> > Afaik there's a limit to the size of an MI_LRI. Where's the check for
> >> > that (probably with a WARN_ON for now to avoid unecessary complexity)?
> >> 
> >> I guess there's always the size of the length field, I don't see any
> >> other indication. Note that I can find the documentation of the
> >> multi-registers version of LRI either. So, well, we probably should
> >> double check it does work.
> >
> > It does work. The max is around 60 iirc (the max length of the
> > command).
> > -Chris
> >
> 
> I did some test with bdw:
> 
> The maximum is 128 writes, resulting the 8 bit length 
> field of the command being 0xff, thus following the spec.
> The 128'th write went through.
> 
> Perhaps the max command length is then less in older gens?
> 
> Perhaps WARN_ON(x > 128) in MI_LOAD_REGISTER_IMM would be in place
> but one needs minor tweak to command parser a bit also then.
> 
> #define I915_MAX_WA_REGS 16
> 
> keeps us safe for now atleast.

Ok, that's good enough I think. I've summarized the discussion a bit in
the commit message and merged the patch.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 497b836..a8f72e8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -680,15 +680,16 @@  static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
 	if (ret)
 		return ret;
 
-	ret = intel_ring_begin(ring, w->count * 3);
+	ret = intel_ring_begin(ring, (w->count * 2 + 2));
 	if (ret)
 		return ret;
 
+	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w->count));
 	for (i = 0; i < w->count; i++) {
-		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 		intel_ring_emit(ring, w->reg[i].addr);
 		intel_ring_emit(ring, w->reg[i].value);
 	}
+	intel_ring_emit(ring, MI_NOOP);
 
 	intel_ring_advance(ring);