diff mbox

[v3,6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

Message ID 1433512679-7707-1-git-send-email-arun.siluvery@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

arun.siluvery@linux.intel.com June 5, 2015, 1:57 p.m. UTC
In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)

Comments

Dave Gordon June 9, 2015, 6:43 p.m. UTC | #1
On 05/06/15 14:57, Arun Siluvery wrote:
> In Per context w/a batch buffer,
> WaRsRestoreWithPerCtxtBb
> 
> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
> so as to not break any future users of existing definitions (Michel)
> 
> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 33b0ff1..6928162 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
[snip]
>  #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>  #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)

Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
a two-operand instruction, each of which is a one-word MMIO register
address, hence always 3 words total. The length bias is 2, so the
so-called 'flags' field must be 1. The original definition (where the
second argument of the MI_INSTR macro is 0) shouldn't work.

So just correct the original definition of MI_LOAD_REGISTER_REG; this
isn't something that's actually changed on GEN8.

While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.

>  #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
>  #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
>  #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)

And these are wrong too! In fact all of these instructions have been
added under a comment which says "Commands used only by the command
parser". Looks like they were added as placeholders without the proper
length fields, and then people have started using them as though they
were complete definitions :(

Time update them all, perhaps ...

[snip]

> +	/*
> +	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
> +	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
> +	 * in the same cacheline.
> +	 */
> +	while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
> +		cmd[index++] = MI_NOOP;
> +
> +	cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
> +		MI_LRM_USE_GLOBAL_GTT |
> +		MI_LRM_ASYNC_MODE_ENABLE;
> +	cmd[index++] = INSTPM;
> +	cmd[index++] = scratch_addr;
> +	cmd[index++] = 0;
> +
> +	/*
> +	 * BSpec says there should not be any commands programmed
> +	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
> +	 * do not add any new commands
> +	 */
> +	cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
> +	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
> +	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
> +
>  	/* padding */
>          while (index < end)
>  		cmd[index++] = MI_NOOP;
> 

Where's the MI_BATCH_BUFFER_END referred to in the comment?

.Dave.
arun.siluvery@linux.intel.com June 12, 2015, 11:58 a.m. UTC | #2
On 09/06/2015 19:43, Dave Gordon wrote:
> On 05/06/15 14:57, Arun Siluvery wrote:
>> In Per context w/a batch buffer,
>> WaRsRestoreWithPerCtxtBb
>>
>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>> so as to not break any future users of existing definitions (Michel)
>>
>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 85 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index 33b0ff1..6928162 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
> [snip]
>>   #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>>   #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
>> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
>> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
>
> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
> a two-operand instruction, each of which is a one-word MMIO register
> address, hence always 3 words total. The length bias is 2, so the
> so-called 'flags' field must be 1. The original definition (where the
> second argument of the MI_INSTR macro is 0) shouldn't work.
>
> So just correct the original definition of MI_LOAD_REGISTER_REG; this
> isn't something that's actually changed on GEN8.
>
I did notice that the original instructions are odd but thought I might 
be wrong hence I created new ones to not disturb the original ones.
ok I will just correct original one and reuse it.

> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.
>
ok.
>>   #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
>>   #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
>>   #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
>
> And these are wrong too! In fact all of these instructions have been
> added under a comment which says "Commands used only by the command
> parser". Looks like they were added as placeholders without the proper
> length fields, and then people have started using them as though they
> were complete definitions :(
>
> Time update them all, perhaps ...
these are not related to this patch, so it can be taken up as a 
different patch.
>
> [snip]
>
>> +	/*
>> +	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>> +	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
>> +	 * in the same cacheline.
>> +	 */
>> +	while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>> +		cmd[index++] = MI_NOOP;
>> +
>> +	cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>> +		MI_LRM_USE_GLOBAL_GTT |
>> +		MI_LRM_ASYNC_MODE_ENABLE;
>> +	cmd[index++] = INSTPM;
>> +	cmd[index++] = scratch_addr;
>> +	cmd[index++] = 0;
>> +
>> +	/*
>> +	 * BSpec says there should not be any commands programmed
>> +	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>> +	 * do not add any new commands
>> +	 */
>> +	cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>> +	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>> +	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>> +
>>   	/* padding */
>>           while (index < end)
>>   		cmd[index++] = MI_NOOP;
>>
>
> Where's the MI_BATCH_BUFFER_END referred to in the comment?

MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
Since the diff context is only few lines it didn't showup in the diff.

regards
Arun

>
> .Dave.
>
>
Dave Gordon June 12, 2015, 5:03 p.m. UTC | #3
On 12/06/15 12:58, Siluvery, Arun wrote:
> On 09/06/2015 19:43, Dave Gordon wrote:
>> On 05/06/15 14:57, Arun Siluvery wrote:
>>> In Per context w/a batch buffer,
>>> WaRsRestoreWithPerCtxtBb
>>>
>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>> so as to not break any future users of existing definitions (Michel)
>>>
>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>   drivers/gpu/drm/i915/intel_lrc.c | 59
>>> ++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 85 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>> b/drivers/gpu/drm/i915/i915_reg.h
>>> index 33b0ff1..6928162 100644
>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> [snip]
>>>   #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>>>   #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
>>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
>>> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
>>> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
>>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
>>
>> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
>> a two-operand instruction, each of which is a one-word MMIO register
>> address, hence always 3 words total. The length bias is 2, so the
>> so-called 'flags' field must be 1. The original definition (where the
>> second argument of the MI_INSTR macro is 0) shouldn't work.
>>
>> So just correct the original definition of MI_LOAD_REGISTER_REG; this
>> isn't something that's actually changed on GEN8.
>>
> I did notice that the original instructions are odd but thought I might
> be wrong hence I created new ones to not disturb the original ones.
> ok I will just correct original one and reuse it.
> 
>> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
>> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.
>>
> ok.
>>>   #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
>>>   #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
>>>   #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
>>
>> And these are wrong too! In fact all of these instructions have been
>> added under a comment which says "Commands used only by the command
>> parser". Looks like they were added as placeholders without the proper
>> length fields, and then people have started using them as though they
>> were complete definitions :(
>>
>> Time update them all, perhaps ...
> these are not related to this patch, so it can be taken up as a
> different patch.

As a minimum, you should move these updated #defines out of the section
under the comment "Commands used only by the command parser" and put
them in the appropriate place in the regular list of MI_ commnds,
preferably in numerical order. Then the ones that are genuinely only
used by the command parser could be left for another patch ...

>> [snip]
>>
>>> +    /*
>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>> +     * in the same cacheline.
>>> +     */
>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>> +        cmd[index++] = MI_NOOP;
>>> +
>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>> +        MI_LRM_USE_GLOBAL_GTT |
>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>> +    cmd[index++] = INSTPM;
>>> +    cmd[index++] = scratch_addr;
>>> +    cmd[index++] = 0;
>>> +
>>> +    /*
>>> +     * BSpec says there should not be any commands programmed
>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>> +     * do not add any new commands
>>> +     */
>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>> +
>>>       /* padding */
>>>           while (index < end)
>>>           cmd[index++] = MI_NOOP;
>>>
>>
>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
> 
> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
> Since the diff context is only few lines it didn't showup in the diff.

The second comment above says "no commands between LOAD_REG_REG and
BB_END", so the point of my comment was that the BB_END is *NOT*
immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!

And therefore also, these instructions do *not* all end up in the same
cacheline, thus contradicting the first comment above.

Padding *after* a BB_END would be redundant.

.Dave.
arun.siluvery@linux.intel.com June 15, 2015, 2:10 p.m. UTC | #4
On 12/06/2015 18:03, Dave Gordon wrote:
> On 12/06/15 12:58, Siluvery, Arun wrote:
>> On 09/06/2015 19:43, Dave Gordon wrote:
>>> On 05/06/15 14:57, Arun Siluvery wrote:
>>>> In Per context w/a batch buffer,
>>>> WaRsRestoreWithPerCtxtBb
>>>>
>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>>> so as to not break any future users of existing definitions (Michel)
>>>>
>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>>    drivers/gpu/drm/i915/intel_lrc.c | 59
>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>    2 files changed, 85 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>>> b/drivers/gpu/drm/i915/i915_reg.h
>>>> index 33b0ff1..6928162 100644
>>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>> [snip]
>>>>    #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>>>>    #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
>>>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
>>>> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
>>>> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
>>>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
>>>
>>> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
>>> a two-operand instruction, each of which is a one-word MMIO register
>>> address, hence always 3 words total. The length bias is 2, so the
>>> so-called 'flags' field must be 1. The original definition (where the
>>> second argument of the MI_INSTR macro is 0) shouldn't work.
>>>
>>> So just correct the original definition of MI_LOAD_REGISTER_REG; this
>>> isn't something that's actually changed on GEN8.
>>>
>> I did notice that the original instructions are odd but thought I might
>> be wrong hence I created new ones to not disturb the original ones.
>> ok I will just correct original one and reuse it.
>>
>>> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
>>> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.
>>>
>> ok.
>>>>    #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
>>>>    #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
>>>>    #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
>>>
>>> And these are wrong too! In fact all of these instructions have been
>>> added under a comment which says "Commands used only by the command
>>> parser". Looks like they were added as placeholders without the proper
>>> length fields, and then people have started using them as though they
>>> were complete definitions :(
>>>
>>> Time update them all, perhaps ...
>> these are not related to this patch, so it can be taken up as a
>> different patch.
>
> As a minimum, you should move these updated #defines out of the section
> under the comment "Commands used only by the command parser" and put
> them in the appropriate place in the regular list of MI_ commnds,
> preferably in numerical order. Then the ones that are genuinely only
> used by the command parser could be left for another patch ...
>
>>> [snip]
>>>
>>>> +    /*
>>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>>> +     * in the same cacheline.
>>>> +     */
>>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>>> +        cmd[index++] = MI_NOOP;
>>>> +
>>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>>> +        MI_LRM_USE_GLOBAL_GTT |
>>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>>> +    cmd[index++] = INSTPM;
>>>> +    cmd[index++] = scratch_addr;
>>>> +    cmd[index++] = 0;
>>>> +
>>>> +    /*
>>>> +     * BSpec says there should not be any commands programmed
>>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>>> +     * do not add any new commands
>>>> +     */
>>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>> +
>>>>        /* padding */
>>>>            while (index < end)
>>>>            cmd[index++] = MI_NOOP;
>>>>
>>>
>>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
>>
>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
>> Since the diff context is only few lines it didn't showup in the diff.
>
> The second comment above says "no commands between LOAD_REG_REG and
> BB_END", so the point of my comment was that the BB_END is *NOT*
> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
true, but they are no-ops so they shouldn't really affect anything. I 
guess the spec means no valid commands.

>
> And therefore also, these instructions do *not* all end up in the same
> cacheline, thus contradicting the first comment above.
I don't understand why. As per the requirement the commands from the 
first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be 
part of same cacheline (in this case second line).

>
> Padding *after* a BB_END would be redundant.

yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of 
abruptly terminating the batch which is why I am padding with no-ops, I 
can change this if that is preferred.
>
> .Dave.
>
>
Daniel Vetter June 15, 2015, 3:27 p.m. UTC | #5
On Fri, Jun 12, 2015 at 06:03:55PM +0100, Dave Gordon wrote:
> On 12/06/15 12:58, Siluvery, Arun wrote:
> > On 09/06/2015 19:43, Dave Gordon wrote:
> >> On 05/06/15 14:57, Arun Siluvery wrote:
> >>> In Per context w/a batch buffer,
> >>> WaRsRestoreWithPerCtxtBb
> >>>
> >>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
> >>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
> >>> so as to not break any future users of existing definitions (Michel)
> >>>
> >>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
> >>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>> ---
> >>>   drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
> >>>   drivers/gpu/drm/i915/intel_lrc.c | 59
> >>> ++++++++++++++++++++++++++++++++++++++++
> >>>   2 files changed, 85 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> >>> b/drivers/gpu/drm/i915/i915_reg.h
> >>> index 33b0ff1..6928162 100644
> >>> --- a/drivers/gpu/drm/i915/i915_reg.h
> >>> +++ b/drivers/gpu/drm/i915/i915_reg.h
> >> [snip]
> >>>   #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
> >>>   #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
> >>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
> >>> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
> >>> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
> >>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
> >>
> >> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
> >> a two-operand instruction, each of which is a one-word MMIO register
> >> address, hence always 3 words total. The length bias is 2, so the
> >> so-called 'flags' field must be 1. The original definition (where the
> >> second argument of the MI_INSTR macro is 0) shouldn't work.
> >>
> >> So just correct the original definition of MI_LOAD_REGISTER_REG; this
> >> isn't something that's actually changed on GEN8.
> >>
> > I did notice that the original instructions are odd but thought I might
> > be wrong hence I created new ones to not disturb the original ones.
> > ok I will just correct original one and reuse it.
> > 
> >> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
> >> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.
> >>
> > ok.
> >>>   #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
> >>>   #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
> >>>   #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
> >>
> >> And these are wrong too! In fact all of these instructions have been
> >> added under a comment which says "Commands used only by the command
> >> parser". Looks like they were added as placeholders without the proper
> >> length fields, and then people have started using them as though they
> >> were complete definitions :(
> >>
> >> Time update them all, perhaps ...
> > these are not related to this patch, so it can be taken up as a
> > different patch.
> 
> As a minimum, you should move these updated #defines out of the section
> under the comment "Commands used only by the command parser" and put
> them in the appropriate place in the regular list of MI_ commnds,
> preferably in numerical order. Then the ones that are genuinely only
> used by the command parser could be left for another patch ...

Please just correct the #defines while at it, this really is way to
tempting a trap to keep it hot. Can be done in a separate patch ofc, but
imo not fixing an obvious issue when we spot it because its not perfectly
directly related to the feature work at hand is bad practice leading to
piles of technical debt.

And that's the kind of stuff that robs me of my sleep at night ;-)

Thanks, Daniel
Dave Gordon June 15, 2015, 5:29 p.m. UTC | #6
On 15/06/15 15:10, Siluvery, Arun wrote:
> On 12/06/2015 18:03, Dave Gordon wrote:
>> On 12/06/15 12:58, Siluvery, Arun wrote:
>>> On 09/06/2015 19:43, Dave Gordon wrote:
>>>> On 05/06/15 14:57, Arun Siluvery wrote:
>>>>> In Per context w/a batch buffer,
>>>>> WaRsRestoreWithPerCtxtBb
>>>>>
>>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>>>> so as to not break any future users of existing definitions (Michel)
>>>>>
>>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>>>    drivers/gpu/drm/i915/intel_lrc.c | 59
>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>    2 files changed, 85 insertions(+)

[snip]

>>>>> +    /*
>>>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>>>> +     * in the same cacheline.
>>>>> +     */
>>>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>>>> +        cmd[index++] = MI_NOOP;
>>>>> +
>>>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>>>> +        MI_LRM_USE_GLOBAL_GTT |
>>>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>>>> +    cmd[index++] = INSTPM;
>>>>> +    cmd[index++] = scratch_addr;
>>>>> +    cmd[index++] = 0;
>>>>> +
>>>>> +    /*
>>>>> +     * BSpec says there should not be any commands programmed
>>>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>>>> +     * do not add any new commands
>>>>> +     */
>>>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>> +
>>>>>        /* padding */
>>>>>            while (index < end)
>>>>>            cmd[index++] = MI_NOOP;
>>>>>
>>>>
>>>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
>>>
>>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
>>> Since the diff context is only few lines it didn't showup in the diff.
>>
>> The second comment above says "no commands between LOAD_REG_REG and
>> BB_END", so the point of my comment was that the BB_END is *NOT*
>> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
>
> true, but they are no-ops so they shouldn't really affect anything. I
> guess the spec means no valid commands.

I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command,
and I've even seen cases where a workaround specifically requires "a
NOOP with the set-no-op-id-register bit set" to fix some particular bug.
The only special thing about NOOP is that it doesn't get captured in IPEHR.

I think the w/a requires this:

0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE)

no gaps, no insertions, all together, all on one cacheline. Those
instructions take up 8 DWords (32 bytes) so the sequence doesn't
necessarily have to start on a cacheline boundary, as long as it's
entirely within the same line. But it's simpler to start on a new line.
You seem to have:

0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END

so the condition in the comment is not fulfilled. If this works, maybe
the comment is wrong.

>> And therefore also, these instructions do *not* all end up in the same
>> cacheline, thus contradicting the first comment above.
>
> I don't understand why. As per the requirement the commands from the
> first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be
> part of same cacheline (in this case second line).

OK, they're all in the same line; I didn't look back at the full context
enough and thought 'end' would point to the end of the buffer, rather
than the end of the cacheline .. because it /does/ point to the end of
the buffer, it just happens to be the end of the very same cacheline as
well.

So I really don't like the way the sizes of the two workaround batches
have been defined in terms of cache lines. Also I'm not keen on one bit
of code allocating the object and defining the sizes of the sub-areas
within it, and then separate functions filling in each of the sequences
within these areas, "knowing" that the areas are /just the right size/.
It would be simpler to maintain if the "size in cachelines" values in
lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync
with the number of instructions written by gen8_init_perctx_bb() and
gen8_init_indirectctx_bb().

How about having each of these return the number of bytes they've
appended to the (u32 *)buffer that they've been given, and let the
caller manage mapping/unmapping, alignment, padding, etc, and fill in
the size fields accordingly *after* the content has been defined?

.Dave.

>> Padding *after* a BB_END would be redundant.
> 
> yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of
> abruptly terminating the batch which is why I am padding with no-ops, I
> can change this if that is preferred.
>>
>> .Dave.
arun.siluvery@linux.intel.com June 15, 2015, 6:09 p.m. UTC | #7
On 15/06/2015 18:29, Dave Gordon wrote:
> On 15/06/15 15:10, Siluvery, Arun wrote:
>> On 12/06/2015 18:03, Dave Gordon wrote:
>>> On 12/06/15 12:58, Siluvery, Arun wrote:
>>>> On 09/06/2015 19:43, Dave Gordon wrote:
>>>>> On 05/06/15 14:57, Arun Siluvery wrote:
>>>>>> In Per context w/a batch buffer,
>>>>>> WaRsRestoreWithPerCtxtBb
>>>>>>
>>>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>>>>> so as to not break any future users of existing definitions (Michel)
>>>>>>
>>>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 59
>>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>>     2 files changed, 85 insertions(+)
>
> [snip]
>
>>>>>> +    /*
>>>>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>>>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>>>>> +     * in the same cacheline.
>>>>>> +     */
>>>>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>>>>> +        cmd[index++] = MI_NOOP;
>>>>>> +
>>>>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>>>>> +        MI_LRM_USE_GLOBAL_GTT |
>>>>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>>>>> +    cmd[index++] = INSTPM;
>>>>>> +    cmd[index++] = scratch_addr;
>>>>>> +    cmd[index++] = 0;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * BSpec says there should not be any commands programmed
>>>>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>>>>> +     * do not add any new commands
>>>>>> +     */
>>>>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>>> +
>>>>>>         /* padding */
>>>>>>             while (index < end)
>>>>>>             cmd[index++] = MI_NOOP;
>>>>>>
>>>>>
>>>>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
>>>>
>>>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
>>>> Since the diff context is only few lines it didn't showup in the diff.
>>>
>>> The second comment above says "no commands between LOAD_REG_REG and
>>> BB_END", so the point of my comment was that the BB_END is *NOT*
>>> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
>>
>> true, but they are no-ops so they shouldn't really affect anything. I
>> guess the spec means no valid commands.
>
> I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command,
> and I've even seen cases where a workaround specifically requires "a
> NOOP with the set-no-op-id-register bit set" to fix some particular bug.
> The only special thing about NOOP is that it doesn't get captured in IPEHR.
>
> I think the w/a requires this:
>
> 0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE)
>
> no gaps, no insertions, all together, all on one cacheline. Those
> instructions take up 8 DWords (32 bytes) so the sequence doesn't
> necessarily have to start on a cacheline boundary, as long as it's
> entirely within the same line. But it's simpler to start on a new line.
> You seem to have:
>
> 0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END
>
> so the condition in the comment is not fulfilled. If this works, maybe
> the comment is wrong.
>
>>> And therefore also, these instructions do *not* all end up in the same
>>> cacheline, thus contradicting the first comment above.
>>
>> I don't understand why. As per the requirement the commands from the
>> first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be
>> part of same cacheline (in this case second line).
>
> OK, they're all in the same line; I didn't look back at the full context
> enough and thought 'end' would point to the end of the buffer, rather
> than the end of the cacheline .. because it /does/ point to the end of
> the buffer, it just happens to be the end of the very same cacheline as
> well.
>
> So I really don't like the way the sizes of the two workaround batches
> have been defined in terms of cache lines. Also I'm not keen on one bit
> of code allocating the object and defining the sizes of the sub-areas
> within it, and then separate functions filling in each of the sequences
> within these areas, "knowing" that the areas are /just the right size/.
> It would be simpler to maintain if the "size in cachelines" values in
> lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync
> with the number of instructions written by gen8_init_perctx_bb() and
> gen8_init_indirectctx_bb().
>
> How about having each of these return the number of bytes they've
> appended to the (u32 *)buffer that they've been given, and let the
> caller manage mapping/unmapping, alignment, padding, etc, and fill in
> the size fields accordingly *after* the content has been defined?

This is an issue, editing the size if more WA are added is not good, it 
can be changed as you suggested.

regards
Arun

>
> .Dave.
>
>>> Padding *after* a BB_END would be redundant.
>>
>> yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of
>> abruptly terminating the batch which is why I am padding with no-ops, I
>> can change this if that is preferred.
>>>
>>> .Dave.
>
>
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 33b0ff1..6928162 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,26 @@ 
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
 #define   MI_FLUSH_DW_USE_PPGTT		(0<<2)
+#define MI_ATOMIC(len)	MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT	(1<<22)
+#define   MI_ATOMIC_INLINE_DATA		(1<<18)
+#define   MI_ATOMIC_CS_STALL		(1<<17)
+#define   MI_ATOMIC_RETURN_DATA_CTL	(1<<16)
+#define MI_ATOMIC_OP_MASK(op)  ((op) << 8)
+#define MI_ATOMIC_AND	MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR	MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR	MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE	MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC	MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC	MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD	MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB	MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB	MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX	MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN	MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX	MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN	MI_ATOMIC_OP_MASK(0x0D)
+
 #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
 #define   MI_BATCH_NON_SECURE		(1)
 /* for snb/ivb/vlv this also means "batch in ppgtt" when ppgtt is enabled. */
@@ -453,6 +473,10 @@ 
 #define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
 #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
 #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
+#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
+#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
 #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
 #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
 #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
@@ -1799,6 +1823,8 @@  enum skl_disp_power_wells {
 #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE	(1 << 12)
 #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE	(1<<10)
 
+#define GEN8_RS_PREEMPT_STATUS		0x215C
+
 /* Fuse readout registers for GT */
 #define CHV_FUSE_GT			(VLV_DISPLAY_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0		(1 << 10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5f6279b..98335c6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1154,6 +1154,13 @@  static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
 	int end;
 	struct page *page;
 	uint32_t *cmd;
+	u32 scratch_addr;
+	unsigned long flags = 0;
+
+	if (ring->scratch.obj == NULL) {
+		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+		return -EINVAL;
+	}
 
 	page = i915_gem_object_get_page(ring->wa_ctx.obj, 0);
 	cmd = kmap_atomic(page);
@@ -1168,9 +1175,61 @@  static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
 		return -EINVAL;
 	}
 
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	cmd[index++] = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 
+	/*
+	 * As per Bspec, to workaround a known HW issue, SW must perform the
+	 * below programming sequence prior to programming MI_BATCH_BUFFER_END.
+	 *
+	 * This is only applicable for Gen8.
+	 */
+
+	/* WaRsRestoreWithPerCtxtBb:bdw,chv */
+	cmd[index++] = MI_LOAD_REGISTER_IMM(1);
+	cmd[index++] = INSTPM;
+	cmd[index++] = _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING);
+
+	flags = MI_ATOMIC_MEMORY_TYPE_GGTT |
+		MI_ATOMIC_INLINE_DATA |
+		MI_ATOMIC_CS_STALL |
+		MI_ATOMIC_RETURN_DATA_CTL |
+		MI_ATOMIC_MOVE;
+
+	cmd[index++] = MI_ATOMIC(5) | flags;
+	cmd[index++] = scratch_addr;
+	cmd[index++] = 0;
+	cmd[index++] = _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING);
+	cmd[index++] = _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING);
+
+	/*
+	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
+	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
+	 * in the same cacheline.
+	 */
+	while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
+		cmd[index++] = MI_NOOP;
+
+	cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
+		MI_LRM_USE_GLOBAL_GTT |
+		MI_LRM_ASYNC_MODE_ENABLE;
+	cmd[index++] = INSTPM;
+	cmd[index++] = scratch_addr;
+	cmd[index++] = 0;
+
+	/*
+	 * BSpec says there should not be any commands programmed
+	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
+	 * do not add any new commands
+	 */
+	cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
+	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+
 	/* padding */
         while (index < end)
 		cmd[index++] = MI_NOOP;