Message ID | 1433512679-7707-1-git-send-email-arun.siluvery@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/06/15 14:57, Arun Siluvery wrote: > In Per context w/a batch buffer, > WaRsRestoreWithPerCtxtBb > > v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and > MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions > so as to not break any future users of existing definitions (Michel) > > Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_reg.h | 26 ++++++++++++++++++ > drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 85 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index 33b0ff1..6928162 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h [snip] > #define MI_LOAD_REGISTER_MEM MI_INSTR(0x29, 0) > #define MI_LOAD_REGISTER_REG MI_INSTR(0x2A, 0) > +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) > +#define MI_LRM_USE_GLOBAL_GTT (1<<22) > +#define MI_LRM_ASYNC_MODE_ENABLE (1<<21) > +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1) Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's a two-operand instruction, each of which is a one-word MMIO register address, hence always 3 words total. The length bias is 2, so the so-called 'flags' field must be 1. The original definition (where the second argument of the MI_INSTR macro is 0) shouldn't work. So just correct the original definition of MI_LOAD_REGISTER_REG; this isn't something that's actually changed on GEN8. While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+. > #define MI_RS_STORE_DATA_IMM MI_INSTR(0x2B, 0) > #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) > #define MI_STORE_URB_MEM MI_INSTR(0x2D, 0) And these are wrong too! In fact all of these instructions have been added under a comment which says "Commands used only by the command parser". Looks like they were added as placeholders without the proper length fields, and then people have started using them as though they were complete definitions :( Time update them all, perhaps ... [snip] > + /* > + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and > + * MI_BATCH_BUFFER_END instructions in this sequence need to be > + * in the same cacheline. > + */ > + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) > + cmd[index++] = MI_NOOP; > + > + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | > + MI_LRM_USE_GLOBAL_GTT | > + MI_LRM_ASYNC_MODE_ENABLE; > + cmd[index++] = INSTPM; > + cmd[index++] = scratch_addr; > + cmd[index++] = 0; > + > + /* > + * BSpec says there should not be any commands programmed > + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so > + * do not add any new commands > + */ > + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; > + cmd[index++] = GEN8_RS_PREEMPT_STATUS; > + cmd[index++] = GEN8_RS_PREEMPT_STATUS; > + > /* padding */ > while (index < end) > cmd[index++] = MI_NOOP; > Where's the MI_BATCH_BUFFER_END referred to in the comment? .Dave.
On 09/06/2015 19:43, Dave Gordon wrote: > On 05/06/15 14:57, Arun Siluvery wrote: >> In Per context w/a batch buffer, >> WaRsRestoreWithPerCtxtBb >> >> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and >> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions >> so as to not break any future users of existing definitions (Michel) >> >> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >> --- >> drivers/gpu/drm/i915/i915_reg.h | 26 ++++++++++++++++++ >> drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 85 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h >> index 33b0ff1..6928162 100644 >> --- a/drivers/gpu/drm/i915/i915_reg.h >> +++ b/drivers/gpu/drm/i915/i915_reg.h > [snip] >> #define MI_LOAD_REGISTER_MEM MI_INSTR(0x29, 0) >> #define MI_LOAD_REGISTER_REG MI_INSTR(0x2A, 0) >> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) >> +#define MI_LRM_USE_GLOBAL_GTT (1<<22) >> +#define MI_LRM_ASYNC_MODE_ENABLE (1<<21) >> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1) > > Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's > a two-operand instruction, each of which is a one-word MMIO register > address, hence always 3 words total. The length bias is 2, so the > so-called 'flags' field must be 1. The original definition (where the > second argument of the MI_INSTR macro is 0) shouldn't work. > > So just correct the original definition of MI_LOAD_REGISTER_REG; this > isn't something that's actually changed on GEN8. > I did notice that the original instructions are odd but thought I might be wrong hence I created new ones to not disturb the original ones. ok I will just correct original one and reuse it. > While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is > wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+. > ok. >> #define MI_RS_STORE_DATA_IMM MI_INSTR(0x2B, 0) >> #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) >> #define MI_STORE_URB_MEM MI_INSTR(0x2D, 0) > > And these are wrong too! In fact all of these instructions have been > added under a comment which says "Commands used only by the command > parser". Looks like they were added as placeholders without the proper > length fields, and then people have started using them as though they > were complete definitions :( > > Time update them all, perhaps ... these are not related to this patch, so it can be taken up as a different patch. > > [snip] > >> + /* >> + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and >> + * MI_BATCH_BUFFER_END instructions in this sequence need to be >> + * in the same cacheline. >> + */ >> + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) >> + cmd[index++] = MI_NOOP; >> + >> + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | >> + MI_LRM_USE_GLOBAL_GTT | >> + MI_LRM_ASYNC_MODE_ENABLE; >> + cmd[index++] = INSTPM; >> + cmd[index++] = scratch_addr; >> + cmd[index++] = 0; >> + >> + /* >> + * BSpec says there should not be any commands programmed >> + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so >> + * do not add any new commands >> + */ >> + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; >> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >> + >> /* padding */ >> while (index < end) >> cmd[index++] = MI_NOOP; >> > > Where's the MI_BATCH_BUFFER_END referred to in the comment? MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. Since the diff context is only few lines it didn't showup in the diff. regards Arun > > .Dave. > >
On 12/06/15 12:58, Siluvery, Arun wrote: > On 09/06/2015 19:43, Dave Gordon wrote: >> On 05/06/15 14:57, Arun Siluvery wrote: >>> In Per context w/a batch buffer, >>> WaRsRestoreWithPerCtxtBb >>> >>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and >>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions >>> so as to not break any future users of existing definitions (Michel) >>> >>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> >>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >>> --- >>> drivers/gpu/drm/i915/i915_reg.h | 26 ++++++++++++++++++ >>> drivers/gpu/drm/i915/intel_lrc.c | 59 >>> ++++++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 85 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>> b/drivers/gpu/drm/i915/i915_reg.h >>> index 33b0ff1..6928162 100644 >>> --- a/drivers/gpu/drm/i915/i915_reg.h >>> +++ b/drivers/gpu/drm/i915/i915_reg.h >> [snip] >>> #define MI_LOAD_REGISTER_MEM MI_INSTR(0x29, 0) >>> #define MI_LOAD_REGISTER_REG MI_INSTR(0x2A, 0) >>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) >>> +#define MI_LRM_USE_GLOBAL_GTT (1<<22) >>> +#define MI_LRM_ASYNC_MODE_ENABLE (1<<21) >>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1) >> >> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's >> a two-operand instruction, each of which is a one-word MMIO register >> address, hence always 3 words total. The length bias is 2, so the >> so-called 'flags' field must be 1. The original definition (where the >> second argument of the MI_INSTR macro is 0) shouldn't work. >> >> So just correct the original definition of MI_LOAD_REGISTER_REG; this >> isn't something that's actually changed on GEN8. >> > I did notice that the original instructions are odd but thought I might > be wrong hence I created new ones to not disturb the original ones. > ok I will just correct original one and reuse it. > >> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is >> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+. >> > ok. >>> #define MI_RS_STORE_DATA_IMM MI_INSTR(0x2B, 0) >>> #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) >>> #define MI_STORE_URB_MEM MI_INSTR(0x2D, 0) >> >> And these are wrong too! In fact all of these instructions have been >> added under a comment which says "Commands used only by the command >> parser". Looks like they were added as placeholders without the proper >> length fields, and then people have started using them as though they >> were complete definitions :( >> >> Time update them all, perhaps ... > these are not related to this patch, so it can be taken up as a > different patch. As a minimum, you should move these updated #defines out of the section under the comment "Commands used only by the command parser" and put them in the appropriate place in the regular list of MI_ commnds, preferably in numerical order. Then the ones that are genuinely only used by the command parser could be left for another patch ... >> [snip] >> >>> + /* >>> + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and >>> + * MI_BATCH_BUFFER_END instructions in this sequence need to be >>> + * in the same cacheline. >>> + */ >>> + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) >>> + cmd[index++] = MI_NOOP; >>> + >>> + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | >>> + MI_LRM_USE_GLOBAL_GTT | >>> + MI_LRM_ASYNC_MODE_ENABLE; >>> + cmd[index++] = INSTPM; >>> + cmd[index++] = scratch_addr; >>> + cmd[index++] = 0; >>> + >>> + /* >>> + * BSpec says there should not be any commands programmed >>> + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so >>> + * do not add any new commands >>> + */ >>> + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; >>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>> + >>> /* padding */ >>> while (index < end) >>> cmd[index++] = MI_NOOP; >>> >> >> Where's the MI_BATCH_BUFFER_END referred to in the comment? > > MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. > Since the diff context is only few lines it didn't showup in the diff. The second comment above says "no commands between LOAD_REG_REG and BB_END", so the point of my comment was that the BB_END is *NOT* immediately after the LOAD_REG_REG -- there are a bunch of no-ops there! And therefore also, these instructions do *not* all end up in the same cacheline, thus contradicting the first comment above. Padding *after* a BB_END would be redundant. .Dave.
On 12/06/2015 18:03, Dave Gordon wrote: > On 12/06/15 12:58, Siluvery, Arun wrote: >> On 09/06/2015 19:43, Dave Gordon wrote: >>> On 05/06/15 14:57, Arun Siluvery wrote: >>>> In Per context w/a batch buffer, >>>> WaRsRestoreWithPerCtxtBb >>>> >>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and >>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions >>>> so as to not break any future users of existing definitions (Michel) >>>> >>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> >>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >>>> --- >>>> drivers/gpu/drm/i915/i915_reg.h | 26 ++++++++++++++++++ >>>> drivers/gpu/drm/i915/intel_lrc.c | 59 >>>> ++++++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 85 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>>> b/drivers/gpu/drm/i915/i915_reg.h >>>> index 33b0ff1..6928162 100644 >>>> --- a/drivers/gpu/drm/i915/i915_reg.h >>>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>> [snip] >>>> #define MI_LOAD_REGISTER_MEM MI_INSTR(0x29, 0) >>>> #define MI_LOAD_REGISTER_REG MI_INSTR(0x2A, 0) >>>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) >>>> +#define MI_LRM_USE_GLOBAL_GTT (1<<22) >>>> +#define MI_LRM_ASYNC_MODE_ENABLE (1<<21) >>>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1) >>> >>> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's >>> a two-operand instruction, each of which is a one-word MMIO register >>> address, hence always 3 words total. The length bias is 2, so the >>> so-called 'flags' field must be 1. The original definition (where the >>> second argument of the MI_INSTR macro is 0) shouldn't work. >>> >>> So just correct the original definition of MI_LOAD_REGISTER_REG; this >>> isn't something that's actually changed on GEN8. >>> >> I did notice that the original instructions are odd but thought I might >> be wrong hence I created new ones to not disturb the original ones. >> ok I will just correct original one and reuse it. >> >>> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is >>> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+. >>> >> ok. >>>> #define MI_RS_STORE_DATA_IMM MI_INSTR(0x2B, 0) >>>> #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) >>>> #define MI_STORE_URB_MEM MI_INSTR(0x2D, 0) >>> >>> And these are wrong too! In fact all of these instructions have been >>> added under a comment which says "Commands used only by the command >>> parser". Looks like they were added as placeholders without the proper >>> length fields, and then people have started using them as though they >>> were complete definitions :( >>> >>> Time update them all, perhaps ... >> these are not related to this patch, so it can be taken up as a >> different patch. > > As a minimum, you should move these updated #defines out of the section > under the comment "Commands used only by the command parser" and put > them in the appropriate place in the regular list of MI_ commnds, > preferably in numerical order. Then the ones that are genuinely only > used by the command parser could be left for another patch ... > >>> [snip] >>> >>>> + /* >>>> + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and >>>> + * MI_BATCH_BUFFER_END instructions in this sequence need to be >>>> + * in the same cacheline. >>>> + */ >>>> + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) >>>> + cmd[index++] = MI_NOOP; >>>> + >>>> + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | >>>> + MI_LRM_USE_GLOBAL_GTT | >>>> + MI_LRM_ASYNC_MODE_ENABLE; >>>> + cmd[index++] = INSTPM; >>>> + cmd[index++] = scratch_addr; >>>> + cmd[index++] = 0; >>>> + >>>> + /* >>>> + * BSpec says there should not be any commands programmed >>>> + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so >>>> + * do not add any new commands >>>> + */ >>>> + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; >>>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>>> + >>>> /* padding */ >>>> while (index < end) >>>> cmd[index++] = MI_NOOP; >>>> >>> >>> Where's the MI_BATCH_BUFFER_END referred to in the comment? >> >> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. >> Since the diff context is only few lines it didn't showup in the diff. > > The second comment above says "no commands between LOAD_REG_REG and > BB_END", so the point of my comment was that the BB_END is *NOT* > immediately after the LOAD_REG_REG -- there are a bunch of no-ops there! true, but they are no-ops so they shouldn't really affect anything. I guess the spec means no valid commands. > > And therefore also, these instructions do *not* all end up in the same > cacheline, thus contradicting the first comment above. I don't understand why. As per the requirement the commands from the first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be part of same cacheline (in this case second line). > > Padding *after* a BB_END would be redundant. yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of abruptly terminating the batch which is why I am padding with no-ops, I can change this if that is preferred. > > .Dave. > >
On Fri, Jun 12, 2015 at 06:03:55PM +0100, Dave Gordon wrote: > On 12/06/15 12:58, Siluvery, Arun wrote: > > On 09/06/2015 19:43, Dave Gordon wrote: > >> On 05/06/15 14:57, Arun Siluvery wrote: > >>> In Per context w/a batch buffer, > >>> WaRsRestoreWithPerCtxtBb > >>> > >>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and > >>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions > >>> so as to not break any future users of existing definitions (Michel) > >>> > >>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> > >>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > >>> --- > >>> drivers/gpu/drm/i915/i915_reg.h | 26 ++++++++++++++++++ > >>> drivers/gpu/drm/i915/intel_lrc.c | 59 > >>> ++++++++++++++++++++++++++++++++++++++++ > >>> 2 files changed, 85 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/i915/i915_reg.h > >>> b/drivers/gpu/drm/i915/i915_reg.h > >>> index 33b0ff1..6928162 100644 > >>> --- a/drivers/gpu/drm/i915/i915_reg.h > >>> +++ b/drivers/gpu/drm/i915/i915_reg.h > >> [snip] > >>> #define MI_LOAD_REGISTER_MEM MI_INSTR(0x29, 0) > >>> #define MI_LOAD_REGISTER_REG MI_INSTR(0x2A, 0) > >>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) > >>> +#define MI_LRM_USE_GLOBAL_GTT (1<<22) > >>> +#define MI_LRM_ASYNC_MODE_ENABLE (1<<21) > >>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1) > >> > >> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's > >> a two-operand instruction, each of which is a one-word MMIO register > >> address, hence always 3 words total. The length bias is 2, so the > >> so-called 'flags' field must be 1. The original definition (where the > >> second argument of the MI_INSTR macro is 0) shouldn't work. > >> > >> So just correct the original definition of MI_LOAD_REGISTER_REG; this > >> isn't something that's actually changed on GEN8. > >> > > I did notice that the original instructions are odd but thought I might > > be wrong hence I created new ones to not disturb the original ones. > > ok I will just correct original one and reuse it. > > > >> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is > >> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+. > >> > > ok. > >>> #define MI_RS_STORE_DATA_IMM MI_INSTR(0x2B, 0) > >>> #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) > >>> #define MI_STORE_URB_MEM MI_INSTR(0x2D, 0) > >> > >> And these are wrong too! In fact all of these instructions have been > >> added under a comment which says "Commands used only by the command > >> parser". Looks like they were added as placeholders without the proper > >> length fields, and then people have started using them as though they > >> were complete definitions :( > >> > >> Time update them all, perhaps ... > > these are not related to this patch, so it can be taken up as a > > different patch. > > As a minimum, you should move these updated #defines out of the section > under the comment "Commands used only by the command parser" and put > them in the appropriate place in the regular list of MI_ commnds, > preferably in numerical order. Then the ones that are genuinely only > used by the command parser could be left for another patch ... Please just correct the #defines while at it, this really is way to tempting a trap to keep it hot. Can be done in a separate patch ofc, but imo not fixing an obvious issue when we spot it because its not perfectly directly related to the feature work at hand is bad practice leading to piles of technical debt. And that's the kind of stuff that robs me of my sleep at night ;-) Thanks, Daniel
On 15/06/15 15:10, Siluvery, Arun wrote: > On 12/06/2015 18:03, Dave Gordon wrote: >> On 12/06/15 12:58, Siluvery, Arun wrote: >>> On 09/06/2015 19:43, Dave Gordon wrote: >>>> On 05/06/15 14:57, Arun Siluvery wrote: >>>>> In Per context w/a batch buffer, >>>>> WaRsRestoreWithPerCtxtBb >>>>> >>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and >>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions >>>>> so as to not break any future users of existing definitions (Michel) >>>>> >>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> >>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >>>>> --- >>>>> drivers/gpu/drm/i915/i915_reg.h | 26 ++++++++++++++++++ >>>>> drivers/gpu/drm/i915/intel_lrc.c | 59 >>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>> 2 files changed, 85 insertions(+) [snip] >>>>> + /* >>>>> + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and >>>>> + * MI_BATCH_BUFFER_END instructions in this sequence need to be >>>>> + * in the same cacheline. >>>>> + */ >>>>> + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) >>>>> + cmd[index++] = MI_NOOP; >>>>> + >>>>> + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | >>>>> + MI_LRM_USE_GLOBAL_GTT | >>>>> + MI_LRM_ASYNC_MODE_ENABLE; >>>>> + cmd[index++] = INSTPM; >>>>> + cmd[index++] = scratch_addr; >>>>> + cmd[index++] = 0; >>>>> + >>>>> + /* >>>>> + * BSpec says there should not be any commands programmed >>>>> + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so >>>>> + * do not add any new commands >>>>> + */ >>>>> + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; >>>>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>>>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>>>> + >>>>> /* padding */ >>>>> while (index < end) >>>>> cmd[index++] = MI_NOOP; >>>>> >>>> >>>> Where's the MI_BATCH_BUFFER_END referred to in the comment? >>> >>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. >>> Since the diff context is only few lines it didn't showup in the diff. >> >> The second comment above says "no commands between LOAD_REG_REG and >> BB_END", so the point of my comment was that the BB_END is *NOT* >> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there! > > true, but they are no-ops so they shouldn't really affect anything. I > guess the spec means no valid commands. I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command, and I've even seen cases where a workaround specifically requires "a NOOP with the set-no-op-id-register bit set" to fix some particular bug. The only special thing about NOOP is that it doesn't get captured in IPEHR. I think the w/a requires this: 0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE) no gaps, no insertions, all together, all on one cacheline. Those instructions take up 8 DWords (32 bytes) so the sequence doesn't necessarily have to start on a cacheline boundary, as long as it's entirely within the same line. But it's simpler to start on a new line. You seem to have: 0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END so the condition in the comment is not fulfilled. If this works, maybe the comment is wrong. >> And therefore also, these instructions do *not* all end up in the same >> cacheline, thus contradicting the first comment above. > > I don't understand why. As per the requirement the commands from the > first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be > part of same cacheline (in this case second line). OK, they're all in the same line; I didn't look back at the full context enough and thought 'end' would point to the end of the buffer, rather than the end of the cacheline .. because it /does/ point to the end of the buffer, it just happens to be the end of the very same cacheline as well. So I really don't like the way the sizes of the two workaround batches have been defined in terms of cache lines. Also I'm not keen on one bit of code allocating the object and defining the sizes of the sub-areas within it, and then separate functions filling in each of the sequences within these areas, "knowing" that the areas are /just the right size/. It would be simpler to maintain if the "size in cachelines" values in lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync with the number of instructions written by gen8_init_perctx_bb() and gen8_init_indirectctx_bb(). How about having each of these return the number of bytes they've appended to the (u32 *)buffer that they've been given, and let the caller manage mapping/unmapping, alignment, padding, etc, and fill in the size fields accordingly *after* the content has been defined? .Dave. >> Padding *after* a BB_END would be redundant. > > yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of > abruptly terminating the batch which is why I am padding with no-ops, I > can change this if that is preferred. >> >> .Dave.
On 15/06/2015 18:29, Dave Gordon wrote: > On 15/06/15 15:10, Siluvery, Arun wrote: >> On 12/06/2015 18:03, Dave Gordon wrote: >>> On 12/06/15 12:58, Siluvery, Arun wrote: >>>> On 09/06/2015 19:43, Dave Gordon wrote: >>>>> On 05/06/15 14:57, Arun Siluvery wrote: >>>>>> In Per context w/a batch buffer, >>>>>> WaRsRestoreWithPerCtxtBb >>>>>> >>>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and >>>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions >>>>>> so as to not break any future users of existing definitions (Michel) >>>>>> >>>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> >>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >>>>>> --- >>>>>> drivers/gpu/drm/i915/i915_reg.h | 26 ++++++++++++++++++ >>>>>> drivers/gpu/drm/i915/intel_lrc.c | 59 >>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>> 2 files changed, 85 insertions(+) > > [snip] > >>>>>> + /* >>>>>> + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and >>>>>> + * MI_BATCH_BUFFER_END instructions in this sequence need to be >>>>>> + * in the same cacheline. >>>>>> + */ >>>>>> + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) >>>>>> + cmd[index++] = MI_NOOP; >>>>>> + >>>>>> + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | >>>>>> + MI_LRM_USE_GLOBAL_GTT | >>>>>> + MI_LRM_ASYNC_MODE_ENABLE; >>>>>> + cmd[index++] = INSTPM; >>>>>> + cmd[index++] = scratch_addr; >>>>>> + cmd[index++] = 0; >>>>>> + >>>>>> + /* >>>>>> + * BSpec says there should not be any commands programmed >>>>>> + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so >>>>>> + * do not add any new commands >>>>>> + */ >>>>>> + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; >>>>>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>>>>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>>>>> + >>>>>> /* padding */ >>>>>> while (index < end) >>>>>> cmd[index++] = MI_NOOP; >>>>>> >>>>> >>>>> Where's the MI_BATCH_BUFFER_END referred to in the comment? >>>> >>>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. >>>> Since the diff context is only few lines it didn't showup in the diff. >>> >>> The second comment above says "no commands between LOAD_REG_REG and >>> BB_END", so the point of my comment was that the BB_END is *NOT* >>> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there! >> >> true, but they are no-ops so they shouldn't really affect anything. I >> guess the spec means no valid commands. > > I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command, > and I've even seen cases where a workaround specifically requires "a > NOOP with the set-no-op-id-register bit set" to fix some particular bug. > The only special thing about NOOP is that it doesn't get captured in IPEHR. > > I think the w/a requires this: > > 0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE) > > no gaps, no insertions, all together, all on one cacheline. Those > instructions take up 8 DWords (32 bytes) so the sequence doesn't > necessarily have to start on a cacheline boundary, as long as it's > entirely within the same line. But it's simpler to start on a new line. > You seem to have: > > 0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END > > so the condition in the comment is not fulfilled. If this works, maybe > the comment is wrong. > >>> And therefore also, these instructions do *not* all end up in the same >>> cacheline, thus contradicting the first comment above. >> >> I don't understand why. As per the requirement the commands from the >> first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be >> part of same cacheline (in this case second line). > > OK, they're all in the same line; I didn't look back at the full context > enough and thought 'end' would point to the end of the buffer, rather > than the end of the cacheline .. because it /does/ point to the end of > the buffer, it just happens to be the end of the very same cacheline as > well. > > So I really don't like the way the sizes of the two workaround batches > have been defined in terms of cache lines. Also I'm not keen on one bit > of code allocating the object and defining the sizes of the sub-areas > within it, and then separate functions filling in each of the sequences > within these areas, "knowing" that the areas are /just the right size/. > It would be simpler to maintain if the "size in cachelines" values in > lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync > with the number of instructions written by gen8_init_perctx_bb() and > gen8_init_indirectctx_bb(). > > How about having each of these return the number of bytes they've > appended to the (u32 *)buffer that they've been given, and let the > caller manage mapping/unmapping, alignment, padding, etc, and fill in > the size fields accordingly *after* the content has been defined? This is an issue, editing the size if more WA are added is not good, it can be changed as you suggested. regards Arun > > .Dave. > >>> Padding *after* a BB_END would be redundant. >> >> yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of >> abruptly terminating the batch which is why I am padding with no-ops, I >> can change this if that is preferred. >>> >>> .Dave. > > >
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 33b0ff1..6928162 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -347,6 +347,26 @@ #define MI_INVALIDATE_BSD (1<<7) #define MI_FLUSH_DW_USE_GTT (1<<2) #define MI_FLUSH_DW_USE_PPGTT (0<<2) +#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2)) +#define MI_ATOMIC_MEMORY_TYPE_GGTT (1<<22) +#define MI_ATOMIC_INLINE_DATA (1<<18) +#define MI_ATOMIC_CS_STALL (1<<17) +#define MI_ATOMIC_RETURN_DATA_CTL (1<<16) +#define MI_ATOMIC_OP_MASK(op) ((op) << 8) +#define MI_ATOMIC_AND MI_ATOMIC_OP_MASK(0x01) +#define MI_ATOMIC_OR MI_ATOMIC_OP_MASK(0x02) +#define MI_ATOMIC_XOR MI_ATOMIC_OP_MASK(0x03) +#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04) +#define MI_ATOMIC_INC MI_ATOMIC_OP_MASK(0x05) +#define MI_ATOMIC_DEC MI_ATOMIC_OP_MASK(0x06) +#define MI_ATOMIC_ADD MI_ATOMIC_OP_MASK(0x07) +#define MI_ATOMIC_SUB MI_ATOMIC_OP_MASK(0x08) +#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09) +#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A) +#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B) +#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C) +#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D) + #define MI_BATCH_BUFFER MI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE (1) /* for snb/ivb/vlv this also means "batch in ppgtt" when ppgtt is enabled. */ @@ -453,6 +473,10 @@ #define MI_REPORT_PERF_COUNT_GGTT (1<<0) #define MI_LOAD_REGISTER_MEM MI_INSTR(0x29, 0) #define MI_LOAD_REGISTER_REG MI_INSTR(0x2A, 0) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) +#define MI_LRM_USE_GLOBAL_GTT (1<<22) +#define MI_LRM_ASYNC_MODE_ENABLE (1<<21) +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1) #define MI_RS_STORE_DATA_IMM MI_INSTR(0x2B, 0) #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) #define MI_STORE_URB_MEM MI_INSTR(0x2D, 0) @@ -1799,6 +1823,8 @@ enum skl_disp_power_wells { #define GEN8_RC_SEMA_IDLE_MSG_DISABLE (1 << 12) #define GEN8_FF_DOP_CLOCK_GATE_DISABLE (1<<10) +#define GEN8_RS_PREEMPT_STATUS 0x215C + /* Fuse readout registers for GT */ #define CHV_FUSE_GT (VLV_DISPLAY_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 << 10) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 5f6279b..98335c6 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1154,6 +1154,13 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring) int end; struct page *page; uint32_t *cmd; + u32 scratch_addr; + unsigned long flags = 0; + + if (ring->scratch.obj == NULL) { + DRM_ERROR("scratch page not allocated for %s\n", ring->name); + return -EINVAL; + } page = i915_gem_object_get_page(ring->wa_ctx.obj, 0); cmd = kmap_atomic(page); @@ -1168,9 +1175,61 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring) return -EINVAL; } + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; + /* WaDisableCtxRestoreArbitration:bdw,chv */ cmd[index++] = MI_ARB_ON_OFF | MI_ARB_ENABLE; + /* + * As per Bspec, to workaround a known HW issue, SW must perform the + * below programming sequence prior to programming MI_BATCH_BUFFER_END. + * + * This is only applicable for Gen8. + */ + + /* WaRsRestoreWithPerCtxtBb:bdw,chv */ + cmd[index++] = MI_LOAD_REGISTER_IMM(1); + cmd[index++] = INSTPM; + cmd[index++] = _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING); + + flags = MI_ATOMIC_MEMORY_TYPE_GGTT | + MI_ATOMIC_INLINE_DATA | + MI_ATOMIC_CS_STALL | + MI_ATOMIC_RETURN_DATA_CTL | + MI_ATOMIC_MOVE; + + cmd[index++] = MI_ATOMIC(5) | flags; + cmd[index++] = scratch_addr; + cmd[index++] = 0; + cmd[index++] = _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING); + cmd[index++] = _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING); + + /* + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and + * MI_BATCH_BUFFER_END instructions in this sequence need to be + * in the same cacheline. + */ + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) + cmd[index++] = MI_NOOP; + + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | + MI_LRM_USE_GLOBAL_GTT | + MI_LRM_ASYNC_MODE_ENABLE; + cmd[index++] = INSTPM; + cmd[index++] = scratch_addr; + cmd[index++] = 0; + + /* + * BSpec says there should not be any commands programmed + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so + * do not add any new commands + */ + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; + cmd[index++] = GEN8_RS_PREEMPT_STATUS; + cmd[index++] = GEN8_RS_PREEMPT_STATUS; + /* padding */ while (index < end) cmd[index++] = MI_NOOP;