Message ID | 1516815007-17902-1-git-send-email-daniele.ceraolospurio@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Daniele Ceraolo Spurio (2018-01-24 17:30:07) > From: Thomas Daniel <thomas.daniel@intel.com> > > Enhanced Execlists is an upgraded version of execlists which supports > up to 8 ports. The lrcs to be submitted are written to a submit queue > (the ExecLists Submission Queue - ELSQ), which is then loaded on the > HW. When writing to the ELSP register, the lrcs are written cyclically > in the queue from position 0 to position 7. Alternatively, it is > possible to write directly in the individual positions of the queue > using the ELSQC registers. To be able to re-use all the existing code > we're using the latter method and we're currently limiting ourself to > only using 2 elements. > > The preemption flow is sligthly different with enhanced execlists, so > this patch turns preemption off temporarily for platforms with ELSQ > while we wait for the new mechanism to land. > > v2: Rebase. > v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio). > v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio) > v5: Reword commit, rename regs to be closer to specs, turn off > preemption (Daniele), reuse engine->execlists.elsp (Chris) > v6: use has_logical_ring_elsq to differentiate the new paths > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 7 ++++++- > drivers/gpu/drm/i915/i915_pci.c | 3 ++- > drivers/gpu/drm/i915/intel_device_info.h | 1 + > drivers/gpu/drm/i915/intel_lrc.c | 35 +++++++++++++++++++++++++++----- > drivers/gpu/drm/i915/intel_lrc.h | 3 +++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 6 ++++-- > 6 files changed, 46 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 8333692..346209a 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2741,8 +2741,13 @@ static inline unsigned int i915_sg_segment_size(void) > > #define HAS_LOGICAL_RING_CONTEXTS(dev_priv) \ > ((dev_priv)->info.has_logical_ring_contexts) > +#define HAS_LOGICAL_RING_ELSQ(dev_priv) \ > + ((dev_priv)->info.has_logical_ring_elsq) > + > +/* XXX: Preemption disabled for ELSQ until support for new flow lands */ > #define HAS_LOGICAL_RING_PREEMPTION(dev_priv) \ > - ((dev_priv)->info.has_logical_ring_preemption) > + ((dev_priv)->info.has_logical_ring_preemption && \ > + !HAS_LOGICAL_RING_ELSQ(dev_priv)) It's in the intel_device_info for a reason. I knew I should not have let Michal turn this into a macro. I still do not see any reason why you don't just make the current preemption work (it will) and then you can refine it if you prove it worthwhile. -Chris
On 24/01/18 09:46, Chris Wilson wrote: > Quoting Daniele Ceraolo Spurio (2018-01-24 17:30:07) >> From: Thomas Daniel <thomas.daniel@intel.com> >> >> Enhanced Execlists is an upgraded version of execlists which supports >> up to 8 ports. The lrcs to be submitted are written to a submit queue >> (the ExecLists Submission Queue - ELSQ), which is then loaded on the >> HW. When writing to the ELSP register, the lrcs are written cyclically >> in the queue from position 0 to position 7. Alternatively, it is >> possible to write directly in the individual positions of the queue >> using the ELSQC registers. To be able to re-use all the existing code >> we're using the latter method and we're currently limiting ourself to >> only using 2 elements. >> >> The preemption flow is sligthly different with enhanced execlists, so >> this patch turns preemption off temporarily for platforms with ELSQ >> while we wait for the new mechanism to land. >> >> v2: Rebase. >> v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio). >> v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio) >> v5: Reword commit, rename regs to be closer to specs, turn off >> preemption (Daniele), reuse engine->execlists.elsp (Chris) >> v6: use has_logical_ring_elsq to differentiate the new paths >> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> >> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> >> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> >> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >> --- >> drivers/gpu/drm/i915/i915_drv.h | 7 ++++++- >> drivers/gpu/drm/i915/i915_pci.c | 3 ++- >> drivers/gpu/drm/i915/intel_device_info.h | 1 + >> drivers/gpu/drm/i915/intel_lrc.c | 35 +++++++++++++++++++++++++++----- >> drivers/gpu/drm/i915/intel_lrc.h | 3 +++ >> drivers/gpu/drm/i915/intel_ringbuffer.h | 6 ++++-- >> 6 files changed, 46 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h >> index 8333692..346209a 100644 >> --- a/drivers/gpu/drm/i915/i915_drv.h >> +++ b/drivers/gpu/drm/i915/i915_drv.h >> @@ -2741,8 +2741,13 @@ static inline unsigned int i915_sg_segment_size(void) >> >> #define HAS_LOGICAL_RING_CONTEXTS(dev_priv) \ >> ((dev_priv)->info.has_logical_ring_contexts) >> +#define HAS_LOGICAL_RING_ELSQ(dev_priv) \ >> + ((dev_priv)->info.has_logical_ring_elsq) >> + >> +/* XXX: Preemption disabled for ELSQ until support for new flow lands */ >> #define HAS_LOGICAL_RING_PREEMPTION(dev_priv) \ >> - ((dev_priv)->info.has_logical_ring_preemption) >> + ((dev_priv)->info.has_logical_ring_preemption && \ >> + !HAS_LOGICAL_RING_ELSQ(dev_priv)) > > It's in the intel_device_info for a reason. I knew I should not have let > Michal turn this into a macro. > You mean setting has_logical_ring_preemption to zero directly? I thought the policy was to avoid setting things in device_info to values that don't reflect real HW capabilities and to do the hacks elsewhere. > I still do not see any reason why you don't just make the current > preemption work (it will) and then you can refine it if you prove it > worthwhile. > -Chris > Just didn't see the worth of it ;). It's not a lot of code but it's in an hot path and we're most likely going to get rid of it soon as the new stuff is simpler. I'll put the change together and send it out so we can evaluate that and see what works better with code at hand. Thanks, Daniele
Quoting Daniele Ceraolo Spurio (2018-01-26 00:10:09) > > > On 24/01/18 09:46, Chris Wilson wrote: > > Quoting Daniele Ceraolo Spurio (2018-01-24 17:30:07) > >> From: Thomas Daniel <thomas.daniel@intel.com> > >> > >> Enhanced Execlists is an upgraded version of execlists which supports > >> up to 8 ports. The lrcs to be submitted are written to a submit queue > >> (the ExecLists Submission Queue - ELSQ), which is then loaded on the > >> HW. When writing to the ELSP register, the lrcs are written cyclically > >> in the queue from position 0 to position 7. Alternatively, it is > >> possible to write directly in the individual positions of the queue > >> using the ELSQC registers. To be able to re-use all the existing code > >> we're using the latter method and we're currently limiting ourself to > >> only using 2 elements. > >> > >> The preemption flow is sligthly different with enhanced execlists, so > >> this patch turns preemption off temporarily for platforms with ELSQ > >> while we wait for the new mechanism to land. > >> > >> v2: Rebase. > >> v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio). > >> v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio) > >> v5: Reword commit, rename regs to be closer to specs, turn off > >> preemption (Daniele), reuse engine->execlists.elsp (Chris) > >> v6: use has_logical_ring_elsq to differentiate the new paths > >> > >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > >> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> > >> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > >> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > >> --- > >> drivers/gpu/drm/i915/i915_drv.h | 7 ++++++- > >> drivers/gpu/drm/i915/i915_pci.c | 3 ++- > >> drivers/gpu/drm/i915/intel_device_info.h | 1 + > >> drivers/gpu/drm/i915/intel_lrc.c | 35 +++++++++++++++++++++++++++----- > >> drivers/gpu/drm/i915/intel_lrc.h | 3 +++ > >> drivers/gpu/drm/i915/intel_ringbuffer.h | 6 ++++-- > >> 6 files changed, 46 insertions(+), 9 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > >> index 8333692..346209a 100644 > >> --- a/drivers/gpu/drm/i915/i915_drv.h > >> +++ b/drivers/gpu/drm/i915/i915_drv.h > >> @@ -2741,8 +2741,13 @@ static inline unsigned int i915_sg_segment_size(void) > >> > >> #define HAS_LOGICAL_RING_CONTEXTS(dev_priv) \ > >> ((dev_priv)->info.has_logical_ring_contexts) > >> +#define HAS_LOGICAL_RING_ELSQ(dev_priv) \ > >> + ((dev_priv)->info.has_logical_ring_elsq) > >> + > >> +/* XXX: Preemption disabled for ELSQ until support for new flow lands */ > >> #define HAS_LOGICAL_RING_PREEMPTION(dev_priv) \ > >> - ((dev_priv)->info.has_logical_ring_preemption) > >> + ((dev_priv)->info.has_logical_ring_preemption && \ > >> + !HAS_LOGICAL_RING_ELSQ(dev_priv)) > > > > It's in the intel_device_info for a reason. I knew I should not have let > > Michal turn this into a macro. > > > > You mean setting has_logical_ring_preemption to zero directly? I thought > the policy was to avoid setting things in device_info to values that > don't reflect real HW capabilities and to do the hacks elsewhere. No, data driven code. intel_device_info was introduced to remove having heavy predicates so that we could see what will be enabled and what not in one place. > > I still do not see any reason why you don't just make the current > > preemption work (it will) and then you can refine it if you prove it > > worthwhile. > > > > Just didn't see the worth of it ;). It's not a lot of code but it's in > an hot path and we're most likely going to get rid of it soon as the new > stuff is simpler. I'll put the change together and send it out so we can > evaluate that and see what works better with code at hand. Is the new stuff going to be any simpler? You still need a preemption point, so a special submission followed by detecting that in the CS handler to do the unwind. And whilst I am here, els is awful. Either stick with elsp and note that they changed the name (+layout) on icl, or replace it with a generic name. Spelling it out completely as execlists->execlists_submission is still better than els, but submit[_reg] (or submit_hw) would be clearer. -Chris
On 26/01/18 00:47, Chris Wilson wrote: > Quoting Daniele Ceraolo Spurio (2018-01-26 00:10:09) >> >> >> On 24/01/18 09:46, Chris Wilson wrote: >>> Quoting Daniele Ceraolo Spurio (2018-01-24 17:30:07) >>>> From: Thomas Daniel <thomas.daniel@intel.com> >>>> >>>> Enhanced Execlists is an upgraded version of execlists which supports >>>> up to 8 ports. The lrcs to be submitted are written to a submit queue >>>> (the ExecLists Submission Queue - ELSQ), which is then loaded on the >>>> HW. When writing to the ELSP register, the lrcs are written cyclically >>>> in the queue from position 0 to position 7. Alternatively, it is >>>> possible to write directly in the individual positions of the queue >>>> using the ELSQC registers. To be able to re-use all the existing code >>>> we're using the latter method and we're currently limiting ourself to >>>> only using 2 elements. >>>> >>>> The preemption flow is sligthly different with enhanced execlists, so >>>> this patch turns preemption off temporarily for platforms with ELSQ >>>> while we wait for the new mechanism to land. >>>> >>>> v2: Rebase. >>>> v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio). >>>> v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio) >>>> v5: Reword commit, rename regs to be closer to specs, turn off >>>> preemption (Daniele), reuse engine->execlists.elsp (Chris) >>>> v6: use has_logical_ring_elsq to differentiate the new paths >>>> >>>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> >>>> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> >>>> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> >>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >>>> --- >>>> drivers/gpu/drm/i915/i915_drv.h | 7 ++++++- >>>> drivers/gpu/drm/i915/i915_pci.c | 3 ++- >>>> drivers/gpu/drm/i915/intel_device_info.h | 1 + >>>> drivers/gpu/drm/i915/intel_lrc.c | 35 +++++++++++++++++++++++++++----- >>>> drivers/gpu/drm/i915/intel_lrc.h | 3 +++ >>>> drivers/gpu/drm/i915/intel_ringbuffer.h | 6 ++++-- >>>> 6 files changed, 46 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h >>>> index 8333692..346209a 100644 >>>> --- a/drivers/gpu/drm/i915/i915_drv.h >>>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>>> @@ -2741,8 +2741,13 @@ static inline unsigned int i915_sg_segment_size(void) >>>> >>>> #define HAS_LOGICAL_RING_CONTEXTS(dev_priv) \ >>>> ((dev_priv)->info.has_logical_ring_contexts) >>>> +#define HAS_LOGICAL_RING_ELSQ(dev_priv) \ >>>> + ((dev_priv)->info.has_logical_ring_elsq) >>>> + >>>> +/* XXX: Preemption disabled for ELSQ until support for new flow lands */ >>>> #define HAS_LOGICAL_RING_PREEMPTION(dev_priv) \ >>>> - ((dev_priv)->info.has_logical_ring_preemption) >>>> + ((dev_priv)->info.has_logical_ring_preemption && \ >>>> + !HAS_LOGICAL_RING_ELSQ(dev_priv)) >>> >>> It's in the intel_device_info for a reason. I knew I should not have let >>> Michal turn this into a macro. >>> >> >> You mean setting has_logical_ring_preemption to zero directly? I thought >> the policy was to avoid setting things in device_info to values that >> don't reflect real HW capabilities and to do the hacks elsewhere. > > No, data driven code. intel_device_info was introduced to remove having > heavy predicates so that we could see what will be enabled and what not > in one place. > >>> I still do not see any reason why you don't just make the current >>> preemption work (it will) and then you can refine it if you prove it >>> worthwhile. >>> >> >> Just didn't see the worth of it ;). It's not a lot of code but it's in >> an hot path and we're most likely going to get rid of it soon as the new >> stuff is simpler. I'll put the change together and send it out so we can >> evaluate that and see what works better with code at hand. > > Is the new stuff going to be any simpler? You still need a preemption > point, so a special submission followed by detecting that in the CS > handler to do the unwind. > > And whilst I am here, els is awful. Either stick with elsp and note that > they changed the name (+layout) on icl, or replace it with a generic > name. Spelling it out completely as execlists->execlists_submission is > still better than els, but submit[_reg] (or submit_hw) would be clearer. > -Chris > The elsp still exists on gen11, with a slightly different behavior as noted in the commit message, that's why I wanted to change the name. execlists->submit_reg sounds good. Daniele
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8333692..346209a 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2741,8 +2741,13 @@ static inline unsigned int i915_sg_segment_size(void) #define HAS_LOGICAL_RING_CONTEXTS(dev_priv) \ ((dev_priv)->info.has_logical_ring_contexts) +#define HAS_LOGICAL_RING_ELSQ(dev_priv) \ + ((dev_priv)->info.has_logical_ring_elsq) + +/* XXX: Preemption disabled for ELSQ until support for new flow lands */ #define HAS_LOGICAL_RING_PREEMPTION(dev_priv) \ - ((dev_priv)->info.has_logical_ring_preemption) + ((dev_priv)->info.has_logical_ring_preemption && \ + !HAS_LOGICAL_RING_ELSQ(dev_priv)) #define HAS_EXECLISTS(dev_priv) HAS_LOGICAL_RING_CONTEXTS(dev_priv) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index f28c165..6c86cba 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -583,7 +583,8 @@ GEN10_FEATURES, \ .gen = 11, \ .ddb_size = 2048, \ - .has_csr = 0 + .has_csr = 0, \ + .has_logical_ring_elsq = 1 static const struct intel_device_info intel_icelake_11_info __initconst = { GEN11_FEATURES, diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 9542018..dbf0f2d 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -96,6 +96,7 @@ enum intel_platform { func(has_l3_dpf); \ func(has_llc); \ func(has_logical_ring_contexts); \ + func(has_logical_ring_elsq); \ func(has_logical_ring_preemption); \ func(has_overlay); \ func(has_pooled_eu); \ diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 22d471a..6c7cf93 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -428,11 +428,24 @@ static inline void elsp_write(u64 desc, u32 __iomem *elsp) writel(lower_32_bits(desc), elsp); } +static inline void elsqc_write(u64 desc, u32 __iomem *elsqc, u32 port) +{ + writel(lower_32_bits(desc), elsqc + port * 2); + writel(upper_32_bits(desc), elsqc + port * 2 + 1); +} + static void execlists_submit_ports(struct intel_engine_cs *engine) { + struct drm_i915_private *dev_priv = engine->i915; struct execlist_port *port = engine->execlists.port; unsigned int n; + /* + * ELSQ note: the submit queue is not cleared after being submitted + * to the HW so we need to make sure we always clean it up. This is + * currently ensured by the fact that we always write the same number + * of elsq entries, keep this in mind before changing the loop below. + */ for (n = execlists_num_ports(&engine->execlists); n--; ) { struct drm_i915_gem_request *rq; unsigned int count; @@ -456,8 +469,16 @@ static void execlists_submit_ports(struct intel_engine_cs *engine) desc = 0; } - elsp_write(desc, engine->execlists.elsp); + if (HAS_LOGICAL_RING_ELSQ(dev_priv)) + elsqc_write(desc, engine->execlists.els, n); + else + elsp_write(desc, engine->execlists.els); } + + /* we need to manually load the submit queue */ + if (HAS_LOGICAL_RING_ELSQ(dev_priv)) + I915_WRITE_FW(RING_EXECLIST_CONTROL(engine), EL_CTRL_LOAD); + execlists_clear_active(&engine->execlists, EXECLISTS_ACTIVE_HWACK); } @@ -506,9 +527,9 @@ static void inject_preempt_context(struct intel_engine_cs *engine) GEM_TRACE("%s\n", engine->name); for (n = execlists_num_ports(&engine->execlists); --n; ) - elsp_write(0, engine->execlists.elsp); + elsp_write(0, engine->execlists.els); - elsp_write(ce->lrc_desc, engine->execlists.elsp); + elsp_write(ce->lrc_desc, engine->execlists.els); execlists_clear_active(&engine->execlists, EXECLISTS_ACTIVE_HWACK); } @@ -2022,8 +2043,12 @@ static int logical_ring_init(struct intel_engine_cs *engine) if (ret) goto error; - engine->execlists.elsp = - engine->i915->regs + i915_mmio_reg_offset(RING_ELSP(engine)); + if (HAS_LOGICAL_RING_ELSQ(engine->i915)) + engine->execlists.els = engine->i915->regs + + i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(engine)); + else + engine->execlists.els = engine->i915->regs + + i915_mmio_reg_offset(RING_ELSP(engine)); return 0; diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index 6d4f9b9..3ab4266 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -38,6 +38,9 @@ #define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT (1 << 0) #define CTX_CTRL_RS_CTX_ENABLE (1 << 1) #define RING_CONTEXT_STATUS_BUF_BASE(engine) _MMIO((engine)->mmio_base + 0x370) +#define RING_EXECLIST_SQ_CONTENTS(engine) _MMIO((engine)->mmio_base + 0x510) +#define RING_EXECLIST_CONTROL(engine) _MMIO((engine)->mmio_base + 0x550) +#define EL_CTRL_LOAD (1 << 0) #define RING_CONTEXT_STATUS_BUF_LO(engine, i) _MMIO((engine)->mmio_base + 0x370 + (i) * 8) #define RING_CONTEXT_STATUS_BUF_HI(engine, i) _MMIO((engine)->mmio_base + 0x370 + (i) * 8 + 4) #define RING_CONTEXT_STATUS_PTR(engine) _MMIO((engine)->mmio_base + 0x3a0) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index c5ff203..d36bb73 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -200,9 +200,11 @@ struct intel_engine_execlists { bool no_priolist; /** - * @elsp: the ExecList Submission Port register + * @els: gen-specific execlist submission register + * set to the ExecList Submission Port (elsp) register pre-Gen11 and to + * the ExecList Submission Queue Contents register array for Gen11+ */ - u32 __iomem *elsp; + u32 __iomem *els; /** * @port: execlist port states