diff mbox

[17/49] drm/i915/bdw: A bit more advanced context init/fini

Message ID 1395943218-7708-18-git-send-email-oscar.mateo@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

oscar.mateo@intel.com March 27, 2014, 5:59 p.m. UTC
From: Ben Widawsky <benjamin.widawsky@intel.com>

There are a few big differences between context init and fini with the
previous implementation of hardware contexts. One of them is
demonstrated in this patch: we must do a context initialization for
every ring.

The patch will still fail at context setup, and therefore won't break
existing code or platform support.

Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c |  5 +++++
 drivers/gpu/drm/i915/i915_lrc.c         | 40 ++++++++++++++++++++++++++++++++-
 3 files changed, 45 insertions(+), 1 deletion(-)

Comments

Lespiau, Damien April 1, 2014, 12:38 a.m. UTC | #1
On Thu, Mar 27, 2014 at 05:59:46PM +0000, oscar.mateo@intel.com wrote:
> --- a/drivers/gpu/drm/i915/i915_lrc.c
> +++ b/drivers/gpu/drm/i915/i915_lrc.c
> @@ -41,7 +41,45 @@
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
>  
> +#define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)

I'm a bit puzzled by that number:
  - I found a sentence saying: "the Context Image for the rendering
    engine consists of 20 4K pages", which seems that it includes the
    HWS page (on the same page it says context layout = HWS Page +
    register state context).
  - When looking at the register state context for the render engine:
    18096 dwords -> 18 pages, so in total it'd be 19 pages (need to add
    the HWS Page)
  - Clearly I must be missing something :)
  - That's only for the render engine, other engines have a much smaller
    context, smaller enough that it's worth looking at their exact size.
  - It'd be nice to work out the real size from the *CXT_*SIZE
    registers.

All of this can be refinement patches on top I guess, might have a look
at it.
oscar.mateo@intel.com April 1, 2014, 1:47 p.m. UTC | #2
> > --- a/drivers/gpu/drm/i915/i915_lrc.c
> > +++ b/drivers/gpu/drm/i915/i915_lrc.c
> > @@ -41,7 +41,45 @@
> >  #include <drm/i915_drm.h>
> >  #include "i915_drv.h"
> >
> > +#define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)
> 
> I'm a bit puzzled by that number:
>   - I found a sentence saying: "the Context Image for the rendering
>     engine consists of 20 4K pages", which seems that it includes the
>     HWS page (on the same page it says context layout = HWS Page +
>     register state context).
>   - When looking at the register state context for the render engine:
>     18096 dwords -> 18 pages, so in total it'd be 19 pages (need to add
>     the HWS Page)
>   - Clearly I must be missing something :)
>   - That's only for the render engine, other engines have a much smaller
>     context, smaller enough that it's worth looking at their exact size.
>   - It'd be nice to work out the real size from the *CXT_*SIZE
>     registers.

Hmmmm... I´ll try to get the real context sizes from the registers and compare. At least for RCS, VCS and BCS since there doesn´t seem to be a register for VECS?

-- Oscar
Lespiau, Damien April 1, 2014, 1:51 p.m. UTC | #3
On Tue, Apr 01, 2014 at 02:47:19PM +0100, Mateo Lozano, Oscar wrote:
> > > --- a/drivers/gpu/drm/i915/i915_lrc.c
> > > +++ b/drivers/gpu/drm/i915/i915_lrc.c
> > > @@ -41,7 +41,45 @@
> > >  #include <drm/i915_drm.h>
> > >  #include "i915_drv.h"
> > >
> > > +#define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)
> > 
> > I'm a bit puzzled by that number:
> >   - I found a sentence saying: "the Context Image for the rendering
> >     engine consists of 20 4K pages", which seems that it includes the
> >     HWS page (on the same page it says context layout = HWS Page +
> >     register state context).
> >   - When looking at the register state context for the render engine:
> >     18096 dwords -> 18 pages, so in total it'd be 19 pages (need to add
> >     the HWS Page)
> >   - Clearly I must be missing something :)
> >   - That's only for the render engine, other engines have a much smaller
> >     context, smaller enough that it's worth looking at their exact size.
> >   - It'd be nice to work out the real size from the *CXT_*SIZE
> >     registers.
> 
> Hmmmm... I´ll try to get the real context sizes from the registers and
> compare. At least for RCS, VCS and BCS since there doesn´t seem to be
> a register for VECS?

Couldn't find it either. I guess we'll need to ask the help of a friend.
Or the 50/50 joker maybe.
Ben Widawsky April 1, 2014, 7:18 p.m. UTC | #4
On Tue, Apr 01, 2014 at 02:51:27PM +0100, Damien Lespiau wrote:
> On Tue, Apr 01, 2014 at 02:47:19PM +0100, Mateo Lozano, Oscar wrote:
> > > > --- a/drivers/gpu/drm/i915/i915_lrc.c
> > > > +++ b/drivers/gpu/drm/i915/i915_lrc.c
> > > > @@ -41,7 +41,45 @@
> > > >  #include <drm/i915_drm.h>
> > > >  #include "i915_drv.h"
> > > >
> > > > +#define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)
> > > 
> > > I'm a bit puzzled by that number:
> > >   - I found a sentence saying: "the Context Image for the rendering
> > >     engine consists of 20 4K pages", which seems that it includes the
> > >     HWS page (on the same page it says context layout = HWS Page +
> > >     register state context).
> > >   - When looking at the register state context for the render engine:
> > >     18096 dwords -> 18 pages, so in total it'd be 19 pages (need to add
> > >     the HWS Page)
> > >   - Clearly I must be missing something :)
> > >   - That's only for the render engine, other engines have a much smaller
> > >     context, smaller enough that it's worth looking at their exact size.
> > >   - It'd be nice to work out the real size from the *CXT_*SIZE
> > >     registers.
> > 
> > Hmmmm... I´ll try to get the real context sizes from the registers and
> > compare. At least for RCS, VCS and BCS since there doesn´t seem to be
> > a register for VECS?
> 
> Couldn't find it either. I guess we'll need to ask the help of a friend.
> Or the 50/50 joker maybe.
> 
> -- 
> Damien

CXT_SIZE is total garbage on anything past Ivybridge. That's why we
don't use it for HSW either... I know, right? We should request the spec
get updated. I have no excuse for not requesting that sooner.
Lespiau, Damien April 1, 2014, 9:05 p.m. UTC | #5
On Tue, Apr 01, 2014 at 12:18:24PM -0700, Ben Widawsky wrote:
> On Tue, Apr 01, 2014 at 02:51:27PM +0100, Damien Lespiau wrote:
> > On Tue, Apr 01, 2014 at 02:47:19PM +0100, Mateo Lozano, Oscar wrote:
> > > > > --- a/drivers/gpu/drm/i915/i915_lrc.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_lrc.c
> > > > > @@ -41,7 +41,45 @@
> > > > >  #include <drm/i915_drm.h>
> > > > >  #include "i915_drv.h"
> > > > >
> > > > > +#define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)
> > > > 
> > > > I'm a bit puzzled by that number:
> > > >   - I found a sentence saying: "the Context Image for the rendering
> > > >     engine consists of 20 4K pages", which seems that it includes the
> > > >     HWS page (on the same page it says context layout = HWS Page +
> > > >     register state context).
> > > >   - When looking at the register state context for the render engine:
> > > >     18096 dwords -> 18 pages, so in total it'd be 19 pages (need to add
> > > >     the HWS Page)
> > > >   - Clearly I must be missing something :)
> > > >   - That's only for the render engine, other engines have a much smaller
> > > >     context, smaller enough that it's worth looking at their exact size.
> > > >   - It'd be nice to work out the real size from the *CXT_*SIZE
> > > >     registers.
> > > 
> > > Hmmmm... I´ll try to get the real context sizes from the registers and
> > > compare. At least for RCS, VCS and BCS since there doesn´t seem to be
> > > a register for VECS?
> > 
> > Couldn't find it either. I guess we'll need to ask the help of a friend.
> > Or the 50/50 joker maybe.
> > 
> > -- 
> > Damien
> 
> CXT_SIZE is total garbage on anything past Ivybridge. That's why we
> don't use it for HSW either... I know, right? We should request the spec
> get updated. I have no excuse for not requesting that sooner.

(talking about BDW only)

For the render ring:

HWSP: 4KB
Ring context: CTX_SIZE[26:24] 5 cache lines -> offsets (in DW) 0x0 to 0x4f (= 5 * 64 / 4)
Render context: CTX_SIZE[23:16] -> 0x65 caches lines -> offets (in DW) 0x50 to 0x69f (= 0x50 + 0x65 * 64 / 4 - 1)
VF/VFE context CTX_SIZE[7:0] -> 0x82 cache lines -> offsets (in DW) 0x6A0 to 0xebf (= 0x6a0 + 0x82*64/4 - 1)
Atomic storage is the max that you can allocate, 32KB ie 8192 DWords

So we're almost there. What's missing here is the RS context size, couldn't find
it in the spec :/ Maybe because that is a "well known" value.

Note that I don't actually know what we read back from hw.

Considering that the BCS context size seems to be 2 pages, I think it's worth
digging a bit more to save ~66KB per BCS context (for instance). Even if we
have to hardcode the different context sizes.
Ben Widawsky April 2, 2014, 4:07 a.m. UTC | #6
On Tue, Apr 01, 2014 at 10:05:12PM +0100, Damien Lespiau wrote:
> On Tue, Apr 01, 2014 at 12:18:24PM -0700, Ben Widawsky wrote:
> > On Tue, Apr 01, 2014 at 02:51:27PM +0100, Damien Lespiau wrote:
> > > On Tue, Apr 01, 2014 at 02:47:19PM +0100, Mateo Lozano, Oscar wrote:
> > > > > > --- a/drivers/gpu/drm/i915/i915_lrc.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_lrc.c
> > > > > > @@ -41,7 +41,45 @@
> > > > > >  #include <drm/i915_drm.h>
> > > > > >  #include "i915_drv.h"
> > > > > >
> > > > > > +#define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)
> > > > > 
> > > > > I'm a bit puzzled by that number:
> > > > >   - I found a sentence saying: "the Context Image for the rendering
> > > > >     engine consists of 20 4K pages", which seems that it includes the
> > > > >     HWS page (on the same page it says context layout = HWS Page +
> > > > >     register state context).
> > > > >   - When looking at the register state context for the render engine:
> > > > >     18096 dwords -> 18 pages, so in total it'd be 19 pages (need to add
> > > > >     the HWS Page)
> > > > >   - Clearly I must be missing something :)
> > > > >   - That's only for the render engine, other engines have a much smaller
> > > > >     context, smaller enough that it's worth looking at their exact size.
> > > > >   - It'd be nice to work out the real size from the *CXT_*SIZE
> > > > >     registers.
> > > > 
> > > > Hmmmm... I´ll try to get the real context sizes from the registers and
> > > > compare. At least for RCS, VCS and BCS since there doesn´t seem to be
> > > > a register for VECS?
> > > 
> > > Couldn't find it either. I guess we'll need to ask the help of a friend.
> > > Or the 50/50 joker maybe.
> > > 
> > > -- 
> > > Damien
> > 
> > CXT_SIZE is total garbage on anything past Ivybridge. That's why we
> > don't use it for HSW either... I know, right? We should request the spec
> > get updated. I have no excuse for not requesting that sooner.
> 
> (talking about BDW only)
> 
> For the render ring:
> 
> HWSP: 4KB
> Ring context: CTX_SIZE[26:24] 5 cache lines -> offsets (in DW) 0x0 to 0x4f (= 5 * 64 / 4)
> Render context: CTX_SIZE[23:16] -> 0x65 caches lines -> offets (in DW) 0x50 to 0x69f (= 0x50 + 0x65 * 64 / 4 - 1)
> VF/VFE context CTX_SIZE[7:0] -> 0x82 cache lines -> offsets (in DW) 0x6A0 to 0xebf (= 0x6a0 + 0x82*64/4 - 1)
> Atomic storage is the max that you can allocate, 32KB ie 8192 DWords
> 
> So we're almost there. What's missing here is the RS context size, couldn't find
> it in the spec :/ Maybe because that is a "well known" value.
> 
> Note that I don't actually know what we read back from hw.
> 
> Considering that the BCS context size seems to be 2 pages, I think it's worth
> digging a bit more to save ~66KB per BCS context (for instance). Even if we
> have to hardcode the different context sizes.
> 
> -- 
> Damien

I guess I should have checked first. Looks like there are actually quite
a few changes since I wrote the code originally.

Carry on.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 264ea67..ff6a33c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2316,6 +2316,7 @@  int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 
 /* i915_lrc.c */
 int gen8_gem_context_init(struct drm_device *dev);
+void gen8_gem_context_fini(struct drm_device *dev);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e92b9c5..4a6f1b0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -440,6 +440,11 @@  void i915_gem_context_fini(struct drm_device *dev)
 	 * other code, leading to spurious errors. */
 	intel_gpu_reset(dev);
 
+	if (dev_priv->lrc_enabled) {
+		gen8_gem_context_fini(dev);
+		return;
+	}
+
 	/* When default context is created and switched to, base object refcount
 	 * will be 2 (+1 from object creation and +1 from do_switch()).
 	 * i915_gem_context_fini() will be called after gpu_idle() has switched
diff --git a/drivers/gpu/drm/i915/i915_lrc.c b/drivers/gpu/drm/i915/i915_lrc.c
index 3a93e99..10e6dbc 100644
--- a/drivers/gpu/drm/i915/i915_lrc.c
+++ b/drivers/gpu/drm/i915/i915_lrc.c
@@ -41,7 +41,45 @@ 
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 
+#define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)
+
+void gen8_gem_context_fini(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine *ring;
+	int unused;
+
+	for_each_ring(ring, dev_priv, unused) {
+		if (ring->default_context) {
+			i915_gem_object_ggtt_unpin(ring->default_context->obj);
+			i915_gem_context_unreference(ring->default_context);
+			ring->default_context = NULL;
+		}
+	}
+
+	dev_priv->mm.aliasing_ppgtt = NULL;
+}
+
 int gen8_gem_context_init(struct drm_device *dev)
 {
-	return -ENOSYS;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_engine *ring;
+	int ret = -ENOSYS, ring_id;
+
+	dev_priv->hw_context_size = round_up(GEN8_LR_CONTEXT_SIZE, 4096);
+
+	for_each_ring(ring, dev_priv, ring_id) {
+		ring->default_context = i915_gem_create_context(dev,
+						NULL, (ring_id == RCS));
+		if (IS_ERR_OR_NULL(ring->default_context)) {
+			ret = PTR_ERR(ring->default_context);
+			DRM_DEBUG_DRIVER("Create ctx failed: %d\n", ret);
+			ring->default_context = NULL;
+			goto err_out;
+		}
+	}
+
+err_out:
+	gen8_gem_context_fini(dev);
+	return ret;
 }