diff mbox

[v5,3/4] drm/i915/bdw: Pin the context backing objects to GGTT on-demand

Message ID 1415874490-386-1-git-send-email-thomas.daniel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Thomas Daniel Nov. 13, 2014, 10:28 a.m. UTC
From: Oscar Mateo <oscar.mateo@intel.com>

Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).

This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).

v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(

v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
Do not unpin default context in free_request.

v4: Break out pin and unpin into functions.  Fix style problems reported
by checkpatch

v5: Remove unpin_lock as all pinning and unpinning is done with the struct
mutex already locked.  Add WARN_ONs to make sure this is the case in future.

Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   12 +++++-
 drivers/gpu/drm/i915/i915_drv.h     |    1 +
 drivers/gpu/drm/i915/i915_gem.c     |   39 +++++++++++++-------
 drivers/gpu/drm/i915/intel_lrc.c    |   69 +++++++++++++++++++++++++++++------
 drivers/gpu/drm/i915/intel_lrc.h    |    4 ++
 5 files changed, 98 insertions(+), 27 deletions(-)

Comments

Daniel Vetter Nov. 17, 2014, 2:23 p.m. UTC | #1
On Tue, Nov 18, 2014 at 12:10:51PM +0530, Deepak S wrote:
> On Thursday 13 November 2014 03:58 PM, Thomas Daniel wrote:
> >diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >index 906b985..f7fa0f7 100644
> >--- a/drivers/gpu/drm/i915/intel_lrc.c
> >+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >@@ -139,8 +139,6 @@
> >  #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
> >  #define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
> >-#define GEN8_LR_CONTEXT_ALIGN 4096
> >-
> >  #define RING_EXECLIST_QFULL		(1 << 0x2)
> >  #define RING_EXECLIST1_VALID		(1 << 0x3)
> >  #define RING_EXECLIST0_VALID		(1 << 0x4)
> >@@ -801,9 +799,40 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
> >  	execlists_context_queue(ring, ctx, ringbuf->tail);
> >  }
> >+static int intel_lr_context_pin(struct intel_engine_cs *ring,
> >+		struct intel_context *ctx)
> >+{
> >+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
> >+	int ret = 0;
> >+
> >+	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
> 
> With pin specific mutex from previous patch set removed.

Pardon my ignorance but I'm completely lost on this review comment here.
Deepak, can you please elaborate what kind of lock on which exact version
of the previous patch you mean? I didn't find any locking at all in the
preceeding patch here ...

Thanks, Daniel
Akash Goel Nov. 17, 2014, 2:38 p.m. UTC | #2
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goels@gmail.com>"

On Thu, Nov 13, 2014 at 3:58 PM, Thomas Daniel <thomas.daniel@intel.com>
wrote:

> From: Oscar Mateo <oscar.mateo@intel.com>
>
> Up until now, we have pinned every logical ring context backing object
> during creation, and left it pinned until destruction. This made my life
> easier, but it's a harmful thing to do, because we cause fragmentation
> of the GGTT (and, eventually, we would run out of space).
>
> This patch makes the pinning on-demand: the backing objects of the two
> contexts that are written to the ELSP are pinned right before submission
> and unpinned once the hardware is done with them. The only context that
> is still pinned regardless is the global default one, so that the HWS can
> still be accessed in the same way (ring->status_page).
>
> v2: In the early version of this patch, we were pinning the context as
> we put it into the ELSP: on the one hand, this is very efficient because
> only a maximum two contexts are pinned at any given time, but on the other
> hand, we cannot really pin in interrupt time :(
>
> v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
> Do not unpin default context in free_request.
>
> v4: Break out pin and unpin into functions.  Fix style problems reported
> by checkpatch
>
> v5: Remove unpin_lock as all pinning and unpinning is done with the struct
> mutex already locked.  Add WARN_ONs to make sure this is the case in
> future.
>
> Issue: VIZ-4277
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   12 +++++-
>  drivers/gpu/drm/i915/i915_drv.h     |    1 +
>  drivers/gpu/drm/i915/i915_gem.c     |   39 +++++++++++++-------
>  drivers/gpu/drm/i915/intel_lrc.c    |   69
> +++++++++++++++++++++++++++++------
>  drivers/gpu/drm/i915/intel_lrc.h    |    4 ++
>  5 files changed, 98 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c
> b/drivers/gpu/drm/i915/i915_debugfs.c
> index e60d5c2..6eaf813 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1799,10 +1799,16 @@ static int i915_dump_lrc(struct seq_file *m, void
> *unused)
>                                 continue;
>
>                         if (ctx_obj) {
> -                               struct page *page =
> i915_gem_object_get_page(ctx_obj, 1);
> -                               uint32_t *reg_state = kmap_atomic(page);
> +                               struct page *page;
> +                               uint32_t *reg_state;
>                                 int j;
>
> +                               i915_gem_obj_ggtt_pin(ctx_obj,
> +                                               GEN8_LR_CONTEXT_ALIGN, 0);
> +
> +                               page = i915_gem_object_get_page(ctx_obj,
> 1);
> +                               reg_state = kmap_atomic(page);
> +
>                                 seq_printf(m, "CONTEXT: %s %u\n",
> ring->name,
>
> intel_execlists_ctx_id(ctx_obj));
>
> @@ -1814,6 +1820,8 @@ static int i915_dump_lrc(struct seq_file *m, void
> *unused)
>                                 }
>                                 kunmap_atomic(reg_state);
>
> +                               i915_gem_object_ggtt_unpin(ctx_obj);
> +
>                                 seq_putc(m, '\n');
>                         }
>                 }
> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> index 059330c..3c7299d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -655,6 +655,7 @@ struct intel_context {
>         struct {
>                 struct drm_i915_gem_object *state;
>                 struct intel_ringbuffer *ringbuf;
> +               int unpin_count;
>         } engine[I915_NUM_RINGS];
>
>         struct list_head link;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c
> index 408afe7..2ee6996 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2494,12 +2494,18 @@ static void i915_set_reset_status(struct
> drm_i915_private *dev_priv,
>
>  static void i915_gem_free_request(struct drm_i915_gem_request *request)
>  {
> +       struct intel_context *ctx = request->ctx;
> +
>         list_del(&request->list);
>         i915_gem_request_remove_from_client(request);
>
> -       if (request->ctx)
> -               i915_gem_context_unreference(request->ctx);
> +       if (i915.enable_execlists && ctx) {
> +               struct intel_engine_cs *ring = request->ring;
>
> +               if (ctx != ring->default_context)
> +                       intel_lr_context_unpin(ring, ctx);
> +               i915_gem_context_unreference(ctx);
> +       }
>         kfree(request);
>  }
>
> @@ -2554,6 +2560,23 @@ static void i915_gem_reset_ring_cleanup(struct
> drm_i915_private *dev_priv,
>         }
>
>         /*
> +        * Clear the execlists queue up before freeing the requests, as
> those
> +        * are the ones that keep the context and ringbuffer backing
> objects
> +        * pinned in place.
> +        */
> +       while (!list_empty(&ring->execlist_queue)) {
> +               struct intel_ctx_submit_request *submit_req;
> +
> +               submit_req = list_first_entry(&ring->execlist_queue,
> +                               struct intel_ctx_submit_request,
> +                               execlist_link);
> +               list_del(&submit_req->execlist_link);
> +               intel_runtime_pm_put(dev_priv);
> +               i915_gem_context_unreference(submit_req->ctx);
> +               kfree(submit_req);
> +       }
> +
> +       /*
>          * We must free the requests after all the corresponding objects
> have
>          * been moved off active lists. Which is the same order as the
> normal
>          * retire_requests function does. This is important if object hold
> @@ -2570,18 +2593,6 @@ static void i915_gem_reset_ring_cleanup(struct
> drm_i915_private *dev_priv,
>                 i915_gem_free_request(request);
>         }
>
> -       while (!list_empty(&ring->execlist_queue)) {
> -               struct intel_ctx_submit_request *submit_req;
> -
> -               submit_req = list_first_entry(&ring->execlist_queue,
> -                               struct intel_ctx_submit_request,
> -                               execlist_link);
> -               list_del(&submit_req->execlist_link);
> -               intel_runtime_pm_put(dev_priv);
> -               i915_gem_context_unreference(submit_req->ctx);
> -               kfree(submit_req);
> -       }
> -
>         /* These may not have been flush before the reset, do so now */
>         kfree(ring->preallocated_lazy_request);
>         ring->preallocated_lazy_request = NULL;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 906b985..f7fa0f7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -139,8 +139,6 @@
>  #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
>  #define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
>
> -#define GEN8_LR_CONTEXT_ALIGN 4096
> -
>  #define RING_EXECLIST_QFULL            (1 << 0x2)
>  #define RING_EXECLIST1_VALID           (1 << 0x3)
>  #define RING_EXECLIST0_VALID           (1 << 0x4)
> @@ -801,9 +799,40 @@ void intel_logical_ring_advance_and_submit(struct
> intel_ringbuffer *ringbuf)
>         execlists_context_queue(ring, ctx, ringbuf->tail);
>  }
>
> +static int intel_lr_context_pin(struct intel_engine_cs *ring,
> +               struct intel_context *ctx)
> +{
> +       struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
> +       int ret = 0;
> +
> +       WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
> +       if (ctx->engine[ring->id].unpin_count++ == 0) {
> +               ret = i915_gem_obj_ggtt_pin(ctx_obj,
> +                               GEN8_LR_CONTEXT_ALIGN, 0);
> +               if (ret)
> +                       ctx->engine[ring->id].unpin_count = 0;
> +       }
> +
> +       return ret;
> +}
> +
> +void intel_lr_context_unpin(struct intel_engine_cs *ring,
> +               struct intel_context *ctx)
> +{
> +       struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
> +
> +       if (ctx_obj) {
> +               WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
> +               if (--ctx->engine[ring->id].unpin_count == 0)
> +                       i915_gem_object_ggtt_unpin(ctx_obj);
> +       }
> +}
> +
>  static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
>                                     struct intel_context *ctx)
>  {
> +       int ret;
> +
>         if (ring->outstanding_lazy_seqno)
>                 return 0;
>
> @@ -814,6 +843,14 @@ static int logical_ring_alloc_seqno(struct
> intel_engine_cs *ring,
>                 if (request == NULL)
>                         return -ENOMEM;
>
> +               if (ctx != ring->default_context) {
> +                       ret = intel_lr_context_pin(ring, ctx);
> +                       if (ret) {
> +                               kfree(request);
> +                               return ret;
> +                       }
> +               }
> +
>                 /* Hold a reference to the context this request belongs to
>                  * (we will need it when the time comes to emit/retire the
>                  * request).
> @@ -1626,12 +1663,16 @@ void intel_lr_context_free(struct intel_context
> *ctx)
>
>         for (i = 0; i < I915_NUM_RINGS; i++) {
>                 struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
> -               struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
>
>                 if (ctx_obj) {
> +                       struct intel_ringbuffer *ringbuf =
> +                                       ctx->engine[i].ringbuf;
> +                       struct intel_engine_cs *ring = ringbuf->ring;
> +
>                         intel_destroy_ringbuffer_obj(ringbuf);
>                         kfree(ringbuf);
> -                       i915_gem_object_ggtt_unpin(ctx_obj);
> +                       if (ctx == ring->default_context)
> +                               i915_gem_object_ggtt_unpin(ctx_obj);
>                         drm_gem_object_unreference(&ctx_obj->base);
>                 }
>         }
> @@ -1695,6 +1736,7 @@ static int lrc_setup_hardware_status_page(struct
> intel_engine_cs *ring,
>  int intel_lr_context_deferred_create(struct intel_context *ctx,
>                                      struct intel_engine_cs *ring)
>  {
> +       const bool is_global_default_ctx = (ctx == ring->default_context);
>         struct drm_device *dev = ring->dev;
>         struct drm_i915_gem_object *ctx_obj;
>         uint32_t context_size;
> @@ -1714,18 +1756,22 @@ int intel_lr_context_deferred_create(struct
> intel_context *ctx,
>                 return ret;
>         }
>
> -       ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
> -       if (ret) {
> -               DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
> -               drm_gem_object_unreference(&ctx_obj->base);
> -               return ret;
> +       if (is_global_default_ctx) {
> +               ret = i915_gem_obj_ggtt_pin(ctx_obj,
> GEN8_LR_CONTEXT_ALIGN, 0);
> +               if (ret) {
> +                       DRM_DEBUG_DRIVER("Pin LRC backing obj failed:
> %d\n",
> +                                       ret);
> +                       drm_gem_object_unreference(&ctx_obj->base);
> +                       return ret;
> +               }
>         }
>
>         ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
>         if (!ringbuf) {
>                 DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
>                                 ring->name);
> -               i915_gem_object_ggtt_unpin(ctx_obj);
> +               if (is_global_default_ctx)
> +                       i915_gem_object_ggtt_unpin(ctx_obj);
>                 drm_gem_object_unreference(&ctx_obj->base);
>                 ret = -ENOMEM;
>                 return ret;
> @@ -1787,7 +1833,8 @@ int intel_lr_context_deferred_create(struct
> intel_context *ctx,
>
>  error:
>         kfree(ringbuf);
> -       i915_gem_object_ggtt_unpin(ctx_obj);
> +       if (is_global_default_ctx)
> +               i915_gem_object_ggtt_unpin(ctx_obj);
>         drm_gem_object_unreference(&ctx_obj->base);
>         return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h
> b/drivers/gpu/drm/i915/intel_lrc.h
> index 84bbf19..14b216b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -24,6 +24,8 @@
>  #ifndef _INTEL_LRC_H_
>  #define _INTEL_LRC_H_
>
> +#define GEN8_LR_CONTEXT_ALIGN 4096
> +
>  /* Execlists regs */
>  #define RING_ELSP(ring)                        ((ring)->mmio_base+0x230)
>  #define RING_EXECLIST_STATUS(ring)     ((ring)->mmio_base+0x234)
> @@ -67,6 +69,8 @@ int intel_lr_context_render_state_init(struct
> intel_engine_cs *ring,
>  void intel_lr_context_free(struct intel_context *ctx);
>  int intel_lr_context_deferred_create(struct intel_context *ctx,
>                                      struct intel_engine_cs *ring);
> +void intel_lr_context_unpin(struct intel_engine_cs *ring,
> +               struct intel_context *ctx);
>
>  /* Execlists */
>  int intel_sanitize_enable_execlists(struct drm_device *dev, int
> enable_execlists);
> --
> 1.7.9.5
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Thomas Daniel Nov. 17, 2014, 2:55 p.m. UTC | #3
Here is the actual review...

_____________________________________________
From: Daniel, Thomas 

Sent: Wednesday, November 12, 2014 8:52 PM
To: Goel, Akash
Subject: RE: Execlists patches code review


Hi Akash,

I will put the WARN messages back in and remove the need_unpin.
The elsp_submitted count does not behave exactly as you would expect because of some race condition.
Have a look at the patch “Avoid non-lite-restore preemptions” by Oscar Mateo for a description.

Thanks,
Thomas.
_____________________________________________
From: Goel, Akash 

Sent: Tuesday, November 11, 2014 4:37 PM
To: Daniel, Thomas
Subject: RE: Execlists patches code review


Hi Thomas,

Few comments on http://patchwork.freedesktop.org/patch/35830/ 

	int elsp_submitted;
+	bool need_unpin;

This new field has not been used anywhere.


		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
-			WARN(head_req->elsp_submitted == 0,
-			     "Never submitted head request\n");

Sorry couldn’t get this change. Even if a request has been merged, still the elsp_submitted count should not be 0 here, when this function is executed on arrival of Context switch interrupt. When a new request is merged with a previously submitted request, the original value of elsp_submitted is still retained.
 
+			/* If the request has been merged, it is possible to get
+			 * here with an unsubmitted request. */
 			if (--head_req->elsp_submitted <= 0) {




		if (status & GEN8_CTX_STATUS_PREEMPTED) {
 			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
-				if (execlists_check_remove_request(ring, status_id))
-					WARN(1, "Lite Restored request removed from queue\n");
+				execlists_check_remove_request(ring, status_id);

Same doubt here, thought that in this case of interrupt due to Preemption (Lite restore), which will occur when the same Context is submitted as the one already being executed by the Hw, the count will not drop to 0. Count will drop to 0 when the context switch interrupt will be generated subsequently.

Best regards
Akash
_____________________________________________
From: Goel, Akash 

Sent: Tuesday, November 11, 2014 8:58 PM
To: Daniel, Thomas
Subject: RE: Execlists patches code review


Hi Thomas, 

I was OOP today, I will provide this review comment tomorrow on the GFX mailing list.

Best regards
Akash
_____________________________________________
From: Daniel, Thomas 

Sent: Monday, November 10, 2014 10:41 PM
To: Goel, Akash
Subject: RE: Execlists patches code review


Hi Akash,

Please post this comment to the mailing list.
Assuming nobody else comments I will remove the unpin_lock and replace the mutex_lock(&unpin_lock) with WARN_ON(!mutex_is_locked(&dev->struct_mutex)).

Cheers,
Thomas.

_____________________________________________
From: Goel, Akash 

Sent: Monday, November 10, 2014 11:19 AM
To: Daniel, Thomas
Subject: RE: Execlists patches code review


In context of the 3rd patch  http://patchwork.freedesktop.org/patch/35829/
intel_lr_context_pin is being called from logical_ring_alloc_seqno function and intel_lr_context_unpin  gets called from i915_gem_free_request & i915_gem_reset_ring_cleanup functions

All these 3 paths are already protected by dev->struct_mutex (Global lock), so they will always execute sequentially with respect to each other. 

Do we need to have a new lock ?
+		struct mutex unpin_lock;

Best regards
Akash
Daniel Vetter Nov. 17, 2014, 6:09 p.m. UTC | #4
On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 059330c..3c7299d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -655,6 +655,7 @@ struct intel_context {
>  	struct {
>  		struct drm_i915_gem_object *state;
>  		struct intel_ringbuffer *ringbuf;
> +		int unpin_count;

Pinning is already refcounted. Why this additional refcount?

And yes I've only realized this now that you've supplied the review
comments from Akash. I really rely upon the review discussions to spot
such low-level implementation details.
-Daniel
deepak.s@intel.com Nov. 18, 2014, 6:40 a.m. UTC | #5
On Thursday 13 November 2014 03:58 PM, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
>
> Up until now, we have pinned every logical ring context backing object
> during creation, and left it pinned until destruction. This made my life
> easier, but it's a harmful thing to do, because we cause fragmentation
> of the GGTT (and, eventually, we would run out of space).
>
> This patch makes the pinning on-demand: the backing objects of the two
> contexts that are written to the ELSP are pinned right before submission
> and unpinned once the hardware is done with them. The only context that
> is still pinned regardless is the global default one, so that the HWS can
> still be accessed in the same way (ring->status_page).
>
> v2: In the early version of this patch, we were pinning the context as
> we put it into the ELSP: on the one hand, this is very efficient because
> only a maximum two contexts are pinned at any given time, but on the other
> hand, we cannot really pin in interrupt time :(
>
> v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
> Do not unpin default context in free_request.
>
> v4: Break out pin and unpin into functions.  Fix style problems reported
> by checkpatch
>
> v5: Remove unpin_lock as all pinning and unpinning is done with the struct
> mutex already locked.  Add WARN_ONs to make sure this is the case in future.
>
> Issue: VIZ-4277
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c |   12 +++++-
>   drivers/gpu/drm/i915/i915_drv.h     |    1 +
>   drivers/gpu/drm/i915/i915_gem.c     |   39 +++++++++++++-------
>   drivers/gpu/drm/i915/intel_lrc.c    |   69 +++++++++++++++++++++++++++++------
>   drivers/gpu/drm/i915/intel_lrc.h    |    4 ++
>   5 files changed, 98 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index e60d5c2..6eaf813 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1799,10 +1799,16 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
>   				continue;
>   
>   			if (ctx_obj) {
> -				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
> -				uint32_t *reg_state = kmap_atomic(page);
> +				struct page *page;
> +				uint32_t *reg_state;
>   				int j;
>   
> +				i915_gem_obj_ggtt_pin(ctx_obj,
> +						GEN8_LR_CONTEXT_ALIGN, 0);
> +
> +				page = i915_gem_object_get_page(ctx_obj, 1);
> +				reg_state = kmap_atomic(page);
> +
>   				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
>   						intel_execlists_ctx_id(ctx_obj));
>   
> @@ -1814,6 +1820,8 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
>   				}
>   				kunmap_atomic(reg_state);
>   
> +				i915_gem_object_ggtt_unpin(ctx_obj);
> +
>   				seq_putc(m, '\n');
>   			}
>   		}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 059330c..3c7299d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -655,6 +655,7 @@ struct intel_context {
>   	struct {
>   		struct drm_i915_gem_object *state;
>   		struct intel_ringbuffer *ringbuf;
> +		int unpin_count;
>   	} engine[I915_NUM_RINGS];
>   
>   	struct list_head link;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 408afe7..2ee6996 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2494,12 +2494,18 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>   
>   static void i915_gem_free_request(struct drm_i915_gem_request *request)
>   {
> +	struct intel_context *ctx = request->ctx;
> +
>   	list_del(&request->list);
>   	i915_gem_request_remove_from_client(request);
>   
> -	if (request->ctx)
> -		i915_gem_context_unreference(request->ctx);
> +	if (i915.enable_execlists && ctx) {
> +		struct intel_engine_cs *ring = request->ring;
>   
> +		if (ctx != ring->default_context)
> +			intel_lr_context_unpin(ring, ctx);
> +		i915_gem_context_unreference(ctx);
> +	}
>   	kfree(request);
>   }
>   
> @@ -2554,6 +2560,23 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>   	}
>   
>   	/*
> +	 * Clear the execlists queue up before freeing the requests, as those
> +	 * are the ones that keep the context and ringbuffer backing objects
> +	 * pinned in place.
> +	 */
> +	while (!list_empty(&ring->execlist_queue)) {
> +		struct intel_ctx_submit_request *submit_req;
> +
> +		submit_req = list_first_entry(&ring->execlist_queue,
> +				struct intel_ctx_submit_request,
> +				execlist_link);
> +		list_del(&submit_req->execlist_link);
> +		intel_runtime_pm_put(dev_priv);
> +		i915_gem_context_unreference(submit_req->ctx);
> +		kfree(submit_req);
> +	}
> +
> +	/*
>   	 * We must free the requests after all the corresponding objects have
>   	 * been moved off active lists. Which is the same order as the normal
>   	 * retire_requests function does. This is important if object hold
> @@ -2570,18 +2593,6 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>   		i915_gem_free_request(request);
>   	}
>   
> -	while (!list_empty(&ring->execlist_queue)) {
> -		struct intel_ctx_submit_request *submit_req;
> -
> -		submit_req = list_first_entry(&ring->execlist_queue,
> -				struct intel_ctx_submit_request,
> -				execlist_link);
> -		list_del(&submit_req->execlist_link);
> -		intel_runtime_pm_put(dev_priv);
> -		i915_gem_context_unreference(submit_req->ctx);
> -		kfree(submit_req);
> -	}
> -
>   	/* These may not have been flush before the reset, do so now */
>   	kfree(ring->preallocated_lazy_request);
>   	ring->preallocated_lazy_request = NULL;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 906b985..f7fa0f7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -139,8 +139,6 @@
>   #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
>   #define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
>   
> -#define GEN8_LR_CONTEXT_ALIGN 4096
> -
>   #define RING_EXECLIST_QFULL		(1 << 0x2)
>   #define RING_EXECLIST1_VALID		(1 << 0x3)
>   #define RING_EXECLIST0_VALID		(1 << 0x4)
> @@ -801,9 +799,40 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
>   	execlists_context_queue(ring, ctx, ringbuf->tail);
>   }
>   
> +static int intel_lr_context_pin(struct intel_engine_cs *ring,
> +		struct intel_context *ctx)
> +{
> +	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
> +	int ret = 0;
> +
> +	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));

With pin specific mutex from previous patch set removed.
Reviewed-by: Deepak S<deepak.s@linux.intel.com>

> +	if (ctx->engine[ring->id].unpin_count++ == 0) {
> +		ret = i915_gem_obj_ggtt_pin(ctx_obj,
> +				GEN8_LR_CONTEXT_ALIGN, 0);
> +		if (ret)
> +			ctx->engine[ring->id].unpin_count = 0;
> +	}
> +
> +	return ret;
> +}
> +
> +void intel_lr_context_unpin(struct intel_engine_cs *ring,
> +		struct intel_context *ctx)
> +{
> +	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
> +
> +	if (ctx_obj) {
> +		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
> +		if (--ctx->engine[ring->id].unpin_count == 0)
> +			i915_gem_object_ggtt_unpin(ctx_obj);
> +	}
> +}
> +
>   static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
>   				    struct intel_context *ctx)
>   {
> +	int ret;
> +
>   	if (ring->outstanding_lazy_seqno)
>   		return 0;
>   
> @@ -814,6 +843,14 @@ static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
>   		if (request == NULL)
>   			return -ENOMEM;
>   
> +		if (ctx != ring->default_context) {
> +			ret = intel_lr_context_pin(ring, ctx);
> +			if (ret) {
> +				kfree(request);
> +				return ret;
> +			}
> +		}
> +
>   		/* Hold a reference to the context this request belongs to
>   		 * (we will need it when the time comes to emit/retire the
>   		 * request).
> @@ -1626,12 +1663,16 @@ void intel_lr_context_free(struct intel_context *ctx)
>   
>   	for (i = 0; i < I915_NUM_RINGS; i++) {
>   		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
> -		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
>   
>   		if (ctx_obj) {
> +			struct intel_ringbuffer *ringbuf =
> +					ctx->engine[i].ringbuf;
> +			struct intel_engine_cs *ring = ringbuf->ring;
> +
>   			intel_destroy_ringbuffer_obj(ringbuf);
>   			kfree(ringbuf);
> -			i915_gem_object_ggtt_unpin(ctx_obj);
> +			if (ctx == ring->default_context)
> +				i915_gem_object_ggtt_unpin(ctx_obj);
>   			drm_gem_object_unreference(&ctx_obj->base);
>   		}
>   	}
> @@ -1695,6 +1736,7 @@ static int lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
>   int intel_lr_context_deferred_create(struct intel_context *ctx,
>   				     struct intel_engine_cs *ring)
>   {
> +	const bool is_global_default_ctx = (ctx == ring->default_context);
>   	struct drm_device *dev = ring->dev;
>   	struct drm_i915_gem_object *ctx_obj;
>   	uint32_t context_size;
> @@ -1714,18 +1756,22 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>   		return ret;
>   	}
>   
> -	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
> -		drm_gem_object_unreference(&ctx_obj->base);
> -		return ret;
> +	if (is_global_default_ctx) {
> +		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
> +		if (ret) {
> +			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n",
> +					ret);
> +			drm_gem_object_unreference(&ctx_obj->base);
> +			return ret;
> +		}
>   	}
>   
>   	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
>   	if (!ringbuf) {
>   		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
>   				ring->name);
> -		i915_gem_object_ggtt_unpin(ctx_obj);
> +		if (is_global_default_ctx)
> +			i915_gem_object_ggtt_unpin(ctx_obj);
>   		drm_gem_object_unreference(&ctx_obj->base);
>   		ret = -ENOMEM;
>   		return ret;
> @@ -1787,7 +1833,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>   
>   error:
>   	kfree(ringbuf);
> -	i915_gem_object_ggtt_unpin(ctx_obj);
> +	if (is_global_default_ctx)
> +		i915_gem_object_ggtt_unpin(ctx_obj);
>   	drm_gem_object_unreference(&ctx_obj->base);
>   	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 84bbf19..14b216b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -24,6 +24,8 @@
>   #ifndef _INTEL_LRC_H_
>   #define _INTEL_LRC_H_
>   
> +#define GEN8_LR_CONTEXT_ALIGN 4096
> +
>   /* Execlists regs */
>   #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
>   #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
> @@ -67,6 +69,8 @@ int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
>   void intel_lr_context_free(struct intel_context *ctx);
>   int intel_lr_context_deferred_create(struct intel_context *ctx,
>   				     struct intel_engine_cs *ring);
> +void intel_lr_context_unpin(struct intel_engine_cs *ring,
> +		struct intel_context *ctx);
>   
>   /* Execlists */
>   int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
Thomas Daniel Nov. 18, 2014, 9:27 a.m. UTC | #6
> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Monday, November 17, 2014 6:09 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> backing objects to GGTT on-demand
> 
> On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index 059330c..3c7299d 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -655,6 +655,7 @@ struct intel_context {
> >  	struct {
> >  		struct drm_i915_gem_object *state;
> >  		struct intel_ringbuffer *ringbuf;
> > +		int unpin_count;
> 
> Pinning is already refcounted. Why this additional refcount?

The vma.pin_count is only allocated 4 bits of storage.  If this restriction can be lifted then I can use that.

> And yes I've only realized this now that you've supplied the review
> comments from Akash. I really rely upon the review discussions to spot such
> low-level implementation details.

I know, and I explicitly asked the guys to post comments to the mailing list.

Cheers,
Thomas.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Thomas Daniel Nov. 18, 2014, 10:48 a.m. UTC | #7
> -----Original Message-----

> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf

> Of Daniel, Thomas

> Sent: Tuesday, November 18, 2014 9:28 AM

> To: Daniel Vetter

> Cc: intel-gfx@lists.freedesktop.org

> Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context

> backing objects to GGTT on-demand

> 

> > -----Original Message-----

> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of

> > Daniel Vetter

> > Sent: Monday, November 17, 2014 6:09 PM

> > To: Daniel, Thomas

> > Cc: intel-gfx@lists.freedesktop.org

> > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context

> > backing objects to GGTT on-demand

> >

> > On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:

> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h

> > > b/drivers/gpu/drm/i915/i915_drv.h index 059330c..3c7299d 100644

> > > --- a/drivers/gpu/drm/i915/i915_drv.h

> > > +++ b/drivers/gpu/drm/i915/i915_drv.h

> > > @@ -655,6 +655,7 @@ struct intel_context {

> > >  	struct {

> > >  		struct drm_i915_gem_object *state;

> > >  		struct intel_ringbuffer *ringbuf;

> > > +		int unpin_count;

> >

> > Pinning is already refcounted. Why this additional refcount?

> 

> The vma.pin_count is only allocated 4 bits of storage.  If this restriction can be

> lifted then I can use that.


Actually I just tried to implement this, it causes a problem for patch 4 of this set as the unpin_count is also used for the ringbuffer object which has an ioremap as well as a ggtt pin.

Thomas.


> > And yes I've only realized this now that you've supplied the review

> > comments from Akash. I really rely upon the review discussions to spot

> > such low-level implementation details.

> 

> I know, and I explicitly asked the guys to post comments to the mailing list.

> 

> Cheers,

> Thomas.

> 

> > -Daniel

> > --

> > Daniel Vetter

> > Software Engineer, Intel Corporation

> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch

> _______________________________________________

> Intel-gfx mailing list

> Intel-gfx@lists.freedesktop.org

> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
deepak.s@intel.com Nov. 18, 2014, 2:27 p.m. UTC | #8
On Monday 17 November 2014 07:53 PM, Daniel Vetter wrote:
> On Tue, Nov 18, 2014 at 12:10:51PM +0530, Deepak S wrote:
>> On Thursday 13 November 2014 03:58 PM, Thomas Daniel wrote:
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 906b985..f7fa0f7 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -139,8 +139,6 @@
>>>   #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
>>>   #define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
>>> -#define GEN8_LR_CONTEXT_ALIGN 4096
>>> -
>>>   #define RING_EXECLIST_QFULL		(1 << 0x2)
>>>   #define RING_EXECLIST1_VALID		(1 << 0x3)
>>>   #define RING_EXECLIST0_VALID		(1 << 0x4)
>>> @@ -801,9 +799,40 @@ void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
>>>   	execlists_context_queue(ring, ctx, ringbuf->tail);
>>>   }
>>> +static int intel_lr_context_pin(struct intel_engine_cs *ring,
>>> +		struct intel_context *ctx)
>>> +{
>>> +	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
>>> +	int ret = 0;
>>> +
>>> +	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
>> With pin specific mutex from previous patch set removed.
> Pardon my ignorance but I'm completely lost on this review comment here.
> Deepak, can you please elaborate what kind of lock on which exact version
> of the previous patch you mean? I didn't find any locking at all in the
> preceeding patch here ...
>
> Thanks, Daniel

Hi Daniel,

+static int intel_lr_context_pin(struct intel_engine_cs *ring,
+		struct intel_context *ctx)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	int ret = 0;
+
+	mutex_lock(&ctx->engine[ring->id].unpin_lock);
+	if (ctx->engine[ring->id].unpin_count++ == 0) {
+		ret = i915_gem_obj_ggtt_pin(ctx_obj,
+				GEN8_LR_CONTEXT_ALIGN, 0);
+		if (ret)
+			ctx->engine[ring->id].unpin_count = 0;
+	}
+	mutex_unlock(&ctx->engine[ring->id].unpin_lock);
+
+	return ret;
+}

In Previous patch set we had a "mutex_lock(&ctx->engine[ring->id].unpin_lock);"

Since we "intel_lr_context_pin" is already under struct mutex, So we dont need unpin_lock. This was the change in latest patch set :)

Thanks
Deepak
Daniel Vetter Nov. 18, 2014, 2:33 p.m. UTC | #9
On Tue, Nov 18, 2014 at 10:48:09AM +0000, Daniel, Thomas wrote:
> > -----Original Message-----
> > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf
> > Of Daniel, Thomas
> > Sent: Tuesday, November 18, 2014 9:28 AM
> > To: Daniel Vetter
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> > backing objects to GGTT on-demand
> > 
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Monday, November 17, 2014 6:09 PM
> > > To: Daniel, Thomas
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> > > backing objects to GGTT on-demand
> > >
> > > On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > > b/drivers/gpu/drm/i915/i915_drv.h index 059330c..3c7299d 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > @@ -655,6 +655,7 @@ struct intel_context {
> > > >  	struct {
> > > >  		struct drm_i915_gem_object *state;
> > > >  		struct intel_ringbuffer *ringbuf;
> > > > +		int unpin_count;
> > >
> > > Pinning is already refcounted. Why this additional refcount?
> > 
> > The vma.pin_count is only allocated 4 bits of storage.  If this restriction can be
> > lifted then I can use that.

Those 4 bits are good enough for legacy contexts, so I wonder a bit what's
so massively different for execlist contexts.
 
> Actually I just tried to implement this, it causes a problem for patch 4
> of this set as the unpin_count is also used for the ringbuffer object
> which has an ioremap as well as a ggtt pin.

Yeah, ioremap needs to be redone every time we pin/unpin. But on sane
archs it's almost no overhead really. And if this does start to matter
(shudder for 32bit kernels on gen8) then we can fix it ...
-Daniel
Thomas Daniel Nov. 18, 2014, 2:51 p.m. UTC | #10
> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, November 18, 2014 2:33 PM
> To: Daniel, Thomas
> Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> backing objects to GGTT on-demand
> 
> On Tue, Nov 18, 2014 at 10:48:09AM +0000, Daniel, Thomas wrote:
> > > -----Original Message-----
> > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On
> > > Behalf Of Daniel, Thomas
> > > Sent: Tuesday, November 18, 2014 9:28 AM
> > > To: Daniel Vetter
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > context backing objects to GGTT on-demand
> > >
> > > > -----Original Message-----
> > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > > Daniel Vetter
> > > > Sent: Monday, November 17, 2014 6:09 PM
> > > > To: Daniel, Thomas
> > > > Cc: intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > context backing objects to GGTT on-demand
> > > >
> > > > On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > > > b/drivers/gpu/drm/i915/i915_drv.h index 059330c..3c7299d 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > > @@ -655,6 +655,7 @@ struct intel_context {
> > > > >  	struct {
> > > > >  		struct drm_i915_gem_object *state;
> > > > >  		struct intel_ringbuffer *ringbuf;
> > > > > +		int unpin_count;
> > > >
> > > > Pinning is already refcounted. Why this additional refcount?
> > >
> > > The vma.pin_count is only allocated 4 bits of storage.  If this
> > > restriction can be lifted then I can use that.
> 
> Those 4 bits are good enough for legacy contexts, so I wonder a bit what's so
> massively different for execlist contexts.
With execlists, in order to dynamically unpin the LRC backing object and ring buffer object when not required we take a reference for each execlist request that uses them (remember that the execlist request lifecycle is currently different from the execbuffer request).  This can be a lot, especially in some of the less sane i-g-t tests.

> > Actually I just tried to implement this, it causes a problem for patch
> > 4 of this set as the unpin_count is also used for the ringbuffer
> > object which has an ioremap as well as a ggtt pin.
> 
> Yeah, ioremap needs to be redone every time we pin/unpin. But on sane
> archs it's almost no overhead really. And if this does start to matter (shudder
> for 32bit kernels on gen8) then we can fix it ...
Hm, so the CPU vaddr of the ring buffer will move around as more requests reference it which I suppose is not a problem.  We will use a lot of address space (again, especially with the i-g-t stress tests which can submit tens of thousands of requests in a very short space of time).  What would the fix be?  An extra reference count for the ioremap?  Looks familiar :)

I still think it's best to keep the context unpin_count for execlists mode.

Thomas.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Nov. 18, 2014, 3:11 p.m. UTC | #11
On Tue, Nov 18, 2014 at 02:51:52PM +0000, Daniel, Thomas wrote:
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Tuesday, November 18, 2014 2:33 PM
> > To: Daniel, Thomas
> > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> > backing objects to GGTT on-demand
> > 
> > On Tue, Nov 18, 2014 at 10:48:09AM +0000, Daniel, Thomas wrote:
> > > > -----Original Message-----
> > > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On
> > > > Behalf Of Daniel, Thomas
> > > > Sent: Tuesday, November 18, 2014 9:28 AM
> > > > To: Daniel Vetter
> > > > Cc: intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > context backing objects to GGTT on-demand
> > > >
> > > > > -----Original Message-----
> > > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > > > Daniel Vetter
> > > > > Sent: Monday, November 17, 2014 6:09 PM
> > > > > To: Daniel, Thomas
> > > > > Cc: intel-gfx@lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > > context backing objects to GGTT on-demand
> > > > >
> > > > > On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > > > > b/drivers/gpu/drm/i915/i915_drv.h index 059330c..3c7299d 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > > > @@ -655,6 +655,7 @@ struct intel_context {
> > > > > >  	struct {
> > > > > >  		struct drm_i915_gem_object *state;
> > > > > >  		struct intel_ringbuffer *ringbuf;
> > > > > > +		int unpin_count;
> > > > >
> > > > > Pinning is already refcounted. Why this additional refcount?
> > > >
> > > > The vma.pin_count is only allocated 4 bits of storage.  If this
> > > > restriction can be lifted then I can use that.
> > 
> > Those 4 bits are good enough for legacy contexts, so I wonder a bit what's so
> > massively different for execlist contexts.
> With execlists, in order to dynamically unpin the LRC backing object and
> ring buffer object when not required we take a reference for each
> execlist request that uses them (remember that the execlist request
> lifecycle is currently different from the execbuffer request).  This can
> be a lot, especially in some of the less sane i-g-t tests.

Why?

Presuming the buffer objects is properly pushed onto the active list you
only need to pin while doing the command submission up to the point where
you've committed the buffer object to the active list.

I know documentation sucks for this stuff since I have this discussion
with roughly everyone ever touching anything related to active buffers :(
If you want some recent examples the cmd parser's shadow batch should
serve well (including the entire evolution from reinvented wheel to just
using the active list, although the latest patches are only 90% there and
still have 1-2 misplaced pieces).

> > > Actually I just tried to implement this, it causes a problem for patch
> > > 4 of this set as the unpin_count is also used for the ringbuffer
> > > object which has an ioremap as well as a ggtt pin.
> > 
> > Yeah, ioremap needs to be redone every time we pin/unpin. But on sane
> > archs it's almost no overhead really. And if this does start to matter (shudder
> > for 32bit kernels on gen8) then we can fix it ...
> Hm, so the CPU vaddr of the ring buffer will move around as more
> requests reference it which I suppose is not a problem.  We will use a
> lot of address space (again, especially with the i-g-t stress tests
> which can submit tens of thousands of requests in a very short space of
> time).  What would the fix be?  An extra reference count for the
> ioremap?  Looks familiar :)

ioremap always gives you the same linear address on 64bit kernels. On
32bit it makes a new one, but if you ioremap for each request it'll fall
over anyway. The solution would be to ioremap just the required pages
using the atomic kmap stuff wrapped up into the io_mapping stuff.

> I still think it's best to keep the context unpin_count for execlists mode.

Well just means the todo-list to fix up execlist grows longer.
-Daniel
Thomas Daniel Nov. 18, 2014, 3:32 p.m. UTC | #12
> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, November 18, 2014 3:11 PM
> To: Daniel, Thomas
> Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> backing objects to GGTT on-demand
> 
> On Tue, Nov 18, 2014 at 02:51:52PM +0000, Daniel, Thomas wrote:
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Tuesday, November 18, 2014 2:33 PM
> > > To: Daniel, Thomas
> > > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > context backing objects to GGTT on-demand
> > >
> > > On Tue, Nov 18, 2014 at 10:48:09AM +0000, Daniel, Thomas wrote:
> > > > > -----Original Message-----
> > > > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org]
> > > > > On Behalf Of Daniel, Thomas
> > > > > Sent: Tuesday, November 18, 2014 9:28 AM
> > > > > To: Daniel Vetter
> > > > > Cc: intel-gfx@lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > > context backing objects to GGTT on-demand
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf
> > > > > > Of Daniel Vetter
> > > > > > Sent: Monday, November 17, 2014 6:09 PM
> > > > > > To: Daniel, Thomas
> > > > > > Cc: intel-gfx@lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > > > context backing objects to GGTT on-demand
> > > > > >
> > > > > > On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > b/drivers/gpu/drm/i915/i915_drv.h index 059330c..3c7299d
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > @@ -655,6 +655,7 @@ struct intel_context {
> > > > > > >  	struct {
> > > > > > >  		struct drm_i915_gem_object *state;
> > > > > > >  		struct intel_ringbuffer *ringbuf;
> > > > > > > +		int unpin_count;
> > > > > >
> > > > > > Pinning is already refcounted. Why this additional refcount?
> > > > >
> > > > > The vma.pin_count is only allocated 4 bits of storage.  If this
> > > > > restriction can be lifted then I can use that.
> > >
> > > Those 4 bits are good enough for legacy contexts, so I wonder a bit
> > > what's so massively different for execlist contexts.
> > With execlists, in order to dynamically unpin the LRC backing object
> > and ring buffer object when not required we take a reference for each
> > execlist request that uses them (remember that the execlist request
> > lifecycle is currently different from the execbuffer request).  This
> > can be a lot, especially in some of the less sane i-g-t tests.
> 
> Why?
> 
> Presuming the buffer objects is properly pushed onto the active list you only
> need to pin while doing the command submission up to the point where
> you've committed the buffer object to the active list.
This is not currently the case.  Using the active list for context object management is one of the refactoring tasks, as we agreed.

> I know documentation sucks for this stuff since I have this discussion with
> roughly everyone ever touching anything related to active buffers :( If you
> want some recent examples the cmd parser's shadow batch should serve
> well (including the entire evolution from reinvented wheel to just using the
> active list, although the latest patches are only 90% there and still have 1-2
> misplaced pieces).
> 
> > > > Actually I just tried to implement this, it causes a problem for
> > > > patch
> > > > 4 of this set as the unpin_count is also used for the ringbuffer
> > > > object which has an ioremap as well as a ggtt pin.
> > >
> > > Yeah, ioremap needs to be redone every time we pin/unpin. But on
> > > sane archs it's almost no overhead really. And if this does start to
> > > matter (shudder for 32bit kernels on gen8) then we can fix it ...
> > Hm, so the CPU vaddr of the ring buffer will move around as more
> > requests reference it which I suppose is not a problem.  We will use a
> > lot of address space (again, especially with the i-g-t stress tests
> > which can submit tens of thousands of requests in a very short space
> > of time).  What would the fix be?  An extra reference count for the
> > ioremap?  Looks familiar :)
> 
> ioremap always gives you the same linear address on 64bit kernels. On 32bit
> it makes a new one, but if you ioremap for each request it'll fall over anyway.
Ah, I didn't know that ioremap behaved like that.

> The solution would be to ioremap just the required pages using the atomic
> kmap stuff wrapped up into the io_mapping stuff.
> 
> > I still think it's best to keep the context unpin_count for execlists mode.
> 
> Well just means the todo-list to fix up execlist grows longer.
That's OK from my point of view, this may go away anyway with some of the refactoring.  The (strong) direction I'm getting from the management is that they want these merged ASAP.

Cheers,
Thomas.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Nov. 19, 2014, 9:53 a.m. UTC | #13
On Tue, Nov 18, 2014 at 03:32:46PM +0000, Daniel, Thomas wrote:
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Tuesday, November 18, 2014 3:11 PM
> > To: Daniel, Thomas
> > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> > backing objects to GGTT on-demand
> > 
> > On Tue, Nov 18, 2014 at 02:51:52PM +0000, Daniel, Thomas wrote:
> > > > -----Original Message-----
> > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > > Daniel Vetter
> > > > Sent: Tuesday, November 18, 2014 2:33 PM
> > > > To: Daniel, Thomas
> > > > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > context backing objects to GGTT on-demand
> > > >
> > > > On Tue, Nov 18, 2014 at 10:48:09AM +0000, Daniel, Thomas wrote:
> > > > > > -----Original Message-----
> > > > > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org]
> > > > > > On Behalf Of Daniel, Thomas
> > > > > > Sent: Tuesday, November 18, 2014 9:28 AM
> > > > > > To: Daniel Vetter
> > > > > > Cc: intel-gfx@lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > > > context backing objects to GGTT on-demand
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf
> > > > > > > Of Daniel Vetter
> > > > > > > Sent: Monday, November 17, 2014 6:09 PM
> > > > > > > To: Daniel, Thomas
> > > > > > > Cc: intel-gfx@lists.freedesktop.org
> > > > > > > Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the
> > > > > > > context backing objects to GGTT on-demand
> > > > > > >
> > > > > > > On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > > b/drivers/gpu/drm/i915/i915_drv.h index 059330c..3c7299d
> > > > > > > > 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > > > > > @@ -655,6 +655,7 @@ struct intel_context {
> > > > > > > >  	struct {
> > > > > > > >  		struct drm_i915_gem_object *state;
> > > > > > > >  		struct intel_ringbuffer *ringbuf;
> > > > > > > > +		int unpin_count;
> > > > > > >
> > > > > > > Pinning is already refcounted. Why this additional refcount?
> > > > > >
> > > > > > The vma.pin_count is only allocated 4 bits of storage.  If this
> > > > > > restriction can be lifted then I can use that.
> > > >
> > > > Those 4 bits are good enough for legacy contexts, so I wonder a bit
> > > > what's so massively different for execlist contexts.
> > > With execlists, in order to dynamically unpin the LRC backing object
> > > and ring buffer object when not required we take a reference for each
> > > execlist request that uses them (remember that the execlist request
> > > lifecycle is currently different from the execbuffer request).  This
> > > can be a lot, especially in some of the less sane i-g-t tests.
> > 
> > Why?
> > 
> > Presuming the buffer objects is properly pushed onto the active list you only
> > need to pin while doing the command submission up to the point where
> > you've committed the buffer object to the active list.
> This is not currently the case.  Using the active list for context
> object management is one of the refactoring tasks, as we agreed.

Actually I even lied, you need to pin the current context and you can only
throw the old one you've just switched out. Becuase the request for the
next batch/ctx combo will complete after the switched happened this all
works out.
-Daniel
Thomas Daniel Nov. 19, 2014, 5:59 p.m. UTC | #14
For the avoidance of confusion, I want to make it clear that the latest revisions to the patches in this set posted to the list (v5) address all the review comments from the VPG guys.

[v5 1/4] http://patchwork.freedesktop.org/patch/36716/
[2/4] already accepted
[v5 3/4] http://patchwork.freedesktop.org/patch/36717/
[v5 4/4] http://patchwork.freedesktop.org/patch/36718/

Thomas.

> -----Original Message-----

> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf

> Of Daniel, Thomas

> Sent: Monday, November 17, 2014 2:56 PM

> To: intel-gfx@lists.freedesktop.org

> Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context

> backing objects to GGTT on-demand

> 

> Here is the actual review...

> 

> _____________________________________________

> From: Daniel, Thomas

> Sent: Wednesday, November 12, 2014 8:52 PM

> To: Goel, Akash

> Subject: RE: Execlists patches code review

> 

> 

> Hi Akash,

> 

> I will put the WARN messages back in and remove the need_unpin.

> The elsp_submitted count does not behave exactly as you would expect

> because of some race condition.

> Have a look at the patch “Avoid non-lite-restore preemptions” by Oscar

> Mateo for a description.

> 

> Thanks,

> Thomas.

> _____________________________________________

> From: Goel, Akash

> Sent: Tuesday, November 11, 2014 4:37 PM

> To: Daniel, Thomas

> Subject: RE: Execlists patches code review

> 

> 

> Hi Thomas,

> 

> Few comments on http://patchwork.freedesktop.org/patch/35830/

> 

> 	int elsp_submitted;

> +	bool need_unpin;

> 

> This new field has not been used anywhere.

> 

> 

> 		if (intel_execlists_ctx_id(ctx_obj) == request_id) {

> -			WARN(head_req->elsp_submitted == 0,

> -			     "Never submitted head request\n");

> 

> Sorry couldn’t get this change. Even if a request has been merged, still the

> elsp_submitted count should not be 0 here, when this function is executed

> on arrival of Context switch interrupt. When a new request is merged with a

> previously submitted request, the original value of elsp_submitted is still

> retained.

> 

> +			/* If the request has been merged, it is possible to

> get

> +			 * here with an unsubmitted request. */

>  			if (--head_req->elsp_submitted <= 0) {

> 

> 

> 

> 

> 		if (status & GEN8_CTX_STATUS_PREEMPTED) {

>  			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {

> -				if (execlists_check_remove_request(ring,

> status_id))

> -					WARN(1, "Lite Restored request

> removed from queue\n");

> +				execlists_check_remove_request(ring,

> status_id);

> 

> Same doubt here, thought that in this case of interrupt due to Preemption

> (Lite restore), which will occur when the same Context is submitted as the

> one already being executed by the Hw, the count will not drop to 0. Count

> will drop to 0 when the context switch interrupt will be generated

> subsequently.

> 

> Best regards

> Akash

> _____________________________________________

> From: Goel, Akash

> Sent: Tuesday, November 11, 2014 8:58 PM

> To: Daniel, Thomas

> Subject: RE: Execlists patches code review

> 

> 

> Hi Thomas,

> 

> I was OOP today, I will provide this review comment tomorrow on the GFX

> mailing list.

> 

> Best regards

> Akash

> _____________________________________________

> From: Daniel, Thomas

> Sent: Monday, November 10, 2014 10:41 PM

> To: Goel, Akash

> Subject: RE: Execlists patches code review

> 

> 

> Hi Akash,

> 

> Please post this comment to the mailing list.

> Assuming nobody else comments I will remove the unpin_lock and replace

> the mutex_lock(&unpin_lock) with WARN_ON(!mutex_is_locked(&dev-

> >struct_mutex)).

> 

> Cheers,

> Thomas.

> 

> _____________________________________________

> From: Goel, Akash

> Sent: Monday, November 10, 2014 11:19 AM

> To: Daniel, Thomas

> Subject: RE: Execlists patches code review

> 

> 

> In context of the 3rd patch  http://patchwork.freedesktop.org/patch/35829/

> intel_lr_context_pin is being called from logical_ring_alloc_seqno function

> and intel_lr_context_unpin  gets called from i915_gem_free_request &

> i915_gem_reset_ring_cleanup functions

> 

> All these 3 paths are already protected by dev->struct_mutex (Global lock),

> so they will always execute sequentially with respect to each other.

> 

> Do we need to have a new lock ?

> +		struct mutex unpin_lock;

> 

> Best regards

> Akash

> _______________________________________________

> Intel-gfx mailing list

> Intel-gfx@lists.freedesktop.org

> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Nov. 24, 2014, 2:24 p.m. UTC | #15
On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Up until now, we have pinned every logical ring context backing object
> during creation, and left it pinned until destruction. This made my life
> easier, but it's a harmful thing to do, because we cause fragmentation
> of the GGTT (and, eventually, we would run out of space).
> 
> This patch makes the pinning on-demand: the backing objects of the two
> contexts that are written to the ELSP are pinned right before submission
> and unpinned once the hardware is done with them. The only context that
> is still pinned regardless is the global default one, so that the HWS can
> still be accessed in the same way (ring->status_page).
> 
> v2: In the early version of this patch, we were pinning the context as
> we put it into the ELSP: on the one hand, this is very efficient because
> only a maximum two contexts are pinned at any given time, but on the other
> hand, we cannot really pin in interrupt time :(
> 
> v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
> Do not unpin default context in free_request.
> 
> v4: Break out pin and unpin into functions.  Fix style problems reported
> by checkpatch
> 
> v5: Remove unpin_lock as all pinning and unpinning is done with the struct
> mutex already locked.  Add WARN_ONs to make sure this is the case in future.
> 
> Issue: VIZ-4277
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

This patch here scored a regression (leak in the module unload path),
please address it asap. Deadline for regressions should be 1 week, then
I'll just drop the patch or apply the revert. That includes review and
everything.

https://bugs.freedesktop.org/show_bug.cgi?id=86507

Thanks,
Thomas Daniel Nov. 24, 2014, 5:14 p.m. UTC | #16
> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Monday, November 24, 2014 2:25 PM
> To: Daniel, Thomas
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH v5 3/4] drm/i915/bdw: Pin the context
> backing objects to GGTT on-demand
> 
> On Thu, Nov 13, 2014 at 10:28:10AM +0000, Thomas Daniel wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > Up until now, we have pinned every logical ring context backing object
> > during creation, and left it pinned until destruction. This made my
> > life easier, but it's a harmful thing to do, because we cause
> > fragmentation of the GGTT (and, eventually, we would run out of space).
> >
> > This patch makes the pinning on-demand: the backing objects of the two
> > contexts that are written to the ELSP are pinned right before
> > submission and unpinned once the hardware is done with them. The only
> > context that is still pinned regardless is the global default one, so
> > that the HWS can still be accessed in the same way (ring->status_page).
> >
> > v2: In the early version of this patch, we were pinning the context as
> > we put it into the ELSP: on the one hand, this is very efficient
> > because only a maximum two contexts are pinned at any given time, but
> > on the other hand, we cannot really pin in interrupt time :(
> >
> > v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
> > Do not unpin default context in free_request.
> >
> > v4: Break out pin and unpin into functions.  Fix style problems
> > reported by checkpatch
> >
> > v5: Remove unpin_lock as all pinning and unpinning is done with the
> > struct mutex already locked.  Add WARN_ONs to make sure this is the case
> in future.
> >
> > Issue: VIZ-4277
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> 
> This patch here scored a regression (leak in the module unload path), please
> address it asap. Deadline for regressions should be 1 week, then I'll just drop
> the patch or apply the revert. That includes review and everything.
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=86507 

Leak identified.  The fix is simple.
Do you want a v6 or a follow-up patch?

Cheers,
Thomas.
Daniel Vetter Nov. 24, 2014, 8:15 p.m. UTC | #17
On Mon, Nov 24, 2014 at 6:14 PM, Daniel, Thomas <thomas.daniel@intel.com> wrote:
>> This patch here scored a regression (leak in the module unload path), please
>> address it asap. Deadline for regressions should be 1 week, then I'll just drop
>> the patch or apply the revert. That includes review and everything.
>>
>> https://bugs.freedesktop.org/show_bug.cgi?id=86507
>
> Leak identified.  The fix is simple.
> Do you want a v6 or a follow-up patch?

Tree is already tagged so no rebasing, hence full-blown patch with all
the bells and wistles please. In general I always prefer the follow-up
patch when I've merged the original one already - squashing in is easy
if still possible, but untangling if a freeze point happened (like
here) more of a pain.

Thanks, Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e60d5c2..6eaf813 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1799,10 +1799,16 @@  static int i915_dump_lrc(struct seq_file *m, void *unused)
 				continue;
 
 			if (ctx_obj) {
-				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
-				uint32_t *reg_state = kmap_atomic(page);
+				struct page *page;
+				uint32_t *reg_state;
 				int j;
 
+				i915_gem_obj_ggtt_pin(ctx_obj,
+						GEN8_LR_CONTEXT_ALIGN, 0);
+
+				page = i915_gem_object_get_page(ctx_obj, 1);
+				reg_state = kmap_atomic(page);
+
 				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
 						intel_execlists_ctx_id(ctx_obj));
 
@@ -1814,6 +1820,8 @@  static int i915_dump_lrc(struct seq_file *m, void *unused)
 				}
 				kunmap_atomic(reg_state);
 
+				i915_gem_object_ggtt_unpin(ctx_obj);
+
 				seq_putc(m, '\n');
 			}
 		}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 059330c..3c7299d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -655,6 +655,7 @@  struct intel_context {
 	struct {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
+		int unpin_count;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 408afe7..2ee6996 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2494,12 +2494,18 @@  static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 
 static void i915_gem_free_request(struct drm_i915_gem_request *request)
 {
+	struct intel_context *ctx = request->ctx;
+
 	list_del(&request->list);
 	i915_gem_request_remove_from_client(request);
 
-	if (request->ctx)
-		i915_gem_context_unreference(request->ctx);
+	if (i915.enable_execlists && ctx) {
+		struct intel_engine_cs *ring = request->ring;
 
+		if (ctx != ring->default_context)
+			intel_lr_context_unpin(ring, ctx);
+		i915_gem_context_unreference(ctx);
+	}
 	kfree(request);
 }
 
@@ -2554,6 +2560,23 @@  static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	}
 
 	/*
+	 * Clear the execlists queue up before freeing the requests, as those
+	 * are the ones that keep the context and ringbuffer backing objects
+	 * pinned in place.
+	 */
+	while (!list_empty(&ring->execlist_queue)) {
+		struct intel_ctx_submit_request *submit_req;
+
+		submit_req = list_first_entry(&ring->execlist_queue,
+				struct intel_ctx_submit_request,
+				execlist_link);
+		list_del(&submit_req->execlist_link);
+		intel_runtime_pm_put(dev_priv);
+		i915_gem_context_unreference(submit_req->ctx);
+		kfree(submit_req);
+	}
+
+	/*
 	 * We must free the requests after all the corresponding objects have
 	 * been moved off active lists. Which is the same order as the normal
 	 * retire_requests function does. This is important if object hold
@@ -2570,18 +2593,6 @@  static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_free_request(request);
 	}
 
-	while (!list_empty(&ring->execlist_queue)) {
-		struct intel_ctx_submit_request *submit_req;
-
-		submit_req = list_first_entry(&ring->execlist_queue,
-				struct intel_ctx_submit_request,
-				execlist_link);
-		list_del(&submit_req->execlist_link);
-		intel_runtime_pm_put(dev_priv);
-		i915_gem_context_unreference(submit_req->ctx);
-		kfree(submit_req);
-	}
-
 	/* These may not have been flush before the reset, do so now */
 	kfree(ring->preallocated_lazy_request);
 	ring->preallocated_lazy_request = NULL;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 906b985..f7fa0f7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -139,8 +139,6 @@ 
 #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
 #define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
 
-#define GEN8_LR_CONTEXT_ALIGN 4096
-
 #define RING_EXECLIST_QFULL		(1 << 0x2)
 #define RING_EXECLIST1_VALID		(1 << 0x3)
 #define RING_EXECLIST0_VALID		(1 << 0x4)
@@ -801,9 +799,40 @@  void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
 	execlists_context_queue(ring, ctx, ringbuf->tail);
 }
 
+static int intel_lr_context_pin(struct intel_engine_cs *ring,
+		struct intel_context *ctx)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	int ret = 0;
+
+	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+	if (ctx->engine[ring->id].unpin_count++ == 0) {
+		ret = i915_gem_obj_ggtt_pin(ctx_obj,
+				GEN8_LR_CONTEXT_ALIGN, 0);
+		if (ret)
+			ctx->engine[ring->id].unpin_count = 0;
+	}
+
+	return ret;
+}
+
+void intel_lr_context_unpin(struct intel_engine_cs *ring,
+		struct intel_context *ctx)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+
+	if (ctx_obj) {
+		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+		if (--ctx->engine[ring->id].unpin_count == 0)
+			i915_gem_object_ggtt_unpin(ctx_obj);
+	}
+}
+
 static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
 				    struct intel_context *ctx)
 {
+	int ret;
+
 	if (ring->outstanding_lazy_seqno)
 		return 0;
 
@@ -814,6 +843,14 @@  static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
 		if (request == NULL)
 			return -ENOMEM;
 
+		if (ctx != ring->default_context) {
+			ret = intel_lr_context_pin(ring, ctx);
+			if (ret) {
+				kfree(request);
+				return ret;
+			}
+		}
+
 		/* Hold a reference to the context this request belongs to
 		 * (we will need it when the time comes to emit/retire the
 		 * request).
@@ -1626,12 +1663,16 @@  void intel_lr_context_free(struct intel_context *ctx)
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
-		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
 
 		if (ctx_obj) {
+			struct intel_ringbuffer *ringbuf =
+					ctx->engine[i].ringbuf;
+			struct intel_engine_cs *ring = ringbuf->ring;
+
 			intel_destroy_ringbuffer_obj(ringbuf);
 			kfree(ringbuf);
-			i915_gem_object_ggtt_unpin(ctx_obj);
+			if (ctx == ring->default_context)
+				i915_gem_object_ggtt_unpin(ctx_obj);
 			drm_gem_object_unreference(&ctx_obj->base);
 		}
 	}
@@ -1695,6 +1736,7 @@  static int lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
+	const bool is_global_default_ctx = (ctx == ring->default_context);
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
 	uint32_t context_size;
@@ -1714,18 +1756,22 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return ret;
 	}
 
-	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
-		drm_gem_object_unreference(&ctx_obj->base);
-		return ret;
+	if (is_global_default_ctx) {
+		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+		if (ret) {
+			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n",
+					ret);
+			drm_gem_object_unreference(&ctx_obj->base);
+			return ret;
+		}
 	}
 
 	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
 	if (!ringbuf) {
 		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
 				ring->name);
-		i915_gem_object_ggtt_unpin(ctx_obj);
+		if (is_global_default_ctx)
+			i915_gem_object_ggtt_unpin(ctx_obj);
 		drm_gem_object_unreference(&ctx_obj->base);
 		ret = -ENOMEM;
 		return ret;
@@ -1787,7 +1833,8 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 error:
 	kfree(ringbuf);
-	i915_gem_object_ggtt_unpin(ctx_obj);
+	if (is_global_default_ctx)
+		i915_gem_object_ggtt_unpin(ctx_obj);
 	drm_gem_object_unreference(&ctx_obj->base);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 84bbf19..14b216b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,8 @@ 
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+#define GEN8_LR_CONTEXT_ALIGN 4096
+
 /* Execlists regs */
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
 #define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
@@ -67,6 +69,8 @@  int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
+void intel_lr_context_unpin(struct intel_engine_cs *ring,
+		struct intel_context *ctx);
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);