From patchwork Wed Jul 15 08:51:41 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: sourab.gupta@intel.com X-Patchwork-Id: 6794661 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 1EC65C05AC for ; Wed, 15 Jul 2015 08:49:57 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E5AD42060A for ; Wed, 15 Jul 2015 08:49:55 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id C5267205F4 for ; Wed, 15 Jul 2015 08:49:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 28BEB6EB17; Wed, 15 Jul 2015 01:49:54 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTP id 9EE986EB17 for ; Wed, 15 Jul 2015 01:49:52 -0700 (PDT) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP; 15 Jul 2015 01:49:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,479,1432623600"; d="scan'208";a="762855633" Received: from sourabgu-desktop.iind.intel.com ([10.223.82.35]) by fmsmga002.fm.intel.com with ESMTP; 15 Jul 2015 01:49:49 -0700 From: sourab.gupta@intel.com To: intel-gfx@lists.freedesktop.org Date: Wed, 15 Jul 2015 14:21:41 +0530 Message-Id: <1436950306-14147-4-git-send-email-sourab.gupta@intel.com> X-Mailer: git-send-email 1.8.5.1 In-Reply-To: <1436950306-14147-1-git-send-email-sourab.gupta@intel.com> References: <1436950306-14147-1-git-send-email-sourab.gupta@intel.com> Cc: Insoo Woo , Peter Zijlstra , Jabin Wu , Sourab Gupta Subject: [Intel-gfx] [RFC 3/8] drm/i915: Handle event stop and destroy for GPU commands submitted X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Sourab Gupta This patch handles the event stop and destroy callbacks taking into account the fact that there may be commands scheduled on GPU which may utilize the destination buffer. The event stop would just set the event state, and stop forwarding data to userspace. From userspace perspective, for all purposes, the event sampling is stopped.A subsequent event start (without event destroy) would start forwarding samples again. The event destroy releases the local copy of the dest buffer. But since it is expected that the active reference of buffer is taken while inserting commands, we can rest assured that buffer is freed up only after GPU is done with it. Still there is a need to schedule a worker from event destroy, because we need to do some further stuff like freeing up request references. The ideal solution here would be to have a callback when the last request is finished on GPU, so that we can do this stuff there (WIP:Chris' retire-notification mechanism). Till the time, a worker thread will do. A subsequent event init would have to wait for previously submitted RPC commands to complete or return -EBUSY. Currently, for the sake of simplicity, we are returning -EBUSY if such a case is detected. Signed-off-by: Sourab Gupta --- drivers/gpu/drm/i915/i915_drv.h | 2 + drivers/gpu/drm/i915/i915_oa_perf.c | 104 +++++++++++++++++++++++++++++++----- 2 files changed, 92 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 6984150..41a01bd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1679,6 +1679,7 @@ struct i915_gen_pmu_node { struct list_head head; struct drm_i915_gem_request *req; u32 offset; + bool discard; u32 ctx_id; }; @@ -2008,6 +2009,7 @@ struct drm_i915_private { } buffer; struct list_head node_list; struct work_struct work_timer; + struct work_struct work_event_destroy; } gen_pmu; void (*insert_profile_cmd[I915_PROFILE_MAX]) diff --git a/drivers/gpu/drm/i915/i915_oa_perf.c b/drivers/gpu/drm/i915/i915_oa_perf.c index 350b560..107570e 100644 --- a/drivers/gpu/drm/i915/i915_oa_perf.c +++ b/drivers/gpu/drm/i915/i915_oa_perf.c @@ -441,6 +441,36 @@ int i915_gen_pmu_wait_gpu(struct drm_i915_private *dev_priv) return 0; } +void i915_gen_pmu_release_request_ref(struct drm_i915_private *dev_priv) +{ + struct i915_gen_pmu_node *entry, *next; + struct drm_i915_gem_request *req; + unsigned long lock_flags; + int ret; + + list_for_each_entry_safe + (entry, next, &dev_priv->gen_pmu.node_list, head) { + req = entry->req; + if (req) { + ret = i915_mutex_lock_interruptible(dev_priv->dev); + if (ret) + break; + i915_gem_request_assign(&entry->req, NULL); + mutex_unlock(&dev_priv->dev->struct_mutex); + } + + /* + * This fn won't be running concurrently with forward snapshots + * work fn. These are the only two places where list entries + * will be deleted. So no need of protecting full loop? + */ + spin_lock_irqsave(&dev_priv->gen_pmu.lock, lock_flags); + list_del(&entry->head); + spin_unlock_irqrestore(&dev_priv->gen_pmu.lock, lock_flags); + kfree(entry); + } +} + static void forward_one_gen_pmu_sample(struct drm_i915_private *dev_priv, struct i915_gen_pmu_node *node) { @@ -479,11 +509,19 @@ void forward_gen_pmu_snapshots_work(struct work_struct *__work) unsigned long lock_flags; int ret; + spin_lock_irqsave(&dev_priv->gen_pmu.lock, lock_flags); + if (dev_priv->gen_pmu.event_active == false) { + spin_unlock_irqrestore(&dev_priv->gen_pmu.lock, lock_flags); + return; + } + spin_unlock_irqrestore(&dev_priv->gen_pmu.lock, lock_flags); + list_for_each_entry_safe (entry, next, &dev_priv->gen_pmu.node_list, head) { req = entry->req; if (req && i915_gem_request_completed(req, true)) { - forward_one_gen_pmu_sample(dev_priv, entry); + if (!entry->discard) + forward_one_gen_pmu_sample(dev_priv, entry); ret = i915_mutex_lock_interruptible(dev_priv->dev); if (ret) break; @@ -667,18 +705,11 @@ out: static void gen_buffer_destroy(struct drm_i915_private *i915) { - unsigned long lock_flags; - mutex_lock(&i915->dev->struct_mutex); vunmap(i915->gen_pmu.buffer.addr); i915_gem_object_ggtt_unpin(i915->gen_pmu.buffer.obj); drm_gem_object_unreference(&i915->gen_pmu.buffer.obj->base); mutex_unlock(&i915->dev->struct_mutex); - - spin_lock_irqsave(&i915->gen_pmu.lock, lock_flags); - i915->gen_pmu.buffer.obj = NULL; - i915->gen_pmu.buffer.addr = NULL; - spin_unlock_irqrestore(&i915->gen_pmu.lock, lock_flags); } static void i915_gen_event_destroy(struct perf_event *event) @@ -688,10 +719,44 @@ static void i915_gen_event_destroy(struct perf_event *event) WARN_ON(event->parent); - gen_buffer_destroy(i915); + cancel_work_sync(&i915->gen_pmu.work_timer); BUG_ON(i915->gen_pmu.exclusive_event != event); i915->gen_pmu.exclusive_event = NULL; + + /* We can deference our local copy of dest buffer here, since + * an active reference of buffer would be taken while + * inserting commands. So the buffer would be freed up only + * after GPU is done with it. + */ + gen_buffer_destroy(i915); + + schedule_work(&i915->gen_pmu.work_event_destroy); +} + +void i915_gen_pmu_event_destroy_work(struct work_struct *__work) +{ + struct drm_i915_private *dev_priv = + container_of(__work, typeof(*dev_priv), + gen_pmu.work_event_destroy); + unsigned long lock_flags; + int ret; + + ret = i915_gen_pmu_wait_gpu(dev_priv); + if (ret) + goto out; + + i915_gen_pmu_release_request_ref(dev_priv); + +out: + /* + * Done at end, as this excludes a new event till we've done processing + * the old one + */ + spin_lock_irqsave(&dev_priv->gen_pmu.lock, lock_flags); + dev_priv->gen_pmu.buffer.obj = NULL; + dev_priv->gen_pmu.buffer.addr = NULL; + spin_unlock_irqrestore(&dev_priv->gen_pmu.lock, lock_flags); } static int alloc_obj(struct drm_i915_private *dev_priv, @@ -1412,6 +1477,7 @@ static int i915_gen_event_init(struct perf_event *event) { struct drm_i915_private *dev_priv = container_of(event->pmu, typeof(*dev_priv), gen_pmu.pmu); + unsigned long lock_flags; int ret = 0; if (event->attr.type != event->pmu->type) @@ -1420,8 +1486,12 @@ static int i915_gen_event_init(struct perf_event *event) /* To avoid the complexity of having to accurately filter * data and marshal to the appropriate client * we currently only allow exclusive access */ - if (dev_priv->gen_pmu.buffer.obj) + spin_lock_irqsave(&dev_priv->gen_pmu.lock, lock_flags); + if (dev_priv->gen_pmu.buffer.obj) { + spin_unlock_irqrestore(&dev_priv->gen_pmu.lock, lock_flags); return -EBUSY; + } + spin_unlock_irqrestore(&dev_priv->gen_pmu.lock, lock_flags); /* * We need to check for CAP_SYS_ADMIN capability as we profile all @@ -1464,16 +1534,18 @@ static void i915_gen_event_stop(struct perf_event *event, int flags) { struct drm_i915_private *dev_priv = container_of(event->pmu, typeof(*dev_priv), gen_pmu.pmu); + struct i915_gen_pmu_node *entry; unsigned long lock_flags; + hrtimer_cancel(&dev_priv->gen_pmu.timer); + gen_pmu_flush_snapshots(dev_priv); + spin_lock_irqsave(&dev_priv->gen_pmu.lock, lock_flags); dev_priv->gen_pmu.event_active = false; - + list_for_each_entry(entry, &dev_priv->gen_pmu.node_list, head) + entry->discard = true; spin_unlock_irqrestore(&dev_priv->gen_pmu.lock, lock_flags); - hrtimer_cancel(&dev_priv->gen_pmu.timer); - gen_pmu_flush_snapshots(dev_priv); - event->hw.state = PERF_HES_STOPPED; } @@ -1660,6 +1732,9 @@ void i915_gen_pmu_register(struct drm_device *dev) i915->gen_pmu.timer.function = hrtimer_sample_gen; INIT_WORK(&i915->gen_pmu.work_timer, forward_gen_pmu_snapshots_work); + INIT_WORK(&i915->gen_pmu.work_event_destroy, + i915_gen_pmu_event_destroy_work); + spin_lock_init(&i915->gen_pmu.lock); i915->gen_pmu.pmu.capabilities = PERF_PMU_CAP_IS_DEVICE; @@ -1690,6 +1765,7 @@ void i915_gen_pmu_unregister(struct drm_device *dev) return; cancel_work_sync(&i915->gen_pmu.work_timer); + cancel_work_sync(&i915->gen_pmu.work_event_destroy); perf_pmu_unregister(&i915->gen_pmu.pmu); i915->gen_pmu.pmu.event_init = NULL;