From patchwork Wed Jul 15 08:46:57 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: sourab.gupta@intel.com X-Patchwork-Id: 6794571 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 249E2C05AC for ; Wed, 15 Jul 2015 08:45:01 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C30FA2060A for ; Wed, 15 Jul 2015 08:44:59 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 3848020603 for ; Wed, 15 Jul 2015 08:44:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C209F6EA73; Wed, 15 Jul 2015 01:44:57 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTP id 4BC9B6EA73 for ; Wed, 15 Jul 2015 01:44:56 -0700 (PDT) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 15 Jul 2015 01:44:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,479,1432623600"; d="scan'208";a="764782422" Received: from sourabgu-desktop.iind.intel.com ([10.223.82.35]) by orsmga002.jf.intel.com with ESMTP; 15 Jul 2015 01:44:53 -0700 From: sourab.gupta@intel.com To: intel-gfx@lists.freedesktop.org Date: Wed, 15 Jul 2015 14:16:57 +0530 Message-Id: <1436950023-13940-3-git-send-email-sourab.gupta@intel.com> X-Mailer: git-send-email 1.8.5.1 In-Reply-To: <1436950023-13940-1-git-send-email-sourab.gupta@intel.com> References: <1436950023-13940-1-git-send-email-sourab.gupta@intel.com> Cc: Insoo Woo , Peter Zijlstra , Jabin Wu , Sourab Gupta Subject: [Intel-gfx] [RFC 2/8] drm/i915: Introduce mode for capture of multi ctx OA reports synchronized with RCS X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Sourab Gupta This patch introduces a mode of capturing OA counter reports belonging to multiple contexts, which can be mapped back to individual contexts. The OA reports captured in this way are synchronized with Render command stream. There may be usecases wherein we need more than periodic OA capture mode which is supported by perf_event currently. We may need to insert RCS synchronized commands to capture the OA counter snapshots. This mode is primarily used for two usecases: - Ability to capture system wide metrics, alongwith the ability to map the reports back to individual contexts. - Ability to inject tags for work, into the reports. This provides visibility into the multiple stages of work within single context. The OA reports generated in this way will be forwarded to userspace after appending a footer, which will have this metadata information. This will enable the usecases mentioned above. This patch introduces an additional field in the oa attr structure for supporting this capture mode. The data thus captured needs to be stored in a separate buffer, which will be different from the buffer used otherwise for periodic OA capture mode. Again this buffer address will not need to be mapped to OA unit register addresses such as OASTATUS1, OASTATUS2 and OABUFFER. The subsequent patches introduce the mechanism for forwarding reports to userspace, handling the command synchronization and mechanism for inserting corresponding commands into the ringbuffer. Signed-off-by: Sourab Gupta --- drivers/gpu/drm/i915/i915_drv.h | 9 ++ drivers/gpu/drm/i915/i915_oa_perf.c | 173 +++++++++++++++++++++++++++--------- include/uapi/drm/i915_drm.h | 3 +- 3 files changed, 143 insertions(+), 42 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index baa0234..740148d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1930,6 +1930,7 @@ struct drm_i915_private { bool event_active; bool periodic; + bool multiple_ctx_mode; u32 period_exponent; u32 metrics_set; @@ -1944,6 +1945,14 @@ struct drm_i915_private { int format_size; spinlock_t flush_lock; } oa_buffer; + + /* Fields for multiple context capture mode */ + struct { + struct drm_i915_gem_object *obj; + u8 *addr; + int format; + int format_size; + } oa_rcs_buffer; } oa_pmu; #endif diff --git a/drivers/gpu/drm/i915/i915_oa_perf.c b/drivers/gpu/drm/i915/i915_oa_perf.c index e7e0b2b..b79582b 100644 --- a/drivers/gpu/drm/i915/i915_oa_perf.c +++ b/drivers/gpu/drm/i915/i915_oa_perf.c @@ -166,19 +166,39 @@ static void flush_oa_snapshots(struct drm_i915_private *dev_priv, } static void -oa_buffer_destroy(struct drm_i915_private *i915) +oa_rcs_buffer_destroy(struct drm_i915_private *i915) { + unsigned long lock_flags; + mutex_lock(&i915->dev->struct_mutex); + vunmap(i915->oa_pmu.oa_rcs_buffer.addr); + i915_gem_object_ggtt_unpin(i915->oa_pmu.oa_rcs_buffer.obj); + drm_gem_object_unreference(&i915->oa_pmu.oa_rcs_buffer.obj->base); + mutex_unlock(&i915->dev->struct_mutex); + spin_lock_irqsave(&i915->oa_pmu.lock, lock_flags); + i915->oa_pmu.oa_rcs_buffer.obj = NULL; + i915->oa_pmu.oa_rcs_buffer.addr = NULL; + spin_unlock_irqrestore(&i915->oa_pmu.lock, lock_flags); +} + +static void +oa_buffer_destroy(struct drm_i915_private *i915) +{ + unsigned long lock_flags; + + mutex_lock(&i915->dev->struct_mutex); vunmap(i915->oa_pmu.oa_buffer.addr); i915_gem_object_ggtt_unpin(i915->oa_pmu.oa_buffer.obj); drm_gem_object_unreference(&i915->oa_pmu.oa_buffer.obj->base); + mutex_unlock(&i915->dev->struct_mutex); + spin_lock_irqsave(&i915->oa_pmu.lock, lock_flags); i915->oa_pmu.oa_buffer.obj = NULL; i915->oa_pmu.oa_buffer.gtt_offset = 0; i915->oa_pmu.oa_buffer.addr = NULL; + spin_unlock_irqrestore(&i915->oa_pmu.lock, lock_flags); - mutex_unlock(&i915->dev->struct_mutex); } static void i915_oa_event_destroy(struct perf_event *event) @@ -207,6 +227,9 @@ static void i915_oa_event_destroy(struct perf_event *event) I915_WRITE(GDT_CHICKEN_BITS, (I915_READ(GDT_CHICKEN_BITS) & ~GT_NOA_ENABLE)); + if (dev_priv->oa_pmu.multiple_ctx_mode) + oa_rcs_buffer_destroy(dev_priv); + oa_buffer_destroy(dev_priv); BUG_ON(dev_priv->oa_pmu.exclusive_event != event); @@ -216,6 +239,59 @@ static void i915_oa_event_destroy(struct perf_event *event) intel_runtime_pm_put(dev_priv); } +static int alloc_obj(struct drm_i915_private *dev_priv, + struct drm_i915_gem_object **obj) +{ + struct drm_i915_gem_object *bo; + int ret; + + /* NB: We over allocate the OA buffer due to the way raw sample data + * gets copied from the gpu mapped circular buffer into the perf + * circular buffer so that only one copy is required. + * + * For each perf sample (raw->size + 4) needs to be 8 byte aligned, + * where the 4 corresponds to the 32bit raw->size member that's + * added to the sample header that userspace sees. + * + * Due to the + 4 for the size member: when we copy a report to the + * userspace facing perf buffer we always copy an additional 4 bytes + * from the subsequent report to make up for the miss alignment, but + * when a report is at the end of the gpu mapped buffer we need to + * read 4 bytes past the end of the buffer. + */ + intel_runtime_pm_get(dev_priv); + + ret = i915_mutex_lock_interruptible(dev_priv->dev); + if (ret) + goto out; + + bo = i915_gem_alloc_object(dev_priv->dev, OA_BUFFER_SIZE + PAGE_SIZE); + if (bo == NULL) { + DRM_ERROR("Failed to allocate OA buffer\n"); + ret = -ENOMEM; + goto unlock; + } + ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC); + if (ret) + goto err_unref; + + /* PreHSW required 512K alignment, HSW requires 16M */ + ret = i915_gem_obj_ggtt_pin(bo, SZ_16M, 0); + if (ret) + goto err_unref; + + *obj = bo; + goto unlock; + +err_unref: + drm_gem_object_unreference(&bo->base); +unlock: + mutex_unlock(&dev_priv->dev->struct_mutex); +out: + intel_runtime_pm_put(dev_priv); + return ret; +} + static void *vmap_oa_buffer(struct drm_i915_gem_object *obj) { int i; @@ -257,42 +333,13 @@ static int init_oa_buffer(struct perf_event *event) BUG_ON(!IS_HASWELL(dev_priv->dev)); BUG_ON(dev_priv->oa_pmu.oa_buffer.obj); - ret = i915_mutex_lock_interruptible(dev_priv->dev); - if (ret) - return ret; - spin_lock_init(&dev_priv->oa_pmu.oa_buffer.flush_lock); - /* NB: We over allocate the OA buffer due to the way raw sample data - * gets copied from the gpu mapped circular buffer into the perf - * circular buffer so that only one copy is required. - * - * For each perf sample (raw->size + 4) needs to be 8 byte aligned, - * where the 4 corresponds to the 32bit raw->size member that's - * added to the sample header that userspace sees. - * - * Due to the + 4 for the size member: when we copy a report to the - * userspace facing perf buffer we always copy an additional 4 bytes - * from the subsequent report to make up for the miss alignment, but - * when a report is at the end of the gpu mapped buffer we need to - * read 4 bytes past the end of the buffer. - */ - bo = i915_gem_alloc_object(dev_priv->dev, OA_BUFFER_SIZE + PAGE_SIZE); - if (bo == NULL) { - DRM_ERROR("Failed to allocate OA buffer\n"); - ret = -ENOMEM; - goto unlock; - } - dev_priv->oa_pmu.oa_buffer.obj = bo; - - ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC); + ret = alloc_obj(dev_priv, &bo); if (ret) - goto err_unref; + return ret; - /* PreHSW required 512K alignment, HSW requires 16M */ - ret = i915_gem_obj_ggtt_pin(bo, SZ_16M, 0); - if (ret) - goto err_unref; + dev_priv->oa_pmu.oa_buffer.obj = bo; dev_priv->oa_pmu.oa_buffer.gtt_offset = i915_gem_obj_ggtt_offset(bo); dev_priv->oa_pmu.oa_buffer.addr = vmap_oa_buffer(bo); @@ -309,14 +356,30 @@ static int init_oa_buffer(struct perf_event *event) dev_priv->oa_pmu.oa_buffer.gtt_offset, dev_priv->oa_pmu.oa_buffer.addr); - goto unlock; + return 0; +} -err_unref: - drm_gem_object_unreference(&bo->base); +static int init_oa_rcs_buffer(struct perf_event *event) +{ + struct drm_i915_private *dev_priv = + container_of(event->pmu, typeof(*dev_priv), oa_pmu.pmu); + struct drm_i915_gem_object *bo; + int ret; -unlock: - mutex_unlock(&dev_priv->dev->struct_mutex); - return ret; + BUG_ON(dev_priv->oa_pmu.oa_rcs_buffer.obj); + + ret = alloc_obj(dev_priv, &bo); + if (ret) + return ret; + + dev_priv->oa_pmu.oa_rcs_buffer.obj = bo; + + dev_priv->oa_pmu.oa_rcs_buffer.addr = vmap_oa_buffer(bo); + + DRM_DEBUG_DRIVER("OA RCS Buffer initialized, vaddr = %p", + dev_priv->oa_pmu.oa_rcs_buffer.addr); + + return 0; } static enum hrtimer_restart hrtimer_sample(struct hrtimer *hrtimer) @@ -427,6 +490,7 @@ static int i915_oa_event_init(struct perf_event *event) container_of(event->pmu, typeof(*dev_priv), oa_pmu.pmu); drm_i915_oa_attr_t oa_attr; u64 report_format; + unsigned long lock_flags; int ret = 0; if (event->attr.type != event->pmu->type) @@ -439,11 +503,28 @@ static int i915_oa_event_init(struct perf_event *event) /* To avoid the complexity of having to accurately filter * counter snapshots and marshal to the appropriate client * we currently only allow exclusive access */ - if (dev_priv->oa_pmu.oa_buffer.obj) + spin_lock_irqsave(&dev_priv->oa_pmu.lock, lock_flags); + if (dev_priv->oa_pmu.oa_buffer.obj) { + spin_unlock_irqrestore(&dev_priv->oa_pmu.lock, lock_flags); return -EBUSY; + } + spin_unlock_irqrestore(&dev_priv->oa_pmu.lock, lock_flags); + + /* + * In case of multiple context mode, we need to check for + * CAP_SYS_ADMIN capability as we need to profile all the running + * contexts + */ + if (oa_attr.multiple_context_mode) { + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; + dev_priv->oa_pmu.multiple_ctx_mode = true; + } report_format = oa_attr.format; dev_priv->oa_pmu.oa_buffer.format = report_format; + if (oa_attr.multiple_context_mode) + dev_priv->oa_pmu.oa_rcs_buffer.format = report_format; dev_priv->oa_pmu.metrics_set = oa_attr.metrics_set; if (IS_HASWELL(dev_priv->dev)) { @@ -457,6 +538,9 @@ static int i915_oa_event_init(struct perf_event *event) return -EINVAL; dev_priv->oa_pmu.oa_buffer.format_size = snapshot_size; + if (oa_attr.multiple_context_mode) + dev_priv->oa_pmu.oa_rcs_buffer.format_size = + snapshot_size; if (oa_attr.metrics_set > I915_OA_METRICS_SET_MAX) return -EINVAL; @@ -465,6 +549,7 @@ static int i915_oa_event_init(struct perf_event *event) return -ENODEV; } + /* Since we are limited to an exponential scale for * programming the OA sampling period we don't allow userspace * to pass a precise attr.sample_period. */ @@ -528,6 +613,12 @@ static int i915_oa_event_init(struct perf_event *event) if (ret) return ret; + if (oa_attr.multiple_context_mode) { + ret = init_oa_rcs_buffer(event); + if (ret) + return ret; + } + BUG_ON(dev_priv->oa_pmu.exclusive_event); dev_priv->oa_pmu.exclusive_event = event; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 992e1e9..dcf7c87 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -92,7 +92,8 @@ typedef struct _drm_i915_oa_attr { __u32 ctx_id; __u64 single_context : 1, - __reserved_1 : 63; + multiple_context_mode:1, + __reserved_1:62; } drm_i915_oa_attr_t; /* Header for PERF_RECORD_DEVICE type events */