From patchwork Tue Sep 29 14:39:07 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Robert Bragg X-Patchwork-Id: 7286641 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 01DC0BEEA4 for ; Tue, 29 Sep 2015 14:41:01 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 47FDD20460 for ; Tue, 29 Sep 2015 14:40:58 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 5408720443 for ; Tue, 29 Sep 2015 14:40:55 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9A7F76E7E9; Tue, 29 Sep 2015 07:40:54 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by gabe.freedesktop.org (Postfix) with ESMTPS id DC6F06E7E9; Tue, 29 Sep 2015 07:40:52 -0700 (PDT) Received: by padhy16 with SMTP id hy16so8182169pad.1; Tue, 29 Sep 2015 07:40:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=C5hx5AYYsDY9xWmQBH55kNY6RafcOttBH+ogG7qh5Jo=; b=ACchg2fIpnHvJ415L5RstNPsBNhUVPKwtrtw644ssPBgq+RCeB6Dyzr+/NkHjgU+Ax BrWns9+PzACighrbpvJRhgFMeuE5VD7ZhoaYGMV0YagL4BQl1VzXj0blVQL/NIOphbpQ x/wyPQq4M5DxEnOdywq94IV66oH0yTFF99W34veztId+HS/Ji/uhe9UJFHCno+RvJ4Hy DTGAj9CqD0/nFLzbaNdiS2ha1sOg+GPldrqJ2oz9ddky3sxyURS9f+rS60HA31/0HO+W DkowQRhAMBNfK8VavOpZOUcrclHBmsmJY+tpLW9Piz/HHf85yWPki+mQO/QPHTBhG1bl TVZQ== X-Received: by 10.68.96.197 with SMTP id du5mr33208402pbb.32.1443537652420; Tue, 29 Sep 2015 07:40:52 -0700 (PDT) Received: from sixbynine.org (host-2-103-31-44.as13285.net. [2.103.31.44]) by smtp.gmail.com with ESMTPSA id hq8sm26190972pad.35.2015.09.29.07.40.16 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Sep 2015 07:40:51 -0700 (PDT) From: Robert Bragg To: intel-gfx@lists.freedesktop.org Date: Tue, 29 Sep 2015 15:39:07 +0100 Message-Id: <1443537549-6905-5-git-send-email-robert@sixbynine.org> X-Mailer: git-send-email 2.5.2 In-Reply-To: <1443537549-6905-1-git-send-email-robert@sixbynine.org> References: <1443537549-6905-1-git-send-email-robert@sixbynine.org> MIME-Version: 1.0 Cc: Mark Rutland , Matt Fleming , David Airlie , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Sourab Gupta , linux-api@vger.kernel.org, Zheng Yan , Daniel Vetter , Ingo Molnar , Alexander Shishkin Subject: [Intel-gfx] [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. Cc: Chris Wilson Signed-off-by: Robert Bragg Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/i915_drv.h | 57 +++ drivers/gpu/drm/i915/i915_gem_context.c | 23 +- drivers/gpu/drm/i915/i915_perf.c | 697 +++++++++++++++++++++++++++++++- drivers/gpu/drm/i915/i915_reg.h | 338 ++++++++++++++++ include/uapi/drm/i915_drm.h | 63 +++ 5 files changed, 1171 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0cb36d9..d6db816 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1694,6 +1694,11 @@ struct i915_execbuffer_params { struct drm_i915_gem_request *request; }; +struct i915_oa_format { + u32 format; + int size; +}; + struct i915_oa_reg { u32 addr; u32 value; @@ -1760,6 +1765,20 @@ struct i915_perf_event { void (*destroy)(struct i915_perf_event *event); }; +struct i915_oa_ops { + void (*init_oa_buffer)(struct drm_i915_private *dev_priv); + void (*enable_metric_set)(struct drm_i915_private *dev_priv); + void (*disable_metric_set)(struct drm_i915_private *dev_priv); + void (*oa_enable)(struct drm_i915_private *dev_priv); + void (*oa_disable)(struct drm_i915_private *dev_priv); + void (*update_oacontrol)(struct drm_i915_private *dev_priv); + void (*update_specific_hw_ctx_id)(struct drm_i915_private *dev_priv, + u32 ctx_id); + void (*read)(struct i915_perf_event *event, + struct i915_perf_read_state *read_state); + bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); +}; + struct drm_i915_private { struct drm_device *dev; struct kmem_cache *objects; @@ -1996,7 +2015,43 @@ struct drm_i915_private { struct { bool initialized; + struct mutex lock; + + struct ctl_table_header *sysctl_header; + + struct { + struct i915_perf_event *exclusive_event; + + u32 specific_ctx_id; + + struct hrtimer poll_check_timer; + wait_queue_head_t poll_wq; + + bool periodic; + u32 period_exponent; + + u32 metrics_set; + + const struct i915_oa_reg *mux_regs; + int mux_regs_len; + const struct i915_oa_reg *b_counter_regs; + int b_counter_regs_len; + + struct { + struct drm_i915_gem_object *obj; + u32 gtt_offset; + u8 *addr; + u32 head; + u32 tail; + int format; + int format_size; + } oa_buffer; + + struct i915_oa_ops ops; + const struct i915_oa_format *oa_formats; + } oa; + struct list_head events; } perf; @@ -3204,6 +3259,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, int i915_perf_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file); +void i915_oa_context_pin_notify(struct drm_i915_private *dev_priv, + struct intel_context *context); /* i915_gem_evict.c */ int __must_check i915_gem_evict_something(struct drm_device *dev, diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 8e893b3..3c4419c 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -133,6 +133,23 @@ static int get_context_size(struct drm_device *dev) return ret; } +static int i915_gem_context_pin_state(struct drm_device *dev, + struct intel_context *ctx) +{ + int ret; + + BUG_ON(!mutex_is_locked(&dev->struct_mutex)); + + ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state, + get_context_alignment(dev), 0); + if (ret) + return ret; + + i915_oa_context_pin_notify(dev->dev_private, ctx); + + return 0; +} + void i915_gem_context_free(struct kref *ctx_ref) { struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref); @@ -258,8 +275,7 @@ i915_gem_create_context(struct drm_device *dev, * be available. To avoid this we always pin the default * context. */ - ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state, - get_context_alignment(dev), 0); + ret = i915_gem_context_pin_state(dev, ctx); if (ret) { DRM_DEBUG_DRIVER("Couldn't pin %d\n", ret); goto err_destroy; @@ -634,8 +650,7 @@ static int do_switch(struct drm_i915_gem_request *req) /* Trying to pin first makes error handling easier. */ if (ring == &dev_priv->ring[RCS]) { - ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state, - get_context_alignment(ring->dev), 0); + ret = i915_gem_context_pin_state(ring->dev, to); if (ret) return ret; } diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 477e3e6..bc1c4d1 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -26,6 +26,31 @@ #include #include "i915_drv.h" +#include "intel_ringbuffer.h" +#include "intel_lrc.h" +#include "i915_oa_hsw.h" + +/* Must be a power of two */ +#define OA_BUFFER_SIZE SZ_16M +#define OA_TAKEN(tail, head) ((tail - head) & (OA_BUFFER_SIZE - 1)) + +/* frequency for forwarding samples from OA to perf buffer */ +#define POLL_FREQUENCY 200 +#define POLL_PERIOD max_t(u64, 10000, NSEC_PER_SEC / POLL_FREQUENCY) + +#define OA_EXPONENT_MAX 0x3f + +static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = { + [I915_OA_FORMAT_A13] = { 0, 64 }, + [I915_OA_FORMAT_A29] = { 1, 128 }, + [I915_OA_FORMAT_A13_B8_C8] = { 2, 128 }, + /* A29_B8_C8 Disallowed as 192 bytes doesn't factor into buffer size */ + [I915_OA_FORMAT_B4_C8] = { 4, 64 }, + [I915_OA_FORMAT_A45_B8_C8] = { 5, 256 }, + [I915_OA_FORMAT_B4_C8_A16] = { 6, 128 }, + [I915_OA_FORMAT_C4_B8] = { 7, 64 }, +}; + /** * i915_perf_copy_attr() - copy specific event attributes from userspace @@ -107,6 +132,634 @@ err_size: goto out; } + +static bool gen7_oa_buffer_is_empty(struct drm_i915_private *dev_priv) +{ + u32 oastatus2 = I915_READ(GEN7_OASTATUS2); + u32 oastatus1 = I915_READ(GEN7_OASTATUS1); + u32 head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK; + u32 tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK; + + return OA_TAKEN(tail, head) == 0; +} + +static bool append_oa_status(struct i915_perf_event *event, + struct i915_perf_read_state *read_state, + enum drm_i915_perf_record_type type) +{ + struct drm_i915_perf_event_header header = { type, 0, sizeof(header) }; + + if ((read_state->count - read_state->read) < header.size) + return false; + + copy_to_user(read_state->buf, &header, sizeof(header)); + + read_state->buf += sizeof(header); + read_state->read += header.size; + + return true; +} + +static bool append_oa_sample(struct i915_perf_event *event, + struct i915_perf_read_state *read_state, + const u8 *report) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + int report_size = dev_priv->perf.oa.oa_buffer.format_size; + struct drm_i915_perf_event_header header; + u32 sample_flags = event->sample_flags; + u32 dummy_ctx_id = 0; + u32 dummy_timestamp = 0; + + header.type = DRM_I915_PERF_RECORD_SAMPLE; + header.misc = 0; + header.size = sizeof(header); + + + /* XXX: could pre-compute this when opening the event... */ + + if (sample_flags & I915_PERF_SAMPLE_CTXID) + header.size += 4; + + if (sample_flags & I915_PERF_SAMPLE_TIMESTAMP) + header.size += 4; + + if (sample_flags & I915_PERF_SAMPLE_OA_REPORT) + header.size += report_size; + + + if ((read_state->count - read_state->read) < header.size) + return false; + + + copy_to_user(read_state->buf, &header, sizeof(header)); + read_state->buf += sizeof(header); + + if (sample_flags & I915_PERF_SAMPLE_CTXID) { +#warning "fixme: extract context ID from OA reports" + copy_to_user(read_state->buf, &dummy_ctx_id, 4); + read_state->buf += 4; + } + + if (sample_flags & I915_PERF_SAMPLE_TIMESTAMP) { +#warning "fixme: extract timestamp from OA reports" + copy_to_user(read_state->buf, &dummy_timestamp, 4); + read_state->buf += 4; + } + + if (sample_flags & I915_PERF_SAMPLE_OA_REPORT) { + copy_to_user(read_state->buf, report, report_size); + read_state->buf += report_size; + } + + + read_state->read += header.size; + + return true; +} + +static u32 gen7_append_oa_reports(struct i915_perf_event *event, + struct i915_perf_read_state *read_state, + u32 head, + u32 tail) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + int report_size = dev_priv->perf.oa.oa_buffer.format_size; + u8 *oa_buf_base = dev_priv->perf.oa.oa_buffer.addr; + u32 mask = (OA_BUFFER_SIZE - 1); + u8 *report; + u32 taken; + + head -= dev_priv->perf.oa.oa_buffer.gtt_offset; + tail -= dev_priv->perf.oa.oa_buffer.gtt_offset; + + /* Note: the gpu doesn't wrap the tail according to the OA buffer size + * so when we need to make sure our head/tail values are in-bounds we + * use the above mask. + */ + + while ((taken = OA_TAKEN(tail, head))) { + /* The tail increases in 64 byte increments, not in + * format_size steps. */ + if (taken < report_size) + break; + + report = oa_buf_base + (head & mask); + + if (dev_priv->perf.oa.exclusive_event->enabled) { + if (!append_oa_sample(event, read_state, report)) + break; + } + + /* If append_oa_sample() returns false we shouldn't progress + * head so we update it afterwards... */ + head += report_size; + } + + return dev_priv->perf.oa.oa_buffer.gtt_offset + head; +} + +static void gen7_oa_read(struct i915_perf_event *event, + struct i915_perf_read_state *read_state) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + u32 oastatus2; + u32 oastatus1; + u32 head; + u32 tail; + + WARN_ON(!dev_priv->perf.oa.oa_buffer.addr); + + oastatus2 = I915_READ(GEN7_OASTATUS2); + oastatus1 = I915_READ(GEN7_OASTATUS1); + + head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK; + tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK; + + if (unlikely(oastatus1 & (GEN7_OASTATUS1_OABUFFER_OVERFLOW | + GEN7_OASTATUS1_REPORT_LOST))) { + + if (oastatus1 & GEN7_OASTATUS1_OABUFFER_OVERFLOW) { + if (append_oa_status(event, read_state, + DRM_I915_PERF_RECORD_OA_BUFFER_OVERFLOW)) + oastatus1 &= ~GEN7_OASTATUS1_OABUFFER_OVERFLOW; + } + + if (oastatus1 & GEN7_OASTATUS1_REPORT_LOST) { + if (append_oa_status(event, read_state, + DRM_I915_PERF_RECORD_OA_REPORT_LOST)) + oastatus1 &= ~GEN7_OASTATUS1_REPORT_LOST; + } + + I915_WRITE(GEN7_OASTATUS1, oastatus1); + } + + head = gen7_append_oa_reports(event, read_state, head, tail); + + I915_WRITE(GEN7_OASTATUS2, (head & GEN7_OASTATUS2_HEAD_MASK) | + OA_MEM_SELECT_GGTT); +} + +static bool i915_oa_can_read(struct i915_perf_event *event) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + + return !dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv); +} + +static int i915_oa_wait_unlocked(struct i915_perf_event *event) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + + /* Note: the oa_buffer_is_empty() condition is ok to run unlocked as it + * just performs mmio reads of the OA buffer head + tail pointers and + * it's assumed we're handling some operation that implies the event + * can't be destroyed until completion (such as a read()) that ensures + * the device + OA buffer can't disappear + */ + return wait_event_interruptible(dev_priv->perf.oa.poll_wq, + !dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv)); +} + +static void i915_oa_poll_wait(struct i915_perf_event *event, + struct file *file, + poll_table *wait) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + + poll_wait(file, &dev_priv->perf.oa.poll_wq, wait); +} + +static void i915_oa_read(struct i915_perf_event *event, + struct i915_perf_read_state *read_state) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + + dev_priv->perf.oa.ops.read(event, read_state); +} + +static void +free_oa_buffer(struct drm_i915_private *i915) +{ + mutex_lock(&i915->dev->struct_mutex); + + vunmap(i915->perf.oa.oa_buffer.addr); + i915_gem_object_ggtt_unpin(i915->perf.oa.oa_buffer.obj); + drm_gem_object_unreference(&i915->perf.oa.oa_buffer.obj->base); + + i915->perf.oa.oa_buffer.obj = NULL; + i915->perf.oa.oa_buffer.gtt_offset = 0; + i915->perf.oa.oa_buffer.addr = NULL; + + mutex_unlock(&i915->dev->struct_mutex); +} + +static void i915_oa_event_destroy(struct i915_perf_event *event) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + + BUG_ON(event != dev_priv->perf.oa.exclusive_event); + + dev_priv->perf.oa.ops.disable_metric_set(dev_priv); + + free_oa_buffer(dev_priv); + + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); + intel_runtime_pm_put(dev_priv); + + dev_priv->perf.oa.exclusive_event = NULL; +} + +static void *vmap_oa_buffer(struct drm_i915_gem_object *obj) +{ + int i; + void *addr = NULL; + struct sg_page_iter sg_iter; + struct page **pages; + + pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages)); + if (pages == NULL) { + DRM_DEBUG_DRIVER("Failed to get space for pages\n"); + goto finish; + } + + i = 0; + for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) { + pages[i] = sg_page_iter_page(&sg_iter); + i++; + } + + addr = vmap(pages, i, 0, PAGE_KERNEL); + if (addr == NULL) { + DRM_DEBUG_DRIVER("Failed to vmap pages\n"); + goto finish; + } + +finish: + if (pages) + drm_free_large(pages); + return addr; +} + +static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv) +{ + /* Pre-DevBDW: OABUFFER must be set with counters off, + * before OASTATUS1, but after OASTATUS2 */ + I915_WRITE(GEN7_OASTATUS2, dev_priv->perf.oa.oa_buffer.gtt_offset | + OA_MEM_SELECT_GGTT); /* head */ + I915_WRITE(GEN7_OABUFFER, dev_priv->perf.oa.oa_buffer.gtt_offset); + I915_WRITE(GEN7_OASTATUS1, dev_priv->perf.oa.oa_buffer.gtt_offset | + OABUFFER_SIZE_16M); /* tail */ +} + +static int alloc_oa_buffer(struct drm_i915_private *dev_priv) +{ + struct drm_i915_gem_object *bo; + int ret; + + BUG_ON(dev_priv->perf.oa.oa_buffer.obj); + + ret = i915_mutex_lock_interruptible(dev_priv->dev); + if (ret) + return ret; + + bo = i915_gem_alloc_object(dev_priv->dev, OA_BUFFER_SIZE); + if (bo == NULL) { + DRM_ERROR("Failed to allocate OA buffer\n"); + ret = -ENOMEM; + goto unlock; + } + dev_priv->perf.oa.oa_buffer.obj = bo; + + ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC); + if (ret) + goto err_unref; + + /* PreHSW required 512K alignment, HSW requires 16M */ + ret = i915_gem_obj_ggtt_pin(bo, SZ_16M, 0); + if (ret) + goto err_unref; + + dev_priv->perf.oa.oa_buffer.gtt_offset = i915_gem_obj_ggtt_offset(bo); + dev_priv->perf.oa.oa_buffer.addr = vmap_oa_buffer(bo); + + dev_priv->perf.oa.ops.init_oa_buffer(dev_priv); + + DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p", + dev_priv->perf.oa.oa_buffer.gtt_offset, + dev_priv->perf.oa.oa_buffer.addr); + + goto unlock; + +err_unref: + drm_gem_object_unreference(&bo->base); + +unlock: + mutex_unlock(&dev_priv->dev->struct_mutex); + return ret; +} + +static void config_oa_regs(struct drm_i915_private *dev_priv, + const struct i915_oa_reg *regs, + int n_regs) +{ + int i; + + for (i = 0; i < n_regs; i++) { + const struct i915_oa_reg *reg = regs + i; + + I915_WRITE(reg->addr, reg->value); + } +} + +static void hsw_enable_metric_set(struct drm_i915_private *dev_priv) +{ + dev_priv->perf.oa.mux_regs = NULL; + dev_priv->perf.oa.mux_regs_len = 0; + dev_priv->perf.oa.b_counter_regs = NULL; + dev_priv->perf.oa.b_counter_regs_len = 0; + + I915_WRITE(GDT_CHICKEN_BITS, GT_NOA_ENABLE); + + /* PRM: + * + * OA unit is using “crclk” for its functionality. When trunk + * level clock gating takes place, OA clock would be gated, + * unable to count the events from non-render clock domain. + * Render clock gating must be disabled when OA is enabled to + * count the events from non-render domain. Unit level clock + * gating for RCS should also be disabled. + */ + I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) & + ~GEN7_DOP_CLOCK_GATE_ENABLE)); + I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) | + GEN6_CSUNIT_CLOCK_GATE_DISABLE)); + + switch (dev_priv->perf.oa.metrics_set) { + case I915_OA_METRICS_SET_3D: + config_oa_regs(dev_priv, i915_oa_3d_mux_config_hsw, + i915_oa_3d_mux_config_hsw_len); + config_oa_regs(dev_priv, i915_oa_3d_b_counter_config_hsw, + i915_oa_3d_b_counter_config_hsw_len); + break; + default: + BUG(); + } +} + +static void hsw_disable_metric_set(struct drm_i915_private *dev_priv) +{ + I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) & + ~GEN6_CSUNIT_CLOCK_GATE_DISABLE)); + I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) | + GEN7_DOP_CLOCK_GATE_ENABLE)); + + I915_WRITE(GDT_CHICKEN_BITS, (I915_READ(GDT_CHICKEN_BITS) & + ~GT_NOA_ENABLE)); +} + +static void gen7_update_oacontrol(struct drm_i915_private *dev_priv) +{ + if (dev_priv->perf.oa.exclusive_event->enabled) { + unsigned long ctx_id = 0; + bool pinning_ok = false; + + if (dev_priv->perf.oa.exclusive_event->ctx && + dev_priv->perf.oa.specific_ctx_id) { + ctx_id = dev_priv->perf.oa.specific_ctx_id; + pinning_ok = true; + } + + if (dev_priv->perf.oa.exclusive_event->ctx == NULL || + pinning_ok) { + bool periodic = dev_priv->perf.oa.periodic; + u32 period_exponent = dev_priv->perf.oa.period_exponent; + u32 report_format = dev_priv->perf.oa.oa_buffer.format; + + I915_WRITE(GEN7_OACONTROL, + (ctx_id & GEN7_OACONTROL_CTX_MASK) | + (period_exponent << + GEN7_OACONTROL_TIMER_PERIOD_SHIFT) | + (periodic ? + GEN7_OACONTROL_TIMER_ENABLE : 0) | + (report_format << + GEN7_OACONTROL_FORMAT_SHIFT) | + (ctx_id ? + GEN7_OACONTROL_PER_CTX_ENABLE : 0) | + GEN7_OACONTROL_ENABLE); + return; + } + } + + I915_WRITE(GEN7_OACONTROL, 0); +} + +static void gen7_oa_enable(struct drm_i915_private *dev_priv) +{ + u32 oastatus1, tail; + + gen7_update_oacontrol(dev_priv); + + /* Reset the head ptr so we don't forward reports from before now. */ + oastatus1 = I915_READ(GEN7_OASTATUS1); + tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK; + I915_WRITE(GEN7_OASTATUS2, (tail & GEN7_OASTATUS2_HEAD_MASK) | + OA_MEM_SELECT_GGTT); +} + +static void i915_oa_event_enable(struct i915_perf_event *event) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + + dev_priv->perf.oa.ops.oa_enable(dev_priv); + + if (dev_priv->perf.oa.periodic) + hrtimer_start(&dev_priv->perf.oa.poll_check_timer, + ns_to_ktime(POLL_PERIOD), + HRTIMER_MODE_REL_PINNED); +} + +static void gen7_oa_disable(struct drm_i915_private *dev_priv) +{ + I915_WRITE(GEN7_OACONTROL, 0); +} + +static void i915_oa_event_disable(struct i915_perf_event *event) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + + dev_priv->perf.oa.ops.oa_disable(dev_priv); + + if (dev_priv->perf.oa.periodic) + hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer); +} + +static int i915_oa_event_init(struct i915_perf_event *event, + struct drm_i915_perf_open_param *param) +{ + struct drm_i915_private *dev_priv = event->dev_priv; + struct drm_i915_perf_oa_attr oa_attr; + u32 known_flags = 0; + int format_size; + int ret; + + BUG_ON(param->type != I915_PERF_OA_EVENT); + + if (!dev_priv->perf.oa.ops.init_oa_buffer) { + DRM_ERROR("OA unit not supported\n"); + return -ENODEV; + } + + /* To avoid the complexity of having to accurately filter + * counter reports and marshal to the appropriate client + * we currently only allow exclusive access */ + if (dev_priv->perf.oa.exclusive_event) { + DRM_ERROR("OA unit already in use\n"); + return -EBUSY; + } + + ret = i915_perf_copy_attr(to_user_ptr(param->attr), + &oa_attr, + I915_OA_ATTR_SIZE_VER0, + sizeof(oa_attr)); + if (ret) + return ret; + + known_flags = I915_OA_FLAG_PERIODIC; + if (oa_attr.flags & ~known_flags) { + DRM_ERROR("Unknown drm_i915_perf_oa_attr flag\n"); + return -EINVAL; + } + + if (oa_attr.oa_format >= I915_OA_FORMAT_MAX) { + DRM_ERROR("Invalid OA report format\n"); + return -EINVAL; + } + + format_size = dev_priv->perf.oa.oa_formats[oa_attr.oa_format].size; + if (!format_size) { + DRM_ERROR("Invalid OA report format\n"); + return -EINVAL; + } + + dev_priv->perf.oa.oa_buffer.format_size = format_size; + + dev_priv->perf.oa.oa_buffer.format = + dev_priv->perf.oa.oa_formats[oa_attr.oa_format].format; + + if (IS_HASWELL(dev_priv->dev)) { + if (oa_attr.metrics_set <= 0 || + oa_attr.metrics_set > I915_OA_METRICS_SET_MAX) { + DRM_ERROR("Metric set not available\n"); + return -EINVAL; + } + } else { + BUG(); /* checked above */ + return -ENODEV; + } + + dev_priv->perf.oa.metrics_set = oa_attr.metrics_set; + + dev_priv->perf.oa.periodic = !!(oa_attr.flags & I915_OA_FLAG_PERIODIC); + + /* NB: The exponent represents a period as follows: + * + * 80ns * 2^(period_exponent + 1) + */ + if (dev_priv->perf.oa.periodic) { + u64 period_exponent = oa_attr.oa_timer_exponent; + + if (period_exponent > OA_EXPONENT_MAX) + return -EINVAL; + + /* Theoretically we can program the OA unit to sample every + * 160ns but don't allow that by default unless root... + * + * Referring to perf's kernel.perf_event_max_sample_rate for + * a precedent (100000 by default); with an OA exponent of + * 6 we get a period of 10.240 microseconds -just under + * 100000Hz + */ + if (period_exponent < 6 && !capable(CAP_SYS_ADMIN)) { + DRM_ERROR("Sampling period too high without root privileges\n"); + return -EACCES; + } + + dev_priv->perf.oa.period_exponent = period_exponent; + } else if (oa_attr.oa_timer_exponent) { + DRM_ERROR("Sampling exponent specified without requesting periodic sampling"); + return -EINVAL; + } + + ret = alloc_oa_buffer(dev_priv); + if (ret) + return ret; + + dev_priv->perf.oa.exclusive_event = event; + + /* PRM - observability performance counters: + * + * OACONTROL, performance counter enable, note: + * + * "When this bit is set, in order to have coherent counts, + * RC6 power state and trunk clock gating must be disabled. + * This can be achieved by programming MMIO registers as + * 0xA094=0 and 0xA090[31]=1" + * + * In our case we are expected that taking pm + FORCEWAKE + * references will effectively disable RC6. + */ + intel_runtime_pm_get(dev_priv); + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); + + dev_priv->perf.oa.ops.enable_metric_set(dev_priv); + + event->destroy = i915_oa_event_destroy; + event->enable = i915_oa_event_enable; + event->disable = i915_oa_event_disable; + event->can_read = i915_oa_can_read; + event->wait_unlocked = i915_oa_wait_unlocked; + event->poll_wait = i915_oa_poll_wait; + event->read = i915_oa_read; + + return 0; +} + +static void gen7_update_specific_hw_ctx_id(struct drm_i915_private *dev_priv, + u32 ctx_id) +{ + dev_priv->perf.oa.specific_ctx_id = ctx_id; + gen7_update_oacontrol(dev_priv); +} + +static void i915_oa_context_pin_notify_locked(struct drm_i915_private *dev_priv, + struct intel_context *context) +{ + if (i915.enable_execlists || + dev_priv->perf.oa.ops.update_specific_hw_ctx_id == NULL) + return; + + if (dev_priv->perf.oa.exclusive_event && + dev_priv->perf.oa.exclusive_event->ctx == context) { + struct drm_i915_gem_object *obj = + context->legacy_hw_ctx.rcs_state; + u32 ctx_id = i915_gem_obj_ggtt_offset(obj); + + dev_priv->perf.oa.ops.update_specific_hw_ctx_id(dev_priv, ctx_id); + } +} + +void i915_oa_context_pin_notify(struct drm_i915_private *dev_priv, + struct intel_context *context) +{ + if (!dev_priv->perf.initialized) + return; + + mutex_lock(&dev_priv->perf.lock); + i915_oa_context_pin_notify_locked(dev_priv, context); + mutex_unlock(&dev_priv->perf.lock); +} + static ssize_t i915_perf_read_locked(struct i915_perf_event *event, struct file *file, char __user *buf, @@ -152,6 +805,20 @@ static ssize_t i915_perf_read(struct file *file, return ret; } +static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer) +{ + struct drm_i915_private *dev_priv = + container_of(hrtimer, typeof(*dev_priv), + perf.oa.poll_check_timer); + + if (!dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv)) + wake_up(&dev_priv->perf.oa.poll_wq); + + hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD)); + + return HRTIMER_RESTART; +} + static unsigned int i915_perf_poll_locked(struct i915_perf_event *event, struct file *file, poll_table *wait) @@ -366,7 +1033,11 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev, void *data, event->ctx = specific_ctx; switch (param->type) { - /* TODO: Init according to specific type */ + case I915_PERF_OA_EVENT: + ret = i915_oa_event_init(event, param); + if (ret) + goto err_alloc; + break; default: DRM_ERROR("Unknown perf event type\n"); ret = -EINVAL; @@ -429,7 +1100,27 @@ void i915_perf_init(struct drm_device *dev) { struct drm_i915_private *dev_priv = to_i915(dev); - /* Currently no global event state to initialize */ + if (!IS_HASWELL(dev)) + return; + + hrtimer_init(&dev_priv->perf.oa.poll_check_timer, + CLOCK_MONOTONIC, HRTIMER_MODE_REL); + dev_priv->perf.oa.poll_check_timer.function = poll_check_timer_cb; + init_waitqueue_head(&dev_priv->perf.oa.poll_wq); + + INIT_LIST_HEAD(&dev_priv->perf.events); + mutex_init(&dev_priv->perf.lock); + + dev_priv->perf.oa.ops.init_oa_buffer = gen7_init_oa_buffer; + dev_priv->perf.oa.ops.enable_metric_set = hsw_enable_metric_set; + dev_priv->perf.oa.ops.disable_metric_set = hsw_disable_metric_set; + dev_priv->perf.oa.ops.oa_enable = gen7_oa_enable; + dev_priv->perf.oa.ops.oa_disable = gen7_oa_disable; + dev_priv->perf.oa.ops.update_specific_hw_ctx_id = gen7_update_specific_hw_ctx_id; + dev_priv->perf.oa.ops.read = gen7_oa_read; + dev_priv->perf.oa.ops.oa_buffer_is_empty = gen7_oa_buffer_is_empty; + + dev_priv->perf.oa.oa_formats = hsw_oa_formats; dev_priv->perf.initialized = true; } @@ -441,7 +1132,7 @@ void i915_perf_fini(struct drm_device *dev) if (!dev_priv->perf.initialized) return; - /* Currently nothing to clean up */ + dev_priv->perf.oa.ops.init_oa_buffer = NULL; dev_priv->perf.initialized = false; } diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 2e488e8..0736358 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -537,6 +537,343 @@ #define GEN7_3DPRIM_BASE_VERTEX 0x2440 #define GEN7_OACONTROL 0x2360 +#define GEN7_OACONTROL_CTX_MASK 0xFFFFF000 +#define GEN7_OACONTROL_TIMER_PERIOD_MASK 0x3F +#define GEN7_OACONTROL_TIMER_PERIOD_SHIFT 6 +#define GEN7_OACONTROL_TIMER_ENABLE (1<<5) +#define GEN7_OACONTROL_FORMAT_A13 (0<<2) +#define GEN7_OACONTROL_FORMAT_A29 (1<<2) +#define GEN7_OACONTROL_FORMAT_A13_B8_C8 (2<<2) +#define GEN7_OACONTROL_FORMAT_A29_B8_C8 (3<<2) +#define GEN7_OACONTROL_FORMAT_B4_C8 (4<<2) +#define GEN7_OACONTROL_FORMAT_A45_B8_C8 (5<<2) +#define GEN7_OACONTROL_FORMAT_B4_C8_A16 (6<<2) +#define GEN7_OACONTROL_FORMAT_C4_B8 (7<<2) +#define GEN7_OACONTROL_FORMAT_SHIFT 2 +#define GEN7_OACONTROL_PER_CTX_ENABLE (1<<1) +#define GEN7_OACONTROL_ENABLE (1<<0) + +#define GEN8_OACTXID 0x2364 + +#define GEN8_OACONTROL 0x2B00 +#define GEN8_OA_REPORT_FORMAT_A12 (0<<2) +#define GEN8_OA_REPORT_FORMAT_A12_B8_C8 (2<<2) +#define GEN8_OA_REPORT_FORMAT_A36_B8_C8 (5<<2) +#define GEN8_OA_REPORT_FORMAT_C4_B8 (7<<2) +#define GEN8_OA_REPORT_FORMAT_SHIFT 2 +#define GEN8_OA_SPECIFIC_CONTEXT_ENABLE (1<<1) +#define GEN8_OA_COUNTER_ENABLE (1<<0) + +#define GEN8_OACTXCONTROL 0x2360 +#define GEN8_OA_TIMER_PERIOD_MASK 0x3F +#define GEN8_OA_TIMER_PERIOD_SHIFT 2 +#define GEN8_OA_TIMER_ENABLE (1<<1) +#define GEN8_OA_COUNTER_RESUME (1<<0) + +#define GEN7_OABUFFER 0x23B0 /* R/W */ +#define GEN7_OABUFFER_OVERRUN_DISABLE (1<<3) +#define GEN7_OABUFFER_EDGE_TRIGGER (1<<2) +#define GEN7_OABUFFER_STOP_RESUME_ENABLE (1<<1) +#define GEN7_OABUFFER_RESUME (1<<0) + +#define GEN8_OABUFFER 0x2b14 + +#define GEN7_OASTATUS1 0x2364 +#define GEN7_OASTATUS1_TAIL_MASK 0xffffffc0 +#define GEN7_OASTATUS1_COUNTER_OVERFLOW (1<<2) +#define GEN7_OASTATUS1_OABUFFER_OVERFLOW (1<<1) +#define GEN7_OASTATUS1_REPORT_LOST (1<<0) + +#define GEN7_OASTATUS2 0x2368 +#define GEN7_OASTATUS2_HEAD_MASK 0xffffffc0 + +#define GEN8_OASTATUS 0x2b08 +#define GEN8_OASTATUS_OVERRUN_STATUS (1<<3) +#define GEN8_OASTATUS_COUNTER_OVERFLOW (1<<2) +#define GEN8_OASTATUS_OABUFFER_OVERFLOW (1<<1) +#define GEN8_OASTATUS_REPORT_LOST (1<<0) + +#define GEN8_OAHEADPTR 0x2B0C +#define GEN8_OATAILPTR 0x2B10 + +#define OABUFFER_SIZE_128K (0<<3) +#define OABUFFER_SIZE_256K (1<<3) +#define OABUFFER_SIZE_512K (2<<3) +#define OABUFFER_SIZE_1M (3<<3) +#define OABUFFER_SIZE_2M (4<<3) +#define OABUFFER_SIZE_4M (5<<3) +#define OABUFFER_SIZE_8M (6<<3) +#define OABUFFER_SIZE_16M (7<<3) + +#define OA_MEM_SELECT_GGTT (1<<0) + +#define EU_PERF_CNTL0 0xe458 + +#define GDT_CHICKEN_BITS 0x9840 +#define GT_NOA_ENABLE 0x00000080 + +/* + * OA Boolean state + */ + +#define OAREPORTTRIG1 0x2740 +#define OAREPORTTRIG1_THRESHOLD_MASK 0xffff +#define OAREPORTTRIG1_EDGE_LEVEL_TRIGER_SELECT_MASK 0xffff0000 /* 0=level */ + +#define OAREPORTTRIG2 0x2744 +#define OAREPORTTRIG2_INVERT_A_0 (1<<0) +#define OAREPORTTRIG2_INVERT_A_1 (1<<1) +#define OAREPORTTRIG2_INVERT_A_2 (1<<2) +#define OAREPORTTRIG2_INVERT_A_3 (1<<3) +#define OAREPORTTRIG2_INVERT_A_4 (1<<4) +#define OAREPORTTRIG2_INVERT_A_5 (1<<5) +#define OAREPORTTRIG2_INVERT_A_6 (1<<6) +#define OAREPORTTRIG2_INVERT_A_7 (1<<7) +#define OAREPORTTRIG2_INVERT_A_8 (1<<8) +#define OAREPORTTRIG2_INVERT_A_9 (1<<9) +#define OAREPORTTRIG2_INVERT_A_10 (1<<10) +#define OAREPORTTRIG2_INVERT_A_11 (1<<11) +#define OAREPORTTRIG2_INVERT_A_12 (1<<12) +#define OAREPORTTRIG2_INVERT_A_13 (1<<13) +#define OAREPORTTRIG2_INVERT_A_14 (1<<14) +#define OAREPORTTRIG2_INVERT_A_15 (1<<15) +#define OAREPORTTRIG2_INVERT_B_0 (1<<16) +#define OAREPORTTRIG2_INVERT_B_1 (1<<17) +#define OAREPORTTRIG2_INVERT_B_2 (1<<18) +#define OAREPORTTRIG2_INVERT_B_3 (1<<19) +#define OAREPORTTRIG2_INVERT_C_0 (1<<20) +#define OAREPORTTRIG2_INVERT_C_1 (1<<21) +#define OAREPORTTRIG2_INVERT_D_0 (1<<22) +#define OAREPORTTRIG2_THRESHOLD_ENABLE (1<<23) +#define OAREPORTTRIG2_REPORT_TRIGGER_ENABLE (1<<31) + +#define OAREPORTTRIG3 0x2748 +#define OAREPORTTRIG3_NOA_SELECT_MASK 0xf +#define OAREPORTTRIG3_NOA_SELECT_8_SHIFT 0 +#define OAREPORTTRIG3_NOA_SELECT_9_SHIFT 4 +#define OAREPORTTRIG3_NOA_SELECT_10_SHIFT 8 +#define OAREPORTTRIG3_NOA_SELECT_11_SHIFT 12 +#define OAREPORTTRIG3_NOA_SELECT_12_SHIFT 16 +#define OAREPORTTRIG3_NOA_SELECT_13_SHIFT 20 +#define OAREPORTTRIG3_NOA_SELECT_14_SHIFT 24 +#define OAREPORTTRIG3_NOA_SELECT_15_SHIFT 28 + +#define OAREPORTTRIG4 0x274c +#define OAREPORTTRIG4_NOA_SELECT_MASK 0xf +#define OAREPORTTRIG4_NOA_SELECT_0_SHIFT 0 +#define OAREPORTTRIG4_NOA_SELECT_1_SHIFT 4 +#define OAREPORTTRIG4_NOA_SELECT_2_SHIFT 8 +#define OAREPORTTRIG4_NOA_SELECT_3_SHIFT 12 +#define OAREPORTTRIG4_NOA_SELECT_4_SHIFT 16 +#define OAREPORTTRIG4_NOA_SELECT_5_SHIFT 20 +#define OAREPORTTRIG4_NOA_SELECT_6_SHIFT 24 +#define OAREPORTTRIG4_NOA_SELECT_7_SHIFT 28 + +#define OAREPORTTRIG5 0x2750 +#define OAREPORTTRIG5_THRESHOLD_MASK 0xffff +#define OAREPORTTRIG5_EDGE_LEVEL_TRIGER_SELECT_MASK 0xffff0000 /* 0=level */ + +#define OAREPORTTRIG6 0x2754 +#define OAREPORTTRIG6_INVERT_A_0 (1<<0) +#define OAREPORTTRIG6_INVERT_A_1 (1<<1) +#define OAREPORTTRIG6_INVERT_A_2 (1<<2) +#define OAREPORTTRIG6_INVERT_A_3 (1<<3) +#define OAREPORTTRIG6_INVERT_A_4 (1<<4) +#define OAREPORTTRIG6_INVERT_A_5 (1<<5) +#define OAREPORTTRIG6_INVERT_A_6 (1<<6) +#define OAREPORTTRIG6_INVERT_A_7 (1<<7) +#define OAREPORTTRIG6_INVERT_A_8 (1<<8) +#define OAREPORTTRIG6_INVERT_A_9 (1<<9) +#define OAREPORTTRIG6_INVERT_A_10 (1<<10) +#define OAREPORTTRIG6_INVERT_A_11 (1<<11) +#define OAREPORTTRIG6_INVERT_A_12 (1<<12) +#define OAREPORTTRIG6_INVERT_A_13 (1<<13) +#define OAREPORTTRIG6_INVERT_A_14 (1<<14) +#define OAREPORTTRIG6_INVERT_A_15 (1<<15) +#define OAREPORTTRIG6_INVERT_B_0 (1<<16) +#define OAREPORTTRIG6_INVERT_B_1 (1<<17) +#define OAREPORTTRIG6_INVERT_B_2 (1<<18) +#define OAREPORTTRIG6_INVERT_B_3 (1<<19) +#define OAREPORTTRIG6_INVERT_C_0 (1<<20) +#define OAREPORTTRIG6_INVERT_C_1 (1<<21) +#define OAREPORTTRIG6_INVERT_D_0 (1<<22) +#define OAREPORTTRIG6_THRESHOLD_ENABLE (1<<23) +#define OAREPORTTRIG6_REPORT_TRIGGER_ENABLE (1<<31) + +#define OAREPORTTRIG7 0x2758 +#define OAREPORTTRIG7_NOA_SELECT_MASK 0xf +#define OAREPORTTRIG7_NOA_SELECT_8_SHIFT 0 +#define OAREPORTTRIG7_NOA_SELECT_9_SHIFT 4 +#define OAREPORTTRIG7_NOA_SELECT_10_SHIFT 8 +#define OAREPORTTRIG7_NOA_SELECT_11_SHIFT 12 +#define OAREPORTTRIG7_NOA_SELECT_12_SHIFT 16 +#define OAREPORTTRIG7_NOA_SELECT_13_SHIFT 20 +#define OAREPORTTRIG7_NOA_SELECT_14_SHIFT 24 +#define OAREPORTTRIG7_NOA_SELECT_15_SHIFT 28 + +#define OAREPORTTRIG8 0x275c +#define OAREPORTTRIG8_NOA_SELECT_MASK 0xf +#define OAREPORTTRIG8_NOA_SELECT_0_SHIFT 0 +#define OAREPORTTRIG8_NOA_SELECT_1_SHIFT 4 +#define OAREPORTTRIG8_NOA_SELECT_2_SHIFT 8 +#define OAREPORTTRIG8_NOA_SELECT_3_SHIFT 12 +#define OAREPORTTRIG8_NOA_SELECT_4_SHIFT 16 +#define OAREPORTTRIG8_NOA_SELECT_5_SHIFT 20 +#define OAREPORTTRIG8_NOA_SELECT_6_SHIFT 24 +#define OAREPORTTRIG8_NOA_SELECT_7_SHIFT 28 + +#define OASTARTTRIG1 0x2710 +#define OASTARTTRIG1_THRESHOLD_COUNT_MASK_MBZ 0xffff0000 +#define OASTARTTRIG1_THRESHOLD_MASK 0xffff + +#define OASTARTTRIG2 0x2714 +#define OASTARTTRIG2_INVERT_A_0 (1<<0) +#define OASTARTTRIG2_INVERT_A_1 (1<<1) +#define OASTARTTRIG2_INVERT_A_2 (1<<2) +#define OASTARTTRIG2_INVERT_A_3 (1<<3) +#define OASTARTTRIG2_INVERT_A_4 (1<<4) +#define OASTARTTRIG2_INVERT_A_5 (1<<5) +#define OASTARTTRIG2_INVERT_A_6 (1<<6) +#define OASTARTTRIG2_INVERT_A_7 (1<<7) +#define OASTARTTRIG2_INVERT_A_8 (1<<8) +#define OASTARTTRIG2_INVERT_A_9 (1<<9) +#define OASTARTTRIG2_INVERT_A_10 (1<<10) +#define OASTARTTRIG2_INVERT_A_11 (1<<11) +#define OASTARTTRIG2_INVERT_A_12 (1<<12) +#define OASTARTTRIG2_INVERT_A_13 (1<<13) +#define OASTARTTRIG2_INVERT_A_14 (1<<14) +#define OASTARTTRIG2_INVERT_A_15 (1<<15) +#define OASTARTTRIG2_INVERT_B_0 (1<<16) +#define OASTARTTRIG2_INVERT_B_1 (1<<17) +#define OASTARTTRIG2_INVERT_B_2 (1<<18) +#define OASTARTTRIG2_INVERT_B_3 (1<<19) +#define OASTARTTRIG2_INVERT_C_0 (1<<20) +#define OASTARTTRIG2_INVERT_C_1 (1<<21) +#define OASTARTTRIG2_INVERT_D_0 (1<<22) +#define OASTARTTRIG2_THRESHOLD_ENABLE (1<<23) +#define OASTARTTRIG2_START_TRIG_FLAG_MBZ (1<<24) +#define OASTARTTRIG2_EVENT_SELECT_0 (1<<28) +#define OASTARTTRIG2_EVENT_SELECT_1 (1<<29) +#define OASTARTTRIG2_EVENT_SELECT_2 (1<<30) +#define OASTARTTRIG2_EVENT_SELECT_3 (1<<31) + +#define OASTARTTRIG3 0x2718 +#define OASTARTTRIG3_NOA_SELECT_MASK 0xf +#define OASTARTTRIG3_NOA_SELECT_8_SHIFT 0 +#define OASTARTTRIG3_NOA_SELECT_9_SHIFT 4 +#define OASTARTTRIG3_NOA_SELECT_10_SHIFT 8 +#define OASTARTTRIG3_NOA_SELECT_11_SHIFT 12 +#define OASTARTTRIG3_NOA_SELECT_12_SHIFT 16 +#define OASTARTTRIG3_NOA_SELECT_13_SHIFT 20 +#define OASTARTTRIG3_NOA_SELECT_14_SHIFT 24 +#define OASTARTTRIG3_NOA_SELECT_15_SHIFT 28 + +#define OASTARTTRIG4 0x271c +#define OASTARTTRIG4_NOA_SELECT_MASK 0xf +#define OASTARTTRIG4_NOA_SELECT_0_SHIFT 0 +#define OASTARTTRIG4_NOA_SELECT_1_SHIFT 4 +#define OASTARTTRIG4_NOA_SELECT_2_SHIFT 8 +#define OASTARTTRIG4_NOA_SELECT_3_SHIFT 12 +#define OASTARTTRIG4_NOA_SELECT_4_SHIFT 16 +#define OASTARTTRIG4_NOA_SELECT_5_SHIFT 20 +#define OASTARTTRIG4_NOA_SELECT_6_SHIFT 24 +#define OASTARTTRIG4_NOA_SELECT_7_SHIFT 28 + +#define OASTARTTRIG5 0x2720 +#define OASTARTTRIG5_THRESHOLD_COUNT_MASK_MBZ 0xffff0000 +#define OASTARTTRIG5_THRESHOLD_MASK 0xffff + +#define OASTARTTRIG6 0x2724 +#define OASTARTTRIG6_INVERT_A_0 (1<<0) +#define OASTARTTRIG6_INVERT_A_1 (1<<1) +#define OASTARTTRIG6_INVERT_A_2 (1<<2) +#define OASTARTTRIG6_INVERT_A_3 (1<<3) +#define OASTARTTRIG6_INVERT_A_4 (1<<4) +#define OASTARTTRIG6_INVERT_A_5 (1<<5) +#define OASTARTTRIG6_INVERT_A_6 (1<<6) +#define OASTARTTRIG6_INVERT_A_7 (1<<7) +#define OASTARTTRIG6_INVERT_A_8 (1<<8) +#define OASTARTTRIG6_INVERT_A_9 (1<<9) +#define OASTARTTRIG6_INVERT_A_10 (1<<10) +#define OASTARTTRIG6_INVERT_A_11 (1<<11) +#define OASTARTTRIG6_INVERT_A_12 (1<<12) +#define OASTARTTRIG6_INVERT_A_13 (1<<13) +#define OASTARTTRIG6_INVERT_A_14 (1<<14) +#define OASTARTTRIG6_INVERT_A_15 (1<<15) +#define OASTARTTRIG6_INVERT_B_0 (1<<16) +#define OASTARTTRIG6_INVERT_B_1 (1<<17) +#define OASTARTTRIG6_INVERT_B_2 (1<<18) +#define OASTARTTRIG6_INVERT_B_3 (1<<19) +#define OASTARTTRIG6_INVERT_C_0 (1<<20) +#define OASTARTTRIG6_INVERT_C_1 (1<<21) +#define OASTARTTRIG6_INVERT_D_0 (1<<22) +#define OASTARTTRIG6_THRESHOLD_ENABLE (1<<23) +#define OASTARTTRIG6_START_TRIG_FLAG_MBZ (1<<24) +#define OASTARTTRIG6_EVENT_SELECT_4 (1<<28) +#define OASTARTTRIG6_EVENT_SELECT_5 (1<<29) +#define OASTARTTRIG6_EVENT_SELECT_6 (1<<30) +#define OASTARTTRIG6_EVENT_SELECT_7 (1<<31) + +#define OASTARTTRIG7 0x2728 +#define OASTARTTRIG7_NOA_SELECT_MASK 0xf +#define OASTARTTRIG7_NOA_SELECT_8_SHIFT 0 +#define OASTARTTRIG7_NOA_SELECT_9_SHIFT 4 +#define OASTARTTRIG7_NOA_SELECT_10_SHIFT 8 +#define OASTARTTRIG7_NOA_SELECT_11_SHIFT 12 +#define OASTARTTRIG7_NOA_SELECT_12_SHIFT 16 +#define OASTARTTRIG7_NOA_SELECT_13_SHIFT 20 +#define OASTARTTRIG7_NOA_SELECT_14_SHIFT 24 +#define OASTARTTRIG7_NOA_SELECT_15_SHIFT 28 + +#define OASTARTTRIG8 0x272c +#define OASTARTTRIG8_NOA_SELECT_MASK 0xf +#define OASTARTTRIG8_NOA_SELECT_0_SHIFT 0 +#define OASTARTTRIG8_NOA_SELECT_1_SHIFT 4 +#define OASTARTTRIG8_NOA_SELECT_2_SHIFT 8 +#define OASTARTTRIG8_NOA_SELECT_3_SHIFT 12 +#define OASTARTTRIG8_NOA_SELECT_4_SHIFT 16 +#define OASTARTTRIG8_NOA_SELECT_5_SHIFT 20 +#define OASTARTTRIG8_NOA_SELECT_6_SHIFT 24 +#define OASTARTTRIG8_NOA_SELECT_7_SHIFT 28 + +/* CECX_0 */ +#define OACEC_COMPARE_LESS_OR_EQUAL 6 +#define OACEC_COMPARE_NOT_EQUAL 5 +#define OACEC_COMPARE_LESS_THAN 4 +#define OACEC_COMPARE_GREATER_OR_EQUAL 3 +#define OACEC_COMPARE_EQUAL 2 +#define OACEC_COMPARE_GREATER_THAN 1 +#define OACEC_COMPARE_ANY_EQUAL 0 + +#define OACEC_COMPARE_VALUE_MASK 0xffff +#define OACEC_COMPARE_VALUE_SHIFT 3 + +#define OACEC_SELECT_NOA (0<<19) +#define OACEC_SELECT_PREV (1<<19) +#define OACEC_SELECT_BOOLEAN (2<<19) + +/* CECX_1 */ +#define OACEC_MASK_MASK 0xffff +#define OACEC_CONSIDERATIONS_MASK 0xffff +#define OACEC_CONSIDERATIONS_SHIFT 16 + +#define OACEC0_0 0x2770 +#define OACEC0_1 0x2774 +#define OACEC1_0 0x2778 +#define OACEC1_1 0x277c +#define OACEC2_0 0x2780 +#define OACEC2_1 0x2784 +#define OACEC3_0 0x2788 +#define OACEC3_1 0x278c +#define OACEC4_0 0x2790 +#define OACEC4_1 0x2794 +#define OACEC5_0 0x2798 +#define OACEC5_1 0x279c +#define OACEC6_0 0x27a0 +#define OACEC6_1 0x27a4 +#define OACEC7_0 0x27a8 +#define OACEC7_1 0x27ac + #define _GEN7_PIPEA_DE_LOAD_SL 0x70068 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068 @@ -6668,6 +7005,7 @@ enum skl_disp_power_wells { # define GEN6_RCCUNIT_CLOCK_GATE_DISABLE (1 << 11) #define GEN6_UCGCTL3 0x9408 +# define GEN6_OACSUNIT_CLOCK_GATE_DISABLE (1 << 20) #define GEN7_UCGCTL4 0x940c #define GEN7_L3BANK2X_CLOCK_GATE_DISABLE (1<<25) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a84f71f..af5dfd4 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -58,6 +58,29 @@ #define I915_ERROR_UEVENT "ERROR" #define I915_RESET_UEVENT "RESET" +/* + * perf events configuration exposed by i915 through + * /sys/bus/event_sources/drivers/i915_oa + */ + +enum drm_i915_oa_format { + I915_OA_FORMAT_A13 = 0, + I915_OA_FORMAT_A29 = 1, + I915_OA_FORMAT_A13_B8_C8 = 2, + I915_OA_FORMAT_B4_C8 = 4, + I915_OA_FORMAT_A45_B8_C8 = 5, + I915_OA_FORMAT_B4_C8_A16 = 6, + I915_OA_FORMAT_C4_B8 = 7, + + I915_OA_FORMAT_MAX /* non-ABI */ +}; + +enum drm_i915_oa_set { + I915_OA_METRICS_SET_3D = 1, + + I915_OA_METRICS_SET_MAX /* non-ABI */ +}; + /* Each region is a minimum of 16k, and there are at most 255 of them. */ #define I915_NR_TEX_REGIONS 255 /* table size 2k - maximum due to use @@ -1132,9 +1155,35 @@ struct drm_i915_gem_context_param { }; enum drm_i915_perf_event_type { + I915_PERF_OA_EVENT = 1, + I915_PERF_EVENT_TYPE_MAX /* non-ABI */ }; + +#define I915_OA_FLAG_PERIODIC (1<<0) + +struct drm_i915_perf_oa_attr { + __u32 size; + + __u32 flags; + + __u32 metrics_set; + __u32 oa_format; + __u32 oa_timer_exponent; +}; + +/* Note: same versioning scheme as struct perf_event_attr + * + * Userspace specified size defines ABI version and kernel + * zero extends to size of latest version. If userspace + * gives a larger structure than the kernel expects then + * kernel asserts that all unknown fields are zero. + */ +#define I915_OA_ATTR_SIZE_VER0 20 /* sizeof first published struct */ + + + #define I915_PERF_FLAG_FD_CLOEXEC (1<<0) #define I915_PERF_FLAG_FD_NONBLOCK (1<<1) #define I915_PERF_FLAG_SINGLE_CONTEXT (1<<2) @@ -1188,6 +1237,20 @@ enum drm_i915_perf_record_type { */ DRM_I915_PERF_RECORD_SAMPLE = 1, + /* + * Indicates that one or more OA reports was not written + * by the hardware. + */ + DRM_I915_PERF_RECORD_OA_REPORT_LOST = 2, + + /* + * Indicates that the internal circular buffer that Gen + * graphics writes OA reports into has filled, which may + * either mean that old reports could be overwritten or + * subsequent reports lost until the buffer is cleared. + */ + DRM_I915_PERF_RECORD_OA_BUFFER_OVERFLOW = 3, + DRM_I915_PERF_RECORD_MAX /* non-ABI */ };