From patchwork Thu Jun 27 08:00:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11019041 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 234CB14E5 for ; Thu, 27 Jun 2019 08:01:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 18CF720223 for ; Thu, 27 Jun 2019 08:01:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0D4DB2898E; Thu, 27 Jun 2019 08:01:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 573E8285A5 for ; Thu, 27 Jun 2019 08:01:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DCC916E82A; Thu, 27 Jun 2019 08:01:16 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id F33506E82A for ; Thu, 27 Jun 2019 08:01:03 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Jun 2019 01:01:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,423,1557212400"; d="scan'208";a="313709911" Received: from lswidere-mobl.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.249.140.121]) by orsmga004.jf.intel.com with ESMTP; 27 Jun 2019 01:01:02 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Thu, 27 Jun 2019 11:00:44 +0300 Message-Id: <20190627080045.8814-10-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.21.0.392.gf8f6787159e In-Reply-To: <20190627080045.8814-1-lionel.g.landwerlin@intel.com> References: <20190627080045.8814-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v5 09/10] drm/i915/perf: execute OA configuration from command stream X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We can't run into issues with doing writing the global OA/NOA registers configuration from CPU so far, but HW engineers actually recommend doing this from the command streamer. Since we have a command buffer prepared for the execbuffer side of things, we can reuse that approach here too. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 203 +++++++++++++++---------------- 1 file changed, 100 insertions(+), 103 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index bf4f5fee6764..7e636463e1f5 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -389,6 +389,19 @@ void i915_oa_config_release(struct kref *ref) kfree(oa_config); } +static void i915_oa_config_dispose_buffers(struct drm_i915_private *i915) +{ + struct i915_oa_config *oa_config, *next; + + mutex_lock(&i915->perf.metrics_lock); + list_for_each_entry_safe(oa_config, next, &i915->perf.metrics_buffers, vma_link) { + list_del(&oa_config->vma_link); + i915_gem_object_put(oa_config->obj); + oa_config->obj = NULL; + } + mutex_unlock(&i915->perf.metrics_lock); +} + static u32 *write_cs_mi_lri(u32 *cs, const struct i915_oa_reg *reg_data, u32 n_regs) { u32 i; @@ -1813,67 +1826,86 @@ static int alloc_noa_wait(struct drm_i915_private *i915) return ret; } -static void config_oa_regs(struct drm_i915_private *dev_priv, - const struct i915_oa_reg *regs, - u32 n_regs) +static int config_oa_regs(struct drm_i915_private *i915, + struct i915_oa_config *oa_config) { - u32 i; + struct i915_request *rq; + struct i915_vma *vma; + long timeout; + u32 *cs; + int err; - for (i = 0; i < n_regs; i++) { - const struct i915_oa_reg *reg = regs + i; + rq = i915_request_create(i915->engine[RCS0]->kernel_context); + if (IS_ERR(rq)) + return PTR_ERR(rq); - I915_WRITE(reg->addr, reg->value); + err = i915_active_request_set(&i915->engine[RCS0]->last_oa_config, + rq); + if (err) { + i915_request_add(rq); + return err; + } + + vma = i915_vma_instance(oa_config->obj, &i915->ggtt.vm, NULL); + if (unlikely(IS_ERR(vma))) { + i915_request_add(rq); + return PTR_ERR(vma); + } + + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL); + if (err) { + i915_request_add(rq); + return err; } + + err = i915_vma_move_to_active(vma, rq, 0); + if (err) { + i915_vma_unpin(vma); + i915_request_add(rq); + return err; + } + + cs = intel_ring_begin(rq, INTEL_GEN(i915) >= 8 ? 4 : 2); + if (IS_ERR(cs)) { + i915_vma_unpin(vma); + i915_request_add(rq); + return PTR_ERR(cs); + } + + if (INTEL_GEN(i915) > 8) { + *cs++ = MI_BATCH_BUFFER_START_GEN8; + *cs++ = lower_32_bits(vma->node.start); + *cs++ = upper_32_bits(vma->node.start); + *cs++ = MI_NOOP; + } else { + *cs++ = MI_BATCH_BUFFER_START; + *cs++ = vma->node.start; + } + + intel_ring_advance(rq, cs); + + i915_vma_unpin(vma); + + i915_request_add(rq); + + i915_request_get(rq); + timeout = i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED, + MAX_SCHEDULE_TIMEOUT); + i915_request_put(rq); + + return timeout < 0 ? err : 0; } static int hsw_enable_metric_set(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; - const struct i915_oa_config *oa_config = stream->oa_config; - /* PRM: - * - * OA unit is using “crclk” for its functionality. When trunk - * level clock gating takes place, OA clock would be gated, - * unable to count the events from non-render clock domain. - * Render clock gating must be disabled when OA is enabled to - * count the events from non-render domain. Unit level clock - * gating for RCS should also be disabled. - */ I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) & ~GEN7_DOP_CLOCK_GATE_ENABLE)); I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) | GEN6_CSUNIT_CLOCK_GATE_DISABLE)); - config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len); - - /* It apparently takes a fairly long time for a new MUX - * configuration to be be applied after these register writes. - * This delay duration was derived empirically based on the - * render_basic config but hopefully it covers the maximum - * configuration latency. - * - * As a fallback, the checks in _append_oa_reports() to skip - * invalid OA reports do also seem to work to discard reports - * generated before this config has completed - albeit not - * silently. - * - * Unfortunately this is essentially a magic number, since we - * don't currently know of a reliable mechanism for predicting - * how long the MUX config will take to apply and besides - * seeing invalid reports we don't know of a reliable way to - * explicitly check that the MUX config has landed. - * - * It's even possible we've miss characterized the underlying - * problem - it just seems like the simplest explanation why - * a delay at this location would mitigate any invalid reports. - */ - usleep_range(15000, 20000); - - config_oa_regs(dev_priv, oa_config->b_counter_regs, - oa_config->b_counter_regs_len); - - return 0; + return config_oa_regs(dev_priv, stream->oa_config); } static void hsw_disable_metric_set(struct drm_i915_private *dev_priv) @@ -1978,7 +2010,6 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv, { unsigned int map_type = i915_coherent_map_type(dev_priv); struct i915_gem_context *ctx; - struct i915_request *rq; int ret; lockdep_assert_held(&dev_priv->drm.struct_mutex); @@ -2037,14 +2068,9 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv, } /* - * Apply the configuration by doing one context restore of the edited - * context image. + * The above configuration will be applied when called + * config_oa_regs(). */ - rq = i915_request_create(dev_priv->engine[RCS0]->kernel_context); - if (IS_ERR(rq)) - return PTR_ERR(rq); - - i915_request_add(rq); return 0; } @@ -2093,35 +2119,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) if (ret) return ret; - config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len); - - /* It apparently takes a fairly long time for a new MUX - * configuration to be be applied after these register writes. - * This delay duration was derived empirically based on the - * render_basic config but hopefully it covers the maximum - * configuration latency. - * - * As a fallback, the checks in _append_oa_reports() to skip - * invalid OA reports do also seem to work to discard reports - * generated before this config has completed - albeit not - * silently. - * - * Unfortunately this is essentially a magic number, since we - * don't currently know of a reliable mechanism for predicting - * how long the MUX config will take to apply and besides - * seeing invalid reports we don't know of a reliable way to - * explicitly check that the MUX config has landed. - * - * It's even possible we've miss characterized the underlying - * problem - it just seems like the simplest explanation why - * a delay at this location would mitigate any invalid reports. - */ - usleep_range(15000, 20000); - - config_oa_regs(dev_priv, oa_config->b_counter_regs, - oa_config->b_counter_regs_len); - - return 0; + return config_oa_regs(dev_priv, stream->oa_config); } static void gen8_disable_metric_set(struct drm_i915_private *dev_priv) @@ -2292,6 +2290,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, struct perf_open_properties *props) { struct drm_i915_private *dev_priv = stream->dev_priv; + struct drm_i915_gem_object *obj; int format_size; int ret; @@ -2376,13 +2375,6 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, } } - ret = i915_perf_get_oa_config(dev_priv, props->metrics_set, - &stream->oa_config, NULL); - if (ret) { - DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); - goto err_config; - } - ret = alloc_noa_wait(dev_priv); if (ret) { DRM_DEBUG("Unable to allocate NOA wait batch buffer\n"); @@ -2412,6 +2404,19 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, if (ret) goto err_lock; + ret = i915_perf_get_oa_config(dev_priv, props->metrics_set, + &stream->oa_config, &obj); + if (ret) { + DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); + goto err_config; + } + + /* + * We just need the buffer to be created, but not our own reference on + * it as the oa_config already has one. + */ + i915_gem_object_put(obj); + stream->ops = &i915_oa_stream_ops; dev_priv->perf.oa.exclusive_stream = stream; @@ -2430,14 +2435,16 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, err_enable: dev_priv->perf.oa.exclusive_stream = NULL; dev_priv->perf.oa.ops.disable_metric_set(dev_priv); + +err_config: + i915_oa_config_put(stream->oa_config); + i915_oa_config_dispose_buffers(dev_priv); mutex_unlock(&dev_priv->drm.struct_mutex); err_lock: free_oa_buffer(dev_priv); err_oa_buf_alloc: - i915_oa_config_put(stream->oa_config); - intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref); @@ -2446,9 +2453,6 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, mutex_unlock(&dev_priv->drm.struct_mutex); err_noa_wait_alloc: - i915_oa_config_put(stream->oa_config); - -err_config: if (stream->ctx) oa_put_render_ctx_id(stream); @@ -2810,20 +2814,13 @@ static int i915_perf_release(struct inode *inode, struct file *file) { struct i915_perf_stream *stream = file->private_data; struct drm_i915_private *dev_priv = stream->dev_priv; - struct i915_oa_config *oa_config, *next; mutex_lock(&dev_priv->perf.lock); i915_perf_destroy_locked(stream); /* Dispose of all oa config batch buffers. */ - mutex_lock(&dev_priv->perf.metrics_lock); - list_for_each_entry_safe(oa_config, next, &dev_priv->perf.metrics_buffers, vma_link) { - list_del(&oa_config->vma_link); - i915_gem_object_put(oa_config->obj); - oa_config->obj = NULL; - } - mutex_unlock(&dev_priv->perf.metrics_lock); + i915_oa_config_dispose_buffers(dev_priv); mutex_unlock(&dev_priv->perf.lock);