From patchwork Wed Nov 15 12:13:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: sagar.a.kamble@intel.com X-Patchwork-Id: 10059267 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 275036019D for ; Wed, 15 Nov 2017 12:10:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 20BEB29F2C for ; Wed, 15 Nov 2017 12:10:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1429429F31; Wed, 15 Nov 2017 12:10:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7207A29F2C for ; Wed, 15 Nov 2017 12:10:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D6A0A6E4B8; Wed, 15 Nov 2017 12:10:26 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id C83BE6E4A2 for ; Wed, 15 Nov 2017 12:10:23 +0000 (UTC) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Nov 2017 04:10:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,399,1505804400"; d="scan'208";a="149720746" Received: from sakamble-desktop.iind.intel.com ([10.223.26.118]) by orsmga004.jf.intel.com with ESMTP; 15 Nov 2017 04:10:21 -0800 From: Sagar Arun Kamble To: intel-gfx@lists.freedesktop.org Date: Wed, 15 Nov 2017 17:43:51 +0530 Message-Id: <1510748034-14034-2-git-send-email-sagar.a.kamble@intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1510748034-14034-1-git-send-email-sagar.a.kamble@intel.com> References: <1510748034-14034-1-git-send-email-sagar.a.kamble@intel.com> Cc: Sourab Gupta , Matthew Auld Subject: [Intel-gfx] [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We can compute system time corresponding to GPU timestamp by taking a reference point and then adding delta time computed using timecounter and cyclecounter support in kernel. We have to configure cyclecounter with the GPU timestamp frequency. In further patches we can leverage timecounter_cyc2time function to compute the system time corresponding to GPU timestamp cycles derived from OA report. Important thing to note is timecounter_cyc2time considers time backwards if delta timestamp is more than half the max ns time covered by counter. (It will be ~35min for 36 bit counter. If this much sampling duration is needed we will have to update tc->nsec by explicitly reading the timecounter after duration less than 35min during sampling) On enabling perf stream we start the timecounter/cyclecounter and while collecting OA samples we translate GPU timestamp to System timestamp. Earlier approach that was based on cross-timestamp is not needed. It was being used to approximate the frequency based on invalid assumptions (possibly drift was being seen in the time due to precision issue). The precision of time from GPU clocks is already in ns and timecounter takes care of it as verified over variable durations. Cross-timestamp might be valuable to get the precise reference point w.r.t device and system time but it needs a support for reading ART counter from i915 which I find not available for i915. Hence, we are compensating the offset to setup the reference point ourselves. With this approach we have very fine precision of device and system time only differing by 5-10us. Signed-off-by: Sagar Arun Kamble Cc: Lionel Landwerlin Cc: Chris Wilson Cc: Sourab Gupta Cc: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.h | 9 +++++++ drivers/gpu/drm/i915/i915_perf.c | 53 ++++++++++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 6 +++++ 3 files changed, 68 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 2158a75..e08bc85 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -2149,6 +2150,14 @@ struct i915_perf_stream { * @oa_config: The OA configuration used by the stream. */ struct i915_oa_config *oa_config; + + /** + * System time correlation variables. + */ + struct cyclecounter cc; + spinlock_t systime_lock; + struct timespec64 start_systime; + struct timecounter tc; }; /** diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 00be015..72ddc34 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -192,6 +192,7 @@ */ #include +#include #include #include @@ -2391,6 +2392,56 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait) } /** + * i915_cyclecounter_read - read raw cycle/timestamp counter + * @cc: cyclecounter structure + */ +static u64 i915_cyclecounter_read(const struct cyclecounter *cc) +{ + struct i915_perf_stream *stream = container_of(cc, typeof(*stream), cc); + struct drm_i915_private *dev_priv = stream->dev_priv; + u64 ts_count; + + intel_runtime_pm_get(dev_priv); + ts_count = I915_READ64_2x32(GEN4_TIMESTAMP, + GEN7_TIMESTAMP_UDW); + intel_runtime_pm_put(dev_priv); + + return ts_count; +} + +static void i915_perf_init_cyclecounter(struct i915_perf_stream *stream) +{ + struct drm_i915_private *dev_priv = stream->dev_priv; + int cs_ts_freq = dev_priv->perf.oa.timestamp_frequency; + struct cyclecounter *cc = &stream->cc; + u32 maxsec; + + cc->read = i915_cyclecounter_read; + cc->mask = CYCLECOUNTER_MASK(CS_TIMESTAMP_WIDTH(dev_priv)); + maxsec = cc->mask / cs_ts_freq; + + clocks_calc_mult_shift(&cc->mult, &cc->shift, cs_ts_freq, + NSEC_PER_SEC, maxsec); +} + +static void i915_perf_init_timecounter(struct i915_perf_stream *stream) +{ +#define SYSTIME_START_OFFSET 350000 /* Counter read takes about 350us */ + unsigned long flags; + u64 ns; + + i915_perf_init_cyclecounter(stream); + spin_lock_init(&stream->systime_lock); + + getnstimeofday64(&stream->start_systime); + ns = timespec64_to_ns(&stream->start_systime) + SYSTIME_START_OFFSET; + + spin_lock_irqsave(&stream->systime_lock, flags); + timecounter_init(&stream->tc, &stream->cc, ns); + spin_unlock_irqrestore(&stream->systime_lock, flags); +} + +/** * i915_perf_enable_locked - handle `I915_PERF_IOCTL_ENABLE` ioctl * @stream: A disabled i915 perf stream * @@ -2408,6 +2459,8 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream) /* Allow stream->ops->enable() to refer to this */ stream->enabled = true; + i915_perf_init_timecounter(stream); + if (stream->ops->enable) stream->ops->enable(stream); } diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index cfdf4f8..e7e6966 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -8882,6 +8882,12 @@ enum skl_power_gate { /* Gen4+ Timestamp and Pipe Frame time stamp registers */ #define GEN4_TIMESTAMP _MMIO(0x2358) +#define GEN7_TIMESTAMP_UDW _MMIO(0x235C) +#define PRE_GEN7_TIMESTAMP_WIDTH 32 +#define GEN7_TIMESTAMP_WIDTH 36 +#define CS_TIMESTAMP_WIDTH(dev_priv) \ + (INTEL_GEN(dev_priv) < 7 ? PRE_GEN7_TIMESTAMP_WIDTH : \ + GEN7_TIMESTAMP_WIDTH) #define ILK_TIMESTAMP_HI _MMIO(0x70070) #define IVB_TIMESTAMP_CTR _MMIO(0x44070)