From patchwork Wed Dec 2 08:07:44 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: akash.goel@intel.com X-Patchwork-Id: 7743401 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B09F3BEEE1 for ; Wed, 2 Dec 2015 07:57:43 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CF0CA20631 for ; Wed, 2 Dec 2015 07:57:42 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id B984A20503 for ; Wed, 2 Dec 2015 07:57:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 404F77207C; Tue, 1 Dec 2015 23:57:40 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id 65B317207C for ; Tue, 1 Dec 2015 23:57:39 -0800 (PST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP; 01 Dec 2015 23:57:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,372,1444719600"; d="scan'208";a="832478265" Received: from akashgoe-desktop.iind.intel.com ([10.223.82.141]) by orsmga001.jf.intel.com with ESMTP; 01 Dec 2015 23:57:37 -0800 From: akash.goel@intel.com To: intel-gfx@lists.freedesktop.org Date: Wed, 2 Dec 2015 13:37:44 +0530 Message-Id: <1449043664-24371-1-git-send-email-akash.goel@intel.com> X-Mailer: git-send-email 1.9.2 In-Reply-To: <565DB60E.3000706@intel.com> References: <565DB60E.3000706@intel.com> Cc: =ville.syrjala@linux.intel.com, Akash Goel Subject: [Intel-gfx] [PATCH v4] drm/i915 : Avoid superfluous invalidation of CPU cache lines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Akash Goel When the object is moved out of CPU read domain, the cachelines are not invalidated immediately. The invalidation is deferred till next time the object is brought back into CPU read domain. But the invalidation is done unconditionally, i.e. even for the case where the cachelines were flushed previously, when the object moved out of CPU write domain. This is avoidable and would lead to some optimization. Though this is not a hypothetical case, but is unlikely to occur often. The aim is to detect changes to the backing storage whilst the data is potentially in the CPU cache, and only clflush in those case. v2: Made the comment more verbose (Ville/Chris) Added doc for 'cache_clean' field (Daniel) v3: Updated the comment to assuage an apprehension regarding the speculative-prefetching behavior of HW (Ville/Chris) v4: Renamed 'cache_clean' to 'cache_flushed' as its more appropriate (Ville) Made minor update in the comments for more clarity (Chris) Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_set_domain Signed-off-by: Chris Wilson Signed-off-by: Akash Goel --- drivers/gpu/drm/i915/i915_drv.h | 9 +++++++++ drivers/gpu/drm/i915/i915_gem.c | 19 ++++++++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 11ae5a5..e6e4bb0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2100,6 +2100,15 @@ struct drm_i915_gem_object { unsigned int cache_level:3; unsigned int cache_dirty:1; + /* + * Tracks if the CPU cache has been completely flushed, on which + * there should be no data in CPU cachelines for the object. + * cache_flushed would also imply !cache_dirty (no data in + * cachelines, so not dirty also). + * cache_dirty just tracks whether we have been omitting clflushes. + */ + unsigned int cache_flushed:1; + unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS; unsigned int pin_display; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 33adc8f..cdc50d8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj, trace_i915_gem_object_clflush(obj); drm_clflush_sg(obj->pages); obj->cache_dirty = false; + obj->cache_flushed = true; return true; } @@ -3982,7 +3983,23 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write) /* Flush the CPU cache if it's still invalid. */ if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) { - i915_gem_clflush_object(obj, false); + /* When the object was moved out of the CPU domain following a + * CPU write, we will have flushed it out of the CPU cache (and + * marked the object as cache_flushed). + * After the clflush we know that this object cannot be in the + * CPU cache, nor can it be speculatively loaded into the CPU + * cache as our objects are page-aligned and speculation cannot + * cross page boundaries. So whilst the cache_flushed flag is + * set, we know that any future access to the object's pages + * will miss the CPU cache and have to be serviced from main + * memory (where they will pick up any writes through the GTT or + * by the GPU) i.e. we do not need another clflush here and now + * to invalidate the CPU cache as we prepare to read from the + * object. + */ + if (!obj->cache_flushed) + i915_gem_clflush_object(obj, false); + obj->cache_flushed = false; obj->base.read_domains |= I915_GEM_DOMAIN_CPU; }