diff mbox

[v4] drm/i915 : Avoid superfluous invalidation of CPU cache lines

Message ID 1449043664-24371-1-git-send-email-akash.goel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

akash.goel@intel.com Dec. 2, 2015, 8:07 a.m. UTC
From: Akash Goel <akash.goel@intel.com>

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

v2: Made the comment more verbose (Ville/Chris)
    Added doc for 'cache_clean' field (Daniel)

v3: Updated the comment to assuage an apprehension regarding the
    speculative-prefetching behavior of HW (Ville/Chris)

v4: Renamed 'cache_clean' to 'cache_flushed' as its more appropriate (Ville)
    Made minor update in the comments for more clarity (Chris)

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_set_domain
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem.c | 19 ++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

Comments

Chris Wilson Dec. 6, 2015, 5:03 p.m. UTC | #1
On Wed, Dec 02, 2015 at 01:37:44PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> When the object is moved out of CPU read domain, the cachelines
> are not invalidated immediately. The invalidation is deferred till
> next time the object is brought back into CPU read domain.
> But the invalidation is done unconditionally, i.e. even for the case
> where the cachelines were flushed previously, when the object moved out
> of CPU write domain. This is avoidable and would lead to some optimization.
> Though this is not a hypothetical case, but is unlikely to occur often.
> The aim is to detect changes to the backing storage whilst the
> data is potentially in the CPU cache, and only clflush in those case.
> 
> v2: Made the comment more verbose (Ville/Chris)
>     Added doc for 'cache_clean' field (Daniel)
> 
> v3: Updated the comment to assuage an apprehension regarding the
>     speculative-prefetching behavior of HW (Ville/Chris)
> 
> v4: Renamed 'cache_clean' to 'cache_flushed' as its more appropriate (Ville)
>     Made minor update in the comments for more clarity (Chris)

Ah, spotted one nuisance we have in i915_gem_obj_prepare_shmem_read()
that needs to clear the cache_flushed flag as after that call we will
pollute the CPU cache (whilst pretending the object is not in the CPU
cache domain).
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 11ae5a5..e6e4bb0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2100,6 +2100,15 @@  struct drm_i915_gem_object {
 	unsigned int cache_level:3;
 	unsigned int cache_dirty:1;
 
+	/*
+	 * Tracks if the CPU cache has been completely flushed, on which
+	 * there should be no data in CPU cachelines for the object.
+	 * cache_flushed would also imply !cache_dirty (no data in
+	 * cachelines, so not dirty also).
+	 * cache_dirty just tracks whether we have been omitting clflushes.
+	 */
+	unsigned int cache_flushed:1;
+
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
 	unsigned int pin_display;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 33adc8f..cdc50d8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@  i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	trace_i915_gem_object_clflush(obj);
 	drm_clflush_sg(obj->pages);
 	obj->cache_dirty = false;
+	obj->cache_flushed = true;
 
 	return true;
 }
@@ -3982,7 +3983,23 @@  i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj, false);
+		/* When the object was moved out of the CPU domain following a
+		 * CPU write, we will have flushed it out of the CPU cache (and
+		 * marked the object as cache_flushed).
+		 * After the clflush we know that this object cannot be in the
+		 * CPU cache, nor can it be speculatively loaded into the CPU
+		 * cache as our objects are page-aligned and speculation cannot
+		 * cross page boundaries. So whilst the cache_flushed flag is
+		 * set, we know that any future access to the object's pages
+		 * will miss the CPU cache and have to be serviced from main
+		 * memory (where they will pick up any writes through the GTT or
+		 * by the GPU) i.e. we do not need another clflush here and now
+		 * to invalidate the CPU cache as we prepare to read from the
+		 * object.
+		 */
+		if (!obj->cache_flushed)
+			i915_gem_clflush_object(obj, false);
+		obj->cache_flushed = false;
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}