Message ID | 1309563307-5480-1-git-send-email-jbarnes@virtuousgeek.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 1 Jul 2011 16:35:07 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > By default, the GPU will only share a very small portion of the CPU > cache. With this change, both the GPU and CPU will have full access to > the cache, which should help (sometimes a lot) in most cases. What's the trade off? Is the GPU data in the cache treated differently than CPU data when it comes to cache eviction and so this asymmetrically hurts CPU bound applications? At the least it will force more CPU data out of the cache, which will be enough to make some people scream and howl. Do we want to expose this as a parameter whilst we test various configurations? Is this just yet a another step on the path to a coordinated cpu-gpu governor? -Chris
On Fri, 1 Jul 2011 16:35:07 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > By default, the GPU will only share a very small portion of the CPU > cache. With this change, both the GPU and CPU will have full access to > the cache, which should help (sometimes a lot) in most cases. Joy, this looks to be at best a mixed blessing. For CPU bound games like padman, it degrades performance by about 5% on my desktop SNB. But for nexuiz, there appears to be little change. The ddx shows further regression of the order of 10%. The immediate suspect is that it hurts the use of pixman for trapezoid mask generation, which whilst being less than ideal behaviour and will be fixed in the near future, is indicative of the sort of negative impact this change will have on CPU-memory bound applications. Conversely the equivalent spans-based code is about the only example I found that is sped up by the patch, by about 3%. Having just checked up on 0x900c, I'm even more confused. From my old specs, the register is SNPCR, the snoop control register, which makes more sense than MBC, and that 0<<21 is for the maximum uncore resources, the default setting and the default on my SNB, with 1<<21 being the medium setting. Now, the only reference I have is the register dump with no explanation of what the resource that is actually being controlled... -Chris
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 4a446b1..eac59f1 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -78,6 +78,11 @@ #define GRDOM_RENDER (1<<2) #define GRDOM_MEDIA (3<<2) +#define GEN6_MBCUNIT_CFG 0x900c /* for LLC config */ +#define GEN6_MBC_LLC_CFG_MASK (3<<21) +#define GEN6_MBC_LLC_CFG_FULL (1<<21) /* full sharing of 16/16ths of the cache */ +#define GEN6_MBC_LLC_CFG_MIN (3<<21) /* only 1/16th of the cache is shared */ + #define GEN6_GDRST 0x941c #define GEN6_GRDOM_FULL (1 << 0) #define GEN6_GRDOM_RENDER (1 << 1) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 823b8d9..0ed4ed2 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -7279,6 +7279,7 @@ void gen6_update_ring_freq(struct drm_i915_private *dev_priv) int min_freq = 15; int gpu_freq, ia_freq, max_ia_freq; int scaling_factor = 180; + u32 mbccfg; max_ia_freq = cpufreq_quick_get_max(0); /* @@ -7293,6 +7294,12 @@ void gen6_update_ring_freq(struct drm_i915_private *dev_priv) mutex_lock(&dev_priv->dev->struct_mutex); + /* Update the cache sharing policy here as well */ + mbccfg = I915_READ(GEN6_MBCUNIT_CFG); + mbccfg &= ~GEN6_MBC_LLC_CFG_MASK; + mbccfg |= GEN6_MBC_LLC_CFG_FULL; + I915_WRITE(GEN6_MBCUNIT_CFG, mbccfg); + /* * For each potential GPU frequency, load a ring frequency we'd like * to use for memory access. We do this by specifying the IA frequency
By default, the GPU will only share a very small portion of the CPU cache. With this change, both the GPU and CPU will have full access to the cache, which should help (sometimes a lot) in most cases. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> --- drivers/gpu/drm/i915/i915_reg.h | 5 +++++ drivers/gpu/drm/i915/intel_display.c | 7 +++++++ 2 files changed, 12 insertions(+), 0 deletions(-)