Message ID | 1308939194-3568-2-git-send-email-jbarnes@virtuousgeek.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 24 Jun 2011 11:13:14 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > The ring frequency scaling table tells the PCU to treat certain GPU > frequencies as if they were a given CPU frequency for purposes of > scaling the ring frequency. Normally the PCU will scale the ring > frequency based on the CPU P-state, but with the table present, it will > also take the GPU frequency into account. > > The main downside of keeping the ring frequency high while the CPU is > at a low frequency (or asleep altogether) is increased power > consumption. But then if you're keeping your GPU busy, you probably > want the extra performance. This one is working nicely here. Maybe I didn't test enough with the smash-to-3000 patch when I told you those results, but my 3 runs on a fresh boot were 64, 70, and 72 fps (in order, so there seems to be some sort of warming up to the system perhaps? Even though the delay between runs was sufficient for the GPU to clock back down). 73 is the fps I see with the cpu busy loop. Tested-by: Eric Anholt <eric@anholt.net> Comments below, then Reviewed-by: Eric Anholt <eric@anholt.net> > --- > drivers/gpu/drm/i915/i915_debugfs.c | 43 +++++++++++++++++++++++++ > drivers/gpu/drm/i915/i915_reg.h | 4 ++- > drivers/gpu/drm/i915/i915_suspend.c | 4 ++- > drivers/gpu/drm/i915/intel_display.c | 57 +++++++++++++++++++++++++++++++++- > drivers/gpu/drm/i915/intel_drv.h | 1 + > 5 files changed, 106 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index 4d46441..79394cd 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -1123,6 +1123,48 @@ static int i915_emon_status(struct seq_file *m, void *unused) > return 0; > } > > +static int i915_ring_freq_table(struct seq_file *m, void *unused) > +{ > + struct drm_info_node *node = (struct drm_info_node *) m->private; > + struct drm_device *dev = node->minor->dev; > + drm_i915_private_t *dev_priv = dev->dev_private; > + int ret; > + int gpu_freq, ia_freq; > + > + if (!IS_GEN6(dev)) { > + seq_printf(m, "unsupported on this chipset\n"); > + return 0; > + } > + > + ret = mutex_lock_interruptible(&dev->struct_mutex); > + if (ret) > + return ret; > + > + gen6_gt_force_wake_get(dev_priv); > + > + seq_printf(m, "GPU freq\tEffective CPU freq\n"); > + > + for (gpu_freq = dev_priv->min_delay; gpu_freq <= dev_priv->max_delay; > + gpu_freq++) { > + I915_WRITE(GEN6_PCODE_DATA, gpu_freq); > + I915_WRITE(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | > + GEN6_PCODE_READ_MIN_FREQ_TABLE); > + if (wait_for((I915_READ(GEN6_PCODE_MAILBOX) & > + GEN6_PCODE_READY) == 0, 10)) { > + DRM_ERROR("pcode write of freq table timed out\n"); > + continue; > + } s/write/read/ Might stick a note of units on the header. > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > index 86a3ec1..3b22e12 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > +void gen6_update_ring_freq(struct drm_i915_private *dev_priv) > +{ > + int min_freq = 15; > + int gpu_freq, ia_freq, max_ia_freq; > + int scaling_factor = 180; > + > + max_ia_freq = cpufreq_quick_get_max(0); > + /* default to 3GHz if none found, PCU will ensure we don't go over */ > + if (!max_ia_freq) > + max_ia_freq = 3000000; > + > + /* Convert from kHz to MHz */ > + max_ia_freq /= 1000; > + > + mutex_lock(&dev_priv->dev->struct_mutex); > + gen6_gt_force_wake_get(dev_priv); > + > + /* > + * For each potential GPU frequency, load a ring frequency we'd like > + * to use for memory access. We do this by specifying the IA frequency > + * the PCU should use as a reference to determine the ring frequency. > + */ > + for (gpu_freq = dev_priv->max_delay; gpu_freq >= dev_priv->min_delay; > + gpu_freq--) { > + int diff = dev_priv->max_delay - gpu_freq; > + > + /* > + * For GPU frequencies less than 750MHz, just use the lowest > + * ring freq. > + */ > + if (gpu_freq < min_freq) > + ia_freq = 800; > + else > + ia_freq = max_ia_freq - ((diff * scaling_factor) / 2); > + ia_freq = DIV_ROUND_CLOSEST(ia_freq, 100); If the GPU has a wide enough clock range (diff large) and the CPU is low enough clocked (max_ia_freq low now), could we end up with the ia_freq < 800, and would that be a bad thing? In other words, should scaling_factor be non-constant? > + I915_WRITE(GEN6_PCODE_DATA, > + (ia_freq << GEN6_PCODE_FREQ_IA_RATIO_SHIFT) | > + gpu_freq); > + I915_WRITE(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | > + GEN6_PCODE_WRITE_MIN_FREQ_TABLE); > + if (wait_for((I915_READ(GEN6_PCODE_MAILBOX) & > + GEN6_PCODE_READY) == 0, 10)) { > + DRM_ERROR("pcode write of freq table timed out\n"); > + continue; > + } > + } > + > + gen6_gt_force_wake_put(dev_priv); > + mutex_unlock(&dev_priv->dev->struct_mutex); > +}
On Fri, 24 Jun 2011 11:13:14 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > The ring frequency scaling table tells the PCU to treat certain GPU > frequencies as if they were a given CPU frequency for purposes of > scaling the ring frequency. Normally the PCU will scale the ring > frequency based on the CPU P-state, but with the table present, it will > also take the GPU frequency into account. So it wasn't picking up max-cpu-freq even though I have CONFIG_CPU_FREQ and the various ACPI and Intel cpu drivers. Taking a hint from x86/kvm, I used tsc_khz instead of the default 3000. nexuiz @10x7 nopatch -> 3000 -> 3300 [tsc_khz] uncached: 43.2 44.4 44.8 llc: 51.3 52.3 At max_freq=3000, a cpu busy loop was still able to nudge it up to the same speed as using the max_freq=3300 value. -Chris
On Mon, 27 Jun 2011 10:54:48 -0700 Eric Anholt <eric@anholt.net> wrote: > > + for (gpu_freq = dev_priv->max_delay; gpu_freq >= dev_priv->min_delay; > > + gpu_freq--) { > > + int diff = dev_priv->max_delay - gpu_freq; > > + > > + /* > > + * For GPU frequencies less than 750MHz, just use the lowest > > + * ring freq. > > + */ > > + if (gpu_freq < min_freq) > > + ia_freq = 800; > > + else > > + ia_freq = max_ia_freq - ((diff * scaling_factor) / 2); > > + ia_freq = DIV_ROUND_CLOSEST(ia_freq, 100); > > If the GPU has a wide enough clock range (diff large) and the CPU is low > enough clocked (max_ia_freq low now), could we end up with the ia_freq < > 800, and would that be a bad thing? In other words, should > scaling_factor be non-constant? scaling_factor probably should be non-constant, but I don't know what function it should follow. ia_freq < 800 shouldn't break anything, but would probably result in sub-optimal GPU performance. OTOH it would save power...
On Mon, 27 Jun 2011 20:40:08 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Fri, 24 Jun 2011 11:13:14 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > The ring frequency scaling table tells the PCU to treat certain GPU > > frequencies as if they were a given CPU frequency for purposes of > > scaling the ring frequency. Normally the PCU will scale the ring > > frequency based on the CPU P-state, but with the table present, it will > > also take the GPU frequency into account. > > So it wasn't picking up max-cpu-freq even though I have CONFIG_CPU_FREQ > and the various ACPI and Intel cpu drivers. > > Taking a hint from x86/kvm, I used tsc_khz instead of the default 3000. > > nexuiz @10x7 nopatch -> 3000 -> 3300 [tsc_khz] > uncached: 43.2 44.4 44.8 > llc: 51.3 52.3 > > At max_freq=3000, a cpu busy loop was still able to nudge it up to the > same speed as using the max_freq=3300 value. Ok tsc_khz is a good fallback, thanks.
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 4d46441..79394cd 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1123,6 +1123,48 @@ static int i915_emon_status(struct seq_file *m, void *unused) return 0; } +static int i915_ring_freq_table(struct seq_file *m, void *unused) +{ + struct drm_info_node *node = (struct drm_info_node *) m->private; + struct drm_device *dev = node->minor->dev; + drm_i915_private_t *dev_priv = dev->dev_private; + int ret; + int gpu_freq, ia_freq; + + if (!IS_GEN6(dev)) { + seq_printf(m, "unsupported on this chipset\n"); + return 0; + } + + ret = mutex_lock_interruptible(&dev->struct_mutex); + if (ret) + return ret; + + gen6_gt_force_wake_get(dev_priv); + + seq_printf(m, "GPU freq\tEffective CPU freq\n"); + + for (gpu_freq = dev_priv->min_delay; gpu_freq <= dev_priv->max_delay; + gpu_freq++) { + I915_WRITE(GEN6_PCODE_DATA, gpu_freq); + I915_WRITE(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | + GEN6_PCODE_READ_MIN_FREQ_TABLE); + if (wait_for((I915_READ(GEN6_PCODE_MAILBOX) & + GEN6_PCODE_READY) == 0, 10)) { + DRM_ERROR("pcode write of freq table timed out\n"); + continue; + } + ia_freq = I915_READ(GEN6_PCODE_DATA); + seq_printf(m, "%d\t\t%d\n", gpu_freq * 50, ia_freq * 100); + } + + gen6_gt_force_wake_put(dev_priv); + + mutex_unlock(&dev->struct_mutex); + + return 0; +} + static int i915_gfxec(struct seq_file *m, void *unused) { struct drm_info_node *node = (struct drm_info_node *) m->private; @@ -1426,6 +1468,7 @@ static struct drm_info_list i915_debugfs_list[] = { {"i915_inttoext_table", i915_inttoext_table, 0}, {"i915_drpc_info", i915_drpc_info, 0}, {"i915_emon_status", i915_emon_status, 0}, + {"i915_ring_freq_table", i915_ring_freq_table, 0}, {"i915_gfxec", i915_gfxec, 0}, {"i915_fbc_status", i915_fbc_status, 0}, {"i915_sr_status", i915_sr_status, 0}, diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 2f967af..757e024 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3433,7 +3433,9 @@ #define GEN6_PCODE_MAILBOX 0x138124 #define GEN6_PCODE_READY (1<<31) #define GEN6_READ_OC_PARAMS 0xc -#define GEN6_PCODE_WRITE_MIN_FREQ_TABLE 0x9 +#define GEN6_PCODE_WRITE_MIN_FREQ_TABLE 0x8 +#define GEN6_PCODE_READ_MIN_FREQ_TABLE 0x9 #define GEN6_PCODE_DATA 0x138128 +#define GEN6_PCODE_FREQ_IA_RATIO_SHIFT 8 #endif /* _I915_REG_H_ */ diff --git a/drivers/gpu/drm/i915/i915_suspend.c b/drivers/gpu/drm/i915/i915_suspend.c index 60a94d2..03f0fac 100644 --- a/drivers/gpu/drm/i915/i915_suspend.c +++ b/drivers/gpu/drm/i915/i915_suspend.c @@ -870,8 +870,10 @@ int i915_restore_state(struct drm_device *dev) intel_init_emon(dev); } - if (IS_GEN6(dev)) + if (IS_GEN6(dev)) { gen6_enable_rps(dev_priv); + gen6_update_ring_freq(dev_priv); + } /* Cache mode state */ I915_WRITE (CACHE_MODE_0, dev_priv->saveCACHE_MODE_0 | 0xffff0000); diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 86a3ec1..3b22e12 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -24,6 +24,7 @@ * Eric Anholt <eric@anholt.net> */ +#include <linux/cpufreq.h> #include <linux/module.h> #include <linux/input.h> #include <linux/i2c.h> @@ -7159,6 +7160,58 @@ void gen6_enable_rps(struct drm_i915_private *dev_priv) mutex_unlock(&dev_priv->dev->struct_mutex); } +void gen6_update_ring_freq(struct drm_i915_private *dev_priv) +{ + int min_freq = 15; + int gpu_freq, ia_freq, max_ia_freq; + int scaling_factor = 180; + + max_ia_freq = cpufreq_quick_get_max(0); + /* default to 3GHz if none found, PCU will ensure we don't go over */ + if (!max_ia_freq) + max_ia_freq = 3000000; + + /* Convert from kHz to MHz */ + max_ia_freq /= 1000; + + mutex_lock(&dev_priv->dev->struct_mutex); + gen6_gt_force_wake_get(dev_priv); + + /* + * For each potential GPU frequency, load a ring frequency we'd like + * to use for memory access. We do this by specifying the IA frequency + * the PCU should use as a reference to determine the ring frequency. + */ + for (gpu_freq = dev_priv->max_delay; gpu_freq >= dev_priv->min_delay; + gpu_freq--) { + int diff = dev_priv->max_delay - gpu_freq; + + /* + * For GPU frequencies less than 750MHz, just use the lowest + * ring freq. + */ + if (gpu_freq < min_freq) + ia_freq = 800; + else + ia_freq = max_ia_freq - ((diff * scaling_factor) / 2); + ia_freq = DIV_ROUND_CLOSEST(ia_freq, 100); + + I915_WRITE(GEN6_PCODE_DATA, + (ia_freq << GEN6_PCODE_FREQ_IA_RATIO_SHIFT) | + gpu_freq); + I915_WRITE(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | + GEN6_PCODE_WRITE_MIN_FREQ_TABLE); + if (wait_for((I915_READ(GEN6_PCODE_MAILBOX) & + GEN6_PCODE_READY) == 0, 10)) { + DRM_ERROR("pcode write of freq table timed out\n"); + continue; + } + } + + gen6_gt_force_wake_put(dev_priv); + mutex_unlock(&dev_priv->dev->struct_mutex); +} + static void ironlake_init_clock_gating(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; @@ -7777,8 +7830,10 @@ void intel_modeset_init(struct drm_device *dev) intel_init_emon(dev); } - if (IS_GEN6(dev)) + if (IS_GEN6(dev)) { gen6_enable_rps(dev_priv); + gen6_update_ring_freq(dev_priv); + } INIT_WORK(&dev_priv->idle_work, intel_idle_update); setup_timer(&dev_priv->idle_timer, intel_gpu_idle_timer, diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 9ffa61e..8ac3bd8 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -317,6 +317,7 @@ extern void intel_enable_clock_gating(struct drm_device *dev); extern void ironlake_enable_drps(struct drm_device *dev); extern void ironlake_disable_drps(struct drm_device *dev); extern void gen6_enable_rps(struct drm_i915_private *dev_priv); +extern void gen6_update_ring_freq(struct drm_i915_private *dev_priv); extern void gen6_disable_rps(struct drm_device *dev); extern void intel_init_emon(struct drm_device *dev);
The ring frequency scaling table tells the PCU to treat certain GPU frequencies as if they were a given CPU frequency for purposes of scaling the ring frequency. Normally the PCU will scale the ring frequency based on the CPU P-state, but with the table present, it will also take the GPU frequency into account. The main downside of keeping the ring frequency high while the CPU is at a low frequency (or asleep altogether) is increased power consumption. But then if you're keeping your GPU busy, you probably want the extra performance. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> --- drivers/gpu/drm/i915/i915_debugfs.c | 43 +++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 4 ++- drivers/gpu/drm/i915/i915_suspend.c | 4 ++- drivers/gpu/drm/i915/intel_display.c | 57 +++++++++++++++++++++++++++++++++- drivers/gpu/drm/i915/intel_drv.h | 1 + 5 files changed, 106 insertions(+), 3 deletions(-)