diff mbox

[1/2,2.6.37.stable] intel_idle: disable NHM/WSM HW C-state auto-demotion

Message ID alpine.LFD.2.02.1103222222490.10489@x980 (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Len Brown March 23, 2011, 2:24 a.m. UTC
None

Comments

Pavel Machek April 8, 2011, 1:16 p.m. UTC | #1
Hi!

> Hardware C-state auto-demotion is a mechanism where the HW overrides
> the OS C-state request, instead demoting to a shallower state,
> which is less expensive, but saves less power.
> 
> Modern Linux should generally get exactly the states it requests.
> In particular, when a CPU is taken off-line, it must not be demoted, else
> it can prevent the entire package from reaching deep C-states.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=25252
> 
>  
> +#define MSR_NHM_SNB_PKG_CST_CFG_CTL	0x000000e2
> +#define NHM_C3_AUTO_DEMOTE		(1UL << 25)
> +#define NHM_C1_AUTO_DEMOTE		(1UL << 26)
> +
....
> @@ -85,6 +86,12 @@ static int intel_idle(struct cpuidle_dev
>  static struct cpuidle_state *cpuidle_state_table;
>  
>  /*
> + * Hardware C-state auto-demotion may not always be optimal.
> + * Indicate which enable bits to clear here.
> + */
> +static unsigned long long auto_demotion_disable_flags;

Why long long here, but long above?
Pavel Machek April 8, 2011, 1:16 p.m. UTC | #2
> On Wed, 23 Mar 2011, Greg KH wrote:
> 
> > On Tue, Mar 22, 2011 at 10:24:43PM -0400, Len Brown wrote:
> > > From: Len Brown <len.brown@intel.com>
> > > 
> > > upstream 14796fca2bd22acc73dd0887248d003b0f441d08
> > > 
> > > Hardware C-state auto-demotion is a mechanism where the HW overrides
> > > the OS C-state request, instead demoting to a shallower state,
> > > which is less expensive, but saves less power.
> > > 
> > > Modern Linux should generally get exactly the states it requests.
> > > In particular, when a CPU is taken off-line, it must not be demoted, else
> > > it can prevent the entire package from reaching deep C-states.
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=25252
> > > 
> > > Signed-off-by: Len Brown <len.brown@intel.com>
> > 
> > This patch doesn't apply to the .38-stable tree, and it also needs to go
> > there, right?  Can you please send a version that I can apply?
> 
> it is already present in 2.6.38, as is the next one.
> That is why the subject specified 2.6.37.stable.

Why is it stable material? Seems like small power optimalization to me...
Len Brown April 8, 2011, 4:14 p.m. UTC | #3
> Why is it stable material? Seems like small power optimalization to me...

Before this patch, an offline CPU could prevent
package-wide C-states.  There is a significant
power difference between being able to enter
a package-wide C-state vs. not.

While processor offline is not used by everybody,
some products use it routinely.  Indeed, MRST
simply can not ship without this patch.

thanks,
Len Brown, Intel Open Source Technology Center
Len Brown April 8, 2011, 10:31 p.m. UTC | #4
> > Hardware C-state auto-demotion is a mechanism where the HW overrides
> > the OS C-state request, instead demoting to a shallower state,
> > which is less expensive, but saves less power.
> > 
> > Modern Linux should generally get exactly the states it requests.
> > In particular, when a CPU is taken off-line, it must not be demoted, else
> > it can prevent the entire package from reaching deep C-states.
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=25252
> > 
> >  
> > +#define MSR_NHM_SNB_PKG_CST_CFG_CTL	0x000000e2
> > +#define NHM_C3_AUTO_DEMOTE		(1UL << 25)
> > +#define NHM_C1_AUTO_DEMOTE		(1UL << 26)
> > +
> ....
> > @@ -85,6 +86,12 @@ static int intel_idle(struct cpuidle_dev
> >  static struct cpuidle_state *cpuidle_state_table;
> >  
> >  /*
> > + * Hardware C-state auto-demotion may not always be optimal.
> > + * Indicate which enable bits to clear here.
> > + */
> > +static unsigned long long auto_demotion_disable_flags;
> 
> Why long long here, but long above?

long long here because MSR accesses are 64-bits, even on
32-bit builds.

UL above because it matches the style of the neighboring code.
The "UL" is purely cosmetic, of course.  Same code is generated
if it were "ULL" or left off entirely; as the LHS is 64-bit.

cheers,
Len Brown, Intel Open Source Technology Center
Pavel Machek April 15, 2011, 4:25 p.m. UTC | #5
Hi!

> > > @@ -85,6 +86,12 @@ static int intel_idle(struct cpuidle_dev
> > >  static struct cpuidle_state *cpuidle_state_table;
> > >  
> > >  /*
> > > + * Hardware C-state auto-demotion may not always be optimal.
> > > + * Indicate which enable bits to clear here.
> > > + */
> > > +static unsigned long long auto_demotion_disable_flags;
> > 
> > Why long long here, but long above?
> 
> long long here because MSR accesses are 64-bits, even on
> 32-bit builds.
> 
> UL above because it matches the style of the neighboring code.
> The "UL" is purely cosmetic, of course.  Same code is generated
> if it were "ULL" or left off entirely; as the LHS is 64-bit.

Normally, if register is always 64bit, it should be u64.

And consistency should not be a reason to add misleading postfixes...
								Pavel
diff mbox

Patch

Index: linux-2.6.37.y/arch/x86/include/asm/msr-index.h
===================================================================
--- linux-2.6.37.y.orig/arch/x86/include/asm/msr-index.h
+++ linux-2.6.37.y/arch/x86/include/asm/msr-index.h
@@ -36,6 +36,10 @@ 
 #define MSR_IA32_PERFCTR1		0x000000c2
 #define MSR_FSB_FREQ			0x000000cd
 
+#define MSR_NHM_SNB_PKG_CST_CFG_CTL	0x000000e2
+#define NHM_C3_AUTO_DEMOTE		(1UL << 25)
+#define NHM_C1_AUTO_DEMOTE		(1UL << 26)
+
 #define MSR_MTRRcap			0x000000fe
 #define MSR_IA32_BBL_CR_CTL		0x00000119
 
Index: linux-2.6.37.y/drivers/idle/intel_idle.c
===================================================================
--- linux-2.6.37.y.orig/drivers/idle/intel_idle.c
+++ linux-2.6.37.y/drivers/idle/intel_idle.c
@@ -62,6 +62,7 @@ 
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <asm/mwait.h>
+#include <asm/msr.h>
 
 #define INTEL_IDLE_VERSION "0.4"
 #define PREFIX "intel_idle: "
@@ -85,6 +86,12 @@  static int intel_idle(struct cpuidle_dev
 static struct cpuidle_state *cpuidle_state_table;
 
 /*
+ * Hardware C-state auto-demotion may not always be optimal.
+ * Indicate which enable bits to clear here.
+ */
+static unsigned long long auto_demotion_disable_flags;
+
+/*
  * States are indexed by the cstate number,
  * which is also the index into the MWAIT hint array.
  * Thus C0 is a dummy.
@@ -276,6 +283,15 @@  static struct notifier_block setup_broad
 	.notifier_call = setup_broadcast_cpuhp_notify,
 };
 
+static void auto_demotion_disable(void *dummy)
+{
+	unsigned long long msr_bits;
+
+	rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
+	msr_bits &= ~auto_demotion_disable_flags;
+	wrmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
+}
+
 /*
  * intel_idle_probe()
  */
@@ -319,6 +335,8 @@  static int intel_idle_probe(void)
 	case 0x25:	/* Westmere */
 	case 0x2C:	/* Westmere */
 		cpuidle_state_table = nehalem_cstates;
+		auto_demotion_disable_flags =
+			(NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE);
 		break;
 
 	case 0x1C:	/* 28 - Atom Processor */
@@ -431,6 +449,8 @@  static int intel_idle_cpuidle_devices_in
 			return -EIO;
 		}
 	}
+	if (auto_demotion_disable_flags)
+		smp_call_function(auto_demotion_disable, NULL, 1);
 
 	return 0;
 }