diff mbox series

[3/3] intel_idle: add C0.2 state for Sapphire Rapids Xeon

Message ID 20230306123418.720679-4-dedekind1@gmail.com (mailing list archive)
State Changes Requested, archived
Headers show
Series Sapphire Rapids C0.x idle states support | expand

Commit Message

Artem Bityutskiy March 6, 2023, 12:34 p.m. UTC
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

Add Sapphire Rapids Xeon C0.2 state support. This state has a lower exit
latency comparing to C1, and saves energy comparing to POLL (in range of
5-20%).

This patch also improves performance (e.g., as measured by 'hackbench'),
because idle CPU power savings in C0.2 increase busy CPU power budget and
therefore, improve turbo boost of the busy CPU.

Suggested-by: Len Brown <len.brown@intel.com>
Suggested-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
---
 drivers/idle/intel_idle.c | 59 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra March 6, 2023, 3:32 p.m. UTC | #1
On Mon, Mar 06, 2023 at 02:34:18PM +0200, Artem Bityutskiy wrote:
> @@ -225,6 +229,27 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
>  	return 0;
>  }
>  
> +/**
> + * intel_idle_umwait_irq - Request C0.x using the 'umwait' instruction.
> + * @dev: cpuidle device of the target CPU.
> + * @drv: cpuidle driver (assumed to point to intel_idle_driver).
> + * @index: Target idle state index.
> + *
> + * Request C0.1 or C0.2 using 'umwait' instruction with interrupts enabled.
> + */
> +static __cpuidle int intel_idle_umwait_irq(struct cpuidle_device *dev,
> +					   struct cpuidle_driver *drv,
> +					   int index)
> +{
> +	u32 state = flg2MWAIT(drv->states[index].flags);
> +
> +	raw_local_irq_enable();
> +	umwait_idle(rdtsc() + umwait_limit, state);
> +	local_irq_disable();
> +
> +	return index;
> +}

Bad copy paste job there...

vmlinux.o: warning: objtool: intel_idle_umwait_irq+0x8c: call to trace_hardirqs_off() leaves .noinstr.text section
Rafael J. Wysocki March 7, 2023, 12:39 p.m. UTC | #2
On Mon, Mar 6, 2023 at 4:32 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 06, 2023 at 02:34:18PM +0200, Artem Bityutskiy wrote:
> > @@ -225,6 +229,27 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
> >       return 0;
> >  }
> >
> > +/**
> > + * intel_idle_umwait_irq - Request C0.x using the 'umwait' instruction.
> > + * @dev: cpuidle device of the target CPU.
> > + * @drv: cpuidle driver (assumed to point to intel_idle_driver).
> > + * @index: Target idle state index.
> > + *
> > + * Request C0.1 or C0.2 using 'umwait' instruction with interrupts enabled.
> > + */
> > +static __cpuidle int intel_idle_umwait_irq(struct cpuidle_device *dev,
> > +                                        struct cpuidle_driver *drv,
> > +                                        int index)
> > +{
> > +     u32 state = flg2MWAIT(drv->states[index].flags);
> > +
> > +     raw_local_irq_enable();
> > +     umwait_idle(rdtsc() + umwait_limit, state);
> > +     local_irq_disable();
> > +
> > +     return index;
> > +}
>
> Bad copy paste job there...
>
> vmlinux.o: warning: objtool: intel_idle_umwait_irq+0x8c: call to trace_hardirqs_off() leaves .noinstr.text section

Well, it would be kind of nice to say that this is related to commit
6d9c7f51b1d9 ("cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE
*again*") that is present in 6.3-rc1.
Peter Zijlstra March 8, 2023, 12:32 p.m. UTC | #3
On Tue, Mar 07, 2023 at 01:39:09PM +0100, Rafael J. Wysocki wrote:
> On Mon, Mar 6, 2023 at 4:32 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Mar 06, 2023 at 02:34:18PM +0200, Artem Bityutskiy wrote:
> > > @@ -225,6 +229,27 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
> > >       return 0;
> > >  }
> > >
> > > +/**
> > > + * intel_idle_umwait_irq - Request C0.x using the 'umwait' instruction.
> > > + * @dev: cpuidle device of the target CPU.
> > > + * @drv: cpuidle driver (assumed to point to intel_idle_driver).
> > > + * @index: Target idle state index.
> > > + *
> > > + * Request C0.1 or C0.2 using 'umwait' instruction with interrupts enabled.
> > > + */
> > > +static __cpuidle int intel_idle_umwait_irq(struct cpuidle_device *dev,
> > > +                                        struct cpuidle_driver *drv,
> > > +                                        int index)
> > > +{
> > > +     u32 state = flg2MWAIT(drv->states[index].flags);
> > > +
> > > +     raw_local_irq_enable();
> > > +     umwait_idle(rdtsc() + umwait_limit, state);
> > > +     local_irq_disable();
> > > +
> > > +     return index;
> > > +}
> >
> > Bad copy paste job there...
> >
> > vmlinux.o: warning: objtool: intel_idle_umwait_irq+0x8c: call to trace_hardirqs_off() leaves .noinstr.text section
> 
> Well, it would be kind of nice to say that this is related to commit
> 6d9c7f51b1d9 ("cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE
> *again*") that is present in 6.3-rc1.

Right, but he said the patches were based on -next, which would've had
that commit for a fair while too.
Artem Bityutskiy March 9, 2023, 8:01 a.m. UTC | #4
On Wed, 2023-03-08 at 13:32 +0100, Peter Zijlstra wrote:
> > Well, it would be kind of nice to say that this is related to commit
> > 6d9c7f51b1d9 ("cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE
> > *again*") that is present in 6.3-rc1.
> 
> Right, but he said the patches were based on -next, which would've had
> that commit for a fair while too.

I can see what is the problem from the above mentioned commit. But I struggle to
reproduce it. I tried 'make W=1', and 'tools/objtool/objtool -n vmlinux.o'. I
have 'CONFIG_HAVE_NOINSTR_VALIDATION' too.

I also tries this:

1. Build kernel with this patch-set
2. tools/objtool/objtool -n vmlinux.o > ~/tmp/before.txt  2>&1
3. Make clean; Build kernel with a fix (local_irq_disable() ->
raw_local_irq_disable())
4. tools/objtool/objtool -n vmlinux.o > ~/tmp/after.txt  2>&1
5. diff -u ~/tmp/before.txt ~/tmp/after.txt

And no difference.

Could you please help by giving a hint how to verify?
Peter Zijlstra March 14, 2023, 12:24 p.m. UTC | #5
On Thu, Mar 09, 2023 at 10:01:57AM +0200, Artem Bityutskiy wrote:

> Could you please help by giving a hint how to verify?

I think you need both:

DEBUG_ENTRY and TRACE_IRQFLAGS

I can reproduce with:

make O=defconfig-build defconfig
cd defconfig-build
../scripts/config --enable INTEL_IDLE
../scripts/config --enable DEBUG_ENTRY
../scripts/config --enable IRQSOFF_TRACER
cd ..
make O=defconfig-build -j32 vmlinux


But typically it is useful to have a .config with lots of tracing and
debug (including lockdep and debug_entry) enabled for this stuff.
Artem Bityutskiy March 17, 2023, 8:42 a.m. UTC | #6
On Tue, 2023-03-14 at 13:24 +0100, Peter Zijlstra wrote:
> ../scripts/config --enable INTEL_IDLE
> ../scripts/config --enable DEBUG_ENTRY
> ../scripts/config --enable IRQSOFF_TRACER

Thank you! I've checked that v2 of these patches addresses this issue. I've
added these to my defconfig too.

Artem.
diff mbox series

Patch

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 938c17f25d94..f7705a64d0e6 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -51,11 +51,13 @@ 
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
+#include <linux/units.h>
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
 #include <asm/nospec-branch.h>
 #include <asm/mwait.h>
 #include <asm/msr.h>
+#include <asm/tsc.h>
 #include <asm/fpu/api.h>
 
 #define INTEL_IDLE_VERSION "0.5.1"
@@ -73,6 +75,8 @@  static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
 
 static unsigned long auto_demotion_disable_flags;
 
+static u64 umwait_limit;
+
 static enum {
 	C1E_PROMOTION_PRESERVE,
 	C1E_PROMOTION_ENABLE,
@@ -225,6 +229,27 @@  static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
 	return 0;
 }
 
+/**
+ * intel_idle_umwait_irq - Request C0.x using the 'umwait' instruction.
+ * @dev: cpuidle device of the target CPU.
+ * @drv: cpuidle driver (assumed to point to intel_idle_driver).
+ * @index: Target idle state index.
+ *
+ * Request C0.1 or C0.2 using 'umwait' instruction with interrupts enabled.
+ */
+static __cpuidle int intel_idle_umwait_irq(struct cpuidle_device *dev,
+					   struct cpuidle_driver *drv,
+					   int index)
+{
+	u32 state = flg2MWAIT(drv->states[index].flags);
+
+	raw_local_irq_enable();
+	umwait_idle(rdtsc() + umwait_limit, state);
+	local_irq_disable();
+
+	return index;
+}
+
 /*
  * States are indexed by the cstate number,
  * which is also the index into the MWAIT hint array.
@@ -968,6 +993,13 @@  static struct cpuidle_state adl_n_cstates[] __initdata = {
 };
 
 static struct cpuidle_state spr_cstates[] __initdata = {
+	{
+		.name = "C0.2",
+		.desc = "UMWAIT C0.2",
+		.flags = MWAIT2flg(TPAUSE_C02_STATE) | CPUIDLE_FLAG_IRQ_ENABLE,
+		.exit_latency_ns = 100,
+		.target_residency_ns = 100,
+		.enter = &intel_idle_umwait_irq, },
 	{
 		.name = "C1",
 		.desc = "MWAIT 0x00",
@@ -1894,7 +1926,8 @@  static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 		/* Structure copy. */
 		drv->states[drv->state_count] = cpuidle_state_table[cstate];
 
-		if ((cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE) || force_irq_on) {
+		if (cpuidle_state_table[cstate].enter == intel_idle &&
+		    ((cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE) || force_irq_on)) {
 			printk("intel_idle: forced intel_idle_irq for state %d\n", cstate);
 			drv->states[drv->state_count].enter = intel_idle_irq;
 		}
@@ -1926,6 +1959,29 @@  static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 	}
 }
 
+/**
+ * umwait_limit_init - initialize time limit value for 'umwait'.
+ * @drv: cpuidle driver structure to initialize.
+ *
+ * C0.1 and C0.2 (later C0.x) idle states are requested via the 'umwait'
+ * instruction. The 'umwait' instruction requires the "deadline" - the TSC
+ * counter value to break out of C0.x (unless it broke out because of an
+ * interrupt or some other event).
+ *
+ * The deadline is specified as an absolute TSC value, and it is calculated as
+ * current TSC value + 'umwait_limit'. This function initializes the
+ * 'umwait_limit' variable to count of cycles per tick. The motivation is:
+ *   * the tick is not disabled for shallow states like C0.x so, so idle will
+ *     not last longer than a tick anyway
+ *   * limit idle time to give cpuidle a chance to re-evaluate its C-state
+ *     selection decision and possibly select a deeper C-state.
+ */
+static void __init umwait_limit_init(void)
+{
+	umwait_limit = (u64)TICK_NSEC * tsc_khz;
+	do_div(umwait_limit, MICRO);
+}
+
 /**
  * intel_idle_cpuidle_driver_init - Create the list of available idle states.
  * @drv: cpuidle driver structure to initialize.
@@ -1933,6 +1989,7 @@  static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 static void __init intel_idle_cpuidle_driver_init(struct cpuidle_driver *drv)
 {
 	cpuidle_poll_state_init(drv);
+	umwait_limit_init();
 
 	if (disabled_states_mask & BIT(0))
 		drv->states[0].flags |= CPUIDLE_FLAG_OFF;