Message ID | 1670416895-50172-1-git-send-email-lirongqing@baidu.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | [1/2,v2] cpuidle-haltpoll: Replace default_idle with arch_cpu_idle | expand |
On Wed, Dec 7, 2022 at 1:42 PM <lirongqing@baidu.com> wrote: > > From: Li RongQing <lirongqing@baidu.com> > > When KVM guest has MWAIT and mwait_idle is used as default idle function, > but once cpuidle-haltpoll is loaded, default_idle in default_enter_idle is > used, which is using HLT, and cause a performance regression. As the commit > aebef63cf7ff ("x86: Remove vendor checks from prefer_mwait_c1_over_halt") > explains that mwait is preferred > > so replace default_idle with arch_cpu_idle which can using MWAIT > optimization. > > latency of sockperf ping-pong localhost test is reduced by more 20% > unixbench has 5% performance improvements for single core > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Signed-off-by: Li RongQing <lirongqing@baidu.com> > --- > arch/x86/kernel/process.c | 1 + > drivers/cpuidle/cpuidle-haltpoll.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index c21b734..303afad 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -721,6 +721,7 @@ void arch_cpu_idle(void) > { > x86_idle(); > } > +EXPORT_SYMBOL(arch_cpu_idle); Why do you need this export at all? > > /* > * We use this if we don't have any better idle routine.. > diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c > index 3a39a7f..e66df22 100644 > --- a/drivers/cpuidle/cpuidle-haltpoll.c > +++ b/drivers/cpuidle/cpuidle-haltpoll.c > @@ -32,7 +32,7 @@ static int default_enter_idle(struct cpuidle_device *dev, > local_irq_enable(); > return index; > } > - default_idle(); > + arch_cpu_idle(); > return index; > } > > -- > 2.9.4 >
> -----Original Message----- > From: Rafael J. Wysocki <rafael@kernel.org> > Sent: Wednesday, December 7, 2022 10:38 PM > To: Li,Rongqing <lirongqing@baidu.com> > Cc: tglx@linutronix.de; mingo@redhat.com; bp@alien8.de; > dave.hansen@linux.intel.com; x86@kernel.org; rafael@kernel.org; > daniel.lezcano@linaro.org; peterz@infradead.org; akpm@linux-foundation.org; > tony.luck@intel.com; jpoimboe@kernel.org; linux-kernel@vger.kernel.org; > linux-pm@vger.kernel.org > Subject: Re: [PATCH 1/2][v2] cpuidle-haltpoll: Replace default_idle with > arch_cpu_idle > > On Wed, Dec 7, 2022 at 1:42 PM <lirongqing@baidu.com> wrote: > > > > From: Li RongQing <lirongqing@baidu.com> > > > > When KVM guest has MWAIT and mwait_idle is used as default idle > > function, but once cpuidle-haltpoll is loaded, default_idle in > > default_enter_idle is used, which is using HLT, and cause a > > performance regression. As the commit aebef63cf7ff ("x86: Remove > > vendor checks from prefer_mwait_c1_over_halt") explains that mwait is > > preferred > > > > so replace default_idle with arch_cpu_idle which can using MWAIT > > optimization. > > > > latency of sockperf ping-pong localhost test is reduced by more 20% > > unixbench has 5% performance improvements for single core > > > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Signed-off-by: Li RongQing <lirongqing@baidu.com> > > --- > > arch/x86/kernel/process.c | 1 + > > drivers/cpuidle/cpuidle-haltpoll.c | 2 +- > > 2 files changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > index c21b734..303afad 100644 > > --- a/arch/x86/kernel/process.c > > +++ b/arch/x86/kernel/process.c > > @@ -721,6 +721,7 @@ void arch_cpu_idle(void) { > > x86_idle(); > > } > > +EXPORT_SYMBOL(arch_cpu_idle); > > Why do you need this export at all? > When cpuidle-haltpoll is built as module, if arch_cpu_idle is not export, cpuidle-haltpoll will complain "arch_cpu_idle" undefined Like ERROR: modpost: "arch_cpu_idle" [drivers/cpuidle/cpuidle-haltpoll.ko] undefined! make[1]: *** [scripts/Makefile.modpost:126: Module.symvers] Error 1 make: *** [Makefile:1944: modpost] Error 2 Error: Make failed! I will add the reason to v3 Thanks -Li
On Thu, Dec 8, 2022 at 2:48 AM Li,Rongqing <lirongqing@baidu.com> wrote: > > > > > -----Original Message----- > > From: Rafael J. Wysocki <rafael@kernel.org> > > Sent: Wednesday, December 7, 2022 10:38 PM > > To: Li,Rongqing <lirongqing@baidu.com> > > Cc: tglx@linutronix.de; mingo@redhat.com; bp@alien8.de; > > dave.hansen@linux.intel.com; x86@kernel.org; rafael@kernel.org; > > daniel.lezcano@linaro.org; peterz@infradead.org; akpm@linux-foundation.org; > > tony.luck@intel.com; jpoimboe@kernel.org; linux-kernel@vger.kernel.org; > > linux-pm@vger.kernel.org > > Subject: Re: [PATCH 1/2][v2] cpuidle-haltpoll: Replace default_idle with > > arch_cpu_idle > > > > On Wed, Dec 7, 2022 at 1:42 PM <lirongqing@baidu.com> wrote: > > > > > > From: Li RongQing <lirongqing@baidu.com> > > > > > > When KVM guest has MWAIT and mwait_idle is used as default idle > > > function, but once cpuidle-haltpoll is loaded, default_idle in > > > default_enter_idle is used, which is using HLT, and cause a > > > performance regression. As the commit aebef63cf7ff ("x86: Remove > > > vendor checks from prefer_mwait_c1_over_halt") explains that mwait is > > > preferred > > > > > > so replace default_idle with arch_cpu_idle which can using MWAIT > > > optimization. > > > > > > latency of sockperf ping-pong localhost test is reduced by more 20% > > > unixbench has 5% performance improvements for single core > > > > > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > > > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > Signed-off-by: Li RongQing <lirongqing@baidu.com> > > > --- > > > arch/x86/kernel/process.c | 1 + > > > drivers/cpuidle/cpuidle-haltpoll.c | 2 +- > > > 2 files changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > > index c21b734..303afad 100644 > > > --- a/arch/x86/kernel/process.c > > > +++ b/arch/x86/kernel/process.c > > > @@ -721,6 +721,7 @@ void arch_cpu_idle(void) { > > > x86_idle(); > > > } > > > +EXPORT_SYMBOL(arch_cpu_idle); > > > > Why do you need this export at all? > > > > When cpuidle-haltpoll is built as module, But it isn't now. > if arch_cpu_idle is not export, cpuidle-haltpoll will complain "arch_cpu_idle" undefined So no, this won't happen.
> > > > When cpuidle-haltpoll is built as module, > > But it isn't now. Centos is compiling it as module, it will fail; Other user wants to compile it as module, they will fail, Syzbot random configuration building will fail Unless prohibit to build it as module as below: config HALTPOLL_CPUIDLE - tristate "Halt poll cpuidle driver" + bool "Halt poll cpuidle driver" depends on X86 && KVM_GUEST default y help thanks -Li
On Thu, Dec 8, 2022 at 1:43 PM Li,Rongqing <lirongqing@baidu.com> wrote: > > > > > > > When cpuidle-haltpoll is built as module, > > > > But it isn't now. > > Centos is compiling it as module, it will fail; > Other user wants to compile it as module, they will fail, > Syzbot random configuration building will fail > > Unless prohibit to build it as module as below: > > config HALTPOLL_CPUIDLE > - tristate "Halt poll cpuidle driver" > + bool "Halt poll cpuidle driver" > depends on X86 && KVM_GUEST > default y > help Ah, OK. Sorry about the confusion. Yes, the driver (not the governor, though), can be modular, so yes you need the export, but please change it to EXPORT_SYMBOL_GPL().
> > latency of sockperf ping-pong localhost test is reduced by more 20% > > unixbench has 5% performance improvements for single core > > Sorry, The upper data are wrong since wrong governor is used when guest has mwait, and haltpoll driver is loaded and haltpoll governor is used. on Intel cpu, unixbench score are nearly same, but sockperf has 20% low latency on AMD milan cpu, the sockperf latency are similar , but unixbench single core score has 10% loss, because AMD cpu maybe has weak smt capability Replace default idle with arch cpu idle has little effect Thanks -Li I
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c21b734..303afad 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -721,6 +721,7 @@ void arch_cpu_idle(void) { x86_idle(); } +EXPORT_SYMBOL(arch_cpu_idle); /* * We use this if we don't have any better idle routine.. diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c index 3a39a7f..e66df22 100644 --- a/drivers/cpuidle/cpuidle-haltpoll.c +++ b/drivers/cpuidle/cpuidle-haltpoll.c @@ -32,7 +32,7 @@ static int default_enter_idle(struct cpuidle_device *dev, local_irq_enable(); return index; } - default_idle(); + arch_cpu_idle(); return index; }