Message ID | 1432902728-31476-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Hi Shilpa, The subject does not convey the purpose of this patch clearly IMO. I would definitely suggest changing the subject to something like "Auto promotion of snooze to deeper idle state" or similar. On 05/29/2015 06:02 PM, Shilpasri G Bhat wrote: > The idle cpus which stay in snooze for a long period can degrade the > perfomance of the sibling cpus. If the cpu stays in snooze for more > than target residency of the next available idle state, then exit from > snooze. This gives a chance to the cpuidle governor to re-evaluate the > last idle state of the cpu to promote it to deeper idle states. > > Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> > --- > drivers/cpuidle/cpuidle-powernv.c | 12 ++++++++++++ > drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c > index bb9e2b6..07135e0 100644 > --- a/drivers/cpuidle/cpuidle-pseries.c > +++ b/drivers/cpuidle/cpuidle-pseries.c > @@ -27,6 +27,8 @@ struct cpuidle_driver pseries_idle_driver = { > > static int max_idle_state; > static struct cpuidle_state *cpuidle_state_table; > +static u64 snooze_timeout; > +static bool snooze_timeout_en; > > static inline void idle_loop_prolog(unsigned long *in_purr) > { > @@ -58,14 +60,18 @@ static int snooze_loop(struct cpuidle_device *dev, > int index) > { > unsigned long in_purr; > + u64 snooze_exit_time; > > idle_loop_prolog(&in_purr); > local_irq_enable(); > set_thread_flag(TIF_POLLING_NRFLAG); > + snooze_exit_time = get_tb() + snooze_timeout; > > while (!need_resched()) { > HMT_low(); > HMT_very_low(); > + if (snooze_timeout_en && get_tb() > snooze_exit_time) > + break; > } > > HMT_medium(); > @@ -244,6 +250,11 @@ static int pseries_idle_probe(void) > } else > return -ENODEV; > > + if (max_idle_state > 1) { > + snooze_timeout_en = true; > + snooze_timeout = cpuidle_state_table[1].target_residency * > + tb_ticks_per_usec; > + } Any idea why we don't have snooze defined on the shared lpar configuration ? Regards Preeti U Murthy > return 0; > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Preeti U Murthy <preeti@linux.vnet.ibm.com> [2015-05-29 19:17:17]: [snip] > > + if (max_idle_state > 1) { > > + snooze_timeout_en = true; > > + snooze_timeout = cpuidle_state_table[1].target_residency * > > + tb_ticks_per_usec; > > + } > > Any idea why we don't have snooze defined on the shared lpar configuration ? In shared lpar case, spinning in guest context may potentially take away cycles from other lpars waiting to run on the same physical cpu. So the policy in shared lpar case is to let PowerVM hypervisor know immediately that the guest cpu is idle which will allow the hypervisor to use the cycles for other tasks/lpars. --Vaidy -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/30/2015 11:31 AM, Vaidyanathan Srinivasan wrote: > * Preeti U Murthy <preeti@linux.vnet.ibm.com> [2015-05-29 19:17:17]: > > [snip] > >>> + if (max_idle_state > 1) { >>> + snooze_timeout_en = true; >>> + snooze_timeout = cpuidle_state_table[1].target_residency * >>> + tb_ticks_per_usec; >>> + } >> >> Any idea why we don't have snooze defined on the shared lpar configuration ? > > In shared lpar case, spinning in guest context may potentially take > away cycles from other lpars waiting to run on the same physical cpu. > > So the policy in shared lpar case is to let PowerVM hypervisor know > immediately that the guest cpu is idle which will allow the hypervisor > to use the cycles for other tasks/lpars. > Oh Ok! Thanks for the clarification ! Regards Preeti U Murthy > --Vaidy > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2015-05-30 at 11:31 +0530, Vaidyanathan Srinivasan wrote: > In shared lpar case, spinning in guest context may potentially take > away cycles from other lpars waiting to run on the same physical cpu. > > So the policy in shared lpar case is to let PowerVM hypervisor know > immediately that the guest cpu is idle which will allow the hypervisor > to use the cycles for other tasks/lpars. But that will have negative side effects under KVM no ? Suresh mentioned something with his new directed interrupts code that we had many cases where the interrupts ended up arriving shortly after we exited to host for NAP'ing ... Snooze might fix it... Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Benjamin Herrenschmidt <benh@au1.ibm.com> [2015-05-30 20:38:22]: > On Sat, 2015-05-30 at 11:31 +0530, Vaidyanathan Srinivasan wrote: > > In shared lpar case, spinning in guest context may potentially take > > away cycles from other lpars waiting to run on the same physical cpu. > > > > So the policy in shared lpar case is to let PowerVM hypervisor know > > immediately that the guest cpu is idle which will allow the hypervisor > > to use the cycles for other tasks/lpars. > > But that will have negative side effects under KVM no ? Yes, you have a good point. If one of the thread in the core goes to cede, it can still come back quickly since the KVM guest context is not switched yet. But in single threaded guest, this can force an unnecessary exit/context switch overhead. Now that we have fixed the snooze loop to be bounded and exit predictably, KVM guest should actually use snooze state to improve latency. I will test this scenario and enable snooze state for KVM guest. > Suresh mentioned something with his new directed interrupts code that we > had many cases where the interrupts ended up arriving shortly after we > exited to host for NAP'ing ... > > Snooze might fix it... Right. This scenario is worth experimenting and then introduce snooze loop for guest. --Vaidy -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 5937207..1e3ef5e 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -29,18 +29,25 @@ struct cpuidle_driver powernv_idle_driver = { static int max_idle_state; static struct cpuidle_state *cpuidle_state_table; +static u64 snooze_timeout; +static bool snooze_timeout_en; static int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { + u64 snooze_exit_time; + local_irq_enable(); set_thread_flag(TIF_POLLING_NRFLAG); + snooze_exit_time = get_tb() + snooze_timeout; ppc64_runlatch_off(); while (!need_resched()) { HMT_low(); HMT_very_low(); + if (snooze_timeout_en && get_tb() > snooze_exit_time) + break; } HMT_medium(); @@ -252,6 +259,11 @@ static int powernv_idle_probe(void) cpuidle_state_table = powernv_states; /* Device tree can indicate more idle states */ max_idle_state = powernv_add_idle_states(); + if (max_idle_state > 1) { + snooze_timeout_en = true; + snooze_timeout = powernv_states[1].target_residency * + tb_ticks_per_usec; + } } else return -ENODEV; diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index bb9e2b6..07135e0 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -27,6 +27,8 @@ struct cpuidle_driver pseries_idle_driver = { static int max_idle_state; static struct cpuidle_state *cpuidle_state_table; +static u64 snooze_timeout; +static bool snooze_timeout_en; static inline void idle_loop_prolog(unsigned long *in_purr) { @@ -58,14 +60,18 @@ static int snooze_loop(struct cpuidle_device *dev, int index) { unsigned long in_purr; + u64 snooze_exit_time; idle_loop_prolog(&in_purr); local_irq_enable(); set_thread_flag(TIF_POLLING_NRFLAG); + snooze_exit_time = get_tb() + snooze_timeout; while (!need_resched()) { HMT_low(); HMT_very_low(); + if (snooze_timeout_en && get_tb() > snooze_exit_time) + break; } HMT_medium(); @@ -244,6 +250,11 @@ static int pseries_idle_probe(void) } else return -ENODEV; + if (max_idle_state > 1) { + snooze_timeout_en = true; + snooze_timeout = cpuidle_state_table[1].target_residency * + tb_ticks_per_usec; + } return 0; }
The idle cpus which stay in snooze for a long period can degrade the perfomance of the sibling cpus. If the cpu stays in snooze for more than target residency of the next available idle state, then exit from snooze. This gives a chance to the cpuidle governor to re-evaluate the last idle state of the cpu to promote it to deeper idle states. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> --- drivers/cpuidle/cpuidle-powernv.c | 12 ++++++++++++ drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++++ 2 files changed, 23 insertions(+)