Message ID | 1466756638-2362-1-git-send-email-shreyas@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
From: Shreyas B. Prabhu > Sent: 24 June 2016 09:24 > > Snooze is a poll idle state in powernv and pseries platforms. Snooze > has a timeout so that if a cpu stays in snooze for more than target > residency of the next available idle state, then it would exit thereby > giving chance to the cpuidle governor to re-evaluate and > promote the cpu to a deeper idle state. Therefore whenever snooze exits > due to this timeout, its last_residency will be target_residency of next > deeper state. > > commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()") > changed the math around last_residency calculation. Specifically, while > converting last_residency value from nanoseconds to microseconds it does > right shift by 10. Due to this, in snooze timeout exit scenarios > last_residency calculated is roughly 2.3% less than target_residency of > next available state. This pattern is picked up get_typical_interval() > in the menu governor and therefore expected_interval in menu_select() is > frequently less than the target_residency of any state but snooze. > > Due to this we are entering snooze at a higher rate, thereby affecting > the single thread performance. > > Fix this by replacing right shift by 10 with /1000 while calculating > last_residency. > > Reported-by: Anton Blanchard <anton@samba.org> > Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> > --- > Changes in v2 > ============= > - Fixing it in the cpuidle core code instead of driver code. > > drivers/cpuidle/cpuidle.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > index a4d0059..30d67a8 100644 > --- a/drivers/cpuidle/cpuidle.c > +++ b/drivers/cpuidle/cpuidle.c > @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, > local_irq_enable(); > > /* > - * local_clock() returns the time in nanosecond, let's shift > - * by 10 (divide by 1024) to have microsecond based time. > + * local_clock() returns the time in nanosecond, let's > + * divide by 1000 to have microsecond based time. > */ > - diff = (time_end - time_start) >> 10; > + diff = (time_end - time_start) / 1000; > if (diff > INT_MAX) > diff = INT_MAX; The intent of the >> 10 was probably to avoid an expensive 64bit divide. So maybe something like: diff = time_end - time_start; if (diff >= INT_MAX/2) diff_32 = INT_MAX/2/1000; else diff_32 = diff; diff_32 += diff_32 >> 6; diff_32 >>= 10; } Adding an extra 1/32 makes the division by be something slightly below 1000. David
Hi,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
drivers/built-in.o: In function `cpuidle_enter_state':
>> (.text+0x31e0d2): undefined reference to `__udivdi3'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On 06/24/2016 11:00 AM, David Laight wrote: > From: Shreyas B. Prabhu >> Sent: 24 June 2016 09:24 >> >> Snooze is a poll idle state in powernv and pseries platforms. Snooze >> has a timeout so that if a cpu stays in snooze for more than target >> residency of the next available idle state, then it would exit thereby >> giving chance to the cpuidle governor to re-evaluate and >> promote the cpu to a deeper idle state. Therefore whenever snooze exits >> due to this timeout, its last_residency will be target_residency of next >> deeper state. >> >> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()") >> changed the math around last_residency calculation. Specifically, while >> converting last_residency value from nanoseconds to microseconds it does >> right shift by 10. Due to this, in snooze timeout exit scenarios >> last_residency calculated is roughly 2.3% less than target_residency of >> next available state. This pattern is picked up get_typical_interval() >> in the menu governor and therefore expected_interval in menu_select() is >> frequently less than the target_residency of any state but snooze. >> >> Due to this we are entering snooze at a higher rate, thereby affecting >> the single thread performance. >> >> Fix this by replacing right shift by 10 with /1000 while calculating >> last_residency. >> >> Reported-by: Anton Blanchard <anton@samba.org> >> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> >> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> >> --- >> Changes in v2 >> ============= >> - Fixing it in the cpuidle core code instead of driver code. >> >> drivers/cpuidle/cpuidle.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c >> index a4d0059..30d67a8 100644 >> --- a/drivers/cpuidle/cpuidle.c >> +++ b/drivers/cpuidle/cpuidle.c >> @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, >> local_irq_enable(); >> >> /* >> - * local_clock() returns the time in nanosecond, let's shift >> - * by 10 (divide by 1024) to have microsecond based time. >> + * local_clock() returns the time in nanosecond, let's >> + * divide by 1000 to have microsecond based time. >> */ >> - diff = (time_end - time_start) >> 10; >> + diff = (time_end - time_start) / 1000; do_div ? >> if (diff > INT_MAX) >> diff = INT_MAX; > > The intent of the >> 10 was probably to avoid an expensive 64bit divide. > So maybe something like: > diff = time_end - time_start; > if (diff >= INT_MAX/2) > diff_32 = INT_MAX/2/1000; > else > diff_32 = diff; > diff_32 += diff_32 >> 6; > diff_32 >>= 10; > } > > Adding an extra 1/32 makes the division by be something slightly below 1000.
On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote: > The intent of the >> 10 was probably to avoid an expensive 64bit divide. > So maybe something like: > diff = time_end - time_start; > if (diff >= INT_MAX/2) > diff_32 = INT_MAX/2/1000; > else > diff_32 = diff; > diff_32 += diff_32 >> 6; > diff_32 >>= 10; > } > > Adding an extra 1/32 makes the division by be something slightly below 1000. Why not change the definition of the time to nanoseconds and update the users accordingly? I see that cpuidle_enter_state() writes to the last_residency value, and it is read in three places: ladder_select_state(), menu_update() and show_state_time() (for sysfs). If those functions are called less often than cpuidle_enter_state(), we could just move the division there. Since the divisor is constant, do_div() can convert it into a multiply and shift, or we could use your the code you suggest above, or use a 32-bit division most of the time: if (diff <= UINT_MAX) diff_32 = (u32)diff / NSECS_PER_USEC; else diff_32 = div_u64(diff, NSECS_PER_USEC; which gcc itself will turn into a multiplication or series of shifts on CPUs on which that is faster. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: sh-sh7785lcr_32bit_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh
All errors (new ones prefixed by >>):
drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:242: undefined reference to `__udivdi3'
vim +242 drivers/cpuidle/cpuidle.c
56cfbf74a Colin Cross 2012-05-07 236 dev->states_usage[entered_state].usage++;
56cfbf74a Colin Cross 2012-05-07 237 } else {
56cfbf74a Colin Cross 2012-05-07 238 dev->last_residency = 0;
56cfbf74a Colin Cross 2012-05-07 239 }
56cfbf74a Colin Cross 2012-05-07 240
56cfbf74a Colin Cross 2012-05-07 241 return entered_state;
56cfbf74a Colin Cross 2012-05-07 @242 }
56cfbf74a Colin Cross 2012-05-07 243
56cfbf74a Colin Cross 2012-05-07 244 /**
907e30f1b Daniel Lezcano 2014-03-03 245 * cpuidle_select - ask the cpuidle framework to choose an idle state
:::::: The code at line 242 was first introduced by commit
:::::: 56cfbf74a17c40f3a741398103c9f5d5a6806715 cpuidle: refactor out cpuidle_enter_state
:::::: TO: Colin Cross <ccross@android.com>
:::::: CC: Len Brown <len.brown@intel.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi, [auto build test ERROR on pm/linux-next] [also build test ERROR on v4.7-rc4 next-20160623] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next config: arm-multi_v5_defconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All errors (new ones prefixed by >>): drivers/built-in.o: In function `cpuidle_enter_state': >> drivers/cpuidle/cpuidle.c:224: undefined reference to `__aeabi_uldivmod' vim +224 drivers/cpuidle/cpuidle.c 218 local_irq_enable(); 219 220 /* 221 * local_clock() returns the time in nanosecond, let's 222 * divide by 1000 to have microsecond based time. 223 */ > 224 diff = (time_end - time_start) / 1000; 225 if (diff > INT_MAX) 226 diff = INT_MAX; 227 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On 06/24/2016 03:41 PM, Arnd Bergmann wrote: > On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote: >> The intent of the >> 10 was probably to avoid an expensive 64bit divide. >> So maybe something like: >> diff = time_end - time_start; >> if (diff >= INT_MAX/2) >> diff_32 = INT_MAX/2/1000; >> else >> diff_32 = diff; >> diff_32 += diff_32 >> 6; >> diff_32 >>= 10; >> } >> >> Adding an extra 1/32 makes the division by be something slightly below 1000. > > Why not change the definition of the time to nanoseconds and update > the users accordingly? > > I see that cpuidle_enter_state() writes to the last_residency value, > and it is read in three places: ladder_select_state(), menu_update() > and show_state_time() (for sysfs). > Updating show_state_time() to convert ns to us gets bit messy since show_state_##_name which is used for disable, usage and time simply returns state_usage->_name. And state_usage->time updation happens in cpuidle_enter_state() itself. > If those functions are called less often than cpuidle_enter_state(), > we could just move the division there. Since the divisor is constant, > do_div() can convert it into a multiply and shift, or we could use > your the code you suggest above, or use a 32-bit division most of > the time: > > if (diff <= UINT_MAX) > diff_32 = (u32)diff / NSECS_PER_USEC; > else > diff_32 = div_u64(diff, NSECS_PER_USEC; > > which gcc itself will turn into a multiplication or series of > shifts on CPUs on which that is faster. > I'm not sure which division method of the three suggested here to use. Does anyone have a strong preference? --Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote: > > If those functions are called less often than cpuidle_enter_state(), > > we could just move the division there. Since the divisor is constant, > > do_div() can convert it into a multiply and shift, or we could use > > your the code you suggest above, or use a 32-bit division most of > > the time: > > > > if (diff <= UINT_MAX) > > diff_32 = (u32)diff / NSECS_PER_USEC; > > else > > diff_32 = div_u64(diff, NSECS_PER_USEC; > > > > which gcc itself will turn into a multiplication or series of > > shifts on CPUs on which that is faster. > > > I'm not sure which division method of the three suggested here to use. > Does anyone have a strong preference? > It depends on how accurate we want it and how long we expect the times to be. The optimization for the 4.2 second cutoff for doing a 32-bit division only makes sense if the majority of the sleep times are below that. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Arnd Bergmann > Sent: 24 June 2016 20:43 > On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote: > > > If those functions are called less often than cpuidle_enter_state(), > > > we could just move the division there. Since the divisor is constant, > > > do_div() can convert it into a multiply and shift, or we could use > > > your the code you suggest above, or use a 32-bit division most of > > > the time: > > > > > > if (diff <= UINT_MAX) > > > diff_32 = (u32)diff / NSECS_PER_USEC; > > > else > > > diff_32 = div_u64(diff, NSECS_PER_USEC; > > > > > > which gcc itself will turn into a multiplication or series of > > > shifts on CPUs on which that is faster. > > > > > I'm not sure which division method of the three suggested here to use. > > Does anyone have a strong preference? > > > > It depends on how accurate we want it and how long we expect > the times to be. The optimization for the 4.2 second cutoff > for doing a 32-bit division only makes sense if the majority > of the sleep times are below that. It also depends if the code actually cares about the length of 'long' sleeps. I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64() result could be treated as 4.2 seconds without causing grief. Actually the cost of a 64bit divide after a 4 second sleep will be noise. OTOH a 64bit divide after a sleep that lasted a few ns will be significant. David -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/27/2016 02:29 PM, David Laight wrote: > From: Arnd Bergmann >> Sent: 24 June 2016 20:43 >> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote: >>>> If those functions are called less often than cpuidle_enter_state(), >>>> we could just move the division there. Since the divisor is constant, >>>> do_div() can convert it into a multiply and shift, or we could use >>>> your the code you suggest above, or use a 32-bit division most of >>>> the time: >>>> >>>> if (diff <= UINT_MAX) >>>> diff_32 = (u32)diff / NSECS_PER_USEC; >>>> else >>>> diff_32 = div_u64(diff, NSECS_PER_USEC; >>>> >>>> which gcc itself will turn into a multiplication or series of >>>> shifts on CPUs on which that is faster. >>>> >>> I'm not sure which division method of the three suggested here to use. >>> Does anyone have a strong preference? >>> >> >> It depends on how accurate we want it and how long we expect >> the times to be. The optimization for the 4.2 second cutoff >> for doing a 32-bit division only makes sense if the majority >> of the sleep times are below that. > > It also depends if the code actually cares about the length of 'long' sleeps. > I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64() > result could be treated as 4.2 seconds without causing grief. > > Actually the cost of a 64bit divide after a 4 second sleep will be noise. > OTOH a 64bit divide after a sleep that lasted a few ns will be significant. > Agreed. I'll use the code you suggested, with a small change- Using diff_32 += diff_32 >> 5 instead of diff_32 += diff_32 >> 6 since I want to err on the side of last_residency being more than actual. And for long sleep cases, I'll use div_u64(). Thanks, Shreyas -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
============= - Fixing it in the cpuidle core code instead of driver code. drivers/cpuidle/cpuidle.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index a4d0059..30d67a8 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, local_irq_enable(); /* - * local_clock() returns the time in nanosecond, let's shift - * by 10 (divide by 1024) to have microsecond based time. + * local_clock() returns the time in nanosecond, let's + * divide by 1000 to have microsecond based time. */ - diff = (time_end - time_start) >> 10; + diff = (time_end - time_start) / 1000; if (diff > INT_MAX) diff = INT_MAX;
Snooze is a poll idle state in powernv and pseries platforms. Snooze has a timeout so that if a cpu stays in snooze for more than target residency of the next available idle state, then it would exit thereby giving chance to the cpuidle governor to re-evaluate and promote the cpu to a deeper idle state. Therefore whenever snooze exits due to this timeout, its last_residency will be target_residency of next deeper state. commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()") changed the math around last_residency calculation. Specifically, while converting last_residency value from nanoseconds to microseconds it does right shift by 10. Due to this, in snooze timeout exit scenarios last_residency calculated is roughly 2.3% less than target_residency of next available state. This pattern is picked up get_typical_interval() in the menu governor and therefore expected_interval in menu_select() is frequently less than the target_residency of any state but snooze. Due to this we are entering snooze at a higher rate, thereby affecting the single thread performance. Fix this by replacing right shift by 10 with /1000 while calculating last_residency. Reported-by: Anton Blanchard <anton@samba.org> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> --- Changes in v2