Message ID | 20210315114827.46036-3-psampat@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | CPU-Idle latency selftest framework | expand |
Hi Pratik, It just so happens that I have been trying Artem's version this last week, so I tried yours. On Mon, Mar 15, 2021 at 4:49 AM Pratik Rajesh Sampat <psampat@linux.ibm.com> wrote: > ... > To run this test specifically: > $ make -C tools/testing/selftests TARGETS="cpuidle" run_tests While I suppose it should have been obvious, I interpreted the "$" sign to mean I could run as a regular user, which I can not. > There are a few optinal arguments too that the script can take > [-h <help>] > [-m <location of the module>] > [-o <location of the output>] > [-v <verbose> (run on all cpus)] > Default Output location in: tools/testing/cpuidle/cpuidle.log Isn't it: tools/testing/selftests/cpuidle/cpuidle.log ? At least, that is where my file was. Other notes: No idle state for CPU 0 ever gets disabled. I assume this is because CPU 0 can never be offline, so that bit of code (Disable all stop states) doesn't find its state. By the way, processor = Intel i5-9600K The system is left with all idle states disabled, well not for CPU 0 as per the above comment. The suggestion is to restore them, otherwise my processor hogs 42 watts instead of 2. My results are highly variable per test. My system is very idle: Example (from turbostat at 6 seconds sample rate): Busy% Bzy_MHz IRQ PkgTmp PkgWatt RAMWatt 0.03 4600 153 28 2.03 1.89 0.01 4600 103 29 2.03 1.89 0.05 4600 115 29 2.08 1.89 0.01 4600 95 28 2.09 1.89 0.03 4600 114 28 2.11 1.89 0.01 4600 107 29 2.07 1.89 0.02 4600 102 29 2.11 1.89 ... ... Doug
Hi Doug, Thanks for trying these patches out. On 18/03/21 2:30 am, Doug Smythies wrote: > Hi Pratik, > > It just so happens that I have been trying Artem's version this last > week, so I tried yours. > > On Mon, Mar 15, 2021 at 4:49 AM Pratik Rajesh Sampat > <psampat@linux.ibm.com> wrote: > ... >> To run this test specifically: >> $ make -C tools/testing/selftests TARGETS="cpuidle" run_tests > While I suppose it should have been obvious, I interpreted > the "$" sign to mean I could run as a regular user, which I can not. Ah yes, this does need root privileges, I should have prefixed the command with sudo in the instructions for better understanding. >> There are a few optinal arguments too that the script can take >> [-h <help>] >> [-m <location of the module>] >> [-o <location of the output>] >> [-v <verbose> (run on all cpus)] >> Default Output location in: tools/testing/cpuidle/cpuidle.log > Isn't it: > > tools/testing/selftests/cpuidle/cpuidle.log My bad, It was a typing error. I missed the "selftest" directory while typing it out. > ? At least, that is where my file was. > > Other notes: > > No idle state for CPU 0 ever gets disabled. > I assume this is because CPU 0 can never be offline, > so that bit of code (Disable all stop states) doesn't find its state. > By the way, processor = Intel i5-9600K I had tried these patches on an IBM POWER 9 processor and disabling CPU0's idle state works there. However, it does make sense for some processors to treat CPU 0 differently. Maybe I could write in a case if idle state disabling fails for a CPU then we just skip it? > The system is left with all idle states disabled, well not for CPU 0 > as per the above comment. The suggestion is to restore them, > otherwise my processor hogs 42 watts instead of 2. > > My results are highly variable per test. Question: Do you notice high variability with IPI test, Timer test or both? I can think of two reasons for high run to run variance: 1. If you observe variance in timer tests, then I believe there could a mechanism of "C-state pre-wake" on some Intel machines at play here, which can pre-wake a CPU from an idle state when timers are armed. I'm not sure if the Intel platform that you're running on does that or not. Artem had described this behavior to me a while ago and I think his wult page describes this behavior in more detail: https://intel.github.io/wult/#c-state-pre-wake 2. I have noticed variability in results when there are kernel book-keeping or jitter tasks scheduled from time to time on an otherwise idle core. In the full per-CPU logs at tools/testing/selftests/cpuidle/cpuidle.log can you spot any obvious outliers per-CPU state? Also you may want to run the test in verbose mode which runs for all the threads of each CPU with the following: "sudo ./cpuidle.sh -v". While latency mostly matters for per-core basis but it still maybe a good idea to see if that changes anything with the observations. -- Thanks and regards, Pratik > My system is very idle: > Example (from turbostat at 6 seconds sample rate): > Busy% Bzy_MHz IRQ PkgTmp PkgWatt RAMWatt > 0.03 4600 153 28 2.03 1.89 > 0.01 4600 103 29 2.03 1.89 > 0.05 4600 115 29 2.08 1.89 > 0.01 4600 95 28 2.09 1.89 > 0.03 4600 114 28 2.11 1.89 > 0.01 4600 107 29 2.07 1.89 > 0.02 4600 102 29 2.11 1.89 > > ... > > ... Doug
On Wed, Mar 17, 2021 at 11:44 PM Pratik Sampat <psampat@linux.ibm.com> wrote: > > Hi Doug, > Thanks for trying these patches out. > > On 18/03/21 2:30 am, Doug Smythies wrote: > > Hi Pratik, > > > > It just so happens that I have been trying Artem's version this last > > week, so I tried yours. > > > > On Mon, Mar 15, 2021 at 4:49 AM Pratik Rajesh Sampat > > <psampat@linux.ibm.com> wrote: > > ... ... > > Other notes: > > > > No idle state for CPU 0 ever gets disabled. > > I assume this is because CPU 0 can never be offline, > > so that bit of code (Disable all stop states) doesn't find its state. > > By the way, processor = Intel i5-9600K > > I had tried these patches on an IBM POWER 9 processor and disabling CPU0's idle > state works there. However, it does make sense for some processors to treat CPU > 0 differently. > Maybe I could write in a case if idle state disabling fails for a CPU then we > just skip it? I didn't try it, I just did a hack so I could continue for this reply. > > The system is left with all idle states disabled, well not for CPU 0 > > as per the above comment. The suggestion is to restore them, > > otherwise my processor hogs 42 watts instead of 2. > > > > My results are highly variable per test. > > Question: Do you notice high variability with IPI test, Timer test or both? The IPI test has less variability than the Timer test. > > I can think of two reasons for high run to run variance: > > 1. If you observe variance in timer tests, then I believe there could a > mechanism of "C-state pre-wake" on some Intel machines at play here, which can > pre-wake a CPU from an idle state when timers are armed. I'm not sure if the > Intel platform that you're running on does that or not. > > Artem had described this behavior to me a while ago and I think his wult page > describes this behavior in more detail: > https://intel.github.io/wult/#c-state-pre-wake Yes, I have reviewed all the references. And yes, I think my processors have the pre-wake stuff. I do not have the proper hardware to do the Artem pre-wake workaround method, but might buy it in future. > 2. I have noticed variability in results when there are kernel book-keeping or > jitter tasks scheduled from time to time on an otherwise idle core. > In the full per-CPU logs at tools/testing/selftests/cpuidle/cpuidle.log can you > spot any obvious outliers per-CPU state? Yes. I'll just paste in an example cpuidle.log file having used the -v option below, along with my hack job diff. doug@s19:~/temp-k-git/linux/tools/testing/selftests/cpuidle$ cat cpuidle.log.v3-1 --IPI Latency Test--- --Baseline IPI Latency measurement: CPU Busy-- SRC_CPU DEST_CPU IPI_Latency(ns) 0 0 140 0 1 632 0 2 675 0 3 671 0 4 675 0 5 767 0 6 653 0 7 826 0 8 819 0 9 615 0 10 758 0 11 758 Baseline Avg IPI latency(ns): 665 ---Enabling state: 0--- SRC_CPU DEST_CPU IPI_Latency(ns) 0 0 76 0 1 484 0 2 494 0 3 539 0 4 498 0 5 491 0 6 474 0 7 434 0 8 544 0 9 476 0 10 447 0 11 467 Expected IPI latency(ns): 0 Observed Avg IPI latency(ns) - State 0: 452 ---Enabling state: 1--- SRC_CPU DEST_CPU IPI_Latency(ns) 0 0 72 0 1 1081 0 2 821 0 3 1486 0 4 1022 0 5 960 0 6 1634 0 7 933 0 8 1032 0 9 1046 0 10 1430 0 11 1338 Expected IPI latency(ns): 1000 Observed Avg IPI latency(ns) - State 1: 1071 ---Enabling state: 2--- SRC_CPU DEST_CPU IPI_Latency(ns) 0 0 264 0 1 30836 0 2 30562 0 3 30748 0 4 35286 0 5 30978 0 6 1952 0 7 36066 0 8 30670 0 9 30605 0 10 30635 0 11 35423 Expected IPI latency(ns): 120000 Observed Avg IPI latency(ns) - State 2: 27002 ---Enabling state: 3--- SRC_CPU DEST_CPU IPI_Latency(ns) 0 0 71 0 1 30853 0 2 32095 0 3 32661 0 4 30230 0 5 34348 0 6 2012 0 7 30816 0 8 30908 0 9 31130 0 10 34150 0 11 32050 Expected IPI latency(ns): 1034000 Observed Avg IPI latency(ns) - State 3: 26777 --Timeout Latency Test-- --Baseline Timeout Latency measurement: CPU Busy-- Wakeup_src Baseline_delay(ns) 0 453 1 568 2 387 3 337 4 433 5 579 6 330 7 400 8 561 9 544 10 569 11 523 Baseline Avg timeout diff(ns): 473 ---Enabling state: 0--- Wakeup_src Baseline_delay(ns) Delay(ns) 0 399 1 388 2 352 3 385 4 334 5 415 6 320 7 356 8 401 9 379 10 339 11 384 Expected timeout(ns): 200 Observed Avg timeout diff(ns) - State 0: 371 ---Enabling state: 1--- Wakeup_src Baseline_delay(ns) Delay(ns) 0 666 1 575 2 608 3 590 4 608 5 552 6 582 7 593 8 597 9 587 10 588 11 610 Expected timeout(ns): 1200 Observed Avg timeout diff(ns) - State 1: 596 ---Enabling state: 2--- Wakeup_src Baseline_delay(ns) Delay(ns) 0 36386 1 1069 2 866 3 884 4 850 5 55642 6 408082 7 1184 8 406075 9 406830 10 414105 11 406594 Expected timeout(ns): 360200 Observed Avg timeout diff(ns) - State 2: 178213 ---Enabling state: 3--- Wakeup_src Baseline_delay(ns) Delay(ns) 0 406049 1 913 2 410134 3 921 4 406237 5 950 6 407181 7 920 8 407678 9 894 10 406320 11 304161 Expected timeout(ns): 3102200 Observed Avg timeout diff(ns) - State 3: 229363 My hack job, (CPUs always online): diff --git a/tools/testing/selftests/cpuidle/cpuidle.sh b/tools/testing/selftests/cpuidle/cpuidle.sh index de5141d5b76b..70bdacda5e91 100755 --- a/tools/testing/selftests/cpuidle/cpuidle.sh +++ b/tools/testing/selftests/cpuidle/cpuidle.sh @@ -86,10 +86,6 @@ disable_idle() { for ((cpu=0; cpu<NUM_CPUS; cpu++)) do - local cpu_status=$(cpu_is_online $cpu) - if [ $cpu_status == 0 ]; then - continue - fi for ((state=0; state<NUM_STATES; state++)) do echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable @@ -104,10 +100,6 @@ op_state() { for ((cpu=0; cpu<NUM_CPUS; cpu++)) do - local cpu_status=$(cpu_is_online $cpu) - if [ $cpu_status == 0 ]; then - continue - fi echo $1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable done } @@ -124,17 +116,6 @@ cpuidle_disable_state() op_state 1 $state } -cpu_is_online() -{ - cpu=$1 - if [ ! -f "/sys/devices/system/cpu/cpu$cpu/online" ]; then - echo 0 - return - fi - status=$(cat /sys/devices/system/cpu/cpu$cpu/online) - echo $status -} - # Extract latency in microseconds and convert to nanoseconds extract_latency() { @@ -179,10 +160,6 @@ run_ipi_tests() printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) do - local cpu_status=$(cpu_is_online $cpu) - if [ $cpu_status == 0 ]; then - continue - fi ipi_test_once "baseline" $cpu printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG avg_arr+=($ipi_latency) @@ -198,10 +175,6 @@ run_ipi_tests() printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) do - local cpu_status=$(cpu_is_online $cpu) - if [ $cpu_status == 0 ]; then - continue - fi # Running IPI test and logging results sleep 1 ipi_test_once "test" $cpu @@ -262,10 +235,6 @@ run_timeout_tests() printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) do - local cpu_status=$(cpu_is_online $cpu) - if [ $cpu_status == 0 ]; then - continue - fi timeout_test_once "baseline" $cpu 1000000 printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG avg_arr+=($timeout_diff) @@ -281,10 +250,6 @@ run_timeout_tests() printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)" "Delay(ns)" >> $LOG for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) do - local cpu_status=$(cpu_is_online $cpu) - if [ $cpu_status == 0 ]; then - continue - fi timeout_test_once "test" $cpu 1000000 printf "%-3s %13s %18s\n" $src_cpu $baseline_timeout_diff $timeout_diff >> $LOG avg_arr+=($timeout_diff) @@ -314,3 +279,7 @@ run_timeout_tests printf "Removing $MODULE module\n" printf "Full Output logged at: $LOG\n" rmmod $MODULE + +printf "enabling idle states\n" + +echo 0 | tee /sys/devices/system/cpu/cpu*/cpuidle/state*/disable (END)
Hi Doug, On 20/03/21 8:34 pm, Doug Smythies wrote: > On Wed, Mar 17, 2021 at 11:44 PM Pratik Sampat <psampat@linux.ibm.com> wrote: >> Hi Doug, >> Thanks for trying these patches out. >> >> On 18/03/21 2:30 am, Doug Smythies wrote: >>> Hi Pratik, >>> >>> It just so happens that I have been trying Artem's version this last >>> week, so I tried yours. >>> >>> On Mon, Mar 15, 2021 at 4:49 AM Pratik Rajesh Sampat >>> <psampat@linux.ibm.com> wrote: >>> ... > ... >>> Other notes: >>> >>> No idle state for CPU 0 ever gets disabled. >>> I assume this is because CPU 0 can never be offline, >>> so that bit of code (Disable all stop states) doesn't find its state. >>> By the way, processor = Intel i5-9600K >> I had tried these patches on an IBM POWER 9 processor and disabling CPU0's idle >> state works there. However, it does make sense for some processors to treat CPU >> 0 differently. >> Maybe I could write in a case if idle state disabling fails for a CPU then we >> just skip it? > I didn't try it, I just did a hack so I could continue for this reply. Sure. In subsequent version I could write something to cleanly handle online fail checks, maybe even specifically for CPU0. >>> The system is left with all idle states disabled, well not for CPU 0 >>> as per the above comment. The suggestion is to restore them, >>> otherwise my processor hogs 42 watts instead of 2. >>> >>> My results are highly variable per test. >> Question: Do you notice high variability with IPI test, Timer test or both? > The IPI test has less variability than the Timer test. > >> I can think of two reasons for high run to run variance: >> >> 1. If you observe variance in timer tests, then I believe there could a >> mechanism of "C-state pre-wake" on some Intel machines at play here, which can >> pre-wake a CPU from an idle state when timers are armed. I'm not sure if the >> Intel platform that you're running on does that or not. >> >> Artem had described this behavior to me a while ago and I think his wult page >> describes this behavior in more detail: >> https://intel.github.io/wult/#c-state-pre-wake > Yes, I have reviewed all the references. > And yes, I think my processors have the pre-wake stuff. > > I do not have the proper hardware to do the Artem pre-wake workaround > method, but might buy it in future. That explains the variability that we are seeing in the Timer tests on the Intel processor you've tried on. Also based on the data pasted below, it means that the IPI tests are more reliable than Timers. Maybe it would be better to not run the Timer test on Intel platforms that support this pre-wakeup feature? However, I don't know how (or if) Intel exposes this information to the userspace and if other platforms like AMD also have this feature in some form. Another way of solving this problem could be to have the timer test as an optional parameter in the selftest for people to use while also printing a disclaimer for x86 users about the potential hardware design? -- Thanks Pratik >> 2. I have noticed variability in results when there are kernel book-keeping or >> jitter tasks scheduled from time to time on an otherwise idle core. >> In the full per-CPU logs at tools/testing/selftests/cpuidle/cpuidle.log can you >> spot any obvious outliers per-CPU state? > Yes. > I'll just paste in an example cpuidle.log file having used the -v option > below, along with my hack job diff. > > doug@s19:~/temp-k-git/linux/tools/testing/selftests/cpuidle$ cat > cpuidle.log.v3-1 > --IPI Latency Test--- > --Baseline IPI Latency measurement: CPU Busy-- > SRC_CPU DEST_CPU IPI_Latency(ns) > 0 0 140 > 0 1 632 > 0 2 675 > 0 3 671 > 0 4 675 > 0 5 767 > 0 6 653 > 0 7 826 > 0 8 819 > 0 9 615 > 0 10 758 > 0 11 758 > Baseline Avg IPI latency(ns): 665 > ---Enabling state: 0--- > SRC_CPU DEST_CPU IPI_Latency(ns) > 0 0 76 > 0 1 484 > 0 2 494 > 0 3 539 > 0 4 498 > 0 5 491 > 0 6 474 > 0 7 434 > 0 8 544 > 0 9 476 > 0 10 447 > 0 11 467 > Expected IPI latency(ns): 0 > Observed Avg IPI latency(ns) - State 0: 452 > ---Enabling state: 1--- > SRC_CPU DEST_CPU IPI_Latency(ns) > 0 0 72 > 0 1 1081 > 0 2 821 > 0 3 1486 > 0 4 1022 > 0 5 960 > 0 6 1634 > 0 7 933 > 0 8 1032 > 0 9 1046 > 0 10 1430 > 0 11 1338 > Expected IPI latency(ns): 1000 > Observed Avg IPI latency(ns) - State 1: 1071 > ---Enabling state: 2--- > SRC_CPU DEST_CPU IPI_Latency(ns) > 0 0 264 > 0 1 30836 > 0 2 30562 > 0 3 30748 > 0 4 35286 > 0 5 30978 > 0 6 1952 > 0 7 36066 > 0 8 30670 > 0 9 30605 > 0 10 30635 > 0 11 35423 > Expected IPI latency(ns): 120000 > Observed Avg IPI latency(ns) - State 2: 27002 > ---Enabling state: 3--- > SRC_CPU DEST_CPU IPI_Latency(ns) > 0 0 71 > 0 1 30853 > 0 2 32095 > 0 3 32661 > 0 4 30230 > 0 5 34348 > 0 6 2012 > 0 7 30816 > 0 8 30908 > 0 9 31130 > 0 10 34150 > 0 11 32050 > Expected IPI latency(ns): 1034000 > Observed Avg IPI latency(ns) - State 3: 26777 > > --Timeout Latency Test-- > --Baseline Timeout Latency measurement: CPU Busy-- > Wakeup_src Baseline_delay(ns) > 0 453 > 1 568 > 2 387 > 3 337 > 4 433 > 5 579 > 6 330 > 7 400 > 8 561 > 9 544 > 10 569 > 11 523 > Baseline Avg timeout diff(ns): 473 > ---Enabling state: 0--- > Wakeup_src Baseline_delay(ns) Delay(ns) > 0 399 > 1 388 > 2 352 > 3 385 > 4 334 > 5 415 > 6 320 > 7 356 > 8 401 > 9 379 > 10 339 > 11 384 > Expected timeout(ns): 200 > Observed Avg timeout diff(ns) - State 0: 371 > ---Enabling state: 1--- > Wakeup_src Baseline_delay(ns) Delay(ns) > 0 666 > 1 575 > 2 608 > 3 590 > 4 608 > 5 552 > 6 582 > 7 593 > 8 597 > 9 587 > 10 588 > 11 610 > Expected timeout(ns): 1200 > Observed Avg timeout diff(ns) - State 1: 596 > ---Enabling state: 2--- > Wakeup_src Baseline_delay(ns) Delay(ns) > 0 36386 > 1 1069 > 2 866 > 3 884 > 4 850 > 5 55642 > 6 408082 > 7 1184 > 8 406075 > 9 406830 > 10 414105 > 11 406594 > Expected timeout(ns): 360200 > Observed Avg timeout diff(ns) - State 2: 178213 > ---Enabling state: 3--- > Wakeup_src Baseline_delay(ns) Delay(ns) > 0 406049 > 1 913 > 2 410134 > 3 921 > 4 406237 > 5 950 > 6 407181 > 7 920 > 8 407678 > 9 894 > 10 406320 > 11 304161 > Expected timeout(ns): 3102200 > Observed Avg timeout diff(ns) - State 3: 229363 > > > My hack job, (CPUs always online): > > diff --git a/tools/testing/selftests/cpuidle/cpuidle.sh > b/tools/testing/selftests/cpuidle/cpuidle.sh > index de5141d5b76b..70bdacda5e91 100755 > --- a/tools/testing/selftests/cpuidle/cpuidle.sh > +++ b/tools/testing/selftests/cpuidle/cpuidle.sh > @@ -86,10 +86,6 @@ disable_idle() > { > for ((cpu=0; cpu<NUM_CPUS; cpu++)) > do > - local cpu_status=$(cpu_is_online $cpu) > - if [ $cpu_status == 0 ]; then > - continue > - fi > for ((state=0; state<NUM_STATES; state++)) > do > echo 1 > > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable > @@ -104,10 +100,6 @@ op_state() > { > for ((cpu=0; cpu<NUM_CPUS; cpu++)) > do > - local cpu_status=$(cpu_is_online $cpu) > - if [ $cpu_status == 0 ]; then > - continue > - fi > echo $1 > > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable > done > } > @@ -124,17 +116,6 @@ cpuidle_disable_state() > op_state 1 $state > } > > -cpu_is_online() > -{ > - cpu=$1 > - if [ ! -f "/sys/devices/system/cpu/cpu$cpu/online" ]; then > - echo 0 > - return > - fi > - status=$(cat /sys/devices/system/cpu/cpu$cpu/online) > - echo $status > -} > - > # Extract latency in microseconds and convert to nanoseconds > extract_latency() > { > @@ -179,10 +160,6 @@ run_ipi_tests() > printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG > for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) > do > - local cpu_status=$(cpu_is_online $cpu) > - if [ $cpu_status == 0 ]; then > - continue > - fi > ipi_test_once "baseline" $cpu > printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG > avg_arr+=($ipi_latency) > @@ -198,10 +175,6 @@ run_ipi_tests() > printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" > "IPI_Latency(ns)" >> $LOG > for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) > do > - local cpu_status=$(cpu_is_online $cpu) > - if [ $cpu_status == 0 ]; then > - continue > - fi > # Running IPI test and logging results > sleep 1 > ipi_test_once "test" $cpu > @@ -262,10 +235,6 @@ run_timeout_tests() > printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG > for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) > do > - local cpu_status=$(cpu_is_online $cpu) > - if [ $cpu_status == 0 ]; then > - continue > - fi > timeout_test_once "baseline" $cpu 1000000 > printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG > avg_arr+=($timeout_diff) > @@ -281,10 +250,6 @@ run_timeout_tests() > printf "%s %10s %10s\n" "Wakeup_src" > "Baseline_delay(ns)" "Delay(ns)" >> $LOG > for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) > do > - local cpu_status=$(cpu_is_online $cpu) > - if [ $cpu_status == 0 ]; then > - continue > - fi > timeout_test_once "test" $cpu 1000000 > printf "%-3s %13s %18s\n" $src_cpu > $baseline_timeout_diff $timeout_diff >> $LOG > avg_arr+=($timeout_diff) > @@ -314,3 +279,7 @@ run_timeout_tests > printf "Removing $MODULE module\n" > printf "Full Output logged at: $LOG\n" > rmmod $MODULE > + > +printf "enabling idle states\n" > + > +echo 0 | tee /sys/devices/system/cpu/cpu*/cpuidle/state*/disable > (END)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 8a917cb4426a..8ac7a4fb86f9 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -7,6 +7,7 @@ TARGETS += cgroup TARGETS += clone3 TARGETS += core TARGETS += cpufreq +TARGETS += cpuidle TARGETS += cpu-hotplug TARGETS += drivers/dma-buf TARGETS += efivarfs diff --git a/tools/testing/selftests/cpuidle/Makefile b/tools/testing/selftests/cpuidle/Makefile new file mode 100644 index 000000000000..cbe13feced34 --- /dev/null +++ b/tools/testing/selftests/cpuidle/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +all: + +TEST_PROGS := cpuidle.sh + +include ../lib.mk \ No newline at end of file diff --git a/tools/testing/selftests/cpuidle/cpuidle.sh b/tools/testing/selftests/cpuidle/cpuidle.sh new file mode 100755 index 000000000000..de5141d5b76b --- /dev/null +++ b/tools/testing/selftests/cpuidle/cpuidle.sh @@ -0,0 +1,316 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +LOG=cpuidle.log +MODULE=/lib/modules/$(uname -r)/kernel/drivers/cpuidle/test-cpuidle_latency.ko + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +SMT=$(lscpu | grep "Thread(s) per core" | awk '{print $4}') + +helpme() +{ + printf "Usage: $0 [-h] [-todg args] + [-h <help>] + [-m <location of the module>] + [-o <location of the output>] + [-v <verbose>] + \n" + exit 2 +} + +parse_arguments() +{ + while getopts ht:m:o:vt: arg + do + case $arg in + h) # --help + helpme + ;; + m) # --mod-file + MODULE=$OPTARG + ;; + o) # output log files + LOG=$OPTARG + ;; + v) # Verbose mode + SMT=1 + ;; + \?) + helpme + ;; + esac + done +} + +ins_mod() +{ + debugfs_file=/sys/kernel/debug/latency_test/ipi_latency_ns + # Check if the module is already loaded + if [ -f "$debugfs_file" ]; then + printf "Module already loaded\n\n" + return 0 + fi + # Try to load the module + if [ ! -f "$MODULE" ]; then + printf "$MODULE module does not exist. Exitting\n" + exit $ksft_skip + fi + printf "Inserting $MODULE module\n\n" + insmod $MODULE + if [ $? != 0 ]; then + printf "Insmod $MODULE failed\n" + exit $ksft_skip + fi +} + +compute_average() +{ + arr=("$@") + sum=0 + size=${#arr[@]} + if [ $size == 0 ]; then + avg=0 + return 1 + fi + for i in "${arr[@]}" + do + sum=$((sum + i)) + done + avg=$((sum/size)) +} + +# Disable all stop states +disable_idle() +{ + for ((cpu=0; cpu<NUM_CPUS; cpu++)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + for ((state=0; state<NUM_STATES; state++)) + do + echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable + done + done +} + +# Perform operation on each CPU for the given state +# $1 - Operation: enable (0) / disable (1) +# $2 - State to enable +op_state() +{ + for ((cpu=0; cpu<NUM_CPUS; cpu++)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + echo $1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable + done +} + +cpuidle_enable_state() +{ + state=$1 + op_state 0 $state +} + +cpuidle_disable_state() +{ + state=$1 + op_state 1 $state +} + +cpu_is_online() +{ + cpu=$1 + if [ ! -f "/sys/devices/system/cpu/cpu$cpu/online" ]; then + echo 0 + return + fi + status=$(cat /sys/devices/system/cpu/cpu$cpu/online) + echo $status +} + +# Extract latency in microseconds and convert to nanoseconds +extract_latency() +{ + for ((state=0; state<NUM_STATES; state++)) + do + latency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/latency) * 1000)) + latency_arr+=($latency) + done +} + +# Run the IPI test +# $1 run for baseline - busy cpu or regular environment +# $2 destination cpu +ipi_test_once() +{ + dest_cpu=$2 + if [ "$1" = "baseline" ]; then + # Keep the CPU busy + taskset -c $dest_cpu cat /dev/random > /dev/null & + task_pid=$! + # Wait for the workload to achieve 100% CPU usage + sleep 1 + fi + taskset 0x1 echo $dest_cpu > /sys/kernel/debug/latency_test/ipi_cpu_dest + ipi_latency=$(cat /sys/kernel/debug/latency_test/ipi_latency_ns) + src_cpu=$(cat /sys/kernel/debug/latency_test/ipi_cpu_src) + if [ "$1" = "baseline" ]; then + kill $task_pid + wait $task_pid 2>/dev/null + fi +} + +# Incrementally Enable idle states one by one and compute the latency +run_ipi_tests() +{ + extract_latency + disable_idle + declare -a avg_arr + echo -e "--IPI Latency Test---" | tee -a $LOG + + echo -e "--Baseline IPI Latency measurement: CPU Busy--" >> $LOG + printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG + for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + ipi_test_once "baseline" $cpu + printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG + avg_arr+=($ipi_latency) + done + compute_average "${avg_arr[@]}" + echo -e "Baseline Avg IPI latency(ns): $avg" | tee -a $LOG + + for ((state=0; state<NUM_STATES; state++)) + do + unset avg_arr + echo -e "---Enabling state: $state---" >> $LOG + cpuidle_enable_state $state + printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG + for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + # Running IPI test and logging results + sleep 1 + ipi_test_once "test" $cpu + printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG + avg_arr+=($ipi_latency) + done + compute_average "${avg_arr[@]}" + echo -e "Expected IPI latency(ns): ${latency_arr[$state]}" >> $LOG + echo -e "Observed Avg IPI latency(ns) - State $state: $avg" | tee -a $LOG + cpuidle_disable_state $state + done +} + +# Extract the residency in microseconds and convert to nanoseconds. +# Add 100 ns so that the timer stays for a little longer than the residency +extract_residency() +{ + for ((state=0; state<NUM_STATES; state++)) + do + residency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/residency) * 1000 + 200)) + residency_arr+=($residency) + done +} + +# Run the Timeout test +# $1 run for baseline - busy cpu or regular environment +# $2 destination cpu +# $3 timeout +timeout_test_once() +{ + dest_cpu=$2 + if [ "$1" = "baseline" ]; then + # Keep the CPU busy + taskset -c $dest_cpu cat /dev/random > /dev/null & + task_pid=$! + # Wait for the workload to achieve 100% CPU usage + sleep 1 + fi + taskset -c $dest_cpu echo $3 > /sys/kernel/debug/latency_test/timeout_expected_ns + # Wait for the result to populate + sleep 0.1 + timeout_diff=$(cat /sys/kernel/debug/latency_test/timeout_diff_ns) + src_cpu=$(cat /sys/kernel/debug/latency_test/timeout_cpu_src) + if [ "$1" = "baseline" ]; then + kill $task_pid + wait $task_pid 2>/dev/null + fi +} + +run_timeout_tests() +{ + extract_residency + disable_idle + declare -a avg_arr + echo -e "\n--Timeout Latency Test--" | tee -a $LOG + + echo -e "--Baseline Timeout Latency measurement: CPU Busy--" >> $LOG + printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG + for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + timeout_test_once "baseline" $cpu 1000000 + printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG + avg_arr+=($timeout_diff) + done + compute_average "${avg_arr[@]}" + echo -e "Baseline Avg timeout diff(ns): $avg" | tee -a $LOG + + for ((state=0; state<NUM_STATES; state++)) + do + unset avg_arr + echo -e "---Enabling state: $state---" >> $LOG + cpuidle_enable_state $state + printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)" "Delay(ns)" >> $LOG + for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + timeout_test_once "test" $cpu 1000000 + printf "%-3s %13s %18s\n" $src_cpu $baseline_timeout_diff $timeout_diff >> $LOG + avg_arr+=($timeout_diff) + done + compute_average "${avg_arr[@]}" + echo -e "Expected timeout(ns): ${residency_arr[$state]}" >> $LOG + echo -e "Observed Avg timeout diff(ns) - State $state: $avg" | tee -a $LOG + cpuidle_disable_state $state + done +} + +declare -a residency_arr +declare -a latency_arr + +parse_arguments $@ + +rm -f $LOG +touch $LOG +NUM_CPUS=$(nproc --all) +NUM_STATES=$(ls -1 /sys/devices/system/cpu/cpu0/cpuidle/ | wc -l) + +ins_mod $MODULE + +run_ipi_tests +run_timeout_tests + +printf "Removing $MODULE module\n" +printf "Full Output logged at: $LOG\n" +rmmod $MODULE diff --git a/tools/testing/selftests/cpuidle/settings b/tools/testing/selftests/cpuidle/settings new file mode 100644 index 000000000000..a26c38a70d77 --- /dev/null +++ b/tools/testing/selftests/cpuidle/settings @@ -0,0 +1,2 @@ +timeout=0 +
The cpuidle latency selftest provides support to systematically extract, analyse and present IPI and timer based wakeup latencies for each CPU and each idle state available on the system. The selftest leverages test-cpuidle_latency module's debugfs interface to interact and extract latency information from the kernel. The selftest inserts the module if already not inserted, disables all the idle states and enables them one by one testing the following: 1. Keeping source CPU constant, iterate through all the CPUS measuring IPI latency for baseline (CPU is busy with cat /dev/random > /dev/null workload) and then when the CPU is allowed to be at rest 2. Iterating through all the CPUs, sending expected timer durations to be equivalent to the residency of the deepest idle state enabled and extracting the difference in time between the time of wakeup and the expected timer duration To run this test specifically: $ make -C tools/testing/selftests TARGETS="cpuidle" run_tests There are a few optinal arguments too that the script can take [-h <help>] [-m <location of the module>] [-o <location of the output>] [-v <verbose> (run on all cpus)] Default Output location in: tools/testing/cpuidle/cpuidle.log Signed-off-by: Pratik Rajesh Sampat <psampat@linux.ibm.com> --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/cpuidle/Makefile | 6 + tools/testing/selftests/cpuidle/cpuidle.sh | 316 +++++++++++++++++++++ tools/testing/selftests/cpuidle/settings | 2 + 4 files changed, 325 insertions(+) create mode 100644 tools/testing/selftests/cpuidle/Makefile create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh create mode 100644 tools/testing/selftests/cpuidle/settings