Message ID | cf200aaf594caae68350219fa1f781d64136fa2c.1693287931.git.raghavendra.kt@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | sched/numa: Enhance disjoint VMA scanning | expand |
hi, Raghu, hope this third performance report for same one patch-set won't annoy you, and better, have some value to you. we won't send more autonuma-benchmark performance improvement reports for this patch-set, of course, unless you still hope we do. BTW, we will still send out performance/function regression reports if any. as in previous reports, we know that you want to see the performance impact of whole patch set, so let me give a full summary here: let me list how we apply your patch set again: 68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned <-- we reported [1] 167773d1ddb5f sched/numa: Increase tasks' access history <---- for this report fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq 1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic <--- we reported [2] 2a806eab1c2e1 sched/numa: Move up the access pid reset logic 2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well [1] https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@intel.com/ [2] https://lore.kernel.org/all/202309121417.53f44ad6-oliver.sang@intel.com/ below will only give out the comparison between 2f88c8e802c8b and 68cfe9439a1ba in a summary way, if you want detail data for more commits, or more comparison data, please let me know. Thanks! on test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark 2f88c8e802c8b128 68cfe9439a1baa642e05883fa64 ---------------- --------------------------- %stddev %change %stddev \ | \ 271.01 -26.4% 199.49 ± 3% autonuma-benchmark.numa01.seconds 76.28 -46.9% 40.49 ± 5% autonuma-benchmark.numa01_THREAD_ALLOC.seconds 8.11 -0.1% 8.10 autonuma-benchmark.numa02.seconds 1425 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time 1425 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time.max on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark 2f88c8e802c8b128 68cfe9439a1baa642e05883fa64 ---------------- --------------------------- %stddev %change %stddev \ | \ 361.53 ± 6% -10.4% 323.83 ± 3% autonuma-benchmark.numa01.seconds 255.31 -60.1% 101.90 ± 2% autonuma-benchmark.numa01_THREAD_ALLOC.seconds 14.95 -4.6% 14.26 autonuma-benchmark.numa02.seconds 2530 ± 3% -30.3% 1763 ± 2% autonuma-benchmark.time.elapsed_time 2530 ± 3% -30.3% 1763 ± 2% autonuma-benchmark.time.elapsed_time.max below is the auto-generated report part, FYI. Hello, kernel test robot noticed a -17.6% improvement of autonuma-benchmark.numa01.seconds on: commit: 167773d1ddb5ffdd944f851f2cbdd4e65425a358 ("[RFC PATCH V1 4/6] sched/numa: Increase tasks' access history") url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805 patch link: https://lore.kernel.org/all/cf200aaf594caae68350219fa1f781d64136fa2c.1693287931.git.raghavendra.kt@amd.com/ patch subject: [RFC PATCH V1 4/6] sched/numa: Increase tasks' access history testcase: autonuma-benchmark test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory parameters: iterations: 4x test: numa01_THREAD_ALLOC cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+----------------------------------------------------------------------------------------------------+ | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01.seconds -15.4% improvement | | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory | | test parameters | cpufreq_governor=performance | | | iterations=4x | | | test=numa01_THREAD_ALLOC | +------------------+----------------------------------------------------------------------------------------------------+ | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01.seconds -14.8% improvement | | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory | | test parameters | cpufreq_governor=performance | | | iterations=4x | | | test=_INVERSE_BIND | +------------------+----------------------------------------------------------------------------------------------------+ | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01_THREAD_ALLOC.seconds -10.7% improvement | | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory | | test parameters | cpufreq_governor=performance | | | iterations=4x | | | test=numa01_THREAD_ALLOC | +------------------+----------------------------------------------------------------------------------------------------+ Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20230912/202309122114.b9e08a43-oliver.sang@intel.com ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark commit: fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") 167773d1dd ("sched/numa: Increase tasks' access history") fc769221b23064c0 167773d1ddb5ffdd944f851f2cb ---------------- --------------------------- %stddev %change %stddev \ | \ 105.67 ± 8% -20.3% 84.17 ± 10% perf-c2c.HITM.remote 1.856e+10 ± 7% -18.8% 1.508e+10 ± 8% cpuidle..time 19025348 ± 7% -18.6% 15481744 ± 8% cpuidle..usage 0.00 ± 28% +0.0 0.01 ± 10% mpstat.cpu.all.iowait% 0.10 ± 2% -0.0 0.09 ± 4% mpstat.cpu.all.soft% 1443 ± 2% -14.2% 1238 ± 4% uptime.boot 26312 ± 5% -12.8% 22935 ± 5% uptime.idle 8774783 ± 7% -19.0% 7104495 ± 8% turbostat.C1E 10147966 ± 7% -18.4% 8280745 ± 8% turbostat.C6 3.225e+08 ± 2% -14.1% 2.77e+08 ± 4% turbostat.IRQ 2.81 ± 24% +3.5 6.35 ± 24% turbostat.PKG_% 638.24 +2.0% 650.74 turbostat.PkgWatt 57.57 +10.9% 63.85 ± 2% turbostat.RAMWatt 271.39 ± 2% -17.6% 223.53 ± 5% autonuma-benchmark.numa01.seconds 1401 ± 2% -14.6% 1197 ± 4% autonuma-benchmark.time.elapsed_time 1401 ± 2% -14.6% 1197 ± 4% autonuma-benchmark.time.elapsed_time.max 1088153 ± 2% -14.1% 934904 ± 6% autonuma-benchmark.time.involuntary_context_switches 3953 -2.6% 3852 ± 2% autonuma-benchmark.time.system_time 287110 -14.5% 245511 ± 4% autonuma-benchmark.time.user_time 22704 ± 7% +15.9% 26303 ± 8% autonuma-benchmark.time.voluntary_context_switches 191.10 ± 64% +94.9% 372.49 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 4.09 ± 49% +85.6% 7.59 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 1.99 ± 40% +99.8% 3.97 ± 30% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter 14.18 ±158% -82.6% 2.47 ± 22% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 189.39 ± 65% +96.5% 372.20 ± 7% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 2.18 ± 21% -33.3% 1.46 ± 41% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open 3.22 ± 32% -73.0% 0.87 ± 81% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.single_open.do_dentry_open 4.73 ± 20% +60.6% 7.59 ± 14% perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 9.61 ± 30% -32.8% 6.46 ± 16% perf-sched.wait_time.max.ms.__cond_resched.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 13.57 ± 65% -60.2% 5.40 ± 24% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open 6040567 -6.2% 5667640 proc-vmstat.numa_hit 32278 ± 7% +51.7% 48955 ± 18% proc-vmstat.numa_huge_pte_updates 4822780 -7.5% 4459553 proc-vmstat.numa_local 3187796 ± 9% +73.2% 5521800 ± 16% proc-vmstat.numa_pages_migrated 16792299 ± 7% +50.8% 25319315 ± 18% proc-vmstat.numa_pte_updates 6242814 -8.5% 5711173 ± 2% proc-vmstat.pgfault 3187796 ± 9% +73.2% 5521800 ± 16% proc-vmstat.pgmigrate_success 254872 ± 2% -12.3% 223591 ± 5% proc-vmstat.pgreuse 6151 ± 9% +74.2% 10717 ± 16% proc-vmstat.thp_migration_success 4201550 -13.7% 3627350 ± 3% proc-vmstat.unevictable_pgs_scanned 1.823e+08 ± 2% -15.2% 1.547e+08 ± 5% sched_debug.cfs_rq:/.avg_vruntime.avg 1.872e+08 ± 2% -15.3% 1.585e+08 ± 5% sched_debug.cfs_rq:/.avg_vruntime.max 1.423e+08 ± 4% -14.0% 1.224e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min 4320209 ± 8% -18.1% 3537344 ± 8% sched_debug.cfs_rq:/.avg_vruntime.stddev 3349 ± 40% +58.3% 5300 ± 27% sched_debug.cfs_rq:/.load_avg.max 1.823e+08 ± 2% -15.2% 1.547e+08 ± 5% sched_debug.cfs_rq:/.min_vruntime.avg 1.872e+08 ± 2% -15.3% 1.585e+08 ± 5% sched_debug.cfs_rq:/.min_vruntime.max 1.423e+08 ± 4% -14.0% 1.224e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.min 4320208 ± 8% -18.1% 3537344 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev 1852009 ± 3% -13.2% 1607461 ± 2% sched_debug.cpu.avg_idle.avg 751880 ± 2% -15.1% 638555 ± 4% sched_debug.cpu.avg_idle.stddev 725827 ± 2% -14.1% 623617 ± 4% sched_debug.cpu.clock.avg 726857 ± 2% -14.1% 624498 ± 4% sched_debug.cpu.clock.max 724740 ± 2% -14.1% 622692 ± 4% sched_debug.cpu.clock.min 717315 ± 2% -14.1% 616349 ± 4% sched_debug.cpu.clock_task.avg 719648 ± 2% -14.1% 618089 ± 4% sched_debug.cpu.clock_task.max 698681 ± 2% -14.2% 599424 ± 4% sched_debug.cpu.clock_task.min 1839 ± 8% -18.1% 1506 ± 7% sched_debug.cpu.clock_task.stddev 27352 -9.6% 24731 ± 2% sched_debug.cpu.curr->pid.max 293258 ± 5% -16.4% 245303 ± 7% sched_debug.cpu.max_idle_balance_cost.stddev -14.96 +73.6% -25.98 sched_debug.cpu.nr_uninterruptible.min 6.27 ± 4% +18.7% 7.44 ± 6% sched_debug.cpu.nr_uninterruptible.stddev 724723 ± 2% -14.1% 622678 ± 4% sched_debug.cpu_clk 723514 ± 2% -14.1% 621468 ± 4% sched_debug.ktime 725604 ± 2% -14.1% 623550 ± 4% sched_debug.sched_clk 29.50 ± 3% +24.9% 36.83 ± 9% perf-stat.i.MPKI 3.592e+08 +5.7% 3.797e+08 ± 2% perf-stat.i.branch-instructions 1823514 +3.7% 1891464 perf-stat.i.branch-misses 28542234 ± 3% +22.0% 34809605 ± 10% perf-stat.i.cache-misses 72486859 ± 3% +19.6% 86713561 ± 7% perf-stat.i.cache-references 224.48 +3.2% 231.63 perf-stat.i.cpu-migrations 145250 ± 2% -10.8% 129549 ± 4% perf-stat.i.cycles-between-cache-misses 0.08 ± 5% -0.0 0.07 ± 10% perf-stat.i.dTLB-load-miss-rate% 272123 ± 6% -15.0% 231302 ± 10% perf-stat.i.dTLB-load-misses 4.515e+08 +4.7% 4.729e+08 ± 2% perf-stat.i.dTLB-loads 995784 +1.9% 1014848 perf-stat.i.dTLB-store-misses 1.844e+08 +1.5% 1.871e+08 perf-stat.i.dTLB-stores 1.711e+09 +5.0% 1.797e+09 ± 2% perf-stat.i.instructions 3.25 +8.3% 3.52 ± 3% perf-stat.i.metric.M/sec 4603 +6.7% 4912 ± 3% perf-stat.i.minor-faults 488266 ± 2% +25.0% 610436 ± 6% perf-stat.i.node-load-misses 618022 ± 4% +13.4% 701130 ± 5% perf-stat.i.node-loads 4603 +6.7% 4912 ± 3% perf-stat.i.page-faults 39.67 ± 2% +16.0% 46.04 ± 6% perf-stat.overall.MPKI 375.84 -4.9% 357.36 ± 2% perf-stat.overall.cpi 24383 ± 3% -19.0% 19742 ± 12% perf-stat.overall.cycles-between-cache-misses 0.06 ± 7% -0.0 0.05 ± 10% perf-stat.overall.dTLB-load-miss-rate% 0.00 +5.2% 0.00 ± 2% perf-stat.overall.ipc 41.99 ± 2% +2.8 44.83 ± 4% perf-stat.overall.node-load-miss-rate% 3.355e+08 +6.3% 3.567e+08 ± 2% perf-stat.ps.branch-instructions 1758832 +4.4% 1835699 perf-stat.ps.branch-misses 24888631 ± 3% +25.6% 31268733 ± 12% perf-stat.ps.cache-misses 64007362 ± 3% +22.5% 78424799 ± 8% perf-stat.ps.cache-references 221.69 +3.0% 228.32 perf-stat.ps.cpu-migrations 4.273e+08 +5.2% 4.495e+08 ± 2% perf-stat.ps.dTLB-loads 992569 +1.8% 1010389 perf-stat.ps.dTLB-store-misses 1.818e+08 +1.6% 1.847e+08 perf-stat.ps.dTLB-stores 1.613e+09 +5.5% 1.701e+09 ± 2% perf-stat.ps.instructions 4331 +7.2% 4644 ± 3% perf-stat.ps.minor-faults 477740 ± 2% +26.3% 603330 ± 7% perf-stat.ps.node-load-misses 660610 ± 5% +12.3% 741896 ± 6% perf-stat.ps.node-loads 4331 +7.2% 4644 ± 3% perf-stat.ps.page-faults 2.264e+12 -10.0% 2.038e+12 ± 3% perf-stat.total.instructions 1.16 ± 20% -0.6 0.59 ± 47% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1.07 ± 20% -0.5 0.54 ± 47% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1.96 ± 25% -0.7 1.27 ± 23% perf-profile.children.cycles-pp.task_mm_cid_work 1.16 ± 20% -0.5 0.67 ± 19% perf-profile.children.cycles-pp.worker_thread 1.07 ± 20% -0.5 0.61 ± 21% perf-profile.children.cycles-pp.process_one_work 0.84 ± 44% -0.4 0.43 ± 25% perf-profile.children.cycles-pp.evlist__id2evsel 0.58 ± 34% -0.2 0.33 ± 21% perf-profile.children.cycles-pp.do_mprotect_pkey 0.54 ± 26% -0.2 0.30 ± 23% perf-profile.children.cycles-pp.drm_fb_helper_damage_work 0.54 ± 26% -0.2 0.30 ± 23% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty 0.58 ± 34% -0.2 0.34 ± 22% perf-profile.children.cycles-pp.__x64_sys_mprotect 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_vmap_unlocked 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_vmap 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_shmem_object_vmap 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_shmem_vmap_locked 0.55 ± 32% -0.2 0.33 ± 18% perf-profile.children.cycles-pp.__wp_page_copy_user 0.50 ± 35% -0.2 0.28 ± 21% perf-profile.children.cycles-pp.mprotect_fixup 0.28 ± 25% -0.2 0.08 ±101% perf-profile.children.cycles-pp.drm_gem_shmem_get_pages_locked 0.28 ± 25% -0.2 0.08 ±101% perf-profile.children.cycles-pp.drm_gem_get_pages 0.28 ± 25% -0.2 0.08 ±102% perf-profile.children.cycles-pp.shmem_read_folio_gfp 0.28 ± 25% -0.2 0.08 ±102% perf-profile.children.cycles-pp.drm_gem_shmem_get_pages 0.62 ± 15% -0.2 0.43 ± 16% perf-profile.children.cycles-pp.try_to_wake_up 0.25 ± 19% -0.2 0.08 ± 84% perf-profile.children.cycles-pp.drm_client_buffer_vmap 0.44 ± 19% -0.2 0.28 ± 31% perf-profile.children.cycles-pp.filemap_get_entry 0.39 ± 14% -0.1 0.26 ± 22% perf-profile.children.cycles-pp.perf_event_mmap 0.38 ± 13% -0.1 0.25 ± 23% perf-profile.children.cycles-pp.perf_event_mmap_event 0.22 ± 22% -0.1 0.11 ± 25% perf-profile.children.cycles-pp.lru_add_drain_cpu 0.24 ± 21% -0.1 0.14 ± 36% perf-profile.children.cycles-pp.do_open_execat 0.24 ± 13% -0.1 0.14 ± 42% perf-profile.children.cycles-pp.arch_do_signal_or_restart 0.22 ± 30% -0.1 0.13 ± 10% perf-profile.children.cycles-pp.wake_up_q 0.14 ± 17% -0.1 0.05 ±101% perf-profile.children.cycles-pp.open_exec 0.16 ± 21% -0.1 0.07 ± 51% perf-profile.children.cycles-pp.path_init 0.23 ± 30% -0.1 0.15 ± 22% perf-profile.children.cycles-pp.ttwu_do_activate 0.26 ± 11% -0.1 0.18 ± 20% perf-profile.children.cycles-pp.perf_iterate_sb 0.14 ± 50% -0.1 0.07 ± 12% perf-profile.children.cycles-pp.security_inode_getattr 0.18 ± 27% -0.1 0.11 ± 20% perf-profile.children.cycles-pp.select_task_rq 0.14 ± 21% -0.1 0.08 ± 29% perf-profile.children.cycles-pp.get_unmapped_area 0.10 ± 19% -0.1 0.04 ± 73% perf-profile.children.cycles-pp.expand_downwards 0.18 ± 16% -0.1 0.13 ± 26% perf-profile.children.cycles-pp.__d_alloc 0.09 ± 15% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.anon_vma_clone 0.13 ± 36% -0.1 0.08 ± 19% perf-profile.children.cycles-pp.file_free_rcu 0.08 ± 23% -0.0 0.03 ±101% perf-profile.children.cycles-pp.__legitimize_mnt 0.09 ± 15% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.__pipe 1.92 ± 26% -0.7 1.24 ± 23% perf-profile.self.cycles-pp.task_mm_cid_work 0.82 ± 43% -0.4 0.42 ± 24% perf-profile.self.cycles-pp.evlist__id2evsel 0.42 ± 39% -0.2 0.22 ± 19% perf-profile.self.cycles-pp.evsel__read_counter 0.27 ± 24% -0.2 0.10 ± 56% perf-profile.self.cycles-pp.filemap_get_entry 0.15 ± 48% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.ksys_read 0.10 ± 34% -0.1 0.03 ±101% perf-profile.self.cycles-pp.enqueue_task_fair 0.13 ± 36% -0.1 0.08 ± 19% perf-profile.self.cycles-pp.file_free_rcu *************************************************************************************************** lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/numa01_THREAD_ALLOC/autonuma-benchmark commit: fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") 167773d1dd ("sched/numa: Increase tasks' access history") fc769221b23064c0 167773d1ddb5ffdd944f851f2cb ---------------- --------------------------- %stddev %change %stddev \ | \ 2.309e+10 ± 6% -27.8% 1.668e+10 ± 5% cpuidle..time 23855797 ± 6% -27.9% 17210884 ± 5% cpuidle..usage 2514 -11.9% 2215 uptime.boot 27543 ± 5% -23.1% 21189 ± 5% uptime.idle 9.80 ± 5% -1.8 8.05 ± 6% mpstat.cpu.all.idle% 0.01 ± 6% +0.0 0.01 ± 17% mpstat.cpu.all.iowait% 0.08 -0.0 0.07 ± 2% mpstat.cpu.all.soft% 845597 ± 12% -26.1% 624549 ± 19% numa-numastat.node0.other_node 2990301 ± 6% -13.1% 2598273 ± 4% numa-numastat.node1.local_node 471614 ± 21% +45.0% 684016 ± 18% numa-numastat.node1.other_node 845597 ± 12% -26.1% 624549 ± 19% numa-vmstat.node0.numa_other 4073 ±106% -82.5% 711.67 ± 23% numa-vmstat.node1.nr_mapped 2989568 ± 6% -13.1% 2597798 ± 4% numa-vmstat.node1.numa_local 471614 ± 21% +45.0% 684016 ± 18% numa-vmstat.node1.numa_other 375.07 ± 4% -15.4% 317.31 ± 2% autonuma-benchmark.numa01.seconds 2462 -12.2% 2162 autonuma-benchmark.time.elapsed_time 2462 -12.2% 2162 autonuma-benchmark.time.elapsed_time.max 1354545 -12.9% 1179617 autonuma-benchmark.time.involuntary_context_switches 3212023 -6.5% 3001966 autonuma-benchmark.time.minor_page_faults 8377 +2.3% 8572 autonuma-benchmark.time.percent_of_cpu_this_job_got 199714 -10.4% 179020 autonuma-benchmark.time.user_time 50675 ± 8% -19.0% 41038 ± 12% turbostat.C1 183835 ± 7% -17.6% 151526 ± 6% turbostat.C1E 23556011 ± 6% -28.0% 16965247 ± 5% turbostat.C6 9.72 ± 5% -1.7 7.99 ± 6% turbostat.C6% 9.54 ± 6% -18.1% 7.81 ± 6% turbostat.CPU%c1 2.404e+08 -12.0% 2.116e+08 turbostat.IRQ 280.51 +1.2% 283.99 turbostat.PkgWatt 63.94 +6.7% 68.23 turbostat.RAMWatt 282375 ± 3% -9.8% 254565 ± 7% proc-vmstat.numa_hint_faults 217705 ± 6% -12.6% 190234 ± 8% proc-vmstat.numa_hint_faults_local 7081835 -7.9% 6524239 proc-vmstat.numa_hit 107927 ± 10% +16.6% 125887 proc-vmstat.numa_huge_pte_updates 5764595 -9.5% 5215673 proc-vmstat.numa_local 7379523 ± 15% +25.7% 9272505 ± 4% proc-vmstat.numa_pages_migrated 55530575 ± 10% +16.5% 64669707 proc-vmstat.numa_pte_updates 8852860 -9.3% 8028738 proc-vmstat.pgfault 7379523 ± 15% +25.7% 9272505 ± 4% proc-vmstat.pgmigrate_success 393902 -9.6% 356099 proc-vmstat.pgreuse 14358 ± 15% +25.8% 18064 ± 5% proc-vmstat.thp_migration_success 18273792 -11.5% 16166144 proc-vmstat.unevictable_pgs_scanned 1.45e+08 -8.7% 1.325e+08 sched_debug.cfs_rq:/.avg_vruntime.max 3995873 -14.0% 3437625 ± 2% sched_debug.cfs_rq:/.avg_vruntime.stddev 0.23 ± 3% -8.6% 0.21 ± 6% sched_debug.cfs_rq:/.h_nr_running.stddev 1.45e+08 -8.7% 1.325e+08 sched_debug.cfs_rq:/.min_vruntime.max 3995873 -14.0% 3437625 ± 2% sched_debug.cfs_rq:/.min_vruntime.stddev 0.53 ± 71% +195.0% 1.56 ± 37% sched_debug.cfs_rq:/.removed.load_avg.avg 25.54 ± 2% +13.0% 28.87 sched_debug.cfs_rq:/.removed.load_avg.max 3.40 ± 35% +85.6% 6.32 ± 17% sched_debug.cfs_rq:/.removed.load_avg.stddev 0.16 ± 74% +275.6% 0.59 ± 39% sched_debug.cfs_rq:/.removed.runnable_avg.avg 8.03 ± 31% +84.9% 14.84 sched_debug.cfs_rq:/.removed.runnable_avg.max 1.02 ± 44% +154.3% 2.59 ± 16% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 0.16 ± 74% +275.6% 0.59 ± 39% sched_debug.cfs_rq:/.removed.util_avg.avg 8.03 ± 31% +84.9% 14.84 sched_debug.cfs_rq:/.removed.util_avg.max 1.02 ± 44% +154.3% 2.59 ± 16% sched_debug.cfs_rq:/.removed.util_avg.stddev 146.33 ± 4% -12.0% 128.80 ± 8% sched_debug.cfs_rq:/.util_avg.stddev 361281 ± 5% -13.6% 312127 ± 3% sched_debug.cpu.avg_idle.stddev 1229022 -9.9% 1107544 sched_debug.cpu.clock.avg 1229436 -9.9% 1107919 sched_debug.cpu.clock.max 1228579 -9.9% 1107137 sched_debug.cpu.clock.min 248.12 ± 6% -8.9% 226.15 ± 2% sched_debug.cpu.clock.stddev 1201071 -9.7% 1084858 sched_debug.cpu.clock_task.avg 1205361 -9.7% 1088445 sched_debug.cpu.clock_task.max 1190139 -9.7% 1074355 sched_debug.cpu.clock_task.min 156325 ± 4% -21.3% 123055 ± 3% sched_debug.cpu.max_idle_balance_cost.stddev 0.00 ± 5% -8.8% 0.00 ± 2% sched_debug.cpu.next_balance.stddev 0.23 ± 3% -6.9% 0.21 ± 4% sched_debug.cpu.nr_running.stddev 22855 -11.9% 20146 ± 2% sched_debug.cpu.nr_switches.avg 0.00 ± 74% +301.6% 0.00 ± 41% sched_debug.cpu.nr_uninterruptible.avg -20.99 +50.9% -31.67 sched_debug.cpu.nr_uninterruptible.min 1228564 -9.9% 1107124 sched_debug.cpu_clk 1227997 -9.9% 1106556 sched_debug.ktime 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_migratory.avg 0.02 ± 70% +66.1% 0.03 sched_debug.rt_rq:.rt_nr_migratory.max 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_migratory.stddev 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_running.avg 0.02 ± 70% +66.1% 0.03 sched_debug.rt_rq:.rt_nr_running.max 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_running.stddev 1229125 -9.9% 1107673 sched_debug.sched_clk 36.73 +9.2% 40.12 perf-stat.i.MPKI 1.156e+08 +0.9% 1.166e+08 perf-stat.i.branch-instructions 1.41 +0.1 1.49 perf-stat.i.branch-miss-rate% 1755317 +6.4% 1868497 perf-stat.i.branch-misses 65.90 +2.6 68.53 perf-stat.i.cache-miss-rate% 13292768 +13.0% 15016556 perf-stat.i.cache-misses 20180664 +9.2% 22041180 perf-stat.i.cache-references 1620 -2.0% 1588 perf-stat.i.context-switches 492.61 +2.2% 503.60 perf-stat.i.cpi 2.624e+11 +2.3% 2.685e+11 perf-stat.i.cpu-cycles 20261 -9.6% 18315 perf-stat.i.cycles-between-cache-misses 0.08 ± 5% -0.0 0.07 perf-stat.i.dTLB-load-miss-rate% 114641 ± 5% -6.6% 107104 perf-stat.i.dTLB-load-misses 0.24 +0.0 0.25 perf-stat.i.dTLB-store-miss-rate% 202887 +3.4% 209829 perf-stat.i.dTLB-store-misses 479259 ± 2% -9.8% 432243 ± 6% perf-stat.i.iTLB-load-misses 272948 ± 5% -16.4% 228065 ± 3% perf-stat.i.iTLB-loads 5.888e+08 +0.8% 5.938e+08 perf-stat.i.instructions 1349 +15.8% 1561 ± 2% perf-stat.i.instructions-per-iTLB-miss 2.73 +2.3% 2.80 perf-stat.i.metric.GHz 3510 +2.9% 3612 perf-stat.i.minor-faults 302696 ± 4% +8.0% 327055 perf-stat.i.node-load-misses 5025469 ± 3% +16.0% 5831348 ± 2% perf-stat.i.node-store-misses 6419781 +11.7% 7171575 perf-stat.i.node-stores 3510 +2.9% 3613 perf-stat.i.page-faults 34.43 +8.1% 37.21 perf-stat.overall.MPKI 1.51 +0.1 1.59 perf-stat.overall.branch-miss-rate% 66.31 +2.2 68.53 perf-stat.overall.cache-miss-rate% 19793 -9.3% 17950 perf-stat.overall.cycles-between-cache-misses 0.07 ± 5% -0.0 0.07 perf-stat.overall.dTLB-load-miss-rate% 0.23 +0.0 0.24 perf-stat.overall.dTLB-store-miss-rate% 1227 ± 2% +12.1% 1376 ± 6% perf-stat.overall.instructions-per-iTLB-miss 1729818 +6.4% 1840962 perf-stat.ps.branch-misses 13346402 +12.6% 15031113 perf-stat.ps.cache-misses 20127330 +9.0% 21934543 perf-stat.ps.cache-references 1624 -2.1% 1590 perf-stat.ps.context-switches 2.641e+11 +2.1% 2.698e+11 perf-stat.ps.cpu-cycles 113287 ± 5% -6.8% 105635 perf-stat.ps.dTLB-load-misses 203569 +3.2% 210036 perf-stat.ps.dTLB-store-misses 476376 ± 2% -9.8% 429901 ± 6% perf-stat.ps.iTLB-load-misses 259293 ± 5% -16.3% 217088 ± 3% perf-stat.ps.iTLB-loads 3465 +3.1% 3571 perf-stat.ps.minor-faults 299695 ± 4% +8.3% 324433 perf-stat.ps.node-load-misses 5044747 ± 3% +15.7% 5834322 ± 2% perf-stat.ps.node-store-misses 6459846 +11.3% 7189821 perf-stat.ps.node-stores 3465 +3.1% 3571 perf-stat.ps.page-faults 1.44e+12 -11.4% 1.275e+12 perf-stat.total.instructions 0.47 ± 58% +593.5% 3.27 ± 81% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 0.37 ±124% +352.3% 1.67 ± 58% perf-sched.sch_delay.avg.ms.__cond_resched.copy_strings.isra.0.do_execveat_common 0.96 ± 74% -99.0% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.part 2.01 ± 79% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.__install_special_mapping.map_vdso 1.35 ± 72% -69.8% 0.41 ± 80% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm 0.17 ± 18% -26.5% 0.13 ± 5% perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 0.26 ± 16% -39.0% 0.16 ± 7% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 2.57 ± 65% +1027.2% 28.92 ±120% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 0.38 ±119% +669.3% 2.92 ± 19% perf-sched.sch_delay.max.ms.__cond_resched.copy_strings.isra.0.do_execveat_common 0.51 ±141% +234.9% 1.71 ± 69% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_map.load_elf_binary 1.63 ± 74% -98.9% 0.02 ±141% perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.part 3.38 ± 12% -55.7% 1.50 ± 78% perf-sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.search_binary_handler.exec_binprm 2.37 ± 68% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.__install_special_mapping.map_vdso 2.05 ± 62% -68.1% 0.65 ± 93% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm 9.09 ±119% -96.0% 0.36 ± 42% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 3.86 ± 40% -50.1% 1.93 ± 30% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 2.77 ± 78% -88.0% 0.33 ± 29% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 2.48 ± 60% -86.1% 0.34 ± 7% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 85.92 ± 73% +97.7% 169.86 ± 31% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 95.98 ± 6% -9.5% 86.82 ± 4% perf-sched.total_wait_and_delay.average.ms 95.30 ± 6% -9.6% 86.19 ± 4% perf-sched.total_wait_time.average.ms 725.88 ± 28% -73.5% 192.63 ±141% perf-sched.wait_and_delay.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 2.22 ± 42% -76.2% 0.53 ±141% perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 4.02 ± 5% -31.9% 2.74 ± 19% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 653.51 ± 9% -13.3% 566.43 ± 7% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 775.33 ± 4% -19.8% 621.67 ± 13% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi 88.33 ± 14% -16.6% 73.67 ± 11% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 6.28 ± 19% -73.5% 1.67 ±141% perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 1286 ± 3% -65.6% 442.66 ± 91% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 222.90 ± 16% +53.8% 342.84 ± 30% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 0.91 ± 70% +7745.7% 71.06 ±129% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter 21.65 ± 34% +42.0% 30.75 ± 12% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 2.67 ± 26% -96.6% 0.09 ±141% perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup 725.14 ± 28% -73.5% 192.24 ±141% perf-sched.wait_time.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 2.87 ± 28% -96.7% 0.09 ± 77% perf-sched.wait_time.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open 2.10 ± 73% +4020.9% 86.55 ±135% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.open_last_lookups.path_openat 1.96 ± 73% -94.8% 0.10 ±141% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 3.24 ± 21% -65.0% 1.13 ± 69% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.mmap_region 338.18 ±140% -100.0% 0.07 ±141% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.prepare_creds.copy_creds.copy_process 21.80 ±122% -94.7% 1.16 ±130% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.do_vmi_align_munmap 4.29 ± 11% -66.2% 1.45 ±118% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup 0.94 ±126% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.pipe_write.vfs_write.ksys_write 3.69 ± 29% -72.9% 1.00 ±141% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.do_exit.do_group_exit.__x64_sys_exit_group 0.04 ±141% +6192.3% 2.73 ± 63% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode 32.86 ±128% -95.2% 1.57 ± 12% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 3.96 ± 5% -33.0% 2.66 ± 19% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 7.38 ± 57% -89.8% 0.75 ± 88% perf-sched.wait_time.avg.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread 643.25 ± 9% -12.8% 560.82 ± 8% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 2.22 ± 74% +15121.1% 338.52 ±138% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter 4.97 ± 39% -98.2% 0.09 ±141% perf-sched.wait_time.max.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup 3.98 -96.1% 0.16 ± 94% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open 4.28 ± 3% -66.5% 1.44 ±126% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open 3.95 ± 14% +109.8% 8.28 ± 45% perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 2.04 ± 74% -95.0% 0.10 ±141% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 340.63 ±140% -100.0% 0.12 ±141% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.prepare_creds.copy_creds.copy_process 4.74 ± 22% -68.4% 1.50 ±117% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup 1.30 ±141% +205.8% 3.99 perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 1.42 ±131% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.pipe_write.vfs_write.ksys_write 337.62 ±140% -99.6% 1.33 ±141% perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru 4.91 ± 27% +4797.8% 240.69 ± 69% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 4.29 ± 7% -76.7% 1.00 ±141% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.do_exit.do_group_exit.__x64_sys_exit_group 0.05 ±141% +5358.6% 2.77 ± 61% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode 338.90 ±138% -98.8% 3.95 perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 1284 ± 3% -68.7% 401.56 ±106% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 7.38 ± 57% -89.8% 0.75 ± 88% perf-sched.wait_time.max.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread 20.80 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.__cmd_record 20.80 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record 20.78 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record 20.74 ± 72% -20.7 0.00 perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record 20.43 ± 72% -20.4 0.00 perf-profile.calltrace.cycles-pp.process_simple.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record 20.03 ± 72% -20.0 0.00 perf-profile.calltrace.cycles-pp.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events.record__finish_output 19.84 ± 72% -19.8 0.00 perf-profile.calltrace.cycles-pp.queue_event.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events 0.77 ± 26% +0.2 1.00 ± 13% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.73 ± 26% +0.3 1.00 ± 21% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.74 ± 18% +0.3 1.07 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 0.73 ± 18% +0.3 1.07 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.78 ± 36% +0.3 1.11 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 0.44 ± 73% +0.3 0.77 ± 14% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 0.78 ± 36% +0.3 1.12 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__fxstat64 0.76 ± 17% +0.3 1.10 ± 19% perf-profile.calltrace.cycles-pp.write 0.81 ± 34% +0.4 1.16 ± 16% perf-profile.calltrace.cycles-pp.__fxstat64 0.96 ± 33% +0.4 1.35 ± 15% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.96 ± 33% +0.4 1.35 ± 15% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.18 ±141% +0.4 0.60 ± 13% perf-profile.calltrace.cycles-pp.walk_component.link_path_walk.path_openat.do_filp_open.do_sys_openat2 1.00 ± 28% +0.4 1.43 ± 6% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close_nocancel 0.22 ±141% +0.4 0.65 ± 18% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.47 ± 76% +0.5 0.93 ± 10% perf-profile.calltrace.cycles-pp.mm_init.alloc_bprm.do_execveat_common.__x64_sys_execve.do_syscall_64 0.42 ± 73% +0.5 0.90 ± 23% perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap 1.14 ± 29% +0.5 1.62 ± 7% perf-profile.calltrace.cycles-pp.__close_nocancel 0.41 ± 73% +0.5 0.90 ± 23% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap 1.10 ± 28% +0.5 1.59 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close_nocancel 1.10 ± 28% +0.5 1.59 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close_nocancel 1.13 ± 19% +0.5 1.66 ± 17% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 0.58 ± 77% +0.5 1.12 ± 8% perf-profile.calltrace.cycles-pp.alloc_bprm.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.22 ±141% +0.5 0.77 ± 18% perf-profile.calltrace.cycles-pp.lookup_fast.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 0.27 ±141% +0.5 0.82 ± 20% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 0.00 +0.6 0.56 ± 9% perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.link_path_walk.path_openat.do_filp_open 0.22 ±141% +0.6 0.85 ± 18% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 1.03 ± 71% +5.3 6.34 ± 64% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 1.04 ± 71% +5.3 6.37 ± 64% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 1.00 ± 71% +5.5 6.50 ± 57% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 1.03 ± 71% +5.6 6.61 ± 58% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 1.07 ± 71% +5.7 6.74 ± 57% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 1.38 ± 78% +6.2 7.53 ± 41% perf-profile.calltrace.cycles-pp.copy_page.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch 1.44 ± 80% +6.2 7.63 ± 41% perf-profile.calltrace.cycles-pp.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages 1.44 ± 80% +6.2 7.67 ± 41% perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page 1.44 ± 80% +6.2 7.67 ± 41% perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page 1.52 ± 78% +6.5 8.07 ± 41% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault 1.52 ± 78% +6.5 8.07 ± 41% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault 1.52 ± 78% +6.6 8.08 ± 41% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.53 ± 78% +6.6 8.14 ± 41% perf-profile.calltrace.cycles-pp.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 5.22 ± 49% +7.3 12.52 ± 23% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 5.49 ± 48% +7.5 12.98 ± 22% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 6.00 ± 47% +7.6 13.57 ± 20% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 5.97 ± 48% +7.6 13.55 ± 20% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 6.99 ± 45% +7.8 14.80 ± 19% perf-profile.calltrace.cycles-pp.asm_exc_page_fault 20.83 ± 73% -20.8 0.00 perf-profile.children.cycles-pp.queue_event 20.80 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.record__finish_output 20.78 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.perf_session__process_events 20.75 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.reader__read_event 20.43 ± 72% -20.4 0.00 perf-profile.children.cycles-pp.process_simple 20.03 ± 72% -20.0 0.00 perf-profile.children.cycles-pp.ordered_events__queue 0.37 ± 14% -0.1 0.26 ± 15% perf-profile.children.cycles-pp.rebalance_domains 0.11 ± 8% -0.1 0.06 ± 75% perf-profile.children.cycles-pp.wake_up_q 0.13 ± 7% +0.0 0.15 ± 13% perf-profile.children.cycles-pp.get_unmapped_area 0.05 +0.0 0.08 ± 22% perf-profile.children.cycles-pp.complete_signal 0.07 ± 23% +0.0 0.10 ± 19% perf-profile.children.cycles-pp.lru_add_fn 0.08 ± 24% +0.0 0.12 ± 10% perf-profile.children.cycles-pp.__do_sys_brk 0.08 ± 11% +0.0 0.13 ± 19% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown 0.08 ± 12% +0.0 0.12 ± 27% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list 0.02 ±141% +0.0 0.06 ± 19% perf-profile.children.cycles-pp.workingset_age_nonresident 0.02 ±141% +0.0 0.06 ± 19% perf-profile.children.cycles-pp.workingset_activation 0.04 ± 71% +0.1 0.09 ± 5% perf-profile.children.cycles-pp.page_add_file_rmap 0.09 ± 18% +0.1 0.14 ± 23% perf-profile.children.cycles-pp.terminate_walk 0.08 ± 12% +0.1 0.13 ± 19% perf-profile.children.cycles-pp.__send_signal_locked 0.00 +0.1 0.06 ± 8% perf-profile.children.cycles-pp.proc_pident_lookup 0.11 ± 15% +0.1 0.17 ± 15% perf-profile.children.cycles-pp.exit_notify 0.15 ± 31% +0.1 0.21 ± 15% perf-profile.children.cycles-pp.try_charge_memcg 0.04 ± 71% +0.1 0.10 ± 27% perf-profile.children.cycles-pp.__mod_lruvec_state 0.04 ± 73% +0.1 0.10 ± 24% perf-profile.children.cycles-pp.__mod_node_page_state 0.11 ± 25% +0.1 0.17 ± 22% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler 0.08 ± 12% +0.1 0.14 ± 26% perf-profile.children.cycles-pp.get_slabinfo 0.02 ±141% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.fput 0.12 ± 6% +0.1 0.18 ± 20% perf-profile.children.cycles-pp.xas_find 0.08 ± 17% +0.1 0.15 ± 39% perf-profile.children.cycles-pp.task_numa_fault 0.07 ± 44% +0.1 0.14 ± 18% perf-profile.children.cycles-pp.___slab_alloc 0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.copy_creds 0.08 ± 12% +0.1 0.15 ± 18% perf-profile.children.cycles-pp._exit 0.07 ± 78% +0.1 0.15 ± 27% perf-profile.children.cycles-pp.file_free_rcu 0.02 ±141% +0.1 0.09 ± 25% perf-profile.children.cycles-pp.do_task_dead 0.19 ± 22% +0.1 0.27 ± 10% perf-profile.children.cycles-pp.dequeue_entity 0.18 ± 29% +0.1 0.26 ± 16% perf-profile.children.cycles-pp.lru_add_drain 0.03 ± 70% +0.1 0.11 ± 25% perf-profile.children.cycles-pp.node_read_numastat 0.07 ± 25% +0.1 0.15 ± 51% perf-profile.children.cycles-pp.__kernel_read 0.20 ± 4% +0.1 0.28 ± 24% perf-profile.children.cycles-pp.__do_fault 0.23 ± 17% +0.1 0.31 ± 9% perf-profile.children.cycles-pp.native_irq_return_iret 0.11 ± 27% +0.1 0.20 ± 17% perf-profile.children.cycles-pp.__pte_alloc 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.cgroup_rstat_flush 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.cgroup_rstat_flush_locked 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.do_flush_stats 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.flush_memcg_stats_dwork 0.12 ± 28% +0.1 0.20 ± 18% perf-profile.children.cycles-pp.d_path 0.08 ± 36% +0.1 0.16 ± 17% perf-profile.children.cycles-pp.lookup_open 0.11 ± 7% +0.1 0.20 ± 33% perf-profile.children.cycles-pp.copy_pte_range 0.13 ± 16% +0.1 0.22 ± 18% perf-profile.children.cycles-pp.dev_attr_show 0.04 ± 73% +0.1 0.13 ± 49% perf-profile.children.cycles-pp.task_numa_migrate 0.19 ± 17% +0.1 0.28 ± 7% perf-profile.children.cycles-pp.__count_memcg_events 0.15 ± 17% +0.1 0.24 ± 10% perf-profile.children.cycles-pp.__pmd_alloc 0.00 +0.1 0.09 ± 31% perf-profile.children.cycles-pp.remove_vma 0.13 ± 16% +0.1 0.22 ± 22% perf-profile.children.cycles-pp.sysfs_kf_seq_show 0.12 ± 26% +0.1 0.21 ± 26% perf-profile.children.cycles-pp.__do_set_cpus_allowed 0.08 ± 78% +0.1 0.18 ± 20% perf-profile.children.cycles-pp.free_unref_page 0.02 ±141% +0.1 0.11 ± 32% perf-profile.children.cycles-pp.nd_jump_root 0.05 ± 74% +0.1 0.14 ± 23% perf-profile.children.cycles-pp._find_next_bit 0.12 ± 22% +0.1 0.21 ± 21% perf-profile.children.cycles-pp.clock_gettime 0.02 ±141% +0.1 0.11 ± 29% perf-profile.children.cycles-pp.free_percpu 0.00 +0.1 0.10 ± 25% perf-profile.children.cycles-pp.lockref_get 0.25 ± 40% +0.1 0.35 ± 24% perf-profile.children.cycles-pp.shift_arg_pages 0.26 ± 29% +0.1 0.36 ± 14% perf-profile.children.cycles-pp.rmqueue 0.13 ± 35% +0.1 0.23 ± 24% perf-profile.children.cycles-pp.single_open 0.05 ± 78% +0.1 0.15 ± 29% perf-profile.children.cycles-pp.vma_expand 0.09 ± 5% +0.1 0.21 ± 41% perf-profile.children.cycles-pp.prepare_task_switch 0.08 ± 12% +0.1 0.19 ± 37% perf-profile.children.cycles-pp.copy_page_to_iter 0.22 ± 40% +0.1 0.34 ± 33% perf-profile.children.cycles-pp.mas_wr_node_store 0.16 ± 41% +0.1 0.27 ± 13% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr_locked 0.16 ± 10% +0.1 0.28 ± 26% perf-profile.children.cycles-pp.free_pages_and_swap_cache 0.11 ± 28% +0.1 0.23 ± 27% perf-profile.children.cycles-pp.single_release 0.00 +0.1 0.12 ± 37% perf-profile.children.cycles-pp.find_busiest_queue 0.23 ± 28% +0.1 0.35 ± 23% perf-profile.children.cycles-pp.pte_alloc_one 0.23 ± 32% +0.1 0.35 ± 16% perf-profile.children.cycles-pp.strncpy_from_user 0.20 ± 55% +0.1 0.33 ± 25% perf-profile.children.cycles-pp.gather_stats 0.16 ± 30% +0.1 0.30 ± 12% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 0.29 ± 31% +0.1 0.43 ± 14% perf-profile.children.cycles-pp.setup_arg_pages 0.13 ± 18% +0.1 0.27 ± 28% perf-profile.children.cycles-pp.aa_file_perm 0.03 ± 70% +0.1 0.18 ± 73% perf-profile.children.cycles-pp.set_pmd_migration_entry 0.09 ±103% +0.1 0.23 ± 39% perf-profile.children.cycles-pp.__wait_for_common 0.19 ± 16% +0.1 0.33 ± 27% perf-profile.children.cycles-pp.obj_cgroup_charge 0.03 ± 70% +0.1 0.18 ± 74% perf-profile.children.cycles-pp.try_to_migrate_one 0.14 ± 41% +0.2 0.29 ± 34% perf-profile.children.cycles-pp.select_task_rq 0.28 ± 35% +0.2 0.44 ± 28% perf-profile.children.cycles-pp.vm_area_alloc 0.04 ± 71% +0.2 0.20 ± 73% perf-profile.children.cycles-pp.try_to_migrate 0.04 ± 71% +0.2 0.22 ± 70% perf-profile.children.cycles-pp.rmap_walk_anon 0.37 ± 28% +0.2 0.55 ± 23% perf-profile.children.cycles-pp.pick_next_task_fair 0.04 ± 71% +0.2 0.22 ± 57% perf-profile.children.cycles-pp.migrate_folio_unmap 0.11 ± 51% +0.2 0.31 ± 30% perf-profile.children.cycles-pp.on_each_cpu_cond_mask 0.30 ± 30% +0.2 0.50 ± 16% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.30 ± 19% +0.2 0.50 ± 23% perf-profile.children.cycles-pp.__perf_sw_event 0.21 ± 30% +0.2 0.41 ± 19% perf-profile.children.cycles-pp.apparmor_file_permission 0.25 ± 29% +0.2 0.45 ± 15% perf-profile.children.cycles-pp.security_file_permission 0.13 ± 55% +0.2 0.34 ± 24% perf-profile.children.cycles-pp.smp_call_function_many_cond 0.31 ± 34% +0.2 0.52 ± 30% perf-profile.children.cycles-pp.pipe_read 0.32 ± 16% +0.2 0.55 ± 8% perf-profile.children.cycles-pp.getname_flags 0.33 ± 11% +0.2 0.55 ± 21% perf-profile.children.cycles-pp.___perf_sw_event 0.17 ± 44% +0.2 0.40 ± 38% perf-profile.children.cycles-pp.newidle_balance 0.38 ± 38% +0.2 0.60 ± 12% perf-profile.children.cycles-pp.__percpu_counter_init 0.38 ± 37% +0.2 0.61 ± 18% perf-profile.children.cycles-pp.readlink 0.27 ± 40% +0.2 0.51 ± 21% perf-profile.children.cycles-pp.mod_objcg_state 0.76 ± 17% +0.3 1.10 ± 19% perf-profile.children.cycles-pp.write 0.48 ± 42% +0.4 0.83 ± 13% perf-profile.children.cycles-pp.pid_revalidate 0.61 ± 34% +0.4 0.98 ± 17% perf-profile.children.cycles-pp.__d_lookup_rcu 0.73 ± 35% +0.4 1.12 ± 8% perf-profile.children.cycles-pp.alloc_bprm 0.59 ± 42% +0.4 0.98 ± 11% perf-profile.children.cycles-pp.pcpu_alloc 0.77 ± 31% +0.4 1.21 ± 4% perf-profile.children.cycles-pp.mm_init 0.92 ± 31% +0.5 1.38 ± 12% perf-profile.children.cycles-pp.__fxstat64 0.74 ± 32% +0.5 1.27 ± 20% perf-profile.children.cycles-pp.open_last_lookups 1.37 ± 29% +0.6 1.94 ± 19% perf-profile.children.cycles-pp.kmem_cache_alloc 1.35 ± 38% +0.7 2.09 ± 15% perf-profile.children.cycles-pp.lookup_fast 1.13 ± 59% +5.3 6.47 ± 63% perf-profile.children.cycles-pp.start_secondary 1.06 ± 60% +5.4 6.50 ± 57% perf-profile.children.cycles-pp.intel_idle 1.09 ± 59% +5.5 6.62 ± 58% perf-profile.children.cycles-pp.cpuidle_enter 1.09 ± 59% +5.5 6.62 ± 58% perf-profile.children.cycles-pp.cpuidle_enter_state 1.10 ± 59% +5.5 6.65 ± 58% perf-profile.children.cycles-pp.cpuidle_idle_call 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.cpu_startup_entry 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.do_idle 1.51 ± 69% +6.1 7.65 ± 41% perf-profile.children.cycles-pp.folio_copy 1.52 ± 69% +6.2 7.68 ± 41% perf-profile.children.cycles-pp.move_to_new_folio 1.52 ± 69% +6.2 7.68 ± 41% perf-profile.children.cycles-pp.migrate_folio_extra 1.74 ± 63% +6.2 7.96 ± 39% perf-profile.children.cycles-pp.copy_page 1.61 ± 68% +6.5 8.08 ± 41% perf-profile.children.cycles-pp.migrate_pages_batch 1.61 ± 68% +6.5 8.09 ± 41% perf-profile.children.cycles-pp.migrate_pages 1.61 ± 68% +6.5 8.10 ± 41% perf-profile.children.cycles-pp.migrate_misplaced_page 1.62 ± 67% +6.5 8.14 ± 41% perf-profile.children.cycles-pp.do_huge_pmd_numa_page 7.23 ± 41% +7.5 14.76 ± 19% perf-profile.children.cycles-pp.__handle_mm_fault 8.24 ± 38% +7.6 15.86 ± 17% perf-profile.children.cycles-pp.exc_page_fault 8.20 ± 38% +7.6 15.84 ± 17% perf-profile.children.cycles-pp.do_user_addr_fault 9.84 ± 35% +7.7 17.51 ± 15% perf-profile.children.cycles-pp.asm_exc_page_fault 7.71 ± 40% +7.7 15.41 ± 18% perf-profile.children.cycles-pp.handle_mm_fault 20.00 ± 72% -20.0 0.00 perf-profile.self.cycles-pp.queue_event 0.18 ± 22% -0.1 0.10 ± 24% perf-profile.self.cycles-pp.__d_lookup 0.07 ± 25% +0.0 0.10 ± 9% perf-profile.self.cycles-pp.__perf_read_group_add 0.08 ± 16% +0.0 0.12 ± 26% perf-profile.self.cycles-pp.check_heap_object 0.05 ± 8% +0.0 0.09 ± 30% perf-profile.self.cycles-pp.__memcg_kmem_charge_page 0.02 ±141% +0.0 0.06 ± 13% perf-profile.self.cycles-pp.try_to_wake_up 0.08 ± 31% +0.1 0.14 ± 30% perf-profile.self.cycles-pp.task_dump_owner 0.05 ± 74% +0.1 0.10 ± 24% perf-profile.self.cycles-pp.rmqueue 0.14 ± 26% +0.1 0.20 ± 6% perf-profile.self.cycles-pp.init_file 0.05 ± 78% +0.1 0.10 ± 4% perf-profile.self.cycles-pp.enqueue_task_fair 0.05 ± 78% +0.1 0.10 ± 27% perf-profile.self.cycles-pp.___slab_alloc 0.02 ±141% +0.1 0.08 ± 24% perf-profile.self.cycles-pp.pick_link 0.04 ± 73% +0.1 0.10 ± 24% perf-profile.self.cycles-pp.__mod_node_page_state 0.07 ± 17% +0.1 0.14 ± 26% perf-profile.self.cycles-pp.get_slabinfo 0.00 +0.1 0.07 ± 18% perf-profile.self.cycles-pp.select_task_rq 0.07 ± 78% +0.1 0.15 ± 27% perf-profile.self.cycles-pp.file_free_rcu 0.09 ± 44% +0.1 0.16 ± 15% perf-profile.self.cycles-pp.apparmor_file_permission 0.08 ± 27% +0.1 0.15 ± 35% perf-profile.self.cycles-pp.malloc 0.02 ±141% +0.1 0.10 ± 29% perf-profile.self.cycles-pp.memcg_account_kmem 0.23 ± 17% +0.1 0.31 ± 9% perf-profile.self.cycles-pp.native_irq_return_iret 0.13 ± 32% +0.1 0.21 ± 32% perf-profile.self.cycles-pp.obj_cgroup_charge 0.10 ± 43% +0.1 0.19 ± 11% perf-profile.self.cycles-pp.perf_read 0.14 ± 12% +0.1 0.23 ± 25% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.13 ± 43% +0.1 0.23 ± 27% perf-profile.self.cycles-pp.mod_objcg_state 0.00 +0.1 0.10 ± 25% perf-profile.self.cycles-pp.lockref_get 0.07 ± 78% +0.1 0.18 ± 34% perf-profile.self.cycles-pp.update_rq_clock_task 0.00 +0.1 0.10 ± 27% perf-profile.self.cycles-pp.find_busiest_queue 0.09 ± 59% +0.1 0.21 ± 29% perf-profile.self.cycles-pp.smp_call_function_many_cond 0.15 ± 31% +0.1 0.27 ± 16% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.19 ± 39% +0.1 0.32 ± 19% perf-profile.self.cycles-pp.zap_pte_range 0.13 ± 18% +0.1 0.26 ± 23% perf-profile.self.cycles-pp.aa_file_perm 0.19 ± 50% +0.1 0.32 ± 24% perf-profile.self.cycles-pp.gather_stats 0.24 ± 16% +0.2 0.40 ± 17% perf-profile.self.cycles-pp.___perf_sw_event 0.25 ± 31% +0.2 0.41 ± 16% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.08 ± 71% +0.2 0.25 ± 24% perf-profile.self.cycles-pp.pcpu_alloc 0.16 ± 38% +0.2 0.34 ± 21% perf-profile.self.cycles-pp.filemap_map_pages 0.32 ± 41% +0.2 0.54 ± 17% perf-profile.self.cycles-pp.pid_revalidate 0.47 ± 19% +0.3 0.73 ± 21% perf-profile.self.cycles-pp.kmem_cache_alloc 0.60 ± 34% +0.4 0.96 ± 18% perf-profile.self.cycles-pp.__d_lookup_rcu 1.06 ± 60% +5.4 6.50 ± 57% perf-profile.self.cycles-pp.intel_idle 1.74 ± 63% +6.2 7.92 ± 39% perf-profile.self.cycles-pp.copy_page *************************************************************************************************** lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/_INVERSE_BIND/autonuma-benchmark commit: fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") 167773d1dd ("sched/numa: Increase tasks' access history") fc769221b23064c0 167773d1ddb5ffdd944f851f2cb ---------------- --------------------------- %stddev %change %stddev \ | \ 0.01 ± 20% +0.0 0.01 ± 15% mpstat.cpu.all.iowait% 25370 ± 3% -13.5% 21946 ± 6% uptime.idle 2.098e+10 ± 4% -15.8% 1.767e+10 ± 7% cpuidle..time 21696014 ± 4% -15.8% 18274389 ± 7% cpuidle..usage 3567832 ± 2% -12.9% 3106532 ± 5% numa-numastat.node1.local_node 4472555 ± 2% -10.8% 3989658 ± 6% numa-numastat.node1.numa_hit 21420616 ± 4% -15.9% 18019892 ± 7% turbostat.C6 62.12 +3.8% 64.46 turbostat.RAMWatt 185236 ± 6% -17.4% 152981 ± 15% numa-meminfo.node1.Active 184892 ± 6% -17.5% 152523 ± 15% numa-meminfo.node1.Active(anon) 190876 ± 6% -17.4% 157580 ± 15% numa-meminfo.node1.Shmem 373.94 ± 4% -14.8% 318.67 ± 6% autonuma-benchmark.numa01.seconds 3066 ± 2% -7.6% 2833 ± 3% autonuma-benchmark.time.elapsed_time 3066 ± 2% -7.6% 2833 ± 3% autonuma-benchmark.time.elapsed_time.max 1770652 ± 3% -7.7% 1634112 ± 3% autonuma-benchmark.time.involuntary_context_switches 258701 ± 2% -6.9% 240826 ± 3% autonuma-benchmark.time.user_time 46235 ± 6% -17.5% 38150 ± 15% numa-vmstat.node1.nr_active_anon 47723 ± 6% -17.4% 39411 ± 15% numa-vmstat.node1.nr_shmem 46235 ± 6% -17.5% 38150 ± 15% numa-vmstat.node1.nr_zone_active_anon 4471422 ± 2% -10.8% 3989129 ± 6% numa-vmstat.node1.numa_hit 3566699 ± 2% -12.9% 3106004 ± 5% numa-vmstat.node1.numa_local 2.37 ± 23% +45.3% 3.44 ± 16% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 2.26 ± 28% +45.0% 3.28 ± 20% sched_debug.cfs_rq:/.removed.util_avg.stddev 203.53 ± 4% -12.8% 177.48 ± 3% sched_debug.cfs_rq:/.util_est_enqueued.stddev 128836 ± 7% -16.9% 107001 ± 8% sched_debug.cpu.max_idle_balance_cost.stddev 12639 ± 6% -12.1% 11108 ± 8% sched_debug.cpu.nr_switches.min 0.06 ± 41% -44.9% 0.04 ± 20% perf-sched.sch_delay.avg.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm 1.84 ± 23% -56.4% 0.80 ± 33% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.08 ± 38% -55.2% 0.04 ± 22% perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm 7.55 ± 60% -77.2% 1.72 ±152% perf-sched.wait_time.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm 10.72 ± 60% -73.8% 2.81 ±171% perf-sched.wait_time.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm 0.28 ± 12% -16.4% 0.23 ± 5% perf-sched.wait_time.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm 8802 ± 3% -4.3% 8427 proc-vmstat.nr_mapped 54506 ± 5% -5.2% 51656 proc-vmstat.nr_shmem 8510048 -4.5% 8124296 proc-vmstat.numa_hit 43091 ± 8% +15.9% 49938 ± 6% proc-vmstat.numa_huge_pte_updates 7242046 -5.3% 6860532 ± 2% proc-vmstat.numa_local 3762770 ± 5% +34.7% 5068087 ± 3% proc-vmstat.numa_pages_migrated 22235827 ± 8% +15.8% 25759214 ± 6% proc-vmstat.numa_pte_updates 10591821 -5.4% 10024519 ± 2% proc-vmstat.pgfault 3762770 ± 5% +34.7% 5068087 ± 3% proc-vmstat.pgmigrate_success 489883 ± 2% -6.8% 456801 ± 3% proc-vmstat.pgreuse 7297 ± 5% +34.8% 9838 ± 3% proc-vmstat.thp_migration_success 22825216 -7.4% 21132800 ± 3% proc-vmstat.unevictable_pgs_scanned 40.10 +4.2% 41.80 perf-stat.i.MPKI 1.64 +0.1 1.74 perf-stat.i.branch-miss-rate% 1920111 +6.9% 2051982 perf-stat.i.branch-misses 60.50 +1.2 61.72 perf-stat.i.cache-miss-rate% 12369678 +6.9% 13223477 perf-stat.i.cache-misses 21918348 +4.6% 22934958 perf-stat.i.cache-references 22544 -4.0% 21634 perf-stat.i.cycles-between-cache-misses 1458 +12.1% 1635 ± 5% perf-stat.i.instructions-per-iTLB-miss 2.51 +2.4% 2.57 perf-stat.i.metric.M/sec 3383 +2.3% 3460 perf-stat.i.minor-faults 244016 +5.0% 256219 perf-stat.i.node-load-misses 4544736 +9.5% 4977101 ± 3% perf-stat.i.node-store-misses 6126744 +5.5% 6463826 ± 2% perf-stat.i.node-stores 3383 +2.3% 3460 perf-stat.i.page-faults 37.34 +3.4% 38.60 perf-stat.overall.MPKI 1.64 +0.1 1.74 perf-stat.overall.branch-miss-rate% 21951 -5.4% 20763 perf-stat.overall.cycles-between-cache-misses 1866870 +7.1% 2000069 perf-stat.ps.branch-misses 12385090 +6.6% 13198317 perf-stat.ps.cache-misses 21609219 +4.6% 22595642 perf-stat.ps.cache-references 3340 +2.3% 3418 perf-stat.ps.minor-faults 243774 +4.9% 255759 perf-stat.ps.node-load-misses 4560352 +9.0% 4973035 ± 3% perf-stat.ps.node-store-misses 6135666 +5.2% 6452858 ± 2% perf-stat.ps.node-stores 3340 +2.3% 3418 perf-stat.ps.page-faults 1.775e+12 -6.5% 1.659e+12 ± 2% perf-stat.total.instructions 32.90 ± 14% -14.9 17.99 ± 40% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt 0.60 ± 14% +0.3 0.88 ± 23% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 0.57 ± 49% +0.4 0.93 ± 14% perf-profile.calltrace.cycles-pp.update_sg_wakeup_stats.find_idlest_group.find_idlest_cpu.select_task_rq_fair.sched_exec 0.78 ± 12% +0.4 1.15 ± 34% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read 0.80 ± 14% +0.4 1.17 ± 26% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.82 ± 15% +0.4 1.19 ± 33% perf-profile.calltrace.cycles-pp.__libc_read.readn.perf_evsel__read.read_counters.process_interval 0.80 ± 14% +0.4 1.19 ± 33% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read.read_counters 0.50 ± 46% +0.4 0.89 ± 25% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.59 ± 49% +0.4 0.98 ± 19% perf-profile.calltrace.cycles-pp.find_idlest_group.find_idlest_cpu.select_task_rq_fair.sched_exec.bprm_execve 0.59 ± 48% +0.4 1.00 ± 25% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 0.67 ± 47% +0.4 1.10 ± 22% perf-profile.calltrace.cycles-pp.sched_exec.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64 0.90 ± 18% +0.4 1.33 ± 24% perf-profile.calltrace.cycles-pp.show_numa_map.seq_read_iter.seq_read.vfs_read.ksys_read 0.66 ± 46% +0.4 1.09 ± 27% perf-profile.calltrace.cycles-pp.gather_pte_stats.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range 0.68 ± 46% +0.5 1.13 ± 27% perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_vma.show_numa_map 0.68 ± 46% +0.5 1.13 ± 27% perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_vma 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.walk_page_vma.show_numa_map.seq_read_iter.seq_read.vfs_read 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_vma.show_numa_map.seq_read_iter.seq_read 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_vma.show_numa_map.seq_read_iter 0.40 ± 71% +0.5 0.88 ± 20% perf-profile.calltrace.cycles-pp._dl_addr 0.93 ± 18% +0.5 1.45 ± 28% perf-profile.calltrace.cycles-pp.__fxstat64 0.88 ± 18% +0.5 1.41 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 0.88 ± 18% +0.5 1.42 ± 28% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__fxstat64 0.60 ± 73% +0.6 1.24 ± 18% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.23 ±142% +0.7 0.88 ± 26% perf-profile.calltrace.cycles-pp.show_stat.seq_read_iter.vfs_read.ksys_read.do_syscall_64 2.87 ± 14% +1.3 4.21 ± 23% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 2.88 ± 14% +1.4 4.23 ± 23% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 34.28 ± 13% -14.6 19.70 ± 36% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.13 ± 29% -0.1 0.05 ± 76% perf-profile.children.cycles-pp.schedule_tail 0.12 ± 20% -0.1 0.05 ± 78% perf-profile.children.cycles-pp.__put_user_4 0.18 ± 16% +0.1 0.23 ± 13% perf-profile.children.cycles-pp.__x64_sys_munmap 0.09 ± 17% +0.1 0.16 ± 27% perf-profile.children.cycles-pp.__do_sys_brk 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_insert_into_field 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_opcode_1A_1T_1R 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_store_object_to_node 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_write_data_to_field 0.02 ±142% +0.1 0.09 ± 50% perf-profile.children.cycles-pp.common_perm_cond 0.06 ± 58% +0.1 0.14 ± 24% perf-profile.children.cycles-pp.___slab_alloc 0.02 ±144% +0.1 0.10 ± 63% perf-profile.children.cycles-pp.__alloc_pages_bulk 0.06 ± 18% +0.1 0.14 ± 58% perf-profile.children.cycles-pp.security_inode_getattr 0.12 ± 40% +0.1 0.21 ± 28% perf-profile.children.cycles-pp.__ptrace_may_access 0.07 ± 33% +0.1 0.18 ± 40% perf-profile.children.cycles-pp.brk 0.15 ± 14% +0.1 0.26 ± 23% perf-profile.children.cycles-pp.wq_worker_comm 0.09 ± 87% +0.1 0.21 ± 30% perf-profile.children.cycles-pp.irq_get_next_irq 0.93 ± 12% +0.2 1.17 ± 19% perf-profile.children.cycles-pp.do_dentry_open 0.15 ± 30% +0.3 0.43 ± 56% perf-profile.children.cycles-pp.run_ksoftirqd 0.54 ± 17% +0.4 0.89 ± 20% perf-profile.children.cycles-pp._dl_addr 0.74 ± 19% +0.4 1.09 ± 27% perf-profile.children.cycles-pp.gather_pte_stats 0.74 ± 25% +0.4 1.10 ± 21% perf-profile.children.cycles-pp.sched_exec 0.76 ± 19% +0.4 1.13 ± 27% perf-profile.children.cycles-pp.walk_p4d_range 0.76 ± 19% +0.4 1.13 ± 27% perf-profile.children.cycles-pp.walk_pud_range 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.walk_page_vma 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.__walk_page_range 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.walk_pgd_range 0.92 ± 13% +0.4 1.33 ± 20% perf-profile.children.cycles-pp.open_last_lookups 0.90 ± 17% +0.4 1.33 ± 24% perf-profile.children.cycles-pp.show_numa_map 0.43 ± 51% +0.5 0.88 ± 26% perf-profile.children.cycles-pp.show_stat 1.49 ± 11% +0.5 1.94 ± 15% perf-profile.children.cycles-pp.__do_softirq 1.22 ± 18% +0.6 1.78 ± 16% perf-profile.children.cycles-pp.update_sg_wakeup_stats 1.28 ± 20% +0.6 1.88 ± 18% perf-profile.children.cycles-pp.find_idlest_group 1.07 ± 16% +0.6 1.67 ± 30% perf-profile.children.cycles-pp.__fxstat64 1.36 ± 20% +0.6 1.98 ± 21% perf-profile.children.cycles-pp.find_idlest_cpu 30.64 ± 15% -14.9 15.70 ± 46% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt 0.01 ±223% +0.1 0.07 ± 36% perf-profile.self.cycles-pp.pick_next_task_fair 0.10 ± 28% +0.1 0.17 ± 28% perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg 0.00 +0.1 0.07 ± 32% perf-profile.self.cycles-pp.touch_atime 0.04 ±106% +0.1 0.11 ± 18% perf-profile.self.cycles-pp.___slab_alloc 0.12 ± 37% +0.1 0.20 ± 27% perf-profile.self.cycles-pp.__ptrace_may_access 0.05 ± 52% +0.1 0.13 ± 75% perf-profile.self.cycles-pp.pick_link 0.14 ± 28% +0.1 0.24 ± 34% perf-profile.self.cycles-pp.__slab_free 0.47 ± 19% +0.3 0.79 ± 16% perf-profile.self.cycles-pp._dl_addr 1.00 ± 19% +0.4 1.44 ± 18% perf-profile.self.cycles-pp.update_sg_wakeup_stats 6.04 ± 14% +1.9 7.99 ± 18% perf-profile.self.cycles-pp.syscall_exit_to_user_mode *************************************************************************************************** lkp-icl-2sp6: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark commit: fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") 167773d1dd ("sched/numa: Increase tasks' access history") fc769221b23064c0 167773d1ddb5ffdd944f851f2cb ---------------- --------------------------- %stddev %change %stddev \ | \ 36796 ± 6% -19.0% 29811 ± 8% uptime.idle 3.231e+10 ± 7% -21.6% 2.534e+10 ± 10% cpuidle..time 33785162 ± 7% -21.8% 26431366 ± 10% cpuidle..usage 10.56 ± 7% -1.5 9.02 ± 9% mpstat.cpu.all.idle% 0.01 ± 22% +0.0 0.01 ± 11% mpstat.cpu.all.iowait% 0.17 ± 2% -0.0 0.15 ± 4% mpstat.cpu.all.soft% 388157 ± 31% +60.9% 624661 ± 36% numa-numastat.node0.other_node 4511165 ± 4% -13.5% 3901276 ± 7% numa-numastat.node1.numa_hit 951382 ± 12% -30.4% 661932 ± 31% numa-numastat.node1.other_node 388157 ± 31% +60.9% 624658 ± 36% numa-vmstat.node0.numa_other 4510646 ± 4% -13.5% 3900948 ± 7% numa-vmstat.node1.numa_hit 951382 ± 12% -30.4% 661932 ± 31% numa-vmstat.node1.numa_other 305.08 ± 5% +19.6% 364.96 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.avg 989.11 ± 4% +13.0% 1117 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.max 5082 ± 6% -19.0% 4114 ± 12% sched_debug.cpu.curr->pid.stddev 85229 -13.2% 74019 ± 9% sched_debug.cpu.max_idle_balance_cost.stddev 7575 ± 5% -8.3% 6946 ± 3% sched_debug.cpu.nr_switches.min 394498 ± 5% -21.0% 311653 ± 10% turbostat.C1E 33233046 ± 8% -21.7% 26018024 ± 10% turbostat.C6 10.39 ± 7% -1.5 8.90 ± 9% turbostat.C6% 7.77 ± 6% -17.5% 6.41 ± 9% turbostat.CPU%c1 206.88 +2.9% 212.86 turbostat.RAMWatt 372.30 -8.3% 341.49 autonuma-benchmark.numa01.seconds 209.06 -10.7% 186.67 ± 6% autonuma-benchmark.numa01_THREAD_ALLOC.seconds 2408 -8.6% 2200 ± 2% autonuma-benchmark.time.elapsed_time 2408 -8.6% 2200 ± 2% autonuma-benchmark.time.elapsed_time.max 1221333 ± 2% -5.1% 1159380 ± 2% autonuma-benchmark.time.involuntary_context_switches 3508627 -4.1% 3363550 autonuma-benchmark.time.minor_page_faults 11174 +1.9% 11388 autonuma-benchmark.time.percent_of_cpu_this_job_got 261419 -7.0% 243046 ± 2% autonuma-benchmark.time.user_time 220972 ± 7% +22.1% 269753 ± 3% proc-vmstat.numa_hint_faults 164886 ± 11% +19.4% 196883 ± 5% proc-vmstat.numa_hint_faults_local 7964964 -5.9% 7494239 proc-vmstat.numa_hit 82885 ± 6% +43.4% 118829 ± 6% proc-vmstat.numa_huge_pte_updates 6625289 -6.3% 6207618 proc-vmstat.numa_local 6636312 ± 4% +33.1% 8834573 ± 3% proc-vmstat.numa_pages_migrated 42671823 ± 6% +43.2% 61094857 ± 6% proc-vmstat.numa_pte_updates 9173569 -6.2% 8602789 proc-vmstat.pgfault 6636312 ± 4% +33.1% 8834573 ± 3% proc-vmstat.pgmigrate_success 397595 -6.5% 371818 proc-vmstat.pgreuse 12917 ± 4% +33.2% 17200 ± 3% proc-vmstat.thp_migration_success 17964288 -8.7% 16401792 ± 2% proc-vmstat.unevictable_pgs_scanned 0.63 ± 12% -0.3 0.28 ±100% perf-profile.calltrace.cycles-pp.__libc_read.readn.evsel__read_counter.read_counters.process_interval 1.17 ± 4% -0.2 0.96 ± 14% perf-profile.children.cycles-pp.__irq_exit_rcu 0.65 ± 19% -0.2 0.46 ± 13% perf-profile.children.cycles-pp.task_mm_cid_work 0.23 ± 16% -0.2 0.08 ± 61% perf-profile.children.cycles-pp.rcu_gp_kthread 0.30 ± 5% -0.1 0.16 ± 43% perf-profile.children.cycles-pp.rebalance_domains 0.13 ± 21% -0.1 0.03 ±100% perf-profile.children.cycles-pp.rcu_gp_fqs_loop 0.25 ± 16% -0.1 0.18 ± 14% perf-profile.children.cycles-pp.lru_add_drain_cpu 0.17 ± 9% -0.1 0.11 ± 23% perf-profile.children.cycles-pp.__perf_read_group_add 0.09 ± 21% -0.0 0.04 ± 72% perf-profile.children.cycles-pp.__evlist__disable 0.11 ± 19% -0.0 0.07 ± 53% perf-profile.children.cycles-pp.vma_link 0.13 ± 6% -0.0 0.09 ± 27% perf-profile.children.cycles-pp.ptep_clear_flush 0.07 ± 7% -0.0 0.03 ±100% perf-profile.children.cycles-pp.__kernel_read 0.07 ± 7% -0.0 0.03 ±100% perf-profile.children.cycles-pp.simple_lookup 0.09 ± 9% +0.0 0.11 ± 10% perf-profile.children.cycles-pp.exit_notify 0.12 ± 14% +0.0 0.16 ± 17% perf-profile.children.cycles-pp.__do_set_cpus_allowed 0.02 ±141% +0.1 0.09 ± 40% perf-profile.children.cycles-pp.__sysvec_call_function 0.05 ± 78% +0.1 0.13 ± 42% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.03 ±141% +0.1 0.12 ± 41% perf-profile.children.cycles-pp.sysvec_call_function 0.64 ± 19% -0.2 0.45 ± 12% perf-profile.self.cycles-pp.task_mm_cid_work 0.07 ± 7% -0.0 0.03 ±100% perf-profile.self.cycles-pp.dequeue_task_fair 0.05 ± 8% +0.0 0.08 ± 14% perf-profile.self.cycles-pp.file_free_rcu 1057 +9.9% 1162 ± 2% perf-stat.i.MPKI 76.36 ± 2% +4.6 80.91 ± 2% perf-stat.i.cache-miss-rate% 5.353e+08 ± 4% +18.2% 6.327e+08 ± 3% perf-stat.i.cache-misses 7.576e+08 +9.3% 8.282e+08 ± 2% perf-stat.i.cache-references 3.727e+11 +1.7% 3.792e+11 perf-stat.i.cpu-cycles 154.73 +1.5% 157.11 perf-stat.i.cpu-migrations 722.61 ± 2% -8.9% 658.12 ± 3% perf-stat.i.cycles-between-cache-misses 2.91 +1.7% 2.96 perf-stat.i.metric.GHz 1242 ± 3% +5.7% 1312 ± 2% perf-stat.i.metric.K/sec 12.73 +9.8% 13.98 ± 2% perf-stat.i.metric.M/sec 245601 +5.4% 258749 perf-stat.i.node-load-misses 43.38 -2.5 40.91 ± 3% perf-stat.i.node-store-miss-rate% 2.267e+08 ± 3% +8.8% 2.467e+08 ± 4% perf-stat.i.node-store-misses 3.067e+08 ± 5% +25.2% 3.841e+08 ± 6% perf-stat.i.node-stores 915.00 +9.1% 998.24 ± 2% perf-stat.overall.MPKI 71.29 ± 3% +5.7 77.00 ± 3% perf-stat.overall.cache-miss-rate% 702.58 ± 3% -14.0% 604.23 ± 3% perf-stat.overall.cycles-between-cache-misses 42.48 ± 2% -3.3 39.20 ± 5% perf-stat.overall.node-store-miss-rate% 5.33e+08 ± 4% +18.1% 6.296e+08 ± 3% perf-stat.ps.cache-misses 7.475e+08 +9.4% 8.178e+08 ± 2% perf-stat.ps.cache-references 3.739e+11 +1.6% 3.8e+11 perf-stat.ps.cpu-cycles 154.22 +1.6% 156.62 perf-stat.ps.cpu-migrations 3655 +2.5% 3744 perf-stat.ps.minor-faults 242759 +5.4% 255974 perf-stat.ps.node-load-misses 2.255e+08 ± 3% +8.9% 2.457e+08 ± 3% perf-stat.ps.node-store-misses 3.057e+08 ± 5% +24.9% 3.82e+08 ± 6% perf-stat.ps.node-stores 3655 +2.5% 3744 perf-stat.ps.page-faults 1.968e+12 -8.3% 1.805e+12 ± 2% perf-stat.total.instructions 0.03 ±141% +283.8% 0.13 ± 85% perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 0.06 ± 77% +254.1% 0.20 ± 54% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 0.08 ± 28% -89.5% 0.01 ±223% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm 0.92 ± 10% -33.4% 0.62 ± 20% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.10 ± 22% -27.2% 0.07 ± 8% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.35 ±141% +186.8% 1.02 ± 69% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 1.47 ± 81% +262.6% 5.32 ± 79% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 2.42 ± 42% +185.9% 6.91 ± 52% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 0.26 ± 9% +1470.7% 4.16 ±115% perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 3.61 ± 7% -25.3% 2.70 ± 18% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.08 ± 28% -89.5% 0.01 ±223% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm 17.44 ± 4% -19.0% 14.12 ± 13% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 23.36 ± 21% -37.2% 14.67 ± 22% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 107.00 +11.5% 119.33 ± 4% perf-sched.wait_and_delay.count.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 75.00 +9.6% 82.17 ± 2% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 79.99 ± 97% -86.8% 10.52 ± 41% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single 145.98 ± 14% -41.5% 85.46 ± 22% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 1.20 ± 94% +152.3% 3.03 ± 31% perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup 2.30 ± 57% -90.9% 0.21 ±205% perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.part 0.06 ± 8% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary 0.58 ± 81% -76.6% 0.14 ± 50% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_lookupat.filename_lookup 2.63 ± 21% -59.4% 1.07 ± 68% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open 2.68 ± 40% -79.5% 0.55 ±174% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 3.59 ± 17% -52.9% 1.69 ± 98% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.mmap_region 4.05 ± 2% -80.6% 0.79 ±133% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup 3.75 ± 19% -81.9% 0.68 ±135% perf-sched.wait_time.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 1527 ± 70% -84.5% 236.84 ±223% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop 16.13 ± 4% -21.4% 12.69 ± 15% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 1.16 ±117% -99.1% 0.01 ±223% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 0.26 ± 25% -93.2% 0.02 ±223% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.__access_remote_vm 22.43 ± 21% -37.4% 14.05 ± 22% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 4.41 ± 8% -94.9% 0.22 ±191% perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.part 0.08 ± 29% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary 6.20 ± 8% -21.6% 4.87 ± 13% perf-sched.wait_time.max.ms.__cond_resched.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 4.23 ± 5% -68.3% 1.34 ±136% perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 3053 ± 70% -92.2% 236.84 ±223% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop 4.78 ± 33% +10431.5% 502.95 ± 99% perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 79.99 ± 97% -86.9% 10.51 ± 41% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single 2.13 ±128% -99.5% 0.01 ±223% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 0.26 ± 25% -92.4% 0.02 ±223% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.__access_remote_vm 142.79 ± 13% -40.9% 84.32 ± 22% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
On 9/12/2023 7:54 PM, kernel test robot wrote: > > > hi, Raghu, > > hope this third performance report for same one patch-set won't annoy you, > and better, have some value to you. Not at all. But thanks a lot and am rather more happy to see this exhaustive results. Because: It is easy to show see that patchset is increasing readability of code or maintainance of code etc., while I try my best to see regressions are within noise level for some corner cases and some benchmarks have improved noticeably, there is always a room to miss something. Reports like this, helps to boost confidence on patchset. Also your cumulative (bisection) report also helped to evaluate importance of each patch too.. > > we won't send more autonuma-benchmark performance improvement reports for this > patch-set, of course, unless you still hope we do. > > BTW, we will still send out performance/function regression reports if any. > > as in previous reports, we know that you want to see the performance impact > of whole patch set, so let me give a full summary here: > > let me list how we apply your patch set again: > > 68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs > af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned <-- we reported [1] > 167773d1ddb5f sched/numa: Increase tasks' access history <---- for this report > fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq > 1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic <--- we reported [2] > 2a806eab1c2e1 sched/numa: Move up the access pid reset logic > 2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well > > [1] https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@intel.com/ > [2] https://lore.kernel.org/all/202309121417.53f44ad6-oliver.sang@intel.com/ > > below will only give out the comparison between 2f88c8e802c8b and 68cfe9439a1ba > in a summary way, if you want detail data for more commits, or more comparison > data, please let me know. Thanks! > > on > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory > > ========================================================================================= > compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: > gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark > > 2f88c8e802c8b128 68cfe9439a1baa642e05883fa64 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 271.01 -26.4% 199.49 ± 3% autonuma-benchmark.numa01.seconds > 76.28 -46.9% 40.49 ± 5% autonuma-benchmark.numa01_THREAD_ALLOC.seconds > 8.11 -0.1% 8.10 autonuma-benchmark.numa02.seconds > 1425 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time > 1425 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time.max > > > on > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory > > ========================================================================================= > compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: > gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark > > 2f88c8e802c8b128 68cfe9439a1baa642e05883fa64 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 361.53 ± 6% -10.4% 323.83 ± 3% autonuma-benchmark.numa01.seconds > 255.31 -60.1% 101.90 ± 2% autonuma-benchmark.numa01_THREAD_ALLOC.seconds > 14.95 -4.6% 14.26 autonuma-benchmark.numa02.seconds > 2530 ± 3% -30.3% 1763 ± 2% autonuma-benchmark.time.elapsed_time > 2530 ± 3% -30.3% 1763 ± 2% autonuma-benchmark.time.elapsed_time.max > > This gives me fair confidence that we are able to get a decent improvement overall. > below is the auto-generated report part, FYI. > > Hello, > > kernel test robot noticed a -17.6% improvement of autonuma-benchmark.numa01.seconds on: > > > commit: 167773d1ddb5ffdd944f851f2cbdd4e65425a358 ("[RFC PATCH V1 4/6] sched/numa: Increase tasks' access history") > url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007 > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805 > patch link: https://lore.kernel.org/all/cf200aaf594caae68350219fa1f781d64136fa2c.1693287931.git.raghavendra.kt@amd.com/ > patch subject: [RFC PATCH V1 4/6] sched/numa: Increase tasks' access history > > testcase: autonuma-benchmark > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory > parameters: > > iterations: 4x > test: numa01_THREAD_ALLOC > cpufreq_governor: performance > > > In addition to that, the commit also has significant impact on the following tests: > > +------------------+----------------------------------------------------------------------------------------------------+ > | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01.seconds -15.4% improvement | > | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory | > | test parameters | cpufreq_governor=performance | > | | iterations=4x | > | | test=numa01_THREAD_ALLOC | > +------------------+----------------------------------------------------------------------------------------------------+ > | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01.seconds -14.8% improvement | > | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory | > | test parameters | cpufreq_governor=performance | > | | iterations=4x | > | | test=_INVERSE_BIND | > +------------------+----------------------------------------------------------------------------------------------------+ > | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01_THREAD_ALLOC.seconds -10.7% improvement | > | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory | > | test parameters | cpufreq_governor=performance | > | | iterations=4x | > | | test=numa01_THREAD_ALLOC | > +------------------+----------------------------------------------------------------------------------------------------+ > > Will go through this too. > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20230912/202309122114.b9e08a43-oliver.sang@intel.com > > ========================================================================================= > compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: > gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark > > commit: > fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") > 167773d1dd ("sched/numa: Increase tasks' access history") > > fc769221b23064c0 167773d1ddb5ffdd944f851f2cb > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 105.67 ± 8% -20.3% 84.17 ± 10% perf-c2c.HITM.remote > 1.856e+10 ± 7% -18.8% 1.508e+10 ± 8% cpuidle..time > 19025348 ± 7% -18.6% 15481744 ± 8% cpuidle..usage > 0.00 ± 28% +0.0 0.01 ± 10% mpstat.cpu.all.iowait% > 0.10 ± 2% -0.0 0.09 ± 4% mpstat.cpu.all.soft% > 1443 ± 2% -14.2% 1238 ± 4% uptime.boot > 26312 ± 5% -12.8% 22935 ± 5% uptime.idle > 8774783 ± 7% -19.0% 7104495 ± 8% turbostat.C1E > 10147966 ± 7% -18.4% 8280745 ± 8% turbostat.C6 > 3.225e+08 ± 2% -14.1% 2.77e+08 ± 4% turbostat.IRQ > 2.81 ± 24% +3.5 6.35 ± 24% turbostat.PKG_% > 638.24 +2.0% 650.74 turbostat.PkgWatt > 57.57 +10.9% 63.85 ± 2% turbostat.RAMWatt > 271.39 ± 2% -17.6% 223.53 ± 5% autonuma-benchmark.numa01.seconds > 1401 ± 2% -14.6% 1197 ± 4% autonuma-benchmark.time.elapsed_time > 1401 ± 2% -14.6% 1197 ± 4% autonuma-benchmark.time.elapsed_time.max > 1088153 ± 2% -14.1% 934904 ± 6% autonuma-benchmark.time.involuntary_context_switches > 3953 -2.6% 3852 ± 2% autonuma-benchmark.time.system_time > 287110 -14.5% 245511 ± 4% autonuma-benchmark.time.user_time > 22704 ± 7% +15.9% 26303 ± 8% autonuma-benchmark.time.voluntary_context_switches > 191.10 ± 64% +94.9% 372.49 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait > 4.09 ± 49% +85.6% 7.59 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 1.99 ± 40% +99.8% 3.97 ± 30% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter > 14.18 ±158% -82.6% 2.47 ± 22% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap > 189.39 ± 65% +96.5% 372.20 ± 7% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait > 2.18 ± 21% -33.3% 1.46 ± 41% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open > 3.22 ± 32% -73.0% 0.87 ± 81% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.single_open.do_dentry_open > 4.73 ± 20% +60.6% 7.59 ± 14% perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 9.61 ± 30% -32.8% 6.46 ± 16% perf-sched.wait_time.max.ms.__cond_resched.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 13.57 ± 65% -60.2% 5.40 ± 24% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open > 6040567 -6.2% 5667640 proc-vmstat.numa_hit > 32278 ± 7% +51.7% 48955 ± 18% proc-vmstat.numa_huge_pte_updates > 4822780 -7.5% 4459553 proc-vmstat.numa_local > 3187796 ± 9% +73.2% 5521800 ± 16% proc-vmstat.numa_pages_migrated > 16792299 ± 7% +50.8% 25319315 ± 18% proc-vmstat.numa_pte_updates > 6242814 -8.5% 5711173 ± 2% proc-vmstat.pgfault > 3187796 ± 9% +73.2% 5521800 ± 16% proc-vmstat.pgmigrate_success > 254872 ± 2% -12.3% 223591 ± 5% proc-vmstat.pgreuse > 6151 ± 9% +74.2% 10717 ± 16% proc-vmstat.thp_migration_success > 4201550 -13.7% 3627350 ± 3% proc-vmstat.unevictable_pgs_scanned > 1.823e+08 ± 2% -15.2% 1.547e+08 ± 5% sched_debug.cfs_rq:/.avg_vruntime.avg > 1.872e+08 ± 2% -15.3% 1.585e+08 ± 5% sched_debug.cfs_rq:/.avg_vruntime.max > 1.423e+08 ± 4% -14.0% 1.224e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min > 4320209 ± 8% -18.1% 3537344 ± 8% sched_debug.cfs_rq:/.avg_vruntime.stddev > 3349 ± 40% +58.3% 5300 ± 27% sched_debug.cfs_rq:/.load_avg.max > 1.823e+08 ± 2% -15.2% 1.547e+08 ± 5% sched_debug.cfs_rq:/.min_vruntime.avg > 1.872e+08 ± 2% -15.3% 1.585e+08 ± 5% sched_debug.cfs_rq:/.min_vruntime.max > 1.423e+08 ± 4% -14.0% 1.224e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.min > 4320208 ± 8% -18.1% 3537344 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev > 1852009 ± 3% -13.2% 1607461 ± 2% sched_debug.cpu.avg_idle.avg > 751880 ± 2% -15.1% 638555 ± 4% sched_debug.cpu.avg_idle.stddev > 725827 ± 2% -14.1% 623617 ± 4% sched_debug.cpu.clock.avg > 726857 ± 2% -14.1% 624498 ± 4% sched_debug.cpu.clock.max > 724740 ± 2% -14.1% 622692 ± 4% sched_debug.cpu.clock.min > 717315 ± 2% -14.1% 616349 ± 4% sched_debug.cpu.clock_task.avg > 719648 ± 2% -14.1% 618089 ± 4% sched_debug.cpu.clock_task.max > 698681 ± 2% -14.2% 599424 ± 4% sched_debug.cpu.clock_task.min > 1839 ± 8% -18.1% 1506 ± 7% sched_debug.cpu.clock_task.stddev > 27352 -9.6% 24731 ± 2% sched_debug.cpu.curr->pid.max > 293258 ± 5% -16.4% 245303 ± 7% sched_debug.cpu.max_idle_balance_cost.stddev > -14.96 +73.6% -25.98 sched_debug.cpu.nr_uninterruptible.min > 6.27 ± 4% +18.7% 7.44 ± 6% sched_debug.cpu.nr_uninterruptible.stddev > 724723 ± 2% -14.1% 622678 ± 4% sched_debug.cpu_clk > 723514 ± 2% -14.1% 621468 ± 4% sched_debug.ktime > 725604 ± 2% -14.1% 623550 ± 4% sched_debug.sched_clk > 29.50 ± 3% +24.9% 36.83 ± 9% perf-stat.i.MPKI > 3.592e+08 +5.7% 3.797e+08 ± 2% perf-stat.i.branch-instructions > 1823514 +3.7% 1891464 perf-stat.i.branch-misses > 28542234 ± 3% +22.0% 34809605 ± 10% perf-stat.i.cache-misses > 72486859 ± 3% +19.6% 86713561 ± 7% perf-stat.i.cache-references > 224.48 +3.2% 231.63 perf-stat.i.cpu-migrations > 145250 ± 2% -10.8% 129549 ± 4% perf-stat.i.cycles-between-cache-misses > 0.08 ± 5% -0.0 0.07 ± 10% perf-stat.i.dTLB-load-miss-rate% > 272123 ± 6% -15.0% 231302 ± 10% perf-stat.i.dTLB-load-misses > 4.515e+08 +4.7% 4.729e+08 ± 2% perf-stat.i.dTLB-loads > 995784 +1.9% 1014848 perf-stat.i.dTLB-store-misses > 1.844e+08 +1.5% 1.871e+08 perf-stat.i.dTLB-stores > 1.711e+09 +5.0% 1.797e+09 ± 2% perf-stat.i.instructions > 3.25 +8.3% 3.52 ± 3% perf-stat.i.metric.M/sec > 4603 +6.7% 4912 ± 3% perf-stat.i.minor-faults > 488266 ± 2% +25.0% 610436 ± 6% perf-stat.i.node-load-misses > 618022 ± 4% +13.4% 701130 ± 5% perf-stat.i.node-loads > 4603 +6.7% 4912 ± 3% perf-stat.i.page-faults > 39.67 ± 2% +16.0% 46.04 ± 6% perf-stat.overall.MPKI > 375.84 -4.9% 357.36 ± 2% perf-stat.overall.cpi > 24383 ± 3% -19.0% 19742 ± 12% perf-stat.overall.cycles-between-cache-misses > 0.06 ± 7% -0.0 0.05 ± 10% perf-stat.overall.dTLB-load-miss-rate% > 0.00 +5.2% 0.00 ± 2% perf-stat.overall.ipc > 41.99 ± 2% +2.8 44.83 ± 4% perf-stat.overall.node-load-miss-rate% > 3.355e+08 +6.3% 3.567e+08 ± 2% perf-stat.ps.branch-instructions > 1758832 +4.4% 1835699 perf-stat.ps.branch-misses > 24888631 ± 3% +25.6% 31268733 ± 12% perf-stat.ps.cache-misses > 64007362 ± 3% +22.5% 78424799 ± 8% perf-stat.ps.cache-references > 221.69 +3.0% 228.32 perf-stat.ps.cpu-migrations > 4.273e+08 +5.2% 4.495e+08 ± 2% perf-stat.ps.dTLB-loads > 992569 +1.8% 1010389 perf-stat.ps.dTLB-store-misses > 1.818e+08 +1.6% 1.847e+08 perf-stat.ps.dTLB-stores > 1.613e+09 +5.5% 1.701e+09 ± 2% perf-stat.ps.instructions > 4331 +7.2% 4644 ± 3% perf-stat.ps.minor-faults > 477740 ± 2% +26.3% 603330 ± 7% perf-stat.ps.node-load-misses > 660610 ± 5% +12.3% 741896 ± 6% perf-stat.ps.node-loads > 4331 +7.2% 4644 ± 3% perf-stat.ps.page-faults > 2.264e+12 -10.0% 2.038e+12 ± 3% perf-stat.total.instructions > 1.16 ± 20% -0.6 0.59 ± 47% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 1.07 ± 20% -0.5 0.54 ± 47% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 1.96 ± 25% -0.7 1.27 ± 23% perf-profile.children.cycles-pp.task_mm_cid_work > 1.16 ± 20% -0.5 0.67 ± 19% perf-profile.children.cycles-pp.worker_thread > 1.07 ± 20% -0.5 0.61 ± 21% perf-profile.children.cycles-pp.process_one_work > 0.84 ± 44% -0.4 0.43 ± 25% perf-profile.children.cycles-pp.evlist__id2evsel > 0.58 ± 34% -0.2 0.33 ± 21% perf-profile.children.cycles-pp.do_mprotect_pkey > 0.54 ± 26% -0.2 0.30 ± 23% perf-profile.children.cycles-pp.drm_fb_helper_damage_work > 0.54 ± 26% -0.2 0.30 ± 23% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty > 0.58 ± 34% -0.2 0.34 ± 22% perf-profile.children.cycles-pp.__x64_sys_mprotect > 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_vmap_unlocked > 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_vmap > 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_shmem_object_vmap > 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_shmem_vmap_locked > 0.55 ± 32% -0.2 0.33 ± 18% perf-profile.children.cycles-pp.__wp_page_copy_user > 0.50 ± 35% -0.2 0.28 ± 21% perf-profile.children.cycles-pp.mprotect_fixup > 0.28 ± 25% -0.2 0.08 ±101% perf-profile.children.cycles-pp.drm_gem_shmem_get_pages_locked > 0.28 ± 25% -0.2 0.08 ±101% perf-profile.children.cycles-pp.drm_gem_get_pages > 0.28 ± 25% -0.2 0.08 ±102% perf-profile.children.cycles-pp.shmem_read_folio_gfp > 0.28 ± 25% -0.2 0.08 ±102% perf-profile.children.cycles-pp.drm_gem_shmem_get_pages > 0.62 ± 15% -0.2 0.43 ± 16% perf-profile.children.cycles-pp.try_to_wake_up > 0.25 ± 19% -0.2 0.08 ± 84% perf-profile.children.cycles-pp.drm_client_buffer_vmap > 0.44 ± 19% -0.2 0.28 ± 31% perf-profile.children.cycles-pp.filemap_get_entry > 0.39 ± 14% -0.1 0.26 ± 22% perf-profile.children.cycles-pp.perf_event_mmap > 0.38 ± 13% -0.1 0.25 ± 23% perf-profile.children.cycles-pp.perf_event_mmap_event > 0.22 ± 22% -0.1 0.11 ± 25% perf-profile.children.cycles-pp.lru_add_drain_cpu > 0.24 ± 21% -0.1 0.14 ± 36% perf-profile.children.cycles-pp.do_open_execat > 0.24 ± 13% -0.1 0.14 ± 42% perf-profile.children.cycles-pp.arch_do_signal_or_restart > 0.22 ± 30% -0.1 0.13 ± 10% perf-profile.children.cycles-pp.wake_up_q > 0.14 ± 17% -0.1 0.05 ±101% perf-profile.children.cycles-pp.open_exec > 0.16 ± 21% -0.1 0.07 ± 51% perf-profile.children.cycles-pp.path_init > 0.23 ± 30% -0.1 0.15 ± 22% perf-profile.children.cycles-pp.ttwu_do_activate > 0.26 ± 11% -0.1 0.18 ± 20% perf-profile.children.cycles-pp.perf_iterate_sb > 0.14 ± 50% -0.1 0.07 ± 12% perf-profile.children.cycles-pp.security_inode_getattr > 0.18 ± 27% -0.1 0.11 ± 20% perf-profile.children.cycles-pp.select_task_rq > 0.14 ± 21% -0.1 0.08 ± 29% perf-profile.children.cycles-pp.get_unmapped_area > 0.10 ± 19% -0.1 0.04 ± 73% perf-profile.children.cycles-pp.expand_downwards > 0.18 ± 16% -0.1 0.13 ± 26% perf-profile.children.cycles-pp.__d_alloc > 0.09 ± 15% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.anon_vma_clone > 0.13 ± 36% -0.1 0.08 ± 19% perf-profile.children.cycles-pp.file_free_rcu > 0.08 ± 23% -0.0 0.03 ±101% perf-profile.children.cycles-pp.__legitimize_mnt > 0.09 ± 15% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.__pipe > 1.92 ± 26% -0.7 1.24 ± 23% perf-profile.self.cycles-pp.task_mm_cid_work > 0.82 ± 43% -0.4 0.42 ± 24% perf-profile.self.cycles-pp.evlist__id2evsel > 0.42 ± 39% -0.2 0.22 ± 19% perf-profile.self.cycles-pp.evsel__read_counter > 0.27 ± 24% -0.2 0.10 ± 56% perf-profile.self.cycles-pp.filemap_get_entry > 0.15 ± 48% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.ksys_read > 0.10 ± 34% -0.1 0.03 ±101% perf-profile.self.cycles-pp.enqueue_task_fair > 0.13 ± 36% -0.1 0.08 ± 19% perf-profile.self.cycles-pp.file_free_rcu > > > *************************************************************************************************** > lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory > ========================================================================================= > compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: > gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/numa01_THREAD_ALLOC/autonuma-benchmark > > commit: > fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") > 167773d1dd ("sched/numa: Increase tasks' access history") > > fc769221b23064c0 167773d1ddb5ffdd944f851f2cb > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 2.309e+10 ± 6% -27.8% 1.668e+10 ± 5% cpuidle..time > 23855797 ± 6% -27.9% 17210884 ± 5% cpuidle..usage > 2514 -11.9% 2215 uptime.boot > 27543 ± 5% -23.1% 21189 ± 5% uptime.idle > 9.80 ± 5% -1.8 8.05 ± 6% mpstat.cpu.all.idle% > 0.01 ± 6% +0.0 0.01 ± 17% mpstat.cpu.all.iowait% > 0.08 -0.0 0.07 ± 2% mpstat.cpu.all.soft% > 845597 ± 12% -26.1% 624549 ± 19% numa-numastat.node0.other_node > 2990301 ± 6% -13.1% 2598273 ± 4% numa-numastat.node1.local_node > 471614 ± 21% +45.0% 684016 ± 18% numa-numastat.node1.other_node > 845597 ± 12% -26.1% 624549 ± 19% numa-vmstat.node0.numa_other > 4073 ±106% -82.5% 711.67 ± 23% numa-vmstat.node1.nr_mapped > 2989568 ± 6% -13.1% 2597798 ± 4% numa-vmstat.node1.numa_local > 471614 ± 21% +45.0% 684016 ± 18% numa-vmstat.node1.numa_other > 375.07 ± 4% -15.4% 317.31 ± 2% autonuma-benchmark.numa01.seconds > 2462 -12.2% 2162 autonuma-benchmark.time.elapsed_time > 2462 -12.2% 2162 autonuma-benchmark.time.elapsed_time.max > 1354545 -12.9% 1179617 autonuma-benchmark.time.involuntary_context_switches > 3212023 -6.5% 3001966 autonuma-benchmark.time.minor_page_faults > 8377 +2.3% 8572 autonuma-benchmark.time.percent_of_cpu_this_job_got > 199714 -10.4% 179020 autonuma-benchmark.time.user_time > 50675 ± 8% -19.0% 41038 ± 12% turbostat.C1 > 183835 ± 7% -17.6% 151526 ± 6% turbostat.C1E > 23556011 ± 6% -28.0% 16965247 ± 5% turbostat.C6 > 9.72 ± 5% -1.7 7.99 ± 6% turbostat.C6% > 9.54 ± 6% -18.1% 7.81 ± 6% turbostat.CPU%c1 > 2.404e+08 -12.0% 2.116e+08 turbostat.IRQ > 280.51 +1.2% 283.99 turbostat.PkgWatt > 63.94 +6.7% 68.23 turbostat.RAMWatt > 282375 ± 3% -9.8% 254565 ± 7% proc-vmstat.numa_hint_faults > 217705 ± 6% -12.6% 190234 ± 8% proc-vmstat.numa_hint_faults_local > 7081835 -7.9% 6524239 proc-vmstat.numa_hit > 107927 ± 10% +16.6% 125887 proc-vmstat.numa_huge_pte_updates > 5764595 -9.5% 5215673 proc-vmstat.numa_local > 7379523 ± 15% +25.7% 9272505 ± 4% proc-vmstat.numa_pages_migrated > 55530575 ± 10% +16.5% 64669707 proc-vmstat.numa_pte_updates > 8852860 -9.3% 8028738 proc-vmstat.pgfault > 7379523 ± 15% +25.7% 9272505 ± 4% proc-vmstat.pgmigrate_success > 393902 -9.6% 356099 proc-vmstat.pgreuse > 14358 ± 15% +25.8% 18064 ± 5% proc-vmstat.thp_migration_success > 18273792 -11.5% 16166144 proc-vmstat.unevictable_pgs_scanned > 1.45e+08 -8.7% 1.325e+08 sched_debug.cfs_rq:/.avg_vruntime.max > 3995873 -14.0% 3437625 ± 2% sched_debug.cfs_rq:/.avg_vruntime.stddev > 0.23 ± 3% -8.6% 0.21 ± 6% sched_debug.cfs_rq:/.h_nr_running.stddev > 1.45e+08 -8.7% 1.325e+08 sched_debug.cfs_rq:/.min_vruntime.max > 3995873 -14.0% 3437625 ± 2% sched_debug.cfs_rq:/.min_vruntime.stddev > 0.53 ± 71% +195.0% 1.56 ± 37% sched_debug.cfs_rq:/.removed.load_avg.avg > 25.54 ± 2% +13.0% 28.87 sched_debug.cfs_rq:/.removed.load_avg.max > 3.40 ± 35% +85.6% 6.32 ± 17% sched_debug.cfs_rq:/.removed.load_avg.stddev > 0.16 ± 74% +275.6% 0.59 ± 39% sched_debug.cfs_rq:/.removed.runnable_avg.avg > 8.03 ± 31% +84.9% 14.84 sched_debug.cfs_rq:/.removed.runnable_avg.max > 1.02 ± 44% +154.3% 2.59 ± 16% sched_debug.cfs_rq:/.removed.runnable_avg.stddev > 0.16 ± 74% +275.6% 0.59 ± 39% sched_debug.cfs_rq:/.removed.util_avg.avg > 8.03 ± 31% +84.9% 14.84 sched_debug.cfs_rq:/.removed.util_avg.max > 1.02 ± 44% +154.3% 2.59 ± 16% sched_debug.cfs_rq:/.removed.util_avg.stddev > 146.33 ± 4% -12.0% 128.80 ± 8% sched_debug.cfs_rq:/.util_avg.stddev > 361281 ± 5% -13.6% 312127 ± 3% sched_debug.cpu.avg_idle.stddev > 1229022 -9.9% 1107544 sched_debug.cpu.clock.avg > 1229436 -9.9% 1107919 sched_debug.cpu.clock.max > 1228579 -9.9% 1107137 sched_debug.cpu.clock.min > 248.12 ± 6% -8.9% 226.15 ± 2% sched_debug.cpu.clock.stddev > 1201071 -9.7% 1084858 sched_debug.cpu.clock_task.avg > 1205361 -9.7% 1088445 sched_debug.cpu.clock_task.max > 1190139 -9.7% 1074355 sched_debug.cpu.clock_task.min > 156325 ± 4% -21.3% 123055 ± 3% sched_debug.cpu.max_idle_balance_cost.stddev > 0.00 ± 5% -8.8% 0.00 ± 2% sched_debug.cpu.next_balance.stddev > 0.23 ± 3% -6.9% 0.21 ± 4% sched_debug.cpu.nr_running.stddev > 22855 -11.9% 20146 ± 2% sched_debug.cpu.nr_switches.avg > 0.00 ± 74% +301.6% 0.00 ± 41% sched_debug.cpu.nr_uninterruptible.avg > -20.99 +50.9% -31.67 sched_debug.cpu.nr_uninterruptible.min > 1228564 -9.9% 1107124 sched_debug.cpu_clk > 1227997 -9.9% 1106556 sched_debug.ktime > 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_migratory.avg > 0.02 ± 70% +66.1% 0.03 sched_debug.rt_rq:.rt_nr_migratory.max > 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_migratory.stddev > 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_running.avg > 0.02 ± 70% +66.1% 0.03 sched_debug.rt_rq:.rt_nr_running.max > 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_running.stddev > 1229125 -9.9% 1107673 sched_debug.sched_clk > 36.73 +9.2% 40.12 perf-stat.i.MPKI > 1.156e+08 +0.9% 1.166e+08 perf-stat.i.branch-instructions > 1.41 +0.1 1.49 perf-stat.i.branch-miss-rate% > 1755317 +6.4% 1868497 perf-stat.i.branch-misses > 65.90 +2.6 68.53 perf-stat.i.cache-miss-rate% > 13292768 +13.0% 15016556 perf-stat.i.cache-misses > 20180664 +9.2% 22041180 perf-stat.i.cache-references > 1620 -2.0% 1588 perf-stat.i.context-switches > 492.61 +2.2% 503.60 perf-stat.i.cpi > 2.624e+11 +2.3% 2.685e+11 perf-stat.i.cpu-cycles > 20261 -9.6% 18315 perf-stat.i.cycles-between-cache-misses > 0.08 ± 5% -0.0 0.07 perf-stat.i.dTLB-load-miss-rate% > 114641 ± 5% -6.6% 107104 perf-stat.i.dTLB-load-misses > 0.24 +0.0 0.25 perf-stat.i.dTLB-store-miss-rate% > 202887 +3.4% 209829 perf-stat.i.dTLB-store-misses > 479259 ± 2% -9.8% 432243 ± 6% perf-stat.i.iTLB-load-misses > 272948 ± 5% -16.4% 228065 ± 3% perf-stat.i.iTLB-loads > 5.888e+08 +0.8% 5.938e+08 perf-stat.i.instructions > 1349 +15.8% 1561 ± 2% perf-stat.i.instructions-per-iTLB-miss > 2.73 +2.3% 2.80 perf-stat.i.metric.GHz > 3510 +2.9% 3612 perf-stat.i.minor-faults > 302696 ± 4% +8.0% 327055 perf-stat.i.node-load-misses > 5025469 ± 3% +16.0% 5831348 ± 2% perf-stat.i.node-store-misses > 6419781 +11.7% 7171575 perf-stat.i.node-stores > 3510 +2.9% 3613 perf-stat.i.page-faults > 34.43 +8.1% 37.21 perf-stat.overall.MPKI > 1.51 +0.1 1.59 perf-stat.overall.branch-miss-rate% > 66.31 +2.2 68.53 perf-stat.overall.cache-miss-rate% > 19793 -9.3% 17950 perf-stat.overall.cycles-between-cache-misses > 0.07 ± 5% -0.0 0.07 perf-stat.overall.dTLB-load-miss-rate% > 0.23 +0.0 0.24 perf-stat.overall.dTLB-store-miss-rate% > 1227 ± 2% +12.1% 1376 ± 6% perf-stat.overall.instructions-per-iTLB-miss > 1729818 +6.4% 1840962 perf-stat.ps.branch-misses > 13346402 +12.6% 15031113 perf-stat.ps.cache-misses > 20127330 +9.0% 21934543 perf-stat.ps.cache-references > 1624 -2.1% 1590 perf-stat.ps.context-switches > 2.641e+11 +2.1% 2.698e+11 perf-stat.ps.cpu-cycles > 113287 ± 5% -6.8% 105635 perf-stat.ps.dTLB-load-misses > 203569 +3.2% 210036 perf-stat.ps.dTLB-store-misses > 476376 ± 2% -9.8% 429901 ± 6% perf-stat.ps.iTLB-load-misses > 259293 ± 5% -16.3% 217088 ± 3% perf-stat.ps.iTLB-loads > 3465 +3.1% 3571 perf-stat.ps.minor-faults > 299695 ± 4% +8.3% 324433 perf-stat.ps.node-load-misses > 5044747 ± 3% +15.7% 5834322 ± 2% perf-stat.ps.node-store-misses > 6459846 +11.3% 7189821 perf-stat.ps.node-stores > 3465 +3.1% 3571 perf-stat.ps.page-faults > 1.44e+12 -11.4% 1.275e+12 perf-stat.total.instructions > 0.47 ± 58% +593.5% 3.27 ± 81% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork > 0.37 ±124% +352.3% 1.67 ± 58% perf-sched.sch_delay.avg.ms.__cond_resched.copy_strings.isra.0.do_execveat_common > 0.96 ± 74% -99.0% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.part > 2.01 ± 79% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.__install_special_mapping.map_vdso > 1.35 ± 72% -69.8% 0.41 ± 80% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm > 0.17 ± 18% -26.5% 0.13 ± 5% perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm > 0.26 ± 16% -39.0% 0.16 ± 7% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork > 2.57 ± 65% +1027.2% 28.92 ±120% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork > 0.38 ±119% +669.3% 2.92 ± 19% perf-sched.sch_delay.max.ms.__cond_resched.copy_strings.isra.0.do_execveat_common > 0.51 ±141% +234.9% 1.71 ± 69% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_map.load_elf_binary > 1.63 ± 74% -98.9% 0.02 ±141% perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.part > 3.38 ± 12% -55.7% 1.50 ± 78% perf-sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.search_binary_handler.exec_binprm > 2.37 ± 68% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.__install_special_mapping.map_vdso > 2.05 ± 62% -68.1% 0.65 ± 93% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm > 9.09 ±119% -96.0% 0.36 ± 42% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 > 3.86 ± 40% -50.1% 1.93 ± 30% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select > 2.77 ± 78% -88.0% 0.33 ± 29% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait > 2.48 ± 60% -86.1% 0.34 ± 7% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork > 85.92 ± 73% +97.7% 169.86 ± 31% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 95.98 ± 6% -9.5% 86.82 ± 4% perf-sched.total_wait_and_delay.average.ms > 95.30 ± 6% -9.6% 86.19 ± 4% perf-sched.total_wait_time.average.ms > 725.88 ± 28% -73.5% 192.63 ±141% perf-sched.wait_and_delay.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap > 2.22 ± 42% -76.2% 0.53 ±141% perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 4.02 ± 5% -31.9% 2.74 ± 19% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt > 653.51 ± 9% -13.3% 566.43 ± 7% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 775.33 ± 4% -19.8% 621.67 ± 13% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi > 88.33 ± 14% -16.6% 73.67 ± 11% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 6.28 ± 19% -73.5% 1.67 ±141% perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 1286 ± 3% -65.6% 442.66 ± 91% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt > 222.90 ± 16% +53.8% 342.84 ± 30% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 0.91 ± 70% +7745.7% 71.06 ±129% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter > 21.65 ± 34% +42.0% 30.75 ± 12% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork > 2.67 ± 26% -96.6% 0.09 ±141% perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup > 725.14 ± 28% -73.5% 192.24 ±141% perf-sched.wait_time.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap > 2.87 ± 28% -96.7% 0.09 ± 77% perf-sched.wait_time.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open > 2.10 ± 73% +4020.9% 86.55 ±135% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.open_last_lookups.path_openat > 1.96 ± 73% -94.8% 0.10 ±141% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 > 3.24 ± 21% -65.0% 1.13 ± 69% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.mmap_region > 338.18 ±140% -100.0% 0.07 ±141% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.prepare_creds.copy_creds.copy_process > 21.80 ±122% -94.7% 1.16 ±130% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.do_vmi_align_munmap > 4.29 ± 11% -66.2% 1.45 ±118% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup > 0.94 ±126% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.pipe_write.vfs_write.ksys_write > 3.69 ± 29% -72.9% 1.00 ±141% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.do_exit.do_group_exit.__x64_sys_exit_group > 0.04 ±141% +6192.3% 2.73 ± 63% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode > 32.86 ±128% -95.2% 1.57 ± 12% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep > 3.96 ± 5% -33.0% 2.66 ± 19% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt > 7.38 ± 57% -89.8% 0.75 ± 88% perf-sched.wait_time.avg.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread > 643.25 ± 9% -12.8% 560.82 ± 8% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 2.22 ± 74% +15121.1% 338.52 ±138% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter > 4.97 ± 39% -98.2% 0.09 ±141% perf-sched.wait_time.max.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup > 3.98 -96.1% 0.16 ± 94% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open > 4.28 ± 3% -66.5% 1.44 ±126% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open > 3.95 ± 14% +109.8% 8.28 ± 45% perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit > 2.04 ± 74% -95.0% 0.10 ±141% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 > 340.63 ±140% -100.0% 0.12 ±141% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.prepare_creds.copy_creds.copy_process > 4.74 ± 22% -68.4% 1.50 ±117% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup > 1.30 ±141% +205.8% 3.99 perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read > 1.42 ±131% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.pipe_write.vfs_write.ksys_write > 337.62 ±140% -99.6% 1.33 ±141% perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru > 4.91 ± 27% +4797.8% 240.69 ± 69% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part > 4.29 ± 7% -76.7% 1.00 ±141% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.do_exit.do_group_exit.__x64_sys_exit_group > 0.05 ±141% +5358.6% 2.77 ± 61% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode > 338.90 ±138% -98.8% 3.95 perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep > 1284 ± 3% -68.7% 401.56 ±106% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt > 7.38 ± 57% -89.8% 0.75 ± 88% perf-sched.wait_time.max.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread > 20.80 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.__cmd_record > 20.80 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record > 20.78 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record > 20.74 ± 72% -20.7 0.00 perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record > 20.43 ± 72% -20.4 0.00 perf-profile.calltrace.cycles-pp.process_simple.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record > 20.03 ± 72% -20.0 0.00 perf-profile.calltrace.cycles-pp.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events.record__finish_output > 19.84 ± 72% -19.8 0.00 perf-profile.calltrace.cycles-pp.queue_event.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events > 0.77 ± 26% +0.2 1.00 ± 13% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat > 0.73 ± 26% +0.3 1.00 ± 21% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.74 ± 18% +0.3 1.07 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write > 0.73 ± 18% +0.3 1.07 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.78 ± 36% +0.3 1.11 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 > 0.44 ± 73% +0.3 0.77 ± 14% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 > 0.78 ± 36% +0.3 1.12 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__fxstat64 > 0.76 ± 17% +0.3 1.10 ± 19% perf-profile.calltrace.cycles-pp.write > 0.81 ± 34% +0.4 1.16 ± 16% perf-profile.calltrace.cycles-pp.__fxstat64 > 0.96 ± 33% +0.4 1.35 ± 15% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.96 ± 33% +0.4 1.35 ± 15% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.18 ±141% +0.4 0.60 ± 13% perf-profile.calltrace.cycles-pp.walk_component.link_path_walk.path_openat.do_filp_open.do_sys_openat2 > 1.00 ± 28% +0.4 1.43 ± 6% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close_nocancel > 0.22 ±141% +0.4 0.65 ± 18% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.47 ± 76% +0.5 0.93 ± 10% perf-profile.calltrace.cycles-pp.mm_init.alloc_bprm.do_execveat_common.__x64_sys_execve.do_syscall_64 > 0.42 ± 73% +0.5 0.90 ± 23% perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap > 1.14 ± 29% +0.5 1.62 ± 7% perf-profile.calltrace.cycles-pp.__close_nocancel > 0.41 ± 73% +0.5 0.90 ± 23% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap > 1.10 ± 28% +0.5 1.59 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close_nocancel > 1.10 ± 28% +0.5 1.59 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close_nocancel > 1.13 ± 19% +0.5 1.66 ± 17% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read > 0.58 ± 77% +0.5 1.12 ± 8% perf-profile.calltrace.cycles-pp.alloc_bprm.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.22 ±141% +0.5 0.77 ± 18% perf-profile.calltrace.cycles-pp.lookup_fast.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 > 0.27 ±141% +0.5 0.82 ± 20% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 > 0.00 +0.6 0.56 ± 9% perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.link_path_walk.path_openat.do_filp_open > 0.22 ±141% +0.6 0.85 ± 18% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat > 1.03 ± 71% +5.3 6.34 ± 64% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary > 1.04 ± 71% +5.3 6.37 ± 64% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify > 1.00 ± 71% +5.5 6.50 ± 57% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle > 1.03 ± 71% +5.6 6.61 ± 58% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry > 1.07 ± 71% +5.7 6.74 ± 57% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify > 1.38 ± 78% +6.2 7.53 ± 41% perf-profile.calltrace.cycles-pp.copy_page.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch > 1.44 ± 80% +6.2 7.63 ± 41% perf-profile.calltrace.cycles-pp.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages > 1.44 ± 80% +6.2 7.67 ± 41% perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page > 1.44 ± 80% +6.2 7.67 ± 41% perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page > 1.52 ± 78% +6.5 8.07 ± 41% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault > 1.52 ± 78% +6.5 8.07 ± 41% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault > 1.52 ± 78% +6.6 8.08 ± 41% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault > 1.53 ± 78% +6.6 8.14 ± 41% perf-profile.calltrace.cycles-pp.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault > 5.22 ± 49% +7.3 12.52 ± 23% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 5.49 ± 48% +7.5 12.98 ± 22% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 6.00 ± 47% +7.6 13.57 ± 20% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault > 5.97 ± 48% +7.6 13.55 ± 20% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 6.99 ± 45% +7.8 14.80 ± 19% perf-profile.calltrace.cycles-pp.asm_exc_page_fault > 20.83 ± 73% -20.8 0.00 perf-profile.children.cycles-pp.queue_event > 20.80 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.record__finish_output > 20.78 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.perf_session__process_events > 20.75 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.reader__read_event > 20.43 ± 72% -20.4 0.00 perf-profile.children.cycles-pp.process_simple > 20.03 ± 72% -20.0 0.00 perf-profile.children.cycles-pp.ordered_events__queue > 0.37 ± 14% -0.1 0.26 ± 15% perf-profile.children.cycles-pp.rebalance_domains > 0.11 ± 8% -0.1 0.06 ± 75% perf-profile.children.cycles-pp.wake_up_q > 0.13 ± 7% +0.0 0.15 ± 13% perf-profile.children.cycles-pp.get_unmapped_area > 0.05 +0.0 0.08 ± 22% perf-profile.children.cycles-pp.complete_signal > 0.07 ± 23% +0.0 0.10 ± 19% perf-profile.children.cycles-pp.lru_add_fn > 0.08 ± 24% +0.0 0.12 ± 10% perf-profile.children.cycles-pp.__do_sys_brk > 0.08 ± 11% +0.0 0.13 ± 19% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown > 0.08 ± 12% +0.0 0.12 ± 27% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list > 0.02 ±141% +0.0 0.06 ± 19% perf-profile.children.cycles-pp.workingset_age_nonresident > 0.02 ±141% +0.0 0.06 ± 19% perf-profile.children.cycles-pp.workingset_activation > 0.04 ± 71% +0.1 0.09 ± 5% perf-profile.children.cycles-pp.page_add_file_rmap > 0.09 ± 18% +0.1 0.14 ± 23% perf-profile.children.cycles-pp.terminate_walk > 0.08 ± 12% +0.1 0.13 ± 19% perf-profile.children.cycles-pp.__send_signal_locked > 0.00 +0.1 0.06 ± 8% perf-profile.children.cycles-pp.proc_pident_lookup > 0.11 ± 15% +0.1 0.17 ± 15% perf-profile.children.cycles-pp.exit_notify > 0.15 ± 31% +0.1 0.21 ± 15% perf-profile.children.cycles-pp.try_charge_memcg > 0.04 ± 71% +0.1 0.10 ± 27% perf-profile.children.cycles-pp.__mod_lruvec_state > 0.04 ± 73% +0.1 0.10 ± 24% perf-profile.children.cycles-pp.__mod_node_page_state > 0.11 ± 25% +0.1 0.17 ± 22% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler > 0.08 ± 12% +0.1 0.14 ± 26% perf-profile.children.cycles-pp.get_slabinfo > 0.02 ±141% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.fput > 0.12 ± 6% +0.1 0.18 ± 20% perf-profile.children.cycles-pp.xas_find > 0.08 ± 17% +0.1 0.15 ± 39% perf-profile.children.cycles-pp.task_numa_fault > 0.07 ± 44% +0.1 0.14 ± 18% perf-profile.children.cycles-pp.___slab_alloc > 0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.copy_creds > 0.08 ± 12% +0.1 0.15 ± 18% perf-profile.children.cycles-pp._exit > 0.07 ± 78% +0.1 0.15 ± 27% perf-profile.children.cycles-pp.file_free_rcu > 0.02 ±141% +0.1 0.09 ± 25% perf-profile.children.cycles-pp.do_task_dead > 0.19 ± 22% +0.1 0.27 ± 10% perf-profile.children.cycles-pp.dequeue_entity > 0.18 ± 29% +0.1 0.26 ± 16% perf-profile.children.cycles-pp.lru_add_drain > 0.03 ± 70% +0.1 0.11 ± 25% perf-profile.children.cycles-pp.node_read_numastat > 0.07 ± 25% +0.1 0.15 ± 51% perf-profile.children.cycles-pp.__kernel_read > 0.20 ± 4% +0.1 0.28 ± 24% perf-profile.children.cycles-pp.__do_fault > 0.23 ± 17% +0.1 0.31 ± 9% perf-profile.children.cycles-pp.native_irq_return_iret > 0.11 ± 27% +0.1 0.20 ± 17% perf-profile.children.cycles-pp.__pte_alloc > 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.cgroup_rstat_flush > 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.cgroup_rstat_flush_locked > 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.do_flush_stats > 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.flush_memcg_stats_dwork > 0.12 ± 28% +0.1 0.20 ± 18% perf-profile.children.cycles-pp.d_path > 0.08 ± 36% +0.1 0.16 ± 17% perf-profile.children.cycles-pp.lookup_open > 0.11 ± 7% +0.1 0.20 ± 33% perf-profile.children.cycles-pp.copy_pte_range > 0.13 ± 16% +0.1 0.22 ± 18% perf-profile.children.cycles-pp.dev_attr_show > 0.04 ± 73% +0.1 0.13 ± 49% perf-profile.children.cycles-pp.task_numa_migrate > 0.19 ± 17% +0.1 0.28 ± 7% perf-profile.children.cycles-pp.__count_memcg_events > 0.15 ± 17% +0.1 0.24 ± 10% perf-profile.children.cycles-pp.__pmd_alloc > 0.00 +0.1 0.09 ± 31% perf-profile.children.cycles-pp.remove_vma > 0.13 ± 16% +0.1 0.22 ± 22% perf-profile.children.cycles-pp.sysfs_kf_seq_show > 0.12 ± 26% +0.1 0.21 ± 26% perf-profile.children.cycles-pp.__do_set_cpus_allowed > 0.08 ± 78% +0.1 0.18 ± 20% perf-profile.children.cycles-pp.free_unref_page > 0.02 ±141% +0.1 0.11 ± 32% perf-profile.children.cycles-pp.nd_jump_root > 0.05 ± 74% +0.1 0.14 ± 23% perf-profile.children.cycles-pp._find_next_bit > 0.12 ± 22% +0.1 0.21 ± 21% perf-profile.children.cycles-pp.clock_gettime > 0.02 ±141% +0.1 0.11 ± 29% perf-profile.children.cycles-pp.free_percpu > 0.00 +0.1 0.10 ± 25% perf-profile.children.cycles-pp.lockref_get > 0.25 ± 40% +0.1 0.35 ± 24% perf-profile.children.cycles-pp.shift_arg_pages > 0.26 ± 29% +0.1 0.36 ± 14% perf-profile.children.cycles-pp.rmqueue > 0.13 ± 35% +0.1 0.23 ± 24% perf-profile.children.cycles-pp.single_open > 0.05 ± 78% +0.1 0.15 ± 29% perf-profile.children.cycles-pp.vma_expand > 0.09 ± 5% +0.1 0.21 ± 41% perf-profile.children.cycles-pp.prepare_task_switch > 0.08 ± 12% +0.1 0.19 ± 37% perf-profile.children.cycles-pp.copy_page_to_iter > 0.22 ± 40% +0.1 0.34 ± 33% perf-profile.children.cycles-pp.mas_wr_node_store > 0.16 ± 41% +0.1 0.27 ± 13% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr_locked > 0.16 ± 10% +0.1 0.28 ± 26% perf-profile.children.cycles-pp.free_pages_and_swap_cache > 0.11 ± 28% +0.1 0.23 ± 27% perf-profile.children.cycles-pp.single_release > 0.00 +0.1 0.12 ± 37% perf-profile.children.cycles-pp.find_busiest_queue > 0.23 ± 28% +0.1 0.35 ± 23% perf-profile.children.cycles-pp.pte_alloc_one > 0.23 ± 32% +0.1 0.35 ± 16% perf-profile.children.cycles-pp.strncpy_from_user > 0.20 ± 55% +0.1 0.33 ± 25% perf-profile.children.cycles-pp.gather_stats > 0.16 ± 30% +0.1 0.30 ± 12% perf-profile.children.cycles-pp._raw_spin_lock_irqsave > 0.29 ± 31% +0.1 0.43 ± 14% perf-profile.children.cycles-pp.setup_arg_pages > 0.13 ± 18% +0.1 0.27 ± 28% perf-profile.children.cycles-pp.aa_file_perm > 0.03 ± 70% +0.1 0.18 ± 73% perf-profile.children.cycles-pp.set_pmd_migration_entry > 0.09 ±103% +0.1 0.23 ± 39% perf-profile.children.cycles-pp.__wait_for_common > 0.19 ± 16% +0.1 0.33 ± 27% perf-profile.children.cycles-pp.obj_cgroup_charge > 0.03 ± 70% +0.1 0.18 ± 74% perf-profile.children.cycles-pp.try_to_migrate_one > 0.14 ± 41% +0.2 0.29 ± 34% perf-profile.children.cycles-pp.select_task_rq > 0.28 ± 35% +0.2 0.44 ± 28% perf-profile.children.cycles-pp.vm_area_alloc > 0.04 ± 71% +0.2 0.20 ± 73% perf-profile.children.cycles-pp.try_to_migrate > 0.04 ± 71% +0.2 0.22 ± 70% perf-profile.children.cycles-pp.rmap_walk_anon > 0.37 ± 28% +0.2 0.55 ± 23% perf-profile.children.cycles-pp.pick_next_task_fair > 0.04 ± 71% +0.2 0.22 ± 57% perf-profile.children.cycles-pp.migrate_folio_unmap > 0.11 ± 51% +0.2 0.31 ± 30% perf-profile.children.cycles-pp.on_each_cpu_cond_mask > 0.30 ± 30% +0.2 0.50 ± 16% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state > 0.30 ± 19% +0.2 0.50 ± 23% perf-profile.children.cycles-pp.__perf_sw_event > 0.21 ± 30% +0.2 0.41 ± 19% perf-profile.children.cycles-pp.apparmor_file_permission > 0.25 ± 29% +0.2 0.45 ± 15% perf-profile.children.cycles-pp.security_file_permission > 0.13 ± 55% +0.2 0.34 ± 24% perf-profile.children.cycles-pp.smp_call_function_many_cond > 0.31 ± 34% +0.2 0.52 ± 30% perf-profile.children.cycles-pp.pipe_read > 0.32 ± 16% +0.2 0.55 ± 8% perf-profile.children.cycles-pp.getname_flags > 0.33 ± 11% +0.2 0.55 ± 21% perf-profile.children.cycles-pp.___perf_sw_event > 0.17 ± 44% +0.2 0.40 ± 38% perf-profile.children.cycles-pp.newidle_balance > 0.38 ± 38% +0.2 0.60 ± 12% perf-profile.children.cycles-pp.__percpu_counter_init > 0.38 ± 37% +0.2 0.61 ± 18% perf-profile.children.cycles-pp.readlink > 0.27 ± 40% +0.2 0.51 ± 21% perf-profile.children.cycles-pp.mod_objcg_state > 0.76 ± 17% +0.3 1.10 ± 19% perf-profile.children.cycles-pp.write > 0.48 ± 42% +0.4 0.83 ± 13% perf-profile.children.cycles-pp.pid_revalidate > 0.61 ± 34% +0.4 0.98 ± 17% perf-profile.children.cycles-pp.__d_lookup_rcu > 0.73 ± 35% +0.4 1.12 ± 8% perf-profile.children.cycles-pp.alloc_bprm > 0.59 ± 42% +0.4 0.98 ± 11% perf-profile.children.cycles-pp.pcpu_alloc > 0.77 ± 31% +0.4 1.21 ± 4% perf-profile.children.cycles-pp.mm_init > 0.92 ± 31% +0.5 1.38 ± 12% perf-profile.children.cycles-pp.__fxstat64 > 0.74 ± 32% +0.5 1.27 ± 20% perf-profile.children.cycles-pp.open_last_lookups > 1.37 ± 29% +0.6 1.94 ± 19% perf-profile.children.cycles-pp.kmem_cache_alloc > 1.35 ± 38% +0.7 2.09 ± 15% perf-profile.children.cycles-pp.lookup_fast > 1.13 ± 59% +5.3 6.47 ± 63% perf-profile.children.cycles-pp.start_secondary > 1.06 ± 60% +5.4 6.50 ± 57% perf-profile.children.cycles-pp.intel_idle > 1.09 ± 59% +5.5 6.62 ± 58% perf-profile.children.cycles-pp.cpuidle_enter > 1.09 ± 59% +5.5 6.62 ± 58% perf-profile.children.cycles-pp.cpuidle_enter_state > 1.10 ± 59% +5.5 6.65 ± 58% perf-profile.children.cycles-pp.cpuidle_idle_call > 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.secondary_startup_64_no_verify > 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.cpu_startup_entry > 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.do_idle > 1.51 ± 69% +6.1 7.65 ± 41% perf-profile.children.cycles-pp.folio_copy > 1.52 ± 69% +6.2 7.68 ± 41% perf-profile.children.cycles-pp.move_to_new_folio > 1.52 ± 69% +6.2 7.68 ± 41% perf-profile.children.cycles-pp.migrate_folio_extra > 1.74 ± 63% +6.2 7.96 ± 39% perf-profile.children.cycles-pp.copy_page > 1.61 ± 68% +6.5 8.08 ± 41% perf-profile.children.cycles-pp.migrate_pages_batch > 1.61 ± 68% +6.5 8.09 ± 41% perf-profile.children.cycles-pp.migrate_pages > 1.61 ± 68% +6.5 8.10 ± 41% perf-profile.children.cycles-pp.migrate_misplaced_page > 1.62 ± 67% +6.5 8.14 ± 41% perf-profile.children.cycles-pp.do_huge_pmd_numa_page > 7.23 ± 41% +7.5 14.76 ± 19% perf-profile.children.cycles-pp.__handle_mm_fault > 8.24 ± 38% +7.6 15.86 ± 17% perf-profile.children.cycles-pp.exc_page_fault > 8.20 ± 38% +7.6 15.84 ± 17% perf-profile.children.cycles-pp.do_user_addr_fault > 9.84 ± 35% +7.7 17.51 ± 15% perf-profile.children.cycles-pp.asm_exc_page_fault > 7.71 ± 40% +7.7 15.41 ± 18% perf-profile.children.cycles-pp.handle_mm_fault > 20.00 ± 72% -20.0 0.00 perf-profile.self.cycles-pp.queue_event > 0.18 ± 22% -0.1 0.10 ± 24% perf-profile.self.cycles-pp.__d_lookup > 0.07 ± 25% +0.0 0.10 ± 9% perf-profile.self.cycles-pp.__perf_read_group_add > 0.08 ± 16% +0.0 0.12 ± 26% perf-profile.self.cycles-pp.check_heap_object > 0.05 ± 8% +0.0 0.09 ± 30% perf-profile.self.cycles-pp.__memcg_kmem_charge_page > 0.02 ±141% +0.0 0.06 ± 13% perf-profile.self.cycles-pp.try_to_wake_up > 0.08 ± 31% +0.1 0.14 ± 30% perf-profile.self.cycles-pp.task_dump_owner > 0.05 ± 74% +0.1 0.10 ± 24% perf-profile.self.cycles-pp.rmqueue > 0.14 ± 26% +0.1 0.20 ± 6% perf-profile.self.cycles-pp.init_file > 0.05 ± 78% +0.1 0.10 ± 4% perf-profile.self.cycles-pp.enqueue_task_fair > 0.05 ± 78% +0.1 0.10 ± 27% perf-profile.self.cycles-pp.___slab_alloc > 0.02 ±141% +0.1 0.08 ± 24% perf-profile.self.cycles-pp.pick_link > 0.04 ± 73% +0.1 0.10 ± 24% perf-profile.self.cycles-pp.__mod_node_page_state > 0.07 ± 17% +0.1 0.14 ± 26% perf-profile.self.cycles-pp.get_slabinfo > 0.00 +0.1 0.07 ± 18% perf-profile.self.cycles-pp.select_task_rq > 0.07 ± 78% +0.1 0.15 ± 27% perf-profile.self.cycles-pp.file_free_rcu > 0.09 ± 44% +0.1 0.16 ± 15% perf-profile.self.cycles-pp.apparmor_file_permission > 0.08 ± 27% +0.1 0.15 ± 35% perf-profile.self.cycles-pp.malloc > 0.02 ±141% +0.1 0.10 ± 29% perf-profile.self.cycles-pp.memcg_account_kmem > 0.23 ± 17% +0.1 0.31 ± 9% perf-profile.self.cycles-pp.native_irq_return_iret > 0.13 ± 32% +0.1 0.21 ± 32% perf-profile.self.cycles-pp.obj_cgroup_charge > 0.10 ± 43% +0.1 0.19 ± 11% perf-profile.self.cycles-pp.perf_read > 0.14 ± 12% +0.1 0.23 ± 25% perf-profile.self.cycles-pp.cgroup_rstat_updated > 0.13 ± 43% +0.1 0.23 ± 27% perf-profile.self.cycles-pp.mod_objcg_state > 0.00 +0.1 0.10 ± 25% perf-profile.self.cycles-pp.lockref_get > 0.07 ± 78% +0.1 0.18 ± 34% perf-profile.self.cycles-pp.update_rq_clock_task > 0.00 +0.1 0.10 ± 27% perf-profile.self.cycles-pp.find_busiest_queue > 0.09 ± 59% +0.1 0.21 ± 29% perf-profile.self.cycles-pp.smp_call_function_many_cond > 0.15 ± 31% +0.1 0.27 ± 16% perf-profile.self.cycles-pp._raw_spin_lock_irqsave > 0.19 ± 39% +0.1 0.32 ± 19% perf-profile.self.cycles-pp.zap_pte_range > 0.13 ± 18% +0.1 0.26 ± 23% perf-profile.self.cycles-pp.aa_file_perm > 0.19 ± 50% +0.1 0.32 ± 24% perf-profile.self.cycles-pp.gather_stats > 0.24 ± 16% +0.2 0.40 ± 17% perf-profile.self.cycles-pp.___perf_sw_event > 0.25 ± 31% +0.2 0.41 ± 16% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state > 0.08 ± 71% +0.2 0.25 ± 24% perf-profile.self.cycles-pp.pcpu_alloc > 0.16 ± 38% +0.2 0.34 ± 21% perf-profile.self.cycles-pp.filemap_map_pages > 0.32 ± 41% +0.2 0.54 ± 17% perf-profile.self.cycles-pp.pid_revalidate > 0.47 ± 19% +0.3 0.73 ± 21% perf-profile.self.cycles-pp.kmem_cache_alloc > 0.60 ± 34% +0.4 0.96 ± 18% perf-profile.self.cycles-pp.__d_lookup_rcu > 1.06 ± 60% +5.4 6.50 ± 57% perf-profile.self.cycles-pp.intel_idle > 1.74 ± 63% +6.2 7.92 ± 39% perf-profile.self.cycles-pp.copy_page > > > > *************************************************************************************************** > lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory > ========================================================================================= > compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: > gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/_INVERSE_BIND/autonuma-benchmark > > commit: > fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") > 167773d1dd ("sched/numa: Increase tasks' access history") > > fc769221b23064c0 167773d1ddb5ffdd944f851f2cb > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 0.01 ± 20% +0.0 0.01 ± 15% mpstat.cpu.all.iowait% > 25370 ± 3% -13.5% 21946 ± 6% uptime.idle > 2.098e+10 ± 4% -15.8% 1.767e+10 ± 7% cpuidle..time > 21696014 ± 4% -15.8% 18274389 ± 7% cpuidle..usage > 3567832 ± 2% -12.9% 3106532 ± 5% numa-numastat.node1.local_node > 4472555 ± 2% -10.8% 3989658 ± 6% numa-numastat.node1.numa_hit > 21420616 ± 4% -15.9% 18019892 ± 7% turbostat.C6 > 62.12 +3.8% 64.46 turbostat.RAMWatt > 185236 ± 6% -17.4% 152981 ± 15% numa-meminfo.node1.Active > 184892 ± 6% -17.5% 152523 ± 15% numa-meminfo.node1.Active(anon) > 190876 ± 6% -17.4% 157580 ± 15% numa-meminfo.node1.Shmem > 373.94 ± 4% -14.8% 318.67 ± 6% autonuma-benchmark.numa01.seconds > 3066 ± 2% -7.6% 2833 ± 3% autonuma-benchmark.time.elapsed_time > 3066 ± 2% -7.6% 2833 ± 3% autonuma-benchmark.time.elapsed_time.max > 1770652 ± 3% -7.7% 1634112 ± 3% autonuma-benchmark.time.involuntary_context_switches > 258701 ± 2% -6.9% 240826 ± 3% autonuma-benchmark.time.user_time > 46235 ± 6% -17.5% 38150 ± 15% numa-vmstat.node1.nr_active_anon > 47723 ± 6% -17.4% 39411 ± 15% numa-vmstat.node1.nr_shmem > 46235 ± 6% -17.5% 38150 ± 15% numa-vmstat.node1.nr_zone_active_anon > 4471422 ± 2% -10.8% 3989129 ± 6% numa-vmstat.node1.numa_hit > 3566699 ± 2% -12.9% 3106004 ± 5% numa-vmstat.node1.numa_local > 2.37 ± 23% +45.3% 3.44 ± 16% sched_debug.cfs_rq:/.removed.runnable_avg.stddev > 2.26 ± 28% +45.0% 3.28 ± 20% sched_debug.cfs_rq:/.removed.util_avg.stddev > 203.53 ± 4% -12.8% 177.48 ± 3% sched_debug.cfs_rq:/.util_est_enqueued.stddev > 128836 ± 7% -16.9% 107001 ± 8% sched_debug.cpu.max_idle_balance_cost.stddev > 12639 ± 6% -12.1% 11108 ± 8% sched_debug.cpu.nr_switches.min > 0.06 ± 41% -44.9% 0.04 ± 20% perf-sched.sch_delay.avg.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm > 1.84 ± 23% -56.4% 0.80 ± 33% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select > 0.08 ± 38% -55.2% 0.04 ± 22% perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm > 7.55 ± 60% -77.2% 1.72 ±152% perf-sched.wait_time.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm > 10.72 ± 60% -73.8% 2.81 ±171% perf-sched.wait_time.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm > 0.28 ± 12% -16.4% 0.23 ± 5% perf-sched.wait_time.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm > 8802 ± 3% -4.3% 8427 proc-vmstat.nr_mapped > 54506 ± 5% -5.2% 51656 proc-vmstat.nr_shmem > 8510048 -4.5% 8124296 proc-vmstat.numa_hit > 43091 ± 8% +15.9% 49938 ± 6% proc-vmstat.numa_huge_pte_updates > 7242046 -5.3% 6860532 ± 2% proc-vmstat.numa_local > 3762770 ± 5% +34.7% 5068087 ± 3% proc-vmstat.numa_pages_migrated > 22235827 ± 8% +15.8% 25759214 ± 6% proc-vmstat.numa_pte_updates > 10591821 -5.4% 10024519 ± 2% proc-vmstat.pgfault > 3762770 ± 5% +34.7% 5068087 ± 3% proc-vmstat.pgmigrate_success > 489883 ± 2% -6.8% 456801 ± 3% proc-vmstat.pgreuse > 7297 ± 5% +34.8% 9838 ± 3% proc-vmstat.thp_migration_success > 22825216 -7.4% 21132800 ± 3% proc-vmstat.unevictable_pgs_scanned > 40.10 +4.2% 41.80 perf-stat.i.MPKI > 1.64 +0.1 1.74 perf-stat.i.branch-miss-rate% > 1920111 +6.9% 2051982 perf-stat.i.branch-misses > 60.50 +1.2 61.72 perf-stat.i.cache-miss-rate% > 12369678 +6.9% 13223477 perf-stat.i.cache-misses > 21918348 +4.6% 22934958 perf-stat.i.cache-references > 22544 -4.0% 21634 perf-stat.i.cycles-between-cache-misses > 1458 +12.1% 1635 ± 5% perf-stat.i.instructions-per-iTLB-miss > 2.51 +2.4% 2.57 perf-stat.i.metric.M/sec > 3383 +2.3% 3460 perf-stat.i.minor-faults > 244016 +5.0% 256219 perf-stat.i.node-load-misses > 4544736 +9.5% 4977101 ± 3% perf-stat.i.node-store-misses > 6126744 +5.5% 6463826 ± 2% perf-stat.i.node-stores > 3383 +2.3% 3460 perf-stat.i.page-faults > 37.34 +3.4% 38.60 perf-stat.overall.MPKI > 1.64 +0.1 1.74 perf-stat.overall.branch-miss-rate% > 21951 -5.4% 20763 perf-stat.overall.cycles-between-cache-misses > 1866870 +7.1% 2000069 perf-stat.ps.branch-misses > 12385090 +6.6% 13198317 perf-stat.ps.cache-misses > 21609219 +4.6% 22595642 perf-stat.ps.cache-references > 3340 +2.3% 3418 perf-stat.ps.minor-faults > 243774 +4.9% 255759 perf-stat.ps.node-load-misses > 4560352 +9.0% 4973035 ± 3% perf-stat.ps.node-store-misses > 6135666 +5.2% 6452858 ± 2% perf-stat.ps.node-stores > 3340 +2.3% 3418 perf-stat.ps.page-faults > 1.775e+12 -6.5% 1.659e+12 ± 2% perf-stat.total.instructions > 32.90 ± 14% -14.9 17.99 ± 40% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt > 0.60 ± 14% +0.3 0.88 ± 23% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 > 0.57 ± 49% +0.4 0.93 ± 14% perf-profile.calltrace.cycles-pp.update_sg_wakeup_stats.find_idlest_group.find_idlest_cpu.select_task_rq_fair.sched_exec > 0.78 ± 12% +0.4 1.15 ± 34% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read > 0.80 ± 14% +0.4 1.17 ± 26% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat > 0.82 ± 15% +0.4 1.19 ± 33% perf-profile.calltrace.cycles-pp.__libc_read.readn.perf_evsel__read.read_counters.process_interval > 0.80 ± 14% +0.4 1.19 ± 33% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read.read_counters > 0.50 ± 46% +0.4 0.89 ± 25% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat > 0.59 ± 49% +0.4 0.98 ± 19% perf-profile.calltrace.cycles-pp.find_idlest_group.find_idlest_cpu.select_task_rq_fair.sched_exec.bprm_execve > 0.59 ± 48% +0.4 1.00 ± 25% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 > 0.67 ± 47% +0.4 1.10 ± 22% perf-profile.calltrace.cycles-pp.sched_exec.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64 > 0.90 ± 18% +0.4 1.33 ± 24% perf-profile.calltrace.cycles-pp.show_numa_map.seq_read_iter.seq_read.vfs_read.ksys_read > 0.66 ± 46% +0.4 1.09 ± 27% perf-profile.calltrace.cycles-pp.gather_pte_stats.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range > 0.68 ± 46% +0.5 1.13 ± 27% perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_vma.show_numa_map > 0.68 ± 46% +0.5 1.13 ± 27% perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_vma > 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.walk_page_vma.show_numa_map.seq_read_iter.seq_read.vfs_read > 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_vma.show_numa_map.seq_read_iter.seq_read > 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_vma.show_numa_map.seq_read_iter > 0.40 ± 71% +0.5 0.88 ± 20% perf-profile.calltrace.cycles-pp._dl_addr > 0.93 ± 18% +0.5 1.45 ± 28% perf-profile.calltrace.cycles-pp.__fxstat64 > 0.88 ± 18% +0.5 1.41 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 > 0.88 ± 18% +0.5 1.42 ± 28% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__fxstat64 > 0.60 ± 73% +0.6 1.24 ± 18% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.23 ±142% +0.7 0.88 ± 26% perf-profile.calltrace.cycles-pp.show_stat.seq_read_iter.vfs_read.ksys_read.do_syscall_64 > 2.87 ± 14% +1.3 4.21 ± 23% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 > 2.88 ± 14% +1.4 4.23 ± 23% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 > 34.28 ± 13% -14.6 19.70 ± 36% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > 0.13 ± 29% -0.1 0.05 ± 76% perf-profile.children.cycles-pp.schedule_tail > 0.12 ± 20% -0.1 0.05 ± 78% perf-profile.children.cycles-pp.__put_user_4 > 0.18 ± 16% +0.1 0.23 ± 13% perf-profile.children.cycles-pp.__x64_sys_munmap > 0.09 ± 17% +0.1 0.16 ± 27% perf-profile.children.cycles-pp.__do_sys_brk > 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_insert_into_field > 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_opcode_1A_1T_1R > 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_store_object_to_node > 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_write_data_to_field > 0.02 ±142% +0.1 0.09 ± 50% perf-profile.children.cycles-pp.common_perm_cond > 0.06 ± 58% +0.1 0.14 ± 24% perf-profile.children.cycles-pp.___slab_alloc > 0.02 ±144% +0.1 0.10 ± 63% perf-profile.children.cycles-pp.__alloc_pages_bulk > 0.06 ± 18% +0.1 0.14 ± 58% perf-profile.children.cycles-pp.security_inode_getattr > 0.12 ± 40% +0.1 0.21 ± 28% perf-profile.children.cycles-pp.__ptrace_may_access > 0.07 ± 33% +0.1 0.18 ± 40% perf-profile.children.cycles-pp.brk > 0.15 ± 14% +0.1 0.26 ± 23% perf-profile.children.cycles-pp.wq_worker_comm > 0.09 ± 87% +0.1 0.21 ± 30% perf-profile.children.cycles-pp.irq_get_next_irq > 0.93 ± 12% +0.2 1.17 ± 19% perf-profile.children.cycles-pp.do_dentry_open > 0.15 ± 30% +0.3 0.43 ± 56% perf-profile.children.cycles-pp.run_ksoftirqd > 0.54 ± 17% +0.4 0.89 ± 20% perf-profile.children.cycles-pp._dl_addr > 0.74 ± 19% +0.4 1.09 ± 27% perf-profile.children.cycles-pp.gather_pte_stats > 0.74 ± 25% +0.4 1.10 ± 21% perf-profile.children.cycles-pp.sched_exec > 0.76 ± 19% +0.4 1.13 ± 27% perf-profile.children.cycles-pp.walk_p4d_range > 0.76 ± 19% +0.4 1.13 ± 27% perf-profile.children.cycles-pp.walk_pud_range > 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.walk_page_vma > 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.__walk_page_range > 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.walk_pgd_range > 0.92 ± 13% +0.4 1.33 ± 20% perf-profile.children.cycles-pp.open_last_lookups > 0.90 ± 17% +0.4 1.33 ± 24% perf-profile.children.cycles-pp.show_numa_map > 0.43 ± 51% +0.5 0.88 ± 26% perf-profile.children.cycles-pp.show_stat > 1.49 ± 11% +0.5 1.94 ± 15% perf-profile.children.cycles-pp.__do_softirq > 1.22 ± 18% +0.6 1.78 ± 16% perf-profile.children.cycles-pp.update_sg_wakeup_stats > 1.28 ± 20% +0.6 1.88 ± 18% perf-profile.children.cycles-pp.find_idlest_group > 1.07 ± 16% +0.6 1.67 ± 30% perf-profile.children.cycles-pp.__fxstat64 > 1.36 ± 20% +0.6 1.98 ± 21% perf-profile.children.cycles-pp.find_idlest_cpu > 30.64 ± 15% -14.9 15.70 ± 46% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt > 0.01 ±223% +0.1 0.07 ± 36% perf-profile.self.cycles-pp.pick_next_task_fair > 0.10 ± 28% +0.1 0.17 ± 28% perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg > 0.00 +0.1 0.07 ± 32% perf-profile.self.cycles-pp.touch_atime > 0.04 ±106% +0.1 0.11 ± 18% perf-profile.self.cycles-pp.___slab_alloc > 0.12 ± 37% +0.1 0.20 ± 27% perf-profile.self.cycles-pp.__ptrace_may_access > 0.05 ± 52% +0.1 0.13 ± 75% perf-profile.self.cycles-pp.pick_link > 0.14 ± 28% +0.1 0.24 ± 34% perf-profile.self.cycles-pp.__slab_free > 0.47 ± 19% +0.3 0.79 ± 16% perf-profile.self.cycles-pp._dl_addr > 1.00 ± 19% +0.4 1.44 ± 18% perf-profile.self.cycles-pp.update_sg_wakeup_stats > 6.04 ± 14% +1.9 7.99 ± 18% perf-profile.self.cycles-pp.syscall_exit_to_user_mode > > > > *************************************************************************************************** > lkp-icl-2sp6: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory > ========================================================================================= > compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: > gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark > > commit: > fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") > 167773d1dd ("sched/numa: Increase tasks' access history") > > fc769221b23064c0 167773d1ddb5ffdd944f851f2cb > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 36796 ± 6% -19.0% 29811 ± 8% uptime.idle > 3.231e+10 ± 7% -21.6% 2.534e+10 ± 10% cpuidle..time > 33785162 ± 7% -21.8% 26431366 ± 10% cpuidle..usage > 10.56 ± 7% -1.5 9.02 ± 9% mpstat.cpu.all.idle% > 0.01 ± 22% +0.0 0.01 ± 11% mpstat.cpu.all.iowait% > 0.17 ± 2% -0.0 0.15 ± 4% mpstat.cpu.all.soft% > 388157 ± 31% +60.9% 624661 ± 36% numa-numastat.node0.other_node > 4511165 ± 4% -13.5% 3901276 ± 7% numa-numastat.node1.numa_hit > 951382 ± 12% -30.4% 661932 ± 31% numa-numastat.node1.other_node > 388157 ± 31% +60.9% 624658 ± 36% numa-vmstat.node0.numa_other > 4510646 ± 4% -13.5% 3900948 ± 7% numa-vmstat.node1.numa_hit > 951382 ± 12% -30.4% 661932 ± 31% numa-vmstat.node1.numa_other > 305.08 ± 5% +19.6% 364.96 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.avg > 989.11 ± 4% +13.0% 1117 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.max > 5082 ± 6% -19.0% 4114 ± 12% sched_debug.cpu.curr->pid.stddev > 85229 -13.2% 74019 ± 9% sched_debug.cpu.max_idle_balance_cost.stddev > 7575 ± 5% -8.3% 6946 ± 3% sched_debug.cpu.nr_switches.min > 394498 ± 5% -21.0% 311653 ± 10% turbostat.C1E > 33233046 ± 8% -21.7% 26018024 ± 10% turbostat.C6 > 10.39 ± 7% -1.5 8.90 ± 9% turbostat.C6% > 7.77 ± 6% -17.5% 6.41 ± 9% turbostat.CPU%c1 > 206.88 +2.9% 212.86 turbostat.RAMWatt > 372.30 -8.3% 341.49 autonuma-benchmark.numa01.seconds > 209.06 -10.7% 186.67 ± 6% autonuma-benchmark.numa01_THREAD_ALLOC.seconds > 2408 -8.6% 2200 ± 2% autonuma-benchmark.time.elapsed_time > 2408 -8.6% 2200 ± 2% autonuma-benchmark.time.elapsed_time.max > 1221333 ± 2% -5.1% 1159380 ± 2% autonuma-benchmark.time.involuntary_context_switches > 3508627 -4.1% 3363550 autonuma-benchmark.time.minor_page_faults > 11174 +1.9% 11388 autonuma-benchmark.time.percent_of_cpu_this_job_got > 261419 -7.0% 243046 ± 2% autonuma-benchmark.time.user_time > 220972 ± 7% +22.1% 269753 ± 3% proc-vmstat.numa_hint_faults > 164886 ± 11% +19.4% 196883 ± 5% proc-vmstat.numa_hint_faults_local > 7964964 -5.9% 7494239 proc-vmstat.numa_hit > 82885 ± 6% +43.4% 118829 ± 6% proc-vmstat.numa_huge_pte_updates > 6625289 -6.3% 6207618 proc-vmstat.numa_local > 6636312 ± 4% +33.1% 8834573 ± 3% proc-vmstat.numa_pages_migrated > 42671823 ± 6% +43.2% 61094857 ± 6% proc-vmstat.numa_pte_updates > 9173569 -6.2% 8602789 proc-vmstat.pgfault > 6636312 ± 4% +33.1% 8834573 ± 3% proc-vmstat.pgmigrate_success > 397595 -6.5% 371818 proc-vmstat.pgreuse > 12917 ± 4% +33.2% 17200 ± 3% proc-vmstat.thp_migration_success > 17964288 -8.7% 16401792 ± 2% proc-vmstat.unevictable_pgs_scanned > 0.63 ± 12% -0.3 0.28 ±100% perf-profile.calltrace.cycles-pp.__libc_read.readn.evsel__read_counter.read_counters.process_interval > 1.17 ± 4% -0.2 0.96 ± 14% perf-profile.children.cycles-pp.__irq_exit_rcu > 0.65 ± 19% -0.2 0.46 ± 13% perf-profile.children.cycles-pp.task_mm_cid_work > 0.23 ± 16% -0.2 0.08 ± 61% perf-profile.children.cycles-pp.rcu_gp_kthread > 0.30 ± 5% -0.1 0.16 ± 43% perf-profile.children.cycles-pp.rebalance_domains > 0.13 ± 21% -0.1 0.03 ±100% perf-profile.children.cycles-pp.rcu_gp_fqs_loop > 0.25 ± 16% -0.1 0.18 ± 14% perf-profile.children.cycles-pp.lru_add_drain_cpu > 0.17 ± 9% -0.1 0.11 ± 23% perf-profile.children.cycles-pp.__perf_read_group_add > 0.09 ± 21% -0.0 0.04 ± 72% perf-profile.children.cycles-pp.__evlist__disable > 0.11 ± 19% -0.0 0.07 ± 53% perf-profile.children.cycles-pp.vma_link > 0.13 ± 6% -0.0 0.09 ± 27% perf-profile.children.cycles-pp.ptep_clear_flush > 0.07 ± 7% -0.0 0.03 ±100% perf-profile.children.cycles-pp.__kernel_read > 0.07 ± 7% -0.0 0.03 ±100% perf-profile.children.cycles-pp.simple_lookup > 0.09 ± 9% +0.0 0.11 ± 10% perf-profile.children.cycles-pp.exit_notify > 0.12 ± 14% +0.0 0.16 ± 17% perf-profile.children.cycles-pp.__do_set_cpus_allowed > 0.02 ±141% +0.1 0.09 ± 40% perf-profile.children.cycles-pp.__sysvec_call_function > 0.05 ± 78% +0.1 0.13 ± 42% perf-profile.children.cycles-pp.__flush_smp_call_function_queue > 0.03 ±141% +0.1 0.12 ± 41% perf-profile.children.cycles-pp.sysvec_call_function > 0.64 ± 19% -0.2 0.45 ± 12% perf-profile.self.cycles-pp.task_mm_cid_work > 0.07 ± 7% -0.0 0.03 ±100% perf-profile.self.cycles-pp.dequeue_task_fair > 0.05 ± 8% +0.0 0.08 ± 14% perf-profile.self.cycles-pp.file_free_rcu > 1057 +9.9% 1162 ± 2% perf-stat.i.MPKI > 76.36 ± 2% +4.6 80.91 ± 2% perf-stat.i.cache-miss-rate% > 5.353e+08 ± 4% +18.2% 6.327e+08 ± 3% perf-stat.i.cache-misses > 7.576e+08 +9.3% 8.282e+08 ± 2% perf-stat.i.cache-references > 3.727e+11 +1.7% 3.792e+11 perf-stat.i.cpu-cycles > 154.73 +1.5% 157.11 perf-stat.i.cpu-migrations > 722.61 ± 2% -8.9% 658.12 ± 3% perf-stat.i.cycles-between-cache-misses > 2.91 +1.7% 2.96 perf-stat.i.metric.GHz > 1242 ± 3% +5.7% 1312 ± 2% perf-stat.i.metric.K/sec > 12.73 +9.8% 13.98 ± 2% perf-stat.i.metric.M/sec > 245601 +5.4% 258749 perf-stat.i.node-load-misses > 43.38 -2.5 40.91 ± 3% perf-stat.i.node-store-miss-rate% > 2.267e+08 ± 3% +8.8% 2.467e+08 ± 4% perf-stat.i.node-store-misses > 3.067e+08 ± 5% +25.2% 3.841e+08 ± 6% perf-stat.i.node-stores > 915.00 +9.1% 998.24 ± 2% perf-stat.overall.MPKI > 71.29 ± 3% +5.7 77.00 ± 3% perf-stat.overall.cache-miss-rate% > 702.58 ± 3% -14.0% 604.23 ± 3% perf-stat.overall.cycles-between-cache-misses > 42.48 ± 2% -3.3 39.20 ± 5% perf-stat.overall.node-store-miss-rate% > 5.33e+08 ± 4% +18.1% 6.296e+08 ± 3% perf-stat.ps.cache-misses > 7.475e+08 +9.4% 8.178e+08 ± 2% perf-stat.ps.cache-references > 3.739e+11 +1.6% 3.8e+11 perf-stat.ps.cpu-cycles > 154.22 +1.6% 156.62 perf-stat.ps.cpu-migrations > 3655 +2.5% 3744 perf-stat.ps.minor-faults > 242759 +5.4% 255974 perf-stat.ps.node-load-misses > 2.255e+08 ± 3% +8.9% 2.457e+08 ± 3% perf-stat.ps.node-store-misses > 3.057e+08 ± 5% +24.9% 3.82e+08 ± 6% perf-stat.ps.node-stores > 3655 +2.5% 3744 perf-stat.ps.page-faults > 1.968e+12 -8.3% 1.805e+12 ± 2% perf-stat.total.instructions > 0.03 ±141% +283.8% 0.13 ± 85% perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range > 0.06 ± 77% +254.1% 0.20 ± 54% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault > 0.08 ± 28% -89.5% 0.01 ±223% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm > 0.92 ± 10% -33.4% 0.62 ± 20% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 0.10 ± 22% -27.2% 0.07 ± 8% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 0.35 ±141% +186.8% 1.02 ± 69% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range > 1.47 ± 81% +262.6% 5.32 ± 79% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault > 2.42 ± 42% +185.9% 6.91 ± 52% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt > 0.26 ± 9% +1470.7% 4.16 ±115% perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm > 3.61 ± 7% -25.3% 2.70 ± 18% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select > 0.08 ± 28% -89.5% 0.01 ±223% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm > 17.44 ± 4% -19.0% 14.12 ± 13% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 > 23.36 ± 21% -37.2% 14.67 ± 22% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 107.00 +11.5% 119.33 ± 4% perf-sched.wait_and_delay.count.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 > 75.00 +9.6% 82.17 ± 2% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 79.99 ± 97% -86.8% 10.52 ± 41% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single > 145.98 ± 14% -41.5% 85.46 ± 22% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 1.20 ± 94% +152.3% 3.03 ± 31% perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup > 2.30 ± 57% -90.9% 0.21 ±205% perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.part > 0.06 ± 8% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary > 0.58 ± 81% -76.6% 0.14 ± 50% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_lookupat.filename_lookup > 2.63 ± 21% -59.4% 1.07 ± 68% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open > 2.68 ± 40% -79.5% 0.55 ±174% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 > 3.59 ± 17% -52.9% 1.69 ± 98% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.mmap_region > 4.05 ± 2% -80.6% 0.79 ±133% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup > 3.75 ± 19% -81.9% 0.68 ±135% perf-sched.wait_time.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read > 1527 ± 70% -84.5% 236.84 ±223% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop > 16.13 ± 4% -21.4% 12.69 ± 15% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 > 1.16 ±117% -99.1% 0.01 ±223% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault > 0.26 ± 25% -93.2% 0.02 ±223% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.__access_remote_vm > 22.43 ± 21% -37.4% 14.05 ± 22% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 4.41 ± 8% -94.9% 0.22 ±191% perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.part > 0.08 ± 29% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary > 6.20 ± 8% -21.6% 4.87 ± 13% perf-sched.wait_time.max.ms.__cond_resched.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 4.23 ± 5% -68.3% 1.34 ±136% perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read > 3053 ± 70% -92.2% 236.84 ±223% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop > 4.78 ± 33% +10431.5% 502.95 ± 99% perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] > 79.99 ± 97% -86.9% 10.51 ± 41% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single > 2.13 ±128% -99.5% 0.01 ±223% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault > 0.26 ± 25% -92.4% 0.02 ±223% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.__access_remote_vm > 142.79 ± 13% -40.9% 84.32 ± 22% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > > > > I hope I can add your tested-by if I need to REBASE the patch for -mm tree depending on the feedback I get any further with any minor changes.
hi, Raghu, On Wed, Sep 13, 2023 at 11:45:18AM +0530, Raghavendra K T wrote: > On 9/12/2023 7:54 PM, kernel test robot wrote: > > > > > > hi, Raghu, > > > > hope this third performance report for same one patch-set won't annoy you, > > and better, have some value to you. > > Not at all. But thanks a lot and am rather more happy to see this > exhaustive results. > > Because: It is easy to show see that patchset is increasing readability > of code or maintainance of code etc., > while I try my best to see regressions are within noise level for some > corner cases and some benchmarks have improved noticeably, there is > always a room to miss something. > Reports like this, helps to boost confidence on patchset. > > Also your cumulative (bisection) report also helped to evaluate > importance of each patch too.. ok, will keep sending report if any :) > > > > > I hope I can add your tested-by if I need to REBASE the patch for -mm > tree depending on the feedback I get any further with any minor changes. sure! Tested-by: kernel test robot <oliver.sang@intel.com> > > >
diff --git a/include/linux/mm.h b/include/linux/mm.h index 406ab9ea818f..7794dc91c50f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1689,10 +1689,14 @@ static inline int xchg_page_access_time(struct page *page, int time) static inline void vma_set_access_pid_bit(struct vm_area_struct *vma) { unsigned int pid_bit; - - pid_bit = hash_32(current->pid, ilog2(BITS_PER_LONG)); - if (vma->numab_state && !test_bit(pid_bit, &vma->numab_state->access_pids[1])) { - __set_bit(pid_bit, &vma->numab_state->access_pids[1]); + unsigned long *pids, pid_idx; + + if (vma->numab_state) { + pid_bit = hash_32(current->pid, ilog2(BITS_PER_LONG)); + pid_idx = READ_ONCE(vma->numab_state->access_pid_idx); + pids = vma->numab_state->access_pids + pid_idx; + if (!test_bit(pid_bit, pids)) + __set_bit(pid_bit, pids); } } #else /* !CONFIG_NUMA_BALANCING */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 647d9fc5da8d..676afa9e497c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -475,10 +475,12 @@ struct vma_lock { struct rw_semaphore lock; }; +#define NR_ACCESS_PID_HIST 4 struct vma_numab_state { unsigned long next_scan; unsigned long next_pid_reset; - unsigned long access_pids[2]; + unsigned long access_pids[NR_ACCESS_PID_HIST]; + unsigned long access_pid_idx; unsigned long vma_scan_select; }; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e26e847a8e26..3ae2a1a3ef5c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2958,12 +2958,26 @@ static bool task_disjoint_vma_select(struct vm_area_struct *vma) return true; } +static inline bool vma_test_access_pid_history(struct vm_area_struct *vma) +{ + unsigned int i, pid_bit; + unsigned long pids = 0; + + pid_bit = hash_32(current->pid, ilog2(BITS_PER_LONG)); + + for (i = 0; i < NR_ACCESS_PID_HIST; i++) + pids |= vma->numab_state->access_pids[i]; + + return test_bit(pid_bit, &pids); +} + static bool vma_is_accessed(struct vm_area_struct *vma) { - unsigned long pids; + /* Check if the current task had historically accessed VMA. */ + if (vma_test_access_pid_history(vma)) + return true; - pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1]; - return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids); + return false; } #define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay) @@ -2983,6 +2997,7 @@ static void task_numa_work(struct callback_head *work) unsigned long nr_pte_updates = 0; long pages, virtpages; struct vma_iterator vmi; + unsigned long pid_idx; SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work)); @@ -3097,8 +3112,12 @@ static void task_numa_work(struct callback_head *work) time_after(jiffies, vma->numab_state->next_pid_reset)) { vma->numab_state->next_pid_reset = vma->numab_state->next_pid_reset + msecs_to_jiffies(VMA_PID_RESET_PERIOD); - vma->numab_state->access_pids[0] = READ_ONCE(vma->numab_state->access_pids[1]); - vma->numab_state->access_pids[1] = 0; + + pid_idx = vma->numab_state->access_pid_idx; + pid_idx = (pid_idx + 1) % NR_ACCESS_PID_HIST; + + vma->numab_state->access_pid_idx = pid_idx; + vma->numab_state->access_pids[pid_idx] = 0; } /*