Message ID | 20240829183741.3331213-9-andrii@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | uprobes: RCU-protected hot path optimizations | expand |
Hi Andrii, kernel test robot noticed the following build errors: [auto build test ERROR on tip/perf/core] [also build test ERROR on next-20240830] [cannot apply to perf-tools-next/perf-tools-next perf-tools/perf-tools linus/master acme/perf/core v6.11-rc5] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/uprobes-revamp-uprobe-refcounting-and-lifetime-management/20240830-024135 base: tip/perf/core patch link: https://lore.kernel.org/r/20240829183741.3331213-9-andrii%40kernel.org patch subject: [PATCH v4 8/8] uprobes: switch to RCU Tasks Trace flavor for better performance config: i386-buildonly-randconfig-004-20240830 (https://download.01.org/0day-ci/archive/20240831/202408310130.t9EBKteQ-lkp@intel.com/config) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408310130.t9EBKteQ-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408310130.t9EBKteQ-lkp@intel.com/ All errors (new ones prefixed by >>): >> kernel/events/uprobes.c:1157:2: error: call to undeclared function 'synchronize_rcu_tasks_trace'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 1157 | synchronize_rcu_tasks_trace(); | ^ kernel/events/uprobes.c:1157:2: note: did you mean 'synchronize_rcu_tasks_rude'? include/linux/rcupdate.h:206:6: note: 'synchronize_rcu_tasks_rude' declared here 206 | void synchronize_rcu_tasks_rude(void); | ^ 1 error generated. vim +/synchronize_rcu_tasks_trace +1157 kernel/events/uprobes.c 1145 1146 void uprobe_unregister_sync(void) 1147 { 1148 /* 1149 * Now that handler_chain() and handle_uretprobe_chain() iterate over 1150 * uprobe->consumers list under RCU protection without holding 1151 * uprobe->register_rwsem, we need to wait for RCU grace period to 1152 * make sure that we can't call into just unregistered 1153 * uprobe_consumer's callbacks anymore. If we don't do that, fast and 1154 * unlucky enough caller can free consumer's memory and cause 1155 * handler_chain() or handle_uretprobe_chain() to do an use-after-free. 1156 */ > 1157 synchronize_rcu_tasks_trace(); 1158 } 1159 EXPORT_SYMBOL_GPL(uprobe_unregister_sync); 1160
On Fri, Aug 30, 2024 at 10:42 AM kernel test robot <lkp@intel.com> wrote: > > Hi Andrii, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on tip/perf/core] > [also build test ERROR on next-20240830] > [cannot apply to perf-tools-next/perf-tools-next perf-tools/perf-tools linus/master acme/perf/core v6.11-rc5] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/uprobes-revamp-uprobe-refcounting-and-lifetime-management/20240830-024135 > base: tip/perf/core > patch link: https://lore.kernel.org/r/20240829183741.3331213-9-andrii%40kernel.org > patch subject: [PATCH v4 8/8] uprobes: switch to RCU Tasks Trace flavor for better performance > config: i386-buildonly-randconfig-004-20240830 (https://download.01.org/0day-ci/archive/20240831/202408310130.t9EBKteQ-lkp@intel.com/config) > compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408310130.t9EBKteQ-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202408310130.t9EBKteQ-lkp@intel.com/ > > All errors (new ones prefixed by >>): > > >> kernel/events/uprobes.c:1157:2: error: call to undeclared function 'synchronize_rcu_tasks_trace'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] > 1157 | synchronize_rcu_tasks_trace(); > | ^ > kernel/events/uprobes.c:1157:2: note: did you mean 'synchronize_rcu_tasks_rude'? > include/linux/rcupdate.h:206:6: note: 'synchronize_rcu_tasks_rude' declared here > 206 | void synchronize_rcu_tasks_rude(void); > | ^ > 1 error generated. Missing #include <linux/rcupdate_trace.h>, will add. > > > vim +/synchronize_rcu_tasks_trace +1157 kernel/events/uprobes.c > > 1145 > 1146 void uprobe_unregister_sync(void) > 1147 { > 1148 /* > 1149 * Now that handler_chain() and handle_uretprobe_chain() iterate over > 1150 * uprobe->consumers list under RCU protection without holding > 1151 * uprobe->register_rwsem, we need to wait for RCU grace period to > 1152 * make sure that we can't call into just unregistered > 1153 * uprobe_consumer's callbacks anymore. If we don't do that, fast and > 1154 * unlucky enough caller can free consumer's memory and cause > 1155 * handler_chain() or handle_uretprobe_chain() to do an use-after-free. > 1156 */ > > 1157 synchronize_rcu_tasks_trace(); > 1158 } > 1159 EXPORT_SYMBOL_GPL(uprobe_unregister_sync); > 1160 > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki
Hi Andrii, kernel test robot noticed the following build errors: [auto build test ERROR on tip/perf/core] [also build test ERROR on next-20240830] [cannot apply to perf-tools-next/perf-tools-next perf-tools/perf-tools linus/master acme/perf/core v6.11-rc5] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/uprobes-revamp-uprobe-refcounting-and-lifetime-management/20240830-024135 base: tip/perf/core patch link: https://lore.kernel.org/r/20240829183741.3331213-9-andrii%40kernel.org patch subject: [PATCH v4 8/8] uprobes: switch to RCU Tasks Trace flavor for better performance config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240831/202408310111.2dkrylJ9-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408310111.2dkrylJ9-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408310111.2dkrylJ9-lkp@intel.com/ All errors (new ones prefixed by >>): kernel/events/uprobes.c: In function 'uprobe_unregister_sync': >> kernel/events/uprobes.c:1157:9: error: implicit declaration of function 'synchronize_rcu_tasks_trace'; did you mean 'synchronize_rcu_tasks'? [-Werror=implicit-function-declaration] 1157 | synchronize_rcu_tasks_trace(); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ | synchronize_rcu_tasks cc1: some warnings being treated as errors vim +1157 kernel/events/uprobes.c 1145 1146 void uprobe_unregister_sync(void) 1147 { 1148 /* 1149 * Now that handler_chain() and handle_uretprobe_chain() iterate over 1150 * uprobe->consumers list under RCU protection without holding 1151 * uprobe->register_rwsem, we need to wait for RCU grace period to 1152 * make sure that we can't call into just unregistered 1153 * uprobe_consumer's callbacks anymore. If we don't do that, fast and 1154 * unlucky enough caller can free consumer's memory and cause 1155 * handler_chain() or handle_uretprobe_chain() to do an use-after-free. 1156 */ > 1157 synchronize_rcu_tasks_trace(); 1158 } 1159 EXPORT_SYMBOL_GPL(uprobe_unregister_sync); 1160
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8a464cf38127..a5d39cec53d5 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -42,8 +42,6 @@ static struct rb_root uprobes_tree = RB_ROOT; static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */ static seqcount_rwlock_t uprobes_seqcount = SEQCNT_RWLOCK_ZERO(uprobes_seqcount, &uprobes_treelock); -DEFINE_STATIC_SRCU(uprobes_srcu); - #define UPROBES_HASH_SZ 13 /* serialize uprobe->pending_list */ static struct mutex uprobes_mmap_mutex[UPROBES_HASH_SZ]; @@ -652,7 +650,7 @@ static void put_uprobe(struct uprobe *uprobe) delayed_uprobe_remove(uprobe, NULL); mutex_unlock(&delayed_uprobe_lock); - call_srcu(&uprobes_srcu, &uprobe->rcu, uprobe_free_rcu); + call_rcu_tasks_trace(&uprobe->rcu, uprobe_free_rcu); } static __always_inline @@ -707,7 +705,7 @@ static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset) struct rb_node *node; unsigned int seq; - lockdep_assert(srcu_read_lock_held(&uprobes_srcu)); + lockdep_assert(rcu_read_lock_trace_held()); do { seq = read_seqcount_begin(&uprobes_seqcount); @@ -935,8 +933,7 @@ static bool filter_chain(struct uprobe *uprobe, struct mm_struct *mm) bool ret = false; down_read(&uprobe->consumer_rwsem); - list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, - srcu_read_lock_held(&uprobes_srcu)) { + list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { ret = consumer_filter(uc, mm); if (ret) break; @@ -1157,7 +1154,7 @@ void uprobe_unregister_sync(void) * unlucky enough caller can free consumer's memory and cause * handler_chain() or handle_uretprobe_chain() to do an use-after-free. */ - synchronize_srcu(&uprobes_srcu); + synchronize_rcu_tasks_trace(); } EXPORT_SYMBOL_GPL(uprobe_unregister_sync); @@ -1241,19 +1238,18 @@ EXPORT_SYMBOL_GPL(uprobe_register); int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add) { struct uprobe_consumer *con; - int ret = -ENOENT, srcu_idx; + int ret = -ENOENT; down_write(&uprobe->register_rwsem); - srcu_idx = srcu_read_lock(&uprobes_srcu); - list_for_each_entry_srcu(con, &uprobe->consumers, cons_node, - srcu_read_lock_held(&uprobes_srcu)) { + rcu_read_lock_trace(); + list_for_each_entry_rcu(con, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { if (con == uc) { ret = register_for_each_vma(uprobe, add ? uc : NULL); break; } } - srcu_read_unlock(&uprobes_srcu, srcu_idx); + rcu_read_unlock_trace(); up_write(&uprobe->register_rwsem); @@ -2123,8 +2119,7 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) current->utask->auprobe = &uprobe->arch; - list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, - srcu_read_lock_held(&uprobes_srcu)) { + list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { int rc = 0; if (uc->handler) { @@ -2162,15 +2157,13 @@ handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs) { struct uprobe *uprobe = ri->uprobe; struct uprobe_consumer *uc; - int srcu_idx; - srcu_idx = srcu_read_lock(&uprobes_srcu); - list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, - srcu_read_lock_held(&uprobes_srcu)) { + rcu_read_lock_trace(); + list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { if (uc->ret_handler) uc->ret_handler(uc, ri->func, regs); } - srcu_read_unlock(&uprobes_srcu, srcu_idx); + rcu_read_unlock_trace(); } static struct return_instance *find_next_ret_chain(struct return_instance *ri) @@ -2255,13 +2248,13 @@ static void handle_swbp(struct pt_regs *regs) { struct uprobe *uprobe; unsigned long bp_vaddr; - int is_swbp, srcu_idx; + int is_swbp; bp_vaddr = uprobe_get_swbp_addr(regs); if (bp_vaddr == uprobe_get_trampoline_vaddr()) return uprobe_handle_trampoline(regs); - srcu_idx = srcu_read_lock(&uprobes_srcu); + rcu_read_lock_trace(); uprobe = find_active_uprobe_rcu(bp_vaddr, &is_swbp); if (!uprobe) { @@ -2319,7 +2312,7 @@ static void handle_swbp(struct pt_regs *regs) out: /* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */ - srcu_read_unlock(&uprobes_srcu, srcu_idx); + rcu_read_unlock_trace(); } /*
This patch switches uprobes SRCU usage to RCU Tasks Trace flavor, which is optimized for more lightweight and quick readers (at the expense of slower writers, which for uprobes is a fine tradeof) and has better performance and scalability with number of CPUs. Similarly to baseline vs SRCU, we've benchmarked SRCU-based implementation vs RCU Tasks Trace implementation. SRCU ==== uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu) uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu) uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu) uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu) uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu) uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu) uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu) uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu) uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu) uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu) uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu) uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu) uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu) uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu) RCU Tasks Trace =============== uprobe-nop ( 1 cpus): 3.396 ± 0.002M/s ( 3.396M/s/cpu) uprobe-nop ( 2 cpus): 4.271 ± 0.006M/s ( 2.135M/s/cpu) uprobe-nop ( 4 cpus): 8.499 ± 0.015M/s ( 2.125M/s/cpu) uprobe-nop ( 8 cpus): 10.355 ± 0.028M/s ( 1.294M/s/cpu) uprobe-nop (16 cpus): 7.615 ± 0.099M/s ( 0.476M/s/cpu) uprobe-nop (32 cpus): 4.430 ± 0.007M/s ( 0.138M/s/cpu) uprobe-nop (64 cpus): 6.887 ± 0.020M/s ( 0.108M/s/cpu) uretprobe-nop ( 1 cpus): 2.174 ± 0.001M/s ( 2.174M/s/cpu) uretprobe-nop ( 2 cpus): 2.853 ± 0.001M/s ( 1.426M/s/cpu) uretprobe-nop ( 4 cpus): 4.913 ± 0.002M/s ( 1.228M/s/cpu) uretprobe-nop ( 8 cpus): 5.883 ± 0.002M/s ( 0.735M/s/cpu) uretprobe-nop (16 cpus): 5.147 ± 0.001M/s ( 0.322M/s/cpu) uretprobe-nop (32 cpus): 3.738 ± 0.008M/s ( 0.117M/s/cpu) uretprobe-nop (64 cpus): 4.397 ± 0.002M/s ( 0.069M/s/cpu) Peak throughput for uprobes increases from 8 mln/s to 10.3 mln/s (+28%!), and for uretprobes from 5.3 mln/s to 5.8 mln/s (+11%), as we have more work to do on uretprobes side. Even single-thread (no contention) performance is slightly better: 3.276 mln/s to 3.396 mln/s (+3.5%) for uprobes, and 2.055 mln/s to 2.174 mln/s (+5.8%) for uretprobes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> --- kernel/events/uprobes.c | 37 +++++++++++++++---------------------- 1 file changed, 15 insertions(+), 22 deletions(-)