diff mbox series

[v4,8/8] uprobes: switch to RCU Tasks Trace flavor for better performance

Message ID 20240829183741.3331213-9-andrii@kernel.org (mailing list archive)
State New
Headers show
Series uprobes: RCU-protected hot path optimizations | expand

Commit Message

Andrii Nakryiko Aug. 29, 2024, 6:37 p.m. UTC
This patch switches uprobes SRCU usage to RCU Tasks Trace flavor, which
is optimized for more lightweight and quick readers (at the expense of
slower writers, which for uprobes is a fine tradeof) and has better
performance and scalability with number of CPUs.

Similarly to baseline vs SRCU, we've benchmarked SRCU-based
implementation vs RCU Tasks Trace implementation.

SRCU
====
uprobe-nop      ( 1 cpus):    3.276 ± 0.005M/s  (  3.276M/s/cpu)
uprobe-nop      ( 2 cpus):    4.125 ± 0.002M/s  (  2.063M/s/cpu)
uprobe-nop      ( 4 cpus):    7.713 ± 0.002M/s  (  1.928M/s/cpu)
uprobe-nop      ( 8 cpus):    8.097 ± 0.006M/s  (  1.012M/s/cpu)
uprobe-nop      (16 cpus):    6.501 ± 0.056M/s  (  0.406M/s/cpu)
uprobe-nop      (32 cpus):    4.398 ± 0.084M/s  (  0.137M/s/cpu)
uprobe-nop      (64 cpus):    6.452 ± 0.000M/s  (  0.101M/s/cpu)

uretprobe-nop   ( 1 cpus):    2.055 ± 0.001M/s  (  2.055M/s/cpu)
uretprobe-nop   ( 2 cpus):    2.677 ± 0.000M/s  (  1.339M/s/cpu)
uretprobe-nop   ( 4 cpus):    4.561 ± 0.003M/s  (  1.140M/s/cpu)
uretprobe-nop   ( 8 cpus):    5.291 ± 0.002M/s  (  0.661M/s/cpu)
uretprobe-nop   (16 cpus):    5.065 ± 0.019M/s  (  0.317M/s/cpu)
uretprobe-nop   (32 cpus):    3.622 ± 0.003M/s  (  0.113M/s/cpu)
uretprobe-nop   (64 cpus):    3.723 ± 0.002M/s  (  0.058M/s/cpu)

RCU Tasks Trace
===============
uprobe-nop      ( 1 cpus):    3.396 ± 0.002M/s  (  3.396M/s/cpu)
uprobe-nop      ( 2 cpus):    4.271 ± 0.006M/s  (  2.135M/s/cpu)
uprobe-nop      ( 4 cpus):    8.499 ± 0.015M/s  (  2.125M/s/cpu)
uprobe-nop      ( 8 cpus):   10.355 ± 0.028M/s  (  1.294M/s/cpu)
uprobe-nop      (16 cpus):    7.615 ± 0.099M/s  (  0.476M/s/cpu)
uprobe-nop      (32 cpus):    4.430 ± 0.007M/s  (  0.138M/s/cpu)
uprobe-nop      (64 cpus):    6.887 ± 0.020M/s  (  0.108M/s/cpu)

uretprobe-nop   ( 1 cpus):    2.174 ± 0.001M/s  (  2.174M/s/cpu)
uretprobe-nop   ( 2 cpus):    2.853 ± 0.001M/s  (  1.426M/s/cpu)
uretprobe-nop   ( 4 cpus):    4.913 ± 0.002M/s  (  1.228M/s/cpu)
uretprobe-nop   ( 8 cpus):    5.883 ± 0.002M/s  (  0.735M/s/cpu)
uretprobe-nop   (16 cpus):    5.147 ± 0.001M/s  (  0.322M/s/cpu)
uretprobe-nop   (32 cpus):    3.738 ± 0.008M/s  (  0.117M/s/cpu)
uretprobe-nop   (64 cpus):    4.397 ± 0.002M/s  (  0.069M/s/cpu)

Peak throughput for uprobes increases from 8 mln/s to 10.3 mln/s
(+28%!), and for uretprobes from 5.3 mln/s to 5.8 mln/s (+11%), as we
have more work to do on uretprobes side.

Even single-thread (no contention) performance is slightly better: 3.276
mln/s to 3.396 mln/s (+3.5%) for uprobes, and 2.055 mln/s to 2.174 mln/s
(+5.8%) for uretprobes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/events/uprobes.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

Comments

kernel test robot Aug. 30, 2024, 5:41 p.m. UTC | #1
Hi Andrii,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on next-20240830]
[cannot apply to perf-tools-next/perf-tools-next perf-tools/perf-tools linus/master acme/perf/core v6.11-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/uprobes-revamp-uprobe-refcounting-and-lifetime-management/20240830-024135
base:   tip/perf/core
patch link:    https://lore.kernel.org/r/20240829183741.3331213-9-andrii%40kernel.org
patch subject: [PATCH v4 8/8] uprobes: switch to RCU Tasks Trace flavor for better performance
config: i386-buildonly-randconfig-004-20240830 (https://download.01.org/0day-ci/archive/20240831/202408310130.t9EBKteQ-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408310130.t9EBKteQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408310130.t9EBKteQ-lkp@intel.com/

All errors (new ones prefixed by >>):

>> kernel/events/uprobes.c:1157:2: error: call to undeclared function 'synchronize_rcu_tasks_trace'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    1157 |         synchronize_rcu_tasks_trace();
         |         ^
   kernel/events/uprobes.c:1157:2: note: did you mean 'synchronize_rcu_tasks_rude'?
   include/linux/rcupdate.h:206:6: note: 'synchronize_rcu_tasks_rude' declared here
     206 | void synchronize_rcu_tasks_rude(void);
         |      ^
   1 error generated.


vim +/synchronize_rcu_tasks_trace +1157 kernel/events/uprobes.c

  1145	
  1146	void uprobe_unregister_sync(void)
  1147	{
  1148		/*
  1149		 * Now that handler_chain() and handle_uretprobe_chain() iterate over
  1150		 * uprobe->consumers list under RCU protection without holding
  1151		 * uprobe->register_rwsem, we need to wait for RCU grace period to
  1152		 * make sure that we can't call into just unregistered
  1153		 * uprobe_consumer's callbacks anymore. If we don't do that, fast and
  1154		 * unlucky enough caller can free consumer's memory and cause
  1155		 * handler_chain() or handle_uretprobe_chain() to do an use-after-free.
  1156		 */
> 1157		synchronize_rcu_tasks_trace();
  1158	}
  1159	EXPORT_SYMBOL_GPL(uprobe_unregister_sync);
  1160
Andrii Nakryiko Aug. 30, 2024, 5:55 p.m. UTC | #2
On Fri, Aug 30, 2024 at 10:42 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Andrii,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on tip/perf/core]
> [also build test ERROR on next-20240830]
> [cannot apply to perf-tools-next/perf-tools-next perf-tools/perf-tools linus/master acme/perf/core v6.11-rc5]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/uprobes-revamp-uprobe-refcounting-and-lifetime-management/20240830-024135
> base:   tip/perf/core
> patch link:    https://lore.kernel.org/r/20240829183741.3331213-9-andrii%40kernel.org
> patch subject: [PATCH v4 8/8] uprobes: switch to RCU Tasks Trace flavor for better performance
> config: i386-buildonly-randconfig-004-20240830 (https://download.01.org/0day-ci/archive/20240831/202408310130.t9EBKteQ-lkp@intel.com/config)
> compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408310130.t9EBKteQ-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202408310130.t9EBKteQ-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> >> kernel/events/uprobes.c:1157:2: error: call to undeclared function 'synchronize_rcu_tasks_trace'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>     1157 |         synchronize_rcu_tasks_trace();
>          |         ^
>    kernel/events/uprobes.c:1157:2: note: did you mean 'synchronize_rcu_tasks_rude'?
>    include/linux/rcupdate.h:206:6: note: 'synchronize_rcu_tasks_rude' declared here
>      206 | void synchronize_rcu_tasks_rude(void);
>          |      ^
>    1 error generated.

Missing #include <linux/rcupdate_trace.h>, will add.

>
>
> vim +/synchronize_rcu_tasks_trace +1157 kernel/events/uprobes.c
>
>   1145
>   1146  void uprobe_unregister_sync(void)
>   1147  {
>   1148          /*
>   1149           * Now that handler_chain() and handle_uretprobe_chain() iterate over
>   1150           * uprobe->consumers list under RCU protection without holding
>   1151           * uprobe->register_rwsem, we need to wait for RCU grace period to
>   1152           * make sure that we can't call into just unregistered
>   1153           * uprobe_consumer's callbacks anymore. If we don't do that, fast and
>   1154           * unlucky enough caller can free consumer's memory and cause
>   1155           * handler_chain() or handle_uretprobe_chain() to do an use-after-free.
>   1156           */
> > 1157          synchronize_rcu_tasks_trace();
>   1158  }
>   1159  EXPORT_SYMBOL_GPL(uprobe_unregister_sync);
>   1160
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
kernel test robot Aug. 30, 2024, 8:36 p.m. UTC | #3
Hi Andrii,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on next-20240830]
[cannot apply to perf-tools-next/perf-tools-next perf-tools/perf-tools linus/master acme/perf/core v6.11-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/uprobes-revamp-uprobe-refcounting-and-lifetime-management/20240830-024135
base:   tip/perf/core
patch link:    https://lore.kernel.org/r/20240829183741.3331213-9-andrii%40kernel.org
patch subject: [PATCH v4 8/8] uprobes: switch to RCU Tasks Trace flavor for better performance
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240831/202408310111.2dkrylJ9-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408310111.2dkrylJ9-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408310111.2dkrylJ9-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/events/uprobes.c: In function 'uprobe_unregister_sync':
>> kernel/events/uprobes.c:1157:9: error: implicit declaration of function 'synchronize_rcu_tasks_trace'; did you mean 'synchronize_rcu_tasks'? [-Werror=implicit-function-declaration]
    1157 |         synchronize_rcu_tasks_trace();
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
         |         synchronize_rcu_tasks
   cc1: some warnings being treated as errors


vim +1157 kernel/events/uprobes.c

  1145	
  1146	void uprobe_unregister_sync(void)
  1147	{
  1148		/*
  1149		 * Now that handler_chain() and handle_uretprobe_chain() iterate over
  1150		 * uprobe->consumers list under RCU protection without holding
  1151		 * uprobe->register_rwsem, we need to wait for RCU grace period to
  1152		 * make sure that we can't call into just unregistered
  1153		 * uprobe_consumer's callbacks anymore. If we don't do that, fast and
  1154		 * unlucky enough caller can free consumer's memory and cause
  1155		 * handler_chain() or handle_uretprobe_chain() to do an use-after-free.
  1156		 */
> 1157		synchronize_rcu_tasks_trace();
  1158	}
  1159	EXPORT_SYMBOL_GPL(uprobe_unregister_sync);
  1160
diff mbox series

Patch

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8a464cf38127..a5d39cec53d5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -42,8 +42,6 @@  static struct rb_root uprobes_tree = RB_ROOT;
 static DEFINE_RWLOCK(uprobes_treelock);	/* serialize rbtree access */
 static seqcount_rwlock_t uprobes_seqcount = SEQCNT_RWLOCK_ZERO(uprobes_seqcount, &uprobes_treelock);
 
-DEFINE_STATIC_SRCU(uprobes_srcu);
-
 #define UPROBES_HASH_SZ	13
 /* serialize uprobe->pending_list */
 static struct mutex uprobes_mmap_mutex[UPROBES_HASH_SZ];
@@ -652,7 +650,7 @@  static void put_uprobe(struct uprobe *uprobe)
 	delayed_uprobe_remove(uprobe, NULL);
 	mutex_unlock(&delayed_uprobe_lock);
 
-	call_srcu(&uprobes_srcu, &uprobe->rcu, uprobe_free_rcu);
+	call_rcu_tasks_trace(&uprobe->rcu, uprobe_free_rcu);
 }
 
 static __always_inline
@@ -707,7 +705,7 @@  static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset)
 	struct rb_node *node;
 	unsigned int seq;
 
-	lockdep_assert(srcu_read_lock_held(&uprobes_srcu));
+	lockdep_assert(rcu_read_lock_trace_held());
 
 	do {
 		seq = read_seqcount_begin(&uprobes_seqcount);
@@ -935,8 +933,7 @@  static bool filter_chain(struct uprobe *uprobe, struct mm_struct *mm)
 	bool ret = false;
 
 	down_read(&uprobe->consumer_rwsem);
-	list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node,
-				 srcu_read_lock_held(&uprobes_srcu)) {
+	list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) {
 		ret = consumer_filter(uc, mm);
 		if (ret)
 			break;
@@ -1157,7 +1154,7 @@  void uprobe_unregister_sync(void)
 	 * unlucky enough caller can free consumer's memory and cause
 	 * handler_chain() or handle_uretprobe_chain() to do an use-after-free.
 	 */
-	synchronize_srcu(&uprobes_srcu);
+	synchronize_rcu_tasks_trace();
 }
 EXPORT_SYMBOL_GPL(uprobe_unregister_sync);
 
@@ -1241,19 +1238,18 @@  EXPORT_SYMBOL_GPL(uprobe_register);
 int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add)
 {
 	struct uprobe_consumer *con;
-	int ret = -ENOENT, srcu_idx;
+	int ret = -ENOENT;
 
 	down_write(&uprobe->register_rwsem);
 
-	srcu_idx = srcu_read_lock(&uprobes_srcu);
-	list_for_each_entry_srcu(con, &uprobe->consumers, cons_node,
-				 srcu_read_lock_held(&uprobes_srcu)) {
+	rcu_read_lock_trace();
+	list_for_each_entry_rcu(con, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) {
 		if (con == uc) {
 			ret = register_for_each_vma(uprobe, add ? uc : NULL);
 			break;
 		}
 	}
-	srcu_read_unlock(&uprobes_srcu, srcu_idx);
+	rcu_read_unlock_trace();
 
 	up_write(&uprobe->register_rwsem);
 
@@ -2123,8 +2119,7 @@  static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 
 	current->utask->auprobe = &uprobe->arch;
 
-	list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node,
-				 srcu_read_lock_held(&uprobes_srcu)) {
+	list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) {
 		int rc = 0;
 
 		if (uc->handler) {
@@ -2162,15 +2157,13 @@  handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs)
 {
 	struct uprobe *uprobe = ri->uprobe;
 	struct uprobe_consumer *uc;
-	int srcu_idx;
 
-	srcu_idx = srcu_read_lock(&uprobes_srcu);
-	list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node,
-				 srcu_read_lock_held(&uprobes_srcu)) {
+	rcu_read_lock_trace();
+	list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) {
 		if (uc->ret_handler)
 			uc->ret_handler(uc, ri->func, regs);
 	}
-	srcu_read_unlock(&uprobes_srcu, srcu_idx);
+	rcu_read_unlock_trace();
 }
 
 static struct return_instance *find_next_ret_chain(struct return_instance *ri)
@@ -2255,13 +2248,13 @@  static void handle_swbp(struct pt_regs *regs)
 {
 	struct uprobe *uprobe;
 	unsigned long bp_vaddr;
-	int is_swbp, srcu_idx;
+	int is_swbp;
 
 	bp_vaddr = uprobe_get_swbp_addr(regs);
 	if (bp_vaddr == uprobe_get_trampoline_vaddr())
 		return uprobe_handle_trampoline(regs);
 
-	srcu_idx = srcu_read_lock(&uprobes_srcu);
+	rcu_read_lock_trace();
 
 	uprobe = find_active_uprobe_rcu(bp_vaddr, &is_swbp);
 	if (!uprobe) {
@@ -2319,7 +2312,7 @@  static void handle_swbp(struct pt_regs *regs)
 
 out:
 	/* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */
-	srcu_read_unlock(&uprobes_srcu, srcu_idx);
+	rcu_read_unlock_trace();
 }
 
 /*