From patchwork Thu Aug 29 18:37:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783568 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB866C8303E for ; Thu, 29 Aug 2024 18:38:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FA8F6B00BD; Thu, 29 Aug 2024 14:38:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 67E966B00BE; Thu, 29 Aug 2024 14:38:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F9BC6B00BF; Thu, 29 Aug 2024 14:38:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2D36A6B00BD for ; Thu, 29 Aug 2024 14:38:16 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AA03C1A0791 for ; Thu, 29 Aug 2024 18:38:15 +0000 (UTC) X-FDA: 82506142950.25.AFB99E9 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf14.hostedemail.com (Postfix) with ESMTP id 61A75100010 for ; Thu, 29 Aug 2024 18:38:12 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XF0YpnC5; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf14.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956673; a=rsa-sha256; cv=none; b=PTwB7PF1938EqoVOnKUtlWBsg1Lvj+uzJrGv8bBzdCcO/fkGtElOCWFCpvUX8j/DuRgQ8F T5r+fPpWxaqZYSPwAGXGI6CcQzgvluMLkCMSWApsSe0erGhpanecGXxUVhTOY5ScqUuzeM 35Ou2RyWD9tDjrmHlsR2PRfFkhG3ggg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XF0YpnC5; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf14.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956673; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o/ukAiLYasM21m4ZiLLKXGAN6zTzRdy+4EiIKU1TU2Q=; b=TN+WqrFdat6fHzMBCMehd0tTZ1TosWYF03m2cQM8H1Ulx6SP1/wOsRYGq3jeaRtse9KEcL NnU+/KehQljj2Y7k6WLA+Jj2ce9sbYl4n1tOBGZcluAtoxOiDE2rMC3ugEr3IWpaHFPDlD 62DK5l37UKpJkDkwP/IfymUrOIpZ1O4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 9EA61CE1C3C; Thu, 29 Aug 2024 18:38:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F230C4AF09; Thu, 29 Aug 2024 18:38:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956689; bh=mdH6TrklWVyaR/UpNIC2j1A7Fp9rcW7rzM5fvQ4YaHo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XF0YpnC5vTfJOQNNzjrlVXplIzK4BiEEzeHP3AGqpYm8JuozzgWT8V3GTt0aHub7q 91Vxk4w5bA7UQbiu+92B+/1BSG52/lE8ORJAcy8loT7JhyZolfthiIwkxzNZ41N6ls CeO098IN8DYVaxzfIXWYpK1ipEvG5UyTpIJmdm6maQk//l0Y8GFBoG+UhM0PD/vl04 L5SZcPvarLd2dxTjc1F4BVFTcF3Mitw2GlxkDMt93LCn6gglORlgtoOSkrzloTSo8x 1cRD30AZ/JurDAGTpIrw77yiUUpqe5kYYDzSQdDGuxnOQvLJGIe+Q9b+KTM1r+7/vI oqseR6t3ofaIA== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 7/8] uprobes: perform lockless SRCU-protected uprobes_tree lookup Date: Thu, 29 Aug 2024 11:37:40 -0700 Message-ID: <20240829183741.3331213-8-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 61A75100010 X-Rspamd-Server: rspam01 X-Stat-Signature: d7zrary8hdms45jdn66swy6ikmnerzka X-HE-Tag: 1724956692-562817 X-HE-Meta: U2FsdGVkX19nwR9IYr/KCjYWUN0wKff8hDwzxnjmkZv9esgyx1yqP5HFAlstakF8of+twYs0CE0jHJoly4x2839RIICXjmP21HsWoRTlt5Sv9inOlZE970MfvIiYDiJ1/BGguwcJhDtASM77fJv1E6jbkEWIxnVTh5C9BEg62yNyBujIXnnMvgC4hSoHgmAJI/tcVGa+xv3PlYkXN5txGo/LKcwLrzINEbjmj9OMENa8t1YsyoaPtkZp5vG3X1t7lT9YUD/nViMpEA5XmETLNuElBSdethdxG6svKeGWk6pcWRfFXT0GLinxW+kvd/Licuz/CxcmecwsIOxuINp6Smw0sEf6EqMl2+n2HIvXZ/phdp/WsK6CViPWXGzAv5AUeV41ED1csDpBAYd/AGZvPDQWSm49TjP/fP5kzL20V4fk99CgQ4tqhlQBZV8XScpfuGQ8Tna3u9F35FYdffDWrhg8GgwOMY80XKlYbLWDkkae8il2UdepUz049ySYXEKMFFDSGAMZHCDI0Z/VG42trwwJtcUaM2+A3kxzoepiizNNH7qRFsGF6i9pb410kHrerf1lH4O2WC1SwViouEWn0p8GcPQpPEcc9Gr3ba1hFLN36sn+ACoJINr5NzxamJ31stVmEZHiG/tNL1h2DSkv7v8umwbVTc047qgyDRmXX4SYKPCprMz56QEQMv5Iafp1dbAmTqXZRxRpkGkk5FDl6G9+WKvdAOuIYI87KpqWtBAWQStcnQn8f35l0bnBQbgL3lOj9GsxRmd7Vbw7v5pWIIKzKUyqYIJY35/6ofpJbTg7lUSfVlJE9BFqHR3kV4lmTXHX93P+wohtCDALLLtvhBgRqfmKtU8sLnggc1vxZU5SmRdotOFFs1RH1yzw6ExJLWRA/fTqYLxcl2tgaor/z/dH3AL/KD3ada/PvSMzjRPS1upynXS2Z+e/xvMS8c1FZ0RqxLGKHZS2oEsyfu+ yDLJGvZ+ 8rZvP3xaMOHwXXYs8rP9Ijmk+FWRCnFbkPVSP3H1bnkelid9oJrsUf9w0xcAZZRlyntD8hGfig8cHtpBh/9TY+hT4m+hpjKQ7/ZBuiHxWT7XFZKLMyFUJxhN8Y55qLNHTeJlnCERXpHhh844q5XOaiHef7c/qv/A+OKPkeZ3aEKpb9CRuUf+BcYsndNPdacs0l9m5DJl62dxGpfUEeo52B3Sv0CU3tIcrz6c6wA3OC78UyOCOp/2H3XGAAnL4aVTdXZVTMuqiLpOjr9V8ytpWxX6d8+ENWjl6RbN3AqksE8qgWcuyRhDYxN5GsSpQMespV5rTrxXNuc2Ihr5zQiZ710YoWmH9N7obbURno7j9B3XUtFW9Mb0uq2WnUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Another big bottleneck to scalablity is uprobe_treelock that's taken in a very hot path in handle_swbp(). Now that uprobes are SRCU-protected, take advantage of that and make uprobes_tree RB-tree look up lockless. To make RB-tree RCU-protected lockless lookup correct, we need to take into account that such RB-tree lookup can return false negatives if there are parallel RB-tree modifications (rotations) going on. We use seqcount lock to detect whether RB-tree changed, and if we find nothing while RB-tree got modified inbetween, we just retry. If uprobe was found, then it's guaranteed to be a correct lookup. With all the lock-avoiding changes done, we get a pretty decent improvement in performance and scalability of uprobes with number of CPUs, even though we are still nowhere near linear scalability. This is due to SRCU not really scaling very well with number of CPUs on a particular hardware that was used for testing (80-core Intel Xeon Gold 6138 CPU @ 2.00GHz), but also due to the remaning mmap_lock, which is currently taken to resolve interrupt address to inode+offset and then uprobe instance. And, of course, uretprobes still need similar RCU to avoid refcount in the hot path, which will be addressed in the follow up patches. Nevertheless, the improvement is good. We used BPF selftest-based uprobe-nop and uretprobe-nop benchmarks to get the below numbers, varying number of CPUs on which uprobes and uretprobes are triggered. BASELINE ======== uprobe-nop ( 1 cpus): 3.032 ± 0.023M/s ( 3.032M/s/cpu) uprobe-nop ( 2 cpus): 3.452 ± 0.005M/s ( 1.726M/s/cpu) uprobe-nop ( 4 cpus): 3.663 ± 0.005M/s ( 0.916M/s/cpu) uprobe-nop ( 8 cpus): 3.718 ± 0.038M/s ( 0.465M/s/cpu) uprobe-nop (16 cpus): 3.344 ± 0.008M/s ( 0.209M/s/cpu) uprobe-nop (32 cpus): 2.288 ± 0.021M/s ( 0.071M/s/cpu) uprobe-nop (64 cpus): 3.205 ± 0.004M/s ( 0.050M/s/cpu) uretprobe-nop ( 1 cpus): 1.979 ± 0.005M/s ( 1.979M/s/cpu) uretprobe-nop ( 2 cpus): 2.361 ± 0.005M/s ( 1.180M/s/cpu) uretprobe-nop ( 4 cpus): 2.309 ± 0.002M/s ( 0.577M/s/cpu) uretprobe-nop ( 8 cpus): 2.253 ± 0.001M/s ( 0.282M/s/cpu) uretprobe-nop (16 cpus): 2.007 ± 0.000M/s ( 0.125M/s/cpu) uretprobe-nop (32 cpus): 1.624 ± 0.003M/s ( 0.051M/s/cpu) uretprobe-nop (64 cpus): 2.149 ± 0.001M/s ( 0.034M/s/cpu) SRCU CHANGES ============ uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu) uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu) uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu) uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu) uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu) uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu) uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu) uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu) uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu) uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu) uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu) uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu) uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu) uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu) Peak througput increased from 3.7 mln/s (uprobe triggerings) up to about 8 mln/s. For uretprobes it's a bit more modest with bump from 2.4 mln/s to 5mln/s. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Andrii Nakryiko --- kernel/events/uprobes.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index e9b755ddf960..8a464cf38127 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -40,6 +40,7 @@ static struct rb_root uprobes_tree = RB_ROOT; #define no_uprobe_events() RB_EMPTY_ROOT(&uprobes_tree) static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */ +static seqcount_rwlock_t uprobes_seqcount = SEQCNT_RWLOCK_ZERO(uprobes_seqcount, &uprobes_treelock); DEFINE_STATIC_SRCU(uprobes_srcu); @@ -634,8 +635,11 @@ static void put_uprobe(struct uprobe *uprobe) write_lock(&uprobes_treelock); - if (uprobe_is_active(uprobe)) + if (uprobe_is_active(uprobe)) { + write_seqcount_begin(&uprobes_seqcount); rb_erase(&uprobe->rb_node, &uprobes_tree); + write_seqcount_end(&uprobes_seqcount); + } write_unlock(&uprobes_treelock); @@ -701,14 +705,26 @@ static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset) .offset = offset, }; struct rb_node *node; + unsigned int seq; lockdep_assert(srcu_read_lock_held(&uprobes_srcu)); - read_lock(&uprobes_treelock); - node = rb_find(&key, &uprobes_tree, __uprobe_cmp_key); - read_unlock(&uprobes_treelock); + do { + seq = read_seqcount_begin(&uprobes_seqcount); + node = rb_find_rcu(&key, &uprobes_tree, __uprobe_cmp_key); + /* + * Lockless RB-tree lookups can result only in false negatives. + * If the element is found, it is correct and can be returned + * under RCU protection. If we find nothing, we need to + * validate that seqcount didn't change. If it did, we have to + * try again as we might have missed the element (false + * negative). If seqcount is unchanged, search truly failed. + */ + if (node) + return __node_2_uprobe(node); + } while (read_seqcount_retry(&uprobes_seqcount, seq)); - return node ? __node_2_uprobe(node) : NULL; + return NULL; } /* @@ -730,7 +746,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe) { struct rb_node *node; again: - node = rb_find_add(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); + node = rb_find_add_rcu(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); if (node) { struct uprobe *u = __node_2_uprobe(node); @@ -755,7 +771,9 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe) struct uprobe *u; write_lock(&uprobes_treelock); + write_seqcount_begin(&uprobes_seqcount); u = __insert_uprobe(uprobe); + write_seqcount_end(&uprobes_seqcount); write_unlock(&uprobes_treelock); return u;