From patchwork Thu Aug 29 18:37:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783565 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E41A2C8303E for ; Thu, 29 Aug 2024 18:38:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8A9D6B00B4; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B131D6B00B6; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B1D26B00B7; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6A3E96B00B6 for ; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1E61B160814 for ; Thu, 29 Aug 2024 18:38:03 +0000 (UTC) X-FDA: 82506142446.09.9FCC00F Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf17.hostedemail.com (Postfix) with ESMTP id 7281340008 for ; Thu, 29 Aug 2024 18:38:01 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=W1RTkwN6; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956660; a=rsa-sha256; cv=none; b=FIqKBatwea29t9s1Dic+NEFq7AwiZbdMD9C13M8FPyMv4vIcgsq3OU/do5cbSWEEgq1G/L HdkAAMJB0Y+0Yk8oV9nhRaePCNrOF/UE2rUU5S9THbICGaK5A+PXLF5rVvEBK+XsCaIEus bnYx2dIZrZ+LjPRmsbNpYiWxJcRA52o= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=W1RTkwN6; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rye/ZL56FAgXcgLeJIdStCBWJGqiX8EXiu7rAVfwJSU=; b=Nvf1kJV0cF5vM4gNFHDnPRKagcNbZ9fCO6szFphcy+7vNlugU0755xhL5CCq4RPuYK445r tu8GN7X7XWeCR8zMIp7Ol6PYD0rWf83vHbEldK3Njl4b/LrWuaq/jMWTLa5zp+d0dredBo 9jkPJLzIPHtLU95jo0z/W+pup7bREVA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 63904A42744; Thu, 29 Aug 2024 18:37:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6BB9FC4CEC7; Thu, 29 Aug 2024 18:37:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956679; bh=F7UsqMhK02VGjTw6o3bhys9hh68g9/OujJofPiFpMS4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=W1RTkwN6Rbj5HoYk9B2TAHsY06cg4q4HSn9uT472zc6E48u/a4M15I9RyHUKAVCnb fFbx0cHQm+lyb/Yz4zeCGwqYzGijtLJkzX9o3qB9oqn4tDYfKwypX+yKqhjVUlorO2 EnT+sqPdXaXGyiKnvq2PDMpo5/HAEgc67hCg1LE/S5qREVOunnxZ2spYN6PVBPJGI8 BEiFWZJ9tpBuIWCq0ffuUGVAu4laXHbmHtuv9Pg7SN2yAMWvBfxZCDuRT1HrCx2YzB 6qwZ0Fu8zJ8qSGQtcAOVcFbX5gzAUiXZMsmQ9uyrI5u8z7a5iSxf7RKfB7G+o8Patn lq0HQFjJxo9zA== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 4/8] uprobes: travers uprobe's consumer list locklessly under SRCU protection Date: Thu, 29 Aug 2024 11:37:37 -0700 Message-ID: <20240829183741.3331213-5-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 7281340008 X-Rspamd-Server: rspam01 X-Stat-Signature: 3cx89et35rer63os7wnje8sdcksqpjh1 X-HE-Tag: 1724956681-146669 X-HE-Meta: U2FsdGVkX19fgo4hqIh+bqiXTkjEDfpPFDGmClIOaDYtf2G1/jx6P92hkDTu0N86iHRvDwWQ85DmeA46WG9j3BWhhrhs5At2FPndHw7b9N64XyZWAbzvWvrEWrUVyn7exgMCB/h7OPCXqjFpvTFxNI0vXenqGZdnVYy5y1SGUTJCE3PRGivZ6BIJcxmaPBlQX05CiTyZp2BE3PYw+EkbS/kdPOBUM+7isyfeVzgVhx4BCbo8I8xgVDk3dJe4Wu3fH7ECZp98uiJUvremzS+qSaYhlXxHEnlcTXNHe8RDzi7AvwdF7sa70hX2EGuNYNniVFcFcEeDFsQS3mXcVl/XR4WoVQVaBHiBx4uRlQE2DhMT4e+W3eWM3RYFd7Z2NTJMnBSdNx4ktwR3WW9Ia10DKP21MNkimYDUmjGfIWsJhrvhBJ3rOmUeNOyjBjGUhYriIDkufAnFS+eXkzNdkuVz8sYc89nEXfgiFq+2mq/zA2b75y9b5oS300ku75LhMrSHVQetPL1nGFZKGhIPv/7B3ldj5YxyGoE5GtZ+a9h02+eZNwkyHeB+v91VEeoSaatzP79NxhfFtPYXHP/aDztPnAaXjvd9Zj2FqrxcQc0TfqyLEZUbrBr7EPSptSPNCP10PP0mnPZUIy7Ahx6pAiqRYizFFs8UzueOO7i7+kBpfb0yYSDHc3fvt1ICRejYNIWpBhuGvoXEWRyBdXf+us1fzKvkDMSFrOpAJCy67qA+2NX7Lv+uf/Bscgx5csknJdPt4Vjd8xW0aYk/aoN7NiS85FEyHOGL4zVxJVjIWKqhMm8IauCvFlcBE4mUJJA18t/gWCjCIH9XNcC8mbhSXxeaevKF2XhX3i3S7VoMoe2NuUAtrZMBgv3NGe9PCnSBHmrXP+YiHWyKMfai8cJEjPVTM1zadfco/YTzHIgwfZ4SfD8jcUgRAH1FO8shuzUy0bERuHBv6jRhtm6fnj00bU9 RL0nj68U TZekeoYTMPRLf0iM5G79FmjFAnC5y14WsCA3wYx6uatsEnQSNE150iDpYLBQ6rVhYUuTNVQ2GGf9/vbqKGMnD0qL0r7PRllfagDaBII62sxI+2Be3A3S8cUjrtjgz+xkd4sRo1rMUldbJ0e0bzWRKXVmiygU2/pJ2N1/e2Showzl/x/IQin6crQotQ581iqMVK0FWbKLQ3DEeMUxH25NP2PLstJzYDOEBmicLXSwlp+WMY5Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: uprobe->register_rwsem is one of a few big bottlenecks to scalability of uprobes, so we need to get rid of it to improve uprobe performance and multi-CPU scalability. First, we turn uprobe's consumer list to a typical doubly-linked list and utilize existing RCU-aware helpers for traversing such lists, as well as adding and removing elements from it. For entry uprobes we already have SRCU protection active since before uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe won't go away from under us, but we add SRCU protection around consumer list traversal. Lastly, to keep handler_chain()'s UPROBE_HANDLER_REMOVE handling simple, we remember whether any removal was requested during handler calls, but then we double-check the decision under a proper register_rwsem using consumers' filter callbacks. Handler removal is very rare, so this extra lock won't hurt performance, overall, but we also avoid the need for any extra protection (e.g., seqcount locks). Signed-off-by: Andrii Nakryiko --- include/linux/uprobes.h | 2 +- kernel/events/uprobes.c | 104 +++++++++++++++++++++++----------------- 2 files changed, 62 insertions(+), 44 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 9cf0dce62e4c..29c935b0d504 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -35,7 +35,7 @@ struct uprobe_consumer { struct pt_regs *regs); bool (*filter)(struct uprobe_consumer *self, struct mm_struct *mm); - struct uprobe_consumer *next; + struct list_head cons_node; }; #ifdef CONFIG_UPROBES diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8bdcdc6901b2..97e58d160647 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -59,7 +59,7 @@ struct uprobe { struct rw_semaphore register_rwsem; struct rw_semaphore consumer_rwsem; struct list_head pending_list; - struct uprobe_consumer *consumers; + struct list_head consumers; struct inode *inode; /* Also hold a ref to inode */ struct rcu_head rcu; loff_t offset; @@ -783,6 +783,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, uprobe->inode = inode; uprobe->offset = offset; uprobe->ref_ctr_offset = ref_ctr_offset; + INIT_LIST_HEAD(&uprobe->consumers); init_rwsem(&uprobe->register_rwsem); init_rwsem(&uprobe->consumer_rwsem); RB_CLEAR_NODE(&uprobe->rb_node); @@ -808,32 +809,19 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc) { down_write(&uprobe->consumer_rwsem); - uc->next = uprobe->consumers; - uprobe->consumers = uc; + list_add_rcu(&uc->cons_node, &uprobe->consumers); up_write(&uprobe->consumer_rwsem); } /* * For uprobe @uprobe, delete the consumer @uc. - * Return true if the @uc is deleted successfully - * or return false. + * Should never be called with consumer that's not part of @uprobe->consumers. */ -static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) +static void consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) { - struct uprobe_consumer **con; - bool ret = false; - down_write(&uprobe->consumer_rwsem); - for (con = &uprobe->consumers; *con; con = &(*con)->next) { - if (*con == uc) { - *con = uc->next; - ret = true; - break; - } - } + list_del_rcu(&uc->cons_node); up_write(&uprobe->consumer_rwsem); - - return ret; } static int __copy_insn(struct address_space *mapping, struct file *filp, @@ -929,7 +917,8 @@ static bool filter_chain(struct uprobe *uprobe, struct mm_struct *mm) bool ret = false; down_read(&uprobe->consumer_rwsem); - for (uc = uprobe->consumers; uc; uc = uc->next) { + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { ret = consumer_filter(uc, mm); if (ret) break; @@ -1125,18 +1114,29 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) int err; down_write(&uprobe->register_rwsem); - if (WARN_ON(!consumer_del(uprobe, uc))) { - err = -ENOENT; - } else { - err = register_for_each_vma(uprobe, NULL); - /* TODO : cant unregister? schedule a worker thread */ - if (unlikely(err)) - uprobe_warn(current, "unregister, leaking uprobe"); - } + consumer_del(uprobe, uc); + err = register_for_each_vma(uprobe, NULL); up_write(&uprobe->register_rwsem); - if (!err) - put_uprobe(uprobe); + /* TODO : cant unregister? schedule a worker thread */ + if (unlikely(err)) { + uprobe_warn(current, "unregister, leaking uprobe"); + goto out_sync; + } + + put_uprobe(uprobe); + +out_sync: + /* + * Now that handler_chain() and handle_uretprobe_chain() iterate over + * uprobe->consumers list under RCU protection without holding + * uprobe->register_rwsem, we need to wait for RCU grace period to + * make sure that we can't call into just unregistered + * uprobe_consumer's callbacks anymore. If we don't do that, fast and + * unlucky enough caller can free consumer's memory and cause + * handler_chain() or handle_uretprobe_chain() to do an use-after-free. + */ + synchronize_srcu(&uprobes_srcu); } EXPORT_SYMBOL_GPL(uprobe_unregister); @@ -1214,13 +1214,20 @@ EXPORT_SYMBOL_GPL(uprobe_register); int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add) { struct uprobe_consumer *con; - int ret = -ENOENT; + int ret = -ENOENT, srcu_idx; down_write(&uprobe->register_rwsem); - for (con = uprobe->consumers; con && con != uc ; con = con->next) - ; - if (con) - ret = register_for_each_vma(uprobe, add ? uc : NULL); + + srcu_idx = srcu_read_lock(&uprobes_srcu); + list_for_each_entry_srcu(con, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { + if (con == uc) { + ret = register_for_each_vma(uprobe, add ? uc : NULL); + break; + } + } + srcu_read_unlock(&uprobes_srcu, srcu_idx); + up_write(&uprobe->register_rwsem); return ret; @@ -2085,10 +2092,12 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) struct uprobe_consumer *uc; int remove = UPROBE_HANDLER_REMOVE; bool need_prep = false; /* prepare return uprobe, when needed */ + bool has_consumers = false; - down_read(&uprobe->register_rwsem); current->utask->auprobe = &uprobe->arch; - for (uc = uprobe->consumers; uc; uc = uc->next) { + + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { int rc = 0; if (uc->handler) { @@ -2101,17 +2110,24 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) need_prep = true; remove &= rc; + has_consumers = true; } current->utask->auprobe = NULL; if (need_prep && !remove) prepare_uretprobe(uprobe, regs); /* put bp at return */ - if (remove && uprobe->consumers) { - WARN_ON(!uprobe_is_active(uprobe)); - unapply_uprobe(uprobe, current->mm); + if (remove && has_consumers) { + down_read(&uprobe->register_rwsem); + + /* re-check that removal is still required, this time under lock */ + if (!filter_chain(uprobe, current->mm)) { + WARN_ON(!uprobe_is_active(uprobe)); + unapply_uprobe(uprobe, current->mm); + } + + up_read(&uprobe->register_rwsem); } - up_read(&uprobe->register_rwsem); } static void @@ -2119,13 +2135,15 @@ handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs) { struct uprobe *uprobe = ri->uprobe; struct uprobe_consumer *uc; + int srcu_idx; - down_read(&uprobe->register_rwsem); - for (uc = uprobe->consumers; uc; uc = uc->next) { + srcu_idx = srcu_read_lock(&uprobes_srcu); + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { if (uc->ret_handler) uc->ret_handler(uc, ri->func, regs); } - up_read(&uprobe->register_rwsem); + srcu_read_unlock(&uprobes_srcu, srcu_idx); } static struct return_instance *find_next_ret_chain(struct return_instance *ri)