From patchwork Mon Oct 28 01:08:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13852773 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7569ED13590 for ; Mon, 28 Oct 2024 01:09:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07B066B0089; Sun, 27 Oct 2024 21:09:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 004676B008C; Sun, 27 Oct 2024 21:09:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D97746B0092; Sun, 27 Oct 2024 21:09:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B0F906B0089 for ; Sun, 27 Oct 2024 21:09:12 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id DAAD4A1D5A for ; Mon, 28 Oct 2024 01:08:31 +0000 (UTC) X-FDA: 82721226672.30.8450AF7 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id F3AC640003 for ; Mon, 28 Oct 2024 01:08:52 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XcEsDuxZ; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730077696; a=rsa-sha256; cv=none; b=G3HmU0GEswxDpAXQInIXaBBDsWYqshvjTWRAupkxw9wFuhwLuD2tIdIL5QHGUQBHsXHCnx /lDdbb3ZlwDnKKBFXBE+PDSTg2sWRs8u1LkNjHIbm8FmvkJ8gFfs4DwBpr3wKg47MaV8E7 UvXi4HeAW7a2zp0fGf/J4fpUBpKbiHI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XcEsDuxZ; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730077696; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aJ9yRWXYZ/yelMyUDnfmysXw9x3hWWCe4ChOBPCba1Q=; b=WcgDxTfOUI9Ro7JhM4bhRYZCO/e+Fw6Txstuyw0F3YRAFokLIo232jojcFQ4v/T8lRTfci mG4s1DxzwUsAOS2jZfC/bJq6CU+Zw30FGRzcTuIgCUhBIfWoyRRluHQ62F+p7rvQvqAvcw SqG+EPbR/6J4NjiOxNoR3Yn6Qio4Ewk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id AC2A65C486B; Mon, 28 Oct 2024 01:08:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 65703C4CEE5; Mon, 28 Oct 2024 01:09:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730077749; bh=Ribok69L0V+IQIfX1LKAa+j7slCTXJ/a5HygjFWCqNY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XcEsDuxZ44cpUaR4KUbORo/ERRrafxKAF05h7Dd+n1ohShoR33iYzss2xVzkX9bZy NUA2c2/P93p+1NY9BMPbz4ZhkGNAColwIYcLIkLlh2xb7UNFCr/2OVBIwND0TfPClO beYu28enquoY3AqAvW6Rn0Hh1VpYSsaOhhkqDSzC3pNPmq+ARjE5S+TFvAdIRba7vC B251sDrDieSzrwoxJpURaYek9EDwPpQG1pvuqNwhl+W23/1EpA/uFDnIxtKnlI9F/A FmYTcm7H7O71F77iLvdRJxipSDNTwtQR1UO+1v/V0OA1iqT0qzRg8WsMBTP86OkiSe E2A9BGrGM6ZnA== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, peterz@infradead.org Cc: oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, mhocko@kernel.org, vbabka@suse.cz, shakeel.butt@linux.dev, hannes@cmpxchg.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, arnd@arndb.de, richard.weiyang@gmail.com, zhangpeng.00@bytedance.com, linmiaohe@huawei.com, viro@zeniv.linux.org.uk, hca@linux.ibm.com, Andrii Nakryiko Subject: [PATCH v4 tip/perf/core 1/4] mm: Convert mm_lock_seq to a proper seqcount Date: Sun, 27 Oct 2024 18:08:15 -0700 Message-ID: <20241028010818.2487581-2-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241028010818.2487581-1-andrii@kernel.org> References: <20241028010818.2487581-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: F3AC640003 X-Rspamd-Server: rspam01 X-Stat-Signature: aa5t8o7grsz8jfagod7g4w75b5kgzg7k X-HE-Tag: 1730077732-82833 X-HE-Meta: U2FsdGVkX1/vErlgRfYWsP/s7TXrE12mf4uEvwTllA/BE4Oh4CupPfAjxAKe1UAmHtEqc7nNKy8viq3RNl2kXT7q52sBol7b3WwnpO+oE6bBsGRMVY0lFHuWSP9HvgqR49Rl4CLFib0DVysI5bbmoLTy8ulwuqxAATjh+3xVcdiviThpeELbk2HPnD3Gf4obbyOY697UO/uZzGlCh/JochL5uwtsbMAA9O/57BR9pjy9F0SyXmxfteIKUsFxy0IAOaODBWqyNUPOgGQ6zczZp9KBezokIUaPH6iy8By5lw9+sKzUuqxMvvOhJYg676fwOpvOYyuO7strquJRodRYdwDYOjcJGu8t12ZpT047T5wW8cu9HNBnlkwXb+xltjxMQv1ZfbHcLYbA6CxSBijeD1JMGzpX+mAPdx49m5TgTS8Dym+IV9CJ2FkDen5DJhNqgph+y3eDxvWZ4oBZP7JSAQcMlWIFQlO7JUJ+PFzoHqslcManjCRVS9pIr90IaLMwiuV60tTYDlR5h+Ok0iTwPRa+rBuCPYI7APY9DY0XvaYa8KxhwXzc6jqnCCMJaqDD3+iut09MkjcdvcR1ClzxwxM3U75OiV31Eql2aNBKoBWB5/RKc2NSukbgX1GMnh0KXN/VLZoYM41F5GwdSFtMckqecsYIjEAD8ooZxjmcszOyTBPX8Z/dBR9kyNZ+smyH5a42cG17VRiYTveyQkFESTjyQdHq0ViiWFeUTKTNZ88TC7YbXccGor66fpbsur/FUYOkgMuBTYbLNk9uV2y0Usry9SAGPnFQI5zSIpirvoLT/NlBYQN9rbLt89oBtqFvUZl1Q3ngKrcB0mYkm3ptw2IfbH9ll7IHi28zvAv7RiurPoUTkYpLkjNxhhaQIAqK7Qdv2bgUG6/fWiGBtkrjfaKnr/wHDayQvGU7MbtloMstR+8ip/yGk+Rc/3U52mB1wQJdacLDiJzRoVLkQVl zqoqv9E7 o6MK6ZqMloEC+w20tY6l+zJOPv+a8xb56/XfyflbdZK5nGM3KT+18Fj0LmAuhOH0yK2zedmJp+28l9dbkdr0UX1Pcjcfhg0T1YtttoQbARZqBay727TDw1chlWl0ud1He3kcyGxfcTVKBrAXui5/akXrlZTswy/8NdjxeiXE3S5Y8ZJLLUNY8Ln4Uxk9KxMI0Sowxla9qHHyu6/d28XZ65KUzAX0l/u+o9mKJPNSPLQH610II+g9I5lNCc6dqSLATlq9XYWRqTSX/X8/VCuLKYEPyFiD//GqjROEw2xOZoAf5n0mKknOmhFFqiziVkvu1ZrbNcpi4zlLKGzkHLJ8xLYhKYk2lNXgafLvMoAVHWrd/wB5fnkkfb0HvJZKl9EGo3sJz8cZSJZeBr5qEkYJTCVPZT49IS+DbR+vgy+qxeD9guTaoMn22FyrN/Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Suren Baghdasaryan Convert mm_lock_seq to be seqcount_t and change all mmap_write_lock variants to increment it, in-line with the usual seqcount usage pattern. This lets us check whether the mmap_lock is write-locked by checking mm_lock_seq.sequence counter (odd=locked, even=unlocked). This will be used when implementing mmap_lock speculation functions. As a result vm_lock_seq is also change to be unsigned to match the type of mm_lock_seq.sequence. Suggested-by: Peter Zijlstra Signed-off-by: Suren Baghdasaryan Signed-off-by: Andrii Nakryiko Acked-by: Vlastimil Babka --- include/linux/mm.h | 12 +++---- include/linux/mm_types.h | 7 ++-- include/linux/mmap_lock.h | 58 +++++++++++++++++++++----------- kernel/fork.c | 5 +-- mm/init-mm.c | 2 +- tools/testing/vma/vma.c | 4 +-- tools/testing/vma/vma_internal.h | 4 +-- 7 files changed, 56 insertions(+), 36 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ecf63d2b0582..94b537088142 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -698,7 +698,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) * we don't rely on for anything - the mm_lock_seq read against which we * need ordering is below. */ - if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq)) + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq.sequence)) return false; if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) @@ -715,7 +715,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) * after it has been unlocked. * This pairs with RELEASE semantics in vma_end_write_all(). */ - if (unlikely(vma->vm_lock_seq == smp_load_acquire(&vma->vm_mm->mm_lock_seq))) { + if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&vma->vm_mm->mm_lock_seq))) { up_read(&vma->vm_lock->lock); return false; } @@ -730,7 +730,7 @@ static inline void vma_end_read(struct vm_area_struct *vma) } /* WARNING! Can only be used if mmap_lock is expected to be write-locked */ -static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) +static bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_lock_seq) { mmap_assert_write_locked(vma->vm_mm); @@ -738,7 +738,7 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) * current task is holding mmap_write_lock, both vma->vm_lock_seq and * mm->mm_lock_seq can't be concurrently modified. */ - *mm_lock_seq = vma->vm_mm->mm_lock_seq; + *mm_lock_seq = vma->vm_mm->mm_lock_seq.sequence; return (vma->vm_lock_seq == *mm_lock_seq); } @@ -749,7 +749,7 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) */ static inline void vma_start_write(struct vm_area_struct *vma) { - int mm_lock_seq; + unsigned int mm_lock_seq; if (__is_vma_write_locked(vma, &mm_lock_seq)) return; @@ -767,7 +767,7 @@ static inline void vma_start_write(struct vm_area_struct *vma) static inline void vma_assert_write_locked(struct vm_area_struct *vma) { - int mm_lock_seq; + unsigned int mm_lock_seq; VM_BUG_ON_VMA(!__is_vma_write_locked(vma, &mm_lock_seq), vma); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..76e0cdc0462b 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -715,7 +715,7 @@ struct vm_area_struct { * counter reuse can only lead to occasional unnecessary use of the * slowpath. */ - int vm_lock_seq; + unsigned int vm_lock_seq; /* Unstable RCU readers are allowed to read this. */ struct vma_lock *vm_lock; #endif @@ -887,6 +887,9 @@ struct mm_struct { * Roughly speaking, incrementing the sequence number is * equivalent to releasing locks on VMAs; reading the sequence * number can be part of taking a read lock on a VMA. + * Incremented every time mmap_lock is write-locked/unlocked. + * Initialized to 0, therefore odd values indicate mmap_lock + * is write-locked and even values that it's released. * * Can be modified under write mmap_lock using RELEASE * semantics. @@ -895,7 +898,7 @@ struct mm_struct { * Can be read with ACQUIRE semantics if not holding write * mmap_lock. */ - int mm_lock_seq; + seqcount_t mm_lock_seq; #endif diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index de9dc20b01ba..6b3272686860 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -71,39 +71,38 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm) } #ifdef CONFIG_PER_VMA_LOCK -/* - * Drop all currently-held per-VMA locks. - * This is called from the mmap_lock implementation directly before releasing - * a write-locked mmap_lock (or downgrading it to read-locked). - * This should normally NOT be called manually from other places. - * If you want to call this manually anyway, keep in mind that this will release - * *all* VMA write locks, including ones from further up the stack. - */ -static inline void vma_end_write_all(struct mm_struct *mm) +static inline void mm_lock_seqcount_init(struct mm_struct *mm) { - mmap_assert_write_locked(mm); - /* - * Nobody can concurrently modify mm->mm_lock_seq due to exclusive - * mmap_lock being held. - * We need RELEASE semantics here to ensure that preceding stores into - * the VMA take effect before we unlock it with this store. - * Pairs with ACQUIRE semantics in vma_start_read(). - */ - smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); + seqcount_init(&mm->mm_lock_seq); +} + +static inline void mm_lock_seqcount_begin(struct mm_struct *mm) +{ + do_raw_write_seqcount_begin(&mm->mm_lock_seq); +} + +static inline void mm_lock_seqcount_end(struct mm_struct *mm) +{ + do_raw_write_seqcount_end(&mm->mm_lock_seq); } + #else -static inline void vma_end_write_all(struct mm_struct *mm) {} +static inline void mm_lock_seqcount_init(struct mm_struct *mm) {} +static inline void mm_lock_seqcount_begin(struct mm_struct *mm) {} +static inline void mm_lock_seqcount_end(struct mm_struct *mm) {} #endif static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); + mm_lock_seqcount_init(mm); } static inline void mmap_write_lock(struct mm_struct *mm) { __mmap_lock_trace_start_locking(mm, true); down_write(&mm->mmap_lock); + mm_lock_seqcount_begin(mm); __mmap_lock_trace_acquire_returned(mm, true, true); } @@ -111,6 +110,7 @@ static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass) { __mmap_lock_trace_start_locking(mm, true); down_write_nested(&mm->mmap_lock, subclass); + mm_lock_seqcount_begin(mm); __mmap_lock_trace_acquire_returned(mm, true, true); } @@ -120,10 +120,30 @@ static inline int mmap_write_lock_killable(struct mm_struct *mm) __mmap_lock_trace_start_locking(mm, true); ret = down_write_killable(&mm->mmap_lock); + if (!ret) + mm_lock_seqcount_begin(mm); __mmap_lock_trace_acquire_returned(mm, true, ret == 0); return ret; } +/* + * Drop all currently-held per-VMA locks. + * This is called from the mmap_lock implementation directly before releasing + * a write-locked mmap_lock (or downgrading it to read-locked). + * This should normally NOT be called manually from other places. + * If you want to call this manually anyway, keep in mind that this will release + * *all* VMA write locks, including ones from further up the stack. + */ +static inline void vma_end_write_all(struct mm_struct *mm) +{ + mmap_assert_write_locked(mm); + /* + * Nobody can concurrently modify mm->mm_lock_seq due to exclusive + * mmap_lock being held. + */ + mm_lock_seqcount_end(mm); +} + static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); diff --git a/kernel/fork.c b/kernel/fork.c index 89ceb4a68af2..55c4088543dc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -448,7 +448,7 @@ static bool vma_lock_alloc(struct vm_area_struct *vma) return false; init_rwsem(&vma->vm_lock->lock); - vma->vm_lock_seq = -1; + vma->vm_lock_seq = UINT_MAX; return true; } @@ -1261,9 +1261,6 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, seqcount_init(&mm->write_protect_seq); mmap_init_lock(mm); INIT_LIST_HEAD(&mm->mmlist); -#ifdef CONFIG_PER_VMA_LOCK - mm->mm_lock_seq = 0; -#endif mm_pgtables_bytes_init(mm); mm->map_count = 0; mm->locked_vm = 0; diff --git a/mm/init-mm.c b/mm/init-mm.c index 24c809379274..6af3ad675930 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -40,7 +40,7 @@ struct mm_struct init_mm = { .arg_lock = __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), #ifdef CONFIG_PER_VMA_LOCK - .mm_lock_seq = 0, + .mm_lock_seq = SEQCNT_ZERO(init_mm.mm_lock_seq), #endif .user_ns = &init_user_ns, .cpu_bitmap = CPU_BITS_NONE, diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c index c53f220eb6cc..bcdf831dfe3e 100644 --- a/tools/testing/vma/vma.c +++ b/tools/testing/vma/vma.c @@ -87,7 +87,7 @@ static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm, * begun. Linking to the tree will have caused this to be incremented, * which means we will get a false positive otherwise. */ - vma->vm_lock_seq = -1; + vma->vm_lock_seq = UINT_MAX; return vma; } @@ -212,7 +212,7 @@ static bool vma_write_started(struct vm_area_struct *vma) int seq = vma->vm_lock_seq; /* We reset after each check. */ - vma->vm_lock_seq = -1; + vma->vm_lock_seq = UINT_MAX; /* The vma_start_write() stub simply increments this value. */ return seq > -1; diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h index c5b9da034511..4007ec580f85 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -231,7 +231,7 @@ struct vm_area_struct { * counter reuse can only lead to occasional unnecessary use of the * slowpath. */ - int vm_lock_seq; + unsigned int vm_lock_seq; struct vma_lock *vm_lock; #endif @@ -406,7 +406,7 @@ static inline bool vma_lock_alloc(struct vm_area_struct *vma) return false; init_rwsem(&vma->vm_lock->lock); - vma->vm_lock_seq = -1; + vma->vm_lock_seq = UINT_MAX; return true; } From patchwork Mon Oct 28 01:08:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13852774 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF147D13591 for ; Mon, 28 Oct 2024 01:09:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E6C86B0092; Sun, 27 Oct 2024 21:09:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4961A6B0095; Sun, 27 Oct 2024 21:09:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C31E6B0093; Sun, 27 Oct 2024 21:09:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 07ED36B008C for ; Sun, 27 Oct 2024 21:09:16 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9955BC1C32 for ; Mon, 28 Oct 2024 01:08:51 +0000 (UTC) X-FDA: 82721226462.19.B3209E3 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id 43745120011 for ; Mon, 28 Oct 2024 01:08:43 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=R3D7Ov04; spf=pass (imf29.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730077674; a=rsa-sha256; cv=none; b=0mVjVlqSJJWtnHE8SKQ2677DAS5GtfdHcXDGk/iyKXmO2YNCw6bvg+DzdF/qeHVJOGkGje qxOasB7a514rqlsHbhKkB00lErhXFmnYSxED4MRGb8vS7/SXLVrqZ03tAZHVJ0qH4hpxi4 BRxh/zEL7M3LwmWro+qRJLF2H8C+QXo= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=R3D7Ov04; spf=pass (imf29.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730077674; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f+rhBCaUIOowBr7L6zyysIJf2njGDW7nlOA5qUUDwWE=; b=SZG74M+YiPQoW4Eo+1hbtjl7VxoLuFNQzQrPG0QhMUmOrmP+0dVS2aoJ+ijooArDBqWJnC dY/wz4afi0v+3OfIbcz36Pyjm55oxjEPTp7aCV+Qwest1z3GTjwVvVlhqFbWUITMiU73y6 iOea6iXrxycVjJb3y1QwTftHc+CHvFA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id EF9635C4850; Mon, 28 Oct 2024 01:08:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E76EC4CEE5; Mon, 28 Oct 2024 01:09:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730077752; bh=hDtDfS8JqT/fLQMiv3m8WVup/RU6yhW70oJ6haQb60E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R3D7Ov04DpaTyYJl+4SfPWoXLMI5EldjfTRT6VKHmSxekQRYCT+7e+vroidwAIM/y c9UyoWB0yeHAQH9wTW7L78nHqBKgCt8ub4xAaiUUNjh1wTF2ra5CO9zN6Y1+ZM9WXu XyH8q6zDh3Q2i8O3Z0GTJGneeTd+hwvD+2koWe7Tt+V5W6IjIOahBvgr/w8QyRKLZX zh3XZeFvlXwBaoY/ZbL/16DKFuEy1Pqn1S42YcpEXwRg9wU7Db45SReuz3aoSXoDcX pYqPLyrc4kC79COC4fnmbguofXh/ZDGrmKdLWt0rsIyDhYk4VXtGueWWod/9Y+SFIn 07MKzAQ58jXyA== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, peterz@infradead.org Cc: oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, mhocko@kernel.org, vbabka@suse.cz, shakeel.butt@linux.dev, hannes@cmpxchg.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, arnd@arndb.de, richard.weiyang@gmail.com, zhangpeng.00@bytedance.com, linmiaohe@huawei.com, viro@zeniv.linux.org.uk, hca@linux.ibm.com, Andrii Nakryiko Subject: [PATCH v4 tip/perf/core 2/4] mm: Introduce mmap_lock_speculation_{begin|end} Date: Sun, 27 Oct 2024 18:08:16 -0700 Message-ID: <20241028010818.2487581-3-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241028010818.2487581-1-andrii@kernel.org> References: <20241028010818.2487581-1-andrii@kernel.org> MIME-Version: 1.0 X-Stat-Signature: 7t3ww8rou6fmbtuaxk3hgfba3kztmwy7 X-Rspamd-Queue-Id: 43745120011 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1730077723-126534 X-HE-Meta: U2FsdGVkX18jmmJ5UPRYh6hwx+O/KqM6ny2u5oiM95MbzC0Nc+ER/03TK9nCZOfmec7LOQGRcy7riR7CiIcqVQTaMPg8zvu2wIPpJR7jXVgVk/y/5fkGyGeXtGAQUR9j8w0Nr1UhjuOsvVE/rIHIaqSuAuNiBz4Uw5YSJr/LW0DOhM3077l4BtSIy0cMD3TEU0AI32TxmB9BhtMbQmcnGSal1qdy+QEQGwvUE2UNoWXZdBovrk7wCp+L0+0v+IyHAp8QXaXBzc1uOQRkiy7YWyGRygchxuy1uzHKkQecnUgje0tiLyLJHEU5DpzU1nQTzcw/kbYoetR/qkcbW3uQwDLuRxvPUDMy+iFCqo1g+s56NuTJ6kgGgIcoNFMcABA0ZDsezEqE0DkjrsW5hQt4i0a7KDohqFbralhVvxZk7+mqv9Hf1dM4wKMRsLUbhaQexNvRKQncw2qtO8NVdn76ZiDneMG3nRSfAgAY3iNfM2/tBRId6d18PCqEm6T+yEl9j8YdHUaIKm79MA4UmoMSgtYvzgRJ3LF5Qd1tg2Lp+xj4h1oo3MF95EODSBmOtx0gGYntWbdWfkqm+yYJwBjmrYrHPVr+u6l86O/y+d7YfGeM072SWQflnXF0xsOI5tNl5E7Wee97W4P+DVBnIXfc5bsIUi9r6desn48xmveJ/wUzNxfNg6SDE0Xs7seIJ1PfKw5LCAiw2yiSiOASkTSEzachzIbVFoPYnvwwchl6knUmSjYggLMsFFKt01dvLECOywGtSqOn5heSIYbbx2rjhpcCkAOknSy5Bqu9p/GRgD7eI+oLnFCjqTyR5WD4E5oaW7TMmcq2KxoHibQMhm1kwVbddBJPQycfwc0IYGvRzYn4JCls3MTmQ18jJlmaVR18DmLp+9TSVXwRApxFcvLfxm4QABnadXyRQ5Twbaehuqyc5HQ73/Ddn/29dQxVme5+y3XUAekSw9IBYfhELfw wcKEkft4 UR7AWHDDiS36TuwoKjLf9VmIhYO2GMocOaycwUVgtYkIU6AnF3IlsS5Nu1pPhpjVbTd9AeQzBtiuKe7RAYe/gbID3eRcDoqJyMc4ecyQRRJ7id9aZnsGa9lMgopK+a8nqG0VpyK17Us0gSmXZ9MnYZZ3p26cj07i/h6DkBMza7t+X633u77Hx7FgQCMIazfUsKHInJ0UtbqQWAhMqEuu/LaW1J/2O6p1aUIS/2puOAVjmTIWBroAWDUiMXsKIMkBpgfl56KvVGreBDrt21EdSr5GxQPLbV6UYj+YeRmppJyHrER1WmpAemwv2ubAF4yMhgPRFepvwj5wjFAZeae+/au0mSxA1VaPozxazr+lKWjsC1JE2E2ky4gcT/7rCjRV5Vbax3+RYT6MtSn7L4ifDtNF61Skk203bC68fzQE31ZFwT2sn+mSMdcTu5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Suren Baghdasaryan Add helper functions to speculatively perform operations without read-locking mmap_lock, expecting that mmap_lock will not be write-locked and mm is not modified from under us. Suggested-by: Peter Zijlstra Signed-off-by: Suren Baghdasaryan Signed-off-by: Andrii Nakryiko Acked-by: Vlastimil Babka --- include/linux/mmap_lock.h | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 6b3272686860..58dde2e35f7e 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -71,6 +71,7 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm) } #ifdef CONFIG_PER_VMA_LOCK + static inline void mm_lock_seqcount_init(struct mm_struct *mm) { seqcount_init(&mm->mm_lock_seq); @@ -86,11 +87,35 @@ static inline void mm_lock_seqcount_end(struct mm_struct *mm) do_raw_write_seqcount_end(&mm->mm_lock_seq); } -#else +static inline bool mmap_lock_speculation_begin(struct mm_struct *mm, unsigned int *seq) +{ + *seq = raw_read_seqcount(&mm->mm_lock_seq); + /* Allow speculation if mmap_lock is not write-locked */ + return (*seq & 1) == 0; +} + +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, unsigned int seq) +{ + return !do_read_seqcount_retry(&mm->mm_lock_seq, seq); +} + +#else /* CONFIG_PER_VMA_LOCK */ + static inline void mm_lock_seqcount_init(struct mm_struct *mm) {} static inline void mm_lock_seqcount_begin(struct mm_struct *mm) {} static inline void mm_lock_seqcount_end(struct mm_struct *mm) {} -#endif + +static inline bool mmap_lock_speculation_begin(struct mm_struct *mm, unsigned int *seq) +{ + return false; +} + +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, unsigned int seq) +{ + return false; +} + +#endif /* CONFIG_PER_VMA_LOCK */ static inline void mmap_init_lock(struct mm_struct *mm) { From patchwork Mon Oct 28 01:08:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13852775 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C415D13590 for ; Mon, 28 Oct 2024 01:09:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18D346B0095; Sun, 27 Oct 2024 21:09:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13A8D6B0096; Sun, 27 Oct 2024 21:09:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E81B96B0098; Sun, 27 Oct 2024 21:09:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B832B6B0095 for ; Sun, 27 Oct 2024 21:09:18 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8FE211A1D42 for ; Mon, 28 Oct 2024 01:08:38 +0000 (UTC) X-FDA: 82721226924.23.5462835 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf27.hostedemail.com (Postfix) with ESMTP id B3FF440003 for ; Mon, 28 Oct 2024 01:08:53 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=hHZTZEzL; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730077599; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VNfE2dhpfJDeCKjiK8JyTgTQKSCpEFH7HHO0t/e8HY0=; b=w/mEFprFAUhV3bh8F7/buqnmzIYObGMoKSvWGWRCSywS7dGg8SJCaCYBsO9627BhJu7ZpX GPbWALkdFJUhSaGrtiJpskCkoCklFf7fppJSNl/v5wUW+/Nhm1LUy15iEYKe1cnEA2nAri oNPOkBBhof9dvLJWj5zAWBiiKM7QHs0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730077599; a=rsa-sha256; cv=none; b=2sIS+286BA6TNw4b5dkr5QtAdkux/XGYV2Z4PNC7Xy4LTG9bbMBJm5xDG8indf0vI/MS6W 5F56ScMUQIj/E0f+szbCsuJymHu1rZwGjnFySIWyl0KG/zMEEXTuNBGJv4k667CRe6ZweK N3kd0F/IsJH/MyPxIiqVA1dLI0/ITsg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=hHZTZEzL; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id E8E295C49C1; Mon, 28 Oct 2024 01:08:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B52B2C4CEC3; Mon, 28 Oct 2024 01:09:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730077755; bh=d0t2KdzFB54YqijRSkN8zgCUkKArImLR1/o5M57T+Y4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hHZTZEzLr1iL3FHbaDuT4QMbEbbJCKfdJHak1Cphw3Io7OVy4N+9xviZFpn66aI1f PlJOkge1nFNeUdTsUnrOMPN1hqSeV1o0gMm9Lg/U2BhrunABIRUiRijxF0F2+CkU8d v58avLtFpLU/wQJTBPSWS4XGJpq96/WS8igjo7MUUUodF4yAQ3kroN4bJHsKngkHlT l8xJpHr70NPY6NK78bXQv05YYUiquo6oYCTtSTdJKylBEBOTHst9WNjrRy8s45NIDO bgUj7V6k42jCxv/1X9EWh7uooH0Cih40WwIP7W2TzNwQMR6cvih5Yu9jll42E3zSyv TwSiGKYcMRJ4g== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, peterz@infradead.org Cc: oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, mhocko@kernel.org, vbabka@suse.cz, shakeel.butt@linux.dev, hannes@cmpxchg.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, arnd@arndb.de, richard.weiyang@gmail.com, zhangpeng.00@bytedance.com, linmiaohe@huawei.com, viro@zeniv.linux.org.uk, hca@linux.ibm.com, Andrii Nakryiko Subject: [PATCH v4 tip/perf/core 3/4] uprobes: simplify find_active_uprobe_rcu() VMA checks Date: Sun, 27 Oct 2024 18:08:17 -0700 Message-ID: <20241028010818.2487581-4-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241028010818.2487581-1-andrii@kernel.org> References: <20241028010818.2487581-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B3FF440003 X-Stat-Signature: 84dbaq3x8qkscsenydpf8irkugdzjp98 X-HE-Tag: 1730077733-624912 X-HE-Meta: U2FsdGVkX1/CH/99zL1rz8DCAclaj3M39Pi9Cy60uFfYF9ltJqnjH+w8rZouz5Cy5lgylVBxTPiVxFrF3QwAN2UpOZDqTCJEJ4dSY4zSJOoGdxXi6hz+g9B2TViPUZtS5Z2LA/7D63zFG5ULfGEjCFTmFh9bQ5z6ywBqXZ0la7B08azebgl7q8H3cakJbkEZRZ1am4zapZi9Xv4mEWuS5t2b0MYUkAMhg+4IztgdULW8jrY5Gz9zV0TxU3K/mSvDRuPTpZ7FLJaVZzIqbtdFhceIO2S/9KTaHFSKTUHEEQFVp3qqVWZNX3OSkVFzMMwvfN+gFXrWTBZgGISJlm6q57DqHE+n+4GKZD66XjlntXqaAKJce/Ef4C6fspsxvmVYeAoUKaf5AoTBQLLAOVSFK0xDoWxP8esqGRmFzj5YWqpQG577ec9xicFXAwNjesJ+QdBwRT3l2lcBFZVRlgHunLFSgwHkdJnT+tZgXUxi/Ojt6E9KTrMufa+YLIPwJH5Zluehc9T6rFd2k5TxNZXq3dCn1oyAfDGyHPCvi9pxj0RJd1Zf+qIVjqP5aMzEwqh6DY46uSJteX88FGdNNbiR9WxDEn1JwMW4LbJh5yFP77Ll+JCvMQojrtAe1p96BCT3hKX2pmwn3B6bzY20ETEFO01xPphsot9+R7Fi95r+9q/eCyfXCaWy8GTfFSV0vJmFsnmLK/VKKHAqRVBv2jzOu+9bK2nSmYOo0XzOgwr4Q4rIX1rKzuiyZAM6Nx8Y8cQbrJsINIA2iQZ/TNnUHfqnpKpIznw5fvkjFWLUuMzBCvAmuvObI1bUiW9jNd7eqIyu61xH2IcKHtArlkV/b2RewwoebsZwH6Gp8ZnR2WO1meYPKwrH8yML8ImTwCVTsoZzhknEfaSaSb4NjQy5o6O9BXVJDQg1xxqOn9P2triV4wBX48KAe8TB2lsG3ZeirTKvg877YcfFK8X8zgfz2E7 Kfpi/GLM VGaS+gz1zBKg1ZCmB/kjYUcgCgd5xbquaMIyj9b6pMWwmbA97B54CK+7csiDMDYA6qSeDE+x50Iq6DHPtSFl0WjA+mgODWMye5nql5xkn7SKR/b6h/NhQ6+XN3S7tOTBFUYMx1MuowDe9uu/fd2errUTxqTl6gH3+j3vu1u/84aJXIW7QtnP4A2yWhC5Aj3UK2UYmRW6LIoLNfCeNpmp70X8fJ+IOxErj0HIQ1imOSjlAycvIpRIVfWHHfI0doPuEdZI3dbnPs04U6c64r3R09mZ4Af42qNDzwpQsuAZ/Y3L/LvJ8HM+qH/N9zlDhdxby+P5mMYok3y+QxHgzoHLMcOMOTAQAddFWcE2lqjvA0h04DZBGHpH6ZiHSpA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: At the point where find_active_uprobe_rcu() is used we know that VMA in question has triggered software breakpoint, so we don't need to validate vma->vm_flags. Keep only vma->vm_file NULL check. Acked-by: Oleg Nesterov Suggested-by: Oleg Nesterov Signed-off-by: Andrii Nakryiko Reviewed-by: Masami Hiramatsu (Google) --- kernel/events/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 4ef4b51776eb..290c445768fa 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -2084,7 +2084,7 @@ static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swb mmap_read_lock(mm); vma = vma_lookup(mm, bp_vaddr); if (vma) { - if (valid_vma(vma, false)) { + if (vma->vm_file) { struct inode *inode = file_inode(vma->vm_file); loff_t offset = vaddr_to_offset(vma, bp_vaddr); From patchwork Mon Oct 28 01:08:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13852776 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FEAFD13590 for ; Mon, 28 Oct 2024 01:09:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B8426B0098; Sun, 27 Oct 2024 21:09:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 871896B0099; Sun, 27 Oct 2024 21:09:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66C066B009B; Sun, 27 Oct 2024 21:09:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 43E166B0098 for ; Sun, 27 Oct 2024 21:09:22 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E1C6FC1C1A for ; Mon, 28 Oct 2024 01:08:57 +0000 (UTC) X-FDA: 82721226882.25.BB8109B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf21.hostedemail.com (Postfix) with ESMTP id 893FE1C0016 for ; Mon, 28 Oct 2024 01:08:36 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=C+buo1Is; spf=pass (imf21.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730077681; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WLVwnRCntnXeWajnYgmxsa3duy7EHPWqMKY12L3L82M=; b=Uakl8qM+kODQsN3aeJrRx2P2gU01R7O+OLpDV6Fb1kq3xdhEQ0Rg4NeKlQuOE2pm+nu02Q FTmPYJEZsQg9GfCiGgJwGvfprFTP5CRDxhHn0qrt8EWi/R5xFpnuryWmuB6dJ8x4OUiNBp pjsf8XrTsE82a86/IXDWP6FKUodasuM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=C+buo1Is; spf=pass (imf21.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730077681; a=rsa-sha256; cv=none; b=517WRTAcuZjSECjWPOPfsYXXZSc5z0zKBgryo9VdQnb/6uqtSfiYGP8ZmCBO64r0N2apoT 0LJAdx1xM0uZF4ltfCTIQnW+ofLe6faIEksZhKKEooGzGHl4S3Gtzucyp8n1cXz9lRZaeB 0Y8IndV9PdwQOEgGvwiOe9CLtRxkoBQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 281D75C48F8; Mon, 28 Oct 2024 01:08:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DBD65C4CEC3; Mon, 28 Oct 2024 01:09:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730077759; bh=ZPNLuHrFYNEi/W8csmpQZbPF0DFPLMywDg12QmBzVuo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C+buo1IsmnmycbNipf2rWZz2vBDbL5OVKi0LpMB8qqcKP41Owf6Z+9wSLvCIBOUrK pcTMZVwiVv4+puvahD9XAPMXxnzik04MGXvx8kxcZPLvlLmNL+UxJgt7ewpybKjvyL ePYifjLrGv2XPtXutiMdOB67aGoI0K7p53B59Pc0ZHzIzP4LN8KI7zxZfkjF9wa4J3 nbQVp3MCBiBBnht22dd66CzJzTkuJ2ZTvKWw+rt+E8+1mQ1yTvrYC9I/Rs34qXHn63 5OFjAIFOvUgmZl5mv8o7MDqV80mCtIkB9c+TfQaYtiWsJg4fIvedudx/XPNZSQ3kVF SuBgNsD3uVvyQ== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, peterz@infradead.org Cc: oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, mhocko@kernel.org, vbabka@suse.cz, shakeel.butt@linux.dev, hannes@cmpxchg.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, arnd@arndb.de, richard.weiyang@gmail.com, zhangpeng.00@bytedance.com, linmiaohe@huawei.com, viro@zeniv.linux.org.uk, hca@linux.ibm.com, Andrii Nakryiko Subject: [PATCH v4 tip/perf/core 4/4] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution Date: Sun, 27 Oct 2024 18:08:18 -0700 Message-ID: <20241028010818.2487581-5-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241028010818.2487581-1-andrii@kernel.org> References: <20241028010818.2487581-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 893FE1C0016 X-Stat-Signature: xszcfkb1rdnkfkp5sfbxp3uh3rfw8pme X-HE-Tag: 1730077716-288968 X-HE-Meta: U2FsdGVkX1/AOxhGqjhhwRWGB+IKnA+k8+sYwGHxb0b+Ws04/4Qq/Xl3b/4QFNiAIzdnDvgmKnAhcxAclDsqNl4dMkyqj272svZYw+q11Q9e3NmQPNPF4hGKWI31dEqCbn7nuLWUdVGjbB7uYKMO+jLPbgdOBOFEFmdTY7rWtlgmtM+4mHHdLG9s1Qr0nqFuwAUW315wFoMEMMYKJ0r99Jmy8Xp5Wpa4Fa9y0PNHDh0mIVX8Q3YNDs0SPTo6h/FmHdtiFkipuSklqdTX6bsYS5HoCoby6m2L+3FyWcNDrO3SmDeru++Iqi35SiT4JdgbLSkOMqLEQ1TgCfPbWpktCUzuYqHJeNkmz4R1xy82jkwBmScI8qCReSutg5Zn0bbqSQnEDv3mZnw6zUxTmfoq8T8zPcWSg1e8EKc8E7UztD3i3SO0BeGdzz5mj2+pU3QTElpYVz5Z0rpr/UFEcJX5evmzk4aaD1Hh7lX+CNa25Ii2o0yU85CAmdHgIPXTOxWhlTWW09RFPFUZ2Qwy71+IxUwgR6EeMtnbz17ab7mLDBOIACmje3IJtbAmPTxNujZbN2pFaqW8/fMarV1zr/lH/JKAd1gkYvntWan+SKsknelbwhz96NozEJZtU9n3NZofjpDapuPKeYQl0D6KjHrDZfqS0solM3izoeSU9KGhMr6orwcWt1aPOhOwhT/2g0iCBgCeAXuaf9yzZ/CxhLH/5D1aNYRuBn28LjTe53qWFB1QIH7+KLIV7BulGAhOYS8BQT6fl/C0OWJDmG6oRY8zOXefykZPWK/OaeENmpKMYnrHAZ+66qDLgQ3SucOQBV4tCWmZe5qWRcq6rCluWJ+uVWIYHr43oqXBkJbQdPTQvtiYWRvoRo8zgURju1wnkkRMD7pbqU7KeDT2zg/cS79IOT2s9W5vQgyx14WyXaqv6kShNTRXvImI5YQjMrXvAmxhxj5+uppSmWJRIAHzg+S ybtDvcd7 NLaBJtoK2C/LD4VC/Eh1LyHRoIYQCARFrjLdEGoYeegvbMc8fKgccRw7MerKwPr1QPz55adlayeL5xwP7Ea/xbWxvBwZI7G5c2o8L0DEz+bx5wknQweDFsTOwcx4oArJT0WcJgcPKVVywawLSC+qCcDeF7a8cOpeb7aWHqJ7/5yI1Nv3n6vrIKozdc7gxTPfTBeFH9pVNMMm6xJTQ4sl0igzcQOT2XKYI88vwtsafBHQCuCTCg9YrmqLFnkOA4Zg5CLd0jEMtvdS/cNnT7PDa52fkoD52lOlG071Baf3qo5kW9Kk3djv0YPF+V/jb0oVeccPMydaayVmj3QbC+mdPHdrB+T4WXRVn2G77yG5slDX9oncV+B6bXy+vGd1fauVKA+RXJWhgIQLqGObQtDCoBQaQGU07R17RDkYxcZpPDaJ6s4k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Given filp_cachep is marked SLAB_TYPESAFE_BY_RCU (and FMODE_BACKING files, a special case, now goes through RCU-delated freeing), we can safely access vma->vm_file->f_inode field locklessly under just rcu_read_lock() protection, which enables looking up uprobe from uprobes_tree completely locklessly and speculatively without the need to acquire mmap_lock for reads. In most cases, anyway, assuming that there are no parallel mm and/or VMA modifications. The underlying struct file's memory won't go away from under us (even if struct file can be reused in the meantime). We rely on newly added mmap_lock_speculation_{begin,end}() helpers to validate that mm_struct stays intact for entire duration of this speculation. If not, we fall back to mmap_lock-protected lookup. The speculative logic is written in such a way that it will safely handle any garbage values that might be read from vma or file structs. Benchmarking results speak for themselves. BEFORE (latest tip/perf/core) ============================= uprobe-nop ( 1 cpus): 3.384 ± 0.004M/s ( 3.384M/s/cpu) uprobe-nop ( 2 cpus): 5.456 ± 0.005M/s ( 2.728M/s/cpu) uprobe-nop ( 3 cpus): 7.863 ± 0.015M/s ( 2.621M/s/cpu) uprobe-nop ( 4 cpus): 9.442 ± 0.008M/s ( 2.360M/s/cpu) uprobe-nop ( 5 cpus): 11.036 ± 0.013M/s ( 2.207M/s/cpu) uprobe-nop ( 6 cpus): 10.884 ± 0.019M/s ( 1.814M/s/cpu) uprobe-nop ( 7 cpus): 7.897 ± 0.145M/s ( 1.128M/s/cpu) uprobe-nop ( 8 cpus): 10.021 ± 0.128M/s ( 1.253M/s/cpu) uprobe-nop (10 cpus): 9.932 ± 0.170M/s ( 0.993M/s/cpu) uprobe-nop (12 cpus): 8.369 ± 0.056M/s ( 0.697M/s/cpu) uprobe-nop (14 cpus): 8.678 ± 0.017M/s ( 0.620M/s/cpu) uprobe-nop (16 cpus): 7.392 ± 0.003M/s ( 0.462M/s/cpu) uprobe-nop (24 cpus): 5.326 ± 0.178M/s ( 0.222M/s/cpu) uprobe-nop (32 cpus): 5.426 ± 0.059M/s ( 0.170M/s/cpu) uprobe-nop (40 cpus): 5.262 ± 0.070M/s ( 0.132M/s/cpu) uprobe-nop (48 cpus): 6.121 ± 0.010M/s ( 0.128M/s/cpu) uprobe-nop (56 cpus): 6.252 ± 0.035M/s ( 0.112M/s/cpu) uprobe-nop (64 cpus): 7.644 ± 0.023M/s ( 0.119M/s/cpu) uprobe-nop (72 cpus): 7.781 ± 0.001M/s ( 0.108M/s/cpu) uprobe-nop (80 cpus): 8.992 ± 0.048M/s ( 0.112M/s/cpu) AFTER ===== uprobe-nop ( 1 cpus): 3.534 ± 0.033M/s ( 3.534M/s/cpu) uprobe-nop ( 2 cpus): 6.701 ± 0.007M/s ( 3.351M/s/cpu) uprobe-nop ( 3 cpus): 10.031 ± 0.007M/s ( 3.344M/s/cpu) uprobe-nop ( 4 cpus): 13.003 ± 0.012M/s ( 3.251M/s/cpu) uprobe-nop ( 5 cpus): 16.274 ± 0.006M/s ( 3.255M/s/cpu) uprobe-nop ( 6 cpus): 19.563 ± 0.024M/s ( 3.261M/s/cpu) uprobe-nop ( 7 cpus): 22.696 ± 0.054M/s ( 3.242M/s/cpu) uprobe-nop ( 8 cpus): 24.534 ± 0.010M/s ( 3.067M/s/cpu) uprobe-nop (10 cpus): 30.475 ± 0.117M/s ( 3.047M/s/cpu) uprobe-nop (12 cpus): 33.371 ± 0.017M/s ( 2.781M/s/cpu) uprobe-nop (14 cpus): 38.864 ± 0.004M/s ( 2.776M/s/cpu) uprobe-nop (16 cpus): 41.476 ± 0.020M/s ( 2.592M/s/cpu) uprobe-nop (24 cpus): 64.696 ± 0.021M/s ( 2.696M/s/cpu) uprobe-nop (32 cpus): 85.054 ± 0.027M/s ( 2.658M/s/cpu) uprobe-nop (40 cpus): 101.979 ± 0.032M/s ( 2.549M/s/cpu) uprobe-nop (48 cpus): 110.518 ± 0.056M/s ( 2.302M/s/cpu) uprobe-nop (56 cpus): 117.737 ± 0.020M/s ( 2.102M/s/cpu) uprobe-nop (64 cpus): 124.613 ± 0.079M/s ( 1.947M/s/cpu) uprobe-nop (72 cpus): 133.239 ± 0.032M/s ( 1.851M/s/cpu) uprobe-nop (80 cpus): 142.037 ± 0.138M/s ( 1.775M/s/cpu) Previously total throughput was maxing out at 11mln/s, and gradually declining past 8 cores. With this change, it now keeps growing with each added CPU, reaching 142mln/s at 80 CPUs (this was measured on a 80-core Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz). Reviewed-by: Oleg Nesterov Suggested-by: Matthew Wilcox Suggested-by: Peter Zijlstra Signed-off-by: Andrii Nakryiko Acked-by: Masami Hiramatsu (Google) --- kernel/events/uprobes.c | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 290c445768fa..efcd62f7051d 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -2074,6 +2074,47 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr) return is_trap_insn(&opcode); } +static struct uprobe *find_active_uprobe_speculative(unsigned long bp_vaddr) +{ + struct mm_struct *mm = current->mm; + struct uprobe *uprobe = NULL; + struct vm_area_struct *vma; + struct file *vm_file; + loff_t offset; + unsigned int seq; + + guard(rcu)(); + + if (!mmap_lock_speculation_begin(mm, &seq)) + return NULL; + + vma = vma_lookup(mm, bp_vaddr); + if (!vma) + return NULL; + + /* + * vm_file memory can be reused for another instance of struct file, + * but can't be freed from under us, so it's safe to read fields from + * it, even if the values are some garbage values; ultimately + * find_uprobe_rcu() + mmap_lock_speculation_end() check will ensure + * that whatever we speculatively found is correct + */ + vm_file = READ_ONCE(vma->vm_file); + if (!vm_file) + return NULL; + + offset = (loff_t)(vma->vm_pgoff << PAGE_SHIFT) + (bp_vaddr - vma->vm_start); + uprobe = find_uprobe_rcu(vm_file->f_inode, offset); + if (!uprobe) + return NULL; + + /* now double check that nothing about MM changed */ + if (!mmap_lock_speculation_end(mm, seq)) + return NULL; + + return uprobe; +} + /* assumes being inside RCU protected region */ static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swbp) { @@ -2081,6 +2122,10 @@ static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swb struct uprobe *uprobe = NULL; struct vm_area_struct *vma; + uprobe = find_active_uprobe_speculative(bp_vaddr); + if (uprobe) + return uprobe; + mmap_read_lock(mm); vma = vma_lookup(mm, bp_vaddr); if (vma) {