From patchwork Sat Nov 23 06:09:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 13883763 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44F2CE6ADE2 for ; Sat, 23 Nov 2024 06:10:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 286E36B0082; Sat, 23 Nov 2024 01:10:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2371F6B0083; Sat, 23 Nov 2024 01:10:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FE696B0085; Sat, 23 Nov 2024 01:10:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E6D676B0082 for ; Sat, 23 Nov 2024 01:10:00 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5582A1C53A8 for ; Sat, 23 Nov 2024 06:10:00 +0000 (UTC) X-FDA: 82816334160.16.AF10879 Received: from out-181.mta1.migadu.com (out-181.mta1.migadu.com [95.215.58.181]) by imf09.hostedemail.com (Postfix) with ESMTP id 2FFED140002 for ; Sat, 23 Nov 2024 06:09:57 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=rcr+bqvh; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf09.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.181 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732342198; a=rsa-sha256; cv=none; b=OMqEVsYULxB3a52ojRmCjCAaFO+0K984d74aaM1M3oKtwtv4MSH9UCBtoZBNnuSNBFCnRx Of6+gxZRMSlBEnftvoswtUeYwzn4XSlnXnqzMkYaVApRetWhpx4/zEZF0zKQg+pLbu9z9W T9gn0O7BwLCjZAeDkYdc6RCi4QuqHBc= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=rcr+bqvh; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf09.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.181 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732342198; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Z/FHdAwT3UFixi6NAoa390cp3x+oKcU453tlU0Jllgg=; b=G5W4Z0f1ccOU5/QA490TpYky3mhlEq8nR7D9wMKcqB365HjZ/FizDInQACRNgdlV/Ooa9d h88fmV+CxlywlsPu6om9By9r/lR9BCTAgNDMZiHAIYPcPPfPl9RqlEiBOdQUAakJOmHjdx r3vCocR64nL+oYqDFmX9KWZ9eXmRG2U= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1732342196; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Z/FHdAwT3UFixi6NAoa390cp3x+oKcU453tlU0Jllgg=; b=rcr+bqvh97kJeYUmM+cl1xZN0Y+4OVRSTvX5uBGjp3Yt/50tnLN6/6frxztIRMjCzd9iF7 RgJ1Mwm+z8p9ega28RG5mV1SC8zfV++N/oJTIVMHELurBmacIifBdVqZ8G9usbW9VuhkxM 7x2++VYXzMHlDIJ8mzHDJt9F87+LkQI= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Axel Rasmussen , Steven Rostedt , Suren Baghdasaryan , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [PATCH] mm: mmap_lock: optimize mmap_lock tracepoints Date: Fri, 22 Nov 2024 22:09:39 -0800 Message-ID: <20241123060939.169978-1-shakeel.butt@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 2FFED140002 X-Stat-Signature: 4z31sqzb6zketaubtwdb5tuqbd1zmw5u X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1732342197-702602 X-HE-Meta: U2FsdGVkX18MicHhJdhoeAQ/ovliKI86v9JGOQ8bqou1sSQULxD05v8IWhauBxjnrw3qRovcyGXB8Z3YPcpU3jrZCiSbzNlekX5tNwEpICU8KRewCKUEaiRif1sifu+4PhipbLVFP4Jac0KHagYVone4PevVconvpitFEN1gC8w1JmKeP7Z75BRJ0qMVan7klkbSPNzxcE9BZXiR7qP2kxydotQIRmGmfwVq/xt+TL94KQHNRKtTMrw3zSLndjEKn+0nZmdSqXH6sDbQ/32+RW29u+pjQZJgUefanbZ8eOkowZaWEcJIZhqkQqJckFyHg3LCEWb8RvNUTUKZUIgduDPtUYX5MxVWNGvstwnNM/7fSvWiCnJN0IjDZehLnrdfcxsoxTtHRUWZWQlu6WuSoNH1zl0w1InjFEiL2YL52lP3kRhkBXid1xksjGoRbQGAeqGQ7jTYdmqL7prcqE7HP0nHAJUK8fgShK6PdYlbIgJ0cUGttoGESA/GQVWzr038d4up7WAHiycrEPfZ8ifI9DKQavVafakDa88fP8F1Zgm0aQu99d+tgiA0CxdemMcXw6N0vgO2MlsevSg87NgVMMWQFPgUo/2vCLWSUDb0O5vT3jB7vV2n3MfhZfUthHW4ierc3Y8FdlwsbpFasxoqhhBGFeonhx++R+O2mOB5F/fdjHktwPxWybRGrSfD3maZtOyyAIvTjGy6hiriOcbwbnYJ2hWOthA5Qni729RrwnRLVagbJ2SGKnymgdjLNnzw3ILH5p+vqPZ526Na/0p1iNMQ34BBT3laK4Nmvq3hSlpxTUi/MN9KJRoAd28FNnr7jOxw656CXYXp3bKyL90rUAQrnflHspYSOvanZNT2RD9lCdA7zqcEsBCLsXbDdFr2702jCiach8YBGYppsnUFFMSvUtpiMXAxhjjoZtVH225sN8WtOrjz9haf05jdVXls9KnjdtQRd9MyD+aNrhc xFRrvymX ALiBR6/KZ54b55a3Ck+eTi+LHPYX2y2KyaOvMdsOhRNtTRlo2e5ee6Vdw5hEcV3QBzlhSkr71i6UUfNnsC17fasdVuX4jdA1RiTUCsgIUWTiOQtL/ToMrzj2xhGCy6iyam3esgqT9/W0bRjqzFs0fxDLoFQ8+ryfCdJoZn7wK2jQ4ztBpo6b+JK8Caw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We are starting to deploy mmap_lock tracepoint monitoring across our fleet and the early results showed that these tracepoints are consuming significant amount of CPUs in kernfs_path_from_node when enabled. It seems like the kernel is trying to resolved the cgroup path in the fast path of the locking code path when the tracepoints are enabled. In addition for some application their metrics are regressing when monitoring is enabled. The cgroup path resolution can be slow and should not be done in the fast path. Most userspace tools, like bpftrace, provides functionality to get the cgroup path from cgroup id, so let's just trace the cgroup id and the users can use better tools to get the path in the slow path. Signed-off-by: Shakeel Butt --- include/linux/memcontrol.h | 18 ++++++++++++ include/trace/events/mmap_lock.h | 32 ++++++++++---------- mm/mmap_lock.c | 50 ++------------------------------ 3 files changed, 36 insertions(+), 64 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5502aa8e138e..d82f08cd70cd 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1046,6 +1046,19 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, void split_page_memcg(struct page *head, int old_order, int new_order); +static inline u64 memcg_id_from_mm(struct mm_struct *mm) +{ + struct mem_cgroup *memcg; + u64 id = 0; + + rcu_read_lock(); + memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); + if (likely(memcg)) + id = cgroup_id(memcg->css.cgroup); + rcu_read_unlock(); + return id; +} + #else /* CONFIG_MEMCG */ #define MEM_CGROUP_ID_SHIFT 0 @@ -1466,6 +1479,11 @@ void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) static inline void split_page_memcg(struct page *head, int old_order, int new_order) { } + +static inline u64 memcg_id_from_mm(struct mm_struct *mm) +{ + return 0; +} #endif /* CONFIG_MEMCG */ /* diff --git a/include/trace/events/mmap_lock.h b/include/trace/events/mmap_lock.h index bc2e3ad787b3..5529933d19c5 100644 --- a/include/trace/events/mmap_lock.h +++ b/include/trace/events/mmap_lock.h @@ -5,6 +5,7 @@ #if !defined(_TRACE_MMAP_LOCK_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_MMAP_LOCK_H +#include #include #include @@ -12,64 +13,61 @@ struct mm_struct; DECLARE_EVENT_CLASS(mmap_lock, - TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write), + TP_PROTO(struct mm_struct *mm, bool write), - TP_ARGS(mm, memcg_path, write), + TP_ARGS(mm, write), TP_STRUCT__entry( __field(struct mm_struct *, mm) - __string(memcg_path, memcg_path) + __field(u64, memcg_id) __field(bool, write) ), TP_fast_assign( __entry->mm = mm; - __assign_str(memcg_path); + __entry->memcg_id = memcg_id_from_mm(mm); __entry->write = write; ), TP_printk( - "mm=%p memcg_path=%s write=%s", - __entry->mm, - __get_str(memcg_path), + "mm=%p memcg_id=%llu write=%s", + __entry->mm, __entry->memcg_id, __entry->write ? "true" : "false" ) ); #define DEFINE_MMAP_LOCK_EVENT(name) \ DEFINE_EVENT(mmap_lock, name, \ - TP_PROTO(struct mm_struct *mm, const char *memcg_path, \ - bool write), \ - TP_ARGS(mm, memcg_path, write)) + TP_PROTO(struct mm_struct *mm, bool write), \ + TP_ARGS(mm, write)) DEFINE_MMAP_LOCK_EVENT(mmap_lock_start_locking); DEFINE_MMAP_LOCK_EVENT(mmap_lock_released); TRACE_EVENT(mmap_lock_acquire_returned, - TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write, - bool success), + TP_PROTO(struct mm_struct *mm, bool write, bool success), - TP_ARGS(mm, memcg_path, write, success), + TP_ARGS(mm, write, success), TP_STRUCT__entry( __field(struct mm_struct *, mm) - __string(memcg_path, memcg_path) + __field(u64, memcg_id) __field(bool, write) __field(bool, success) ), TP_fast_assign( __entry->mm = mm; - __assign_str(memcg_path); + __entry->memcg_id = memcg_id_from_mm(mm); __entry->write = write; __entry->success = success; ), TP_printk( - "mm=%p memcg_path=%s write=%s success=%s", + "mm=%p memcg_id=%llu write=%s success=%s", __entry->mm, - __get_str(memcg_path), + __entry->memcg_id, __entry->write ? "true" : "false", __entry->success ? "true" : "false" ) diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c index f186d57df2c6..e7dbaf96aa17 100644 --- a/mm/mmap_lock.c +++ b/mm/mmap_lock.c @@ -17,51 +17,7 @@ EXPORT_TRACEPOINT_SYMBOL(mmap_lock_start_locking); EXPORT_TRACEPOINT_SYMBOL(mmap_lock_acquire_returned); EXPORT_TRACEPOINT_SYMBOL(mmap_lock_released); -#ifdef CONFIG_MEMCG - -/* - * Size of the buffer for memcg path names. Ignoring stack trace support, - * trace_events_hist.c uses MAX_FILTER_STR_VAL for this, so we also use it. - */ -#define MEMCG_PATH_BUF_SIZE MAX_FILTER_STR_VAL - -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ - do { \ - if (trace_mmap_lock_##type##_enabled()) { \ - char buf[MEMCG_PATH_BUF_SIZE]; \ - get_mm_memcg_path(mm, buf, sizeof(buf)); \ - trace_mmap_lock_##type(mm, buf, ##__VA_ARGS__); \ - } \ - } while (0) - -#else /* !CONFIG_MEMCG */ - -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ - trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) - -#endif /* CONFIG_MEMCG */ - #ifdef CONFIG_TRACING -#ifdef CONFIG_MEMCG -/* - * Write the given mm_struct's memcg path to a buffer. If the path cannot be - * determined, empty string is written. - */ -static void get_mm_memcg_path(struct mm_struct *mm, char *buf, size_t buflen) -{ - struct mem_cgroup *memcg; - - buf[0] = '\0'; - memcg = get_mem_cgroup_from_mm(mm); - if (memcg == NULL) - return; - if (memcg->css.cgroup) - cgroup_path(memcg->css.cgroup, buf, buflen); - css_put(&memcg->css); -} - -#endif /* CONFIG_MEMCG */ - /* * Trace calls must be in a separate file, as otherwise there's a circular * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. @@ -69,20 +25,20 @@ static void get_mm_memcg_path(struct mm_struct *mm, char *buf, size_t buflen) void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write) { - TRACE_MMAP_LOCK_EVENT(start_locking, mm, write); + trace_mmap_lock_start_locking(mm, write); } EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, bool success) { - TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); + trace_mmap_lock_acquire_returned(mm, write, success); } EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) { - TRACE_MMAP_LOCK_EVENT(released, mm, write); + trace_mmap_lock_released(mm, write); } EXPORT_SYMBOL(__mmap_lock_do_trace_released); #endif /* CONFIG_TRACING */