From patchwork Tue Oct 15 21:37:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 13837272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E286BD1F9D8 for ; Tue, 15 Oct 2024 21:37:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A85F6B0083; Tue, 15 Oct 2024 17:37:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 731A16B0085; Tue, 15 Oct 2024 17:37:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D2B96B0088; Tue, 15 Oct 2024 17:37:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3A8796B0083 for ; Tue, 15 Oct 2024 17:37:51 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id AE88CAC16D for ; Tue, 15 Oct 2024 21:37:32 +0000 (UTC) X-FDA: 82677148848.27.1F47493 Received: from out-176.mta0.migadu.com (out-176.mta0.migadu.com [91.218.175.176]) by imf30.hostedemail.com (Postfix) with ESMTP id 5A48680006 for ; Tue, 15 Oct 2024 21:37:34 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Tq5qugG3; spf=pass (imf30.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729028222; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=bHLO0k40DlJvpKFDxsDix1kd1cXxAKJ76NZ9bnkBv4A=; b=yuQccYnxr8Od9ktgJt23nfYCk8Qa4B7chbzf89QcUN6qMg4DrDU46OgU6rwD4j+GdfhdMs Antrv/oxNXtvBOAJhzAPJ+YQwg9v8Pz4Vn9y+WQJVG+LmKqN2mbZs6vb7bNj1hfyeoWtoc 9aIns/76u+g2dConfunanDibIn4kCk4= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Tq5qugG3; spf=pass (imf30.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729028222; a=rsa-sha256; cv=none; b=xjNyDWPBK1paLqAwyzwlgO3pe47irWYHeMy/cXOHbXQpneZltAOIkAVyoXAWu4mUFHv0tr pQSB3ENmJvqaevRgsCa90WY7eBxwihezQKOfzA1emzDTzXKUKlqQS03o538VnmxLtVIxaF qHE7X+h0lR/gy1qszJa4Xf7gTRfaSks= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1729028266; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=bHLO0k40DlJvpKFDxsDix1kd1cXxAKJ76NZ9bnkBv4A=; b=Tq5qugG3KRstaJyKB2v7jem16OWo8hf1FE8Bk2YQgzaNsn5L5Hdh3YJAV1LW5CvgVMAUFm lf8psQxJxgAbqSinP0BMINvR6tRlx5cLmVZ/kcL4mCx3EKc/46+SZIu/HHOPv4O17G1fKs 3jK+DM+OnL4o4CnT1FCz/hLi3lrc+pI= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Steven Rostedt , JP Kobryn , Yosry Ahmed , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team , Michal Hocko , Muchun Song Subject: [PATCH v2] memcg: add tracing for memcg stat updates Date: Tue, 15 Oct 2024 14:37:21 -0700 Message-ID: <20241015213721.3804209-1-shakeel.butt@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Stat-Signature: on6t7u663b7ou46kcufj9te8a3e5tt9e X-Rspamd-Queue-Id: 5A48680006 X-Rspamd-Server: rspam11 X-HE-Tag: 1729028254-633417 X-HE-Meta: U2FsdGVkX1/vzFy14YODiOH9FsBcponJupOYO1bKa0grt59gYyZL0Fx5GIVHfWCsQd+74VY5J6rEhs0j+oCXXG46ImB6YBOspPLrDwr5DzH2AKTQY7Hbp6Xn/TeARU+s+js6zi90c4LtrejXwkkY5uGbC1HQ2pK1PVCc1L6hJIXozANTM6itgTd7Bdu3/4lhayC0mGVnsb2NkAVimh0Q1UZOuF7AlS+ZeF2la0+Y1wxV6PRasWaZc+/KlpmzdhzHFWvKKBdbAHqlfqLcSVwBEpT600gDhIOGHXvnOrkl0qVPE+HxGqbUiJhIemws5TTCvag1fwYUsnPvEsdFQHuHLk8orNeIWqLj39wVQBmLhIxDKYQKrcPquvXXQ5+D9FkZwvrihZlEpAnvJ09zNQj4ie2u44s/tBBpkQQrKPmeZqRZDdHUgV+Ie9XZjBZLk3lxlCWPXLE2bACMRbuCTuE//qVyc07MBmCDlS3bknkv7ZZlxwgKX6Zq14X+lOJMMtC3uamnCecvrtZm08hx2GyRowCI+GBi4tS+AZajxambtr++xZNkXPbepPT8kZ1Oc9mJzHi2j3IljDuC0gorbkqlwex6++VA/WLbnl2hHtLCeemhDLgTwk69CYf8svtt6deqwIaJ6RvDmL5v49g127d8ujadeM7hmn03/xD57k1yZlsCXOK9Y3zp67JDJ6aeg1le3E9VG+PRtnPahKoeUxKR1GnH+jyAPWzTRXCj2XHcrXYy46TqPYFvaaby2Xo1otmSLDxzAJ3a+3pfl4aU4e4ql6MU4A+iQAAOhKZwjLSDa869fJO1wqLv2Bm1vlpuKzWHxEOgVOlc5nL7YDO52ZSc3Wo+S1R1C0Ui1I0hM2rQTU3eNLRXzdiwN/4e9pK6vzYl+wzhqWquI9/T5Is5I56miOcj4f1NARQsEY4QT7Fg4WJNAgf7IQ32WQFH07xp7nPIE38poV7QcHn8RmTUZEU csjG3qZN 4WTjeHMK68yHo89LeUrIfq7KXyD3O72YTWwvydZZQLrNqzex/yVYOABMgq/AJm0E5/a9AVh8iF+L07Yt2Y3wNzaskIxboVkl0Cd7LIMM3gv1kKrVOOKp6m9JIL10A3dUzM3QZeUBU/ksZUSwdMIpf7oAn4Rwjyn+ayCNw6sj/siSAzh+ivaa6fqkeifrBwKQPBOYz0P44h4PY9NAaY1C+4Xz9WGSIbjmKoJ//CcHvRNaNf8vVMNBd3rRZwHSnXTp9l4SzVD9uBIBajvrrFFcK0lzDoO7ZfVh0exQuhx7wfEj5di4AKwXtXxSS31kHEPkSlvyhwJuFA73pOmdIzghpJjUDLVxHaQyvpMsg1l9o0S8dexhxJ+7eL7cSnrnJVpOoSzXwqv6zaAkl5ITuSS0ofst/DNYeJ8lZzl7B56d6DP2a8fEydrhBXAra9KqARbOdGNyn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The memcg stats are maintained in rstat infrastructure which provides very fast updates side and reasonable read side. However memcg added plethora of stats and made the read side, which is cgroup rstat flush, very slow. To solve that, threshold was added in the memcg stats read side i.e. no need to flush the stats if updates are within the threshold. This threshold based improvement worked for sometime but more stats were added to memcg and also the read codepath was getting triggered in the performance sensitive paths which made threshold based ratelimiting ineffective. We need more visibility into the hot and cold stats i.e. stats with a lot of updates. Let's add trace to get that visibility. Signed-off-by: Shakeel Butt Acked-by: Roman Gushchin Reviewed-by: Yosry Ahmed Cc: Michal Hocko Cc: Johannes Weiner Cc: Muchun Song Cc: JP Kobryn Cc: Steven Rostedt (Google) Cc: Andrew Morton Acked-by: Johannes Weiner Reviewed-by: T.J. Mercier --- Changes since v1: - Used unsigned long type for memcg_rstat_events (Yosry) - Kept the Acks and Reviews tag include/trace/events/memcg.h | 81 ++++++++++++++++++++++++++++++++++++ mm/memcontrol.c | 13 +++++- 2 files changed, 92 insertions(+), 2 deletions(-) create mode 100644 include/trace/events/memcg.h diff --git a/include/trace/events/memcg.h b/include/trace/events/memcg.h new file mode 100644 index 000000000000..8667e57816d2 --- /dev/null +++ b/include/trace/events/memcg.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM memcg + +#if !defined(_TRACE_MEMCG_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_MEMCG_H + +#include +#include + + +DECLARE_EVENT_CLASS(memcg_rstat_stats, + + TP_PROTO(struct mem_cgroup *memcg, int item, int val), + + TP_ARGS(memcg, item, val), + + TP_STRUCT__entry( + __field(u64, id) + __field(int, item) + __field(int, val) + ), + + TP_fast_assign( + __entry->id = cgroup_id(memcg->css.cgroup); + __entry->item = item; + __entry->val = val; + ), + + TP_printk("memcg_id=%llu item=%d val=%d", + __entry->id, __entry->item, __entry->val) +); + +DEFINE_EVENT(memcg_rstat_stats, mod_memcg_state, + + TP_PROTO(struct mem_cgroup *memcg, int item, int val), + + TP_ARGS(memcg, item, val) +); + +DEFINE_EVENT(memcg_rstat_stats, mod_memcg_lruvec_state, + + TP_PROTO(struct mem_cgroup *memcg, int item, int val), + + TP_ARGS(memcg, item, val) +); + +DECLARE_EVENT_CLASS(memcg_rstat_events, + + TP_PROTO(struct mem_cgroup *memcg, int item, unsigned long val), + + TP_ARGS(memcg, item, val), + + TP_STRUCT__entry( + __field(u64, id) + __field(int, item) + __field(unsigned long, val) + ), + + TP_fast_assign( + __entry->id = cgroup_id(memcg->css.cgroup); + __entry->item = item; + __entry->val = val; + ), + + TP_printk("memcg_id=%llu item=%d val=%lu", + __entry->id, __entry->item, __entry->val) +); + +DEFINE_EVENT(memcg_rstat_events, count_memcg_events, + + TP_PROTO(struct mem_cgroup *memcg, int item, unsigned long val), + + TP_ARGS(memcg, item, val) +); + + +#endif /* _TRACE_MEMCG_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c098fd7f5c5e..17af08367c68 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -71,6 +71,10 @@ #include +#define CREATE_TRACE_POINTS +#include +#undef CREATE_TRACE_POINTS + #include struct cgroup_subsys memory_cgrp_subsys __read_mostly; @@ -682,7 +686,9 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, return; __this_cpu_add(memcg->vmstats_percpu->state[i], val); - memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)); + val = memcg_state_val_in_pages(idx, val); + memcg_rstat_updated(memcg, val); + trace_mod_memcg_state(memcg, idx, val); } /* idx can be of type enum memcg_stat_item or node_stat_item. */ @@ -741,7 +747,9 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec, /* Update lruvec */ __this_cpu_add(pn->lruvec_stats_percpu->state[i], val); - memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)); + val = memcg_state_val_in_pages(idx, val); + memcg_rstat_updated(memcg, val); + trace_mod_memcg_lruvec_state(memcg, idx, val); memcg_stats_unlock(); } @@ -832,6 +840,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, memcg_stats_lock(); __this_cpu_add(memcg->vmstats_percpu->events[i], count); memcg_rstat_updated(memcg, count); + trace_count_memcg_events(memcg, idx, count); memcg_stats_unlock(); }