From patchwork Thu Mar 26 11:12:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11459857 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0E126CA for ; Thu, 26 Mar 2020 11:12:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8C0C52073E for ; Thu, 26 Mar 2020 11:12:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jtuIanoH" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C0C52073E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A68DE6B0071; Thu, 26 Mar 2020 07:12:51 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A416F6B0072; Thu, 26 Mar 2020 07:12:51 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92FFB6B0073; Thu, 26 Mar 2020 07:12:51 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id 7C7B66B0071 for ; Thu, 26 Mar 2020 07:12:51 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6B25D68A2 for ; Thu, 26 Mar 2020 11:12:51 +0000 (UTC) X-FDA: 76637250942.20.talk26_18e371582411d X-Spam-Summary: 2,0,0,87bcb419f8227dd1,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:1:41:355:379:541:800:960:966:973:988:989:1260:1345:1359:1437:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2559:2562:2637:2731:2890:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4042:4250:4321:4385:4605:5007:6261:6653:7514:7875:7903:8957:9413:11026:11473:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12679:12895:12986:13161:13229:14096:14394:14687:14877:21080:21324:21325:21444:21450:21451:21627:21666:21740:21987:21990:30001:30045:30054,0,RBL:209.85.214.196:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: talk26_18e371582411d X-Filterd-Recvd-Size: 14520 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Thu, 26 Mar 2020 11:12:50 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id x1so1997606plm.4 for ; Thu, 26 Mar 2020 04:12:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=VWkQKW+0O9iGIwovqTyuSGTaplJbqkl4t8fzkYB4gBE=; b=jtuIanoHNItiirMWqxq81HOmQO2g4kebKtPeLHoPUBHRdUsO4RmJUE6V0b0mDqtEAy 0UxXC+5zfdWOBpgY///uHtfhxqErywduYLYuu8B6wEBsyyLPU4L0e93uXJ5nftwe9Y4L WLDYuW6h5Q82TLYvN9SdpvzrBIIC6WcKa7JEt/CjFi+TjuRopw3RcP4qpw8lKE7GbDqX Yeg2OOKhbLoURgHlIY/jhD8egHuh9GzOb2d0c1gVTuvORoFCiFXvE1AOreHfUfehGOUB llax+axfGrqJpcoiprDg+pIrBN917mhZRRmKZ5b/LKE1FihI7PAUa2Oc7Obe8A6e7FLA WqdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=VWkQKW+0O9iGIwovqTyuSGTaplJbqkl4t8fzkYB4gBE=; b=sW6739gEGylPjM7Gba2CL7Sf8g60R+ANL4JJ5yVLcJ1GHdEADX4SverMolEzIK+4Sb hjc5nj4e+TfOD/3IBGjyLUlfHuHTlAndTCwMlcK44snlOO4YrHYHRohmMJHBFAgDqy2Z XnVe+d4kwnYhjBfAgB34nMCt+28ig0z7+wQCsw05zNAniecvv6dns+GoBuOM1klZ1xgT w6S/HzP/WsR3s2IMgwutqKysA7St8XnR28QJRlsj0a183C6zfM5mFhciZfF0C/MMzKpJ yJ1x9+FSUysY9Gx9naDFYetwLxvpCKS3pq65zD0ScXyOy9VvsOpbYcpDibPuu+JHiE37 jkcQ== X-Gm-Message-State: ANhLgQ0RrTy11f9aaS38mBguKqOrTVbKUB4bvQo9cBrDT7YlRrIZuyVP PARhk9fHTktMNvGF+dBpFvY= X-Google-Smtp-Source: ADFU+vsxeoWk59cjGMPBDRmaqKSKdxpR2NavHs+av5lDO++pcLbiPu6pPI7YI2Hup0JPrfSsyCLzSg== X-Received: by 2002:a17:902:9889:: with SMTP id s9mr7153100plp.252.1585221169497; Thu, 26 Mar 2020 04:12:49 -0700 (PDT) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id m9sm1427723pff.93.2020.03.26.04.12.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Mar 2020 04:12:48 -0700 (PDT) From: Yafang Shao To: hannes@cmpxchg.org, peterz@infradead.org, akpm@linux-foundation.org, mhocko@kernel.org, axboe@kernel.dk, mgorman@suse.de, rostedt@goodmis.org, mingo@redhat.com Cc: linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH 1/2] psi: introduce various types of memstall Date: Thu, 26 Mar 2020 07:12:06 -0400 Message-Id: <1585221127-11458-2-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1585221127-11458-1-git-send-email-laoar.shao@gmail.com> References: <1585221127-11458-1-git-send-email-laoar.shao@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The memstall is used as a memory pressure index now. But there're many paths to get into memstall, so once memstall happens we don't know the specific reason of it. This patch introduces various types of memstall as bellow, MEMSTALL_KSWAPD MEMSTALL_RECLAIM_DIRECT MEMSTALL_RECLAIM_MEMCG MEMSTALL_RECLAIM_HIGH MEMSTALL_KCOMPACTD MEMSTALL_COMPACT MEMSTALL_WORKINGSET_REFAULT MEMSTALL_WORKINGSET_THRASHING MEMSTALL_MEMDELAY MEMSTALL_SWAPIO and adds a new parameter 'type' in psi_memstall_{enter, leave}. After that, we can trace specific types of memstall with other powerful tools like tracepoint, kprobe, ebpf and etc. It can also help us to analyze latency spike caused by memory pressure. But note that we can't use it to build memory pressure for a specific type of memstall, e.g. memcg pressure, compaction pressure and etc, because it doesn't implement various types of task->in_memstall, e.g. task->in_memcgstall, task->in_compactionstall and etc. IOW, the main goal of it is to trace the spread of latencies and the specific reason of these latencies. Although there're already some tracepoints can help us to achieve this goal, e.g. vmscan:mm_vmscan_kswapd_{wake, sleep} vmscan:mm_vmscan_direct_reclaim_{begin, end} vmscan:mm_vmscan_memcg_reclaim_{begin, end} /* no tracepoint for memcg high reclaim*/ compcation:mm_compaction_kcompactd_{wake, sleep} compcation:mm_compaction_begin_{begin, end} /* no tracepoint for workingset refault */ /* no tracepoint for workingset thrashing */ /* no tracepoint for use memdelay */ /* no tracepoint for swapio */ but psi_memstall_{enter, leave} gives us a unified entrance for all types of memstall and we don't need to add many begin and end tracepoints that hasn't been implemented yet. Signed-off-by: Yafang Shao --- block/blk-cgroup.c | 4 ++-- block/blk-core.c | 4 ++-- include/linux/psi.h | 15 +++++++++++---- include/linux/psi_types.h | 13 +++++++++++++ kernel/sched/psi.c | 6 ++++-- mm/compaction.c | 4 ++-- mm/filemap.c | 4 ++-- mm/memcontrol.c | 4 ++-- mm/page_alloc.c | 8 ++++---- mm/page_io.c | 4 ++-- mm/vmscan.c | 8 ++++---- 11 files changed, 48 insertions(+), 26 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index a229b94..fc24095 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1593,7 +1593,7 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) delay_nsec = min_t(u64, delay_nsec, 250 * NSEC_PER_MSEC); if (use_memdelay) - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_MEMDELAY); exp = ktime_add_ns(now, delay_nsec); tok = io_schedule_prepare(); @@ -1605,7 +1605,7 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) io_schedule_finish(tok); if (use_memdelay) - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_MEMDELAY); } /** diff --git a/block/blk-core.c b/block/blk-core.c index 60dc955..e2039cf 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1190,12 +1190,12 @@ blk_qc_t submit_bio(struct bio *bio) * submission can be a significant part of overall IO time. */ if (workingset_read) - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_WORKINGSET_REFAULT); ret = generic_make_request(bio); if (workingset_read) - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_WORKINGSET_REFAULT); return ret; } diff --git a/include/linux/psi.h b/include/linux/psi.h index 7b3de73..7bf94f6 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -19,8 +19,8 @@ void psi_task_change(struct task_struct *task, int clear, int set); void psi_memstall_tick(struct task_struct *task, int cpu); -void psi_memstall_enter(unsigned long *flags); -void psi_memstall_leave(unsigned long *flags); +void psi_memstall_enter(unsigned long *flags, enum memstall_types type); +void psi_memstall_leave(unsigned long *flags, enum memstall_types type); int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); @@ -41,8 +41,15 @@ __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, static inline void psi_init(void) {} -static inline void psi_memstall_enter(unsigned long *flags) {} -static inline void psi_memstall_leave(unsigned long *flags) {} +static inline void psi_memstall_enter(unsigned long *flags, + enum memstall_types type) +{ +} + +static inline void psi_memstall_leave(unsigned long *flags, + enum memstall_types type) +{ +} #ifdef CONFIG_CGROUPS static inline int psi_cgroup_alloc(struct cgroup *cgrp) diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 07aaf9b..52a3f08 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -7,6 +7,19 @@ #include #include +enum memstall_types { + MEMSTALL_KSWAPD, + MEMSTALL_RECLAIM_DIRECT, + MEMSTALL_RECLAIM_MEMCG, + MEMSTALL_RECLAIM_HIGH, + MEMSTALL_KCOMPACTD, + MEMSTALL_COMPACT, + MEMSTALL_WORKINGSET_REFAULT, + MEMSTALL_WORKINGSET_THRASH, + MEMSTALL_MEMDELAY, + MEMSTALL_SWAP, +}; + #ifdef CONFIG_PSI /* Tracked task states */ diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 0285207..460f084 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -806,11 +806,12 @@ void psi_memstall_tick(struct task_struct *task, int cpu) /** * psi_memstall_enter - mark the beginning of a memory stall section * @flags: flags to handle nested sections + * @type: type of memstall * * Marks the calling task as being stalled due to a lack of memory, * such as waiting for a refault or performing reclaim. */ -void psi_memstall_enter(unsigned long *flags) +void psi_memstall_enter(unsigned long *flags, enum memstall_types type) { struct rq_flags rf; struct rq *rq; @@ -837,10 +838,11 @@ void psi_memstall_enter(unsigned long *flags) /** * psi_memstall_leave - mark the end of an memory stall section * @flags: flags to handle nested memdelay sections + * @type: type of memstall * * Marks the calling task as no longer stalled due to lack of memory. */ -void psi_memstall_leave(unsigned long *flags) +void psi_memstall_leave(unsigned long *flags, enum memstall_types type) { struct rq_flags rf; struct rq *rq; diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c7..c0d5331 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2647,9 +2647,9 @@ static int kcompactd(void *p) wait_event_freezable(pgdat->kcompactd_wait, kcompactd_work_requested(pgdat)); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_KCOMPACTD); kcompactd_do_work(pgdat); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_KCOMPACTD); } return 0; diff --git a/mm/filemap.c b/mm/filemap.c index 1784478..f5459e3 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1123,7 +1123,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, delayacct_thrashing_start(); delayacct = true; } - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_WORKINGSET_THRASH); thrashing = true; } @@ -1182,7 +1182,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, if (thrashing) { if (delayacct) delayacct_thrashing_end(); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_WORKINGSET_THRASH); } /* diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7a4bd8b..a9b336e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2399,9 +2399,9 @@ void mem_cgroup_handle_over_high(void) * schedule_timeout_killable sets TASK_KILLABLE). This means we don't * need to account for any ill-begotten jiffies to pay them off later. */ - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_RECLAIM_HIGH); schedule_timeout_killable(penalty_jiffies); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_RECLAIM_HIGH); out: css_put(&memcg->css); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c4eb75..8789234a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3884,14 +3884,14 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) if (!order) return NULL; - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_COMPACT); noreclaim_flag = memalloc_noreclaim_save(); *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, prio, &page); memalloc_noreclaim_restore(noreclaim_flag); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_COMPACT); /* * At least in one zone compaction wasn't deferred or skipped, so let's @@ -4106,7 +4106,7 @@ void fs_reclaim_release(gfp_t gfp_mask) /* We now go into synchronous reclaim */ cpuset_memory_pressure_bump(); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_RECLAIM_DIRECT); fs_reclaim_acquire(gfp_mask); noreclaim_flag = memalloc_noreclaim_save(); @@ -4115,7 +4115,7 @@ void fs_reclaim_release(gfp_t gfp_mask) memalloc_noreclaim_restore(noreclaim_flag); fs_reclaim_release(gfp_mask); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_RECLAIM_DIRECT); cond_resched(); diff --git a/mm/page_io.c b/mm/page_io.c index 76965be..67de6b1 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -369,7 +369,7 @@ int swap_readpage(struct page *page, bool synchronous) * or the submitting cgroup IO-throttled, submission can be a * significant part of overall IO time. */ - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_SWAPIO); if (frontswap_load(page) == 0) { SetPageUptodate(page); @@ -431,7 +431,7 @@ int swap_readpage(struct page *page, bool synchronous) bio_put(bio); out: - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_SWAPIO); return ret; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 8763705..4445c1d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3352,13 +3352,13 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, trace_mm_vmscan_memcg_reclaim_begin(0, sc.gfp_mask); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_RECLAIM_MEMCG); noreclaim_flag = memalloc_noreclaim_save(); nr_reclaimed = do_try_to_free_pages(zonelist, &sc); memalloc_noreclaim_restore(noreclaim_flag); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_RECLAIM_MEMCG); trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed); set_task_reclaim_state(current, NULL); @@ -3568,7 +3568,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) }; set_task_reclaim_state(current, &sc.reclaim_state); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_KSWAPD); __fs_reclaim_acquire(); count_vm_event(PAGEOUTRUN); @@ -3747,7 +3747,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) snapshot_refaults(NULL, pgdat); __fs_reclaim_release(); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_KSWAPD); set_task_reclaim_state(current, NULL); /* From patchwork Thu Mar 26 11:12:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11459859 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D3C1A6CA for ; Thu, 26 Mar 2020 11:12:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9322120409 for ; Thu, 26 Mar 2020 11:12:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iELsYFB7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9322120409 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B05A86B0072; Thu, 26 Mar 2020 07:12:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AB5C26B0073; Thu, 26 Mar 2020 07:12:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CDCB6B0074; Thu, 26 Mar 2020 07:12:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 850C16B0072 for ; Thu, 26 Mar 2020 07:12:56 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 661658248047 for ; Thu, 26 Mar 2020 11:12:56 +0000 (UTC) X-FDA: 76637251152.14.thumb24_199ad09693819 X-Spam-Summary: 2,0,0,33d1cad7d9e484b2,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:1:2:41:355:379:541:800:960:973:982:988:989:1260:1345:1359:1434:1437:1605:1730:1747:1777:1792:1801:2393:2559:2562:2897:2912:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4051:4321:4605:5007:6119:6261:6653:7514:9149:9413:10004:11026:11473:11658:11914:12043:12048:12114:12257:12291:12294:12296:12297:12438:12517:12519:12555:12679:12683:12895:12986:13255:14096:14394:14687:21080:21444:21450:21451:21611:21627:21666:21740:21966:21990:30045:30054:30056,0,RBL:209.85.214.196:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: thumb24_199ad09693819 X-Filterd-Recvd-Size: 11186 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 26 Mar 2020 11:12:55 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id g2so2003100plo.3 for ; Thu, 26 Mar 2020 04:12:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=lDcRAPOAOCjxei5GIfDQABYJpwvJHNWVWIgGygWjlfU=; b=iELsYFB7VpOogk9P9fQpVkKjPl9106/LwwtS35V+SKTH2G9QOEQme56XM0YTTDT0t7 JyZMHFe17Ei43hfsUIZEl4y4TY2uOS+1VnORvr90+36VcG0y/qfGqU+dm9VBANQut0e0 B1Z7rSwuXLh7uc8gyk0yGg+OlfSKCzwLTzeUUUvzHRJP7XdUAtt2p8KntcuIv5b5br1t GlD9yKSlI00phiyzl+vT22TZuGIa+f+TqCh/CXqFmxepjY4Bxya7ZAhf/cSYZykUmzxV fhQ93AF+6vmxuvTds1Q9pMK1+2dBYPI/qtG2EHNvWEzJyhHW1BAYrUCyGpAYwhzcy1U1 6v6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=lDcRAPOAOCjxei5GIfDQABYJpwvJHNWVWIgGygWjlfU=; b=m23YyPpf9prgFRl7Aj+WOA7k9KPjhDQOYZaeb5XbAYQ6i+vAVTs9ZnOP2F4BIoSJi1 OD3AKmksz64eGoEBhaLcVz7eBa0a/RJ2FYie9g1r1GcjYGnckPZf2Hfv9O23nffMes7v Reml8sb8mWssqaLjPlDje2DO4/f/GDj+bS1suU5BSbyStUwy5txHqglLhD9WZmP4QPIe 2QjvKd0z0LM8luLhQ6XpfzWdDj2YJ6L0yL44LlGa0uCvG+XPgtBJ368S5OXpsIZAIqzr vvGl1VXrK3LNOWofpnLMBMo1MMzjodse0nfShg0w1PtSAfgpJse7qhf8P36IgpEN32jy rVMw== X-Gm-Message-State: ANhLgQ2i/xW7zFDyjTgwCFz8x9wZnODrytXIqJomU/cQ0JFh0B4dvz0T woXfF1PWgckVwQC6AsVWmbY= X-Google-Smtp-Source: ADFU+vusNe8isEuLjYVdOaSJm3pPllANr4V6EDWHvBZaYRQbgIVI+9LfWkRoZv1XUfRz+yO5J9nohA== X-Received: by 2002:a17:902:20b:: with SMTP id 11mr7030287plc.209.1585221174653; Thu, 26 Mar 2020 04:12:54 -0700 (PDT) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id m9sm1427723pff.93.2020.03.26.04.12.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Mar 2020 04:12:54 -0700 (PDT) From: Yafang Shao To: hannes@cmpxchg.org, peterz@infradead.org, akpm@linux-foundation.org, mhocko@kernel.org, axboe@kernel.dk, mgorman@suse.de, rostedt@goodmis.org, mingo@redhat.com Cc: linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH 2/2] psi, tracepoint: introduce tracepoints for psi_memstall_{enter, leave} Date: Thu, 26 Mar 2020 07:12:07 -0400 Message-Id: <1585221127-11458-3-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1585221127-11458-1-git-send-email-laoar.shao@gmail.com> References: <1585221127-11458-1-git-send-email-laoar.shao@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With the new parameter introduced in psi_memstall_{enter, leave} we can get the specific type of memstal. To make it easier to use, we'd better introduce tracepoints for them. Once these two tracepoints are added we can easily use other tools like ebpf or bash script to collect the memstall data and analyze. Here's one example with bpftrace to measure application's latency. tracepoint:sched:psi_memstall_enter { @start[tid, args->type] = nsecs } tracepoint:sched:psi_memstall_leave { @time[comm, args->type] = hist(nsecs - @start[tid, args->type]); delete(@start[tid, args->type]); } Bellow is part of the result after producing some memory pressure. @time[objdump, 7]: [256K, 512K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[objdump, 6]: [8K, 16K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[objcopy, 7]: [16K, 32K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[ld, 7]: [4M, 8M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8M, 16M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[khugepaged, 5]: [4K, 8K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8K, 16K) 0 | | [16K, 32K) 0 | | [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 0 | | [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 0 | | [8M, 16M) 0 | | [16M, 32M) 0 | | [32M, 64M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[kswapd0, 0]: [16K, 32K) 1 |@@@@@ | [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 0 | | [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 0 | | [8M, 16M) 1 |@@@@@ | [16M, 32M) 10 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32M, 64M) 9 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64M, 128M) 2 |@@@@@@@@@@ | [128M, 256M) 2 |@@@@@@@@@@ | [256M, 512M) 3 |@@@@@@@@@@@@@@@ | [512M, 1G) 1 |@@@@@ | @time[kswapd1, 0]: [1M, 2M) 1 |@@@@ | [2M, 4M) 2 |@@@@@@@@ | [4M, 8M) 0 | | [8M, 16M) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16M, 32M) 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32M, 64M) 5 |@@@@@@@@@@@@@@@@@@@@@ | [64M, 128M) 5 |@@@@@@@@@@@@@@@@@@@@@ | [128M, 256M) 3 |@@@@@@@@@@@@@ | [256M, 512M) 1 |@@@@ | @time[khugepaged, 1]: [2M, 4M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| With the builtin variable 'cgroup' of bpftrace we can also filter a memcg and its descendants. Signed-off-by: Yafang Shao --- include/trace/events/sched.h | 41 +++++++++++++++++++++++++++++++++++++++++ kernel/sched/psi.c | 8 ++++++++ 2 files changed, 49 insertions(+) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 420e80e..6aca996 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -7,8 +7,20 @@ #include #include +#include #include +#define show_psi_memstall_type(type) __print_symbolic(type, \ + {MEMSTALL_KSWAPD, "MEMSTALL_KSWAPD"}, \ + {MEMSTALL_RECLAIM_DIRECT, "MEMSTALL_RECLAIM_DIRECT"}, \ + {MEMSTALL_RECLAIM_MEMCG, "MEMSTALL_RECLAIM_MEMCG"}, \ + {MEMSTALL_RECLAIM_HIGH, "MEMSTALL_RECLAIM_HIGH"}, \ + {MEMSTALL_KCOMPACTD, "MEMSTALL_KCOMPACTD"}, \ + {MEMSTALL_COMPACT, "MEMSTALL_COMPACT"}, \ + {MEMSTALL_WORKINGSET, "MEMSTALL_WORKINGSET"}, \ + {MEMSTALL_PGLOCK, "MEMSTALL_PGLOCK"}, \ + {MEMSTALL_MEMDELAY, "MEMSTALL_MEMDELAY"}, \ + {MEMSTALL_SWAP, "MEMSTALL_SWAP"}) /* * Tracepoint for calling kthread_stop, performed to end a kthread: */ @@ -625,6 +637,35 @@ static inline long __trace_sched_switch_state(bool preempt, struct task_struct * TP_PROTO(struct root_domain *rd, bool overutilized), TP_ARGS(rd, overutilized)); +DECLARE_EVENT_CLASS(psi_memstall_template, + + TP_PROTO(int type), + + TP_ARGS(type), + + TP_STRUCT__entry( + __field(int, type) + ), + + TP_fast_assign( + __entry->type = type; + ), + + TP_printk("type=%s", + show_psi_memstall_type(__entry->type)) +); + +DEFINE_EVENT(psi_memstall_template, psi_memstall_enter, + TP_PROTO(int type), + TP_ARGS(type) +); + +DEFINE_EVENT(psi_memstall_template, psi_memstall_leave, + TP_PROTO(int type), + TP_ARGS(type) +); + + #endif /* _TRACE_SCHED_H */ /* This part must be outside protection */ diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 460f084..4c5a402 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -142,6 +142,8 @@ #include #include "sched.h" +#include + static int psi_bug __read_mostly; DEFINE_STATIC_KEY_FALSE(psi_disabled); @@ -822,6 +824,9 @@ void psi_memstall_enter(unsigned long *flags, enum memstall_types type) *flags = current->flags & PF_MEMSTALL; if (*flags) return; + + trace_psi_memstall_enter(type); + /* * PF_MEMSTALL setting & accounting needs to be atomic wrt * changes to the task's scheduling state, otherwise we can @@ -852,6 +857,9 @@ void psi_memstall_leave(unsigned long *flags, enum memstall_types type) if (*flags) return; + + trace_psi_memstall_leave(type); + /* * PF_MEMSTALL clearing & accounting needs to be atomic wrt * changes to the task's scheduling state, otherwise we could