From patchwork Tue Mar 31 10:04:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11467189 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B562D92C for ; Tue, 31 Mar 2020 10:05:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 69141206DB for ; Tue, 31 Mar 2020 10:05:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hg+/mPQZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 69141206DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 980CA6B0070; Tue, 31 Mar 2020 06:05:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9569D6B00A5; Tue, 31 Mar 2020 06:05:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 846FB6B00A6; Tue, 31 Mar 2020 06:05:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0226.hostedemail.com [216.40.44.226]) by kanga.kvack.org (Postfix) with ESMTP id 67E4C6B0070 for ; Tue, 31 Mar 2020 06:05:33 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 32469824805A for ; Tue, 31 Mar 2020 10:05:33 +0000 (UTC) X-FDA: 76655225346.02.title76_135185c30600d X-Spam-Summary: 2,0,0,7bb16658295e3e32,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:1:41:355:379:541:800:960:966:973:988:989:1260:1345:1359:1437:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2559:2562:2637:2731:2890:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4042:4250:4321:4385:4605:5007:6261:6653:7514:7875:7903:8957:9413:11026:11473:11658:11914:12043:12296:12297:12438:12517:12519:12555:12679:12895:12986:13161:13229:14096:14394:14687:14877:21080:21324:21325:21444:21450:21451:21627:21666:21740:21987:21990:30001:30045:30054,0,RBL:209.85.214.195:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:1:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: title76_135185c30600d X-Filterd-Recvd-Size: 14571 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Tue, 31 Mar 2020 10:05:32 +0000 (UTC) Received: by mail-pl1-f195.google.com with SMTP id x1so7949378plm.4 for ; Tue, 31 Mar 2020 03:05:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=MloxXFbMJ1hXYukDxa8eWwQXwB55vcrDA9BhobMXRMs=; b=hg+/mPQZaABDp0stHAGwojMwlPnDyTtJekkViihHxNj25iCdhlAQNHZ9Xt1Q+JR4j5 agQbrkFwa5pkxywAgVNuOAPZJM6og63M2t2WIzL2rqm1Y2W8gRi3/KWtMlX6CX06lbbt duiN3jqLElEUMezOfbk153U2hX+CgBFbmg19slPZeJzJ7AWWfxZo+bfe4Lg+YDi6d3rV z0uCC6J1BF2oou5LxgsFtF7oO7VyrUeppORpRdQnGDwWCwwvy3semuiLQCNUISIwwRN0 v1hfZ/2IvTMyRhg7dBRhWfS6NFsOqw3N3D8c/d5Ax+F8UvubwTO9oExhUCkLSY21haiS zpog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=MloxXFbMJ1hXYukDxa8eWwQXwB55vcrDA9BhobMXRMs=; b=DppZPObsIYL24oY76hZFOAdL2rbO3vGRnzA+bEYjjEZ76iibzUuMB8Aqb3Z6Pn/pxV B1vyGpkCcFsx5C67EsRJjaZVX5QocqDZN+scDzvGzXm8A4NFCQJoBgJj8gG7ZLSYXuM0 F1lLrHMEAYCiUgboQKsPVh6KeeQAHj0T8uvDpoTM9ymXq5lt5n7AMYdQ6w1xU0elkmMp wWdt1NOwMih3IVwO99BPWoMEVY3d6Q+8dyWHLiWqTHjtH2EjZClHihOOc6Bh8Anm341Q x0gjnC0uY6aWtjd0Er0roFVX9jyO2v2of9apfXY3YcHCfqSt/u1h4SVJ5vsTxAzq10/Y K9UA== X-Gm-Message-State: AGi0PuaTf1oYpjPQ/9qMgqoNTtgUXRtPepbCz7xjPzcDVFoN/kufu2yD e7IZCp9MrO2HcKCzVmWNCJ4= X-Google-Smtp-Source: APiQypImb5l+ZId041fb0OciZorkOYhGfqmxks4N01Uc6kZor5b6cZJ0VZhpgal3Glh+B/JnveJKdA== X-Received: by 2002:a17:90a:e398:: with SMTP id b24mr2922913pjz.113.1585649131552; Tue, 31 Mar 2020 03:05:31 -0700 (PDT) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id y207sm12354592pfb.189.2020.03.31.03.05.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Mar 2020 03:05:30 -0700 (PDT) From: Yafang Shao To: hannes@cmpxchg.org, peterz@infradead.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v2 1/2] psi: introduce various types of memstall Date: Tue, 31 Mar 2020 06:04:36 -0400 Message-Id: <1585649077-10896-2-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1585649077-10896-1-git-send-email-laoar.shao@gmail.com> References: <1585649077-10896-1-git-send-email-laoar.shao@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The memstall is used as a memory pressure index now. But there're many paths to get into memstall, so once memstall happens we don't know the specific reason of it. This patch introduces various types of memstall as bellow, MEMSTALL_KSWAPD MEMSTALL_RECLAIM_DIRECT MEMSTALL_RECLAIM_MEMCG MEMSTALL_RECLAIM_HIGH MEMSTALL_KCOMPACTD MEMSTALL_COMPACT MEMSTALL_WORKINGSET_REFAULT MEMSTALL_WORKINGSET_THRASH MEMSTALL_MEMDELAY MEMSTALL_SWAPIO and adds a new parameter 'type' in psi_memstall_{enter, leave}. After that, we can trace specific types of memstall with other powerful tools like tracepoint, kprobe, ebpf and etc. It can also help us to analyze latency spike caused by memory pressure. But note that we can't use it to build memory pressure for a specific type of memstall, e.g. memcg pressure, compaction pressure and etc, because it doesn't implement various types of task->in_memstall, e.g. task->in_memcgstall, task->in_compactionstall and etc. IOW, the main goal of it is to trace the spread of latencies and the specific reason of these latencies. Although there're already some tracepoints can help us to achieve this goal, e.g. vmscan:mm_vmscan_kswapd_{wake, sleep} vmscan:mm_vmscan_direct_reclaim_{begin, end} vmscan:mm_vmscan_memcg_reclaim_{begin, end} /* no tracepoint for memcg high reclaim*/ compcation:mm_compaction_kcompactd_{wake, sleep} compcation:mm_compaction_begin_{begin, end} /* no tracepoint for workingset refault */ /* no tracepoint for workingset thrashing */ /* no tracepoint for use memdelay */ /* no tracepoint for swapio */ but psi_memstall_{enter, leave} gives us a unified entrance for all types of memstall and we don't need to add many begin and end tracepoints that hasn't been implemented yet. Signed-off-by: Yafang Shao --- block/blk-cgroup.c | 4 ++-- block/blk-core.c | 4 ++-- include/linux/psi.h | 15 +++++++++++---- include/linux/psi_types.h | 13 +++++++++++++ kernel/sched/psi.c | 6 ++++-- mm/compaction.c | 4 ++-- mm/filemap.c | 4 ++-- mm/memcontrol.c | 4 ++-- mm/page_alloc.c | 8 ++++---- mm/page_io.c | 4 ++-- mm/vmscan.c | 8 ++++---- 11 files changed, 48 insertions(+), 26 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index a229b94d5390..fc24095c13c0 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1593,7 +1593,7 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) delay_nsec = min_t(u64, delay_nsec, 250 * NSEC_PER_MSEC); if (use_memdelay) - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_MEMDELAY); exp = ktime_add_ns(now, delay_nsec); tok = io_schedule_prepare(); @@ -1605,7 +1605,7 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) io_schedule_finish(tok); if (use_memdelay) - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_MEMDELAY); } /** diff --git a/block/blk-core.c b/block/blk-core.c index 60dc9552ef8d..e2039cf4719a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1190,12 +1190,12 @@ blk_qc_t submit_bio(struct bio *bio) * submission can be a significant part of overall IO time. */ if (workingset_read) - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_WORKINGSET_REFAULT); ret = generic_make_request(bio); if (workingset_read) - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_WORKINGSET_REFAULT); return ret; } diff --git a/include/linux/psi.h b/include/linux/psi.h index 7b3de7321219..7bf94f6fb5e8 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -19,8 +19,8 @@ void psi_init(void); void psi_task_change(struct task_struct *task, int clear, int set); void psi_memstall_tick(struct task_struct *task, int cpu); -void psi_memstall_enter(unsigned long *flags); -void psi_memstall_leave(unsigned long *flags); +void psi_memstall_enter(unsigned long *flags, enum memstall_types type); +void psi_memstall_leave(unsigned long *flags, enum memstall_types type); int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); @@ -41,8 +41,15 @@ __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, static inline void psi_init(void) {} -static inline void psi_memstall_enter(unsigned long *flags) {} -static inline void psi_memstall_leave(unsigned long *flags) {} +static inline void psi_memstall_enter(unsigned long *flags, + enum memstall_types type) +{ +} + +static inline void psi_memstall_leave(unsigned long *flags, + enum memstall_types type) +{ +} #ifdef CONFIG_CGROUPS static inline int psi_cgroup_alloc(struct cgroup *cgrp) diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 07aaf9b82241..48ebb51484f9 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -7,6 +7,19 @@ #include #include +enum memstall_types { + MEMSTALL_KSWAPD, + MEMSTALL_RECLAIM_DIRECT, + MEMSTALL_RECLAIM_MEMCG, + MEMSTALL_RECLAIM_HIGH, + MEMSTALL_KCOMPACTD, + MEMSTALL_COMPACT, + MEMSTALL_WORKINGSET_REFAULT, + MEMSTALL_WORKINGSET_THRASH, + MEMSTALL_MEMDELAY, + MEMSTALL_SWAPIO, +}; + #ifdef CONFIG_PSI /* Tracked task states */ diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 028520702717..460f08436b58 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -806,11 +806,12 @@ void psi_memstall_tick(struct task_struct *task, int cpu) /** * psi_memstall_enter - mark the beginning of a memory stall section * @flags: flags to handle nested sections + * @type: type of memstall * * Marks the calling task as being stalled due to a lack of memory, * such as waiting for a refault or performing reclaim. */ -void psi_memstall_enter(unsigned long *flags) +void psi_memstall_enter(unsigned long *flags, enum memstall_types type) { struct rq_flags rf; struct rq *rq; @@ -837,10 +838,11 @@ void psi_memstall_enter(unsigned long *flags) /** * psi_memstall_leave - mark the end of an memory stall section * @flags: flags to handle nested memdelay sections + * @type: type of memstall * * Marks the calling task as no longer stalled due to lack of memory. */ -void psi_memstall_leave(unsigned long *flags) +void psi_memstall_leave(unsigned long *flags, enum memstall_types type) { struct rq_flags rf; struct rq *rq; diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c78c6ab..c0d533192974 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2647,9 +2647,9 @@ static int kcompactd(void *p) wait_event_freezable(pgdat->kcompactd_wait, kcompactd_work_requested(pgdat)); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_KCOMPACTD); kcompactd_do_work(pgdat); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_KCOMPACTD); } return 0; diff --git a/mm/filemap.c b/mm/filemap.c index 1784478270e1..f5459e3850ef 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1123,7 +1123,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, delayacct_thrashing_start(); delayacct = true; } - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_WORKINGSET_THRASH); thrashing = true; } @@ -1182,7 +1182,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, if (thrashing) { if (delayacct) delayacct_thrashing_end(); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_WORKINGSET_THRASH); } /* diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7a4bd8b9adc2..a9b336ea7fe5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2399,9 +2399,9 @@ void mem_cgroup_handle_over_high(void) * schedule_timeout_killable sets TASK_KILLABLE). This means we don't * need to account for any ill-begotten jiffies to pay them off later. */ - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_RECLAIM_HIGH); schedule_timeout_killable(penalty_jiffies); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_RECLAIM_HIGH); out: css_put(&memcg->css); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c4eb750a199..8789234a2fca 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3884,14 +3884,14 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, if (!order) return NULL; - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_COMPACT); noreclaim_flag = memalloc_noreclaim_save(); *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, prio, &page); memalloc_noreclaim_restore(noreclaim_flag); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_COMPACT); /* * At least in one zone compaction wasn't deferred or skipped, so let's @@ -4106,7 +4106,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, /* We now go into synchronous reclaim */ cpuset_memory_pressure_bump(); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_RECLAIM_DIRECT); fs_reclaim_acquire(gfp_mask); noreclaim_flag = memalloc_noreclaim_save(); @@ -4115,7 +4115,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, memalloc_noreclaim_restore(noreclaim_flag); fs_reclaim_release(gfp_mask); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_RECLAIM_DIRECT); cond_resched(); diff --git a/mm/page_io.c b/mm/page_io.c index 76965be1d40e..67de6b1801a4 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -369,7 +369,7 @@ int swap_readpage(struct page *page, bool synchronous) * or the submitting cgroup IO-throttled, submission can be a * significant part of overall IO time. */ - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_SWAPIO); if (frontswap_load(page) == 0) { SetPageUptodate(page); @@ -431,7 +431,7 @@ int swap_readpage(struct page *page, bool synchronous) bio_put(bio); out: - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_SWAPIO); return ret; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 876370565455..4445c1dd9551 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3352,13 +3352,13 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, trace_mm_vmscan_memcg_reclaim_begin(0, sc.gfp_mask); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_RECLAIM_MEMCG); noreclaim_flag = memalloc_noreclaim_save(); nr_reclaimed = do_try_to_free_pages(zonelist, &sc); memalloc_noreclaim_restore(noreclaim_flag); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_RECLAIM_MEMCG); trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed); set_task_reclaim_state(current, NULL); @@ -3568,7 +3568,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) }; set_task_reclaim_state(current, &sc.reclaim_state); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, MEMSTALL_KSWAPD); __fs_reclaim_acquire(); count_vm_event(PAGEOUTRUN); @@ -3747,7 +3747,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) snapshot_refaults(NULL, pgdat); __fs_reclaim_release(); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, MEMSTALL_KSWAPD); set_task_reclaim_state(current, NULL); /* From patchwork Tue Mar 31 10:04:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11467191 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2B77781 for ; Tue, 31 Mar 2020 10:05:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D134E207FF for ; Tue, 31 Mar 2020 10:05:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PC/8nLqZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D134E207FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 00AA56B00A5; Tue, 31 Mar 2020 06:05:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EFE2D6B00A7; Tue, 31 Mar 2020 06:05:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E39E06B00A8; Tue, 31 Mar 2020 06:05:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id CA5766B00A5 for ; Tue, 31 Mar 2020 06:05:39 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8682E181AEF07 for ; Tue, 31 Mar 2020 10:05:39 +0000 (UTC) X-FDA: 76655225598.18.tax57_144095f673712 X-Spam-Summary: 2,0,0,c955c49318b92ff2,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:1:2:41:355:379:541:800:960:973:982:988:989:1260:1345:1359:1434:1437:1605:1730:1747:1777:1792:1801:2198:2199:2393:2559:2562:2897:2912:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4052:4321:4605:5007:6119:6261:6653:7514:7903:8784:9121:9149:9413:10004:11026:11233:11473:11658:11914:12043:12114:12219:12257:12291:12294:12296:12297:12438:12517:12519:12555:12679:12683:12895:12986:13161:13229:13255:14096:14394:14687:14877:21080:21324:21444:21450:21451:21627:21666:21740:21795:21939:21966:21990:30045:30051:30054:30056:30074,0,RBL:209.85.210.195:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:28,LUA_SUMMARY:none X-HE-Tag: tax57_144095f673712 X-Filterd-Recvd-Size: 12631 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Tue, 31 Mar 2020 10:05:38 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id q3so10091673pff.13 for ; Tue, 31 Mar 2020 03:05:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Jde356x7OUSc/Bp1j+s7N+aNB1dzdB1pnUBgckdQu7k=; b=PC/8nLqZRbwWAJDN/Whp+AjZkOzcriGF7a5GXqu/Yw17uTgIRQTr2fCDE8s9RRQcEI mRrL08hMmHo6BABkYQRe0Ut0wwJnbhBerHVlJasIpFfsDVniuf9rWedzrJwthsilEEIa 30Ii5YSl9XZUIhr42cdEAF6H4kZyPxuYs17m9GSnvyytCZFh3NBfGfqnF5XQjmCpkcxR NSZvVdYznFWvEjjJR0eHllb156SB2vZwL6ulrqBbgFv1AwOTJPUdPe9wwmYvCBRjeytY bovCxxbf8F9Tj8sm+IBkQsbEcvqO6e93YvIyUOPJgH9h5xD5UnOgeRM1kg6X6DpoGKQI 2loQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Jde356x7OUSc/Bp1j+s7N+aNB1dzdB1pnUBgckdQu7k=; b=Jc1SA6c0ekJ8/HlESURUBWfgR4R2ntJJnjVBfrCHenBU40NOsMhqyY/N6VSVRY3RHt XQfMtr0kPizbLzyH2sPIlrgit6+g5ArIMjGm3KY2i57wNcB8OmFRb3kBV+HUxO8HUo5n vMLLuUm3db4kIlUxewzvCPrOP6mz9oJhU9niT5DJjpzDh/tQudSVNx42acZOfAZiLNf3 /1dwgaFZzabeVOLdVkZ3zDGSGBTz3rpq74GJTHTPoyMX6JcacL7QbkPhF1vcr5nE2/rg XncoIDcIEaAwiJy0VGD6Kod3ZLrvM+8fle78iqCRTf3en/HOelu8uJmLGXiLdfzYEODq 5WuQ== X-Gm-Message-State: AGi0PuZWq2F93pb4kGcVDIPf6AHrd8GNRm27Fuy3JL+fJbmcomknx63W ab+65DZSDjro7jCjJDUWpU4= X-Google-Smtp-Source: APiQypKKTQ9gpnEDt3Fh2IjtoGiTIL94uGDe2FOPJ2UaKVSv8pSXE4LZyvu98LBIQWhyGhLtK4Zm9A== X-Received: by 2002:a63:1053:: with SMTP id 19mr3550400pgq.60.1585649137821; Tue, 31 Mar 2020 03:05:37 -0700 (PDT) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id y207sm12354592pfb.189.2020.03.31.03.05.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Mar 2020 03:05:37 -0700 (PDT) From: Yafang Shao To: hannes@cmpxchg.org, peterz@infradead.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH v2 2/2] psi, tracepoint: introduce tracepoints for psi_memstall_{enter, leave} Date: Tue, 31 Mar 2020 06:04:37 -0400 Message-Id: <1585649077-10896-3-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1585649077-10896-1-git-send-email-laoar.shao@gmail.com> References: <1585649077-10896-1-git-send-email-laoar.shao@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000051, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With the new parameter introduced in psi_memstall_{enter, leave} we can get the specific type of memstal. To make it easier to use, we'd better introduce tracepoints for them. Once these two tracepoints are added we can easily use other tools like ebpf or bash script to collect the memstall data and analyze. The output of these tracepoints is, usemem-30288 [012] .... 302479.734290: psi_memstall_enter: type=MEMSTALL_RECLAIM_DIRECT usemem-30288 [012] .N.. 302479.741186: psi_memstall_leave: type=MEMSTALL_RECLAIM_DIRECT usemem-30288 [021] .... 302479.742075: psi_memstall_enter: type=MEMSTALL_COMPACT usemem-30288 [021] .... 302479.744869: psi_memstall_leave: type=MEMSTALL_COMPACT <...>-388 [000] .... 302514.609040: psi_memstall_enter: type=MEMSTALL_KSWAPD kswapd0-388 [000] .... 302514.616376: psi_memstall_leave: type=MEMSTALL_KSWAPD <...>-223 [024] .... 302514.616380: psi_memstall_enter: type=MEMSTALL_KCOMPACTD kcompactd0-223 [024] .... 302514.618414: psi_memstall_leave: type=MEMSTALL_KCOMPACTD supervisorctl-31675 [014] .... 302516.281293: psi_memstall_enter: type=MEMSTALL_WORKINGSET_REFAULT supervisorctl-31675 [014] .N.. 302516.281314: psi_memstall_leave: type=MEMSTALL_WORKINGSET_REFAULT bash-32092 [034] .... 302526.225639: psi_memstall_enter: type=MEMSTALL_WORKINGSET_THRASH bash-32092 [034] .... 302526.225843: psi_memstall_leave: type=MEMSTALL_WORKINGSET_THRASH Here's one example with bpftrace to measure application's latency with these tracepoints. tracepoint:sched:psi_memstall_enter { @start[tid, args->type] = nsecs } tracepoint:sched:psi_memstall_leave { @time[comm, args->type] = hist(nsecs - @start[tid, args->type]); delete(@start[tid, args->type]); } Bellow is part of the result after producing some memory pressure. @time[objdump, 7]: [256K, 512K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[objdump, 6]: [8K, 16K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[objcopy, 7]: [16K, 32K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[ld, 7]: [4M, 8M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8M, 16M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[khugepaged, 5]: [4K, 8K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8K, 16K) 0 | | [16K, 32K) 0 | | [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 0 | | [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 0 | | [8M, 16M) 0 | | [16M, 32M) 0 | | [32M, 64M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[kswapd0, 0]: [16K, 32K) 1 |@@@@@ | [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 0 | | [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 0 | | [8M, 16M) 1 |@@@@@ | [16M, 32M) 10 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32M, 64M) 9 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64M, 128M) 2 |@@@@@@@@@@ | [128M, 256M) 2 |@@@@@@@@@@ | [256M, 512M) 3 |@@@@@@@@@@@@@@@ | [512M, 1G) 1 |@@@@@ | @time[kswapd1, 0]: [1M, 2M) 1 |@@@@ | [2M, 4M) 2 |@@@@@@@@ | [4M, 8M) 0 | | [8M, 16M) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16M, 32M) 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32M, 64M) 5 |@@@@@@@@@@@@@@@@@@@@@ | [64M, 128M) 5 |@@@@@@@@@@@@@@@@@@@@@ | [128M, 256M) 3 |@@@@@@@@@@@@@ | [256M, 512M) 1 |@@@@ | @time[khugepaged, 1]: [2M, 4M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| From these traced data, we can find that the high latencies of user tasks are always type 7 of memstall, which is MEMSTALL_WORKINGSET_THRASH, and then we should look into the details of wokingset of the user tasks and think about how to improve it - for example by reducing the workingset. With the builtin variable 'cgroup' of bpftrace we can also filter a memcg and its descendants. Signed-off-by: Yafang Shao --- include/trace/events/sched.h | 41 ++++++++++++++++++++++++++++++++++++ kernel/sched/psi.c | 8 +++++++ 2 files changed, 49 insertions(+) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 420e80e56e55..8ea2cdf78810 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -7,8 +7,20 @@ #include #include +#include #include +#define show_psi_memstall_type(type) __print_symbolic(type, \ + {MEMSTALL_KSWAPD, "MEMSTALL_KSWAPD"}, \ + {MEMSTALL_RECLAIM_DIRECT, "MEMSTALL_RECLAIM_DIRECT"}, \ + {MEMSTALL_RECLAIM_MEMCG, "MEMSTALL_RECLAIM_MEMCG"}, \ + {MEMSTALL_RECLAIM_HIGH, "MEMSTALL_RECLAIM_HIGH"}, \ + {MEMSTALL_KCOMPACTD, "MEMSTALL_KCOMPACTD"}, \ + {MEMSTALL_COMPACT, "MEMSTALL_COMPACT"}, \ + {MEMSTALL_WORKINGSET_REFAULT, "MEMSTALL_WORKINGSET_REFAULT"}, \ + {MEMSTALL_WORKINGSET_THRASH, "MEMSTALL_WORKINGSET_THRASH"}, \ + {MEMSTALL_MEMDELAY, "MEMSTALL_MEMDELAY"}, \ + {MEMSTALL_SWAPIO, "MEMSTALL_SWAPIO"}) /* * Tracepoint for calling kthread_stop, performed to end a kthread: */ @@ -625,6 +637,35 @@ DECLARE_TRACE(sched_overutilized_tp, TP_PROTO(struct root_domain *rd, bool overutilized), TP_ARGS(rd, overutilized)); +DECLARE_EVENT_CLASS(psi_memstall_template, + + TP_PROTO(int type), + + TP_ARGS(type), + + TP_STRUCT__entry( + __field(int, type) + ), + + TP_fast_assign( + __entry->type = type; + ), + + TP_printk("type=%s", + show_psi_memstall_type(__entry->type)) +); + +DEFINE_EVENT(psi_memstall_template, psi_memstall_enter, + TP_PROTO(int type), + TP_ARGS(type) +); + +DEFINE_EVENT(psi_memstall_template, psi_memstall_leave, + TP_PROTO(int type), + TP_ARGS(type) +); + + #endif /* _TRACE_SCHED_H */ /* This part must be outside protection */ diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 460f08436b58..4c5a40222e88 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -142,6 +142,8 @@ #include #include "sched.h" +#include + static int psi_bug __read_mostly; DEFINE_STATIC_KEY_FALSE(psi_disabled); @@ -822,6 +824,9 @@ void psi_memstall_enter(unsigned long *flags, enum memstall_types type) *flags = current->flags & PF_MEMSTALL; if (*flags) return; + + trace_psi_memstall_enter(type); + /* * PF_MEMSTALL setting & accounting needs to be atomic wrt * changes to the task's scheduling state, otherwise we can @@ -852,6 +857,9 @@ void psi_memstall_leave(unsigned long *flags, enum memstall_types type) if (*flags) return; + + trace_psi_memstall_leave(type); + /* * PF_MEMSTALL clearing & accounting needs to be atomic wrt * changes to the task's scheduling state, otherwise we could