From patchwork Mon Aug 1 00:42:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: CGEL X-Patchwork-Id: 12933296 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 189A5C00140 for ; Mon, 1 Aug 2022 00:42:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 21E138E0002; Sun, 31 Jul 2022 20:42:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CDCE8E0001; Sun, 31 Jul 2022 20:42:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06FB68E0002; Sun, 31 Jul 2022 20:42:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id ED4378E0001 for ; Sun, 31 Jul 2022 20:42:14 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AB95C140727 for ; Mon, 1 Aug 2022 00:42:14 +0000 (UTC) X-FDA: 79749172188.08.E97E77F Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf27.hostedemail.com (Postfix) with ESMTP id 424144004A for ; Mon, 1 Aug 2022 00:42:14 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id c19-20020a17090ae11300b001f2f94ed5c6so10742479pjz.1 for ; Sun, 31 Jul 2022 17:42:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc; bh=WlcDnJCSyhboxjh7TZAjMrYuQfm2QWZ++EYsPBoElQI=; b=B5emr773hCnL0DlhSgcDlvnU5v5F6L7JDuBWV/BmxJSudNUE9QbT/vCVxDt1+8Ucpb Yxp6mlV8ioxWfaF0PTbZyWv887WIujffK8xdYIRwEoWm9zg6HidoXKktcRFWcl6+w6KS g5gPPcy4C16f7iMpv4SZ8IE5pc9WY75L0JubnrgtPQ7GgyLs7tVJbvAYPLZOU+IZTLel VwgjfurRmc3Q79bufoMBLuTdzz9RcHFX+YxBHDJeCg2pLg9ZoCrijW05NUmmlYCjCIVD wRXsilJcF0taTw5Qz/3b6DOhEZSH2zGcwicBE4B0QJa8pBtH93z6JWqYUIJH4Wl9El9s V3DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc; bh=WlcDnJCSyhboxjh7TZAjMrYuQfm2QWZ++EYsPBoElQI=; b=ULs0bNSS6kldc1qHkHNmMI0n14OaHPl4yN4b9nm2arhbGuQhD15F4cphGgbxlP0/ky dGB//Vq7w3KJLd3mbQW9mjvNi/thZYNPEj6LWRP9g/QBG2acwu2y+RFwiKo7o5eJ4WeF +xIevZVWCGyNA4tsz/7B6p4uVKcnISSXl0G1JlGWG73XO4iDhrAG047xR5vuJQT3zu// GiZrnrOTQBde5KLSbWRZkKPMEj+N2Srnr9mUueseUc/mbFBAz0kf7VMRehL8IxygvZyT a3hi5AQpzoVNk1SYadPQY/tvM8NHuBJqe5UuhWqjnuDeuSx1nuIzWfb9PPX+W/QoXxvm JBjQ== X-Gm-Message-State: ACgBeo2w2YZYs8wsk2uKoS0lj+uZeV0Ss7gYcMs7EKFcrjEu7ABs/bUk 6D8rnNKPHwXqUO/hAWNcAWY= X-Google-Smtp-Source: AA6agR4Z4wr4oZUVoAx77FeUn/ECastlRuQ0gI/qcsRVR7X0Dluz1fe4S4+iT+Q2fpWqsWb/n56z+g== X-Received: by 2002:a17:902:f608:b0:16d:20a0:5339 with SMTP id n8-20020a170902f60800b0016d20a05339mr14044496plg.133.1659314532928; Sun, 31 Jul 2022 17:42:12 -0700 (PDT) Received: from localhost.localdomain ([193.203.214.57]) by smtp.gmail.com with ESMTPSA id q6-20020a656846000000b0041c04286010sm1255471pgt.83.2022.07.31.17.42.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jul 2022 17:42:12 -0700 (PDT) From: cgel.zte@gmail.com X-Google-Original-From: ran.xiaokai@zte.com.cn To: hannes@cmpxchg.org, akpm@linux-foundation.org, tj@kernel.org, axboe@kernel.dk, vdavydov.dev@gmail.com Cc: ran.xiaokai@zte.com.cn, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgel Subject: [RFC PATCH 1/2] psi: introduce memory.pressure.stat Date: Mon, 1 Aug 2022 00:42:04 +0000 Message-Id: <20220801004205.1593100-1-ran.xiaokai@zte.com.cn> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659314534; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=WlcDnJCSyhboxjh7TZAjMrYuQfm2QWZ++EYsPBoElQI=; b=h/GFbsv5vp+KvdM7r6NwvnTsNgH3k6ZXJyE07k12m5r4u5FS+xao6pXKaudOEbC0GP7l0s Tu8DEizMJg45WfdkDyIV8b9+ZhRRFGLCcM56toHghsNPoVrRU6IkZge3OTSdzB43JC8lQ/ hDUq2TWZv/XqRgy0CMfP+C1g2NQBO2g= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=B5emr773; spf=pass (imf27.hostedemail.com: domain of cgel.zte@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=cgel.zte@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659314534; a=rsa-sha256; cv=none; b=IXnhCpZTtsct+fiscIXUO42mdbJZuhxa7qmK/GLUJJ2Jzr1LUqgSLqP9Jh47HAODL+i548 Ikh4Ph4oewfrmnhe7CL+TzI81Z4Mw6TDAcgrc4sHp5KH6BQ/3qsfLXbjcLlThhygWKsVw3 VQWXEmfG8DbQe30zKO+Y5+PkMG42MiQ= X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=B5emr773; spf=pass (imf27.hostedemail.com: domain of cgel.zte@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=cgel.zte@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: kupgbnf9iibwombbykx4doaxt3urwibu X-Rspamd-Queue-Id: 424144004A X-Rspamd-Server: rspam10 X-HE-Tag: 1659314534-719576 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: cgel For now psi memory pressure account for all the mem stall in the system, And didnot provide a detailed information why the stall happens. This patch introduce a cgroupu knob memory.pressure.stat, it tells the detailed stall information of all memory events and it format and the corresponding proc interface. for the cgroup, add memory.pressure.stat and it shows: kswapd: avg10=0.00 avg60=0.00 avg300=0.00 total=0 direct reclaim: avg10=0.00 avg60=0.00 avg300=0.12 total=42356 kcompacted: avg10=0.00 avg60=0.00 avg300=0.00 total=0 direct compact: avg10=0.00 avg60=0.00 avg300=0.00 total=0 cgroup reclaim: avg10=0.00 avg60=0.00 avg300=0.00 total=0 workingset thrashing: avg10=0.00 avg60=0.00 avg300=0.00 total=0 for the system wide, a proc file introduced as pressure/memory_stat and the format is the same as the cgroup interface. With this detaled information, for example, if the system is stalled because of kcompacted, compaction_proactiveness can be promoted so pro-compaction can be involved earlier. Signed-off-by: cgel --- include/linux/psi.h | 7 +-- include/linux/psi_types.h | 34 +++++++++++++ kernel/cgroup/cgroup.c | 11 ++++ kernel/sched/psi.c | 126 +++++++++++++++++++++++++++++++++++++++++++--- 4 files changed, 168 insertions(+), 10 deletions(-) diff --git a/include/linux/psi.h b/include/linux/psi.h index 7b3de73..163da43 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -19,10 +19,11 @@ void psi_init(void); void psi_task_change(struct task_struct *task, int clear, int set); void psi_memstall_tick(struct task_struct *task, int cpu); -void psi_memstall_enter(unsigned long *flags); -void psi_memstall_leave(unsigned long *flags); +void psi_memstall_enter(unsigned long *flags, int mem_state); +void psi_memstall_leave(unsigned long *flags, int mem_state); int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); +int psi_mem_pressure_stat_show(struct seq_file *m, void *v); #ifdef CONFIG_CGROUPS int psi_cgroup_alloc(struct cgroup *cgrp); @@ -41,7 +42,7 @@ __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, static inline void psi_init(void) {} -static inline void psi_memstall_enter(unsigned long *flags) {} +static inline void psi_memstall_enter(unsigned long *flags, int mem_state) {} static inline void psi_memstall_leave(unsigned long *flags) {} #ifdef CONFIG_CGROUPS diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 07aaf9b..194ea78 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -9,6 +9,8 @@ #ifdef CONFIG_PSI +#define PSI_MASK(x) ((1UL << (x))-1) + /* Tracked task states */ enum psi_task_count { NR_IOWAIT, @@ -22,6 +24,10 @@ enum psi_task_count { #define TSK_MEMSTALL (1 << NR_MEMSTALL) #define TSK_RUNNING (1 << NR_RUNNING) +#define TSK_COUNT_MASK PSI_MASK(NR_PSI_TASK_COUNTS) +#define TSK_COUNT_SHIFT 8 + + /* Resources that workloads could be stalled on */ enum psi_res { PSI_IO, @@ -53,6 +59,27 @@ enum psi_aggregators { NR_PSI_AGGREGATORS, }; +/* Causes for mem pressure */ +enum psi_memstall_states { + PSI_MEM_KSWAPD, + PSI_MEM_DRECALAIM, + PSI_MEM_KCOMPACTED, + PSI_MEM_DCOMPACT, + PSI_MEM_CGROUP, + PSI_MEM_SWAP, + PSI_MEM_WORKINGSET, + PSI_MEM_STATES, +}; + +#define TSK_MEMSTALL_SHIFT 8 +#define TSK_MEMSTALL_KSWAPD (1 << (PSI_MEM_KSWAPD + TSK_MEMSTALL_SHIFT)) +#define TSK_MEMSTALL_DRECLAIM (1 << (PSI_MEM_KCOMPACTED + TSK_MEMSTALL_SHIFT)) +#define TSK_MEMSTALL_KCOMPACTED (1 << (PSI_MEM_DCOMPACT + TSK_MEMSTALL_SHIFT)) +#define TSK_MEMSTALL_DCOMPACT (1 << (PSI_MEM_CGROUP + TSK_MEMSTALL_SHIFT)) +#define TSK_MEMSTALL_CGROUP (1 << (PSI_MEM_DRECALAIM + TSK_MEMSTALL_SHIFT)) +#define TSK_MEMSTALL_WORKINGSET (1 << (PSI_MEM_WORKINGSET + TSK_MEMSTALL_SHIFT)) +#define TSK_MEMSTALL_MASK (PSI_MASK(TSK_MEMSTALL_SHIFT) << TSK_COUNT_SHIFT) + struct psi_group_cpu { /* 1st cacheline updated by the scheduler */ @@ -64,9 +91,11 @@ struct psi_group_cpu { /* Aggregate pressure state derived from the tasks */ u32 state_mask; + u32 state_memstall; /* Period time sampling buckets for each state of interest (ns) */ u32 times[NR_PSI_STATES]; + u32 times_mem[PSI_MEM_STATES]; /* Time of last task change in this group (rq_clock) */ u64 state_start; @@ -76,6 +105,7 @@ struct psi_group_cpu { /* Delta detection against the sampling buckets */ u32 times_prev[NR_PSI_AGGREGATORS][NR_PSI_STATES] ____cacheline_aligned_in_smp; + u32 times_mem_prev[PSI_MEM_STATES]; }; /* PSI growth tracking window */ @@ -144,6 +174,10 @@ struct psi_group { u64 total[NR_PSI_AGGREGATORS][NR_PSI_STATES - 1]; unsigned long avg[NR_PSI_STATES - 1][3]; + u64 total_mems[PSI_MEM_STATES - 1]; + unsigned long avg_mems[PSI_MEM_STATES - 1][3]; + u64 avg_total_mems[PSI_MEM_STATES - 1]; + /* Monitor work control */ atomic_t poll_scheduled; struct kthread_worker __rcu *poll_kworker; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 806fc9d..b50ab92 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3613,6 +3613,13 @@ static int cgroup_memory_pressure_show(struct seq_file *seq, void *v) return psi_show(seq, psi, PSI_MEM); } +static int cgroup_memory_pressure_stat_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgroup = seq_css(seq)->cgroup; + struct psi_group *psi = cgroup->id == 1 ? &psi_system : &cgroup->psi; + + return psi_mem_pressure_stat_show(seq, psi); +} static int cgroup_cpu_pressure_show(struct seq_file *seq, void *v) { struct cgroup *cgroup = seq_css(seq)->cgroup; @@ -4930,6 +4937,10 @@ static struct cftype cgroup_base_files[] = { .poll = cgroup_pressure_poll, .release = cgroup_pressure_release, }, + { + .name = "memory.pressure.stat", + .seq_show = cgroup_memory_pressure_stat_show, + }, { .name = "cpu.pressure", .seq_show = cgroup_cpu_pressure_show, diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 9154e74..072d535 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -279,6 +279,35 @@ static void get_recent_times(struct psi_group *group, int cpu, } } +static void get_recent_mem_times(struct psi_group *group, int cpu, u32 *times_mem) +{ + struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu); + u64 now, state_start; + enum psi_memstall_states s; + unsigned int seq; + u32 state_mask; + + do { + seq = read_seqcount_begin(&groupc->seq); + now = cpu_clock(cpu); + memcpy(times_mem, groupc->times_mem, sizeof(groupc->times_mem)); + state_mask = groupc->state_mask; + state_start = groupc->state_start; + } while (read_seqcount_retry(&groupc->seq, seq)); + + for (s = 0; s < PSI_MEM_STATES; s++) { + u32 delta; + + if (state_mask & (1 << s)) + times_mem[s] += now - state_start; + + delta = times_mem[s] - groupc->times_mem_prev[s]; + groupc->times_mem_prev[s] = times_mem[s]; + + times_mem[s] = delta; + } +} + static void calc_avgs(unsigned long avg[3], int missed_periods, u64 time, u64 period) { @@ -304,6 +333,7 @@ static void collect_percpu_times(struct psi_group *group, u32 *pchanged_states) { u64 deltas[NR_PSI_STATES - 1] = { 0, }; + u64 delta_mems[PSI_MEM_STATES - 1] = { 0, }; unsigned long nonidle_total = 0; u32 changed_states = 0; int cpu; @@ -319,11 +349,16 @@ static void collect_percpu_times(struct psi_group *group, */ for_each_possible_cpu(cpu) { u32 times[NR_PSI_STATES]; + u32 times_mem[PSI_MEM_STATES]; + u32 nonidle; u32 cpu_changed_states; get_recent_times(group, cpu, aggregator, times, &cpu_changed_states); + if (times[PSI_MEM_SOME]) + get_recent_mem_times(group, cpu, times_mem); + changed_states |= cpu_changed_states; nonidle = nsecs_to_jiffies(times[PSI_NONIDLE]); @@ -350,6 +385,10 @@ static void collect_percpu_times(struct psi_group *group, group->total[aggregator][s] += div_u64(deltas[s], max(nonidle_total, 1UL)); + for (s = 0; s < PSI_MEM_STATES - 1; s++) + group->total_mems[s] += + div_u64(delta_mems[s], max(nonidle_total, 1UL)); + if (pchanged_states) *pchanged_states = changed_states; } @@ -404,6 +443,16 @@ static u64 update_averages(struct psi_group *group, u64 now) calc_avgs(group->avg[s], missed_periods, sample, period); } + for (s = 0; s < PSI_MEM_STATES - 1; s++) { + u32 sample; + + sample = group->total_mems[s] - group->avg_total_mems[s]; + if (sample > period) + sample = period; + group->avg_total_mems[s] += sample; + calc_avgs(group->avg_mems[s], missed_periods, sample, period); + } + return avg_next_update; } @@ -628,6 +677,7 @@ static void record_times(struct psi_group_cpu *groupc, int cpu, { u32 delta; u64 now; + int state_memstall = groupc->state_memstall; now = cpu_clock(cpu); delta = now - groupc->state_start; @@ -641,6 +691,7 @@ static void record_times(struct psi_group_cpu *groupc, int cpu, if (groupc->state_mask & (1 << PSI_MEM_SOME)) { groupc->times[PSI_MEM_SOME] += delta; + groupc->times_mem[state_memstall] += delta; if (groupc->state_mask & (1 << PSI_MEM_FULL)) groupc->times[PSI_MEM_FULL] += delta; else if (memstall_tick) { @@ -676,7 +727,12 @@ static u32 psi_group_change(struct psi_group *group, int cpu, unsigned int t, m; enum psi_states s; u32 state_mask = 0; + u32 state_memstall = 0; + if (set & TSK_MEMSTALL) { + state_memstall = set & TSK_MEMSTALL_MASK; + set &= TSK_COUNT_MASK; + } groupc = per_cpu_ptr(group->pcpu, cpu); /* @@ -714,7 +770,7 @@ static u32 psi_group_change(struct psi_group *group, int cpu, state_mask |= (1 << s); } groupc->state_mask = state_mask; - + groupc->state_memstall = state_memstall; write_seqcount_end(&groupc->seq); return state_mask; @@ -810,7 +866,7 @@ void psi_memstall_tick(struct task_struct *task, int cpu) * Marks the calling task as being stalled due to a lack of memory, * such as waiting for a refault or performing reclaim. */ -void psi_memstall_enter(unsigned long *flags) +void psi_memstall_enter(unsigned long *flags, int mem_state) { struct rq_flags rf; struct rq *rq; @@ -829,7 +885,7 @@ void psi_memstall_enter(unsigned long *flags) rq = this_rq_lock_irq(&rf); current->flags |= PF_MEMSTALL; - psi_task_change(current, 0, TSK_MEMSTALL); + psi_task_change(current, 0, TSK_MEMSTALL | mem_state); rq_unlock_irq(rq, &rf); } @@ -840,7 +896,7 @@ void psi_memstall_enter(unsigned long *flags) * * Marks the calling task as no longer stalled due to lack of memory. */ -void psi_memstall_leave(unsigned long *flags) +void psi_memstall_leave(unsigned long *flags, int mem_state) { struct rq_flags rf; struct rq *rq; @@ -858,7 +914,7 @@ void psi_memstall_leave(unsigned long *flags) rq = this_rq_lock_irq(&rf); current->flags &= ~PF_MEMSTALL; - psi_task_change(current, TSK_MEMSTALL, 0); + psi_task_change(current, TSK_MEMSTALL | mem_state, 0); rq_unlock_irq(rq, &rf); } @@ -974,6 +1030,53 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res) return 0; } +const char * const memstall_text[] = { + "kswapd", + "direct reclaim", + "kcompacted", + "direct compact", + "cgroup reclaim", + "swap", + "workingset", +}; + +int psi_mem_pressure_stat_show(struct seq_file *m, void *v) +{ + int s; + u64 now; + struct psi_group *group = &psi_system; + + if (static_branch_likely(&psi_disabled)) + return -EOPNOTSUPP; + + mutex_lock(&group->avgs_lock); + now = sched_clock(); + collect_percpu_times(group, PSI_AVGS, NULL); + if (now >= group->avg_next_update) + group->avg_next_update = update_averages(group, now); + mutex_unlock(&group->avgs_lock); + + for (s = 0; s < PSI_MEM_STATES; s++) { + unsigned long avg[3]; + u64 total; + int w; + + for (w = 0; w < 3; w++) + avg[w] = group->avg_mems[s][w]; + + total = div_u64(group->total_mems[PSI_AVGS], NSEC_PER_USEC); + + seq_printf(m, "%s avg10=%lu.%02lu avg60=%lu.%02lu avg300=%lu.%02lu total=%llu\n", + memstall_text[s], + LOAD_INT(avg[0]), LOAD_FRAC(avg[0]), + LOAD_INT(avg[1]), LOAD_FRAC(avg[1]), + LOAD_INT(avg[2]), LOAD_FRAC(avg[2]), + total); + } + + return 0; +} + static int psi_io_show(struct seq_file *m, void *v) { return psi_show(m, &psi_system, PSI_IO); @@ -998,7 +1101,10 @@ static int psi_memory_open(struct inode *inode, struct file *file) { return single_open(file, psi_memory_show, NULL); } - +static int psi_memory_stat_open(struct inode *inode, struct file *file) +{ + return single_open(file, psi_mem_pressure_stat_show, NULL); +} static int psi_cpu_open(struct inode *inode, struct file *file) { return single_open(file, psi_cpu_show, NULL); @@ -1271,7 +1377,12 @@ static const struct file_operations psi_memory_fops = { .poll = psi_fop_poll, .release = psi_fop_release, }; - +static const struct file_operations psi_memory_stat_fops = { + .open = psi_memory_stat_open, + .read = seq_read, + .llseek = seq_lseek, + .release = psi_fop_release, +}; static const struct file_operations psi_cpu_fops = { .open = psi_cpu_open, .read = seq_read, @@ -1286,6 +1397,7 @@ static int __init psi_proc_init(void) proc_mkdir("pressure", NULL); proc_create("pressure/io", 0, NULL, &psi_io_fops); proc_create("pressure/memory", 0, NULL, &psi_memory_fops); + proc_create("pressure/memory_stat", 0, NULL, &psi_memory_stat_fops); proc_create("pressure/cpu", 0, NULL, &psi_cpu_fops); return 0; } From patchwork Mon Aug 1 00:42:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: CGEL X-Patchwork-Id: 12933297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E63DDC00140 for ; Mon, 1 Aug 2022 00:42:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37DCF8E0003; Sun, 31 Jul 2022 20:42:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 330048E0001; Sun, 31 Jul 2022 20:42:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A8F38E0003; Sun, 31 Jul 2022 20:42:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0DB8F8E0001 for ; Sun, 31 Jul 2022 20:42:19 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DC89F140727 for ; Mon, 1 Aug 2022 00:42:18 +0000 (UTC) X-FDA: 79749172356.24.458E8AD Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf28.hostedemail.com (Postfix) with ESMTP id 74BFEC00DB for ; Mon, 1 Aug 2022 00:42:18 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id y15so9057755plp.10 for ; Sun, 31 Jul 2022 17:42:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=RbadYO9E/asHxUYJikjza7R0dOsDISfb5MjlorKw2oc=; b=H1ZiojRFbDNAVIzmBrO4nbe2XA9r3pCBrYFADRVvas4QcDgtYUzHjB3jdKa/Lple6Q rPjIspdvep0qyE+55lRFf9yUKKSDaLvgNZ8cAtUHQ5bUIIfh5cYKRrpCQhUDv4UISRRI HQhdiiShg7vCw2v08MfWD+uq4QhLi3TgwhREB8zIT5kkDCo5TuPtdOSfMDQrJtJUm+M7 Yjh3f4dT29uttttcUUxKbaqm49vns3oOB9LvyhoS1wXNLnzt2QJ0jepq3Hc0nAAovglT DexYCT3BcuRKRIfqXqO2/fJy05fr6Z4udGb+bMBgM+4+r0x77HuTu3sYhoyCN2zKrzIM 7CIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=RbadYO9E/asHxUYJikjza7R0dOsDISfb5MjlorKw2oc=; b=pObSOdiqmJfX950Q5BW7Nqk5FcSVMbHzFY9qXnA8B9qda8/hALnE9uNSxZBbWGRxYh TKn7pyROm0ApW0I3TAzHgkKP/zpQoukEOmN7tIhgBGNpjrUNZhB76JwfxcSOGUoSB24L IV9sj9tD2X9U9mb4+mEDPT9hsOaM4M+CTwg7w8xl1zPHPdZhpxQxqthUvVXwWNowBKFt fa4P/46X/D5BEN9Ru1BmtxJES3drCLCngzR6CTUzzuCo6vO4Df2tUST09Gku+42mS2dS Uhg4avTiaixPdQs05E/IeyslJOyLwG1gf9AHb2a8MeLiqZw/q9Wf0nF/8t5y3hvJlBki GqWg== X-Gm-Message-State: ACgBeo3vWmcQ9uVbwVdSV9k6mSGnfLWOC1dBZBHfcqAk2Qs1crQODXuO 0JuAuhZuSab/D4jGHaadRn/x7HqOl/Q= X-Google-Smtp-Source: AA6agR5MVhPWpD890K6kq56we5BsMb7LTXXDUY9lUvPd73p1N6Ho7hg1gLBHnjfeUmWt+KP9VACGZg== X-Received: by 2002:a17:90a:530d:b0:1f4:d38a:503d with SMTP id x13-20020a17090a530d00b001f4d38a503dmr9840456pjh.180.1659314537279; Sun, 31 Jul 2022 17:42:17 -0700 (PDT) Received: from localhost.localdomain ([193.203.214.57]) by smtp.gmail.com with ESMTPSA id q6-20020a656846000000b0041c04286010sm1255471pgt.83.2022.07.31.17.42.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jul 2022 17:42:16 -0700 (PDT) From: cgel.zte@gmail.com X-Google-Original-From: ran.xiaokai@zte.com.cn To: hannes@cmpxchg.org, akpm@linux-foundation.org, tj@kernel.org, axboe@kernel.dk, vdavydov.dev@gmail.com Cc: ran.xiaokai@zte.com.cn, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgel Subject: [RFC PATCH 2/2] psi: account for memory stall types Date: Mon, 1 Aug 2022 00:42:05 +0000 Message-Id: <20220801004205.1593100-2-ran.xiaokai@zte.com.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220801004205.1593100-1-ran.xiaokai@zte.com.cn> References: <20220801004205.1593100-1-ran.xiaokai@zte.com.cn> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659314538; a=rsa-sha256; cv=none; b=047NK2XNVR6/pqBgqz5RYWSiiX3o11b46YfgxfFJUstIal9UXspun35Ti5wcuhMHB27p5D EHyMxzDW8rf6BHKV6FuZ7nhIPhGgceVy5g+ACB7hIMH4ytCMo46Pv+IBviTIjUCcRu3OKO bn1knTD7w7dog1kvAGCPC60lZEZetV4= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=H1ZiojRF; spf=pass (imf28.hostedemail.com: domain of cgel.zte@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=cgel.zte@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659314538; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RbadYO9E/asHxUYJikjza7R0dOsDISfb5MjlorKw2oc=; b=nm3BSk2IljD7LfV+0eF+Rh7qdLNaVSQkKUKJ29uiTanxHTjKBlC4HaJm/4mGKutDCbyM+v mqj+0OtJooY5AceDF9xylVNHtjOkJ6F5/DXU/gsZ2vCubCnIJ4xgBRc0DLraZ8ycpdhrZL zEK5pmHZLSk1hnnzFEpUGmuCQzcNoXM= X-Stat-Signature: yc547daqe997hkb559eqdpcmrhksmt93 X-Rspamd-Queue-Id: 74BFEC00DB X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=H1ZiojRF; spf=pass (imf28.hostedemail.com: domain of cgel.zte@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=cgel.zte@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam03 X-HE-Tag: 1659314538-621708 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: cgel As psi can tell the reason of the memstall, add a second argumnet of psi_memstall_enter() to identify the reason of the memstall. Signed-off-by: cgel --- block/blk-cgroup.c | 4 ++-- block/blk-core.c | 4 ++-- include/linux/psi_types.h | 6 +++--- mm/compaction.c | 4 ++-- mm/filemap.c | 4 ++-- mm/memcontrol.c | 4 ++-- mm/page_alloc.c | 8 ++++---- mm/vmscan.c | 8 ++++---- 8 files changed, 21 insertions(+), 21 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 3d34ac0..857898f 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1732,7 +1732,7 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) delay_nsec = min_t(u64, delay_nsec, 250 * NSEC_PER_MSEC); if (use_memdelay) - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_WORKINGSET); exp = ktime_add_ns(now, delay_nsec); tok = io_schedule_prepare(); @@ -1744,7 +1744,7 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) io_schedule_finish(tok); if (use_memdelay) - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_WORKINGSET); } /** diff --git a/block/blk-core.c b/block/blk-core.c index d221322..ebbbe49 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1185,12 +1185,12 @@ blk_qc_t submit_bio(struct bio *bio) * submission can be a significant part of overall IO time. */ if (workingset_read) - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_WORKINGSET); ret = generic_make_request(bio); if (workingset_read) - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_WORKINGSET); return ret; } diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 194ea78..8200623 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -174,9 +174,9 @@ struct psi_group { u64 total[NR_PSI_AGGREGATORS][NR_PSI_STATES - 1]; unsigned long avg[NR_PSI_STATES - 1][3]; - u64 total_mems[PSI_MEM_STATES - 1]; - unsigned long avg_mems[PSI_MEM_STATES - 1][3]; - u64 avg_total_mems[PSI_MEM_STATES - 1]; + u64 total_mems[PSI_MEM_STATES]; + unsigned long avg_mems[PSI_MEM_STATES][3]; + u64 avg_total_mems[PSI_MEM_STATES]; /* Monitor work control */ atomic_t poll_scheduled; diff --git a/mm/compaction.c b/mm/compaction.c index 903aea9..62d1416 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2664,9 +2664,9 @@ static int kcompactd(void *p) wait_event_freezable(pgdat->kcompactd_wait, kcompactd_work_requested(pgdat)); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_KCOMPACTED); kcompactd_do_work(pgdat); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_KCOMPACTED); } return 0; diff --git a/mm/filemap.c b/mm/filemap.c index 3d43769..cf08388 100755 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1151,7 +1151,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, delayacct_thrashing_start(); delayacct = true; } - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_WORKINGSET); thrashing = true; } @@ -1210,7 +1210,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, if (thrashing) { if (delayacct) delayacct_thrashing_end(); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_WORKINGSET); } /* diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 94e9a1c..fab06b7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2589,9 +2589,9 @@ void mem_cgroup_handle_over_high(void) * schedule_timeout_killable sets TASK_KILLABLE). This means we don't * need to account for any ill-begotten jiffies to pay them off later. */ - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_CGROUP); schedule_timeout_killable(penalty_jiffies); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_CGROUP); out: css_put(&memcg->css); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ef077b8..52d86c2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3988,7 +3988,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, if (!order) return NULL; - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_DCOMPACT); delayacct_compact_start(); noreclaim_flag = memalloc_noreclaim_save(); @@ -3996,7 +3996,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, prio, &page); memalloc_noreclaim_restore(noreclaim_flag); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_DCOMPACT); delayacct_compact_end(); /* @@ -4212,7 +4212,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, /* We now go into synchronous reclaim */ cpuset_memory_pressure_bump(); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_DRECLAIM); fs_reclaim_acquire(gfp_mask); noreclaim_flag = memalloc_noreclaim_save(); @@ -4221,7 +4221,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, memalloc_noreclaim_restore(noreclaim_flag); fs_reclaim_release(gfp_mask); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_DRECLAIM); cond_resched(); diff --git a/mm/vmscan.c b/mm/vmscan.c index 075da44..c2038b4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3416,13 +3416,13 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, trace_mm_vmscan_memcg_reclaim_begin(0, sc.gfp_mask); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_CGROUP); noreclaim_flag = memalloc_noreclaim_save(); nr_reclaimed = do_try_to_free_pages(zonelist, &sc); memalloc_noreclaim_restore(noreclaim_flag); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_CGROUP); trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed); set_task_reclaim_state(current, NULL); @@ -3794,7 +3794,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) }; set_task_reclaim_state(current, &sc.reclaim_state); - psi_memstall_enter(&pflags); + psi_memstall_enter(&pflags, TSK_MEMSTALL_KSWAPD); __fs_reclaim_acquire(); count_vm_event(PAGEOUTRUN); @@ -3973,7 +3973,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) snapshot_refaults(NULL, pgdat); __fs_reclaim_release(); - psi_memstall_leave(&pflags); + psi_memstall_leave(&pflags, TSK_MEMSTALL_KSWAPD); set_task_reclaim_state(current, NULL); /*