From patchwork Wed Apr 17 07:47:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhaoyang Huang X-Patchwork-Id: 10904725 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69611922 for ; Wed, 17 Apr 2019 07:48:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 55C29289BD for ; Wed, 17 Apr 2019 07:48:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4748D289F7; Wed, 17 Apr 2019 07:48:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 47F58289BD for ; Wed, 17 Apr 2019 07:48:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAB846B0007; Wed, 17 Apr 2019 03:48:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E5B4A6B0008; Wed, 17 Apr 2019 03:48:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFE746B000A; Wed, 17 Apr 2019 03:48:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 8FBF86B0007 for ; Wed, 17 Apr 2019 03:48:07 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id n5so14204984pgk.9 for ; Wed, 17 Apr 2019 00:48:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id; bh=LfsYQs+r2tnN2nPN5NZItQW9ZKUgXqJ0T9e/uFdLPiY=; b=Olb8u3XFbV2y+TYvblFCey9Wn3FMFjA+kkol4cMFM/lksRDJndvyyFvZxeCeTJuJ7L jk1BhC2VCbW4q6z6lPZfuwoHhFKuCXVlJ/s6WgSun63Qip3eO7zrPhG7XVqwGg/IRii2 Fzwd4DR6eyymu3sLdt0AMdUmYwpMOlcoKR9m2XDuhYyIU8MPCemBMBw/NHP7QvyhDHxj l/be5YL/37S34iFTPKjTbJCakqmnFXvFM0NwTHlsInOe9qHLZSB+ZjJlUQ8fmhpm4G+n Si9O9gorui2FRXt24yFh7q4u7hUrTM+GAVSnvGOJnKKjNf7ONmVYYQInEOtLZxgUX3ah qcRw== X-Gm-Message-State: APjAAAV4Gsme5qpBZyh2ULibWY+Hg0O/vMzMvH8E7oQbhS0YV3Fl1Q/n KPGpHA0/IP8ff70ELSP2kLmS4y+8LvwPPfMV5hoK1R0T3aDkRLpMgxsEQzGKSdEUXOhuxcjWphQ xA+4IhErTkpYDP/7F17t5Bs23dbjE7uPYfl1+PJFebWV26ilsfhDnvyEnjGBTyX+tsA== X-Received: by 2002:aa7:8212:: with SMTP id k18mr87065870pfi.50.1555487287092; Wed, 17 Apr 2019 00:48:07 -0700 (PDT) X-Received: by 2002:aa7:8212:: with SMTP id k18mr87065759pfi.50.1555487285582; Wed, 17 Apr 2019 00:48:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555487285; cv=none; d=google.com; s=arc-20160816; b=FKAwINtC/OWfrQDSXJZRspyNkpLJCusKSxwTwOX8Cb+oRUQyLZXWyb8QKEnmCSfHs4 o2QsSC8jb6Q2RcekSZvwEbBFfgn9EAqLXhtRTv6A94DOseZjbVt+dMUtdZRtD5190KIn L4vQN95XXYCqItrke0DjbO6IAppBYUJ93PYieszuI/qZPDI7ptHqBvN5lP2WVbLBiQP7 z2yS/MuJDDxyDbvhI4gtnnDzKY+b7aiT+ys71PKPcLQ0LAw1Krcm88lrw7Ho4DnKBTxE 5BZxdqdk5XokKah+6vjekryRZyaLqqmmouuInlHFEPh0ciSaTU+nFwHlQlpzlaflS6iu 9p2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:to:from:dkim-signature; bh=LfsYQs+r2tnN2nPN5NZItQW9ZKUgXqJ0T9e/uFdLPiY=; b=SqL70oHXSVl9klIn5+IMquobs/KM8/J4oMOTdi7ly6EETjHt/r1gFArUaRz5SJKy5c jk9XhuxPy1BLYzy+bGWpAgWcE0UP6HZXiZcMaw++22JwQhRFcKWLLgjuy+5SEnZl7vLC 7Rp6QfitvsNbnslHdf2+dszYjjdNHHmrfedDih+q+X2dMgcTkrbpKYhMJ4ECI/o7A2MU AOQRi6KRBZ59RCj89Q3DMDgcPHWNYXyeV5bnEdh3VJYkqWbyu+5XXbMgeWGaD06voFL4 vT5nQiO8bF+mDwTvC0R3Uxvny5ajtX75RQ51C9ARNi8MwSbD2JVIHVc0sWO/XXeaiN+m FACQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=VmpcECcu; spf=pass (google.com: domain of huangzhaoyang@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id a23sor52250049pgm.45.2019.04.17.00.48.05 for (Google Transport Security); Wed, 17 Apr 2019 00:48:05 -0700 (PDT) Received-SPF: pass (google.com: domain of huangzhaoyang@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=VmpcECcu; spf=pass (google.com: domain of huangzhaoyang@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=LfsYQs+r2tnN2nPN5NZItQW9ZKUgXqJ0T9e/uFdLPiY=; b=VmpcECcu4C5T2Rnde10C8wLlIAYxhtSqmj2T2hV0ePtGGpv82fdb1wTE2ol530YYzA WjGl2LfoXI5a2jh+XaXGwAFTdJS9RENa59RQIU5lb1llPsCyXH3FFbmgdg2dIobiRR3T nFQOljo0HfWK7pymJb5KyK7ULIGnS02hNjeta0GcS3rYr+Ry9199HFVk+qZv3Bcqpa6R HNzOLcJuphRFu+ARQvclXmTk5uN7mlEAgiEXJL4ag2NKsAhvD1lGsXJAMEPtAVj0JsDG vkYGb/hWtc8hJFfNA90gcR98eMEt9/NKbB3iC9R24uTMRAupuRdwPS7evcmsV6wk4iEr 0qYg== X-Google-Smtp-Source: APXvYqz8y7B5nZnJ9khyT03/ezgoBpuG7zDcrpzde5cNla5TubiA+8UgyaCXRAsEkAyQyU47l62xuw== X-Received: by 2002:a63:4847:: with SMTP id x7mr82038251pgk.233.1555487285046; Wed, 17 Apr 2019 00:48:05 -0700 (PDT) Received: from bj03382pcu.spreadtrum.com ([117.18.48.82]) by smtp.gmail.com with ESMTPSA id m8sm62714444pgn.59.2019.04.17.00.47.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 17 Apr 2019 00:48:04 -0700 (PDT) From: Zhaoyang Huang To: Andrew Morton , Vlastimil Babka , Pavel Tatashin , Joonsoo Kim , David Rientjes , Zhaoyang Huang , Roman Gushchin , Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH] mm/workingset : judge file page activity via timestamp Date: Wed, 17 Apr 2019 15:47:26 +0800 Message-Id: <1555487246-15764-1-git-send-email-huangzhaoyang@gmail.com> X-Mailer: git-send-email 1.7.9.5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zhaoyang Huang This patch introduce timestamp into workingset's entry and judge if the page is active or inactive via active_file/refault_ratio instead of refault distance. The original thought is coming from the logs we got from trace_printk in this patch, we can find about 1/5 of the file pages' refault are under the scenario[1],which will be counted as inactive as they have a long refault distance in between access. However, we can also know from the time information that the page refault quickly as comparing to the average refault time which is calculated by the number of active file and refault ratio. We want to save these kinds of pages from evicted earlier as it used to be. The refault ratio is the value which can reflect lru's average file access frequency and also can be deemed as a prediction of future. The patch is tested on an android system and reduce 30% of page faults, while 60% of the pages remain the original status as (refault_distance < active_file) indicates. Pages status got from ftrace during the test can refer to [2]. [1] system_server workingset_refault: WKST_ACT[0]:rft_dis 265976, act_file 34268 rft_ratio 3047 rft_time 0 avg_rft_time 11 refault 295592 eviction 29616 secs 97 pre_secs 97 HwBinder:922 workingset_refault: WKST_ACT[0]:rft_dis 264478, act_file 35037 rft_ratio 3070 rft_time 2 avg_rft_time 11 refault 310078 eviction 45600 secs 101 pre_secs 99 [2] WKST_ACT[0]: original--INACTIVE commit--ACTIVE WKST_ACT[1]: original--ACTIVE commit--ACTIVE WKST_INACT[0]: original--INACTIVE commit--INACTIVE WKST_INACT[1]: original--ACTIVE commit--INACTIVE Signed-off-by: Zhaoyang Huang --- include/linux/mmzone.h | 1 + mm/workingset.c | 120 +++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 112 insertions(+), 9 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 32699b2..6f30673 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -240,6 +240,7 @@ struct lruvec { atomic_long_t inactive_age; /* Refaults at the time of last reclaim cycle */ unsigned long refaults; + atomic_long_t refaults_ratio; #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif diff --git a/mm/workingset.c b/mm/workingset.c index 40ee02c..66c177b 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -160,6 +160,21 @@ MEM_CGROUP_ID_SHIFT) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) +#ifdef CONFIG_64BIT +#define EVICTION_SECS_POS_SHIFT 20 +#define EVICTION_SECS_SHRINK_SHIFT 4 +#define EVICTION_SECS_POS_MASK ((1UL << EVICTION_SECS_POS_SHIFT) - 1) +#else +#ifndef CONFIG_MEMCG +#define EVICTION_SECS_POS_SHIFT 12 +#define EVICTION_SECS_SHRINK_SHIFT 4 +#define EVICTION_SECS_POS_MASK ((1UL << EVICTION_SECS_POS_SHIFT) - 1) +#else +#define EVICTION_SECS_POS_SHIFT 0 +#define EVICTION_SECS_SHRINK_SHIFT 0 +#define NO_SECS_IN_WORKINGSET +#endif +#endif /* * Eviction timestamps need to be able to cover the full range of * actionable refaults. However, bits are tight in the radix tree @@ -169,10 +184,54 @@ * evictions into coarser buckets by shaving off lower timestamp bits. */ static unsigned int bucket_order __read_mostly; - +#ifdef NO_SECS_IN_WORKINGSET +static void pack_secs(unsigned long *peviction) { } +static unsigned int unpack_secs(unsigned long entry) {return 0; } +#else +/* + * Shrink the timestamp according to its value and store it together + * with the shrink size in the entry. + */ +static void pack_secs(unsigned long *peviction) +{ + unsigned int secs; + unsigned long eviction; + int order; + int secs_shrink_size; + struct timespec ts; + + get_monotonic_boottime(&ts); + secs = (unsigned int)ts.tv_sec ? (unsigned int)ts.tv_sec : 1; + order = get_count_order(secs); + secs_shrink_size = (order <= EVICTION_SECS_POS_SHIFT) + ? 0 : (order - EVICTION_SECS_POS_SHIFT); + + eviction = *peviction; + eviction = (eviction << EVICTION_SECS_POS_SHIFT) + | ((secs >> secs_shrink_size) & EVICTION_SECS_POS_MASK); + eviction = (eviction << EVICTION_SECS_SHRINK_SHIFT) | (secs_shrink_size & 0xf); + *peviction = eviction; +} +/* + * Unpack the second from the entry and restore the value according to the + * shrink size. + */ +static unsigned int unpack_secs(unsigned long entry) +{ + unsigned int secs; + int secs_shrink_size; + + secs_shrink_size = entry & ((1 << EVICTION_SECS_SHRINK_SHIFT) - 1); + entry >>= EVICTION_SECS_SHRINK_SHIFT; + secs = entry & EVICTION_SECS_POS_MASK; + secs = secs << secs_shrink_size; + return secs; +} +#endif static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction) { eviction >>= bucket_order; + pack_secs(&eviction); eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid; eviction = (eviction << NODES_SHIFT) | pgdat->node_id; eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT); @@ -181,20 +240,24 @@ static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction) } static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, - unsigned long *evictionp) + unsigned long *evictionp, unsigned int *prev_secs) { unsigned long entry = (unsigned long)shadow; int memcgid, nid; + unsigned int secs; entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT; nid = entry & ((1UL << NODES_SHIFT) - 1); entry >>= NODES_SHIFT; memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1); entry >>= MEM_CGROUP_ID_SHIFT; + secs = unpack_secs(entry); + entry >>= (EVICTION_SECS_POS_SHIFT + EVICTION_SECS_SHRINK_SHIFT); *memcgidp = memcgid; *pgdat = NODE_DATA(nid); *evictionp = entry << bucket_order; + *prev_secs = secs; } /** @@ -242,9 +305,22 @@ bool workingset_refault(void *shadow) unsigned long refault; struct pglist_data *pgdat; int memcgid; +#ifndef NO_SECS_IN_WORKINGSET + unsigned long avg_refault_time; + unsigned long refault_time; + int tradition; + unsigned int prev_secs; + unsigned int secs; + unsigned long refaults_ratio; +#endif + struct timespec ts; + /* + convert jiffies to second + */ + get_monotonic_boottime(&ts); + secs = (unsigned int)ts.tv_sec ? (unsigned int)ts.tv_sec : 1; - unpack_shadow(shadow, &memcgid, &pgdat, &eviction); - + unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &prev_secs); rcu_read_lock(); /* * Look up the memcg associated with the stored ID. It might @@ -288,14 +364,37 @@ bool workingset_refault(void *shadow) * list is not a problem. */ refault_distance = (refault - eviction) & EVICTION_MASK; - inc_lruvec_state(lruvec, WORKINGSET_REFAULT); - - if (refault_distance <= active_file) { +#ifndef NO_SECS_IN_WORKINGSET + refaults_ratio = (atomic_long_read(&lruvec->inactive_age) + 1) / secs; + atomic_long_set(&lruvec->refaults_ratio, refaults_ratio); + refault_time = secs - prev_secs; + avg_refault_time = active_file / refaults_ratio; + tradition = !!(refault_distance < active_file); + if (refault_time <= avg_refault_time) { +#else + if (refault_distance < active_file) { +#endif inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE); +#ifndef NO_SECS_IN_WORKINGSET + trace_printk("WKST_ACT[%d]:rft_dis %ld, act_file %ld \ + rft_ratio %ld rft_time %ld avg_rft_time %ld \ + refault %ld eviction %ld secs %d pre_secs %d\n", + tradition, refault_distance, active_file, + refaults_ratio, refault_time, avg_refault_time, + refault, eviction, secs, prev_secs); +#endif rcu_read_unlock(); return true; } +#ifndef NO_SECS_IN_WORKINGSET + trace_printk("WKST_INACT[%d]:rft_dis %ld, act_file %ld \ + rft_ratio %ld rft_time %ld avg_rft_time %ld \ + refault %ld eviction %ld secs %d pre_secs %d\n", + tradition, refault_distance, active_file, + refaults_ratio, refault_time, avg_refault_time, + refault, eviction, secs, prev_secs); +#endif rcu_read_unlock(); return false; } @@ -513,7 +612,9 @@ static int __init workingset_init(void) unsigned int max_order; int ret; - BUILD_BUG_ON(BITS_PER_LONG < EVICTION_SHIFT); + BUILD_BUG_ON(BITS_PER_LONG < (EVICTION_SHIFT + + EVICTION_SECS_POS_SHIFT + + EVICTION_SECS_SHRINK_SHIFT)); /* * Calculate the eviction bucket size to cover the longest * actionable refault distance, which is currently half of @@ -521,7 +622,8 @@ static int __init workingset_init(void) * some more pages at runtime, so keep working with up to * double the initial memory by using totalram_pages as-is. */ - timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT; + timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT + - EVICTION_SECS_POS_SHIFT - EVICTION_SECS_SHRINK_SHIFT; max_order = fls_long(totalram_pages - 1); if (max_order > timestamp_bits) bucket_order = max_order - timestamp_bits;