[RFC] mm: activate access-more-than-once page via NUMA balancing

One idea behind the LRU page reclaiming algorithm is to put the
access-once pages in the inactive list and access-more-than-once pages
in the active list.  This is true for the file pages that are accessed
via syscall (read()/write(), etc.), but not for the pages accessed via
the page tables.  We can only activate them via page reclaim scanning
now.  This may cause some problems.  For example, even if there are
only hot file pages accessed via the page tables in the inactive list,
we will enable the cache trim mode incorrectly to scan only the hot
file pages instead of cold anon pages.

This can be improved via NUMA balancing.  Where, the page tables of
all processes will be scanned gradually to trap the page accesses.
With that, we can identify whether a page in the inactive list has
been accessed at least twice.  If so, we can activate the page to
leave only the access-once pages in the inactive list.  This patch
implements this.

It may sound overkill to enable NUMA balancing only to activate some
pages.  But firstly, if you have used NUMA balancing already, the
added overhead is negligible.  Secondly, this patch is only the first
step to take advantage of the NUMA balancing to optimize the page
reclaiming.  We may improve the page reclaim further with the help of
the NUMA balancing.  For example, we have implemented a way to measure
the page hot/cold via NUMA balancing in

https://lore.kernel.org/linux-mm/20210311081821.138467-5-ying.huang@intel.com/

That may help to improve the LRU algorithm.  For example, instead of
migrating from PMEM to DRAM, the hot pages can be put at the head of
the active list (or a separate hot page list) to make it easier to
reclaim the cold pages at the tail of the LRU.

This patch is inspired by the work done by Yu Zhao in the
Multigenerational LRU patchset as follows,

https://lore.kernel.org/linux-mm/20210313075747.3781593-1-yuzhao@google.com/

It may be possible to combine some ideas from the multi-generational
LRU patchset with the NUMA balancing page table scanning to improve
the LRU page reclaiming algorithm.  Compared with the page table
scanning method used in the multi-generational LRU patchset, the page
tables can be scanned much slower via NUMA balancing, because the page
faults instead of the Accessed bit is used to trap the page accesses.
This can reduce the peak overhead of scanning.

To show the effect of the patch, we designed a test as follows,

On a system with 128 GB DRAM and 2 NVMe disks as swap,

  * Run the workload A with about 60 GB hot anon pages.

  * After 100 seconds, run the workload B with about 58 GB cold anon
    pages (accessed-once).

  * After another 200 second, run the workload C with about 57 GB hot
    anon pages.

It’s desirable that the 58 GB cold pages of the workload B will be
swapped out to accommodate the 57 GB memory of the workload C.

The test results are as follows,

			         base	      patched
Pages swapped in (GB)		  2.3		  0.0
Pages swapped out (GB)		 59.0		 55.9
Pages scanned (GB)		296.7		172.5
Avg length of active
list (GB)			 18.1		 58.4
Avg length of inactive
list (GB)			 89.1		 48.4

Because the size of the cold workload B (58 GB) is larger than the
size of the workload C, it’s desirable that the accessed-once pages of
workload B will be reclaimed to accommodate the workload C, so that
there should be no pages to be swapped in.  But in the base kernel,
because the pages of the workload A are scanned before that of the
workload B, some hot pages (~2.3 GB) from the workload A will be
swapped out wrongly.  While in the patched kernel, the pages of
workload A will be activated to the active list beforehand, so the
pages swapped in reduces greatly (~14.2 MB).  Because the size of
inactive list is much shorter in the patched kernel, to reclaim pages
for the workload C, the pages scanned is much less too (172.5 GB
vs. 296.7 GB).

As always, the VM subsystem is complex, any change may cause some
regressions.  We have observed some for this patch too.  The
fundamental effect of the patch is to reduce the size of inactive list
to reduce the scanning overhead and improve scanning correctness.  But
in some situations, the long inactive list in the base kernel (not
patched) can help performance.  Because it will take longer to scan
a (not so) hot page twice, to make it easier to distinguish the hot
and cold pages.  But generally, I don't think it is a good idea to
improve the performance via increasing the system overhead purely.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Inspired-by: Yu Zhao <yuzhao@google.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Yang Shi <shy828301@gmail.com>
---
 mm/memory.c | 7 +++++++
 1 file changed, 7 insertions(+)

Message ID	20210324083209.527427-1-ying.huang@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=u5hr=IW=kvack.org=owner-linux-mm@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7200CC433C1 for <linux-mm@archiver.kernel.org>; Wed, 24 Mar 2021 08:39:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DC252619F3 for <linux-mm@archiver.kernel.org>; Wed, 24 Mar 2021 08:39:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC252619F3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3AC3B6B0289; Wed, 24 Mar 2021 04:39:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 35C276B028D; Wed, 24 Mar 2021 04:39:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FD266B0292; Wed, 24 Mar 2021 04:39:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 005326B0289 for <linux-mm@kvack.org>; Wed, 24 Mar 2021 04:39:46 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id ADB82180ACF7F for <linux-mm@kvack.org>; Wed, 24 Mar 2021 08:39:46 +0000 (UTC) X-FDA: 77954119572.21.4641BAE Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf09.hostedemail.com (Postfix) with ESMTP id DA4816000106 for <linux-mm@kvack.org>; Wed, 24 Mar 2021 08:39:44 +0000 (UTC) IronPort-SDR: /gDfw0ixu+MdG9gkEaK2Dxd1FJmjcEYKtdP8PdQ25wndg8d/pn6r0asmFIZeDhqUzh402YQwFE QlhCI/Rpi6ow== X-IronPort-AV: E=McAfee;i="6000,8403,9932"; a="190069547" X-IronPort-AV: E=Sophos;i="5.81,274,1610438400"; d="scan'208";a="190069547" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2021 01:39:43 -0700 IronPort-SDR: /1Hn1Gw25ZH8DZ0wrqQYR8IE94GRz/aOqMJX7UflhUHsaH0rukH/1JnjNa40VhuvDsunlnrZ19 bn837bul6Wvw== X-IronPort-AV: E=Sophos;i="5.81,274,1610438400"; d="scan'208";a="415386079" Received: from yhuang6-desk1.sh.intel.com ([10.239.13.1]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Mar 2021 01:39:39 -0700 From: Huang Ying <ying.huang@intel.com> To: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org, Huang Ying <ying.huang@intel.com>, Yu Zhao <yuzhao@google.com>, Hillf Danton <hdanton@sina.com>, Johannes Weiner <hannes@cmpxchg.org>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Matthew Wilcox <willy@infradead.org>, Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>, Vlastimil Babka <vbabka@suse.cz>, Wei Yang <richard.weiyang@linux.alibaba.com>, Yang Shi <shy828301@gmail.com> Subject: [RFC] mm: activate access-more-than-once page via NUMA balancing Date: Wed, 24 Mar 2021 16:32:09 +0800 Message-Id: <20210324083209.527427-1-ying.huang@intel.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Stat-Signature: po3fkaphtuqs38n9743yamo7rfyba8y6 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: DA4816000106 Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from="<ying.huang@intel.com>"; helo=mga14.intel.com; client-ip=192.55.52.115 X-HE-DKIM-Result: none/none X-HE-Tag: 1616575184-370787 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org>
Series	[RFC] mm: activate access-more-than-once page via NUMA balancing \| expand [RFC] mm: activate access-more-than-once page via NUMA balancing

[RFC] mm: activate access-more-than-once page via NUMA balancing

Commit Message

Comments

Patch