From patchwork Tue Sep 12 18:45:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13382045 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77FD0EE3F0C for ; Tue, 12 Sep 2023 18:45:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2F7F6B0146; Tue, 12 Sep 2023 14:45:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ADFAB6B0147; Tue, 12 Sep 2023 14:45:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9595F6B0148; Tue, 12 Sep 2023 14:45:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7D76B6B0146 for ; Tue, 12 Sep 2023 14:45:37 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 481348031B for ; Tue, 12 Sep 2023 18:45:37 +0000 (UTC) X-FDA: 81228823914.10.35BFAD2 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf30.hostedemail.com (Postfix) with ESMTP id 44C9F80009 for ; Tue, 12 Sep 2023 18:45:35 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=DrhLUi0F; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694544335; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WpcCE6VGUaQK+vP2ZyE+C/o0M3YSvV14np9x7oLHb84=; b=Oj3vtcH9CMQyh8QE5WQFRNpXTygA83YvWorHOhAmIJgfvVJIw6EffxLn7SM4ye9rX/qbRd eDo/W8tp6jzHSUhk+JrVMv09nnEyEwdCSJdVKxizaSX4CQz5pUm37gLJRJK958oQrvQfAn P2utZowwkJLfu5pzD6H7DemvIIH8Dls= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=DrhLUi0F; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694544335; a=rsa-sha256; cv=none; b=QrTyliAAy1SI/Nu6ADDzm8CDzttep2vvHYBDI5YRDTo0ZH3R9LYiEscPh1OLudxb6WC+y0 DdoOpwSRh2qAX8YW6Y5eR6OtNgRbzy0CI9mmk8aLQSY0AJAgInh3jk6ZgIJNX3f91cJUZj 2KIEjOT+hBLcw7O7OtqPg2lWR1tl5RY= Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-68fb2e9ebcdso2209186b3a.2 for ; Tue, 12 Sep 2023 11:45:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694544333; x=1695149133; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=WpcCE6VGUaQK+vP2ZyE+C/o0M3YSvV14np9x7oLHb84=; b=DrhLUi0FemiSfzRymijIxRSY1brUCCz4ZgujlP3kfrbZtqDbQvMXL6XjvSUuEtNsT+ u2RG/HEE2lozwWy+HgN4n40yDne3XItp/K0CQQtB1fTSytWBBUC/s/aVvHhNMCrSQq6U WlSGaaMW4f27JXtIAevLavn1KnFhdmNngUEFKcZeleycw2RTx6Q3nTTde68fTKHeCRW9 xZE3gMBxr5wZ1gMqRkC0quMUhn1xhyzZTYxHv/6qHGHIXVVNrENWdv1PURepmurbpOWb hcEa8Yhkqy8ESoc6rGj92J/h8CgKMljpJMtkKP3gzufa32Y525rtGWv30k8V6gUY+NZk 1zIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694544333; x=1695149133; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=WpcCE6VGUaQK+vP2ZyE+C/o0M3YSvV14np9x7oLHb84=; b=M2+vJ6gRM0gvxLBkR2m1Ejev2FScqG93Q2ZjSL0BgCMiV/0N1VQh4oUm5qTHrShb5e klj436gqTAKV0+t0SbfhSkzxr0wxWFQWkEqf8HvsCefmF6umwaTh2/NlaDjio9VBnuiB bwvytGQSMfFqQxiLibbuxt+tbPwhKrw3VehwaZOvZBrCTQaKlx923Hy0t/5wx4GBByP2 9R4DBTOWD7J6EmekXZCdiWH/QzCKdXxVwaIL5D5zxIuOPsWLuZsJ24DCJGdc8FFgYJXZ QxXCiuyjRMTn6qjMeSG1iMvaA/wfRUhMFP0wwVRy27f61kvvpO6W/9GqNiv8qvKqS6oY WdCg== X-Gm-Message-State: AOJu0Yx1j7/GdgJ0Wt6I7uB7h02/D6Z6we7K3Z9/y37+dUPNfQS+BoX7 jkGtQ/wBmhdUmVkOl97tEQF9BXJc4jyGrCJcj0rM8g== X-Google-Smtp-Source: AGHT+IEm4bvuWb4oGiATA9YUnFzVWnGUla2Pt3ufXJ+3xcdk67KYg1JqlOaUbZ7I9H58UDpxhG5Ubg== X-Received: by 2002:a05:6a20:160c:b0:13e:7a0a:36d8 with SMTP id l12-20020a056a20160c00b0013e7a0a36d8mr351769pzj.9.1694544332817; Tue, 12 Sep 2023 11:45:32 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id q18-20020a63bc12000000b00553b9e0510esm7390605pge.60.2023.09.12.11.45.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 12 Sep 2023 11:45:31 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.orng, Kairui Song Subject: [RFC PATCH v2 1/5] workingset: simplify and use a more intuitive model Date: Wed, 13 Sep 2023 02:45:07 +0800 Message-ID: <20230912184511.49333-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912184511.49333-1-ryncsn@gmail.com> References: <20230912184511.49333-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 44C9F80009 X-Rspam-User: X-Stat-Signature: 6cd8sfw1iqhxxdkgzefcoohhtjh4xb7z X-Rspamd-Server: rspam01 X-HE-Tag: 1694544335-45023 X-HE-Meta: U2FsdGVkX19HyksDVx1Wx1QKt8rcGMY/db2gegJRyUlugrKPr+vVYmeeJ30G/VHyS7RH8MPytRDCk0RkWqBH9C6GB2ayUUn/Mf/85tirGAVaXxeukTBa8GPsyls2oOLGzN5OjXBA9FSxHte8wix2r5Z4Nuv7hH93Rft6dyZ2UQ51I517W2mu5crLC2BrzPLt4mTCo3MkR1kC94seQFWXiukF3Lane9GyLqxPnb2g2FoEH0SdoqZ8eV76+HuRgryXCjpaLb4vaQT+2vrS1JZM+poECq5T32UpkW00YFu79Md5l6QO9xV16eqqX7kmDLiEgI4IHGHjPrUwiBudSw9cwrQhaC4wJmrDnB/jT5JYA2fUwGdQfdShF7sJgwxo7Px5LCpBRV+G0fQki5M8Xyv0zZcLICOBl+pOwlwPxRvdHPDGfjLlIvgAhUeGwNdeCJ13dqAFuiVbHwh4UBtWPZKBN2QgYWow2k6dto+dDH9w7sfLcUZ8LPIn+9Vj2oXSyRKCDc3K3sUG69i+0U2OTgwjzQm5yxPkCjUYTGsXffkJR/Rem7MlIcNfPCMG6osyilfocSc8pJqMdLIYoPmhq3dAjw8dkC95TLQxa1HjNw4bV9kZYDKHGmlvj0HfbnNqaqDqAclzr18hrmnucCCVPQ3gy4KIgqyko7rrmOe8nAOXR+CwqdCDrUm7a3ChaU0Ge5n7G2pBmKEPn12sHouZeqaARf8/RIBRTrC9o7BPYzqaMlCTszdDeB2OkFs4an4+h8bUPGkSuMHPh38cWKEfcgGJYVZGgw6Rx+2JWMedpyQ1qNwiKOsAmJAxLqrFHZmuahJU1ID4jwUSynZp/ilx73n8HBuM40c0xfc6ge9nIB4tcJV/I8QCK5bt6pRXDtobJ51QbbpKNkHtK/GgHbrgfwOaVcSfhN3T5VEIFj99QbZkCNLC/S4C1lY8+4F5wYRoTmaznW9cVFA8U7ro1Uy6zJ/ m4/kKepW SWybTyGQ9lT1Z1Uup6UtfjoLGZfpkYsS8frwtD3p9CtiIm/ioxzUv9naFgZrO7nksiDq0IaH7vqdGHqF6tkBZOJT+PaVv12SmNYUqcC5ZBDT/JpXlPZVGClTf3RcB2pbASDUQgr8dMnVtB1O29oXEB4bgT5HJxMxlMMiEZFqgcb75CsRWIZ7Gng7APw3HZaZwKG1z/7TAiok0t3QDGGIhAdqXE8bmLDogw43U12i8uNZMi2oBJh4fovyv77O9k39SJpVUGK4Bi3fnzpjY6dIrrPMei/Em0uYZ9eSmwvgYYWp+S5Ew4dTKmiD6FmimVJ7jG1pm4jIvrKhH8VipLu9Qm7TlrwK7rLkX0u6xBd3VFvFctHXuE4+il1ktVTeCV77s4ECwR8Yhr4XdnIeVz1DfKv0U34xDUtooXQ1vKAizHLjW9r9B1qbNcwqPxLBD4+/VzBcgB8f5s16egl2KtoG6tqf3vs1/0niIeDYNrRBCFDY94bwEEHFv9qvNCRYWN2Owwv2bbc2wsj2YNZH4HbTVNZC2NNiN4UzRTFsAgdLRH4n0dVqyWp2iTc09hFQM8ew+/ESiR7iSQzpgHS32gr/CMbUq32mRAnoJ47Cx3ijFBq9o0eW+uwaSQbXg8aFtOmx8nyfz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song This basically removed workingset_activation and reduced calls to workingset_age_nonresident. The idea behind this change is a new way to calculate the refault distance and prepare for adapting refault distance based re-activation for multi-gen LRU. Currently, refault distance re-activation is based on two assumptions: 1. Activation of an inactive page will left-shift LRU pages (considering LRU starts from right). 2. Eviction of an inactive page will left-shift LRU pages. Assumption 2 is correct, but assumption 1 is not always true, an activated page could be anywhere in the LRU list (through mark_page_accessed), it only left-shift the pages on its right. And besides, one page can get activate/deactivated for multiple times. And multi-gen LRU doesn't fit with this model well, pages are getting aged and activated constantly as the generation sliding window slides. So instead we introduce a simpler idea here: Just presume the evicted pages are still in memory, each has an eviction sequence like before. Let the `nonresistence_age` still be NA and get increased for each eviction, so we get a "Shadow LRU" here of one evicted page: Let SP = ((NA's reading @ current) - (NA's reading @ eviction)) +-memory available to cache-+ | | +-------------------------+===============+===========+ | * shadows O O O | INACTIVE | ACTIVE | +-+-----------------------+===============+===========+ | | +-----------------------+ | SP fault page O -> Hole left by previously faulted in pages * -> The page corresponding to SP It can be easily seen that SP stands for how far the current workflow could push a page out of available memory. Since all evicted page was once head of INACTIVE list, the page could have such an access distance: SP + NR_INACTIVE It *may* get re-activated before getting evicted again if: SP + NR_INACTIVE < NR_INACTIVE + NR_ACTIVE Which can be simplified to: SP < NR_ACTIVE Then the page is worth getting re-activated to start from ACTIVE part, since the access distance is shorter than the total memory to make it stay. The calculation is same as before, just dropped the assumption 1 above. And since this is only an estimation, based on several hypotheses, and it could break the ability of LRU to distinguish a workingset out of caches, so throttle this by two factors: 1. Notice previously re-faulted in pages may leave "holes" on the shadow part of LRU, that part is left unhandled on purpose to decrease re-activate rate for pages that have a large SP value (the larger SP value a page has, the more likely it will be affected by such holes). 2. When the ACTIVE part of LRU is long enough, chanllaging ACTIVE pages by re-activating a one-time faulted previously INACTIVE page may not be a good idea, so throttle the re-activation when ACTIVE > INACTIVE by comparing with INACTIVE instead. Combined all above, we have: Upon refault, if any of following conditions is met, mark page as active: - If ACTIVE LRU is low (NR_ACTIVE < NR_INACTIVE), check if: SP < NR_ACTIVE - If ACTIVE LRU is high (NR_ACTIVE >= NR_INACTIVE), check if: SP < NR_INACTIVE The code is almose same but simpler than before, since no longer need to do lruvec statistic update when activating a page. A few benchmarks showed a similar or better result. And when combined with multi-gen LRU (in later commits) it shows a measurable performance gain for some workloads. Using memtier and fio test from commit ac35a4902374 but scaled down to fit in my test environment, and some other test results: memtier test (with 16G ramdisk as swap and 2G memcg limit on an i7-9700): memcached -u nobody -m 16384 -s /tmp/memcached.socket \ -a 0700 -t 12 -B binary & memtier_benchmark -S /tmp/memcached.socket -P memcache_binary -n allkeys\ --key-minimum=1 --key-maximum=24000000 --key-pattern=P:P -c 1 \ -t 12 --ratio 1:0 --pipeline 8 -d 2000 -x 6 fio test 1 (with 16G ramdisk on 28G VM on an i7-9700): fio -name=refault --numjobs=12 --directory=/mnt --size=1024m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=5m --runtime=5m --group_reporting fio test 2 (with 16G ramdisk on 28G VM on an i7-9700): fio -name=mglru --numjobs=10 --directory=/mnt --size=1536m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=zipf:1.2 --norandommap \ --time_based --ramp_time=10m --runtime=5m --group_reporting mysql (using oltp_read_only from sysbench, with 12G of buffer pool in a 10G memcg): sysbench /usr/share/sysbench/oltp_read_only.lua \ --mysql-db=sb --tables=36 --table-size=2000000 --threads=12 --time=1800 Before (Average of 6 test run): fio: IOPS=5213.7k fio2: IOPS=7315.3k memcached: 49493.75 ops/s mysql: 6237.45 tps After (Average of 6 test run): fio: IOPS=5230.5k fio2: IOPS=7349.3k memcached: 49912.79 ops/s mysql: 6240.62 tps Signed-off-by: Kairui Song --- include/linux/swap.h | 2 - mm/swap.c | 1 - mm/vmscan.c | 2 - mm/workingset.c | 215 +++++++++++++++++++++---------------------- 4 files changed, 106 insertions(+), 114 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 493487ed7c38..ca51d79842b7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -344,10 +344,8 @@ static inline swp_entry_t page_swap_entry(struct page *page) /* linux/mm/workingset.c */ bool workingset_test_recent(void *shadow, bool file, bool *workingset); -void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages); void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg); void workingset_refault(struct folio *folio, void *shadow); -void workingset_activation(struct folio *folio); /* Only track the nodes of mappings with shadow entries */ void workingset_update_node(struct xa_node *node); diff --git a/mm/swap.c b/mm/swap.c index cd8f0150ba3a..685b446fd4f9 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -482,7 +482,6 @@ void folio_mark_accessed(struct folio *folio) else __lru_cache_activate_folio(folio); folio_clear_referenced(folio); - workingset_activation(folio); } if (folio_test_idle(folio)) folio_clear_idle(folio); diff --git a/mm/vmscan.c b/mm/vmscan.c index 6f13394b112e..3f4de75e5186 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2539,8 +2539,6 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, lruvec_add_folio(lruvec, folio); nr_pages = folio_nr_pages(folio); nr_moved += nr_pages; - if (folio_test_active(folio)) - workingset_age_nonresident(lruvec, nr_pages); } /* diff --git a/mm/workingset.c b/mm/workingset.c index da58a26d0d4d..babda11601ea 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -180,9 +180,10 @@ */ #define WORKINGSET_SHIFT 1 -#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ +#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ WORKINGSET_SHIFT + NODES_SHIFT + \ MEM_CGROUP_ID_SHIFT) +#define EVICTION_BITS (BITS_PER_LONG - (EVICTION_SHIFT)) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) /* @@ -226,8 +227,103 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, *workingsetp = workingset; } -#ifdef CONFIG_LRU_GEN +/* + * Get the distance reading at eviction time. + */ +static inline unsigned long lru_eviction(struct lruvec *lruvec, + int bits, int bucket_order) +{ + unsigned long eviction = atomic_long_read(&lruvec->nonresident_age); + + eviction >>= bucket_order; + eviction &= ~0UL >> (BITS_PER_LONG - bits); + + return eviction; +} + +/* + * Calculate and test refault distance + */ +static inline bool lru_refault(struct mem_cgroup *memcg, + struct lruvec *lruvec, + unsigned long eviction, bool file, + int bits, int bucket_order) +{ + unsigned long refault, distance; + unsigned long workingset, active, inactive, inactive_file, inactive_anon = 0; + + eviction <<= bucket_order; + refault = atomic_long_read(&lruvec->nonresident_age); + + /* + * The unsigned subtraction here gives an accurate distance + * across nonresident_age overflows in most cases. There is a + * special case: usually, shadow entries have a short lifetime + * and are either refaulted or reclaimed along with the inode + * before they get too old. But it is not impossible for the + * nonresident_age to lap a shadow entry in the field, which + * can then result in a false small refault distance, leading + * to a false activation should this old entry actually + * refault again. However, earlier kernels used to deactivate + * unconditionally with *every* reclaim invocation for the + * longest time, so the occasional inappropriate activation + * leading to pressure on the active list is not a problem. + */ + distance = (refault - eviction) & (~0UL >> (BITS_PER_LONG - bits)); + active = lruvec_page_state(lruvec, NR_ACTIVE_FILE); + inactive_file = lruvec_page_state(lruvec, NR_INACTIVE_FILE); + if (mem_cgroup_get_nr_swap_pages(memcg) > 0) { + active += lruvec_page_state(lruvec, NR_ACTIVE_ANON); + inactive_anon = lruvec_page_state(lruvec, NR_INACTIVE_ANON); + } + + /* + * Compare the distance to the existing workingset size. We + * don't activate pages that couldn't stay resident even if + * all the memory was available to the workingset. Whether + * workingset competition needs to consider anon or not depends + * on having free swap space. + * + * When there are already enough active pages, be less aggressive + * on reactivating pages, challenge an already established set of + * active pages with one time refaulted page may not be a good idea. + */ + if (active >= (inactive_anon + inactive_file)) + return distance < inactive_anon + inactive_file; + else + return distance < active + (file ? inactive_anon : inactive_file); +} + +/** + * workingset_age_nonresident - age non-resident entries as LRU ages + * @lruvec: the lruvec that was aged + * @nr_pages: the number of pages to count + * + * As in-memory pages are aged, non-resident pages need to be aged as + * well, in order for the refault distances later on to be comparable + * to the in-memory dimensions. This function allows reclaim and LRU + * operations to drive the non-resident aging along in parallel. + */ +static void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages) +{ + /* + * Reclaiming a cgroup means reclaiming all its children in a + * round-robin fashion. That means that each cgroup has an LRU + * order that is composed of the LRU orders of its child + * cgroups; and every page has an LRU position not just in the + * cgroup that owns it, but in all of that group's ancestors. + * + * So when the physical inactive list of a leaf cgroup ages, + * the virtual inactive lists of all its parents, including + * the root cgroup's, age as well. + */ + do { + atomic_long_add(nr_pages, &lruvec->nonresident_age); + } while ((lruvec = parent_lruvec(lruvec))); +} + +#ifdef CONFIG_LRU_GEN static void *lru_gen_eviction(struct folio *folio) { int hist; @@ -342,34 +438,6 @@ static void lru_gen_refault(struct folio *folio, void *shadow) #endif /* CONFIG_LRU_GEN */ -/** - * workingset_age_nonresident - age non-resident entries as LRU ages - * @lruvec: the lruvec that was aged - * @nr_pages: the number of pages to count - * - * As in-memory pages are aged, non-resident pages need to be aged as - * well, in order for the refault distances later on to be comparable - * to the in-memory dimensions. This function allows reclaim and LRU - * operations to drive the non-resident aging along in parallel. - */ -void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages) -{ - /* - * Reclaiming a cgroup means reclaiming all its children in a - * round-robin fashion. That means that each cgroup has an LRU - * order that is composed of the LRU orders of its child - * cgroups; and every page has an LRU position not just in the - * cgroup that owns it, but in all of that group's ancestors. - * - * So when the physical inactive list of a leaf cgroup ages, - * the virtual inactive lists of all its parents, including - * the root cgroup's, age as well. - */ - do { - atomic_long_add(nr_pages, &lruvec->nonresident_age); - } while ((lruvec = parent_lruvec(lruvec))); -} - /** * workingset_eviction - note the eviction of a folio from memory * @target_memcg: the cgroup that is causing the reclaim @@ -396,11 +464,11 @@ void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg) lruvec = mem_cgroup_lruvec(target_memcg, pgdat); /* XXX: target_memcg can be NULL, go through lruvec */ memcgid = mem_cgroup_id(lruvec_memcg(lruvec)); - eviction = atomic_long_read(&lruvec->nonresident_age); - eviction >>= bucket_order; + + eviction = lru_eviction(lruvec, EVICTION_BITS, bucket_order); workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(memcgid, pgdat, eviction, - folio_test_workingset(folio)); + folio_test_workingset(folio)); } /** @@ -418,9 +486,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) { struct mem_cgroup *eviction_memcg; struct lruvec *eviction_lruvec; - unsigned long refault_distance; - unsigned long workingset_size; - unsigned long refault; int memcgid; struct pglist_data *pgdat; unsigned long eviction; @@ -429,7 +494,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) return lru_gen_test_recent(shadow, file, &eviction_lruvec, &eviction, workingset); unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); - eviction <<= bucket_order; /* * Look up the memcg associated with the stored ID. It might @@ -450,50 +514,10 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) eviction_memcg = mem_cgroup_from_id(memcgid); if (!mem_cgroup_disabled() && !eviction_memcg) return false; - eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); - refault = atomic_long_read(&eviction_lruvec->nonresident_age); - - /* - * Calculate the refault distance - * - * The unsigned subtraction here gives an accurate distance - * across nonresident_age overflows in most cases. There is a - * special case: usually, shadow entries have a short lifetime - * and are either refaulted or reclaimed along with the inode - * before they get too old. But it is not impossible for the - * nonresident_age to lap a shadow entry in the field, which - * can then result in a false small refault distance, leading - * to a false activation should this old entry actually - * refault again. However, earlier kernels used to deactivate - * unconditionally with *every* reclaim invocation for the - * longest time, so the occasional inappropriate activation - * leading to pressure on the active list is not a problem. - */ - refault_distance = (refault - eviction) & EVICTION_MASK; - /* - * Compare the distance to the existing workingset size. We - * don't activate pages that couldn't stay resident even if - * all the memory was available to the workingset. Whether - * workingset competition needs to consider anon or not depends - * on having free swap space. - */ - workingset_size = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); - if (!file) { - workingset_size += lruvec_page_state(eviction_lruvec, - NR_INACTIVE_FILE); - } - if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { - workingset_size += lruvec_page_state(eviction_lruvec, - NR_ACTIVE_ANON); - if (file) { - workingset_size += lruvec_page_state(eviction_lruvec, - NR_INACTIVE_ANON); - } - } - - return refault_distance <= workingset_size; + return lru_refault(eviction_memcg, eviction_lruvec, eviction, file, + EVICTION_BITS, bucket_order); } /** @@ -543,7 +567,6 @@ void workingset_refault(struct folio *folio, void *shadow) goto out; folio_set_active(folio); - workingset_age_nonresident(lruvec, nr); mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + file, nr); /* Folio was active prior to eviction */ @@ -560,30 +583,6 @@ void workingset_refault(struct folio *folio, void *shadow) rcu_read_unlock(); } -/** - * workingset_activation - note a page activation - * @folio: Folio that is being activated. - */ -void workingset_activation(struct folio *folio) -{ - struct mem_cgroup *memcg; - - rcu_read_lock(); - /* - * Filter non-memcg pages here, e.g. unmap can call - * mark_page_accessed() on VDSO pages. - * - * XXX: See workingset_refault() - this should return - * root_mem_cgroup even for !CONFIG_MEMCG. - */ - memcg = folio_memcg_rcu(folio); - if (!mem_cgroup_disabled() && !memcg) - goto out; - workingset_age_nonresident(folio_lruvec(folio), folio_nr_pages(folio)); -out: - rcu_read_unlock(); -} - /* * Shadow entries reflect the share of the working set that does not * fit into memory, so their number depends on the access pattern of @@ -778,7 +777,6 @@ static struct lock_class_key shadow_nodes_key; static int __init workingset_init(void) { - unsigned int timestamp_bits; unsigned int max_order; int ret; @@ -790,12 +788,11 @@ static int __init workingset_init(void) * some more pages at runtime, so keep working with up to * double the initial memory by using totalram_pages as-is. */ - timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT; max_order = fls_long(totalram_pages() - 1); - if (max_order > timestamp_bits) - bucket_order = max_order - timestamp_bits; + if (max_order > EVICTION_BITS) + bucket_order = max_order - EVICTION_BITS; pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n", - timestamp_bits, max_order, bucket_order); + EVICTION_BITS, max_order, bucket_order); ret = prealloc_shrinker(&workingset_shadow_shrinker, "mm-shadow"); if (ret) From patchwork Tue Sep 12 18:45:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13382046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E61D4EE3F0B for ; Tue, 12 Sep 2023 18:45:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 479276B0148; Tue, 12 Sep 2023 14:45:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4290C6B0149; Tue, 12 Sep 2023 14:45:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27C276B014A; Tue, 12 Sep 2023 14:45:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 110AF6B0148 for ; Tue, 12 Sep 2023 14:45:41 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D61AA80CB5 for ; Tue, 12 Sep 2023 18:45:40 +0000 (UTC) X-FDA: 81228824040.30.3ADCE7B Received: from mail-oi1-f181.google.com (mail-oi1-f181.google.com [209.85.167.181]) by imf16.hostedemail.com (Postfix) with ESMTP id 004F1180020 for ; Tue, 12 Sep 2023 18:45:38 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=TsEJA3+x; spf=pass (imf16.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694544339; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gjfuji7UEHE75khrS+0yR4vY9yLEHy5K3t7jmjwKaUo=; b=VZXEvzXQtUC5ff8/rXWa9kXwuw27pfJGBzPN+z4eOYfBCyqQtm0AYW9GQ8lnY4vrSdeiue 31iErdogIor2cAsnD5eQCsdMcTkJWinT2hL1bpfiNjwg6M5gC1kT4DZ2ZlLq2+radiiObX CZtprmFfmMMQV/4axiuq7nSXKnm3Oqo= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=TsEJA3+x; spf=pass (imf16.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694544339; a=rsa-sha256; cv=none; b=nxAoay4YQ386M80vdN0smyvnhuJgRB2rGER15FesItLWdEb20KOdyhxZNszgY0qSAGQyc8 2a0gKRrFiprPmbbm3Sko05q0dnJSXS6xxcIwAKAkxh9+onxOwEo3sy7/m1Hx1qYpW05y32 epWpbBbC+rfeeBFDa1HM0SHiW8v2QNQ= Received: by mail-oi1-f181.google.com with SMTP id 5614622812f47-3a812843f0fso3682598b6e.2 for ; Tue, 12 Sep 2023 11:45:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694544337; x=1695149137; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=gjfuji7UEHE75khrS+0yR4vY9yLEHy5K3t7jmjwKaUo=; b=TsEJA3+xYSRTSoQvV9KYEE8ZuRZfOeLAzudzjWrtJA/Y+fmxLBSOPZfSsXV/uqGLMM C8a1uocYD9n7oRQDM2C+jwwZzpXZ8SehjPqvSOkMNGwG5ceGdaLF+X7MR5d9Lrk8ADH/ XD3zRBp1Jhx4GTW8fPruVZCf2Og20+mV04W3bYevohCIOp6VRoAbvU5dir4bkyqHJBFy CWHRXx4qyp41oalh/n65RBnXg8QGvGgR9SO8ignKsqqZHui2K7YL6cprRLM9hJ0ENPQ8 p7xxMhn85mPGtrx1ed8za98KYQvEQi4+4Oe2Aytz1colScjitWx5RBaTLdn+PgO4rbmS nGQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694544337; x=1695149137; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=gjfuji7UEHE75khrS+0yR4vY9yLEHy5K3t7jmjwKaUo=; b=i/Hic1Obx2OXoQPLs2eUjb5D7X7P/Q8VnL9hMzu3rTXd42jqNDqZRU8PHDNJL4Qjfd W5J06vozn53V+b7h6765CnQS0fP70ry5zO6E7RveZ0qJ/BFbkfhKm54VW60qaN2vDFyT R1ajNmmbj1sp6fTFTo96qx2rIltGUf7IK21wSW16WpFpKtDBJ9goC8rdNILb7W+/egJo GQdfCm5ubHHRCY2V0/SjcWTjjbdIiFmuQPc1fJ8YYDFiV7/paJz89i6W8fw75sJ5N83F bZ6SyAy5UST2tEYHpugZYB7TSUYjuwq+J/mIfBxgVEr48Qb+vfUpiqoNO3QsQ78vWDPE 27Hw== X-Gm-Message-State: AOJu0Yy4TeS3WRs70GMF0mjo8jI0m8eYn5rdPNSlog6aWlJwACuuVpRa 1U90YdU3ixVBcpNW7oc9+fmJ3NGU14bFK1yIBxKBsQ== X-Google-Smtp-Source: AGHT+IHyWdNnqK0qjnu7TwqtfVWChgp49JttbBNQpuhoyVR+YbsMFEjpzShFP3xExPsLxO1wSb3peg== X-Received: by 2002:a05:6808:13d5:b0:3a8:3d5b:aad6 with SMTP id d21-20020a05680813d500b003a83d5baad6mr667457oiw.55.1694544337174; Tue, 12 Sep 2023 11:45:37 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id q18-20020a63bc12000000b00553b9e0510esm7390605pge.60.2023.09.12.11.45.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 12 Sep 2023 11:45:36 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.orng, Kairui Song Subject: [RFC PATCH v2 2/5] workingset: update comment in workingset.c Date: Wed, 13 Sep 2023 02:45:08 +0800 Message-ID: <20230912184511.49333-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912184511.49333-1-ryncsn@gmail.com> References: <20230912184511.49333-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 004F1180020 X-Rspam-User: X-Stat-Signature: ycu7x6xu1kayznpjukxndgx1rirc8xzo X-Rspamd-Server: rspam01 X-HE-Tag: 1694544338-94537 X-HE-Meta: U2FsdGVkX1+8taV4VK4OLM3DTbaACHyG6YOtdHE7qEc5AzzYJePi37lJ8MOaEFOSiE1NF/Pp6Y8Ah5MidEZJTxsfR30UoqTt4uP355dv2KpUxZRCjZWBtdjd7EFOwcoGJbzDv8tyJq6oIWNa7PP9EZLTQkndC8b1Dte+jNCbs1y1zQdnCq/Y3wYFVTvFzc7QziD2acqDlFftFzLjdmUNdlg21q7LnB/LZOuSIsS+BJSKJK0CDEg87PN2q2xo6bwGY8UnUSJMfI7zi08JZCFGNeN73OEMON0bV2GFPnyCR4tGDaT9+cvWGmG5J/KhbL3gknaDhQgs4aAQb+wHKkb5wntv/AXA/15S0tULCVkHzTdOX2DTqUiuuRrDNwelgD4LCFyH/9ZwBZxIsk4fUqXd/S6qtPSZ3PVS/xpORC/rboS8iP7F3daEFl9V0V4sqnY0GEkL2ztYfumae0x82Ff8uC52++3F1uekoyqOL9aBzEA8zCcUL1lpYSIeEW6MVyWGze3IvkUZOBfzfagLjC3XhkxeZj5e9h9HTyxt+i3tMXFlA/C5Hq0eov0DPnElVq89uogYcUEnCuVocVFUvgWv53fNkSc1DDFTZmCdV4N4X7D17v5/5Ja2+PdvP51MJaiM0EYGAPQeYa9WSok7vAZyus5NT5h1tLtAQvO3H+vWCqYhXl8Q/4avFBMvLT0PHGXSdIrFyTSZ/jrMZPomVzTaH1m6WbVkY79gl9GllHE8RJV7E32cBGAIMNG7SK9j02kkSfUjUOOKTi6wgef5AY/n+TL+ht4wGdSAUx9GntJajuFNAF6vlO3GTsDrJ9eoFMJMuEAedqO04kzaNgxzDomcni4QxDvH/MvlF6k7dCEZHs339JhFXlh7zb3L7DrMuw3nyulXW/lN6FWIlXZGX7nZJrj4XzzzeZsmWdqVkuEmFh2VjHRGqbvsLGO6dnBVGIPf3BVW5ch6FceVOTAmpd2 3w/5D11D 4iljxG5glzaWQI0oEq2aVi7uCz4404vENci8yYQeXuHEYpojN0m7VQI6kytQUpaFgwme7DhmNLjhcHjz1O73ZFQ7saj7H9KRCgRT/i8Z0Ep3Dcxf1y0yhOflZbl5DKLuIrqOONqPn8ho9Pm/YuG+e5S6EJ1S+C6D23jQqjzj94p/+jMtoGO7+Qrz2Mybwj+jXS/BQdrb79bGffKczPnW4g4x10fUQwdOT/MqskR9p7GuXMFgU2VcSfHYdOZWH8eMmUx6jGM8WZn+t/4kktEf7UCWNrlcSBawyQa543Xw95KWJcUuw4NRrDIYnOYMyyJ3uSKySQ6SSbqZdUJ3aqvy313ffBffCKMzKGW3JK10+3uk9GH1W3xr75aUTXDRXlrpG52cwL3CPLs6rqDXiS30/XmFPwj5uy+D/aUsmd7rrHZT3iDEJaYFxiD0d5v8mtaZp954q7nWCnJUqHF3Py6kKR7Wb+A1j7kA/YNTnL/8y7T4qbNt1Juwnaw4Lu37C0IMsSq8gsEoFuY2Zsus0q0nRUfELYK2iK+YJz+dhz+pgSiswYpF/3XRUn2jCCHJeiGvbVLdwYgXEFl6lLrE0ktcJKQJvdgzKSBPHJaTd3tSUiPmZfcv0LITFz0rg0zfp7a49Uw8X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song Update the comments according to new implementation. Signed-off-by: Kairui Song --- mm/workingset.c | 98 ++++++++++++++++++++++--------------------------- 1 file changed, 44 insertions(+), 54 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index babda11601ea..b5c565a5a959 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -64,74 +64,64 @@ * thrashing on the inactive list, after which refaulting pages can be * activated optimistically to compete with the existing active pages. * - * Approximating inactive page access frequency - Observations: + * For such approximation, we introduce a counter `nonresistence_age` (NA) + * here. This counter increases each time a page is evicted, and each evicted + * page will have a shadow that stores the counter reading at the eviction + * time as a timestamp. So when an evicted page was faulted again, we have: * - * 1. When a page is accessed for the first time, it is added to the - * head of the inactive list, slides every existing inactive page - * towards the tail by one slot, and pushes the current tail page - * out of memory. + * Let SP = ((NA's reading @ current) - (NA's reading @ eviction)) * - * 2. When a page is accessed for the second time, it is promoted to - * the active list, shrinking the inactive list by one slot. This - * also slides all inactive pages that were faulted into the cache - * more recently than the activated page towards the tail of the - * inactive list. + * +-memory available to cache-+ + * | | + * +-------------------------+===============+===========+ + * | * shadows O O O | INACTIVE | ACTIVE | + * +-+-----------------------+===============+===========+ + * | | + * +-----------------------+ + * | SP + * fault page O -> Hole left by previously faulted in pages + * * -> The page corresponding to SP * - * Thus: + * Here SP can stands for how far the current workflow could push a page + * out of available memory. Since all evicted page was once head of + * INACTIVE list, the page could have such an access distance of: * - * 1. The sum of evictions and activations between any two points in - * time indicate the minimum number of inactive pages accessed in - * between. + * SP + NR_INACTIVE * - * 2. Moving one inactive page N page slots towards the tail of the - * list requires at least N inactive page accesses. + * So if: * - * Combining these: + * SP + NR_INACTIVE < NR_INACTIVE + NR_ACTIVE * - * 1. When a page is finally evicted from memory, the number of - * inactive pages accessed while the page was in cache is at least - * the number of page slots on the inactive list. + * Which can be simplified to: * - * 2. In addition, measuring the sum of evictions and activations (E) - * at the time of a page's eviction, and comparing it to another - * reading (R) at the time the page faults back into memory tells - * the minimum number of accesses while the page was not cached. - * This is called the refault distance. + * SP < NR_ACTIVE * - * Because the first access of the page was the fault and the second - * access the refault, we combine the in-cache distance with the - * out-of-cache distance to get the complete minimum access distance - * of this page: + * Then the page is worth getting re-activated to start from ACTIVE part, + * since the access distance is shorter than total memory to make it stay. * - * NR_inactive + (R - E) + * And since this is only an estimation, based on several hypotheses, and + * it could break the ability of LRU to distinguish a workingset out of + * caches, so throttle this by two factors: * - * And knowing the minimum access distance of a page, we can easily - * tell if the page would be able to stay in cache assuming all page - * slots in the cache were available: + * 1. Notice that re-faulted in pages may leave "holes" on the shadow + * part of LRU, that part is left unhandled on purpose to decrease + * re-activate rate for pages that have a large SP value (the larger + * SP value a page have, the more likely it will be affected by such + * holes). + * 2. When the ACTIVE part of LRU is long enough, chanllaging ACTIVE pages + * by re-activating a one-time faulted previously INACTIVE page may not + * be a good idea, so throttle the re-activation when ACTIVE > INACTIVE + * by comparing with INACTIVE instead. * - * NR_inactive + (R - E) <= NR_inactive + NR_active + * Combined all above, we have: + * Upon refault, if any of the following conditions is met, mark the page + * as active: * - * If we have swap we should consider about NR_inactive_anon and - * NR_active_anon, so for page cache and anonymous respectively: - * - * NR_inactive_file + (R - E) <= NR_inactive_file + NR_active_file - * + NR_inactive_anon + NR_active_anon - * - * NR_inactive_anon + (R - E) <= NR_inactive_anon + NR_active_anon - * + NR_inactive_file + NR_active_file - * - * Which can be further simplified to: - * - * (R - E) <= NR_active_file + NR_inactive_anon + NR_active_anon - * - * (R - E) <= NR_active_anon + NR_inactive_file + NR_active_file - * - * Put into words, the refault distance (out-of-cache) can be seen as - * a deficit in inactive list space (in-cache). If the inactive list - * had (R - E) more page slots, the page would not have been evicted - * in between accesses, but activated instead. And on a full system, - * the only thing eating into inactive list space is active pages. + * - If ACTIVE LRU is low (NR_ACTIVE < NR_INACTIVE), check if: + * SP < NR_ACTIVE * + * - If ACTIVE LRU is high (NR_ACTIVE >= NR_INACTIVE), check if: + * SP < NR_INACTIVE * * Refaulting inactive pages * From patchwork Tue Sep 12 18:45:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13382047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EC79EE3F0B for ; Tue, 12 Sep 2023 18:45:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16D776B014A; Tue, 12 Sep 2023 14:45:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11F2F6B014B; Tue, 12 Sep 2023 14:45:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFF616B014C; Tue, 12 Sep 2023 14:45:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DBB846B014A for ; Tue, 12 Sep 2023 14:45:44 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B3B9880288 for ; Tue, 12 Sep 2023 18:45:44 +0000 (UTC) X-FDA: 81228824208.08.2F37B4F Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by imf12.hostedemail.com (Postfix) with ESMTP id BFCF24001B for ; Tue, 12 Sep 2023 18:45:42 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=ary4hzCd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694544342; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9t2k/YGSbVN1+MHtdBH3OwcOic7suhow6uThzVwUsig=; b=8qbi2ckCnHMBaF1lzz7fEr7bzf0nl2FTuEpKHAVLoQK5av3hHaZmqPErdAKhnDxGgYKAEZ csgotIPHItE4Selqjmc+X1OAYGu4YURFELZndYQeaPoKgMfYTSdQytZpsrERbLvwX4cXV6 dDYQfi6y6Nln0wb5GHuozr0lk9Q2G08= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=ary4hzCd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694544342; a=rsa-sha256; cv=none; b=dHgEDNuS1UVhVjwD56bqLqS4Ai2CTfT6VHxmN7n67pUAChV6nLjUxNQerfJLhatK1Bz3ev Y6tfM5ttFGTdMs/OECuyXQiXwKaARU3HGFpkP9f8vcOJAxMi5tsDOH7+nN9KsVWdzVKhgw H+Ul1ZQNg8aYWWjwZ0dvcBoOcKMG4E0= Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-57759a5bc17so2143530a12.1 for ; Tue, 12 Sep 2023 11:45:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694544341; x=1695149141; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=9t2k/YGSbVN1+MHtdBH3OwcOic7suhow6uThzVwUsig=; b=ary4hzCdnI2MARhGFcZKYUGrvXKdctrMjPrO/hIwd+jMpAu493sF1BSCxmVAg4Vlc8 8m9MDc971HEGLMFrSgiSMkwOYB1U5/e0+l1W9PnUaa+cS6PoL8kpwIA4/Scyotlr2Fc+ 0HIkCafPNHRUBMXUxrVJi8CYXBrFgrTzjcLlueHIeBjmDeATSjLJVLsQ+gDmDvOCjKa/ thUQ24+Ry/lagnJ6VpO+EDWPK/VUHbVY6uAX1rwqlpTgPCdk3Vd0dbR/iMDhDOD1wyLF ZgExbPe/Aj+dSGY9UAK4wg5GcormaHnAZYaRFOl30hiTTygJCUvMuy0QyUT5FBLlK//s 5KBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694544341; x=1695149141; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9t2k/YGSbVN1+MHtdBH3OwcOic7suhow6uThzVwUsig=; b=rPIgmPrMWLtQ0m85UK/I1a4juL89jxOmw2B2iNZSEkHvMGG5mHlJ2MSxhuAfdkT+Kt rLOOei6L+f6O820UvgYgAJW9sAWSt5Rf9GAOmLmxkr/Q0oWllSmcB+CijcCPl0OcadvE XSzugDnsULgCN5mPbjytXGpXG8TN8k3NRU/g7f1ZcJYGBcWIY219zsx+cnZ/KJ95WuOO 2Pvu7v42WimyYE8OsIG9CubI22yYNKeF9YLWohQmvqWYjLgyxvoOdplLgPWhygBbfsVj jxXA0xMJPk8aSg3fU+ghT/buALPdb2oOz/+gLRGYGvXDgXpGJkUCt3s9XXU91TJrlZVi rcaQ== X-Gm-Message-State: AOJu0Yxj/6XK2HnU9TiRuiT3I9nCtCVpBjliz+h5w6bmVrXPUiQNhS9x nkkWLlHIMczcJpE3Rc7ld7r1UmzXEcK7ZrpaSVHZhg== X-Google-Smtp-Source: AGHT+IEAuA4/F1B0Lt9LeFUVwDamYyShSMqx6Vqa1EAIDmY4Te7Z5WUkjLcK04QCzmySN/XdckjO9A== X-Received: by 2002:a05:6a20:748e:b0:149:7fea:d89d with SMTP id p14-20020a056a20748e00b001497fead89dmr230203pzd.54.1694544340991; Tue, 12 Sep 2023 11:45:40 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id q18-20020a63bc12000000b00553b9e0510esm7390605pge.60.2023.09.12.11.45.37 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 12 Sep 2023 11:45:40 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.orng, Kairui Song Subject: [RFC PATCH v2 3/5] workingset: simplify lru_gen_test_recent Date: Wed, 13 Sep 2023 02:45:09 +0800 Message-ID: <20230912184511.49333-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912184511.49333-1-ryncsn@gmail.com> References: <20230912184511.49333-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: BFCF24001B X-Stat-Signature: 7y6ayiyfkka5tafwakjkd7oiifb3bf18 X-HE-Tag: 1694544342-875004 X-HE-Meta: U2FsdGVkX1+d/mgoj7928yMAsVBv51h43SAUELBNLvOcHeo8f2i284ihrB1iC24ODi44KNkp7PE781HxTglu+tC3qn4XK7Fs1vI/f2Ow39dL5sQqf/D0Sc5oxucxRZsvTRq750K4xBOnR/Fe5HR+6g6ljvYYf9VHIRcucQ/VbvJWHfTk9HyEcHMAS6ojrfQP+nv0FcU2uxue+Yb5/IOUZOSjPoyfVDYUAuQeICBRvOQcEl/6AS75YD5Aiv/tzAQdDdhERZs8POzqJnsKfrSGD0rxZpshQu5+UaAw9h8A1fIdp/vZ8p7IyvxNZGNYcc6q5na2gZQjuEvGhcTPrSD4zPuH6rvSOIuWXO+HIVIWe/ndR3sk8q40YV0fCIB4J73sIirLDBpTY5hmJue4rrIw27B2tTsxkCon1Aqpsiu1LHPj/DyZV2PxAfOHf6CdUw6WlTTQarPXJsqq0vJDv9RZFigr4RGC11Rq7zpcUuxC/bQY0Rvv18Owk7bvbY5yFQXfZdWn4cYtRyazpWwUsv7dGoc5XozyT8i+zRQWZ8iKbEoeXW2K2w4hADJfLiGPBuERLarXNPMyVTYwYqW8dhzKIZwyv5Xmx2iOMmo/pbSMPu6EC+KxuAmB9Fu+wa0l3mRIeK0sRKbY4it53BvP3ojJ74zpJfx3XfnhD9/S3F7jj2IfaLI1rmMMt/iE43EL0OJZc1hoU6jYLA5azFFMtXo2kxfnqUycgTKYIBSb+LR0kWBpQmIS6Qt5r0vvO/PMYhflB8vhgq2Cc7ShAhrJ3Rp1yuCWCBmrrqIRrIFVwDTeDOZ/qcRSutG05VPeL7jhieGkbItL4YvqWgMLEk+EVGI3Fv4QeXRlNbFHDu1lSpk0nNUK5jZXojZybRK4qgNcp4lS7BMKODNT2CkOz6/zv57ObWrqLErbSY9eBAWkSrjHrBwQq6Vo+ugj8/jhGVuoOQ95KNRm2iJ8BIZGrbMy2z8 30btSWau SemZAUjaWSlxUfq/B44FOz1F63lisb4DyMfXPovl8rVzchh5CepXQfdO2ljzExFDjv4dLc4orNehrInJ25VcMgHzsAmzfjbz5Nvop9c+C1uIOufONRIe/Z1ZKYz7LGBYz5nw+/qy/v4mhv5Ha0191nN3jcKiEUYhUoVd1UD+aBJt2IFZCGeRxE7XPtX274eaM0A2N+l0wxxel2debh0+HXAvCiCONTvWuzn18bmOWBRs4HnZ88cQlGgjv16/nhSnXZu7BO0vCONV+wTBJyBmjOm5MAMGh2mQ5YuLtX3BinN9hq9RBaIUR9rt37zd5JI72/MoKQ0n+xrvpL7E3NtYcrgkvWG9L1tDKK2VEaNauTFuF/F1DxnioNQGgKw9UCVXVjxeZk9d6rt3KgmQPCM+Ly48ZEzdlhRX+qrgTJkmep/rWKIt72yQYUBgmYVzCzKN3e5VnGiTer1hf+quV3dsqwaNHI4g5utyRXfwXhXFzFzvLqyMriDCxHEP9jw5AAozHG/P9EPletfnEJxolU9IQC+aytGilt1CmFzQSeTnH1MsjuW0swRIee1lwIhbyZuZCFkbwkLeJK7jB/trTeH3cdEV5ee7FFZyoMbnJKXfknVQW7xyMgu4yGYE498Vkxe6T3/ksmFbph8RJ/0yJwbY7L0wrwrHm8rsljPFlI4hhEyWrTT0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.005164, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song Simplify the code, move some common path into its caller, prepare for following commits. Signed-off-by: Kairui Song --- mm/workingset.c | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index b5c565a5a959..ff7587456b7f 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -345,42 +345,38 @@ static void *lru_gen_eviction(struct folio *folio) * Tests if the shadow entry is for a folio that was recently evicted. * Fills in @lruvec, @token, @workingset with the values unpacked from shadow. */ -static bool lru_gen_test_recent(void *shadow, bool file, struct lruvec **lruvec, - unsigned long *token, bool *workingset) +static bool lru_gen_test_recent(struct lruvec *lruvec, bool file, + unsigned long token) { - int memcg_id; unsigned long min_seq; - struct mem_cgroup *memcg; - struct pglist_data *pgdat; - unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset); - - memcg = mem_cgroup_from_id(memcg_id); - *lruvec = mem_cgroup_lruvec(memcg, pgdat); - - min_seq = READ_ONCE((*lruvec)->lrugen.min_seq[file]); - return (*token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); + min_seq = READ_ONCE(lruvec->lrugen.min_seq[file]); + return (token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); } static void lru_gen_refault(struct folio *folio, void *shadow) { + int memcgid; bool recent; - int hist, tier, refs; bool workingset; unsigned long token; + int hist, tier, refs; struct lruvec *lruvec; + struct pglist_data *pgdat; struct lru_gen_folio *lrugen; int type = folio_is_file_lru(folio); int delta = folio_nr_pages(folio); rcu_read_lock(); - recent = lru_gen_test_recent(shadow, type, &lruvec, &token, &workingset); + unpack_shadow(shadow, &memcgid, &pgdat, &token, &workingset); + lruvec = mem_cgroup_lruvec(mem_cgroup_from_id(memcgid), pgdat); if (lruvec != folio_lruvec(folio)) goto unlock; mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); + recent = lru_gen_test_recent(lruvec, type, token); if (!recent) goto unlock; @@ -480,9 +476,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) struct pglist_data *pgdat; unsigned long eviction; - if (lru_gen_enabled()) - return lru_gen_test_recent(shadow, file, &eviction_lruvec, &eviction, workingset); - unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); /* @@ -506,6 +499,9 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) return false; eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); + if (lru_gen_enabled()) + return lru_gen_test_recent(eviction_lruvec, file, eviction); + return lru_refault(eviction_memcg, eviction_lruvec, eviction, file, EVICTION_BITS, bucket_order); } From patchwork Tue Sep 12 18:45:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13382048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86178EE3F0B for ; Tue, 12 Sep 2023 18:45:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E3236B014C; Tue, 12 Sep 2023 14:45:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 190836B014D; Tue, 12 Sep 2023 14:45:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00A9A6B014E; Tue, 12 Sep 2023 14:45:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DEB426B014C for ; Tue, 12 Sep 2023 14:45:48 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BA1951CA929 for ; Tue, 12 Sep 2023 18:45:48 +0000 (UTC) X-FDA: 81228824376.22.AABBF49 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf13.hostedemail.com (Postfix) with ESMTP id DC0102000C for ; Tue, 12 Sep 2023 18:45:46 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=nFajmyWk; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694544346; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zmtAWomE9UHhxu4UCSwc3QxDlD+DtTBiD3PMp93+3PE=; b=g4USYwLhbS5GOIVbdlMy+L1JhZrupEoIoJKm+CJhQRCyeq9peuZq65Bk6uJRPfucP6klkP Xh3h6Z/FUxsdGJX0Gr+X8BpL+/R9WOnZJdp52i1RQ4JAlnVHmgNXHrmu3fkztuBoxglGZf XKK3y/g1hTvPOTzBwMzLN+J0P/80hio= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694544346; a=rsa-sha256; cv=none; b=b/RqnNIkzp81w5lmTMyPSzxS8HBuwd/hcBzaWdnFyc7lMZ4rGc+2ExbZxlgzRmlMXYrp20 MFvtzDIe90D3n5Ks97mmFmhSmAA8uk+xljS4hIEVbPhls+eq0WurHNr6WFZwHgcJi+Dz3m GPTckKOrjekoTswVoEZgGJrLrOJdXtI= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=nFajmyWk; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-68fbd31d9ddso2050398b3a.0 for ; Tue, 12 Sep 2023 11:45:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694544345; x=1695149145; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=zmtAWomE9UHhxu4UCSwc3QxDlD+DtTBiD3PMp93+3PE=; b=nFajmyWk9y9S9omjDygcSYVTffnCvTU93NzZQtx+bi6SrsO5TAO732/dXGpihsgiVU F2V5TBAN/2aGqf339i5Z0dbjRCYKZfaHyxYZyjU/g9Qdtgk1/TxEW8BsaXcCFJ1MJYWA wuMJ+rJ22DCRaKKPApnfVGiaZt3K5Uq02tiIvF57uPXeeu8cLURO81LYGo2Hrnr/Xuda V6CAQUR52RtfrgWRiYttdRashGaY1REe6s5u1dFO05rP/qG97iCItnYmrSyz3FyUJ6C1 4c5iIEqnLRlSMcArhZvaSAL3uJwuApXGk58lQ6COcO0ei2BYxkfnzKUg3RqdaaB47mPY 9YHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694544345; x=1695149145; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=zmtAWomE9UHhxu4UCSwc3QxDlD+DtTBiD3PMp93+3PE=; b=O9uyKe4qfetAnqA4H+DpVnRIJ2x5YEqwKgbP/p+RiAq8hyi3ab+yvsW9NoZ3Nu3r7r /iUgoTlvgesWZEaP44ubBoDtLCvpqZBWXHMVCjlo6c4nIktoJo4abMnoePy5DMHSRdee 1Fx3NyFChlctwSBzCxqziEbcxZ62UPzJKlmSexB2+KF3NidESDrr5jIYXZLERN9sUUoZ Dmgj70yxKcP4v8vLlyL0TXCeEwLm3RhgN+RpzqCScZYghIJhjTtRLCJMVv6O7mwP2VM8 IZc4KciGa/0FbjjmK0JppWiZjo4J/U/yFHsmGHw5l+1fgVnL9gGAqWT8wFNtX39WUEOs 7j+Q== X-Gm-Message-State: AOJu0Yx615I37FV0obpVW+92qPo9Ir95ViI8udcPaP6L66GRt+Z5xYtd ezLgj3cq5cT9sPbr0pi/5Q2+HKkFuHtURYD/Ej/Slw== X-Google-Smtp-Source: AGHT+IFTl+e/Goi7iXFwtLdVoeLHUo9ApJcrx6N3ePfxaq6hFANf6bjQXpHJPVMd0VLiRBWMDtZbew== X-Received: by 2002:a05:6a20:914a:b0:155:1221:a3d5 with SMTP id x10-20020a056a20914a00b001551221a3d5mr341766pzc.5.1694544345053; Tue, 12 Sep 2023 11:45:45 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id q18-20020a63bc12000000b00553b9e0510esm7390605pge.60.2023.09.12.11.45.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 12 Sep 2023 11:45:44 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.orng, Kairui Song Subject: [RFC PATCH v2 4/5] lru_gen: convert avg_total and avg_refaulted to atomic Date: Wed, 13 Sep 2023 02:45:10 +0800 Message-ID: <20230912184511.49333-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912184511.49333-1-ryncsn@gmail.com> References: <20230912184511.49333-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: DC0102000C X-Rspam-User: X-Stat-Signature: 5dp379bwjgx6xd3smcwywcz55jed41wu X-Rspamd-Server: rspam03 X-HE-Tag: 1694544346-87181 X-HE-Meta: U2FsdGVkX1+qytR9TjkAoTzOl0wvBkyUtZctCZgWZ2PfV+fo2vaEdMTr3hrVUBQH/W53TkcH0K+C1Kb3MFbIi9WVTZNBsP+5MNcO/OfGJDpfxmYVANf9kLp4dx7iyrsI/Pp4kDlL+uMV4XTr9xxjOMbWM3m2h0QG3cqN0znB/dKZK1VVzqJ+QKwGy9QgPTkb/zz9TSyVsPUL4B1NAZNdcSQFn1gC0V5c3Sok2pqv2NTSBZd/vr/GnHLhbWLjGAbNLJye3VWVf6VZXtGk/mqKC3kdBZ91y4i5quXNybyTAlGhLYdGjDZ0aBrfo2i3/45g4Dy+AY8447O79U5OWO8dnGbt/C6jj0isr+rvETRLORhj0y0+454xSYaZCELlOwnpOLpYhUKJ8Fd/rYRhmqSlY5w5QJSNEnLiqputlC6NwQRsi0Xswm3cutvLMP4NrsC6WNClVJ97ywz5j1mkQJ7ZnqnrUfPyvp5tRfz5++Tt/33BHloz5PvRy4L5pKRNnFMzLi23JpahxKUBP5yr4/cVK1Ffd9MZsH/jWPx8ts89ChpW/ti3L8a3gy169VDeHYWLQIenDW0tjzG4++FczGFWdGXIj20/LKovKThR5rEd/i0qmAVmFu769IsRFOpnP6yJNzcBfJ+iIboDs2c27xsr4Yrdd22d0hGoNqrWvJZw+poFK6PlV6MWAk6D/ZTozawrD3v3T1T/uqeBIGN8SC7nZ1p5o10UzbBy794hz0Oy4e9y06mvfX3t4mAA5lGf9nSl3Xwlwh1tLuBPBxAAHf8wODNlHXMZsF0gHND72QTsY1DIjRUwboFYOm4W1SjMeSq7EsZOLrxbQdvC+yg+Vie8Ox/odd9igsF5/JAy7evLOgk4Ibq30kyOxZ6zrlZRQ8d+VzbJc3DvlVnRAnCZAfp2qA46IBXfpBxTB9l51GMNbpkAEPZLpzfjy+Ko6S6slFWwt68dQWLZz45XHlO5OJ7 qI9uYgu2 vEcKeBRxrcnVVwLexNCAtYbNQV1eX0sscnorYc79LpSwSSKMkPSPQJqlDX4RJb29kdAXmrro4go5FcnFOEYkV69Q2YRem66iK+aG+wL2E/N/G2ZukLS+5VH3Db1N9Q58yqA6ypBopVWD+fd8qIwG+9ZUaQ6p13ON3iQ4nbU6AF/ocAb8EPsR2+q8wXst/dUlDgdAUK/ElDpQdTpArhLKk4s9qxFaa76IMdZbsrRyM0TAx88hnLZgkg8bDeYRzXT2Gk5stxm/j0yOLUP8GuYIpp3psQX7R8JoGDawNpEFuzwy+hLiLWrUgEeactvEML4CyRgvjhYiopx6ICcy6XWq/Ah7Oh83N2LeIa4AQ6f9cOlCVpzlN9FIQXA3alxRB8SxadSfI8/BNj89ZLV6Hzx68042eiFNiCeR/cum9Q4nItiojjMJgfF5I04hLxOp0iIVh2TIkjgn0Va9HVYY4xhz4OO3BU3NsvXVf/LsQUqufiaImze4/7fi95MFw2IQf85m7bVYFPi9iiLCIpNqAaoPt+GcHC1vPUBPAg6DR/YZxr7I09h/0DNwfLGcsC1iOqzzGodtWMEriS4ze5YhKcldSn0cKHu9KRhTUeoHvWKZzE0vehkoyoUz4KhZtcDZ6K/QIg4zq+B+SHqZ+I6xkTNQQKoMrx964LsAAhC5a9Ozadja7L4o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song No feature change, prepare for later patch. Signed-off-by: Kairui Song --- include/linux/mmzone.h | 4 ++-- mm/vmscan.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4106fbc5b4b3..d944987b67d3 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -425,9 +425,9 @@ struct lru_gen_folio { /* the multi-gen LRU sizes, eventually consistent */ long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the exponential moving average of refaulted */ - unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; /* the exponential moving average of evicted+protected */ - unsigned long avg_total[ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t avg_total[ANON_AND_FILE][MAX_NR_TIERS]; /* the first tier doesn't need protection, hence the minus one */ unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS - 1]; /* can be modified without holding the LRU lock */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 3f4de75e5186..82acc1934c86 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3705,9 +3705,9 @@ static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain, struct lru_gen_folio *lrugen = &lruvec->lrugen; int hist = lru_hist_from_seq(lrugen->min_seq[type]); - pos->refaulted = lrugen->avg_refaulted[type][tier] + + pos->refaulted = atomic_long_read(&lrugen->avg_refaulted[type][tier]) + atomic_long_read(&lrugen->refaulted[hist][type][tier]); - pos->total = lrugen->avg_total[type][tier] + + pos->total = atomic_long_read(&lrugen->avg_total[type][tier]) + atomic_long_read(&lrugen->evicted[hist][type][tier]); if (tier) pos->total += lrugen->protected[hist][type][tier - 1]; @@ -3732,15 +3732,15 @@ static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover) if (carryover) { unsigned long sum; - sum = lrugen->avg_refaulted[type][tier] + + sum = atomic_long_read(&lrugen->avg_refaulted[type][tier]) + atomic_long_read(&lrugen->refaulted[hist][type][tier]); - WRITE_ONCE(lrugen->avg_refaulted[type][tier], sum / 2); + atomic_long_set(&lrugen->avg_refaulted[type][tier], sum / 2); - sum = lrugen->avg_total[type][tier] + + sum = atomic_long_read(&lrugen->avg_total[type][tier]) + atomic_long_read(&lrugen->evicted[hist][type][tier]); if (tier) sum += lrugen->protected[hist][type][tier - 1]; - WRITE_ONCE(lrugen->avg_total[type][tier], sum / 2); + atomic_long_set(&lrugen->avg_total[type][tier], sum / 2); } if (clear) { @@ -5885,8 +5885,8 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, if (seq == max_seq) { s = "RT "; - n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); - n[1] = READ_ONCE(lrugen->avg_total[type][tier]); + n[0] = atomic_long_read(&lrugen->avg_refaulted[type][tier]); + n[1] = atomic_long_read(&lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { s = "rep"; n[0] = atomic_long_read(&lrugen->refaulted[hist][type][tier]); From patchwork Tue Sep 12 18:45:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13382049 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAFB0EE3F0B for ; Tue, 12 Sep 2023 18:45:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 479CB6B014E; Tue, 12 Sep 2023 14:45:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 42AB56B014F; Tue, 12 Sep 2023 14:45:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27E9A6B0150; Tue, 12 Sep 2023 14:45:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 125086B014E for ; Tue, 12 Sep 2023 14:45:53 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E18F8A0188 for ; Tue, 12 Sep 2023 18:45:52 +0000 (UTC) X-FDA: 81228824544.04.09DF365 Received: from mail-ot1-f43.google.com (mail-ot1-f43.google.com [209.85.210.43]) by imf21.hostedemail.com (Postfix) with ESMTP id DA1581C000E for ; Tue, 12 Sep 2023 18:45:50 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=KQMjvwG+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.43 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694544350; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AqLJgyQqfzUIIKOgaKwfSh8F/4j0hn9Vy7CkRzUnKKA=; b=EB+5Xzc418JqZzItbhkVSFkf/83MHI/rJcwCkyZwgNDF1E3AAQK7SPVxwG2tDUNzDKvipW pE2gHxWw6u+my8Tzike5XD73JN7HgPwRJjcdEHr3d1cpKao+6PrHVzrXj397LBeyKZnPxK eOSF2KqBhJeL/tscalE7HWUlFpZqRXE= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=KQMjvwG+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.43 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694544350; a=rsa-sha256; cv=none; b=evE5mYh3/pUHNS7NVMcRqfOWZmY4wQCoDcARfhRQk3+AY0Z1P9jThsswgLm8wmXlmelxV5 O13p+Nh0dGE1T4adcF18rVJT95M73lF/f2+kXQlEzT9IomYzlgvpfzZ9vA7BVm59BGOhVG QYyL4K4KjI3Ya3j2lUUR0haJsOdBbcI= Received: by mail-ot1-f43.google.com with SMTP id 46e09a7af769-6c0b727c1caso3763322a34.0 for ; Tue, 12 Sep 2023 11:45:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694544349; x=1695149149; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=AqLJgyQqfzUIIKOgaKwfSh8F/4j0hn9Vy7CkRzUnKKA=; b=KQMjvwG+5hU2Ocu5tuApGRL8MRztHO3pJQU6TGnFottyvD/QPr1Wofy1B/SMteKXmS vdWNkdnFncO7ZuCtqBHmvkGBJnVJ8wLmLkutodySYx8X8pnYSRI0GylQJWgcZvwd6kdt 530ypxhr3zMo31ZFJm4IUyC8t2fmsX3AgL9OuqNNlWGkX1L4HIyD6/MCmVXTLtnNRIBU HZRayytfh4KQFmAuXaTCCvT0p+esz/9vwfwc+YEzLUdlf/9Cz4YUUHWI1sNW00npYnef nOPZUc8f23uu5ptkBKeBHMDHBp/fCzPs6ljdw27XprOybpZ5q2dQADc+qSq6tSfQ2dSx JYng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694544349; x=1695149149; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=AqLJgyQqfzUIIKOgaKwfSh8F/4j0hn9Vy7CkRzUnKKA=; b=o8Rh+/KohBvo9lLI2KbOnDaJoTrzMl9A/bL6DqQh6i0sE7D+Urx2BTzxtoi+Jl1XpR JTTlJ2hbTxWWmEk0A1uZ7wQUW40aK8WRK37wncJHFkvAGO9NTonNOH6LT2Q1RqBayeDh IvwIZ/PzJxyiNbrOvAUOgVRwZQoh4miFYcDvgjNETgGGRJ0temn5gaY8W/GdhCzgZ7xe CgHjRXY0uNEjuEp5ykbKcRJQpRr/uAmbHjQu9oYppBQ0MrlibDhX12tXA/TjJBmaQFzs 1s0LkaoyzPEVhD9vfrL5t9BoBGTxWISgGG8AfVyYwpM54fSnlQ72POLJpwXpAr9wPqjh RLfw== X-Gm-Message-State: AOJu0Yyie36eae6Q7N/ccCs0zomtrK6bN64oPQ4rgXSnsccz+WD4FWxJ Q4QfDxChfz8jzLiYY7fx6AWUW5yzFlTeCf9FNN8= X-Google-Smtp-Source: AGHT+IHzmBcb/tyS2PxbvKKzErE9Tokw2tGryW2awbHorwgkZ53+YLkXi7rPle6t4POxSNl6IIx4yQ== X-Received: by 2002:a9d:7a92:0:b0:6bc:f276:717f with SMTP id l18-20020a9d7a92000000b006bcf276717fmr657333otn.13.1694544349135; Tue, 12 Sep 2023 11:45:49 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id q18-20020a63bc12000000b00553b9e0510esm7390605pge.60.2023.09.12.11.45.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 12 Sep 2023 11:45:48 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.orng, Kairui Song Subject: [RFC PATCH v2 5/5] workingset, lru_gen: apply refault-distance based re-activation Date: Wed, 13 Sep 2023 02:45:11 +0800 Message-ID: <20230912184511.49333-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912184511.49333-1-ryncsn@gmail.com> References: <20230912184511.49333-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: DA1581C000E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: s6u1n7p38wrko15z18shooaowpxig1za X-HE-Tag: 1694544350-579249 X-HE-Meta: U2FsdGVkX19PfRIYbAuxUbvWz6/tP8FUsBUTNkqUg9KmqhD+Wmy87YFpgzDQAyqjxi2c2XiGgQMzark3AY+PpV03zGpiWsgspNApop+BYowP4TjPK78HbUaLBlG0p7yPv10Qt3rCRorKtJSq2qBwGc0mwGnN+o1RxSYBSDIQXGZ8VxIokTeLOOqeINHOFCNh9YR5FUWeDHk+cGhDXdghisksX6jM+alpB2SVx/jw8P0IEAsGSeVLo4WYvQljoKQXGbPTLq2cF0bhl+ByRUtlge4wha7+WkTkra6iaiLpHd//STgoTPegE0lOa1Y0LBzz4+c1K6lg4Xb6yYx9Y+2LA2dNbrMsaZ+XPZWKwUoXLZdUzw22AGgQLhqaz0yjbMDAQXrpiRnzq6IdlmeA6wUaD69DBdD5SsmAc+/TRrehUfwuPhz83wC0NSnY+eWwQVYrsrC/Uorg/a3NOVH13vEjhl9U3jccreLH1bfVVAfHfwU48KNkBZfbJ5ROubI0ayvE0Uyjot9BEdEGqv6NNDBFZ2guCAVpiFGJa9C1V4ftgOVpl2IF/yCZLZKVff+GZw4tTv6l7iIOEigBmP5Z/INCUiwb2quuFz2QzticzWyLK2nGH2sHnyV0Hsk6btC8tdCR8UwAnosmjwGfH5YOv7IlTfj9jqtHhN+SPkLXxyNYhCxkvpniUsA37sCQkdESdek5aUHGPlLuJtYg4Ly7OeJ7I4xfQWwWYwZKBYR03uL+npwnlWENJv3yH4WT/n03cJQOGDqZA3u208LIBmMJJW6APRvbu05rFKO4/tQHwHgme8vRM3eur82TRlvSgj/oFxBcfGfkMo26Y//ljqystDAUDzcPnA44Lpm5z1P095aAEUI2jRFvp+twMyjAeAE7bqfnK6UBhHz5B29yRGKi5+eUrqrG7dcGU0psW8FAk/Ym+IuAIOaZyT3XvDCV9rHaLqAriS3PvcH3rRC5zbq/Wz6 3TH8NWB5 HlRSZJNmiqBO/bCH0nJauaU9cu1YC6E4Ri2AFvSPBuo7sdoJK+RgjcQVgo5y7K5OwBPTE4aMsnpRXjP+8CW0VfbYTdZjosV3pFT31ntngjQdRgDAGuqLemBkoqpilNHb3QqVpbivEOqQeaJ4GfASos1B+NacIj/v4CBAatbQEgGtVL6ihTYnIhJ6n4EoKc/VEyE81D/bRFPJJRwyLblUo/mNKwtpcOZ7LmHvEoo7uHNBI6e9l7Wdy7B5Q2xdZPgNaOHd2qGhFzYRiqCGqCB6/MRp5whJstAvebD4H3HlD010FpHGeJBtTByLDIO1hBa6oDGRSP4Or7d0ke99rgpwGkc6UniKF/qrKkCUH1AhYOzUVLlX8u/5pmGwOuTQ5Q2TD5M/NnUDEvEw6N8Wib2TMteuXSuEvd/4ZLP2G0b9nv2JoQ4WaAUPuc19DDw+G4sXSa/ygiGrEhGev8fRA9cji16k32BEgSkPDPj7h1MEMkLnFziuTi4mg+0njk0Qglp79lclBP2HGEgrTat+1+D3dA43XAcLCQySkSQ92hzA0PUfUIRAVChYAbKBf6TOyzQnKNaIFC3jUXlW2QiHi7XaljE1+A+XdBwg0cRyGzT/W5S/l9aO+63QI9fsv+c1AB+qgaUw7Uuoy9jrnfnSIEgNZHJJDUa1lzNBq+GaDa4rHyzjW9k4h4doPKopMg8svYW1lfRMQihfptPbeQHg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song I noticed MGLRU not working very well on certain workflows, which is observed on some heavily stressed databases. That is when the file page workingset size exceeds total memory, and the access distance (the left-shift time of a page before it gets activated, considering LRU starts from right) of file pages also larger than total memory. All file pages are stuck on the oldest generation and getting read-in then evicted permutably. Despite anon pages being idle, they never get aged. PID controller didn't kickin until there are some minor access pattern changes. And file pages are not promoted or reused. Even though the memory can't cover the whole workingset, the refault-distance based re-activation can help hold part of the workingset in-memory to help reduce the IO workload significantly. So apply it for MGLRU as well. The updated refault-distance model fits well for MGLRU in most cases, if we just consider the last two generation as the inactive LRU and the first two generations as active LRU. Some adjustment is done to fit the logic better, also make the refault-distance contributed to page tiering and PID refault detection of MGLRU: - If a tier-0 page have a qualified refault-distance, just promote it to higher tier, send it to second oldest gen. - If a tier >= 1 page have a qualified refault-distance, mark it as active and send it to youngest gen. - Increase the reference of every page that have a qualified refault-distance and increase the PID countroled refault rate of the updated tier. Following benchmark showed improvement. To simulate the workflow, I setup a 3-replicated mongodb cluster, each use 5 gb of cache and 10g of oplog, on a 32G VM. The benchmark is done using https://github.com/apavlo/py-tpcc.git, modified to run STOCK_LEVEL query only, for simulating slow query and get a stable result. Before the patch (with 10G swap, the result won't change whether swap is on or not): $ tpcc.py --config=mongodb.config mongodb --duration=900 --warehouses=500 --clients=30 ================================================================== Execution Results after 904 seconds ------------------------------------------------------------------ Executed Time (µs) Rate STOCK_LEVEL 503 27150226136.4 0.02 txn/s ------------------------------------------------------------------ TOTAL 503 27150226136.4 0.02 txn/s $ cat /proc/vmstat | grep working workingset_nodes 53391 workingset_refault_anon 0 workingset_refault_file 23856735 workingset_activate_anon 0 workingset_activate_file 23845737 workingset_restore_anon 0 workingset_restore_file 18280692 workingset_nodereclaim 1024 $ free -m total used free shared buff/cache available Mem: 31837 6752 379 23 24706 24607 Swap: 10239 0 10239 After the patch (with 10G swap on same disk, similiar result using ZRAM): $ tpcc.py --config=mongodb.config mongodb --duration=900 --warehouses=500 --clients=30 ================================================================== Execution Results after 903 seconds ------------------------------------------------------------------ Executed Time (µs) Rate STOCK_LEVEL 2575 27094953498.8 0.10 txn/s ------------------------------------------------------------------ TOTAL 2575 27094953498.8 0.10 txn/s $ cat /proc/vmstat | grep working workingset_nodes 78249 workingset_refault_anon 10139 workingset_refault_file 23001863 workingset_activate_anon 7238 workingset_activate_file 6718032 workingset_restore_anon 7432 workingset_restore_file 6719406 workingset_nodereclaim 9747 $ free -m total used free shared buff/cache available Mem: 31837 7376 320 3 24140 24014 Swap: 10239 1662 8577 The performance is 5x times better than before, and the idle anon pages now can get swapped out as expected. The result is also better with lower test stress, testing with lower stress also shows a improvement. I also checked the benchmark with memtier/memcached and fio, using similar setup as in commit ac35a4902374 but scaled down to fit in my test environment: memtier test (16G ramdisk as swap, 2G memcg limit, VM on a EPYC 7K62): memcached -u nobody -m 16384 -s /tmp/memcached.socket -a 0766 \ -t 12 -B binary & memtier_benchmark -S /tmp/memcached.socket -P memcache_binary -n allkeys\ --key-minimum=1 --key-maximum=24000000 --key-pattern=P:P -c 1 \ -t 12 --ratio 1:0 --pipeline 8 -d 2000 -x 6 fio test (16G ramdisk on /mnt, 4G memcg limit, VM on a EPYC 7K62): fio -name=refault --numjobs=14 --directory=/mnt --size=1024m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=5m --runtime=5m --group_reporting mysql test (15G buffer pool with 16G memcg limit, VM on a EPYC 7K62): sysbench /usr/share/sysbench/oltp_read_only.lua \ --tables=48 --table-size=2000000 --threads=32 --time=1800 Before this patch: memtier: 379329.77 op/s fio: 5786.8k iops mysql: 150190.43 qps After this patch: memtier: 373877.41 op/s fio: 5805.5k iops mysql: 150220.93 qps The test looks ok except a bit extra overhead introduced by atomic operations introduced, there seems to be no LRU accuracy drop. Signed-off-by: Kairui Song --- mm/workingset.c | 78 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 53 insertions(+), 25 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index ff7587456b7f..1fa336054528 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -175,6 +175,7 @@ MEM_CGROUP_ID_SHIFT) #define EVICTION_BITS (BITS_PER_LONG - (EVICTION_SHIFT)) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) +#define LRU_GEN_EVICTION_BITS (EVICTION_BITS - LRU_REFS_WIDTH - LRU_GEN_WIDTH) /* * Eviction timestamps need to be able to cover the full range of @@ -185,6 +186,7 @@ * evictions into coarser buckets by shaving off lower timestamp bits. */ static unsigned int bucket_order __read_mostly; +static unsigned int lru_gen_bucket_order __read_mostly; static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction, bool workingset) @@ -240,7 +242,7 @@ static inline bool lru_refault(struct mem_cgroup *memcg, int bits, int bucket_order) { unsigned long refault, distance; - unsigned long workingset, active, inactive, inactive_file, inactive_anon = 0; + unsigned long active, inactive_file, inactive_anon = 0; eviction <<= bucket_order; refault = atomic_long_read(&lruvec->nonresident_age); @@ -280,7 +282,7 @@ static inline bool lru_refault(struct mem_cgroup *memcg, * active pages with one time refaulted page may not be a good idea. */ if (active >= (inactive_anon + inactive_file)) - return distance < inactive_anon + inactive_file; + return distance < (inactive_anon + inactive_file); else return distance < active + (file ? inactive_anon : inactive_file); } @@ -333,10 +335,14 @@ static void *lru_gen_eviction(struct folio *folio) lruvec = mem_cgroup_lruvec(memcg, pgdat); lrugen = &lruvec->lrugen; min_seq = READ_ONCE(lrugen->min_seq[type]); + token = (min_seq << LRU_REFS_WIDTH) | max(refs - 1, 0); + token <<= LRU_GEN_EVICTION_BITS; + token |= lru_eviction(lruvec, LRU_GEN_EVICTION_BITS, lru_gen_bucket_order); hist = lru_hist_from_seq(min_seq); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); + workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(mem_cgroup_id(memcg), pgdat, token, refs); } @@ -351,44 +357,55 @@ static bool lru_gen_test_recent(struct lruvec *lruvec, bool file, unsigned long min_seq; min_seq = READ_ONCE(lruvec->lrugen.min_seq[file]); + token >>= LRU_GEN_EVICTION_BITS; return (token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); } static void lru_gen_refault(struct folio *folio, void *shadow) { int memcgid; - bool recent; + bool refault; bool workingset; unsigned long token; + bool recent = false; + int refault_tier = 0; int hist, tier, refs; struct lruvec *lruvec; + struct mem_cgroup *memcg; struct pglist_data *pgdat; struct lru_gen_folio *lrugen; int type = folio_is_file_lru(folio); int delta = folio_nr_pages(folio); - rcu_read_lock(); - unpack_shadow(shadow, &memcgid, &pgdat, &token, &workingset); - lruvec = mem_cgroup_lruvec(mem_cgroup_from_id(memcgid), pgdat); - if (lruvec != folio_lruvec(folio)) - goto unlock; + memcg = mem_cgroup_from_id(memcgid); + lruvec = mem_cgroup_lruvec(memcg, pgdat); + /* memcg can be NULL, go through lruvec */ + memcg = lruvec_memcg(lruvec); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); - - recent = lru_gen_test_recent(lruvec, type, token); - if (!recent) - goto unlock; + refault = lru_refault(memcg, lruvec, token, type, + LRU_GEN_EVICTION_BITS, lru_gen_bucket_order); + if (lruvec == folio_lruvec(folio)) + recent = lru_gen_test_recent(lruvec, type, token); + if (!recent && !refault) + return; lrugen = &lruvec->lrugen; - hist = lru_hist_from_seq(READ_ONCE(lrugen->min_seq[type])); /* see the comment in folio_lru_refs() */ + token >>= LRU_GEN_EVICTION_BITS; refs = (token & (BIT(LRU_REFS_WIDTH) - 1)) + workingset; tier = lru_tier_from_refs(refs); - - atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); - mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); + refault_tier = tier; + + if (refault) { + if (refs) + folio_set_active(folio); + if (refs != BIT(LRU_REFS_WIDTH)) + refault_tier = lru_tier_from_refs(refs + 1); + mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); + } /* * Count the following two cases as stalls: @@ -397,12 +414,17 @@ static void lru_gen_refault(struct folio *folio, void *shadow) * 2. For pages accessed multiple times through file descriptors, * numbers of accesses might have been out of the range. */ - if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { + if (refault || lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { folio_set_workingset(folio); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } -unlock: - rcu_read_unlock(); + + if (recent && refault_tier == tier) { + atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); + } else { + atomic_long_add(delta, &lrugen->avg_total[type][refault_tier]); + atomic_long_add(delta, &lrugen->avg_refaulted[type][refault_tier]); + } } #else /* !CONFIG_LRU_GEN */ @@ -524,16 +546,15 @@ void workingset_refault(struct folio *folio, void *shadow) bool workingset; long nr; - if (lru_gen_enabled()) { - lru_gen_refault(folio, shadow); - return; - } - /* Flush stats (and potentially sleep) before holding RCU read lock */ mem_cgroup_flush_stats_ratelimited(); - rcu_read_lock(); + if (lru_gen_enabled()) { + lru_gen_refault(folio, shadow); + goto out; + } + /* * The activation decision for this folio is made at the level * where the eviction occurred, as that is where the LRU order @@ -780,6 +801,13 @@ static int __init workingset_init(void) pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n", EVICTION_BITS, max_order, bucket_order); +#ifdef CONFIG_LRU_GEN + if (max_order > LRU_GEN_EVICTION_BITS) + lru_gen_bucket_order = max_order - LRU_GEN_EVICTION_BITS; + pr_info("workingset: lru_gen_timestamp_bits=%d lru_gen_bucket_order=%u\n", + LRU_GEN_EVICTION_BITS, lru_gen_bucket_order); +#endif + ret = prealloc_shrinker(&workingset_shadow_shrinker, "mm-shadow"); if (ret) goto err;