From patchwork Sun Apr 2 10:42:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13197362 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DBF0C76196 for ; Sun, 2 Apr 2023 10:43:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8385F6B0071; Sun, 2 Apr 2023 06:43:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E80D6B0074; Sun, 2 Apr 2023 06:43:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 663536B0075; Sun, 2 Apr 2023 06:43:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 503156B0071 for ; Sun, 2 Apr 2023 06:43:09 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 19649AB776 for ; Sun, 2 Apr 2023 10:43:09 +0000 (UTC) X-FDA: 80636113698.01.19BF620 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf11.hostedemail.com (Postfix) with ESMTP id D503140016 for ; Sun, 2 Apr 2023 10:43:06 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=nhVAflAS; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf11.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680432187; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=GGAyvjY+PBKUxOMuAM54W6jfg/sE2uC+8B2B29w9PAI=; b=lfZTtpFu87LtuaUUAubpPWK84jVHIpXI95dr3wMLRkkrJ+/+3iKnVG7xQ3rN82w969i+MI Smr2hjgAJdVIFnZ70VE3QfErri0AcTwFHnhKOqta6/w/FJd6v5gLr1VRurCKXtzCH1OBEH CVoVXB4L/IZUWF5B1IIZ53gXV+4udAQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=nhVAflAS; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf11.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680432187; a=rsa-sha256; cv=none; b=0eo1NYjHi0KkIEt/Mh03xwVf1IxK9AsUX/aQPfnf4h7Fhtp/l8yxyWTdLiVas5TB/vN12x wkUclFIxIFtRZ46Z+Dd8szZppBLLd2YAO4QQ7Gg6dap6311t2HPNxamQ3DmmmWxpCyXPD/ pGP31wzY/6AzwiWij+TBRTX6O53cIYc= Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3327vZjJ015612; Sun, 2 Apr 2023 10:42:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=GGAyvjY+PBKUxOMuAM54W6jfg/sE2uC+8B2B29w9PAI=; b=nhVAflASw/MlBAAb1ndQMtLrtesbyOzthvDButSJQkN6SVgv23bmr97yz6UNWgEGlXoE u/AyCQgHk5mX6sOaQQ5jsMF8MawCe9BHdcEVfRFTY6J8QCxnlmAzIt0WhRR7TiqzH3Dd cfBliTCpz3enHa4V/1k9O0k2AHjCu8FT3BV3bkCgUqr8/i3AfvQT1sqTWVxUxY5yebBc WzOenatARnN7HiBtGy2Q2rIil8SxTcABjs5UlPNo3fxuN7yeb39lzMmLlVK1QlDHbwxO sVPHij5iL0ZAA5xwkoj8krAKMsvYewCn4J5AbkZNPCTYfmx4PNgwlA8+oQOxm4HBdEMU kw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3ppxs272y3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 02 Apr 2023 10:42:57 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 332AguoI002652; Sun, 2 Apr 2023 10:42:56 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3ppxs272xx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 02 Apr 2023 10:42:56 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3326jj2x009057; Sun, 2 Apr 2023 10:42:55 GMT Received: from smtprelay01.wdc07v.mail.ibm.com ([9.208.129.119]) by ppma04wdc.us.ibm.com (PPS) with ESMTPS id 3ppc87e1tk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 02 Apr 2023 10:42:55 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay01.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 332AgsXU46268738 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 2 Apr 2023 10:42:54 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 477135805A; Sun, 2 Apr 2023 10:42:54 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9396A58056; Sun, 2 Apr 2023 10:42:50 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.8.200]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP; Sun, 2 Apr 2023 10:42:50 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Dave Hansen , Johannes Weiner , Matthew Wilcox , Mel Gorman , Yu Zhao , Wei Xu , Guru Anbalagane , "Aneesh Kumar K.V" Subject: [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Date: Sun, 2 Apr 2023 16:12:33 +0530 Message-Id: <20230402104240.1734931-1-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: QTS7kjgtq2_qM6YuNzKAABN6ujWFwj92 X-Proofpoint-ORIG-GUID: 6a41YF6zs0-fUzYo-Qb3YVAiGw31r4ev X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-31_07,2023-03-31_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1011 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 adultscore=0 phishscore=0 bulkscore=0 malwarescore=0 impostorscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304020092 X-Rspamd-Queue-Id: D503140016 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: ibbnqzyuuft3gtwp4md77ritd11q3nhd X-HE-Tag: 1680432186-468392 X-HE-Meta: U2FsdGVkX18NO//rNj5o0Q+V9g0F9Anc+ucMTtTKis4YYdUr85hY8lRaWuJT84WYYIwOy4xhSgSKj7XJ5NaB6Rfs4Srz8V0l7c6E4GKkU0teBquzg9sBqceo9qFqPxP1MKzYNVgBdJxjtAhMbLL66YgWkXfgT73WC3dylPxoEACDzqrjdiJCCwnUI7M/WRkYNx7R3rxoKuXXAFT/+NZ6r2SW7ye0JSGnv1x1tO2D8/iHVWTzi5wNx6bP0QQ/4T8nQpWVWDr6hC5tD18haSwjFVdQoV3pHz06cllGblBRRnGEgvI+L0dcGhQ9YM/TzwqXklIdCaTO8M6Zei+wEUoNB7PqLaGD2QFGQXaXHGAcQeHJTKp6h1zQ+ZpIOmlfhaSeEmYnxW9dQkjnABkyB4QcEdFuEX6SJnu0J8+hGA1Gw6n2W6ybIhRjyvmCsISKMXDb4xgTHAxEzEqvD3IN+/e0TKOywGp3fonucnz6+ujRYokxeulZ/U0g2HwXqmc5i5e0fzMujw3HP2Pj1N3zwZmDcU3/w0H+UcEbmuJL2Gv+1GYyNdQrNVvwbBlxijAd/QAzn+9js8RvcUC38PPZSiQzYU1D4CiCT0td1+9Va9SdwO8TM7kzQnYb5UFzXFIiN0hbxylNY8MCUOgN7qJ1akMY8B3i59r/nZlui0052HtCvOg5bGFzF49RSk1ltgNkm1RIuFVMhWevMQ4M+ZZF7CmIG2w1zlWBvg3v3mIlx0sjSqRJDwyDRgKJLqt2/LkAtiU5iFYtcgEuA9MIZg9KVEhOKCWx+V0pcreqaIdxpOdcKTcGeqMSCZtCNtdNL2FAuybqAPCRXtqYCeplyEYKHJhO6NmndIZPDeSrWa7LhrO08rzqCi4B++L8atgw3K9yc8cg5LUip2xsiIbReqsp0bOx2+Tf9NJPvJ96mGF6++Hufao0KRQg2KUBSEAQquMbzu7uCie9xqu6KjD4486v2Xh 2dyVVXYN IeWH54+2Vya/MGxxUJRS1G9OYTrs/OE59oogW72/bEI9W6HokPHO40aDJsJ2innu+kV9YPOYdRIMioHWSrb+WgDk2++GIzjGpXyc9yHbkNeOU3twsvvqyCXCEb588T1Hcz5ymzoHqd5aPeAk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Architectures like powerpc support page access count mechanism which can be used for better identification of hot/cold pages in the system. POWER10 supports a 32-bit page access count which is incremented based on page access and decremented based on time decay. The page access count is incremented based on physical address filtering and hence should count access via page table(mmap) and read/write syscall. This patch series updates multi-gen LRU to use this page access count instead of the page table reference bit to classify a page into a generation. Pages are classified into generation during the sorting phase of reclaim. Currently sorting phase use generation details stored in page flags and with this change, we can avoid using page flags for storing generation. That will free the 3 bits in page flag used to store generation. Since the page access counting mechanism can also count access via read/write, we can look at avoiding using tier index in page flags. That should free the 2 bits in page flag used for REFS (this is not done in this patch). I also added a patch that did the below @@ -5243,7 +5243,8 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap if (list_empty(&list)) return scanned; retry: - reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); + reclaimed = shrink_folio_list(&list, pgdat, sc, + &stat, arch_supports_page_access_count()); sc->nr_reclaimed += reclaimed; The performance did improve, but that did result in a large increase in the workingset_refault_anon. I think this is because it takes some minimal access to classify the pages to the younger generation and we can have high page refaults during that window. PATCH 2 did result in some improvements on powerpc because it is removing all additional code that is not used in page classification. memcached: patch details Total Ops/sec: mglru 160821 PATCH 2 164572 mongodb: Patch details Throughput(Ops/sec) mglru 92987 PATCH 2 93740 Enabling the architecture-supported page access count does impact workload performance since updating the access count involves some memory access overhead. Another challenge with page access count is in determining relative hotness between pages. I did try two methods density-based clustering and kmean clustering to classify pages to LRU generation based on sampled hotness. Doing more work during page classification is resulting in increased lock contention on lru_lock and hence hurts performance. memcached: patch details Total Ops/sec: arch page access count 161940 avoid folio_check_reference 171631 (but refault count increase from 2606765 -> 7793482) mongodb: Patch details Throughput(Ops/sec) arch page access count 92533 avoid folio_check_reference 91105 ( refault: 828951 -> 4592539) The patch series does show that using page access count is not resulting in any regression and can keep the code simpler w.r.t different feedback loop used during multi-gen LRU reclaim. This also saves some bits in page->flags . It was also observed that overhead in counting page access is not that high and can be mitigated by further tuning of the page generation classification logic. This also enables us to start looking at using page access count in other parts of the linux kernel like page promotion. I haven't been able to measure the impact on page promotion yet due to hardware availability. Aneesh Kumar K.V (7): mm: Move some code around so that next patch is simpler mm: Don't build multi-gen LRU page table walk code on architecture not supported mm: multi-gen LRU: avoid using generation stored in page flags for generation mm: multi-gen LRU: support different page aging mechanism powerpc/mm: Add page access count support powerpc/mm: Clear page access count on allocation mm: multi-gen LRU: Shrink folio list without checking for page table reference arch/Kconfig | 3 + arch/arm64/Kconfig | 1 + arch/powerpc/Kconfig | 10 + arch/powerpc/include/asm/hca.h | 49 ++++ arch/powerpc/include/asm/page.h | 5 + arch/powerpc/include/asm/page_aging.h | 35 +++ arch/powerpc/mm/Makefile | 1 + arch/powerpc/mm/hca.c | 288 ++++++++++++++++++++ arch/x86/Kconfig | 1 + include/linux/memcontrol.h | 2 +- include/linux/mm_inline.h | 47 +--- include/linux/mm_types.h | 8 +- include/linux/mmzone.h | 15 +- include/linux/page_aging.h | 43 +++ include/linux/swap.h | 2 +- kernel/fork.c | 2 +- mm/Kconfig | 4 + mm/memcontrol.c | 2 +- mm/rmap.c | 4 +- mm/vmscan.c | 372 ++++++++++++++++++++++---- 20 files changed, 780 insertions(+), 114 deletions(-) create mode 100644 arch/powerpc/include/asm/hca.h create mode 100644 arch/powerpc/include/asm/page_aging.h create mode 100644 arch/powerpc/mm/hca.c create mode 100644 include/linux/page_aging.h