From patchwork Fri Sep 29 11:44:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB8D1E810D5 for ; Fri, 29 Sep 2023 11:44:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 203968D00ED; Fri, 29 Sep 2023 07:44:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B4E58D0023; Fri, 29 Sep 2023 07:44:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 006AE8D00ED; Fri, 29 Sep 2023 07:44:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E78CD8D0023 for ; Fri, 29 Sep 2023 07:44:38 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BE18FB463F for ; Fri, 29 Sep 2023 11:44:38 +0000 (UTC) X-FDA: 81289452636.23.658607B Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 1D81918001D for ; Fri, 29 Sep 2023 11:44:36 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987877; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wW8PmLeN+NODDjeW+k+loXBT9eKUllZt4ss/4twQx6U=; b=DMRf/jz0bmT1zHLc6DbqEzh0iEOYCJ7VMgpn+NGRYOdRB5lHw31DuGArwJXyM8i5JgVVMV 0GFcEDrZW/UBuL6gQ/hYo3d/Qzk3j1kP8/ePQkxtWcMgF6fxyfELBwhDPBwB0K+XZtFnjQ csbtPGAHqavWdxTRGUhREnFN5d3leTk= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987877; a=rsa-sha256; cv=none; b=OciZf3tIbe59e0YvTASbJgPhLFrtUcWGM6ZHhizbotCN0NU0J9HvWt6pD1Go47n2H+FXNX a3UyyHsP07km7m5F63S6DXx4T6m3m9qniKFqBHa/UaOBHDucZo0FH79khHjvVPjM6ryAgz QgP2InVoegjrSVEfMfdh5GCzo1KwYyU= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 77FA4DA7; Fri, 29 Sep 2023 04:45:14 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 94DAD3F59C; Fri, 29 Sep 2023 04:44:33 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 1/9] mm: Allow deferred splitting of arbitrary anon large folios Date: Fri, 29 Sep 2023 12:44:12 +0100 Message-Id: <20230929114421.3761121-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1D81918001D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: qu4i7wbaseqgj6mqe3f94ewyu4ppjoai X-HE-Tag: 1695987876-873634 X-HE-Meta: U2FsdGVkX1+avFHH2SjvYWvR2b5qJT01UhmeQGI6vBkAFGoz3wHmUhJWqk7XIjH/6tzULlyQ33aSP+D1T9pGqsnz3WPwfdK+LZPPeIaGXAhRMJUP1uhbR/h4VBPQsi1YKiN/4d6MxrEKuk076Z/WmuohEPJZmAVc1W/J6jXiEiBBpH8Nju/Vi7aiqAOhFfAGTW6WT8SYOAeVFs/0Rs4JzD0hvDHndqOIU5Z05gCsnQILcsDdpQ/36h7aSITFyCcSSauPHd/D1hG+lI9QhwC5F+IIqWGUrjL+D0qg+pGLNvFQChSHA3XLhRULiZ23tCebheL+0Jjj+G6Y5c41sxU+3FSuH6Lyco0NY+lQnBAVkf1b0SXrR6O6H9JuSiOeo5YVnNwzoU2GzoUcISBJJV7+WwMEpFiBMTG+w1y9ffynyxcmh08QtPEfAKMOVN8HtKK1PmubfLEOHSAWcIUE5iDrpH+l/Ovlgdq7XbuJQsqqRPg39yn+1PC5UeIt5K6TljdQWiLwqQDEV3JEQx7X3zQdU+3DquMakgipCpIhHigomJlnqfQDs0bd0DLhVbtI5I7lLwoaxq45GCQ0/J6eXYUqJfjp8h5y8mb3EHJQjxKzBflUXy4vvIMUptn+HINUSsP8Hrhhp8CpwRhHBG7JCeUHynXRRxuLGNLy94ctSjwH+kXwqnEBuE+sfmyLQlUDbIIlcFHCthYq7OvAsoWp9gdpnV5RoJPXQZZmhN3oZnyp/hj8OLYQyiTq3GN7aa+hLtVGXelU0VB6v44b45jr9UZD7K+0Zh+SACLOgVau6l6am8+EM5OTWUc7j6rBO9ksC6KOVdQIAvfbfrfBAelWbe3itb5yppKRHOPsSyd3VE2blpA8tJ7TscnONLVoy79TnCGlDSZL9lRi1odGzPuE/1Veq6mxNNQgL+x5vQBTfRpw4AKU9xbgtL5204nf7mfWUfp35EcM1Y7u3N6VbEbSsK2 h9jwIOAj YYDzxN2PDuoboTtRC70nOvASoEuEe2hUSb7cy8q7lrWmGd3S8svQs8crAJrkHfS+oyvXZUxlbjrnD6rS3B8TjKTnYgesMWfFIBxIf2AmlY2zWjozo+TBwGLJ+7bEe6yL4L4MARSSqKzZDeX65nveS8gmDJ+pzgB4rfrZV0WZ5mmw50bylIxdZA2tEP5CMzxRrw5Gdyd1xctZW7T+HieytCkKMPAJCOktQXAtg3tSgNjLecyPdbo7Z8oSM/mMGcB9e/8lEEdxWbdCufHYrWZLWmM+YX1z12R0nIZUDB9UXUicTI5KeLWbtlwPhKmTJq8JeSoOVBzolhyAhOyMoLprcnaY6Yjtc6+nJnNmS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for the introduction of large folios for anonymous memory, we would like to be able to split them when they have unmapped subpages, in order to free those unused pages under memory pressure. So remove the artificial requirement that the large folio needed to be at least PMD-sized. Reviewed-by: Yu Zhao Reviewed-by: Yin Fengwei Reviewed-by: Matthew Wilcox (Oracle) Reviewed-by: David Hildenbrand Signed-off-by: Ryan Roberts --- mm/rmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 9f795b93cf40..8600bd029acf 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1446,11 +1446,11 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, __lruvec_stat_mod_folio(folio, idx, -nr); /* - * Queue anon THP for deferred split if at least one + * Queue anon large folio for deferred split if at least one * page of the folio is unmapped and at least one page * is still mapped. */ - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) + if (folio_test_large(folio) && folio_test_anon(folio)) if (!compound || nr < nr_pmdmapped) deferred_split_folio(folio); } From patchwork Fri Sep 29 11:44:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29D67E810DF for ; Fri, 29 Sep 2023 11:44:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A41128D00EE; Fri, 29 Sep 2023 07:44:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F1358D0023; Fri, 29 Sep 2023 07:44:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E0BA8D00EE; Fri, 29 Sep 2023 07:44:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 80C3B8D0023 for ; Fri, 29 Sep 2023 07:44:41 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 57102B2E54 for ; Fri, 29 Sep 2023 11:44:41 +0000 (UTC) X-FDA: 81289452762.02.0C35522 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id A3314140012 for ; Fri, 29 Sep 2023 11:44:39 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Mr+mjEqwfDoSUXJ6kY2GFEcxLi6J7er8Wf2qGr66jIc=; b=7HS7SHCkLh79SJhTPC1ydB39O0qtVYUBRETgj38hxaSi3gFNM8Z4jtVs+kqnzxmPCvsSgo fl2EADtjIm6SgutRsaqI7eIfWLtEgrGCyPDI4PosduMLcX8R0bJuMxTOYIC78Cy3T0i/Qy lhy8Dakg80yWsOPiPFCgDiEte9SNhf8= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987879; a=rsa-sha256; cv=none; b=LQLhPW/b5R0SgPONiS1r/fDUHqu11nYuKjd26rZbr/AgvXQUzdLvJKi4yTvYmackmuHDLV bijNXR8do8NJn1zt4Uut98xqqClxAdOfjH/k9JsefwlIQcdh4wkwlOB2E88DyBm8jQIp1o VWjefw+ewU+/UD683/vh58q0HhzEwhM= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3537F1007; Fri, 29 Sep 2023 04:45:17 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 527553F59C; Fri, 29 Sep 2023 04:44:36 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 2/9] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Date: Fri, 29 Sep 2023 12:44:13 +0100 Message-Id: <20230929114421.3761121-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A3314140012 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: pcqhbx8y5h468x144g7djcf1n14jt4tj X-HE-Tag: 1695987879-709487 X-HE-Meta: U2FsdGVkX1+Z+IM7JEYTPckQQ0xRMQHxRlm4wuUGhoLNbMqU8FtkgFN/NSOIilkrlwgyn/5OxYJ9r37qu5FghWXZmb+zp26y5Nj/ANEVCFfrnSfsP+JIe7nbt+crHMkMpuUL1kjjohxBLPWKlFSJJXRXDbtPMAsRdi0FBzjady8jgdoSpTL3vNReTV5odxng5U1naQKzuNDa73VkHZMtKeqJ0dHLSkwTx8koFSTgOI32NwJZqway9IPXTd0NER1+x4kxUy5offQhpH+vQmX2IHcaHPicuyPeXP/bC1m1Ri8Hv3RqexK+lZz4vYkk5EGP3sJ7SJoRgoK7pBnWK75I7BUQYVX86i6mmIn7W9tCG+U7zhyeedqkUNer9YFHCwiGTKv5OzlZ54GcXp93icNKTiO9HufMxYmz+2Ng63Kn2O6ApgLm/qQhRK4S+yhP1x0UnsDxcAzvqvRkL4itgEeyVoNoqX9io9vzloheDmX/icOnaDhXNah2MJkDp6Kx3J9px5tiY4RVHJLSp9qNbFv4UOzocOvIKG8A80F+/0MpaD2fD9KJ1yj5cioFR0xd8S0E3gHBjpeqBZe+cOP5x1XMZ+LS+1W+XG2vH3NmFyoEpDn73YL20W4vuHt+E5nrVd/eeROJsqIo2TtDYJRjkaPLzWgtBIuVeZzhI8LJQ+ThN5cBZHkhsy4Q0qnhJaVj0IoXeKnjIV+HY5HBaoKpTlUoQ8x7zT5AiywinTl4N2dIaG0q4SgIUbpQlAM7HumUlZRFgQtKraUPcbjTjoUigB3cooeRKES3ifS66hWBueuLdPScF+sdVVYGHCzaaxnVB+vtavy4PsuQCRk6NoZGnapyGRrjnfulOfpz16RH5N+H1Wydu7Fivwhnrwav3kHVexHiy2BGMI8jKEFLJwf84RIITwfFdGuwDO+NlDGKWyH4IcDTI2QTHB0SK9f04xEEnO/yKDKRZsWj02jWF9cMZ7F IaGoYG1n PN6bHTRF1VxqVkL7LijKws6oVg1eavVeJErjpcR9h6f68bD+zdNcmj52Yo4G6FeGdqXdevuMxoC/3r9pmm38rataj/mUtw42pVb98BoebzWYoi4sfjCvCn4Z5bCm7rftoAMFrCLAvki3ljPl5mA/KkgoDiKs3PwjiyLgE/hMYjf8ML5gyWuvEGm2v0ikarXXabA7P9yW7j/HYfJGDDi1XnabrciHoYG6repN8h2lB9CroM6rIyf9NB8LHmVttkkto4aJ4IPtyCQcdG5nBme8p+anEokNwUD2AFZdTU4IMpaU+oAMKf8qH8YmcllpmjRFLoXs+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for anonymous large folio support, improve folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be passed to it. In this case, all contained pages are accounted using the order-0 folio (or base page) scheme. Reviewed-by: Yu Zhao Reviewed-by: Yin Fengwei Signed-off-by: Ryan Roberts --- mm/rmap.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 8600bd029acf..106149690366 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1266,31 +1266,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, * This means the inc-and-test can be bypassed. * The folio does not have to be locked. * - * If the folio is large, it is accounted as a THP. As the folio + * If the folio is pmd-mappable, it is accounted as a THP. As the folio * is new, it's assumed to be mapped exclusively by a single process. */ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address) { - int nr; + int nr = folio_nr_pages(folio); - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); + VM_BUG_ON_VMA(address < vma->vm_start || + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); __folio_set_swapbacked(folio); - if (likely(!folio_test_pmd_mappable(folio))) { + if (likely(!folio_test_large(folio))) { /* increment count (starts at -1) */ atomic_set(&folio->_mapcount, 0); - nr = 1; + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); + } else if (!folio_test_pmd_mappable(folio)) { + int i; + + for (i = 0; i < nr; i++) { + struct page *page = folio_page(folio, i); + + /* increment count (starts at -1) */ + atomic_set(&page->_mapcount, 0); + __page_set_anon_rmap(folio, page, vma, + address + (i << PAGE_SHIFT), 1); + } + + atomic_set(&folio->_nr_pages_mapped, nr); } else { /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); - nr = folio_nr_pages(folio); + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); } __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); - __page_set_anon_rmap(folio, &folio->page, vma, address, 1); } /** From patchwork Fri Sep 29 11:44:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40BBDE7F154 for ; Fri, 29 Sep 2023 11:44:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D76328D0023; Fri, 29 Sep 2023 07:44:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D50568E0001; Fri, 29 Sep 2023 07:44:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BED668D00EF; Fri, 29 Sep 2023 07:44:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id ACC308D0023 for ; Fri, 29 Sep 2023 07:44:44 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6C3F741389 for ; Fri, 29 Sep 2023 11:44:44 +0000 (UTC) X-FDA: 81289452888.07.6CA99BD Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id A8B79140002 for ; Fri, 29 Sep 2023 11:44:42 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987882; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BCOYVCPEAW5w4k6+XBBjGI0EWgGCdsX2k3cf8C21COc=; b=39UqHPKDjXVhO7/LkYdGIJ9QSzzHjIKjcXoEImy/xb0RzTOneWphfbgB+K1GToFlyWVEYn F1cXMNzTjwgLTyDXxa2jGrrRNfEtWqRZkqv0SlTBiHFPTdDnsdE0+qX2+CpKoqGEOCwr6f oVPrWG4TJ6FTfDGU1N7GtSorJ0GMsvU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987882; a=rsa-sha256; cv=none; b=zhQd8SvSmlgqwvCTHQdQnoVpM4qzYAk6b5R1znxflfpBwcugXwH4btocZTN0d+HzeMl3/a wTMHxUiMTLEdLeAB99tfkIzMRj9qLokYVX7Ls2hLrpNMjJOERd+lSyfEqrCPKn1fSepv/x dBgGxEDP05kJZyTgAFAsoR71mcTFUYg= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0CDDC1063; Fri, 29 Sep 2023 04:45:20 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 103F93F59C; Fri, 29 Sep 2023 04:44:38 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 3/9] mm: thp: Account pte-mapped anonymous THP usage Date: Fri, 29 Sep 2023 12:44:14 +0100 Message-Id: <20230929114421.3761121-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A8B79140002 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ah5nygocxmbgz1mubrft3frw5qr33ou7 X-HE-Tag: 1695987882-574341 X-HE-Meta: U2FsdGVkX1+zwe8ptxZBV1DoSbPnC8eu3/Uenvj1Qw8bcCsXrsvJR9JxTV2pntXhP89paBuqwUK7G5IZH9WbzLubFUVYGD43Wv+Xxd0ieEe8dRcFqxV5Kvt5v3/WVYlfSvPEp46z02rVtbSqYWBj47AfdyndXBVZ6TqckMar8u25+pBygpSlN8QQuJtHExpQIuD1xn9TzgI8k4mTi2sWYV9EKXjkLXVemdKObqTBN88jJsL+9mK1bREvmKwXh5erPXsdHmPXPpgtFfs4U4i5t+peeoqV0oYFWocshNKmjXupW2UuJGPDNC8Su2iKFSlioGHIizTz9kz5BJpWTK972bnFMSFrYBl+yOK8qBvj7jGZBCBi0ShqgMr/KylVx8giW+nTBUgRkGPhMz/BPQEk/skbb5qAmbladKg9sIYzZk8hpR0sMic/Am9xx6DqoVZwquYG8+i7/jYVTC/vIcdp4o79XJqgkqqWrGKM1HcftuYD4jf7CY6z5pKGgGRgPRF4it9rcC8IVulu+xyxYlCYvCZtic/kff120L2Ya710CRR5gFveuWh/4UWWFTQQv8ycNsjQuYyLsncVINOYTdeLW6Ca+JQwGcOSdd3e6fQ86q7IqeOJ/YL7BURLKe9Dws3cUvuYcEd7cnpZsb+uDJBh0qssXmDuxTkonRkxAD6hwSRnI4SKThjwF75RLz3ISXyaIiD2jLLdK9puMrRHBB/VBAhxPhspNNQUPwtT8y3G3QKnBSoIqM104hEAV+pxZd9HzdNxDTJVHWtaE6IF4f8CPZNKBHSqIZjDoBlIlpHZ2sAI3hsFdPnmpaikOMKsf1fUo16EOQibHXp+uIJCkzcIIZrycQnOHrjqKbRdBrhB1n4oNTo0giIcbMwPqD0KyLSI7Lh0KDnkNLQMn75+16YeC0xawuv6wzbAaZ0dUp9J1PkgP/WwxblT4Bn16/6AGsXHfvPUS4k5gA9uLOlANwD 2QGhArrh Si+1NYKQMO4IptUJw3cjJjwMeD96UHgyToXC2s9cf6p+al3FcLWJt3I7dEPBDTidPXih42nG9w3i7I56z9lT5ngY9KTE1CJRDb726AIY1XIC1AbatUHo3fJLKhKTx5wm9SSR20fQlrOzzpqp/euZZhMr7xmTHjb3AUjZ450dXzi27O5RiZBl/vei3Wg0pTBC/1ZKY1cTKzluo4I59bfzRI+Q+FxX28aa/+XSJmzObKroVIQg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add accounting for pte-mapped anonymous transparent hugepages at various locations. This visibility will aid in debugging and tuning performance for the "small order" thp extension that will be added in a subsequent commit, where hugepages can be allocated which are large (greater than order-0) but smaller than PMD_ORDER. This new accounting follows a similar pattern to the existing NR_ANON_THPS, which measures pmd-mapped anonymous transparent hugepages. We account pte-mapped anonymous thp mappings per-page, where the page is mapped at least once via PTE and the page belongs to a large folio. So when a page belonging to a large folio is PTE-mapped for the first time, then we add 1 to NR_ANON_THPS_PTEMAPPED. And when a page belonging to a large folio is PTE-unmapped for the last time, then we remove 1 from NR_ANON_THPS_PTEMAPPED. /proc/meminfo: Introduce new "AnonHugePteMap" field, which reports the amount of memory (in KiB) mapped from large folios globally (similar to AnonHugePages field). /proc/vmstat: Introduce new "nr_anon_thp_pte" field, which reports the amount of memory (in pages) mapped from large folios globally (similar to nr_anon_transparent_hugepages field). /sys/devices/system/node/nodeX/meminfo Introduce new "AnonHugePteMap" field, which reports the amount of memory (in KiB) mapped from large folios per-node (similar to AnonHugePages field). show_mem (panic logger): Introduce new "anon_thp_pte" field, which reports the amount of memory (in KiB) mapped from large folios per-node (similar to anon_thp field). memory.stat (cgroup v1 and v2): Introduce new "anon_thp_pte" field, which reports the amount of memory (in bytes) mapped from large folios in the memcg (similar to rss_huge (v1) / anon_thp (v2) fields). /proc//smaps & /proc//smaps_rollup: Introduce new "AnonHugePteMap" field, which reports the amount of memory (in KiB) mapped from large folios within the vma/process (similar to AnonHugePages field). NOTE on charge migration: The new NR_ANON_THPS_PTEMAPPED charge is NOT moved between cgroups, even when the (v1) memory.move_charge_at_immigrate feature is enabled. That feature is marked deprecated and the current code does not attempt to move the NR_ANON_MAPPED charge for large PTE-mapped folios anyway (see comment in mem_cgroup_move_charge_pte_range()). If this code was enhanced to allow moving the NR_ANON_MAPPED charge for large PTE-mapped folios, we would also need to add support for moving the new NR_ANON_THPS_PTEMAPPED charge. This would likely get quite fiddly. Given the deprecation of memory.move_charge_at_immigrate, I assume it is not valuable to implement. NOTE on naming: Given the new small order anonymous thp feature will be exposed to user space as an extension to thp, I've opted to call the new counters after thp also (as aposed to "large"/"large folio"/etc.), so "huge" no longer strictly means PMD - one could argue hugetlb already breaks this rule anyway. I also did not want to risk breaking back compat by renaming/redefining the existing counters (which would have resulted in more consistent and clearer names). So the existing NR_ANON_THPS counters remain and continue to only refer to PMD-mapped THPs. And I've added new counters, which only refer to PTE-mapped THPs. Signed-off-by: Ryan Roberts --- Documentation/ABI/testing/procfs-smaps_rollup | 1 + Documentation/admin-guide/cgroup-v1/memory.rst | 5 ++++- Documentation/admin-guide/cgroup-v2.rst | 6 +++++- Documentation/admin-guide/mm/transhuge.rst | 11 +++++++---- Documentation/filesystems/proc.rst | 14 ++++++++++++-- drivers/base/node.c | 2 ++ fs/proc/meminfo.c | 2 ++ fs/proc/task_mmu.c | 4 ++++ include/linux/mmzone.h | 1 + mm/memcontrol.c | 8 ++++++++ mm/rmap.c | 11 +++++++++-- mm/show_mem.c | 2 ++ mm/vmstat.c | 1 + 13 files changed, 58 insertions(+), 10 deletions(-) diff --git a/Documentation/ABI/testing/procfs-smaps_rollup b/Documentation/ABI/testing/procfs-smaps_rollup index b446a7154a1b..b50b3eda5a3f 100644 --- a/Documentation/ABI/testing/procfs-smaps_rollup +++ b/Documentation/ABI/testing/procfs-smaps_rollup @@ -34,6 +34,7 @@ Description: Anonymous: 68 kB LazyFree: 0 kB AnonHugePages: 0 kB + AnonHugePteMap: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 5f502bf68fbc..b7efc7531896 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -535,7 +535,10 @@ memory.stat file includes following statistics: cache # of bytes of page cache memory. rss # of bytes of anonymous and swap cache memory (includes transparent hugepages). - rss_huge # of bytes of anonymous transparent hugepages. + rss_huge # of bytes of anonymous transparent hugepages, mapped by + PMD. + anon_thp_pte # of bytes of anonymous transparent hugepages, mapped by + PTE. mapped_file # of bytes of mapped file (includes tmpfs/shmem) pgpgin # of charging events to the memory cgroup. The charging event happens each time a page is accounted as either mapped diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index b26b5274eaaf..48b961b8fc6d 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1421,7 +1421,11 @@ PAGE_SIZE multiple when read back. anon_thp Amount of memory used in anonymous mappings backed by - transparent hugepages + transparent hugepages, mapped by PMD + + anon_thp_pte + Amount of memory used in anonymous mappings backed by + transparent hugepages, mapped by PTE file_thp Amount of cached filesystem data backed by transparent diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index b0cc8243e093..ebda57850643 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -291,10 +291,13 @@ Monitoring usage ================ The number of anonymous transparent huge pages currently used by the -system is available by reading the AnonHugePages field in ``/proc/meminfo``. -To identify what applications are using anonymous transparent huge pages, -it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages fields -for each mapping. +system is available by reading the AnonHugePages and AnonHugePteMap +fields in ``/proc/meminfo``. To identify what applications are using +anonymous transparent huge pages, it is necessary to read +``/proc/PID/smaps`` and count the AnonHugePages and AnonHugePteMap +fields for each mapping. Note that in both cases, AnonHugePages refers +only to PMD-mapped THPs. AnonHugePteMap refers to THPs that are mapped +using PTEs. The number of file transparent huge pages mapped to userspace is available by reading ShmemPmdMapped and ShmemHugePages fields in ``/proc/meminfo``. diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 2b59cff8be17..ccbb76a509f0 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -464,6 +464,7 @@ Memory Area, or VMA) there is a series of lines such as the following:: KSM: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB + AnonHugePteMap: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB @@ -511,7 +512,11 @@ pressure if the memory is clean. Please note that the printed value might be lower than the real value due to optimizations used in the current implementation. If this is not desirable please file a bug report. -"AnonHugePages" shows the amount of memory backed by transparent hugepage. +"AnonHugePages" shows the amount of memory backed by transparent hugepage, +mapped by PMD. + +"AnonHugePteMap" shows the amount of memory backed by transparent hugepage, +mapped by PTE. "ShmemPmdMapped" shows the amount of shared (shmem/tmpfs) memory backed by huge pages. @@ -1006,6 +1011,7 @@ Example output. You may not have all of these fields. EarlyMemtestBad: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 4149248 kB + AnonHugePteMap: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB @@ -1165,7 +1171,11 @@ HardwareCorrupted The amount of RAM/memory in KB, the kernel identifies as corrupted. AnonHugePages - Non-file backed huge pages mapped into userspace page tables + Non-file backed huge pages mapped into userspace page tables by + PMD +AnonHugePteMap + Non-file backed huge pages mapped into userspace page tables by + PTE ShmemHugePages Memory used by shared memory (shmem) and tmpfs allocated with huge pages diff --git a/drivers/base/node.c b/drivers/base/node.c index 493d533f8375..08f1759387d2 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -443,6 +443,7 @@ static ssize_t node_read_meminfo(struct device *dev, "Node %d SUnreclaim: %8lu kB\n" #ifdef CONFIG_TRANSPARENT_HUGEPAGE "Node %d AnonHugePages: %8lu kB\n" + "Node %d AnonHugePteMap: %8lu kB\n" "Node %d ShmemHugePages: %8lu kB\n" "Node %d ShmemPmdMapped: %8lu kB\n" "Node %d FileHugePages: %8lu kB\n" @@ -475,6 +476,7 @@ static ssize_t node_read_meminfo(struct device *dev, #ifdef CONFIG_TRANSPARENT_HUGEPAGE , nid, K(node_page_state(pgdat, NR_ANON_THPS)), + nid, K(node_page_state(pgdat, NR_ANON_THPS_PTEMAPPED)), nid, K(node_page_state(pgdat, NR_SHMEM_THPS)), nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)), nid, K(node_page_state(pgdat, NR_FILE_THPS)), diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 45af9a989d40..bac20cc60b6a 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -143,6 +143,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #ifdef CONFIG_TRANSPARENT_HUGEPAGE show_val_kb(m, "AnonHugePages: ", global_node_page_state(NR_ANON_THPS)); + show_val_kb(m, "AnonHugePteMap: ", + global_node_page_state(NR_ANON_THPS_PTEMAPPED)); show_val_kb(m, "ShmemHugePages: ", global_node_page_state(NR_SHMEM_THPS)); show_val_kb(m, "ShmemPmdMapped: ", diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3dd5be96691b..7b5dad163533 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -392,6 +392,7 @@ struct mem_size_stats { unsigned long anonymous; unsigned long lazyfree; unsigned long anonymous_thp; + unsigned long anonymous_thp_pte; unsigned long shmem_thp; unsigned long file_thp; unsigned long swap; @@ -452,6 +453,8 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page, mss->anonymous += size; if (!PageSwapBacked(page) && !dirty && !PageDirty(page)) mss->lazyfree += size; + if (!compound && PageTransCompound(page)) + mss->anonymous_thp_pte += size; } if (PageKsm(page)) @@ -833,6 +836,7 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss, SEQ_PUT_DEC(" kB\nKSM: ", mss->ksm); SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); + SEQ_PUT_DEC(" kB\nAnonHugePteMap: ", mss->anonymous_thp_pte); SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4106fbc5b4b3..5032fc31c651 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -186,6 +186,7 @@ enum node_stat_item { NR_FILE_THPS, NR_FILE_PMDMAPPED, NR_ANON_THPS, + NR_ANON_THPS_PTEMAPPED, NR_VMSCAN_WRITE, NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */ NR_DIRTIED, /* page dirtyings since bootup */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d13dde2f8b56..07d8e0b55b0e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -809,6 +809,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, case NR_ANON_MAPPED: case NR_FILE_MAPPED: case NR_ANON_THPS: + case NR_ANON_THPS_PTEMAPPED: case NR_SHMEM_PMDMAPPED: case NR_FILE_PMDMAPPED: WARN_ON_ONCE(!in_task()); @@ -1512,6 +1513,7 @@ static const struct memory_stat memory_stats[] = { #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE { "anon_thp", NR_ANON_THPS }, + { "anon_thp_pte", NR_ANON_THPS_PTEMAPPED }, { "file_thp", NR_FILE_THPS }, { "shmem_thp", NR_SHMEM_THPS }, #endif @@ -4052,6 +4054,7 @@ static const unsigned int memcg1_stats[] = { NR_ANON_MAPPED, #ifdef CONFIG_TRANSPARENT_HUGEPAGE NR_ANON_THPS, + NR_ANON_THPS_PTEMAPPED, #endif NR_SHMEM, NR_FILE_MAPPED, @@ -4067,6 +4070,7 @@ static const char *const memcg1_stat_names[] = { "rss", #ifdef CONFIG_TRANSPARENT_HUGEPAGE "rss_huge", + "anon_thp_pte", #endif "shmem", "mapped_file", @@ -6259,6 +6263,10 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, * can be done but it would be too convoluted so simply * ignore such a partial THP and keep it in original * memcg. There should be somebody mapping the head. + * This simplification also means that pte-mapped large + * folios are never migrated, which means we don't need + * to worry about migrating the NR_ANON_THPS_PTEMAPPED + * accounting. */ if (PageTransCompound(page)) goto put; diff --git a/mm/rmap.c b/mm/rmap.c index 106149690366..52dabee73023 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1205,7 +1205,7 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, { struct folio *folio = page_folio(page); atomic_t *mapped = &folio->_nr_pages_mapped; - int nr = 0, nr_pmdmapped = 0; + int nr = 0, nr_pmdmapped = 0, nr_lgmapped = 0; bool compound = flags & RMAP_COMPOUND; bool first = true; @@ -1214,6 +1214,7 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, first = atomic_inc_and_test(&page->_mapcount); nr = first; if (first && folio_test_large(folio)) { + nr_lgmapped = 1; nr = atomic_inc_return_relaxed(mapped); nr = (nr < COMPOUND_MAPPED); } @@ -1241,6 +1242,8 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped); + if (nr_lgmapped) + __lruvec_stat_mod_folio(folio, NR_ANON_THPS_PTEMAPPED, nr_lgmapped); if (nr) __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); @@ -1295,6 +1298,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, } atomic_set(&folio->_nr_pages_mapped, nr); + __lruvec_stat_mod_folio(folio, NR_ANON_THPS_PTEMAPPED, nr); } else { /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); @@ -1405,7 +1409,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, { struct folio *folio = page_folio(page); atomic_t *mapped = &folio->_nr_pages_mapped; - int nr = 0, nr_pmdmapped = 0; + int nr = 0, nr_pmdmapped = 0, nr_lgmapped = 0; bool last; enum node_stat_item idx; @@ -1423,6 +1427,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, last = atomic_add_negative(-1, &page->_mapcount); nr = last; if (last && folio_test_large(folio)) { + nr_lgmapped = 1; nr = atomic_dec_return_relaxed(mapped); nr = (nr < COMPOUND_MAPPED); } @@ -1454,6 +1459,8 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, idx = NR_FILE_PMDMAPPED; __lruvec_stat_mod_folio(folio, idx, -nr_pmdmapped); } + if (nr_lgmapped && folio_test_anon(folio)) + __lruvec_stat_mod_folio(folio, NR_ANON_THPS_PTEMAPPED, -nr_lgmapped); if (nr) { idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED; __lruvec_stat_mod_folio(folio, idx, -nr); diff --git a/mm/show_mem.c b/mm/show_mem.c index 4b888b18bdde..e648a815f0fb 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -254,6 +254,7 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z " shmem_thp:%lukB" " shmem_pmdmapped:%lukB" " anon_thp:%lukB" + " anon_thp_pte:%lukB" #endif " writeback_tmp:%lukB" " kernel_stack:%lukB" @@ -280,6 +281,7 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z K(node_page_state(pgdat, NR_SHMEM_THPS)), K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)), K(node_page_state(pgdat, NR_ANON_THPS)), + K(node_page_state(pgdat, NR_ANON_THPS_PTEMAPPED)), #endif K(node_page_state(pgdat, NR_WRITEBACK_TEMP)), node_page_state(pgdat, NR_KERNEL_STACK_KB), diff --git a/mm/vmstat.c b/mm/vmstat.c index 00e81e99c6ee..267de0e4ddca 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1224,6 +1224,7 @@ const char * const vmstat_text[] = { "nr_file_hugepages", "nr_file_pmdmapped", "nr_anon_transparent_hugepages", + "nr_anon_thp_pte", "nr_vmscan_write", "nr_vmscan_immediate_reclaim", "nr_dirtied", From patchwork Fri Sep 29 11:44:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE346E810D5 for ; Fri, 29 Sep 2023 11:44:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 587A18E0002; Fri, 29 Sep 2023 07:44:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 50FCA8E0001; Fri, 29 Sep 2023 07:44:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33E518E0002; Fri, 29 Sep 2023 07:44:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1E5368E0001 for ; Fri, 29 Sep 2023 07:44:47 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E9AF11A0237 for ; Fri, 29 Sep 2023 11:44:46 +0000 (UTC) X-FDA: 81289452972.15.480E62D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf14.hostedemail.com (Postfix) with ESMTP id 37A5E100017 for ; Fri, 29 Sep 2023 11:44:45 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987885; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mFKsrfYRPYbVeNdBA8ZVRwXhiya+e01CfoMJVaq5bvk=; b=e6WVrWCBbfkvyuh2j7MllCW/g87eh+yEdFgd+QOlHB7mDD6NKzg0MQMrgbApvIDeMGNH6O /fGwqchQ89OXCVRm4/iIQI4Btvs5eEf5gDVMcfPR69vViW06GOVLQBFqHkN3GKLu/BJ/Bz ydx0G4XCbW13gHgL00MV9Hwc7ShUfSE= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987885; a=rsa-sha256; cv=none; b=XMVw/8ihO/yuaCt6aahLWaqHwcNU9W5wVDEIdGfajVlKO0K3qHvA9tMNJKLBuN1d63tC/v u4zk4VbvUsnYHXc1QRXmlRqd4zN8D+JrvHR22vcp7DidnBCcX+OGKvbUw8b9DKd52cG74h lEZ9OTZiG+XnzTeNgJPYddqO5K3VoKA= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D9363DA7; Fri, 29 Sep 2023 04:45:22 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DC3AE3F59C; Fri, 29 Sep 2023 04:44:41 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 4/9] mm: thp: Introduce anon_orders and anon_always_mask sysfs files Date: Fri, 29 Sep 2023 12:44:15 +0100 Message-Id: <20230929114421.3761121-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 37A5E100017 X-Stat-Signature: b9z5mcobd65joqphsobh84jwj8xuocr6 X-HE-Tag: 1695987885-78226 X-HE-Meta: U2FsdGVkX19XzsgPGIUw7Ck7cjqLSwbX6W3X9lcS/9+0kLWi6cTS7FmYsRd5xonG/J4lBaBj4d/aVY61my02xszgnBydBXg04AmQnJOuqWYE5sPh0Zg/e3Yl0TrcMJAGy/g3aNTrWmWUeK4F/r7IFspWv4px4uEmINWf3ARvkXp2NZnfO+7xmz7fOjcSjWbx8nklK/eF8QVNz8FBGPln3Nn7onT7gUqbwtvsM2ISspX+HWDQT6D0yaFhZfUbiFtQ7ITJUUw1miNz8/Zvt9yzi91Z8nthLUtj7j4vlT/SEf+oD2CuFjqWpX2BpewHLkaS6TWNV+MmWDHTb23tiMYEkaSwVkiVhXkNTN6Or9CFWItUwV0EBu340CYHYHrBq3zrzIKHvdu09mk7XU8WwDsvCIFEfigm3/qItzsJKPTkvgTjpLukFu+Ev7vavNzxa04g4XcSQXrnmbHoQxb8bwUzm19HMTnZz+YR5BFW3mapeb6RaRW7h42F88QqqOQkjNAz2x7cimzIfbEjShHbdyZzjtfPVrjyJUdKMSy1N9uSBCQ9OhrSP3ArN60HcRCzxot/g42oOM56ZK7F3JVjMmE2q/optZzFPhU2g9S/lDRuUJM1CYH4jIESOIrUHW2nTC+OQ+wEsgs8lSjqqB88liktNzrWXFn+/bgJkkGGVwywzlOMeLpwT9dJOtZOETU1TTHWOASgerwiH+HKG2R/Om03MvVxeISqo43S1DDkruKfKZ/n+FcDyXd0Zyt3V/eUx18xX7MUcq4tzLmKwUnP0PWcJVE+TjtHVmccf8U2j8N84oRauXpIeBHioT8zSXI7dn+opP7gvNpTvwCiPk+XI+kBYAPDpC0HRg7dieVTKdyM9+3KE6OHmUATS/9sY2xqR+MpMAQ7fXvfX3W/7WLCXFHiJTf4vRWnEck1vs3lOmHn+wAzYgFI8CekDuISx4NWoHkaKsmKnhsMYu/1L07GZBJ uir0ZmL8 fSNRvhik1AXqMS78FrCum62WFC5OIdRLsH6X+lPY6gUTwCviat9kj20t48zTNuKMRrUZt1XGkKEIFZLH7E2zKEDCRXWji/f4Qv5XhA8/Ie6A0IQFt2vyQQZ7shwWkx0E/7wkhJlwMq6r9fmDe5ULC7gSXmRoTna9uUps2xZADUFTqmont+LKJDJVxg5fIz29nRia+Ilkaj2RrtVZhJsVuLyENP5+P8LghGABHioWKAnm/RguCoN84CLqxs8M0auS6nVBBaetkLEKWyvd06QKJAmnmPA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for adding support for anonymous large folios that are smaller than the PMD-size, introduce 2 new sysfs files that will be used to control the new behaviours via the transparent_hugepage interface. For now, the kernel still only supports PMD-order anonymous THP, so when reading back anon_orders, it will reflect that. Therefore there are no behavioural changes intended here. The bulk of the change is implemented by converting transhuge_vma_suitable() and hugepage_vma_check() so that they take a bitfield of orders for which the user wants to determine support, and the functions filter out all the orders that can't be supported. If there is only 1 order set in the input then the output can continue to be treated like a boolean; this is the case for most call sites. The remainder is copied from Documentation/admin-guide/mm/transhuge.rst, as modified by this commit. See that file for further details. By default, allocation of anonymous THPs that are smaller than PMD-size is disabled. These smaller allocation orders can be enabled by writing an encoded set of orders as follows:: echo 0x208 >/sys/kernel/mm/transparent_hugepage/anon_orders Where an order refers to the number of pages in the large folio as 2^order, and where each order is encoded in the written value such that each set bit represents an enabled order; So setting bit-2 indicates that order-2 folios are in use, and order-2 means 2^2=4 pages (=16K if the page size is 4K). The example above enables order-9 (PMD-order) and order-3. By enabling multiple orders, allocation of each order will be attempted, highest to lowest, until a successful allocation is made. If the PMD-order is unset, then no PMD-sized THPs will be allocated. The kernel will ignore any orders that it does not support so read the file back to determine which orders are enabled:: cat /sys/kernel/mm/transparent_hugepage/anon_orders For some workloads it may be desirable to limit some THP orders to be used only for MADV_HUGEPAGE regions, while allowing others to be used always. For example, a workload may only benefit from PMD-sized THP in specific areas, but can take benefit of 32K sized THP more generally. In this case, THP can be enabled in ``madvise`` mode as normal, but specific orders can be configured to be allocated as if in ``always`` mode. The below example enables orders 9 and 3, with order-9 only applied to MADV_HUGEPAGE regions, and order-3 applied always:: echo madvise >/sys/kernel/mm/transparent_hugepage/enabled echo 0x208 >/sys/kernel/mm/transparent_hugepage/anon_orders echo 0x008 >/sys/kernel/mm/transparent_hugepage/anon_always_mask Signed-off-by: Ryan Roberts --- Documentation/admin-guide/mm/transhuge.rst | 74 ++++++++-- Documentation/filesystems/proc.rst | 6 +- fs/proc/task_mmu.c | 3 +- include/linux/huge_mm.h | 93 +++++++++--- mm/huge_memory.c | 164 ++++++++++++++++++--- mm/khugepaged.c | 18 ++- mm/memory.c | 6 +- mm/page_vma_mapped.c | 3 +- 8 files changed, 296 insertions(+), 71 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index ebda57850643..9f954e73a4ca 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -45,10 +45,22 @@ components: the two is using hugepages just because of the fact the TLB miss is going to run faster. +Furthermore, it is possible to configure THP to allocate large folios +to back anonymous memory, which are smaller than PMD-size (for example +16K, 32K, 64K, etc). These THPs continue to be PTE-mapped, but in many +cases can still provide the similar benefits to those outlined above: +Page faults are significantly reduced (by a factor of e.g. 4, 8, 16, +etc), but latency spikes are much less prominent because the size of +each page isn't as huge as the PMD-sized variant and there is less +memory to clear in each page fault. Some architectures also employ TLB +compression mechanisms to squeeze more entries in when a set of PTEs +are virtually and physically contiguous and approporiately aligned. In +this case, TLB misses will occur less often. + THP can be enabled system wide or restricted to certain tasks or even memory ranges inside task's address space. Unless THP is completely disabled, there is ``khugepaged`` daemon that scans memory and -collapses sequences of basic pages into huge pages. +collapses sequences of basic pages into PMD-sized huge pages. The THP behaviour is controlled via :ref:`sysfs ` interface and using madvise(2) and prctl(2) system calls. @@ -146,25 +158,69 @@ madvise never should be self-explanatory. -By default kernel tries to use huge zero page on read page fault to -anonymous mapping. It's possible to disable huge zero page by writing 0 -or enable it back by writing 1:: +By default kernel tries to use huge, PMD-mapped zero page on read page +fault to anonymous mapping. It's possible to disable huge zero page by +writing 0 or enable it back by writing 1:: echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page Some userspace (such as a test program, or an optimized memory allocation -library) may want to know the size (in bytes) of a transparent hugepage:: +library) may want to know the size (in bytes) of a PMD-mappable +transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +By default, allocation of anonymous THPs that are smaller than +PMD-size is disabled. These smaller allocation orders can be enabled +by writing an encoded set of orders as follows:: + + echo 0x208 >/sys/kernel/mm/transparent_hugepage/anon_orders + +Where an order refers to the number of pages in the large folio as +2^order, and where each order is encoded in the written value such +that each set bit represents an enabled order; So setting bit-2 +indicates that order-2 folios are in use, and order-2 means 2^2=4 +pages (=16K if the page size is 4K). The example above enables order-9 +(PMD-order) and order-3. + +By enabling multiple orders, allocation of each order will be +attempted, highest to lowest, until a successful allocation is made. +If the PMD-order is unset, then no PMD-sized THPs will be allocated. + +The kernel will ignore any orders that it does not support so read the +file back to determine which orders are enabled:: + + cat /sys/kernel/mm/transparent_hugepage/anon_orders + +For some workloads it may be desirable to limit some THP orders to be +used only for MADV_HUGEPAGE regions, while allowing others to be used +always. For example, a workload may only benefit from PMD-sized THP in +specific areas, but can take benefit of 32K sized THP more generally. +In this case, THP can be enabled in ``madvise`` mode as normal, but +specific orders can be configured to be allocated as if in ``always`` +mode. The below example enables orders 9 and 3, with order-9 only +applied to MADV_HUGEPAGE regions, and order-3 applied always:: + + echo madvise >/sys/kernel/mm/transparent_hugepage/enabled + echo 0x208 >/sys/kernel/mm/transparent_hugepage/anon_orders + echo 0x008 >/sys/kernel/mm/transparent_hugepage/anon_always_mask + khugepaged will be automatically started when -transparent_hugepage/enabled is set to "always" or "madvise, and it'll -be automatically shutdown if it's set to "never". +transparent_hugepage/enabled is set to "always" or "madvise", +providing the PMD-order is enabled in +transparent_hugepage/anon_orders, and it'll be automatically shutdown +if it's set to "never" or the PMD-order is disabled in +transparent_hugepage/anon_orders. Khugepaged controls ------------------- +.. note:: + khugepaged currently only searches for opportunities to collapse to + PMD-sized THP and no attempt is made to collapse to smaller order + THP. + khugepaged runs usually at low frequency so while one may not want to invoke defrag algorithms synchronously during the page faults, it should be worth invoking defrag at least in khugepaged. However it's @@ -285,7 +341,7 @@ Need of application restart The transparent_hugepage/enabled values and tmpfs mount option only affect future behavior. So to make them effective you need to restart any application that could have been using hugepages. This also applies to the -regions registered in khugepaged. +regions registered in khugepaged, and transparent_hugepage/anon_orders. Monitoring usage ================ @@ -416,7 +472,7 @@ for huge pages. Optimizing the applications =========================== -To be guaranteed that the kernel will map a 2M page immediately in any +To be guaranteed that the kernel will map a thp immediately in any memory region, the mmap region has to be hugepage naturally aligned. posix_memalign() can provide that guarantee. diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index ccbb76a509f0..72526f8bb658 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -533,9 +533,9 @@ replaced by copy-on-write) part of the underlying shmem object out on swap. does not take into account swapped out page of underlying shmem objects. "Locked" indicates whether the mapping is locked in memory or not. -"THPeligible" indicates whether the mapping is eligible for allocating THP -pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise. -It just shows the current status. +"THPeligible" indicates whether the mapping is eligible for allocating +naturally aligned THP pages of any currently enabled order. 1 if true, 0 +otherwise. It just shows the current status. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 7b5dad163533..f978dce7f7ce 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -869,7 +869,8 @@ static int show_smap(struct seq_file *m, void *v) __show_smap(m, &mss, false); seq_printf(m, "THPeligible: %8u\n", - hugepage_vma_check(vma, vma->vm_flags, true, false, true)); + !!hugepage_vma_check(vma, vma->vm_flags, true, false, true, + THP_ORDERS_ALL)); if (arch_pkeys_enabled()) seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index fa0350b0812a..2e7c338229a6 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -67,6 +67,21 @@ extern struct kobj_attribute shmem_enabled_attr; #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) #define HPAGE_PMD_NR (1<vm_start >> PAGE_SHIFT) - vma->vm_pgoff, - HPAGE_PMD_NR)) - return false; +static inline unsigned int transhuge_vma_suitable(struct vm_area_struct *vma, + unsigned long addr, unsigned int orders) +{ + int order; + + /* + * Iterate over orders, highest to lowest, removing orders that don't + * meet alignment requirements from the set. Exit loop at first order + * that meets requirements, since all lower orders must also meet + * requirements. + */ + + order = first_order(orders); + + while (orders) { + unsigned long hpage_size = PAGE_SIZE << order; + unsigned long haddr = ALIGN_DOWN(addr, hpage_size); + + if (haddr >= vma->vm_start && + haddr + hpage_size <= vma->vm_end) { + if (!vma_is_anonymous(vma)) { + if (IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - + vma->vm_pgoff, + hpage_size >> PAGE_SHIFT)) + break; + } else + break; + } + + order = next_order(&orders, order); } - haddr = addr & HPAGE_PMD_MASK; - - if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end) - return false; - return true; + return orders; } static inline bool file_thp_enabled(struct vm_area_struct *vma) @@ -130,8 +173,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } -bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, - bool smaps, bool in_pf, bool enforce_sysfs); +unsigned int hugepage_vma_check(struct vm_area_struct *vma, + unsigned long vm_flags, bool smaps, bool in_pf, + bool enforce_sysfs, unsigned int orders); #define transparent_hugepage_use_zero_page() \ (transparent_hugepage_flags & \ @@ -267,17 +311,18 @@ static inline bool folio_test_pmd_mappable(struct folio *folio) return false; } -static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, - unsigned long addr) +static inline unsigned int transhuge_vma_suitable(struct vm_area_struct *vma, + unsigned long addr, unsigned int orders) { - return false; + return 0; } -static inline bool hugepage_vma_check(struct vm_area_struct *vma, - unsigned long vm_flags, bool smaps, - bool in_pf, bool enforce_sysfs) +static inline unsigned int hugepage_vma_check(struct vm_area_struct *vma, + unsigned long vm_flags, bool smaps, + bool in_pf, bool enforce_sysfs, + unsigned int orders) { - return false; + return 0; } static inline void folio_prep_large_rmappable(struct folio *folio) {} diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 064fbd90822b..bcecce769017 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -70,12 +70,48 @@ static struct shrinker deferred_split_shrinker; static atomic_t huge_zero_refcount; struct page *huge_zero_page __read_mostly; unsigned long huge_zero_pfn __read_mostly = ~0UL; +unsigned int huge_anon_orders __read_mostly = BIT(PMD_ORDER); +static unsigned int huge_anon_always_mask __read_mostly; -bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, - bool smaps, bool in_pf, bool enforce_sysfs) +/** + * hugepage_vma_check - determine which hugepage orders can be applied to vma + * @vma: the vm area to check + * @vm_flags: use these vm_flags instead of vma->vm_flags + * @smaps: whether answer will be used for smaps file + * @in_pf: whether answer will be used by page fault handler + * @enforce_sysfs: whether sysfs config should be taken into account + * @orders: bitfield of all orders to consider + * + * Calculates the intersection of the requested hugepage orders and the allowed + * hugepage orders for the provided vma. Permitted orders are encoded as a set + * bit at the corresponding bit position (bit-2 corresponds to order-2, bit-3 + * corresponds to order-3, etc). Order-0 is never considered a hugepage order. + * + * Return: bitfield of orders allowed for hugepage in the vma. 0 if no hugepage + * orders are allowed. + */ +unsigned int hugepage_vma_check(struct vm_area_struct *vma, + unsigned long vm_flags, bool smaps, bool in_pf, + bool enforce_sysfs, unsigned int orders) { + /* + * Fix up the orders mask; Supported orders for file vmas are static. + * Supported orders for anon vmas are configured dynamically - but only + * use the dynamic set if enforce_sysfs=true, otherwise use the full + * set. + */ + if (vma_is_anonymous(vma)) + orders &= enforce_sysfs ? READ_ONCE(huge_anon_orders) + : THP_ORDERS_ALL_ANON; + else + orders &= THP_ORDERS_ALL_FILE; + + /* No orders in the intersection. */ + if (!orders) + return 0; + if (!vma->vm_mm) /* vdso */ - return false; + return 0; /* * Explicitly disabled through madvise or prctl, or some @@ -84,16 +120,16 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, * */ if ((vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return false; + return 0; /* * If the hardware/firmware marked hugepage support disabled. */ if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) - return false; + return 0; /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ if (vma_is_dax(vma)) - return in_pf; + return in_pf ? orders : 0; /* * Special VMA and hugetlb VMA. @@ -101,17 +137,29 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, * VM_MIXEDMAP set. */ if (vm_flags & VM_NO_KHUGEPAGED) - return false; + return 0; /* - * Check alignment for file vma and size for both file and anon vma. + * Check alignment for file vma and size for both file and anon vma by + * filtering out the unsuitable orders. * * Skip the check for page fault. Huge fault does the check in fault - * handlers. And this check is not suitable for huge PUD fault. + * handlers. */ - if (!in_pf && - !transhuge_vma_suitable(vma, (vma->vm_end - HPAGE_PMD_SIZE))) - return false; + if (!in_pf) { + int order = first_order(orders); + unsigned long addr; + + while (orders) { + addr = vma->vm_end - (PAGE_SIZE << order); + if (transhuge_vma_suitable(vma, addr, BIT(order))) + break; + order = next_order(&orders, order); + } + + if (!orders) + return 0; + } /* * Enabled via shmem mount options or sysfs settings. @@ -120,23 +168,35 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, */ if (!in_pf && shmem_file(vma->vm_file)) return shmem_is_huge(file_inode(vma->vm_file), vma->vm_pgoff, - !enforce_sysfs, vma->vm_mm, vm_flags); + !enforce_sysfs, vma->vm_mm, vm_flags) + ? orders : 0; /* Enforce sysfs THP requirements as necessary */ - if (enforce_sysfs && - (!hugepage_flags_enabled() || (!(vm_flags & VM_HUGEPAGE) && - !hugepage_flags_always()))) - return false; + if (enforce_sysfs) { + /* enabled=never. */ + if (!hugepage_flags_enabled()) + return 0; + + /* enabled=madvise without VM_HUGEPAGE. */ + if (!(vm_flags & VM_HUGEPAGE) && !hugepage_flags_always()) { + if (vma_is_anonymous(vma)) { + orders &= READ_ONCE(huge_anon_always_mask); + if (!orders) + return 0; + } else + return 0; + } + } /* Only regular file is valid */ if (!in_pf && file_thp_enabled(vma)) - return true; + return orders; if (!vma_is_anonymous(vma)) - return false; + return 0; if (vma_is_temporary_stack(vma)) - return false; + return 0; /* * THPeligible bit of smaps should show 1 for proper VMAs even @@ -146,9 +206,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, * the first page fault. */ if (!vma->anon_vma) - return (smaps || in_pf); + return (smaps || in_pf) ? orders : 0; - return true; + return orders; } static bool get_huge_zero_page(void) @@ -391,11 +451,69 @@ static ssize_t hpage_pmd_size_show(struct kobject *kobj, static struct kobj_attribute hpage_pmd_size_attr = __ATTR_RO(hpage_pmd_size); +static ssize_t anon_orders_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "0x%08x\n", READ_ONCE(huge_anon_orders)); +} + +static ssize_t anon_orders_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err; + int ret = count; + unsigned int orders; + + err = kstrtouint(buf, 0, &orders); + if (err) + ret = -EINVAL; + + if (ret > 0) { + orders &= THP_ORDERS_ALL_ANON; + WRITE_ONCE(huge_anon_orders, orders); + + err = start_stop_khugepaged(); + if (err) + ret = err; + } + + return ret; +} + +static struct kobj_attribute anon_orders_attr = __ATTR_RW(anon_orders); + +static ssize_t anon_always_mask_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "0x%08x\n", READ_ONCE(huge_anon_always_mask)); +} + +static ssize_t anon_always_mask_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err; + unsigned int always_mask; + + err = kstrtouint(buf, 0, &always_mask); + if (err) + return -EINVAL; + + WRITE_ONCE(huge_anon_always_mask, always_mask); + + return count; +} + +static struct kobj_attribute anon_always_mask_attr = __ATTR_RW(anon_always_mask); + static struct attribute *hugepage_attr[] = { &enabled_attr.attr, &defrag_attr.attr, &use_zero_page_attr.attr, &hpage_pmd_size_attr.attr, + &anon_orders_attr.attr, + &anon_always_mask_attr.attr, #ifdef CONFIG_SHMEM &shmem_enabled_attr.attr, #endif @@ -778,7 +896,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) struct folio *folio; unsigned long haddr = vmf->address & HPAGE_PMD_MASK; - if (!transhuge_vma_suitable(vma, haddr)) + if (!transhuge_vma_suitable(vma, haddr, BIT(PMD_ORDER))) return VM_FAULT_FALLBACK; if (unlikely(anon_vma_prepare(vma))) return VM_FAULT_OOM; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 88433cc25d8a..2b5c0321d96b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -446,7 +446,8 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && hugepage_flags_enabled()) { - if (hugepage_vma_check(vma, vm_flags, false, false, true)) + if (hugepage_vma_check(vma, vm_flags, false, false, true, + BIT(PMD_ORDER))) __khugepaged_enter(vma->vm_mm); } } @@ -921,10 +922,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, if (!vma) return SCAN_VMA_NULL; - if (!transhuge_vma_suitable(vma, address)) + if (!transhuge_vma_suitable(vma, address, BIT(PMD_ORDER))) return SCAN_ADDRESS_RANGE; if (!hugepage_vma_check(vma, vma->vm_flags, false, false, - cc->is_khugepaged)) + cc->is_khugepaged, BIT(PMD_ORDER))) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1499,7 +1500,8 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, * and map it by a PMD, regardless of sysfs THP settings. As such, let's * analogously elide sysfs THP settings here. */ - if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false, + BIT(PMD_ORDER))) return SCAN_VMA_CHECK; /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ @@ -2369,7 +2371,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, progress++; break; } - if (!hugepage_vma_check(vma, vma->vm_flags, false, false, true)) { + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, true, + BIT(PMD_ORDER))) { skip: progress++; continue; @@ -2626,7 +2629,7 @@ int start_stop_khugepaged(void) int err = 0; mutex_lock(&khugepaged_mutex); - if (hugepage_flags_enabled()) { + if (hugepage_flags_enabled() && (huge_anon_orders & BIT(PMD_ORDER))) { if (!khugepaged_thread) khugepaged_thread = kthread_run(khugepaged, NULL, "khugepaged"); @@ -2706,7 +2709,8 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, *prev = vma; - if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false, + BIT(PMD_ORDER))) return -EINVAL; cc = kmalloc(sizeof(*cc), GFP_KERNEL); diff --git a/mm/memory.c b/mm/memory.c index e4b0f6a461d8..b5b82fc8e164 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4256,7 +4256,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) pmd_t entry; vm_fault_t ret = VM_FAULT_FALLBACK; - if (!transhuge_vma_suitable(vma, haddr)) + if (!transhuge_vma_suitable(vma, haddr, BIT(PMD_ORDER))) return ret; page = compound_head(page); @@ -5055,7 +5055,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, return VM_FAULT_OOM; retry_pud: if (pud_none(*vmf.pud) && - hugepage_vma_check(vma, vm_flags, false, true, true)) { + hugepage_vma_check(vma, vm_flags, false, true, true, BIT(PUD_ORDER))) { ret = create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; @@ -5089,7 +5089,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, goto retry_pud; if (pmd_none(*vmf.pmd) && - hugepage_vma_check(vma, vm_flags, false, true, true)) { + hugepage_vma_check(vma, vm_flags, false, true, true, BIT(PMD_ORDER))) { ret = create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index e0b368e545ed..5f7e89c5b595 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -268,7 +268,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) * cleared *pmd but not decremented compound_mapcount(). */ if ((pvmw->flags & PVMW_SYNC) && - transhuge_vma_suitable(vma, pvmw->address) && + transhuge_vma_suitable(vma, pvmw->address, + BIT(PMD_ORDER)) && (pvmw->nr_pages >= HPAGE_PMD_NR)) { spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); From patchwork Fri Sep 29 11:44:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A382BE80ABE for ; Fri, 29 Sep 2023 11:44:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38A488E0003; Fri, 29 Sep 2023 07:44:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 313838E0001; Fri, 29 Sep 2023 07:44:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B63C8E0003; Fri, 29 Sep 2023 07:44:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0E2328E0001 for ; Fri, 29 Sep 2023 07:44:50 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D579080B31 for ; Fri, 29 Sep 2023 11:44:49 +0000 (UTC) X-FDA: 81289453098.27.A2BFDCC Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf27.hostedemail.com (Postfix) with ESMTP id 208FC4002C for ; Fri, 29 Sep 2023 11:44:47 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf27.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987888; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2gExMYv36b1IpQ9pVpTbz0yc/Ix+8JpSzZniXtjIQmc=; b=lvXGIwo8k1wJ/rdn/f8nVPJPeinTyDxWqwXffqHtmo5IQohxkkrwoCA9aZZa2/PqiHH9TX bNnOqF2qR/b88C0GSNu9jVt+PWKtbTV2F+hwmK7KE8W6nXuDACjrVTfV/7/o26XUrIt/x4 CKBm75nH0XTBuDXbMPvgv4zCTHiDZ0s= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf27.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987888; a=rsa-sha256; cv=none; b=OXVS7CuqkCZvNGbTeRRMiWyQeEW4ZdK7EsMx5RbQh/98WhkGEgfHfCibPTkd1g/XTJzg/H irSq78yzqSq7swQtlv0cqVFkknjTb4AclOaisnsyZ7iYyxQY/GkjGKfZX+0CsxADFGJYHq waKGOp1RorsJno6ljB9nBc+7KYWh0HE= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9765D1FB; Fri, 29 Sep 2023 04:45:25 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B3E903F59C; Fri, 29 Sep 2023 04:44:44 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 5/9] mm: thp: Extend THP to allocate anonymous large folios Date: Fri, 29 Sep 2023 12:44:16 +0100 Message-Id: <20230929114421.3761121-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 208FC4002C X-Stat-Signature: d3tjznfmu1969fjoucw48ca571fk6ufk X-Rspam-User: X-HE-Tag: 1695987887-779258 X-HE-Meta: U2FsdGVkX1942R8L4/ggJJe6QqWmrMI2OrMQsbi7gq9pHT7wUxOOzz8v/qz9Hs/EhX8dZCctthBsO5gW5BAw2tMYJ/5e2VlULprnl68f/LJPysOe0kCGes6UlU8MbRVcVpDEvMMESnSZtHFVWfYq0lzZVHZrXPSbpcsGq7QbopIcrB9WXYP8x1dyCnpamED6g3sFxI5njer6AZFSX61jT4YWciEL6gWF4N4VB5iyR7gwUQi2RSVt1cWNSNO//v0f9UvQkP43Oj4JCRhQnM0ttaAqVeUUsxY8y5Mf3xQbBSdx28qjBgQX1aF9JvddwzT/wYlfoLjni1ywN3NVd3MgTloX7yP1KF2TgK+CFn26trn1OgVbn3BKQzkm+gxJk9bqdkcN/izONkc7TIPK2ucB7ch0UXzCjGo6d7JiT1+Hienw1wtESLHJzHBR+YUtRzzaWsRnGVwIuxYh8TZRQv4ziDOst8WsgtZMop0Kk4C6F1y8ONlx+M0bPzfAQwDZbKZHz9GLqRI9UK2xuE4JXinN4vWnITSNAdqayuZEd2hXfMw6qZZ4wZiaLSTOHQ153weJ9R9jQAcxexDaz6kpkAwaO+pENJM+MeDUj9L0F9ARg8E9225tkkBGKlQEu8b2T0TQGScBDYfTgIDd2Dxw01BBxxL4dU8YE3PF/rymfJcKxdbYos/tqcH8CLXqeu64nkDBJ0GHx0kdkFrbp/vmV6e7VucNM0J4EBnyWKUMYHzpQhNPLguc6eby7WuTxZSQo3XU/SYLku58+Fg5NA1JH+qBYOV3hm/Y5qAH3LAS+byDpxbxK/YPfZIm3A8xhjC5iMWJotxv6Hnh3qnEoogOLi8glkeKM7Wq2WHtRzLZytKx9Rhwa4qPS+dNEkAHQzqTE9ARnDCcvSnoAXYg3LK4kRjfS5Iofj/ygD22rh5P3vhAXOyk7Au9I6dyJiYIYVbgOIQzRfuYCxRI8U7jCM43Sq9 Tsf8dDeA MHP5reG7T+kpN5Rw57UmNGUa5HJDtfJi8WurcVO5ZZjeLT3W4Btvgetzy+svccOeHAoOSqtxfw4w1rWp6R1K/HBNiMGYK5Klc6TA/dYhkH51ix9YwL03kwNFOkmqmQ+sOpgDx8i/gmlVptRxQr3uGusrKP+UoZsM8tUuNXABEbmIYuOKz3iimgKmwFMzteWQJkGzIIq7wtKQMfRqabpJ9AeqZoK271VAPBQLEuCwFY3aGKq4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce the logic to allow THP to be configured (through the new anon_orders interface we just added) to allocate large folios to back anonymous memory, which are smaller than PMD-size (for example order-2, order-3, order-4, etc). These THPs continue to be PTE-mapped, but in many cases can still provide similar benefits to traditional PMD-sized THP: Page faults are significantly reduced (by a factor of e.g. 4, 8, 16, etc. depending on the configured order), but latency spikes are much less prominent because the size of each page isn't as huge as the PMD-sized variant and there is less memory to clear in each page fault. The number of per-page operations (e.g. ref counting, rmap management, lru list management) are also significantly reduced since those ops now become per-folio. Some architectures also employ TLB compression mechanisms to squeeze more entries in when a set of PTEs are virtually and physically contiguous and approporiately aligned. In this case, TLB misses will occur less often. The new behaviour is disabled by default because the anon_orders defaults to only enabling PMD-order, but can be enabled at runtime by writing to anon_orders (see documentation in previous commit). The long term aim is to default anon_orders to include suitable lower orders, but there are some risks around internal fragmentation that need to be better understood first. Signed-off-by: Ryan Roberts --- Documentation/admin-guide/mm/transhuge.rst | 9 +- include/linux/huge_mm.h | 6 +- mm/memory.c | 108 +++++++++++++++++++-- 3 files changed, 111 insertions(+), 12 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 9f954e73a4ca..732c3b2f4ba8 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -353,7 +353,9 @@ anonymous transparent huge pages, it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages and AnonHugePteMap fields for each mapping. Note that in both cases, AnonHugePages refers only to PMD-mapped THPs. AnonHugePteMap refers to THPs that are mapped -using PTEs. +using PTEs. This includes all THPs whose order is smaller than +PMD-order, as well as any PMD-order THPs that happen to be PTE-mapped +for other reasons. The number of file transparent huge pages mapped to userspace is available by reading ShmemPmdMapped and ShmemHugePages fields in ``/proc/meminfo``. @@ -367,6 +369,11 @@ frequently will incur overhead. There are a number of counters in ``/proc/vmstat`` that may be used to monitor how successfully the system is providing huge pages for use. +.. note:: + Currently the below counters only record events relating to + PMD-order THPs. Events relating to smaller order THPs are not + included. + thp_fault_alloc is incremented every time a huge page is successfully allocated to handle a page fault. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2e7c338229a6..c4860476a1f5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -68,9 +68,11 @@ extern struct kobj_attribute shmem_enabled_attr; #define HPAGE_PMD_NR (1<pte + i))) + return true; + } + + return false; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static struct folio *alloc_anon_folio(struct vm_fault *vmf) +{ + gfp_t gfp; + pte_t *pte; + unsigned long addr; + struct folio *folio; + struct vm_area_struct *vma = vmf->vma; + unsigned int orders; + int order; + + /* + * If uffd is active for the vma we need per-page fault fidelity to + * maintain the uffd semantics. + */ + if (userfaultfd_armed(vma)) + goto fallback; + + /* + * Get a list of all the (large) orders below PMD_ORDER that are enabled + * for this vma. Then filter out the orders that can't be allocated over + * the faulting address and still be fully contained in the vma. + */ + orders = hugepage_vma_check(vma, vma->vm_flags, false, true, true, + BIT(PMD_ORDER) - 1); + orders = transhuge_vma_suitable(vma, vmf->address, orders); + + if (!orders) + goto fallback; + + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); + if (!pte) + return ERR_PTR(-EAGAIN); + + order = first_order(orders); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + vmf->pte = pte + pte_index(addr); + if (!vmf_pte_range_changed(vmf, 1 << order)) + break; + order = next_order(&orders, order); + } + + vmf->pte = NULL; + pte_unmap(pte); + + gfp = vma_thp_gfp_mask(vma); + + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + folio = vma_alloc_folio(gfp, order, vma, addr, true); + if (folio) { + clear_huge_page(&folio->page, addr, 1 << order); + return folio; + } + order = next_order(&orders, order); + } + +fallback: + return vma_alloc_zeroed_movable_folio(vma, vmf->address); +} +#else +#define alloc_anon_folio(vmf) \ + vma_alloc_zeroed_movable_folio((vmf)->vma, (vmf)->address) +#endif + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4066,6 +4147,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) { + int i; + int nr_pages = 1; + unsigned long addr = vmf->address; bool uffd_wp = vmf_orig_pte_uffd_wp(vmf); struct vm_area_struct *vma = vmf->vma; struct folio *folio; @@ -4110,10 +4194,15 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* Allocate our own private page. */ if (unlikely(anon_vma_prepare(vma))) goto oom; - folio = vma_alloc_zeroed_movable_folio(vma, vmf->address); + folio = alloc_anon_folio(vmf); + if (IS_ERR(folio)) + return 0; if (!folio) goto oom; + nr_pages = folio_nr_pages(folio); + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); + if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) goto oom_free_page; folio_throttle_swaprate(folio, GFP_KERNEL); @@ -4130,12 +4219,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry), vma); - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); if (!vmf->pte) goto release; - if (vmf_pte_changed(vmf)) { - update_mmu_tlb(vma, vmf->address, vmf->pte); + if (vmf_pte_range_changed(vmf, nr_pages)) { + for (i = 0; i < nr_pages; i++) + update_mmu_tlb(vma, addr + PAGE_SIZE * i, vmf->pte + i); goto release; } @@ -4150,16 +4239,17 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_ref_add(folio, nr_pages - 1); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); + folio_add_new_anon_rmap(folio, vma, addr); folio_add_lru_vma(folio, vma); setpte: if (uffd_wp) entry = pte_mkuffd_wp(entry); - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages); /* No need to invalidate - it was non-present before */ - update_mmu_cache_range(vmf, vma, vmf->address, vmf->pte, 1); + update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages); unlock: if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); From patchwork Fri Sep 29 11:44:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D6DCE7F154 for ; Fri, 29 Sep 2023 11:44:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CBFA8E0005; Fri, 29 Sep 2023 07:44:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 97BE28E0001; Fri, 29 Sep 2023 07:44:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 844308E0005; Fri, 29 Sep 2023 07:44:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 75C238E0001 for ; Fri, 29 Sep 2023 07:44:53 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4E21DA136C for ; Fri, 29 Sep 2023 11:44:53 +0000 (UTC) X-FDA: 81289453266.09.6085C68 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf13.hostedemail.com (Postfix) with ESMTP id 9DFD720005 for ; Fri, 29 Sep 2023 11:44:50 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf13.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987890; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y7eGeJJo26SMVTELJHv7Gjv4n633olBN8TsxgQi6IXE=; b=VwRlUTtRdf8i6kvEzMoBJQjXDq7ZoJdSx/+1+eIzkQPk7eTdzDcQuGcbOg7YTvCDNZiy4e /tTTO+HrB7z0KWdyK84gh0BXGL0pauPFZHxmTpR4uzNAofNIhrcajoxQJcyP/s5nA77Ao8 o3EZ4WDBUjki4tTyT5/TRpQ1LNBEdgU= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf13.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987890; a=rsa-sha256; cv=none; b=WkpXpEPRIocuPtxRP667ZGwH1+SMna982uT/MBJoIOTTWEY+tUktswSq8fIzsBHcvJCHN0 HaeMvQetHOfBsCr2iuVv1Qli4W4NFkPuDo/5FGarxGTC/5K9YSBWOm6mpx/QCIUwaTc6HX BvXlddwTtiUQhIgyOBbEYB9HnAPKORA= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 523D91007; Fri, 29 Sep 2023 04:45:28 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7040B3F59C; Fri, 29 Sep 2023 04:44:47 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 6/9] mm: thp: Add "recommend" option for anon_orders Date: Fri, 29 Sep 2023 12:44:17 +0100 Message-Id: <20230929114421.3761121-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9DFD720005 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: gub9cknaiokukroqrwgeko1rggfc4caj X-HE-Tag: 1695987890-743839 X-HE-Meta: U2FsdGVkX1+u6eTJ2PRyfG3YmEUOkvJFQFb58kF9jl5cqs7ijrgx2YbDvWCSJE5aClp1QDkOwUWMJTmONq+pFnNsIK4jiVkuUia3rUdgh5lrEmDG872M0W/R9eeNlmj4fhExOJEIlzBY8dH0Iks6QssxiEoAwkuZuBuHlBogYqtACcw+jJIckjtkZkma5kT0NuPljg8tCX9XhkWFq8C5k585g0S6qUkq5pmEG0zCekMt9sqSgX/rfqGVdJjJD9hFl1K+DHRyxJSSI6EybcK1Bak0vkezfmQifZyhl9unqlvbFaZEAKIigkXASctsIwJSEtzI2hesSeo02xxvPcM6G/Fh7y7zE339hqou/+3iuSSeVdq2cjuEJrrfN8gebF/P2esz2i7OrEUZQXTD1bZJEmXChWXkVPQxgJ4OXPLq8+ZHb4ZF8YftV5tHjeuN3XGY2eBY/cFR9ZO6khTnvB7rrOhJQXOfFCRS6s/0iZILvkeBTLTIdd/xxw7zTTgfdriXH0N4puHjwwaL8fR2qJc7rRatRoqNHRh6qwKe0XoscHv1b1kcBrEFBXYStFZJuq3M2KB7qzc86rc1veJGIWpLxGB6mX66S7EGYhHiwm4s56uMeAAAwkuycvujrSTliNTJw9/v6nsV352vsaOwGAoTCi+k8gcWqsXKg42SnZmhljLrsSIu9Zq0339j/GoLfv2Sf7oHKj9RdDnJS8eV/nKpuaa+IO40m9vwAnQC/AFU0kg5NfK3xOWPG+zbfzjqf8E5zUYD8ncS0QLEg0/D6hPzqob71d+vw/5S8C05HRxTm9YCL/pWAwbbkK8F9dpIvhXNRL50eu/us9tYUFm9usQPYBrI0ms7RRcY3Mg2qSfXHzdeiZTLcuUBmb/XfWm7npcTxvYWRjkMuBVAw0QQYSu6KyoYzDQ0gHViqgObp/jzxok7ZBon3w4qldxcE8HDscr96y9VsDwCKtT5cdeigCZ MsrggzHt cz0jZ+6SUsR7PuP8SFx6nrhGcrAOB7mIyqgTpRey809Mc74DZ1fZpT8Nn0sa+CxFdrTiU/jsoQ9a6BXG8Glxnh2JiIAjVBoCAqSv0HK2LLMBN4SdXAHVf0zbrariIdQWZSKgB0uiDYr8mawuPsPEjHt6RQEy/Epz+Deq8CubY05FL7EHJRsRtNLf8UGuPjAxJ6sbUOPEIXPrASwoLLhtvNBwZzwynzhba9Y6GU44+2I0DzNMwKT3gIQJyKIMCHijfWjokg30o+/pDi85wHw89R680Zg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In addition to passing a bitfield of folio orders to enable for THP, allow the string "recommend" to be written, which has the effect of causing the system to enable the orders preferred by the architecture and by the mm. The user can see what these orders are by subsequently reading back the file. Note that these recommended orders are expected to be static for a given boot of the system, and so the keyword "auto" was deliberately not used, as I want to reserve it for a possible future use where the "best" order is chosen more dynamically at runtime. Recommended orders are determined as follows: - PMD_ORDER: The traditional THP size - arch_wants_pte_order() if implemented by the arch - PAGE_ALLOC_COSTLY_ORDER: The largest order kept on per-cpu free list arch_wants_pte_order() can be overridden by the architecture if desired. Some architectures (e.g. arm64) can coalsece TLB entries if a contiguous set of ptes map physically contigious, naturally aligned memory, so this mechanism allows the architecture to optimize as required. Here we add the default implementation of arch_wants_pte_order(), used when the architecture does not define it, which returns -1, implying that the HW has no preference. Signed-off-by: Ryan Roberts --- Documentation/admin-guide/mm/transhuge.rst | 4 ++++ include/linux/pgtable.h | 13 +++++++++++++ mm/huge_memory.c | 14 +++++++++++--- 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 732c3b2f4ba8..d6363d4efa3a 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -187,6 +187,10 @@ pages (=16K if the page size is 4K). The example above enables order-9 By enabling multiple orders, allocation of each order will be attempted, highest to lowest, until a successful allocation is made. If the PMD-order is unset, then no PMD-sized THPs will be allocated. +It is also possible to enable the recommended set of orders, which +will be optimized for the architecture and mm:: + + echo recommend >/sys/kernel/mm/transparent_hugepage/anon_orders The kernel will ignore any orders that it does not support so read the file back to determine which orders are enabled:: diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index af7639c3b0a3..0e110ce57cc3 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -393,6 +393,19 @@ static inline void arch_check_zapped_pmd(struct vm_area_struct *vma, } #endif +#ifndef arch_wants_pte_order +/* + * Returns preferred folio order for pte-mapped memory. Must be in range [0, + * PMD_ORDER) and must not be order-1 since THP requires large folios to be at + * least order-2. Negative value implies that the HW has no preference and mm + * will choose it's own default order. + */ +static inline int arch_wants_pte_order(void) +{ + return -1; +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bcecce769017..e2e2d3906a21 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -464,10 +464,18 @@ static ssize_t anon_orders_store(struct kobject *kobj, int err; int ret = count; unsigned int orders; + int arch; - err = kstrtouint(buf, 0, &orders); - if (err) - ret = -EINVAL; + if (sysfs_streq(buf, "recommend")) { + arch = max(arch_wants_pte_order(), PAGE_ALLOC_COSTLY_ORDER); + orders = BIT(arch); + orders |= BIT(PAGE_ALLOC_COSTLY_ORDER); + orders |= BIT(PMD_ORDER); + } else { + err = kstrtouint(buf, 0, &orders); + if (err) + ret = -EINVAL; + } if (ret > 0) { orders &= THP_ORDERS_ALL_ANON; From patchwork Fri Sep 29 11:44:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF38CE80ABE for ; Fri, 29 Sep 2023 11:44:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7822B8D00EF; Fri, 29 Sep 2023 07:44:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70B2E8D008F; Fri, 29 Sep 2023 07:44:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55E958D00EF; Fri, 29 Sep 2023 07:44:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3FEAA8D008F for ; Fri, 29 Sep 2023 07:44:55 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 19E854138B for ; Fri, 29 Sep 2023 11:44:55 +0000 (UTC) X-FDA: 81289453350.14.5388730 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 771CDC0016 for ; Fri, 29 Sep 2023 11:44:53 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987893; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R12DE0IaGz6VExNqcRsTIxpYiYIG/gDiSa9v3a/m4ts=; b=hLZPiOKRd4MuJDpHDKW99EGCPtbcrE11qtonBYgwoF/nusH/S3e/uqHcPRkwXl23xMz4Pi thOLjjiSor1SWygbfjWWI9i7UjAFDM3XIC0DxlzhM8CiCKOCGTmk9l7vn/1Eqq0TftXibP bre8WDosd3gjfUv2fKRswpJfItGFNK8= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987893; a=rsa-sha256; cv=none; b=LgCNNxZ9hLWux5oM/SN+5UFmW75vNOD6TD6xTNBhbKWasMspQ2rMZ9M1DTtGWNU9/JTh2Z M6iR6zrT6gn9Ma8JTFc+A2fBy2f4CIfIcob1DCeTUMAbUtBQdEhotdYbMR0wQLE4VTVdaz tqvPGB0c4Z5ltsOz7rOFpBwjBEIcJWI= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0EAF31FB; Fri, 29 Sep 2023 04:45:31 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2C6323F59C; Fri, 29 Sep 2023 04:44:50 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 7/9] arm64/mm: Override arch_wants_pte_order() Date: Fri, 29 Sep 2023 12:44:18 +0100 Message-Id: <20230929114421.3761121-8-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 771CDC0016 X-Rspam-User: X-Stat-Signature: adj6urhp89rwmfy8j39m451tcoczpc8y X-Rspamd-Server: rspam01 X-HE-Tag: 1695987893-565885 X-HE-Meta: U2FsdGVkX185J1NqzJwCWI710aex2F4hXiyU/80kQ+SBm7XCgv+T8p7kBlJ/Skl24o6EngrVkVKC2Xzu5hkE+XYLPI/vGJkrsjnXoFMB5tTkFoB1zD2Uqo5ettmIL4QQ6cr8oDgud7fb54cxAy8gPm5uOc2SUvaw4udcbUuIEY8VrGYIYjWHkrxQy9R5pPOwPNsBURLXqejEjkQK8UT1POU1bLhTyBVi+uMfdSE7jK8OSwbRhwuF6pCo/ndmo87fxSpUUHwcCL1LBd/L+CPYdevfxuJ62q0OWE7/p0XWJLh6HrlyOTC0eSDUZ74E0Saq8Wr16Tl+0SGCinb/R/xsOLmcWnn+crJN2DQY/5DwWqnJDHXOHJhcSYTmilvxYU7OpRSO2kyZ4gFyPLgJZgbBkCRZqlgb1dU3a/XdnMw5ZWepiaSxfYcfu/JF9OyQrlFde7rnYhmyo6VzrU50d0ZAvXp+XY2dmkmladgCLeZi9BxAI+Wa+3+M7DDDXPd9cLQnhpPoZB4DY3/4CuY9aCOLd6AkPQo0DhME4q+bBviA7ZAJOE0hLG0fQZient9z5tWG1RIrQApzUQvCUKGuXFC5+QhhyguUrcv0hG/I+iHwvcIN1ob5TTCQgi10zPH6AgltVIHzRERcoMK3qPmVxHdYPvQrt59kWRTusbvy+xCw38Q0vwueI6ZJGCy9cCuHBnVkdDjynLW+mPw3/RUoWTR6qsZ05ntksyrdIuSvt/mHj6v2K44wqfwLxA4JRmTd8idqcLGdndJZnDhz4pTBZ6na0e15ML6st351S/6jZ7AKvrwWtUeaGjZ3kn32DG1JZhgwlz2a7qR2ooRXIF6MCTq/V+b1uR/CY+JdMiB5BUQ7kv9uhBgICFrAOlSCMJP2tlvk/QGXjFFQ5HFAVwQRmutCJ7KEaCtsCKWsz35CNLX+zaD+PT5OVjPLVeMlFfwGfFoIAmpuKRPNOPqUiHDuvwZ 4AGgIZJY qb1CrYiavbHwiw1xXI87tFSnyBOrwuEc0Me/DxWvO8PaAkpE8aRfp30abCkzvUXxd9MFy2hfCP9Th1+vtqN/oiKfRXu4LEqx9CCuCI75ifgArC2dHP/mPqhjfjAACl4Dr5wYQGWsN0EFwlrBdG+uJUJSSkoGiTPIAJ6k8jhAy46yZD+K6J3MbaI+6glSqhX1/chsLwlVnQV/yTbs6atgz8/6KScGhRFwU6gOy+UROPJxw56TyrueyJaEGcPkNcLqjy5nFXSxMB/RSscUEWvXyO3R/uEAxkB3b1ulQo0pFjdH/QrU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Define an arch-specific override of arch_wants_pte_order() so that when anon_orders=recommend is set, large folios will be allocated for anonymous memory with an order that is compatible with arm64's HPA uarch feature. Reviewed-by: Yu Zhao Signed-off-by: Ryan Roberts Acked-by: Catalin Marinas --- arch/arm64/include/asm/pgtable.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 7f7d9b1df4e5..e3d2449dec5c 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1110,6 +1110,16 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma, extern void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); + +#define arch_wants_pte_order arch_wants_pte_order +static inline int arch_wants_pte_order(void) +{ + /* + * Many arm64 CPUs support hardware page aggregation (HPA), which can + * coalesce 4 contiguous pages into a single TLB entry. + */ + return 2; +} #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ From patchwork Fri Sep 29 11:44:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D245E80ABE for ; Fri, 29 Sep 2023 11:44:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27B6A8E0001; Fri, 29 Sep 2023 07:44:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 229878D008F; Fri, 29 Sep 2023 07:44:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 054A38E0001; Fri, 29 Sep 2023 07:44:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DFF938D008F for ; Fri, 29 Sep 2023 07:44:57 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BA32E41388 for ; Fri, 29 Sep 2023 11:44:57 +0000 (UTC) X-FDA: 81289453434.24.5337072 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP id 14A8F1A0008 for ; Fri, 29 Sep 2023 11:44:55 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987896; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=quTNzgeRKuuO6Qb2fbhO+s0uzcvHJvNtQAmGH7Z0kG8=; b=Ej83MC4eFddJWcLeWS32RWAF/T29TZUcQdpifuumK6r/cJoX7PoWc6yU0hWR5CocPDMw3z /cyR5P1LHn9IcjQ9GbHDqcytwOqvgtTl6AaFqU11Wz0ooFfAwyp/z1K5yfdiLakSboyyMU BuC9sy8/eiqcfi5W6Ilr5TcDalpllok= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987896; a=rsa-sha256; cv=none; b=B77EtneBt4OwDUlN+l8UzP6f4zTbsbGKyxKZ4Y+Br1UqBC9J7wt4WYKRHacjZA+kworJKt dheDMsQfv1ulp6NmPc30kgqx7ot2+aL9F73zzK7DGxdOPM5FYw2bgVpH6B/x5D8kb4xW5q QRgmlW3AOLw5eQ7kgKYMdQhupFJIyjI= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BF64FDA7; Fri, 29 Sep 2023 04:45:33 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DDAFE3F59C; Fri, 29 Sep 2023 04:44:52 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 8/9] selftests/mm/cow: Generalize do_run_with_thp() helper Date: Fri, 29 Sep 2023 12:44:19 +0100 Message-Id: <20230929114421.3761121-9-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 14A8F1A0008 X-Rspam-User: X-Stat-Signature: xzywijg643ff1u4eodqge1mdwxyaab9z X-Rspamd-Server: rspam01 X-HE-Tag: 1695987895-834507 X-HE-Meta: U2FsdGVkX1/qK8dietQjs9QW/vg7yVAPhkzrOog7GYBzTsY/nju9vCFHVOFa40QD78jIElGoPD+EyE3G/dS3tCKONDOL1WdR8fffTZkLHfFgIDnRWico6EMvCgLRKIi6dv8VD4dhXAnnO1wehamkOEMJtTWinZndYqs5+ZLjjUb59tZqD4oX5dQEWeT++JGmkIjYBVIg/BstO4gBA3YhPJc1k3NgBGAF4MMziMTEYCiVtl7PJNrbdqcNT4a3aHrwZHlw1u5NSsNkT3PGJH6Hli66c2Z4niGf/6lQx1z/kOr0sXJADGtJEnl5gkOWEOHeTRQSpnf88OiGHklUzx5y4n6IEdLN7urGVuOUpK2WUwCK6WRyw8ofefBY+gT/M63076QJyCYCsBPT0vpZ885v4auv7BydaPiuRaCgf/UDi3pS1xhU4w+ul7PxvwrTzz+UhI3ei+6Mx53sVuhksrVS1UjuYQAPZZlAdaz31BPrh5MtDaLvGh+cTyumu91ssPTop7xDJWWzglkincOSKGz8xUv/o/L5l8m9ooGQw5k1aPT6KD5yfXGU0Brcs+KfcJ+D7TX9QM90X+TkpCSnZqR+YrXi1w/LIAYbe4pTIkOwycNTkNI/sJstPXRUk+qEyuwRkt7Smge8UDVTbUCCP47168axqnBY6uXOuasf297CiLm6IhdoT152JRHoh5K/1P7X1dHszIHfxujUBvBDxhEkbFXkx18wtjLUTeIYcKRz2GcAEH3YN7kaFcpPdCv6OInaRIDCwQSlrpRXLf6NKmxRXnzOd8FOrF0ziCAwdTZGA/pwgYEzF59zkvxd02Fxgf/kNqeDl3C081Rp/WkXFOKYN+qiB//BxHRASCc8oSqphxlpcFFxn/fYef9PNuCd/67UzRrKUoZFoXqKwjXeFBmpKbZaMSI+xSnLJlcR/2o+YwAOdLQX5/7hpvP85/WXJpuNKo1ooPUtJDxI9f9EXH1 lgENYD13 SHJnFP1EmW0hWwiJ5dBFwUIxlAqf8upJmO5yHsUCSQ917U1obsjHmgpSy4gl1gJjC5AiQXnlUpIKpiax6Hyc/QXweGdHQIinOmPeKSXCZSQVYVFQi2A9pp8A2FIcRJqeDo8kuowoEGCxGLAgReF/ZPjIZqi9eh3dVwLeXT61op0SAjo8oTVdmAa/paGwnCLHkUMqHfD3u40fZuEg0tqHTFfyLE2D5qRU+N7C0mM6K2YX8/BhSmNwC+ia1VgSTqYvW9wtljOwhPARm6O8pnfyQDbSdhw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: do_run_with_thp() prepares (PMD-sized) THP memory into different states before running tests. With the introduction of THP orders that are smaller than PMD_ORDER, we would like to reuse this logic to also test those smaller orders. So let's add a size parameter which tells the function what size THP it should operate on. No functional change intended here, but a separate commit will add new tests for smaller order THP, where available. Signed-off-by: Ryan Roberts --- tools/testing/selftests/mm/cow.c | 151 +++++++++++++++++-------------- 1 file changed, 84 insertions(+), 67 deletions(-) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index 7324ce5363c0..d887ce454e34 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -32,7 +32,7 @@ static size_t pagesize; static int pagemap_fd; -static size_t thpsize; +static size_t pmdsize; static int nr_hugetlbsizes; static size_t hugetlbsizes[10]; static int gup_fd; @@ -734,14 +734,14 @@ enum thp_run { THP_RUN_PARTIAL_SHARED, }; -static void do_run_with_thp(test_fn fn, enum thp_run thp_run) +static void do_run_with_thp(test_fn fn, enum thp_run thp_run, size_t size) { char *mem, *mmap_mem, *tmp, *mremap_mem = MAP_FAILED; - size_t size, mmap_size, mremap_size; + size_t mmap_size, mremap_size; int ret; - /* For alignment purposes, we need twice the thp size. */ - mmap_size = 2 * thpsize; + /* For alignment purposes, we need twice the requested size. */ + mmap_size = 2 * size; mmap_mem = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (mmap_mem == MAP_FAILED) { @@ -749,36 +749,40 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) return; } - /* We need a THP-aligned memory area. */ - mem = (char *)(((uintptr_t)mmap_mem + thpsize) & ~(thpsize - 1)); + /* We need to naturally align the memory area. */ + mem = (char *)(((uintptr_t)mmap_mem + size) & ~(size - 1)); - ret = madvise(mem, thpsize, MADV_HUGEPAGE); + ret = madvise(mem, size, MADV_HUGEPAGE); if (ret) { ksft_test_result_fail("MADV_HUGEPAGE failed\n"); goto munmap; } /* - * Try to populate a THP. Touch the first sub-page and test if we get - * another sub-page populated automatically. + * Try to populate a THP. Touch the first sub-page and test if + * we get the last sub-page populated automatically. */ mem[0] = 0; - if (!pagemap_is_populated(pagemap_fd, mem + pagesize)) { + if (!pagemap_is_populated(pagemap_fd, mem + size - pagesize)) { ksft_test_result_skip("Did not get a THP populated\n"); goto munmap; } - memset(mem, 0, thpsize); + memset(mem, 0, size); - size = thpsize; switch (thp_run) { case THP_RUN_PMD: case THP_RUN_PMD_SWAPOUT: + if (size != pmdsize) { + ksft_test_result_fail("test bug: can't PMD-map size\n"); + goto munmap; + } break; case THP_RUN_PTE: case THP_RUN_PTE_SWAPOUT: /* * Trigger PTE-mapping the THP by temporarily mapping a single - * subpage R/O. + * subpage R/O. This is a noop if the THP is not pmdsize (and + * therefore already PTE-mapped). */ ret = mprotect(mem + pagesize, pagesize, PROT_READ); if (ret) { @@ -797,7 +801,7 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) * Discard all but a single subpage of that PTE-mapped THP. What * remains is a single PTE mapping a single subpage. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DONTNEED); + ret = madvise(mem + pagesize, size - pagesize, MADV_DONTNEED); if (ret) { ksft_test_result_fail("MADV_DONTNEED failed\n"); goto munmap; @@ -809,7 +813,7 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) * Remap half of the THP. We need some new memory location * for that. */ - mremap_size = thpsize / 2; + mremap_size = size / 2; mremap_mem = mmap(NULL, mremap_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (mem == MAP_FAILED) { @@ -830,7 +834,7 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) * child. This will result in some parts of the THP never * have been shared. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DONTFORK); + ret = madvise(mem + pagesize, size - pagesize, MADV_DONTFORK); if (ret) { ksft_test_result_fail("MADV_DONTFORK failed\n"); goto munmap; @@ -844,7 +848,7 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) } wait(&ret); /* Allow for sharing all pages again. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DOFORK); + ret = madvise(mem + pagesize, size - pagesize, MADV_DOFORK); if (ret) { ksft_test_result_fail("MADV_DOFORK failed\n"); goto munmap; @@ -875,52 +879,65 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) munmap(mremap_mem, mremap_size); } -static void run_with_thp(test_fn fn, const char *desc) +static int sz2ord(size_t size) +{ + return __builtin_ctzll(size / pagesize); +} + +static void run_with_thp(test_fn fn, const char *desc, size_t size) { - ksft_print_msg("[RUN] %s ... with THP\n", desc); - do_run_with_thp(fn, THP_RUN_PMD); + ksft_print_msg("[RUN] %s ... with order-%d THP\n", + desc, sz2ord(size)); + do_run_with_thp(fn, THP_RUN_PMD, size); } -static void run_with_thp_swap(test_fn fn, const char *desc) +static void run_with_thp_swap(test_fn fn, const char *desc, size_t size) { - ksft_print_msg("[RUN] %s ... with swapped-out THP\n", desc); - do_run_with_thp(fn, THP_RUN_PMD_SWAPOUT); + ksft_print_msg("[RUN] %s ... with swapped-out order-%d THP\n", + desc, sz2ord(size)); + do_run_with_thp(fn, THP_RUN_PMD_SWAPOUT, size); } -static void run_with_pte_mapped_thp(test_fn fn, const char *desc) +static void run_with_pte_mapped_thp(test_fn fn, const char *desc, size_t size) { - ksft_print_msg("[RUN] %s ... with PTE-mapped THP\n", desc); - do_run_with_thp(fn, THP_RUN_PTE); + ksft_print_msg("[RUN] %s ... with PTE-mapped order-%d THP\n", + desc, sz2ord(size)); + do_run_with_thp(fn, THP_RUN_PTE, size); } -static void run_with_pte_mapped_thp_swap(test_fn fn, const char *desc) +static void run_with_pte_mapped_thp_swap(test_fn fn, const char *desc, size_t size) { - ksft_print_msg("[RUN] %s ... with swapped-out, PTE-mapped THP\n", desc); - do_run_with_thp(fn, THP_RUN_PTE_SWAPOUT); + ksft_print_msg("[RUN] %s ... with swapped-out, PTE-mapped order-%d THP\n", + desc, sz2ord(size)); + do_run_with_thp(fn, THP_RUN_PTE_SWAPOUT, size); } -static void run_with_single_pte_of_thp(test_fn fn, const char *desc) +static void run_with_single_pte_of_thp(test_fn fn, const char *desc, size_t size) { - ksft_print_msg("[RUN] %s ... with single PTE of THP\n", desc); - do_run_with_thp(fn, THP_RUN_SINGLE_PTE); + ksft_print_msg("[RUN] %s ... with single PTE of order-%d THP\n", + desc, sz2ord(size)); + do_run_with_thp(fn, THP_RUN_SINGLE_PTE, size); } -static void run_with_single_pte_of_thp_swap(test_fn fn, const char *desc) +static void run_with_single_pte_of_thp_swap(test_fn fn, const char *desc, size_t size) { - ksft_print_msg("[RUN] %s ... with single PTE of swapped-out THP\n", desc); - do_run_with_thp(fn, THP_RUN_SINGLE_PTE_SWAPOUT); + ksft_print_msg("[RUN] %s ... with single PTE of swapped-out order-%d THP\n", + desc, sz2ord(size)); + do_run_with_thp(fn, THP_RUN_SINGLE_PTE_SWAPOUT, size); } -static void run_with_partial_mremap_thp(test_fn fn, const char *desc) +static void run_with_partial_mremap_thp(test_fn fn, const char *desc, size_t size) { - ksft_print_msg("[RUN] %s ... with partially mremap()'ed THP\n", desc); - do_run_with_thp(fn, THP_RUN_PARTIAL_MREMAP); + ksft_print_msg("[RUN] %s ... with partially mremap()'ed order-%d THP\n", + desc, sz2ord(size)); + do_run_with_thp(fn, THP_RUN_PARTIAL_MREMAP, size); } -static void run_with_partial_shared_thp(test_fn fn, const char *desc) +static void run_with_partial_shared_thp(test_fn fn, const char *desc, size_t size) { - ksft_print_msg("[RUN] %s ... with partially shared THP\n", desc); - do_run_with_thp(fn, THP_RUN_PARTIAL_SHARED); + ksft_print_msg("[RUN] %s ... with partially shared order-%d THP\n", + desc, sz2ord(size)); + do_run_with_thp(fn, THP_RUN_PARTIAL_SHARED, size); } static void run_with_hugetlb(test_fn fn, const char *desc, size_t hugetlbsize) @@ -1091,15 +1108,15 @@ static void run_anon_test_case(struct test_case const *test_case) run_with_base_page(test_case->fn, test_case->desc); run_with_base_page_swap(test_case->fn, test_case->desc); - if (thpsize) { - run_with_thp(test_case->fn, test_case->desc); - run_with_thp_swap(test_case->fn, test_case->desc); - run_with_pte_mapped_thp(test_case->fn, test_case->desc); - run_with_pte_mapped_thp_swap(test_case->fn, test_case->desc); - run_with_single_pte_of_thp(test_case->fn, test_case->desc); - run_with_single_pte_of_thp_swap(test_case->fn, test_case->desc); - run_with_partial_mremap_thp(test_case->fn, test_case->desc); - run_with_partial_shared_thp(test_case->fn, test_case->desc); + if (pmdsize) { + run_with_thp(test_case->fn, test_case->desc, pmdsize); + run_with_thp_swap(test_case->fn, test_case->desc, pmdsize); + run_with_pte_mapped_thp(test_case->fn, test_case->desc, pmdsize); + run_with_pte_mapped_thp_swap(test_case->fn, test_case->desc, pmdsize); + run_with_single_pte_of_thp(test_case->fn, test_case->desc, pmdsize); + run_with_single_pte_of_thp_swap(test_case->fn, test_case->desc, pmdsize); + run_with_partial_mremap_thp(test_case->fn, test_case->desc, pmdsize); + run_with_partial_shared_thp(test_case->fn, test_case->desc, pmdsize); } for (i = 0; i < nr_hugetlbsizes; i++) run_with_hugetlb(test_case->fn, test_case->desc, @@ -1120,7 +1137,7 @@ static int tests_per_anon_test_case(void) { int tests = 2 + nr_hugetlbsizes; - if (thpsize) + if (pmdsize) tests += 8; return tests; } @@ -1329,7 +1346,7 @@ static void run_anon_thp_test_cases(void) { int i; - if (!thpsize) + if (!pmdsize) return; ksft_print_msg("[INFO] Anonymous THP tests\n"); @@ -1338,13 +1355,13 @@ static void run_anon_thp_test_cases(void) struct test_case const *test_case = &anon_thp_test_cases[i]; ksft_print_msg("[RUN] %s\n", test_case->desc); - do_run_with_thp(test_case->fn, THP_RUN_PMD); + do_run_with_thp(test_case->fn, THP_RUN_PMD, pmdsize); } } static int tests_per_anon_thp_test_case(void) { - return thpsize ? 1 : 0; + return pmdsize ? 1 : 0; } typedef void (*non_anon_test_fn)(char *mem, const char *smem, size_t size); @@ -1419,7 +1436,7 @@ static void run_with_huge_zeropage(non_anon_test_fn fn, const char *desc) } /* For alignment purposes, we need twice the thp size. */ - mmap_size = 2 * thpsize; + mmap_size = 2 * pmdsize; mmap_mem = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (mmap_mem == MAP_FAILED) { @@ -1434,11 +1451,11 @@ static void run_with_huge_zeropage(non_anon_test_fn fn, const char *desc) } /* We need a THP-aligned memory area. */ - mem = (char *)(((uintptr_t)mmap_mem + thpsize) & ~(thpsize - 1)); - smem = (char *)(((uintptr_t)mmap_smem + thpsize) & ~(thpsize - 1)); + mem = (char *)(((uintptr_t)mmap_mem + pmdsize) & ~(pmdsize - 1)); + smem = (char *)(((uintptr_t)mmap_smem + pmdsize) & ~(pmdsize - 1)); - ret = madvise(mem, thpsize, MADV_HUGEPAGE); - ret |= madvise(smem, thpsize, MADV_HUGEPAGE); + ret = madvise(mem, pmdsize, MADV_HUGEPAGE); + ret |= madvise(smem, pmdsize, MADV_HUGEPAGE); if (ret) { ksft_test_result_fail("MADV_HUGEPAGE failed\n"); goto munmap; @@ -1457,7 +1474,7 @@ static void run_with_huge_zeropage(non_anon_test_fn fn, const char *desc) goto munmap; } - fn(mem, smem, thpsize); + fn(mem, smem, pmdsize); munmap: munmap(mmap_mem, mmap_size); if (mmap_smem != MAP_FAILED) @@ -1650,7 +1667,7 @@ static void run_non_anon_test_case(struct non_anon_test_case const *test_case) run_with_zeropage(test_case->fn, test_case->desc); run_with_memfd(test_case->fn, test_case->desc); run_with_tmpfile(test_case->fn, test_case->desc); - if (thpsize) + if (pmdsize) run_with_huge_zeropage(test_case->fn, test_case->desc); for (i = 0; i < nr_hugetlbsizes; i++) run_with_memfd_hugetlb(test_case->fn, test_case->desc, @@ -1671,7 +1688,7 @@ static int tests_per_non_anon_test_case(void) { int tests = 3 + nr_hugetlbsizes; - if (thpsize) + if (pmdsize) tests += 1; return tests; } @@ -1681,10 +1698,10 @@ int main(int argc, char **argv) int err; pagesize = getpagesize(); - thpsize = read_pmd_pagesize(); - if (thpsize) - ksft_print_msg("[INFO] detected THP size: %zu KiB\n", - thpsize / 1024); + pmdsize = read_pmd_pagesize(); + if (pmdsize) + ksft_print_msg("[INFO] detected PMD-mapped THP size: %zu KiB\n", + pmdsize / 1024); nr_hugetlbsizes = detect_hugetlb_page_sizes(hugetlbsizes, ARRAY_SIZE(hugetlbsizes)); detect_huge_zeropage(); From patchwork Fri Sep 29 11:44:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13404133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 797D8E810DF for ; Fri, 29 Sep 2023 11:45:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CB638D008F; Fri, 29 Sep 2023 07:45:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 055618E0006; Fri, 29 Sep 2023 07:45:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5EAD8D00F0; Fri, 29 Sep 2023 07:45:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D5C5F8D008F for ; Fri, 29 Sep 2023 07:45:00 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B8507B4715 for ; Fri, 29 Sep 2023 11:45:00 +0000 (UTC) X-FDA: 81289453560.28.50471E1 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf14.hostedemail.com (Postfix) with ESMTP id EBF99100024 for ; Fri, 29 Sep 2023 11:44:58 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695987899; a=rsa-sha256; cv=none; b=ECJBBamGbqfaLALNjClnRKQ72cJ2feeURwOjpVfw7nJYpj7N0wD5sO1oTbdS7PY3AuZEO/ NAelCK6CHV0TwbIeqZPXdknGrsqz/LGCY7F6hRXqcloGUrRBp+YR9Wn9U1KGJskvueobxo 8yXP8kQfSpE2z869knKg/Zr7ZwyHHNc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695987899; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xAX3/CyH7Dx1azBnB2aEK4gEWTkd0D8Qfh6cCpOBfVo=; b=M+8TahIY8kNCuGiHxr01K+jH84j1AVK6m6jq48Dv6XUDVZ/Psl30ySvESODDTGi7VXtoXN iB7mX2ohzcg7sXoQI33rt4/Gk+IU30RLi/jaQcuqSqzTi1ldvBFZyzoAdYRgrb+r/wqWmZ YVSnNO2D3LAIKsF4fhtlWjLetBrdxUI= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7C4B61007; Fri, 29 Sep 2023 04:45:36 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9A1783F59C; Fri, 29 Sep 2023 04:44:55 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 9/9] selftests/mm/cow: Add tests for small-order anon THP Date: Fri, 29 Sep 2023 12:44:20 +0100 Message-Id: <20230929114421.3761121-10-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: EBF99100024 X-Stat-Signature: c3aocxijqpqfnfep4oey7r4bczxgec74 X-Rspam-User: X-HE-Tag: 1695987898-559247 X-HE-Meta: U2FsdGVkX19/maxpyyDP23QVCFKEoU3rezpI60T0CR8QkjJZN+E5QdV2mwKju7ahzyOuW5OsYe+cW34zKIkVX+WcP4aM1YsNJzP52gtB74u9MoxH9tNYMsjcbzkkbg1LToO475HpDlIMwdpwNSodG96xkEGSIJjB2l1Msdz6h3yOz3adw1u4VqPlaqdleIHCm5AYBywJ2oTgp5dgq1w2eyHgBdig7ccR+pfoFFgJBTEPldr6/p1CN2W1tDePAuWR24ZtvXaHAhW6hHTyVFcItmXxIzhuMGnTBO+B5e1HxqsxwM9Rgn4+ISpg3fm88wv0Tt4MslIhhY+eg1AfBZ2ARXiwvkxZLealHRHF0frGH5ATYgqf8br7YtPxXA9ERNGMvo2LouENOO3ozLak/J9/kWG0ZQRuZJSGYzfeLHcyKG3Nkg9PMKDXs897cUHJdnVYIJ2kGQno6Je3+49JWJDLS09tFgOfRLgjGzIvLtL36rDLlQm8dvAk+PV/tonMZ0Re+sn3mJC/wzlhGBc4aVns+Ns9S+Kxh4nVJJTcGaosFayEt6sBbIpH1GeE3iNsd/03H6HPs9Uq/zCLO+b80U9MFfSX3CCawvzFhYdfXjD6O/7Df4KkDa7aa+v+h448PcatCICFmII+mT17iGaxw19D6JK4yqi1yxjeUnijQkuAYp8eEgjk0dMoPBOWoKO3tiuvNQw7ncc5hbcIh/PoBSCa5N7g2h2Hw3RQpG5hmAqbFPwsmSmiHg6EM5vDdXbVXY7Z3gZVkNgikgNLKTw5MJW0PAhwYKVhPTKZhhBrW++F+dTWY8GZEcfk2skRoGXRnAuxET+2vwgqH/xzJR+TJLyhBmiCwijiSKL1JRkTuHlWeWXKgkMC4lYLjlEaMt7MV5p4nDGLoEEQTkVUgBqk75cELhdtZ7/RgVguZ2Ggr5MyO9vTvpLzFv465hcr+mf6XprxFAao7gLsMbVRlzYi7P1 gRPpR0+p qEuyC7BFAHCIwNYNdL1+p5ben+wIhjjTeiRMloDexdIEoUc0vIszSsx/BGpxddEUapEI7ekDjFVMHm3hgI095x3sTrwgJkI6OAT+we/S3kqxA0gcn0MnLbziT+ekMzRs3hrzFqZnZYzr9GgjRPjxHQ3FvZVYeUip6NIs1HincZsorfXkrh1YrnwqiW+Gbj6VcKfTp6rn0LFdV7X3vK0YBJVL2J/TU0zXfbONc/PdWmMtEA/E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add tests similar to the existing THP tests, but which operate on memory backed by smaller-order, PTE-mapped THP. This reuses all the existing infrastructure. If the test suite detects that small-order THP is not supported by the kernel, the new tests are skipped. Signed-off-by: Ryan Roberts --- tools/testing/selftests/mm/cow.c | 93 ++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index d887ce454e34..6c5e37d8bb69 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -33,10 +33,13 @@ static size_t pagesize; static int pagemap_fd; static size_t pmdsize; +static size_t ptesize; static int nr_hugetlbsizes; static size_t hugetlbsizes[10]; static int gup_fd; static bool has_huge_zeropage; +static unsigned int orig_anon_orders; +static bool orig_anon_orders_valid; static void detect_huge_zeropage(void) { @@ -1118,6 +1121,14 @@ static void run_anon_test_case(struct test_case const *test_case) run_with_partial_mremap_thp(test_case->fn, test_case->desc, pmdsize); run_with_partial_shared_thp(test_case->fn, test_case->desc, pmdsize); } + if (ptesize) { + run_with_pte_mapped_thp(test_case->fn, test_case->desc, ptesize); + run_with_pte_mapped_thp_swap(test_case->fn, test_case->desc, ptesize); + run_with_single_pte_of_thp(test_case->fn, test_case->desc, ptesize); + run_with_single_pte_of_thp_swap(test_case->fn, test_case->desc, ptesize); + run_with_partial_mremap_thp(test_case->fn, test_case->desc, ptesize); + run_with_partial_shared_thp(test_case->fn, test_case->desc, ptesize); + } for (i = 0; i < nr_hugetlbsizes; i++) run_with_hugetlb(test_case->fn, test_case->desc, hugetlbsizes[i]); @@ -1139,6 +1150,8 @@ static int tests_per_anon_test_case(void) if (pmdsize) tests += 8; + if (ptesize) + tests += 6; return tests; } @@ -1693,6 +1706,80 @@ static int tests_per_non_anon_test_case(void) return tests; } +#define ANON_ORDERS_FILE "/sys/kernel/mm/transparent_hugepage/anon_orders" + +static int read_anon_orders(unsigned int *orders) +{ + ssize_t buflen = 80; + char buf[buflen]; + int fd; + + fd = open(ANON_ORDERS_FILE, O_RDONLY); + if (fd == -1) + return -1; + + buflen = read(fd, buf, buflen); + close(fd); + + if (buflen < 1) + return -1; + + *orders = strtoul(buf, NULL, 16); + + return 0; +} + +static int write_anon_orders(unsigned int orders) +{ + ssize_t buflen = 80; + char buf[buflen]; + int fd; + + fd = open(ANON_ORDERS_FILE, O_WRONLY); + if (fd == -1) + return -1; + + buflen = snprintf(buf, buflen, "0x%08x\n", orders); + buflen = write(fd, buf, buflen); + close(fd); + + if (buflen < 1) + return -1; + + return 0; +} + +static size_t save_thp_anon_orders(void) +{ + /* + * If the kernel supports multiple orders for anon THP (indicated by the + * presence of anon_orders file), configure it for the PMD-order and the + * PMD-order - 1, which we will report back and use as the PTE-order THP + * size. Save the original value so that it can be restored on exit. If + * the kernel does not support multiple orders, report back 0 for the + * PTE-size so those tests are skipped. + */ + + int pteorder = sz2ord(pmdsize) - 1; + unsigned int orders = (1UL << sz2ord(pmdsize)) | (1UL << pteorder); + + if (read_anon_orders(&orig_anon_orders)) + return 0; + + orig_anon_orders_valid = true; + + if (write_anon_orders(orders)) + return 0; + + return pagesize << pteorder; +} + +static void restore_thp_anon_orders(void) +{ + if (orig_anon_orders_valid) + write_anon_orders(orig_anon_orders); +} + int main(int argc, char **argv) { int err; @@ -1702,6 +1789,10 @@ int main(int argc, char **argv) if (pmdsize) ksft_print_msg("[INFO] detected PMD-mapped THP size: %zu KiB\n", pmdsize / 1024); + ptesize = save_thp_anon_orders(); + if (ptesize) + ksft_print_msg("[INFO] configured PTE-mapped THP size: %zu KiB\n", + ptesize / 1024); nr_hugetlbsizes = detect_hugetlb_page_sizes(hugetlbsizes, ARRAY_SIZE(hugetlbsizes)); detect_huge_zeropage(); @@ -1720,6 +1811,8 @@ int main(int argc, char **argv) run_anon_thp_test_cases(); run_non_anon_test_cases(); + restore_thp_anon_orders(); + err = ksft_get_fail_cnt(); if (err) ksft_exit_fail_msg("%d out of %d tests failed\n",