From patchwork Tue Jun 13 16:09:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13279026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 97C31EB64D0 for ; Tue, 13 Jun 2023 16:10:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=DdevakYQM7fKuC2fkphki64yswIB5sqAuqgmTvL6BE0=; b=AR/7rewP5vXPgy 5uh2D8lqRXeLfKUPl3e/cyJDKhKbOF8bWvcoK8HHNgQ1/llGlJ6Dk7pVt73lJWGGT7AO4xzf0g6uH Bw2q4KXRWi1fw5TMljByjSOF4I1ktF9ErMhRUE0RHtbQ5P+n4j9C5Mn2napMRXjeyDNaYhkrF34M0 /bvI+agmyBOkEN1ZMmEmWZLLCbT+a+OqVDCPKsb9bq/RPXpRjRv6eRokSBY+FjlTkZV/mAgA2unRm //79V9vVRoWCOlpC10dKTwvyPpea8CCw8v+9VksMvsdPLcQFbZ9B12GHjTUujLNlk1yA9UNtG1sfI hi3M0hdH26fAku1B4wBA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q96ax-008YB2-0Q; Tue, 13 Jun 2023 16:10:11 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q96as-008Y8T-16 for linux-arm-kernel@lists.infradead.org; Tue, 13 Jun 2023 16:10:09 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9AE202F4; Tue, 13 Jun 2023 09:10:48 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4AB803F5A1; Tue, 13 Jun 2023 09:10:02 -0700 (PDT) From: Ryan Roberts To: Jonathan Corbet , Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v1 1/2] mm: /proc/pid/smaps: Report large folio mappings Date: Tue, 13 Jun 2023 17:09:49 +0100 Message-Id: <20230613160950.3554675-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230613160950.3554675-1-ryan.roberts@arm.com> References: <20230613160950.3554675-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230613_091006_470133_BAA9D853 X-CRM114-Status: GOOD ( 21.94 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org With the addition of large folios for page cache pages, it is useful to see which orders of folios are being mapped into a process. Additionally, with planned future improvements to allocate large folios for anonymous memory this will become even more useful. Visibility will help to tune performance. New fields "AnonContXXX" and "FileContXXX" indicate physically contiguous runs of memory, binned into power-of-2 sizes starting with the page size and ending with the pmd size. Therefore the exact set of keys will vary by platform. It only includes pte-mapped memory and reports on anonymous and file-backed memory separately. Rollup Example: aaaac9960000-ffffddfdd000 ---p 00000000 00:00 0 [rollup] Rss: 10852 kB ... AnonCont4K: 3480 kB AnonCont8K: 0 kB AnonCont16K: 0 kB AnonCont32K: 0 kB AnonCont64K: 0 kB AnonCont128K: 0 kB AnonCont256K: 0 kB AnonCont512K: 0 kB AnonCont1M: 0 kB AnonCont2M: 0 kB FileCont4K: 3060 kB FileCont8K: 40 kB FileCont16K: 3792 kB FileCont32K: 160 kB FileCont64K: 320 kB FileCont128K: 0 kB FileCont256K: 0 kB FileCont512K: 0 kB FileCont1M: 0 kB FileCont2M: 0 kB Signed-off-by: Ryan Roberts --- Documentation/filesystems/proc.rst | 26 +++++++ fs/proc/task_mmu.c | 115 +++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 7897a7dafcbc..5fa3f638848d 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -471,6 +471,26 @@ Memory Area, or VMA) there is a series of lines such as the following:: KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB + AnonCont4K: 0 kB + AnonCont8K: 0 kB + AnonCont16K: 0 kB + AnonCont32K: 0 kB + AnonCont64K: 0 kB + AnonCont128K: 0 kB + AnonCont256K: 0 kB + AnonCont512K: 0 kB + AnonCont1M: 0 kB + AnonCont2M: 0 kB + FileCont4K: 348 kB + FileCont8K: 0 kB + FileCont16K: 32 kB + FileCont32K: 0 kB + FileCont64K: 512 kB + FileCont128K: 0 kB + FileCont256K: 0 kB + FileCont512K: 0 kB + FileCont1M: 0 kB + FileCont2M: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me dw @@ -524,6 +544,12 @@ replaced by copy-on-write) part of the underlying shmem object out on swap. does not take into account swapped out page of underlying shmem objects. "Locked" indicates whether the mapping is locked in memory or not. +"AnonContXXX" and "FileContXXX" indicate physically contiguous runs of memory, +binned into power-of-2 sizes starting with the page size and ending with the +pmd size. Therefore the exact set of keys will vary by platform. It only +includes pte-mapped memory and reports on anonymous and file-backed memory +separately. + "THPeligible" indicates whether the mapping is eligible for allocating THP pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise. It just shows the current status. diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 507cd4e59d07..29fee5b7b00b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -397,6 +397,49 @@ const struct file_operations proc_pid_maps_operations = { #define PSS_SHIFT 12 #ifdef CONFIG_PROC_PAGE_MONITOR + +#define CONT_ORDER_MAX (PMD_SHIFT-PAGE_SHIFT) +#define CONT_LABEL_FIELD_SIZE 8 +#define CONT_LABEL_BUF_SIZE 32 + +static char *cont_label(int order, char buf[CONT_LABEL_BUF_SIZE]) +{ + unsigned long size = ((1UL << order) * PAGE_SIZE) >> 10; + char suffix = 'K'; + int count; + + if (size >= SZ_1K) { + size >>= 10; + suffix = 'M'; + } + + if (size >= SZ_1K) { + size >>= 10; + suffix = 'G'; + } + + count = snprintf(buf, CONT_LABEL_BUF_SIZE, "%lu%c:", size, suffix); + + /* + * If the string is less than the field size, pad it with spaces so that + * the values line up in smaps. + */ + if (count < CONT_LABEL_FIELD_SIZE) { + memset(&buf[count], ' ', CONT_LABEL_FIELD_SIZE - count); + buf[CONT_LABEL_FIELD_SIZE] = '\0'; + } + + return buf; +} + +struct cont_accumulator { + bool anon; + unsigned long folio_start_pfn; + unsigned long folio_end_pfn; + unsigned long next_pfn; + unsigned long nrpages; +}; + struct mem_size_stats { unsigned long resident; unsigned long shared_clean; @@ -419,8 +462,60 @@ struct mem_size_stats { u64 pss_dirty; u64 pss_locked; u64 swap_pss; + unsigned long anon_cont[CONT_ORDER_MAX + 1]; + unsigned long file_cont[CONT_ORDER_MAX + 1]; + struct cont_accumulator cacc; }; +static void cacc_init(struct mem_size_stats *mss) +{ + struct cont_accumulator *cacc = &mss->cacc; + + cacc->next_pfn = -1; + cacc->nrpages = 0; +} + +static void cacc_drain(struct mem_size_stats *mss) +{ + struct cont_accumulator *cacc = &mss->cacc; + unsigned long *cont = cacc->anon ? mss->anon_cont : mss->file_cont; + unsigned long order; + unsigned long nrpages; + + while (cacc->nrpages > 0) { + order = ilog2(cacc->nrpages); + nrpages = 1UL << order; + cacc->nrpages -= nrpages; + cont[order] += nrpages * PAGE_SIZE; + } +} + +static void cacc_accumulate(struct mem_size_stats *mss, struct page *page) +{ + struct cont_accumulator *cacc = &mss->cacc; + unsigned long pfn = page_to_pfn(page); + bool anon = PageAnon(page); + struct folio *folio; + unsigned long start_pfn; + + if (cacc->next_pfn == pfn && cacc->anon == anon && + pfn >= cacc->folio_start_pfn && pfn < cacc->folio_end_pfn) { + cacc->next_pfn++; + cacc->nrpages++; + } else { + cacc_drain(mss); + + folio = page_folio(page); + start_pfn = page_to_pfn(&folio->page); + + cacc->anon = anon; + cacc->folio_start_pfn = start_pfn; + cacc->folio_end_pfn = start_pfn + folio_nr_pages(folio); + cacc->next_pfn = pfn + 1; + cacc->nrpages = 1; + } +} + static void smaps_page_accumulate(struct mem_size_stats *mss, struct page *page, unsigned long size, unsigned long pss, bool dirty, bool locked, bool private) @@ -473,6 +568,10 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page, if (young || page_is_young(page) || PageReferenced(page)) mss->referenced += size; + /* Accumulate physically contiguous map size information. */ + if (!compound) + cacc_accumulate(mss, page); + /* * Then accumulate quantities that may depend on sharing, or that may * differ page-by-page. @@ -622,6 +721,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma = walk->vma; + struct mem_size_stats *mss = walk->private; pte_t *pte; spinlock_t *ptl; @@ -632,6 +732,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, goto out; } + cacc_init(mss); pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); if (!pte) { walk->action = ACTION_AGAIN; @@ -640,6 +741,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, for (; addr != end; pte++, addr += PAGE_SIZE) smaps_pte_entry(pte, addr, walk); pte_unmap_unlock(pte - 1, ptl); + cacc_drain(mss); out: cond_resched(); return 0; @@ -816,6 +918,9 @@ static void smap_gather_stats(struct vm_area_struct *vma, static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss, bool rollup_mode) { + int i; + char label[CONT_LABEL_BUF_SIZE]; + SEQ_PUT_DEC("Rss: ", mss->resident); SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); @@ -849,6 +954,16 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss, mss->swap_pss >> PSS_SHIFT); SEQ_PUT_DEC(" kB\nLocked: ", mss->pss_locked >> PSS_SHIFT); + for (i = 0; i <= CONT_ORDER_MAX; i++) { + seq_printf(m, " kB\nAnonCont%s%8lu", + cont_label(i, label), + mss->anon_cont[i] >> 10); + } + for (i = 0; i <= CONT_ORDER_MAX; i++) { + seq_printf(m, " kB\nFileCont%s%8lu", + cont_label(i, label), + mss->file_cont[i] >> 10); + } seq_puts(m, " kB\n"); } From patchwork Tue Jun 13 16:09:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13279027 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 88CECEB64D0 for ; Tue, 13 Jun 2023 16:10:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=O0bR9Yr4zzwYqylOPuVFcTactqWldjxgpvnkq3YjVNc=; b=Zm25vVAvI99O4F VfbF6pkz5nzR2ucYEXEFrMl4WLvdm3vq4zIv7iRBOtWvcTdLzIl4AvHK71M52OBQ0nPH1qMgTwb+I HKwcQ1bwAx4sCI+oZfmBCvI8dztg+xVol8c85lgiu+2O0wyExYJhmDAzGZWtuY0HSi3+PfS4eb7qH jrpsKV9RFXUd6VZu92wckPzm5O3fqtWFm+7jIzKJY2b5cZlVrVh9RxNAXnNS91GmPZjnpUprvSmnr /UUhHcwHUTVBazYIJ35mt+buhZL3qfdFVMN9SpweQl1lrj/KX2vcklGjHfk7rVMEgi8wMLItyIhdn F3S/32bGtTD22KwiItag==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q96b1-008YE7-2C; Tue, 13 Jun 2023 16:10:15 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q96at-008Y95-1j for linux-arm-kernel@lists.infradead.org; Tue, 13 Jun 2023 16:10:13 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2B482143D; Tue, 13 Jun 2023 09:10:50 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E80413F5A1; Tue, 13 Jun 2023 09:10:03 -0700 (PDT) From: Ryan Roberts To: Jonathan Corbet , Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v1 2/2] mm: /proc/pid/smaps: Report contpte mappings Date: Tue, 13 Jun 2023 17:09:50 +0100 Message-Id: <20230613160950.3554675-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230613160950.3554675-1-ryan.roberts@arm.com> References: <20230613160950.3554675-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230613_091007_665168_EA48330E X-CRM114-Status: GOOD ( 17.44 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org arm64 intends to start using its "contpte" bit in pgtables more frequently, and therefore it would be useful to know how well utilised it is in order to help diagnose and fix performance issues. Add "ContPTEMapped" field, which shows how much of the rss is mapped using contptes. For architectures that do not support contpte mappings (as determined by pte_cont() not being defined) the field will be suppressed. Rollup Example: aaaac5150000-ffffccf07000 ---p 00000000 00:00 0 [rollup] Rss: 11504 kB ... ContPTEMapped: 6848 kB Signed-off-by: Ryan Roberts --- Documentation/filesystems/proc.rst | 5 +++++ fs/proc/task_mmu.c | 19 +++++++++++++++---- 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 5fa3f638848d..726951374c57 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -491,6 +491,7 @@ Memory Area, or VMA) there is a series of lines such as the following:: FileCont512K: 0 kB FileCont1M: 0 kB FileCont2M: 0 kB + ContPTEMapped: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me dw @@ -550,6 +551,10 @@ pmd size. Therefore the exact set of keys will vary by platform. It only includes pte-mapped memory and reports on anonymous and file-backed memory separately. +"ContPTEMapped" is only present for architectures that support indicating a set +of contiguously mapped ptes in their page tables. In this case, it indicates +how much of the memory is currently mapped using contpte mappings. + "THPeligible" indicates whether the mapping is eligible for allocating THP pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise. It just shows the current status. diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 29fee5b7b00b..0ebd6eb7efd4 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -465,6 +465,7 @@ struct mem_size_stats { unsigned long anon_cont[CONT_ORDER_MAX + 1]; unsigned long file_cont[CONT_ORDER_MAX + 1]; struct cont_accumulator cacc; + unsigned long contpte_mapped; }; static void cacc_init(struct mem_size_stats *mss) @@ -548,7 +549,7 @@ static void smaps_page_accumulate(struct mem_size_stats *mss, static void smaps_account(struct mem_size_stats *mss, struct page *page, bool compound, bool young, bool dirty, bool locked, - bool migration) + bool migration, bool contpte) { int i, nr = compound ? compound_nr(page) : 1; unsigned long size = nr * PAGE_SIZE; @@ -572,6 +573,10 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page, if (!compound) cacc_accumulate(mss, page); + /* Accumulate all the pages that are part of a contpte. */ + if (contpte) + mss->contpte_mapped += size; + /* * Then accumulate quantities that may depend on sharing, or that may * differ page-by-page. @@ -636,13 +641,16 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, struct vm_area_struct *vma = walk->vma; bool locked = !!(vma->vm_flags & VM_LOCKED); struct page *page = NULL; - bool migration = false, young = false, dirty = false; + bool migration = false, young = false, dirty = false, cont = false; pte_t ptent = ptep_get(pte); if (pte_present(ptent)) { page = vm_normal_page(vma, addr, ptent); young = pte_young(ptent); dirty = pte_dirty(ptent); +#ifdef pte_cont + cont = pte_cont(ptent); +#endif } else if (is_swap_pte(ptent)) { swp_entry_t swpent = pte_to_swp_entry(ptent); @@ -672,7 +680,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, if (!page) return; - smaps_account(mss, page, false, young, dirty, locked, migration); + smaps_account(mss, page, false, young, dirty, locked, migration, cont); } #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -708,7 +716,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, mss->file_thp += HPAGE_PMD_SIZE; smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd), - locked, migration); + locked, migration, false); } #else static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, @@ -964,6 +972,9 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss, cont_label(i, label), mss->file_cont[i] >> 10); } +#ifdef pte_cont + SEQ_PUT_DEC(" kB\nContPTEMapped: ", mss->contpte_mapped); +#endif seq_puts(m, " kB\n"); }