[WIP,v1,02/20] mm: add a total mapcount for large folios

Message ID	20231124132626.235350-3-david@redhat.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: David Hildenbrand <david@redhat.com> To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand <david@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org>, Ryan Roberts <ryan.roberts@arm.com>, Matthew Wilcox <willy@infradead.org>, Hugh Dickins <hughd@google.com>, Yin Fengwei <fengwei.yin@intel.com>, Yang Shi <shy828301@gmail.com>, Ying Huang <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>, "Paul E. McKenney" <paulmck@kernel.org> Subject: [PATCH WIP v1 02/20] mm: add a total mapcount for large folios Date: Fri, 24 Nov 2023 14:26:07 +0100 Message-ID: <20231124132626.235350-3-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / partially-mappable folios \| expand [WIP,v1,00/20] mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / … [WIP,v1,01/20] mm/rmap: factor out adding folio range into __folio_add_rmap_range() [WIP,v1,02/20] mm: add a total mapcount for large folios [WIP,v1,03/20] mm: convert folio_estimated_sharers() to folio_mapped_shared() and improve it [WIP,v1,04/20] mm/rmap: pass dst_vma to page_try_dup_anon_rmap() and page_dup_file_rmap() [WIP,v1,05/20] mm/rmap: abstract total mapcount operations for partially-mappable folios [WIP,v1,06/20] atomic_seqcount: new (raw) seqcount variant to support concurrent writers [WIP,v1,07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio [WIP,v1,08/20] mm: pass MM to folio_mapped_shared() [WIP,v1,09/20] mm: improve folio_mapped_shared() for partially-mappable folios using rmap IDs [WIP,v1,10/20] mm/memory: COW reuse support for PTE-mapped THP with rmap IDs [WIP,v1,11/20] mm/rmap_id: support for 1, 2 and 3 values by manual calculation [WIP,v1,12/20] mm/rmap: introduce folio_add_anon_rmap_range() [WIP,v1,13/20] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() [WIP,v1,14/20] mm/huge_memory: avoid folio_refcount() < folio_mapcount() in __split_huge_pmd_locked… [WIP,v1,15/20] mm/rmap_id: verify precalculated subids with CONFIG_DEBUG_VM [WIP,v1,16/20] atomic_seqcount: support a single exclusive writer in the absence of other writers [WIP,v1,17/20] mm/rmap_id: reduce atomic RMW operations when we are the exclusive writer [WIP,v1,18/20] atomic_seqcount: use atomic add-return instead of atomic cmpxchg on 64bit [WIP,v1,19/20] mm/rmap: factor out removing folio range into __folio_remove_rmap_range() [WIP,v1,20/20] mm/rmap: perform all mapcount operations of large folios under the rmap seqcount

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst index 9a607059ea11..b0d3b1d3e8ea 100644 --- a/Documentation/mm/transhuge.rst +++ b/Documentation/mm/transhuge.rst @@ -116,14 +116,14 @@ pages: succeeds on tail pages. - map/unmap of a PMD entry for the whole THP increment/decrement - folio->_entire_mapcount and also increment/decrement - folio->_nr_pages_mapped by COMPOUND_MAPPED when _entire_mapcount - goes from -1 to 0 or 0 to -1. + folio->_entire_mapcount, increment/decrement folio->_total_mapcount + and also increment/decrement folio->_nr_pages_mapped by COMPOUND_MAPPED + when _entire_mapcount goes from -1 to 0 or 0 to -1. - map/unmap of individual pages with PTE entry increment/decrement - page->_mapcount and also increment/decrement folio->_nr_pages_mapped - when page->_mapcount goes from -1 to 0 or 0 to -1 as this counts - the number of pages mapped by PTE. + page->_mapcount, increment/decrement folio->_total_mapcount and also + increment/decrement folio->_nr_pages_mapped when page->_mapcount goes + from -1 to 0 or 0 to -1 as this counts the number of pages mapped by PTE. split_huge_page internally has to distribute the refcounts in the head page to the tail pages before clearing all PG_head/tail bits from the page diff --git a/include/linux/mm.h b/include/linux/mm.h index 418d26608ece..fe91aaefa3db 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1207,17 +1207,16 @@ static inline int page_mapcount(struct page *page) return mapcount; } -int folio_total_mapcount(struct folio *folio); +static inline int folio_total_mapcount(struct folio *folio) +{ + VM_WARN_ON_FOLIO(!folio_test_large(folio), folio); + return atomic_read(&folio->_total_mapcount) + 1; +} /** - * folio_mapcount() - Calculate the number of mappings of this folio. + * folio_mapcount() - Number of mappings of this folio. * @folio: The folio. * - * A large folio tracks both how many times the entire folio is mapped, - * and how many times each individual page in the folio is mapped. - * This function calculates the total number of times the folio is - * mapped. - * * Return: The number of times this folio is mapped. */ static inline int folio_mapcount(struct folio *folio) @@ -1229,19 +1228,7 @@ static inline int folio_mapcount(struct folio *folio) static inline int total_mapcount(struct page *page) { - if (likely(!PageCompound(page))) - return atomic_read(&page->_mapcount) + 1; - return folio_total_mapcount(page_folio(page)); -} - -static inline bool folio_large_is_mapped(struct folio *folio) -{ - /* - * Reading _entire_mapcount below could be omitted if hugetlb - * participated in incrementing nr_pages_mapped when compound mapped. - */ - return atomic_read(&folio->_nr_pages_mapped) > 0 || - atomic_read(&folio->_entire_mapcount) >= 0; + return folio_mapcount(page_folio(page)); } /** @@ -1252,9 +1239,7 @@ static inline bool folio_large_is_mapped(struct folio *folio) */ static inline bool folio_mapped(struct folio *folio) { - if (likely(!folio_test_large(folio))) - return atomic_read(&folio->_mapcount) >= 0; - return folio_large_is_mapped(folio); + return folio_mapcount(folio) > 0; } /* @@ -1264,9 +1249,7 @@ static inline bool folio_mapped(struct folio *folio) */ static inline bool page_mapped(struct page *page) { - if (likely(!PageCompound(page))) - return atomic_read(&page->_mapcount) >= 0; - return folio_large_is_mapped(page_folio(page)); + return folio_mapped(page_folio(page)); } static inline struct page *virt_to_head_page(const void *x) @@ -2139,7 +2122,7 @@ static inline size_t folio_size(struct folio *folio) * looking at the precise mapcount of the first subpage in the folio, and * assuming the other subpages are the same. This may not be true for large * folios. If you want exact mapcounts for exact calculations, look at - * page_mapcount() or folio_total_mapcount(). + * page_mapcount() or folio_mapcount(). * * Return: The estimated number of processes sharing a folio. */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 957ce38768b2..99b84b4797b9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -264,7 +264,8 @@ typedef struct { * @virtual: Virtual address in the kernel direct map. * @_last_cpupid: IDs of last CPU and last process that accessed the folio. * @_entire_mapcount: Do not use directly, call folio_entire_mapcount(). - * @_nr_pages_mapped: Do not use directly, call folio_mapcount(). + * @_total_mapcount: Do not use directly, call folio_mapcount(). + * @_nr_pages_mapped: Do not use outside of rmap code. * @_pincount: Do not use directly, call folio_maybe_dma_pinned(). * @_folio_nr_pages: Do not use directly, call folio_nr_pages(). * @_hugetlb_subpool: Do not use directly, use accessor in hugetlb.h. @@ -323,8 +324,8 @@ struct folio { struct { unsigned long _flags_1; unsigned long _head_1; - unsigned long _folio_avail; /* public: */ + atomic_t _total_mapcount; atomic_t _entire_mapcount; atomic_t _nr_pages_mapped; atomic_t _pincount; diff --git a/include/linux/rmap.h b/include/linux/rmap.h index b26fe858fd44..42e2c74d4d6e 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -210,14 +210,19 @@ void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *, static inline void __page_dup_rmap(struct page *page, bool compound) { - if (compound) { - struct folio *folio = (struct folio *)page; + struct folio *folio = page_folio(page); - VM_BUG_ON_PAGE(compound && !PageHead(page), page); - atomic_inc(&folio->_entire_mapcount); - } else { + VM_BUG_ON_PAGE(compound && !PageHead(page), page); + if (likely(!folio_test_large(folio))) { atomic_inc(&page->_mapcount); + return; } + + if (compound) + atomic_inc(&folio->_entire_mapcount); + else + atomic_inc(&page->_mapcount); + atomic_inc(&folio->_total_mapcount); } static inline void page_dup_file_rmap(struct page *page, bool compound) diff --git a/mm/debug.c b/mm/debug.c index ee533a5ceb79..97f6f6b32ae7 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -99,10 +99,10 @@ static void __dump_page(struct page *page) page, page_ref_count(head), mapcount, mapping, page_to_pgoff(page), page_to_pfn(page)); if (compound) { - pr_warn("head:%p order:%u entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n", + pr_warn("head:%p order:%u entire_mapcount:%d total_mapcount:%d pincount:%d\n", head, compound_order(head), folio_entire_mapcount(folio), - folio_nr_pages_mapped(folio), + folio_mapcount(folio), atomic_read(&folio->_pincount)); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1169ef2f2176..cf84784064c7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1509,7 +1509,7 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, struct page *p; atomic_set(&folio->_entire_mapcount, 0); - atomic_set(&folio->_nr_pages_mapped, 0); + atomic_set(&folio->_total_mapcount, 0); atomic_set(&folio->_pincount, 0); for (i = 1; i < nr_pages; i++) { @@ -2119,7 +2119,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, /* we rely on prep_new_hugetlb_folio to set the destructor */ folio_set_order(folio, order); atomic_set(&folio->_entire_mapcount, -1); - atomic_set(&folio->_nr_pages_mapped, 0); + atomic_set(&folio->_total_mapcount, -1); atomic_set(&folio->_pincount, 0); return true; diff --git a/mm/internal.h b/mm/internal.h index b61034bd50f5..bb2e55c402e7 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -67,15 +67,6 @@ void page_writeback_init(void); */ #define SHOW_MEM_FILTER_NODES (0x0001u) /* disallowed nodes */ -/* - * How many individual pages have an elevated _mapcount. Excludes - * the folio's entire_mapcount. - */ -static inline int folio_nr_pages_mapped(struct folio *folio) -{ - return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED; -} - static inline void *folio_raw_mapping(struct folio *folio) { unsigned long mapping = (unsigned long)folio->mapping; @@ -429,6 +420,7 @@ static inline void prep_compound_head(struct page *page, unsigned int order) struct folio *folio = (struct folio *)page; folio_set_order(folio, order); + atomic_set(&folio->_total_mapcount, -1); atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 733732e7e0ba..aad45758c0c7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -988,6 +988,10 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) bad_page(page, "nonzero entire_mapcount"); goto out; } + if (unlikely(atomic_read(&folio->_total_mapcount) + 1)) { + bad_page(page, "nonzero total_mapcount"); + goto out; + } if (unlikely(atomic_read(&folio->_nr_pages_mapped))) { bad_page(page, "nonzero nr_pages_mapped"); goto out; diff --git a/mm/rmap.c b/mm/rmap.c index afddf3d82a8f..38765796dca8 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1104,35 +1104,12 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, return page_vma_mkclean_one(&pvmw); } -int folio_total_mapcount(struct folio *folio) -{ - int mapcount = folio_entire_mapcount(folio); - int nr_pages; - int i; - - /* In the common case, avoid the loop when no pages mapped by PTE */ - if (folio_nr_pages_mapped(folio) == 0) - return mapcount; - /* - * Add all the PTE mappings of those pages mapped by PTE. - * Limit the loop to folio_nr_pages_mapped()? - * Perhaps: given all the raciness, that may be a good or a bad idea. - */ - nr_pages = folio_nr_pages(folio); - for (i = 0; i < nr_pages; i++) - mapcount += atomic_read(&folio_page(folio, i)->_mapcount); - - /* But each of those _mapcounts was based on -1 */ - mapcount += nr_pages; - return mapcount; -} - static unsigned int __folio_add_rmap_range(struct folio *folio, struct page *page, unsigned int nr_pages, bool compound, int *nr_pmdmapped) { atomic_t *mapped = &folio->_nr_pages_mapped; - int first, nr = 0; + int first, count, nr = 0; VM_WARN_ON_FOLIO(compound && page != &folio->page, folio); VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); @@ -1144,6 +1121,7 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, /* Is page being mapped by PTE? Is this its first map to be added? */ if (!compound) { + count = nr_pages; do { first = atomic_inc_and_test(&page->_mapcount); if (first) { @@ -1151,7 +1129,8 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, if (first < COMPOUND_MAPPED) nr++; } - } while (page++, --nr_pages > 0); + } while (page++, --count > 0); + atomic_add(nr_pages, &folio->_total_mapcount); } else if (folio_test_pmd_mappable(folio)) { /* That test is redundant: it's for safety or to optimize out */ @@ -1169,6 +1148,7 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, nr = 0; } } + atomic_inc(&folio->_total_mapcount); } else { VM_WARN_ON_ONCE_FOLIO(true, folio); } @@ -1348,6 +1328,10 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); } + if (folio_test_large(folio)) + /* increment count (starts at -1) */ + atomic_set(&folio->_total_mapcount, 0); + __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); __folio_set_anon(folio, vma, address, true); SetPageAnonExclusive(&folio->page); @@ -1427,6 +1411,9 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, VM_BUG_ON_PAGE(compound && !PageHead(page), page); + if (folio_test_large(folio)) + atomic_dec(&folio->_total_mapcount); + /* Hugetlb pages are not counted in NR_*MAPPED */ if (unlikely(folio_test_hugetlb(folio))) { /* hugetlb pages are always mapped with pmds */ @@ -2576,6 +2563,7 @@ void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); atomic_inc(&folio->_entire_mapcount); + atomic_inc(&folio->_total_mapcount); if (flags & RMAP_EXCLUSIVE) SetPageAnonExclusive(&folio->page); VM_WARN_ON_FOLIO(folio_entire_mapcount(folio) > 1 && @@ -2588,6 +2576,7 @@ void hugepage_add_new_anon_rmap(struct folio *folio, BUG_ON(address < vma->vm_start || address >= vma->vm_end); /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); + atomic_set(&folio->_total_mapcount, 0); folio_clear_hugetlb_restore_reserve(folio); __folio_set_anon(folio, vma, address, true); SetPageAnonExclusive(&folio->page);

[WIP,v1,02/20] mm: add a total mapcount for large folios

Commit Message

Patch