diff mbox series

[v1] mm: convert folio_estimated_sharers() to folio_likely_mapped_shared()

Message ID 20240227201548.857831-1-david@redhat.com (mailing list archive)
State New
Headers show
Series [v1] mm: convert folio_estimated_sharers() to folio_likely_mapped_shared() | expand

Commit Message

David Hildenbrand Feb. 27, 2024, 8:15 p.m. UTC
Callers of folio_estimated_sharers() only care about "mapped shared vs.
mapped exclusively", not the exact estimate of sharers. Let's consolidate
and unify the condition users are checking. While at it clarify the
semantics and extend the discussion on the fuzziness.

Use the "likely mapped shared" terminology to better express what the
(adjusted) function actually checks.

Whether a partially-mappable folio is more likely to not be partially
mapped than partially mapped is debatable. In the future, we might be able
to improve our estimate for partially-mappable folios, though.

Note that we will now consistently detect "mapped shared" only if the
first subpage is actually mapped multiple times. When the first subpage
is not mapped, we will consistently detect it as "mapped exclusively".
This change should currently only affect the usage in
madvise_free_pte_range() and queue_folios_pte_range() for large folios: if
the first page was already unmapped, we would have skipped the folio.

Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++----------
 mm/huge_memory.c   |  2 +-
 mm/madvise.c       |  6 +++---
 mm/memory.c        |  2 +-
 mm/mempolicy.c     | 14 ++++++--------
 mm/migrate.c       |  8 ++++----
 6 files changed, 51 insertions(+), 27 deletions(-)

Comments

Khalid Aziz Feb. 27, 2024, 11:18 p.m. UTC | #1
On Tue, 2024-02-27 at 21:15 +0100, David Hildenbrand wrote:
> Callers of folio_estimated_sharers() only care about "mapped shared
> vs.
> mapped exclusively", not the exact estimate of sharers. Let's
> consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
> 
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
> 
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be
> able
> to improve our estimate for partially-mappable folios, though.
> 
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first
> subpage
> is not mapped, we will consistently detect it as "mapped
> exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large
> folios: if
> the first page was already unmapped, we would have skipped the folio.
> 
> Cc: Barry Song <v-songbaohua@oppo.com>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>


This patch adds clarity while retaining current behavior, so looks good
to me.

Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>


> ---
>  include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++--------
> --
>  mm/huge_memory.c   |  2 +-
>  mm/madvise.c       |  6 +++---
>  mm/memory.c        |  2 +-
>  mm/mempolicy.c     | 14 ++++++--------
>  mm/migrate.c       |  8 ++++----
>  6 files changed, 51 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f4825d829656..795c89632265f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio
> *folio)
>  }
>  
>  /**
> - * folio_estimated_sharers - Estimate the number of sharers of a
> folio.
> + * folio_likely_mapped_shared - Estimate if the folio is mapped into
> the page
> + *				tables of more than one MM
>   * @folio: The folio.
>   *
> - * folio_estimated_sharers() aims to serve as a function to
> efficiently
> - * estimate the number of processes sharing a folio. This is done by
> - * looking at the precise mapcount of the first subpage in the
> folio, and
> - * assuming the other subpages are the same. This may not be true
> for large
> - * folios. If you want exact mapcounts for exact calculations, look
> at
> - * page_mapcount() or folio_total_mapcount().
> + * This function checks if the folio is currently mapped into more
> than one
> + * MM ("mapped shared"), or if the folio is only mapped into a
> single MM
> + * ("mapped exclusively").
>   *
> - * Return: The estimated number of processes sharing a folio.
> + * As precise information is not easily available for all folios,
> this function
> + * estimates the number of MMs ("sharers") that are currently
> mapping a folio
> + * using the number of times the first page of the folio is
> currently mapped
> + * into page tables.
> + *
> + * For small anonymous folios (except KSM folios) and anonymous
> hugetlb folios,
> + * the return value will be exactly correct, because they can only
> be mapped
> + * at most once into an MM, and they cannot be partially mapped.
> + *
> + * For other folios, the result can be fuzzy:
> + * (a) For partially-mappable large folios (THP), the return value
> can wrongly
> + *     indicate "mapped exclusively" (false negative) when the folio
> is
> + *     only partially mapped into at least one MM.
> + * (b) For pagecache folios (including hugetlb), the return value
> can wrongly
> + *     indicate "mapped shared" (false positive) when two VMAs in
> the same MM
> + *     cover the same file range.
> + * (c) For (small) KSM folios, the return value can wrongly indicate
> "mapped
> + *     shared" (false negative), when the folio is mapped multiple
> times into
> + *     the same MM.
> + *
> + * Further, this function only considers current page table mappings
> that
> + * are tracked using the folio mapcount(s). It does not consider:
> + * (1) If the folio might get mapped in the (near) future (e.g.,
> swapcache,
> + *     pagecache, temporary unmapping for migration).
> + * (2) If the folio is mapped differently (VM_PFNMAP).
> + * (3) If hugetlb page table sharing applies. Callers might want to
> check
> + *     hugetlb_pmd_shared().
> + *
> + * Return: Whether the folio is estimated to be mapped into more
> than one MM.
>   */
> -static inline int folio_estimated_sharers(struct folio *folio)
> +static inline bool folio_likely_mapped_shared(struct folio *folio)
>  {
> -	return page_mapcount(folio_page(folio, 0));
> +	return page_mapcount(folio_page(folio, 0)) > 1;
>  }
>  
>  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 50d146eb248ff..4d10904fef70c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather
> *tlb, struct vm_area_struct *vma,
>  	 * If other processes are mapping this folio, we couldn't
> discard
>  	 * the folio unless they all do MADV_FREE so let's skip the
> folio.
>  	 */
> -	if (folio_estimated_sharers(folio) != 1)
> +	if (folio_likely_mapped_shared(folio))
>  		goto out;
>  
>  	if (!folio_trylock(folio))
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 44a498c94158c..32a534d200219 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -366,7 +366,7 @@ static int
> madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		folio = pfn_folio(pmd_pfn(orig_pmd));
>  
>  		/* Do not interfere with other mappings of this
> folio */
> -		if (folio_estimated_sharers(folio) != 1)
> +		if (folio_likely_mapped_shared(folio))
>  			goto huge_unlock;
>  
>  		if (pageout_anon_only_filter &&
> !folio_test_anon(folio))
> @@ -453,7 +453,7 @@ static int
> madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		if (folio_test_large(folio)) {
>  			int err;
>  
> -			if (folio_estimated_sharers(folio) > 1)
> +			if (folio_likely_mapped_shared(folio))
>  				break;
>  			if (pageout_anon_only_filter &&
> !folio_test_anon(folio))
>  				break;
> @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd,
> unsigned long addr,
>  		if (folio_test_large(folio)) {
>  			int err;
>  
> -			if (folio_estimated_sharers(folio) != 1)
> +			if (folio_likely_mapped_shared(folio))
>  				break;
>  			if (!folio_trylock(folio))
>  				break;
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c45b6a42a1b9..8394a9843ca06 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault
> *vmf)
>  	 * Flag if the folio is shared between multiple address
> spaces. This
>  	 * is later used when determining whether to group tasks
> together
>  	 */
> -	if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags &
> VM_SHARED))
> +	if (folio_likely_mapped_shared(folio) && (vma->vm_flags &
> VM_SHARED))
>  		flags |= TNF_SHARED;
>  
>  	nid = folio_nid(folio);
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f60b4c99f1302..0b92fde395182 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte,
> unsigned long hmask,
>  	 * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a
> shared folio.
>  	 * Choosing not to migrate a shared folio is not counted as
> a failure.
>  	 *
> -	 * To check if the folio is shared, ideally we want to make
> sure
> -	 * every page is mapped to the same process. Doing that is
> very
> -	 * expensive, so check the estimated sharers of the folio
> instead.
> +	 * See folio_likely_mapped_shared() on possible imprecision
> when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
>  	if ((flags & MPOL_MF_MOVE_ALL) ||
> -	    (folio_estimated_sharers(folio) == 1 &&
> !hugetlb_pmd_shared(pte)))
> +	    (!folio_likely_mapped_shared(folio) &&
> !hugetlb_pmd_shared(pte)))
>  		if (!isolate_hugetlb(folio, qp->pagelist))
>  			qp->nr_failed++;
>  unlock:
> @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio
> *folio, struct list_head *foliolist,
>  	 * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a
> shared folio.
>  	 * Choosing not to migrate a shared folio is not counted as
> a failure.
>  	 *
> -	 * To check if the folio is shared, ideally we want to make
> sure
> -	 * every page is mapped to the same process. Doing that is
> very
> -	 * expensive, so check the estimated sharers of the folio
> instead.
> +	 * See folio_likely_mapped_shared() on possible imprecision
> when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
> -	if ((flags & MPOL_MF_MOVE_ALL) ||
> folio_estimated_sharers(folio) == 1) {
> +	if ((flags & MPOL_MF_MOVE_ALL) ||
> !folio_likely_mapped_shared(folio)) {
>  		if (folio_isolate_lru(folio)) {
>  			list_add_tail(&folio->lru, foliolist);
>  			node_stat_mod_folio(folio,
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 73a052a382f13..35d376969f8b9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio
> *folio, struct vm_area_struct *vma,
>  	/*
>  	 * Don't migrate file folios that are mapped in multiple
> processes
>  	 * with execute permissions as they are probably shared
> libraries.
> -	 * To check if the folio is shared, ideally we want to make
> sure
> -	 * every page is mapped to the same process. Doing that is
> very
> -	 * expensive, so check the estimated mapcount of the folio
> instead.
> +	 *
> +	 * See folio_likely_mapped_shared() on possible imprecision
> when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
> -	if (folio_estimated_sharers(folio) != 1 &&
> folio_is_file_lru(folio) &&
> +	if (folio_likely_mapped_shared(folio) &&
> folio_is_file_lru(folio) &&
>  	    (vma->vm_flags & VM_EXEC))
>  		goto out;
>
Barry Song Feb. 27, 2024, 11:29 p.m. UTC | #2
On Wed, Feb 28, 2024 at 9:16 AM David Hildenbrand <david@redhat.com> wrote:
>
> Callers of folio_estimated_sharers() only care about "mapped shared vs.
> mapped exclusively", not the exact estimate of sharers. Let's consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
>
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
>
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be able
> to improve our estimate for partially-mappable folios, though.
>
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first subpage
> is not mapped, we will consistently detect it as "mapped exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large folios: if
> the first page was already unmapped, we would have skipped the folio.
>
> Cc: Barry Song <v-songbaohua@oppo.com>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

LGTM,
Acked-by: Barry Song <v-songbaohua@oppo.com>

> ---
>  include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++----------
>  mm/huge_memory.c   |  2 +-
>  mm/madvise.c       |  6 +++---
>  mm/memory.c        |  2 +-
>  mm/mempolicy.c     | 14 ++++++--------
>  mm/migrate.c       |  8 ++++----
>  6 files changed, 51 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f4825d829656..795c89632265f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio *folio)
>  }
>
>  /**
> - * folio_estimated_sharers - Estimate the number of sharers of a folio.
> + * folio_likely_mapped_shared - Estimate if the folio is mapped into the page
> + *                             tables of more than one MM
>   * @folio: The folio.
>   *
> - * folio_estimated_sharers() aims to serve as a function to efficiently
> - * estimate the number of processes sharing a folio. This is done by
> - * looking at the precise mapcount of the first subpage in the folio, and
> - * assuming the other subpages are the same. This may not be true for large
> - * folios. If you want exact mapcounts for exact calculations, look at
> - * page_mapcount() or folio_total_mapcount().
> + * This function checks if the folio is currently mapped into more than one
> + * MM ("mapped shared"), or if the folio is only mapped into a single MM
> + * ("mapped exclusively").
>   *
> - * Return: The estimated number of processes sharing a folio.
> + * As precise information is not easily available for all folios, this function
> + * estimates the number of MMs ("sharers") that are currently mapping a folio
> + * using the number of times the first page of the folio is currently mapped
> + * into page tables.
> + *
> + * For small anonymous folios (except KSM folios) and anonymous hugetlb folios,
> + * the return value will be exactly correct, because they can only be mapped
> + * at most once into an MM, and they cannot be partially mapped.
> + *
> + * For other folios, the result can be fuzzy:
> + * (a) For partially-mappable large folios (THP), the return value can wrongly
> + *     indicate "mapped exclusively" (false negative) when the folio is
> + *     only partially mapped into at least one MM.
> + * (b) For pagecache folios (including hugetlb), the return value can wrongly
> + *     indicate "mapped shared" (false positive) when two VMAs in the same MM
> + *     cover the same file range.
> + * (c) For (small) KSM folios, the return value can wrongly indicate "mapped
> + *     shared" (false negative), when the folio is mapped multiple times into
> + *     the same MM.
> + *
> + * Further, this function only considers current page table mappings that
> + * are tracked using the folio mapcount(s). It does not consider:
> + * (1) If the folio might get mapped in the (near) future (e.g., swapcache,
> + *     pagecache, temporary unmapping for migration).
> + * (2) If the folio is mapped differently (VM_PFNMAP).
> + * (3) If hugetlb page table sharing applies. Callers might want to check
> + *     hugetlb_pmd_shared().
> + *
> + * Return: Whether the folio is estimated to be mapped into more than one MM.
>   */
> -static inline int folio_estimated_sharers(struct folio *folio)
> +static inline bool folio_likely_mapped_shared(struct folio *folio)
>  {
> -       return page_mapcount(folio_page(folio, 0));
> +       return page_mapcount(folio_page(folio, 0)) > 1;
>  }
>
>  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 50d146eb248ff..4d10904fef70c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>          * If other processes are mapping this folio, we couldn't discard
>          * the folio unless they all do MADV_FREE so let's skip the folio.
>          */
> -       if (folio_estimated_sharers(folio) != 1)
> +       if (folio_likely_mapped_shared(folio))
>                 goto out;
>
>         if (!folio_trylock(folio))
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 44a498c94158c..32a534d200219 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -366,7 +366,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>                 folio = pfn_folio(pmd_pfn(orig_pmd));
>
>                 /* Do not interfere with other mappings of this folio */
> -               if (folio_estimated_sharers(folio) != 1)
> +               if (folio_likely_mapped_shared(folio))
>                         goto huge_unlock;
>
>                 if (pageout_anon_only_filter && !folio_test_anon(folio))
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>                 if (folio_test_large(folio)) {
>                         int err;
>
> -                       if (folio_estimated_sharers(folio) > 1)
> +                       if (folio_likely_mapped_shared(folio))
>                                 break;
>                         if (pageout_anon_only_filter && !folio_test_anon(folio))
>                                 break;
> @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>                 if (folio_test_large(folio)) {
>                         int err;
>
> -                       if (folio_estimated_sharers(folio) != 1)
> +                       if (folio_likely_mapped_shared(folio))
>                                 break;
>                         if (!folio_trylock(folio))
>                                 break;
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c45b6a42a1b9..8394a9843ca06 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>          * Flag if the folio is shared between multiple address spaces. This
>          * is later used when determining whether to group tasks together
>          */
> -       if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags & VM_SHARED))
> +       if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
>                 flags |= TNF_SHARED;
>
>         nid = folio_nid(folio);
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f60b4c99f1302..0b92fde395182 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask,
>          * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
>          * Choosing not to migrate a shared folio is not counted as a failure.
>          *
> -        * To check if the folio is shared, ideally we want to make sure
> -        * every page is mapped to the same process. Doing that is very
> -        * expensive, so check the estimated sharers of the folio instead.
> +        * See folio_likely_mapped_shared() on possible imprecision when we
> +        * cannot easily detect if a folio is shared.
>          */
>         if ((flags & MPOL_MF_MOVE_ALL) ||
> -           (folio_estimated_sharers(folio) == 1 && !hugetlb_pmd_shared(pte)))
> +           (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pte)))
>                 if (!isolate_hugetlb(folio, qp->pagelist))
>                         qp->nr_failed++;
>  unlock:
> @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist,
>          * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
>          * Choosing not to migrate a shared folio is not counted as a failure.
>          *
> -        * To check if the folio is shared, ideally we want to make sure
> -        * every page is mapped to the same process. Doing that is very
> -        * expensive, so check the estimated sharers of the folio instead.
> +        * See folio_likely_mapped_shared() on possible imprecision when we
> +        * cannot easily detect if a folio is shared.
>          */
> -       if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) == 1) {
> +       if ((flags & MPOL_MF_MOVE_ALL) || !folio_likely_mapped_shared(folio)) {
>                 if (folio_isolate_lru(folio)) {
>                         list_add_tail(&folio->lru, foliolist);
>                         node_stat_mod_folio(folio,
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 73a052a382f13..35d376969f8b9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
>         /*
>          * Don't migrate file folios that are mapped in multiple processes
>          * with execute permissions as they are probably shared libraries.
> -        * To check if the folio is shared, ideally we want to make sure
> -        * every page is mapped to the same process. Doing that is very
> -        * expensive, so check the estimated mapcount of the folio instead.
> +        *
> +        * See folio_likely_mapped_shared() on possible imprecision when we
> +        * cannot easily detect if a folio is shared.
>          */
> -       if (folio_estimated_sharers(folio) != 1 && folio_is_file_lru(folio) &&
> +       if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio) &&
>             (vma->vm_flags & VM_EXEC))
>                 goto out;
>
> --
> 2.43.2
>
>

Thanks
Barry
Vishal Moola Feb. 27, 2024, 11:46 p.m. UTC | #3
On Tue, Feb 27, 2024 at 12:15 PM David Hildenbrand <david@redhat.com> wrote:
>
> Callers of folio_estimated_sharers() only care about "mapped shared vs.
> mapped exclusively", not the exact estimate of sharers. Let's consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
>
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
>
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be able
> to improve our estimate for partially-mappable folios, though.
>
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first subpage
> is not mapped, we will consistently detect it as "mapped exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large folios: if
> the first page was already unmapped, we would have skipped the folio.
>
> Cc: Barry Song <v-songbaohua@oppo.com>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>

> ---
>  include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++----------
>  mm/huge_memory.c   |  2 +-
>  mm/madvise.c       |  6 +++---
>  mm/memory.c        |  2 +-
>  mm/mempolicy.c     | 14 ++++++--------
>  mm/migrate.c       |  8 ++++----
>  6 files changed, 51 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f4825d829656..795c89632265f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio *folio)
>  }
>
>  /**
> - * folio_estimated_sharers - Estimate the number of sharers of a folio.
> + * folio_likely_mapped_shared - Estimate if the folio is mapped into the page
> + *                             tables of more than one MM
>   * @folio: The folio.
>   *
> - * folio_estimated_sharers() aims to serve as a function to efficiently
> - * estimate the number of processes sharing a folio. This is done by
> - * looking at the precise mapcount of the first subpage in the folio, and
> - * assuming the other subpages are the same. This may not be true for large
> - * folios. If you want exact mapcounts for exact calculations, look at
> - * page_mapcount() or folio_total_mapcount().
> + * This function checks if the folio is currently mapped into more than one
> + * MM ("mapped shared"), or if the folio is only mapped into a single MM
> + * ("mapped exclusively").
>   *
> - * Return: The estimated number of processes sharing a folio.
> + * As precise information is not easily available for all folios, this function
> + * estimates the number of MMs ("sharers") that are currently mapping a folio
> + * using the number of times the first page of the folio is currently mapped
> + * into page tables.
> + *
> + * For small anonymous folios (except KSM folios) and anonymous hugetlb folios,
> + * the return value will be exactly correct, because they can only be mapped
> + * at most once into an MM, and they cannot be partially mapped.
> + *
> + * For other folios, the result can be fuzzy:
> + * (a) For partially-mappable large folios (THP), the return value can wrongly
> + *     indicate "mapped exclusively" (false negative) when the folio is
> + *     only partially mapped into at least one MM.
> + * (b) For pagecache folios (including hugetlb), the return value can wrongly
> + *     indicate "mapped shared" (false positive) when two VMAs in the same MM
> + *     cover the same file range.
> + * (c) For (small) KSM folios, the return value can wrongly indicate "mapped
> + *     shared" (false negative), when the folio is mapped multiple times into
> + *     the same MM.
> + *
> + * Further, this function only considers current page table mappings that
> + * are tracked using the folio mapcount(s). It does not consider:
> + * (1) If the folio might get mapped in the (near) future (e.g., swapcache,
> + *     pagecache, temporary unmapping for migration).
> + * (2) If the folio is mapped differently (VM_PFNMAP).
> + * (3) If hugetlb page table sharing applies. Callers might want to check
> + *     hugetlb_pmd_shared().
> + *
> + * Return: Whether the folio is estimated to be mapped into more than one MM.
>   */
> -static inline int folio_estimated_sharers(struct folio *folio)
> +static inline bool folio_likely_mapped_shared(struct folio *folio)
>  {
> -       return page_mapcount(folio_page(folio, 0));
> +       return page_mapcount(folio_page(folio, 0)) > 1;
>  }
>
>  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 50d146eb248ff..4d10904fef70c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>          * If other processes are mapping this folio, we couldn't discard
>          * the folio unless they all do MADV_FREE so let's skip the folio.
>          */
> -       if (folio_estimated_sharers(folio) != 1)
> +       if (folio_likely_mapped_shared(folio))
>                 goto out;
>
>         if (!folio_trylock(folio))
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 44a498c94158c..32a534d200219 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -366,7 +366,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>                 folio = pfn_folio(pmd_pfn(orig_pmd));
>
>                 /* Do not interfere with other mappings of this folio */
> -               if (folio_estimated_sharers(folio) != 1)
> +               if (folio_likely_mapped_shared(folio))
>                         goto huge_unlock;
>
>                 if (pageout_anon_only_filter && !folio_test_anon(folio))
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>                 if (folio_test_large(folio)) {
>                         int err;
>
> -                       if (folio_estimated_sharers(folio) > 1)
> +                       if (folio_likely_mapped_shared(folio))
>                                 break;
>                         if (pageout_anon_only_filter && !folio_test_anon(folio))
>                                 break;
> @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>                 if (folio_test_large(folio)) {
>                         int err;
>
> -                       if (folio_estimated_sharers(folio) != 1)
> +                       if (folio_likely_mapped_shared(folio))
>                                 break;
>                         if (!folio_trylock(folio))
>                                 break;
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c45b6a42a1b9..8394a9843ca06 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>          * Flag if the folio is shared between multiple address spaces. This
>          * is later used when determining whether to group tasks together
>          */
> -       if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags & VM_SHARED))
> +       if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
>                 flags |= TNF_SHARED;
>
>         nid = folio_nid(folio);
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f60b4c99f1302..0b92fde395182 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask,
>          * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
>          * Choosing not to migrate a shared folio is not counted as a failure.
>          *
> -        * To check if the folio is shared, ideally we want to make sure
> -        * every page is mapped to the same process. Doing that is very
> -        * expensive, so check the estimated sharers of the folio instead.
> +        * See folio_likely_mapped_shared() on possible imprecision when we
> +        * cannot easily detect if a folio is shared.
>          */
>         if ((flags & MPOL_MF_MOVE_ALL) ||
> -           (folio_estimated_sharers(folio) == 1 && !hugetlb_pmd_shared(pte)))
> +           (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pte)))
>                 if (!isolate_hugetlb(folio, qp->pagelist))
>                         qp->nr_failed++;
>  unlock:
> @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist,
>          * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
>          * Choosing not to migrate a shared folio is not counted as a failure.
>          *
> -        * To check if the folio is shared, ideally we want to make sure
> -        * every page is mapped to the same process. Doing that is very
> -        * expensive, so check the estimated sharers of the folio instead.
> +        * See folio_likely_mapped_shared() on possible imprecision when we
> +        * cannot easily detect if a folio is shared.
>          */
> -       if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) == 1) {
> +       if ((flags & MPOL_MF_MOVE_ALL) || !folio_likely_mapped_shared(folio)) {
>                 if (folio_isolate_lru(folio)) {
>                         list_add_tail(&folio->lru, foliolist);
>                         node_stat_mod_folio(folio,
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 73a052a382f13..35d376969f8b9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
>         /*
>          * Don't migrate file folios that are mapped in multiple processes
>          * with execute permissions as they are probably shared libraries.
> -        * To check if the folio is shared, ideally we want to make sure
> -        * every page is mapped to the same process. Doing that is very
> -        * expensive, so check the estimated mapcount of the folio instead.
> +        *
> +        * See folio_likely_mapped_shared() on possible imprecision when we
> +        * cannot easily detect if a folio is shared.
>          */
> -       if (folio_estimated_sharers(folio) != 1 && folio_is_file_lru(folio) &&
> +       if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio) &&
>             (vma->vm_flags & VM_EXEC))
>                 goto out;
>
> --
> 2.43.2
>
Ryan Roberts Feb. 29, 2024, 11:45 a.m. UTC | #4
On 27/02/2024 20:15, David Hildenbrand wrote:
> Callers of folio_estimated_sharers() only care about "mapped shared vs.
> mapped exclusively", not the exact estimate of sharers. Let's consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
> 
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
> 
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be able
> to improve our estimate for partially-mappable folios, though.
> 
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first subpage
> is not mapped, we will consistently detect it as "mapped exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large folios: if
> the first page was already unmapped, we would have skipped the folio.
> 
> Cc: Barry Song <v-songbaohua@oppo.com>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++----------
>  mm/huge_memory.c   |  2 +-
>  mm/madvise.c       |  6 +++---
>  mm/memory.c        |  2 +-
>  mm/mempolicy.c     | 14 ++++++--------
>  mm/migrate.c       |  8 ++++----
>  6 files changed, 51 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f4825d829656..795c89632265f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio *folio)
>  }
>  
>  /**
> - * folio_estimated_sharers - Estimate the number of sharers of a folio.
> + * folio_likely_mapped_shared - Estimate if the folio is mapped into the page
> + *				tables of more than one MM
>   * @folio: The folio.
>   *
> - * folio_estimated_sharers() aims to serve as a function to efficiently
> - * estimate the number of processes sharing a folio. This is done by
> - * looking at the precise mapcount of the first subpage in the folio, and
> - * assuming the other subpages are the same. This may not be true for large
> - * folios. If you want exact mapcounts for exact calculations, look at
> - * page_mapcount() or folio_total_mapcount().
> + * This function checks if the folio is currently mapped into more than one
> + * MM ("mapped shared"), or if the folio is only mapped into a single MM
> + * ("mapped exclusively").
>   *
> - * Return: The estimated number of processes sharing a folio.
> + * As precise information is not easily available for all folios, this function
> + * estimates the number of MMs ("sharers") that are currently mapping a folio
> + * using the number of times the first page of the folio is currently mapped
> + * into page tables.
> + *
> + * For small anonymous folios (except KSM folios) and anonymous hugetlb folios,
> + * the return value will be exactly correct, because they can only be mapped
> + * at most once into an MM, and they cannot be partially mapped.
> + *
> + * For other folios, the result can be fuzzy:
> + * (a) For partially-mappable large folios (THP), the return value can wrongly
> + *     indicate "mapped exclusively" (false negative) when the folio is
> + *     only partially mapped into at least one MM.
> + * (b) For pagecache folios (including hugetlb), the return value can wrongly
> + *     indicate "mapped shared" (false positive) when two VMAs in the same MM
> + *     cover the same file range.
> + * (c) For (small) KSM folios, the return value can wrongly indicate "mapped
> + *     shared" (false negative), when the folio is mapped multiple times into
> + *     the same MM.
> + *
> + * Further, this function only considers current page table mappings that
> + * are tracked using the folio mapcount(s). It does not consider:
> + * (1) If the folio might get mapped in the (near) future (e.g., swapcache,
> + *     pagecache, temporary unmapping for migration).
> + * (2) If the folio is mapped differently (VM_PFNMAP).
> + * (3) If hugetlb page table sharing applies. Callers might want to check
> + *     hugetlb_pmd_shared().
> + *
> + * Return: Whether the folio is estimated to be mapped into more than one MM.
>   */
> -static inline int folio_estimated_sharers(struct folio *folio)
> +static inline bool folio_likely_mapped_shared(struct folio *folio)
>  {
> -	return page_mapcount(folio_page(folio, 0));
> +	return page_mapcount(folio_page(folio, 0)) > 1;
>  }
>  
>  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 50d146eb248ff..4d10904fef70c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  	 * If other processes are mapping this folio, we couldn't discard
>  	 * the folio unless they all do MADV_FREE so let's skip the folio.
>  	 */
> -	if (folio_estimated_sharers(folio) != 1)
> +	if (folio_likely_mapped_shared(folio))
>  		goto out;
>  
>  	if (!folio_trylock(folio))
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 44a498c94158c..32a534d200219 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -366,7 +366,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		folio = pfn_folio(pmd_pfn(orig_pmd));
>  
>  		/* Do not interfere with other mappings of this folio */
> -		if (folio_estimated_sharers(folio) != 1)
> +		if (folio_likely_mapped_shared(folio))
>  			goto huge_unlock;
>  
>  		if (pageout_anon_only_filter && !folio_test_anon(folio))
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		if (folio_test_large(folio)) {
>  			int err;
>  
> -			if (folio_estimated_sharers(folio) > 1)
> +			if (folio_likely_mapped_shared(folio))
>  				break;
>  			if (pageout_anon_only_filter && !folio_test_anon(folio))
>  				break;
> @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  		if (folio_test_large(folio)) {
>  			int err;
>  
> -			if (folio_estimated_sharers(folio) != 1)
> +			if (folio_likely_mapped_shared(folio))
>  				break;
>  			if (!folio_trylock(folio))
>  				break;
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c45b6a42a1b9..8394a9843ca06 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>  	 * Flag if the folio is shared between multiple address spaces. This
>  	 * is later used when determining whether to group tasks together
>  	 */
> -	if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags & VM_SHARED))
> +	if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
>  		flags |= TNF_SHARED;
>  
>  	nid = folio_nid(folio);
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f60b4c99f1302..0b92fde395182 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask,
>  	 * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
>  	 * Choosing not to migrate a shared folio is not counted as a failure.
>  	 *
> -	 * To check if the folio is shared, ideally we want to make sure
> -	 * every page is mapped to the same process. Doing that is very
> -	 * expensive, so check the estimated sharers of the folio instead.
> +	 * See folio_likely_mapped_shared() on possible imprecision when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
>  	if ((flags & MPOL_MF_MOVE_ALL) ||
> -	    (folio_estimated_sharers(folio) == 1 && !hugetlb_pmd_shared(pte)))
> +	    (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pte)))
>  		if (!isolate_hugetlb(folio, qp->pagelist))
>  			qp->nr_failed++;
>  unlock:
> @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist,
>  	 * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
>  	 * Choosing not to migrate a shared folio is not counted as a failure.
>  	 *
> -	 * To check if the folio is shared, ideally we want to make sure
> -	 * every page is mapped to the same process. Doing that is very
> -	 * expensive, so check the estimated sharers of the folio instead.
> +	 * See folio_likely_mapped_shared() on possible imprecision when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
> -	if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) == 1) {
> +	if ((flags & MPOL_MF_MOVE_ALL) || !folio_likely_mapped_shared(folio)) {
>  		if (folio_isolate_lru(folio)) {
>  			list_add_tail(&folio->lru, foliolist);
>  			node_stat_mod_folio(folio,
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 73a052a382f13..35d376969f8b9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
>  	/*
>  	 * Don't migrate file folios that are mapped in multiple processes
>  	 * with execute permissions as they are probably shared libraries.
> -	 * To check if the folio is shared, ideally we want to make sure
> -	 * every page is mapped to the same process. Doing that is very
> -	 * expensive, so check the estimated mapcount of the folio instead.
> +	 *
> +	 * See folio_likely_mapped_shared() on possible imprecision when we
> +	 * cannot easily detect if a folio is shared.
>  	 */
> -	if (folio_estimated_sharers(folio) != 1 && folio_is_file_lru(folio) &&
> +	if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio) &&
>  	    (vma->vm_flags & VM_EXEC))
>  		goto out;
>
Zi Yan Feb. 29, 2024, 3:28 p.m. UTC | #5
On 27 Feb 2024, at 15:15, David Hildenbrand wrote:

> Callers of folio_estimated_sharers() only care about "mapped shared vs.
> mapped exclusively", not the exact estimate of sharers. Let's consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
>
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
>
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be able
> to improve our estimate for partially-mappable folios, though.
>
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first subpage
> is not mapped, we will consistently detect it as "mapped exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large folios: if
> the first page was already unmapped, we would have skipped the folio.
>
> Cc: Barry Song <v-songbaohua@oppo.com>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
LGTM. Thanks for the documentation. Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi
David Hildenbrand March 25, 2024, 8:57 a.m. UTC | #6
On 27.02.24 21:15, David Hildenbrand wrote:
> Callers of folio_estimated_sharers() only care about "mapped shared vs.
> mapped exclusively", not the exact estimate of sharers. Let's consolidate
> and unify the condition users are checking. While at it clarify the
> semantics and extend the discussion on the fuzziness.
> 
> Use the "likely mapped shared" terminology to better express what the
> (adjusted) function actually checks.
> 
> Whether a partially-mappable folio is more likely to not be partially
> mapped than partially mapped is debatable. In the future, we might be able
> to improve our estimate for partially-mappable folios, though.
> 
> Note that we will now consistently detect "mapped shared" only if the
> first subpage is actually mapped multiple times. When the first subpage
> is not mapped, we will consistently detect it as "mapped exclusively".
> This change should currently only affect the usage in
> madvise_free_pte_range() and queue_folios_pte_range() for large folios: if
> the first page was already unmapped, we would have skipped the folio.
> 
> Cc: Barry Song <v-songbaohua@oppo.com>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

The following fixup on top to make kerneldoc happy:


 From 3e472636d266e3acba3755ed5712992adbc2151d Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Mon, 25 Mar 2024 09:23:03 +0100
Subject: [PATCH] folio_likely_mapped_shared() kerneldoc fixup

Fixup "mm: convert folio_estimated_sharers() to
folio_likely_mapped_shared()".

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  include/linux/mm.h | 32 +++++++++++++++++---------------
  1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index afe27ff3fa94..fb3724723448 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2183,23 +2183,25 @@ static inline size_t folio_size(struct folio *folio)
   * at most once into an MM, and they cannot be partially mapped.
   *
   * For other folios, the result can be fuzzy:
- * (a) For partially-mappable large folios (THP), the return value can wrongly
- *     indicate "mapped exclusively" (false negative) when the folio is
- *     only partially mapped into at least one MM.
- * (b) For pagecache folios (including hugetlb), the return value can wrongly
- *     indicate "mapped shared" (false positive) when two VMAs in the same MM
- *     cover the same file range.
- * (c) For (small) KSM folios, the return value can wrongly indicate "mapped
- *     shared" (false negative), when the folio is mapped multiple times into
- *     the same MM.
+ *    #. For partially-mappable large folios (THP), the return value can wrongly
+ *       indicate "mapped exclusively" (false negative) when the folio is
+ *       only partially mapped into at least one MM.
+ *    #. For pagecache folios (including hugetlb), the return value can wrongly
+ *       indicate "mapped shared" (false positive) when two VMAs in the same MM
+ *       cover the same file range.
+ *    #. For (small) KSM folios, the return value can wrongly indicate "mapped
+ *       shared" (false negative), when the folio is mapped multiple times into
+ *       the same MM.
   *
   * Further, this function only considers current page table mappings that
- * are tracked using the folio mapcount(s). It does not consider:
- * (1) If the folio might get mapped in the (near) future (e.g., swapcache,
- *     pagecache, temporary unmapping for migration).
- * (2) If the folio is mapped differently (VM_PFNMAP).
- * (3) If hugetlb page table sharing applies. Callers might want to check
- *     hugetlb_pmd_shared().
+ * are tracked using the folio mapcount(s).
+ *
+ * This function does not consider:
+ *    #. If the folio might get mapped in the (near) future (e.g., swapcache,
+ *       pagecache, temporary unmapping for migration).
+ *    #. If the folio is mapped differently (VM_PFNMAP).
+ *    #. If hugetlb page table sharing applies. Callers might want to check
+ *       hugetlb_pmd_shared().
   *
   * Return: Whether the folio is estimated to be mapped into more than one MM.
   */
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6f4825d829656..795c89632265f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2147,21 +2147,47 @@  static inline size_t folio_size(struct folio *folio)
 }
 
 /**
- * folio_estimated_sharers - Estimate the number of sharers of a folio.
+ * folio_likely_mapped_shared - Estimate if the folio is mapped into the page
+ *				tables of more than one MM
  * @folio: The folio.
  *
- * folio_estimated_sharers() aims to serve as a function to efficiently
- * estimate the number of processes sharing a folio. This is done by
- * looking at the precise mapcount of the first subpage in the folio, and
- * assuming the other subpages are the same. This may not be true for large
- * folios. If you want exact mapcounts for exact calculations, look at
- * page_mapcount() or folio_total_mapcount().
+ * This function checks if the folio is currently mapped into more than one
+ * MM ("mapped shared"), or if the folio is only mapped into a single MM
+ * ("mapped exclusively").
  *
- * Return: The estimated number of processes sharing a folio.
+ * As precise information is not easily available for all folios, this function
+ * estimates the number of MMs ("sharers") that are currently mapping a folio
+ * using the number of times the first page of the folio is currently mapped
+ * into page tables.
+ *
+ * For small anonymous folios (except KSM folios) and anonymous hugetlb folios,
+ * the return value will be exactly correct, because they can only be mapped
+ * at most once into an MM, and they cannot be partially mapped.
+ *
+ * For other folios, the result can be fuzzy:
+ * (a) For partially-mappable large folios (THP), the return value can wrongly
+ *     indicate "mapped exclusively" (false negative) when the folio is
+ *     only partially mapped into at least one MM.
+ * (b) For pagecache folios (including hugetlb), the return value can wrongly
+ *     indicate "mapped shared" (false positive) when two VMAs in the same MM
+ *     cover the same file range.
+ * (c) For (small) KSM folios, the return value can wrongly indicate "mapped
+ *     shared" (false negative), when the folio is mapped multiple times into
+ *     the same MM.
+ *
+ * Further, this function only considers current page table mappings that
+ * are tracked using the folio mapcount(s). It does not consider:
+ * (1) If the folio might get mapped in the (near) future (e.g., swapcache,
+ *     pagecache, temporary unmapping for migration).
+ * (2) If the folio is mapped differently (VM_PFNMAP).
+ * (3) If hugetlb page table sharing applies. Callers might want to check
+ *     hugetlb_pmd_shared().
+ *
+ * Return: Whether the folio is estimated to be mapped into more than one MM.
  */
-static inline int folio_estimated_sharers(struct folio *folio)
+static inline bool folio_likely_mapped_shared(struct folio *folio)
 {
-	return page_mapcount(folio_page(folio, 0));
+	return page_mapcount(folio_page(folio, 0)) > 1;
 }
 
 #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 50d146eb248ff..4d10904fef70c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1829,7 +1829,7 @@  bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	 * If other processes are mapping this folio, we couldn't discard
 	 * the folio unless they all do MADV_FREE so let's skip the folio.
 	 */
-	if (folio_estimated_sharers(folio) != 1)
+	if (folio_likely_mapped_shared(folio))
 		goto out;
 
 	if (!folio_trylock(folio))
diff --git a/mm/madvise.c b/mm/madvise.c
index 44a498c94158c..32a534d200219 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -366,7 +366,7 @@  static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		folio = pfn_folio(pmd_pfn(orig_pmd));
 
 		/* Do not interfere with other mappings of this folio */
-		if (folio_estimated_sharers(folio) != 1)
+		if (folio_likely_mapped_shared(folio))
 			goto huge_unlock;
 
 		if (pageout_anon_only_filter && !folio_test_anon(folio))
@@ -453,7 +453,7 @@  static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		if (folio_test_large(folio)) {
 			int err;
 
-			if (folio_estimated_sharers(folio) > 1)
+			if (folio_likely_mapped_shared(folio))
 				break;
 			if (pageout_anon_only_filter && !folio_test_anon(folio))
 				break;
@@ -677,7 +677,7 @@  static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 		if (folio_test_large(folio)) {
 			int err;
 
-			if (folio_estimated_sharers(folio) != 1)
+			if (folio_likely_mapped_shared(folio))
 				break;
 			if (!folio_trylock(folio))
 				break;
diff --git a/mm/memory.c b/mm/memory.c
index 1c45b6a42a1b9..8394a9843ca06 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5173,7 +5173,7 @@  static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	 * Flag if the folio is shared between multiple address spaces. This
 	 * is later used when determining whether to group tasks together
 	 */
-	if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags & VM_SHARED))
+	if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
 		flags |= TNF_SHARED;
 
 	nid = folio_nid(folio);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f60b4c99f1302..0b92fde395182 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -642,12 +642,11 @@  static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask,
 	 * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
 	 * Choosing not to migrate a shared folio is not counted as a failure.
 	 *
-	 * To check if the folio is shared, ideally we want to make sure
-	 * every page is mapped to the same process. Doing that is very
-	 * expensive, so check the estimated sharers of the folio instead.
+	 * See folio_likely_mapped_shared() on possible imprecision when we
+	 * cannot easily detect if a folio is shared.
 	 */
 	if ((flags & MPOL_MF_MOVE_ALL) ||
-	    (folio_estimated_sharers(folio) == 1 && !hugetlb_pmd_shared(pte)))
+	    (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pte)))
 		if (!isolate_hugetlb(folio, qp->pagelist))
 			qp->nr_failed++;
 unlock:
@@ -1032,11 +1031,10 @@  static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist,
 	 * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio.
 	 * Choosing not to migrate a shared folio is not counted as a failure.
 	 *
-	 * To check if the folio is shared, ideally we want to make sure
-	 * every page is mapped to the same process. Doing that is very
-	 * expensive, so check the estimated sharers of the folio instead.
+	 * See folio_likely_mapped_shared() on possible imprecision when we
+	 * cannot easily detect if a folio is shared.
 	 */
-	if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) == 1) {
+	if ((flags & MPOL_MF_MOVE_ALL) || !folio_likely_mapped_shared(folio)) {
 		if (folio_isolate_lru(folio)) {
 			list_add_tail(&folio->lru, foliolist);
 			node_stat_mod_folio(folio,
diff --git a/mm/migrate.c b/mm/migrate.c
index 73a052a382f13..35d376969f8b9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2568,11 +2568,11 @@  int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
 	/*
 	 * Don't migrate file folios that are mapped in multiple processes
 	 * with execute permissions as they are probably shared libraries.
-	 * To check if the folio is shared, ideally we want to make sure
-	 * every page is mapped to the same process. Doing that is very
-	 * expensive, so check the estimated mapcount of the folio instead.
+	 *
+	 * See folio_likely_mapped_shared() on possible imprecision when we
+	 * cannot easily detect if a folio is shared.
 	 */
-	if (folio_estimated_sharers(folio) != 1 && folio_is_file_lru(folio) &&
+	if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio) &&
 	    (vma->vm_flags & VM_EXEC))
 		goto out;