[v9,03/10] mm/rmap: Split try_to_munlock from try_to_unmap

Message ID	20210524132725.12697-4-apopple@nvidia.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=cMiy=KT=lists.freedesktop.org=dri-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE8CB61376 Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; From: Alistair Popple <apopple@nvidia.com> To: <linux-mm@kvack.org>, <akpm@linux-foundation.org> Subject: [PATCH v9 03/10] mm/rmap: Split try_to_munlock from try_to_unmap Date: Mon, 24 May 2021 23:27:18 +1000 Message-ID: <20210524132725.12697-4-apopple@nvidia.com> In-Reply-To: <20210524132725.12697-1-apopple@nvidia.com> References: <20210524132725.12697-1-apopple@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: list Cc: rcampbell@nvidia.com, willy@infradead.org, linux-doc@vger.kernel.org, nouveau@lists.freedesktop.org, bsingharora@gmail.com, Alistair Popple <apopple@nvidia.com>, hughd@google.com, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, hch@infradead.org, jglisse@redhat.com, bskeggs@redhat.com, jgg@nvidia.com, jhubbard@nvidia.com, peterx@redhat.com, Christoph Hellwig <hch@lst.de> Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	Add support for SVM atomics in Nouveau \| expand [v9,00/10] Add support for SVM atomics in Nouveau [v9,01/10] mm: Remove special swap entry functions [v9,02/10] mm/swapops: Rework swap entry manipulation code [v9,03/10] mm/rmap: Split try_to_munlock from try_to_unmap [v9,04/10] mm/rmap: Split migration into its own function [v9,05/10] mm: Rename migrate_pgmap_owner [v9,06/10] mm/memory.c: Allow different return codes for copy_nonpresent_pte() [v9,07/10] mm: Device exclusive memory access [v9,08/10] mm: Selftests for exclusive device memory [v9,09/10] nouveau/svm: Refactor nouveau_range_fault [v9,10/10] nouveau/svm: Implement atomic SVM access

Alistair Popple May 24, 2021, 1:27 p.m. UTC

The behaviour of try_to_unmap_one() is difficult to follow because it
performs different operations based on a fairly large set of flags used
in different combinations.

TTU_MUNLOCK is one such flag. However it is exclusively used by
try_to_munlock() which specifies no other flags. Therefore rather than
overload try_to_unmap_one() with unrelated behaviour split this out into
it's own function and remove the flag.

Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

---

v9:
* Improved comments

v8:
* Renamed try_to_munlock to page_mlock to better reflect what the
  function actually does.
* Removed the TODO from the documentation that this patch addresses.

v7:
* Added Christoph's Reviewed-by

v4:
* Removed redundant check for VM_LOCKED
---
 Documentation/vm/unevictable-lru.rst | 33 ++++++---------
 include/linux/rmap.h                 |  3 +-
 mm/mlock.c                           | 10 ++---
 mm/rmap.c                            | 61 ++++++++++++++++++++--------
 4 files changed, 63 insertions(+), 44 deletions(-)

Liam R. Howlett May 25, 2021, 6:39 p.m. UTC | #1

* Alistair Popple <apopple@nvidia.com> [210524 09:29]:
> The behaviour of try_to_unmap_one() is difficult to follow because it
> performs different operations based on a fairly large set of flags used
> in different combinations.
> 
> TTU_MUNLOCK is one such flag. However it is exclusively used by
> try_to_munlock() which specifies no other flags. Therefore rather than
> overload try_to_unmap_one() with unrelated behaviour split this out into
> it's own function and remove the flag.
> 
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> ---
> 
> v9:
> * Improved comments
> 
> v8:
> * Renamed try_to_munlock to page_mlock to better reflect what the
>   function actually does.
> * Removed the TODO from the documentation that this patch addresses.
> 
> v7:
> * Added Christoph's Reviewed-by
> 
> v4:
> * Removed redundant check for VM_LOCKED
> ---
>  Documentation/vm/unevictable-lru.rst | 33 ++++++---------
>  include/linux/rmap.h                 |  3 +-
>  mm/mlock.c                           | 10 ++---
>  mm/rmap.c                            | 61 ++++++++++++++++++++--------
>  4 files changed, 63 insertions(+), 44 deletions(-)
> 
> diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst
> index 0e1490524f53..eae3af17f2d9 100644
> --- a/Documentation/vm/unevictable-lru.rst
> +++ b/Documentation/vm/unevictable-lru.rst
> @@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone statistics for the number of
>  mlocked pages.  Note, however, that at this point we haven't checked whether
>  the page is mapped by other VM_LOCKED VMAs.
>  
> -We can't call try_to_munlock(), the function that walks the reverse map to
> +We can't call page_mlock(), the function that walks the reverse map to
>  check for other VM_LOCKED VMAs, without first isolating the page from the LRU.
> -try_to_munlock() is a variant of try_to_unmap() and thus requires that the page
> +page_mlock() is a variant of try_to_unmap() and thus requires that the page
>  not be on an LRU list [more on these below].  However, the call to
> -isolate_lru_page() could fail, in which case we couldn't try_to_munlock().  So,
> +isolate_lru_page() could fail, in which case we can't call page_mlock().  So,
>  we go ahead and clear PG_mlocked up front, as this might be the only chance we
> -have.  If we can successfully isolate the page, we go ahead and
> -try_to_munlock(), which will restore the PG_mlocked flag and update the zone
> +have.  If we can successfully isolate the page, we go ahead and call
> +page_mlock(), which will restore the PG_mlocked flag and update the zone
>  page statistics if it finds another VMA holding the page mlocked.  If we fail
>  to isolate the page, we'll have left a potentially mlocked page on the LRU.
>  This is fine, because we'll catch it later if and if vmscan tries to reclaim
> @@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim,
>  holepunching, and truncation of file pages and their anonymous COWed pages.
>  
>  
> -try_to_munlock() Reverse Map Scan
> +page_mlock() Reverse Map Scan
>  ---------------------------------
>  
> -.. warning::
> -   [!] TODO/FIXME: a better name might be page_mlocked() - analogous to the
> -   page_referenced() reverse map walker.
> -
>  When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call
>  Handling <munlock_munlockall_handling>` above] tries to munlock a
>  page, it needs to determine whether or not the page is mapped by any
>  VM_LOCKED VMA without actually attempting to unmap all PTEs from the
>  page.  For this purpose, the unevictable/mlock infrastructure
> -introduced a variant of try_to_unmap() called try_to_munlock().
> +introduced a variant of try_to_unmap() called page_mlock().
>  
> -try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
> -mapped file and KSM pages with a flag argument specifying unlock versus unmap
> -processing.  Again, these functions walk the respective reverse maps looking
> -for VM_LOCKED VMAs.  When such a VMA is found, as in the try_to_unmap() case,
> -the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK.  This
> -undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.
> +page_mlock() walks the respective reverse maps looking for VM_LOCKED VMAs. When
> +such a VMA is found the page is mlocked via mlock_vma_page(). This undoes the
> +pre-clearing of the page's PG_mlocked done by munlock_vma_page.
>  
> -Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
> +Note that page_mlock()'s reverse map walk must visit every VMA in a page's
>  reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
>  However, the scan can terminate when it encounters a VM_LOCKED VMA.
> -Although try_to_munlock() might be called a great many times when munlocking a
> +Although page_mlock() might be called a great many times when munlocking a
>  large region or tearing down a large address space that has been mlocked via
>  mlockall(), overall this is a fairly rare event.
>  
> @@ -602,7 +595,7 @@ inactive lists to the appropriate node's unevictable list.
>  shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd
>  after shrink_active_list() had moved them to the inactive list, or pages mapped
>  into VM_LOCKED VMAs that munlock_vma_page() couldn't isolate from the LRU to
> -recheck via try_to_munlock().  shrink_inactive_list() won't notice the latter,
> +recheck via page_mlock().  shrink_inactive_list() won't notice the latter,
>  but will pass on to shrink_page_list().
>  
>  shrink_page_list() again culls obviously unevictable pages that it could
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index def5c62c93b3..38a746787c2f 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -87,7 +87,6 @@ struct anon_vma_chain {
>  
>  enum ttu_flags {
>  	TTU_MIGRATION		= 0x1,	/* migration mode */
> -	TTU_MUNLOCK		= 0x2,	/* munlock mode */
>  
>  	TTU_SPLIT_HUGE_PMD	= 0x4,	/* split huge PMD if any */
>  	TTU_IGNORE_MLOCK	= 0x8,	/* ignore mlock */
> @@ -239,7 +238,7 @@ int page_mkclean(struct page *);
>   * called in munlock()/munmap() path to check for other vmas holding
>   * the page mlocked.
>   */
> -void try_to_munlock(struct page *);
> +void page_mlock(struct page *page);
>  
>  void remove_migration_ptes(struct page *old, struct page *new, bool locked);
>  
> diff --git a/mm/mlock.c b/mm/mlock.c
> index df590fda5688..a518d4c48e65 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -108,7 +108,7 @@ void mlock_vma_page(struct page *page)
>  /*
>   * Finish munlock after successful page isolation
>   *
> - * Page must be locked. This is a wrapper for try_to_munlock()
> + * Page must be locked. This is a wrapper for page_mlock()
>   * and putback_lru_page() with munlock accounting.
>   */
>  static void __munlock_isolated_page(struct page *page)
> @@ -118,7 +118,7 @@ static void __munlock_isolated_page(struct page *page)
>  	 * and we don't need to check all the other vmas.
>  	 */
>  	if (page_mapcount(page) > 1)
> -		try_to_munlock(page);
> +		page_mlock(page);
>  
>  	/* Did try_to_unlock() succeed or punt? */
>  	if (!PageMlocked(page))
> @@ -158,7 +158,7 @@ static void __munlock_isolation_failed(struct page *page)
>   * munlock()ed or munmap()ed, we want to check whether other vmas hold the
>   * page locked so that we can leave it on the unevictable lru list and not
>   * bother vmscan with it.  However, to walk the page's rmap list in
> - * try_to_munlock() we must isolate the page from the LRU.  If some other
> + * page_mlock() we must isolate the page from the LRU.  If some other
>   * task has removed the page from the LRU, we won't be able to do that.
>   * So we clear the PageMlocked as we might not get another chance.  If we
>   * can't isolate the page, we leave it for putback_lru_page() and vmscan
> @@ -168,7 +168,7 @@ unsigned int munlock_vma_page(struct page *page)
>  {
>  	int nr_pages;
>  
> -	/* For try_to_munlock() and to serialize with page migration */
> +	/* For page_mlock() and to serialize with page migration */
>  	BUG_ON(!PageLocked(page));
>  	VM_BUG_ON_PAGE(PageTail(page), page);
>  
> @@ -205,7 +205,7 @@ static int __mlock_posix_error_return(long retval)
>   *
>   * The fast path is available only for evictable pages with single mapping.
>   * Then we can bypass the per-cpu pvec and get better performance.
> - * when mapcount > 1 we need try_to_munlock() which can fail.
> + * when mapcount > 1 we need page_mlock() which can fail.
>   * when !page_evictable(), we need the full redo logic of putback_lru_page to
>   * avoid leaving evictable page in unevictable list.
>   *
> diff --git a/mm/rmap.c b/mm/rmap.c
> index bc08c4d4b58a..e88966903e1e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1405,10 +1405,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  	struct mmu_notifier_range range;
>  	enum ttu_flags flags = (enum ttu_flags)(long)arg;
>  
> -	/* munlock has nothing to gain from examining un-locked vmas */
> -	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
> -		return true;
> -
>  	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>  	    is_zone_device_page(page) && !is_device_private_page(page))
>  		return true;
> @@ -1469,8 +1465,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  				page_vma_mapped_walk_done(&pvmw);
>  				break;
>  			}
> -			if (flags & TTU_MUNLOCK)
> -				continue;
>  		}
>  
>  		/* Unexpected PMD-mapped THP? */
> @@ -1784,20 +1778,53 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags)
>  	return !page_mapcount(page) ? true : false;
>  }
>  
> +/*
> + * Walks the vma's mapping a page and mlocks the page if any locked vma's are
> + * found. Once one is found the page is locked and the scan can be terminated.
> + */

Can you please add that this requires the mmap_sem() lock to the
comments?

> +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
> +				 unsigned long address, void *unused)
> +{
> +	struct page_vma_mapped_walk pvmw = {
> +		.page = page,
> +		.vma = vma,
> +		.address = address,
> +	};
> +
> +	/* An un-locked vma doesn't have any pages to lock, continue the scan */
> +	if (!(vma->vm_flags & VM_LOCKED))
> +		return true;
> +
> +	while (page_vma_mapped_walk(&pvmw)) {
> +		/* PTE-mapped THP are never mlocked */
> +		if (!PageTransCompound(page))
> +			mlock_vma_page(page);
> +		page_vma_mapped_walk_done(&pvmw);
> +
> +		/*
> +		 * no need to continue scanning other vma's if the page has
> +		 * been locked.
> +		 */
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
>  /**
> - * try_to_munlock - try to munlock a page
> - * @page: the page to be munlocked
> + * page_mlock - try to mlock a page
> + * @page: the page to be mlocked
>   *
> - * Called from munlock code.  Checks all of the VMAs mapping the page
> - * to make sure nobody else has this page mlocked. The page will be
> - * returned with PG_mlocked cleared if no other vmas have it mlocked.
> + * Called from munlock code. Checks all of the VMAs mapping the page and mlocks
> + * the page if any are found. The page will be returned with PG_mlocked cleared
> + * if it is not mapped by any locked vmas.
> + *
> + * mmap_lock should be held for read or write.
>   */
> -
> -void try_to_munlock(struct page *page)
> +void page_mlock(struct page *page)
>  {
>  	struct rmap_walk_control rwc = {
> -		.rmap_one = try_to_unmap_one,
> -		.arg = (void *)TTU_MUNLOCK,
> +		.rmap_one = page_mlock_one,
>  		.done = page_not_mapped,
>  		.anon_lock = page_lock_anon_vma_read,
>  
> @@ -1849,7 +1876,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
>   * Find all the mappings of a page using the mapping pointer and the vma chains
>   * contained in the anon_vma struct it points to.
>   *
> - * When called from try_to_munlock(), the mmap_lock of the mm containing the vma
> + * When called from page_mlock(), the mmap_lock of the mm containing the vma
>   * where the page was found will be held for write.  So, we won't recheck
>   * vm_flags for that VMA.  That should be OK, because that vma shouldn't be
>   * LOCKED.
> @@ -1901,7 +1928,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
>   * Find all the mappings of a page using the mapping pointer and the vma chains
>   * contained in the address_space struct it points to.
>   *
> - * When called from try_to_munlock(), the mmap_lock of the mm containing the vma
> + * When called from page_mlock(), the mmap_lock of the mm containing the vma
>   * where the page was found will be held for write.  So, we won't recheck
>   * vm_flags for that VMA.  That should be OK, because that vma shouldn't be
>   * LOCKED.
> -- 
> 2.20.1
> 
> 

I believe munlock_vma_pages_range() still references the old function
name?

Thanks,
Liam

Shakeel Butt May 25, 2021, 11:45 p.m. UTC | #2

On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote:
>
[...]
> >
> > +/*
> > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are
> > + * found. Once one is found the page is locked and the scan can be terminated.
> > + */
>
> Can you please add that this requires the mmap_sem() lock to the
> comments?
>

Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct?

> > +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
> > +                              unsigned long address, void *unused)
> > +{
> > +     struct page_vma_mapped_walk pvmw = {
> > +             .page = page,
> > +             .vma = vma,
> > +             .address = address,
> > +     };
> > +
> > +     /* An un-locked vma doesn't have any pages to lock, continue the scan */
> > +     if (!(vma->vm_flags & VM_LOCKED))
> > +             return true;
> > +
> > +     while (page_vma_mapped_walk(&pvmw)) {
> > +             /* PTE-mapped THP are never mlocked */
> > +             if (!PageTransCompound(page))
> > +                     mlock_vma_page(page);
> > +             page_vma_mapped_walk_done(&pvmw);
> > +
> > +             /*
> > +              * no need to continue scanning other vma's if the page has
> > +              * been locked.
> > +              */
> > +             return false;
> > +     }
> > +
> > +     return true;
> > +}

Liam R. Howlett June 4, 2021, 8:49 p.m. UTC | #3

* Shakeel Butt <shakeelb@google.com> [210525 19:45]:
> On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote:
> >
> [...]
> > >
> > > +/*
> > > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are
> > > + * found. Once one is found the page is locked and the scan can be terminated.
> > > + */
> >
> > Can you please add that this requires the mmap_sem() lock to the
> > comments?
> >
> 
> Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct?


Doesn't the mlock_vma_page() require the mmap_sem() for reading?  The
mm_struct in vma->vm_mm;


From what I can see, at least the following paths have mmap_lock held
for writing:

munlock_vma_pages_range() from __do_munmap()
munlokc_vma_pages_range() from remap_file_pages()

> 
> > > +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
> > > +                              unsigned long address, void *unused)
> > > +{
> > > +     struct page_vma_mapped_walk pvmw = {
> > > +             .page = page,
> > > +             .vma = vma,
> > > +             .address = address,
> > > +     };
> > > +
> > > +     /* An un-locked vma doesn't have any pages to lock, continue the scan */
> > > +     if (!(vma->vm_flags & VM_LOCKED))
> > > +             return true;
> > > +
> > > +     while (page_vma_mapped_walk(&pvmw)) {
> > > +             /* PTE-mapped THP are never mlocked */
> > > +             if (!PageTransCompound(page))
> > > +                     mlock_vma_page(page);
> > > +             page_vma_mapped_walk_done(&pvmw);
> > > +
> > > +             /*
> > > +              * no need to continue scanning other vma's if the page has
> > > +              * been locked.
> > > +              */
> > > +             return false;
> > > +     }
> > > +
> > > +     return true;
> > > +}

munlock_vma_pages_range() comments still references try_to_{munlock|unmap}

Shakeel Butt June 5, 2021, 12:41 a.m. UTC | #4

On Fri, Jun 4, 2021 at 1:49 PM Liam Howlett <liam.howlett@oracle.com> wrote:
>
> * Shakeel Butt <shakeelb@google.com> [210525 19:45]:
> > On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote:
> > >
> > [...]
> > > >
> > > > +/*
> > > > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are
> > > > + * found. Once one is found the page is locked and the scan can be terminated.
> > > > + */
> > >
> > > Can you please add that this requires the mmap_sem() lock to the
> > > comments?
> > >
> >
> > Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct?
>
>
> Doesn't the mlock_vma_page() require the mmap_sem() for reading?  The
> mm_struct in vma->vm_mm;
>

We are traversing all the vmas where this page is mapped of possibly
different mm_structs. I don't think we want to take mmap_sem() of all
those mm_structs. The commit b87537d9e2fe ("mm: rmap use pte lock not
mmap_sem to set PageMlocked") removed exactly that.

>
> From what I can see, at least the following paths have mmap_lock held
> for writing:
>
> munlock_vma_pages_range() from __do_munmap()
> munlokc_vma_pages_range() from remap_file_pages()
>

The following path does not hold mmap_sem:

exit_mmap() -> munlock_vma_pages_all() -> munlock_vma_pages_range().

I would really suggest all to carefully read the commit message of
b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set
PageMlocked").

Particularly the following paragraph:
...
    Vlastimil Babka points out another race which this patch protects against.
     try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
    moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
    leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
    is ineffective because exit_mmap() does not hold it; page lock ineffective
    because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
    is effective because __munlock_pagevec_fill() takes it to get the page,
    after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.
...

Alistair, please bring back the VM_LOCKED check with pte lock held and
the comment "Holding pte lock, we do *not* need mmap_lock here".

One positive outcome of this cleanup patch is the removal of
unnecessary invalidation (unmapping for kvm case) of secondary mmus.

Liam R. Howlett June 5, 2021, 3:39 a.m. UTC | #5

* Shakeel Butt <shakeelb@google.com> [210604 20:41]:
> On Fri, Jun 4, 2021 at 1:49 PM Liam Howlett <liam.howlett@oracle.com> wrote:
> >
> > * Shakeel Butt <shakeelb@google.com> [210525 19:45]:
> > > On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote:
> > > >
> > > [...]
> > > > >
> > > > > +/*
> > > > > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are
> > > > > + * found. Once one is found the page is locked and the scan can be terminated.
> > > > > + */
> > > >
> > > > Can you please add that this requires the mmap_sem() lock to the
> > > > comments?
> > > >
> > >
> > > Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct?
> >
> >
> > Doesn't the mlock_vma_page() require the mmap_sem() for reading?  The
> > mm_struct in vma->vm_mm;
> >
> 
> We are traversing all the vmas where this page is mapped of possibly
> different mm_structs. I don't think we want to take mmap_sem() of all
> those mm_structs. The commit b87537d9e2fe ("mm: rmap use pte lock not
> mmap_sem to set PageMlocked") removed exactly that.
> 
> >
> > From what I can see, at least the following paths have mmap_lock held
> > for writing:
> >
> > munlock_vma_pages_range() from __do_munmap()
> > munlokc_vma_pages_range() from remap_file_pages()
> >
> 
> The following path does not hold mmap_sem:
> 
> exit_mmap() -> munlock_vma_pages_all() -> munlock_vma_pages_range().

Isn't this the benign race referenced by Hugh in the commit you point to
below?

> 
> I would really suggest all to carefully read the commit message of
> b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set
> PageMlocked").
> 
> Particularly the following paragraph:
> ...
>     Vlastimil Babka points out another race which this patch protects against.
>      try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
>     moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
>     leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
>     is ineffective because exit_mmap() does not hold it; page lock ineffective
>     because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
>     is effective because __munlock_pagevec_fill() takes it to get the page,
>     after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.
> ...

So this is saying the race with exit_mmap() isn't benign after all?

> 
> Alistair, please bring back the VM_LOCKED check with pte lock held and
> the comment "Holding pte lock, we do *not* need mmap_lock here".
> 
> One positive outcome of this cleanup patch is the removal of
> unnecessary invalidation (unmapping for kvm case) of secondary mmus.

Shakeel Butt June 5, 2021, 4:19 a.m. UTC | #6

On Fri, Jun 4, 2021 at 8:39 PM Liam Howlett <liam.howlett@oracle.com> wrote:
>
> > Particularly the following paragraph:
> > ...
> >     Vlastimil Babka points out another race which this patch protects against.
> >      try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
> >     moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
> >     leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
> >     is ineffective because exit_mmap() does not hold it; page lock ineffective
> >     because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
> >     is effective because __munlock_pagevec_fill() takes it to get the page,
> >     after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.
> > ...
>
> So this is saying the race with exit_mmap() isn't benign after all?
>

Yes, not benign at all.

Alistair Popple June 7, 2021, 4:51 a.m. UTC | #7

On Saturday, 5 June 2021 10:41:03 AM AEST Shakeel Butt wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Jun 4, 2021 at 1:49 PM Liam Howlett <liam.howlett@oracle.com> wrote:
> >
> > * Shakeel Butt <shakeelb@google.com> [210525 19:45]:
> > > On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> 
wrote:
> > > >
> > > [...]
> > > > >
> > > > > +/*
> > > > > + * Walks the vma's mapping a page and mlocks the page if any locked 
vma's are
> > > > > + * found. Once one is found the page is locked and the scan can be 
terminated.
> > > > > + */
> > > >
> > > > Can you please add that this requires the mmap_sem() lock to the
> > > > comments?
> > > >
> > >
> > > Why does this require mmap_sem() lock? Also mmap_sem() lock of which 
mm_struct?
> >
> >
> > Doesn't the mlock_vma_page() require the mmap_sem() for reading?  The
> > mm_struct in vma->vm_mm;
> >
> 
> We are traversing all the vmas where this page is mapped of possibly
> different mm_structs. I don't think we want to take mmap_sem() of all
> those mm_structs. The commit b87537d9e2fe ("mm: rmap use pte lock not
> mmap_sem to set PageMlocked") removed exactly that.
> 
> >
> > From what I can see, at least the following paths have mmap_lock held
> > for writing:
> >
> > munlock_vma_pages_range() from __do_munmap()
> > munlokc_vma_pages_range() from remap_file_pages()
> >
> 
> The following path does not hold mmap_sem:
> 
> exit_mmap() -> munlock_vma_pages_all() -> munlock_vma_pages_range().
> 
> I would really suggest all to carefully read the commit message of
> b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set
> PageMlocked").
> 
> Particularly the following paragraph:
> ...
>     Vlastimil Babka points out another race which this patch protects 
against.
>      try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked 
a
>     moment after munlock_vma_pages_all() did its Phase 1 
TestClearPageMlocked:
>     leaving PageMlocked and unevictable when it should be evictable.  
mmap_sem
>     is ineffective because exit_mmap() does not hold it; page lock 
ineffective
>     because __munlock_pagevec() only takes it afterwards, in Phase 2; pte 
lock
>     is effective because __munlock_pagevec_fill() takes it to get the page,
>     after VM_LOCKED was cleared from vm_flags, so visible to 
try_to_unmap_one.
> ...
>
> Alistair, please bring back the VM_LOCKED check with pte lock held and
> the comment "Holding pte lock, we do *not* need mmap_lock here".

Actually thanks for highlighting that paragraph. I have gone back through the 
code again in munlock_vma_pages_range() and think I have a better 
understanding of it now. So now I agree - the check of VM_LOCKED under the PTL 
is important to ensure mlock_vma_page() does not run after VM_LOCKED has been 
cleared and __munlock_pagevec_fill() has run.

Will post v10 to fix this and the try_to_munlock reference pointed out by Liam 
which I missed for v9. Thanks Shakeel for taking the time to point this out.

> One positive outcome of this cleanup patch is the removal of
> unnecessary invalidation (unmapping for kvm case) of secondary mmus.

[v9,03/10] mm/rmap: Split try_to_munlock from try_to_unmap

Commit Message

Comments

Patch