Message ID | 20210524132725.12697-4-apopple@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add support for SVM atomics in Nouveau | expand |
* Alistair Popple <apopple@nvidia.com> [210524 09:29]: > The behaviour of try_to_unmap_one() is difficult to follow because it > performs different operations based on a fairly large set of flags used > in different combinations. > > TTU_MUNLOCK is one such flag. However it is exclusively used by > try_to_munlock() which specifies no other flags. Therefore rather than > overload try_to_unmap_one() with unrelated behaviour split this out into > it's own function and remove the flag. > > Signed-off-by: Alistair Popple <apopple@nvidia.com> > Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > > --- > > v9: > * Improved comments > > v8: > * Renamed try_to_munlock to page_mlock to better reflect what the > function actually does. > * Removed the TODO from the documentation that this patch addresses. > > v7: > * Added Christoph's Reviewed-by > > v4: > * Removed redundant check for VM_LOCKED > --- > Documentation/vm/unevictable-lru.rst | 33 ++++++--------- > include/linux/rmap.h | 3 +- > mm/mlock.c | 10 ++--- > mm/rmap.c | 61 ++++++++++++++++++++-------- > 4 files changed, 63 insertions(+), 44 deletions(-) > > diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst > index 0e1490524f53..eae3af17f2d9 100644 > --- a/Documentation/vm/unevictable-lru.rst > +++ b/Documentation/vm/unevictable-lru.rst > @@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone statistics for the number of > mlocked pages. Note, however, that at this point we haven't checked whether > the page is mapped by other VM_LOCKED VMAs. > > -We can't call try_to_munlock(), the function that walks the reverse map to > +We can't call page_mlock(), the function that walks the reverse map to > check for other VM_LOCKED VMAs, without first isolating the page from the LRU. > -try_to_munlock() is a variant of try_to_unmap() and thus requires that the page > +page_mlock() is a variant of try_to_unmap() and thus requires that the page > not be on an LRU list [more on these below]. However, the call to > -isolate_lru_page() could fail, in which case we couldn't try_to_munlock(). So, > +isolate_lru_page() could fail, in which case we can't call page_mlock(). So, > we go ahead and clear PG_mlocked up front, as this might be the only chance we > -have. If we can successfully isolate the page, we go ahead and > -try_to_munlock(), which will restore the PG_mlocked flag and update the zone > +have. If we can successfully isolate the page, we go ahead and call > +page_mlock(), which will restore the PG_mlocked flag and update the zone > page statistics if it finds another VMA holding the page mlocked. If we fail > to isolate the page, we'll have left a potentially mlocked page on the LRU. > This is fine, because we'll catch it later if and if vmscan tries to reclaim > @@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim, > holepunching, and truncation of file pages and their anonymous COWed pages. > > > -try_to_munlock() Reverse Map Scan > +page_mlock() Reverse Map Scan > --------------------------------- > > -.. warning:: > - [!] TODO/FIXME: a better name might be page_mlocked() - analogous to the > - page_referenced() reverse map walker. > - > When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call > Handling <munlock_munlockall_handling>` above] tries to munlock a > page, it needs to determine whether or not the page is mapped by any > VM_LOCKED VMA without actually attempting to unmap all PTEs from the > page. For this purpose, the unevictable/mlock infrastructure > -introduced a variant of try_to_unmap() called try_to_munlock(). > +introduced a variant of try_to_unmap() called page_mlock(). > > -try_to_munlock() calls the same functions as try_to_unmap() for anonymous and > -mapped file and KSM pages with a flag argument specifying unlock versus unmap > -processing. Again, these functions walk the respective reverse maps looking > -for VM_LOCKED VMAs. When such a VMA is found, as in the try_to_unmap() case, > -the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK. This > -undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page. > +page_mlock() walks the respective reverse maps looking for VM_LOCKED VMAs. When > +such a VMA is found the page is mlocked via mlock_vma_page(). This undoes the > +pre-clearing of the page's PG_mlocked done by munlock_vma_page. > > -Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's > +Note that page_mlock()'s reverse map walk must visit every VMA in a page's > reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. > However, the scan can terminate when it encounters a VM_LOCKED VMA. > -Although try_to_munlock() might be called a great many times when munlocking a > +Although page_mlock() might be called a great many times when munlocking a > large region or tearing down a large address space that has been mlocked via > mlockall(), overall this is a fairly rare event. > > @@ -602,7 +595,7 @@ inactive lists to the appropriate node's unevictable list. > shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd > after shrink_active_list() had moved them to the inactive list, or pages mapped > into VM_LOCKED VMAs that munlock_vma_page() couldn't isolate from the LRU to > -recheck via try_to_munlock(). shrink_inactive_list() won't notice the latter, > +recheck via page_mlock(). shrink_inactive_list() won't notice the latter, > but will pass on to shrink_page_list(). > > shrink_page_list() again culls obviously unevictable pages that it could > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > index def5c62c93b3..38a746787c2f 100644 > --- a/include/linux/rmap.h > +++ b/include/linux/rmap.h > @@ -87,7 +87,6 @@ struct anon_vma_chain { > > enum ttu_flags { > TTU_MIGRATION = 0x1, /* migration mode */ > - TTU_MUNLOCK = 0x2, /* munlock mode */ > > TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ > TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ > @@ -239,7 +238,7 @@ int page_mkclean(struct page *); > * called in munlock()/munmap() path to check for other vmas holding > * the page mlocked. > */ > -void try_to_munlock(struct page *); > +void page_mlock(struct page *page); > > void remove_migration_ptes(struct page *old, struct page *new, bool locked); > > diff --git a/mm/mlock.c b/mm/mlock.c > index df590fda5688..a518d4c48e65 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -108,7 +108,7 @@ void mlock_vma_page(struct page *page) > /* > * Finish munlock after successful page isolation > * > - * Page must be locked. This is a wrapper for try_to_munlock() > + * Page must be locked. This is a wrapper for page_mlock() > * and putback_lru_page() with munlock accounting. > */ > static void __munlock_isolated_page(struct page *page) > @@ -118,7 +118,7 @@ static void __munlock_isolated_page(struct page *page) > * and we don't need to check all the other vmas. > */ > if (page_mapcount(page) > 1) > - try_to_munlock(page); > + page_mlock(page); > > /* Did try_to_unlock() succeed or punt? */ > if (!PageMlocked(page)) > @@ -158,7 +158,7 @@ static void __munlock_isolation_failed(struct page *page) > * munlock()ed or munmap()ed, we want to check whether other vmas hold the > * page locked so that we can leave it on the unevictable lru list and not > * bother vmscan with it. However, to walk the page's rmap list in > - * try_to_munlock() we must isolate the page from the LRU. If some other > + * page_mlock() we must isolate the page from the LRU. If some other > * task has removed the page from the LRU, we won't be able to do that. > * So we clear the PageMlocked as we might not get another chance. If we > * can't isolate the page, we leave it for putback_lru_page() and vmscan > @@ -168,7 +168,7 @@ unsigned int munlock_vma_page(struct page *page) > { > int nr_pages; > > - /* For try_to_munlock() and to serialize with page migration */ > + /* For page_mlock() and to serialize with page migration */ > BUG_ON(!PageLocked(page)); > VM_BUG_ON_PAGE(PageTail(page), page); > > @@ -205,7 +205,7 @@ static int __mlock_posix_error_return(long retval) > * > * The fast path is available only for evictable pages with single mapping. > * Then we can bypass the per-cpu pvec and get better performance. > - * when mapcount > 1 we need try_to_munlock() which can fail. > + * when mapcount > 1 we need page_mlock() which can fail. > * when !page_evictable(), we need the full redo logic of putback_lru_page to > * avoid leaving evictable page in unevictable list. > * > diff --git a/mm/rmap.c b/mm/rmap.c > index bc08c4d4b58a..e88966903e1e 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1405,10 +1405,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > struct mmu_notifier_range range; > enum ttu_flags flags = (enum ttu_flags)(long)arg; > > - /* munlock has nothing to gain from examining un-locked vmas */ > - if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED)) > - return true; > - > if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && > is_zone_device_page(page) && !is_device_private_page(page)) > return true; > @@ -1469,8 +1465,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > page_vma_mapped_walk_done(&pvmw); > break; > } > - if (flags & TTU_MUNLOCK) > - continue; > } > > /* Unexpected PMD-mapped THP? */ > @@ -1784,20 +1778,53 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) > return !page_mapcount(page) ? true : false; > } > > +/* > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are > + * found. Once one is found the page is locked and the scan can be terminated. > + */ Can you please add that this requires the mmap_sem() lock to the comments? > +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma, > + unsigned long address, void *unused) > +{ > + struct page_vma_mapped_walk pvmw = { > + .page = page, > + .vma = vma, > + .address = address, > + }; > + > + /* An un-locked vma doesn't have any pages to lock, continue the scan */ > + if (!(vma->vm_flags & VM_LOCKED)) > + return true; > + > + while (page_vma_mapped_walk(&pvmw)) { > + /* PTE-mapped THP are never mlocked */ > + if (!PageTransCompound(page)) > + mlock_vma_page(page); > + page_vma_mapped_walk_done(&pvmw); > + > + /* > + * no need to continue scanning other vma's if the page has > + * been locked. > + */ > + return false; > + } > + > + return true; > +} > + > /** > - * try_to_munlock - try to munlock a page > - * @page: the page to be munlocked > + * page_mlock - try to mlock a page > + * @page: the page to be mlocked > * > - * Called from munlock code. Checks all of the VMAs mapping the page > - * to make sure nobody else has this page mlocked. The page will be > - * returned with PG_mlocked cleared if no other vmas have it mlocked. > + * Called from munlock code. Checks all of the VMAs mapping the page and mlocks > + * the page if any are found. The page will be returned with PG_mlocked cleared > + * if it is not mapped by any locked vmas. > + * > + * mmap_lock should be held for read or write. > */ > - > -void try_to_munlock(struct page *page) > +void page_mlock(struct page *page) > { > struct rmap_walk_control rwc = { > - .rmap_one = try_to_unmap_one, > - .arg = (void *)TTU_MUNLOCK, > + .rmap_one = page_mlock_one, > .done = page_not_mapped, > .anon_lock = page_lock_anon_vma_read, > > @@ -1849,7 +1876,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page, > * Find all the mappings of a page using the mapping pointer and the vma chains > * contained in the anon_vma struct it points to. > * > - * When called from try_to_munlock(), the mmap_lock of the mm containing the vma > + * When called from page_mlock(), the mmap_lock of the mm containing the vma > * where the page was found will be held for write. So, we won't recheck > * vm_flags for that VMA. That should be OK, because that vma shouldn't be > * LOCKED. > @@ -1901,7 +1928,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, > * Find all the mappings of a page using the mapping pointer and the vma chains > * contained in the address_space struct it points to. > * > - * When called from try_to_munlock(), the mmap_lock of the mm containing the vma > + * When called from page_mlock(), the mmap_lock of the mm containing the vma > * where the page was found will be held for write. So, we won't recheck > * vm_flags for that VMA. That should be OK, because that vma shouldn't be > * LOCKED. > -- > 2.20.1 > > I believe munlock_vma_pages_range() still references the old function name? Thanks, Liam
On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote: > [...] > > > > +/* > > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are > > + * found. Once one is found the page is locked and the scan can be terminated. > > + */ > > Can you please add that this requires the mmap_sem() lock to the > comments? > Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct? > > +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma, > > + unsigned long address, void *unused) > > +{ > > + struct page_vma_mapped_walk pvmw = { > > + .page = page, > > + .vma = vma, > > + .address = address, > > + }; > > + > > + /* An un-locked vma doesn't have any pages to lock, continue the scan */ > > + if (!(vma->vm_flags & VM_LOCKED)) > > + return true; > > + > > + while (page_vma_mapped_walk(&pvmw)) { > > + /* PTE-mapped THP are never mlocked */ > > + if (!PageTransCompound(page)) > > + mlock_vma_page(page); > > + page_vma_mapped_walk_done(&pvmw); > > + > > + /* > > + * no need to continue scanning other vma's if the page has > > + * been locked. > > + */ > > + return false; > > + } > > + > > + return true; > > +}
* Shakeel Butt <shakeelb@google.com> [210525 19:45]: > On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote: > > > [...] > > > > > > +/* > > > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are > > > + * found. Once one is found the page is locked and the scan can be terminated. > > > + */ > > > > Can you please add that this requires the mmap_sem() lock to the > > comments? > > > > Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct? Doesn't the mlock_vma_page() require the mmap_sem() for reading? The mm_struct in vma->vm_mm; From what I can see, at least the following paths have mmap_lock held for writing: munlock_vma_pages_range() from __do_munmap() munlokc_vma_pages_range() from remap_file_pages() > > > > +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma, > > > + unsigned long address, void *unused) > > > +{ > > > + struct page_vma_mapped_walk pvmw = { > > > + .page = page, > > > + .vma = vma, > > > + .address = address, > > > + }; > > > + > > > + /* An un-locked vma doesn't have any pages to lock, continue the scan */ > > > + if (!(vma->vm_flags & VM_LOCKED)) > > > + return true; > > > + > > > + while (page_vma_mapped_walk(&pvmw)) { > > > + /* PTE-mapped THP are never mlocked */ > > > + if (!PageTransCompound(page)) > > > + mlock_vma_page(page); > > > + page_vma_mapped_walk_done(&pvmw); > > > + > > > + /* > > > + * no need to continue scanning other vma's if the page has > > > + * been locked. > > > + */ > > > + return false; > > > + } > > > + > > > + return true; > > > +} munlock_vma_pages_range() comments still references try_to_{munlock|unmap}
On Fri, Jun 4, 2021 at 1:49 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > * Shakeel Butt <shakeelb@google.com> [210525 19:45]: > > On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > [...] > > > > > > > > +/* > > > > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are > > > > + * found. Once one is found the page is locked and the scan can be terminated. > > > > + */ > > > > > > Can you please add that this requires the mmap_sem() lock to the > > > comments? > > > > > > > Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct? > > > Doesn't the mlock_vma_page() require the mmap_sem() for reading? The > mm_struct in vma->vm_mm; > We are traversing all the vmas where this page is mapped of possibly different mm_structs. I don't think we want to take mmap_sem() of all those mm_structs. The commit b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set PageMlocked") removed exactly that. > > From what I can see, at least the following paths have mmap_lock held > for writing: > > munlock_vma_pages_range() from __do_munmap() > munlokc_vma_pages_range() from remap_file_pages() > The following path does not hold mmap_sem: exit_mmap() -> munlock_vma_pages_all() -> munlock_vma_pages_range(). I would really suggest all to carefully read the commit message of b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set PageMlocked"). Particularly the following paragraph: ... Vlastimil Babka points out another race which this patch protects against. try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked: leaving PageMlocked and unevictable when it should be evictable. mmap_sem is ineffective because exit_mmap() does not hold it; page lock ineffective because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock is effective because __munlock_pagevec_fill() takes it to get the page, after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one. ... Alistair, please bring back the VM_LOCKED check with pte lock held and the comment "Holding pte lock, we do *not* need mmap_lock here". One positive outcome of this cleanup patch is the removal of unnecessary invalidation (unmapping for kvm case) of secondary mmus.
* Shakeel Butt <shakeelb@google.com> [210604 20:41]: > On Fri, Jun 4, 2021 at 1:49 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > * Shakeel Butt <shakeelb@google.com> [210525 19:45]: > > > On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > [...] > > > > > > > > > > +/* > > > > > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are > > > > > + * found. Once one is found the page is locked and the scan can be terminated. > > > > > + */ > > > > > > > > Can you please add that this requires the mmap_sem() lock to the > > > > comments? > > > > > > > > > > Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct? > > > > > > Doesn't the mlock_vma_page() require the mmap_sem() for reading? The > > mm_struct in vma->vm_mm; > > > > We are traversing all the vmas where this page is mapped of possibly > different mm_structs. I don't think we want to take mmap_sem() of all > those mm_structs. The commit b87537d9e2fe ("mm: rmap use pte lock not > mmap_sem to set PageMlocked") removed exactly that. > > > > > From what I can see, at least the following paths have mmap_lock held > > for writing: > > > > munlock_vma_pages_range() from __do_munmap() > > munlokc_vma_pages_range() from remap_file_pages() > > > > The following path does not hold mmap_sem: > > exit_mmap() -> munlock_vma_pages_all() -> munlock_vma_pages_range(). Isn't this the benign race referenced by Hugh in the commit you point to below? > > I would really suggest all to carefully read the commit message of > b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set > PageMlocked"). > > Particularly the following paragraph: > ... > Vlastimil Babka points out another race which this patch protects against. > try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a > moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked: > leaving PageMlocked and unevictable when it should be evictable. mmap_sem > is ineffective because exit_mmap() does not hold it; page lock ineffective > because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock > is effective because __munlock_pagevec_fill() takes it to get the page, > after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one. > ... So this is saying the race with exit_mmap() isn't benign after all? > > Alistair, please bring back the VM_LOCKED check with pte lock held and > the comment "Holding pte lock, we do *not* need mmap_lock here". > > One positive outcome of this cleanup patch is the removal of > unnecessary invalidation (unmapping for kvm case) of secondary mmus.
On Fri, Jun 4, 2021 at 8:39 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > Particularly the following paragraph: > > ... > > Vlastimil Babka points out another race which this patch protects against. > > try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a > > moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked: > > leaving PageMlocked and unevictable when it should be evictable. mmap_sem > > is ineffective because exit_mmap() does not hold it; page lock ineffective > > because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock > > is effective because __munlock_pagevec_fill() takes it to get the page, > > after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one. > > ... > > So this is saying the race with exit_mmap() isn't benign after all? > Yes, not benign at all.
On Saturday, 5 June 2021 10:41:03 AM AEST Shakeel Butt wrote: > External email: Use caution opening links or attachments > > > On Fri, Jun 4, 2021 at 1:49 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > * Shakeel Butt <shakeelb@google.com> [210525 19:45]: > > > On Tue, May 25, 2021 at 11:40 AM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > [...] > > > > > > > > > > +/* > > > > > + * Walks the vma's mapping a page and mlocks the page if any locked vma's are > > > > > + * found. Once one is found the page is locked and the scan can be terminated. > > > > > + */ > > > > > > > > Can you please add that this requires the mmap_sem() lock to the > > > > comments? > > > > > > > > > > Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct? > > > > > > Doesn't the mlock_vma_page() require the mmap_sem() for reading? The > > mm_struct in vma->vm_mm; > > > > We are traversing all the vmas where this page is mapped of possibly > different mm_structs. I don't think we want to take mmap_sem() of all > those mm_structs. The commit b87537d9e2fe ("mm: rmap use pte lock not > mmap_sem to set PageMlocked") removed exactly that. > > > > > From what I can see, at least the following paths have mmap_lock held > > for writing: > > > > munlock_vma_pages_range() from __do_munmap() > > munlokc_vma_pages_range() from remap_file_pages() > > > > The following path does not hold mmap_sem: > > exit_mmap() -> munlock_vma_pages_all() -> munlock_vma_pages_range(). > > I would really suggest all to carefully read the commit message of > b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set > PageMlocked"). > > Particularly the following paragraph: > ... > Vlastimil Babka points out another race which this patch protects against. > try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a > moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked: > leaving PageMlocked and unevictable when it should be evictable. mmap_sem > is ineffective because exit_mmap() does not hold it; page lock ineffective > because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock > is effective because __munlock_pagevec_fill() takes it to get the page, > after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one. > ... > > Alistair, please bring back the VM_LOCKED check with pte lock held and > the comment "Holding pte lock, we do *not* need mmap_lock here". Actually thanks for highlighting that paragraph. I have gone back through the code again in munlock_vma_pages_range() and think I have a better understanding of it now. So now I agree - the check of VM_LOCKED under the PTL is important to ensure mlock_vma_page() does not run after VM_LOCKED has been cleared and __munlock_pagevec_fill() has run. Will post v10 to fix this and the try_to_munlock reference pointed out by Liam which I missed for v9. Thanks Shakeel for taking the time to point this out. > One positive outcome of this cleanup patch is the removal of > unnecessary invalidation (unmapping for kvm case) of secondary mmus.
diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst index 0e1490524f53..eae3af17f2d9 100644 --- a/Documentation/vm/unevictable-lru.rst +++ b/Documentation/vm/unevictable-lru.rst @@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone statistics for the number of mlocked pages. Note, however, that at this point we haven't checked whether the page is mapped by other VM_LOCKED VMAs. -We can't call try_to_munlock(), the function that walks the reverse map to +We can't call page_mlock(), the function that walks the reverse map to check for other VM_LOCKED VMAs, without first isolating the page from the LRU. -try_to_munlock() is a variant of try_to_unmap() and thus requires that the page +page_mlock() is a variant of try_to_unmap() and thus requires that the page not be on an LRU list [more on these below]. However, the call to -isolate_lru_page() could fail, in which case we couldn't try_to_munlock(). So, +isolate_lru_page() could fail, in which case we can't call page_mlock(). So, we go ahead and clear PG_mlocked up front, as this might be the only chance we -have. If we can successfully isolate the page, we go ahead and -try_to_munlock(), which will restore the PG_mlocked flag and update the zone +have. If we can successfully isolate the page, we go ahead and call +page_mlock(), which will restore the PG_mlocked flag and update the zone page statistics if it finds another VMA holding the page mlocked. If we fail to isolate the page, we'll have left a potentially mlocked page on the LRU. This is fine, because we'll catch it later if and if vmscan tries to reclaim @@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim, holepunching, and truncation of file pages and their anonymous COWed pages. -try_to_munlock() Reverse Map Scan +page_mlock() Reverse Map Scan --------------------------------- -.. warning:: - [!] TODO/FIXME: a better name might be page_mlocked() - analogous to the - page_referenced() reverse map walker. - When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call Handling <munlock_munlockall_handling>` above] tries to munlock a page, it needs to determine whether or not the page is mapped by any VM_LOCKED VMA without actually attempting to unmap all PTEs from the page. For this purpose, the unevictable/mlock infrastructure -introduced a variant of try_to_unmap() called try_to_munlock(). +introduced a variant of try_to_unmap() called page_mlock(). -try_to_munlock() calls the same functions as try_to_unmap() for anonymous and -mapped file and KSM pages with a flag argument specifying unlock versus unmap -processing. Again, these functions walk the respective reverse maps looking -for VM_LOCKED VMAs. When such a VMA is found, as in the try_to_unmap() case, -the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK. This -undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page. +page_mlock() walks the respective reverse maps looking for VM_LOCKED VMAs. When +such a VMA is found the page is mlocked via mlock_vma_page(). This undoes the +pre-clearing of the page's PG_mlocked done by munlock_vma_page. -Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's +Note that page_mlock()'s reverse map walk must visit every VMA in a page's reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. However, the scan can terminate when it encounters a VM_LOCKED VMA. -Although try_to_munlock() might be called a great many times when munlocking a +Although page_mlock() might be called a great many times when munlocking a large region or tearing down a large address space that has been mlocked via mlockall(), overall this is a fairly rare event. @@ -602,7 +595,7 @@ inactive lists to the appropriate node's unevictable list. shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd after shrink_active_list() had moved them to the inactive list, or pages mapped into VM_LOCKED VMAs that munlock_vma_page() couldn't isolate from the LRU to -recheck via try_to_munlock(). shrink_inactive_list() won't notice the latter, +recheck via page_mlock(). shrink_inactive_list() won't notice the latter, but will pass on to shrink_page_list(). shrink_page_list() again culls obviously unevictable pages that it could diff --git a/include/linux/rmap.h b/include/linux/rmap.h index def5c62c93b3..38a746787c2f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -87,7 +87,6 @@ struct anon_vma_chain { enum ttu_flags { TTU_MIGRATION = 0x1, /* migration mode */ - TTU_MUNLOCK = 0x2, /* munlock mode */ TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ @@ -239,7 +238,7 @@ int page_mkclean(struct page *); * called in munlock()/munmap() path to check for other vmas holding * the page mlocked. */ -void try_to_munlock(struct page *); +void page_mlock(struct page *page); void remove_migration_ptes(struct page *old, struct page *new, bool locked); diff --git a/mm/mlock.c b/mm/mlock.c index df590fda5688..a518d4c48e65 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -108,7 +108,7 @@ void mlock_vma_page(struct page *page) /* * Finish munlock after successful page isolation * - * Page must be locked. This is a wrapper for try_to_munlock() + * Page must be locked. This is a wrapper for page_mlock() * and putback_lru_page() with munlock accounting. */ static void __munlock_isolated_page(struct page *page) @@ -118,7 +118,7 @@ static void __munlock_isolated_page(struct page *page) * and we don't need to check all the other vmas. */ if (page_mapcount(page) > 1) - try_to_munlock(page); + page_mlock(page); /* Did try_to_unlock() succeed or punt? */ if (!PageMlocked(page)) @@ -158,7 +158,7 @@ static void __munlock_isolation_failed(struct page *page) * munlock()ed or munmap()ed, we want to check whether other vmas hold the * page locked so that we can leave it on the unevictable lru list and not * bother vmscan with it. However, to walk the page's rmap list in - * try_to_munlock() we must isolate the page from the LRU. If some other + * page_mlock() we must isolate the page from the LRU. If some other * task has removed the page from the LRU, we won't be able to do that. * So we clear the PageMlocked as we might not get another chance. If we * can't isolate the page, we leave it for putback_lru_page() and vmscan @@ -168,7 +168,7 @@ unsigned int munlock_vma_page(struct page *page) { int nr_pages; - /* For try_to_munlock() and to serialize with page migration */ + /* For page_mlock() and to serialize with page migration */ BUG_ON(!PageLocked(page)); VM_BUG_ON_PAGE(PageTail(page), page); @@ -205,7 +205,7 @@ static int __mlock_posix_error_return(long retval) * * The fast path is available only for evictable pages with single mapping. * Then we can bypass the per-cpu pvec and get better performance. - * when mapcount > 1 we need try_to_munlock() which can fail. + * when mapcount > 1 we need page_mlock() which can fail. * when !page_evictable(), we need the full redo logic of putback_lru_page to * avoid leaving evictable page in unevictable list. * diff --git a/mm/rmap.c b/mm/rmap.c index bc08c4d4b58a..e88966903e1e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1405,10 +1405,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; - /* munlock has nothing to gain from examining un-locked vmas */ - if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED)) - return true; - if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && is_zone_device_page(page) && !is_device_private_page(page)) return true; @@ -1469,8 +1465,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } - if (flags & TTU_MUNLOCK) - continue; } /* Unexpected PMD-mapped THP? */ @@ -1784,20 +1778,53 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) return !page_mapcount(page) ? true : false; } +/* + * Walks the vma's mapping a page and mlocks the page if any locked vma's are + * found. Once one is found the page is locked and the scan can be terminated. + */ +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma, + unsigned long address, void *unused) +{ + struct page_vma_mapped_walk pvmw = { + .page = page, + .vma = vma, + .address = address, + }; + + /* An un-locked vma doesn't have any pages to lock, continue the scan */ + if (!(vma->vm_flags & VM_LOCKED)) + return true; + + while (page_vma_mapped_walk(&pvmw)) { + /* PTE-mapped THP are never mlocked */ + if (!PageTransCompound(page)) + mlock_vma_page(page); + page_vma_mapped_walk_done(&pvmw); + + /* + * no need to continue scanning other vma's if the page has + * been locked. + */ + return false; + } + + return true; +} + /** - * try_to_munlock - try to munlock a page - * @page: the page to be munlocked + * page_mlock - try to mlock a page + * @page: the page to be mlocked * - * Called from munlock code. Checks all of the VMAs mapping the page - * to make sure nobody else has this page mlocked. The page will be - * returned with PG_mlocked cleared if no other vmas have it mlocked. + * Called from munlock code. Checks all of the VMAs mapping the page and mlocks + * the page if any are found. The page will be returned with PG_mlocked cleared + * if it is not mapped by any locked vmas. + * + * mmap_lock should be held for read or write. */ - -void try_to_munlock(struct page *page) +void page_mlock(struct page *page) { struct rmap_walk_control rwc = { - .rmap_one = try_to_unmap_one, - .arg = (void *)TTU_MUNLOCK, + .rmap_one = page_mlock_one, .done = page_not_mapped, .anon_lock = page_lock_anon_vma_read, @@ -1849,7 +1876,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page, * Find all the mappings of a page using the mapping pointer and the vma chains * contained in the anon_vma struct it points to. * - * When called from try_to_munlock(), the mmap_lock of the mm containing the vma + * When called from page_mlock(), the mmap_lock of the mm containing the vma * where the page was found will be held for write. So, we won't recheck * vm_flags for that VMA. That should be OK, because that vma shouldn't be * LOCKED. @@ -1901,7 +1928,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, * Find all the mappings of a page using the mapping pointer and the vma chains * contained in the address_space struct it points to. * - * When called from try_to_munlock(), the mmap_lock of the mm containing the vma + * When called from page_mlock(), the mmap_lock of the mm containing the vma * where the page was found will be held for write. So, we won't recheck * vm_flags for that VMA. That should be OK, because that vma shouldn't be * LOCKED.