[133/192] mm/rmap: split migration into its own function

Message ID	20210701015416.0t4MkxtDR%akpm@linux-foundation.org (mailing list archive)
State	New
Headers	show Return-Path: <SRS0=qc1w=LZ=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23EB361241 Date: Wed, 30 Jun 2021 18:54:16 -0700 From: Andrew Morton <akpm@linux-foundation.org> To: akpm@linux-foundation.org, apopple@nvidia.com, bskeggs@redhat.com, hch@lst.de, hughd@google.com, jgg@nvidia.com, jhubbard@nvidia.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, peterx@redhat.com, rcampbell@nvidia.com, shakeelb@google.com, torvalds@linux-foundation.org, willy@infradead.org Subject: [patch 133/192] mm/rmap: split migration into its own function Message-ID: <20210701015416.0t4MkxtDR%akpm@linux-foundation.org> In-Reply-To: <20210630184624.9ca1937310b0dd5ce66b30e7@linux-foundation.org> User-Agent: s-nail v14.8.16 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[001/192] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c \| expand [001/192] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c [002/192] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP [003/192] mm: hugetlb: gather discrete indexes of tail page [004/192] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page [005/192] mm: hugetlb: defer freeing of HugeTLB pages [006/192] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page [007/192] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap [008/192] mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled [009/192] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate [010/192] mm/debug_vm_pgtable: move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE [011/192] mm/debug_vm_pgtable: remove redundant pfn_{pmd/pte}() and fix one comment mistake [012/192] mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK [013/192] mm/huge_memory.c: use page->deferred_list [014/192] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() [015/192] mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd [016/192] mm/huge_memory.c: don't discard hugepage if other processes are mapping it [017/192] mm/hugetlb: change parameters of arch_make_huge_pte() [018/192] mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge [019/192] mm/vmalloc: enable mapping of huge pages at pte level in vmap [020/192] mm/vmalloc: enable mapping of huge pages at pte level in vmalloc [021/192] powerpc/8xx: add support for huge pages on VMAP and VMALLOC [022/192] khugepaged: selftests: remove debug_cow [023/192] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY [024/192] mm: sparsemem: split the huge PMD mapping of vmemmap pages [025/192] mm: sparsemem: use huge PMD mapping for vmemmap pages [026/192] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON [027/192] hugetlb: remove prep_compound_huge_page cleanup [028/192] hugetlb: address ref count racing in prep_compound_gigantic_page [029/192] mm/hwpoison: disable pcp for page_handle_poison() [030/192] userfaultfd/selftests: use user mode only [031/192] userfaultfd/selftests: remove the time() check on delayed uffd [032/192] userfaultfd/selftests: dropping VERIFY check in locking_thread [033/192] userfaultfd/selftests: only dump counts if mode enabled [034/192] userfaultfd/selftests: unify error handling [035/192] mm/thp: simplify copying of huge zero page pmd when fork [036/192] mm/userfaultfd: fix uffd-wp special cases for fork() [037/192] mm/userfaultfd: fail uffd-wp registration if not supported [038/192] mm/pagemap: export uffd-wp protection information [039/192] userfaultfd/selftests: add pagemap uffd-wp test [040/192] userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte [041/192] userfaultfd/shmem: support minor fault registration for shmem [042/192] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem [043/192] userfaultfd/shmem: advertise shmem minor fault support [044/192] userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte() [045/192] userfaultfd/selftests: use memfd_create for shmem test type [046/192] userfaultfd/selftests: create alias mappings in the shmem test [047/192] userfaultfd/selftests: reinitialize test context in each test [048/192] userfaultfd/selftests: exercise minor fault handling shmem support [049/192] mm/vmscan.c: fix potential deadlock in reclaim_pages() [050/192] include/trace/events/vmscan.h: remove mm_vmscan_inactive_list_is_low [051/192] mm: workingset: define macro WORKINGSET_SHIFT [052/192] mm/kconfig: move HOLES_IN_ZONE into mm [053/192] docs: proc.rst: meminfo: briefly describe gaps in memory accounting [054/192] fs/proc/kcore: drop KCORE_REMAP and KCORE_OTHER [055/192] fs/proc/kcore: pfn_is_ram check only applies to KCORE_RAM [056/192] fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages [057/192] mm: introduce page_offline_(begin\|end\|freeze\|thaw) to synchronize setting PageOffline() [058/192] virtio-mem: use page_offline_(start\|end) when setting PageOffline() [059/192] fs/proc/kcore: use page_offline_(freeze\|thaw) [060/192] mm/z3fold: define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKS [061/192] mm/z3fold: avoid possible underflow in z3fold_alloc() [062/192] mm/z3fold: remove magic number in z3fold_create_pool() [063/192] mm/z3fold: remove unused function handle_to_z3fold_header() [064/192] mm/z3fold: fix potential memory leak in z3fold_destroy_pool() [065/192] mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page [066/192] mm/zbud: reuse unbuddied[0] as buddied in zbud_pool [067/192] mm/zbud: don't export any zbud API [068/192] mm/compaction: use DEVICE_ATTR_WO macro [069/192] mm: compaction: remove duplicate !list_empty(&sublist) check [070/192] mm/compaction: fix 'limit' in fast_isolate_freepages [071/192] mm/mempolicy: cleanup nodemask intersection check for oom [072/192] mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy [073/192] mm/mempolicy: unify the parameter sanity check for mbind and set_mempolicy [074/192] mm: mempolicy: don't have to split pmd for huge zero page [075/192] mm/mempolicy: use unified 'nodes' for bind/interleave/prefer policies [076/192] include/linux/mmzone.h: add documentation for pfn_valid() [077/192] memblock: update initialization of reserved pages [078/192] arm64: decouple check whether pfn is in linear map from pfn_valid() [079/192] arm64: drop pfn_valid_within() and simplify pfn_valid() [080/192] arm64/mm: drop HAVE_ARCH_PFN_VALID [081/192] mm: migrate: fix missing update page_private to hugetlb_page_subpool [082/192] mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs [083/192] mm: memory: add orig_pmd to struct vm_fault [084/192] mm: memory: make numa_migrate_prep() non-static [085/192] mm: thp: refactor NUMA fault handling [086/192] mm: migrate: account THP NUMA migration counters correctly [087/192] mm: migrate: don't split THP for misplaced NUMA page [088/192] mm: migrate: check mapcount for THP instead of refcount [089/192] mm: thp: skip make PMD PROT_NONE if THP migration is not supported [090/192] mm/thp: make ARCH_ENABLE_SPLIT_PMD_PTLOCK dependent on PGTABLE_LEVELS > 2 [091/192] mm: rmap: make try_to_unmap() void function [092/192] mm/thp: remap_page() is only needed on anonymous THP [093/192] mm: hwpoison_user_mappings() try_to_unmap() with TTU_SYNC [094/192] mm/thp: fix strncpy warning [095/192] nommu: remove __GFP_HIGHMEM in vmalloc/vzalloc [096/192] mm/nommu: unexport do_munmap() [097/192] mm: generalize ZONE_[DMA\|DMA32] [098/192] mm: make variable names for populate_vma_page_range() consistent [099/192] mm/madvise: introduce MADV_POPULATE_(READ\|WRITE) to prefault page tables [100/192] MAINTAINERS: add tools/testing/selftests/vm/ to MEMORY MANAGEMENT [101/192] selftests/vm: add protection_keys_32 / protection_keys_64 to gitignore [102/192] selftests/vm: add test for MADV_POPULATE_(READ\|WRITE) [103/192] mm/memory_hotplug: rate limit page migration warnings [104/192] mm,memory_hotplug: drop unneeded locking [105/192] mm/zswap.c: remove unused function zswap_debugfs_exit() [106/192] mm/zswap.c: avoid unnecessary copy-in at map time [107/192] mm/zswap.c: fix two bugs in zswap_writeback_entry() [108/192] mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep [109/192] mm/zsmalloc.c: remove confusing code in obj_free() [110/192] mm/zsmalloc.c: improve readability for async_free_zspage() [111/192] zram: move backing_dev under macro CONFIG_ZRAM_WRITEBACK [112/192] mm: fix typos and grammar error in comments [113/192] mm: define default value for FIRST_USER_ADDRESS [114/192] mm: fix spelling mistakes [115/192] mm/vmscan: remove kerneldoc-like comment from isolate_lru_pages [116/192] mm/vmalloc: include header for prototype of set_iounmap_nonlazy [117/192] mm/page_alloc: make should_fail_alloc_page() static [118/192] mm/mapping_dirty_helpers: remove double Note in kerneldoc [119/192] mm/memcontrol.c: fix kerneldoc comment for mem_cgroup_calculate_protection [120/192] mm/memory_hotplug: fix kerneldoc comment for __try_online_node [121/192] mm/memory_hotplug: fix kerneldoc comment for __remove_memory [122/192] mm/zbud: add kerneldoc fields for zbud_pool [123/192] mm/z3fold: add kerneldoc fields for z3fold_pool [124/192] mm/swap: make swap_address_space an inline function [125/192] mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations [126/192] mm/page_alloc: move prototype for find_suitable_fallback [127/192] mm/swap: make NODE_DATA an inline function on CONFIG_FLATMEM [128/192] mm/thp: define default pmd_pgtable() [129/192] kfence: unconditionally use unbound work queue [130/192] mm: remove special swap entry functions [131/192] mm/swapops: rework swap entry manipulation code [132/192] mm/rmap: split try_to_munlock from try_to_unmap [133/192] mm/rmap: split migration into its own function [134/192] mm: rename migrate_pgmap_owner [135/192] mm/memory.c: allow different return codes for copy_nonpresent_pte() [136/192] mm: device exclusive memory access [137/192] mm: selftests for exclusive device memory [138/192] nouveau/svm: refactor nouveau_range_fault [139/192] nouveau/svm: implement atomic SVM access [140/192] proc: Avoid mixing integer types in mem_rw() [141/192] fs/proc/kcore.c: add mmap interface [142/192] procfs: allow reading fdinfo with PTRACE_MODE_READ [143/192] procfs/dmabuf: add inode number to /proc//fdinfo [144/192] sysctl: remove redundant assignment to first [145/192] drm: include only needed headers in ascii85.h [146/192] kernel.h: split out panic and oops helpers [147/192] lib: decompress_bunzip2: remove an unneeded semicolon [148/192] lib/string_helpers: switch to use BIT() macro [149/192] lib/string_helpers: move ESCAPE_NP check inside 'else' branch in a loop [150/192] lib/string_helpers: drop indentation level in string_escape_mem() [151/192] lib/string_helpers: introduce ESCAPE_NA for escaping non-ASCII [152/192] lib/string_helpers: introduce ESCAPE_NAP to escape non-ASCII and non-printable [153/192] lib/string_helpers: allow to append additional characters to be escaped [154/192] lib/test-string_helpers: print flags in hexadecimal format [155/192] lib/test-string_helpers: get rid of trailing comma in terminators [156/192] lib/test-string_helpers: add test cases for new features [157/192] MAINTAINERS: add myself as designated reviewer for generic string library [158/192] seq_file: introduce seq_escape_mem() [159/192] seq_file: add seq_escape_str() as replica of string_escape_str() [160/192] seq_file: convert seq_escape() to use seq_escape_str() [161/192] nfsd: avoid non-flexible API in seq_quote_mem() [162/192] seq_file: drop unused _escape_mem_ascii() [163/192] lib/math/rational.c: fix divide by zero [164/192] lib/math/rational: add Kunit test cases [165/192] lib/decompressors: fix spelling mistakes [166/192] lib/mpi: fix spelling mistakes [167/192] lib: memscan() fixlet [168/192] lib: uninline simple_strtoull() [169/192] lib/test_string.c: allow module removal [170/192] kernel.h: split out kstrtox() and simple_strtox() to a separate header [171/192] lz4_decompress: declare LZ4_decompress_safe_withPrefix64k static [172/192] lib/decompress_unlz4.c: correctly handle zero-padding around initrds. [173/192] checkpatch: scripts/spdxcheck.py now requires python3 [174/192] checkpatch: improve the indented label test [175/192] checkpatch: do not complain about positive return values starting with EPOLL [176/192] init: print out unknown kernel parameters [177/192] kprobes: remove duplicated strong free_insn_page in x86 and s390 [178/192] nilfs2: remove redundant continue statement in a while-loop [179/192] hfsplus: remove unnecessary oom message [180/192] hfsplus: report create_date to kstat.btime [181/192] x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned [182/192] exec: remove checks in __register_bimfmt() [183/192] kcov: add __no_sanitize_coverage to fix noinstr for all architectures [184/192] selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random [185/192] selftests/vm/pkeys: handle negative sys_pkey_alloc() return code [186/192] selftests/vm/pkeys: refill shadow register after implicit kernel write [187/192] selftests/vm/pkeys: exercise x86 XSAVE init state [188/192] lib/decompressors: remove set but not used variabled 'level' [189/192] ipc sem: use kvmalloc for sem_undo allocation [190/192] ipc: use kmalloc for msg_queue and shmid_kernel [191/192] ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock [192/192] ipc/util.c: use binary search for max_idx

Message ID

20210701015416.0t4MkxtDR%akpm@linux-foundation.org (mailing list archive)

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23EB361241
Date: Wed, 30 Jun 2021 18:54:16 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, apopple@nvidia.com, bskeggs@redhat.com,
 hch@lst.de, hughd@google.com, jgg@nvidia.com, jhubbard@nvidia.com,
 linux-mm@kvack.org, mm-commits@vger.kernel.org, peterx@redhat.com,
 rcampbell@nvidia.com, shakeelb@google.com,
 torvalds@linux-foundation.org, willy@infradead.org
Subject: [patch 133/192] mm/rmap: split migration into its own
 function
Message-ID: <20210701015416.0t4MkxtDR%akpm@linux-foundation.org>
In-Reply-To: <20210630184624.9ca1937310b0dd5ce66b30e7@linux-foundation.org>
User-Agent: s-nail v14.8.16
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[001/192] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c | expand

Commit Message

Andrew Morton July 1, 2021, 1:54 a.m. UTC

From: Alistair Popple <apopple@nvidia.com>
Subject: mm/rmap: split migration into its own function

Migration is currently implemented as a mode of operation for
try_to_unmap_one() generally specified by passing the TTU_MIGRATION flag
or in the case of splitting a huge anonymous page TTU_SPLIT_FREEZE.

However it does not have much in common with the rest of the unmap
functionality of try_to_unmap_one() and thus splitting it into a separate
function reduces the complexity of try_to_unmap_one() making it more
readable.

Several simplifications can also be made in try_to_migrate_one() based on
the following observations:

 - All users of TTU_MIGRATION also set TTU_IGNORE_MLOCK.
 - No users of TTU_MIGRATION ever set TTU_IGNORE_HWPOISON.
 - No users of TTU_MIGRATION ever set TTU_BATCH_FLUSH.

TTU_SPLIT_FREEZE is a special case of migration used when splitting an
anonymous page.  This is most easily dealt with by calling the correct
function from unmap_page() in mm/huge_memory.c - either try_to_migrate()
for PageAnon or try_to_unmap().

Link: https://lkml.kernel.org/r/20210616105937.23201-5-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/rmap.h |    4 
 mm/huge_memory.c     |   16 +
 mm/migrate.c         |    9 -
 mm/rmap.c            |  367 ++++++++++++++++++++++++++++++-----------
 4 files changed, 289 insertions(+), 107 deletions(-)

--- a/include/linux/rmap.h~mm-rmap-split-migration-into-its-own-function
+++ a/include/linux/rmap.h
@@ -86,8 +86,6 @@  struct anon_vma_chain {
 };
 
 enum ttu_flags {
-	TTU_MIGRATION		= 0x1,	/* migration mode */
-
 	TTU_SPLIT_HUGE_PMD	= 0x4,	/* split huge PMD if any */
 	TTU_IGNORE_MLOCK	= 0x8,	/* ignore mlock */
 	TTU_SYNC		= 0x10,	/* avoid racy checks with PVMW_SYNC */
@@ -97,7 +95,6 @@  enum ttu_flags {
 					 * do a final flush if necessary */
 	TTU_RMAP_LOCKED		= 0x80,	/* do not grab rmap lock:
 					 * caller holds it */
-	TTU_SPLIT_FREEZE	= 0x100,		/* freeze pte under splitting thp */
 };
 
 #ifdef CONFIG_MMU
@@ -194,6 +191,7 @@  static inline void page_dup_rmap(struct
 int page_referenced(struct page *, int is_locked,
 			struct mem_cgroup *memcg, unsigned long *vm_flags);
 
+void try_to_migrate(struct page *page, enum ttu_flags flags);
 void try_to_unmap(struct page *, enum ttu_flags flags);
 
 /* Avoid racy checks */
--- a/mm/huge_memory.c~mm-rmap-split-migration-into-its-own-function
+++ a/mm/huge_memory.c
@@ -2309,16 +2309,20 @@  void vma_adjust_trans_huge(struct vm_are
 
 static void unmap_page(struct page *page)
 {
-	enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC |
-		TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
+	enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
+		TTU_SYNC;
 
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
-	/* If TTU_SPLIT_FREEZE is ever extended to file, update remap_page() */
+	/*
+	 * Anon pages need migration entries to preserve them, but file
+	 * pages can simply be left unmapped, then faulted back on demand.
+	 * If that is ever changed (perhaps for mlock), update remap_page().
+	 */
 	if (PageAnon(page))
-		ttu_flags |= TTU_SPLIT_FREEZE;
-
-	try_to_unmap(page, ttu_flags);
+		try_to_migrate(page, ttu_flags);
+	else
+		try_to_unmap(page, ttu_flags | TTU_IGNORE_MLOCK);
 
 	VM_WARN_ON_ONCE_PAGE(page_mapped(page), page);
 }
--- a/mm/migrate.c~mm-rmap-split-migration-into-its-own-function
+++ a/mm/migrate.c
@@ -1109,7 +1109,7 @@  static int __unmap_and_move(struct page
 		/* Establish migration ptes */
 		VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
 				page);
-		try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK);
+		try_to_migrate(page, 0);
 		page_was_mapped = 1;
 	}
 
@@ -1311,7 +1311,7 @@  static int unmap_and_move_huge_page(new_
 
 	if (page_mapped(hpage)) {
 		bool mapping_locked = false;
-		enum ttu_flags ttu = TTU_MIGRATION|TTU_IGNORE_MLOCK;
+		enum ttu_flags ttu = 0;
 
 		if (!PageAnon(hpage)) {
 			/*
@@ -1328,7 +1328,7 @@  static int unmap_and_move_huge_page(new_
 			ttu |= TTU_RMAP_LOCKED;
 		}
 
-		try_to_unmap(hpage, ttu);
+		try_to_migrate(hpage, ttu);
 		page_was_mapped = 1;
 
 		if (mapping_locked)
@@ -2602,7 +2602,6 @@  static void migrate_vma_prepare(struct m
  */
 static void migrate_vma_unmap(struct migrate_vma *migrate)
 {
-	int flags = TTU_MIGRATION | TTU_IGNORE_MLOCK;
 	const unsigned long npages = migrate->npages;
 	const unsigned long start = migrate->start;
 	unsigned long addr, i, restore = 0;
@@ -2614,7 +2613,7 @@  static void migrate_vma_unmap(struct mig
 			continue;
 
 		if (page_mapped(page)) {
-			try_to_unmap(page, flags);
+			try_to_migrate(page, 0);
 			if (page_mapped(page))
 				goto restore;
 		}
--- a/mm/rmap.c~mm-rmap-split-migration-into-its-own-function
+++ a/mm/rmap.c
@@ -1411,14 +1411,8 @@  static bool try_to_unmap_one(struct page
 	if (flags & TTU_SYNC)
 		pvmw.flags = PVMW_SYNC;
 
-	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
-	    is_zone_device_page(page) && !is_device_private_page(page))
-		return true;
-
-	if (flags & TTU_SPLIT_HUGE_PMD) {
-		split_huge_pmd_address(vma, address,
-				flags & TTU_SPLIT_FREEZE, page);
-	}
+	if (flags & TTU_SPLIT_HUGE_PMD)
+		split_huge_pmd_address(vma, address, false, page);
 
 	/*
 	 * For THP, we have to assume the worse case ie pmd for invalidation.
@@ -1443,16 +1437,6 @@  static bool try_to_unmap_one(struct page
 	mmu_notifier_invalidate_range_start(&range);
 
 	while (page_vma_mapped_walk(&pvmw)) {
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-		/* PMD-mapped THP migration entry */
-		if (!pvmw.pte && (flags & TTU_MIGRATION)) {
-			VM_BUG_ON_PAGE(PageHuge(page) || !PageTransCompound(page), page);
-
-			set_pmd_migration_entry(&pvmw, page);
-			continue;
-		}
-#endif
-
 		/*
 		 * If the page is mlock()d, we cannot swap it out.
 		 * If it's recently referenced (perhaps page_referenced
@@ -1514,46 +1498,6 @@  static bool try_to_unmap_one(struct page
 			}
 		}
 
-		if (IS_ENABLED(CONFIG_MIGRATION) &&
-		    (flags & TTU_MIGRATION) &&
-		    is_zone_device_page(page)) {
-			swp_entry_t entry;
-			pte_t swp_pte;
-
-			pteval = ptep_get_and_clear(mm, pvmw.address, pvmw.pte);
-
-			/*
-			 * Store the pfn of the page in a special migration
-			 * pte. do_swap_page() will wait until the migration
-			 * pte is removed and then restart fault handling.
-			 */
-			entry = make_readable_migration_entry(page_to_pfn(page));
-			swp_pte = swp_entry_to_pte(entry);
-
-			/*
-			 * pteval maps a zone device page and is therefore
-			 * a swap pte.
-			 */
-			if (pte_swp_soft_dirty(pteval))
-				swp_pte = pte_swp_mksoft_dirty(swp_pte);
-			if (pte_swp_uffd_wp(pteval))
-				swp_pte = pte_swp_mkuffd_wp(swp_pte);
-			set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte);
-			/*
-			 * No need to invalidate here it will synchronize on
-			 * against the special swap migration pte.
-			 *
-			 * The assignment to subpage above was computed from a
-			 * swap PTE which results in an invalid pointer.
-			 * Since only PAGE_SIZE pages can currently be
-			 * migrated, just set it to page. This will need to be
-			 * changed when hugepage migrations to device private
-			 * memory are supported.
-			 */
-			subpage = page;
-			goto discard;
-		}
-
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		if (should_defer_flush(mm, flags)) {
@@ -1606,39 +1550,6 @@  static bool try_to_unmap_one(struct page
 			/* We have to invalidate as we cleared the pte */
 			mmu_notifier_invalidate_range(mm, address,
 						      address + PAGE_SIZE);
-		} else if (IS_ENABLED(CONFIG_MIGRATION) &&
-				(flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) {
-			swp_entry_t entry;
-			pte_t swp_pte;
-
-			if (arch_unmap_one(mm, vma, address, pteval) < 0) {
-				set_pte_at(mm, address, pvmw.pte, pteval);
-				ret = false;
-				page_vma_mapped_walk_done(&pvmw);
-				break;
-			}
-
-			/*
-			 * Store the pfn of the page in a special migration
-			 * pte. do_swap_page() will wait until the migration
-			 * pte is removed and then restart fault handling.
-			 */
-			if (pte_write(pteval))
-				entry = make_writable_migration_entry(
-							page_to_pfn(subpage));
-			else
-				entry = make_readable_migration_entry(
-							page_to_pfn(subpage));
-			swp_pte = swp_entry_to_pte(entry);
-			if (pte_soft_dirty(pteval))
-				swp_pte = pte_swp_mksoft_dirty(swp_pte);
-			if (pte_uffd_wp(pteval))
-				swp_pte = pte_swp_mkuffd_wp(swp_pte);
-			set_pte_at(mm, address, pvmw.pte, swp_pte);
-			/*
-			 * No need to invalidate here it will synchronize on
-			 * against the special swap migration pte.
-			 */
 		} else if (PageAnon(page)) {
 			swp_entry_t entry = { .val = page_private(subpage) };
 			pte_t swp_pte;
@@ -1766,6 +1677,277 @@  void try_to_unmap(struct page *page, enu
 		.anon_lock = page_lock_anon_vma_read,
 	};
 
+	if (flags & TTU_RMAP_LOCKED)
+		rmap_walk_locked(page, &rwc);
+	else
+		rmap_walk(page, &rwc);
+}
+
+/*
+ * @arg: enum ttu_flags will be passed to this argument.
+ *
+ * If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs
+ * containing migration entries. This and TTU_RMAP_LOCKED are the only supported
+ * flags.
+ */
+static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
+		     unsigned long address, void *arg)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	struct page_vma_mapped_walk pvmw = {
+		.page = page,
+		.vma = vma,
+		.address = address,
+	};
+	pte_t pteval;
+	struct page *subpage;
+	bool ret = true;
+	struct mmu_notifier_range range;
+	enum ttu_flags flags = (enum ttu_flags)(long)arg;
+
+	if (is_zone_device_page(page) && !is_device_private_page(page))
+		return true;
+
+	/*
+	 * When racing against e.g. zap_pte_range() on another cpu,
+	 * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+	 * try_to_migrate() may return before page_mapped() has become false,
+	 * if page table locking is skipped: use TTU_SYNC to wait for that.
+	 */
+	if (flags & TTU_SYNC)
+		pvmw.flags = PVMW_SYNC;
+
+	/*
+	 * unmap_page() in mm/huge_memory.c is the only user of migration with
+	 * TTU_SPLIT_HUGE_PMD and it wants to freeze.
+	 */
+	if (flags & TTU_SPLIT_HUGE_PMD)
+		split_huge_pmd_address(vma, address, true, page);
+
+	/*
+	 * For THP, we have to assume the worse case ie pmd for invalidation.
+	 * For hugetlb, it could be much worse if we need to do pud
+	 * invalidation in the case of pmd sharing.
+	 *
+	 * Note that the page can not be free in this function as call of
+	 * try_to_unmap() must hold a reference on the page.
+	 */
+	range.end = PageKsm(page) ?
+			address + PAGE_SIZE : vma_address_end(page, vma);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+				address, range.end);
+	if (PageHuge(page)) {
+		/*
+		 * If sharing is possible, start and end will be adjusted
+		 * accordingly.
+		 */
+		adjust_range_if_pmd_sharing_possible(vma, &range.start,
+						     &range.end);
+	}
+	mmu_notifier_invalidate_range_start(&range);
+
+	while (page_vma_mapped_walk(&pvmw)) {
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+		/* PMD-mapped THP migration entry */
+		if (!pvmw.pte) {
+			VM_BUG_ON_PAGE(PageHuge(page) ||
+				       !PageTransCompound(page), page);
+
+			set_pmd_migration_entry(&pvmw, page);
+			continue;
+		}
+#endif
+
+		/* Unexpected PMD-mapped THP? */
+		VM_BUG_ON_PAGE(!pvmw.pte, page);
+
+		subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
+		address = pvmw.address;
+
+		if (PageHuge(page) && !PageAnon(page)) {
+			/*
+			 * To call huge_pmd_unshare, i_mmap_rwsem must be
+			 * held in write mode.  Caller needs to explicitly
+			 * do this outside rmap routines.
+			 */
+			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
+				/*
+				 * huge_pmd_unshare unmapped an entire PMD
+				 * page.  There is no way of knowing exactly
+				 * which PMDs may be cached for this mm, so
+				 * we must flush them all.  start/end were
+				 * already adjusted above to cover this range.
+				 */
+				flush_cache_range(vma, range.start, range.end);
+				flush_tlb_range(vma, range.start, range.end);
+				mmu_notifier_invalidate_range(mm, range.start,
+							      range.end);
+
+				/*
+				 * The ref count of the PMD page was dropped
+				 * which is part of the way map counting
+				 * is done for shared PMDs.  Return 'true'
+				 * here.  When there is no other sharing,
+				 * huge_pmd_unshare returns false and we will
+				 * unmap the actual page and drop map count
+				 * to zero.
+				 */
+				page_vma_mapped_walk_done(&pvmw);
+				break;
+			}
+		}
+
+		/* Nuke the page table entry. */
+		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+		pteval = ptep_clear_flush(vma, address, pvmw.pte);
+
+		/* Move the dirty bit to the page. Now the pte is gone. */
+		if (pte_dirty(pteval))
+			set_page_dirty(page);
+
+		/* Update high watermark before we lower rss */
+		update_hiwater_rss(mm);
+
+		if (is_zone_device_page(page)) {
+			swp_entry_t entry;
+			pte_t swp_pte;
+
+			/*
+			 * Store the pfn of the page in a special migration
+			 * pte. do_swap_page() will wait until the migration
+			 * pte is removed and then restart fault handling.
+			 */
+			entry = make_readable_migration_entry(
+							page_to_pfn(page));
+			swp_pte = swp_entry_to_pte(entry);
+
+			/*
+			 * pteval maps a zone device page and is therefore
+			 * a swap pte.
+			 */
+			if (pte_swp_soft_dirty(pteval))
+				swp_pte = pte_swp_mksoft_dirty(swp_pte);
+			if (pte_swp_uffd_wp(pteval))
+				swp_pte = pte_swp_mkuffd_wp(swp_pte);
+			set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte);
+			/*
+			 * No need to invalidate here it will synchronize on
+			 * against the special swap migration pte.
+			 *
+			 * The assignment to subpage above was computed from a
+			 * swap PTE which results in an invalid pointer.
+			 * Since only PAGE_SIZE pages can currently be
+			 * migrated, just set it to page. This will need to be
+			 * changed when hugepage migrations to device private
+			 * memory are supported.
+			 */
+			subpage = page;
+		} else if (PageHWPoison(page)) {
+			pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
+			if (PageHuge(page)) {
+				hugetlb_count_sub(compound_nr(page), mm);
+				set_huge_swap_pte_at(mm, address,
+						     pvmw.pte, pteval,
+						     vma_mmu_pagesize(vma));
+			} else {
+				dec_mm_counter(mm, mm_counter(page));
+				set_pte_at(mm, address, pvmw.pte, pteval);
+			}
+
+		} else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
+			/*
+			 * The guest indicated that the page content is of no
+			 * interest anymore. Simply discard the pte, vmscan
+			 * will take care of the rest.
+			 * A future reference will then fault in a new zero
+			 * page. When userfaultfd is active, we must not drop
+			 * this page though, as its main user (postcopy
+			 * migration) will not expect userfaults on already
+			 * copied pages.
+			 */
+			dec_mm_counter(mm, mm_counter(page));
+			/* We have to invalidate as we cleared the pte */
+			mmu_notifier_invalidate_range(mm, address,
+						      address + PAGE_SIZE);
+		} else {
+			swp_entry_t entry;
+			pte_t swp_pte;
+
+			if (arch_unmap_one(mm, vma, address, pteval) < 0) {
+				set_pte_at(mm, address, pvmw.pte, pteval);
+				ret = false;
+				page_vma_mapped_walk_done(&pvmw);
+				break;
+			}
+
+			/*
+			 * Store the pfn of the page in a special migration
+			 * pte. do_swap_page() will wait until the migration
+			 * pte is removed and then restart fault handling.
+			 */
+			if (pte_write(pteval))
+				entry = make_writable_migration_entry(
+							page_to_pfn(subpage));
+			else
+				entry = make_readable_migration_entry(
+							page_to_pfn(subpage));
+
+			swp_pte = swp_entry_to_pte(entry);
+			if (pte_soft_dirty(pteval))
+				swp_pte = pte_swp_mksoft_dirty(swp_pte);
+			if (pte_uffd_wp(pteval))
+				swp_pte = pte_swp_mkuffd_wp(swp_pte);
+			set_pte_at(mm, address, pvmw.pte, swp_pte);
+			/*
+			 * No need to invalidate here it will synchronize on
+			 * against the special swap migration pte.
+			 */
+		}
+
+		/*
+		 * No need to call mmu_notifier_invalidate_range() it has be
+		 * done above for all cases requiring it to happen under page
+		 * table lock before mmu_notifier_invalidate_range_end()
+		 *
+		 * See Documentation/vm/mmu_notifier.rst
+		 */
+		page_remove_rmap(subpage, PageHuge(page));
+		put_page(page);
+	}
+
+	mmu_notifier_invalidate_range_end(&range);
+
+	return ret;
+}
+
+/**
+ * try_to_migrate - try to replace all page table mappings with swap entries
+ * @page: the page to replace page table entries for
+ * @flags: action and flags
+ *
+ * Tries to remove all the page table entries which are mapping this page and
+ * replace them with special swap entries. Caller must hold the page lock.
+ *
+ * If is successful, return true. Otherwise, false.
+ */
+void try_to_migrate(struct page *page, enum ttu_flags flags)
+{
+	struct rmap_walk_control rwc = {
+		.rmap_one = try_to_migrate_one,
+		.arg = (void *)flags,
+		.done = page_not_mapped,
+		.anon_lock = page_lock_anon_vma_read,
+	};
+
+	/*
+	 * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and
+	 * TTU_SPLIT_HUGE_PMD and TTU_SYNC flags.
+	 */
+	if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
+					TTU_SYNC)))
+		return;
+
 	/*
 	 * During exec, a temporary VMA is setup and later moved.
 	 * The VMA is moved under the anon_vma lock but not the
@@ -1774,8 +1956,7 @@  void try_to_unmap(struct page *page, enu
 	 * locking requirements of exec(), migration skips
 	 * temporary VMAs until after exec() completes.
 	 */
-	if ((flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))
-	    && !PageKsm(page) && PageAnon(page))
+	if (!PageKsm(page) && PageAnon(page))
 		rwc.invalid_vma = invalid_migration_vma;
 
 	if (flags & TTU_RMAP_LOCKED)

[133/192] mm/rmap: split migration into its own function

Commit Message

Patch