diff mbox series

[v2,13/17] khugepaged: Lock all VMAs mapping the PTE table

Message ID 20250211111326.14295-14-dev.jain@arm.com (mailing list archive)
State New
Headers show
Series khugepaged: Asynchronous mTHP collapse | expand

Commit Message

Dev Jain Feb. 11, 2025, 11:13 a.m. UTC
After enabling khugepaged to handle VMAs of any size, it may happen that
the process faults on a VMA other than the VMA under collapse, and both
these VMAs span the same PTE table. As a result, the fault handler will
install a new PTE table after khugepaged isolates the PTE table. Therefore,
scan the PTE table, retrieve all VMAs, and write lock them. Note that,
rmap can still reach the PTE table from folios not under collapse; this is
fine since it does not interfere with the PTEs under collapse, nor the folios
under collapse, nor can rmap fill the PMD.

Signed-off-by: Dev Jain <dev.jain@arm.com>
---
 mm/khugepaged.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 048f990d8507..e1c2c5b89f6d 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1139,6 +1139,23 @@  static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm,
 	return SCAN_SUCCEED;
 }
 
+static void take_vma_locks_per_pte(struct mm_struct *mm, unsigned long haddress)
+{
+	struct vm_area_struct *vma;
+	unsigned long start = haddress;
+	unsigned long end = haddress + HPAGE_PMD_SIZE;
+
+	while (start < end) {
+		vma = vma_lookup(mm, start);
+		if (!vma) {
+			start += PAGE_SIZE;
+			continue;
+		}
+		vma_start_write(vma);
+		start = vma->vm_end;
+	}
+}
+
 static int vma_collapse_anon_folio_pmd(struct mm_struct *mm, unsigned long address,
 		struct vm_area_struct *vma, struct collapse_control *cc, pmd_t *pmd,
 		struct folio *folio)
@@ -1270,7 +1287,9 @@  static int vma_collapse_anon_folio(struct mm_struct *mm, unsigned long address,
 	if (result != SCAN_SUCCEED)
 		goto out;
 
-	vma_start_write(vma);
+	/* Faulting may fill the PMD after flush; lock all VMAs mapping this PTE */
+	take_vma_locks_per_pte(mm, haddress);
+
 	anon_vma_lock_write(vma->anon_vma);
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, haddress,