From patchwork Mon Nov 13 09:05:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yu X-Patchwork-Id: 13453717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB697C4167B for ; Mon, 13 Nov 2023 09:06:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CEDD6B01F7; Mon, 13 Nov 2023 04:06:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 67DAB6B01F9; Mon, 13 Nov 2023 04:06:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F9F66B01FA; Mon, 13 Nov 2023 04:06:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3B53C6B01F7 for ; Mon, 13 Nov 2023 04:06:15 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 191D11606DF for ; Mon, 13 Nov 2023 09:06:15 +0000 (UTC) X-FDA: 81452349510.18.01DA594 Received: from out30-112.freemail.mail.aliyun.com (out30-112.freemail.mail.aliyun.com [115.124.30.112]) by imf11.hostedemail.com (Postfix) with ESMTP id ED4B440004 for ; Mon, 13 Nov 2023 09:06:12 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf11.hostedemail.com: domain of xuyu@linux.alibaba.com designates 115.124.30.112 as permitted sender) smtp.mailfrom=xuyu@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699866373; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T2cTfmi9UpwaHhn958sOCo6jZHJ/gilpytPwm6Is/tY=; b=t1GpWNXCAwya3jmQER5CUfco2tBfUJgr8ogaz4EEQbzekesLvIidC3/vdsb+rVpQ2ChTKq wHx6ZrYuPdNN9MHXMEF2nHSqSNbkHlgkXSjtbyhB0POQvzGOLoVeK7F3zjB2qaP3dyi8qa 7hw/tPAGpXKaqHLi8CaIIZmrBVyl//I= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf11.hostedemail.com: domain of xuyu@linux.alibaba.com designates 115.124.30.112 as permitted sender) smtp.mailfrom=xuyu@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699866373; a=rsa-sha256; cv=none; b=FDOiK7DBt3JF2zFIMb+5LMHydmlcAmivztNAhPbU7TVNMB8bb/en2IAyHWX54h9ikamvf5 jNgqWCn2i3mFLDztMpLC1PmCu7HeBxiAbT5y4rMxPQcawUtDwPkB44RCAQfmKncqgKwmEf 36GCRPpirug5pjorNSIx85n5lVp0nQE= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R681e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=xuyu@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0VwGyeii_1699866367; Received: from localhost(mailfrom:xuyu@linux.alibaba.com fp:SMTPD_---0VwGyeii_1699866367) by smtp.aliyun-inc.com; Mon, 13 Nov 2023 17:06:08 +0800 From: Xu Yu To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, zokeefe@google.com, song@kernel.org, shy828301@gmail.com Subject: [PATCH 1/1] mm/khugepaged: map anonymous pte-mapped THPs by pmds Date: Mon, 13 Nov 2023 17:05:58 +0800 Message-Id: <5e56a480be9294108bff6ff0bcb0980dc7ee27d4.1699865107.git.xuyu@linux.alibaba.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: ED4B440004 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 1prhy9mhaczmwpigwtj8cp51mcacs4b6 X-HE-Tag: 1699866372-502100 X-HE-Meta: U2FsdGVkX188bo4kHgttQbDTD4ORIADWq4FONN7YEwD84LA50TxPhLwJ2wnz6NkW/S3L9wfLDe9o+4lwDDgPmWjVS/n6SrLJPFA0e69D4SYauNBII+Dd/4H3gxWjfR3qQAnNARRUpTwcAuTjIFxkBMe3gBw5ZARQGbLNFTYz+0auzSxG0t3BiPg/YBNIJaQTG8ufGYerFm3IpGKqFdI1uMG+RxiLev78SWbc0q1aRSwA7cPRseMlONX7EGDeQToLWXhLMQGxdXivMowtuKytUzwcv3OCk03o5ih3FFC3TITqilZ8mxtussT2jSBfh//lFVa/EHbET15UmI8AA40vZ2LRldR5Ej3wQWvoV/WOUI7+OiTUva5caWEAnPHouhFYnNUVsfyEWFCWiuyevHY1bQONCqQWTMek7nXP4nqlL/glqak5NpSlWvDClaGXPlBl5vAVUptZNsITj1p1Rm/aO8PMjsmoOOEVxMeA+W9oOEiE8L5NdD8wRLWJgKFAhZ0dSZaIOCD4twLMed8GYMYHb3Z7py0kL1/E2XTtmvLOgUh+lflHddLmIzwRQAcCXb63gU/ThBMpNwOpByR/PTphaiKrQJSA3ACHRjMK8fdU5XayFZnrAgYLjqsdh0Dn6PgTohxhLrzp0w1BdkO3QQWza2QIRiwomdtQ3lZl2LQyAzT10EiqF98FTt8lp4KRS5HKHtJN5cHvZz1mvXaePd4JY6NoZNp60y+ucCtpEPhX/yK4XVNbGmuKb7nI4krlk+pOmQXqEmOgvtqMlzKL/Q++saMKVmnb+h5leR6YTHFi/pwAQRu+MaoKNVhN54qBaEbXMhrkQg6GwwdrKUAYo99nnoD+/t6AVkUNoMcx3jJuR8jwBUGW3VP4nNdyEC1S7RO3DMbKetc2pX95HdBtFzEvB5dhWe8pzbmNJPjBaV+n3D9kqigiw7zPp4FFsXZZKBtXgbkFAfk8FnzRv8/K8CF 0eehF7ua Kh+cvTQO7piIK7oIw1d/RBBbR3A+ndmGO7PMchlz8xcpO8hSwt1LHuwtpdu0ku22ZLVTEGjZAJjRWfbI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In the anonymous collapse path, khugepaged collapses pte-mapped hugepages by allocating and copying to a new hugepage, which is suboptimally. In fact, we only need to update the mapping page tables for anonymous pte-mapped THPs, in the same way as file/shmem-backed pte-mapped THPs, as shown in commit 58ac9a8993a1 ("mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds"). Signed-off-by: Xu Yu --- mm/khugepaged.c | 187 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 180 insertions(+), 7 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 88433cc25d8a..14069dedebdc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1384,6 +1384,12 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, address))) referenced++; + + if (compound_order(page) == HPAGE_PMD_ORDER && + !is_huge_zero_page(page)) { + result = SCAN_PTE_MAPPED_HUGEPAGE; + goto out_unmap; + } } if (!writable) { result = SCAN_PAGE_RO; @@ -1402,6 +1408,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, /* collapse_huge_page will return with the mmap_lock released */ *mmap_locked = false; } + if (result == SCAN_PTE_MAPPED_HUGEPAGE) { + /* adapt to calling convention of collapse_pte_mapped_thp() */ + mmap_read_unlock(mm); + *mmap_locked = false; + } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, none_or_zero, result, unmapped); @@ -1454,6 +1465,140 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, return SCAN_SUCCEED; } +static struct page *find_lock_pte_mapped_page_unsafe(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd) +{ + pte_t *pte, pteval; + struct page *page = NULL; + + /* caller should recheck with ptl. */ + pte = pte_offset_map(pmd, addr); + if (!pte) + return NULL; + + pteval = ptep_get_lockless(pte); + if (pte_none(pteval) || !pte_present(pteval)) + goto out; + + page = vm_normal_page(vma, addr, pteval); + if (unlikely(!page) || unlikely(is_zone_device_page(page))) + goto out; + + page = compound_head(page); + + if (!trylock_page(page)) { + page = NULL; + goto out; + } + + if (!get_page_unless_zero(page)) { + unlock_page(page); + page = NULL; + goto out; + } + +out: + pte_unmap(pte); + return page; +} + +/* call with mmap write lock, and hpage is PG_locked. */ +static noinline int collapse_pte_mapped_thp_anon(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long haddr, struct page *hpage) +{ + struct mmu_notifier_range range; + unsigned long addr; + pmd_t *pmd, pmdval; + pte_t *start_pte, *pte; + spinlock_t *pml, *ptl; + pgtable_t pgtable; + int result, i; + + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); + if (result != SCAN_SUCCEED) + goto out; + + result = SCAN_FAIL; + start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto out; + /* step 1: check all mapped PTEs are to the right huge page */ + for (i = 0, addr = haddr, pte = start_pte; + i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { + struct page *page; + pte_t pteval = ptep_get(pte); + + if (pte_none(pteval) || !pte_present(pteval)) { + result = SCAN_PTE_NON_PRESENT; + goto out_unmap; + } + + page = vm_normal_page(vma, addr, pteval); + if (WARN_ON_ONCE(page && is_zone_device_page(page))) + page = NULL; + /* + * Note that uprobe, debugger, or MAP_PRIVATE may change the + * page table, but the new page will not be a subpage of hpage. + */ + if (hpage + i != page) + goto out_unmap; + } + pte_unmap_unlock(start_pte, ptl); + + /* step 2: clear page table and adjust rmap */ + vma_start_write(vma); + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + pml = pmd_lock(mm, pmd); + pmdval = pmdp_collapse_flush(vma, haddr, pmd); + spin_unlock(pml); + + mmu_notifier_invalidate_range_end(&range); + tlb_remove_table_sync_one(); + + start_pte = pte_offset_map_lock(mm, &pmdval, haddr, &ptl); + if (!start_pte) + goto abort; + for (i = 0, addr = haddr, pte = start_pte; + i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { + struct page *page; + pte_t pteval = ptep_get(pte); + + page = vm_normal_page(vma, addr, pteval); + page_remove_rmap(page, vma, false); + } + pte_unmap_unlock(start_pte, ptl); + + /* step 3: install pmd entry */ + pgtable = pmd_pgtable(pmdval); + + pmdval = mk_huge_pmd(hpage, vma->vm_page_prot); + pmdval = maybe_pmd_mkwrite(pmd_mkdirty(pmdval), vma); + + spin_lock(pml); + page_add_anon_rmap(hpage, vma, haddr, RMAP_COMPOUND); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, haddr, pmd, pmdval); + update_mmu_cache_pmd(vma, haddr, pmd); + spin_unlock(pml); + + result = SCAN_SUCCEED; + return result; +abort: + spin_lock(pml); + pmd_populate(mm, pmd, pmd_pgtable(pmdval)); + spin_unlock(pml); +out_unmap: + if (start_pte) + pte_unmap_unlock(start_pte, ptl); +out: + return result; +} + /** * collapse_pte_mapped_thp - Try to collapse a pte-mapped THP for mm at * address haddr. @@ -1479,14 +1624,16 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, spinlock_t *pml = NULL, *ptl; int nr_ptes = 0, result = SCAN_FAIL; int i; + bool file; mmap_assert_locked(mm); /* First check VMA found, in case page tables are being torn down */ - if (!vma || !vma->vm_file || - !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) + if (!vma || !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) return SCAN_VMA_CHECK; + file = !!vma->vm_file; + /* Fast check before locking page if already PMD-mapped */ result = find_pmd_or_thp_or_none(mm, haddr, &pmd); if (result == SCAN_PMD_MAPPED) @@ -1506,8 +1653,11 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, if (userfaultfd_wp(vma)) return SCAN_PTE_UFFD_WP; - hpage = find_lock_page(vma->vm_file->f_mapping, - linear_page_index(vma, haddr)); + if (file) + hpage = find_lock_page(vma->vm_file->f_mapping, + linear_page_index(vma, haddr)); + else + hpage = find_lock_pte_mapped_page_unsafe(vma, haddr, pmd); if (!hpage) return SCAN_PAGE_NULL; @@ -1521,6 +1671,11 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } + if (!file) { + result = collapse_pte_mapped_thp_anon(mm, vma, haddr, hpage); + goto drop_hpage; + } + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); switch (result) { case SCAN_SUCCEED: @@ -2415,6 +2570,18 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, } else { *result = hpage_collapse_scan_pmd(mm, vma, khugepaged_scan.address, &mmap_locked, cc); + if (*result == SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_write_lock(mm); + if (hpage_collapse_test_exit(mm)) { + mmap_write_unlock(mm); + goto breakouterloop_mmap_lock; + } + *result = collapse_pte_mapped_thp(mm, + khugepaged_scan.address, true); + if (*result == SCAN_PMD_MAPPED) + *result = SCAN_SUCCEED; + mmap_write_unlock(mm); + } } if (*result == SCAN_SUCCEED) @@ -2764,9 +2931,15 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, case SCAN_PTE_MAPPED_HUGEPAGE: BUG_ON(mmap_locked); BUG_ON(*prev); - mmap_read_lock(mm); - result = collapse_pte_mapped_thp(mm, addr, true); - mmap_read_unlock(mm); + if (vma->vm_file) { + mmap_read_lock(mm); + result = collapse_pte_mapped_thp(mm, addr, true); + mmap_read_unlock(mm); + } else { + mmap_write_lock(mm); + result = collapse_pte_mapped_thp(mm, addr, true); + mmap_write_unlock(mm); + } goto handle_result; /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: