From patchwork Mon Mar 6 09:22:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13160758 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62FEBC678D4 for ; Mon, 6 Mar 2023 09:22:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 045376B0078; Mon, 6 Mar 2023 04:22:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F37D56B007B; Mon, 6 Mar 2023 04:22:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD8CF6B007D; Mon, 6 Mar 2023 04:22:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CFC596B0078 for ; Mon, 6 Mar 2023 04:22:13 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9BC5740A12 for ; Mon, 6 Mar 2023 09:22:13 +0000 (UTC) X-FDA: 80537932146.05.705364D Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 54D6A40013 for ; Mon, 6 Mar 2023 09:22:10 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QDcBhh5l; spf=pass (imf12.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678094530; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6uMFnD/7JocVIfIwxiS09ToRcJzLtqPg7eRKebdQneI=; b=SVuJx6HpPYuR3+WPEDTPMbhQyCi9dNMRcQgYyap0jAybET6ESoKke3GkSftr1ru5HmRxhC 7ve0mu3fUwKXAVqtNNOCmSdrAR5/qP9NxBmInUg0gHROETjw0l8HvG3/PFJUYnverZ8rhf z4tn11vp04lthxlgeqtFD/1qxmmuYlI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QDcBhh5l; spf=pass (imf12.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678094530; a=rsa-sha256; cv=none; b=xPVE0NYyPIbHKuo8Ssv2By5HY0MidFWsTvsQ+1QZUi5RzVvbo4kybgNKF82zom1lVBKydJ heGHv4d5PNx9UildFxphojNHUVNGPZoEGZpbRMORkFed6KUZ8Vn/+SaEVNZO0YFQJuIu75 bLif9i3+DZN6KQctWbRp0iaKPegYf9g= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678094530; x=1709630530; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nGlTKbMyeEZTiSX3ADmtCHbhxG3gVYPWqUBO0qZiZMM=; b=QDcBhh5li8NZw7ivay21YgcpmmXfv60tby2yxOsyA6yV3BO+iuIA20w8 o4bvx16yUGs7uzufL7Kb3qBMOtd/2JnwEh7LKtR0iNsPDN8v5/tLrdMfj eqDCCmk4djfKqZJ1vOpFxJizeZshqxVxWaQbp71kRF2U2YX8pk4386Qu6 YpyQ5W9jaFHkvCWRhgzjUO6OwG5Gb3Uf+SiorrAQhcAm/VwC1IHtUpTXw 933NLi433o0CYEj6WGxHMdbsW8YScEdzveTiCCDfVvYh0Ndh5wnAYTL4G C2banjGTjWrTq90pS/ZO55Z1xLzfBHp3ndP3/2YnGki5u2XzKncTfD2j9 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="363123967" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="363123967" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 01:22:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="799932411" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="799932411" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga004.jf.intel.com with ESMTP; 06 Mar 2023 01:22:05 -0800 From: Yin Fengwei To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, mike.kravetz@oracle.com, sidhartha.kumar@oracle.com, naoya.horiguchi@nec.com, jane.chu@oracle.com, david@redhat.com Cc: fengwei.yin@intel.com Subject: [PATCH v3 2/5] rmap: move page unmap operation to dedicated function Date: Mon, 6 Mar 2023 17:22:56 +0800 Message-Id: <20230306092259.3507807-3-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230306092259.3507807-1-fengwei.yin@intel.com> References: <20230306092259.3507807-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 54D6A40013 X-Rspam-User: X-Stat-Signature: qtphnpt3c51dzzruuh9jxtt36a7jq6f3 X-HE-Tag: 1678094530-930298 X-HE-Meta: U2FsdGVkX1+2fA3EyNae58jivbmRiCyPQPYebtjKluywVRyo2cRew6/UvZADyvDybyr+E4+DdNsfAefm/65lAk24LwtUSNczTWg1jFNpuNGYs7wy75b260wLgTAoSayUesFdnoP6Ira42feMeb1mYxntAp5SIdts05SD5XeavMMhda/FFO0PFbrRBUxC68Pn9hfkAQBBWpTyHUO0D2FBmXrLcI7gVvx/zEaws0UC4hszlF0hvDZ5TUWtNXPvbSugNDSUjYLCBw43TIGmmlNMWv4VygGwo8EjdENcw7Jolb7HWYWorLS++bdx4SNpSubovGANDUxHPJ29On88hMY3YB/YfoVI1NJ6+HZBEQirwch3hLUt+IipBE2lLtuERdqUrpYceWpJOmIdtF5iFJK3Ars177+fkp1LmR+oXaFILGsVuiJvejgIsKTFmgMJDLxdmJipugiuACmgpt6TN7xYqr0NeDlVVe1LKeNuznpeoc4f4ExbcGuPrTH/JfVCgVEDp2JuQhTC7SWIRU6zICtYZoUb9BcC/cVlF11gH9rOFh858MMv2Hqj0CdDF3D5vAkKkx4S5uEFNsfIgr+WZhBqvxx09I2ldMvgkEmgIeQA5REF+Gxx1LBXkYwVjb/8TdhcMm2HTC79uyf/B/UfcGtLC0SBGbsYMmzOPbG9MoGmOej+F9e0ne3K5m/B0NbF3AJsQjy4fL5TlY4DrTh9nc78SBZ65h/g09piq1WThPdm5sRRdfU9ccWmz/MYU0+ANYKb9Sx2WwEBC9hlqoExNqRbU455olpMLjq+9/N2J6Yk5jThpF7XCx5MHWBZXn27PtacbqUGqbG8QGRCjG3qeFb0Wfw4olXkyDInnHraD9C+sQpzeRMEiawClCiER2KPRa1mVYFMkjLTUN2dFzBfZ3SdJDc0NDYBugC66Fu5tyeqLgfIKhhvW/TpPGiuVkl2RHwJppGs+8aSPNpnBXMPQ8R AzVrXG1w +SgnftTNN/uuCbPdpAOcSqq/JuqM0jGQ/Mk+Jg+VJ3IeDx0oHqVFxrMvDV2h6FYCSkBZt0hIC4BhZbhQaHmQc4X/YZ3Pl2lK0JijoxBiHqlPVoqUaHFDeTMApEJGYrPr8ylkssz0FDi1xyofTlI0zwpc7AJL73Rq8wwcn47K6CnN33bQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: No functional change. Just code reorganized. Signed-off-by: Yin Fengwei --- mm/rmap.c | 369 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 194 insertions(+), 175 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 508d141dacc5..013643122d0c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1538,17 +1538,204 @@ static bool try_to_unmap_one_hugetlb(struct folio *folio, return ret; } +static bool try_to_unmap_one_page(struct folio *folio, + struct vm_area_struct *vma, struct mmu_notifier_range range, + struct page_vma_mapped_walk pvmw, unsigned long address, + enum ttu_flags flags) +{ + bool anon_exclusive, ret = true; + struct page *subpage; + struct mm_struct *mm = vma->vm_mm; + pte_t pteval; + + subpage = folio_page(folio, + pte_pfn(*pvmw.pte) - folio_pfn(folio)); + anon_exclusive = folio_test_anon(folio) && + PageAnonExclusive(subpage); + + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + if (should_defer_flush(mm, flags)) { + /* + * We clear the PTE but do not flush so potentially + * a remote CPU could still be writing to the folio. + * If the entry was previously clean then the + * architecture must guarantee that a clear->dirty + * transition on a cached TLB entry is written through + * and traps if the PTE is unmapped. + */ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); + + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } + + /* + * Now the pte is cleared. If this pte was uffd-wp armed, + * we may want to replace a none pte with a marker pte if + * it's file-backed, so we don't lose the tracking info. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + + /* Set the dirty flag on the folio now the pte is gone. */ + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + if (PageHWPoison(subpage) && !(flags & TTU_HWPOISON)) { + pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); + dec_mm_counter(mm, mm_counter(&folio->page)); + set_pte_at(mm, address, pvmw.pte, pteval); + } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + /* + * The guest indicated that the page content is of no + * interest anymore. Simply discard the pte, vmscan + * will take care of the rest. + * A future reference will then fault in a new zero + * page. When userfaultfd is active, we must not drop + * this page though, as its main user (postcopy + * migration) will not expect userfaults on already + * copied pages. + */ + dec_mm_counter(mm, mm_counter(&folio->page)); + /* We have to invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + } else if (folio_test_anon(folio)) { + swp_entry_t entry = { .val = page_private(subpage) }; + pte_t swp_pte; + /* + * Store the swap location in the pte. + * See handle_pte_fault() ... + */ + if (unlikely(folio_test_swapbacked(folio) != + folio_test_swapcache(folio))) { + WARN_ON_ONCE(1); + ret = false; + /* We have to invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + /* MADV_FREE page check */ + if (!folio_test_swapbacked(folio)) { + int ref_count, map_count; + + /* + * Synchronize with gup_pte_range(): + * - clear PTE; barrier; read refcount + * - inc refcount; barrier; read PTE + */ + smp_mb(); + + ref_count = folio_ref_count(folio); + map_count = folio_mapcount(folio); + + /* + * Order reads for page refcount and dirty flag + * (see comments in __remove_mapping()). + */ + smp_rmb(); + + /* + * The only page refs must be one from isolation + * plus the rmap(s) (dropped by discard:). + */ + if (ref_count == 1 + map_count && + !folio_test_dirty(folio)) { + /* Invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, + address, address + PAGE_SIZE); + dec_mm_counter(mm, MM_ANONPAGES); + goto discard; + } + + /* + * If the folio was redirtied, it cannot be + * discarded. Remap the page to page table. + */ + set_pte_at(mm, address, pvmw.pte, pteval); + folio_set_swapbacked(folio); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + if (swap_duplicate(entry) < 0) { + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + swap_free(entry); + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + /* See page_try_share_anon_rmap(): clear PTE first. */ + if (anon_exclusive && + page_try_share_anon_rmap(subpage)) { + swap_free(entry); + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + dec_mm_counter(mm, MM_ANONPAGES); + inc_mm_counter(mm, MM_SWAPENTS); + swp_pte = swp_entry_to_pte(entry); + if (anon_exclusive) + swp_pte = pte_swp_mkexclusive(swp_pte); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + set_pte_at(mm, address, pvmw.pte, swp_pte); + /* Invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + } else { + /* + * This is a locked file-backed folio, + * so it cannot be removed from the page + * cache and replaced by a new folio before + * mmu_notifier_invalidate_range_end, so no + * concurrent thread might update its page table + * to point at a new folio while a device is + * still using this folio. + * + * See Documentation/mm/mmu_notifier.rst + */ + dec_mm_counter(mm, mm_counter_file(&folio->page)); + } + +discard: + return ret; +} + /* * @arg: enum ttu_flags will be passed to this argument */ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { - struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); - pte_t pteval; struct page *subpage; - bool anon_exclusive, ret = true; + bool ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; @@ -1613,179 +1800,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); - anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); - - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - /* Nuke the page table entry. */ - if (should_defer_flush(mm, flags)) { - /* - * We clear the PTE but do not flush so potentially - * a remote CPU could still be writing to the folio. - * If the entry was previously clean then the - * architecture must guarantee that a clear->dirty - * transition on a cached TLB entry is written through - * and traps if the PTE is unmapped. - */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); - - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); - } - - /* - * Now the pte is cleared. If this pte was uffd-wp armed, - * we may want to replace a none pte with a marker pte if - * it's file-backed, so we don't lose the tracking info. - */ - pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); - - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - - /* Update high watermark before we lower rss */ - update_hiwater_rss(mm); - - if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) { - pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); - dec_mm_counter(mm, mm_counter(&folio->page)); - set_pte_at(mm, address, pvmw.pte, pteval); - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { - /* - * The guest indicated that the page content is of no - * interest anymore. Simply discard the pte, vmscan - * will take care of the rest. - * A future reference will then fault in a new zero - * page. When userfaultfd is active, we must not drop - * this page though, as its main user (postcopy - * migration) will not expect userfaults on already - * copied pages. - */ - dec_mm_counter(mm, mm_counter(&folio->page)); - /* We have to invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - } else if (folio_test_anon(folio)) { - swp_entry_t entry = { .val = page_private(subpage) }; - pte_t swp_pte; - /* - * Store the swap location in the pte. - * See handle_pte_fault() ... - */ - if (unlikely(folio_test_swapbacked(folio) != - folio_test_swapcache(folio))) { - WARN_ON_ONCE(1); - ret = false; - /* We have to invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - page_vma_mapped_walk_done(&pvmw); - break; - } - - /* MADV_FREE page check */ - if (!folio_test_swapbacked(folio)) { - int ref_count, map_count; - - /* - * Synchronize with gup_pte_range(): - * - clear PTE; barrier; read refcount - * - inc refcount; barrier; read PTE - */ - smp_mb(); - - ref_count = folio_ref_count(folio); - map_count = folio_mapcount(folio); - - /* - * Order reads for page refcount and dirty flag - * (see comments in __remove_mapping()). - */ - smp_rmb(); - - /* - * The only page refs must be one from isolation - * plus the rmap(s) (dropped by discard:). - */ - if (ref_count == 1 + map_count && - !folio_test_dirty(folio)) { - /* Invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, - address, address + PAGE_SIZE); - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } - - /* - * If the folio was redirtied, it cannot be - * discarded. Remap the page to page table. - */ - set_pte_at(mm, address, pvmw.pte, pteval); - folio_set_swapbacked(folio); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - - if (swap_duplicate(entry) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - swap_free(entry); - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } + ret = try_to_unmap_one_page(folio, vma, + range, pvmw, address, flags); + if (!ret) + break; - /* See page_try_share_anon_rmap(): clear PTE first. */ - if (anon_exclusive && - page_try_share_anon_rmap(subpage)) { - swap_free(entry); - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - if (list_empty(&mm->mmlist)) { - spin_lock(&mmlist_lock); - if (list_empty(&mm->mmlist)) - list_add(&mm->mmlist, &init_mm.mmlist); - spin_unlock(&mmlist_lock); - } - dec_mm_counter(mm, MM_ANONPAGES); - inc_mm_counter(mm, MM_SWAPENTS); - swp_pte = swp_entry_to_pte(entry); - if (anon_exclusive) - swp_pte = pte_swp_mkexclusive(swp_pte); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); - /* Invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - } else { - /* - * This is a locked file-backed folio, - * so it cannot be removed from the page - * cache and replaced by a new folio before - * mmu_notifier_invalidate_range_end, so no - * concurrent thread might update its page table - * to point at a new folio while a device is - * still using this folio. - * - * See Documentation/mm/mmu_notifier.rst - */ - dec_mm_counter(mm, mm_counter_file(&folio->page)); - } -discard: /* * No need to call mmu_notifier_invalidate_range() it has be * done above for all cases requiring it to happen under page