From patchwork Mon Mar 13 12:45:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13172441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21714C61DA4 for ; Mon, 13 Mar 2023 12:44:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC25A6B0074; Mon, 13 Mar 2023 08:44:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A714D6B0075; Mon, 13 Mar 2023 08:44:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 912766B0078; Mon, 13 Mar 2023 08:44:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 80D416B0074 for ; Mon, 13 Mar 2023 08:44:31 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4FBD71A0B05 for ; Mon, 13 Mar 2023 12:44:31 +0000 (UTC) X-FDA: 80563843542.19.AF1D1C1 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf10.hostedemail.com (Postfix) with ESMTP id 126DAC001F for ; Mon, 13 Mar 2023 12:44:28 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=njxQ5hWL; spf=pass (imf10.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678711469; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WTwL8LHDuL9P+LNTKHtgsih7WEgDUpwXA5rVCR2+HCI=; b=EjVZ/zBpF2JzDUVAyFsHhLTtpcjaoUeN+9i1SEIQ7DD86WlPovvffSBR+d0MSyP1/8ZHfY 6NtsolaGCTBlD8NqUNjO9eLX4Q+au7vIDdiqNfmxqJZBy2w32i/MqEx5ONjvycf8/lBa9h GmFHwOTm0OZSwGktlAtsxwqTVedcfzY= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=njxQ5hWL; spf=pass (imf10.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678711469; a=rsa-sha256; cv=none; b=PBrIYqnTvoaVcr/pcSgh5oqGB5nh6XaaJRwjLobjyY17eXQ+scEVP1vH5mR3CNXRKWmtn+ IKrGlwHSHWyOSAWiw9LS7VZ8aVTv+cd/++VNS6ZBLYnUl4b4CEapeO4u/fkuGEDthoD/iR 9zxFIEOzeYt4jJLab7sofKrkOLBP2EA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678711469; x=1710247469; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+Cb4ASBAgHOPwudxgLxAly/MqqPV70H5weA/H5xS/qw=; b=njxQ5hWLr7OPvdgaoKhgM9u4byjzR+wrReSPva5AzVDFEWPh6D/RnluT 0EdEogq/Sj9c35pyXN9ZAYzDKE98I8wWffJCHcCf5ugK3v+oPFwBDe1bK yaLRguk0p9p+gxsTpeQyzafn2yW0wDaelokwiasnQ4NzCe6BUMX/urZjY Hn6GGsYuP5/dxJV7kU9NR4tawzAR8z5wb1Zwccpi4HBGTCj6wTcTUYNDz Kv8Ar7jYpIOY//d076sxbzWa9C3iMziWDPDrfRKNHDryH3KNt/SvyP0cQ m72R/N/TrzCRBYSGGn4hooHlWIS7016A/wDRQD4woigzjlc/WSYQI2bJ5 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10647"; a="399727470" X-IronPort-AV: E=Sophos;i="5.98,256,1673942400"; d="scan'208";a="399727470" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2023 05:44:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10647"; a="767683317" X-IronPort-AV: E=Sophos;i="5.98,256,1673942400"; d="scan'208";a="767683317" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by FMSMGA003.fm.intel.com with ESMTP; 13 Mar 2023 05:44:24 -0700 From: Yin Fengwei To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, mike.kravetz@oracle.com, sidhartha.kumar@oracle.com, naoya.horiguchi@nec.com, jane.chu@oracle.com, david@redhat.com Cc: fengwei.yin@intel.com Subject: [PATCH v4 2/5] rmap: move page unmap operation to dedicated function Date: Mon, 13 Mar 2023 20:45:23 +0800 Message-Id: <20230313124526.1207490-3-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230313124526.1207490-1-fengwei.yin@intel.com> References: <20230313124526.1207490-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 126DAC001F X-Stat-Signature: 73z1b7m4ucshf4mbtagdp1m8cyzdrduf X-Rspam-User: X-HE-Tag: 1678711468-939404 X-HE-Meta: U2FsdGVkX18j3s8MYdHoCvIArhDoD4et8DMOIPvV++YStQwjRqihZg+xhGOlfqwCRT99J+XCZC0iseeqZlDUJnqQOYBaB8/mZWVvPh9hUSFd+yZoTWMWVZanNBsAIfV4dZzkPtoyTxIygEQPwJGSArdypbt+ylHWFr7/HJDiYNsRavRjymwBQx6wh48Pu9RlfaeDBiCZ2GS/7P/iNhva3g4bCg2rhosqcADP7OFpr3tV/I2aOKSd4yEBpHJOaSRjYWvb2FZESvB5JhsPe5sO/kM5oBQQtXPBqn7hxi2qTu3qyLieqyJ4hRYi9cMMAZl54LuwJSRMKrWqaUPj93adlRMkSAjB/HQZqmnnKi+veVUaWThqC50B5iUkKPtuos6ZCJYx63X+lsWMcgqYdPoA+q2+oWU0xsFLSUeWfA03EIjVSSzgLKEyNP6u+73sgLD8hT5mZMxvsVkFcbkNmHsXNAjJxjwCuqowo2L+GSB4zK7iVs1rKHzoB3K2CP4ojlEE8VCoxB6OrN310dDQ+x40wL6Je+AHIBi9dmi9mRvySICfPTShsvnbrnS54RYTmmfLx+/kVCfote+/dO/oU8a94lbadUBq1TcoAbm3wEP645VZe/hYTiQz5NhY7q2JFb2k0DDUDqyRHBinHL7qvTyRvGundVsOBXyOHMZtj2BceKf5BI6GhK/kXYqbIe8hi6GV1nswUHLKszDY0n01m+mC0SSKYOmDu5djUeY9ch101X7s3JTxqrhrAqiVYRRuAE3UD6qmVZx4fP1HoK/smS1k6Ev/4WRNN5VegmTbyKTzLS2aM0rhZj7qlN7Tws1VAiWUqRjUL0546G2YGHNX/5yCzeayxUrRGqNBjnDbgOB9OCDTokcUY9ODGhmfKkCalykL4oYINAnYCEQi2JfVAWQEJxa4+2hXztmkzXwkOp6w7KY1La2n7YpcRdpVLzomoPKf2N9Q8tIAXCVsuiPmdzc HHpTa2rn CzRUVqy5XkmbHiJXNMEv7kH6VXbrPE953GjAU9UhjPBQyt22IWOP5Fm9XuXB/uiXLeRn+AOKCCFvmPFI20jvvFaBoBSzedNhHkZbGnbsL6kUJZ2JNpivFvHBDhCCRzC223FfRrQZc5a68Cmd54i6k4eb/Cnxux+mxBFWFnAP1eXAK+M4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: No functional change. Just code reorganized. Signed-off-by: Yin Fengwei --- mm/rmap.c | 369 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 194 insertions(+), 175 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 3a2e3ccb8031..23eda671447a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1538,17 +1538,204 @@ static bool try_to_unmap_one_hugetlb(struct folio *folio, return ret; } +static bool try_to_unmap_one_page(struct folio *folio, + struct vm_area_struct *vma, struct mmu_notifier_range range, + struct page_vma_mapped_walk pvmw, unsigned long address, + enum ttu_flags flags) +{ + bool anon_exclusive, ret = true; + struct page *subpage; + struct mm_struct *mm = vma->vm_mm; + pte_t pteval; + + subpage = folio_page(folio, + pte_pfn(*pvmw.pte) - folio_pfn(folio)); + anon_exclusive = folio_test_anon(folio) && + PageAnonExclusive(subpage); + + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + if (should_defer_flush(mm, flags)) { + /* + * We clear the PTE but do not flush so potentially + * a remote CPU could still be writing to the folio. + * If the entry was previously clean then the + * architecture must guarantee that a clear->dirty + * transition on a cached TLB entry is written through + * and traps if the PTE is unmapped. + */ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); + + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } + + /* + * Now the pte is cleared. If this pte was uffd-wp armed, + * we may want to replace a none pte with a marker pte if + * it's file-backed, so we don't lose the tracking info. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + + /* Set the dirty flag on the folio now the pte is gone. */ + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + if (PageHWPoison(subpage) && !(flags & TTU_HWPOISON)) { + pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); + dec_mm_counter(mm, mm_counter(&folio->page)); + set_pte_at(mm, address, pvmw.pte, pteval); + } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + /* + * The guest indicated that the page content is of no + * interest anymore. Simply discard the pte, vmscan + * will take care of the rest. + * A future reference will then fault in a new zero + * page. When userfaultfd is active, we must not drop + * this page though, as its main user (postcopy + * migration) will not expect userfaults on already + * copied pages. + */ + dec_mm_counter(mm, mm_counter(&folio->page)); + /* We have to invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + } else if (folio_test_anon(folio)) { + swp_entry_t entry = { .val = page_private(subpage) }; + pte_t swp_pte; + /* + * Store the swap location in the pte. + * See handle_pte_fault() ... + */ + if (unlikely(folio_test_swapbacked(folio) != + folio_test_swapcache(folio))) { + WARN_ON_ONCE(1); + ret = false; + /* We have to invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + /* MADV_FREE page check */ + if (!folio_test_swapbacked(folio)) { + int ref_count, map_count; + + /* + * Synchronize with gup_pte_range(): + * - clear PTE; barrier; read refcount + * - inc refcount; barrier; read PTE + */ + smp_mb(); + + ref_count = folio_ref_count(folio); + map_count = folio_mapcount(folio); + + /* + * Order reads for page refcount and dirty flag + * (see comments in __remove_mapping()). + */ + smp_rmb(); + + /* + * The only page refs must be one from isolation + * plus the rmap(s) (dropped by discard:). + */ + if (ref_count == 1 + map_count && + !folio_test_dirty(folio)) { + /* Invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, + address, address + PAGE_SIZE); + dec_mm_counter(mm, MM_ANONPAGES); + goto discard; + } + + /* + * If the folio was redirtied, it cannot be + * discarded. Remap the page to page table. + */ + set_pte_at(mm, address, pvmw.pte, pteval); + folio_set_swapbacked(folio); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + if (swap_duplicate(entry) < 0) { + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + swap_free(entry); + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + /* See page_try_share_anon_rmap(): clear PTE first. */ + if (anon_exclusive && + page_try_share_anon_rmap(subpage)) { + swap_free(entry); + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + dec_mm_counter(mm, MM_ANONPAGES); + inc_mm_counter(mm, MM_SWAPENTS); + swp_pte = swp_entry_to_pte(entry); + if (anon_exclusive) + swp_pte = pte_swp_mkexclusive(swp_pte); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + set_pte_at(mm, address, pvmw.pte, swp_pte); + /* Invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + } else { + /* + * This is a locked file-backed folio, + * so it cannot be removed from the page + * cache and replaced by a new folio before + * mmu_notifier_invalidate_range_end, so no + * concurrent thread might update its page table + * to point at a new folio while a device is + * still using this folio. + * + * See Documentation/mm/mmu_notifier.rst + */ + dec_mm_counter(mm, mm_counter_file(&folio->page)); + } + +discard: + return ret; +} + /* * @arg: enum ttu_flags will be passed to this argument */ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { - struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); - pte_t pteval; struct page *subpage; - bool anon_exclusive, ret = true; + bool ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; @@ -1613,179 +1800,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); - anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); - - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - /* Nuke the page table entry. */ - if (should_defer_flush(mm, flags)) { - /* - * We clear the PTE but do not flush so potentially - * a remote CPU could still be writing to the folio. - * If the entry was previously clean then the - * architecture must guarantee that a clear->dirty - * transition on a cached TLB entry is written through - * and traps if the PTE is unmapped. - */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); - - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); - } - - /* - * Now the pte is cleared. If this pte was uffd-wp armed, - * we may want to replace a none pte with a marker pte if - * it's file-backed, so we don't lose the tracking info. - */ - pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); - - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - - /* Update high watermark before we lower rss */ - update_hiwater_rss(mm); - - if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) { - pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); - dec_mm_counter(mm, mm_counter(&folio->page)); - set_pte_at(mm, address, pvmw.pte, pteval); - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { - /* - * The guest indicated that the page content is of no - * interest anymore. Simply discard the pte, vmscan - * will take care of the rest. - * A future reference will then fault in a new zero - * page. When userfaultfd is active, we must not drop - * this page though, as its main user (postcopy - * migration) will not expect userfaults on already - * copied pages. - */ - dec_mm_counter(mm, mm_counter(&folio->page)); - /* We have to invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - } else if (folio_test_anon(folio)) { - swp_entry_t entry = { .val = page_private(subpage) }; - pte_t swp_pte; - /* - * Store the swap location in the pte. - * See handle_pte_fault() ... - */ - if (unlikely(folio_test_swapbacked(folio) != - folio_test_swapcache(folio))) { - WARN_ON_ONCE(1); - ret = false; - /* We have to invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - page_vma_mapped_walk_done(&pvmw); - break; - } - - /* MADV_FREE page check */ - if (!folio_test_swapbacked(folio)) { - int ref_count, map_count; - - /* - * Synchronize with gup_pte_range(): - * - clear PTE; barrier; read refcount - * - inc refcount; barrier; read PTE - */ - smp_mb(); - - ref_count = folio_ref_count(folio); - map_count = folio_mapcount(folio); - - /* - * Order reads for page refcount and dirty flag - * (see comments in __remove_mapping()). - */ - smp_rmb(); - - /* - * The only page refs must be one from isolation - * plus the rmap(s) (dropped by discard:). - */ - if (ref_count == 1 + map_count && - !folio_test_dirty(folio)) { - /* Invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, - address, address + PAGE_SIZE); - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } - - /* - * If the folio was redirtied, it cannot be - * discarded. Remap the page to page table. - */ - set_pte_at(mm, address, pvmw.pte, pteval); - folio_set_swapbacked(folio); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - - if (swap_duplicate(entry) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - swap_free(entry); - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } + ret = try_to_unmap_one_page(folio, vma, + range, pvmw, address, flags); + if (!ret) + break; - /* See page_try_share_anon_rmap(): clear PTE first. */ - if (anon_exclusive && - page_try_share_anon_rmap(subpage)) { - swap_free(entry); - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - if (list_empty(&mm->mmlist)) { - spin_lock(&mmlist_lock); - if (list_empty(&mm->mmlist)) - list_add(&mm->mmlist, &init_mm.mmlist); - spin_unlock(&mmlist_lock); - } - dec_mm_counter(mm, MM_ANONPAGES); - inc_mm_counter(mm, MM_SWAPENTS); - swp_pte = swp_entry_to_pte(entry); - if (anon_exclusive) - swp_pte = pte_swp_mkexclusive(swp_pte); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); - /* Invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - } else { - /* - * This is a locked file-backed folio, - * so it cannot be removed from the page - * cache and replaced by a new folio before - * mmu_notifier_invalidate_range_end, so no - * concurrent thread might update its page table - * to point at a new folio while a device is - * still using this folio. - * - * See Documentation/mm/mmu_notifier.rst - */ - dec_mm_counter(mm, mm_counter_file(&folio->page)); - } -discard: /* * No need to call mmu_notifier_invalidate_range() it has be * done above for all cases requiring it to happen under page