From patchwork Mon Mar 6 09:22:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13160757 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2C30C61DA4 for ; Mon, 6 Mar 2023 09:21:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D5186B0075; Mon, 6 Mar 2023 04:21:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6868E280001; Mon, 6 Mar 2023 04:21:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 528366B007B; Mon, 6 Mar 2023 04:21:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 455E76B0075 for ; Mon, 6 Mar 2023 04:21:59 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EBD731A0931 for ; Mon, 6 Mar 2023 09:21:58 +0000 (UTC) X-FDA: 80537931516.21.42258D4 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf29.hostedemail.com (Postfix) with ESMTP id D065B120015 for ; Mon, 6 Mar 2023 09:21:56 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KGFydPag; spf=pass (imf29.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678094517; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gYqZfrAlseg/s6Sc6J91lblgYt6k4J4Slcn85cNAxqo=; b=oAbdNLCt5v+btpt1krj+ECAVl73FV+8AGX7H2TKy+4NNurTsK2eEqd8q/blp5ac9H8V2B7 ECb9WeRCIslpdhrYObNowCuFGPuZdtoCjWW6h6X0PrQoYo/6NC8+zVRiKrhrjZX3XjJcRM csz5StR7iBJKD0rNxIeyfVRIsNvBG9s= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KGFydPag; spf=pass (imf29.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678094517; a=rsa-sha256; cv=none; b=7mn8Ol8YD13XtI1zAiIeJdZAK6JQfq41+IBTo7LjZYA+RL8s7TCNodGIEgx2VFl/fJW2k6 WWkb5cnCOnevRwozhFTV8xNIHqQLmHmtjhFaOS8ol0B+TPAih6NVerDSdtfxNFPk636ASj msjmqN3njTjZdESJ1tYUpH/rTPp5oLU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678094517; x=1709630517; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cbep/069ys7eOkAdGkMtuK/fXBOfL7OwZ64lPB2f1nE=; b=KGFydPag1gtJPoDjwABqOJY5Pbn/JZj+MVoJ6UhmKmY5nNrgRO1njc18 i5SXIpJJpAeAeZwe/8tFjpr3DVoYAIqZdH2wMhE1sYFFotdOUln4f45nX x+ni5cCLVgVDWyJBSiXg2k4XHGn8EeKfxIw/zee+3H6s5PNy8gaezmR+C Ni45HIFz9ElRnOBlodLpTSspnFXS0J1pnlfAl7jwdYLdB1jtg1I0eH/jQ egKJdccwtkezhj18cRZi1W0HIU710h5e3n/RTeyJwJPQ+1dcek01e1aO/ YefXSWYgKC8Pn7lSmORl0mDpt5q6+wSVa7N7qh14EvETI+B6atXyB5PMQ g==; X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="334225594" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="334225594" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 01:21:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10640"; a="676099442" X-IronPort-AV: E=Sophos;i="5.98,236,1673942400"; d="scan'208";a="676099442" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga002.jf.intel.com with ESMTP; 06 Mar 2023 01:21:52 -0800 From: Yin Fengwei To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, mike.kravetz@oracle.com, sidhartha.kumar@oracle.com, naoya.horiguchi@nec.com, jane.chu@oracle.com, david@redhat.com Cc: fengwei.yin@intel.com Subject: [PATCH v3 1/5] rmap: move hugetlb try_to_unmap to dedicated function Date: Mon, 6 Mar 2023 17:22:55 +0800 Message-Id: <20230306092259.3507807-2-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230306092259.3507807-1-fengwei.yin@intel.com> References: <20230306092259.3507807-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D065B120015 X-Stat-Signature: 58fwye5dpuwhc77whg9d5fdq5mycrpd7 X-Rspam-User: X-HE-Tag: 1678094516-43851 X-HE-Meta: U2FsdGVkX1+fnWJI1d1N/UP74sBvFcjZUYHdCx915M8mt0TY/79RvE00NyhMMP/ruH5Nss2EFjM8250IHFd7lOXX00mv3psAgZ0f5QJ7psB53ay0ygeT8b6gdALuJIQSfNqNIVojEyIbOGzXgxy7FasprxRZetDSGrWsAjB6vvzJcGWjbk+YuL+Ji14vGep0F/qpRL9zOxq6mLuXZ2CmkjSTxubHGLW6IjSixByim4XhQ8/Jau0iEFgeX53E2P/m5ILxAJZDAcl8RpupC7PBTr2hjcjd9zRw2/GZb6eEcrZlqQLZTSTOlKRlCjdp/9JBsrKmgqxE28t9ecUMuvJR6KfoQOQSU+WTHBjotiZWfD7Dw8SfPIsW+sv6g4sc3IqeOdm0IpjG/e0m8g7zpRNDb8hAXjZ9PKdLmL4DIBfM/QsQD2ADDsaxRwzJJuT7fo8ezx39yclqDJCcNa4Iiv2Nsr3/ZsK+PHPQSOfKW35kYpylsdt7njKxSoW+7KNmkXc/ZhOziPj8dxUae+bLbcPb1i8rkKbLnJqlOsEqkNBbrHKr1xN/APggrVd0GTlxsVeX1b0bbIrqi2o7sS0LCp+jrsIPy38FEVWPuuFkNg016OR9eEHszJyymOnMTF1dhL82qHi1jJ0OaVzsy26SCTBXIfHu0WMK+Il/bFGDedQ2G0NIJDkwrY1y2TZZZ4o+oy2AQWB2ukvzJW7KI32HDpgwqimtc+PKsufRMdxDm409nYBM4rAk0ZRk8E8Q0UJBD8QnCq3dw+y/yggYMKkNbvdEIpNDZy0y5c0DhYMxVEFOJx8LwxyQVG9vFS3bWO/Y1YJ89Df7/zMwUbjAxVERiaaCneggYPpkOwYBxYcDDK6CG99NFSeF5qzOo7SmkR1Crbnih/pNswiggNmj4Hl9vmWoyGZN8zOUllvkcik3n7zNR+FTZ9nw+tGdoI3Pv3m/bHnDCqrIe/gSvlqmERo1OLJ PCJOT0uI wvF+xGkXpIxcHVJ6quOJalByJfBKDhwT4y+DHGWXT2BDnxYc4dX9X6XQ9ellAYIcsPDjQ+RaLsQL+Bb81LNWCWW86KPVqxuhVn/cCnqmTrSUz6B02jck87lsst1LsKrNDIiAH+sNgcZLgWhWEOkvCwCQ3FcWVOMAMAC9Z48ycaVRX4QQdFaFtLif8ZH8RLmy4zFxIgm7w9VN/T2EB+ly/jby0Pw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It's to prepare the batched rmap update for large folio. No need to looped handle hugetlb. Just handle hugetlb and bail out early. Signed-off-by: Yin Fengwei Reviewed-by: Mike Kravetz --- mm/rmap.c | 200 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 121 insertions(+), 79 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index ba901c416785..508d141dacc5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1441,6 +1441,103 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, munlock_vma_folio(folio, vma, compound); } +static bool try_to_unmap_one_hugetlb(struct folio *folio, + struct vm_area_struct *vma, struct mmu_notifier_range range, + struct page_vma_mapped_walk pvmw, unsigned long address, + enum ttu_flags flags) +{ + struct mm_struct *mm = vma->vm_mm; + pte_t pteval; + bool ret = true, anon = folio_test_anon(folio); + + /* + * The try_to_unmap() is only passed a hugetlb page + * in the case where the hugetlb page is poisoned. + */ + VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio); + /* + * huge_pmd_unshare may unmap an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range.start, range.end); + + /* + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and fail + * if unsuccessful. + */ + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + ret = false; + goto out; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + goto out; + } + hugetlb_vma_unlock_write(vma); + } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + + /* + * Now the pte is cleared. If this pte was uffd-wp armed, + * we may want to replace a none pte with a marker pte if + * it's file-backed, so we don't lose the tracking info. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + + /* Set the dirty flag on the folio now the pte is gone. */ + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + /* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */ + pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page)); + set_huge_pte_at(mm, address, pvmw.pte, pteval); + hugetlb_count_sub(folio_nr_pages(folio), mm); + + /* + * No need to call mmu_notifier_invalidate_range() it has be + * done above for all cases requiring it to happen under page + * table lock before mmu_notifier_invalidate_range_end() + * + * See Documentation/mm/mmu_notifier.rst + */ + page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio)); + /* No VM_LOCKED set in vma->vm_flags for hugetlb. So not + * necessary to call mlock_drain_local(). + */ + folio_put(folio); + +out: + return ret; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -1504,86 +1601,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } + address = pvmw.address; + if (folio_test_hugetlb(folio)) { + ret = try_to_unmap_one_hugetlb(folio, vma, range, + pvmw, address, flags); + + /* no need to loop for hugetlb */ + page_vma_mapped_walk_done(&pvmw); + break; + } + subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); - address = pvmw.address; anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(subpage); - if (folio_test_hugetlb(folio)) { - bool anon = folio_test_anon(folio); - - /* - * The try_to_unmap() is only passed a hugetlb page - * in the case where the hugetlb page is poisoned. - */ - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + if (should_defer_flush(mm, flags)) { /* - * huge_pmd_unshare may unmap an entire PMD page. - * There is no way of knowing exactly which PMDs may - * be cached for this mm, so we must flush them all. - * start/end were already adjusted above to cover this - * range. + * We clear the PTE but do not flush so potentially + * a remote CPU could still be writing to the folio. + * If the entry was previously clean then the + * architecture must guarantee that a clear->dirty + * transition on a cached TLB entry is written through + * and traps if the PTE is unmapped. */ - flush_cache_range(vma, range.start, range.end); + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - * - * We also must hold hugetlb vma_lock in write mode. - * Lock order dictates acquiring vma_lock BEFORE - * i_mmap_rwsem. We can only try lock here and fail - * if unsuccessful. - */ - if (!anon) { - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); - if (!hugetlb_vma_trylock_write(vma)) { - page_vma_mapped_walk_done(&pvmw); - ret = false; - break; - } - if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - hugetlb_vma_unlock_write(vma); - flush_tlb_range(vma, - range.start, range.end); - mmu_notifier_invalidate_range(mm, - range.start, range.end); - /* - * The ref count of the PMD page was - * dropped which is part of the way map - * counting is done for shared PMDs. - * Return 'true' here. When there is - * no other sharing, huge_pmd_unshare - * returns false and we will unmap the - * actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; - } - hugetlb_vma_unlock_write(vma); - } - pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); } else { - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - /* Nuke the page table entry. */ - if (should_defer_flush(mm, flags)) { - /* - * We clear the PTE but do not flush so potentially - * a remote CPU could still be writing to the folio. - * If the entry was previously clean then the - * architecture must guarantee that a clear->dirty - * transition on a cached TLB entry is written through - * and traps if the PTE is unmapped. - */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); - - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); - } + pteval = ptep_clear_flush(vma, address, pvmw.pte); } /* @@ -1602,14 +1650,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); - if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); - set_huge_pte_at(mm, address, pvmw.pte, pteval); - } else { - dec_mm_counter(mm, mm_counter(&folio->page)); - set_pte_at(mm, address, pvmw.pte, pteval); - } - + dec_mm_counter(mm, mm_counter(&folio->page)); + set_pte_at(mm, address, pvmw.pte, pteval); } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no