From patchwork Tue Feb 28 12:23:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13154853 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3E7FC64ED6 for ; Tue, 28 Feb 2023 12:22:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF7956B007D; Tue, 28 Feb 2023 07:22:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D0A456B007B; Tue, 28 Feb 2023 07:22:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A98F36B007D; Tue, 28 Feb 2023 07:22:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9CE606B0078 for ; Tue, 28 Feb 2023 07:22:28 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 69F9380955 for ; Tue, 28 Feb 2023 12:22:28 +0000 (UTC) X-FDA: 80516613576.11.EA92C6C Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf06.hostedemail.com (Postfix) with ESMTP id 4E4E1180018 for ; Tue, 28 Feb 2023 12:22:26 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jo8sbaBD; spf=pass (imf06.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677586946; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q3sS0hurUNQ+Y++fTudY1yfNdB+1Q6sVmeSmWveel0I=; b=0pM4OTLgjoYKADivlncTdgYxRP3hh0MopydPXmuciujVEf/nuylV0TY+HrZbr4yxZV8q+E XoipUSPgKMz8Om0DOD2FDzASss1QqmUn+OHVQaqsUgoyGTCL4Lad00m/0pxFQl/7hpj8Om fCgioh3Ay1/sSkMGyVw0aE+MWeLG7ms= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jo8sbaBD; spf=pass (imf06.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677586946; a=rsa-sha256; cv=none; b=wOHYznVph90wZVK5p+THbGvSYm2p45RPojev1u2vJbPoj9w+L1lFPfWpOPlvu8imuDCBBC XbJ4a1hT9QAWY8yyVOvg+ICNLJW0FST/DWu5HlcJBWItDBavQv06iye7QtJKpnyaH7RKhX 5Qjh+PaSGlksErVDBeowM3id6WFUFIo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677586946; x=1709122946; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/8ZslATSdQSy2Jniwxmoxv3WRHCslMvDl3ogC59rwcQ=; b=jo8sbaBDJK9++ks7kB0Tzis7JTNlVotNtfZMwFhS1PFL3oQh6xbV/ofe LjDjqx2VeODGpg+5+fBqiFuN/ZiK8ltpSdtFUz7bE+PLs2hVhxJjUIOxj H1bMRm6S516pV7xr+ksGg1zIAyfhPOdW7H4M2diEbd2hSP00LwB5dfNlS RGW+972/lHQ18Y7K4pc1bewXnTN2nsJMzQhCzpJE4QFfvh3XzqleFzPP5 LMVIqwOkkNIYipuSnIfknrAprcGPT1gxgiWQQwbN11Fi1GQN+XCSmi+S3 Rv3++RaCqrZH8Gbcve+eVXI/8Eydxv/4IAAvWNwlNHdx4LnX7zWgV11mN g==; X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="317921175" X-IronPort-AV: E=Sophos;i="5.98,221,1673942400"; d="scan'208";a="317921175" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2023 04:22:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="1003220723" X-IronPort-AV: E=Sophos;i="5.98,221,1673942400"; d="scan'208";a="1003220723" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga005.fm.intel.com with ESMTP; 28 Feb 2023 04:21:58 -0800 From: Yin Fengwei To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, sidhartha.kumar@oracle.com, mike.kravetz@oracle.com, jane.chu@oracle.com, naoya.horiguchi@nec.com Cc: fengwei.yin@intel.com Subject: [PATCH v2 1/5] rmap: move hugetlb try_to_unmap to dedicated function Date: Tue, 28 Feb 2023 20:23:04 +0800 Message-Id: <20230228122308.2972219-2-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230228122308.2972219-1-fengwei.yin@intel.com> References: <20230228122308.2972219-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4E4E1180018 X-Stat-Signature: btgt3w5digrh1y98kjyzcbbgkbb5jhoo X-Rspam-User: X-HE-Tag: 1677586946-223794 X-HE-Meta: U2FsdGVkX1/l2M6T7AmR5mLEUDGjG2C5xx2n1wt7UwlrsuRPWWIKVJxdzbFJLDgPwpYN+C54NlHFuQ1Ax26cZ3fHBF5401ffumvS4o8FIl/d5N7m5bQhU1lXRLO3Xa3qgp/apfylLSjYeJov6mPm4ZmkWKRch4FxfKKf1XZDRpGTUpyhHGqLIgx/SDqrELYrumzRIkuAOXC7HWkCH9pbsvY8o+ebuQ0dIkHp0dqzLzp6ZDt8+A9HobTlr4RZYfRUhflqfVcAIkSizQ2sCxQIVWrNJOrmsMfmxIgEo8PvAX8IoH4quBnDSlOlHXLkUuR9p0Guj0usf2YnVnY7rkLmuZ0Q+UH9i714aRhf2QIsvDwTtPGMfd/ZXmoOFcHciVw9ij09ABpR8hDKGaUZqqekmCReydZ+hWrf5ehRWK7ikPJC17oH0UlqhMp8yRAl/5q9v3yOCn16W09IwptjUpzodpYw9qRTCw6s8Hi5t+F+fKJelQh23D/eQMP1PDGJ5edZuYkF3Vai9NOR4tvnT5PTcPH08xNAVUtfTdFeoE0yfYb2xA4wBy2ZEr8mHRtmw1a3NPyGkUt9DaF2G9BBqJ10fAKK4qLaJVsJmPu1AiweAMPlrPDycm6+p8rmqeykoZMvr/zLrXDEZ6gH49/9X0A+C2FwitNtdauLaEcefly2OCYE9I0sDoJhQ1IyDBLedgcwBls65qJoU6Wpjr8zOeGmssZPJXqO2IjT4vHX6xHI+BgxWT7Cveix0EeChKhB5anExajO3TR84IZ9vtz0vNI67TxMTKoubIZTcOOlPSVwDL1xaUZ7KZmDAPkvOggpNAFCrajru81RCmm3SPTTpCQgVU2zdkiAPzpH4qZm5BOFVYjfUlgE+ZG1YgWSh9Swtktxb0ugLL0wMIP87OJM61F7QoEdpHg0DgihzckonCmchb845X3iFoQVmE06TEH8ugqSCRN8qNdPS83mbBu4JAK J4jnVuD7 V8In2MF40X0pPEGTtCqG3EGzvPoxtFpyK6zE7oJyZ4kR5pBJVP6HQGb+kR8KpDO/Zkt+7j3eWQV5JLNqMLJPcDSKOZ4l5wGqGfniSeSkl3FvSIxiGYwVpex8AgVedcyJmenjN6oWzdOXAbz5KuP6SiNtyBFABtmUWRO9PoRRrlR9xDSczwDngz0sHcTMt3XFYEcSlGwMuKxFZZbcr95gdcy8tsF+V/FjZSwAJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It's to prepare the batched rmap update for large folio. No need to looped handle hugetlb. Just handle hugetlb and bail out early. Signed-off-by: Yin Fengwei --- mm/rmap.c | 200 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 121 insertions(+), 79 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 8632e02661ac..0f09518d6f30 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1443,6 +1443,103 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, munlock_vma_folio(folio, vma, compound); } +static bool try_to_unmap_one_hugetlb(struct folio *folio, + struct vm_area_struct *vma, struct mmu_notifier_range range, + struct page_vma_mapped_walk pvmw, unsigned long address, + enum ttu_flags flags) +{ + struct mm_struct *mm = vma->vm_mm; + pte_t pteval; + bool ret = true, anon = folio_test_anon(folio); + + /* + * The try_to_unmap() is only passed a hugetlb page + * in the case where the hugetlb page is poisoned. + */ + VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio); + /* + * huge_pmd_unshare may unmap an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range.start, range.end); + + /* + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and fail + * if unsuccessful. + */ + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + ret = false; + goto out; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + goto out; + } + hugetlb_vma_unlock_write(vma); + } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + + /* + * Now the pte is cleared. If this pte was uffd-wp armed, + * we may want to replace a none pte with a marker pte if + * it's file-backed, so we don't lose the tracking info. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + + /* Set the dirty flag on the folio now the pte is gone. */ + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + /* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */ + pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page)); + set_huge_pte_at(mm, address, pvmw.pte, pteval); + hugetlb_count_sub(folio_nr_pages(folio), mm); + + /* + * No need to call mmu_notifier_invalidate_range() it has be + * done above for all cases requiring it to happen under page + * table lock before mmu_notifier_invalidate_range_end() + * + * See Documentation/mm/mmu_notifier.rst + */ + page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio)); + /* No VM_LOCKED set in vma->vm_flags for hugetlb. So not + * necessary to call mlock_drain_local(). + */ + folio_put(folio); + +out: + return ret; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -1506,86 +1603,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } + address = pvmw.address; + if (folio_test_hugetlb(folio)) { + ret = try_to_unmap_one_hugetlb(folio, vma, range, + pvmw, address, flags); + + /* no need to loop for hugetlb */ + page_vma_mapped_walk_done(&pvmw); + break; + } + subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); - address = pvmw.address; anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(subpage); - if (folio_test_hugetlb(folio)) { - bool anon = folio_test_anon(folio); - - /* - * The try_to_unmap() is only passed a hugetlb page - * in the case where the hugetlb page is poisoned. - */ - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + if (should_defer_flush(mm, flags)) { /* - * huge_pmd_unshare may unmap an entire PMD page. - * There is no way of knowing exactly which PMDs may - * be cached for this mm, so we must flush them all. - * start/end were already adjusted above to cover this - * range. + * We clear the PTE but do not flush so potentially + * a remote CPU could still be writing to the folio. + * If the entry was previously clean then the + * architecture must guarantee that a clear->dirty + * transition on a cached TLB entry is written through + * and traps if the PTE is unmapped. */ - flush_cache_range(vma, range.start, range.end); + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - * - * We also must hold hugetlb vma_lock in write mode. - * Lock order dictates acquiring vma_lock BEFORE - * i_mmap_rwsem. We can only try lock here and fail - * if unsuccessful. - */ - if (!anon) { - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); - if (!hugetlb_vma_trylock_write(vma)) { - page_vma_mapped_walk_done(&pvmw); - ret = false; - break; - } - if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - hugetlb_vma_unlock_write(vma); - flush_tlb_range(vma, - range.start, range.end); - mmu_notifier_invalidate_range(mm, - range.start, range.end); - /* - * The ref count of the PMD page was - * dropped which is part of the way map - * counting is done for shared PMDs. - * Return 'true' here. When there is - * no other sharing, huge_pmd_unshare - * returns false and we will unmap the - * actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; - } - hugetlb_vma_unlock_write(vma); - } - pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); } else { - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - /* Nuke the page table entry. */ - if (should_defer_flush(mm, flags)) { - /* - * We clear the PTE but do not flush so potentially - * a remote CPU could still be writing to the folio. - * If the entry was previously clean then the - * architecture must guarantee that a clear->dirty - * transition on a cached TLB entry is written through - * and traps if the PTE is unmapped. - */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); - - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); - } + pteval = ptep_clear_flush(vma, address, pvmw.pte); } /* @@ -1604,14 +1652,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); - if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); - set_huge_pte_at(mm, address, pvmw.pte, pteval); - } else { - dec_mm_counter(mm, mm_counter(&folio->page)); - set_pte_at(mm, address, pvmw.pte, pteval); - } - + dec_mm_counter(mm, mm_counter(&folio->page)); + set_pte_at(mm, address, pvmw.pte, pteval); } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no