From patchwork Tue Feb 28 12:23:04 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yin Fengwei <fengwei.yin@intel.com>
X-Patchwork-Id: 13154853
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F3E7FC64ED6
	for <linux-mm@archiver.kernel.org>; Tue, 28 Feb 2023 12:22:29 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DF7956B007D; Tue, 28 Feb 2023 07:22:28 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D0A456B007B; Tue, 28 Feb 2023 07:22:28 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A98F36B007D; Tue, 28 Feb 2023 07:22:28 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com
 [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 9CE606B0078
	for <linux-mm@kvack.org>; Tue, 28 Feb 2023 07:22:28 -0500 (EST)
Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id 69F9380955
	for <linux-mm@kvack.org>; Tue, 28 Feb 2023 12:22:28 +0000 (UTC)
X-FDA: 80516613576.11.EA92C6C
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
	by imf06.hostedemail.com (Postfix) with ESMTP id 4E4E1180018
	for <linux-mm@kvack.org>; Tue, 28 Feb 2023 12:22:26 +0000 (UTC)
Authentication-Results: imf06.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=jo8sbaBD;
	spf=pass (imf06.hostedemail.com: domain of fengwei.yin@intel.com designates
 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1677586946;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=q3sS0hurUNQ+Y++fTudY1yfNdB+1Q6sVmeSmWveel0I=;
	b=0pM4OTLgjoYKADivlncTdgYxRP3hh0MopydPXmuciujVEf/nuylV0TY+HrZbr4yxZV8q+E
	XoipUSPgKMz8Om0DOD2FDzASss1QqmUn+OHVQaqsUgoyGTCL4Lad00m/0pxFQl/7hpj8Om
	fCgioh3Ay1/sSkMGyVw0aE+MWeLG7ms=
ARC-Authentication-Results: i=1;
	imf06.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=jo8sbaBD;
	spf=pass (imf06.hostedemail.com: domain of fengwei.yin@intel.com designates
 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677586946; a=rsa-sha256;
	cv=none;
	b=wOHYznVph90wZVK5p+THbGvSYm2p45RPojev1u2vJbPoj9w+L1lFPfWpOPlvu8imuDCBBC
	XbJ4a1hT9QAWY8yyVOvg+ICNLJW0FST/DWu5HlcJBWItDBavQv06iye7QtJKpnyaH7RKhX
	5Qjh+PaSGlksErVDBeowM3id6WFUFIo=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677586946; x=1709122946;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=/8ZslATSdQSy2Jniwxmoxv3WRHCslMvDl3ogC59rwcQ=;
  b=jo8sbaBDJK9++ks7kB0Tzis7JTNlVotNtfZMwFhS1PFL3oQh6xbV/ofe
   LjDjqx2VeODGpg+5+fBqiFuN/ZiK8ltpSdtFUz7bE+PLs2hVhxJjUIOxj
   H1bMRm6S516pV7xr+ksGg1zIAyfhPOdW7H4M2diEbd2hSP00LwB5dfNlS
   RGW+972/lHQ18Y7K4pc1bewXnTN2nsJMzQhCzpJE4QFfvh3XzqleFzPP5
   LMVIqwOkkNIYipuSnIfknrAprcGPT1gxgiWQQwbN11Fi1GQN+XCSmi+S3
   Rv3++RaCqrZH8Gbcve+eVXI/8Eydxv/4IAAvWNwlNHdx4LnX7zWgV11mN
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="317921175"
X-IronPort-AV: E=Sophos;i="5.98,221,1673942400";
   d="scan'208";a="317921175"
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 28 Feb 2023 04:22:14 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="1003220723"
X-IronPort-AV: E=Sophos;i="5.98,221,1673942400";
   d="scan'208";a="1003220723"
Received: from fyin-dev.sh.intel.com ([10.239.159.32])
  by fmsmga005.fm.intel.com with ESMTP; 28 Feb 2023 04:21:58 -0800
From: Yin Fengwei <fengwei.yin@intel.com>
To: linux-mm@kvack.org,
	akpm@linux-foundation.org,
	willy@infradead.org,
	sidhartha.kumar@oracle.com,
	mike.kravetz@oracle.com,
	jane.chu@oracle.com,
	naoya.horiguchi@nec.com
Cc: fengwei.yin@intel.com
Subject: [PATCH v2 1/5] rmap: move hugetlb try_to_unmap to dedicated function
Date: Tue, 28 Feb 2023 20:23:04 +0800
Message-Id: <20230228122308.2972219-2-fengwei.yin@intel.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20230228122308.2972219-1-fengwei.yin@intel.com>
References: <20230228122308.2972219-1-fengwei.yin@intel.com>
MIME-Version: 1.0
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 4E4E1180018
X-Stat-Signature: btgt3w5digrh1y98kjyzcbbgkbb5jhoo
X-Rspam-User: 
X-HE-Tag: 1677586946-223794
X-HE-Meta: 
 U2FsdGVkX1/l2M6T7AmR5mLEUDGjG2C5xx2n1wt7UwlrsuRPWWIKVJxdzbFJLDgPwpYN+C54NlHFuQ1Ax26cZ3fHBF5401ffumvS4o8FIl/d5N7m5bQhU1lXRLO3Xa3qgp/apfylLSjYeJov6mPm4ZmkWKRch4FxfKKf1XZDRpGTUpyhHGqLIgx/SDqrELYrumzRIkuAOXC7HWkCH9pbsvY8o+ebuQ0dIkHp0dqzLzp6ZDt8+A9HobTlr4RZYfRUhflqfVcAIkSizQ2sCxQIVWrNJOrmsMfmxIgEo8PvAX8IoH4quBnDSlOlHXLkUuR9p0Guj0usf2YnVnY7rkLmuZ0Q+UH9i714aRhf2QIsvDwTtPGMfd/ZXmoOFcHciVw9ij09ABpR8hDKGaUZqqekmCReydZ+hWrf5ehRWK7ikPJC17oH0UlqhMp8yRAl/5q9v3yOCn16W09IwptjUpzodpYw9qRTCw6s8Hi5t+F+fKJelQh23D/eQMP1PDGJ5edZuYkF3Vai9NOR4tvnT5PTcPH08xNAVUtfTdFeoE0yfYb2xA4wBy2ZEr8mHRtmw1a3NPyGkUt9DaF2G9BBqJ10fAKK4qLaJVsJmPu1AiweAMPlrPDycm6+p8rmqeykoZMvr/zLrXDEZ6gH49/9X0A+C2FwitNtdauLaEcefly2OCYE9I0sDoJhQ1IyDBLedgcwBls65qJoU6Wpjr8zOeGmssZPJXqO2IjT4vHX6xHI+BgxWT7Cveix0EeChKhB5anExajO3TR84IZ9vtz0vNI67TxMTKoubIZTcOOlPSVwDL1xaUZ7KZmDAPkvOggpNAFCrajru81RCmm3SPTTpCQgVU2zdkiAPzpH4qZm5BOFVYjfUlgE+ZG1YgWSh9Swtktxb0ugLL0wMIP87OJM61F7QoEdpHg0DgihzckonCmchb845X3iFoQVmE06TEH8ugqSCRN8qNdPS83mbBu4JAK
 J4jnVuD7
 V8In2MF40X0pPEGTtCqG3EGzvPoxtFpyK6zE7oJyZ4kR5pBJVP6HQGb+kR8KpDO/Zkt+7j3eWQV5JLNqMLJPcDSKOZ4l5wGqGfniSeSkl3FvSIxiGYwVpex8AgVedcyJmenjN6oWzdOXAbz5KuP6SiNtyBFABtmUWRO9PoRRrlR9xDSczwDngz0sHcTMt3XFYEcSlGwMuKxFZZbcr95gdcy8tsF+V/FjZSwAJ
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

It's to prepare the batched rmap update for large folio.
No need to looped handle hugetlb. Just handle hugetlb and
bail out early.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/rmap.c | 200 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 121 insertions(+), 79 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 8632e02661ac..0f09518d6f30 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1443,6 +1443,103 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
 	munlock_vma_folio(folio, vma, compound);
 }
 
+static bool try_to_unmap_one_hugetlb(struct folio *folio,
+		struct vm_area_struct *vma, struct mmu_notifier_range range,
+		struct page_vma_mapped_walk pvmw, unsigned long address,
+		enum ttu_flags flags)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t pteval;
+	bool ret = true, anon = folio_test_anon(folio);
+
+	/*
+	 * The try_to_unmap() is only passed a hugetlb page
+	 * in the case where the hugetlb page is poisoned.
+	 */
+	VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
+	/*
+	 * huge_pmd_unshare may unmap an entire PMD page.
+	 * There is no way of knowing exactly which PMDs may
+	 * be cached for this mm, so we must flush them all.
+	 * start/end were already adjusted above to cover this
+	 * range.
+	 */
+	flush_cache_range(vma, range.start, range.end);
+
+	/*
+	 * To call huge_pmd_unshare, i_mmap_rwsem must be
+	 * held in write mode.  Caller needs to explicitly
+	 * do this outside rmap routines.
+	 *
+	 * We also must hold hugetlb vma_lock in write mode.
+	 * Lock order dictates acquiring vma_lock BEFORE
+	 * i_mmap_rwsem.  We can only try lock here and fail
+	 * if unsuccessful.
+	 */
+	if (!anon) {
+		VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+		if (!hugetlb_vma_trylock_write(vma)) {
+			ret = false;
+			goto out;
+		}
+		if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+			hugetlb_vma_unlock_write(vma);
+			flush_tlb_range(vma,
+					range.start, range.end);
+			mmu_notifier_invalidate_range(mm,
+					range.start, range.end);
+			/*
+			 * The ref count of the PMD page was
+			 * dropped which is part of the way map
+			 * counting is done for shared PMDs.
+			 * Return 'true' here.  When there is
+			 * no other sharing, huge_pmd_unshare
+			 * returns false and we will unmap the
+			 * actual page and drop map count
+			 * to zero.
+			 */
+			goto out;
+		}
+		hugetlb_vma_unlock_write(vma);
+	}
+	pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+
+	/*
+	 * Now the pte is cleared. If this pte was uffd-wp armed,
+	 * we may want to replace a none pte with a marker pte if
+	 * it's file-backed, so we don't lose the tracking info.
+	 */
+	pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
+
+	/* Set the dirty flag on the folio now the pte is gone. */
+	if (pte_dirty(pteval))
+		folio_mark_dirty(folio);
+
+	/* Update high watermark before we lower rss */
+	update_hiwater_rss(mm);
+
+	/* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */
+	pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
+	set_huge_pte_at(mm, address, pvmw.pte, pteval);
+	hugetlb_count_sub(folio_nr_pages(folio), mm);
+
+	/*
+	 * No need to call mmu_notifier_invalidate_range() it has be
+	 * done above for all cases requiring it to happen under page
+	 * table lock before mmu_notifier_invalidate_range_end()
+	 *
+	 * See Documentation/mm/mmu_notifier.rst
+	 */
+	page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio));
+	/* No VM_LOCKED set in vma->vm_flags for hugetlb. So not
+	 * necessary to call mlock_drain_local().
+	 */
+	folio_put(folio);
+
+out:
+	return ret;
+}
+
 /*
  * @arg: enum ttu_flags will be passed to this argument
  */
@@ -1506,86 +1603,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			break;
 		}
 
+		address = pvmw.address;
+		if (folio_test_hugetlb(folio)) {
+			ret = try_to_unmap_one_hugetlb(folio, vma, range,
+							pvmw, address, flags);
+
+			/* no need to loop for hugetlb */
+			page_vma_mapped_walk_done(&pvmw);
+			break;
+		}
+
 		subpage = folio_page(folio,
 					pte_pfn(*pvmw.pte) - folio_pfn(folio));
-		address = pvmw.address;
 		anon_exclusive = folio_test_anon(folio) &&
 				 PageAnonExclusive(subpage);
 
-		if (folio_test_hugetlb(folio)) {
-			bool anon = folio_test_anon(folio);
-
-			/*
-			 * The try_to_unmap() is only passed a hugetlb page
-			 * in the case where the hugetlb page is poisoned.
-			 */
-			VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+		/* Nuke the page table entry. */
+		if (should_defer_flush(mm, flags)) {
 			/*
-			 * huge_pmd_unshare may unmap an entire PMD page.
-			 * There is no way of knowing exactly which PMDs may
-			 * be cached for this mm, so we must flush them all.
-			 * start/end were already adjusted above to cover this
-			 * range.
+			 * We clear the PTE but do not flush so potentially
+			 * a remote CPU could still be writing to the folio.
+			 * If the entry was previously clean then the
+			 * architecture must guarantee that a clear->dirty
+			 * transition on a cached TLB entry is written through
+			 * and traps if the PTE is unmapped.
 			 */
-			flush_cache_range(vma, range.start, range.end);
+			pteval = ptep_get_and_clear(mm, address, pvmw.pte);
 
-			/*
-			 * To call huge_pmd_unshare, i_mmap_rwsem must be
-			 * held in write mode.  Caller needs to explicitly
-			 * do this outside rmap routines.
-			 *
-			 * We also must hold hugetlb vma_lock in write mode.
-			 * Lock order dictates acquiring vma_lock BEFORE
-			 * i_mmap_rwsem.  We can only try lock here and fail
-			 * if unsuccessful.
-			 */
-			if (!anon) {
-				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
-				if (!hugetlb_vma_trylock_write(vma)) {
-					page_vma_mapped_walk_done(&pvmw);
-					ret = false;
-					break;
-				}
-				if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
-					hugetlb_vma_unlock_write(vma);
-					flush_tlb_range(vma,
-						range.start, range.end);
-					mmu_notifier_invalidate_range(mm,
-						range.start, range.end);
-					/*
-					 * The ref count of the PMD page was
-					 * dropped which is part of the way map
-					 * counting is done for shared PMDs.
-					 * Return 'true' here.  When there is
-					 * no other sharing, huge_pmd_unshare
-					 * returns false and we will unmap the
-					 * actual page and drop map count
-					 * to zero.
-					 */
-					page_vma_mapped_walk_done(&pvmw);
-					break;
-				}
-				hugetlb_vma_unlock_write(vma);
-			}
-			pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+			set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
 		} else {
-			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-			/* Nuke the page table entry. */
-			if (should_defer_flush(mm, flags)) {
-				/*
-				 * We clear the PTE but do not flush so potentially
-				 * a remote CPU could still be writing to the folio.
-				 * If the entry was previously clean then the
-				 * architecture must guarantee that a clear->dirty
-				 * transition on a cached TLB entry is written through
-				 * and traps if the PTE is unmapped.
-				 */
-				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
-
-				set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-			} else {
-				pteval = ptep_clear_flush(vma, address, pvmw.pte);
-			}
+			pteval = ptep_clear_flush(vma, address, pvmw.pte);
 		}
 
 		/*
@@ -1604,14 +1652,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 
 		if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
 			pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
-			if (folio_test_hugetlb(folio)) {
-				hugetlb_count_sub(folio_nr_pages(folio), mm);
-				set_huge_pte_at(mm, address, pvmw.pte, pteval);
-			} else {
-				dec_mm_counter(mm, mm_counter(&folio->page));
-				set_pte_at(mm, address, pvmw.pte, pteval);
-			}
-
+			dec_mm_counter(mm, mm_counter(&folio->page));
+			set_pte_at(mm, address, pvmw.pte, pteval);
 		} else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
 			/*
 			 * The guest indicated that the page content is of no