From patchwork Tue Apr 15 14:57:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dev Jain X-Patchwork-Id: 14052374 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59123C369AB for ; Tue, 15 Apr 2025 14:57:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2ED0E2800CA; Tue, 15 Apr 2025 10:57:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 298572800BA; Tue, 15 Apr 2025 10:57:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 09BFE2800CA; Tue, 15 Apr 2025 10:57:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DA5CB2800BA for ; Tue, 15 Apr 2025 10:57:46 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1091DC157E for ; Tue, 15 Apr 2025 14:57:48 +0000 (UTC) X-FDA: 83336582616.01.287B0E2 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf17.hostedemail.com (Postfix) with ESMTP id 726884000D for ; Tue, 15 Apr 2025 14:57:46 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744729066; a=rsa-sha256; cv=none; b=r0F//q740kOZtMNKTPd8zP4gAmw2T1saRfYjnMkjug/GiE8lGCnzzRM8mezTSH6Hl0F75Q yyC8jWQZJipU82Kgka4A1U5Ow5o5umUtnqLfIBe0+s6NuzT7OxU40CWAgGKzXaEx++Dr+s sVlNfY02+hJX9vRvs+W5Kdv2gwTvI6s= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744729066; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=6fioP5ZeqJ1e7yg6R471pi3gl8kAIUpypjV+TecTqGw=; b=hp1MJmTNQVqIpHTAn0hQXlYkbkT0M5ImeuhekQ1rrOqnlFzD2JdH5GupMLXs7mvuv63hrI 7JcCh25z4OlIJni3wNDnX67G1WQPiOPFQTpUuNpnfSx2iJHz/BJKmx4IhZ7Lb3AiWAMwhh +x0LtiLtFieijd8lvQi4TPdAECLeGbk= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 845B3152B; Tue, 15 Apr 2025 07:57:43 -0700 (PDT) Received: from K4MQJ0H1H2.arm.com (unknown [10.163.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id D127B3F59E; Tue, 15 Apr 2025 07:57:40 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, hughd@google.com, vishal.moola@gmail.com, yang@os.amperecomputing.com, ziy@nvidia.com, Dev Jain Subject: [PATCH v2] mempolicy: Optimize queue_folios_pte_range by PTE batching Date: Tue, 15 Apr 2025 20:27:31 +0530 Message-Id: <20250415145731.86281-1-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 726884000D X-Rspamd-Server: rspam04 X-Stat-Signature: 7wcxndfbrjsh6n1s4hxbcqd1e5t4bbjd X-HE-Tag: 1744729066-873210 X-HE-Meta: U2FsdGVkX19+B8Lf1BNzpLXzyoQ86eYx1+Acwk+e0Uc5BU/LKieUHYaN+vE2+toE7F8pFIWOLCUmfn/8kt1avgPDjh/2bZovA6+Dy4cVankFx1QU3bAe0LH/3BHe0rfojqYuGKOSoPPI958bURta7m6TccRffEXK4E2AqpBuOQjTZUZ2QFR6FRStlamlNuRXwsssVpi5cf1YOLmICneI/ZPeDF9y/qx1JUttXQQukSdadqCDdNUtKCbPsrn/OK8wiJFnhGzj7P7n0ZLdMtrzYCLb7LdWhBymKL1A39aUkQ3R+08AhWjCpIQg7ycjL6f/Rq/3vHTP1YKez3zBhc+IFMzMGLli5GOrzOcYy0omMveGASUwBFPdyV9iIRxEKEpLkmWlo4qpz6UXltcFhwOlzxGxIROzWZ+iqoqwZVLm5LyF9ZEhmA7sda3fMCrXAL0LyU0B9F1LOkjXYtKdfj38VegqwIDpaCGOAmOblFxlU3nUti6pl/pK0/mh9UqLMK9LtjAR5iNg2zl7lyx6LZHI7Y6wn6tQFnws1Bt4InSI6mJ04uhrzT55ajFL3/m4VRzWTYphelt+fKcxYY2A1UAn5IfDCH7U8twvxobxWsdACJ0a0OUwqxcNMFxmnbNhzy8QEvq0fmI8Xx5pTEaPt+iyttNPnMDWHsSbpEqR+mByljC8vt/vSuUnnoDBdBKpuPu5+r6n0SbCbJmxQ5aknjx7R5Ne5WZaMTOwz2b0i1CvmJ9EmGSMr2yFeqkElz8swgzMMqyjj9+a4XJqA41//5Bu3nWTan5TNRLvqhv4Jz1riJQmKTVbHqeHDpznI9FpqVZdSWESO6a0MeqRkg1NBOnLUbMbmcOfzJqOh4QKDl56mBYmfh1/7cihRJe7AcZNmLhPttDN7GLwTDbJHp5ouUxxV+jNXgnLlgV9++HmHWJnmlEoZvRvhEbVlnHlb4jaNMtul7ex/U5ttdhpmjd6U8O Ysg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: After the check for queue_folio_required(), the code only cares about the folio in the for loop, i.e the PTEs are redundant. Therefore, optimize this loop by skipping over a PTE batch mapping the same folio. With a test program migrating pages of the calling process, which includes a mapped VMA of size 4GB with pte-mapped large folios of order-9, and migrating once back and forth node-0 and node-1, the average execution time reduces from 7.5 to 4 seconds, giving an approx 47% speedup. v1->v2: - Follow reverse xmas tree declarations - Don't initialize nr - Move folio_pte_batch() immediately after retrieving a normal folio - increment nr_failed in one shot Signed-off-by: Dev Jain Acked-by: David Hildenbrand --- mm/mempolicy.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index b28a1e6ae096..ca90cdcd3207 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -566,6 +566,7 @@ static void queue_folios_pmd(pmd_t *pmd, struct mm_walk *walk) static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; struct vm_area_struct *vma = walk->vma; struct folio *folio; struct queue_pages *qp = walk->private; @@ -573,6 +574,7 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, pte_t *pte, *mapped_pte; pte_t ptent; spinlock_t *ptl; + int max_nr, nr; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -586,7 +588,8 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, walk->action = ACTION_AGAIN; return 0; } - for (; addr != end; pte++, addr += PAGE_SIZE) { + for (; addr != end; pte += nr, addr += nr * PAGE_SIZE) { + nr = 1; ptent = ptep_get(pte); if (pte_none(ptent)) continue; @@ -598,6 +601,11 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, folio = vm_normal_folio(vma, addr, ptent); if (!folio || folio_is_zone_device(folio)) continue; + if (folio_test_large(folio) && + (max_nr = ((end - addr) >> PAGE_SHIFT)) != 1) + nr = folio_pte_batch(folio, addr, pte, ptent, + max_nr, fpb_flags, + NULL, NULL, NULL); /* * vm_normal_folio() filters out zero pages, but there might * still be reserved folios to skip, perhaps in a VDSO. @@ -630,7 +638,7 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(vma) || !migrate_folio_add(folio, qp->pagelist, flags)) { - qp->nr_failed++; + qp->nr_failed += nr; if (strictly_unmovable(flags)) break; }