From patchwork Wed Apr 16 05:30:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dev Jain X-Patchwork-Id: 14053176 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74AFEC369B1 for ; Wed, 16 Apr 2025 05:31:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 852996B00B4; Wed, 16 Apr 2025 01:31:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8021C6B00B5; Wed, 16 Apr 2025 01:31:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C99A6B00B6; Wed, 16 Apr 2025 01:31:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4D4986B00B4 for ; Wed, 16 Apr 2025 01:31:06 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 33A021A2112 for ; Wed, 16 Apr 2025 05:31:07 +0000 (UTC) X-FDA: 83338783374.08.DA30444 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf11.hostedemail.com (Postfix) with ESMTP id C469140007 for ; Wed, 16 Apr 2025 05:31:05 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf11.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744781465; a=rsa-sha256; cv=none; b=1HZnLyDK3YHWn6C+w3BksQ13bFHFiJ3K4yfxuS2NfijkTAJpx59slRTFaLRSjhI8w/kBHP 4mZ6fy7UADwR0EmLMD+hZLfcG4lmtlFpY2cj8n+ef+YTse2xPXgczZDMJcuQmSW3fupSDU fYCC6r30q2CHslb3XKIgRhMkVz0YVYM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf11.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744781465; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=wrghUSiyR1VTO6t5+NtRL8m8x1fo0uIL7j3JzLgjI3o=; b=LMvJgTDro3KgtK5d6FUr+668e9fx9aO2WUrUFZRuTsvCIerio5izkDAGlLHpCvwC3MC2l2 HudgHysOjANWaeKlkgGCfYtOvT3MJD2Pozr/k1S0yywgJ8LHXRtKZ+Hgo6DcoIk8R8InSf BPXAvcbEduIpTpXLh+IuQjFmm/POUA8= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DF045152B; Tue, 15 Apr 2025 22:31:02 -0700 (PDT) Received: from K4MQJ0H1H2.arm.com (unknown [10.163.75.121]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id A92473F66E; Tue, 15 Apr 2025 22:30:59 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, hughd@google.com, vishal.moola@gmail.com, yang@os.amperecomputing.com, ziy@nvidia.com, Dev Jain Subject: [PATCH v3] mempolicy: Optimize queue_folios_pte_range by PTE batching Date: Wed, 16 Apr 2025 11:00:48 +0530 Message-Id: <20250416053048.96479-1-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C469140007 X-Stat-Signature: 8buo5okb56qucgfs6hbg3cf83orq43i6 X-HE-Tag: 1744781465-332153 X-HE-Meta: U2FsdGVkX18tFBz02lQKpm/GNXkq8/2CMzqMl7TniBB5S7CoBJfilCmw7veP79wKhYdW9SP7Q6STzVRh8Du58r/Uqd2mPVhzrAVeN5sxJFm5bG5bij/gfluXUVzKmkB+hFaUP6gNvzwmIdXm078Pn2ouadG3MlUj47kwCi3oqtq2tkfeV41m/QeEIia7h/46k8TqgN4tM0qv2V660hK0dEVCbbBoLlhCWo9YQrU8GsM2oIBwn440NUb7lHBnzvJoz7kYLZINWDrc+5K84inZltavnsyhLZIECIXFWqHKhc+y7W95/rQE5bq5s2fw3ApfHzfXTh+YwFHcZaKvhg1c2sd7cXl4jviUoauI6vLAwVk74TrS8gfj8ZkOXnWHwSTugDIUYAkmuBpWz4WQ08LgLF4KXtpQHfz3sfqMpIH1fP7TB3E2vg/1zG+m4iG2oIS1Z4TO40MZhZ8UNqZM+PAtgBm4m+KpNYWcIrAKRvo1U5EZHSV6ljpsuI7wZWZuimUg3eeY1au6I41smCSqaWfJBVKSrqDMLZ1OXSmgUmX1fkzkOj1n40luRr5b/ujgKWUQ6hDr1b++cCoytjJ0RQmZihDlWnLg1HXoUkNwPpcJc2XfH31PLft6AaeecgWByWGrJ7wOpBt11NC5a/WU6viUIk9a3Id82ZfuoNFtlYeifqk1VZmYkVcqXrCb8tcSoHCNwGko5L8nN+ld8kuIhAWYEJr4jFVhOEdWlOtwQJN+yrwgn+3HdsmJKPIQ9cFtxWQvIagNIj3vqaKUn+qlbEnDaiF/0/mdUPd0j/Yd5q3MLacvMFw9z9jFwrgghVNMSm6lXJvPYtts/+uSNGuxKaf2OsQBLkDY9CDhLH8DQYGULSyYPDxLKoRK5ytu0FugOB4udEFvPLWUWSrHormaBiQxV9l15ZEVm2huuyYtgTyYRhixNFBqxVA3S42KrdPXjd0SuUrsTyfKDNoj5WiQcH7 ufA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: After the check for queue_folio_required(), the code only cares about the folio in the for loop, i.e the PTEs are redundant. Therefore, optimize this loop by skipping over a PTE batch mapping the same folio. With a test program migrating pages of the calling process, which includes a mapped VMA of size 4GB with pte-mapped large folios of order-9, and migrating once back and forth node-0 and node-1, the average execution time reduces from 7.5 to 4 seconds, giving an approx 47% speedup. v2->v3: - Don't use assignment in if condition v1->v2: - Follow reverse xmas tree declarations - Don't initialize nr - Move folio_pte_batch() immediately after retrieving a normal folio - increment nr_failed in one shot Acked-by: David Hildenbrand Signed-off-by: Dev Jain --- mm/mempolicy.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index b28a1e6ae096..4d2dc8b63965 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -566,6 +566,7 @@ static void queue_folios_pmd(pmd_t *pmd, struct mm_walk *walk) static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; struct vm_area_struct *vma = walk->vma; struct folio *folio; struct queue_pages *qp = walk->private; @@ -573,6 +574,7 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, pte_t *pte, *mapped_pte; pte_t ptent; spinlock_t *ptl; + int max_nr, nr; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -586,7 +588,9 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, walk->action = ACTION_AGAIN; return 0; } - for (; addr != end; pte++, addr += PAGE_SIZE) { + for (; addr != end; pte += nr, addr += nr * PAGE_SIZE) { + max_nr = (end - addr) >> PAGE_SHIFT; + nr = 1; ptent = ptep_get(pte); if (pte_none(ptent)) continue; @@ -598,6 +602,10 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, folio = vm_normal_folio(vma, addr, ptent); if (!folio || folio_is_zone_device(folio)) continue; + if (folio_test_large(folio) && max_nr != 1) + nr = folio_pte_batch(folio, addr, pte, ptent, + max_nr, fpb_flags, + NULL, NULL, NULL); /* * vm_normal_folio() filters out zero pages, but there might * still be reserved folios to skip, perhaps in a VDSO. @@ -630,7 +638,7 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(vma) || !migrate_folio_add(folio, qp->pagelist, flags)) { - qp->nr_failed++; + qp->nr_failed += nr; if (strictly_unmovable(flags)) break; }