From patchwork Mon Sep 25 08:35:03 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Hugh Dickins <hughd@google.com>
X-Patchwork-Id: 13397490
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 51450CE7A81
	for <linux-mm@archiver.kernel.org>; Mon, 25 Sep 2023 08:35:10 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E5A368D0019; Mon, 25 Sep 2023 04:35:09 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DE36D8D0001; Mon, 25 Sep 2023 04:35:09 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CB6518D0019; Mon, 25 Sep 2023 04:35:09 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com
 [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id B8B998D0001
	for <linux-mm@kvack.org>; Mon, 25 Sep 2023 04:35:09 -0400 (EDT)
Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 87716140AB5
	for <linux-mm@kvack.org>; Mon, 25 Sep 2023 08:35:09 +0000 (UTC)
X-FDA: 81274459938.26.E135388
Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com
 [209.85.219.173])
	by imf29.hostedemail.com (Postfix) with ESMTP id AFE2512000E
	for <linux-mm@kvack.org>; Mon, 25 Sep 2023 08:35:07 +0000 (UTC)
Authentication-Results: imf29.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=jAeratV7;
	spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates
 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1695630907;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=ezzNrLDnqV/k6jp9dH7Pe4D9g9Anl40AHbeIOwXqGUY=;
	b=EdEQXo04Xt//SFPY8Vwhqpvn8wQkkb++ge0tlMkJQm7NS1yTDxa8LJlKB8bx2R+lIzYWAL
	smSUcRt9PY1hW+mmtZqEVjqNRhzR5Snj5LgcJaO2dXj4WBcKI2JVbHt3KMLB42/HE/y9vF
	XIvW5zfEsrQ97DoKW4rs3a5olq3qln4=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630907; a=rsa-sha256;
	cv=none;
	b=u6sUO2sxQUBw3sJfY+JfsCsQIpykBQ/b3swflt5hnNdqduJkfenrPIlIdnMe3V+5bHWvwa
	f6hhEMN88fWqDDMNirwipScxvj1uo9YH8xbj2dbm/FMRVohJqB4qPGtvN8o64STazWJCtG
	eeauTF+yCxxfp85ZxOJbZ9TOSFpwLLY=
ARC-Authentication-Results: i=1;
	imf29.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=jAeratV7;
	spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates
 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com;
	dmarc=pass (policy=reject) header.from=google.com
Received: by mail-yb1-f173.google.com with SMTP id
 3f1490d57ef6-d867d4cf835so2976519276.1
        for <linux-mm@kvack.org>; Mon, 25 Sep 2023 01:35:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1695630907; x=1696235707; darn=kvack.org;
        h=mime-version:references:message-id:in-reply-to:subject:cc:to:from
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=ezzNrLDnqV/k6jp9dH7Pe4D9g9Anl40AHbeIOwXqGUY=;
        b=jAeratV7w6LMIdPonXtSUSXBkUDPo+nJVlVaZPkV4nMCSRjplpw1qwQ7j5JbF1XRQr
         egkyP0P1xMCJGQTo4hE1gGWS3yCnA61AIrukHcIvQnJ382jHgs1a4BRTfyYcs5FXkZC8
         YgPU9MRZ+C8PPxp6bzGNHZFfbk8vN5kcJde2XRjFt0JCICkkxy1tvOPc6PAXn5zsAORH
         nyInffigVyr/b6y0aSgx92NxnV85IgA1DkTKepT36hHcnOgk3BR/T6KoB6J2Q7fsnOc6
         fhqSmFhvlL4pOsQSuLJhVbylAyQ7IeV6Rdu8x2bi4jFI436Rynv9CEeCWYK+s2+jlzni
         fzCQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1695630907; x=1696235707;
        h=mime-version:references:message-id:in-reply-to:subject:cc:to:from
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=ezzNrLDnqV/k6jp9dH7Pe4D9g9Anl40AHbeIOwXqGUY=;
        b=TyPyaqyXLK3/VQlNUxsdqyeGeFM0CNkjUbOT5f2mRf95a/3/DIgbl/ZoVoWgtThdGx
         GKFbOlp/doivj7QEVGE/RRAVJsx/mz/iGHVGLTYaGjG9340bub/4g/pbbqR3RlLBu8Bi
         VTWlGAZ5luHjga9kIBWtbQZUlRAaksogudXZmdwr4J2dzhhRGw14zJI/AbyJTJAaweI6
         2ZmOJRxVq3+UbljlbJtdx1Mye+yDda4116lbytWd/C0m5O3onB6GKGVywxWX0PlyliHP
         ALYqV/KzWOJzNeUIxAJh8kaeF/0/a8xeA6R62DNMKohnXDzIcn9IwCYqdoeBVh0ZBAs3
         Oi8A==
X-Gm-Message-State: AOJu0YylBEstmvhrdMD0i1mJIh0PQtWCMbozKM/1CFNSrXGa8V1rY+8i
	853wJCOn6LeN7hmX0iou2aySAqQWAchQqsuZexuaUQ==
X-Google-Smtp-Source: 
 AGHT+IFp0uZwCYh9f6Ww0ZKvnQPx4qXewNgiqFJbqCBgiDfI2aX3U5liWK+g0EmK6zec2o4LhHt/pg==
X-Received: by 2002:a05:690c:d93:b0:59f:8026:4260 with SMTP id
 da19-20020a05690c0d9300b0059f80264260mr2238947ywb.24.1695630906593;
        Mon, 25 Sep 2023 01:35:06 -0700 (PDT)
Received: from ripple.attlocal.net
 (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147])
        by smtp.gmail.com with ESMTPSA id
 c188-20020a0df3c5000000b0059293c8d70csm2293994ywf.132.2023.09.25.01.35.04
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 25 Sep 2023 01:35:05 -0700 (PDT)
Date: Mon, 25 Sep 2023 01:35:03 -0700 (PDT)
From: Hugh Dickins <hughd@google.com>
X-X-Sender: hugh@ripple.attlocal.net
To: Andrew Morton <akpm@linux-foundation.org>
cc: Andi Kleen <ak@linux.intel.com>, Christoph Lameter <cl@linux.com>,
    Matthew Wilcox <willy@infradead.org>,
    Mike Kravetz <mike.kravetz@oracle.com>,
    David Hildenbrand <david@redhat.com>,
    Suren Baghdasaryan <surenb@google.com>, Yang Shi <shy828301@gmail.com>,
    Sidhartha Kumar <sidhartha.kumar@oracle.com>,
    Vishal Moola <vishal.moola@gmail.com>,
    Kefeng Wang <wangkefeng.wang@huawei.com>,
    Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
 Tejun Heo <tj@kernel.org>,
    Mel Gorman <mgorman@techsingularity.net>, Michal Hocko <mhocko@suse.com>,
    linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 11/12] mempolicy: mmap_lock is not needed while migrating
 folios
In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com>
Message-ID: <73183de1-6529-b146-f2cc-fcd5b812166@google.com>
References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com>
MIME-Version: 1.0
X-Rspamd-Queue-Id: AFE2512000E
X-Rspam-User: 
X-Stat-Signature: qur6ccqtgmypo9j8tjz6or47adjmqnnw
X-Rspamd-Server: rspam03
X-HE-Tag: 1695630907-354767
X-HE-Meta: 
 U2FsdGVkX19DmypxcSdvzoNVax9hpxwd0EsGNiVLiz67EnCAh3OotTWsb4PyBB0dzdvrFG2iXF4Pf7fNVeFWje5mdZtejYJ80eeVnD8tFUnkcsoa8QksrvXSr00oVunlZ0xNUc9JzJ1ihjUm8YYK5aotUKqOdokVAgaw9GMdL/W/6P7GSTMa3x4lCb/0DSo8L2n0qVncUqb3dejOiAXXzHpRiCS0tgvXkMw11S++yE24X66weymjg368S2hDGEAWJPmWPwizCilX5oz4bQAX4yTK9PQf25GfveRcI8Iejwa17Xh6Ez9cNjt9Rt39bx8ZMGmp0fGSr7/O2fYh21W9qRy7bBVG1F4xgeVwCNdXiL2ASXKOeIHsJB6ajlorX7BdWAFuooV0dAL02GCJJ3/vuD2RuzXnHXFwDRGITIJB6z3U/7Skoegej9aMFrteKbMkKEpDHoXpD6kaWzhHMijsrgX1I4guFi/0PbpdceyMft/7OmdhNP2KIYKlPCkZ8WpNkMVt4JpJaL/z3Nfe4JLZ3xX4hpz+YI0vF+AXgRmHkIsI96d+1dy8Pg5EaBiH7Lrl9+43UX+QDPsU39PBF9DY695UwXA4K70oGlaPcqgbO5fUx55kXPZqeBp1TEINbAbIkaCuTHvnjABrFgJLQ57ZG0hUk6Qq5K5e2NlOvRvBrobDSPgNKJI6PtHm6GvQnaiFegDWz1a7FYGxgXMzlRk8bSbxJDoUYojFK3pfJEC/ajWQPzmA2cy2n/yjCmzG+TpGdzbhpJUEstHY/gAUzuegYuznhvkniaeK/3x6S3UYmgROxlztCzrCpbUwFccCJRPxM3qh8rKw8jmXKkHkjkX7hpC5bCvJc4uxbIkua9n9eNrQ/gSUrTRzVYP49JgIgWZw/eYK8MPVW4lOjJEk36oXBWnQKxLwaLgXxokUMgUfdZgv5S225Gqgx3XXMLmLrIgXbr5NmxJ8p6qTVPu2Y4D
 sk1KzLOb
 AMqnVeM0IqR3J0vMn4Wqa6MY+cerrHWDwxWBBFa5iJrawFd7VNZugaCa5QXw8j0TlFoCfhkfoQZrNxP45K8Z+iN/xj2Xw77wM3XE+VMcu77wvHIGZq+6mc6qDZrBTAcey+IOLgYwAJVPIIEMHakjCQKA0gVL+nRba+o1gocJV++WZXDGG9MK5zKwE2EOCgU1GJbLu+Ve7J//ECSSo9G2tLbNS2PWVjiA39D2jSAijQGlYWyPYsHSdjylkI2UXlc4DIr93Q6Uyg+igM+9V+4P5/5UZMGR2LekDKGUhydTLBLf9r/1jP7l/IvgekDsPl0MsCEKRUiydcMqKGptctuoZ8EuC/nObhGp/z5f3gNbf1ZMbXRKWLnmTYNbsWwPA3VZPhkqD76T08PUO7ZkfLGj5UV72wx1iQm86Yb6zgr3dJBM4KLOQJ5siWe7sTaoXy2gH6URBJ9BupPl/BRsyEvYBvXk4C9VZpnUFVe5KgWhO/SK4CsGynptsGyg6lppc0QGQRX2X
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

mbind(2) holds down_write of current task's mmap_lock throughout
(exclusive because it needs to set the new mempolicy on the vmas);
migrate_pages(2) holds down_read of pid's mmap_lock throughout.

They both hold mmap_lock across the internal migrate_pages(), under which
all new page allocations (huge or small) are made.  I'm nervous about it;
and migrate_pages() certainly does not need mmap_lock itself.  It's done
this way for mbind(2), because its page allocator is vma_alloc_folio() or
alloc_hugetlb_folio_vma(), both of which depend on vma and address.

Now that we have alloc_pages_mpol(), depending on (refcounted) memory
policy and interleave index, mbind(2) can be modified to use that or
alloc_hugetlb_folio_nodemask(), and then not need mmap_lock across the
internal migrate_pages() at all: add alloc_migration_target_by_mpol()
to replace mbind's new_page().

(After that change, alloc_hugetlb_folio_vma() is used by nothing but a
userfaultfd function: move it out of hugetlb.h and into the #ifdef.)

migrate_pages(2) has chosen its target node before migrating, so can
continue to use the standard alloc_migration_target(); but let it take
and drop mmap_lock just around migrate_to_node()'s queue_pages_range():
neither the node-to-node calculations nor the page migrations need it.

It seems unlikely, but it is conceivable that some userspace depends on
the kernel's mmap_lock exclusion here, instead of doing its own locking:
more likely in a testsuite than in real life.  It is also possible, of
course, that some pages on the list will be munmapped by another thread
before they are migrated, or a newer memory policy applied to the range
by that time: but such races could happen before, as soon as mmap_lock
was dropped, so it does not appear to be a concern.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/hugetlb.h |  9 -----
 mm/hugetlb.c            | 38 +++++++++---------
 mm/mempolicy.c          | 85 +++++++++++++++++++++--------------------
 3 files changed, 64 insertions(+), 68 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6522eb3cd007..9c4265c73f76 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -714,8 +714,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
 struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
 				nodemask_t *nmask, gfp_t gfp_mask);
-struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma,
-				unsigned long address);
 int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
 			pgoff_t idx);
 void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
@@ -1024,13 +1022,6 @@ alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
 	return NULL;
 }
 
-static inline struct folio *alloc_hugetlb_folio_vma(struct hstate *h,
-					       struct vm_area_struct *vma,
-					       unsigned long address)
-{
-	return NULL;
-}
-
 static inline int __alloc_bootmem_huge_page(struct hstate *h)
 {
 	return 0;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ba6d39b71cb1..1af54dbbd7cc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2479,24 +2479,6 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
 	return alloc_migrate_hugetlb_folio(h, gfp_mask, preferred_nid, nmask);
 }
 
-/* mempolicy aware migration callback */
-struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma,
-		unsigned long address)
-{
-	struct mempolicy *mpol;
-	nodemask_t *nodemask;
-	struct folio *folio;
-	gfp_t gfp_mask;
-	int node;
-
-	gfp_mask = htlb_alloc_mask(h);
-	node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
-	folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask);
-	mpol_cond_put(mpol);
-
-	return folio;
-}
-
 /*
  * Increase the hugetlb pool such that it can accommodate a reservation
  * of size 'delta'.
@@ -6225,6 +6207,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 }
 
 #ifdef CONFIG_USERFAULTFD
+/*
+ * Can probably be eliminated, but still used by hugetlb_mfill_atomic_pte().
+ */
+static struct folio *alloc_hugetlb_folio_vma(struct hstate *h,
+		struct vm_area_struct *vma, unsigned long address)
+{
+	struct mempolicy *mpol;
+	nodemask_t *nodemask;
+	struct folio *folio;
+	gfp_t gfp_mask;
+	int node;
+
+	gfp_mask = htlb_alloc_mask(h);
+	node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
+	folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask);
+	mpol_cond_put(mpol);
+
+	return folio;
+}
+
 /*
  * Used by userfaultfd UFFDIO_* ioctls. Based on userfaultfd's mfill_atomic_pte
  * with modifications for hugetlb pages.
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d74df1e1b14a..74b1894d29c1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -417,6 +417,8 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
 
 static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist,
 				unsigned long flags);
+static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
+				pgoff_t ilx, int *nid);
 
 static bool strictly_unmovable(unsigned long flags)
 {
@@ -1040,6 +1042,8 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
 	node_set(source, nmask);
 
 	VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
+
+	mmap_read_lock(mm);
 	vma = find_vma(mm, 0);
 
 	/*
@@ -1050,6 +1054,7 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
 	 */
 	nr_failed = queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask,
 				      flags | MPOL_MF_DISCONTIG_OK, &pagelist);
+	mmap_read_unlock(mm);
 
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, alloc_migration_target, NULL,
@@ -1078,8 +1083,6 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 
 	lru_cache_disable();
 
-	mmap_read_lock(mm);
-
 	/*
 	 * Find a 'source' bit set in 'tmp' whose corresponding 'dest'
 	 * bit in 'to' is not also set in 'tmp'.  Clear the found 'source'
@@ -1159,7 +1162,6 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 		if (err < 0)
 			break;
 	}
-	mmap_read_unlock(mm);
 
 	lru_cache_enable();
 	if (err < 0)
@@ -1168,44 +1170,38 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 }
 
 /*
- * Allocate a new page for page migration based on vma policy.
- * Start by assuming the page is mapped by the same vma as contains @start.
- * Search forward from there, if not.  N.B., this assumes that the
- * list of pages handed to migrate_pages()--which is how we get here--
- * is in virtual address order.
+ * Allocate a new folio for page migration, according to NUMA mempolicy.
  */
-static struct folio *new_folio(struct folio *src, unsigned long start)
+static struct folio *alloc_migration_target_by_mpol(struct folio *src,
+						    unsigned long private)
 {
-	struct vm_area_struct *vma;
-	unsigned long address;
-	VMA_ITERATOR(vmi, current->mm, start);
-	gfp_t gfp = GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL;
-
-	for_each_vma(vmi, vma) {
-		address = page_address_in_vma(&src->page, vma);
-		if (address != -EFAULT)
-			break;
-	}
-
-	/*
-	 * __get_vma_policy() now expects a genuine non-NULL vma. Return NULL
-	 * when the page can no longer be located in a vma: that is not ideal
-	 * (migrate_pages() will give up early, presuming ENOMEM), but good
-	 * enough to avoid a crash by syzkaller or concurrent holepunch.
-	 */
-	if (!vma)
-		return NULL;
+	struct mempolicy *pol = (struct mempolicy *)private;
+	pgoff_t ilx = 0;	/* improve on this later */
+	struct page *page;
+	unsigned int order;
+	int nid = numa_node_id();
+	gfp_t gfp;
 
 	if (folio_test_hugetlb(src)) {
-		return alloc_hugetlb_folio_vma(folio_hstate(src),
-				vma, address);
+		nodemask_t *nodemask;
+		struct hstate *h;
+
+		ilx += src->index;	/* HugeTLBfs indexes in hpage_size */
+		h = folio_hstate(src);
+		gfp = htlb_alloc_mask(h);
+		nodemask = policy_nodemask(gfp, pol, ilx, &nid);
+		return alloc_hugetlb_folio_nodemask(h, nid, nodemask, gfp);
 	}
 
 	if (folio_test_large(src))
 		gfp = GFP_TRANSHUGE;
+	else
+		gfp = GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL | __GFP_COMP;
 
-	return vma_alloc_folio(gfp, folio_order(src), vma, address,
-			folio_test_large(src));
+	order = folio_order(src);
+	ilx += src->index >> order;
+	page = alloc_pages_mpol(gfp, order, pol, ilx, nid);
+	return page_rmappable_folio(page);
 }
 #else
 
@@ -1221,7 +1217,8 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 	return -ENOSYS;
 }
 
-static struct folio *new_folio(struct folio *src, unsigned long start)
+static struct folio *alloc_migration_target_by_mpol(struct folio *src,
+						    unsigned long private)
 {
 	return NULL;
 }
@@ -1295,6 +1292,7 @@ static long do_mbind(unsigned long start, unsigned long len,
 
 	if (nr_failed < 0) {
 		err = nr_failed;
+		nr_failed = 0;
 	} else {
 		vma_iter_init(&vmi, mm, start);
 		prev = vma_prev(&vmi);
@@ -1305,19 +1303,24 @@ static long do_mbind(unsigned long start, unsigned long len,
 		}
 	}
 
-	if (!err) {
-		if (!list_empty(&pagelist)) {
-			nr_failed |= migrate_pages(&pagelist, new_folio, NULL,
-				start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND, NULL);
+	mmap_write_unlock(mm);
+
+	if (!err && !list_empty(&pagelist)) {
+		/* Convert MPOL_DEFAULT's NULL to task or default policy */
+		if (!new) {
+			new = get_task_policy(current);
+			mpol_get(new);
 		}
-		if (nr_failed && (flags & MPOL_MF_STRICT))
-			err = -EIO;
+		nr_failed |= migrate_pages(&pagelist,
+				alloc_migration_target_by_mpol, NULL,
+				(unsigned long)new, MIGRATE_SYNC,
+				MR_MEMPOLICY_MBIND, NULL);
 	}
 
+	if (nr_failed && (flags & MPOL_MF_STRICT))
+		err = -EIO;
 	if (!list_empty(&pagelist))
 		putback_movable_pages(&pagelist);
-
-	mmap_write_unlock(mm);
 mpol_out:
 	mpol_put(new);
 	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))