From patchwork Wed Mar 10 16:14:27 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Minchan Kim <minchan@kernel.org>
X-Patchwork-Id: 12128385
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CA870C433E6
	for <linux-fsdevel@archiver.kernel.org>;
 Wed, 10 Mar 2021 16:15:15 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 9ECC564FBC
	for <linux-fsdevel@archiver.kernel.org>;
 Wed, 10 Mar 2021 16:15:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231490AbhCJQOo (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Wed, 10 Mar 2021 11:14:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54642 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231295AbhCJQOf (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Wed, 10 Mar 2021 11:14:35 -0500
Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com
 [IPv6:2607:f8b0:4864:20::52a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88FFFC061760;
        Wed, 10 Mar 2021 08:14:35 -0800 (PST)
Received: by mail-pg1-x52a.google.com with SMTP id o38so11698590pgm.9;
        Wed, 10 Mar 2021 08:14:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=sender:from:to:cc:subject:date:message-id:mime-version
         :content-transfer-encoding;
        bh=txDbp7Wx2vX6YjkhEkbwKwYgP5c1xPLVUUyDZKo4vU0=;
        b=RqZkiJOX3AzUZI0tbgk2ZL2AhsevK4JuBLj9cn+6ODa4Ec8DKsNAVu3OOcWPsBeAn+
         jfFLTa5qDcsP53lb9sX/+eYM2fVU+IGHtE36xW1v1HW9p1oS7ftygo5d9gmSkAFQEkwH
         lEONCRwxPVxtQ12E6VAsz1FiN7XXoMnrqQLfOrhFjt1SSH1SQ9pzIBHgGYM94yHLVniC
         AZ2eJSQj+5Kv9eA/ow6Qi413f+H+9qVltIPQFLCehT0dlVaTfW+ghQYvvpkK+PNo6sN8
         L3wsMig1XfKGIdQUXxTpHWNVBcOpOHHiY2Y+Y05XABIY7UmzKRE81/UK/C3GvEoWMxEX
         gRew==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :mime-version:content-transfer-encoding;
        bh=txDbp7Wx2vX6YjkhEkbwKwYgP5c1xPLVUUyDZKo4vU0=;
        b=gzxPG8SIkgok5IkbY/RjJ1ojZcHwsLLaVYO90bwCyCKEb7+Y5XZXGX/8hd7aFifWUK
         8v3jbX04TmK7JuMSybFahsFzMg0LpeDQvdXou+BdMwVqpnoAY3a0Ef1/NRR7U+wKwHke
         ai3iFQIbm8frwr+vYC6KldsTDEAbfznX8Hl/jpywPynJVKwcYGWRYPUP82MifGNwqM9P
         YkPRwKPKaGvnCpNKSIuM8co3jc9HgrKtmg3+gv6EEXj4L5KydqoVeWGxktAPiUIhiXZK
         4x28RuGFu7fOKC9tWb9j9YvtFuR1DwIemhIXnI4h6C4RvtKd5060Wj+22AeEETWM5thy
         C0BQ==
X-Gm-Message-State: AOAM5313pO73wPB9XPilh6cq3097ZWp69Rx0U+jUHeGrpBH2JOEazH1I
        lHsHgRQGaTKxbrEWNfsn/xE=
X-Google-Smtp-Source: 
 ABdhPJyWcNFIgTYVyW1GyKn81RImjZdoe1yYrSeMfSwu8whqA4bM2QbmpiG4gnMx36igaAIn9kSfRQ==
X-Received: by 2002:a65:6642:: with SMTP id z2mr3380674pgv.214.1615392875014;
        Wed, 10 Mar 2021 08:14:35 -0800 (PST)
Received: from bbox-1.mtv.corp.google.com
 ([2620:15c:211:201:64cb:74c7:f2c:e5e0])
        by smtp.gmail.com with ESMTPSA id
 d1sm7121189pjc.24.2021.03.10.08.14.33
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 10 Mar 2021 08:14:34 -0800 (PST)
Sender: Minchan Kim <minchan.kim@gmail.com>
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>,
        joaodias@google.com, surenb@google.com, cgoldswo@codeaurora.org,
        willy@infradead.org, mhocko@suse.com, david@redhat.com,
        vbabka@suse.cz, linux-fsdevel@vger.kernel.org,
        Minchan Kim <minchan@kernel.org>
Subject: [PATCH v3 1/3] mm: replace migrate_prep with lru_add_drain_all
Date: Wed, 10 Mar 2021 08:14:27 -0800
Message-Id: <20210310161429.399432-1-minchan@kernel.org>
X-Mailer: git-send-email 2.30.1.766.gb4fecdf3b7-goog
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

Currently, migrate_prep is merely a wrapper of lru_cache_add_all.
There is not much to gain from having additional abstraction.

Use lru_add_drain_all instead of migrate_prep, which would be more
descriptive.

note: migrate_prep_local in compaction.c changed into lru_add_drain
to avoid CPU schedule cost with involving many other CPUs to keep
keep old behavior.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 include/linux/migrate.h |  5 -----
 mm/compaction.c         |  3 ++-
 mm/mempolicy.c          |  4 ++--
 mm/migrate.c            | 24 +-----------------------
 mm/page_alloc.c         |  2 +-
 mm/swap.c               |  5 +++++
 6 files changed, 11 insertions(+), 32 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3a389633b68f..6155d97ec76c 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -45,8 +45,6 @@ extern struct page *alloc_migration_target(struct page *page, unsigned long priv
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
-extern void migrate_prep(void);
-extern void migrate_prep_local(void);
 extern void migrate_page_states(struct page *newpage, struct page *page);
 extern void migrate_page_copy(struct page *newpage, struct page *page);
 extern int migrate_huge_page_move_mapping(struct address_space *mapping,
@@ -66,9 +64,6 @@ static inline struct page *alloc_migration_target(struct page *page,
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
 	{ return -EBUSY; }
 
-static inline int migrate_prep(void) { return -ENOSYS; }
-static inline int migrate_prep_local(void) { return -ENOSYS; }
-
 static inline void migrate_page_states(struct page *newpage, struct page *page)
 {
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index e04f4476e68e..3be017ececc0 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2319,7 +2319,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync);
 
-	migrate_prep_local();
+	/* lru_add_drain_all could be expensive with involving other CPUs */
+	lru_add_drain();
 
 	while ((ret = compact_finished(cc)) == COMPACT_CONTINUE) {
 		int err;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index ab51132547b8..fc024e97be37 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1124,7 +1124,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 	int err = 0;
 	nodemask_t tmp;
 
-	migrate_prep();
+	lru_add_drain_all();
 
 	mmap_read_lock(mm);
 
@@ -1323,7 +1323,7 @@ static long do_mbind(unsigned long start, unsigned long len,
 
 	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
 
-		migrate_prep();
+		lru_add_drain_all();
 	}
 	{
 		NODEMASK_SCRATCH(scratch);
diff --git a/mm/migrate.c b/mm/migrate.c
index 62b81d5257aa..45f925e10f5a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -57,28 +57,6 @@
 
 #include "internal.h"
 
-/*
- * migrate_prep() needs to be called before we start compiling a list of pages
- * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
- * undesirable, use migrate_prep_local()
- */
-void migrate_prep(void)
-{
-	/*
-	 * Clear the LRU lists so pages can be isolated.
-	 * Note that pages may be moved off the LRU after we have
-	 * drained them. Those pages will fail to migrate like other
-	 * pages that may be busy.
-	 */
-	lru_add_drain_all();
-}
-
-/* Do the necessary work of migrate_prep but not if it involves other CPUs */
-void migrate_prep_local(void)
-{
-	lru_add_drain();
-}
-
 int isolate_movable_page(struct page *page, isolate_mode_t mode)
 {
 	struct address_space *mapping;
@@ -1769,7 +1747,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 	int start, i;
 	int err = 0, err1;
 
-	migrate_prep();
+	lru_add_drain_all();
 
 	for (i = start = 0; i < nr_pages; i++) {
 		const void __user *p;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2e8348936df8..f05a8db741ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8467,7 +8467,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
 	};
 
-	migrate_prep();
+	lru_add_drain_all();
 
 	while (pfn < end || !list_empty(&cc->migratepages)) {
 		if (fatal_signal_pending(current)) {
diff --git a/mm/swap.c b/mm/swap.c
index 31b844d4ed94..441d1ae1f285 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -729,6 +729,11 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
 }
 
 /*
+ * lru_add_drain_all() usually needs to be called before we start compiling
+ * a list of pages to be migrated using isolate_lru_page(). Note that pages
+ * may be moved off the LRU after we have drained them. Those pages will
+ * fail to migrate like other pages that may be busy.
+ *
  * Doesn't need any cpu hotplug locking because we do rely on per-cpu
  * kworkers being shut down before our page_alloc_cpu_dead callback is
  * executed on the offlined cpu.

From patchwork Wed Mar 10 16:14:28 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Minchan Kim <minchan@kernel.org>
X-Patchwork-Id: 12128389
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EF4ACC43381
	for <linux-fsdevel@archiver.kernel.org>;
 Wed, 10 Mar 2021 16:15:15 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id CC58364FBD
	for <linux-fsdevel@archiver.kernel.org>;
 Wed, 10 Mar 2021 16:15:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232284AbhCJQOo (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Wed, 10 Mar 2021 11:14:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54650 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231359AbhCJQOh (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Wed, 10 Mar 2021 11:14:37 -0500
Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com
 [IPv6:2607:f8b0:4864:20::52b])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AFD9C061760;
        Wed, 10 Mar 2021 08:14:37 -0800 (PST)
Received: by mail-pg1-x52b.google.com with SMTP id 16so5276827pgo.13;
        Wed, 10 Mar 2021 08:14:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=sender:from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=KFzn+HW7QU1YuokVuuLvjBXwnGWstmVEZGxg/O7T27g=;
        b=VlZhYEdYh2D1yn1Rbg0s72eZSpzYr0KpDPCmHl4nZjWVVDUT1XCmSMAySiCMOahHBD
         BraIyK0n++UWwDVoSSm+eC/FCVf0t16sFrkd965utyuDbnExST4iFgM91XlLiBPDCSrj
         5zcOdZhmUzvyY8aCuS092HxpuSkfgLlVscRSxImjRJIiUtMTFg5KLPkf1d5eEa6FvG3H
         j2rVIbDq9lxEJaLhvRGGOw2iv5qVRUxxKML2bwv5eMsB2GX/JkPmkUnr0IMELDLoxsnF
         RHHOFXKlCx93cONQgM5T0JUlYdfyPvKwihal32BEsQjVJuYJtW01p3Z/3L4GyQJ8LMxm
         nmzw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :in-reply-to:references:mime-version:content-transfer-encoding;
        bh=KFzn+HW7QU1YuokVuuLvjBXwnGWstmVEZGxg/O7T27g=;
        b=Ub/mfJNYWiCFqR3PE9Tq8bjcTXlZtnvtR9kHSqNYWrpiE+7lD0xeHtivW/FzTcEbLS
         vcWSWmfVuO9Lco/BA5rVCrXKjGJrz9RzQJkAsOZEsXUooAdvkpna+JuVBveiarmV5xAL
         ZR57J3M+6N4K2qtS81CRRGpeD8BkD7AktI7eKmStQUx4Alq2dt6er/d3DkLgIN1Vwgpq
         qqd9w4pgvIVyuBxYBO1kOPfr8E683au17C2TmM1aVAwyWMCU0uF+fLvZo2IUJ/GaJZuB
         LZau8ZjWVoLznonjbgab9MGFkXU5i9n8eb6roZKWQGnNbZfJXYwewI6ragHk45hTAJkB
         X1Ug==
X-Gm-Message-State: AOAM532g6YzvjOL+lIxl7rzed3RSXXsjsXkW1hKx5wZ1eCJeN7R37vM0
        ZLM3GUc6rIgUSh9sdQFQ/B0=
X-Google-Smtp-Source: 
 ABdhPJz4uZh+gkV3hCHXAzdT1lkzni3EbzIsdM3/Zq33xtta7u3dDjDaFK7QSYhM0+a6PCPlula+0w==
X-Received: by 2002:a62:5344:0:b029:1df:c7d:3c3e with SMTP id
 h65-20020a6253440000b02901df0c7d3c3emr3482626pfb.11.1615392877113;
        Wed, 10 Mar 2021 08:14:37 -0800 (PST)
Received: from bbox-1.mtv.corp.google.com
 ([2620:15c:211:201:64cb:74c7:f2c:e5e0])
        by smtp.gmail.com with ESMTPSA id
 d1sm7121189pjc.24.2021.03.10.08.14.35
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 10 Mar 2021 08:14:36 -0800 (PST)
Sender: Minchan Kim <minchan.kim@gmail.com>
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>,
        joaodias@google.com, surenb@google.com, cgoldswo@codeaurora.org,
        willy@infradead.org, mhocko@suse.com, david@redhat.com,
        vbabka@suse.cz, linux-fsdevel@vger.kernel.org,
        Minchan Kim <minchan@kernel.org>
Subject: [PATCH v3 2/3] mm: disable LRU pagevec during the migration
 temporarily
Date: Wed, 10 Mar 2021 08:14:28 -0800
Message-Id: <20210310161429.399432-2-minchan@kernel.org>
X-Mailer: git-send-email 2.30.1.766.gb4fecdf3b7-goog
In-Reply-To: <20210310161429.399432-1-minchan@kernel.org>
References: <20210310161429.399432-1-minchan@kernel.org>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

LRU pagevec holds refcount of pages until the pagevec are drained.
It could prevent migration since the refcount of the page is greater
than the expection in migration logic. To mitigate the issue,
callers of migrate_pages drains LRU pagevec via migrate_prep or
lru_add_drain_all before migrate_pages call.

However, it's not enough because pages coming into pagevec after the
draining call still could stay at the pagevec so it could keep
preventing page migration. Since some callers of migrate_pages have
retrial logic with LRU draining, the page would migrate at next trail
but it is still fragile in that it doesn't close the fundamental race
between upcoming LRU pages into pagvec and migration so the migration
failure could cause contiguous memory allocation failure in the end.

To close the race, this patch disables lru caches(i.e, pagevec)
during ongoing migration until migrate is done.

Since it's really hard to reproduce, I measured how many times
migrate_pages retried with force mode(it is about a fallback to a
sync migration) with below debug code.

int migrate_pages(struct list_head *from, new_page_t get_new_page,
			..
			..

if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
       printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
       dump_page(page, "fail to migrate");
}

The test was repeating android apps launching with cma allocation
in background every five seconds. Total cma allocation count was
about 500 during the testing. With this patch, the dump_page count
was reduced from 400 to 30.

The new interface is also useful for memory hotplug which currently
drains lru pcp caches after each migration failure. This is rather
suboptimal as it has to disrupt others running during the operation.
With the new interface the operation happens only once. This is also in
line with pcp allocator cache which are disabled for the offlining as
well.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
---
 include/linux/swap.h |  3 ++
 mm/memory_hotplug.c  |  3 +-
 mm/mempolicy.c       |  4 ++-
 mm/migrate.c         |  3 +-
 mm/swap.c            | 79 ++++++++++++++++++++++++++++++++++++--------
 5 files changed, 75 insertions(+), 17 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 32f665b1ee85..a3e258335a7f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -339,6 +339,9 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file,
 extern void lru_note_cost_page(struct page *);
 extern void lru_cache_add(struct page *);
 extern void mark_page_accessed(struct page *);
+extern void lru_cache_disable(void);
+extern void lru_cache_enable(void);
+extern bool lru_cache_disabled(void);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_cpu_zone(struct zone *zone);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5ba51a8bdaeb..959f659ef085 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1611,6 +1611,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 	 * in a way that pages from isolated pageblock are left on pcplists.
 	 */
 	zone_pcp_disable(zone);
+	lru_cache_disable();
 
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
@@ -1642,7 +1643,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 			}
 
 			cond_resched();
-			lru_add_drain_all();
 
 			ret = scan_movable_pages(pfn, end_pfn, &pfn);
 			if (!ret) {
@@ -1687,6 +1687,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 	zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages;
 	spin_unlock_irqrestore(&zone->lock, flags);
 
+	lru_cache_enable();
 	zone_pcp_enable(zone);
 
 	/* removal success */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index fc024e97be37..658238e69551 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1323,7 +1323,7 @@ static long do_mbind(unsigned long start, unsigned long len,
 
 	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
 
-		lru_add_drain_all();
+		lru_cache_disable();
 	}
 	{
 		NODEMASK_SCRATCH(scratch);
@@ -1371,6 +1371,8 @@ static long do_mbind(unsigned long start, unsigned long len,
 	mmap_write_unlock(mm);
 mpol_out:
 	mpol_put(new);
+	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+		lru_cache_enable();
 	return err;
 }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 45f925e10f5a..acc9913e4303 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1747,7 +1747,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 	int start, i;
 	int err = 0, err1;
 
-	lru_add_drain_all();
+	lru_cache_disable();
 
 	for (i = start = 0; i < nr_pages; i++) {
 		const void __user *p;
@@ -1816,6 +1816,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 	if (err >= 0)
 		err = err1;
 out:
+	lru_cache_enable();
 	return err;
 }
 
diff --git a/mm/swap.c b/mm/swap.c
index 441d1ae1f285..fbdf6ac05aec 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -235,6 +235,18 @@ static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec)
 	}
 }
 
+/* return true if pagevec needs to drain */
+static bool pagevec_add_and_need_flush(struct pagevec *pvec, struct page *page)
+{
+	bool ret = false;
+
+	if (!pagevec_add(pvec, page) || PageCompound(page) ||
+			lru_cache_disabled())
+		ret = true;
+
+	return ret;
+}
+
 /*
  * Writeback is about to end against a page which has been marked for immediate
  * reclaim.  If it still appears to be reclaimable, move it to the tail of the
@@ -252,7 +264,7 @@ void rotate_reclaimable_page(struct page *page)
 		get_page(page);
 		local_lock_irqsave(&lru_rotate.lock, flags);
 		pvec = this_cpu_ptr(&lru_rotate.pvec);
-		if (!pagevec_add(pvec, page) || PageCompound(page))
+		if (pagevec_add_and_need_flush(pvec, page))
 			pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
 		local_unlock_irqrestore(&lru_rotate.lock, flags);
 	}
@@ -343,7 +355,7 @@ static void activate_page(struct page *page)
 		local_lock(&lru_pvecs.lock);
 		pvec = this_cpu_ptr(&lru_pvecs.activate_page);
 		get_page(page);
-		if (!pagevec_add(pvec, page) || PageCompound(page))
+		if (pagevec_add_and_need_flush(pvec, page))
 			pagevec_lru_move_fn(pvec, __activate_page);
 		local_unlock(&lru_pvecs.lock);
 	}
@@ -458,7 +470,7 @@ void lru_cache_add(struct page *page)
 	get_page(page);
 	local_lock(&lru_pvecs.lock);
 	pvec = this_cpu_ptr(&lru_pvecs.lru_add);
-	if (!pagevec_add(pvec, page) || PageCompound(page))
+	if (pagevec_add_and_need_flush(pvec, page))
 		__pagevec_lru_add(pvec);
 	local_unlock(&lru_pvecs.lock);
 }
@@ -654,7 +666,7 @@ void deactivate_file_page(struct page *page)
 		local_lock(&lru_pvecs.lock);
 		pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file);
 
-		if (!pagevec_add(pvec, page) || PageCompound(page))
+		if (pagevec_add_and_need_flush(pvec, page))
 			pagevec_lru_move_fn(pvec, lru_deactivate_file_fn);
 		local_unlock(&lru_pvecs.lock);
 	}
@@ -676,7 +688,7 @@ void deactivate_page(struct page *page)
 		local_lock(&lru_pvecs.lock);
 		pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate);
 		get_page(page);
-		if (!pagevec_add(pvec, page) || PageCompound(page))
+		if (pagevec_add_and_need_flush(pvec, page))
 			pagevec_lru_move_fn(pvec, lru_deactivate_fn);
 		local_unlock(&lru_pvecs.lock);
 	}
@@ -698,7 +710,7 @@ void mark_page_lazyfree(struct page *page)
 		local_lock(&lru_pvecs.lock);
 		pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree);
 		get_page(page);
-		if (!pagevec_add(pvec, page) || PageCompound(page))
+		if (pagevec_add_and_need_flush(pvec, page))
 			pagevec_lru_move_fn(pvec, lru_lazyfree_fn);
 		local_unlock(&lru_pvecs.lock);
 	}
@@ -729,18 +741,13 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
 }
 
 /*
- * lru_add_drain_all() usually needs to be called before we start compiling
- * a list of pages to be migrated using isolate_lru_page(). Note that pages
- * may be moved off the LRU after we have drained them. Those pages will
- * fail to migrate like other pages that may be busy.
- *
  * Doesn't need any cpu hotplug locking because we do rely on per-cpu
  * kworkers being shut down before our page_alloc_cpu_dead callback is
  * executed on the offlined cpu.
  * Calling this function with cpu hotplug locks held can actually lead
  * to obscure indirect dependencies via WQ context.
  */
-void lru_add_drain_all(void)
+static void __lru_add_drain_all(bool force_all_cpus)
 {
 	/*
 	 * lru_drain_gen - Global pages generation number
@@ -785,7 +792,7 @@ void lru_add_drain_all(void)
 	 * (C) Exit the draining operation if a newer generation, from another
 	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
 	 */
-	if (unlikely(this_gen != lru_drain_gen))
+	if (unlikely(this_gen != lru_drain_gen && !force_all_cpus))
 		goto done;
 
 	/*
@@ -815,7 +822,8 @@ void lru_add_drain_all(void)
 	for_each_online_cpu(cpu) {
 		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
 
-		if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
+		if (force_all_cpus ||
+		    pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
 		    data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
 		    pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
 		    pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
@@ -833,6 +841,11 @@ void lru_add_drain_all(void)
 done:
 	mutex_unlock(&lock);
 }
+
+void lru_add_drain_all(void)
+{
+	__lru_add_drain_all(false);
+}
 #else
 void lru_add_drain_all(void)
 {
@@ -840,6 +853,44 @@ void lru_add_drain_all(void)
 }
 #endif /* CONFIG_SMP */
 
+static atomic_t lru_disable_count = ATOMIC_INIT(0);
+
+bool lru_cache_disabled(void)
+{
+	return atomic_read(&lru_disable_count);
+}
+
+void lru_cache_enable(void)
+{
+	atomic_dec(&lru_disable_count);
+}
+
+/*
+ * lru_cache_disable() needs to be called before we start compiling
+ * a list of pages to be migrated using isolate_lru_page().
+ * It drains pages on LRU cache and then disable on all cpus until
+ * lru_cache_enable is called.
+ *
+ * Must be paired with a call to lru_cache_enable().
+ */
+void lru_cache_disable(void)
+{
+	atomic_inc(&lru_disable_count);
+#ifdef CONFIG_SMP
+	/*
+	 * lru_add_drain_all in the force mode will schedule draining on
+	 * all online CPUs so any calls of lru_cache_disabled wrapped by
+	 * local_lock or preemption disabled would be ordered by that.
+	 * The atomic operation doesn't need to have stronger ordering
+	 * requirements because that is enforeced by the scheduling
+	 * guarantees.
+	 */
+	__lru_add_drain_all(true);
+#else
+	lru_add_drain();
+#endif
+}
+
 /**
  * release_pages - batched put_page()
  * @pages: array of pages to release

From patchwork Wed Mar 10 16:14:29 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Minchan Kim <minchan@kernel.org>
X-Patchwork-Id: 12128387
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D7335C433DB
	for <linux-fsdevel@archiver.kernel.org>;
 Wed, 10 Mar 2021 16:15:15 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id AD76364F4A
	for <linux-fsdevel@archiver.kernel.org>;
 Wed, 10 Mar 2021 16:15:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232520AbhCJQOp (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Wed, 10 Mar 2021 11:14:45 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54654 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231366AbhCJQOj (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Wed, 10 Mar 2021 11:14:39 -0500
Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com
 [IPv6:2607:f8b0:4864:20::534])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46878C061760;
        Wed, 10 Mar 2021 08:14:39 -0800 (PST)
Received: by mail-pg1-x534.google.com with SMTP id w34so10640768pga.8;
        Wed, 10 Mar 2021 08:14:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=sender:from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=2lZIChPtI3p/aQ6I3GH7R9vIF95h/LXfK2s4My5ecFY=;
        b=PpJW7kp5pzBkHhRmXGWgHhfCoR5r78l0ND0ggR1yOLv9HV1gPw2Ucrv9D1FahpYdNX
         v/Pq+6lsYoRIRGMfHS9cp+RFs5LOFR4izsjtsBv3ctsddBwDVuUBJos2yKEW0g/I8CIy
         GZaSi5mU7jBOOTlw6J9itpKWr2FrElsM6Nz5PVwFamSgbcaN9/vIs24Qkol7SQq8xJPN
         diOupJhOcJ7Jv+3wbA3GI3OLkZI7pJhQCMsrPwzOtlw5oWxeSuXNSoq2guC86l5HRGqI
         /iIhvjog/gC5R0GSdyCFa8oavOc9B8gg+M1WVWD60fKpBBTm3FKaWDS+9OtiqPDbA/eo
         kYPg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :in-reply-to:references:mime-version:content-transfer-encoding;
        bh=2lZIChPtI3p/aQ6I3GH7R9vIF95h/LXfK2s4My5ecFY=;
        b=px4eyER33OsZkGbAma4t0aBgcOYcGJpdr5kQDPuoMjKXpkkV6EilunEVl9btfiShLZ
         ZHcrv5LMzUv1puN6irlhB5PB6NcnXWdzSXeN/d0QEnZfxFRAuoxpNAlZuGIQhaIgA9wI
         y2gD1hB17XSDCjOkQuvJmqFS7klR5sj8yUbtpYSjhmtNBBNwchSM6G1MP0fgVpaHMQEt
         Sj61cb984gddSu1MYmWsT08a53xkxeL7UHzH315/WQK4QtQ5ClvhnBzCji+eQ1YnUgIw
         ejSpy8qjip79TtmXt+Y540vFwBAUAqqxBh9+tsVsSom/ly85x9jAAw8rXs1i5v65lZBJ
         c4mw==
X-Gm-Message-State: AOAM533wlDWWcR/cvM1v074gMC115h+jmU2ndvrIAZ+tlZqSSWOZEDMs
        COiF9+Kyj3fpCaDFQriVUK8=
X-Google-Smtp-Source: 
 ABdhPJyPQhcFlW2wScizcKfaE6eSRP+gjVd97qisGxEWmX7zZWVrW7WndKT1z6Wy60BgoWv6yGaykA==
X-Received: by 2002:a05:6a00:3:b029:1f3:1959:2e3b with SMTP id
 h3-20020a056a000003b02901f319592e3bmr3444355pfk.11.1615392878842;
        Wed, 10 Mar 2021 08:14:38 -0800 (PST)
Received: from bbox-1.mtv.corp.google.com
 ([2620:15c:211:201:64cb:74c7:f2c:e5e0])
        by smtp.gmail.com with ESMTPSA id
 d1sm7121189pjc.24.2021.03.10.08.14.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 10 Mar 2021 08:14:38 -0800 (PST)
Sender: Minchan Kim <minchan.kim@gmail.com>
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>,
        joaodias@google.com, surenb@google.com, cgoldswo@codeaurora.org,
        willy@infradead.org, mhocko@suse.com, david@redhat.com,
        vbabka@suse.cz, linux-fsdevel@vger.kernel.org,
        Minchan Kim <minchan@kernel.org>
Subject: [PATCH v3 3/3] mm: fs: Invalidate BH LRU during page migration
Date: Wed, 10 Mar 2021 08:14:29 -0800
Message-Id: <20210310161429.399432-3-minchan@kernel.org>
X-Mailer: git-send-email 2.30.1.766.gb4fecdf3b7-goog
In-Reply-To: <20210310161429.399432-1-minchan@kernel.org>
References: <20210310161429.399432-1-minchan@kernel.org>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

Pages containing buffer_heads that are in one of the per-CPU
buffer_head LRU caches will be pinned and thus cannot be migrated.
This can prevent CMA allocations from succeeding, which are often used
on platforms with co-processors (such as a DSP) that can only use
physically contiguous memory. It can also prevent memory
hot-unplugging from succeeding, which involves migrating at least
MIN_MEMORY_BLOCK_SIZE bytes of memory, which ranges from 8 MiB to 1
GiB based on the architecture in use.

Correspondingly, invalidate the BH LRU caches before a migration
starts and stop any buffer_head from being cached in the LRU caches,
until migration has finished.

Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
---
 fs/buffer.c                 | 12 ++++++++++--
 include/linux/buffer_head.h |  4 ++++
 mm/swap.c                   |  5 ++++-
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0cb7ffd4977c..ca9dd736bcb8 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1264,6 +1264,14 @@ static void bh_lru_install(struct buffer_head *bh)
 	int i;
 
 	check_irqs_on();
+	/*
+	 * buffer_head in bh_lru could increase refcount of the page
+	 * until it will be invalidated. It causes page migraion failure.
+	 * Skip putting upcoming bh into bh_lru until migration is done.
+	 */
+	if (lru_cache_disabled())
+		return;
+
 	bh_lru_lock();
 
 	b = this_cpu_ptr(&bh_lrus);
@@ -1409,7 +1417,7 @@ EXPORT_SYMBOL(__bread_gfp);
  * This doesn't race because it runs in each cpu either in irq
  * or with preempt disabled.
  */
-static void invalidate_bh_lru(void *arg)
+void invalidate_bh_lru(void *arg)
 {
 	struct bh_lru *b = &get_cpu_var(bh_lrus);
 	int i;
@@ -1421,7 +1429,7 @@ static void invalidate_bh_lru(void *arg)
 	put_cpu_var(bh_lrus);
 }
 
-static bool has_bh_in_lru(int cpu, void *dummy)
+bool has_bh_in_lru(int cpu, void *dummy)
 {
 	struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu);
 	int i;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 6b47f94378c5..05998b5947a2 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -194,6 +194,8 @@ void __breadahead_gfp(struct block_device *, sector_t block, unsigned int size,
 struct buffer_head *__bread_gfp(struct block_device *,
 				sector_t block, unsigned size, gfp_t gfp);
 void invalidate_bh_lrus(void);
+void invalidate_bh_lru(void *arg);
+bool has_bh_in_lru(int cpu, void *dummy);
 struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
 void free_buffer_head(struct buffer_head * bh);
 void unlock_buffer(struct buffer_head *bh);
@@ -406,6 +408,8 @@ static inline int inode_has_buffers(struct inode *inode) { return 0; }
 static inline void invalidate_inode_buffers(struct inode *inode) {}
 static inline int remove_inode_buffers(struct inode *inode) { return 1; }
 static inline int sync_mapping_buffers(struct address_space *mapping) { return 0; }
+static inline void invalidate_bh_lru(void *arg) {}
+static inline bool has_bh_in_lru(int cpu, void *dummy) { return 0; }
 #define buffer_heads_over_limit 0
 
 #endif /* CONFIG_BLOCK */
diff --git a/mm/swap.c b/mm/swap.c
index fbdf6ac05aec..2a431959a45d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -36,6 +36,7 @@
 #include <linux/hugetlb.h>
 #include <linux/page_idle.h>
 #include <linux/local_lock.h>
+#include <linux/buffer_head.h>
 
 #include "internal.h"
 
@@ -641,6 +642,7 @@ void lru_add_drain_cpu(int cpu)
 		pagevec_lru_move_fn(pvec, lru_lazyfree_fn);
 
 	activate_page_drain(cpu);
+	invalidate_bh_lru(NULL);
 }
 
 /**
@@ -828,7 +830,8 @@ static void __lru_add_drain_all(bool force_all_cpus)
 		    pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
 		    pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
 		    pagevec_count(&per_cpu(lru_pvecs.lru_lazyfree, cpu)) ||
-		    need_activate_page_drain(cpu)) {
+		    need_activate_page_drain(cpu) ||
+		    has_bh_in_lru(cpu, NULL)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, mm_percpu_wq, work);
 			__cpumask_set_cpu(cpu, &has_work);