[v19,06/20] mm/thp: narrow lru locking

Message ID	1600918115-22007-7-git-send-email-alex.shi@linux.alibaba.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=vOKg=DB=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2B69823899 From: Alex Shi <alex.shi@linux.alibaba.com> To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com, aaron.lwe@gmail.com Cc: Andrea Arcangeli <aarcange@redhat.com> Subject: [PATCH v19 06/20] mm/thp: narrow lru locking Date: Thu, 24 Sep 2020 11:28:21 +0800 Message-Id: <1600918115-22007-7-git-send-email-alex.shi@linux.alibaba.com> In-Reply-To: <1600918115-22007-1-git-send-email-alex.shi@linux.alibaba.com> References: <1600918115-22007-1-git-send-email-alex.shi@linux.alibaba.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	per memcg lru_lock \| expand [v19,00/20] per memcg lru_lock [v19,01/20] mm/memcg: warning on !memcg after readahead page charged [v19,02/20] mm/memcg: bail early from swap accounting if memcg disabled [v19,03/20] mm/thp: move lru_add_page_tail func to huge_memory.c [v19,04/20] mm/thp: use head for head page in lru_add_page_tail [v19,05/20] mm/thp: Simplify lru_add_page_tail() [v19,06/20] mm/thp: narrow lru locking [v19,07/20] mm/vmscan: remove unnecessary lruvec adding [v19,08/20] mm: page_idle_get_page() does not need lru_lock [v19,09/20] mm/memcg: add debug checking in lock_page_memcg [v19,10/20] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn [v19,11/20] mm/lru: move lock into lru_note_cost [v19,12/20] mm/vmscan: remove lruvec reget in move_pages_to_lru [v19,13/20] mm/mlock: remove lru_lock on TestClearPageMlocked [v19,14/20] mm/mlock: remove __munlock_isolate_lru_page [v19,15/20] mm/lru: introduce TestClearPageLRU [v19,16/20] mm/compaction: do page isolation first in compaction [v19,17/20] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn [v19,18/20] mm/lru: replace pgdat lru_lock with lruvec lock [v19,19/20] mm/lru: introduce the relock_page_lruvec function [v19,20/20] mm/lru: revise the comments of lru_lock

Message ID

1600918115-22007-7-git-send-email-alex.shi@linux.alibaba.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2B69823899
From: Alex Shi <alex.shi@linux.alibaba.com>
To: akpm@linux-foundation.org,
	mgorman@techsingularity.net,
	tj@kernel.org,
	hughd@google.com,
	khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com,
	willy@infradead.org,
	hannes@cmpxchg.org,
	lkp@intel.com,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org,
	shakeelb@google.com,
	iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com,
	kirill@shutemov.name,
	alexander.duyck@gmail.com,
	rong.a.chen@intel.com,
	mhocko@suse.com,
	vdavydov.dev@gmail.com,
	shy828301@gmail.com,
	aaron.lwe@gmail.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Subject: [PATCH v19 06/20] mm/thp: narrow lru locking
Date: Thu, 24 Sep 2020 11:28:21 +0800
Message-Id: <1600918115-22007-7-git-send-email-alex.shi@linux.alibaba.com>
In-Reply-To: <1600918115-22007-1-git-send-email-alex.shi@linux.alibaba.com>
References: <1600918115-22007-1-git-send-email-alex.shi@linux.alibaba.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

per memcg lru_lock | expand

Commit Message

Alex Shi Sept. 24, 2020, 3:28 a.m. UTC

lru_lock and page cache xa_lock have no obvious reason to be taken
one way round or the other: until now, lru_lock has been taken before
page cache xa_lock, when splitting a THP; but nothing else takes them
together.  Reverse that ordering: let's narrow the lru locking - but
leave local_irq_disable to block interrupts throughout, like before.

Hugh Dickins point: split_huge_page_to_list() was already silly, to be
using the _irqsave variant: it's just been taking sleeping locks, so
would already be broken if entered with interrupts enabled.  So we
can save passing flags argument down to __split_huge_page().

Why change the lock ordering here? That was hard to decide. One reason:
when this series reaches per-memcg lru locking, it relies on the THP's
memcg to be stable when taking the lru_lock: that is now done after the
THP's refcount has been frozen, which ensures page memcg cannot change.

Another reason: previously, lock_page_memcg()'s move_lock was presumed
to nest inside lru_lock; but now lru_lock must nest inside (page cache
lock inside) move_lock, so it becomes possible to use lock_page_memcg()
to stabilize page memcg before taking its lru_lock.  That is not the
mechanism used in this series, but it is an option we want to keep open.

[Hugh Dickins: rewrite commit log]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

Comments

Alex Shi Sept. 26, 2020, 6:08 a.m. UTC | #1

Rebased to today's akpm branch.

Thanks
Alex

From f9af3691a7163d8461a140066ddd0eff5d3e44cb Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@linux.alibaba.com>
Date: Mon, 6 Apr 2020 23:43:31 +0800
Subject: [PATCH v19 06/20] mm/thp: narrow lru locking

lru_lock and page cache xa_lock have no obvious reason to be taken
one way round or the other: until now, lru_lock has been taken before
page cache xa_lock, when splitting a THP; but nothing else takes them
together.  Reverse that ordering: let's narrow the lru locking - but
leave local_irq_disable to block interrupts throughout, like before.

Hugh Dickins point: split_huge_page_to_list() was already silly, to be
using the _irqsave variant: it's just been taking sleeping locks, so
would already be broken if entered with interrupts enabled.  So we
can save passing flags argument down to __split_huge_page().

Why change the lock ordering here? That was hard to decide. One reason:
when this series reaches per-memcg lru locking, it relies on the THP's
memcg to be stable when taking the lru_lock: that is now done after the
THP's refcount has been frozen, which ensures page memcg cannot change.

Another reason: previously, lock_page_memcg()'s move_lock was presumed
to nest inside lru_lock; but now lru_lock must nest inside (page cache
lock inside) move_lock, so it becomes possible to use lock_page_memcg()
to stabilize page memcg before taking its lru_lock.  That is not the
mechanism used in this series, but it is an option we want to keep open.

[Hugh Dickins: rewrite commit log]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ba849d0629b7..9a8fcec3239e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2407,7 +2407,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
 }
 
 static void __split_huge_page(struct page *page, struct list_head *list,
-		pgoff_t end, unsigned long flags)
+		pgoff_t end)
 {
 	struct page *head = compound_head(page);
 	pg_data_t *pgdat = page_pgdat(head);
@@ -2417,8 +2417,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	unsigned int nr = thp_nr_pages(head);
 	int i;
 
-	lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
 	/* complete memcg works before add pages to LRU */
 	mem_cgroup_split_huge_fixup(head);
 
@@ -2430,6 +2428,11 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_lock(&swap_cache->i_pages);
 	}
 
+	/* prevent PageLRU to go away from under us, and freeze lru stats */
+	spin_lock(&pgdat->lru_lock);
+
+	lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
 	for (i = nr - 1; i >= 1; i--) {
 		__split_huge_page_tail(head, i, lruvec, list);
 		/* Some pages can be beyond i_size: drop them from page cache */
@@ -2449,6 +2452,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	}
 
 	ClearPageCompound(head);
+	spin_unlock(&pgdat->lru_lock);
+	/* Caller disabled irqs, so they are still disabled here */
 
 	split_page_owner(head, nr);
 
@@ -2466,8 +2471,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		page_ref_add(head, 2);
 		xa_unlock(&head->mapping->i_pages);
 	}
-
-	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+	local_irq_enable();
 
 	remap_page(head, nr);
 
@@ -2607,12 +2611,10 @@ bool can_split_huge_page(struct page *page, int *pextra_pins)
 int split_huge_page_to_list(struct page *page, struct list_head *list)
 {
 	struct page *head = compound_head(page);
-	struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
 	struct deferred_split *ds_queue = get_deferred_split_queue(head);
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
 	int count, mapcount, extra_pins, ret;
-	unsigned long flags;
 	pgoff_t end;
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2673,9 +2675,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	unmap_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
-	/* prevent PageLRU to go away from under us, and freeze lru stats */
-	spin_lock_irqsave(&pgdata->lru_lock, flags);
-
+	/* block interrupt reentry in xa_lock and spinlock */
+	local_irq_disable();
 	if (mapping) {
 		XA_STATE(xas, &mapping->i_pages, page_index(head));
 
@@ -2705,7 +2706,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 				__dec_node_page_state(head, NR_FILE_THPS);
 		}
 
-		__split_huge_page(page, list, end, flags);
+		__split_huge_page(page, list, end);
 		if (PageSwapCache(head)) {
 			swp_entry_t entry = { .val = page_private(head) };
 
@@ -2724,7 +2725,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		spin_unlock(&ds_queue->split_queue_lock);
 fail:		if (mapping)
 			xa_unlock(&mapping->i_pages);
-		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+		local_irq_enable();
 		remap_page(head, thp_nr_pages(head));
 		ret = -EBUSY;
 	}

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8b92cd197218..63af7611afaf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2402,7 +2402,7 @@  static void __split_huge_page_tail(struct page *head, int tail,
 }
 
 static void __split_huge_page(struct page *page, struct list_head *list,
-		pgoff_t end, unsigned long flags)
+		pgoff_t end)
 {
 	struct page *head = compound_head(page);
 	pg_data_t *pgdat = page_pgdat(head);
@@ -2411,8 +2411,6 @@  static void __split_huge_page(struct page *page, struct list_head *list,
 	unsigned long offset = 0;
 	int i;
 
-	lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
 	/* complete memcg works before add pages to LRU */
 	mem_cgroup_split_huge_fixup(head);
 
@@ -2424,6 +2422,11 @@  static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_lock(&swap_cache->i_pages);
 	}
 
+	/* prevent PageLRU to go away from under us, and freeze lru stats */
+	spin_lock(&pgdat->lru_lock);
+
+	lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
 	for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
 		__split_huge_page_tail(head, i, lruvec, list);
 		/* Some pages can be beyond i_size: drop them from page cache */
@@ -2443,6 +2446,8 @@  static void __split_huge_page(struct page *page, struct list_head *list,
 	}
 
 	ClearPageCompound(head);
+	spin_unlock(&pgdat->lru_lock);
+	/* Caller disabled irqs, so they are still disabled here */
 
 	split_page_owner(head, HPAGE_PMD_ORDER);
 
@@ -2460,8 +2465,7 @@  static void __split_huge_page(struct page *page, struct list_head *list,
 		page_ref_add(head, 2);
 		xa_unlock(&head->mapping->i_pages);
 	}
-
-	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+	local_irq_enable();
 
 	remap_page(head);
 
@@ -2600,12 +2604,10 @@  bool can_split_huge_page(struct page *page, int *pextra_pins)
 int split_huge_page_to_list(struct page *page, struct list_head *list)
 {
 	struct page *head = compound_head(page);
-	struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
 	struct deferred_split *ds_queue = get_deferred_split_queue(head);
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
 	int count, mapcount, extra_pins, ret;
-	unsigned long flags;
 	pgoff_t end;
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2666,9 +2668,8 @@  int split_huge_page_to_list(struct page *page, struct list_head *list)
 	unmap_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
-	/* prevent PageLRU to go away from under us, and freeze lru stats */
-	spin_lock_irqsave(&pgdata->lru_lock, flags);
-
+	/* block interrupt reentry in xa_lock and spinlock */
+	local_irq_disable();
 	if (mapping) {
 		XA_STATE(xas, &mapping->i_pages, page_index(head));
 
@@ -2698,7 +2699,7 @@  int split_huge_page_to_list(struct page *page, struct list_head *list)
 				__dec_node_page_state(head, NR_FILE_THPS);
 		}
 
-		__split_huge_page(page, list, end, flags);
+		__split_huge_page(page, list, end);
 		if (PageSwapCache(head)) {
 			swp_entry_t entry = { .val = page_private(head) };
 
@@ -2717,7 +2718,7 @@  int split_huge_page_to_list(struct page *page, struct list_head *list)
 		spin_unlock(&ds_queue->split_queue_lock);
 fail:		if (mapping)
 			xa_unlock(&mapping->i_pages);
-		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+		local_irq_enable();
 		remap_page(head);
 		ret = -EBUSY;
 	}

[v19,06/20] mm/thp: narrow lru locking

Commit Message

Comments

Patch