Message ID | 1573567588-47048-5-git-send-email-alex.shi@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | per lruvec lru_lock for memcg | expand |
On Tue, Nov 12, 2019 at 10:06:24PM +0800, Alex Shi wrote: > +/* Don't lock again iff page's lruvec locked */ > +static inline struct lruvec *relock_page_lruvec_irq(struct page *page, > + struct lruvec *locked_lruvec) > +{ > + struct pglist_data *pgdat = page_pgdat(page); > + struct lruvec *lruvec; > + > + rcu_read_lock(); > + lruvec = mem_cgroup_page_lruvec(page, pgdat); > + > + if (locked_lruvec == lruvec) { > + rcu_read_unlock(); > + return lruvec; > + } > + rcu_read_unlock(); Why not simply: rcu_read_lock(); lruvec = mem_cgroup_page_lruvec(page, pgdat); rcu_read_unlock(); if (locked_lruvec == lruvec) return lruvec; Also, why are you bothering to re-enable interrupts here? Surely if you're holding lock A with interrupts disabled , you can just drop lock A, acquire lock B and leave the interrupts alone. That way you only need one of this variety of function, and not the separate irq/irqsave variants.
hi Matthew, Thanks a lot for comments! 在 2019/11/12 下午10:36, Matthew Wilcox 写道: > On Tue, Nov 12, 2019 at 10:06:24PM +0800, Alex Shi wrote: >> +/* Don't lock again iff page's lruvec locked */ >> +static inline struct lruvec *relock_page_lruvec_irq(struct page *page, >> + struct lruvec *locked_lruvec) >> +{ >> + struct pglist_data *pgdat = page_pgdat(page); >> + struct lruvec *lruvec; >> + >> + rcu_read_lock(); >> + lruvec = mem_cgroup_page_lruvec(page, pgdat); >> + >> + if (locked_lruvec == lruvec) { >> + rcu_read_unlock(); >> + return lruvec; >> + } >> + rcu_read_unlock(); > > Why not simply: > > rcu_read_lock(); > lruvec = mem_cgroup_page_lruvec(page, pgdat); > rcu_read_unlock(); > > if (locked_lruvec == lruvec) The rcu_read_unlock here is for guarding the locked_lruvec/lruvec comparsion. Otherwise memcg/lruvec maybe changed, like, from memcg migration etc. I guess. > return lruvec; > > Also, why are you bothering to re-enable interrupts here? Surely if > you're holding lock A with interrupts disabled , you can just drop lock A, > acquire lock B and leave the interrupts alone. That way you only need > one of this variety of function, and not the separate irq/irqsave variants. > Thanks for the suggestion! Yes, if only do re-lock, it's better to leave the irq unchanging. but, when the locked_lruvec is NULL, it become a first time lock which irq or irqsave are different. Thus, in combined function we need a nother parameter to indicate if it do irqsaving. So comparing to a extra/indistinct parameter, I guess 2 inline functions would be a bit more simple/cleary? Thanks a lot! Alex
On Wed, Nov 13, 2019 at 10:26:24AM +0800, Alex Shi wrote: > 在 2019/11/12 下午10:36, Matthew Wilcox 写道: > > On Tue, Nov 12, 2019 at 10:06:24PM +0800, Alex Shi wrote: > >> +/* Don't lock again iff page's lruvec locked */ > >> +static inline struct lruvec *relock_page_lruvec_irq(struct page *page, > >> + struct lruvec *locked_lruvec) > >> +{ > >> + struct pglist_data *pgdat = page_pgdat(page); > >> + struct lruvec *lruvec; > >> + > >> + rcu_read_lock(); > >> + lruvec = mem_cgroup_page_lruvec(page, pgdat); > >> + > >> + if (locked_lruvec == lruvec) { > >> + rcu_read_unlock(); > >> + return lruvec; > >> + } > >> + rcu_read_unlock(); > > > > Why not simply: > > > > rcu_read_lock(); > > lruvec = mem_cgroup_page_lruvec(page, pgdat); > > rcu_read_unlock(); > > > > if (locked_lruvec == lruvec) > > The rcu_read_unlock here is for guarding the locked_lruvec/lruvec comparsion. > Otherwise memcg/lruvec maybe changed, like, from memcg migration etc. I guess. How does holding the RCU lock guard the comparison? You're comparing two pointers for equality. Nothing any other CPU can do at this point will change the results of that comparison. > > Also, why are you bothering to re-enable interrupts here? Surely if > > you're holding lock A with interrupts disabled , you can just drop lock A, > > acquire lock B and leave the interrupts alone. That way you only need > > one of this variety of function, and not the separate irq/irqsave variants. > > > > Thanks for the suggestion! Yes, if only do re-lock, it's better to leave the irq unchanging. but, when the locked_lruvec is NULL, it become a first time lock which irq or irqsave are different. Thus, in combined function we need a nother parameter to indicate if it do irqsaving. So comparing to a extra/indistinct parameter, I guess 2 inline functions would be a bit more simple/cleary? Ah, right, I missed the "if it's not held" case.
在 2019/11/13 下午9:45, Matthew Wilcox 写道: >>> Why not simply: >>> >>> rcu_read_lock(); >>> lruvec = mem_cgroup_page_lruvec(page, pgdat); >>> rcu_read_unlock(); >>> >>> if (locked_lruvec == lruvec) >> The rcu_read_unlock here is for guarding the locked_lruvec/lruvec comparsion. >> Otherwise memcg/lruvec maybe changed, like, from memcg migration etc. I guess. > How does holding the RCU lock guard the comparison? You're comparing two > pointers for equality. Nothing any other CPU can do at this point will > change the results of that comparison. > Hi Matthew, Thanks for reply! The reason of guarding lruvec compasion here is a bit undistinct, in fact, it guards the page's memcg from free/migrate etc, since the lruvec is kind of binding to each of memcg. The detailed explanation could be found in lock_page_memcg(). Thanks Alex
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1c1e68537eca..2421b720d272 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1300,6 +1300,55 @@ static inline void dec_lruvec_page_state(struct page *page, mod_lruvec_page_state(page, idx, -1); } +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irq(struct page *page, + struct lruvec *locked_lruvec) +{ + struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *lruvec; + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + + if (locked_lruvec == lruvec) { + rcu_read_unlock(); + return lruvec; + } + rcu_read_unlock(); + + if (locked_lruvec) + spin_unlock_irq(&locked_lruvec->lru_lock); + + lruvec = lock_page_lruvec_irq(page, pgdat); + + return lruvec; +} + +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page, + struct lruvec *locked_lruvec) +{ + struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *lruvec; + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + + if (locked_lruvec == lruvec) { + rcu_read_unlock(); + return lruvec; + } + rcu_read_unlock(); + + if (locked_lruvec) + spin_unlock_irqrestore(&locked_lruvec->lru_lock, + locked_lruvec->flags); + + lruvec = lock_page_lruvec_irqsave(page, pgdat); + + return lruvec; +} + #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); diff --git a/mm/mlock.c b/mm/mlock.c index b509b80b8513..8b3a97b62c0a 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -290,6 +290,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) { int i; int nr = pagevec_count(pvec); + int delta_munlocked = -nr; struct pagevec pvec_putback; struct lruvec *lruvec = NULL; int pgrescued = 0; @@ -300,20 +301,19 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; - lruvec = lock_page_lruvec_irq(page, page_pgdat(page)); + lruvec = relock_page_lruvec_irq(page, lruvec); if (TestClearPageMlocked(page)) { /* * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (__munlock_isolate_lru_page(page, lruvec, false)) { - __mod_zone_page_state(zone, NR_MLOCK, -1); - spin_unlock_irq(&lruvec->lru_lock); + if (__munlock_isolate_lru_page(page, lruvec, false)) continue; - } else + else __munlock_isolation_failed(page); - } + } else + delta_munlocked++; /* * We won't be munlocking this page in the next phase @@ -323,8 +323,10 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) */ pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; - spin_unlock_irq(&lruvec->lru_lock); } + __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); + if (lruvec) + spin_unlock_irq(&lruvec->lru_lock); /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); diff --git a/mm/swap.c b/mm/swap.c index 267c3e262254..0639b3e9e03b 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -195,11 +195,12 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - lruvec = lock_page_lruvec_irqsave(page, page_pgdat(page)); + lruvec = relock_page_lruvec_irqsave(page, lruvec); (*move_fn)(page, lruvec, arg); - spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); } + if (lruvec) + spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); @@ -802,15 +803,12 @@ void release_pages(struct page **pages, int nr) } if (PageLRU(page)) { - struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + struct lruvec *pre_lruvec = lruvec; - if (new_lruvec != lruvec) { - if (lruvec) - spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->flags); + lruvec = relock_page_lruvec_irqsave(page, lruvec); + if (pre_lruvec != lruvec) lock_batch = 0; - lruvec = lock_page_lruvec_irqsave(page, page_pgdat(page)); - } VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); diff --git a/mm/vmscan.c b/mm/vmscan.c index 50f15d1e5f18..cbebd9b0b9c8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1874,22 +1874,25 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, * Returns the number of pages moved to the given lruvec. */ -static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, +static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *llvec, struct list_head *list) { int nr_pages, nr_moved = 0; LIST_HEAD(pages_to_free); struct page *page; enum lru_list lru; + struct lruvec *lruvec = llvec; while (!list_empty(list)) { page = lru_to_page(list); + lruvec = relock_page_lruvec_irq(page, lruvec); + VM_BUG_ON_PAGE(PageLRU(page), page); if (unlikely(!page_evictable(page))) { list_del(&page->lru); spin_unlock_irq(&lruvec->lru_lock); + lruvec = NULL; putback_lru_page(page); - spin_lock_irq(&lruvec->lru_lock); continue; } @@ -1907,8 +1910,8 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, if (unlikely(PageCompound(page))) { spin_unlock_irq(&lruvec->lru_lock); + lruvec = NULL; (*get_compound_page_dtor(page))(page); - spin_lock_irq(&lruvec->lru_lock); } else list_add(&page->lru, &pages_to_free); } else { @@ -1916,6 +1919,11 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, } } + if (lruvec != llvec) { + if (lruvec) + spin_unlock_irq(&lruvec->lru_lock); + spin_lock_irq(&llvec->lru_lock); + } /* * To save our caller's stack, now use input list for pages to free. */ @@ -4338,18 +4346,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pgdat = page_pgdat(page); - struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, pgdat); - pgscanned++; - if (lruvec != new_lruvec) { - if (lruvec) - spin_unlock_irq(&lruvec->lru_lock); - lruvec = new_lruvec; - spin_lock_irq(&lruvec->lru_lock); - } + lruvec = relock_page_lruvec_irq(page, lruvec); if (!PageLRU(page) || !PageUnevictable(page)) continue;
During the pagevec locking, a new page's lruvec is may same as previous one. So we may save a lruvec locking if the lruvec is same as previous one, and only change the lock iff lruvec is new. Function named relock_page_lruvec following Hugh Dickins patch. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: swkhack <swkhack@gmail.com> Cc: "Potyra, Stefan" <Stefan.Potyra@elektrobit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peng Fan <peng.fan@nxp.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Hugh Dickins <hughd@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org --- include/linux/memcontrol.h | 49 ++++++++++++++++++++++++++++++++++++++++++++++ mm/mlock.c | 16 ++++++++------- mm/swap.c | 14 ++++++------- mm/vmscan.c | 24 +++++++++++------------ 4 files changed, 76 insertions(+), 27 deletions(-)