Message ID | 20220503170341.1413961-1-minchan@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: don't be stuck to rmap lock on reclaim path | expand |
Hi Minchan,
I love your patch! Perhaps something to improve:
[auto build test WARNING on hnaz-mm/master]
url: https://github.com/intel-lab-lkp/linux/commits/Minchan-Kim/mm-don-t-be-stuck-to-rmap-lock-on-reclaim-path/20220504-010625
base: https://github.com/hnaz/linux-mm master
config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20220504/202205041057.a78ABZ95-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/0e190ce022ef259d63eb2cf50110b292ba17c79c
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Minchan-Kim/mm-don-t-be-stuck-to-rmap-lock-on-reclaim-path/20220504-010625
git checkout 0e190ce022ef259d63eb2cf50110b292ba17c79c
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
mm/page_idle.c: In function 'page_idle_clear_pte_refs':
>> mm/page_idle.c:106:26: warning: passing argument 2 of 'rmap_walk' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
106 | rmap_walk(folio, &rwc);
| ^~~~
In file included from mm/page_idle.c:11:
include/linux/rmap.h:403:63: note: expected 'struct rmap_walk_control *' but argument is of type 'const struct rmap_walk_control *'
403 | void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
vim +106 mm/page_idle.c
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 85
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 86 static void page_idle_clear_pte_refs(struct page *page)
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 87 {
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 88) struct folio *folio = page_folio(page);
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 89 /*
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 90 * Since rwc.arg is unused, rwc is effectively immutable, so we
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 91 * can make it static const to save some cycles and stack.
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 92 */
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 93 static const struct rmap_walk_control rwc = {
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 94 .rmap_one = page_idle_clear_pte_refs_one,
2f031c6f042cb8 Matthew Wilcox (Oracle 2022-01-29 95) .anon_lock = folio_lock_anon_vma_read,
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 96 };
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 97 bool need_lock;
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 98
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 99) if (!folio_mapped(folio) || !folio_raw_mapping(folio))
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 100 return;
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 101
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 102) need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 103) if (need_lock && !folio_trylock(folio))
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 104 return;
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 105
84fbbe21894bb9 Matthew Wilcox (Oracle 2022-01-29 @106) rmap_walk(folio, &rwc);
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 107
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 108 if (need_lock)
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 109) folio_unlock(folio);
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 110 }
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 111
Hi Minchan,
I love your patch! Yet something to improve:
[auto build test ERROR on hnaz-mm/master]
url: https://github.com/intel-lab-lkp/linux/commits/Minchan-Kim/mm-don-t-be-stuck-to-rmap-lock-on-reclaim-path/20220504-010625
base: https://github.com/hnaz/linux-mm master
config: x86_64-randconfig-a016 (https://download.01.org/0day-ci/archive/20220504/202205041158.b7On1pwy-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 363b3a645a1e30011cc8da624f13dac5fd915628)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/0e190ce022ef259d63eb2cf50110b292ba17c79c
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Minchan-Kim/mm-don-t-be-stuck-to-rmap-lock-on-reclaim-path/20220504-010625
git checkout 0e190ce022ef259d63eb2cf50110b292ba17c79c
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> mm/page_idle.c:106:19: error: passing 'const struct rmap_walk_control *' to parameter of type 'struct rmap_walk_control *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
rmap_walk(folio, &rwc);
^~~~
include/linux/rmap.h:403:63: note: passing argument to parameter 'rwc' here
void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc);
^
1 error generated.
vim +106 mm/page_idle.c
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 85
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 86 static void page_idle_clear_pte_refs(struct page *page)
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 87 {
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 88) struct folio *folio = page_folio(page);
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 89 /*
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 90 * Since rwc.arg is unused, rwc is effectively immutable, so we
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 91 * can make it static const to save some cycles and stack.
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 92 */
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 93 static const struct rmap_walk_control rwc = {
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 94 .rmap_one = page_idle_clear_pte_refs_one,
2f031c6f042cb8 Matthew Wilcox (Oracle 2022-01-29 95) .anon_lock = folio_lock_anon_vma_read,
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 96 };
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 97 bool need_lock;
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 98
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 99) if (!folio_mapped(folio) || !folio_raw_mapping(folio))
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 100 return;
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 101
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 102) need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 103) if (need_lock && !folio_trylock(folio))
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 104 return;
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 105
84fbbe21894bb9 Matthew Wilcox (Oracle 2022-01-29 @106) rmap_walk(folio, &rwc);
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 107
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 108 if (need_lock)
4aed23a2f8aaaa Matthew Wilcox (Oracle 2022-01-29 109) folio_unlock(folio);
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 110 }
33c3fc71c8cfa3 Vladimir Davydov 2015-09-09 111
On Tue, May 03, 2022 at 10:03:41AM -0700, Minchan Kim wrote: > -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); > -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); > +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); > +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); I see the build bot already beat me to pointing out why this is wrong, but do you not look at git log to figure out why code was changed to be the way it is now, before you change it back?
On Wed, May 04, 2022 at 04:32:13AM +0100, Matthew Wilcox wrote: > On Tue, May 03, 2022 at 10:03:41AM -0700, Minchan Kim wrote: > > -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); > > -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); > > +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); > > +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); > > I see the build bot already beat me to pointing out why this is wrong, > but do you not look at git log to figure out why code was changed to be > the way it is now, before you change it back? This patch added a new field as out param like compact_control so the rmap_walk_control is not immutable.
On Tue, May 03, 2022 at 09:30:38PM -0700, Minchan Kim wrote: > On Wed, May 04, 2022 at 04:32:13AM +0100, Matthew Wilcox wrote: > > On Tue, May 03, 2022 at 10:03:41AM -0700, Minchan Kim wrote: > > > -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); > > > -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); > > > +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); > > > +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); > > > > I see the build bot already beat me to pointing out why this is wrong, > > but do you not look at git log to figure out why code was changed to be > > the way it is now, before you change it back? > > This patch added a new field as out param like compact_control so > the rmap_walk_control is not immutable. ... but we have a user which treats it as if it is.
On Wed, May 04, 2022 at 07:09:37AM +0100, Matthew Wilcox wrote: > On Tue, May 03, 2022 at 09:30:38PM -0700, Minchan Kim wrote: > > On Wed, May 04, 2022 at 04:32:13AM +0100, Matthew Wilcox wrote: > > > On Tue, May 03, 2022 at 10:03:41AM -0700, Minchan Kim wrote: > > > > -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); > > > > -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); > > > > +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); > > > > +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); > > > > > > I see the build bot already beat me to pointing out why this is wrong, > > > but do you not look at git log to figure out why code was changed to be > > > the way it is now, before you change it back? > > > > This patch added a new field as out param like compact_control so > > the rmap_walk_control is not immutable. > > ... but we have a user which treats it as if it is. True. I don't think it will show sizable benefit on runtime overhead since rmap_walk is already one of the most expensive operation in MM. I could reintroduce the typecast for page_idle_clear_pte_refs to remove the const as we had several years. If your concern was to make rmap_walk_control mutable back, I could change rmap_walk function having return value or adding a addtional new out param. However, I thought rmap_walk_control is more readable/ easier than them.
On Wed, May 04, 2022 at 08:52:13AM -0700, Minchan Kim wrote: > On Wed, May 04, 2022 at 07:09:37AM +0100, Matthew Wilcox wrote: > > On Tue, May 03, 2022 at 09:30:38PM -0700, Minchan Kim wrote: > > > On Wed, May 04, 2022 at 04:32:13AM +0100, Matthew Wilcox wrote: > > > > On Tue, May 03, 2022 at 10:03:41AM -0700, Minchan Kim wrote: > > > > > -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); > > > > > -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); > > > > > +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); > > > > > +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); > > > > > > > > I see the build bot already beat me to pointing out why this is wrong, > > > > but do you not look at git log to figure out why code was changed to be > > > > the way it is now, before you change it back? > > > > > > This patch added a new field as out param like compact_control so > > > the rmap_walk_control is not immutable. > > > > ... but we have a user which treats it as if it is. > > True. I don't think it will show sizable benefit on runtime overhead > since rmap_walk is already one of the most expensive operation in MM. > > I could reintroduce the typecast for page_idle_clear_pte_refs to remove > the const as we had several years. > > If your concern was to make rmap_walk_control mutable back, I could > change rmap_walk function having return value or adding a addtional > new out param. However, I thought rmap_walk_control is more readable/ > easier than them. I haven't thought deeply about it, but I suspect the right approach is to remove the rather dubious optimisation in page_idle_clear_pte_refs().
On Wed, May 04, 2022 at 05:55:33PM +0100, Matthew Wilcox wrote: > On Wed, May 04, 2022 at 08:52:13AM -0700, Minchan Kim wrote: > > On Wed, May 04, 2022 at 07:09:37AM +0100, Matthew Wilcox wrote: > > > On Tue, May 03, 2022 at 09:30:38PM -0700, Minchan Kim wrote: > > > > On Wed, May 04, 2022 at 04:32:13AM +0100, Matthew Wilcox wrote: > > > > > On Tue, May 03, 2022 at 10:03:41AM -0700, Minchan Kim wrote: > > > > > > -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); > > > > > > -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); > > > > > > +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); > > > > > > +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); > > > > > > > > > > I see the build bot already beat me to pointing out why this is wrong, > > > > > but do you not look at git log to figure out why code was changed to be > > > > > the way it is now, before you change it back? > > > > > > > > This patch added a new field as out param like compact_control so > > > > the rmap_walk_control is not immutable. > > > > > > ... but we have a user which treats it as if it is. > > > > True. I don't think it will show sizable benefit on runtime overhead > > since rmap_walk is already one of the most expensive operation in MM. > > > > I could reintroduce the typecast for page_idle_clear_pte_refs to remove > > the const as we had several years. > > > > If your concern was to make rmap_walk_control mutable back, I could > > change rmap_walk function having return value or adding a addtional > > new out param. However, I thought rmap_walk_control is more readable/ > > easier than them. > > I haven't thought deeply about it, but I suspect the right approach is > to remove the rather dubious optimisation in page_idle_clear_pte_refs(). Ccing Vladimir From a7f755e7e5cbe5d33893f9d4ca6bd95638ce1b16 Mon Sep 17 00:00:00 2001 From: Minchan Kim <minchan@kernel.org> Date: Tue, 3 May 2022 09:47:37 -0700 Subject: [PATCH v2] mm: don't be stuck to rmap lock on reclaim path The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contented under memory pressure if processes keep working on their vmas(e.g., fork, mmap, munmap). It makes reclaim path stuck. In our real workload traces, we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it makes other processes entering direct reclaim, which were also stuck on the lock. This patch makes LRU aging path try_lock mode like shink_page_list so the reclaim context will keep working with next LRU pages without being stuck. Since this patch introduces a new "contended" field as out param in rmap_walk_control, it's not immutable any longer so remove const keywords on rmap related functions. Since rmap walking is already expensive operation, I doubt the const would help sizable benefit(And we didn't have it until 5.17). Signed-off-by: Minchan Kim <minchan@kernel.org> --- include/linux/fs.h | 5 +++++ include/linux/ksm.h | 4 ++-- include/linux/rmap.h | 28 +++++++++++++++++------- mm/ksm.c | 10 +++++++-- mm/memory-failure.c | 2 +- mm/page_idle.c | 7 ++---- mm/rmap.c | 52 ++++++++++++++++++++++++++++++++++---------- mm/vmscan.c | 6 ++++- 8 files changed, 84 insertions(+), 30 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index f5ec00b2f073..5a169066f463 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -477,6 +477,11 @@ static inline void i_mmap_unlock_write(struct address_space *mapping) up_write(&mapping->i_mmap_rwsem); } +static inline int i_mmap_trylock_read(struct address_space *mapping) +{ + return down_read_trylock(&mapping->i_mmap_rwsem); +} + static inline void i_mmap_lock_read(struct address_space *mapping) { down_read(&mapping->i_mmap_rwsem); diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 0630e545f4cb..0b4f17418f64 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -51,7 +51,7 @@ static inline void ksm_exit(struct mm_struct *mm) struct page *ksm_might_need_to_copy(struct page *page, struct vm_area_struct *vma, unsigned long address); -void rmap_walk_ksm(struct folio *folio, const struct rmap_walk_control *rwc); +void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc); void folio_migrate_ksm(struct folio *newfolio, struct folio *folio); #else /* !CONFIG_KSM */ @@ -79,7 +79,7 @@ static inline struct page *ksm_might_need_to_copy(struct page *page, } static inline void rmap_walk_ksm(struct folio *folio, - const struct rmap_walk_control *rwc) + struct rmap_walk_control *rwc) { } diff --git a/include/linux/rmap.h b/include/linux/rmap.h index cbe279a6f0de..9ec23138e410 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -128,6 +128,11 @@ static inline void anon_vma_lock_read(struct anon_vma *anon_vma) down_read(&anon_vma->root->rwsem); } +static inline int anon_vma_trylock_read(struct anon_vma *anon_vma) +{ + return down_read_trylock(&anon_vma->root->rwsem); +} + static inline void anon_vma_unlock_read(struct anon_vma *anon_vma) { up_read(&anon_vma->root->rwsem); @@ -366,17 +371,14 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); -/* - * Called by memory-failure.c to kill processes. - */ -struct anon_vma *folio_lock_anon_vma_read(struct folio *folio); -void page_unlock_anon_vma_read(struct anon_vma *anon_vma); int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); /* * rmap_walk_control: To control rmap traversing for specific needs * * arg: passed to rmap_one() and invalid_vma() + * try_lock: bail out if the rmap lock is contended + * contended: indicate the rmap traversal bailed out due to lock contention * rmap_one: executed on each vma where page is mapped * done: for checking traversing termination condition * anon_lock: for getting anon_lock by optimized way rather than default @@ -384,6 +386,8 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); */ struct rmap_walk_control { void *arg; + bool try_lock; + bool contended; /* * Return false if page table scanning in rmap_walk should be stopped. * Otherwise, return true. @@ -391,12 +395,20 @@ struct rmap_walk_control { bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg); int (*done)(struct folio *folio); - struct anon_vma *(*anon_lock)(struct folio *folio); + struct anon_vma *(*anon_lock)(struct folio *folio, + struct rmap_walk_control *rwc); bool (*invalid_vma)(struct vm_area_struct *vma, void *arg); }; -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); + +/* + * Called by memory-failure.c to kill processes. + */ +struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, + struct rmap_walk_control *rwc); +void page_unlock_anon_vma_read(struct anon_vma *anon_vma); #else /* !CONFIG_MMU */ diff --git a/mm/ksm.c b/mm/ksm.c index ee607d3f8720..26da7f813f23 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2614,7 +2614,7 @@ struct page *ksm_might_need_to_copy(struct page *page, return new_page; } -void rmap_walk_ksm(struct folio *folio, const struct rmap_walk_control *rwc) +void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc) { struct stable_node *stable_node; struct rmap_item *rmap_item; @@ -2638,7 +2638,13 @@ void rmap_walk_ksm(struct folio *folio, const struct rmap_walk_control *rwc) struct vm_area_struct *vma; cond_resched(); - anon_vma_lock_read(anon_vma); + if (!anon_vma_trylock_read(anon_vma)) { + if (rwc->try_lock) { + rwc->contended = true; + return; + } + anon_vma_lock_read(anon_vma); + } anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, 0, ULONG_MAX) { unsigned long addr; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a83d32bbc567..09d60bc93140 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -485,7 +485,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill, struct anon_vma *av; pgoff_t pgoff; - av = folio_lock_anon_vma_read(folio); + av = folio_lock_anon_vma_read(folio, NULL); if (av == NULL) /* Not actually mapped anymore */ return; diff --git a/mm/page_idle.c b/mm/page_idle.c index fc0435abf909..fdff8c6dcd2d 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -86,11 +86,8 @@ static bool page_idle_clear_pte_refs_one(struct folio *folio, static void page_idle_clear_pte_refs(struct page *page) { struct folio *folio = page_folio(page); - /* - * Since rwc.arg is unused, rwc is effectively immutable, so we - * can make it static const to save some cycles and stack. - */ - static const struct rmap_walk_control rwc = { + + static struct rmap_walk_control rwc = { .rmap_one = page_idle_clear_pte_refs_one, .anon_lock = folio_lock_anon_vma_read, }; diff --git a/mm/rmap.c b/mm/rmap.c index 61e63db5dc6f..bbf32dbeb8ee 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -527,9 +527,11 @@ struct anon_vma *page_get_anon_vma(struct page *page) * * Its a little more complex as it tries to keep the fast path to a single * atomic op -- the trylock. If we fail the trylock, we fall back to getting a - * reference like with page_get_anon_vma() and then block on the mutex. + * reference like with page_get_anon_vma() and then block on the mutex + * on !rwc->try_lock case. */ -struct anon_vma *folio_lock_anon_vma_read(struct folio *folio) +struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, + struct rmap_walk_control *rwc) { struct anon_vma *anon_vma = NULL; struct anon_vma *root_anon_vma; @@ -557,6 +559,12 @@ struct anon_vma *folio_lock_anon_vma_read(struct folio *folio) goto out; } + if (rwc && rwc->try_lock) { + anon_vma = NULL; + rwc->contended = true; + goto out; + } + /* trylock failed, we got to sleep */ if (!atomic_inc_not_zero(&anon_vma->refcount)) { anon_vma = NULL; @@ -883,7 +891,8 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg) * * Quick test_and_clear_referenced for all mappings of a folio, * - * Return: The number of mappings which referenced the folio. + * Return: The number of mappings which referenced the folio. Return -1 if + * the function bailed out due to lock contention. */ int folio_referenced(struct folio *folio, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags) @@ -897,6 +906,7 @@ int folio_referenced(struct folio *folio, int is_locked, .rmap_one = folio_referenced_one, .arg = (void *)&pra, .anon_lock = folio_lock_anon_vma_read, + .try_lock = true, }; *vm_flags = 0; @@ -927,7 +937,7 @@ int folio_referenced(struct folio *folio, int is_locked, if (we_locked) folio_unlock(folio); - return pra.referenced; + return rwc.contended ? -1 : pra.referenced; } static int page_vma_mkclean_one(struct page_vma_mapped_walk *pvmw) @@ -2307,12 +2317,12 @@ void __put_anon_vma(struct anon_vma *anon_vma) } static struct anon_vma *rmap_walk_anon_lock(struct folio *folio, - const struct rmap_walk_control *rwc) + struct rmap_walk_control *rwc) { struct anon_vma *anon_vma; if (rwc->anon_lock) - return rwc->anon_lock(folio); + return rwc->anon_lock(folio, rwc); /* * Note: remove_migration_ptes() cannot use folio_lock_anon_vma_read() @@ -2324,7 +2334,17 @@ static struct anon_vma *rmap_walk_anon_lock(struct folio *folio, if (!anon_vma) return NULL; + if (anon_vma_trylock_read(anon_vma)) + goto out; + + if (rwc->try_lock) { + anon_vma = NULL; + rwc->contended = true; + goto out; + } + anon_vma_lock_read(anon_vma); +out: return anon_vma; } @@ -2338,7 +2358,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct folio *folio, * contained in the anon_vma struct it points to. */ static void rmap_walk_anon(struct folio *folio, - const struct rmap_walk_control *rwc, bool locked) + struct rmap_walk_control *rwc, bool locked) { struct anon_vma *anon_vma; pgoff_t pgoff_start, pgoff_end; @@ -2386,7 +2406,7 @@ static void rmap_walk_anon(struct folio *folio, * contained in the address_space struct it points to. */ static void rmap_walk_file(struct folio *folio, - const struct rmap_walk_control *rwc, bool locked) + struct rmap_walk_control *rwc, bool locked) { struct address_space *mapping = folio_mapping(folio); pgoff_t pgoff_start, pgoff_end; @@ -2405,8 +2425,18 @@ static void rmap_walk_file(struct folio *folio, pgoff_start = folio_pgoff(folio); pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; - if (!locked) + if (!locked) { + if (i_mmap_trylock_read(mapping)) + goto lookup; + + if (rwc->try_lock) { + rwc->contended = true; + return; + } + i_mmap_lock_read(mapping); + } +lookup: vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end) { unsigned long address = vma_address(&folio->page, vma); @@ -2428,7 +2458,7 @@ static void rmap_walk_file(struct folio *folio, i_mmap_unlock_read(mapping); } -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc) +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc) { if (unlikely(folio_test_ksm(folio))) rmap_walk_ksm(folio, rwc); @@ -2439,7 +2469,7 @@ void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc) } /* Like rmap_walk, but caller holds relevant rmap lock */ -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc) +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc) { /* no ksm support for now */ VM_BUG_ON_FOLIO(folio_test_ksm(folio), folio); diff --git a/mm/vmscan.c b/mm/vmscan.c index c6918fff06e1..e68c5715270a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1391,6 +1391,10 @@ static enum page_references folio_check_references(struct folio *folio, if (vm_flags & VM_LOCKED) return PAGEREF_ACTIVATE; + /* page_referenced didn't work due to lock contention */ + if (referenced_ptes == -1) + return PAGEREF_KEEP; + if (referenced_ptes) { /* * All mapped folios start out with page table @@ -2492,7 +2496,7 @@ static void shrink_active_list(unsigned long nr_to_scan, } if (folio_referenced(folio, 0, sc->target_mem_cgroup, - &vm_flags)) { + &vm_flags) > 0) { /* * Identify referenced, file-backed active pages and * give them one more trip around the active list. So
On Wed, May 04, 2022 at 04:47:51PM -0700, Minchan Kim wrote: > Since this patch introduces a new "contended" field as out param > in rmap_walk_control, it's not immutable any longer so remove > const keywords on rmap related functions. Since rmap walking > is already expensive operation, I doubt the const would help sizable > benefit(And we didn't have it until 5.17). Um? If it's now mutable, it surely can't be static as that means it would be shared by all callers, and you might write to the new fields in one caller and have them interpreted by the other caller! Or if it is safe, then the comment needs to not be deleted, but modified to explain why it's safe to do so in this instance, and that other instances should not copy the approach unless they are similarly safe. > diff --git a/mm/page_idle.c b/mm/page_idle.c > index fc0435abf909..fdff8c6dcd2d 100644 > --- a/mm/page_idle.c > +++ b/mm/page_idle.c > @@ -86,11 +86,8 @@ static bool page_idle_clear_pte_refs_one(struct folio *folio, > static void page_idle_clear_pte_refs(struct page *page) > { > struct folio *folio = page_folio(page); > - /* > - * Since rwc.arg is unused, rwc is effectively immutable, so we > - * can make it static const to save some cycles and stack. > - */ > - static const struct rmap_walk_control rwc = { > + > + static struct rmap_walk_control rwc = { > .rmap_one = page_idle_clear_pte_refs_one, > .anon_lock = folio_lock_anon_vma_read, > };
On Thu, May 05, 2022 at 01:42:17AM +0100, Matthew Wilcox wrote: > On Wed, May 04, 2022 at 04:47:51PM -0700, Minchan Kim wrote: > > Since this patch introduces a new "contended" field as out param > > in rmap_walk_control, it's not immutable any longer so remove > > const keywords on rmap related functions. Since rmap walking > > is already expensive operation, I doubt the const would help sizable > > benefit(And we didn't have it until 5.17). > > Um? If it's now mutable, it surely can't be static as that means it > would be shared by all callers, and you might write to the new fields > in one caller and have them interpreted by the other caller! > > Or if it is safe, then the comment needs to not be deleted, but modified > to explain why it's safe to do so in this instance, and that other > instances should not copy the approach unless they are similarly safe. It's safe since rwc.contended is used only when rwc.try_lock is true. > > > diff --git a/mm/page_idle.c b/mm/page_idle.c > > index fc0435abf909..fdff8c6dcd2d 100644 > > --- a/mm/page_idle.c > > +++ b/mm/page_idle.c > > @@ -86,11 +86,8 @@ static bool page_idle_clear_pte_refs_one(struct folio *folio, > > static void page_idle_clear_pte_refs(struct page *page) > > { > > struct folio *folio = page_folio(page); > > - /* > > - * Since rwc.arg is unused, rwc is effectively immutable, so we > > - * can make it static const to save some cycles and stack. > > - */ > > - static const struct rmap_walk_control rwc = { > > + > > + static struct rmap_walk_control rwc = { > > .rmap_one = page_idle_clear_pte_refs_one, > > .anon_lock = folio_lock_anon_vma_read, > > }; So, delta is diff --git a/mm/page_idle.c b/mm/page_idle.c index fdff8c6dcd2d..bc08332a609c 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -87,6 +87,10 @@ static void page_idle_clear_pte_refs(struct page *page) { struct folio *folio = page_folio(page); + /* + * Since rwc.try_lock is unused, rwc is effectively immutable, so we + * can make it static to save some cycles and stack. + */ static struct rmap_walk_control rwc = { .rmap_one = page_idle_clear_pte_refs_one, .anon_lock = folio_lock_anon_vma_read,
On Wed, 4 May 2022 23:11:04 -0700 Minchan Kim <minchan@kernel.org> wrote: > > > + > > > + static struct rmap_walk_control rwc = { > > > .rmap_one = page_idle_clear_pte_refs_one, > > > .anon_lock = folio_lock_anon_vma_read, > > > }; > > So, delta is --- a/mm/page_idle.c~mm-dont-be-stuck-to-rmap-lock-on-reclaim-path-fix +++ a/mm/page_idle.c @@ -87,6 +87,10 @@ static void page_idle_clear_pte_refs(str { struct folio *folio = page_folio(page); + /* + * Since rwc.try_lock is unused, rwc is effectively immutable, so we + * can make it static to save some cycles and stack. + */ static struct rmap_walk_control rwc = { .rmap_one = page_idle_clear_pte_refs_one, .anon_lock = folio_lock_anon_vma_read,
diff --git a/include/linux/fs.h b/include/linux/fs.h index f5ec00b2f073..5a169066f463 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -477,6 +477,11 @@ static inline void i_mmap_unlock_write(struct address_space *mapping) up_write(&mapping->i_mmap_rwsem); } +static inline int i_mmap_trylock_read(struct address_space *mapping) +{ + return down_read_trylock(&mapping->i_mmap_rwsem); +} + static inline void i_mmap_lock_read(struct address_space *mapping) { down_read(&mapping->i_mmap_rwsem); diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 0630e545f4cb..0b4f17418f64 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -51,7 +51,7 @@ static inline void ksm_exit(struct mm_struct *mm) struct page *ksm_might_need_to_copy(struct page *page, struct vm_area_struct *vma, unsigned long address); -void rmap_walk_ksm(struct folio *folio, const struct rmap_walk_control *rwc); +void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc); void folio_migrate_ksm(struct folio *newfolio, struct folio *folio); #else /* !CONFIG_KSM */ @@ -79,7 +79,7 @@ static inline struct page *ksm_might_need_to_copy(struct page *page, } static inline void rmap_walk_ksm(struct folio *folio, - const struct rmap_walk_control *rwc) + struct rmap_walk_control *rwc) { } diff --git a/include/linux/rmap.h b/include/linux/rmap.h index cbe279a6f0de..9ec23138e410 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -128,6 +128,11 @@ static inline void anon_vma_lock_read(struct anon_vma *anon_vma) down_read(&anon_vma->root->rwsem); } +static inline int anon_vma_trylock_read(struct anon_vma *anon_vma) +{ + return down_read_trylock(&anon_vma->root->rwsem); +} + static inline void anon_vma_unlock_read(struct anon_vma *anon_vma) { up_read(&anon_vma->root->rwsem); @@ -366,17 +371,14 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); -/* - * Called by memory-failure.c to kill processes. - */ -struct anon_vma *folio_lock_anon_vma_read(struct folio *folio); -void page_unlock_anon_vma_read(struct anon_vma *anon_vma); int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); /* * rmap_walk_control: To control rmap traversing for specific needs * * arg: passed to rmap_one() and invalid_vma() + * try_lock: bail out if the rmap lock is contended + * contended: indicate the rmap traversal bailed out due to lock contention * rmap_one: executed on each vma where page is mapped * done: for checking traversing termination condition * anon_lock: for getting anon_lock by optimized way rather than default @@ -384,6 +386,8 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); */ struct rmap_walk_control { void *arg; + bool try_lock; + bool contended; /* * Return false if page table scanning in rmap_walk should be stopped. * Otherwise, return true. @@ -391,12 +395,20 @@ struct rmap_walk_control { bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg); int (*done)(struct folio *folio); - struct anon_vma *(*anon_lock)(struct folio *folio); + struct anon_vma *(*anon_lock)(struct folio *folio, + struct rmap_walk_control *rwc); bool (*invalid_vma)(struct vm_area_struct *vma, void *arg); }; -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); + +/* + * Called by memory-failure.c to kill processes. + */ +struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, + struct rmap_walk_control *rwc); +void page_unlock_anon_vma_read(struct anon_vma *anon_vma); #else /* !CONFIG_MMU */ diff --git a/mm/ksm.c b/mm/ksm.c index ee607d3f8720..26da7f813f23 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2614,7 +2614,7 @@ struct page *ksm_might_need_to_copy(struct page *page, return new_page; } -void rmap_walk_ksm(struct folio *folio, const struct rmap_walk_control *rwc) +void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc) { struct stable_node *stable_node; struct rmap_item *rmap_item; @@ -2638,7 +2638,13 @@ void rmap_walk_ksm(struct folio *folio, const struct rmap_walk_control *rwc) struct vm_area_struct *vma; cond_resched(); - anon_vma_lock_read(anon_vma); + if (!anon_vma_trylock_read(anon_vma)) { + if (rwc->try_lock) { + rwc->contended = true; + return; + } + anon_vma_lock_read(anon_vma); + } anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, 0, ULONG_MAX) { unsigned long addr; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a83d32bbc567..09d60bc93140 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -485,7 +485,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill, struct anon_vma *av; pgoff_t pgoff; - av = folio_lock_anon_vma_read(folio); + av = folio_lock_anon_vma_read(folio, NULL); if (av == NULL) /* Not actually mapped anymore */ return; diff --git a/mm/rmap.c b/mm/rmap.c index 61e63db5dc6f..bbf32dbeb8ee 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -527,9 +527,11 @@ struct anon_vma *page_get_anon_vma(struct page *page) * * Its a little more complex as it tries to keep the fast path to a single * atomic op -- the trylock. If we fail the trylock, we fall back to getting a - * reference like with page_get_anon_vma() and then block on the mutex. + * reference like with page_get_anon_vma() and then block on the mutex + * on !rwc->try_lock case. */ -struct anon_vma *folio_lock_anon_vma_read(struct folio *folio) +struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, + struct rmap_walk_control *rwc) { struct anon_vma *anon_vma = NULL; struct anon_vma *root_anon_vma; @@ -557,6 +559,12 @@ struct anon_vma *folio_lock_anon_vma_read(struct folio *folio) goto out; } + if (rwc && rwc->try_lock) { + anon_vma = NULL; + rwc->contended = true; + goto out; + } + /* trylock failed, we got to sleep */ if (!atomic_inc_not_zero(&anon_vma->refcount)) { anon_vma = NULL; @@ -883,7 +891,8 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg) * * Quick test_and_clear_referenced for all mappings of a folio, * - * Return: The number of mappings which referenced the folio. + * Return: The number of mappings which referenced the folio. Return -1 if + * the function bailed out due to lock contention. */ int folio_referenced(struct folio *folio, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags) @@ -897,6 +906,7 @@ int folio_referenced(struct folio *folio, int is_locked, .rmap_one = folio_referenced_one, .arg = (void *)&pra, .anon_lock = folio_lock_anon_vma_read, + .try_lock = true, }; *vm_flags = 0; @@ -927,7 +937,7 @@ int folio_referenced(struct folio *folio, int is_locked, if (we_locked) folio_unlock(folio); - return pra.referenced; + return rwc.contended ? -1 : pra.referenced; } static int page_vma_mkclean_one(struct page_vma_mapped_walk *pvmw) @@ -2307,12 +2317,12 @@ void __put_anon_vma(struct anon_vma *anon_vma) } static struct anon_vma *rmap_walk_anon_lock(struct folio *folio, - const struct rmap_walk_control *rwc) + struct rmap_walk_control *rwc) { struct anon_vma *anon_vma; if (rwc->anon_lock) - return rwc->anon_lock(folio); + return rwc->anon_lock(folio, rwc); /* * Note: remove_migration_ptes() cannot use folio_lock_anon_vma_read() @@ -2324,7 +2334,17 @@ static struct anon_vma *rmap_walk_anon_lock(struct folio *folio, if (!anon_vma) return NULL; + if (anon_vma_trylock_read(anon_vma)) + goto out; + + if (rwc->try_lock) { + anon_vma = NULL; + rwc->contended = true; + goto out; + } + anon_vma_lock_read(anon_vma); +out: return anon_vma; } @@ -2338,7 +2358,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct folio *folio, * contained in the anon_vma struct it points to. */ static void rmap_walk_anon(struct folio *folio, - const struct rmap_walk_control *rwc, bool locked) + struct rmap_walk_control *rwc, bool locked) { struct anon_vma *anon_vma; pgoff_t pgoff_start, pgoff_end; @@ -2386,7 +2406,7 @@ static void rmap_walk_anon(struct folio *folio, * contained in the address_space struct it points to. */ static void rmap_walk_file(struct folio *folio, - const struct rmap_walk_control *rwc, bool locked) + struct rmap_walk_control *rwc, bool locked) { struct address_space *mapping = folio_mapping(folio); pgoff_t pgoff_start, pgoff_end; @@ -2405,8 +2425,18 @@ static void rmap_walk_file(struct folio *folio, pgoff_start = folio_pgoff(folio); pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; - if (!locked) + if (!locked) { + if (i_mmap_trylock_read(mapping)) + goto lookup; + + if (rwc->try_lock) { + rwc->contended = true; + return; + } + i_mmap_lock_read(mapping); + } +lookup: vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end) { unsigned long address = vma_address(&folio->page, vma); @@ -2428,7 +2458,7 @@ static void rmap_walk_file(struct folio *folio, i_mmap_unlock_read(mapping); } -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc) +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc) { if (unlikely(folio_test_ksm(folio))) rmap_walk_ksm(folio, rwc); @@ -2439,7 +2469,7 @@ void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc) } /* Like rmap_walk, but caller holds relevant rmap lock */ -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc) +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc) { /* no ksm support for now */ VM_BUG_ON_FOLIO(folio_test_ksm(folio), folio); diff --git a/mm/vmscan.c b/mm/vmscan.c index c6918fff06e1..e68c5715270a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1391,6 +1391,10 @@ static enum page_references folio_check_references(struct folio *folio, if (vm_flags & VM_LOCKED) return PAGEREF_ACTIVATE; + /* page_referenced didn't work due to lock contention */ + if (referenced_ptes == -1) + return PAGEREF_KEEP; + if (referenced_ptes) { /* * All mapped folios start out with page table @@ -2492,7 +2496,7 @@ static void shrink_active_list(unsigned long nr_to_scan, } if (folio_referenced(folio, 0, sc->target_mem_cgroup, - &vm_flags)) { + &vm_flags) > 0) { /* * Identify referenced, file-backed active pages and * give them one more trip around the active list. So
The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contented under memory pressure if processes keep working on their vmas(e.g., fork, mmap, munmap). It makes reclaim path stuck. In our real workload traces, we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it makes other processes entering direct reclaim, which were also stuck on the lock. This patch makes LRU aging path try_lock mode like shink_page_list so the reclaim context will keep working with next LRU pages without being stuck. Signed-off-by: Minchan Kim <minchan@kernel.org> --- include/linux/fs.h | 5 +++++ include/linux/ksm.h | 4 ++-- include/linux/rmap.h | 28 +++++++++++++++++------- mm/ksm.c | 10 +++++++-- mm/memory-failure.c | 2 +- mm/rmap.c | 52 ++++++++++++++++++++++++++++++++++---------- mm/vmscan.c | 6 ++++- 7 files changed, 82 insertions(+), 25 deletions(-)