Message ID | 20181009184732.762-4-hannes@cmpxchg.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: workingset & shrinker fixes | expand |
On Tue, 9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > Make it easier to catch bugs in the shadow node shrinker by adding a > counter for the shadow nodes in circulation. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > --- > include/linux/mmzone.h | 1 + > mm/vmstat.c | 1 + > mm/workingset.c | 12 ++++++++++-- > 3 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 4179e67add3d..d82e80d82aa6 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -161,6 +161,7 @@ enum node_stat_item { > NR_SLAB_UNRECLAIMABLE, > NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ > NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ > + WORKINGSET_NODES, Documentation/admin-guide/cgroup-v2.rst, please. And please check for any other missing items while in there?
On Tue, 9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > --- a/mm/workingset.c > +++ b/mm/workingset.c > @@ -378,11 +378,17 @@ void workingset_update_node(struct xa_node *node) > * as node->private_list is protected by the i_pages lock. > */ > if (node->count && node->count == node->nr_values) { > - if (list_empty(&node->private_list)) > + if (list_empty(&node->private_list)) { > list_lru_add(&shadow_nodes, &node->private_list); > + __inc_lruvec_page_state(virt_to_page(node), > + WORKINGSET_NODES); > + } > } else { > - if (!list_empty(&node->private_list)) > + if (!list_empty(&node->private_list)) { > list_lru_del(&shadow_nodes, &node->private_list); > + __dec_lruvec_page_state(virt_to_page(node), > + WORKINGSET_NODES); > + } > } > } A bit worried that we're depending on the caller's caller to have disabled interrupts to avoid subtle and rare errors. Can we do this? --- a/mm/workingset.c~mm-workingset-add-vmstat-counter-for-shadow-nodes-fix +++ a/mm/workingset.c @@ -377,6 +377,8 @@ void workingset_update_node(struct radix * already where they should be. The list_empty() test is safe * as node->private_list is protected by the i_pages lock. */ + WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */ + if (node->count && node->count == node->exceptional) { if (list_empty(&node->private_list)) { list_lru_add(&shadow_nodes, &node->private_list);
On Tue, Oct 09, 2018 at 03:04:01PM -0700, Andrew Morton wrote: > On Tue, 9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > Make it easier to catch bugs in the shadow node shrinker by adding a > > counter for the shadow nodes in circulation. > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > --- > > include/linux/mmzone.h | 1 + > > mm/vmstat.c | 1 + > > mm/workingset.c | 12 ++++++++++-- > > 3 files changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index 4179e67add3d..d82e80d82aa6 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -161,6 +161,7 @@ enum node_stat_item { > > NR_SLAB_UNRECLAIMABLE, > > NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ > > NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ > > + WORKINGSET_NODES, > > Documentation/admin-guide/cgroup-v2.rst, please. And please check for > any other missing items while in there? The new counter isn't being added to the per-cgroup memory.stat, actually, it just shows in /proc/vmstat. It seemed a bit too low-level for the cgroup interface, and the other stats in there are in bytes, which isn't straight-forward to calculate with sl*b packing. Not that I'm against adding a cgroup breakdown in general, but the global counter was enough to see if things were working right or not, so I'd cross that bridge when somebody needs it per cgroup. But I checked cgroup-v2.rst anyway: all the exported items are documented. Only the reclaim vs. refault stats were in different orders: the doc has the refault stats first, the interface leads with the reclaim stats. The refault stats go better with the page fault stats, and are probably of more interest (since they have higher impact on performance) than the LRU shuffling, so maybe this? --- Subject: [PATCH] mm: memcontrol: fix memory.stat item ordering The refault stats go better with the page fault stats, and are of higher interest than the stats on LRU operations. In fact they used to be grouped together; when the LRU operation stats were added later on, they were wedged in between. Move them back together. Documentation/admin-guide/cgroup-v2.rst already lists them in the right order. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 81b47d0b14d7..ed15f233d31d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5575,6 +5575,13 @@ static int memory_stat_show(struct seq_file *m, void *v) seq_printf(m, "pgfault %lu\n", acc.events[PGFAULT]); seq_printf(m, "pgmajfault %lu\n", acc.events[PGMAJFAULT]); + seq_printf(m, "workingset_refault %lu\n", + acc.stat[WORKINGSET_REFAULT]); + seq_printf(m, "workingset_activate %lu\n", + acc.stat[WORKINGSET_ACTIVATE]); + seq_printf(m, "workingset_nodereclaim %lu\n", + acc.stat[WORKINGSET_NODERECLAIM]); + seq_printf(m, "pgrefill %lu\n", acc.events[PGREFILL]); seq_printf(m, "pgscan %lu\n", acc.events[PGSCAN_KSWAPD] + acc.events[PGSCAN_DIRECT]); @@ -5585,13 +5592,6 @@ static int memory_stat_show(struct seq_file *m, void *v) seq_printf(m, "pglazyfree %lu\n", acc.events[PGLAZYFREE]); seq_printf(m, "pglazyfreed %lu\n", acc.events[PGLAZYFREED]); - seq_printf(m, "workingset_refault %lu\n", - acc.stat[WORKINGSET_REFAULT]); - seq_printf(m, "workingset_activate %lu\n", - acc.stat[WORKINGSET_ACTIVATE]); - seq_printf(m, "workingset_nodereclaim %lu\n", - acc.stat[WORKINGSET_NODERECLAIM]); - return 0; }
On Tue, Oct 09, 2018 at 03:08:45PM -0700, Andrew Morton wrote: > On Tue, 9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > --- a/mm/workingset.c > > +++ b/mm/workingset.c > > @@ -378,11 +378,17 @@ void workingset_update_node(struct xa_node *node) > > * as node->private_list is protected by the i_pages lock. > > */ > > if (node->count && node->count == node->nr_values) { > > - if (list_empty(&node->private_list)) > > + if (list_empty(&node->private_list)) { > > list_lru_add(&shadow_nodes, &node->private_list); > > + __inc_lruvec_page_state(virt_to_page(node), > > + WORKINGSET_NODES); > > + } > > } else { > > - if (!list_empty(&node->private_list)) > > + if (!list_empty(&node->private_list)) { > > list_lru_del(&shadow_nodes, &node->private_list); > > + __dec_lruvec_page_state(virt_to_page(node), > > + WORKINGSET_NODES); > > + } > > } > > } > > A bit worried that we're depending on the caller's caller to have > disabled interrupts to avoid subtle and rare errors. > > Can we do this? I'm not opposed to it, but the i_pages lock is guaranteed to be held during the tree update, and that lock is also taken from the io completion irq to maintain the tree's dirty/writeback state. It seems like a robust assumption that interrupts will be disabled here. But all that isn't very obvious from the code at hand, so I wouldn't mind adding the check for documentation purposes. It's not a super hot path, but maybe VM_WARN_ON_ONCE()? > --- a/mm/workingset.c~mm-workingset-add-vmstat-counter-for-shadow-nodes-fix > +++ a/mm/workingset.c > @@ -377,6 +377,8 @@ void workingset_update_node(struct radix > * already where they should be. The list_empty() test is safe > * as node->private_list is protected by the i_pages lock. > */ > + WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */ > + > if (node->count && node->count == node->exceptional) { > if (list_empty(&node->private_list)) { > list_lru_add(&shadow_nodes, &node->private_list); > _
On Tue, Oct 09, 2018 at 03:08:45PM -0700, Andrew Morton wrote: > On Tue, 9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > --- a/mm/workingset.c > > +++ b/mm/workingset.c > > @@ -378,11 +378,17 @@ void workingset_update_node(struct xa_node *node) > > * as node->private_list is protected by the i_pages lock. > > */ > > if (node->count && node->count == node->nr_values) { > > - if (list_empty(&node->private_list)) > > + if (list_empty(&node->private_list)) { > > list_lru_add(&shadow_nodes, &node->private_list); > > + __inc_lruvec_page_state(virt_to_page(node), > > + WORKINGSET_NODES); > > + } > > } else { > > - if (!list_empty(&node->private_list)) > > + if (!list_empty(&node->private_list)) { > > list_lru_del(&shadow_nodes, &node->private_list); > > + __dec_lruvec_page_state(virt_to_page(node), > > + WORKINGSET_NODES); > > + } > > } > > } > > A bit worried that we're depending on the caller's caller to have > disabled interrupts to avoid subtle and rare errors. > > Can we do this? > > --- a/mm/workingset.c~mm-workingset-add-vmstat-counter-for-shadow-nodes-fix > +++ a/mm/workingset.c > @@ -377,6 +377,8 @@ void workingset_update_node(struct radix > * already where they should be. The list_empty() test is safe > * as node->private_list is protected by the i_pages lock. > */ > + WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */ > + > if (node->count && node->count == node->exceptional) { > if (list_empty(&node->private_list)) { > list_lru_add(&shadow_nodes, &node->private_list); Note that for whatever reason, I've observed that irqs_disabled() is actually quite an expensive call. I'm not saying the warning is a bad idea but it should not be sprinkled around unnecessary and may be more suitable as a debug option.
On Tue, 16 Oct 2018 09:49:23 +0100 Mel Gorman <mgorman@techsingularity.net> wrote: > > Can we do this? > > > > --- a/mm/workingset.c~mm-workingset-add-vmstat-counter-for-shadow-nodes-fix > > +++ a/mm/workingset.c > > @@ -377,6 +377,8 @@ void workingset_update_node(struct radix > > * already where they should be. The list_empty() test is safe > > * as node->private_list is protected by the i_pages lock. > > */ > > + WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */ > > + > > if (node->count && node->count == node->exceptional) { > > if (list_empty(&node->private_list)) { > > list_lru_add(&shadow_nodes, &node->private_list); > > Note that for whatever reason, I've observed that irqs_disabled() is > actually quite an expensive call. I'm not saying the warning is a bad > idea but it should not be sprinkled around unnecessary and may be more > suitable as a debug option. Yup, it is now VM_WARN_ON_ONCE().
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4179e67add3d..d82e80d82aa6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -161,6 +161,7 @@ enum node_stat_item { NR_SLAB_UNRECLAIMABLE, NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ + WORKINGSET_NODES, WORKINGSET_REFAULT, WORKINGSET_ACTIVATE, WORKINGSET_RESTORE, diff --git a/mm/vmstat.c b/mm/vmstat.c index d08ed044759d..6038ce593ce3 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1143,6 +1143,7 @@ const char * const vmstat_text[] = { "nr_slab_unreclaimable", "nr_isolated_anon", "nr_isolated_file", + "workingset_nodes", "workingset_refault", "workingset_activate", "workingset_restore", diff --git a/mm/workingset.c b/mm/workingset.c index f564aaa6b71d..cfdf6adf7e7c 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -378,11 +378,17 @@ void workingset_update_node(struct xa_node *node) * as node->private_list is protected by the i_pages lock. */ if (node->count && node->count == node->nr_values) { - if (list_empty(&node->private_list)) + if (list_empty(&node->private_list)) { list_lru_add(&shadow_nodes, &node->private_list); + __inc_lruvec_page_state(virt_to_page(node), + WORKINGSET_NODES); + } } else { - if (!list_empty(&node->private_list)) + if (!list_empty(&node->private_list)) { list_lru_del(&shadow_nodes, &node->private_list); + __dec_lruvec_page_state(virt_to_page(node), + WORKINGSET_NODES); + } } } @@ -472,6 +478,8 @@ static enum lru_status shadow_lru_isolate(struct list_head *item, } list_lru_isolate(lru, item); + __dec_lruvec_page_state(virt_to_page(node), WORKINGSET_NODES); + spin_unlock(lru_lock); /*
Make it easier to catch bugs in the shadow node shrinker by adding a counter for the shadow nodes in circulation. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/mmzone.h | 1 + mm/vmstat.c | 1 + mm/workingset.c | 12 ++++++++++-- 3 files changed, 12 insertions(+), 2 deletions(-)