diff mbox series

mm/memcontrol: make the local VM stats consistent with total stats

Message ID 1562750823-2762-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/memcontrol: make the local VM stats consistent with total stats | expand

Commit Message

Yafang Shao July 10, 2019, 9:27 a.m. UTC
After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
the local VM stats is not consistent with total VM stats.

Bellow is one example on my server (with 8 CPUs),
	inactive_file 3567570944
	total_inactive_file 3568029696

We can find that the deviation is very great, that is because the 'val' in
__mod_memcg_state() is in pages while the effective value
in memcg_stat_show() is in bytes.
So the maximum of this deviation between local VM stats and total VM
stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
great value.

We should make the local VM stats consistent with the total stats.
Although the deviation between local VM events and total events are not
great, I think we'd better make them consistent with each other as well.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
---
 mm/memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Johannes Weiner July 10, 2019, 8:38 p.m. UTC | #1
On Wed, Jul 10, 2019 at 05:27:03AM -0400, Yafang Shao wrote:
> After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> the local VM stats is not consistent with total VM stats.
>
> Bellow is one example on my server (with 8 CPUs),
> 	inactive_file 3567570944
> 	total_inactive_file 3568029696
> 
> We can find that the deviation is very great, that is because the 'val' in
> __mod_memcg_state() is in pages while the effective value
> in memcg_stat_show() is in bytes.
> So the maximum of this deviation between local VM stats and total VM
> stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> great value.
> 
> We should make the local VM stats consistent with the total stats.
> Although the deviation between local VM events and total events are not
> great, I think we'd better make them consistent with each other as well.

Ha - the local stats are not percpu-fuzzy enough... But I guess that
is a valid complaint.

> ---
>  mm/memcontrol.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ba9138a..a9448c3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -691,12 +691,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	__this_cpu_add(memcg->vmstats_local->stat[idx], val);
>  
>  	x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
>  	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
>  		struct mem_cgroup *mi;
>  
> +		__this_cpu_add(memcg->vmstats_local->stat[idx], x);
>  		for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
>  			atomic_long_add(x, &mi->vmstats[idx]);
>  		x = 0;
> @@ -773,12 +773,12 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	__this_cpu_add(memcg->vmstats_local->events[idx], count);
>  
>  	x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]);
>  	if (unlikely(x > MEMCG_CHARGE_BATCH)) {
>  		struct mem_cgroup *mi;
>  
> +		__this_cpu_add(memcg->vmstats_local->events[idx], x);
>  		for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
>  			atomic_long_add(x, &mi->vmevents[idx]);
>  		x = 0;

Please also update __mod_lruvec_state() to keep this behavior the same
across counters, to make sure we won't have any surprises when
switching between them.

And please add comments explaining that we batch local counters to
keep them in sync with the hierarchical ones. Because it does look a
little odd without explanation.
Yafang Shao July 11, 2019, 1:10 a.m. UTC | #2
On Thu, Jul 11, 2019 at 4:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Jul 10, 2019 at 05:27:03AM -0400, Yafang Shao wrote:
> > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> > the local VM stats is not consistent with total VM stats.
> >
> > Bellow is one example on my server (with 8 CPUs),
> >       inactive_file 3567570944
> >       total_inactive_file 3568029696
> >
> > We can find that the deviation is very great, that is because the 'val' in
> > __mod_memcg_state() is in pages while the effective value
> > in memcg_stat_show() is in bytes.
> > So the maximum of this deviation between local VM stats and total VM
> > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> > great value.
> >
> > We should make the local VM stats consistent with the total stats.
> > Although the deviation between local VM events and total events are not
> > great, I think we'd better make them consistent with each other as well.
>
> Ha - the local stats are not percpu-fuzzy enough... But I guess that
> is a valid complaint.
>
> > ---
> >  mm/memcontrol.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index ba9138a..a9448c3 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -691,12 +691,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
> >       if (mem_cgroup_disabled())
> >               return;
> >
> > -     __this_cpu_add(memcg->vmstats_local->stat[idx], val);
> >
> >       x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
> >       if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
> >               struct mem_cgroup *mi;
> >
> > +             __this_cpu_add(memcg->vmstats_local->stat[idx], x);
> >               for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
> >                       atomic_long_add(x, &mi->vmstats[idx]);
> >               x = 0;
> > @@ -773,12 +773,12 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
> >       if (mem_cgroup_disabled())
> >               return;
> >
> > -     __this_cpu_add(memcg->vmstats_local->events[idx], count);
> >
> >       x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]);
> >       if (unlikely(x > MEMCG_CHARGE_BATCH)) {
> >               struct mem_cgroup *mi;
> >
> > +             __this_cpu_add(memcg->vmstats_local->events[idx], x);
> >               for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
> >                       atomic_long_add(x, &mi->vmevents[idx]);
> >               x = 0;
>
> Please also update __mod_lruvec_state() to keep this behavior the same
> across counters, to make sure we won't have any surprises when
> switching between them.
>
> And please add comments explaining that we batch local counters to
> keep them in sync with the hierarchical ones. Because it does look a
> little odd without explanation.

Sure, I will do it.

Thanks
Yafang
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba9138a..a9448c3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -691,12 +691,12 @@  void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
 	if (mem_cgroup_disabled())
 		return;
 
-	__this_cpu_add(memcg->vmstats_local->stat[idx], val);
 
 	x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
 	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
 		struct mem_cgroup *mi;
 
+		__this_cpu_add(memcg->vmstats_local->stat[idx], x);
 		for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
 			atomic_long_add(x, &mi->vmstats[idx]);
 		x = 0;
@@ -773,12 +773,12 @@  void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
 	if (mem_cgroup_disabled())
 		return;
 
-	__this_cpu_add(memcg->vmstats_local->events[idx], count);
 
 	x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]);
 	if (unlikely(x > MEMCG_CHARGE_BATCH)) {
 		struct mem_cgroup *mi;
 
+		__this_cpu_add(memcg->vmstats_local->events[idx], x);
 		for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
 			atomic_long_add(x, &mi->vmevents[idx]);
 		x = 0;