diff mbox series

[v2] mm/memcontrol: keep local VM counters in sync with the hierarchical ones

Message ID 1562851979-10610-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] mm/memcontrol: keep local VM counters in sync with the hierarchical ones | expand

Commit Message

Yafang Shao July 11, 2019, 1:32 p.m. UTC
After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
the local VM counters is not in sync with the hierarchical ones.

Bellow is one example in a leaf memcg on my server (with 8 CPUs),
	inactive_file 3567570944
	total_inactive_file 3568029696
We can find that the deviation is very great, that is because the 'val' in
__mod_memcg_state() is in pages while the effective value in
memcg_stat_show() is in bytes.
So the maximum of this deviation between local VM stats and total VM
stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
great value.

We should keep the local VM stats in sync with the total stats.
In order to keep this behavior the same across counters, this patch updates
__mod_lruvec_state() and __count_memcg_events() as well.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
---
 mm/memcontrol.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

Comments

Johannes Weiner July 11, 2019, 3:19 p.m. UTC | #1
On Thu, Jul 11, 2019 at 09:32:59AM -0400, Yafang Shao wrote:
> After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> the local VM counters is not in sync with the hierarchical ones.
> 
> Bellow is one example in a leaf memcg on my server (with 8 CPUs),
> 	inactive_file 3567570944
> 	total_inactive_file 3568029696
> We can find that the deviation is very great, that is because the 'val' in
> __mod_memcg_state() is in pages while the effective value in
> memcg_stat_show() is in bytes.
> So the maximum of this deviation between local VM stats and total VM
> stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> great value.
> 
> We should keep the local VM stats in sync with the total stats.
> In order to keep this behavior the same across counters, this patch updates
> __mod_lruvec_state() and __count_memcg_events() as well.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Yafang Shao <shaoyafang@didiglobal.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Andrew Morton July 11, 2019, 11:42 p.m. UTC | #2
On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@gmail.com> wrote:

> After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> the local VM counters is not in sync with the hierarchical ones.
> 
> Bellow is one example in a leaf memcg on my server (with 8 CPUs),
> 	inactive_file 3567570944
> 	total_inactive_file 3568029696
> We can find that the deviation is very great, that is because the 'val' in
> __mod_memcg_state() is in pages while the effective value in
> memcg_stat_show() is in bytes.
> So the maximum of this deviation between local VM stats and total VM
> stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> great value.
> 
> We should keep the local VM stats in sync with the total stats.
> In order to keep this behavior the same across counters, this patch updates
> __mod_lruvec_state() and __count_memcg_events() as well.

hm.

So the local counters are presently more accurate than the hierarchical
ones because the hierarchical counters use batching.  And the proposal
is to make the local counters less accurate so that the inaccuracies
will match.

It is a bit counter intuitive to hear than worsened accuracy is a good
thing!  We're told that the difference may be "unacceptably great" but
we aren't told why.  Some additional information to support this
surprising assertion would be useful, please.  What are the use-cases
which are harmed by this difference and how are they harmed?

> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -691,12 +691,15 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	__this_cpu_add(memcg->vmstats_local->stat[idx], val);
> -
>  	x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
>  	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
>  		struct mem_cgroup *mi;
>  
> +		/*
> +		 * Batch local counters to keep them in sync with
> +		 * the hierarchical ones.
> +		 */
> +		__this_cpu_add(memcg->vmstats_local->stat[idx], x);

Given that we are no longer batching updates to the local counters, I
wonder if it is still necessary to accumulate the counters on a per-cpu
basis.  ie, can we now do

		atomic_long_add(memcg->vmstats_local->stat[idx], x);

and remove the loop in memcg_events_local()?
Yafang Shao July 12, 2019, 1:47 a.m. UTC | #3
On Fri, Jul 12, 2019 at 7:42 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> > the local VM counters is not in sync with the hierarchical ones.
> >
> > Bellow is one example in a leaf memcg on my server (with 8 CPUs),
> >       inactive_file 3567570944
> >       total_inactive_file 3568029696
> > We can find that the deviation is very great, that is because the 'val' in
> > __mod_memcg_state() is in pages while the effective value in
> > memcg_stat_show() is in bytes.
> > So the maximum of this deviation between local VM stats and total VM
> > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> > great value.
> >
> > We should keep the local VM stats in sync with the total stats.
> > In order to keep this behavior the same across counters, this patch updates
> > __mod_lruvec_state() and __count_memcg_events() as well.
>
> hm.
>
> So the local counters are presently more accurate than the hierarchical
> ones because the hierarchical counters use batching.  And the proposal
> is to make the local counters less accurate so that the inaccuracies
> will match.
>
> It is a bit counter intuitive to hear than worsened accuracy is a good
> thing!  We're told that the difference may be "unacceptably great" but
> we aren't told why.  Some additional information to support this
> surprising assertion would be useful, please.  What are the use-cases
> which are harmed by this difference and how are they harmed?
>

Hi Andrew,

Both local counter and the hierachical one are exposed to user.
In a leaf memcg, the local counter should be equal with the hierarchical one,
if they are different, the user may wondering what's wrong in this memcg.
IOW, the difference makes these counters not reliable, if they are not
reliable we can't use them to help us anylze issues.
Another method is making the hierachical counter as accurate as the
local one, but that will consume more CPU cycles,
because we have to calculate all the descendants' counter, that may
not scalable as some counters are in the critical path.


> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -691,12 +691,15 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
> >       if (mem_cgroup_disabled())
> >               return;
> >
> > -     __this_cpu_add(memcg->vmstats_local->stat[idx], val);
> > -
> >       x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
> >       if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
> >               struct mem_cgroup *mi;
> >
> > +             /*
> > +              * Batch local counters to keep them in sync with
> > +              * the hierarchical ones.
> > +              */
> > +             __this_cpu_add(memcg->vmstats_local->stat[idx], x);
>
> Given that we are no longer batching updates to the local counters, I
> wonder if it is still necessary to accumulate the counters on a per-cpu
> basis.  ie, can we now do
>
>                 atomic_long_add(memcg->vmstats_local->stat[idx], x);
>
> and remove the loop in memcg_events_local()?
>
As explained in the commit 815744d75152, that may cause performance
hit by bouncing the additional cachelines
from the new hierarchical statistics counters.

Thanks
Yafang
Michal Hocko July 12, 2019, 5:29 a.m. UTC | #4
On Fri 12-07-19 09:47:14, Yafang Shao wrote:
> On Fri, Jul 12, 2019 at 7:42 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> > > the local VM counters is not in sync with the hierarchical ones.
> > >
> > > Bellow is one example in a leaf memcg on my server (with 8 CPUs),
> > >       inactive_file 3567570944
> > >       total_inactive_file 3568029696
> > > We can find that the deviation is very great, that is because the 'val' in
> > > __mod_memcg_state() is in pages while the effective value in
> > > memcg_stat_show() is in bytes.
> > > So the maximum of this deviation between local VM stats and total VM
> > > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> > > great value.
> > >
> > > We should keep the local VM stats in sync with the total stats.
> > > In order to keep this behavior the same across counters, this patch updates
> > > __mod_lruvec_state() and __count_memcg_events() as well.
> >
> > hm.
> >
> > So the local counters are presently more accurate than the hierarchical
> > ones because the hierarchical counters use batching.  And the proposal
> > is to make the local counters less accurate so that the inaccuracies
> > will match.
> >
> > It is a bit counter intuitive to hear than worsened accuracy is a good
> > thing!  We're told that the difference may be "unacceptably great" but
> > we aren't told why.  Some additional information to support this
> > surprising assertion would be useful, please.  What are the use-cases
> > which are harmed by this difference and how are they harmed?
> >
> 
> Hi Andrew,
> 
> Both local counter and the hierachical one are exposed to user.
> In a leaf memcg, the local counter should be equal with the hierarchical one,
> if they are different, the user may wondering what's wrong in this memcg.
> IOW, the difference makes these counters not reliable, if they are not
> reliable we can't use them to help us anylze issues.

But those numbers are in flight anyway. We do not stop updating them
while they are read so there is no guarantee they will be consistent
anyway, right?
Yafang Shao July 12, 2019, 6:12 a.m. UTC | #5
On Fri, Jul 12, 2019 at 1:29 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 12-07-19 09:47:14, Yafang Shao wrote:
> > On Fri, Jul 12, 2019 at 7:42 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> > > > the local VM counters is not in sync with the hierarchical ones.
> > > >
> > > > Bellow is one example in a leaf memcg on my server (with 8 CPUs),
> > > >       inactive_file 3567570944
> > > >       total_inactive_file 3568029696
> > > > We can find that the deviation is very great, that is because the 'val' in
> > > > __mod_memcg_state() is in pages while the effective value in
> > > > memcg_stat_show() is in bytes.
> > > > So the maximum of this deviation between local VM stats and total VM
> > > > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> > > > great value.
> > > >
> > > > We should keep the local VM stats in sync with the total stats.
> > > > In order to keep this behavior the same across counters, this patch updates
> > > > __mod_lruvec_state() and __count_memcg_events() as well.
> > >
> > > hm.
> > >
> > > So the local counters are presently more accurate than the hierarchical
> > > ones because the hierarchical counters use batching.  And the proposal
> > > is to make the local counters less accurate so that the inaccuracies
> > > will match.
> > >
> > > It is a bit counter intuitive to hear than worsened accuracy is a good
> > > thing!  We're told that the difference may be "unacceptably great" but
> > > we aren't told why.  Some additional information to support this
> > > surprising assertion would be useful, please.  What are the use-cases
> > > which are harmed by this difference and how are they harmed?
> > >
> >
> > Hi Andrew,
> >
> > Both local counter and the hierachical one are exposed to user.
> > In a leaf memcg, the local counter should be equal with the hierarchical one,
> > if they are different, the user may wondering what's wrong in this memcg.
> > IOW, the difference makes these counters not reliable, if they are not
> > reliable we can't use them to help us anylze issues.
>
> But those numbers are in flight anyway. We do not stop updating them
> while they are read so there is no guarantee they will be consistent
> anyway, right?

Right.
They can't be guaranted to be consistent.
When we read them, may only the local counters are updated and the
hierarchical ones are not updated yet.
But the current deviation is so great that can't be ignored.
So the question is similar like  what about increasing the
MEMCG_CHARGE_BATCH from 32 to 32 * 4096 ?

Thanks
Yafang
Michal Hocko July 12, 2019, 6:53 a.m. UTC | #6
On Fri 12-07-19 14:12:30, Yafang Shao wrote:
> On Fri, Jul 12, 2019 at 1:29 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 12-07-19 09:47:14, Yafang Shao wrote:
> > > On Fri, Jul 12, 2019 at 7:42 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > >
> > > > On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> > > > > the local VM counters is not in sync with the hierarchical ones.
> > > > >
> > > > > Bellow is one example in a leaf memcg on my server (with 8 CPUs),
> > > > >       inactive_file 3567570944
> > > > >       total_inactive_file 3568029696
> > > > > We can find that the deviation is very great, that is because the 'val' in
> > > > > __mod_memcg_state() is in pages while the effective value in
> > > > > memcg_stat_show() is in bytes.
> > > > > So the maximum of this deviation between local VM stats and total VM
> > > > > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> > > > > great value.
> > > > >
> > > > > We should keep the local VM stats in sync with the total stats.
> > > > > In order to keep this behavior the same across counters, this patch updates
> > > > > __mod_lruvec_state() and __count_memcg_events() as well.
> > > >
> > > > hm.
> > > >
> > > > So the local counters are presently more accurate than the hierarchical
> > > > ones because the hierarchical counters use batching.  And the proposal
> > > > is to make the local counters less accurate so that the inaccuracies
> > > > will match.
> > > >
> > > > It is a bit counter intuitive to hear than worsened accuracy is a good
> > > > thing!  We're told that the difference may be "unacceptably great" but
> > > > we aren't told why.  Some additional information to support this
> > > > surprising assertion would be useful, please.  What are the use-cases
> > > > which are harmed by this difference and how are they harmed?
> > > >
> > >
> > > Hi Andrew,
> > >
> > > Both local counter and the hierachical one are exposed to user.
> > > In a leaf memcg, the local counter should be equal with the hierarchical one,
> > > if they are different, the user may wondering what's wrong in this memcg.
> > > IOW, the difference makes these counters not reliable, if they are not
> > > reliable we can't use them to help us anylze issues.
> >
> > But those numbers are in flight anyway. We do not stop updating them
> > while they are read so there is no guarantee they will be consistent
> > anyway, right?
> 
> Right.
> They can't be guaranted to be consistent.
> When we read them, may only the local counters are updated and the
> hierarchical ones are not updated yet.
> But the current deviation is so great that can't be ignored.

Is really 32 pages per cpu all that great?

Please note that I am not objecting to the patch (yet) because I didn't
get to think about it thoroughly but I do agree with Andrew that the
changelog should state the exact problem including why it matters.
I do agree that inconsistencies are confusing but maybe we just need to
document the existing behavior better.
Yafang Shao July 12, 2019, 7:14 a.m. UTC | #7
On Fri, Jul 12, 2019 at 2:53 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 12-07-19 14:12:30, Yafang Shao wrote:
> > On Fri, Jul 12, 2019 at 1:29 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Fri 12-07-19 09:47:14, Yafang Shao wrote:
> > > > On Fri, Jul 12, 2019 at 7:42 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > >
> > > > > On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> > > > > > the local VM counters is not in sync with the hierarchical ones.
> > > > > >
> > > > > > Bellow is one example in a leaf memcg on my server (with 8 CPUs),
> > > > > >       inactive_file 3567570944
> > > > > >       total_inactive_file 3568029696
> > > > > > We can find that the deviation is very great, that is because the 'val' in
> > > > > > __mod_memcg_state() is in pages while the effective value in
> > > > > > memcg_stat_show() is in bytes.
> > > > > > So the maximum of this deviation between local VM stats and total VM
> > > > > > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> > > > > > great value.
> > > > > >
> > > > > > We should keep the local VM stats in sync with the total stats.
> > > > > > In order to keep this behavior the same across counters, this patch updates
> > > > > > __mod_lruvec_state() and __count_memcg_events() as well.
> > > > >
> > > > > hm.
> > > > >
> > > > > So the local counters are presently more accurate than the hierarchical
> > > > > ones because the hierarchical counters use batching.  And the proposal
> > > > > is to make the local counters less accurate so that the inaccuracies
> > > > > will match.
> > > > >
> > > > > It is a bit counter intuitive to hear than worsened accuracy is a good
> > > > > thing!  We're told that the difference may be "unacceptably great" but
> > > > > we aren't told why.  Some additional information to support this
> > > > > surprising assertion would be useful, please.  What are the use-cases
> > > > > which are harmed by this difference and how are they harmed?
> > > > >
> > > >
> > > > Hi Andrew,
> > > >
> > > > Both local counter and the hierachical one are exposed to user.
> > > > In a leaf memcg, the local counter should be equal with the hierarchical one,
> > > > if they are different, the user may wondering what's wrong in this memcg.
> > > > IOW, the difference makes these counters not reliable, if they are not
> > > > reliable we can't use them to help us anylze issues.
> > >
> > > But those numbers are in flight anyway. We do not stop updating them
> > > while they are read so there is no guarantee they will be consistent
> > > anyway, right?
> >
> > Right.
> > They can't be guaranted to be consistent.
> > When we read them, may only the local counters are updated and the
> > hierarchical ones are not updated yet.
> > But the current deviation is so great that can't be ignored.
>
> Is really 32 pages per cpu all that great?
>

As I has pointed out in the commit log, the local inactive_file is
3567570944 while the total_inactive_file is 3568029696,
and the difference between these two values are 458752.

> Please note that I am not objecting to the patch (yet) because I didn't
> get to think about it thoroughly but I do agree with Andrew that the
> changelog should state the exact problem including why it matters.
> I do agree that inconsistencies are confusing but maybe we just need to
> document the existing behavior better.

I'm not sure whether document it is enough or not.
What about removing all the hierarchical counters if this is a leaf memcg ?
Don't calculate the hierarchical counters nor display them if this is
a leaf memcg, I don't know whether it is worth to do.

Thanks
Yafang
Michal Hocko July 12, 2019, 7:54 a.m. UTC | #8
On Fri 12-07-19 15:14:01, Yafang Shao wrote:
> On Fri, Jul 12, 2019 at 2:53 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 12-07-19 14:12:30, Yafang Shao wrote:
> > > On Fri, Jul 12, 2019 at 1:29 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Fri 12-07-19 09:47:14, Yafang Shao wrote:
> > > > > On Fri, Jul 12, 2019 at 7:42 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > > >
> > > > > > On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> > > > > > > the local VM counters is not in sync with the hierarchical ones.
> > > > > > >
> > > > > > > Bellow is one example in a leaf memcg on my server (with 8 CPUs),
> > > > > > >       inactive_file 3567570944
> > > > > > >       total_inactive_file 3568029696
> > > > > > > We can find that the deviation is very great, that is because the 'val' in
> > > > > > > __mod_memcg_state() is in pages while the effective value in
> > > > > > > memcg_stat_show() is in bytes.
> > > > > > > So the maximum of this deviation between local VM stats and total VM
> > > > > > > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> > > > > > > great value.
> > > > > > >
> > > > > > > We should keep the local VM stats in sync with the total stats.
> > > > > > > In order to keep this behavior the same across counters, this patch updates
> > > > > > > __mod_lruvec_state() and __count_memcg_events() as well.
> > > > > >
> > > > > > hm.
> > > > > >
> > > > > > So the local counters are presently more accurate than the hierarchical
> > > > > > ones because the hierarchical counters use batching.  And the proposal
> > > > > > is to make the local counters less accurate so that the inaccuracies
> > > > > > will match.
> > > > > >
> > > > > > It is a bit counter intuitive to hear than worsened accuracy is a good
> > > > > > thing!  We're told that the difference may be "unacceptably great" but
> > > > > > we aren't told why.  Some additional information to support this
> > > > > > surprising assertion would be useful, please.  What are the use-cases
> > > > > > which are harmed by this difference and how are they harmed?
> > > > > >
> > > > >
> > > > > Hi Andrew,
> > > > >
> > > > > Both local counter and the hierachical one are exposed to user.
> > > > > In a leaf memcg, the local counter should be equal with the hierarchical one,
> > > > > if they are different, the user may wondering what's wrong in this memcg.
> > > > > IOW, the difference makes these counters not reliable, if they are not
> > > > > reliable we can't use them to help us anylze issues.
> > > >
> > > > But those numbers are in flight anyway. We do not stop updating them
> > > > while they are read so there is no guarantee they will be consistent
> > > > anyway, right?
> > >
> > > Right.
> > > They can't be guaranted to be consistent.
> > > When we read them, may only the local counters are updated and the
> > > hierarchical ones are not updated yet.
> > > But the current deviation is so great that can't be ignored.
> >
> > Is really 32 pages per cpu all that great?
> >
> 
> As I has pointed out in the commit log, the local inactive_file is
> 3567570944 while the total_inactive_file is 3568029696,
> and the difference between these two values are 458752.

And that is less than 1% right?

> > Please note that I am not objecting to the patch (yet) because I didn't
> > get to think about it thoroughly but I do agree with Andrew that the
> > changelog should state the exact problem including why it matters.
> > I do agree that inconsistencies are confusing but maybe we just need to
> > document the existing behavior better.
> 
> I'm not sure whether document it is enough or not.

Well, the fact is that numbers will always be a snapshot of a single
counter at a specific time. So two different counters might be still
inconsistent. Caching just adds on top of this fact. If that is too much
then we should think about reducing that - especially for machines with
many cpus.

> What about removing all the hierarchical counters if this is a leaf memcg ?

Removing them would make parsing more complex because now the userspace
code has to special case leaf and intermediate memcgs. If we want them
to do that then we could also let them calculte hierarchical numbers if
they need them.

We could also special case leave memcgs and print normal counters for
hierarchical values. But I am still not sure this is really necessary.
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba9138a..07b4ca5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -691,12 +691,15 @@  void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
 	if (mem_cgroup_disabled())
 		return;
 
-	__this_cpu_add(memcg->vmstats_local->stat[idx], val);
-
 	x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
 	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
 		struct mem_cgroup *mi;
 
+		/*
+		 * Batch local counters to keep them in sync with
+		 * the hierarchical ones.
+		 */
+		__this_cpu_add(memcg->vmstats_local->stat[idx], x);
 		for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
 			atomic_long_add(x, &mi->vmstats[idx]);
 		x = 0;
@@ -745,13 +748,15 @@  void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	/* Update memcg */
 	__mod_memcg_state(memcg, idx, val);
 
-	/* Update lruvec */
-	__this_cpu_add(pn->lruvec_stat_local->count[idx], val);
-
 	x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
 	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
 		struct mem_cgroup_per_node *pi;
 
+		/*
+		 * Batch local counters to keep them in sync with
+		 * the hierarchical ones.
+		 */
+		__this_cpu_add(pn->lruvec_stat_local->count[idx], x);
 		for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id))
 			atomic_long_add(x, &pi->lruvec_stat[idx]);
 		x = 0;
@@ -773,12 +778,15 @@  void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
 	if (mem_cgroup_disabled())
 		return;
 
-	__this_cpu_add(memcg->vmstats_local->events[idx], count);
-
 	x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]);
 	if (unlikely(x > MEMCG_CHARGE_BATCH)) {
 		struct mem_cgroup *mi;
 
+		/*
+		 * Batch local counters to keep them in sync with
+		 * the hierarchical ones.
+		 */
+		__this_cpu_add(memcg->vmstats_local->events[idx], x);
 		for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
 			atomic_long_add(x, &mi->vmevents[idx]);
 		x = 0;