[7/7] mm: memcontrol: consolidate lruvec stat flushing

Message ID	20210202184746.119084-8-hannes@cmpxchg.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=KBoR=HE=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B1E6E64E9C From: Johannes Weiner <hannes@cmpxchg.org> To: Andrew Morton <akpm@linux-foundation.org>, Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 7/7] mm: memcontrol: consolidate lruvec stat flushing Date: Tue, 2 Feb 2021 13:47:46 -0500 Message-Id: <20210202184746.119084-8-hannes@cmpxchg.org> In-Reply-To: <20210202184746.119084-1-hannes@cmpxchg.org> References: <20210202184746.119084-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	: mm: memcontrol: switch to rstat \| expand [0/7] : mm: memcontrol: switch to rstat [1/7] mm: memcontrol: fix cpuhotplug statistics flushing [2/7] mm: memcontrol: kill mem_cgroup_nodeinfo() [3/7] mm: memcontrol: privatize memcg_page_state query functions [4/7] cgroup: rstat: support cgroup1 [5/7] cgroup: rstat: punt root-level optimization to individual controllers [6/7] mm: memcontrol: switch to rstat [7/7] mm: memcontrol: consolidate lruvec stat flushing

Johannes Weiner Feb. 2, 2021, 6:47 p.m. UTC

There are two functions to flush the per-cpu data of an lruvec into
the rest of the cgroup tree: when the cgroup is being freed, and when
a CPU disappears during hotplug. The difference is whether all CPUs or
just one is being collected, but the rest of the flushing code is the
same. Merge them into one function and share the common code.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 88 +++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 46 deletions(-)

Roman Gushchin Feb. 3, 2021, 2:25 a.m. UTC | #1

On Tue, Feb 02, 2021 at 01:47:46PM -0500, Johannes Weiner wrote:
> There are two functions to flush the per-cpu data of an lruvec into
> the rest of the cgroup tree: when the cgroup is being freed, and when
> a CPU disappears during hotplug. The difference is whether all CPUs or
> just one is being collected, but the rest of the flushing code is the
> same. Merge them into one function and share the common code.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/memcontrol.c | 88 +++++++++++++++++++++++--------------------------
>  1 file changed, 42 insertions(+), 46 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b205b2413186..88e8afc49a46 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2410,39 +2410,56 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>  	mutex_unlock(&percpu_charge_mutex);
>  }
>  
> -static int memcg_hotplug_cpu_dead(unsigned int cpu)
> +static void memcg_flush_lruvec_page_state(struct mem_cgroup *memcg, int cpu)
>  {
> -	struct memcg_stock_pcp *stock;
> -	struct mem_cgroup *memcg;
> -
> -	stock = &per_cpu(memcg_stock, cpu);
> -	drain_stock(stock);
> +	int nid;
>  
> -	for_each_mem_cgroup(memcg) {
> +	for_each_node(nid) {
> +		struct mem_cgroup_per_node *pn = memcg->nodeinfo[nid];
> +		unsigned long stat[NR_VM_NODE_STAT_ITEMS] = { 0, };
  			      				      ^^^^
							   Same here.

> +		struct batched_lruvec_stat *lstatc;
>  		int i;
>  
> -		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> -			int nid;
> -
> -			for_each_node(nid) {
> -				struct batched_lruvec_stat *lstatc;
> -				struct mem_cgroup_per_node *pn;
> -				long x;
> -
> -				pn = memcg->nodeinfo[nid];
> +		if (cpu == -1) {
> +			int cpui;
> +			/*
> +			 * The memcg is about to be freed, collect all
> +			 * CPUs, no need to zero anything out.
> +			 */
> +			for_each_online_cpu(cpui) {
> +				lstatc = per_cpu_ptr(pn->lruvec_stat_cpu, cpui);
> +				for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> +					stat[i] += lstatc->count[i];
> +			}
> +		} else {
> +			/*
> +			 * The CPU has gone away, collect and zero out
> +			 * its stats, it may come back later.
> +			 */
> +			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
>  				lstatc = per_cpu_ptr(pn->lruvec_stat_cpu, cpu);
> -
> -				x = lstatc->count[i];
> +				stat[i] = lstatc->count[i];
>  				lstatc->count[i] = 0;
> -
> -				if (x) {
> -					do {
> -						atomic_long_add(x, &pn->lruvec_stat[i]);
> -					} while ((pn = parent_nodeinfo(pn, nid)));
> -				}
>  			}
>  		}
> +
> +		do {
> +			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> +				atomic_long_add(stat[i], &pn->lruvec_stat[i]);
> +		} while ((pn = parent_nodeinfo(pn, nid)));
>  	}
> +}
> +
> +static int memcg_hotplug_cpu_dead(unsigned int cpu)
> +{
> +	struct memcg_stock_pcp *stock;
> +	struct mem_cgroup *memcg;
> +
> +	stock = &per_cpu(memcg_stock, cpu);
> +	drain_stock(stock);
> +
> +	for_each_mem_cgroup(memcg)
> +		memcg_flush_lruvec_page_state(memcg, cpu);
>  
>  	return 0;
>  }
> @@ -3636,27 +3653,6 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
>  	}
>  }
>  
> -static void memcg_flush_lruvec_page_state(struct mem_cgroup *memcg)
> -{
> -	int node;
> -
> -	for_each_node(node) {
> -		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
> -		unsigned long stat[NR_VM_NODE_STAT_ITEMS] = {0, };
> -		struct mem_cgroup_per_node *pi;
> -		int cpu, i;
> -
> -		for_each_online_cpu(cpu)
> -			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> -				stat[i] += per_cpu(
> -					pn->lruvec_stat_cpu->count[i], cpu);
> -
> -		for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
> -			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> -				atomic_long_add(stat[i], &pi->lruvec_stat[i]);
> -	}
> -}
> -
>  #ifdef CONFIG_MEMCG_KMEM
>  static int memcg_online_kmem(struct mem_cgroup *memcg)
>  {
> @@ -5197,7 +5193,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
>  	 * Flush percpu lruvec stats to guarantee the value
>  	 * correctness on parent's and all ancestor levels.
>  	 */
> -	memcg_flush_lruvec_page_state(memcg);
> +	memcg_flush_lruvec_page_state(memcg, -1);

I wonder if adding "cpu" or "percpu" into the function name will make clearer what -1 means?
E.g. memcg_flush_(per)cpu_lruvec_stats(memcg, -1).

Reviewed-by: Roman Gushchin <guro@fb.com>

Johannes Weiner Feb. 4, 2021, 9:44 p.m. UTC | #2

On Tue, Feb 02, 2021 at 06:25:30PM -0800, Roman Gushchin wrote:
> On Tue, Feb 02, 2021 at 01:47:46PM -0500, Johannes Weiner wrote:
> > There are two functions to flush the per-cpu data of an lruvec into
> > the rest of the cgroup tree: when the cgroup is being freed, and when
> > a CPU disappears during hotplug. The difference is whether all CPUs or
> > just one is being collected, but the rest of the flushing code is the
> > same. Merge them into one function and share the common code.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/memcontrol.c | 88 +++++++++++++++++++++++--------------------------
> >  1 file changed, 42 insertions(+), 46 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index b205b2413186..88e8afc49a46 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2410,39 +2410,56 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> >  	mutex_unlock(&percpu_charge_mutex);
> >  }
> >  
> > -static int memcg_hotplug_cpu_dead(unsigned int cpu)
> > +static void memcg_flush_lruvec_page_state(struct mem_cgroup *memcg, int cpu)
> >  {
> > -	struct memcg_stock_pcp *stock;
> > -	struct mem_cgroup *memcg;
> > -
> > -	stock = &per_cpu(memcg_stock, cpu);
> > -	drain_stock(stock);
> > +	int nid;
> >  
> > -	for_each_mem_cgroup(memcg) {
> > +	for_each_node(nid) {
> > +		struct mem_cgroup_per_node *pn = memcg->nodeinfo[nid];
> > +		unsigned long stat[NR_VM_NODE_STAT_ITEMS] = { 0, };
>   			      				      ^^^^
> 							   Same here.
> 
> > +		struct batched_lruvec_stat *lstatc;
> >  		int i;
> >  
> > -		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> > -			int nid;
> > -
> > -			for_each_node(nid) {
> > -				struct batched_lruvec_stat *lstatc;
> > -				struct mem_cgroup_per_node *pn;
> > -				long x;
> > -
> > -				pn = memcg->nodeinfo[nid];
> > +		if (cpu == -1) {
> > +			int cpui;
> > +			/*
> > +			 * The memcg is about to be freed, collect all
> > +			 * CPUs, no need to zero anything out.
> > +			 */
> > +			for_each_online_cpu(cpui) {
> > +				lstatc = per_cpu_ptr(pn->lruvec_stat_cpu, cpui);
> > +				for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> > +					stat[i] += lstatc->count[i];
> > +			}
> > +		} else {
> > +			/*
> > +			 * The CPU has gone away, collect and zero out
> > +			 * its stats, it may come back later.
> > +			 */
> > +			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> >  				lstatc = per_cpu_ptr(pn->lruvec_stat_cpu, cpu);
> > -
> > -				x = lstatc->count[i];
> > +				stat[i] = lstatc->count[i];
> >  				lstatc->count[i] = 0;
> > -
> > -				if (x) {
> > -					do {
> > -						atomic_long_add(x, &pn->lruvec_stat[i]);
> > -					} while ((pn = parent_nodeinfo(pn, nid)));
> > -				}
> >  			}
> >  		}
> > +
> > +		do {
> > +			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> > +				atomic_long_add(stat[i], &pn->lruvec_stat[i]);
> > +		} while ((pn = parent_nodeinfo(pn, nid)));
> >  	}
> > +}
> > +
> > +static int memcg_hotplug_cpu_dead(unsigned int cpu)
> > +{
> > +	struct memcg_stock_pcp *stock;
> > +	struct mem_cgroup *memcg;
> > +
> > +	stock = &per_cpu(memcg_stock, cpu);
> > +	drain_stock(stock);
> > +
> > +	for_each_mem_cgroup(memcg)
> > +		memcg_flush_lruvec_page_state(memcg, cpu);
> >  
> >  	return 0;
> >  }
> > @@ -3636,27 +3653,6 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
> >  	}
> >  }
> >  
> > -static void memcg_flush_lruvec_page_state(struct mem_cgroup *memcg)
> > -{
> > -	int node;
> > -
> > -	for_each_node(node) {
> > -		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
> > -		unsigned long stat[NR_VM_NODE_STAT_ITEMS] = {0, };
> > -		struct mem_cgroup_per_node *pi;
> > -		int cpu, i;
> > -
> > -		for_each_online_cpu(cpu)
> > -			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> > -				stat[i] += per_cpu(
> > -					pn->lruvec_stat_cpu->count[i], cpu);
> > -
> > -		for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
> > -			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> > -				atomic_long_add(stat[i], &pi->lruvec_stat[i]);
> > -	}
> > -}
> > -
> >  #ifdef CONFIG_MEMCG_KMEM
> >  static int memcg_online_kmem(struct mem_cgroup *memcg)
> >  {
> > @@ -5197,7 +5193,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
> >  	 * Flush percpu lruvec stats to guarantee the value
> >  	 * correctness on parent's and all ancestor levels.
> >  	 */
> > -	memcg_flush_lruvec_page_state(memcg);
> > +	memcg_flush_lruvec_page_state(memcg, -1);
> 
> I wonder if adding "cpu" or "percpu" into the function name will make clearer what -1 means?
> E.g. memcg_flush_(per)cpu_lruvec_stats(memcg, -1).

Yes, it's a bit ominous. I changed it to

	memcg_flush_lruvec_page_state_cpu(memcg, -1);

percpu would have pushed the function signature over 80 characters.

> Reviewed-by: Roman Gushchin <guro@fb.com>

Thanks

Roman Gushchin Feb. 4, 2021, 9:47 p.m. UTC | #3

On Thu, Feb 04, 2021 at 04:44:27PM -0500, Johannes Weiner wrote:
> On Tue, Feb 02, 2021 at 06:25:30PM -0800, Roman Gushchin wrote:
> > On Tue, Feb 02, 2021 at 01:47:46PM -0500, Johannes Weiner wrote:
> > > There are two functions to flush the per-cpu data of an lruvec into
> > > the rest of the cgroup tree: when the cgroup is being freed, and when
> > > a CPU disappears during hotplug. The difference is whether all CPUs or
> > > just one is being collected, but the rest of the flushing code is the
> > > same. Merge them into one function and share the common code.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > ---
> > >  mm/memcontrol.c | 88 +++++++++++++++++++++++--------------------------
> > >  1 file changed, 42 insertions(+), 46 deletions(-)
> > > 
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index b205b2413186..88e8afc49a46 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2410,39 +2410,56 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> > >  	mutex_unlock(&percpu_charge_mutex);
> > >  }
> > >  
> > > -static int memcg_hotplug_cpu_dead(unsigned int cpu)
> > > +static void memcg_flush_lruvec_page_state(struct mem_cgroup *memcg, int cpu)
> > >  {
> > > -	struct memcg_stock_pcp *stock;
> > > -	struct mem_cgroup *memcg;
> > > -
> > > -	stock = &per_cpu(memcg_stock, cpu);
> > > -	drain_stock(stock);
> > > +	int nid;
> > >  
> > > -	for_each_mem_cgroup(memcg) {
> > > +	for_each_node(nid) {
> > > +		struct mem_cgroup_per_node *pn = memcg->nodeinfo[nid];
> > > +		unsigned long stat[NR_VM_NODE_STAT_ITEMS] = { 0, };
> >   			      				      ^^^^
> > 							   Same here.
> > 
> > > +		struct batched_lruvec_stat *lstatc;
> > >  		int i;
> > >  
> > > -		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> > > -			int nid;
> > > -
> > > -			for_each_node(nid) {
> > > -				struct batched_lruvec_stat *lstatc;
> > > -				struct mem_cgroup_per_node *pn;
> > > -				long x;
> > > -
> > > -				pn = memcg->nodeinfo[nid];
> > > +		if (cpu == -1) {
> > > +			int cpui;
> > > +			/*
> > > +			 * The memcg is about to be freed, collect all
> > > +			 * CPUs, no need to zero anything out.
> > > +			 */
> > > +			for_each_online_cpu(cpui) {
> > > +				lstatc = per_cpu_ptr(pn->lruvec_stat_cpu, cpui);
> > > +				for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> > > +					stat[i] += lstatc->count[i];
> > > +			}
> > > +		} else {
> > > +			/*
> > > +			 * The CPU has gone away, collect and zero out
> > > +			 * its stats, it may come back later.
> > > +			 */
> > > +			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> > >  				lstatc = per_cpu_ptr(pn->lruvec_stat_cpu, cpu);
> > > -
> > > -				x = lstatc->count[i];
> > > +				stat[i] = lstatc->count[i];
> > >  				lstatc->count[i] = 0;
> > > -
> > > -				if (x) {
> > > -					do {
> > > -						atomic_long_add(x, &pn->lruvec_stat[i]);
> > > -					} while ((pn = parent_nodeinfo(pn, nid)));
> > > -				}
> > >  			}
> > >  		}
> > > +
> > > +		do {
> > > +			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> > > +				atomic_long_add(stat[i], &pn->lruvec_stat[i]);
> > > +		} while ((pn = parent_nodeinfo(pn, nid)));
> > >  	}
> > > +}
> > > +
> > > +static int memcg_hotplug_cpu_dead(unsigned int cpu)
> > > +{
> > > +	struct memcg_stock_pcp *stock;
> > > +	struct mem_cgroup *memcg;
> > > +
> > > +	stock = &per_cpu(memcg_stock, cpu);
> > > +	drain_stock(stock);
> > > +
> > > +	for_each_mem_cgroup(memcg)
> > > +		memcg_flush_lruvec_page_state(memcg, cpu);
> > >  
> > >  	return 0;
> > >  }
> > > @@ -3636,27 +3653,6 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
> > >  	}
> > >  }
> > >  
> > > -static void memcg_flush_lruvec_page_state(struct mem_cgroup *memcg)
> > > -{
> > > -	int node;
> > > -
> > > -	for_each_node(node) {
> > > -		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
> > > -		unsigned long stat[NR_VM_NODE_STAT_ITEMS] = {0, };
> > > -		struct mem_cgroup_per_node *pi;
> > > -		int cpu, i;
> > > -
> > > -		for_each_online_cpu(cpu)
> > > -			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> > > -				stat[i] += per_cpu(
> > > -					pn->lruvec_stat_cpu->count[i], cpu);
> > > -
> > > -		for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
> > > -			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> > > -				atomic_long_add(stat[i], &pi->lruvec_stat[i]);
> > > -	}
> > > -}
> > > -
> > >  #ifdef CONFIG_MEMCG_KMEM
> > >  static int memcg_online_kmem(struct mem_cgroup *memcg)
> > >  {
> > > @@ -5197,7 +5193,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
> > >  	 * Flush percpu lruvec stats to guarantee the value
> > >  	 * correctness on parent's and all ancestor levels.
> > >  	 */
> > > -	memcg_flush_lruvec_page_state(memcg);
> > > +	memcg_flush_lruvec_page_state(memcg, -1);
> > 
> > I wonder if adding "cpu" or "percpu" into the function name will make clearer what -1 means?
> > E.g. memcg_flush_(per)cpu_lruvec_stats(memcg, -1).
> 
> Yes, it's a bit ominous. I changed it to
> 
> 	memcg_flush_lruvec_page_state_cpu(memcg, -1);

Works for me!
But honestly I don't understand what does "page_state" mean in this context.

Thanks!

Michal Hocko Feb. 5, 2021, 3:17 p.m. UTC | #4

On Tue 02-02-21 13:47:46, Johannes Weiner wrote:
> There are two functions to flush the per-cpu data of an lruvec into
> the rest of the cgroup tree: when the cgroup is being freed, and when
> a CPU disappears during hotplug. The difference is whether all CPUs or
> just one is being collected, but the rest of the flushing code is the
> same. Merge them into one function and share the common code.

IIUC the only reason for the cpu == -1 special case is to avoid
zeroying, right? Is this optimization worth the special case? The code
would be slightly easier to follow without this.

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Anyway the above is not really a fundamental objection. It is more important
to unify the flushing.

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol.c | 88 +++++++++++++++++++++++--------------------------
>  1 file changed, 42 insertions(+), 46 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b205b2413186..88e8afc49a46 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2410,39 +2410,56 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>  	mutex_unlock(&percpu_charge_mutex);
>  }
>  
> -static int memcg_hotplug_cpu_dead(unsigned int cpu)
> +static void memcg_flush_lruvec_page_state(struct mem_cgroup *memcg, int cpu)
>  {
> -	struct memcg_stock_pcp *stock;
> -	struct mem_cgroup *memcg;
> -
> -	stock = &per_cpu(memcg_stock, cpu);
> -	drain_stock(stock);
> +	int nid;
>  
> -	for_each_mem_cgroup(memcg) {
> +	for_each_node(nid) {
> +		struct mem_cgroup_per_node *pn = memcg->nodeinfo[nid];
> +		unsigned long stat[NR_VM_NODE_STAT_ITEMS] = { 0, };
> +		struct batched_lruvec_stat *lstatc;
>  		int i;
>  
> -		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> -			int nid;
> -
> -			for_each_node(nid) {
> -				struct batched_lruvec_stat *lstatc;
> -				struct mem_cgroup_per_node *pn;
> -				long x;
> -
> -				pn = memcg->nodeinfo[nid];
> +		if (cpu == -1) {
> +			int cpui;
> +			/*
> +			 * The memcg is about to be freed, collect all
> +			 * CPUs, no need to zero anything out.
> +			 */
> +			for_each_online_cpu(cpui) {
> +				lstatc = per_cpu_ptr(pn->lruvec_stat_cpu, cpui);
> +				for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> +					stat[i] += lstatc->count[i];
> +			}
> +		} else {
> +			/*
> +			 * The CPU has gone away, collect and zero out
> +			 * its stats, it may come back later.
> +			 */
> +			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
>  				lstatc = per_cpu_ptr(pn->lruvec_stat_cpu, cpu);
> -
> -				x = lstatc->count[i];
> +				stat[i] = lstatc->count[i];
>  				lstatc->count[i] = 0;
> -
> -				if (x) {
> -					do {
> -						atomic_long_add(x, &pn->lruvec_stat[i]);
> -					} while ((pn = parent_nodeinfo(pn, nid)));
> -				}
>  			}
>  		}
> +
> +		do {
> +			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> +				atomic_long_add(stat[i], &pn->lruvec_stat[i]);
> +		} while ((pn = parent_nodeinfo(pn, nid)));
>  	}
> +}
> +
> +static int memcg_hotplug_cpu_dead(unsigned int cpu)
> +{
> +	struct memcg_stock_pcp *stock;
> +	struct mem_cgroup *memcg;
> +
> +	stock = &per_cpu(memcg_stock, cpu);
> +	drain_stock(stock);
> +
> +	for_each_mem_cgroup(memcg)
> +		memcg_flush_lruvec_page_state(memcg, cpu);
>  
>  	return 0;
>  }
> @@ -3636,27 +3653,6 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
>  	}
>  }
>  
> -static void memcg_flush_lruvec_page_state(struct mem_cgroup *memcg)
> -{
> -	int node;
> -
> -	for_each_node(node) {
> -		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
> -		unsigned long stat[NR_VM_NODE_STAT_ITEMS] = {0, };
> -		struct mem_cgroup_per_node *pi;
> -		int cpu, i;
> -
> -		for_each_online_cpu(cpu)
> -			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> -				stat[i] += per_cpu(
> -					pn->lruvec_stat_cpu->count[i], cpu);
> -
> -		for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
> -			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> -				atomic_long_add(stat[i], &pi->lruvec_stat[i]);
> -	}
> -}
> -
>  #ifdef CONFIG_MEMCG_KMEM
>  static int memcg_online_kmem(struct mem_cgroup *memcg)
>  {
> @@ -5197,7 +5193,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
>  	 * Flush percpu lruvec stats to guarantee the value
>  	 * correctness on parent's and all ancestor levels.
>  	 */
> -	memcg_flush_lruvec_page_state(memcg);
> +	memcg_flush_lruvec_page_state(memcg, -1);
>  	__mem_cgroup_free(memcg);
>  }
>  
> -- 
> 2.30.0
>

Johannes Weiner Feb. 5, 2021, 5:10 p.m. UTC | #5

On Fri, Feb 05, 2021 at 04:17:27PM +0100, Michal Hocko wrote:
> On Tue 02-02-21 13:47:46, Johannes Weiner wrote:
> > There are two functions to flush the per-cpu data of an lruvec into
> > the rest of the cgroup tree: when the cgroup is being freed, and when
> > a CPU disappears during hotplug. The difference is whether all CPUs or
> > just one is being collected, but the rest of the flushing code is the
> > same. Merge them into one function and share the common code.
> 
> IIUC the only reason for the cpu == -1 special case is to avoid
> zeroying, right? Is this optimization worth the special case? The code
> would be slightly easier to follow without this.

Hm, it was less about the optimization and more about which CPU(s)
need(s) to be handled. But it's pretty silly the way it's written,
indeed. I'll move the for_each_online_cpu() to the caller and drop the
cpu==-1 special casing, it makes things much simpler and more obvious.

> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Anyway the above is not really a fundamental objection. It is more important
> to unify the flushing.
> 
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks. v2 is different, so I'll wait with taking the ack.

[7/7] mm: memcontrol: consolidate lruvec stat flushing

Commit Message

Comments

Patch