diff mbox series

mm: memcontrol: switch to rcu protection in drain_all_stock()

Message ID 20190801233513.137917-1-guro@fb.com (mailing list archive)
State New, archived
Headers show
Series mm: memcontrol: switch to rcu protection in drain_all_stock() | expand

Commit Message

Roman Gushchin Aug. 1, 2019, 11:35 p.m. UTC
Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
introduced css_tryget()/css_put() calls in drain_all_stock(),
which are supposed to protect the target memory cgroup from being
released during the mem_cgroup_is_descendant() call.

However, it's not completely safe. In theory, memcg can go away
between reading stock->cached pointer and calling css_tryget().

So, let's read the stock->cached pointer and evaluate the memory
cgroup inside a rcu read section, and get rid of
css_tryget()/css_put() calls.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Comments

Michal Hocko Aug. 2, 2019, 8:04 a.m. UTC | #1
On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> introduced css_tryget()/css_put() calls in drain_all_stock(),
> which are supposed to protect the target memory cgroup from being
> released during the mem_cgroup_is_descendant() call.
> 
> However, it's not completely safe. In theory, memcg can go away
> between reading stock->cached pointer and calling css_tryget().

I have to remember how is this whole thing supposed to work, it's been
some time since I've looked into that.

> So, let's read the stock->cached pointer and evaluate the memory
> cgroup inside a rcu read section, and get rid of
> css_tryget()/css_put() calls.

Could you be more specific why does RCU help here?

> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memcontrol.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5c7b9facb0eb..d856b64426b7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2235,21 +2235,22 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
>  		struct mem_cgroup *memcg;
> +		bool flush = false;
>  
> +		rcu_read_lock();
>  		memcg = stock->cached;
> -		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
> -			continue;
> -		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
> -			css_put(&memcg->css);
> -			continue;
> -		}
> -		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
> +		if (memcg && stock->nr_pages &&
> +		    mem_cgroup_is_descendant(memcg, root_memcg))
> +			flush = true;
> +		rcu_read_unlock();
> +
> +		if (flush &&
> +		    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>  			if (cpu == curcpu)
>  				drain_local_stock(&stock->work);
>  			else
>  				schedule_work_on(cpu, &stock->work);
>  		}
> -		css_put(&memcg->css);
>  	}
>  	put_cpu();
>  	mutex_unlock(&percpu_charge_mutex);
> -- 
> 2.21.0
Michal Hocko Aug. 2, 2019, 8:59 a.m. UTC | #2
On Fri 02-08-19 10:04:22, Michal Hocko wrote:
> On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > which are supposed to protect the target memory cgroup from being
> > released during the mem_cgroup_is_descendant() call.
> > 
> > However, it's not completely safe. In theory, memcg can go away
> > between reading stock->cached pointer and calling css_tryget().
> 
> I have to remember how is this whole thing supposed to work, it's been
> some time since I've looked into that.

OK, I guess I remember now and I do not see how the race is possible.
Stock cache is keeping its memcg alive because it elevates the reference
counting for each cached charge. And that should keep the whole chain up
to the root (of draining) alive, no? Or do I miss something, could you
generate a sequence of events that would lead to use-after-free?
Roman Gushchin Aug. 2, 2019, 4:55 p.m. UTC | #3
On Fri, Aug 02, 2019 at 11:33:33AM +0800, Hillf Danton wrote:
> 
> On Thu, 1 Aug 2019 16:35:13 -0700 Roman Gushchin wrote:
> > 
> > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > which are supposed to protect the target memory cgroup from being
> > released during the mem_cgroup_is_descendant() call.
> > 
> > However, it's not completely safe. In theory, memcg can go away
> > between reading stock->cached pointer and calling css_tryget().
> 
> Good catch!
> > 
> > So, let's read the stock->cached pointer and evaluate the memory
> > cgroup inside a rcu read section, and get rid of
> > css_tryget()/css_put() calls.
> 
> You need to either adjust the boundry of the rcu-protected section, or
> retain the call pairs, as the memcg cache is dereferenced again in
> drain_stock().

Not really. drain_stock() is always accessing the local percpu stock, and
stock->cached memcg pointer is protected by references of stocked pages.
Pages are stocked and drained only locally, so they can't go away.
So if (stock->nr_pages > 0), the memcg has at least stock->nr_pages references.

Also, because stocks on other cpus are drained via scheduled work,
neither rcu_read_lock(), not css_tryget()/css_put() protects it.

That's exactly the reason why I think this code is worth changing: it
looks confusing. It looks like css_tryget()/css_put() protect stock
draining, however it's not true.

Thanks!

> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/memcontrol.c | 17 +++++++++--------
> >  1 file changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 5c7b9facb0eb..d856b64426b7 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2235,21 +2235,22 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> >  	for_each_online_cpu(cpu) {
> >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> >  		struct mem_cgroup *memcg;
> > +		bool flush = false;
> >  
> > +		rcu_read_lock();
> >  		memcg = stock->cached;
> > -		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
> > -			continue;
> > -		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
> > -			css_put(&memcg->css);
> > -			continue;
> > -		}
> > -		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
> > +		if (memcg && stock->nr_pages &&
> > +		    mem_cgroup_is_descendant(memcg, root_memcg))
> > +			flush = true;
> > +		rcu_read_unlock();
> > +
> > +		if (flush &&
> > +		    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
> >  			if (cpu == curcpu)
> >  				drain_local_stock(&stock->work);
> >  			else
> >  				schedule_work_on(cpu, &stock->work);
> >  		}
> > -		css_put(&memcg->css);
> >  	}
> >  	put_cpu();
> >  	mutex_unlock(&percpu_charge_mutex);
> > -- 
> > 2.21.0
> > 
> 
>
Roman Gushchin Aug. 2, 2019, 5 p.m. UTC | #4
On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote:
> On Fri 02-08-19 10:04:22, Michal Hocko wrote:
> > On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > > which are supposed to protect the target memory cgroup from being
> > > released during the mem_cgroup_is_descendant() call.
> > > 
> > > However, it's not completely safe. In theory, memcg can go away
> > > between reading stock->cached pointer and calling css_tryget().
> > 
> > I have to remember how is this whole thing supposed to work, it's been
> > some time since I've looked into that.
> 
> OK, I guess I remember now and I do not see how the race is possible.
> Stock cache is keeping its memcg alive because it elevates the reference
> counting for each cached charge. And that should keep the whole chain up
> to the root (of draining) alive, no? Or do I miss something, could you
> generate a sequence of events that would lead to use-after-free?

Right, but it's true when you reading a local percpu stock.
But here we read a remote stock->cached pointer, which can be cleared
by a remote concurrent drain_local_stock() execution.

In theory, it could be the last reference, and the memcg can be destroyed
remotely, so we end up trying to call css_tryget() over freed memory.

The race is theoretical, but as I wrote in the thread, I think
that it's still worth fixing, because the current code looks confusing
(and this confirms my feelings).

Thanks!
Michal Hocko Aug. 2, 2019, 5:14 p.m. UTC | #5
On Fri 02-08-19 17:00:34, Roman Gushchin wrote:
> On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote:
> > On Fri 02-08-19 10:04:22, Michal Hocko wrote:
> > > On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> > > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > > > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > > > which are supposed to protect the target memory cgroup from being
> > > > released during the mem_cgroup_is_descendant() call.
> > > > 
> > > > However, it's not completely safe. In theory, memcg can go away
> > > > between reading stock->cached pointer and calling css_tryget().
> > > 
> > > I have to remember how is this whole thing supposed to work, it's been
> > > some time since I've looked into that.
> > 
> > OK, I guess I remember now and I do not see how the race is possible.
> > Stock cache is keeping its memcg alive because it elevates the reference
> > counting for each cached charge. And that should keep the whole chain up
> > to the root (of draining) alive, no? Or do I miss something, could you
> > generate a sequence of events that would lead to use-after-free?
> 
> Right, but it's true when you reading a local percpu stock.
> But here we read a remote stock->cached pointer, which can be cleared
> by a remote concurrent drain_local_stock() execution.

OK, I can see how refill_stock can race with drain_all_stock. I am not
sure I see drain_local_stock race because that should be triggered only
from drain_all_stock and only one cpu is allowed to do that. Maybe we
might have scheduled a work from the previous run?

In any case, please document the race in the changelog please. This code
is indeed tricky and a comment would help as well.
Roman Gushchin Aug. 2, 2019, 5:36 p.m. UTC | #6
On Fri, Aug 02, 2019 at 07:14:51PM +0200, Michal Hocko wrote:
> On Fri 02-08-19 17:00:34, Roman Gushchin wrote:
> > On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote:
> > > On Fri 02-08-19 10:04:22, Michal Hocko wrote:
> > > > On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> > > > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > > > > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > > > > which are supposed to protect the target memory cgroup from being
> > > > > released during the mem_cgroup_is_descendant() call.
> > > > > 
> > > > > However, it's not completely safe. In theory, memcg can go away
> > > > > between reading stock->cached pointer and calling css_tryget().
> > > > 
> > > > I have to remember how is this whole thing supposed to work, it's been
> > > > some time since I've looked into that.
> > > 
> > > OK, I guess I remember now and I do not see how the race is possible.
> > > Stock cache is keeping its memcg alive because it elevates the reference
> > > counting for each cached charge. And that should keep the whole chain up
> > > to the root (of draining) alive, no? Or do I miss something, could you
> > > generate a sequence of events that would lead to use-after-free?
> > 
> > Right, but it's true when you reading a local percpu stock.
> > But here we read a remote stock->cached pointer, which can be cleared
> > by a remote concurrent drain_local_stock() execution.
> 
> OK, I can see how refill_stock can race with drain_all_stock. I am not
> sure I see drain_local_stock race because that should be triggered only
> from drain_all_stock and only one cpu is allowed to do that. Maybe we
> might have scheduled a work from the previous run?

Exactly. Previously executed drain_all_stock() -> schedule_work ->
drain_local_stock() on a remote cpu races with checking memcg pointer
from drain_all_stock.

> 
> In any case, please document the race in the changelog please. This code
> is indeed tricky and a comment would help as well.

Sure, will send out v2 soon.

Thanks!
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5c7b9facb0eb..d856b64426b7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2235,21 +2235,22 @@  static void drain_all_stock(struct mem_cgroup *root_memcg)
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
 		struct mem_cgroup *memcg;
+		bool flush = false;
 
+		rcu_read_lock();
 		memcg = stock->cached;
-		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
-			continue;
-		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
-			css_put(&memcg->css);
-			continue;
-		}
-		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
+		if (memcg && stock->nr_pages &&
+		    mem_cgroup_is_descendant(memcg, root_memcg))
+			flush = true;
+		rcu_read_unlock();
+
+		if (flush &&
+		    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
 			if (cpu == curcpu)
 				drain_local_stock(&stock->work);
 			else
 				schedule_work_on(cpu, &stock->work);
 		}
-		css_put(&memcg->css);
 	}
 	put_cpu();
 	mutex_unlock(&percpu_charge_mutex);