Message ID | 20190801233513.137917-1-guro@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: memcontrol: switch to rcu protection in drain_all_stock() | expand |
On Thu 01-08-19 16:35:13, Roman Gushchin wrote: > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge") > introduced css_tryget()/css_put() calls in drain_all_stock(), > which are supposed to protect the target memory cgroup from being > released during the mem_cgroup_is_descendant() call. > > However, it's not completely safe. In theory, memcg can go away > between reading stock->cached pointer and calling css_tryget(). I have to remember how is this whole thing supposed to work, it's been some time since I've looked into that. > So, let's read the stock->cached pointer and evaluate the memory > cgroup inside a rcu read section, and get rid of > css_tryget()/css_put() calls. Could you be more specific why does RCU help here? > Signed-off-by: Roman Gushchin <guro@fb.com> > Cc: Michal Hocko <mhocko@suse.com> > --- > mm/memcontrol.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5c7b9facb0eb..d856b64426b7 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2235,21 +2235,22 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *memcg; > + bool flush = false; > > + rcu_read_lock(); > memcg = stock->cached; > - if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css)) > - continue; > - if (!mem_cgroup_is_descendant(memcg, root_memcg)) { > - css_put(&memcg->css); > - continue; > - } > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > + if (memcg && stock->nr_pages && > + mem_cgroup_is_descendant(memcg, root_memcg)) > + flush = true; > + rcu_read_unlock(); > + > + if (flush && > + !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > if (cpu == curcpu) > drain_local_stock(&stock->work); > else > schedule_work_on(cpu, &stock->work); > } > - css_put(&memcg->css); > } > put_cpu(); > mutex_unlock(&percpu_charge_mutex); > -- > 2.21.0
On Fri 02-08-19 10:04:22, Michal Hocko wrote: > On Thu 01-08-19 16:35:13, Roman Gushchin wrote: > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge") > > introduced css_tryget()/css_put() calls in drain_all_stock(), > > which are supposed to protect the target memory cgroup from being > > released during the mem_cgroup_is_descendant() call. > > > > However, it's not completely safe. In theory, memcg can go away > > between reading stock->cached pointer and calling css_tryget(). > > I have to remember how is this whole thing supposed to work, it's been > some time since I've looked into that. OK, I guess I remember now and I do not see how the race is possible. Stock cache is keeping its memcg alive because it elevates the reference counting for each cached charge. And that should keep the whole chain up to the root (of draining) alive, no? Or do I miss something, could you generate a sequence of events that would lead to use-after-free?
On Fri, Aug 02, 2019 at 11:33:33AM +0800, Hillf Danton wrote: > > On Thu, 1 Aug 2019 16:35:13 -0700 Roman Gushchin wrote: > > > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge") > > introduced css_tryget()/css_put() calls in drain_all_stock(), > > which are supposed to protect the target memory cgroup from being > > released during the mem_cgroup_is_descendant() call. > > > > However, it's not completely safe. In theory, memcg can go away > > between reading stock->cached pointer and calling css_tryget(). > > Good catch! > > > > So, let's read the stock->cached pointer and evaluate the memory > > cgroup inside a rcu read section, and get rid of > > css_tryget()/css_put() calls. > > You need to either adjust the boundry of the rcu-protected section, or > retain the call pairs, as the memcg cache is dereferenced again in > drain_stock(). Not really. drain_stock() is always accessing the local percpu stock, and stock->cached memcg pointer is protected by references of stocked pages. Pages are stocked and drained only locally, so they can't go away. So if (stock->nr_pages > 0), the memcg has at least stock->nr_pages references. Also, because stocks on other cpus are drained via scheduled work, neither rcu_read_lock(), not css_tryget()/css_put() protects it. That's exactly the reason why I think this code is worth changing: it looks confusing. It looks like css_tryget()/css_put() protect stock draining, however it's not true. Thanks! > > > > Signed-off-by: Roman Gushchin <guro@fb.com> > > Cc: Michal Hocko <mhocko@suse.com> > > --- > > mm/memcontrol.c | 17 +++++++++-------- > > 1 file changed, 9 insertions(+), 8 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 5c7b9facb0eb..d856b64426b7 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2235,21 +2235,22 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > struct mem_cgroup *memcg; > > + bool flush = false; > > > > + rcu_read_lock(); > > memcg = stock->cached; > > - if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css)) > > - continue; > > - if (!mem_cgroup_is_descendant(memcg, root_memcg)) { > > - css_put(&memcg->css); > > - continue; > > - } > > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > > + if (memcg && stock->nr_pages && > > + mem_cgroup_is_descendant(memcg, root_memcg)) > > + flush = true; > > + rcu_read_unlock(); > > + > > + if (flush && > > + !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > > if (cpu == curcpu) > > drain_local_stock(&stock->work); > > else > > schedule_work_on(cpu, &stock->work); > > } > > - css_put(&memcg->css); > > } > > put_cpu(); > > mutex_unlock(&percpu_charge_mutex); > > -- > > 2.21.0 > > > >
On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote: > On Fri 02-08-19 10:04:22, Michal Hocko wrote: > > On Thu 01-08-19 16:35:13, Roman Gushchin wrote: > > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge") > > > introduced css_tryget()/css_put() calls in drain_all_stock(), > > > which are supposed to protect the target memory cgroup from being > > > released during the mem_cgroup_is_descendant() call. > > > > > > However, it's not completely safe. In theory, memcg can go away > > > between reading stock->cached pointer and calling css_tryget(). > > > > I have to remember how is this whole thing supposed to work, it's been > > some time since I've looked into that. > > OK, I guess I remember now and I do not see how the race is possible. > Stock cache is keeping its memcg alive because it elevates the reference > counting for each cached charge. And that should keep the whole chain up > to the root (of draining) alive, no? Or do I miss something, could you > generate a sequence of events that would lead to use-after-free? Right, but it's true when you reading a local percpu stock. But here we read a remote stock->cached pointer, which can be cleared by a remote concurrent drain_local_stock() execution. In theory, it could be the last reference, and the memcg can be destroyed remotely, so we end up trying to call css_tryget() over freed memory. The race is theoretical, but as I wrote in the thread, I think that it's still worth fixing, because the current code looks confusing (and this confirms my feelings). Thanks!
On Fri 02-08-19 17:00:34, Roman Gushchin wrote: > On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote: > > On Fri 02-08-19 10:04:22, Michal Hocko wrote: > > > On Thu 01-08-19 16:35:13, Roman Gushchin wrote: > > > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge") > > > > introduced css_tryget()/css_put() calls in drain_all_stock(), > > > > which are supposed to protect the target memory cgroup from being > > > > released during the mem_cgroup_is_descendant() call. > > > > > > > > However, it's not completely safe. In theory, memcg can go away > > > > between reading stock->cached pointer and calling css_tryget(). > > > > > > I have to remember how is this whole thing supposed to work, it's been > > > some time since I've looked into that. > > > > OK, I guess I remember now and I do not see how the race is possible. > > Stock cache is keeping its memcg alive because it elevates the reference > > counting for each cached charge. And that should keep the whole chain up > > to the root (of draining) alive, no? Or do I miss something, could you > > generate a sequence of events that would lead to use-after-free? > > Right, but it's true when you reading a local percpu stock. > But here we read a remote stock->cached pointer, which can be cleared > by a remote concurrent drain_local_stock() execution. OK, I can see how refill_stock can race with drain_all_stock. I am not sure I see drain_local_stock race because that should be triggered only from drain_all_stock and only one cpu is allowed to do that. Maybe we might have scheduled a work from the previous run? In any case, please document the race in the changelog please. This code is indeed tricky and a comment would help as well.
On Fri, Aug 02, 2019 at 07:14:51PM +0200, Michal Hocko wrote: > On Fri 02-08-19 17:00:34, Roman Gushchin wrote: > > On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote: > > > On Fri 02-08-19 10:04:22, Michal Hocko wrote: > > > > On Thu 01-08-19 16:35:13, Roman Gushchin wrote: > > > > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge") > > > > > introduced css_tryget()/css_put() calls in drain_all_stock(), > > > > > which are supposed to protect the target memory cgroup from being > > > > > released during the mem_cgroup_is_descendant() call. > > > > > > > > > > However, it's not completely safe. In theory, memcg can go away > > > > > between reading stock->cached pointer and calling css_tryget(). > > > > > > > > I have to remember how is this whole thing supposed to work, it's been > > > > some time since I've looked into that. > > > > > > OK, I guess I remember now and I do not see how the race is possible. > > > Stock cache is keeping its memcg alive because it elevates the reference > > > counting for each cached charge. And that should keep the whole chain up > > > to the root (of draining) alive, no? Or do I miss something, could you > > > generate a sequence of events that would lead to use-after-free? > > > > Right, but it's true when you reading a local percpu stock. > > But here we read a remote stock->cached pointer, which can be cleared > > by a remote concurrent drain_local_stock() execution. > > OK, I can see how refill_stock can race with drain_all_stock. I am not > sure I see drain_local_stock race because that should be triggered only > from drain_all_stock and only one cpu is allowed to do that. Maybe we > might have scheduled a work from the previous run? Exactly. Previously executed drain_all_stock() -> schedule_work -> drain_local_stock() on a remote cpu races with checking memcg pointer from drain_all_stock. > > In any case, please document the race in the changelog please. This code > is indeed tricky and a comment would help as well. Sure, will send out v2 soon. Thanks!
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5c7b9facb0eb..d856b64426b7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2235,21 +2235,22 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *memcg; + bool flush = false; + rcu_read_lock(); memcg = stock->cached; - if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css)) - continue; - if (!mem_cgroup_is_descendant(memcg, root_memcg)) { - css_put(&memcg->css); - continue; - } - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { + if (memcg && stock->nr_pages && + mem_cgroup_is_descendant(memcg, root_memcg)) + flush = true; + rcu_read_unlock(); + + if (flush && + !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { if (cpu == curcpu) drain_local_stock(&stock->work); else schedule_work_on(cpu, &stock->work); } - css_put(&memcg->css); } put_cpu(); mutex_unlock(&percpu_charge_mutex);
Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge") introduced css_tryget()/css_put() calls in drain_all_stock(), which are supposed to protect the target memory cgroup from being released during the mem_cgroup_is_descendant() call. However, it's not completely safe. In theory, memcg can go away between reading stock->cached pointer and calling css_tryget(). So, let's read the stock->cached pointer and evaluate the memory cgroup inside a rcu read section, and get rid of css_tryget()/css_put() calls. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> --- mm/memcontrol.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)