Message ID | 20160121212416.GL6357@twins.programming.kicks-ass.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 21, 2016 at 10:24:16PM +0100, Peter Zijlstra wrote: > On Thu, Jan 21, 2016 at 03:31:11PM -0500, Tejun Heo wrote: > > There are three subsystem callbacks in css shutdown path - > > css_offline(), css_released() and css_free(). Except for > > css_released(), cgroup core didn't use to guarantee the order of > > invocation. css_offline() or css_free() could be called on a parent > > css before its children. This behavior is unexpected and led to > > use-after-free in cpu controller. > > > > This patch updates offline path so that a parent css is never offlined > > before its children. Each css keeps online_cnt which reaches zero iff > > itself and all its children are offline and offline_css() is invoked > > only after online_cnt reaches zero. > > > > This fixes the reported cpu controller malfunction. The next patch > > will update css_free() handling. > > No, I need to fix the cpu controller too, because the offending code > sits off of css_free() (the next patch), but also does a call_rcu() in > between, which also doesn't guarantee order. Ah, I see. Christian, can you please apply all three patches and see whether the problem gets fixed? Once verified, I'll update the patch description and repost. Thanks.
On 01/21/2016 10:28 PM, Tejun Heo wrote: > On Thu, Jan 21, 2016 at 10:24:16PM +0100, Peter Zijlstra wrote: >> On Thu, Jan 21, 2016 at 03:31:11PM -0500, Tejun Heo wrote: >>> There are three subsystem callbacks in css shutdown path - >>> css_offline(), css_released() and css_free(). Except for >>> css_released(), cgroup core didn't use to guarantee the order of >>> invocation. css_offline() or css_free() could be called on a parent >>> css before its children. This behavior is unexpected and led to >>> use-after-free in cpu controller. >>> >>> This patch updates offline path so that a parent css is never offlined >>> before its children. Each css keeps online_cnt which reaches zero iff >>> itself and all its children are offline and offline_css() is invoked >>> only after online_cnt reaches zero. >>> >>> This fixes the reported cpu controller malfunction. The next patch >>> will update css_free() handling. >> >> No, I need to fix the cpu controller too, because the offending code >> sits off of css_free() (the next patch), but also does a call_rcu() in >> between, which also doesn't guarantee order. > > Ah, I see. Christian, can you please apply all three patches and see > whether the problem gets fixed? Once verified, I'll update the patch > description and repost. With these 3 patches I always run into the dio/scsi problem, but never in the css issue. So I cannot test a full day or so, but it looks like the problem is gone. At least it worked multiple times for 30minutes or so until my system was killed by the io issue. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b8bd352dc63f..d589a140fe0e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7865,11 +7865,9 @@ void sched_destroy_group(struct task_group *tg) void sched_offline_group(struct task_group *tg) { unsigned long flags; - int i; /* end participation in shares distribution */ - for_each_possible_cpu(i) - unregister_fair_sched_group(tg, i); + unregister_fair_sched_group(tg); spin_lock_irqsave(&task_group_lock, flags); list_del_rcu(&tg->list); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7f60da0f0fd7..aff660b70bf5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8244,11 +8244,8 @@ void free_fair_sched_group(struct task_group *tg) for_each_possible_cpu(i) { if (tg->cfs_rq) kfree(tg->cfs_rq[i]); - if (tg->se) { - if (tg->se[i]) - remove_entity_load_avg(tg->se[i]); + if (tg->se) kfree(tg->se[i]); - } } kfree(tg->cfs_rq); @@ -8296,21 +8293,29 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) return 0; } -void unregister_fair_sched_group(struct task_group *tg, int cpu) +void unregister_fair_sched_group(struct task_group *tg) { - struct rq *rq = cpu_rq(cpu); unsigned long flags; + struct rq *rq; + int cpu; - /* - * Only empty task groups can be destroyed; so we can speculatively - * check on_list without danger of it being re-added. - */ - if (!tg->cfs_rq[cpu]->on_list) - return; + for_each_possible_cpu(cpu) { + if (tg->se[cpu]) + remove_entity_load_avg(tg->se[cpu]); - raw_spin_lock_irqsave(&rq->lock, flags); - list_del_leaf_cfs_rq(tg->cfs_rq[cpu]); - raw_spin_unlock_irqrestore(&rq->lock, flags); + /* + * Only empty task groups can be destroyed; so we can speculatively + * check on_list without danger of it being re-added. + */ + if (!tg->cfs_rq[cpu]->on_list) + continue; + + rq = cpu_rq(cpu); + + raw_spin_lock_irqsave(&rq->lock, flags); + list_del_leaf_cfs_rq(tg->cfs_rq[cpu]); + raw_spin_unlock_irqrestore(&rq->lock, flags); + } } void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 837bcd383cda..492478bb717c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -313,7 +313,7 @@ extern int tg_nop(struct task_group *tg, void *data); extern void free_fair_sched_group(struct task_group *tg); extern int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent); -extern void unregister_fair_sched_group(struct task_group *tg, int cpu); +extern void unregister_fair_sched_group(struct task_group *tg); extern void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq, struct sched_entity *se, int cpu, struct sched_entity *parent);