Message ID | 20191001091837.GK4536@hirez.programming.kicks-ass.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sched: Avoid spurious lock dependencies | expand |
On 01/10/2019 10:18, Peter Zijlstra wrote: > On Thu, Sep 26, 2019 at 08:29:34AM -0400, Qian Cai wrote: > >> Oh, you were talking about took #3 while holding #2. Anyway, your patch is >> working fine so far. Care to post/merge it officially or do you want me to post >> it? > > Does the below adequately describe the situation? > > --- > Subject: sched: Avoid spurious lock dependencies > > While seemingly harmless, __sched_fork() does hrtimer_init(), which, > when DEBUG_OBJETS, can end up doing allocations. > > This then results in the following lock order: > > rq->lock > zone->lock.rlock > batched_entropy_u64.lock > > Which in turn causes deadlocks when we do wakeups while holding that > batched_entropy lock -- as the random code does. > > Solve this by moving __sched_fork() out from under rq->lock. This is > safe because nothing there relies on rq->lock, as also evident from the > other __sched_fork() callsite. > > Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy") > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Funky dependency, but the change looks fine to me. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> > --- > kernel/sched/core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 7880f4f64d0e..1832fc0fbec5 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu) > struct rq *rq = cpu_rq(cpu); > unsigned long flags; > > + __sched_fork(0, idle); > + > raw_spin_lock_irqsave(&idle->pi_lock, flags); > raw_spin_lock(&rq->lock); > > - __sched_fork(0, idle); > idle->state = TASK_RUNNING; > idle->se.exec_start = sched_clock(); > idle->flags |= PF_IDLE; > >
> On Oct 1, 2019, at 5:18 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > Does the below adequately describe the situation? Yes, looks fine.
> Subject: sched: Avoid spurious lock dependencies > > While seemingly harmless, __sched_fork() does hrtimer_init(), which, > when DEBUG_OBJETS, can end up doing allocations. > NIT: s/DEBUG_OBJETS/DEBUG_OBJECTS > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 7880f4f64d0e..1832fc0fbec5 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu) > struct rq *rq = cpu_rq(cpu); > unsigned long flags; > > + __sched_fork(0, idle); > + > raw_spin_lock_irqsave(&idle->pi_lock, flags); > raw_spin_lock(&rq->lock); > > - __sched_fork(0, idle); > idle->state = TASK_RUNNING; > idle->se.exec_start = sched_clock(); > idle->flags |= PF_IDLE; > Given that there is a comment just after this which says "init_task() gets called multiple times on a task", should we add a check if rq->idle is present and bail out? if (rq->idle) { raw_spin_unlock(&rq->lock); raw_spin_unlock_irqrestore(&idle->pi_lock, flags); return; } Also can we also move the above 3 statements before the lock?
On Tue, Oct 01, 2019 at 05:06:56PM +0530, Srikar Dronamraju wrote: > > Subject: sched: Avoid spurious lock dependencies > > > > While seemingly harmless, __sched_fork() does hrtimer_init(), which, > > when DEBUG_OBJETS, can end up doing allocations. > > > > NIT: s/DEBUG_OBJETS/DEBUG_OBJECTS > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 7880f4f64d0e..1832fc0fbec5 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu) > > struct rq *rq = cpu_rq(cpu); > > unsigned long flags; > > > > + __sched_fork(0, idle); > > + > > raw_spin_lock_irqsave(&idle->pi_lock, flags); > > raw_spin_lock(&rq->lock); > > > > - __sched_fork(0, idle); > > idle->state = TASK_RUNNING; > > idle->se.exec_start = sched_clock(); > > idle->flags |= PF_IDLE; > > > > Given that there is a comment just after this which says > "init_task() gets called multiple times on a task", > should we add a check if rq->idle is present and bail out? > > if (rq->idle) { > raw_spin_unlock(&rq->lock); > raw_spin_unlock_irqrestore(&idle->pi_lock, flags); > return; > } Not really worth it; the best solution is to fix the callchains leading up to it. It's all hotplug related IIRC and so it's slow anyway. > Also can we also move the above 3 statements before the lock? Probably, but to what effect?
> On Oct 1, 2019, at 5:18 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > Does the below adequately describe the situation? > > --- > Subject: sched: Avoid spurious lock dependencies > > While seemingly harmless, __sched_fork() does hrtimer_init(), which, > when DEBUG_OBJETS, can end up doing allocations. > > This then results in the following lock order: > > rq->lock > zone->lock.rlock > batched_entropy_u64.lock > > Which in turn causes deadlocks when we do wakeups while holding that > batched_entropy lock -- as the random code does. > > Solve this by moving __sched_fork() out from under rq->lock. This is > safe because nothing there relies on rq->lock, as also evident from the > other __sched_fork() callsite. > > Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy") > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/sched/core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 7880f4f64d0e..1832fc0fbec5 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu) > struct rq *rq = cpu_rq(cpu); > unsigned long flags; > > + __sched_fork(0, idle); > + > raw_spin_lock_irqsave(&idle->pi_lock, flags); > raw_spin_lock(&rq->lock); > > - __sched_fork(0, idle); > idle->state = TASK_RUNNING; > idle->se.exec_start = sched_clock(); > idle->flags |= PF_IDLE; It looks like this patch has been forgotten forever. Do you need to repost, so Ingo might have a better chance to pick it up?
On Tue, Oct 29, 2019 at 07:10:34AM -0400, Qian Cai wrote: > > It looks like this patch has been forgotten forever. Do you need to > repost, so Ingo might have a better chance to pick it up? I've queued it now, sorry!
> On Oct 29, 2019, at 8:44 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Oct 29, 2019 at 07:10:34AM -0400, Qian Cai wrote: >> >> It looks like this patch has been forgotten forever. Do you need to >> repost, so Ingo might have a better chance to pick it up? > > I've queued it now, sorry! Hmm, this is still not even in the linux-next after another 2 weeks. Not sure what to do except carrying the patch on my own.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7880f4f64d0e..1832fc0fbec5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu) struct rq *rq = cpu_rq(cpu); unsigned long flags; + __sched_fork(0, idle); + raw_spin_lock_irqsave(&idle->pi_lock, flags); raw_spin_lock(&rq->lock); - __sched_fork(0, idle); idle->state = TASK_RUNNING; idle->se.exec_start = sched_clock(); idle->flags |= PF_IDLE;