Message ID | 20210602131225.336600299@infradead.org (mailing list archive) |
---|---|
Headers | show |
Series | sched: Cleanup task_struct::state | expand |
On Wed, Jun 02, 2021 at 03:12:26PM +0200, Peter Zijlstra wrote: > Remove broken task->state references and let wake_up_process() DTRT. > > The anti-pattern in these patches breaks the ordering of ->state vs > COND as described in the comment near set_current_state() and can lead > to missed wakeups: > > (OoO load, observes RUNNING)<-. > for (;;) { | > t->state = UNINTERRUPTIBLE; | > smp_mb(); ,-----> ,' (OoO load, observed !COND) > | | > | | COND = 1; > | `- if (t->state != RUNNING) > | wake_up_process(t); // not done > if (COND) ---------' > break; > schedule(); // forever waiting > } > t->state = TASK_RUNNING; > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > drivers/net/ethernet/qualcomm/qca_spi.c | 6 ++---- > drivers/usb/gadget/udc/max3420_udc.c | 15 +++++---------- > drivers/usb/host/max3421-hcd.c | 3 +-- > kernel/softirq.c | 2 +- > 4 files changed, 9 insertions(+), 17 deletions(-) For USB stuff: Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote: > Remove yet another few p->state accesses. [...] > > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -212,6 +212,8 @@ struct task_group; > > #endif > > +#define get_current_state() READ_ONCE(current->state) Why use a macro rather than a static inline here ? Thanks, Mathieu
----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote: > Change the type and name of task_struct::state. Drop the volatile and > shrink it to an 'unsigned int'. Rename it in order to find all uses > such that we can use READ_ONCE/WRITE_ONCE as appropriate. > [...] > > --- a/block/blk-mq.c > +++ b/block/blk-mq.c [...] > @@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin > psinfo->pr_pgrp = task_pgrp_vnr(p); > psinfo->pr_sid = task_session_vnr(p); > > - i = p->state ? ffz(~p->state) + 1 : 0; > + state = READ_ONCE(p->__state); > + i = state ? ffz(~state) + 1 : 0; > psinfo->pr_state = i; > psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i]; > psinfo->pr_zomb = psinfo->pr_sname == 'Z'; [...] > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -113,13 +113,13 @@ struct task_group; > __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \ > TASK_PARKED) > > -#define task_is_running(task) (READ_ONCE((task)->state) == TASK_RUNNING) > +#define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING) > > -#define task_is_traced(task) ((task->state & __TASK_TRACED) != 0) > +#define task_is_traced(task) ((READ_ONCE(task->__state) & __TASK_TRACED) != 0) > > -#define task_is_stopped(task) ((task->state & __TASK_STOPPED) != 0) > +#define task_is_stopped(task) ((READ_ONCE(task->__state) & __TASK_STOPPED) != > 0) > > -#define task_is_stopped_or_traced(task) ((task->state & (__TASK_STOPPED | > __TASK_TRACED)) != 0) > +#define task_is_stopped_or_traced(task) ((READ_ONCE(task->__state) & > (__TASK_STOPPED | __TASK_TRACED)) != 0) > > #ifdef CONFIG_DEBUG_ATOMIC_SLEEP > > @@ -134,14 +134,14 @@ struct task_group; > do { \ > WARN_ON_ONCE(is_special_task_state(state_value));\ > current->task_state_change = _THIS_IP_; \ > - current->state = (state_value); \ > + WRITE_ONCE(current->__state, (state_value)); \ > } while (0) Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle READ_ONCE() and WRITE_ONCE() all over the kernel ? Thanks, Mathieu
On Wed, Jun 02, 2021 at 10:01:29AM -0400, Mathieu Desnoyers wrote: > ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote: > > > Remove yet another few p->state accesses. > > [...] > > > > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -212,6 +212,8 @@ struct task_group; > > > > #endif > > > > +#define get_current_state() READ_ONCE(current->state) > > Why use a macro rather than a static inline here ? Mostly to be consistent, all that state stuff is macros. I suppose we could try and make them inlines at the end or so -- if the header maze allows. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 02, 2021 at 10:06:58AM -0400, Mathieu Desnoyers wrote: > ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote: > > @@ -134,14 +134,14 @@ struct task_group; > > do { \ > > WARN_ON_ONCE(is_special_task_state(state_value));\ > > current->task_state_change = _THIS_IP_; \ > > - current->state = (state_value); \ > > + WRITE_ONCE(current->__state, (state_value)); \ > > } while (0) > > Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle > READ_ONCE() and WRITE_ONCE() all over the kernel ? set_task_state() is fundamentally unsound, there's very few sites that _set_ state on anything other than current, and those sites are super tricky, eg. ptrace. Having get_task_state() would seem to suggest it's actually a sane thing to do, it's not really. Inspecting remote state is full of races, and some of that really wants cleaning up, but that's for another day. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 02, 2021 at 03:12:26PM +0200, Peter Zijlstra wrote: > Remove broken task->state references and let wake_up_process() DTRT. > > The anti-pattern in these patches breaks the ordering of ->state vs > COND as described in the comment near set_current_state() and can lead > to missed wakeups: > > (OoO load, observes RUNNING)<-. > for (;;) { | > t->state = UNINTERRUPTIBLE; | > smp_mb(); ,-----> ,' (OoO load, observed !COND) > | | > | | COND = 1; > | `- if (t->state != RUNNING) > | wake_up_process(t); // not done > if (COND) ---------' > break; > schedule(); // forever waiting > } > t->state = TASK_RUNNING; > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > drivers/net/ethernet/qualcomm/qca_spi.c | 6 ++---- > drivers/usb/gadget/udc/max3420_udc.c | 15 +++++---------- > drivers/usb/host/max3421-hcd.c | 3 +-- > kernel/softirq.c | 2 +- > 4 files changed, 9 insertions(+), 17 deletions(-) Acked-by: Will Deacon <will@kernel.org> I couldn't spot any others. Will -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 02, 2021 at 03:12:27PM +0200, Peter Zijlstra wrote: > Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: > task_is_running(p). > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/kernel/process.c | 4 ++-- > block/blk-mq.c | 2 +- > include/linux/sched.h | 2 ++ > kernel/locking/lockdep.c | 2 +- > kernel/rcu/tree_plugin.h | 2 +- > kernel/sched/core.c | 6 +++--- > kernel/sched/stats.h | 2 +- > kernel/signal.c | 2 +- > kernel/softirq.c | 3 +-- > mm/compaction.c | 2 +- > 10 files changed, 14 insertions(+), 13 deletions(-) > > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru > unsigned long start, bottom, top, sp, fp, ip, ret = 0; > int count = 0; > > - if (p == current || p->state == TASK_RUNNING) > + if (p == current || task_is_running(p)) Looks like this one in get_wchan() has been cargo-culted across most of arch/ so they'll need fixing up before you rename the struct member. There's also a weird one in tools/bpf/runqslower/runqslower.bpf.c (!) Will -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote: > Change the type and name of task_struct::state. Drop the volatile and > shrink it to an 'unsigned int'. Rename it in order to find all uses > such that we can use READ_ONCE/WRITE_ONCE as appropriate. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > block/blk-mq.c | 2 - > drivers/md/dm.c | 6 ++-- > fs/binfmt_elf.c | 8 +++--- > fs/userfaultfd.c | 4 +-- > include/linux/sched.h | 31 +++++++++++------------ > include/linux/sched/debug.h | 2 - > include/linux/sched/signal.h | 2 - > init/init_task.c | 2 - > kernel/cgroup/cgroup-v1.c | 2 - > kernel/debug/kdb/kdb_support.c | 18 +++++++------ > kernel/fork.c | 4 +-- > kernel/hung_task.c | 2 - > kernel/kthread.c | 4 +-- > kernel/locking/mutex.c | 6 ++-- > kernel/locking/rtmutex.c | 4 +-- > kernel/locking/rwsem.c | 2 - > kernel/ptrace.c | 12 ++++----- > kernel/rcu/rcutorture.c | 4 +-- > kernel/rcu/tree_stall.h | 12 ++++----- > kernel/sched/core.c | 53 +++++++++++++++++++++-------------------- > kernel/sched/deadline.c | 10 +++---- > kernel/sched/fair.c | 11 +++++--- > lib/syscall.c | 4 +-- > net/core/dev.c | 2 - > 24 files changed, 108 insertions(+), 99 deletions(-) I think this makes the code a _lot_ easier to understand, so: Acked-by: Will Deacon <will@kernel.org> on the assumption that you'll fix get_wchan() for !x86 as well. Cheers, Will -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 02, 2021 at 03:59:21PM +0100, Will Deacon wrote: > On Wed, Jun 02, 2021 at 03:12:27PM +0200, Peter Zijlstra wrote: > > Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: > > task_is_running(p). > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > --- > > arch/x86/kernel/process.c | 4 ++-- > > block/blk-mq.c | 2 +- > > include/linux/sched.h | 2 ++ > > kernel/locking/lockdep.c | 2 +- > > kernel/rcu/tree_plugin.h | 2 +- > > kernel/sched/core.c | 6 +++--- > > kernel/sched/stats.h | 2 +- > > kernel/signal.c | 2 +- > > kernel/softirq.c | 3 +-- > > mm/compaction.c | 2 +- > > 10 files changed, 14 insertions(+), 13 deletions(-) > > > > --- a/arch/x86/kernel/process.c > > +++ b/arch/x86/kernel/process.c > > @@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru > > unsigned long start, bottom, top, sp, fp, ip, ret = 0; > > int count = 0; > > > > - if (p == current || p->state == TASK_RUNNING) > > + if (p == current || task_is_running(p)) > > Looks like this one in get_wchan() has been cargo-culted across most of > arch/ so they'll need fixing up before you rename the struct member. Yeah, this was x86_64 allmodconfig driven, I've already got a bunch of robot mail telling me other archs need help, I'll fix it iup. > There's also a weird one in tools/bpf/runqslower/runqslower.bpf.c (!) I'm tempted to let the bpf people sort their own gunk. This is not an ABI. I so don't care breaking every script out there. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, 02 Jun 2021, Peter Zijlstra wrote: >Remove broken task->state references and let wake_up_process() DTRT. > >The anti-pattern in these patches breaks the ordering of ->state vs >COND as described in the comment near set_current_state() and can lead >to missed wakeups: > > (OoO load, observes RUNNING)<-. > for (;;) { | > t->state = UNINTERRUPTIBLE; | > smp_mb(); ,-----> ,' (OoO load, observed !COND) > | | > | | COND = 1; > | `- if (t->state != RUNNING) > | wake_up_process(t); // not done > if (COND) ---------' > break; > schedule(); // forever waiting > } > t->state = TASK_RUNNING; > >Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, 02 Jun 2021, Peter Zijlstra wrote: >Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: >task_is_running(p). > >Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso But afaict .... >--- > arch/x86/kernel/process.c | 4 ++-- > block/blk-mq.c | 2 +- > include/linux/sched.h | 2 ++ > kernel/locking/lockdep.c | 2 +- > kernel/rcu/tree_plugin.h | 2 +- > kernel/sched/core.c | 6 +++--- > kernel/sched/stats.h | 2 +- > kernel/signal.c | 2 +- > kernel/softirq.c | 3 +-- > mm/compaction.c | 2 +- > 10 files changed, 14 insertions(+), 13 deletions(-) there are also (on top of the already mentioned arch/): kernel/kcsan/report.c: const bool is_running = current->state == TASK_RUNNING; kernel/locking/lockdep.c: if (p->state == TASK_RUNNING && p != current) -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel