Message ID | 1457697574-6710-3-git-send-email-jgross@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote: > +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par) > +{ > + cpumask_var_t old_mask; > + int ret; > + > + if (cpu >= nr_cpu_ids) > + return -EINVAL; > + > + if (!alloc_cpumask_var(&old_mask, GFP_KERNEL)) > + return -ENOMEM; > + > + cpumask_copy(old_mask, ¤t->cpus_allowed); > + ret = set_cpus_allowed_ptr(current, cpumask_of(cpu)); > + if (ret) > + goto out; So what happens if someone does sched_setaffinity() right about here? > + > + ret = func(par); > + > + set_cpus_allowed_ptr(current, old_mask); > + > +out: > + free_cpumask_var(old_mask); > + return ret; > +} > +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu); This is disgusting, and you're adding this to !Xen kernels too.
On Fri, Mar 11, 2016 at 01:19:50PM +0100, Peter Zijlstra wrote: > On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote: > > +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par) > > +{ > > + cpumask_var_t old_mask; > > + int ret; > > + > > + if (cpu >= nr_cpu_ids) > > + return -EINVAL; > > + > > + if (!alloc_cpumask_var(&old_mask, GFP_KERNEL)) > > + return -ENOMEM; > > + > > + cpumask_copy(old_mask, ¤t->cpus_allowed); > > + ret = set_cpus_allowed_ptr(current, cpumask_of(cpu)); > > + if (ret) > > + goto out; > > So what happens if someone does sched_setaffinity() right about here? > > > + > > + ret = func(par); > > + > > + set_cpus_allowed_ptr(current, old_mask); > > + > > +out: > > + free_cpumask_var(old_mask); > > + return ret; > > +} > > +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu); > > This is disgusting, and you're adding this to !Xen kernels too. how about something like: struct xen_callback_struct { struct work_struct work; struct completion done; void * data; int ret; }; static void xen_callback_f(struct work_struct *work) { struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work); xcs->ret = xcs->func(xcs->data); complete(&xcs->done); } xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data) { struct xen_callback_state xcs = { .work = __WORK_INITIALIZER(xcs.work, xen_callback_f); .done = COMPLETION_INITIALIZER_ONSTACK(xcs.done), .data = data, }; queue_work_on(&work, cpu); wait_for_completion(&xcs.done); return xcs.ret; } No mucking about with the scheduler state, no new exported functions etc..
On 11/03/16 13:19, Peter Zijlstra wrote: > On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote: >> +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par) >> +{ >> + cpumask_var_t old_mask; >> + int ret; >> + >> + if (cpu >= nr_cpu_ids) >> + return -EINVAL; >> + >> + if (!alloc_cpumask_var(&old_mask, GFP_KERNEL)) >> + return -ENOMEM; >> + >> + cpumask_copy(old_mask, ¤t->cpus_allowed); >> + ret = set_cpus_allowed_ptr(current, cpumask_of(cpu)); >> + if (ret) >> + goto out; > > So what happens if someone does sched_setaffinity() right about here? Aah, okay. Seems I should add preempt_disable() in this patch already and retry the set_cpus_allowed_ptr() in case cpus_allowed has changed. > >> + >> + ret = func(par); >> + >> + set_cpus_allowed_ptr(current, old_mask); >> + >> +out: >> + free_cpumask_var(old_mask); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu); > > This is disgusting, and you're adding this to !Xen kernels too. Sure. It is called on !Xen kernels too. Without Xen it is just omitting the vcpu pinning. I've copied above logic from dcdbas/i8k (it was open coded twice). BTW: I tried to get rid of the complete logic to call a function on cpu 0 only. It was rejected by the Dell maintainers. Juergen
On Fri, Mar 11, 2016 at 01:43:53PM +0100, Juergen Gross wrote: > On 11/03/16 13:19, Peter Zijlstra wrote: > > On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote: > >> +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par) > >> +{ > >> + cpumask_var_t old_mask; > >> + int ret; > >> + > >> + if (cpu >= nr_cpu_ids) > >> + return -EINVAL; > >> + > >> + if (!alloc_cpumask_var(&old_mask, GFP_KERNEL)) > >> + return -ENOMEM; > >> + > >> + cpumask_copy(old_mask, ¤t->cpus_allowed); > >> + ret = set_cpus_allowed_ptr(current, cpumask_of(cpu)); > >> + if (ret) > >> + goto out; > > > > So what happens if someone does sched_setaffinity() right about here? > > Aah, okay. Seems I should add preempt_disable() in this patch already > and retry the set_cpus_allowed_ptr() in case cpus_allowed has changed. Doesn't help, you should simply not _ever_ touch this for random tasks.
On 11/03/16 13:42, Peter Zijlstra wrote: > On Fri, Mar 11, 2016 at 01:19:50PM +0100, Peter Zijlstra wrote: >> On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote: >>> +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par) >>> +{ >>> + cpumask_var_t old_mask; >>> + int ret; >>> + >>> + if (cpu >= nr_cpu_ids) >>> + return -EINVAL; >>> + >>> + if (!alloc_cpumask_var(&old_mask, GFP_KERNEL)) >>> + return -ENOMEM; >>> + >>> + cpumask_copy(old_mask, ¤t->cpus_allowed); >>> + ret = set_cpus_allowed_ptr(current, cpumask_of(cpu)); >>> + if (ret) >>> + goto out; >> >> So what happens if someone does sched_setaffinity() right about here? >> >>> + >>> + ret = func(par); >>> + >>> + set_cpus_allowed_ptr(current, old_mask); >>> + >>> +out: >>> + free_cpumask_var(old_mask); >>> + return ret; >>> +} >>> +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu); >> >> This is disgusting, and you're adding this to !Xen kernels too. > > how about something like: > > struct xen_callback_struct { > struct work_struct work; > struct completion done; > void * data; > int ret; > }; > > static void xen_callback_f(struct work_struct *work) > { > struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work); > > xcs->ret = xcs->func(xcs->data); > > complete(&xcs->done); > } > > xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data) > { > struct xen_callback_state xcs = { > .work = __WORK_INITIALIZER(xcs.work, xen_callback_f); > .done = COMPLETION_INITIALIZER_ONSTACK(xcs.done), > .data = data, > }; > > queue_work_on(&work, cpu); > wait_for_completion(&xcs.done); > > return xcs.ret; > } > > No mucking about with the scheduler state, no new exported functions > etc.. > Hey, I like it. Can't be limited to Xen as on bare metal the function needs to be called on cpu 0, too. But avoiding the scheduler fiddling is much better! As this seems to be required for Dell hardware only, I could add it to some Dell base driver in case you don't want to add it to core code. Juergen
On Fri, Mar 11, 2016 at 01:48:12PM +0100, Juergen Gross wrote: > On 11/03/16 13:42, Peter Zijlstra wrote: > > how about something like: > > > > struct xen_callback_struct { > > struct work_struct work; > > struct completion done; int (*func)(void*); > > void * data; > > int ret; > > }; > > > > static void xen_callback_f(struct work_struct *work) > > { > > struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work); > > > > xcs->ret = xcs->func(xcs->data); > > > > complete(&xcs->done); > > } > > > > xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data) > > { > > struct xen_callback_state xcs = { > > .work = __WORK_INITIALIZER(xcs.work, xen_callback_f); > > .done = COMPLETION_INITIALIZER_ONSTACK(xcs.done), .func = func, > > .data = data, > > }; > > > > queue_work_on(&work, cpu); > > wait_for_completion(&xcs.done); > > > > return xcs.ret; > > } > > > > No mucking about with the scheduler state, no new exported functions > > etc.. > > > > Hey, I like it. Can't be limited to Xen as on bare metal the function > needs to be called on cpu 0, too. But avoiding the scheduler fiddling > is much better! As this seems to be required for Dell hardware only, > I could add it to some Dell base driver in case you don't want to add > it to core code. Urgh yeah, saw that in your other mail. It looks like I should go look at set_cpus_allowed_ptr() abuse :/ Not sure where this would fit best, maybe somewhere near workqueue.c or smp.c.
On 11/03/16 13:57, Peter Zijlstra wrote: > On Fri, Mar 11, 2016 at 01:48:12PM +0100, Juergen Gross wrote: >> On 11/03/16 13:42, Peter Zijlstra wrote: >>> how about something like: >>> >>> struct xen_callback_struct { >>> struct work_struct work; >>> struct completion done; > int (*func)(void*); >>> void * data; >>> int ret; >>> }; >>> >>> static void xen_callback_f(struct work_struct *work) >>> { >>> struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work); >>> >>> xcs->ret = xcs->func(xcs->data); >>> >>> complete(&xcs->done); >>> } >>> >>> xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data) >>> { >>> struct xen_callback_state xcs = { >>> .work = __WORK_INITIALIZER(xcs.work, xen_callback_f); >>> .done = COMPLETION_INITIALIZER_ONSTACK(xcs.done), > .func = func, >>> .data = data, >>> }; >>> >>> queue_work_on(&work, cpu); >>> wait_for_completion(&xcs.done); >>> >>> return xcs.ret; >>> } >>> >>> No mucking about with the scheduler state, no new exported functions >>> etc.. >>> >> >> Hey, I like it. Can't be limited to Xen as on bare metal the function >> needs to be called on cpu 0, too. But avoiding the scheduler fiddling >> is much better! As this seems to be required for Dell hardware only, >> I could add it to some Dell base driver in case you don't want to add >> it to core code. > > Urgh yeah, saw that in your other mail. It looks like I should go look > at set_cpus_allowed_ptr() abuse :/ > > Not sure where this would fit best, maybe somewhere near workqueue.c or > smp.c. At a first glance I think smp.c would be the better choice. I'll have a try. Thanks, Juergen
diff --git a/include/linux/sched.h b/include/linux/sched.h index a10494a..dfadf1a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2247,6 +2247,7 @@ extern void do_set_cpus_allowed(struct task_struct *p, extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par); #else static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask) @@ -2259,6 +2260,14 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p, return -EINVAL; return 0; } +static inline int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), + void *par) +{ + if (cpu != 0) + return -EINVAL; + + return func(par); +} #endif #ifdef CONFIG_NO_HZ_COMMON diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 41f6b22..cb9955f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1265,6 +1265,32 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask) } EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr); +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par) +{ + cpumask_var_t old_mask; + int ret; + + if (cpu >= nr_cpu_ids) + return -EINVAL; + + if (!alloc_cpumask_var(&old_mask, GFP_KERNEL)) + return -ENOMEM; + + cpumask_copy(old_mask, ¤t->cpus_allowed); + ret = set_cpus_allowed_ptr(current, cpumask_of(cpu)); + if (ret) + goto out; + + ret = func(par); + + set_cpus_allowed_ptr(current, old_mask); + +out: + free_cpumask_var(old_mask); + return ret; +} +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu); + void set_task_cpu(struct task_struct *p, unsigned int new_cpu) { #ifdef CONFIG_SCHED_DEBUG
On some hardware models (e.g. Dell Studio 1555 laptop) some hardware related functions (e.g. SMIs) are to be executed on physical cpu 0 only. Instead of open coding such a functionality multiple times in the kernel add a service function for this purpose. This will enable the possibility to take special measures in virtualized environments like Xen, too. Signed-off-by: Juergen Gross <jgross@suse.com> --- include/linux/sched.h | 9 +++++++++ kernel/sched/core.c | 26 ++++++++++++++++++++++++++ 2 files changed, 35 insertions(+)