Message ID | 20230306200849.376804-4-longman@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | cgroup/cpuset: Miscellaneous updates | expand |
Hello. On Mon, Mar 06, 2023 at 03:08:47PM -0500, Waiman Long <longman@redhat.com> wrote: > On a system with asymmetric CPUs, a restricted task is one that can run > only a selected subset of available CPUs. When a CPU goes offline or > when "cpuset.cpus" is changed, it is possible that a restricted task > may not have any runnable CPUs left in the current cpuset even if there > is still some CPUs in effective_cpus. In this case, the restricted task > cannot be run at all. > > There are several ways we may be able to handle this situation. Treating > it like empty effective_cpus is probably too disruptive and is unfair to > the normal tasks. So it is better to have some special handling for these > restricted tasks. One possibility is to move the restricted tasks up the > cpuset hierarchy, but it is tricky to do it right. Another solution is > to assign other usable CPUs to these tasks. This patch implements the > later alternative by finding one usable CPU by walking up the cpuset > hierarchy and printing an informational message to let the users know > that these restricted tasks are running in a cpuset with no usable CPU. > > Signed-off-by: Waiman Long <longman@redhat.com> > --- > kernel/cgroup/cpuset.c | 56 +++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 55 insertions(+), 1 deletion(-) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index bbf57dcb2f68..aa8225daf1d3 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -1202,6 +1202,38 @@ void rebuild_sched_domains(void) > cpus_read_unlock(); > } > > [...] > /** > * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset. > * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed > @@ -1218,6 +1250,7 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) > struct task_struct *task; > bool top_cs = cs == &top_cpuset; > > + percpu_rwsem_assert_held(&cpuset_rwsem); > css_task_iter_start(&cs->css, 0, &it); > while ((task = css_task_iter_next(&it))) { > const struct cpumask *possible_mask = task_cpu_possible_mask(task); > @@ -1232,7 +1265,28 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) > } else { > cpumask_and(new_cpus, cs->effective_cpus, possible_mask); > } > - set_cpus_allowed_ptr(task, new_cpus); > + /* > + * On systems with assymetric CPUs, it is possible that > + * cpumask will become empty or set_cpus_allowed_ptr() will > + * return an error even if we still have CPUs in > + * effective_cpus. In this case, we find a usable CPU walking > + * up the cpuset hierarchy and use that for this particular > + * task with an informational message about the change in the > + * hope that the users will adjust "cpuset.cpus" accordingly. > + */ > + if (cpumask_empty(new_cpus) || > + set_cpus_allowed_ptr(task, new_cpus)) { IIUC, cpumask_empty(new_cpus) here implies cpumask_empty(cs->effective_cpus) but that shouldn't happen (cs should inherit non-empty mask from an ancestor). Do I miss/forget anything? This thus covers the case when p->user_cpus_ptr is incompatible with hotplug or cpuset.cpus allowance and a different affinity must be chosen. But doesn't that mean that the task would run _out_ of cs->effective_cpus? I guess that's unavoidable on asymmetric CPU archs but not no SMPs. Shouldn't the solution distinguish between the two? (I.e. never run out of effective_cpus on SMP.) Thanks, Michal
On 3/14/23 14:17, Michal Koutný wrote: > Hello. > > On Mon, Mar 06, 2023 at 03:08:47PM -0500, Waiman Long <longman@redhat.com> wrote: >> On a system with asymmetric CPUs, a restricted task is one that can run >> only a selected subset of available CPUs. When a CPU goes offline or >> when "cpuset.cpus" is changed, it is possible that a restricted task >> may not have any runnable CPUs left in the current cpuset even if there >> is still some CPUs in effective_cpus. In this case, the restricted task >> cannot be run at all. >> >> There are several ways we may be able to handle this situation. Treating >> it like empty effective_cpus is probably too disruptive and is unfair to >> the normal tasks. So it is better to have some special handling for these >> restricted tasks. One possibility is to move the restricted tasks up the >> cpuset hierarchy, but it is tricky to do it right. Another solution is >> to assign other usable CPUs to these tasks. This patch implements the >> later alternative by finding one usable CPU by walking up the cpuset >> hierarchy and printing an informational message to let the users know >> that these restricted tasks are running in a cpuset with no usable CPU. >> >> Signed-off-by: Waiman Long <longman@redhat.com> >> --- >> kernel/cgroup/cpuset.c | 56 +++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 55 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index bbf57dcb2f68..aa8225daf1d3 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -1202,6 +1202,38 @@ void rebuild_sched_domains(void) >> cpus_read_unlock(); >> } >> >> [...] >> /** >> * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset. >> * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed >> @@ -1218,6 +1250,7 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) >> struct task_struct *task; >> bool top_cs = cs == &top_cpuset; >> >> + percpu_rwsem_assert_held(&cpuset_rwsem); >> css_task_iter_start(&cs->css, 0, &it); >> while ((task = css_task_iter_next(&it))) { >> const struct cpumask *possible_mask = task_cpu_possible_mask(task); >> @@ -1232,7 +1265,28 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) >> } else { >> cpumask_and(new_cpus, cs->effective_cpus, possible_mask); >> } >> - set_cpus_allowed_ptr(task, new_cpus); >> + /* >> + * On systems with assymetric CPUs, it is possible that >> + * cpumask will become empty or set_cpus_allowed_ptr() will >> + * return an error even if we still have CPUs in >> + * effective_cpus. In this case, we find a usable CPU walking >> + * up the cpuset hierarchy and use that for this particular >> + * task with an informational message about the change in the >> + * hope that the users will adjust "cpuset.cpus" accordingly. >> + */ >> + if (cpumask_empty(new_cpus) || >> + set_cpus_allowed_ptr(task, new_cpus)) { > IIUC, cpumask_empty(new_cpus) here implies > cpumask_empty(cs->effective_cpus) but that shouldn't happen (cs should > inherit non-empty mask from an ancestor). Do I miss/forget anything? > > This thus covers the case when p->user_cpus_ptr is incompatible with > hotplug or cpuset.cpus allowance and a different affinity must be > chosen. But doesn't that mean that the task would run _out_ of > cs->effective_cpus? > I guess that's unavoidable on asymmetric CPU archs but not no SMPs. > Shouldn't the solution distinguish between the two? (I.e. never run out > of effective_cpus on SMP.) Some arm64 systems can have asymmetric CPUs where certain tasks are only runnable on a selected subset of CPUs. This information is not captured in the cpuset. As a result, task_cpu_possible_mask() may return a mask that have no overlap with effective_cpus causing new_cpus to become empty. It takes me a while to understand the implication of that. I will elaborate this point a bit more in the patch. Cheers, Longman
On Tue, Mar 14, 2023 at 04:22:06PM -0400, Waiman Long <longman@redhat.com> wrote: > Some arm64 systems can have asymmetric CPUs where certain tasks are only > runnable on a selected subset of CPUs. Ah, I'm catching up. > This information is not captured in the cpuset. As a result, > task_cpu_possible_mask() may return a mask that have no overlap with > effective_cpus causing new_cpus to become empty. I can see that historically, there was an approach of terminating unaccomodable tasks: 94f9c00f6460 ("arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores") the removal of killing had been made possible with df950811f4a8 ("arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system"). That gives two other alternatives to affinity modification: 2) kill such tasks (not unlike OOM upon memory.max reduction), 3) reject cpuset reduction (violates cgroup v2 delegation). What do you think about 2)? Michal
On 3/17/23 08:27, Michal Koutný wrote: > On Tue, Mar 14, 2023 at 04:22:06PM -0400, Waiman Long <longman@redhat.com> wrote: >> Some arm64 systems can have asymmetric CPUs where certain tasks are only >> runnable on a selected subset of CPUs. > Ah, I'm catching up. > >> This information is not captured in the cpuset. As a result, >> task_cpu_possible_mask() may return a mask that have no overlap with >> effective_cpus causing new_cpus to become empty. > I can see that historically, there was an approach of terminating > unaccomodable tasks: > 94f9c00f6460 ("arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores") > the removal of killing had been made possible with > df950811f4a8 ("arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system"). > > That gives two other alternatives to affinity modification: > 2) kill such tasks (not unlike OOM upon memory.max reduction), > 3) reject cpuset reduction (violates cgroup v2 delegation). > > What do you think about 2)? Yes, killing it is one possible solution. (3) doesn't work if the affinity change is due to hot cpu removal. So that leaves this patch or (2) as the only alternative. I would like to hear what Will and Tejun thinks about it. I am going to remove this patch from the series for the time being. Thanks, Longman
On Fri, Mar 17, 2023 at 10:59:26AM -0400, Waiman Long wrote: > On 3/17/23 08:27, Michal Koutný wrote: > > On Tue, Mar 14, 2023 at 04:22:06PM -0400, Waiman Long <longman@redhat.com> wrote: > > > Some arm64 systems can have asymmetric CPUs where certain tasks are only > > > runnable on a selected subset of CPUs. > > Ah, I'm catching up. > > > > > This information is not captured in the cpuset. As a result, > > > task_cpu_possible_mask() may return a mask that have no overlap with > > > effective_cpus causing new_cpus to become empty. > > I can see that historically, there was an approach of terminating > > unaccomodable tasks: > > 94f9c00f6460 ("arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores") > > the removal of killing had been made possible with > > df950811f4a8 ("arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system"). > > > > That gives two other alternatives to affinity modification: > > 2) kill such tasks (not unlike OOM upon memory.max reduction), > > 3) reject cpuset reduction (violates cgroup v2 delegation). > > > > What do you think about 2)? > > Yes, killing it is one possible solution. > > (3) doesn't work if the affinity change is due to hot cpu removal. So that > leaves this patch or (2) as the only alternative. I would like to hear what > Will and Tejun thinks about it. The main constraint from the Android side (the lucky ecosystem where these SoCs tend to show up) is that existing userspace (including 32-bit binaries) continues to function without modification. So approaches such as killing tasks or rejecting system calls tend not to work as well, since you inevitably get divergent behaviour leading to functional breakage rather than e.g. performance anomalies. Having said that, the behaviour we currently have in mainline seems to be alright, so please don't go out of your way to accomodate these SoCs. I'm mainly just concerned about introducing any regressions, which is why I ran my tests on this series Cheers, Will
On 3/24/23 10:32, Will Deacon wrote: > On Fri, Mar 17, 2023 at 10:59:26AM -0400, Waiman Long wrote: >> On 3/17/23 08:27, Michal Koutný wrote: >>> On Tue, Mar 14, 2023 at 04:22:06PM -0400, Waiman Long <longman@redhat.com> wrote: >>>> Some arm64 systems can have asymmetric CPUs where certain tasks are only >>>> runnable on a selected subset of CPUs. >>> Ah, I'm catching up. >>> >>>> This information is not captured in the cpuset. As a result, >>>> task_cpu_possible_mask() may return a mask that have no overlap with >>>> effective_cpus causing new_cpus to become empty. >>> I can see that historically, there was an approach of terminating >>> unaccomodable tasks: >>> 94f9c00f6460 ("arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores") >>> the removal of killing had been made possible with >>> df950811f4a8 ("arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system"). >>> >>> That gives two other alternatives to affinity modification: >>> 2) kill such tasks (not unlike OOM upon memory.max reduction), >>> 3) reject cpuset reduction (violates cgroup v2 delegation). >>> >>> What do you think about 2)? >> Yes, killing it is one possible solution. >> >> (3) doesn't work if the affinity change is due to hot cpu removal. So that >> leaves this patch or (2) as the only alternative. I would like to hear what >> Will and Tejun thinks about it. > The main constraint from the Android side (the lucky ecosystem where these > SoCs tend to show up) is that existing userspace (including 32-bit binaries) > continues to function without modification. So approaches such as killing > tasks or rejecting system calls tend not to work as well, since you > inevitably get divergent behaviour leading to functional breakage rather > than e.g. performance anomalies. > > Having said that, the behaviour we currently have in mainline seems to > be alright, so please don't go out of your way to accomodate these SoCs. > I'm mainly just concerned about introducing any regressions, which is why > I ran my tests on this series I agree that killing it may be too draconian. I am withholding this patch for now. Thanks, Longman
On Fri, Mar 24, 2023 at 02:32:50PM +0000, Will Deacon <will@kernel.org> wrote: > So approaches such as killing tasks or rejecting system calls tend not > to work as well, since you inevitably get divergent behaviour leading > to functional breakage rather than e.g. performance anomalies. What about temporary performance drop from 100% to 0% aka freezing the tasks for the duration of the mismatching affinity config? > Having said that, the behaviour we currently have in mainline seems to > be alright, so please don't go out of your way to accomodate these SoCs. I see. (Just wondering what you think about the fourth option above.) Thanks, Michal
On 3/24/23 14:19, Michal Koutný wrote: > On Fri, Mar 24, 2023 at 02:32:50PM +0000, Will Deacon <will@kernel.org> wrote: >> So approaches such as killing tasks or rejecting system calls tend not >> to work as well, since you inevitably get divergent behaviour leading >> to functional breakage rather than e.g. performance anomalies. > What about temporary performance drop from 100% to 0% aka freezing the > tasks for the duration of the mismatching affinity config? That can be a lot of extra work to freeze it. I will prefer something simpler. Without this patch, I believe it will lead to a cpumask of 0 which will cause the scheduler to pick a fallback cpu. It looks like the fallback code may be able to pick up the right cpu or it may panic the system (less likely). Cheers, Longman > > >> Having said that, the behaviour we currently have in mainline seems to >> be alright, so please don't go out of your way to accomodate these SoCs. > I see. (Just wondering what you think about the fourth option above.) > > Thanks, > Michal
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index bbf57dcb2f68..aa8225daf1d3 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1202,6 +1202,38 @@ void rebuild_sched_domains(void) cpus_read_unlock(); } +/* + * Find a usable effective (online) CPU up the cpuset hierarchy and return it. + */ +static int find_usable_cpu(struct cpuset *cs, struct cpumask *new_cpus, + const struct cpumask *possible_mask) +{ + struct cpuset *parent; + unsigned long flags; + int cpu; + + /* + * When offlining cpu, some effective_cpus may not be up to date. + * So check cpu_online_mask to be sure. + */ + parent = parent_cs(cs); + while (parent && + (!cpumask_and(new_cpus, parent->effective_cpus, possible_mask) || + !cpumask_and(new_cpus, new_cpus, cpu_online_mask))) + parent = parent_cs(cs); + + /* Fall back to all possible online cpus, if necessary */ + if (!parent) + cpumask_and(new_cpus, possible_mask, cpu_online_mask); + + /* cpumask_any_distribute() has to be called with preemption disabled */ + local_irq_save(flags); + cpu = cpumask_any_distribute(new_cpus); + local_irq_restore(flags); + + return cpu; +} + /** * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset. * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed @@ -1218,6 +1250,7 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) struct task_struct *task; bool top_cs = cs == &top_cpuset; + percpu_rwsem_assert_held(&cpuset_rwsem); css_task_iter_start(&cs->css, 0, &it); while ((task = css_task_iter_next(&it))) { const struct cpumask *possible_mask = task_cpu_possible_mask(task); @@ -1232,7 +1265,28 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) } else { cpumask_and(new_cpus, cs->effective_cpus, possible_mask); } - set_cpus_allowed_ptr(task, new_cpus); + /* + * On systems with assymetric CPUs, it is possible that + * cpumask will become empty or set_cpus_allowed_ptr() will + * return an error even if we still have CPUs in + * effective_cpus. In this case, we find a usable CPU walking + * up the cpuset hierarchy and use that for this particular + * task with an informational message about the change in the + * hope that the users will adjust "cpuset.cpus" accordingly. + */ + if (cpumask_empty(new_cpus) || + set_cpus_allowed_ptr(task, new_cpus)) { + char name[80]; + int cpu; + + cpu = find_usable_cpu(cs, new_cpus, possible_mask); + cpumask_clear(new_cpus); + cpumask_set_cpu(cpu, new_cpus); + WARN_ON_ONCE(set_cpus_allowed_ptr(task, new_cpus)); + cgroup_name(cs->css.cgroup, name, sizeof(name)); + pr_info("cpuset: Restricted task %s(%d) in cpuset %s is forced to run on outside CPU %d\n", + task->comm, task->pid, name, cpu); + } } css_task_iter_end(&it); }
On a system with asymmetric CPUs, a restricted task is one that can run only a selected subset of available CPUs. When a CPU goes offline or when "cpuset.cpus" is changed, it is possible that a restricted task may not have any runnable CPUs left in the current cpuset even if there is still some CPUs in effective_cpus. In this case, the restricted task cannot be run at all. There are several ways we may be able to handle this situation. Treating it like empty effective_cpus is probably too disruptive and is unfair to the normal tasks. So it is better to have some special handling for these restricted tasks. One possibility is to move the restricted tasks up the cpuset hierarchy, but it is tricky to do it right. Another solution is to assign other usable CPUs to these tasks. This patch implements the later alternative by finding one usable CPU by walking up the cpuset hierarchy and printing an informational message to let the users know that these restricted tasks are running in a cpuset with no usable CPU. Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/cgroup/cpuset.c | 56 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-)