Message ID | 20220419020958.40419-1-feng.tang@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] cgroup/cpuset: fix a memory binding failure for cgroup v2 | expand |
cc'ing Waiman and copying the whole body. Waiman, can you please take a look? Thanks. On Tue, Apr 19, 2022 at 10:09:58AM +0800, Feng Tang wrote: > We got report that setting cpuset.mems failed when the nodemask > contains a newly onlined memory node (not enumerated during boot) > for cgroup v2, while the binding succeeded for cgroup v1. > > The root cause is, for cgroup v2, when a new memory node is onlined, > top_cpuset's 'mem_allowed' is not updated with the new nodemask of > memory nodes, and the following setting memory nodemask will fail, > if the nodemask contains a new node. > > Fix it by updating top_cpuset.mems_allowed right after the > new memory node is onlined, just like v1. > > Signed-off-by: Feng Tang <feng.tang@intel.com> > --- > Very likely I missed some details here, but it looks strange that > the top_cpuset.mem_allowed is not updatd even after we onlined > several memory nodes after boot. > > kernel/cgroup/cpuset.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 9390bfd9f1cd..b97caaf16374 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -3314,8 +3314,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work) > /* synchronize mems_allowed to N_MEMORY */ > if (mems_updated) { > spin_lock_irq(&callback_lock); > - if (!on_dfl) > - top_cpuset.mems_allowed = new_mems; > + top_cpuset.mems_allowed = new_mems; > top_cpuset.effective_mems = new_mems; > spin_unlock_irq(&callback_lock); > update_tasks_nodemask(&top_cpuset); > -- > 2.27.0 >
On 4/21/22 18:22, Tejun Heo wrote: > cc'ing Waiman and copying the whole body. > > Waiman, can you please take a look? > > Thanks. > > On Tue, Apr 19, 2022 at 10:09:58AM +0800, Feng Tang wrote: >> We got report that setting cpuset.mems failed when the nodemask >> contains a newly onlined memory node (not enumerated during boot) >> for cgroup v2, while the binding succeeded for cgroup v1. >> >> The root cause is, for cgroup v2, when a new memory node is onlined, >> top_cpuset's 'mem_allowed' is not updated with the new nodemask of >> memory nodes, and the following setting memory nodemask will fail, >> if the nodemask contains a new node. >> >> Fix it by updating top_cpuset.mems_allowed right after the >> new memory node is onlined, just like v1. >> >> Signed-off-by: Feng Tang <feng.tang@intel.com> >> --- >> Very likely I missed some details here, but it looks strange that >> the top_cpuset.mem_allowed is not updatd even after we onlined >> several memory nodes after boot. >> >> kernel/cgroup/cpuset.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index 9390bfd9f1cd..b97caaf16374 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -3314,8 +3314,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work) >> /* synchronize mems_allowed to N_MEMORY */ >> if (mems_updated) { >> spin_lock_irq(&callback_lock); >> - if (!on_dfl) >> - top_cpuset.mems_allowed = new_mems; >> + top_cpuset.mems_allowed = new_mems; >> top_cpuset.effective_mems = new_mems; >> spin_unlock_irq(&callback_lock); >> update_tasks_nodemask(&top_cpuset); The on_dfl check was added by commit 7e88291beefb ("cpuset: make cs->{cpus, mems}_allowed as user-configured masks"). This is the expected behavior for cgroup v2 as we don't want to remove a node because it is hot-removed. However, I do see a problem in case we are adding a node that is not originally in top_cpuset.mems_allowed. We should be allowed to add the extra memory node. So something like if (!on_dfl) top_cpuset.mems_allowed = new_mems; else if (!nodes_subset(new_mems, top_cpuset.mems_allowed)) nodes_or(top_cpuset.mems_allowed, top_cpuset.mems_allowed, new_mems); For v2, top_cpuset.mems_allowed is set to node_possible_map in cpuset_bind(). Perhaps node_possible_map may not include all the nodes that are hot-pluggable. I don't know if that is similar problem with cpu_possible_mask or not. Cheers, Longman
On 4/24/22 12:04, Waiman Long wrote: > On 4/21/22 18:22, Tejun Heo wrote: >> cc'ing Waiman and copying the whole body. >> >> Waiman, can you please take a look? >> >> Thanks. >> >> On Tue, Apr 19, 2022 at 10:09:58AM +0800, Feng Tang wrote: >>> We got report that setting cpuset.mems failed when the nodemask >>> contains a newly onlined memory node (not enumerated during boot) >>> for cgroup v2, while the binding succeeded for cgroup v1. >>> >>> The root cause is, for cgroup v2, when a new memory node is onlined, >>> top_cpuset's 'mem_allowed' is not updated with the new nodemask of >>> memory nodes, and the following setting memory nodemask will fail, >>> if the nodemask contains a new node. >>> >>> Fix it by updating top_cpuset.mems_allowed right after the >>> new memory node is onlined, just like v1. >>> >>> Signed-off-by: Feng Tang <feng.tang@intel.com> >>> --- >>> Very likely I missed some details here, but it looks strange that >>> the top_cpuset.mem_allowed is not updatd even after we onlined >>> several memory nodes after boot. >>> >>> kernel/cgroup/cpuset.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >>> index 9390bfd9f1cd..b97caaf16374 100644 >>> --- a/kernel/cgroup/cpuset.c >>> +++ b/kernel/cgroup/cpuset.c >>> @@ -3314,8 +3314,7 @@ static void cpuset_hotplug_workfn(struct >>> work_struct *work) >>> /* synchronize mems_allowed to N_MEMORY */ >>> if (mems_updated) { >>> spin_lock_irq(&callback_lock); >>> - if (!on_dfl) >>> - top_cpuset.mems_allowed = new_mems; >>> + top_cpuset.mems_allowed = new_mems; >>> top_cpuset.effective_mems = new_mems; >>> spin_unlock_irq(&callback_lock); >>> update_tasks_nodemask(&top_cpuset); > The on_dfl check was added by commit 7e88291beefb ("cpuset: make > cs->{cpus, mems}_allowed as user-configured masks"). This is the > expected behavior for cgroup v2 as we don't want to remove a node > because it is hot-removed. However, I do see a problem in case we are > adding a node that is not originally in top_cpuset.mems_allowed. We > should be allowed to add the extra memory node. So something like > > if (!on_dfl) > top_cpuset.mems_allowed = new_mems; > else if (!nodes_subset(new_mems, top_cpuset.mems_allowed)) > nodes_or(top_cpuset.mems_allowed, > top_cpuset.mems_allowed, new_mems); > > For v2, top_cpuset.mems_allowed is set to node_possible_map in > cpuset_bind(). Perhaps node_possible_map may not include all the nodes > that are hot-pluggable. > > I don't know if that is similar problem with cpu_possible_mask or not. Ah, I know why the top_cpuset.mems_allowed isn't correctly set. There are 2 places where it is set: - cpuset_bind(): top_cpuset.mems_allowed = node_possible_map - cpuset_init_smp(): top_cpuset.mems_allowed = node_states[N_MEMORY]; The first one is correct, but the second isn't. It turns out that cpuset_init_smp() can be called later than cpuset_bind(). [ 2.611207] cpuset_bind called [ 2.631182] cblist_init_generic: Setting adjustable number of callback queues. [ 3.082357] cpuset_init_smp called So the proper fix may be to make sure that top_cpuset.mems_allowed is initialized correctly. Cheers, Longman
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 9390bfd9f1cd..b97caaf16374 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3314,8 +3314,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work) /* synchronize mems_allowed to N_MEMORY */ if (mems_updated) { spin_lock_irq(&callback_lock); - if (!on_dfl) - top_cpuset.mems_allowed = new_mems; + top_cpuset.mems_allowed = new_mems; top_cpuset.effective_mems = new_mems; spin_unlock_irq(&callback_lock); update_tasks_nodemask(&top_cpuset);
We got report that setting cpuset.mems failed when the nodemask contains a newly onlined memory node (not enumerated during boot) for cgroup v2, while the binding succeeded for cgroup v1. The root cause is, for cgroup v2, when a new memory node is onlined, top_cpuset's 'mem_allowed' is not updated with the new nodemask of memory nodes, and the following setting memory nodemask will fail, if the nodemask contains a new node. Fix it by updating top_cpuset.mems_allowed right after the new memory node is onlined, just like v1. Signed-off-by: Feng Tang <feng.tang@intel.com> --- Very likely I missed some details here, but it looks strange that the top_cpuset.mem_allowed is not updatd even after we onlined several memory nodes after boot. kernel/cgroup/cpuset.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)