Message ID | 20220905033852.18988-1-kernelfans@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] rcu: remove redundant cpu affinity setting during teardown | expand |
On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote: > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is > called twice. Firstly by rcutree_offline_cpu(), then by > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP cpuhp_step callback. > > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu > is visible to the scheduler. Furthermore, a bit in cpu_active_mask > means that the cpu is suitable as a migration destination. > > Now turning back to the case in rcu offlining. sched_cpu_deactivate() > has disabled the dying cpu as the migration destination before > rcutree_offline_cpu(). Furthermore, if the boost kthread is on the dying > cpu, it will be migrated to another suitable online cpu by the scheduler. > So the affinity setting by rcutree_offline_cpu() is redundant and can be > eliminated. > > Besides, this patch does an trival code rearrangement by unfolding > rcutree_affinity_setting() into rcutree_online_cpu(), considering that > the latter one is the only user of the former. > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > Cc: "Paul E. McKenney" <paulmck@kernel.org> > Cc: David Woodhouse <dwmw@amazon.co.uk> > Cc: Frederic Weisbecker <frederic@kernel.org> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> > Cc: Josh Triplett <josh@joshtriplett.org> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > Cc: Joel Fernandes <joel@joelfernandes.org> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> > To: rcu@vger.kernel.org > --- > kernel/rcu/tree.c | 14 +------------- > 1 file changed, 1 insertion(+), 13 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 79aea7df4345..b90f6487fd45 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu) > return 0; > } > > -/* > - * Update RCU priority boot kthread affinity for CPU-hotplug changes. > - */ > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing) > -{ > - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > - > - rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); > -} Good point, tiven how simple a wrapper this is and how little it is used, getting rid of it does sound like a reasonable idea. > /* > * Near the end of the CPU-online process. Pretty much all services > * enabled, and the CPU is now very much alive. > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu) > if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > return 0; /* Too early in boot for scheduler work. */ > sync_sched_exp_online_cleanup(cpu); > - rcutree_affinity_setting(cpu, -1); > + rcu_boost_kthread_setaffinity(rdp->mynode, -1); > > // Stop-machine done, so allow nohz_full to disable tick. > tick_dep_clear(TICK_DEP_BIT_RCU); > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu) > rnp->ffmask &= ~rdp->grpmask; > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > - rcutree_affinity_setting(cpu, cpu); We do need to keep this one because the CPU is going away. One the other hand, it might well be that we could get rid of the call to rcutree_affinity_setting() in rcutree_dead_cpu(). Or am I missing something subtle here? Thanx, Paul > - > // nohz_full CPUs need the tick for stop-machine to work quickly > tick_dep_set(TICK_DEP_BIT_RCU); > return 0; > -- > 2.31.1 >
On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote: > On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote: > > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is > > called twice. Firstly by rcutree_offline_cpu(), then by > > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP cpuhp_step callback. > > > > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu > > is visible to the scheduler. Furthermore, a bit in cpu_active_mask > > means that the cpu is suitable as a migration destination. > > > > Now turning back to the case in rcu offlining. sched_cpu_deactivate() > > has disabled the dying cpu as the migration destination before > > rcutree_offline_cpu(). Furthermore, if the boost kthread is on the dying > > cpu, it will be migrated to another suitable online cpu by the scheduler. > > So the affinity setting by rcutree_offline_cpu() is redundant and can be > > eliminated. > > > > Besides, this patch does an trival code rearrangement by unfolding > > rcutree_affinity_setting() into rcutree_online_cpu(), considering that > > the latter one is the only user of the former. > > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > > Cc: "Paul E. McKenney" <paulmck@kernel.org> > > Cc: David Woodhouse <dwmw@amazon.co.uk> > > Cc: Frederic Weisbecker <frederic@kernel.org> > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> > > Cc: Josh Triplett <josh@joshtriplett.org> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > > Cc: Joel Fernandes <joel@joelfernandes.org> > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> > > To: rcu@vger.kernel.org > > --- > > kernel/rcu/tree.c | 14 +------------- > > 1 file changed, 1 insertion(+), 13 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 79aea7df4345..b90f6487fd45 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu) > > return 0; > > } > > > > -/* > > - * Update RCU priority boot kthread affinity for CPU-hotplug changes. > > - */ > > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing) > > -{ > > - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > > - > > - rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); > > -} > > Good point, tiven how simple a wrapper this is and how little it is used, > getting rid of it does sound like a reasonable idea. > > > /* > > * Near the end of the CPU-online process. Pretty much all services > > * enabled, and the CPU is now very much alive. > > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu) > > if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > > return 0; /* Too early in boot for scheduler work. */ > > sync_sched_exp_online_cleanup(cpu); > > - rcutree_affinity_setting(cpu, -1); > > + rcu_boost_kthread_setaffinity(rdp->mynode, -1); > > > > // Stop-machine done, so allow nohz_full to disable tick. > > tick_dep_clear(TICK_DEP_BIT_RCU); > > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu) > > rnp->ffmask &= ~rdp->grpmask; > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > > > - rcutree_affinity_setting(cpu, cpu); > > We do need to keep this one because the CPU is going away. > > One the other hand, it might well be that we could get rid of the call > to rcutree_affinity_setting() in rcutree_dead_cpu(). > > Or am I missing something subtle here? > Oops, I think I need to rephrase my commit log to describe this nuance. The keypoint is whether ->qsmaskinitnext is stable. The teardown code path on a single dying cpu looks like: sched_cpu_deactivate() // prevent this dying cpu as a migration dst. rcutree_offline_cpu() // as a result, the scheduler core will take care // of the transient affinity mismatching until // rcutree_dead_cpu(). (I think it also stands in // the concurrent offlining) rcu_report_dead() // running on the dying cpu, and clear its bit in ->qsmaskinitnext rcutree_dead_cpu() // running on the initiator (a initiator cpu will // execute this function for each dying cpu) // At this point, ->qsmaskinitnext reflects the // offlining, and the affinity can get right. Sorry that my commit log had emphasized on the first part, but forgot to mention the ->qsmaskinitnext. Does this justification stand? Thanks, Pingfan
On Wed, Sep 07, 2022 at 09:40:29AM +0800, Pingfan Liu wrote: > On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote: > > On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote: > > > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is > > > called twice. Firstly by rcutree_offline_cpu(), then by > > > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP cpuhp_step callback. > > > > > > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu > > > is visible to the scheduler. Furthermore, a bit in cpu_active_mask > > > means that the cpu is suitable as a migration destination. > > > > > > Now turning back to the case in rcu offlining. sched_cpu_deactivate() > > > has disabled the dying cpu as the migration destination before > > > rcutree_offline_cpu(). Furthermore, if the boost kthread is on the dying > > > cpu, it will be migrated to another suitable online cpu by the scheduler. > > > So the affinity setting by rcutree_offline_cpu() is redundant and can be > > > eliminated. > > > > > > Besides, this patch does an trival code rearrangement by unfolding > > > rcutree_affinity_setting() into rcutree_online_cpu(), considering that > > > the latter one is the only user of the former. > > > > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > > > Cc: "Paul E. McKenney" <paulmck@kernel.org> > > > Cc: David Woodhouse <dwmw@amazon.co.uk> > > > Cc: Frederic Weisbecker <frederic@kernel.org> > > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> > > > Cc: Josh Triplett <josh@joshtriplett.org> > > > Cc: Steven Rostedt <rostedt@goodmis.org> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > > > Cc: Joel Fernandes <joel@joelfernandes.org> > > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> > > > To: rcu@vger.kernel.org > > > --- > > > kernel/rcu/tree.c | 14 +------------- > > > 1 file changed, 1 insertion(+), 13 deletions(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 79aea7df4345..b90f6487fd45 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu) > > > return 0; > > > } > > > > > > -/* > > > - * Update RCU priority boot kthread affinity for CPU-hotplug changes. > > > - */ > > > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing) > > > -{ > > > - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > > > - > > > - rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); > > > -} > > > > Good point, tiven how simple a wrapper this is and how little it is used, > > getting rid of it does sound like a reasonable idea. > > > > > /* > > > * Near the end of the CPU-online process. Pretty much all services > > > * enabled, and the CPU is now very much alive. > > > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu) > > > if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > > > return 0; /* Too early in boot for scheduler work. */ > > > sync_sched_exp_online_cleanup(cpu); > > > - rcutree_affinity_setting(cpu, -1); > > > + rcu_boost_kthread_setaffinity(rdp->mynode, -1); > > > > > > // Stop-machine done, so allow nohz_full to disable tick. > > > tick_dep_clear(TICK_DEP_BIT_RCU); > > > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu) > > > rnp->ffmask &= ~rdp->grpmask; > > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > > > > > - rcutree_affinity_setting(cpu, cpu); > > > > We do need to keep this one because the CPU is going away. > > > > One the other hand, it might well be that we could get rid of the call > > to rcutree_affinity_setting() in rcutree_dead_cpu(). > > > > Or am I missing something subtle here? > > Oops, I think I need to rephrase my commit log to describe this nuance. > The keypoint is whether ->qsmaskinitnext is stable. > > The teardown code path on a single dying cpu looks like: > > sched_cpu_deactivate() // prevent this dying cpu as a migration dst. Suppose that this was the last CPU that the task was permitted to run on. > rcutree_offline_cpu() // as a result, the scheduler core will take care > // of the transient affinity mismatching until > // rcutree_dead_cpu(). (I think it also stands in > // the concurrent offlining) > > rcu_report_dead() // running on the dying cpu, and clear its bit in ->qsmaskinitnext > > rcutree_dead_cpu() // running on the initiator (a initiator cpu will > // execute this function for each dying cpu) > // At this point, ->qsmaskinitnext reflects the > // offlining, and the affinity can get right. > > Sorry that my commit log had emphasized on the first part, but forgot to > mention the ->qsmaskinitnext. > > > Does this justification stand? We should ensure that the task's permitted set of CPUs always contained at least one online CPU. Unless I am missing something, your suggested change will sometimes end up with the task having no online CPUs in its mask. So what am I missing here? Thanx, Paul
On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote: > On Wed, Sep 07, 2022 at 09:40:29AM +0800, Pingfan Liu wrote: > > On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote: > > > On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote: > > > > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is > > > > called twice. Firstly by rcutree_offline_cpu(), then by > > > > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP cpuhp_step callback. > > > > > > > > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu > > > > is visible to the scheduler. Furthermore, a bit in cpu_active_mask > > > > means that the cpu is suitable as a migration destination. > > > > > > > > Now turning back to the case in rcu offlining. sched_cpu_deactivate() > > > > has disabled the dying cpu as the migration destination before > > > > rcutree_offline_cpu(). Furthermore, if the boost kthread is on the dying > > > > cpu, it will be migrated to another suitable online cpu by the scheduler. > > > > So the affinity setting by rcutree_offline_cpu() is redundant and can be > > > > eliminated. > > > > > > > > Besides, this patch does an trival code rearrangement by unfolding > > > > rcutree_affinity_setting() into rcutree_online_cpu(), considering that > > > > the latter one is the only user of the former. > > > > > > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org> > > > > Cc: David Woodhouse <dwmw@amazon.co.uk> > > > > Cc: Frederic Weisbecker <frederic@kernel.org> > > > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> > > > > Cc: Josh Triplett <josh@joshtriplett.org> > > > > Cc: Steven Rostedt <rostedt@goodmis.org> > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > > > > Cc: Joel Fernandes <joel@joelfernandes.org> > > > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> > > > > To: rcu@vger.kernel.org > > > > --- > > > > kernel/rcu/tree.c | 14 +------------- > > > > 1 file changed, 1 insertion(+), 13 deletions(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 79aea7df4345..b90f6487fd45 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu) > > > > return 0; > > > > } > > > > > > > > -/* > > > > - * Update RCU priority boot kthread affinity for CPU-hotplug changes. > > > > - */ > > > > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing) > > > > -{ > > > > - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > > > > - > > > > - rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); > > > > -} > > > > > > Good point, tiven how simple a wrapper this is and how little it is used, > > > getting rid of it does sound like a reasonable idea. > > > > > > > /* > > > > * Near the end of the CPU-online process. Pretty much all services > > > > * enabled, and the CPU is now very much alive. > > > > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu) > > > > if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > > > > return 0; /* Too early in boot for scheduler work. */ > > > > sync_sched_exp_online_cleanup(cpu); > > > > - rcutree_affinity_setting(cpu, -1); > > > > + rcu_boost_kthread_setaffinity(rdp->mynode, -1); > > > > > > > > // Stop-machine done, so allow nohz_full to disable tick. > > > > tick_dep_clear(TICK_DEP_BIT_RCU); > > > > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > rnp->ffmask &= ~rdp->grpmask; > > > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > > > > > > > - rcutree_affinity_setting(cpu, cpu); > > > > > > We do need to keep this one because the CPU is going away. > > > > > > One the other hand, it might well be that we could get rid of the call > > > to rcutree_affinity_setting() in rcutree_dead_cpu(). > > > > > > Or am I missing something subtle here? > > > > Oops, I think I need to rephrase my commit log to describe this nuance. > > The keypoint is whether ->qsmaskinitnext is stable. > > > > The teardown code path on a single dying cpu looks like: > > > > sched_cpu_deactivate() // prevent this dying cpu as a migration dst. > > Suppose that this was the last CPU that the task was permitted to > run on. > Get your concern, please see the comment below. > > rcutree_offline_cpu() // as a result, the scheduler core will take care > > // of the transient affinity mismatching until > > // rcutree_dead_cpu(). (I think it also stands in > > // the concurrent offlining) > > > > rcu_report_dead() // running on the dying cpu, and clear its bit in ->qsmaskinitnext > > > > rcutree_dead_cpu() // running on the initiator (a initiator cpu will > > // execute this function for each dying cpu) > > // At this point, ->qsmaskinitnext reflects the > > // offlining, and the affinity can get right. > > > > Sorry that my commit log had emphasized on the first part, but forgot to > > mention the ->qsmaskinitnext. > > > > > > Does this justification stand? > > We should ensure that the task's permitted set of CPUs always contained > at least one online CPU. Unless I am missing something, your suggested > change will sometimes end up with the task having no online CPUs in > its mask. > Actually, the scheduler will pick up one online cpu for the boost thread. On the last cpu, boost is subject to migration, and is pushed away by __balance_push_cpu_stop() { if (task_rq(p) == rq && task_on_rq_queued(p)) { cpu = select_fallback_rq(rq->cpu, p); rq = __migrate_task(rq, &rf, p, cpu); } } Now, turning to select_fallback_rq(), inside that function, if p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it point to cpuset or cpu_possible_mask. So finally, there is a valid cpu is picked up. (I have not found the online cpu there. The trick should hide elsewhere. But that rendered the same result as in v5.16, where rcu_boost_kthread_setaffinity()->cpumask_setall(cm)) But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU), which appeals for a more careful thinking. If there is a cpuset for housekeeping so that select_fallback_rq() can pick up one from that cpuset? To Frederic, could you show some light here and is it worth introducing such a cpuset so that select_fallback_rq() can pick up a cpu in housekeeping set in this case? Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for the awareness of the concurrent rcutree_offline_cpu(), the affinity setting could be done there if select_fallback_rq() can not work. Thanks, Pingfan
On Wed, Sep 14, 2022 at 06:12:27PM +0800, Pingfan Liu wrote: > On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote: > Actually, the scheduler will pick up one online cpu for the boost > thread. > > On the last cpu, boost is subject to migration, and is pushed away by > __balance_push_cpu_stop() > { > if (task_rq(p) == rq && task_on_rq_queued(p)) { > cpu = select_fallback_rq(rq->cpu, p); > rq = __migrate_task(rq, &rf, p, cpu); > } > } > > Now, turning to select_fallback_rq(), inside that function, if > p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it > point to cpuset or cpu_possible_mask. So finally, there is a valid cpu > is picked up. (I have not found the online cpu there. The trick should hide > elsewhere. But that rendered the same result as in v5.16, where > rcu_boost_kthread_setaffinity()->cpumask_setall(cm)) > > But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU), > which appeals for a more careful thinking. > If there is a cpuset for housekeeping so that select_fallback_rq() can > pick up one from that cpuset? > > To Frederic, could you show some light here and is it worth introducing > such a cpuset so that select_fallback_rq() can pick up a cpu in > housekeeping set in this case? > > > Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for > the awareness of the concurrent rcutree_offline_cpu(), the affinity > setting could be done there if select_fallback_rq() can not work. Here is the way I understand it: suppose the outgoing CPU was the last online one in this rcu node, in this case the migration performed in sched_cpu_deactivate() will find only one CPU in p->cpus_mask, but that CPU is now out of cpu_active_mask, so the task is affined by default to cpu_possible_mask. So there is a short window between sched_cpu_deactivate() and rcutree_offline_cpu() where the task may run on all cpu_possible_mask. Does it matter? Maybe, I don't know... After that we reach rcutree_offline_cpu() -> rcu_boost_kthread_setaffinity() that fails to fill the cpumask since the node is now empty. As the resulting cpumask is empty, it fills it as a last resort to the HK_TYPE_RCU housekeeping set which has to have at least 1 online CPU (nohz_full constraint). So the task is finally affine to that housekeeping set. Like Paul said, I'd rather indeed remove the rcu_boost_kthread_setaffinity() call from rcutree_dead_cpu() as this one looks useless. Thanks. > > Thanks, > > Pingfan
On Wed, Sep 14, 2022 at 8:27 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > On Wed, Sep 14, 2022 at 06:12:27PM +0800, Pingfan Liu wrote: > > On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote: > > Actually, the scheduler will pick up one online cpu for the boost > > thread. > > > > On the last cpu, boost is subject to migration, and is pushed away by > > __balance_push_cpu_stop() > > { > > if (task_rq(p) == rq && task_on_rq_queued(p)) { > > cpu = select_fallback_rq(rq->cpu, p); > > rq = __migrate_task(rq, &rf, p, cpu); > > } > > } > > > > Now, turning to select_fallback_rq(), inside that function, if > > p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it > > point to cpuset or cpu_possible_mask. So finally, there is a valid cpu > > is picked up. (I have not found the online cpu there. The trick should hide > > elsewhere. But that rendered the same result as in v5.16, where > > rcu_boost_kthread_setaffinity()->cpumask_setall(cm)) > > > > But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU), > > which appeals for a more careful thinking. > > If there is a cpuset for housekeeping so that select_fallback_rq() can > > pick up one from that cpuset? > > > > To Frederic, could you show some light here and is it worth introducing > > such a cpuset so that select_fallback_rq() can pick up a cpu in > > housekeeping set in this case? > > > > > > Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for > > the awareness of the concurrent rcutree_offline_cpu(), the affinity > > setting could be done there if select_fallback_rq() can not work. > > Here is the way I understand it: suppose the outgoing CPU was the last online > one in this rcu node, in this case the migration performed in > sched_cpu_deactivate() will find only one CPU in p->cpus_mask, but that CPU is > now out of cpu_active_mask, so the task is affined by default to cpu_possible_mask. > > So there is a short window between sched_cpu_deactivate() and > rcutree_offline_cpu() where the task may run on all cpu_possible_mask. Does it > matter? Maybe, I don't know... > OK, then it is pointless to introduce a housekeeping cpuset. > After that we reach rcutree_offline_cpu() -> rcu_boost_kthread_setaffinity() > that fails to fill the cpumask since the node is now empty. As the resulting > cpumask is empty, it fills it as a last resort to the HK_TYPE_RCU housekeeping > set which has to have at least 1 online CPU (nohz_full constraint). So the task > is finally affine to that housekeeping set. > Yes, exactly. > Like Paul said, I'd rather indeed remove the rcu_boost_kthread_setaffinity() > call from rcutree_dead_cpu() as this one looks useless. > OK, I will take that way. Thanks to both of you for the generous help. I will bring up V2 soon. Thanks, Pingfan
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 79aea7df4345..b90f6487fd45 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu) return 0; } -/* - * Update RCU priority boot kthread affinity for CPU-hotplug changes. - */ -static void rcutree_affinity_setting(unsigned int cpu, int outgoing) -{ - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); - - rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); -} - /* * Near the end of the CPU-online process. Pretty much all services * enabled, and the CPU is now very much alive. @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu) if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) return 0; /* Too early in boot for scheduler work. */ sync_sched_exp_online_cleanup(cpu); - rcutree_affinity_setting(cpu, -1); + rcu_boost_kthread_setaffinity(rdp->mynode, -1); // Stop-machine done, so allow nohz_full to disable tick. tick_dep_clear(TICK_DEP_BIT_RCU); @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu) rnp->ffmask &= ~rdp->grpmask; raw_spin_unlock_irqrestore_rcu_node(rnp, flags); - rcutree_affinity_setting(cpu, cpu); - // nohz_full CPUs need the tick for stop-machine to work quickly tick_dep_set(TICK_DEP_BIT_RCU); return 0;
At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is called twice. Firstly by rcutree_offline_cpu(), then by rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP cpuhp_step callback. From the scheduler's perspective, a bit in cpu_online_mask means that the cpu is visible to the scheduler. Furthermore, a bit in cpu_active_mask means that the cpu is suitable as a migration destination. Now turning back to the case in rcu offlining. sched_cpu_deactivate() has disabled the dying cpu as the migration destination before rcutree_offline_cpu(). Furthermore, if the boost kthread is on the dying cpu, it will be migrated to another suitable online cpu by the scheduler. So the affinity setting by rcutree_offline_cpu() is redundant and can be eliminated. Besides, this patch does an trival code rearrangement by unfolding rcutree_affinity_setting() into rcutree_online_cpu(), considering that the latter one is the only user of the former. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> To: rcu@vger.kernel.org --- kernel/rcu/tree.c | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-)