diff mbox series

[11/20] sched: Handle CPU isolation on last resort fallback rq selection

Message ID 20240926224910.11106-12-frederic@kernel.org (mailing list archive)
State New
Headers show
Series [1/2] arm64: Keep first mismatched 32bits el0 capable CPU online through its callbacks | expand

Commit Message

Frederic Weisbecker Sept. 26, 2024, 10:48 p.m. UTC
When a kthread or any other task has an affinity mask that is fully
offline or unallowed, the scheduler reaffines the task to all possible
CPUs as a last resort.

This default decision doesn't mix up very well with nohz_full CPUs that
are part of the possible cpumask but don't want to be disturbed by
unbound kthreads or even detached pinned user tasks.

Make the fallback affinity setting aware of nohz_full. This applies to
all architectures supporting nohz_full except arm32. However this
architecture that overrides the task possible mask is unlikely to be
willing to integrate new development.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/sched/core.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

Michal Hocko Sept. 27, 2024, 7:26 a.m. UTC | #1
On Fri 27-09-24 00:48:59, Frederic Weisbecker wrote:
> When a kthread or any other task has an affinity mask that is fully
> offline or unallowed, the scheduler reaffines the task to all possible
> CPUs as a last resort.
> 
> This default decision doesn't mix up very well with nohz_full CPUs that
> are part of the possible cpumask but don't want to be disturbed by
> unbound kthreads or even detached pinned user tasks.
> 
> Make the fallback affinity setting aware of nohz_full. This applies to
> all architectures supporting nohz_full except arm32. However this
> architecture that overrides the task possible mask is unlikely to be
> willing to integrate new development.
> 
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Thanks, this makes sense to me. Up to scheduler maitainers whether this
makes sense in general though.

Thanks for looking into this Frederic!

> ---
>  kernel/sched/core.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 43e453ab7e20..d4b759c1cbf1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3421,6 +3421,21 @@ void kick_process(struct task_struct *p)
>  }
>  EXPORT_SYMBOL_GPL(kick_process);
>  
> +static const struct cpumask *task_cpu_fallback_mask(struct task_struct *t)
> +{
> +	const struct cpumask *mask;
> +
> +	mask = task_cpu_possible_mask(p);
> +	/*
> +	 * Architectures that overrides the task possible mask
> +	 * must handle CPU isolation.
> +	 */
> +	if (mask != cpu_possible_mask)
> +		return mask;
> +	else
> +		return housekeeping_cpumask(HK_TYPE_TICK);
> +}
> +
>  /*
>   * ->cpus_ptr is protected by both rq->lock and p->pi_lock
>   *
> @@ -3489,7 +3504,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>  			 *
>  			 * More yuck to audit.
>  			 */
> -			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
> +			do_set_cpus_allowed(p, task_cpu_fallback_mask(p));
>  			state = fail;
>  			break;
>  		case fail:
> -- 
> 2.46.0
Will Deacon Oct. 8, 2024, 10:54 a.m. UTC | #2
On Fri, Sep 27, 2024 at 12:48:59AM +0200, Frederic Weisbecker wrote:
> When a kthread or any other task has an affinity mask that is fully
> offline or unallowed, the scheduler reaffines the task to all possible
> CPUs as a last resort.
> 
> This default decision doesn't mix up very well with nohz_full CPUs that
> are part of the possible cpumask but don't want to be disturbed by
> unbound kthreads or even detached pinned user tasks.
> 
> Make the fallback affinity setting aware of nohz_full. This applies to
> all architectures supporting nohz_full except arm32. However this
> architecture that overrides the task possible mask is unlikely to be
> willing to integrate new development.

I'm not sure I understand this last sentence. The possible mask is
overridden for 32-bit tasks on an *arm64* kernel when running on an SoC
featuring some CPUs that can execute only 64-bit tasks. Who is unwilling
to integrate what?

Will
Frederic Weisbecker Oct. 8, 2024, 12:27 p.m. UTC | #3
Le Tue, Oct 08, 2024 at 11:54:35AM +0100, Will Deacon a écrit :
> On Fri, Sep 27, 2024 at 12:48:59AM +0200, Frederic Weisbecker wrote:
> > When a kthread or any other task has an affinity mask that is fully
> > offline or unallowed, the scheduler reaffines the task to all possible
> > CPUs as a last resort.
> > 
> > This default decision doesn't mix up very well with nohz_full CPUs that
> > are part of the possible cpumask but don't want to be disturbed by
> > unbound kthreads or even detached pinned user tasks.
> > 
> > Make the fallback affinity setting aware of nohz_full. This applies to
> > all architectures supporting nohz_full except arm32. However this
> > architecture that overrides the task possible mask is unlikely to be
> > willing to integrate new development.
> 
> I'm not sure I understand this last sentence. The possible mask is
> overridden for 32-bit tasks on an *arm64* kernel when running on an SoC
> featuring some CPUs that can execute only 64-bit tasks. Who is unwilling
> to integrate what?

Right I've been lazy on that, assuming that nohz_full is a niche, and
nohz_full on arm 32 bits tasks must be even more a niche. But I can make
it a macro just like task_cpu_possible_mask() so that architectures
can override it?

Thanks.
Frederic Weisbecker Oct. 15, 2024, 1:48 p.m. UTC | #4
Le Tue, Oct 08, 2024 at 11:54:35AM +0100, Will Deacon a écrit :
> On Fri, Sep 27, 2024 at 12:48:59AM +0200, Frederic Weisbecker wrote:
> > When a kthread or any other task has an affinity mask that is fully
> > offline or unallowed, the scheduler reaffines the task to all possible
> > CPUs as a last resort.
> > 
> > This default decision doesn't mix up very well with nohz_full CPUs that
> > are part of the possible cpumask but don't want to be disturbed by
> > unbound kthreads or even detached pinned user tasks.
> > 
> > Make the fallback affinity setting aware of nohz_full. This applies to
> > all architectures supporting nohz_full except arm32. However this
> > architecture that overrides the task possible mask is unlikely to be
> > willing to integrate new development.
> 
> I'm not sure I understand this last sentence. The possible mask is
> overridden for 32-bit tasks on an *arm64* kernel when running on an SoC
> featuring some CPUs that can execute only 64-bit tasks. Who is unwilling
> to integrate what?
> 
> Will

Will, how does the (untested) following look like? The rationale is that
we must deal with the fact that CPU supporting 32-bits el0 may appear at
any time and those may not intersect housekeeping CPUs (housekeeping CPUs
are CPUs that are not part of nohz_full=. If nohz_full= isn't used then
it's cpu_possible_mask). If there is a housekeeping CPU supporting el0 32bits
then it will be disallowed to be ever offlined. But if the first mismatching
CPU supporting el0 that pops up is not housekeeping then we may end up
with that CPU disallowed to be offlined + later if a housekeeping CPU appears
that also supports 32bits el0 will also be disallowed to be offlined. Ideally
it should turn back the previous CPU to be offlinable but there may be
other things that have forbidden that CPU to be offline so...

Oh well I made a mess.

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 3d261cc123c1..992d782f2899 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -663,6 +663,7 @@ static inline bool supports_clearbhb(int scope)
 }
 
 const struct cpumask *system_32bit_el0_cpumask(void);
+const struct cpumask *fallback_32bit_el0_cpumask(void);
 DECLARE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
 
 static inline bool system_supports_32bit_el0(void)
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 7c09d47e09cb..30cb30438fec 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -282,6 +282,19 @@ task_cpu_possible_mask(struct task_struct *p)
 }
 #define task_cpu_possible_mask	task_cpu_possible_mask
 
+static inline const struct cpumask *
+task_cpu_fallback_mask(struct task_struct *p)
+{
+	if (!static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		return housekeeping_cpumask(HK_TYPE_TICK);
+
+	if (!is_compat_thread(task_thread_info(p)))
+		return housekeeping_cpumask(HK_TYPE_TICK);
+
+	return fallback_32bit_el0_cpumask();
+}
+#define task_cpu_fallback_mask	task_cpu_fallback_mask
+
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 718728a85430..3e4400df588f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -133,6 +133,7 @@ DEFINE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
  * Only valid if arm64_mismatched_32bit_el0 is enabled.
  */
 static cpumask_var_t cpu_32bit_el0_mask __cpumask_var_read_mostly;
+static cpumask_var_t fallback_32bit_el0_mask __cpumask_var_read_mostly;
 
 void dump_cpu_features(void)
 {
@@ -1618,6 +1619,21 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_possible_mask;
 }
 
+const struct cpumask *fallback_32bit_el0_cpumask(void)
+{
+	if (!system_supports_32bit_el0())
+		return cpu_none_mask;
+
+	if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) {
+		if (!cpumask_empty(fallback_32bit_el0_mask))
+			return fallback_32bit_el0_mask;
+		else
+			return cpu_32bit_el0_mask
+	}
+
+	return housekeeping_cpumask(HK_TYPE_TICK);
+}
+
 static int __init parse_32bit_el0_param(char *str)
 {
 	allow_mismatched_32bit_el0 = true;
@@ -3598,20 +3614,30 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
 	 * be offlined by userspace. -1 indicates we haven't yet onlined
 	 * a 32-bit-capable CPU.
 	 */
-	static int lucky_winner = -1;
+	static int unofflinable = nr_cpu_ids;
+	static int unofflinable_hk = nr_cpu_ids;
 
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
 	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
 
 	if (cpu_32bit) {
 		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
+		if (housekeeping_cpu(cpu, HK_TYPE_TICK))
+			cpumask_set_cpu(cpu, fallback_32bit_el0_mask);
 		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
 	}
 
-	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
+	if (unofflinable < nr_cpu_ids) {
+		if (unofflinable_hk >= nr_cpu_ids && cpu_32bit && housekeeping_cpu(cpu, HK_TYPE_TICK)) {
+			unofflinable_hk = cpu;
+			get_cpu_device(unofflinable_hk)->offline_disabled = true;
+			pr_info("Asymmetric 32-bit EL0 support detected on housekeeping CPU %u;"
+				"CPU hot-unplug disabled on CPU %u\n", cpu, cpu);
+		}
 		return 0;
+	}
 
-	if (lucky_winner >= 0)
+	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
 		return 0;
 
 	/*
@@ -3619,9 +3645,13 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
 	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
 	 * every CPU in the system for a 32-bit task.
 	 */
-	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
-							 cpu_active_mask);
-	get_cpu_device(lucky_winner)->offline_disabled = true;
+	unofflinable_hk = cpumask_any_and(fallback_32bit_el0_mask, cpu_active_mask);
+	if (unofflinable_hk < nr_cpu_ids)
+		unofflinable = unofflinable_hk;
+	else
+		unofflinable = cpumask_any_and(cpu_32bit_el0_mask, cpu_active_mask);
+
+	get_cpu_device(unofflinable)->offline_disabled = true;
 	setup_elf_hwcaps(compat_elf_hwcaps);
 	elf_hwcap_fixup();
 	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
@@ -3637,6 +3667,9 @@ static int __init init_32bit_el0_mask(void)
 	if (!zalloc_cpumask_var(&cpu_32bit_el0_mask, GFP_KERNEL))
 		return -ENOMEM;
 
+	if (!zalloc_cpumask_var(&fallback_32bit_el0_mask, GFP_KERNEL))
+		return -ENOMEM;
+
 	return cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 				 "arm64/mismatched_32bit_el0:online",
 				 enable_mismatched_32bit_el0, NULL);
diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index bbaec80c78c5..5b8d017a17f9 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -28,6 +28,10 @@ static inline void leave_mm(void) { }
 # define task_cpu_possible(cpu, p)	cpumask_test_cpu((cpu), task_cpu_possible_mask(p))
 #endif
 
+#ifndef task_cpu_fallback_mask
+# define task_cpu_fallback_mask(p)	housekeeping_cpumask(HK_TYPE_TICK);
+#endif
+
 #ifndef mm_untag_mask
 static inline unsigned long mm_untag_mask(struct mm_struct *mm)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aeb595514461..1edce360f1a6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3489,7 +3489,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			 *
 			 * More yuck to audit.
 			 */
-			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
+			do_set_cpus_allowed(p, task_cpu_fallback_mask(p));
 			state = fail;
 			break;
 		case fail:
Will Deacon Oct. 28, 2024, 4:25 p.m. UTC | #5
Hi Frederic,

Thanks for having a crack at this, but I'm pretty confused now so please
prepare for a bunch of silly questions!

On Tue, Oct 15, 2024 at 03:48:55PM +0200, Frederic Weisbecker wrote:
> Le Tue, Oct 08, 2024 at 11:54:35AM +0100, Will Deacon a écrit :
> > On Fri, Sep 27, 2024 at 12:48:59AM +0200, Frederic Weisbecker wrote:
> > > When a kthread or any other task has an affinity mask that is fully
> > > offline or unallowed, the scheduler reaffines the task to all possible
> > > CPUs as a last resort.
> > > 
> > > This default decision doesn't mix up very well with nohz_full CPUs that
> > > are part of the possible cpumask but don't want to be disturbed by
> > > unbound kthreads or even detached pinned user tasks.
> > > 
> > > Make the fallback affinity setting aware of nohz_full. This applies to
> > > all architectures supporting nohz_full except arm32. However this
> > > architecture that overrides the task possible mask is unlikely to be
> > > willing to integrate new development.
> > 
> > I'm not sure I understand this last sentence. The possible mask is
> > overridden for 32-bit tasks on an *arm64* kernel when running on an SoC
> > featuring some CPUs that can execute only 64-bit tasks. Who is unwilling
> > to integrate what?

I should've been clearer in my reply, but I think the most important thing
here for the arm64 heterogeneous SoCs is that we document whatever the
behaviour is in Documentation/arch/arm64/asymmetric-32bit.rst. There are
a few other kernel features that don't play well (e.g. SCHED_DEADLINE),
so it might be sufficient just to call out the limitations relating to
CPU isolation there.

However:

> Will, how does the (untested) following look like? The rationale is that
> we must deal with the fact that CPU supporting 32-bits el0 may appear at
> any time and those may not intersect housekeeping CPUs (housekeeping CPUs
> are CPUs that are not part of nohz_full=.

In the funky SoCs, all CPUs support 64-bit and we have a 64-bit kernel.
Some CPUs additionally support 32-bit but that should only be a concern
for the scheduling of user tasks.

> If nohz_full= isn't used then
> it's cpu_possible_mask). If there is a housekeeping CPU supporting el0 32bits
> then it will be disallowed to be ever offlined. But if the first mismatching
> CPU supporting el0 that pops up is not housekeeping then we may end up
> with that CPU disallowed to be offlined + later if a housekeeping CPU appears
> that also supports 32bits el0 will also be disallowed to be offlined. Ideally
> it should turn back the previous CPU to be offlinable but there may be
> other things that have forbidden that CPU to be offline so...

I'd have thought the bigger problem would be if the set of nohz_full=
CPUs was defined as the set of CPUs that support 32-bit. In that case,
executing a 32-bit task will give the scheduler no choice but to run
the task on a !housekeeping core.

So perhaps we could turn this on its head and explicitly mark the first
32-bit capable CPU as a housekeeping core when the mismatched mode is
enabled? We're already preventing CPU hotplug for the thing, so it's
"special" already. If that conflicts with the nohz_full_option, we can
emit a warning message that we're overriding it. I think that's ok, as
the user will have had to specify 'allow_mismatched_32bit_el0' as well.

Will
Frederic Weisbecker Oct. 28, 2024, 4:51 p.m. UTC | #6
Le Mon, Oct 28, 2024 at 04:25:15PM +0000, Will Deacon a écrit :
> > If nohz_full= isn't used then
> > it's cpu_possible_mask). If there is a housekeeping CPU supporting el0 32bits
> > then it will be disallowed to be ever offlined. But if the first mismatching
> > CPU supporting el0 that pops up is not housekeeping then we may end up
> > with that CPU disallowed to be offlined + later if a housekeeping CPU appears
> > that also supports 32bits el0 will also be disallowed to be offlined. Ideally
> > it should turn back the previous CPU to be offlinable but there may be
> > other things that have forbidden that CPU to be offline so...
> 
> I'd have thought the bigger problem would be if the set of nohz_full=
> CPUs was defined as the set of CPUs that support 32-bit. In that case,
> executing a 32-bit task will give the scheduler no choice but to run
> the task on a !housekeeping core.

Right.

> 
> So perhaps we could turn this on its head and explicitly mark the first
> 32-bit capable CPU as a housekeeping core when the mismatched mode is
> enabled? We're already preventing CPU hotplug for the thing, so it's
> "special" already. If that conflicts with the nohz_full_option, we can
> emit a warning message that we're overriding it. I think that's ok, as
> the user will have had to specify 'allow_mismatched_32bit_el0' as well.

It's very complicated to revert a CPU once it is set as nohz_full. But we can
retain a 32 bits capable nohz_full CPU from offlining until we finally find
a non-nohz_full 2bits capable CPU. I was about to repost the whole kthread
patchset but lemme post just the specific bits of interests here, it's "just"
two patches.

Thanks.
diff mbox series

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 43e453ab7e20..d4b759c1cbf1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3421,6 +3421,21 @@  void kick_process(struct task_struct *p)
 }
 EXPORT_SYMBOL_GPL(kick_process);
 
+static const struct cpumask *task_cpu_fallback_mask(struct task_struct *t)
+{
+	const struct cpumask *mask;
+
+	mask = task_cpu_possible_mask(p);
+	/*
+	 * Architectures that overrides the task possible mask
+	 * must handle CPU isolation.
+	 */
+	if (mask != cpu_possible_mask)
+		return mask;
+	else
+		return housekeeping_cpumask(HK_TYPE_TICK);
+}
+
 /*
  * ->cpus_ptr is protected by both rq->lock and p->pi_lock
  *
@@ -3489,7 +3504,7 @@  static int select_fallback_rq(int cpu, struct task_struct *p)
 			 *
 			 * More yuck to audit.
 			 */
-			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
+			do_set_cpus_allowed(p, task_cpu_fallback_mask(p));
 			state = fail;
 			break;
 		case fail: