diff mbox series

[-next,v4,06/19] arm64: entry: Move arm64_preempt_schedule_irq() into exit_to_kernel_mode()

Message ID 20241025100700.3714552-7-ruanjinjie@huawei.com (mailing list archive)
State New
Headers show
Series arm64: entry: Convert to generic entry | expand

Commit Message

Jinjie Ruan Oct. 25, 2024, 10:06 a.m. UTC
Move arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
only __el1_irq() but also every time when kernel mode irq return,
there is a chance to reschedule.

As Mark pointed out, this change will have the following key impact:

    "We'll preempt even without taking a "real" interrupt. That
    shouldn't result in preemption that wasn't possible before,
    but it does change the probability of preempting at certain points,
    and might have a performance impact, so probably warrants a
    benchmark."

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/entry-common.c | 88 ++++++++++++++++----------------
 1 file changed, 44 insertions(+), 44 deletions(-)

Comments

Mark Rutland Oct. 29, 2024, 2:52 p.m. UTC | #1
On Fri, Oct 25, 2024 at 06:06:47PM +0800, Jinjie Ruan wrote:
> Move arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
> only __el1_irq() but also every time when kernel mode irq return,
> there is a chance to reschedule.

We use exit_to_kernel_mode() for every non-NMI exception return to the
kernel, not just IRQ returns.

> As Mark pointed out, this change will have the following key impact:
> 
>     "We'll preempt even without taking a "real" interrupt. That
>     shouldn't result in preemption that wasn't possible before,
>     but it does change the probability of preempting at certain points,
>     and might have a performance impact, so probably warrants a
>     benchmark."

For anyone following along at home, I said that at:

  https://lore.kernel.org/linux-arm-kernel/ZxejvAmccYMTa4P1@J2N7QTR9R3/

... and there I specifically said:

> I's suggest you first write a patch to align arm64's entry code with the
> generic code, by removing the call to arm64_preempt_schedule_irq() from
> __el1_irq(), and adding a call to arm64_preempt_schedule_irq() in
> __exit_to_kernel_mode(), e.g.
> 
> | static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
> | {
> | 	...
> | 	if (interrupts_enabled(regs)) {
> | 		...
> | 		if (regs->exit_rcu) {
> | 			...
> | 		}
> | 		...
> | 		arm64_preempt_schedule_irq();
> | 		...
> | 	} else {
> | 		...
> | 	}
> | }

[...]

> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +#define need_irq_preemption() \
> +	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +#else
> +#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
> +#endif
> +
> +static void __sched arm64_preempt_schedule_irq(void)
> +{
> +	if (!need_irq_preemption())
> +		return;
> +
> +	/*
> +	 * Note: thread_info::preempt_count includes both thread_info::count
> +	 * and thread_info::need_resched, and is not equivalent to
> +	 * preempt_count().
> +	 */
> +	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> +		return;
> +
> +	/*
> +	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> +	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> +	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> +	 * DAIF we must have handled an NMI, so skip preemption.
> +	 */
> +	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> +		return;
> +
> +	/*
> +	 * Preempting a task from an IRQ means we leave copies of PSTATE
> +	 * on the stack. cpufeature's enable calls may modify PSTATE, but
> +	 * resuming one of these preempted tasks would undo those changes.
> +	 *
> +	 * Only allow a task to be preempted once cpufeatures have been
> +	 * enabled.
> +	 */
> +	if (system_capabilities_finalized())
> +		preempt_schedule_irq();
> +}
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -72,6 +114,8 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
>  					irqentry_state_t state)
>  {
> +	arm64_preempt_schedule_irq();

This is broken; exit_to_kernel_mode() is called for any non-NMI return
excpetion return to the kernel, and this doesn't check that interrupts
were enabled in the context the exception was taken from.

This will preempt in cases where we should not, e.g. if we WARN() in a section with
IRQs disabled.

Mark.
Jinjie Ruan Oct. 31, 2024, 4:02 a.m. UTC | #2
On 2024/10/29 22:52, Mark Rutland wrote:
> On Fri, Oct 25, 2024 at 06:06:47PM +0800, Jinjie Ruan wrote:
>> Move arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
>> only __el1_irq() but also every time when kernel mode irq return,
>> there is a chance to reschedule.
> 
> We use exit_to_kernel_mode() for every non-NMI exception return to the
> kernel, not just IRQ returns.

Yes, it it not only irq but other non-NMI exception, will update it.

> 
>> As Mark pointed out, this change will have the following key impact:
>>
>>     "We'll preempt even without taking a "real" interrupt. That
>>     shouldn't result in preemption that wasn't possible before,
>>     but it does change the probability of preempting at certain points,
>>     and might have a performance impact, so probably warrants a
>>     benchmark."
> 
> For anyone following along at home, I said that at:
> 
>   https://lore.kernel.org/linux-arm-kernel/ZxejvAmccYMTa4P1@J2N7QTR9R3/
> 
> ... and there I specifically said:

Thank you!

This one and the next patch will be merged as you suggested.

I would have thought it would have been clearer to put it in
__exit_to_kernel_mode() and move it to interrupt enabled block in two steps.

> 
>> I's suggest you first write a patch to align arm64's entry code with the
>> generic code, by removing the call to arm64_preempt_schedule_irq() from
>> __el1_irq(), and adding a call to arm64_preempt_schedule_irq() in
>> __exit_to_kernel_mode(), e.g.
>>
>> | static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
>> | {
>> | 	...
>> | 	if (interrupts_enabled(regs)) {
>> | 		...
>> | 		if (regs->exit_rcu) {
>> | 			...
>> | 		}
>> | 		...
>> | 		arm64_preempt_schedule_irq();
>> | 		...
>> | 	} else {
>> | 		...
>> | 	}
>> | }
> 
> [...]
> 
>> +#ifdef CONFIG_PREEMPT_DYNAMIC
>> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
>> +#define need_irq_preemption() \
>> +	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
>> +#else
>> +#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
>> +#endif
>> +
>> +static void __sched arm64_preempt_schedule_irq(void)
>> +{
>> +	if (!need_irq_preemption())
>> +		return;
>> +
>> +	/*
>> +	 * Note: thread_info::preempt_count includes both thread_info::count
>> +	 * and thread_info::need_resched, and is not equivalent to
>> +	 * preempt_count().
>> +	 */
>> +	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
>> +		return;
>> +
>> +	/*
>> +	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
>> +	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
>> +	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
>> +	 * DAIF we must have handled an NMI, so skip preemption.
>> +	 */
>> +	if (system_uses_irq_prio_masking() && read_sysreg(daif))
>> +		return;
>> +
>> +	/*
>> +	 * Preempting a task from an IRQ means we leave copies of PSTATE
>> +	 * on the stack. cpufeature's enable calls may modify PSTATE, but
>> +	 * resuming one of these preempted tasks would undo those changes.
>> +	 *
>> +	 * Only allow a task to be preempted once cpufeatures have been
>> +	 * enabled.
>> +	 */
>> +	if (system_capabilities_finalized())
>> +		preempt_schedule_irq();
>> +}
>> +
>>  /*
>>   * Handle IRQ/context state management when exiting to kernel mode.
>>   * After this function returns it is not safe to call regular kernel code,
>> @@ -72,6 +114,8 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>>  static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
>>  					irqentry_state_t state)
>>  {
>> +	arm64_preempt_schedule_irq();
> 
> This is broken; exit_to_kernel_mode() is called for any non-NMI return
> excpetion return to the kernel, and this doesn't check that interrupts
> were enabled in the context the exception was taken from.
> 
> This will preempt in cases where we should not, e.g. if we WARN() in a section with
> IRQs disabled.
> 
> Mark.
>
diff mbox series

Patch

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 137481a3f0fa..e0380812d71e 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -61,6 +61,48 @@  static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
 	return ret;
 }
 
+#ifdef CONFIG_PREEMPT_DYNAMIC
+DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
+#define need_irq_preemption() \
+	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
+#else
+#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
+#endif
+
+static void __sched arm64_preempt_schedule_irq(void)
+{
+	if (!need_irq_preemption())
+		return;
+
+	/*
+	 * Note: thread_info::preempt_count includes both thread_info::count
+	 * and thread_info::need_resched, and is not equivalent to
+	 * preempt_count().
+	 */
+	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
+		return;
+
+	/*
+	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
+	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
+	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
+	 * DAIF we must have handled an NMI, so skip preemption.
+	 */
+	if (system_uses_irq_prio_masking() && read_sysreg(daif))
+		return;
+
+	/*
+	 * Preempting a task from an IRQ means we leave copies of PSTATE
+	 * on the stack. cpufeature's enable calls may modify PSTATE, but
+	 * resuming one of these preempted tasks would undo those changes.
+	 *
+	 * Only allow a task to be preempted once cpufeatures have been
+	 * enabled.
+	 */
+	if (system_capabilities_finalized())
+		preempt_schedule_irq();
+}
+
 /*
  * Handle IRQ/context state management when exiting to kernel mode.
  * After this function returns it is not safe to call regular kernel code,
@@ -72,6 +114,8 @@  static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
 static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
 					irqentry_state_t state)
 {
+	arm64_preempt_schedule_irq();
+
 	mte_check_tfsr_exit();
 
 	lockdep_assert_irqs_disabled();
@@ -257,48 +301,6 @@  static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs,
 		lockdep_hardirqs_on(CALLER_ADDR0);
 }
 
-#ifdef CONFIG_PREEMPT_DYNAMIC
-DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-#define need_irq_preemption() \
-	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
-#else
-#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
-#endif
-
-static void __sched arm64_preempt_schedule_irq(void)
-{
-	if (!need_irq_preemption())
-		return;
-
-	/*
-	 * Note: thread_info::preempt_count includes both thread_info::count
-	 * and thread_info::need_resched, and is not equivalent to
-	 * preempt_count().
-	 */
-	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
-		return;
-
-	/*
-	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-	 * DAIF we must have handled an NMI, so skip preemption.
-	 */
-	if (system_uses_irq_prio_masking() && read_sysreg(daif))
-		return;
-
-	/*
-	 * Preempting a task from an IRQ means we leave copies of PSTATE
-	 * on the stack. cpufeature's enable calls may modify PSTATE, but
-	 * resuming one of these preempted tasks would undo those changes.
-	 *
-	 * Only allow a task to be preempted once cpufeatures have been
-	 * enabled.
-	 */
-	if (system_capabilities_finalized())
-		preempt_schedule_irq();
-}
-
 static void do_interrupt_handler(struct pt_regs *regs,
 				 void (*handler)(struct pt_regs *))
 {
@@ -567,8 +569,6 @@  static __always_inline void __el1_irq(struct pt_regs *regs,
 	do_interrupt_handler(regs, handler);
 	irq_exit_rcu();
 
-	arm64_preempt_schedule_irq();
-
 	exit_to_kernel_mode(regs, state);
 }
 static void noinstr el1_interrupt(struct pt_regs *regs,