diff mbox series

[v1,5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases

Message ID 20241114220921.2529905-6-saravanak@google.com (mailing list archive)
State New
Headers show
Series Optimize async device suspend/resume | expand

Commit Message

Saravana Kannan Nov. 14, 2024, 10:09 p.m. UTC
As of today, the scheduler doesn't spread out all the kworker threads
across all the available CPUs during suspend/resume. This causes
significant resume latency during the dpm_resume*() phases.

System resume latency is a very user-visible event. Reducing the
latency is more important than trying to be energy aware during that
period.

Since there are no userspace processes running during this time and
this is a very short time window, we can simply disable EAS during
resume so that the parallel resume of the devices is spread across all
the CPUs.

On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
plus disabling EAS for resume yields significant improvements:
+---------------------------+-----------+------------+------------------+
| Phase			    | Old full sync | New full async | % change |
|			    |		    | + EAS disabled |		|
+---------------------------+-----------+------------+------------------+
| Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
+---------------------------+-----------+------------+------------------+
| Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
+---------------------------+-----------+------------+------------------+
| Sum			    |        182 ms |         123 ms |     -32% |
+---------------------------+-----------+------------+------------------+

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 kernel/power/suspend.c  | 16 ++++++++++++++++
 kernel/sched/topology.c | 13 +++++++++++++
 2 files changed, 29 insertions(+)

Comments

Saravana Kannan Nov. 15, 2024, 5:25 a.m. UTC | #1
On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> As of today, the scheduler doesn't spread out all the kworker threads
> across all the available CPUs during suspend/resume. This causes
> significant resume latency during the dpm_resume*() phases.
>
> System resume latency is a very user-visible event. Reducing the
> latency is more important than trying to be energy aware during that
> period.
>
> Since there are no userspace processes running during this time and
> this is a very short time window, we can simply disable EAS during
> resume so that the parallel resume of the devices is spread across all
> the CPUs.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> plus disabling EAS for resume yields significant improvements:
> +---------------------------+-----------+------------+------------------+
> | Phase                     | Old full sync | New full async | % change |
> |                           |               | + EAS disabled |          |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> +---------------------------+-----------+------------+------------------+
> | Sum                       |        182 ms |         123 ms |     -32% |
> +---------------------------+-----------+------------+------------------+
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  kernel/power/suspend.c  | 16 ++++++++++++++++
>  kernel/sched/topology.c | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..7304dc39958f 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
>         local_irq_enable();
>  }
>
> +/*
> + * Intentionally not part of a header file to avoid risk of abuse by other
> + * drivers.
> + */
> +void sched_set_energy_aware(unsigned int enable);
> +
>  /**
>   * suspend_enter - Make the system enter the given sleep state.
>   * @state: System sleep state to enter.
> @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>
>   Platform_wake:
>         platform_resume_noirq(state);
> +       /*
> +        * We do this only for resume instead of suspend and resume for these
> +        * reasons:
> +        * - Performance is more important than power for resume.
> +        * - Power spent entering suspend is more important for suspend. Also,
> +        *   stangely, disabling EAS was making suspent a few milliseconds
> +        *   slower in my testing.
> +        */
> +       sched_set_energy_aware(0);
>         dpm_resume_noirq(PMSG_RESUME);
>
>   Platform_early_resume:
> @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>   Resume_devices:
>         suspend_test_start();
>         dpm_resume_end(PMSG_RESUME);
> +       sched_set_energy_aware(1);
>         suspend_test_finish("resume devices");
>         trace_suspend_resume(TPS("resume_console"), state, true);
>         resume_console();
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..c069c0b17cbf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
>         mutex_unlock(&sched_energy_mutex);
>  }
>
> +void sched_set_energy_aware(unsigned int enable)

  CC      kernel/sched/build_utility.o
In file included from kernel/sched/build_utility.c:88:
kernel/sched/topology.c:287:6: warning: no previous prototype for
‘sched_set_energy_aware’ [-Wmissing-prototypes]
  287 | void sched_set_energy_aware(unsigned int enable)
      |      ^~~~~~~~~~~~~~~~~~~~~~

Peter/Vincent,

I noticed that I'm getting a warning for this line. But I'm not sure
what to do about it. I intentionally didn't put this in a header file
because I'm guessing we don't want to make this available to
drivers/frameworks in general.

Let me know how you want me to handle this.

-Saravana

> +{
> +       int state;
> +
> +       if (!sched_is_eas_possible(cpu_active_mask))
> +               return;
> +
> +       sysctl_sched_energy_aware = enable;
> +       state = static_branch_unlikely(&sched_energy_present);
> +       if (state != sysctl_sched_energy_aware)
> +               rebuild_sched_domains_energy();
> +}
> +
>  #ifdef CONFIG_PROC_SYSCTL
>  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
>                 void *buffer, size_t *lenp, loff_t *ppos)
> --
> 2.47.0.338.g60cca15819-goog
>
Geert Uytterhoeven Nov. 15, 2024, 9:25 a.m. UTC | #2
Hi Saravana,

On Fri, Nov 15, 2024 at 6:25 AM Saravana Kannan <saravanak@google.com> wrote:
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);
> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
>   CC      kernel/sched/build_utility.o
> In file included from kernel/sched/build_utility.c:88:
> kernel/sched/topology.c:287:6: warning: no previous prototype for
> ‘sched_set_energy_aware’ [-Wmissing-prototypes]
>   287 | void sched_set_energy_aware(unsigned int enable)
>       |      ^~~~~~~~~~~~~~~~~~~~~~
>
> Peter/Vincent,
>
> I noticed that I'm getting a warning for this line. But I'm not sure
> what to do about it. I intentionally didn't put this in a header file
> because I'm guessing we don't want to make this available to
> drivers/frameworks in general.
>
> Let me know how you want me to handle this.

Put the prototype in kernel/sched/sched.h, and include
../sched/sched.h from kernel/power/suspend.c?

Gr{oetje,eeting}s,

                        Geert
Vincent Guittot Nov. 15, 2024, 3:30 p.m. UTC | #3
On Fri, 15 Nov 2024 at 06:25, Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);

extern void sched_set_energy_aware(unsigned int enable);

clear the warning

> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
>   CC      kernel/sched/build_utility.o
> In file included from kernel/sched/build_utility.c:88:
> kernel/sched/topology.c:287:6: warning: no previous prototype for
> ‘sched_set_energy_aware’ [-Wmissing-prototypes]
>   287 | void sched_set_energy_aware(unsigned int enable)
>       |      ^~~~~~~~~~~~~~~~~~~~~~
>
> Peter/Vincent,
>
> I noticed that I'm getting a warning for this line. But I'm not sure
> what to do about it. I intentionally didn't put this in a header file
> because I'm guessing we don't want to make this available to
> drivers/frameworks in general.
>
> Let me know how you want me to handle this.
>
> -Saravana
>
> > +{
> > +       int state;
> > +
> > +       if (!sched_is_eas_possible(cpu_active_mask))
> > +               return;
> > +
> > +       sysctl_sched_energy_aware = enable;
> > +       state = static_branch_unlikely(&sched_energy_present);
> > +       if (state != sysctl_sched_energy_aware)
> > +               rebuild_sched_domains_energy();
> > +}
> > +
> >  #ifdef CONFIG_PROC_SYSCTL
> >  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
> >                 void *buffer, size_t *lenp, loff_t *ppos)
> > --
> > 2.47.0.338.g60cca15819-goog
> >
Vincent Guittot Nov. 15, 2024, 4:12 p.m. UTC | #4
On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote:
>
> As of today, the scheduler doesn't spread out all the kworker threads
> across all the available CPUs during suspend/resume. This causes
> significant resume latency during the dpm_resume*() phases.
>
> System resume latency is a very user-visible event. Reducing the
> latency is more important than trying to be energy aware during that
> period.
>
> Since there are no userspace processes running during this time and
> this is a very short time window, we can simply disable EAS during
> resume so that the parallel resume of the devices is spread across all
> the CPUs.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> plus disabling EAS for resume yields significant improvements:
> +---------------------------+-----------+------------+------------------+
> | Phase                     | Old full sync | New full async | % change |
> |                           |               | + EAS disabled |          |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> +---------------------------+-----------+------------+------------------+
> | Sum                       |        182 ms |         123 ms |     -32% |
> +---------------------------+-----------+------------+------------------+

in cover letter you have figures for
 - Old full sync
 - New full async
 - New full async  + EAS disabled

you should better use the figures for  New full async vs New full
async  + EAS disabled to show EAS disabled impact

I would be interested to get figures about the impact of disabling it
during full suspend sequence as I'm not convince that it's worth the
complexity especially with fix OPP during suspend

>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  kernel/power/suspend.c  | 16 ++++++++++++++++
>  kernel/sched/topology.c | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..7304dc39958f 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
>         local_irq_enable();
>  }
>
> +/*
> + * Intentionally not part of a header file to avoid risk of abuse by other
> + * drivers.
> + */
> +void sched_set_energy_aware(unsigned int enable);
> +
>  /**
>   * suspend_enter - Make the system enter the given sleep state.
>   * @state: System sleep state to enter.
> @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>
>   Platform_wake:
>         platform_resume_noirq(state);
> +       /*
> +        * We do this only for resume instead of suspend and resume for these
> +        * reasons:
> +        * - Performance is more important than power for resume.
> +        * - Power spent entering suspend is more important for suspend. Also,
> +        *   stangely, disabling EAS was making suspent a few milliseconds
> +        *   slower in my testing.
> +        */
> +       sched_set_energy_aware(0);
>         dpm_resume_noirq(PMSG_RESUME);
>
>   Platform_early_resume:
> @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>   Resume_devices:
>         suspend_test_start();
>         dpm_resume_end(PMSG_RESUME);
> +       sched_set_energy_aware(1);

If we end up having a special scheduling mode during suspend, we
should make the function more generic and not only EAS/ smartphone
specific

Like a sched_suspend and sched_resume

>         suspend_test_finish("resume devices");
>         trace_suspend_resume(TPS("resume_console"), state, true);
>         resume_console();
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..c069c0b17cbf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
>         mutex_unlock(&sched_energy_mutex);
>  }
>
> +void sched_set_energy_aware(unsigned int enable)

This is a copy/paste of sched_energy_aware_handler() below, we should
have 1 helper for both

> +{
> +       int state;
> +
> +       if (!sched_is_eas_possible(cpu_active_mask))
> +               return;
> +
> +       sysctl_sched_energy_aware = enable;
> +       state = static_branch_unlikely(&sched_energy_present);
> +       if (state != sysctl_sched_energy_aware)
> +               rebuild_sched_domains_energy();
> +}
> +
>  #ifdef CONFIG_PROC_SYSCTL
>  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
>                 void *buffer, size_t *lenp, loff_t *ppos)
> --
> 2.47.0.338.g60cca15819-goog
>
Saravana Kannan Nov. 15, 2024, 6:33 p.m. UTC | #5
On Fri, Nov 15, 2024 at 8:13 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote:
> >
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
>
> in cover letter you have figures for
>  - Old full sync
>  - New full async
>  - New full async  + EAS disabled
>
> you should better use the figures for  New full async vs New full
> async  + EAS disabled to show EAS disabled impact

I do give those numbers in the commit text of each patch making the changes.

Patch 4 commit text shows how it's improving things compared to the
older logic full sync (this is the baseline) - resume is 1% faster.
Patch 5 commit text shows you how disabling EAS is improving numbers
compared to baseline - resume 19% faster.

So, yeah, all the numbers are there in one of these emails. Patch 5
(which is the only one touching EAS) is the one that has the
comparison you are asking for.

> I would be interested to get figures about the impact of disabling it
> during full suspend sequence as I'm not convince that it's worth the
> complexity especially with fix OPP during suspend

1. Device suspend actually got worse by 5ms or so. I already provided that.

2. As I said in the Patch 5, suspend is more about reducing the energy
going into suspend. It's a balance of how quick you can be to how much
power you use to be quick. So, disabling EAS across all of
suspend/resume will have a huge impact on power because userspace is
still running, there are a ton of threads and userspace could get
preempted between disabling suspend and kicking off suspend. Lots of
obvious power concerns overall.

Thanks,
Saravana

>
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);
> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
>
> If we end up having a special scheduling mode during suspend, we
> should make the function more generic and not only EAS/ smartphone
> specific
>
> Like a sched_suspend and sched_resume
>
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
> This is a copy/paste of sched_energy_aware_handler() below, we should
> have 1 helper for both
>
> > +{
> > +       int state;
> > +
> > +       if (!sched_is_eas_possible(cpu_active_mask))
> > +               return;
> > +
> > +       sysctl_sched_energy_aware = enable;
> > +       state = static_branch_unlikely(&sched_energy_present);
> > +       if (state != sysctl_sched_energy_aware)
> > +               rebuild_sched_domains_energy();
> > +}
> > +
> >  #ifdef CONFIG_PROC_SYSCTL
> >  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
> >                 void *buffer, size_t *lenp, loff_t *ppos)
> > --
> > 2.47.0.338.g60cca15819-goog
> >
kernel test robot Nov. 17, 2024, 12:34 a.m. UTC | #6
Hi Saravana,

kernel test robot noticed the following build errors:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com
patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
config: arm-s3c6400_defconfig (https://download.01.org/0day-ci/archive/20241117/202411170802.SnHuptE7-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411170802.SnHuptE7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411170802.SnHuptE7-lkp@intel.com/

All errors (new ones prefixed by >>):

   arm-linux-gnueabi-ld: kernel/power/suspend.o: in function `suspend_enter':
>> kernel/power/suspend.c:485:(.text+0x2f4): undefined reference to `sched_set_energy_aware'
>> arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x48c): undefined reference to `sched_set_energy_aware'
   arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x4c0): undefined reference to `sched_set_energy_aware'
   arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x4d0): undefined reference to `sched_set_energy_aware'
   arm-linux-gnueabi-ld: kernel/power/suspend.o: in function `suspend_devices_and_enter':
   kernel/power/suspend.c:538:(.text+0x748): undefined reference to `sched_set_energy_aware'


vim +485 kernel/power/suspend.c

   401	
   402	/**
   403	 * suspend_enter - Make the system enter the given sleep state.
   404	 * @state: System sleep state to enter.
   405	 * @wakeup: Returns information that the sleep state should not be re-entered.
   406	 *
   407	 * This function should be called after devices have been suspended.
   408	 */
   409	static int suspend_enter(suspend_state_t state, bool *wakeup)
   410	{
   411		int error;
   412	
   413		error = platform_suspend_prepare(state);
   414		if (error)
   415			goto Platform_finish;
   416	
   417		error = dpm_suspend_late(PMSG_SUSPEND);
   418		if (error) {
   419			pr_err("late suspend of devices failed\n");
   420			goto Platform_finish;
   421		}
   422		error = platform_suspend_prepare_late(state);
   423		if (error)
   424			goto Devices_early_resume;
   425	
   426		error = dpm_suspend_noirq(PMSG_SUSPEND);
   427		if (error) {
   428			pr_err("noirq suspend of devices failed\n");
   429			goto Platform_early_resume;
   430		}
   431		error = platform_suspend_prepare_noirq(state);
   432		if (error)
   433			goto Platform_wake;
   434	
   435		if (suspend_test(TEST_PLATFORM))
   436			goto Platform_wake;
   437	
   438		if (state == PM_SUSPEND_TO_IDLE) {
   439			s2idle_loop();
   440			goto Platform_wake;
   441		}
   442	
   443		error = pm_sleep_disable_secondary_cpus();
   444		if (error || suspend_test(TEST_CPUS))
   445			goto Enable_cpus;
   446	
   447		arch_suspend_disable_irqs();
   448		BUG_ON(!irqs_disabled());
   449	
   450		system_state = SYSTEM_SUSPEND;
   451	
   452		error = syscore_suspend();
   453		if (!error) {
   454			*wakeup = pm_wakeup_pending();
   455			if (!(suspend_test(TEST_CORE) || *wakeup)) {
   456				trace_suspend_resume(TPS("machine_suspend"),
   457					state, true);
   458				error = suspend_ops->enter(state);
   459				trace_suspend_resume(TPS("machine_suspend"),
   460					state, false);
   461			} else if (*wakeup) {
   462				error = -EBUSY;
   463			}
   464			syscore_resume();
   465		}
   466	
   467		system_state = SYSTEM_RUNNING;
   468	
   469		arch_suspend_enable_irqs();
   470		BUG_ON(irqs_disabled());
   471	
   472	 Enable_cpus:
   473		pm_sleep_enable_secondary_cpus();
   474	
   475	 Platform_wake:
   476		platform_resume_noirq(state);
   477		/*
   478		 * We do this only for resume instead of suspend and resume for these
   479		 * reasons:
   480		 * - Performance is more important than power for resume.
   481		 * - Power spent entering suspend is more important for suspend. Also,
   482		 *   stangely, disabling EAS was making suspent a few milliseconds
   483		 *   slower in my testing.
   484		 */
 > 485		sched_set_energy_aware(0);
   486		dpm_resume_noirq(PMSG_RESUME);
   487	
   488	 Platform_early_resume:
   489		platform_resume_early(state);
   490	
   491	 Devices_early_resume:
   492		dpm_resume_early(PMSG_RESUME);
   493	
   494	 Platform_finish:
   495		platform_resume_finish(state);
   496		return error;
   497	}
   498
kernel test robot Nov. 17, 2024, 1:17 a.m. UTC | #7
Hi Saravana,

kernel test robot noticed the following build errors:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com
patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
config: powerpc-tqm8560_defconfig (https://download.01.org/0day-ci/archive/20241117/202411170920.Pv29JceD-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411170920.Pv29JceD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411170920.Pv29JceD-lkp@intel.com/

All errors (new ones prefixed by >>):

   powerpc-linux-ld: kernel/power/suspend.o: in function `suspend_enter':
   suspend.c:(.text+0x414): undefined reference to `sched_set_energy_aware'
>> powerpc-linux-ld: suspend.c:(.text+0x63c): undefined reference to `sched_set_energy_aware'
   powerpc-linux-ld: kernel/power/suspend.o: in function `suspend_devices_and_enter':
   suspend.c:(.text+0x834): undefined reference to `sched_set_energy_aware'
   powerpc-linux-ld: suspend.c:(.text+0x950): undefined reference to `sched_set_energy_aware'
   powerpc-linux-ld: suspend.c:(.text+0x970): undefined reference to `sched_set_energy_aware'
kernel test robot Nov. 17, 2024, 1:34 p.m. UTC | #8
Hi Saravana,

kernel test robot noticed the following build warnings:

[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com
patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
config: arm-randconfig-003-20241117 (https://download.01.org/0day-ci/archive/20241117/202411172344.QqFap290-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411172344.QqFap290-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411172344.QqFap290-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from kernel/sched/build_utility.c:88:
>> kernel/sched/topology.c:287:6: warning: no previous prototype for 'sched_set_energy_aware' [-Wmissing-prototypes]
     287 | void sched_set_energy_aware(unsigned int enable)
         |      ^~~~~~~~~~~~~~~~~~~~~~


vim +/sched_set_energy_aware +287 kernel/sched/topology.c

   286	
 > 287	void sched_set_energy_aware(unsigned int enable)
   288	{
   289		int state;
   290	
   291		if (!sched_is_eas_possible(cpu_active_mask))
   292			return;
   293	
   294		sysctl_sched_energy_aware = enable;
   295		state = static_branch_unlikely(&sched_energy_present);
   296		if (state != sysctl_sched_energy_aware)
   297			rebuild_sched_domains_energy();
   298	}
   299
Christian Loehle Nov. 18, 2024, 9:52 a.m. UTC | #9
On 11/14/24 22:09, Saravana Kannan wrote:
> As of today, the scheduler doesn't spread out all the kworker threads
> across all the available CPUs during suspend/resume. This causes
> significant resume latency during the dpm_resume*() phases.
> 
> System resume latency is a very user-visible event. Reducing the
> latency is more important than trying to be energy aware during that
> period.
> 
> Since there are no userspace processes running during this time and
> this is a very short time window, we can simply disable EAS during
> resume so that the parallel resume of the devices is spread across all
> the CPUs.
> 
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> plus disabling EAS for resume yields significant improvements:
> +---------------------------+-----------+------------+------------------+
> | Phase			    | Old full sync | New full async | % change |
> |			    |		    | + EAS disabled |		|
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> +---------------------------+-----------+------------+------------------+
> | Sum			    |        182 ms |         123 ms |     -32% |
> +---------------------------+-----------+------------+------------------+
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  kernel/power/suspend.c  | 16 ++++++++++++++++
>  kernel/sched/topology.c | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..7304dc39958f 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
>  	local_irq_enable();
>  }
>  
> +/*
> + * Intentionally not part of a header file to avoid risk of abuse by other
> + * drivers.
> + */
> +void sched_set_energy_aware(unsigned int enable);
> +
>  /**
>   * suspend_enter - Make the system enter the given sleep state.
>   * @state: System sleep state to enter.
> @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>  
>   Platform_wake:
>  	platform_resume_noirq(state);
> +	/*
> +	 * We do this only for resume instead of suspend and resume for these
> +	 * reasons:
> +	 * - Performance is more important than power for resume.
> +	 * - Power spent entering suspend is more important for suspend. Also,
> +	 *   stangely, disabling EAS was making suspent a few milliseconds
> +	 *   slower in my testing.

s/stangely/strangely
s/suspent/suspend
I'd also be curious why that is. Disabling EAS shouldn't be that expensive.
What if you just hack the static branch switch (without the sd rebuild)?

> +	 */
> +	sched_set_energy_aware(0);
>  	dpm_resume_noirq(PMSG_RESUME);
>  
>   Platform_early_resume:
> @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>   Resume_devices:
>  	suspend_test_start();
>  	dpm_resume_end(PMSG_RESUME);
> +	sched_set_energy_aware(1);
>  	suspend_test_finish("resume devices");
>  	trace_suspend_resume(TPS("resume_console"), state, true);
>  	resume_console();
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..c069c0b17cbf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
>  	mutex_unlock(&sched_energy_mutex);
>  }
>  
> +void sched_set_energy_aware(unsigned int enable)

bool enable?

> +{
> +	int state;
> +
> +	if (!sched_is_eas_possible(cpu_active_mask))
> +		return;
> +
> +	sysctl_sched_energy_aware = enable;
> +	state = static_branch_unlikely(&sched_energy_present);
> +	if (state != sysctl_sched_energy_aware)
> +		rebuild_sched_domains_energy();
> +}
> +

This definitely shouldn't just overwrite
sysctl_sched_energy_aware, otherwise you enable EAS
for users that explicitly disabled it.

If it ever comes to other users wanting this we might
need a eas_pause counter so this can be nested, but
let's just hope that's never needed.

Regards,
Christian
Saravana Kannan Nov. 18, 2024, 5:18 p.m. UTC | #10
On Mon, Nov 18, 2024 at 1:52 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 11/14/24 22:09, Saravana Kannan wrote:
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                           | Old full sync | New full async | % change |
> > |                         |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                     |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >       local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);
> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >       platform_resume_noirq(state);
> > +     /*
> > +      * We do this only for resume instead of suspend and resume for these
> > +      * reasons:
> > +      * - Performance is more important than power for resume.
> > +      * - Power spent entering suspend is more important for suspend. Also,
> > +      *   stangely, disabling EAS was making suspent a few milliseconds
> > +      *   slower in my testing.
>
> s/stangely/strangely
> s/suspent/suspend

Will fix it in the next version.

> I'd also be curious why that is. Disabling EAS shouldn't be that expensive.
> What if you just hack the static branch switch (without the sd rebuild)?

I don't think the enabling/disabling is the expensive part. Because I
do it around dpm_resume*() and it helps performance. I tried to see if
I could spot a reason, looking at the trace. But nothing stood out.

My educated guess is that when going into suspend, the "thundering
herd" happens early (all the leaf nodes suspend first) and then peters
out. Whereas, during resume it's a slow ramp up until the "thundering
herd" happens at the end (all the leaf nodes resume last).  Spreading
out the threads immediately (no EAS) probably has a different impact
on these two styles of thundering herds.

>
> > +      */
> > +     sched_set_energy_aware(0);
> >       dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >       suspend_test_start();
> >       dpm_resume_end(PMSG_RESUME);
> > +     sched_set_energy_aware(1);
> >       suspend_test_finish("resume devices");
> >       trace_suspend_resume(TPS("resume_console"), state, true);
> >       resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >       mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
> bool enable?

Will do.

>
> > +{
> > +     int state;
> > +
> > +     if (!sched_is_eas_possible(cpu_active_mask))
> > +             return;
> > +
> > +     sysctl_sched_energy_aware = enable;
> > +     state = static_branch_unlikely(&sched_energy_present);
> > +     if (state != sysctl_sched_energy_aware)
> > +             rebuild_sched_domains_energy();
> > +}
> > +
>
> This definitely shouldn't just overwrite
> sysctl_sched_energy_aware, otherwise you enable EAS
> for users that explicitly disabled it.

Good point. Will fix it in the next version.

Thanks for the review!

-Saravana

>
> If it ever comes to other users wanting this we might
> need a eas_pause counter so this can be nested, but
> let's just hope that's never needed.
>
> Regards,
> Christian
>
diff mbox series

Patch

diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 09f8397bae15..7304dc39958f 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -393,6 +393,12 @@  void __weak arch_suspend_enable_irqs(void)
 	local_irq_enable();
 }
 
+/*
+ * Intentionally not part of a header file to avoid risk of abuse by other
+ * drivers.
+ */
+void sched_set_energy_aware(unsigned int enable);
+
 /**
  * suspend_enter - Make the system enter the given sleep state.
  * @state: System sleep state to enter.
@@ -468,6 +474,15 @@  static int suspend_enter(suspend_state_t state, bool *wakeup)
 
  Platform_wake:
 	platform_resume_noirq(state);
+	/*
+	 * We do this only for resume instead of suspend and resume for these
+	 * reasons:
+	 * - Performance is more important than power for resume.
+	 * - Power spent entering suspend is more important for suspend. Also,
+	 *   stangely, disabling EAS was making suspent a few milliseconds
+	 *   slower in my testing.
+	 */
+	sched_set_energy_aware(0);
 	dpm_resume_noirq(PMSG_RESUME);
 
  Platform_early_resume:
@@ -520,6 +535,7 @@  int suspend_devices_and_enter(suspend_state_t state)
  Resume_devices:
 	suspend_test_start();
 	dpm_resume_end(PMSG_RESUME);
+	sched_set_energy_aware(1);
 	suspend_test_finish("resume devices");
 	trace_suspend_resume(TPS("resume_console"), state, true);
 	resume_console();
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9748a4c8d668..c069c0b17cbf 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -284,6 +284,19 @@  void rebuild_sched_domains_energy(void)
 	mutex_unlock(&sched_energy_mutex);
 }
 
+void sched_set_energy_aware(unsigned int enable)
+{
+	int state;
+
+	if (!sched_is_eas_possible(cpu_active_mask))
+		return;
+
+	sysctl_sched_energy_aware = enable;
+	state = static_branch_unlikely(&sched_energy_present);
+	if (state != sysctl_sched_energy_aware)
+		rebuild_sched_domains_energy();
+}
+
 #ifdef CONFIG_PROC_SYSCTL
 static int sched_energy_aware_handler(const struct ctl_table *table, int write,
 		void *buffer, size_t *lenp, loff_t *ppos)