diff mbox series

[v2] PM / s2idle: Clear _TIF_POLLING_NRFLAG before suspend to idle

Message ID 20200616040442.21515-1-yu.c.chen@intel.com (mailing list archive)
State Superseded, archived
Headers show
Series [v2] PM / s2idle: Clear _TIF_POLLING_NRFLAG before suspend to idle | expand

Commit Message

Chen Yu June 16, 2020, 4:04 a.m. UTC
Suspend to idle was found to not work on Goldmont CPU recently.
And the issue was triggered due to:

1. On Goldmont the CPU in idle can only be woken up via IPIs,
   not POLLING mode:
   Commit 08e237fa56a1 ("x86/cpu: Add workaround for MONITOR
   instruction erratum on Goldmont based CPUs")
2. When the CPU is entering suspend to idle process, the
   _TIF_POLLING_NRFLAG is kept on, due to cpuidle_enter_s2idle()
   doesn't properly match call_cpuidle().
3. Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
   makes use of _TIF_POLLING_NRFLAG to avoid sending IPIs to
   idle CPUs.
4. As a result, some IPIs related functions might not work
   well during suspend to idle on Goldmont. For example, one
   suspected victim:
   tick_unfreeze() -> timekeeping_resume() -> hrtimers_resume()
   -> clock_was_set() -> on_each_cpu() might wait forever,
   because the IPIs will not be sent to the CPUs which are
   sleeping with _TIF_POLLING_NRFLAG set, and Goldmont CPU
   could not be woken up by only setting _TIF_NEED_RESCHED
   on the monitor address.

I don't find a way in Ubuntu to update the firmware of Goldmont
and check if the issue was gone, a fix patch would do no harm.
Clear the _TIF_POLLING_NRFLAG flag before entering suspend to idle,
and let the driver's enter_s2idle() to decide whether to set
_TIF_POLLING_NRFLAG or not. So that to avoid the scenario described
above and keep the context consistent with before. Also adjust
the naming to be consistent with call_cpuidle().

Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: kbuild test robot <lkp@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v2: According to Peter's review, v1 is racy, if someone already
    set TIF_NEED_RESCHED this patch just clear POLLING and go to sleep.
    Check TIF_NEED_RESCHED before entering suspend to idle and
    adjust the naming to be consistent with call_cpuidle().
--
 drivers/cpuidle/cpuidle.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

Comments

Rafael J. Wysocki June 22, 2020, 4:19 p.m. UTC | #1
On Tue, Jun 16, 2020 at 6:03 AM Chen Yu <yu.c.chen@intel.com> wrote:
>
> Suspend to idle was found to not work on Goldmont CPU recently.
> And the issue was triggered due to:
>
> 1. On Goldmont the CPU in idle can only be woken up via IPIs,
>    not POLLING mode:
>    Commit 08e237fa56a1 ("x86/cpu: Add workaround for MONITOR
>    instruction erratum on Goldmont based CPUs")
> 2. When the CPU is entering suspend to idle process, the
>    _TIF_POLLING_NRFLAG is kept on, due to cpuidle_enter_s2idle()
>    doesn't properly match call_cpuidle().
> 3. Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
>    makes use of _TIF_POLLING_NRFLAG to avoid sending IPIs to
>    idle CPUs.
> 4. As a result, some IPIs related functions might not work
>    well during suspend to idle on Goldmont. For example, one
>    suspected victim:
>    tick_unfreeze() -> timekeeping_resume() -> hrtimers_resume()
>    -> clock_was_set() -> on_each_cpu() might wait forever,
>    because the IPIs will not be sent to the CPUs which are
>    sleeping with _TIF_POLLING_NRFLAG set, and Goldmont CPU
>    could not be woken up by only setting _TIF_NEED_RESCHED
>    on the monitor address.
>
> I don't find a way in Ubuntu to update the firmware of Goldmont
> and check if the issue was gone, a fix patch would do no harm.
> Clear the _TIF_POLLING_NRFLAG flag before entering suspend to idle,
> and let the driver's enter_s2idle() to decide whether to set
> _TIF_POLLING_NRFLAG or not. So that to avoid the scenario described
> above and keep the context consistent with before. Also adjust
> the naming to be consistent with call_cpuidle().
>
> Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reported-by: kbuild test robot <lkp@intel.com>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>

Peter, any more comments here?

> ---
> v2: According to Peter's review, v1 is racy, if someone already
>     set TIF_NEED_RESCHED this patch just clear POLLING and go to sleep.
>     Check TIF_NEED_RESCHED before entering suspend to idle and
>     adjust the naming to be consistent with call_cpuidle().
> --
>  drivers/cpuidle/cpuidle.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index c149d9e20dfd..b003767abebd 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -13,6 +13,7 @@
>  #include <linux/mutex.h>
>  #include <linux/sched.h>
>  #include <linux/sched/clock.h>
> +#include <linux/sched/idle.h>
>  #include <linux/notifier.h>
>  #include <linux/pm_qos.h>
>  #include <linux/cpu.h>
> @@ -133,8 +134,8 @@ int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
>  }
>
>  #ifdef CONFIG_SUSPEND
> -static void enter_s2idle_proper(struct cpuidle_driver *drv,
> -                               struct cpuidle_device *dev, int index)
> +static void s2idle_enter(struct cpuidle_driver *drv,
> +                        struct cpuidle_device *dev, int index)
>  {
>         ktime_t time_start, time_end;
>
> @@ -168,6 +169,15 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
>         dev->states_usage[index].s2idle_usage++;
>  }
>
> +static int call_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> +                      int index)
> +{
> +       if (!current_clr_polling_and_test())
> +               s2idle_enter(drv, dev, index);
> +
> +       return index;

Is the value returned here used at all?

> +}
> +
>  /**
>   * cpuidle_enter_s2idle - Enter an idle state suitable for suspend-to-idle.
>   * @drv: cpuidle driver for the given CPU.
> @@ -187,7 +197,7 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>          */
>         index = find_deepest_state(drv, dev, U64_MAX, 0, true);
>         if (index > 0)
> -               enter_s2idle_proper(drv, dev, index);
> +               call_s2idle(drv, dev, index);

I'm wondering why this can't be

    if (index > 0 && !current_clr_polling_and_test())
            enter_s2idle_proper(drv, dev, index);

>         return index;
>  }
> --
Chen Yu June 22, 2020, 5:18 p.m. UTC | #2
Hi Rafael,
On Mon, Jun 22, 2020 at 06:19:35PM +0200, Rafael J. Wysocki wrote:
[cut]
> > +{
> > +       if (!current_clr_polling_and_test())
> > +               s2idle_enter(drv, dev, index);
> > +
> > +       return index;
> 
> Is the value returned here used at all?
>
It is not used for now IMO.
> >          */
> >         index = find_deepest_state(drv, dev, U64_MAX, 0, true);
> >         if (index > 0)
> > -               enter_s2idle_proper(drv, dev, index);
> > +               call_s2idle(drv, dev, index);
> 
> I'm wondering why this can't be
> 
>     if (index > 0 && !current_clr_polling_and_test())
>             enter_s2idle_proper(drv, dev, index);
> 
Yes, it should be simpler, but I guess Peter was trying to
make call_s2idle() consistent with call_cpuidle(),
and also s2idle_enter() is analogous to cpuidle_enter().

Thanks,
Chenyu
Peter Zijlstra June 22, 2020, 5:29 p.m. UTC | #3
On Mon, Jun 22, 2020 at 06:19:35PM +0200, Rafael J. Wysocki wrote:
> > Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
> > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Reported-by: kbuild test robot <lkp@intel.com>
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Cc: Len Brown <lenb@kernel.org>
> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> 
> Peter, any more comments here?

Only that the whole s2idle stuff could do with a cleanup :-)

> > +static int call_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> > +                      int index)
> > +{
> > +       if (!current_clr_polling_and_test())
> > +               s2idle_enter(drv, dev, index);
> > +
> > +       return index;
> 
> Is the value returned here used at all?
> 
> > +}
> > +
> >  /**
> >   * cpuidle_enter_s2idle - Enter an idle state suitable for suspend-to-idle.
> >   * @drv: cpuidle driver for the given CPU.
> > @@ -187,7 +197,7 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> >          */
> >         index = find_deepest_state(drv, dev, U64_MAX, 0, true);
> >         if (index > 0)
> > -               enter_s2idle_proper(drv, dev, index);
> > +               call_s2idle(drv, dev, index);
> 
> I'm wondering why this can't be
> 
>     if (index > 0 && !current_clr_polling_and_test())
>             enter_s2idle_proper(drv, dev, index);

Works for me. Some Wysocki guy wrote much of it, best ask him :-)

The thing that confused me is that all this is way different from the
normal idle path and didn't keep the invariants.

Ideally; much of that gets folded back into the normal patch somehow.
Rafael J. Wysocki June 22, 2020, 7:45 p.m. UTC | #4
On Mon, Jun 22, 2020 at 7:16 PM Chen Yu <yu.c.chen@intel.com> wrote:
>
> Hi Rafael,
> On Mon, Jun 22, 2020 at 06:19:35PM +0200, Rafael J. Wysocki wrote:
> [cut]
> > > +{
> > > +       if (!current_clr_polling_and_test())
> > > +               s2idle_enter(drv, dev, index);
> > > +
> > > +       return index;
> >
> > Is the value returned here used at all?
> >
> It is not used for now IMO.
> > >          */
> > >         index = find_deepest_state(drv, dev, U64_MAX, 0, true);
> > >         if (index > 0)
> > > -               enter_s2idle_proper(drv, dev, index);
> > > +               call_s2idle(drv, dev, index);
> >
> > I'm wondering why this can't be
> >
> >     if (index > 0 && !current_clr_polling_and_test())
> >             enter_s2idle_proper(drv, dev, index);
> >
> Yes, it should be simpler, but I guess Peter was trying to
> make call_s2idle() consistent with call_cpuidle(),
> and also s2idle_enter() is analogous to cpuidle_enter().

So IMO it would be better to do the simplest fix first and then do the
cleanup on top of it.

Thanks!
Chen Yu June 23, 2020, 1:56 a.m. UTC | #5
On Mon, Jun 22, 2020 at 09:45:35PM +0200, Rafael J. Wysocki wrote:
> On Mon, Jun 22, 2020 at 7:16 PM Chen Yu <yu.c.chen@intel.com> wrote:
> >
> > Hi Rafael,
> > On Mon, Jun 22, 2020 at 06:19:35PM +0200, Rafael J. Wysocki wrote:
> > [cut]
> > > > +{
> > > > +       if (!current_clr_polling_and_test())
> > > > +               s2idle_enter(drv, dev, index);
> > > > +
> > > > +       return index;
> > >
> > > Is the value returned here used at all?
> > >
> > It is not used for now IMO.
> > > >          */
> > > >         index = find_deepest_state(drv, dev, U64_MAX, 0, true);
> > > >         if (index > 0)
> > > > -               enter_s2idle_proper(drv, dev, index);
> > > > +               call_s2idle(drv, dev, index);
> > >
> > > I'm wondering why this can't be
> > >
> > >     if (index > 0 && !current_clr_polling_and_test())
> > >             enter_s2idle_proper(drv, dev, index);
> > >
> > Yes, it should be simpler, but I guess Peter was trying to
> > make call_s2idle() consistent with call_cpuidle(),
> > and also s2idle_enter() is analogous to cpuidle_enter().
> 
> So IMO it would be better to do the simplest fix first and then do the
> cleanup on top of it.
>
Okay, I'll do that.

Thanks,
Chenyu
diff mbox series

Patch

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index c149d9e20dfd..b003767abebd 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -13,6 +13,7 @@ 
 #include <linux/mutex.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
+#include <linux/sched/idle.h>
 #include <linux/notifier.h>
 #include <linux/pm_qos.h>
 #include <linux/cpu.h>
@@ -133,8 +134,8 @@  int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
 }
 
 #ifdef CONFIG_SUSPEND
-static void enter_s2idle_proper(struct cpuidle_driver *drv,
-				struct cpuidle_device *dev, int index)
+static void s2idle_enter(struct cpuidle_driver *drv,
+			 struct cpuidle_device *dev, int index)
 {
 	ktime_t time_start, time_end;
 
@@ -168,6 +169,15 @@  static void enter_s2idle_proper(struct cpuidle_driver *drv,
 	dev->states_usage[index].s2idle_usage++;
 }
 
+static int call_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+		       int index)
+{
+	if (!current_clr_polling_and_test())
+		s2idle_enter(drv, dev, index);
+
+	return index;
+}
+
 /**
  * cpuidle_enter_s2idle - Enter an idle state suitable for suspend-to-idle.
  * @drv: cpuidle driver for the given CPU.
@@ -187,7 +197,7 @@  int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 */
 	index = find_deepest_state(drv, dev, U64_MAX, 0, true);
 	if (index > 0)
-		enter_s2idle_proper(drv, dev, index);
+		call_s2idle(drv, dev, index);
 
 	return index;
 }