Message ID | a079bba5a0e47d6534b307553fc3772d26ce911b.camel@infradead.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 9bb69ba4c177dccaa1f5b5cbdf80b67813328348 |
Headers | show |
Series | acpi_idle: use raw_safe_halt() from acpi_idle_play_dead() | expand |
On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > Xen HVM guests were observed taking triple-faults when attempting to > online a previously offlined vCPU. > > Investigation showed that the fault was coming from a failing call > to lockdep_assert_irqs_disabled(), in load_current_idt() which was > too early in the CPU bringup to actually catch the exception and > report the failure cleanly. > > This was a false positive, caused by acpi_idle_play_dead() setting > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it > to use raw_safe_halt() instead, which doesn't do so. > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > --- > We might {also,instead} explicitly set the hardirqs_enabled flag to > zero when bringing up an AP? So I fixed up the idle paths the other day (see all that __cpuidle stuff) but I've not yet gone through the whole hotplug thing :/ This seems right, at this point everything, including RCU is very much gone, any instrumentation is undesired. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > drivers/acpi/processor_idle.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c > index 3a34a8c425fe..55437f5e0c3a 100644 > --- a/drivers/acpi/processor_idle.c > +++ b/drivers/acpi/processor_idle.c > @@ -592,7 +592,7 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index) > while (1) { > > if (cx->entry_method == ACPI_CSTATE_HALT) > - safe_halt(); > + raw_safe_halt(); > else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) { > io_idle(cx->address); > } else > -- > 2.41.0 > >
On Fri, 2023-10-27 at 21:14 +0200, Peter Zijlstra wrote: > On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote: > > From: David Woodhouse <dwmw@amazon.co.uk> > > > > Xen HVM guests were observed taking triple-faults when attempting to > > online a previously offlined vCPU. > > > > Investigation showed that the fault was coming from a failing call > > to lockdep_assert_irqs_disabled(), in load_current_idt() which was > > too early in the CPU bringup to actually catch the exception and > > report the failure cleanly. > > > > This was a false positive, caused by acpi_idle_play_dead() setting > > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it > > to use raw_safe_halt() instead, which doesn't do so. > > > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > > --- > > We might {also,instead} explicitly set the hardirqs_enabled flag to > > zero when bringing up an AP? > > So I fixed up the idle paths the other day (see all that __cpuidle > stuff) but I've not yet gone through the whole hotplug thing :/ > > This seems right, at this point everything, including RCU is very much > gone, any instrumentation is undesired. > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Ping? Who's taking this? Needs a Cc:stable@vger.kernel.org now too, to fix 6.6.x.
On Mon, Nov 20, 2023 at 1:20 PM David Woodhouse <dwmw2@infradead.org> wrote: > > On Fri, 2023-10-27 at 21:14 +0200, Peter Zijlstra wrote: > > On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote: > > > From: David Woodhouse <dwmw@amazon.co.uk> > > > > > > Xen HVM guests were observed taking triple-faults when attempting to > > > online a previously offlined vCPU. > > > > > > Investigation showed that the fault was coming from a failing call > > > to lockdep_assert_irqs_disabled(), in load_current_idt() which was > > > too early in the CPU bringup to actually catch the exception and > > > report the failure cleanly. > > > > > > This was a false positive, caused by acpi_idle_play_dead() setting > > > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it > > > to use raw_safe_halt() instead, which doesn't do so. > > > > > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > > > --- > > > We might {also,instead} explicitly set the hardirqs_enabled flag to > > > zero when bringing up an AP? > > > > So I fixed up the idle paths the other day (see all that __cpuidle > > stuff) but I've not yet gone through the whole hotplug thing :/ > > > > This seems right, at this point everything, including RCU is very much > > gone, any instrumentation is undesired. > > > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > Ping? Who's taking this? I'm going to apply it. > Needs a Cc:stable@vger.kernel.org now too, to fix 6.6.x. Sure.
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 3a34a8c425fe..55437f5e0c3a 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -592,7 +592,7 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index) while (1) { if (cx->entry_method == ACPI_CSTATE_HALT) - safe_halt(); + raw_safe_halt(); else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) { io_idle(cx->address); } else