acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()

Message ID	a079bba5a0e47d6534b307553fc3772d26ce911b.camel@infradead.org (mailing list archive)
State	Mainlined, archived
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 671D73D3A0 for <linux-acpi@vger.kernel.org>; Fri, 27 Oct 2023 18:37:00 +0000 (UTC) Message-ID: <a079bba5a0e47d6534b307553fc3772d26ce911b.camel@infradead.org> Subject: [PATCH] acpi_idle: use raw_safe_halt() from acpi_idle_play_dead() From: David Woodhouse <dwmw2@infradead.org> To: linux-acpi <linux-acpi@vger.kernel.org>, linux-kernel <linux-kernel@vger.kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org>, Len Brown <lenb@kernel.org>, Juergen Gross <jgross@suse.com>, xen-devel <xen-devel@lists.xenproject.org>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com> Date: Fri, 27 Oct 2023 19:36:51 +0100 Content-Type: multipart/signed; micalg="sha-256"; protocol="application/pkcs7-signature"; boundary="=-m3QW9bQiD6p/UWx5bqvF" User-Agent: Evolution 3.44.4-0ubuntu2 Precedence: bulk MIME-Version: 1.0
Series	acpi_idle: use raw_safe_halt() from acpi_idle_play_dead() \| expand acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()

Message ID

a079bba5a0e47d6534b307553fc3772d26ce911b.camel@infradead.org (mailing list archive)

State

Mainlined, archived

Headers

Message-ID: <a079bba5a0e47d6534b307553fc3772d26ce911b.camel@infradead.org>
Subject: [PATCH] acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()
From: David Woodhouse <dwmw2@infradead.org>
To: linux-acpi <linux-acpi@vger.kernel.org>, linux-kernel
	 <linux-kernel@vger.kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>, Len Brown <lenb@kernel.org>,
 Juergen Gross <jgross@suse.com>, xen-devel
 <xen-devel@lists.xenproject.org>, Peter Zijlstra <peterz@infradead.org>,
 Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>, Waiman Long
 <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>
Date: Fri, 27 Oct 2023 19:36:51 +0100
Content-Type: multipart/signed; micalg="sha-256";
 protocol="application/pkcs7-signature";
	boundary="=-m3QW9bQiD6p/UWx5bqvF"
User-Agent: Evolution 3.44.4-0ubuntu2 
Precedence: bulk
MIME-Version: 1.0

Series

acpi_idle: use raw_safe_halt() from acpi_idle_play_dead() | expand

Commit Message

David Woodhouse Oct. 27, 2023, 6:36 p.m. UTC

From: David Woodhouse <dwmw@amazon.co.uk>

Xen HVM guests were observed taking triple-faults when attempting to
online a previously offlined vCPU.

Investigation showed that the fault was coming from a failing call
to lockdep_assert_irqs_disabled(), in load_current_idt() which was
too early in the CPU bringup to actually catch the exception and
report the failure cleanly.

This was a false positive, caused by acpi_idle_play_dead() setting
the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it
to use raw_safe_halt() instead, which doesn't do so.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
We might {also,instead} explicitly set the hardirqs_enabled flag to
zero when bringing up an AP?

 drivers/acpi/processor_idle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Peter Zijlstra Oct. 27, 2023, 7:14 p.m. UTC | #1

On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Xen HVM guests were observed taking triple-faults when attempting to
> online a previously offlined vCPU.
> 
> Investigation showed that the fault was coming from a failing call
> to lockdep_assert_irqs_disabled(), in load_current_idt() which was
> too early in the CPU bringup to actually catch the exception and
> report the failure cleanly.
> 
> This was a false positive, caused by acpi_idle_play_dead() setting
> the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it
> to use raw_safe_halt() instead, which doesn't do so.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> We might {also,instead} explicitly set the hardirqs_enabled flag to
> zero when bringing up an AP?

So I fixed up the idle paths the other day (see all that __cpuidle
stuff) but I've not yet gone through the whole hotplug thing :/

This seems right, at this point everything, including RCU is very much
gone, any instrumentation is undesired.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> 
>  drivers/acpi/processor_idle.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 3a34a8c425fe..55437f5e0c3a 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -592,7 +592,7 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index)
>  	while (1) {
>  
>  		if (cx->entry_method == ACPI_CSTATE_HALT)
> -			safe_halt();
> +			raw_safe_halt();
>  		else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
>  			io_idle(cx->address);
>  		} else
> -- 
> 2.41.0
> 
>

David Woodhouse Nov. 20, 2023, 12:19 p.m. UTC | #2

On Fri, 2023-10-27 at 21:14 +0200, Peter Zijlstra wrote:
> On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > Xen HVM guests were observed taking triple-faults when attempting to
> > online a previously offlined vCPU.
> > 
> > Investigation showed that the fault was coming from a failing call
> > to lockdep_assert_irqs_disabled(), in load_current_idt() which was
> > too early in the CPU bringup to actually catch the exception and
> > report the failure cleanly.
> > 
> > This was a false positive, caused by acpi_idle_play_dead() setting
> > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it
> > to use raw_safe_halt() instead, which doesn't do so.
> > 
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > ---
> > We might {also,instead} explicitly set the hardirqs_enabled flag to
> > zero when bringing up an AP?
> 
> So I fixed up the idle paths the other day (see all that __cpuidle
> stuff) but I've not yet gone through the whole hotplug thing :/
> 
> This seems right, at this point everything, including RCU is very much
> gone, any instrumentation is undesired.
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Ping? Who's taking this?

Needs a Cc:stable@vger.kernel.org now too, to fix 6.6.x.

Rafael J. Wysocki Nov. 20, 2023, 12:57 p.m. UTC | #3

On Mon, Nov 20, 2023 at 1:20 PM David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Fri, 2023-10-27 at 21:14 +0200, Peter Zijlstra wrote:
> > On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote:
> > > From: David Woodhouse <dwmw@amazon.co.uk>
> > >
> > > Xen HVM guests were observed taking triple-faults when attempting to
> > > online a previously offlined vCPU.
> > >
> > > Investigation showed that the fault was coming from a failing call
> > > to lockdep_assert_irqs_disabled(), in load_current_idt() which was
> > > too early in the CPU bringup to actually catch the exception and
> > > report the failure cleanly.
> > >
> > > This was a false positive, caused by acpi_idle_play_dead() setting
> > > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it
> > > to use raw_safe_halt() instead, which doesn't do so.
> > >
> > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > > ---
> > > We might {also,instead} explicitly set the hardirqs_enabled flag to
> > > zero when bringing up an AP?
> >
> > So I fixed up the idle paths the other day (see all that __cpuidle
> > stuff) but I've not yet gone through the whole hotplug thing :/
> >
> > This seems right, at this point everything, including RCU is very much
> > gone, any instrumentation is undesired.
> >
> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> Ping? Who's taking this?

I'm going to apply it.

> Needs a Cc:stable@vger.kernel.org now too, to fix 6.6.x.

Sure.

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 3a34a8c425fe..55437f5e0c3a 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -592,7 +592,7 @@  static int acpi_idle_play_dead(struct cpuidle_device *dev, int index)
 	while (1) {
 
 		if (cx->entry_method == ACPI_CSTATE_HALT)
-			safe_halt();
+			raw_safe_halt();
 		else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
 			io_idle(cx->address);
 		} else

acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()

Commit Message

Comments

Patch