diff mbox series

[1/2] KVM: arm64: Initialize HCR_EL2.E2H early

Message ID 20250227180526.1204723-2-mark.rutland@arm.com (mailing list archive)
State New
Headers show
Series KVM: arm64: PSCI relay fixes | expand

Commit Message

Mark Rutland Feb. 27, 2025, 6:05 p.m. UTC
On CPUs without FEAT_E2H0, HCR_EL2.E2H is RES1, but may reset to an
UNKNOWN value out of reset and consequently may not read as 1 unless it
has been explicitly initialized.

We handled this for the head.S boot code in commits:

  3944382fa6f22b54 ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative")
  b3320142f3db9b3f ("arm64: Fix early handling of FEAT_E2H0 not being implemented")

Unfortunately, we forgot to apply a similar fix to the KVM PSCI entry
points used when relaying CPU_ON, CPU_SUSPEND, and SYSTEM SUSPEND. When
KVM is entered via these entry points, the value of HCR_EL2.E2H may be
consumed before it has been initialized (e.g. by the 'init_el2_state'
macro).

Initialize HCR_EL2.E2H early in these paths such that it can be consumed
reliably. The existing code in head.S is factored out into a new
'init_el2_hcr' macro, and this is used in the __kvm_hyp_init_cpu()
function common to all the relevant PSCI entry points.

For clarity, I've tweaked the assembly used to check whether
ID_AA64MMFR4_EL1.E2H0 is negative. The bitfield is extracted as a signed
value, and this is checked with a signed-less-than (LT) comparison.

As the hyp code will reconfigure HCR_EL2 later in ___kvm_hyp_init(), all
bits other than E2H are initialized to zero in __kvm_hyp_init_cpu().

Fixes: 3944382fa6f22b54 ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative")
Fixes: b3320142f3db9b3f ("arm64: Fix early handling of FEAT_E2H0 not being implemented")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ahmed Genidi <ahmed.genidi@arm.com>
Cc: Ben Horgan <ben.horgan@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/el2_setup.h | 26 ++++++++++++++++++++++++++
 arch/arm64/kernel/head.S           | 19 +------------------
 arch/arm64/kvm/hyp/nvhe/hyp-init.S |  8 +++++++-
 3 files changed, 34 insertions(+), 19 deletions(-)

Comments

Leo Yan Feb. 28, 2025, 9:29 a.m. UTC | #1
Hi Mark,

On Thu, Feb 27, 2025 at 06:05:25PM +0000, Mark Rutland wrote:

[...]

> +.macro init_el2_hcr	val
> +	mov_q	x0, \val
> +
> +	/*
> +	 * Compliant CPUs advertise their VHE-onlyness with
> +	 * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it
> +	 * can reset into an UNKNOWN state and might not read as 1 until it has
> +	 * been initialized explicitly.

For ID_AA64MMFR4_EL1.E2H0 < 0 case, the code actually clears the
HCR_EL2.E2H bit.

Hence, the comment should be corrected as: "... it can reset into an
UNKNOWN state and might not read as 0 until it has been initialized
explicitly".

> +	 *
> +	 * Fruity CPUs seem to have HCR_EL2.E2H set to RAO/WI, but
> +	 * don't advertise it (they predate this relaxation).
> +	 *
> +	 * Initalize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H
> +	 * indicating whether the CPU is running in E2H mode.
> +	 */

I think it is even better to clear the HCR_E2H bit first. This can
avoid any dependency on the passed parameter 'val'.

        bic     x0, x0, #HCR_E2H

> +	mrs_s	x1, SYS_ID_AA64MMFR4_EL1
> +	sbfx	x1, x1, #ID_AA64MMFR4_EL1_E2H0_SHIFT, #ID_AA64MMFR4_EL1_E2H0_WIDTH
> +	cmp	x1, #0
> +	b.lt	.LnVHE_\@
> +
> +	orr	x0, x0, #HCR_E2H
> +.LnVHE_\@:
> +	msr	hcr_el2, x0
> +	isb
> +.endm
Marc Zyngier Feb. 28, 2025, 9:43 a.m. UTC | #2
On Fri, 28 Feb 2025 09:29:55 +0000,
Leo Yan <leo.yan@arm.com> wrote:
> 
> Hi Mark,
> 
> On Thu, Feb 27, 2025 at 06:05:25PM +0000, Mark Rutland wrote:
> 
> [...]
> 
> > +.macro init_el2_hcr	val
> > +	mov_q	x0, \val
> > +
> > +	/*
> > +	 * Compliant CPUs advertise their VHE-onlyness with
> > +	 * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it
> > +	 * can reset into an UNKNOWN state and might not read as 1 until it has
> > +	 * been initialized explicitly.
> 
> For ID_AA64MMFR4_EL1.E2H0 < 0 case, the code actually clears the
> HCR_EL2.E2H bit.
>
> Hence, the comment should be corrected as: "... it can reset into an
> UNKNOWN state and might not read as 0 until it has been initialized
> explicitly".

The comment is just fine. It is the code that is wrong, as it avoids
setting E2H when E2H0 < 0 while we want the exact opposite behaviour.

As a result, 'b.lt' really should be a 'b.ge'. Or the original code
kept as is.

> 
> > +	 *
> > +	 * Fruity CPUs seem to have HCR_EL2.E2H set to RAO/WI, but
> > +	 * don't advertise it (they predate this relaxation).
> > +	 *
> > +	 * Initalize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H
> > +	 * indicating whether the CPU is running in E2H mode.
> > +	 */
> 
> I think it is even better to clear the HCR_E2H bit first. This can
> avoid any dependency on the passed parameter 'val'.

What are you trying to avoid? A random value passed as a parameter to
the macro?

	M.
Mark Rutland Feb. 28, 2025, 9:52 a.m. UTC | #3
On Fri, Feb 28, 2025 at 09:43:20AM +0000, Marc Zyngier wrote:
> On Fri, 28 Feb 2025 09:29:55 +0000,
> Leo Yan <leo.yan@arm.com> wrote:
> > 
> > Hi Mark,
> > 
> > On Thu, Feb 27, 2025 at 06:05:25PM +0000, Mark Rutland wrote:
> > 
> > [...]
> > 
> > > +.macro init_el2_hcr	val
> > > +	mov_q	x0, \val
> > > +
> > > +	/*
> > > +	 * Compliant CPUs advertise their VHE-onlyness with
> > > +	 * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it
> > > +	 * can reset into an UNKNOWN state and might not read as 1 until it has
> > > +	 * been initialized explicitly.
> > 
> > For ID_AA64MMFR4_EL1.E2H0 < 0 case, the code actually clears the
> > HCR_EL2.E2H bit.
> >
> > Hence, the comment should be corrected as: "... it can reset into an
> > UNKNOWN state and might not read as 0 until it has been initialized
> > explicitly".
> 
> The comment is just fine. It is the code that is wrong, as it avoids
> setting E2H when E2H0 < 0 while we want the exact opposite behaviour.
> 
> As a result, 'b.lt' really should be a 'b.ge'. Or the original code
> kept as is.

Ugh, yes. I got confused and got the condition backwards.

Either works. Using 'b.ge' is closer to my intention -- I found the
'tbz' of the sign bit somewhat surprising and that needed a longer line
after the lable name changed.

Would you like me to respin, or would you be hapy to fix up when
applying?

Mark.
Leo Yan Feb. 28, 2025, 10:14 a.m. UTC | #4
On Fri, Feb 28, 2025 at 09:43:20AM +0000, Marc Zyngier wrote:

[...]

> On Fri, 28 Feb 2025 09:29:55 +0000,
> Leo Yan <leo.yan@arm.com> wrote:
> >
> > Hi Mark,
> >
> > On Thu, Feb 27, 2025 at 06:05:25PM +0000, Mark Rutland wrote:
> >
> > [...]
> >
> > > +.macro init_el2_hcr        val
> > > +   mov_q   x0, \val
> > > +
> > > +   /*
> > > +    * Compliant CPUs advertise their VHE-onlyness with
> > > +    * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it
> > > +    * can reset into an UNKNOWN state and might not read as 1 until it has
> > > +    * been initialized explicitly.
> >
> > For ID_AA64MMFR4_EL1.E2H0 < 0 case, the code actually clears the
> > HCR_EL2.E2H bit.
> >
> > Hence, the comment should be corrected as: "... it can reset into an
> > UNKNOWN state and might not read as 0 until it has been initialized
> > explicitly".
> 
> The comment is just fine. It is the code that is wrong, as it avoids
> setting E2H when E2H0 < 0 while we want the exact opposite behaviour.
> 
> As a result, 'b.lt' really should be a 'b.ge'. Or the original code
> kept as is.

Thanks for correcting.

> > > +    *
> > > +    * Fruity CPUs seem to have HCR_EL2.E2H set to RAO/WI, but
> > > +    * don't advertise it (they predate this relaxation).
> > > +    *
> > > +    * Initalize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H
> > > +    * indicating whether the CPU is running in E2H mode.
> > > +    */
> >
> > I think it is even better to clear the HCR_E2H bit first. This can
> > avoid any dependency on the passed parameter 'val'.
> 
> What are you trying to avoid? A random value passed as a parameter to
> the macro?

Yes, an unexpected value passed to the macro is a concern.

If the intentaion here is only to set the HCR_EL2.E2H bit when E2H0 < 0,
we don't need to cosndier other configurations and current code is just
fine?

Thanks,
Leo
Marc Zyngier Feb. 28, 2025, 10:20 a.m. UTC | #5
On Fri, 28 Feb 2025 09:52:50 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Fri, Feb 28, 2025 at 09:43:20AM +0000, Marc Zyngier wrote:
> > On Fri, 28 Feb 2025 09:29:55 +0000,
> > Leo Yan <leo.yan@arm.com> wrote:
> > > 
> > > Hi Mark,
> > > 
> > > On Thu, Feb 27, 2025 at 06:05:25PM +0000, Mark Rutland wrote:
> > > 
> > > [...]
> > > 
> > > > +.macro init_el2_hcr	val
> > > > +	mov_q	x0, \val
> > > > +
> > > > +	/*
> > > > +	 * Compliant CPUs advertise their VHE-onlyness with
> > > > +	 * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it
> > > > +	 * can reset into an UNKNOWN state and might not read as 1 until it has
> > > > +	 * been initialized explicitly.
> > > 
> > > For ID_AA64MMFR4_EL1.E2H0 < 0 case, the code actually clears the
> > > HCR_EL2.E2H bit.
> > >
> > > Hence, the comment should be corrected as: "... it can reset into an
> > > UNKNOWN state and might not read as 0 until it has been initialized
> > > explicitly".
> > 
> > The comment is just fine. It is the code that is wrong, as it avoids
> > setting E2H when E2H0 < 0 while we want the exact opposite behaviour.
> > 
> > As a result, 'b.lt' really should be a 'b.ge'. Or the original code
> > kept as is.
> 
> Ugh, yes. I got confused and got the condition backwards.
> 
> Either works. Using 'b.ge' is closer to my intention -- I found the
> 'tbz' of the sign bit somewhat surprising and that needed a longer line
> after the lable name changed.
> 
> Would you like me to respin, or would you be hapy to fix up when
> applying?

I can fix it on the fly, but it needs retesting, as I don't understand
how things could work in this state.

Thanks,

	M.
Mark Rutland Feb. 28, 2025, 11:13 a.m. UTC | #6
On Fri, Feb 28, 2025 at 10:20:38AM +0000, Marc Zyngier wrote:
> On Fri, 28 Feb 2025 09:52:50 +0000,
> Mark Rutland <mark.rutland@arm.com> wrote:
> > 
> > On Fri, Feb 28, 2025 at 09:43:20AM +0000, Marc Zyngier wrote:
> > > On Fri, 28 Feb 2025 09:29:55 +0000,
> > > Leo Yan <leo.yan@arm.com> wrote:
> > > > 
> > > > Hi Mark,
> > > > 
> > > > On Thu, Feb 27, 2025 at 06:05:25PM +0000, Mark Rutland wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > +.macro init_el2_hcr	val
> > > > > +	mov_q	x0, \val
> > > > > +
> > > > > +	/*
> > > > > +	 * Compliant CPUs advertise their VHE-onlyness with
> > > > > +	 * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it
> > > > > +	 * can reset into an UNKNOWN state and might not read as 1 until it has
> > > > > +	 * been initialized explicitly.
> > > > 
> > > > For ID_AA64MMFR4_EL1.E2H0 < 0 case, the code actually clears the
> > > > HCR_EL2.E2H bit.
> > > >
> > > > Hence, the comment should be corrected as: "... it can reset into an
> > > > UNKNOWN state and might not read as 0 until it has been initialized
> > > > explicitly".
> > > 
> > > The comment is just fine. It is the code that is wrong, as it avoids
> > > setting E2H when E2H0 < 0 while we want the exact opposite behaviour.
> > > 
> > > As a result, 'b.lt' really should be a 'b.ge'. Or the original code
> > > kept as is.
> > 
> > Ugh, yes. I got confused and got the condition backwards.
> > 
> > Either works. Using 'b.ge' is closer to my intention -- I found the
> > 'tbz' of the sign bit somewhat surprising and that needed a longer line
> > after the lable name changed.
> > 
> > Would you like me to respin, or would you be hapy to fix up when
> > applying?
> 
> I can fix it on the fly, but it needs retesting, as I don't understand
> how things could work in this state.

This happened to work by virtue of coincidence :/

Critically, I have not tested this on a CPU where HCR_EL2.E2H is
writeable but one polarity has no effect, as I don't have such a CPU to
hand. IIUC you tested that with hVHE under NV per commit:

  b3320142f3db9b3f ("arm64: Fix early handling of FEAT_E2H0 not being implemented")

... but I don't currently have a good setup to test that configuration.

In other cases, this largely falls out in the wash, e.g.

* On a CPU without E2H, where the HCR_EL2.E2H bit is implemented as
  RAZ/WI, the bit always reads as 0. Trying to set the bit has no
  effect. Later reads see 0.

  Hence this case happens to work.

* On a CPU with E2H and without FEAT_E2H0, where the HCR_EL2.E2H bit is
  implemented as RAO/WI, the bit always reads as 1. Trying to clear the
  bit has no effect. Later reads see 1.

  Hence this case happens to work.

* On a CPU with E2H and with FEAT_E2H0, there the HCR_EL2.E2H bit is
  implemented and has an effect, writing to the bit moves the CPU into
  E2H mode.

  The early boot code handles this the same as FEAT_E2H0 being absent,
  and so that happens to work. I haven't dug into how HCR_EL2 gets
  properly initialized later by KVM, but testing seems to indicate that
  this works.

Mark.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 25e1626517500..bc8ebd55788ac 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -16,6 +16,32 @@ 
 #include <asm/sysreg.h>
 #include <linux/irqchip/arm-gic-v3.h>
 
+.macro init_el2_hcr	val
+	mov_q	x0, \val
+
+	/*
+	 * Compliant CPUs advertise their VHE-onlyness with
+	 * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it
+	 * can reset into an UNKNOWN state and might not read as 1 until it has
+	 * been initialized explicitly.
+	 *
+	 * Fruity CPUs seem to have HCR_EL2.E2H set to RAO/WI, but
+	 * don't advertise it (they predate this relaxation).
+	 *
+	 * Initalize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H
+	 * indicating whether the CPU is running in E2H mode.
+	 */
+	mrs_s	x1, SYS_ID_AA64MMFR4_EL1
+	sbfx	x1, x1, #ID_AA64MMFR4_EL1_E2H0_SHIFT, #ID_AA64MMFR4_EL1_E2H0_WIDTH
+	cmp	x1, #0
+	b.lt	.LnVHE_\@
+
+	orr	x0, x0, #HCR_E2H
+.LnVHE_\@:
+	msr	hcr_el2, x0
+	isb
+.endm
+
 .macro __init_el2_sctlr
 	mov_q	x0, INIT_SCTLR_EL2_MMU_OFF
 	msr	sctlr_el2, x0
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 5ab1970ee5436..2d56459d6c94c 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -298,25 +298,8 @@  SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
 	msr	sctlr_el2, x0
 	isb
 0:
-	mov_q	x0, HCR_HOST_NVHE_FLAGS
-
-	/*
-	 * Compliant CPUs advertise their VHE-onlyness with
-	 * ID_AA64MMFR4_EL1.E2H0 < 0. HCR_EL2.E2H can be
-	 * RES1 in that case. Publish the E2H bit early so that
-	 * it can be picked up by the init_el2_state macro.
-	 *
-	 * Fruity CPUs seem to have HCR_EL2.E2H set to RAO/WI, but
-	 * don't advertise it (they predate this relaxation).
-	 */
-	mrs_s	x1, SYS_ID_AA64MMFR4_EL1
-	tbz	x1, #(ID_AA64MMFR4_EL1_E2H0_SHIFT + ID_AA64MMFR4_EL1_E2H0_WIDTH - 1), 1f
-
-	orr	x0, x0, #HCR_E2H
-1:
-	msr	hcr_el2, x0
-	isb
 
+	init_el2_hcr	HCR_HOST_NVHE_FLAGS
 	init_el2_state
 
 	/* Hypervisor stub */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index fc18662260676..3fb5504a7d7fc 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -73,8 +73,12 @@  __do_hyp_init:
 	eret
 SYM_CODE_END(__kvm_hyp_init)
 
+/*
+ * Initialize EL2 CPU state to sane values.
+ *
+ * HCR_EL2.E2H must have been initialized already.
+ */
 SYM_CODE_START_LOCAL(__kvm_init_el2_state)
-	/* Initialize EL2 CPU state to sane values. */
 	init_el2_state				// Clobbers x0..x2
 	finalise_el2_state
 	ret
@@ -206,6 +210,8 @@  SYM_CODE_START_LOCAL(__kvm_hyp_init_cpu)
 
 2:	msr	SPsel, #1			// We want to use SP_EL{1,2}
 
+	init_el2_hcr	0
+
 	bl	__kvm_init_el2_state
 
 	__init_el2_nvhe_prepare_eret